9

I have a python script written with Spark Context and I want to run it. I tried to integrate IPython with Spark, but I could not do that. So, I tried to set the spark path [ Installation folder/bin ] as an environment variable and called spark-submit command in the cmd prompt. I believe that it is finding the spark context, but it produces a really big error. Can someone please help me with this issue?

Environment variable path: C:/Users/Name/Spark-1.4;C:/Users/Name/Spark-1.4/bin

After that, in cmd prompt: spark-submit script.py

enter image description here

Brian Spiering
  • 20,142
  • 2
  • 25
  • 102
SRS
  • 1,045
  • 5
  • 11
  • 22

3 Answers3

3

I'm fairly new to Spark, and have figured out how to integrate with with IPython on Windows 10 and 7. First, check your environment variables for Python and Spark. Here are mine: SPARK_HOME: C:\spark-1.6.0-bin-hadoop2.6\ I use Enthought Canopy, so Python is already integrated in my system path. Next, launch Python or IPython and use the following code. If you get an error, check what you get for 'spark_home'. Otherwise, it should run just fine.

import os

import sys

spark_home = os.environ.get('SPARK_HOME', None)

if not spark_home:

raise ValueError('SPARK_HOME environment variable is not set')

sys.path.insert(0, os.path.join(spark_home, 'python'))

sys.path.insert(0, os.path.join(spark_home, 'C:/spark-1.6.0-bin-hadoop2.6/python/lib/py4j-0.9-src.zip')) ## may need to adjust on your system depending on which Spark version you're using and where you installed it.

execfile(os.path.join(spark_home, 'python/pyspark/shell.py'))

pySpark on IPython

Jon
  • 481
  • 2
  • 8
1

Johnnyboycurtis answer works for me. If you are using python 3, use below code. His code doesnt work in python 3. I am editing only the last line of his code.

import os
import sys


spark_home = os.environ.get('SPARK_HOME', None)
print(spark_home)
if not spark_home:
    raise ValueError('SPARK_HOME environment variable is not set')
sys.path.insert(0, os.path.join(spark_home, 'python'))
sys.path.insert(0, os.path.join(spark_home, 'C:/spark-1.6.1-bin-hadoop2.6/spark-1.6.1-bin-hadoop2.6/python/lib/py4j-0.9-src.zip')) ## may need to adjust on your system depending on which Spark version you're using and where you installed it.


filename=os.path.join(spark_home, 'python/pyspark/shell.py')
exec(compile(open(filename, "rb").read(), filename, 'exec'))
user2543622
  • 147
  • 11
  • I have been using the codes provided by "user2543622" successfully, and encountered problem recently with the following error message. Do you know what went wrong? Thanks. Exception: Java gateway process exited before sending the driver its port number – user27155 Dec 16 '16 at 21:16
0

Finally, I resolved the issue. I had to set the pyspark location in PATH variable and py4j-0.8.2.1-src.zip location in PYTHONPATH variable.

SRS
  • 1,045
  • 5
  • 11
  • 22