How to run a pyspark application in windows 8 command prompt

Question

I have a python script written with Spark Context and I want to run it. I tried to integrate IPython with Spark, but I could not do that. So, I tried to set the spark path [ Installation folder/bin ] as an environment variable and called spark-submit command in the cmd prompt. I believe that it is finding the spark context, but it produces a really big error. Can someone please help me with this issue?

Environment variable path: C:/Users/Name/Spark-1.4;C:/Users/Name/Spark-1.4/bin

After that, in cmd prompt: spark-submit script.py

enter image description here

Check if [this](https://spark.apache.org/docs/0.8.1/python-programming-guide.html) link could help you out. — untitledprogrammer, Jun 25 '15 at 15:54

score 3 · Answer 1 · answered Feb 12 '16 at 03:19

I'm fairly new to Spark, and have figured out how to integrate with with IPython on Windows 10 and 7. First, check your environment variables for Python and Spark. Here are mine: SPARK_HOME: C:\spark-1.6.0-bin-hadoop2.6\ I use Enthought Canopy, so Python is already integrated in my system path. Next, launch Python or IPython and use the following code. If you get an error, check what you get for 'spark_home'. Otherwise, it should run just fine.

import os

import sys

spark_home = os.environ.get('SPARK_HOME', None)

if not spark_home:

raise ValueError('SPARK_HOME environment variable is not set')

sys.path.insert(0, os.path.join(spark_home, 'python'))

sys.path.insert(0, os.path.join(spark_home, 'C:/spark-1.6.0-bin-hadoop2.6/python/lib/py4j-0.9-src.zip')) ## may need to adjust on your system depending on which Spark version you're using and where you installed it.

execfile(os.path.join(spark_home, 'python/pyspark/shell.py'))

score 1 · Answer 2 · answered Mar 14 '16 at 20:00

Johnnyboycurtis answer works for me. If you are using python 3, use below code. His code doesnt work in python 3. I am editing only the last line of his code.

import os
import sys


spark_home = os.environ.get('SPARK_HOME', None)
print(spark_home)
if not spark_home:
    raise ValueError('SPARK_HOME environment variable is not set')
sys.path.insert(0, os.path.join(spark_home, 'python'))
sys.path.insert(0, os.path.join(spark_home, 'C:/spark-1.6.1-bin-hadoop2.6/spark-1.6.1-bin-hadoop2.6/python/lib/py4j-0.9-src.zip')) ## may need to adjust on your system depending on which Spark version you're using and where you installed it.


filename=os.path.join(spark_home, 'python/pyspark/shell.py')
exec(compile(open(filename, "rb").read(), filename, 'exec'))

I have been using the codes provided by "user2543622" successfully, and encountered problem recently with the following error message. Do you know what went wrong? Thanks. Exception: Java gateway process exited before sending the driver its port number — user27155, Dec 16 '16 at 21:16

score 0 · Accepted Answer · answered Jun 29 '15 at 17:15

0

Finally, I resolved the issue. I had to set the pyspark location in PATH variable and py4j-0.8.2.1-src.zip location in PYTHONPATH variable.

answered Jun 29 '15 at 17:15

SRS

1,045
5
11
22

How to run a pyspark application in windows 8 command prompt

3 Answers3

Linked