12

I'm using Spark (1.5.1) from an IPython notebook on a macbook pro. After installing Spark and Anaconda, I start IPython from a terminal by executing: IPYTHON_OPTS="notebook" pyspark. This opens a webpage listing all my IPython notebooks. I can select one of them, opening it in a second webpage. SparkContext (sc) is available already, and my first command in the notebook is help(sc), which runs fine. The problem I'm having is that I am getting a Java heap space error that I don't know how to address. How do I view my current Java heap setting, and how do I increase it within the context of my setup. The error message I'm getting follows:

Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 19 in stage 247.0 failed 1 times, most recent failure: Lost task 19.0 in stage 247.0 (TID 953, localhost): java.lang.OutOfMemoryError: Java heap space
Sean Owen
  • 6,585
  • 6
  • 31
  • 43
Kai
  • 303
  • 1
  • 2
  • 10

3 Answers3

20

You can manage Spark memory limits programmatically (by the API).

As SparkContext is already available in your Notebook:

sc._conf.get('spark.driver.memory')

You can set as well, but you have to shutdown the existing SparkContext first:

conf = SparkConf().setAppName("App")
conf = (conf.setMaster('local[*]')
        .set('spark.executor.memory', '4G')
        .set('spark.driver.memory', '45G')
        .set('spark.driver.maxResultSize', '10G'))
sc = SparkContext(conf=conf)

If your workload is the same for all analysis, then editing spark-defaults.conf as cited above is the way to go.

noleto
  • 301
  • 2
  • 3
7

I solved it by creating a spark-defaults.conf file in apache-spark/1.5.1/libexec/conf/ and adding the following line to it: spark.driver.memory 14g

That solved my issue. But then I ran into another issue of exceeding max result size of 1024MB. The solution was to add another line in the file above: spark.driver.maxResultSize 2g

Kai
  • 303
  • 1
  • 2
  • 10
4

Just use the config option when setting SparkSession (as of 2.4)

MAX_MEMORY = "5g"

spark = SparkSession \
    .builder \
    .appName("Foo") \
    .config("spark.executor.memory", MAX_MEMORY) \
    .config("spark.driver.memory", MAX_MEMORY) \
    .getOrCreate()
LaSul
  • 461
  • 3
  • 11