Thursday, June 18, 2015

How to use Spark on Ipython notebook


Note: Assuming you have installed anaconda python distribution.
1.  Create a Ipython notebook profile for spark configuration.

             
                                ipython profile create spark

2. It will create a folder profile_spark in your iPython profile
 3. Now create a file C:\Users\Bellamkonda\.ipython\profile_spark\startup\00-pyspark-setup.py and add the following:

import os
import sys
# Configure the environment
if 'SPARK_HOME' not in os.environ:
    os.environ['SPARK_HOME'] = 'C:\Spark\spark-1.3.1' (Insert your spark location)
# Create a variable for our root path
SPARK_HOME = os.environ['SPARK_HOME']
# Add the PySpark/py4j to the Python Path
sys.path.insert(0, os.path.join(SPARK_HOME, "python", "build"))
sys.path.insert(0, os.path.join(SPARK_HOME, "python"))


4. Now Start up an IPython notebook with the profile we just created
   
    ipython notebook --profile spark
              
5. The above command will initiate your Ipython notebook.
6. Do check the Pyspark by importing libraries or by running commands
        Example: from pyspark import  SparkContext

source:https://districtdatalabs.silvrback.com/getting-started-with-spark-in-python



No comments:

Post a Comment