How to use Spark on Ipython notebook
Note: Assuming you have installed anaconda python distribution.
If not please install https://store.continuum.io/cshop/anaconda
1. Create a Ipython notebook profile for spark configuration.
2. It will create a folder profile_spark in your iPython profile
3. Now create a file C:\Users\Bellamkonda\.ipython\profile_spark\startup\00-pyspark-setup.py and add the following:
import os
import sys
# Configure the environment
if 'SPARK_HOME' not in os.environ:
os.environ['SPARK_HOME'] = 'C:\Spark\spark-1.3.1' (Insert your spark location)
# Create a variable for our root path
SPARK_HOME = os.environ['SPARK_HOME']
# Add the PySpark/py4j to the Python Path
sys.path.insert(0, os.path.join(SPARK_HOME, "python", "build"))
sys.path.insert(0, os.path.join(SPARK_HOME, "python"))
|
4. Now Start up an IPython notebook with the profile we just created
ipython notebook --profile spark
ipython notebook --profile spark
5. The above command will initiate your Ipython notebook.
6. Do check the Pyspark by importing libraries or by running commands
Example: from pyspark import SparkContext
source:https://districtdatalabs.silvrback.com/getting-started-with-spark-in-python
No comments:
Post a Comment