Showing posts with label Ipython. Show all posts
Showing posts with label Ipython. Show all posts

Thursday, July 21, 2016

How to integrate Jupyter notebook and Apache spark in windows ? [Tested on Spark 1.6.1]


  1. Download Anaconda python and install it (https://www.continuum.io/downloads)
  2. Open command prompt run command " ipython notebook" or "jupyter notebook"
  3. Create a new python notebook and copy paste the below commands

import os
import sys
os.environ['SPARK_HOME'] = "C:/Spark1.6.1/spark-1.6.1-bin-hadoop2.6"

sys.path.append("C:/Spark1.6.1/spark-1.6.1-bin-hadoop2.6/bin")
sys.path.append("C:/Spark1.6.1/spark-1.6.1-bin-hadoop2.6/python")
sys.path.append("C:/Spark1.6.1/spark-1.6.1-bin-hadoop2.6/python/pyspark")
sys.path.append("C:/Spark1.6.1/spark-1.6.1-bin-hadoop2.6/python/lib")
sys.path.append("C:/Spark1.6.1/spark-1.6.1-bin-hadoop2.6/python/lib/pyspark.zip")
sys.path.append("C:/Spark1.6.1/spark-1.6.1-bin-hadoop2.6/python/lib/py4j-0.9-src.zip")
sys.path.append("C:/Program Files/Java/jdk1.8.0_73")

from pyspark import SparkContext
from pyspark import SparkConf

sc = SparkContext("local","test")

replace SPARK_HOME with your spark's home location similarly change the rest of the commands also.

Testing

textFile = sc.textFile("README.md")
textFile.count()



Thursday, June 18, 2015

How to use Spark on Ipython notebook


Note: Assuming you have installed anaconda python distribution.
1.  Create a Ipython notebook profile for spark configuration.

             
                                ipython profile create spark

2. It will create a folder profile_spark in your iPython profile
 3. Now create a file C:\Users\Bellamkonda\.ipython\profile_spark\startup\00-pyspark-setup.py and add the following:

import os
import sys
# Configure the environment
if 'SPARK_HOME' not in os.environ:
    os.environ['SPARK_HOME'] = 'C:\Spark\spark-1.3.1' (Insert your spark location)
# Create a variable for our root path
SPARK_HOME = os.environ['SPARK_HOME']
# Add the PySpark/py4j to the Python Path
sys.path.insert(0, os.path.join(SPARK_HOME, "python", "build"))
sys.path.insert(0, os.path.join(SPARK_HOME, "python"))


4. Now Start up an IPython notebook with the profile we just created
   
    ipython notebook --profile spark
              
5. The above command will initiate your Ipython notebook.
6. Do check the Pyspark by importing libraries or by running commands
        Example: from pyspark import  SparkContext

source:https://districtdatalabs.silvrback.com/getting-started-with-spark-in-python