Saturday, June 27, 2015

How to upgrade Spark 1.3.1 to Spark 1.4.0

1) Download Spark 1.4.0 from https://spark.apache.org/downloads.html
2)Check the dependencies Scala 2.11, Maven 3.3.3 using commands
scala -version & mvn -version
3)Now we need to build Spark using Apache Maven, run 
mvn -DskipTests clean package   
4)Wait till build gets success and this process would take around 45mins.
5)Check scala shell and pyspark shell using commands 
    ./bin/spark-shell (Scala)
    ./bin/pyspark       (Python)
6)If you need to use Spark 1.4 with Ipython notebook 

Thursday, June 18, 2015

How to use Spark on Ipython notebook


Note: Assuming you have installed anaconda python distribution.
1.  Create a Ipython notebook profile for spark configuration.

             
                                ipython profile create spark

2. It will create a folder profile_spark in your iPython profile
 3. Now create a file C:\Users\Bellamkonda\.ipython\profile_spark\startup\00-pyspark-setup.py and add the following:

import os
import sys
# Configure the environment
if 'SPARK_HOME' not in os.environ:
    os.environ['SPARK_HOME'] = 'C:\Spark\spark-1.3.1' (Insert your spark location)
# Create a variable for our root path
SPARK_HOME = os.environ['SPARK_HOME']
# Add the PySpark/py4j to the Python Path
sys.path.insert(0, os.path.join(SPARK_HOME, "python", "build"))
sys.path.insert(0, os.path.join(SPARK_HOME, "python"))


4. Now Start up an IPython notebook with the profile we just created
   
    ipython notebook --profile spark
              
5. The above command will initiate your Ipython notebook.
6. Do check the Pyspark by importing libraries or by running commands
        Example: from pyspark import  SparkContext

source:https://districtdatalabs.silvrback.com/getting-started-with-spark-in-python