Showing posts with label Apache spark. Show all posts
Showing posts with label Apache spark. Show all posts

Thursday, June 18, 2015

How to use Spark on Ipython notebook


Note: Assuming you have installed anaconda python distribution.
1.  Create a Ipython notebook profile for spark configuration.

             
                                ipython profile create spark

2. It will create a folder profile_spark in your iPython profile
 3. Now create a file C:\Users\Bellamkonda\.ipython\profile_spark\startup\00-pyspark-setup.py and add the following:

import os
import sys
# Configure the environment
if 'SPARK_HOME' not in os.environ:
    os.environ['SPARK_HOME'] = 'C:\Spark\spark-1.3.1' (Insert your spark location)
# Create a variable for our root path
SPARK_HOME = os.environ['SPARK_HOME']
# Add the PySpark/py4j to the Python Path
sys.path.insert(0, os.path.join(SPARK_HOME, "python", "build"))
sys.path.insert(0, os.path.join(SPARK_HOME, "python"))


4. Now Start up an IPython notebook with the profile we just created
   
    ipython notebook --profile spark
              
5. The above command will initiate your Ipython notebook.
6. Do check the Pyspark by importing libraries or by running commands
        Example: from pyspark import  SparkContext

source:https://districtdatalabs.silvrback.com/getting-started-with-spark-in-python



Tuesday, May 19, 2015

How to install Apache Spark on windows machine

1) Download Apache spark from http://spark.apache.org/downloads.html
2) Extract files and open Readme.md documentation.
3) Building Spark using Maven requires Maven 3.0.4 or newer,Java 6+ and scala.
4) Install JDK and Maven and setup the paths in system variables.(http://maven.apache.org/download.cgi)
5) Run command mvn -DskipTests clean package for building the package
    
6) Install scala http://www.scala-lang.org/
7) Update scala system path in system variables.


8) After building completes invoke spark shell using ./bin/spark-shell