Monday, May 16, 2016

Install Apache Spark on Mac/Linux using prebuilt package

If you do not want to run Apache Spark on Hadoop, then standalone mode is what you are looking for. Here are the steps to install and run Apache Spark on MAC/Linux in standalone mode.

1. Java is a prerequisite for running Apache Spark. Install Java 7 or later. If not present, download Java from here.
If Java is already installed, try the following command to verify Java version

$ java -version

3. Download Scala. Choose the first option of "Download Scala x.y.z. binaries for your system". 
Untar the Scala tar file using the following command.

$ tar xvf scala-2.11.8.tgz

4. Use the following commands to move scala directory to /usr/local/scala directory.

$ sudo mv scala-2.11.8 /usr/local/scala
Password:

4. Set PATH for Scala.

$ export PATH=$PATH:/usr/local/scala/bin

5. To check if Scala is working or not, run following command.

$ scala -version

Scala code runner version 2.11.8 -- Copyright 2002-2016, LAMP/EPFL

6. Apache Spark can be installed in two ways.
  • Building Spark using SBT 
  • Use prebuilt Spark package
Let's choose a Spark prebuilt package for Hadoop from here. Here we are trying to download spark-1.6.1-bin-hadoop2.6 version. After downloading, spark tar file will be in download folder.
Untar the downloaded tar file using the following command.

$ tar xvf spark-1.6.1-bin-hadoop2.6.tgz

7. Move Spark software files to /usr/local/spark directory 

$ sudo mv spark-1.6.1-bin/hadoop2.6 /usr/local/spark
Password:

Set PATH variable to the downloaded spark folder.

$ export PATH=$PATH:/usr/local/spark/bin

8. For testing if Spark is working or not, you can run the following command

$ spark-shell

If Spark is installed successfully, it will find the following output.


Spark assembly has been built with Hive, including Datanucleus jars on classpath 
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 
15/06/04 15:25:22 INFO SecurityManager: Changing view acls to: hadoop 
15/06/04 15:25:22 INFO SecurityManager: Changing modify acls to: hadoop
15/06/04 15:25:22 INFO SecurityManager: SecurityManager: authentication disabled;
   ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop) 
15/06/04 15:25:22 INFO HttpServer: Starting HTTP Server 
15/06/04 15:25:23 INFO Utils: Successfully started service 'HTTP class server' on port 43292. 
Welcome to 
      ____              __ 
     / __/__  ___ _____/ /__ 
    _\ \/ _ \/ _ `/ __/  '_/ 
   /___/ .__/\_,_/_/ /_/\_\   version 1.6.1
      /_/  
  
Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_79) 
Type in expressions to have them evaluated. 
Spark context available as sc.  
scala> 

No comments:

Post a Comment

Amazon S3: Basic Concepts

Amazon S3 is an reliable, scalable, online object storage that stores files. Bucket: A bucket is a container in Amazon S3 where the fil...