21 Jul 2021

Running Apache Spark in Ammonite REPL

When working with Apache Spark I often like to test things out in my local so I don’t have to worry about compiling full project or trying things out through testing suite. Apache Spark comes with Spark Shell which is esentially Scala REPL with available instance of SparkSession which allows for quick exploration to test out your code.

But can we do better?

Ammonite is just like Scala REPL but on steroids! It has some very nice features like easier code navigation, importing external libraries and more. I’m not going to go over featurs of Ammonite here, you can learn more at Ammonite website .

Here’s the instructions on setting up your local Spark with Ammonite.

Install Ammonite

$ sudo sh -c '(echo "#!/usr/bin/env sh" && curl -L https://github.com/com-lihaoyi/Ammonite/releases/download/2.4.0/2.12-2.4.0) > /usr/local/bin/amm && chmod +x /usr/local/bin/amm' && amm

Create Ammonite Spark script

spark_session.sc

import ammonite.ops._
import $ivy.`org.apache.spark:spark-sql_2.12:2.4.4`
import $ivy.`org.apache.spark:spark-core_2.12:2.4.4`
import $ivy.`org.apache.spark:spark-avro_2.12:2.4.4`
import org.apache.spark.sql.SparkSession

val spark: SparkSession = SparkSession.builder().master("local").getOrCreate()

Use it

$ amm - execute in the same directory as spark_session.sc

import $exec.spark_session

That’s it! You can now start working with Spark.

Throw in some helper methods

Let’s edit spark_session.sc file and add some additional code, for example helper methods to read data for common formats.

def loadAvro(path: String) = spark.read.format("avro").load(path)
def loadParuqet(path: String) = spark.read.format("parquet").load(path)

And use it.

$ amm

import $exec.spark_session

val df = loadParquet("/tmp/parquet_data")