When working with Apache Spark I often like to test things out in my local so I don’t have to worry about compiling full project or trying things out through testing suite. Apache Spark comes with Spark Shell which is esentially Scala REPL with available instance of SparkSession which allows for quick exploration to test out your code.
But can we do better?
Ammonite is just like Scala REPL but on steroids! It has some very nice features like easier code navigation, importing external libraries and more. I’m not going to go over featurs of Ammonite here, you can learn more at Ammonite website .
Here’s the instructions on setting up your local Spark with Ammonite.
$ sudo sh -c '(echo "#!/usr/bin/env sh" && curl -L https://github.com/com-lihaoyi/Ammonite/releases/download/2.4.0/2.12-2.4.0) > /usr/local/bin/amm && chmod +x /usr/local/bin/amm' && amm
import ammonite.ops._ import $ivy.`org.apache.spark:spark-sql_2.12:2.4.4` import $ivy.`org.apache.spark:spark-core_2.12:2.4.4` import $ivy.`org.apache.spark:spark-avro_2.12:2.4.4` import org.apache.spark.sql.SparkSession val spark: SparkSession = SparkSession.builder().master("local").getOrCreate()
$ amm - execute in the same directory as
That’s it! You can now start working with Spark.
spark_session.sc file and add some additional code, for example helper methods to read data for common formats.
def loadAvro(path: String) = spark.read.format("avro").load(path) def loadParuqet(path: String) = spark.read.format("parquet").load(path)
And use it.
import $exec.spark_session val df = loadParquet("/tmp/parquet_data")