Does anybody know why this project has not been updated anymore?
GitHub - ondra-m/ruby-spark: Ruby wrapper for Apache Spark
It's for Spark 1.x, now we use Spark 3.x.
It even doesn't have dataframe API support.
Some comments from ondra-m in the issues:
Feb 26, 2017 "If more people will want compatibility with version 2,
I'll look at it."
Sep 3, 2018 "Sorry but currently I don't have time to maintain this
library."
Nov 6, 2019 "its very hard to use dynamic language in a distributed
environment. Specially ruby cannot serialize an anonymous function"
So I was thinking if the project gets active development it will help
a lot of people like me working on both ruby dev and data science.
Something more native in ruby would be great, but I did a quick test
based on GitHub - mrkn/pycall.rb: Calling Python functions from the Ruby language
$ cat pyspark.rb
require 'ruby.py'
SparkSql = RubyPy.import('pyspark.sql')
SparkSession = SparkSql.SparkSession
SparkRow = SparkSql.Row
spark = SparkSession.builder.getOrCreate
df = spark.createDataFrame [
SparkRow.new(a: 2, b: 3.0, c: 'string1'),
SparkRow.new(a: 4, b: 9.0, c: 'string2'),
SparkRow.new(a: 8, b: 17.0, c: 'string3'),
]
df.show
# =>
# +---+----+-------+
# | a| b| c|
# +---+----+-------+
# | 2| 3.0|string1|
# | 4| 9.0|string2|
# | 8|17.0|string3|
# +---+----+-------+
df.printSchema
# =>
# root
# |-- a: long (nullable = true)
# |-- b: double (nullable = true)
# |-- c: string (nullable = true)
df.select('a', 'c').describe.show
# =>
# +-------+-----------------+-------+
# |summary| a| c|
# +-------+-----------------+-------+
# | count| 3| 3|
# | mean|4.666666666666667| null|
# | stddev|3.055050463303893| null|
# | min| 2|string1|
# | max| 8|string3|
# +-------+-----------------+-------+
df.filter(df.a < 5).show
# =>
# +---+---+-------+
# | a| b| c|
# +---+---+-------+
# | 2|3.0|string1|
# | 4|9.0|string2|
# +---+---+-------+
df.filter(df.b > 5).show
# =>
# +---+----+-------+
# | a| b| c|
# +---+----+-------+
# | 4| 9.0|string2|
# | 8|17.0|string3|
# +---+----+-------+
···
On 12/24/21, Piper H <potthua@gmail.com> wrote: