(Why) do we need to call cache or persist on a RDD
When a resilient distributed dataset (RDD) is created from a text file or collection (or from another RDD), do we need to call … Read more
When a resilient distributed dataset (RDD) is created from a text file or collection (or from another RDD), do we need to call … Read more
I prefer Python over Scala. But, as Spark is natively written in Scala, I was expecting my code to run faster in the … Read more
In terms of RDD persistence, what are the differences between cache() and persist() in spark ? 6 Answers 6
I’m just wondering what is the difference between an RDD and DataFrame (Spark 2.0.0 DataFrame is a mere type alias for Dataset[Row]) in … Read more
According to Learning Spark Keep in mind that repartitioning your data is a fairly expensive operation. Spark also has an optimized version of … Read more