(Why) do we need to call cache or persist on a RDD
When a resilient distributed dataset (RDD) is created from a text file or collection (or from another RDD), do we need to call “cache” or “persist” explicitly to store the RDD data into memory? Or is the RDD data stored in a distributed way in the memory by default? val textFile = sc.textFile(“/user/emp.txt”) As per … Read more