What is Spark Parallelize?

Parallelize is a method to create an RDD from an existing collection. Parallelized collections are created by calling JavaSparkContext’s parallelize method on an existing Collection in your driver program. The elements of the collection are copied to form a distributed dataset that can be operated on in parallel.

List<Integer> data = Arrays.asList(1, 2, 3, 4, 5);
JavaRDD<Integer> distData = sc.parallelize(data);

Comments

Popular posts from this blog

Gove confirms mandatory housebuilding targets for councils will be abolished in face of Tory rebellion – UK politics live

Kotak Mahindra Bank Recruitment 2022 Released for Graduate Candidates And Apply Online