Thursday, May 4, 2017

Hadoop Streaming: Specifying Map-Only Jobs

Often, you may want to process input data using a map function only. To do this, simply set mapred.reduce.tasks to zero. The Map/Reduce framework will not create any reducer tasks. Rather, the outputs of the mapper tasks will be the final output of the job.

    -D mapred.reduce.tasks=0

Hadoop Streaming also supports map-only jobs by specifying "-D mapred.reduce.tasks=0".

To specify map-only jobs, use

hadoop jar hadoop-streaming-2.7.1.2.4.0.0-169.jar 
-D mapred.reduce.tasks=0 
-input /user/root/wordcount 
-output /user/root/out 
-mapper /bin/cat


We can also achieve map-only jobs by specifying "-numReduceTasks 0"

hadoop jar hadoop-streaming-2.7.1.2.4.0.0-169.jar 
-input /user/root/wordcount 
-output /user/root/out 
-mapper /bin/cat 
-numReduceTasks 0

No comments:

Post a Comment

Amazon S3: Basic Concepts

Amazon S3 is an reliable, scalable, online object storage that stores files. Bucket: A bucket is a container in Amazon S3 where the fil...