Learning Big Data : Hadoop Streaming: Specifying Map-Only Jobs

Thursday, May 4, 2017

Hadoop Streaming: Specifying Map-Only Jobs

Often, you may want to process input data using a map function only. To do this, simply set mapred.reduce.tasks to zero. The Map/Reduce framework will not create any reducer tasks. Rather, the outputs of the mapper tasks will be the final output of the job.

-D mapred.reduce.tasks=0

Hadoop Streaming also supports map-only jobs by specifying "-D mapred.reduce.tasks=0".

To specify map-only jobs, use

hadoop jar hadoop-streaming-2.7.1.2.4.0.0-169.jar
-D mapred.reduce.tasks=0
-input /user/root/wordcount
-output /user/root/out
-mapper /bin/cat

We can also achieve map-only jobs by specifying "-numReduceTasks 0"

hadoop jar hadoop-streaming-2.7.1.2.4.0.0-169.jar
-input /user/root/wordcount
-output /user/root/out
-mapper /bin/cat
-numReduceTasks 0

Learning Big Data

Thursday, May 4, 2017

Hadoop Streaming: Specifying Map-Only Jobs

No comments:

Post a Comment

Amazon S3: Basic Concepts

Total Pageviews

Search This Blog