hadoop - Not getting correct output when running standard "WordCount" program using Hadoop0.20.2 -


I'm new to Hadop. I'm trying to run the famous "WordCount" program - the total number of words in a list of files using Hadoop-0.20.2. I am using a single node cluster.

Foling is my program:

Import java.io.File; Import java.io.IOException; Import java.util. *;

Import org.apache.hadoop.fs.Path; Import org.apache.hadoop.conf ; Import org.apache.hadoop.io ; Import org.apache.hadoop.mapreduce *; Import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; Import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; Import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; Import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

Public Class WordCount {

  Public Static Class Map Mapper and Letting; LongVerbable, Text, Text, Intrabate, and GT; {Private Final Static IntWritable A = New IntWritable (1); Private text word = new text (); Public Zero map (long-term appropriate key, text value, reference reference) throws IOException, interrupted; Expression {string line = value Tutorial (); StringTokenizer Tokenizer = New StringTokenizer (line); While (tokenizer.hasMoreTokens ()) {word.set (tokenizer.nextToken ()); Context.write (word, one); }}} Reducer reduction in public stable class  value, reference reference) throws IOException, interrupted; Exception {int sum = 0; While (values.hasNext ()) {++ sum; } Context.write (Key, New IntWritable (Yoga)); }} Public static zero main (string [] args throws exceptions {configuration conf = new configuration ()); Job job = new job (conf, "wordcount"); FileInputFormat.addInputPath (Job, New Path (Args [0])); FileOutputFormat.setOutputPath (job, new path (args [1])); Job.setJarByClass (WordCount.class); Job.setInputFormatClass (TextInputFormat.class); Job.setOutputFormatClass (TextOutputFormat.class); Job.setMapperClass (Map.class); Job.setMapOutputKeyClass (Text.class); Job.setMapOutputValueClass (IntWritable.class); Job.setReducerClass (Reduce.class); Job.setOutputKeyClass (Text.class); Job.setOutputValueClass (IntWritable.class); Job.setNumReduceTasks (5); Job.waitForCompletion (true); }  

Program using Hoop -0.20.2 (not showing the command for clarity), the incoming output is A1A1B1B! C1C1D! D1

which is wrong. The actual output should be: A2B2C2D2

This "WordCount" program is a very standard program. I'm not really sure this code is wrong with the content of all configuration files like mapper-site.exml, core-site.exml etc.

I would be happy if someone could help me.

Thank you.

This code actually runs a local metret job. If you want to submit it to the actual cluster, you must provide the fs.default.name and mapred.job.tracker configuration parameters. These keys are mapped to a host with your machine: the port pair will just be your Managed / Core-Site. Like in Xml.
Ensure that your data is available in HDFS, not on the local disk, as well as the number of reducers should be reduced. It has about 2 records per reducer copy, you should set it to 1.

Comments