Learning Big Data : 2014

Wednesday, October 22, 2014

VirtualBox/Ubuntu: Full Screen Mode resolution

By default, at startup, ubuntu VMBox presents all the icons in a particular size that is smaller than the actual monitor. But when I switch to either full screen or scaled mode, instead of resizing everything smoothly (like a vector image) it makes everything fuzzy and grainy, and hard to look at.

Eclipse : Error in WordCount MapReduce code : Type java.util.Map$Entry cannot be resolved

I was trying to compile the Hadoop MapReduce WordCount problem and faced the following error. I was using Java 1.8 version.

Error:

The project was not built since its build path is incomplete. Cannot find the class file for java.util.Map$Entry. Fix the build path then try building this project.
The type java.util.Map$Entry cannot be resolved. It is indirectly referenced from required .class files.

Resolution:
Downgrade your Java version from 1.8 to 1.7. Below are the steps I have followed to resolve this error.

1. I have uninstalled Java 1.8 latest version from my computer. Go to Control Panel, Programs and Features and then uninstall the Java 1.8 Kit.
2. Download and install Java 1.7 version. Change the PATH environment variable on your windows to point to new installed version of Java as in add "C:\Program Files\Java\jdk1.7.0_67\bin" to PATH env. variable.
3. Restart the computer.
4. Open your Eclipse and try to build and compile your project against 1.7. Your error should disappear.

Thursday, September 25, 2014

Eclipse : Syntax Error : Parameterized types are only if source level is 1.5

Follow the steps to resolve the above error.

1. Right click on the project and go to project properties.
2. Then go to Java Compiler.
3. Check the Box (Enable project specific settings)
4. Change the compiler compliance level to '1.5'.
5. Also, click the check box for "Use default compliance settings" & click ok.

Rebuild the project. It will be resolved.

Wednesday, September 24, 2014

Hadoop Integration with Eclipse : Errors while running the WordCount program

Following are the WordCount Mapper,Reducer and Driver classes respectively for the WordCount example picked from YDN.

WordCountMapper.java
--------------------------------------

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.Writable;
import org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;

public class WordCountMapper extends MapReduceBase
    implements Mapper<LongWritable, Text, Text, IntWritable> {

private final IntWritable one = new IntWritable(1);
private Text word = new Text();

public void map(WritableComparable key, Writable value,
      OutputCollector output, Reporter reporter) throws IOException {

    String line = value.toString();
    StringTokenizer itr = new StringTokenizer(line.toLowerCase());
    while(itr.hasMoreTokens()) {
      word.set(itr.nextToken());
      output.collect(word, one);
    }
}
}

WordCountReducer.java
-----------------------------------

import java.io.IOException;
import java.util.Iterator;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;

public class WordCountReducer extends MapReduceBase
    implements Reducer<Text, IntWritable, Text, IntWritable> {

public void reduce(Text key, Iterator values,
      OutputCollector output, Reporter reporter) throws IOException {

    int sum = 0;
    while (values.hasNext()) {
      IntWritable value = (IntWritable) values.next();
      sum += value.get(); // process value
    }

    output.collect(key, new IntWritable(sum));
}
}

WordCount.java (Driver Class)
--------------------------------------

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;

public class WordCount {

public static void main(String[] args) {
    JobClient client = new JobClient();
    JobConf conf = new JobConf(WordCount.class);

    // specify output types
    conf.setOutputKeyClass(Text.class);
    conf.setOutputValueClass(IntWritable.class);

    // specify input and output dirs
    FileInputPath.addInputPath(conf, new Path("input"));
    FileOutputPath.addOutputPath(conf, new Path("output"));

    // specify a mapper
    conf.setMapperClass(WordCountMapper.class);

    // specify a reducer
    conf.setReducerClass(WordCountReducer.class);
    conf.setCombinerClass(WordCountReducer.class);

    client.setConf(conf);
    try {
      JobClient.runJob(conf);
    } catch (Exception e) {
      e.printStackTrace();
    }
}
}

Errors:

1. FileInputPath cannot be resolved
2. FileOutputPath cannot be resolved

Modify
1. FileInputPath.addInputPath in driver class (WordCount.java) to FileInputFormat.addInputPath
2. FileOutputPath.addOutputPath in driver class to FileOutputFormat.setOutputPath

Hope it helps.

Wednesday, September 17, 2014

ERROR dfs.NameNode: java.io.IOException: Cannot lock storage /tmp/hadoop-hadoop-user/dfs/name. The directory is already locked

The following is the error occurred while formatting the namenode.
----------------------------------------------------------------------------------------------------------------
hadoop-user@hadoop-desk:~/hadoop$ bin/hadoop namenode -format
14/09/14 18:53:27 INFO dfs.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = hadoop-desk/127.0.1.1
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 0.18.0
STARTUP_MSG:   build = http://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.18 -r 686010; comp                                                                                      iled by 'hadoopqa' on Thu Aug 14 19:48:33 UTC 2008
************************************************************/
Re-format filesystem in /tmp/hadoop-hadoop-user/dfs/name ? (Y or N) Y
14/09/14 18:53:35 INFO fs.FSNamesystem: fsOwner=hadoop-user,hadoop-user,adm,dialout,cdrom,floppy,audio,                                                                                      dip,video,plugdev,fuse,lpadmin,admin
14/09/14 18:53:35 INFO fs.FSNamesystem: supergroup=supergroup
14/09/14 18:53:35 INFO fs.FSNamesystem: isPermissionEnabled=true
14/09/14 18:53:35 INFO dfs.Storage: Cannot lock storage /tmp/hadoop-hadoop-user/dfs/name. The directory                                                                                       is already locked.
14/09/14 18:53:35 ERROR dfs.NameNode: java.io.IOException: Cannot lock storage /tmp/hadoop-hadoop-user/                                                                                      dfs/name. The directory is already locked.
        at org.apache.hadoop.dfs.Storage$StorageDirectory.lock(Storage.java:447)
        at org.apache.hadoop.dfs.FSImage.format(FSImage.java:930)
        at org.apache.hadoop.dfs.FSImage.format(FSImage.java:949)
        at org.apache.hadoop.dfs.NameNode.format(NameNode.java:741)
        at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:822)
        at org.apache.hadoop.dfs.NameNode.main(NameNode.java:839)
14/09/14 18:53:35 INFO dfs.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at hadoop-desk/127.0.1.1
************************************************************/
------------------------------------------------------------------------------------------------------

The username with which you are trying to format the name node might not have necessary privileges on the namenode and datanode directories that are listed in the hadoop-site.xml configuration file. Provide the necessary privileges on all directories and sub-directories of name node and data node directories and rerun the command to format.

Getting "Error:Null" while browsing HDFS from mapreduce plugin of Eclipse

While navigating HDFS directory structure in the project explorer, if you are getting Error:NULL check for Advanced Parameters settings. I am using Hadoop 0.18.0 and Eclipse SDK version 3.5.0.

In many of the eclipse versions when you are setting Advanced Parameters, you may not find "hadoop.job.ugi" parameter. And hence missing setting this parameter could result in Error:NULL.

First, set all the required parameters except hadoop.job.ugi (because its missing). Then click Finish.

Now go to your workspace directory and navigate to locations folder.

"\workspace\.metadata\.plugins\org.apache.hadoop.eclipse\locations"

Here open the xml file and add the following.

<property>
< name>hadoop.job.ugi</name>
< value>hadoop-user,ABC</value>
< /property>

Close the eclipse and start it again. Now you should be able to connect to HDFS and view its file system.

MapReduce Plugin for Eclipse

Any eclipse before 3.6 version is compatible with MapReduce plugin. So be sure to download the right version for proper functioning.

You can download the eclipse version from http://archive.eclipse.org/eclipse/downloads/. Installing Eclipse is pretty straightforward. Download the packaged .zip file into any directory on your computer. Unzip the compressed file and from the uncompressed folder you can directly run eclipse.exe file under eclipse directory with no modifications or other installation procedure.

After downloading eclipse,

1. Move mapreduce plugin's jar file, hadoop-0.18.0-eclipse-plugin.jar to plugins directory under eclipse folder.

2. Run eclipse.exe file to start Eclipse.

3. Please select the workspace directory. This is where Eclipse stores all your source projects and their related settings.

4. After choosing the workspace directory, if you are opening Eclipse for the first time you are presented with welcome screen, click the button that says "workbench". This is the main view of Eclipse where you write source code, manage your projects etc.

5. On the upper right hand corner of the workbench, click the "Open Prespective" button which is next to "Java Prespective" button.

6. Select "Other" and then Map/Reduce and press OK.

7. Once this is done, you will see a Map/Reduce Locations panel next to Problems, Tasks, Javadoc panels on the bottom of the workbench.

8. Select the Map/Reduce Locations panel, and in that select the elephant logo which says "New Hadoop Location".

9. You will be asked to fill number of parameters identifying the server.

Under General Tab:

Location Name: Any descriptive name
Map/Reduce Master Host: The IP address of the hadoop server
Map/Reduce Master Port: 9001
DFS Master Port: 9000
UserName: username used to connect to hadoop server.

Under Advanced Paramerters Tab:

hadoop.job.ugi : It contains your current Windows login credentials. Highlight the first comma-separated value in this list (your username) and replace it with hadoop user.
mapred.system.dir : Erase the current value and set it to /hadoop/mapred/system.

When you are done, your server will appear in Map/Reduce Locations panel. In the project explorer, mapreduce plugin has added the ability to browse the HDFS. Any files inserted into HDFS file system can be viewed by expanding the tree by pressing [+] buttons.

Learning Big Data