Monday, February 09, 2009

Distributed copy from remote hdfs to local hdfs

The command DistCp works for copying from one hdfs cluster to other hdfs cluster. But, the problem is that it is not specified any where(As I aware of) what should be the source url and destination url.
Many places, it is specified that specify hdfs://namenode1:50070/path_to_file and many other places some other port number.
So, after a lot of debugging, I figured out that it should be
hadoop distcp hdfs://(fs.default.name)_property_specified_in_hadoop-site.xml/path_to_file

for ex:
hadoop distcp hdfs://remotehost:10000/opt/hadoop-name/foo/bar hdfs://localhost:54310/opt/hadoop-name/foo/bar

The important note here is the url should be picked up exactly how it is in fs.default.name in hadoop-site.xml in hadoop conf directory.