The command DistCp works for copying from one hdfs cluster to other hdfs cluster. But, the problem is that it is not specified any where(As I aware of) what should be the source url and destination url.
Many places, it is specified that specify hdfs://namenode1:50070/path_to_file and many other places some other port number.
So, after a lot of debugging, I figured out that it should be
hadoop distcp hdfs://(fs.default.name)_property_specified_in_hadoop-site.xml/path_to_file
for ex:
hadoop distcp hdfs://remotehost:10000/opt/hadoop-name/foo/bar hdfs://localhost:54310/opt/hadoop-name/foo/bar
The important note here is the url should be picked up exactly how it is in fs.default.name in hadoop-site.xml in hadoop conf directory.
Monday, February 09, 2009
Distributed copy from remote hdfs to local hdfs
Subscribe to:
Posts (Atom)