hadoop restart copy local files to dfs

December 5th, 2008
Tags: java heap hadoop datanode Posted in Hadoop :-)Say something

 

This is really a time-consuming step, and what's worse, the datanodes tend to be dead easily over the course. I couldn't figure out the reason, although there is such Exception in the datanode.*.out file:

Exception in thread "DataNode: [/*/dfs/data]" java.lang.OutOfMemoryError: Java heap space

So after all datanodes become dead, I have to restart dfs and continue `fs -put` files to where I want. Fortunately, seems hadoop can figure out itself where to pick up and continue.

Update:

So I decided to increase the heap size anyway, bump the default size 1000m to 1500m. And then it just works!!!

Howto change heap size:

in the conf/hadoop-env.sh

  • change the value: HADOOP_HEAPSIZE, this will increase the heap size for all jobs (DataNode or TaskTracker)
  • or add -Xmx1500m to value HADOOP_DATANODE_OPTS, then it only affects DataNode

 

Also how to restart the dead DataNode:

  1. manually find out the process on the DataNode and kill it ( I also copied the exact process command as my start-datanode.sh)
  2. then run start-dfs.sh on master, (or, you can run the start-datanode.sh on the datanode)

Relate Posts:// 相关文章

Using micolog»
«GAE response

我要留言

3+5=