hadoop restart copy local files to dfs
December 5th, 2008
Tags: java heap hadoop datanode Posted in Hadoop :-)Say something
This is really a time-consuming step, and what's worse, the datanodes tend to be dead easily over the course. I couldn't figure out the reason, although there is such Exception in the datanode.*.out file:
Exception in thread "DataNode: [/*/dfs/data]" java.lang.OutOfMemoryError: Java heap space
So after all datanodes become dead, I have to restart dfs and continue `fs -put` files to where I want. Fortunately, seems hadoop can figure out itself where to pick up and continue.
Update:
So I decided to increase the heap size anyway, bump the default size 1000m to 1500m. And then it just works!!!
Howto change heap size:
in the conf/hadoop-env.sh
- change the value: HADOOP_HEAPSIZE, this will increase the heap size for all jobs (DataNode or TaskTracker)
- or add -Xmx1500m to value HADOOP_DATANODE_OPTS, then it only affects DataNode
Also how to restart the dead DataNode:
- manually find out the process on the DataNode and kill it ( I also copied the exact process command as my start-datanode.sh)
- then run start-dfs.sh on master, (or, you can run the start-datanode.sh on the datanode)
Custom Search