Data lost after Hdfs client was killed – Managing your servers can streamline the performance of your team by allowing them to complete complex tasks faster. Plus, it can enable them to detect problems early on before they get out of hand and compromise your business. As a result, the risk of experiencing operational setbacks is drastically lower.
But the only way to make the most of your server management is to perform it correctly. And to help you do so, this article will share nine tips on improving your server management and fix some problem about java, linux, hadoop, hdfs, .
I wrote a simple tool to upload logs to HDFS. And I found some curious phenomenon.
If I run the tool in foreground and close it with “Ctrl – C”, there will be some data in HDFS.
If I run the tool in background and kill the process with “
kill -KILL pid“, the data has been processed is lost and leaves an empty file in HDFS.
My tool has tried to do sync (by invoking
SequenceFile.Writer.syncFs()) frequently (every 1000 lines).
And I just couldn’t figure out why the data was lost. If my tool has run all day but the machine crashed suddenly, will all the data be lost?
My tool is used to collect logs from different servers and then upload to HDFS (aggregating all log to a single file every day).
The signals SIGKILL and SIGSTOP cannot be caught or ignored.
You could do an
strace to see the effect of your
syncFs() call. Does it actually call one of
fdatasync(), etc? Also, consider different implementations: can you close the file during inactivity/idle?