Monday, November 18, 2013

Disk full, beyond resource leaking it could lead to increased business risk

We do our best to identify big files and directories, delete them and so on. But is that enough? We live in a world of abundance and think that pouring more hardware resources is the way to go when we get that "Disk full" error or alike. As a consequence you get developers using better hardware than what a server might have.

This combined with the lack of performance and stress testing ends up hiding important code problems which lead to resource leaking (memory, file system, CPU) and pop up in the servers at a latest phase.

If you constraint resources in developer machines on purpose then you might be able to find some of those problems quicker.

In a developer machine you will see the disk full:

$ ssh df -h
Filesystem                           Size  Used Avail Use% Mounted on
/dev/sda1                             34G   32G  4.0K 100% /

Time to use lsof to find out the open files:
$ ssh lsof >~/lsof.txt
After a reboot I got back 25% of the file resources:
$ ssh df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1        34G   23G  8.9G  73% /

Now it is time to analyze lsof:
$ sort -n -k 7 ~/lsof.txt | tail -1
java       1645        dev  202w      REG                8,1 9086951424     530660 /home/dev/.local/share/Trash/expunged/555119177 (deleted)

The 7th column gives us the size so we sort by its numeric value and get the last record which contains the biggest consumer. It tells us there is a 9GB file which was deleted but it is still use by tomcat (process 1645). Most likely there is a resource leak. 

How can we find it? Stop any automated processes in charge of deleting files and run lsof when you run out of HDD space again. It should tell you exactly which file is that and you should be able to look into your source code for the resource not being closed. In Java 7 try-with-resources should be used, previously we used to use libraries or simply those of us coming from C would be way more careful when operating with resources. In anyway look into your IDE or compiler options that could help identifying not closed streams. Java developers should turn on their warnings in their IDEs and cleanup the classes they touch. If this leak is not picked by compiler or IDE warnings then reach the community to find out why. Probably findbugs could help and if not reach them out, they will be more than happy to help as far as I can tell.

I have found in my years as developer that we get "overwhelmed" by alerts, compiler warnings and many other "inconveniences" and as a result we ignore them all. All this happens until the team faces the challenge from IT arguing that the software they have built is not efficient.

Quality of code is important and as in any other business defines it's mere future. Code quality is about Risk management and such as a developer you should not ignore warnings.

Happy (and responsible) coding!

No comments: