Thursday, May 07, 2015

Talend OutOfMemoryError: Java heap space because of many files in a directory

I have blogged in the past about how to debug OutOfMemoryError in Talend jobs.

There is at least one official Talend component that would be generating these errors when we point to a specific directory containing a really large amount of files. The reason is that some code generates an array of strings containing the file names which clearly will not scale. The way I figured this out was following the steps in that previous post. From Eclipse Memory Analyzer I saw the cause for high memory consumption was an array of strings which matched file names.

Of course it is a bad practice to use a root directory to store all files, one should use a temporary directory per run. So the solution is actually simple. Nevertheless keeping such array of strings is just a waste of resources so that should be avoided as well.

The bottom line is that just automatically increasing memory when a JVM code throws OutOfMemoryError is not an option. Instead the engineer should investigate and get to the bottom of why processes are inefficient. Failure to do so will only postpone the inevitable because simply underperforming jobs won't scale. In the case of Talend as in any java application the JVM provides the tools to understand what happened when a memory leak originated a crash.

No comments: