Wednesday, May 20, 2015
Thursday, May 07, 2015
Talend OutOfMemoryError: Java heap space because of many files in a directory
I have blogged in the past about how to debug OutOfMemoryError in Talend jobs.
There is at least one official Talend component that would be generating these errors when we point to a specific directory containing a really large amount of files. The reason is that some code generates an array of strings containing the file names which clearly will not scale. The way I figured this out was following the steps in that previous post. From Eclipse Memory Analyzer I saw the cause for high memory consumption was an array of strings which matched file names.
Of course it is a bad practice to use a root directory to store all files, one should use a temporary directory per run. So the solution is actually simple. Nevertheless keeping such array of strings is just a waste of resources so that should be avoided as well.
The bottom line is that just automatically increasing memory when a JVM code throws OutOfMemoryError is not an option. Instead the engineer should investigate and get to the bottom of why processes are inefficient. Failure to do so will only postpone the inevitable because simply underperforming jobs won't scale. In the case of Talend as in any java application the JVM provides the tools to understand what happened when a memory leak originated a crash.
There is at least one official Talend component that would be generating these errors when we point to a specific directory containing a really large amount of files. The reason is that some code generates an array of strings containing the file names which clearly will not scale. The way I figured this out was following the steps in that previous post. From Eclipse Memory Analyzer I saw the cause for high memory consumption was an array of strings which matched file names.
Of course it is a bad practice to use a root directory to store all files, one should use a temporary directory per run. So the solution is actually simple. Nevertheless keeping such array of strings is just a waste of resources so that should be avoided as well.
The bottom line is that just automatically increasing memory when a JVM code throws OutOfMemoryError is not an option. Instead the engineer should investigate and get to the bottom of why processes are inefficient. Failure to do so will only postpone the inevitable because simply underperforming jobs won't scale. In the case of Talend as in any java application the JVM provides the tools to understand what happened when a memory leak originated a crash.
Saturday, May 02, 2015
Fastest idempotent way to install nodejs in linux or MAC OSX
Simply run the below one-liner from my plain old bash (POB) recipe as shown below:
export NODE_VERSION=18.19.0; curl -sL https://raw.githubusercontent.com/nestoru/pob-recipes/master/common/nodejs.sh | sudo bash -s $NODE_VERSION $USERNote that you might need to run 'npm rebuild' in your project(s) if you run into errors like:
Error: The module '/app/node_modules/node-expat/build/Release/node_expat.node' was compiled against a different Node.js version using NODE_MODULE_VERSION 64. This version of Node.js requires NODE_MODULE_VERSION 83. Please try re-compiling or re-installing
Subscribe to:
Posts (Atom)