Wednesday, August 05, 2015

Use 'set -x' in docker multiline statements

Some people new to Docker are also new to bash which is a must-have skill for *nix sysadmin. Perhaps the most important statement in bash 'set -x' should be used in multiline statements to understand where exactly the whole RUN command is failing. The classical example involves entering a directory and issuing there several commands. Without 'set -x' you will be lost about why and in some cases even what specific command failed. Example:
RUN set -x \
      cd /tmp \

Wednesday, June 17, 2015

Scatter Diagrams from any two columns in Excel 2010

In Excel 2010 to plot a scatter diagram out of two columns that appear in any position and any order follow this steps:

  1. Click on an empty cell; select Menu | Insert | Scatter | Select first
  2. Right click on the chart area: click “Select Data” | Add legend entries (series) | Pick X and Y values | Pick a name for the series for example “Comparison of size and rental cost of apartments” or in general “Comparison of X and Y" | click OK
  3. From "Chart Tools | Design | Chart Layout" pick the first one (layout 1) which adds the axis labels
  4. Remove the label on the right which contains the name of the series. This is redundant as the title already states the same
  5. Click on each axis title labels to select it, then click again inside it to change it to the real name of Y and X

Thursday, May 07, 2015

Talend OutOfMemoryError: Java heap space because of many files in a directory

I have blogged in the past about how to debug OutOfMemoryError in Talend jobs.

There is at least one official Talend component that would be generating these errors when we point to a specific directory containing a really large amount of files. The reason is that some code generates an array of strings containing the file names which clearly will not scale. The way I figured this out was following the steps in that previous post. From Eclipse Memory Analyzer I saw the cause for high memory consumption was an array of strings which matched file names.

Of course it is a bad practice to use a root directory to store all files, one should use a temporary directory per run. So the solution is actually simple. Nevertheless keeping such array of strings is just a waste of resources so that should be avoided as well.

The bottom line is that just automatically increasing memory when a JVM code throws OutOfMemoryError is not an option. Instead the engineer should investigate and get to the bottom of why processes are inefficient. Failure to do so will only postpone the inevitable because simply underperforming jobs won't scale. In the case of Talend as in any java application the JVM provides the tools to understand what happened when a memory leak originated a crash.

Saturday, May 02, 2015

Fastest idempotent way to install nodejs in linux or MAC OSX

Simply install the binary from a plain old bash (POB) recipe ;-)

Thursday, April 30, 2015

Fastest idempotent way to install nodejs in Ubuntu

Originally I created a gist tailored at Ubuntu however a fastest way is just to use a Plain Old Bash script to install the binaries as presented here. This will work in any linux and MAC OSX.