Thursday, December 20, 2012

Tomcat catalina.out rotation and truncation

Tomcat rotates catalina.out in a way that it logs for the current date everything in its own file ( catalina.yyyy-MM-DD.log ) but it also keeps the whole log without truncation in catalina.out. Most of the traces going to catalina.out can be avoided removing the Console Handler from the below line in conf/
.handlers =, java.util.logging.ConsoleHandler
The reality is that still catalina.out will grow if any library not using logging decides to send its output to the stdout, for example (probably not the best example for production as you won't run a debugger there, will you?):
Listening for transport dt_socket at address: 8000
This might not a big deal after all in terms of file size but we will agree it is easier to read from the same file when monitoring your logs rather than having to inspect a different log every day. It makes sense IMO to go to rotated logs only for historical reasons.

I believe that setting a file size you are comfortable with and making sure any time the catalina.out goes beyond that size is truncated makes the most sense. However that is not easy to be achieved from outside tomcat itself.

Truncating catalina.out from the stop command in makes the most sense out of the very different ways you can stop catalina.out from growing. That means right before the stop section ends (before the else for configtest) we can resolve this issue just adding:
  if [ "`stat -c %s $CATALINA_OUT`" -gt "$CATALINA_OUT_MAX_SIZE" ]; then
    cp /dev/null "$CATALINA_OUT"
elif [ "$1" = "configtest" ] ; then
Of course you need to configure the value for $CATALINA_OUT_MAX_SIZE in

Please understand that catalina.out will stop growing once the log file is greater than CATALINA_OUT_MAX_SIZE *AND* the stop command is used. This might mean you can have eventually a file bigger than the expected CATALINA_OUT_MAX_SIZE but hopefully it will not go out of control as it is the case when you do not take proper measures.

Wednesday, December 19, 2012

Ubuntu 12.04 Unity Launcher How To

Every time I have to come back to Ubuntu Desktop there is some major change I need to adapt to. I will make no comparisons with other Desktop OS but instead will keep here some notes I found useful for Ubuntu 12.04.
  1. You run applications by searching for them via "Dash Board", the top icon from the Launcher (the by-default left bar)
  2. You can add a shortcut in the Launcher to a particular application: Run the app, right click on its icon from the Launcher and select "Lock to Launcher"
  3. The bar can autohide if you use the "Appearance" app, "Behavior" option
  4. The Launcher is buggy at least in 12.04 so it might disappear, freeze, God knows what else. You just need to be able to connect to the console and run the below (which will restart the box and try to rebuild all compiz related settings)
    $ mv ~/.config/compiz-1 ~/.config/compiz-1.BACKUP
    $ shutdown -r now

Tuesday, December 18, 2012

DevOps and VDI configuration - Ubuntu Desktop POB recipes

We have presented before how to install remotely in any Ubuntu desktop Talend Open Studio IDE. The same can be done of course with any other package and I have just released the simple recipes that helped me installing not only Talend but Eclipse and iReport as well.

Here is you would use it to install Eclipse Juno from a CIFS mounting point in your network:
common/ubuntu-desktop/ //fileServer/path/to/Eclipse/gzip/distro/file /mnt/Eclipse WIN_DOMAIN `logname` `logname` eclipse-jee-juno-SR1-linux-gtk-x86_64.tar.gz eclipse-jee-juno-SR1-linux-gtk-x86_64

Automating the Desktop is usually ignored but the time you save is simply to big not to bring my attention to it. Agility after all is to be applied to all areas of your development lifecycle. How can you possibly claim you are agile if you do not automate?

Having Desktop golden images for the Virtual Desktop Infrastructure (VDI) is as important for a software shop as having recipes for server configuration, management and maintenance. From Plain Old Bash scripts you can build a huge collection of recipes that can take care of your whole Infrastructure and development. With a tool like Remoto-IT you can deploy those on demand in new virtual or physical boxes.

Only when your Devs are agile AND your Ops are agile, you can claim you have DevOps in your workplace.

Mountain Lion OSX slow SMB / CIFS / Windows network share access

Mountain Lion OSX slow SMB / CIFS / Windows network share access?
$ sudo bash
# if [ -f /etc/smb.conf ]; then mv /etc/smb.conf /etc/smb.conf.bak; fi
# echo "[default]" > /etc/smb.conf
# echo "notify_off=yes" >> /etc/smb.conf

Multi line comments in bash

Friday, December 14, 2012

Security: Stop Tomcat info disclosure

Not that I believe obscuring information of your running backend services will stop any smart hacker from figuring out what version you are running. In fact are hackers still doing that manually? I would expect them to have automated tools that would simply try to attack using a reverse historical vulnerability list for example.

In any case it is "recommended" to obscure tomcat version number and here is a recipe to do exactly that. We cannot afford patching dozens of tomcat servers following steps and while any configuration and management tool could help we prefer Plain Old Bash (POB) recipes run from Remoto-IT. So we just include a recipe line like:
common/tomcat/ /opt/tomcat

Tuesday, December 11, 2012

couchdb Unknown SSL protocol error in connection

After hardening a couchdb server I found the below:
$ curl -v -X GET https://localhost:6984
* About to connect() to localhost port 6984 (#0)
*   Trying connected
* successfully set certificate verify locations:
*   CAfile: none
  CApath: /etc/ssl/certs
* SSLv3, TLS handshake, Client hello (1):
* Unknown SSL protocol error in connection to localhost:6984 
* Closing connection #0
curl: (35) Unknown SSL protocol error in connection to localhost:6984 
But I have done this several times before and it did work so I knew it should be a problem with certificates. How to debug what is going on? From erlang traces we could barely see any other clue than the fact there was a crash after loading the key so running a local server using the certificate and key should be the next step to later test if from that server (default port is 4433) we get a better curl reponse so:
$ openssl s_server -key couchdb.pem -cert couchdb.cert.pem -www
Enter pass phrase for
And that was the issue, the key needed a password. This can be either configured or removed from the key so it works without a password. No need to say which one is more secure.

Running tomcat remotely

Let us say you want to restart tomcat remotely. Any of the below commands would work when issued directly in the server:
$ service tomcat start
$ /etc/init.d/tomcat start
Unfortunately they will not work if you try something like:
$ ssh -t remoteserver sudo /etc/init.d/tomcat start
That is what ultimately happens when you try to restart tomcat from a POB recipe using Remoto-IT as well.


The tomcat/bin/ script needs to be modified to use the nohup command like:
$ vi /opt/tomcat/bin/ 
exec nohup "$PRGDIR"/"$EXECUTABLE" start "$@"
The reason for this issue is that when the command is run remotely the logout signal causes the tomcat process to be killed as it depends on a terminal session that is to be closed. The command nohup allows the process to be detached from its parent (terminal), basically the process becomes a daemon.

Monday, December 03, 2012

Mount a VirtualBox shared folder in Ubuntu guest

VirtualBox "presents" to the Linux operating systems a device named after the shared name you selected from "Devices|Shared Folders" VM menu option. In order to use that as a directory in Ubuntu or any Linux system you need to follow the same steps you would follow to mount any device. Let us say you picked as a name "reports" to mount some reports folder in our host:
  1. Create a directory to be used as the destination of the mounting point:
    sudo mkdir -p /mnt/reports
  2. Mount the device using the proper file system type (-t)
    sudo mount -t vboxsf -o uid=`logname`,gid=`logname` report /mnt/reports
Look how I mounted after /mnt. That is for the purpose of organization, you should not be mounting in any path (even though it is perfectly valid). You better know from one single place how many mounted paths you have.

Finally look at the use of `logname` invocation to make sure the VBOX folder gets mounted in a directory where the username of the current logged in user has enough permissions.

Friday, November 30, 2012

Map hostname to interface IP from a POB recipe

Here is a POB recipe you can use to add an entry in /etc/hosts for the IP of a given interface. This is great for boxes like developer Desktops where for example you want to make sure apache is installed using SSL virtual hosts, just to name a simple example.

Here is how to call it with an example which creates two entries in the remote /etc/hosts (if used through Remoto-IT):
common/tools/ bhub eth0
common/tools/ eth0

Linux Server cries for Linux Desktop - bash ': invalid option'

This should be an FAQ for bash. If you edit a bash file in a Windows machine or with an editor that uses carriage return (CR) in addition to line feed (LF) to produce new line characters you end up with:
: invalid option
If you open that file in Linux with some editors like vi you see ^M characters (corresponding to CR+LF) whenever a new line happens. With vim you are less successful but using ":set" command it shows the file format is "dos". Another way to look at the file is just using the file command:
$ file Bourne-Again shell script, ASCII text executable, with CRLF line terminators
And we all know you can correct that file in various ways like:
$ dos2unix myFile
Running again the file command will not state it uses CRLF because now the file is converted to use just line feed or ^N as line terminator.

Even if you edit such a file in Linux you will still end up respecting the original ^M in most editors.

Some editors will do this trick as part of the encoding conversion. For example I have seen Eclipse editor converting the file correctly when switched to UTF-8 in file properties, text file encoding option. While I saw that working on linux before in OSX Mountain Lion at least you can't. Luckily Eclipse comes with "File, Convert Line Delimiters To, Unix" option that does work on OSX as well. In my MAC even the file command will not report the file having CRLF when in fact it does.

What everybody keeps on forgetting is the amount of time we lose when we develop in one OS and deploy in a different one. Nobody questions that different environments like Integration, Staging, Production should be similar and ideally clones just with different data state so why is it so difficult to understand that if you deploy in Linux your code should be produced in a Linux machine? Way easier for everybody and with powerful hardware like we have today having Windows and Linux running at the same time in your machine should not be that difficult (in case you need to code in both).

Specifically for bash programming you need to be careful with file permissions as well. Go figure how to handle that in Windows. Unless you work with cygwin on top of Windows and use an editor from inside, edition of bash files in Windows should be banned! Just kidding, but seriously your environment is the foundation of your SDLC, anything wrong there will have tremendous consequences in productivity.

org.quartz.JobPersistenceException: Couldn't retrieve job because a required class was not found

Quartz API will fail if you ever dare to change the name of any existing Job class:
org.quartz.JobPersistenceException: Couldn't retrieve job because a required class was not found
So if you make changes in a scheduled job class do remember to provide migration plans before your release. For sure you will need to rename in the database the class name in case it changes before you use the Quartz API. This is true even if you are trying to delete the job and triggers like I have posted before.

Thursday, November 29, 2012

The agile alternative to Ubuntu release upgrade

We had some Ubuntu 10.10 machines we wanted to upgrade to 12.04.1 LTS so I said let us give it a try in a clone to see how it goes before we work on the real servers.

After updating and upgrading 10.10 we proceed to a release upgrade:
After running this command, going through some questions and restarting the box the system was upgraded to natty (11.04), not precise (12.04). Right there the problems started, mysql won't start, couchdb service would not listen and more. Running the command again did some progress and as you can see below after repeating the procedure 3 times I finally got the version I wanted (albeit several services will need to be tweaked to correctly work):
$ cat /etc/lsb-release 

$ cat /etc/lsb-release 

$ cat /etc/lsb-release 

$ cat /etc/lsb-release 
So this upgrade in phases, restarting again and again the machine, responding to questions etc demands too much human intervention for my taste, too much place for errors. Granted this would not be as bad if coming from 10.10LTS but yet ...

What is the alternative? Having your system scriptable, being able to reinstall the whole server from recipes. Plain Old Bash (POB) recipes would suffice and you can remotely deploy them using Remoto-IT.

Here are some advantages of rebuilding the box from scratch in comparison to upgrading the OS in the box:
  1. Going beyond Disaster Recovery (DR): To recover even from total lost of snapshots (SQL data corruption in snapshots for example), or corrupted DR data. Just tape backups are enough
  2. System Tests: As a real life exercise to test backup/restore procedure plus automated DevOps recipes for configuration and deployment management
  3. Recipes update: Recipes might not actually work with newer versions of OS so with this process recipes are kept current or worst case scenario customized per OS release version
  4. House keeping: From time to time IT guys will do manual stuff, keep big files where they should not, forget about cleaning up broken installations or simply can't afford the time it takes to correct broken dependencies. POB recipes on top of a brand new OS is a solution for that mess for sure
  5. Performance: The bares minimum needed are guaranteed in the new fresh OS. That is not the situation for most of the servers that have seen several IT people tweaking them in many cases for several years
  6. Reliability: Responding questions about the upgrade, manually hitting the button will always be less reliable than automated scripting. It is just the nature of manual intervention
  7. Efficiency: Even today with virtualized environments you will spend more time upgrading the OS and dealing with potential problems than the time it will take to automatically deploy the necessary services and data in a brand new OS release
  8. Security: Last but not least going with POB recipes makes sure there is no hidden malware (root-kit or not) deployed from an external or internal attacker. You simply get a cleaner, more reliable and secure system.
If you use an Ubuntu LTS distribution you need to do this every two years, not bad.

couchdb not listening

After latest release upgrade to Ubuntu 12.04.1 LTS couchdb server would not complain starting, all processes will show up and yet it won't be listening in any port. Log would say nothing.

In this case we need to look at the standard console after running the couchdb server manually:
$ sudo /usr/local/etc/init.d/couchdb stop
 * Stopping database server couchdb                                                                                                                                           [ OK ] 
$ sudo -u couchdb couchdb
Apache CouchDB 1.1.1a1187836 (LogLevel=info) is starting.

=CRASH REPORT==== 29-Nov-2012::16:35:18 ===
    initial call: application_master:init/4
    pid: <0.31.0>
    registered_name: []
    exception exit: {bad_return,
                             " cannot open shared object file: No such file or directory"}}}
      in function  application_master:init/4
    ancestors: [<0.30.0>]
    messages: [{'EXIT',<0.32.0>,normal}]
    links: [<0.30.0>,<0.7.0>]
    dictionary: []
    trap_exit: true
    status: running
    heap_size: 987
    stack_size: 24
    reductions: 176

=INFO REPORT==== 29-Nov-2012::16:35:18 ===
    application: couch
    exited: {bad_return,{{couch_app,start,
                         {'EXIT'," cannot open shared object file: No such file or directory"}}}
    type: temporary

At that point I bet you can correct a lot of stuff but I save time in this cases with a fresh installation using a POB recipe remotely through the help of remoto-IT. Here is such recipe.

Especially after a release upgrade expect to get package corruption. More on Ubuntu release upgrades soon.

Tuesday, November 27, 2012

Debugging JDBC based applications

You are a Java developer and you use JDBC because you use a RDBMS like MySQL. Clearly you will be blind if everything you do is at JPA or ORM level. You must understand JDBC and databases if you want to find out why your application is suffering from performance issues, deadlocks and more.

The p6spy driver helps you with that. It will give you all the complete queries including parameters that are run in your system, when they are run and how long it takes to execute them. We have built a POB recipe that easily installs p6spy in tomcat. Below is a copy of it:
#!/bin/bash -e
# common/tomcat/


curl -o /opt/tomcat/lib/psspy.jar $ARTIFACTORY_JAR
common/tools/  /opt/tomcat/lib/ SVN_P6SPY_PROPERTIES
sed -i 's/\(jdbc.*driverClassName\).*/\1=com.p6spy.engine.spy.P6SpyDriver/g' /opt/tomcat/lib/
sed -i "/JAVA_OPTS -Dp6.home/d" $TOMCAT_SETENV_PATH
echo "JAVA_OPTS=\"\$JAVA_OPTS -Dp6.home=/opt/tomcat/lib/\"" >> $TOMCAT_SETENV_PATH
Breakpoints do not help when troubleshooting concurrency issues. Real world goes beyond the development box and so only hitting your application with a stress test (read JMeter for example) can prove that you have no issues that will later come back to your plate. At a minimum run those stress tests in Integration environment every so often.

Friday, November 23, 2012

MySQL EXPLAIN Divide and Conquer

Especially when you are using sub-selects (AKA subqueries) things gets not that clear when analyzing the results of EXPLAIN. A good technique is to run your subqueries and try to tune them separately before trying to adjust the whole query.

I found myself wearing the DBA hat during this long weekend resulting in a speed improvement of more than four thousand times.

As MySQL manual suggests make sure you run EXPLAIN and look first at those entries with the "extra" field stating "Using temporary" and/or "Using filesort".

Then identify the query bits using those conflicting tables and try to isolate them (divide)

Continue running EXPLAIN again on those bits to come up with a solution that increase performance (conquer).

Contrary to the belief it is not always a lack of indexes, in fact too many indexes will slow the query or other queries down, for sure it will slow down inserts.

Many times is about a query that should be rewritten.

Look for things like not filtering in the subquery. The WHERE clause is your friend. In my case I found the developer used a WHERE clause in the wrapper dataset but not in the sub-selects where he could have done the same for the very same key.

Look for multiple keys or indexes that are just redundant. Remember the rule: A composite index/key is read from left to right, there is no need to add other keys or indexes with portions of the composite, if your fields are organized in that order you can use one, two or more fields and that unique composite key will be enough.

Any job can be done using multiple ways, a developer can use any but the engineer should always pick the faster. Let us strive to be better Engineers when we develop code! Rewrite your query for which of course you must have documentation of what you are trying to achieve. Yes documentation is on the right side in the Agile Manifesto but that actually does not mean it is not necessary.

Learn by challenge. You will not master MySQL optimization by reading the whole manual and ten books. You will master it as you face problems and wonder why the occur. In that process you will definitely learn *just* what you need to perform your job.

In extreme cases some table structures will have to be changed. Even de-normalization will be sometimes the only option but please be sure there is nothing else to do with indexes or rewriting your query. Changing table structure might be hiding a lot more time, complexity and probably no that much efficiency improvement.

Tuesday, November 20, 2012

Windows account locked when working from a MAC

Your Enterprise Domain Controller is for sure expiring passwords but your MAC is not integrated with it and yet you access several network resources like Exchange and the Shared File System (AKA Common Internet File System or CIFS)

I thought I had found all the entries related to my account from the and so I told the sysadmin. However today I had a second thought, what about if the KeyChain search is actually not that smart? And indeed. The search will always look at key entries but never inside them.

In my case the sysadmin was seeing the account being locked from "\\workstation" which is not a name defined in any place after all.

Running the below command I was able to find several old entries for Remote Desktop Connections, File System, Exchange, iCal, Address Book and more.
$ security dump-keychain | grep $username
Then I realized I really had too many entries so I started going through them using:
$ security dump-keychain | more
I ended up then verifying I had a lot of old entries with expired passwords (I went back to and searched for the specific key named "srvr" in the CLI output). Some of those key/attribute/srvr names I found were having interesting Access Control names like "localhost/Address Book", "https/exchange", "IISupport/Mail,Mail,iSync" (note the multiple access control values).

Deleting and later recreating keys on demand sounds like a good option especially if you have migrated from a previous major version of the Operating system. We all know software is ultimately buggy so you never know how the old garbage can actually impact in your newly installed OS. In my case I upgraded from Snow Leopard to Mountain Lion. BTW to avoid confusion I am not stating though the upgrade is responsible for the locking as this was happening when I was in Snow Leopard as well.

Other causes

I have found manually created entries in /etc/fstab to a be cause for this issue as well.

Monday, November 19, 2012

Talend Eclipse default new line and Control M characters

I noticed some code I wrote in Talend Component's Designer Perspective was inserting Control M characters as new lines. Apparently the default for "New text file line delimiter" is not being picked correctly in OSX (10.8.2) using Talend Version: 4.2.3 Build id: r67267-20110905-0421

Sunday, November 11, 2012

OSX brew: Error: Cannot write to /usr/local/Cellar

sudo chown -RL `logname` /usr/local/Cellar

Filter the flow in Talend Components

When building a custom Talend component you might want to ignore certain input rows. I spent a lot of time trying to figure out why my tFileInputCSVFilter component was insisting on printing empty records for the filter flow and for the reject flow, the reason? Just an attribute from /COMPONENT/HEADER node :
In Talend by default when you use a flow from a main template and you don't assign anything to the output connector the result will be an empty record unless you set HAS_CONDITIONAL_OUTPUTS="true".

Wednesday, November 07, 2012

VDI Path: Access Linux Desktop remotely using RDP

Update 20220123: Works well for 20.04.

Update 20190511: In 18.04 clipboard does work again

Update 20161201: In 16.04 Clipboard is not longer working.

Update 20150731: Clipboard works great in Ubuntu 14.04 LTS, however Unity is not supported.

Original post for historical reasons: RDP has been for ages a great protocol giving Windows the big advantage on the Virtual Desktop Infrastructure (VDI) competition. The FreeRDP GitHub project is changing that little by little especially thanks to XRDP. Here is a POB Recipe to install XRDP. For the latest version of the script please pull the recipe from github pob-recipes project.
#!/bin/bash -e
# common/ubuntu/

apt-get -q -y remove xrdp
apt-get -q -y update
apt-get -q -y clean
apt-get -q -y purge xrdp
apt-get -q -y install xrdp
After you have done so the Ubuntu Desktop will then be available via RDP so you can use any RDC to connect to it on default port 8309. You must login with an existing linux user so make sure the user you are trying to connect with is in /etc/passwd

I have found a very good performance after setting the desktop background to a simple color. Also restarting the desktop is recommended to make sure after a fresh reboot RDP will be available.

You can configure the port and a lot of other settings from ini files in /etc/xrdp/

RDP Sessions

XRDP does a great job and by default it does what you would expect if you are disconnected from your session: Upon reconnection you will continue working using the very same session you were when you got disconnected. Just make sure you do not change the resolution of your client before you completely logout from an existing session because that will end up creating a brand new session.

Tuesday, October 30, 2012

Monday, October 29, 2012

JSON viewer and editor from a local HTML

I have looked at several JSON tools out there. So far I like the simplicity of json-editor because JSON is after all a Javascript Object Notation and so it makes sense to just have an HTML page which uses javascript to manipulate a JSON structure you want to either visualize or edit.

Just in case you are afraid from using github, not clear on front end web development or simply just trying to quickly try just one more json edition/viewing tool here are the steps you follow:
  1. Save the zip file as "json-editor". Alternatively you can clone the project and so on but I said I wanted to provide instructions for those not using git and still interested in trying just yet another json editor
  2. Uncompress the zip and double click on index.html
  3. Paste your json in "Value", hit "Save" (no worries it is just "saving" the json in memory)
  4. On the right you get the JSON tree representation. As you click on different components you get different paths in "label"
  5. You can manipulate the json structure while adding or deleting child or sibling elements in any existing element. You can edit an element and pressing save will update again the local (in memory representation of the object). Clicking on the root node you can see the whole JSON string again.
Handy right? I have asked one of the developers what he thinks about supporting JsonPath

Saturday, October 27, 2012

Talend Component Creation Tutorial

Building custom ETL components is a necessity in any ETL suite. With Talend it is not difficult to create your own components albeit it is not straightforward either.

I have written a tutorial that I just released in github together with a hopefully useful component as well (tFileInputCSVFilter).

The tFileInputCSVFilter component is just a second step after the initial approach of running the code out of a tJavaFlex component.

So you could basically try to develop your code using a tJavaFlex and once happy you can move to the custom Talend component creation. Of course you can jump right away into the component creation as well.

Thursday, October 25, 2012

Automate security patching in Ubuntu

Important: This worked for Ubuntu 10.10. Here is a POB recipe you can use with Remoto-IT to deploy cron-apt in your servers correctly configured to get security updates automatically installed.

While you need to be careful with automated updates and especially upgrades in Ubuntu, security updates should be performed ASAP and they are *most of the time* safe.

If you want to get notifications only when a security upgrade is performed (recommended) then use:
common/debian/ upgrade

Test the installation

You can force the process to run right away while changing the cron expression in /etc/cron.d/cron-apt, then inspect the log file in /var/log/cron-apt/log. You should get something like:
CRON-APT RUN [/etc/cron-apt/config]: Mon Oct 2 04:00:01 EDT 2012
CRON-APT SLEEP: 3322, Mon Oct 2 04:55:23 EDT 2012
CRON-APT LINE: /usr/bin/apt-get update -o quiet=2
CRON-APT ACTION: 3-download
CRON-APT LINE: /usr/bin/apt-get autoclean -y
Reading package lists...
Building dependency tree...
Reading state information...
CRON-APT LINE: /usr/bin/apt-get upgrade -u -y -o APT::Get::Show-Upgraded=true
Reading package lists...
Building dependency tree...
Reading state information...
The following packages have been kept back:
  linux-headers-server linux-image-server linux-server
0 upgraded, 0 newly installed, 0 to remove and 3 not upgraded.

Friday, October 19, 2012

Notes on SpringOne 2012

2012 SpringOne was full of good information like usual. The sad part of 2011 was the absence of Rod Johnson who was supposed to give the Key Note but was apparently sick so could not attend. Spring 2012 was not different in this regard. As we already know Rod departed from Vmware in July. Even though this is supposed to be a straight technical note I could not help to write some words of gratitude to the great Engineer that saved us from the complexities of J2EE taking us into a journey of IoC that we still of course enjoy. Thank you Rod, we owe you a lot!

Back to the event here are some of my notes about it.

The classical Spring Triangle (DI/AOP/Portable Service Abstractions) is being re@annotated (Injection Annotations/Composable Stereotypes/Service Oriented Annotations). All this can be defined with two simple words "Annotated Components".

Stereotype means defining a noun for example @Service. You can compose stereotypes meaning you can build an annotation that groups several others. A composable stereotype model is then just an Annotation definition (Interface) which groups some annotations itself. Injection Annotation defines a need for example @Autowired. Service Oriented Annotations define a capacity like @Transactional, @Scheduled, @Cacheable.

Spring supports a programmatic way of configuring applications. While this can be handy it looks to me like a potential problem for some teams lacking of good architectural direction. Separation of concerns could be easily violated if not used with care. Spring 3.1 takes already advantage of Servlet 3.0: WebApplicationInitializer replaces a lot of xml with java code. There is a composition model, some properties can be in web.xml while others in the Java initializer. There is no overriding mechanism. The application can now be initialized without web.xml help. A couple of methods worth mentioning: scan() and register(). A couple of annotations worth mentioning: @Configuration, @Bean.

Spring 3.2 being in github is now more open for contributions. Expected to be released in December 2012 it features Gradle based build.

Spring 3.3 is expected by the end of December next year. It is based on JDK 8 so it will support and use for JDK closures (lambda expressions), Date and Time API (JSR 310), NIO based HTTP client APIs (getting rid as we all know of jakarta commons http-client), parameter name discovery, java.util.concurrent enhancements.

Support for XML free JPA setup is now available.

MVC on the browser is becoming more popular and Spring is promoting that while proposing architectures like having not only the View but also the Controller in the client (in fact even a local model. This is of course not new, look at Gmail for one popular example. However I would expect Google to have an MVC on the Server side of Gmail as well). Spring propose only the service and data access to live on the server side. I personally have advocated for years that the MVC pattern does not necessarily mean you must have all layers in the Server. That is now a reality with frameworks like Backbone and Angular which deploy Controllers on the Front end and support local storage in browsers. However IMO the interaction between back-end and front-end MVCs might suffer of serious lack of DRY. The proposal to eliminate the Controller completely from the backend IMO could make security even a bigger pain of what it is today. Not to say that moving just the Controller to the Browser does not impose already additional security threats. The reality is that rich applications are demanding more and more and it looks like unavoidable to follow at least *part* of such advice.

Programmatic configuration goes further with MVC Java Config which can be used instead of MVC namespace. I would say that Architects have now bigger concerns about the classical "Where did my Architecture go", so my advice would be "Do review the code!".

Current @Cacheable is proprietary but as soon as Java has the implementation available (JCache JSR-107) Spring will integrate with it. We personally have been enjoying caching for while but it is good to go with the standards (whenever they are kept simple of course).

ASM and CGLIB are included into spring module jars.

Async MVC processing: This uses the Servlet 3 Async Thread Model. There are different approaches depending on the use case: Callable (swaps the servlet container thread by an application thread to process the request. The servlet thread is suspended and later resumed when the request is processed), DeferredResult (Out of Spring process which provide a DeferredResult) and AsyncTask (which wraps a callable to add features like timeout). @RequestMapping methods can return any of these objects. Look at the async tab in the Spring MVC showcase. There is chat sample using redis showing a distributed chat application which uses the async concepts. Long Polling is supported through these async approaches. All Spring Filters have been updated to support async-supported features from the specifications.

There is support for annotation driven JMS endpoint model

Spring will not stop support for java 5 and java 6.

To find out what is new in Spring 3.2 look at Stoyanchev presentation accessible from spring-mvc-32 update github project.

Here is a handy class: UriComponentsBuiolder to build urls

If you are using servlet 3 multipart requests, simplify your life using @RequestPart

Do better exception handling in your applications: @ControllerAdvice allows to globally define @ExceptionHandler to handle global exceptions. The same applies for @InitBinder for global binding and @ModelAttributes across all controllers. Global Exceptions is presented in the Spring MVC showcase project as well.

ContentNegotiatingViewResolver looks by default to the extension of the url to determine the View, then the accept header, url extension and parameter (format which I have been calling in my implementations ert or Expected Response type) in that order.

For error handling now we can rely on Custom Error Page in Servlet 3 (error-page in web.xml).

Path segment name-value pairs are supported through @MatrixVariable

Spring Mobile is an extension to Spring MVC for server side. It compliments client-side mobile frameworks. It does device detection, site preference management (storage engine for user preferences - by default cookies), site switcher (to switch from mobile and desktop). LiteDeviceResolver is the default implementation. Other classes to take a look at: DeviceWebArgumentResolver and DeviceResolverHandlerInterceptor. Support for Java configuration will be added soon. The site switcher capability approach which redirects to different sites for different devices does not look very DRY to me.

Platform targeted Sites can be developed using technologies like Lumbar and Thorax. I am not a big fan of this to be honest. I already explained my position about MVC, DRY and the need of a Business Hub when building web applications. In terms of selecting which front-end pieces should be included in one project or the other we have been using Maven overlays with success so I personally am not embarking on this.

Still CouchDB is not supported in Spring Data. I suggested looking at the Ektorp project for this. The familiar Spring Template pattern is used to access the supported NoSQL DBs. CrudRepository, PagingAndSortingRepository are some of the class worth to mention here. Implementing CrudRepository the JPA entity can be directly exposed via REST. JSON is first class universal protocol for this. Even for JPA you can use CrudRepository which exposes basic operation on entities. The entities can be exported automatically using REST semantics. I suggested looking at jpasecurity project for some enhancements in Spring Data project.

QueryDSL is less verbose than JPA2 Criteria API and Spring has embraced it for their data project.

The spring projects are in github which we already know is great for code collaboration as well as code review. Spring team calls the code review part "Jurgenization" prizing Jurgen contributions to coding conventions. They are migrating to Gradle everything they can.

Websockets support is not fully standardized yet but Spring has been working on it. Websockets try to solve issues like too many connections, too much overhead, burden on the client side. Trading, chat, gaming applications, collaboration, visualizing a lot of data are good candidates to take advantage of websockets. The Websocket Protocol (RFC 6455) uses HTTP to bootstrap but it runs on TCP directly, this is a low overhead solution. A simple header is sent by the client "Upgrade: websocket", the server replies a 101 "Switching Protocols" status code with a header "Upgrade: websocket". The library called d3.js is a good library to visualize data which the guys at springsource have used combined with vert.x library to rewrite the application (which uses Long Polling) using websockets (In there are several implementations using different technologies). From chrome the websocket protocol frames can be inspected. The technology is still new and a lot of users are still on browsers which do not support websockets. Existing proxies become a problem for websocket support. Encrypted (wss:) traffic will create better possibilities to go around this issue. Some manual configurations could be needed in browsers and server side proxies. Keeping connections alive is a problem when using websockets. Out of the box there is no confirmation of message delivery (even though there is "ping-pong" which can be used to provide "keepalive" and "heartbeat"). In Java the Java API for websockets (JSR-356) is still evolving and most likely will not be tied to Servlet specification. Spring plus vert.x can be used to develop applications based on websockets. Sock.js is a great client side library to implement websockets applications, it would fall back to other means of push protocols when the client does not support websockets so sock.js is definitely an excellent API to do asynchronous messaging between client and server. The message from Spring team: Websockets is a promising technology as a complement but it is not a silver bullet, the need for fallback options will be there for a long time. Backward protocol support is an important niche for Frameworks which Spring probably will address in future versions.

Springsource is promoting "Spring MVC used less for page rendering, more for REST API". we saw that from the second Key Note and also from several other presentations. Hence the search for a good front end framework starts and the options are really huge. Mustache is one good template library just to mention one, look for alternatives to see the rest. Backbone is one of the most popular MVC javascript frameworks.

REST support is fully supported in Spring, for example @PathVariable binds to the JSON request specific path. Spring (Data) REST uses Spring HATEOAS. Wikipedia says "The HATEOAS constraint serves to decouple client and server in a way that allows the server to evolve functionality independently" which I still have to see to believe. Design by Contract cannot be ignored and developing a client that will adapt on the fly to new servers needs is not something we mere mortals can do in 2012. Just the fact that the resources available can be listed does not guarantee the above, not to mention concerns about security.

While REST might be the way you want to go when all your clients will be speaking REST the reality is that this does not comply with my idea of a Business Hub (BHUB). As a reminder this idea is what allows me to serve the resources from one entry point and leave a View resolver determine what type of response the client needs. This is not just about a javascript client framework but about reports rendered in pdf or Excel for example. Of course is kind of impossible to get both: real REST approach and a Business Hub approach. This is because you cannot force non rich web clients to use JSON posts. On the other hand your REST applications need as much documentation as a proprietary BHUB approach. With BHUB you play just with normal POST, GET parameters and you certainly get JSON (but not limited to) back.

Spring Mobile has cool projects like Urban Air Ship to abstract the way you use push notifications for different platforms like Apple iPhone.

A JBoss presentation discussed the Spring and JEE coexistence. There will be always a space for Service simplification. In particular JPA (and JNDI), JTA, JMS, JCA,EJB, Cache (JSR-107), WebSockets (JSR-356), CDI and Bean Validations 1.1 are services that the JEE application server is already providing and Spring will transparently use those (if available) provided the correct configuration exists. So the point is Spring does support JEE provided services and not just plain servlet container servers like Tomcat where non of these services are available nor deployed by default (certainly there are ways but with Spring there you rarely use those for non JEE containers). Other capabilities like JSONP is simple enough and Spring will not introduce any simplifications. Multithreading and Spring Batch are still preferred to the vendor specific JSR 237 (WorkManager) which was withdrawn anyway in favor of the Java Concurrent API (JSR 236), dormant since 2003 but recently announced it should come to lige in Q1 2013. The use of Arquillian is again proposed to test JEE based code. I think I already blogged about my opinion on this when I posted about JavaOne. The Seam framework/CDI (JSR-299 ) extensions have been donated to Delta Spike and yet CDI beans can be used from Spring. In fact there is work in progress for bidirectional injection from CDI to Spring and the reverse. In JEE7 that bidirectional relation is getting tighter. The integration with JEE is a high priority for Spring.

IOC in Javascript looked appealing for those not happy with functional programming. IMO separation of concerns should reign and a Front End Engineer cannot say Functional Programming is not a perfect paradigm for the event driven nature of UIs.

A migration to a JSR-352 approach was demonstrated showing how Spring adapts easily as in fact many ideas of the specification come from the Spring Batch implementation. I have to say this again this year, I will not comment on Spring Batch versus ETL tools because I believe it is matter of how you structure your team and probably a subject for a lot of complex considerations that go beyond simple software development. For now I am not planning on using Spring Batch.

Some interesting notes on Testing: MockMvc allows to test Controllers. I personally think your behavior tests (with Selenium Web Driver) should cover anything wrong with Controllers however there is more about the Spring Test MVC. New @WebAppConfiguration (defaults to src/main/webapp) in Spring 3.2, @ContextConfiguration defaults to get a local file with the name of the class followed by "-context.xml" in the same path. Using EasyMock and a factory method Spring manages to inject mocked objects for testing. Mockito is also supported through a constructor passing the to-be-mocked class. In both cases a Factory is in charge of generating the mock. MockServletContext, MockHttpSession, MockFilterChain, MockClientHttpRequest and MockClientHttpResponse have been introduced. Here is another concern for Architects: I only hope developers will not put servlet scope objects (request, context, response, session) in services now that you can mock those from Spring. An ApplicationContextInitializer can be used to avoid annotations or xml for initializing the Spring Context for testing. Sprint Test MVC is an independent project, it depends on spring 3.2. Some limitations: No forward nor redirect, no JSP rendering, other rendering technologies do work as they do not depend on a real servlet container. I still believe Selenium WebDriver is the best way to test that tier in any case (Granted the problem is with side effects). In any case there is value on Controller Unit Test of course. IMO this framework creates interesting possibilities to perform automated Security tests like XSS attacks for example, however as noted before JSP won't be supported. You can check not only status, headers and content but flash attributes, handler, model content from Spring context as well. The method alwaysDo(print()) is used to provide information about the "perform" action. Method andReturn() will return all context servlet main objects in case we want to assert more specific data not available yet from the framework. Testing filters is powerful for example for spring security filter testing. HtmlUnit enables using Selenium tests but again that is not available for JSP. Learn more about this from the spring-32-test-webapps github project. In addition there are client and server tests in the Spring Framework itself so get the source code from github and start your own journey of fresh spring simple coding.

Notes on JavaOne 2012

Make the Future Java was the slogan for JavaOne 2012.

After spending 5 days x 12 hours of different session trainings at JavaOne I couldn't help to blog about my impressions on the pure technical side of the event. Please note that a significant portion of the presentations were handled by community driven projects and not precisely by Oracle so some of my notes below reflect information acquired from external to Oracle entities.

Garbage Collection optimization is still a time consuming and complex process that demands a lot of trial and error. The hope is that G1 will come to the rescue of mere mortal programmers

Troubleshooting JVM performance issues: Oracle is working on having all features from Flight Recorder plus JRockit Mission Control into Hotspot Mission Control. JTRockit will be deprecated in 2013. For commercial purposes JVisualVM will not be integrated with Mission Control. Java Mission Control is a graphical tool that provides information about the JVM, the client side of Java Flight recorder if you will.

From the origins of RMI all the way to WebSockets we are still trying to get distributed computing right. WebSockets is pure TCP, the same way REST is pure HTTP and with pros and cons it looks like the community will only keep using both in the next years.

Use the new generics enhancements to make sure a specific class is returned by methods which operate only on interfaces. Guava and Goldman Sachs collections are recognized as great enhancements to the JDK library.

Use command line tools (and of course script them) to know more about your JVMs: jps to find JVMs running in the system (-m and -v options); jcmd which is similar to jps for listing but it can send commands to the jvm so it can be used to diagnose the JVM (jcmd VM.version for example). A list of commands can be passed in a file via -f flag. Out of the box it allows for deadlock detection as it can pull stacktraces from the application (This creates a possibility for some interesting monitoring right?); jstat is used to list counters from inside the jvm. JVisualVM can be use to take a JVM core dump files and analyze them but jstack will do the same from command line (you get more power we would agree). Again jcmd is useful here: jcmd GC.heap_dump file.dump).

Inspecting the JVM can be done through several methods: JMX (jvisualvm as jconsole uses JMX for remote access), daemon (jstatd is a daemon that can be run in the server and then use jstat to connect to the it - There is no permissions here, so be careful where you run it), attach (It is used by jmap, jcmd and lps. Only available locally and for the same user). The file /tmp/hsperfdata has lot of JVM runtime information which is constantly updated by the JVM. Use jstack command for core files or non responsive jvm. Use it as last resort, it uses the debugger to pull information. JVM built-in profiler and tracer use a circular buffer with low overhead. It collects info from JVM.

Command jcmd should replace in the future jstack, jmap, jinfo. Improved Logging for the JVM like garbage collector logs have rotation but not the rest so the plan is to unify them.

The Java Discovery Protocol (JEP 158) will be used to broadcast information from the jvm so tools like jvisualvm can be notified.

JRockit Mission Control can be used to find duplicated Strings for example (those candidates for interning right :), we know we can use tools like Eclipse memory analyzer (MAT) for that but certainly it would be nice if the JDK itself comes with the tools we need as developers (the one-stop shop concept saves time of course)

Intel presented their SPECjbb2012 results for JDK7. They found no issues with most APIs: New I/O, JAXB, Try-with-resource, Catching Multiple Exceptions types, Type Inference for Generics Instance creation, Underscores in constants, Concurrent Utilities. However the Fork/Join Pool was found to be the big problem: Contentions and network throughput issues. JDK8 according to Oracle is simplifying this API so probably they will correct these detected issues.

The need to move to Java7 is clear. Just to mention a fact even though there is commitment to patch security bugs for JDK6 the support for it will cease in 2013. But there is more.

Find out from the jdk7 release notes website what is new in jdk7. Here are some of those features: string switch, diamond operator, simplified exception handling, better garbage collection (the basis of G1), multi-catch and try with (exceptions improvements)

G1 is specially designed for big heaps (above 6GB). It is better than CMS especially for fragmented heaps. It works dividing the memory in different regions which are heuristically selected for garbage collection (Divide and Conquer right ;-) It can be tested in JDK7u4+) with "-XX: +Use1GC". CMS GC will be deprecated soon.

Contention is avoided in Date (from Hashtable to ConcurrentHashMap). There are BigDecimal improvements. String to byte conversion improvements.

Java upgrades are supposed to come now every two years. Skipping versions means a bigger update gap. IntelliJ was presented as the only automated refactoring capable tool for migration to java 7. Personally I recommend looking into current open bugs before deciding for an upgrade to java 7 of course but you should be planning for it.

There are performance improvements in JDK7 specifically in JDBC, JAX-WS, JAXB, java io, async io

Here is a recipe for your JDK7 migration: Compile to java 6 using jdk7 first. Test for some time, then upgrade the Runtime to Java 7. Finally migrate code to Java 7.

Java 7 has a more strict API so expect some assumptions you have incorrectly done to break parts of your code for example SortedKey must have as input an Object which implements Comparable Interface.

Opencl is used today in jdk7, the integration inside jdk8 will continue just to finalize in jdk9 bringing full abstraction to the developer while the JVM takes advantage of GPU computing. Sinatra project promises to bridge the gap between Java and GPUs. MMUs can allow sharing virtual address space (Heterogeneous computing). There is important collaboration with Intel that might lead even to ship Java in hardware in the near future ;-)

JDK8 will remove PermGen space with the data going to the heap and native memory.

The London Java User Group has been praised for their work on 'adopt a JSR' and their contributions to the JCP. The message is: Oracle is taking very serious the interaction with the community so they are demanding us to contribute.

Java embedded had a Perrone Robots presentation (demo failed as expected - Murphy Law)

Java embedded best moment IMO was when Liquid Robotics (presented by Gosling) showed how they can control thousands of little ships which are moving using waves energy in a mechanic fashion. It auto-generates the energy to communicate data from its sensors via GSM or satellite depending how close they are to GSM networks. A piece of engineering.

Use the process to change the process is what Oracle is expecting from JCP. We certainly are looking forward for it.

A considerable part of JavaOne talks was dedicated to the promise of Lambdas (closures) in Java8 . Lambda is nothing more than anonymous functions but with not just new syntax, look for the use cases they can cover inside the JDK code itself. Java has been behind of most of the programming languages in this regard BTW. You can learn more about Lambda project using the lambda-dev mailing list. I heard more than once the statement: Developers are looking into Scala for features that are not supported in Java. Oracle is listening to the community and we can expect Java to get richer. Lambdas abstract behavior just like Generics abstract type. The code is treated as data (behavior can be stored in variables). Lambdas are more about the what and less about the how. For example with lambdas instead of the client being in charge of managing the loop, the library is in charge of the internal iteration. The way lambdas have been implemented is providing a default() method in interfaces. This rapidly brings a lot of questions about multiple inheritance and here is the explanation from the JDK team: Interfaces already provide a multiple inheritance mechanism for types, lambdas enhance multiple inheritance to behavior BUT not for state which is the real problem with C++.

JavaFX is the de facto standard to build native applications. AWT while providing OS specific native components lacks a lot of bells and whistles that Swing came with but on the other side the latter lack of support for specific OS native UI features is calling for its end of life.

JavaFX web view and jfx panel create a good opportunity to construct hybrid applications (Native UI with JavaFX + HTML5 + Javascript). A clone of the jVisualVM done with JavaFX was presented.

Use Solaris truss and Unix/Linux strace to debug database performance issues.

Nashorn (Naz-horn is the right pronunciation) brings javascript inside the JVM. A demo was presented using Mustache as javascript templating engine. It scales well and runs in small devices like Raspberry PI. The engine is 20 times faster than Rhino. Nashorn implementation relies heavily in Invoke Dynamics. Of course shebang is supported and so nashorn can be run from command line as well. There is node.jar which is a port of the nodejs API. These are interesting news that could benefit Node from the existing Java Services and Java itself from the power of NodeJS.

JEE 7 includes in the platform key features "to avoid the use of proprietary frameworks" and I quote it. I will be posting soon my notes about SpringOne BTW ;-)

Cleaner API is a mission for JEE. JMS is so simplified that it looked to me like Apache Camel code.

I heard the word POJO a lot, and not just Beans ;-)

The web socket API looks really clean, same for batch with annotations and Java Temporary caching.

DI is heavily used across the whole JDK.

There was a demo on web sockets called Angry Bids. All built on top of JEE using a REST approach.

Doing a remotely retrospective we reviewed DCE, COM, CORBA, RMI, RMI/IOP, SOAP, REST, Websockets (web sockets is just plain TCP)

Look into JDK secure coding guidelines

Software Archeology is unfortunately a common challenge, especially when you are a consultant or simply switching jobs. The amount of legacy and undocumented code makes your life difficult and we discussed how to mitigate this reality. Finding behavior is reduced to documenting using activity and sequence diagrams. Finding structure is about deployment, component and class diagrams. I would add to the equation (if not favoring it as highest priority) User Stories. Some tools can help here like trace based analysis using byte codes or aspects. Tools like mission control can help but it does not provide the order of method calls. There are tools that allow to have an output like the one from strace but from the JVM. We can generate system dumps and then analyze them after. IBM hosts in their website some of these fee tools (JVM trace for example)

Codename One presented how to build iPhone Applications from Java. I share Martin Fowler's opinion on this issue

Verisign presented JEE security in practice. You can find information they maintain in here. They discussed the use of HttpRequest#authenticate(), @ServletSecurity, session#logout(), HttpServletRequest#[getRemoteUser(), getUserPrincipal(), isUserInRole()]. Not always possible but white listing is always preferred. Use declarative security first then as needed use programmatically security. It was clear to me how ahead Spring Framework is in terms of security in comparison with JEE in the Web Tier. IMO vendor locking is precisely where Spring excels so to claim that spring security cannot be compared with official JEE just because the later is a standard is not a wise statement as far as I can tell. For one all vendors after all add security features in their application servers for example, so you will be locked of course.

In Java 8/9 we can expect a Modular Java Platform (Project Jigsaw). It allows to package ME and SE together, it will try to resolve the the jar hell problem, the scalability (down to small devices like Rasberry PI and up to the cloud like Oracle Exalogic Elastic Cloud T3-1B). Performance is expected to increase both in terms of download and startup time. A couple of comments on language keywords: the module keyword allows for organization of Java packages and the the public keyword loses its meaning as it is not longer public to the outside unless exported.

A presentation promoting agile JEE development using JBoss, IntelliJ and JSF including the use or Arquillian for testing which basically had to restart a servlet container every time a JUnit test was triggered. It was really IMO not that agile.

JDK Enhancement Proposals (JEP) promise to bring more community participation to Java as an open standard. The OpenJDK project is after all the incubator for new features of the Oracle JDK (Hotspot). Boxing will be removed at some point as an example of one of those current JEP. Just search for JEP to get an idea of the new features and enhancement proposals.

JEP 159: Enhanced Class Redefinition is in implementation phase. This will allow the hotspot to support redefinition of classes, method signatures and more.

A clarification for the meaning of @deprecated within JDK code: For the JDK team it does not mean it will disappear from the source code. The reason is backward compatibility.

Check your application is correctly using all cores. Modern computers use NUMA so use the optimization Flag -XX:+UseNUMA to allow a more optimal usage of memory. Bunch of other flags for you to look at: -XX:StringTableSize, (interned Strings), -XX:+UnlockExperimentalVMOptions to use even in JDK 6u21+ -XX:+UseG1GC between others.

Tuesday, October 16, 2012

Install a Custom Talend Component

Talend custom components are a nice way to go around limitations like bugs and missing features. Here is how you install them (tested in version 4.2.3):
  1. Download the component (most likely a zipped file) from a provider (most likely from Talend Exchange
  2. Uncompress the zip making sure it contains all files inside of the root of the resulting directory
  3. Copy the directory to plugins/org.talend.designer.components.localprovider_$TALEND_VERSION/components/
  4. Restart Talend and access your component
Some components could fail to be recognized due of issues with the xml declaration schema which you can find with a command similar to:
$ find $TALEND_HOME/ -name "Component.xsd"
You can validate the component xml against the schema using an online service like That is how I found for example the tFTPGetFile was missing the node. As a side note I also had to repoint to the module edtftpj-1.5.6.jar from the GUI for this component.

Note that you can avoid restarting Talend to get your components recognized and ready to be used. The Generation Engine initialization is responsible to recompile javajet templates. This is triggered when you first load talend but it can be also triggered while pressing shift+ctrl+f3 (add fn if using a MAC).

If you find out the component is unable to load jar files it needs or any other weird behavior consider cleaning the cache deleting the file \configuration\ComponentCache.javacache and restarting Talend after.

Friday, October 12, 2012

Skipping lines on top of a file with Talend

This is straight forward in Talend. Just use the tFileInputFullRow "header" setting which as per the help defines the "Number of rows to be skipped at the beginning of a file":
I hurried too much on providing a solution using the tJavaFlex component which I not longer need thanks to the answer in Talend forums.

CSV Splitter or Filter with Talend Java

The Data Team was in need of an unnexistent Talend behavior.

Something that we could call the CSVSplitter or the CSVFilter, a component that would take a CSV file and would output that same row only if a lookup column matches certain content.

Of course you might think at first of a combination of tFileInputDelimited and a tFilterRow but that would not work if you do not know the schema. We need some schema less or dynamic schema component for this use case.

Jump directly to learn how to get this done from a component or read below to understand how you can do this from tJavaFlex and later build your own component with similar code.

Here is a project that shows this proof of concept. It parses an inputFile using a delimiter, and outputs only the lines where the lookupColumn has a specific lookupValue (four parameters). Below is a screenshot of the POC. I needed to use the tFilterRow because tJavaFlex will output blank lines when there is no output from code:

This approach has a big advantage. Instead of having to create a job, subjob or project per schema to parse a unique single job can take care of all your CSV splitting or filtering needs.

You can test the project with the below file:
person| city
Paul| Miami
John| Boston
Mathew| San Francisco
Craig| Miami
Change the lookupColumn between person and city and change the lookupValue to see how it filters the rows. Change the delimiter to test that as well.

Below is the code for the import, begin, main and end methods with the addition of a new requirement: Start parsing the file at a given row (starting at 0) where the header is expected to be. Import:
import com.csvreader.CsvReader;
BufferedReader reader = new BufferedReader(new FileReader(context.inputFile));
ByteArrayOutputStream out = new ByteArrayOutputStream();
int rowNumber = 0;
String line = null;
while ((line = reader.readLine()) != null) {
  if(rowNumber >= context.headerRowNumber) {
    out.write((line + "\n").getBytes());
InputStream is = new ByteArrayInputStream(out.toByteArray());
CsvReader csvReader = new CsvReader(new InputStreamReader(is));
char delimiter = context.delimiter.charAt(0);
char textQualifier = csvReader.getTextQualifier();

String[] headers = csvReader.getHeaders();
StringBuffer sb = new StringBuffer();
for(int i = 0; i < headers.length; i++ ) {
  String header = headers[i];
  sb.append(textQualifier + header + textQualifier);
  if( i != headers.length - 1 ) {
int i = 0;
while (csvReader.readRecord()) {
String lookupValue = csvReader.get(context.lookupColumn);
//System.out.println("'" + context.lookupColumn + "'|'" + context.lookupValue + "'|'" + lookupValue + "'");
if(lookupValue.equals(context.lookupValue)) {
  if( i == 0 ) {
    row2.line = sb.toString() + "\n" + csvReader.getRawRecord();
  } else {
    row2.line = csvReader.getRawRecord();

Putting it all in a Talend Component

I have built a Talend component that encapsulates the logic here presented. It is included in a github project which contains a tutorial on how to build Talend custom components.

To use it you just need to configure the component to parse a file like the above. Look at the picture below for a usage example:

Tuesday, October 09, 2012

iReport Attribute 'uuid' is not allowed to appear in element

Some renegades still refuse to use Linux as development environment when working with Java, JasperReports, Talend etc. OSX so far looks OK but Windows one way or the other is always bringing issues.

Today I had to spend sometime with iReport Designer tool in Windows. We wanted to upgrade from version 4.1.3 to 4.7.1 so we opened the old reports perfectly in 4.7.1, compiled them, run them. Everything seemed to be perfect until the report was modified in which case version 4.7.1 would behave like version 4.1.3, basically it does not understand the new XML schema:
Error loading the report template: org.xml.sax.SAXParseException: cvc-complex-type.3.2.2: Attribute 'uuid' is not allowed to appear in element 'jasperReport'
I could not find a way to make 4.1.7 import settings from 4.1.3 without stopping from parsing correctly the JRXML which contains in newer versions the uuid attribute in multiple nodes.

So my only option was to tell the renegades to:
  1. Close iReport
  2. Delete the 4.1.7 settings directory. If you installed iReport in C drive here is the command to use. Otherwise locate the directory and delete it:
    rmdir /s "c:%HOMEPATH%\.ireport\4.7.1"
  3. Start iReport canceling the import for settings from 4.1.3
Bottom line is it looks like in Windows an iReport upgrade will result in losing your previous settings.

Delete old files except for certain directories and files with one liner bash

DISCLAIMER: Do understand what you are doing before proceeding. I am not responsible for your own actions. I just make public useful code which might become harmful in the wrong hands.

Here is a one liner "find" command that allows you to iterate through all files inside a given directory providing exceptions for certain files and directories. With the result you can run any command.

Let me read the below example for you: Find starting at /home/ directory all files (-type f) older than 30 days (-mtime +30), print full path file names followed by a NUL character so white spaces are correctly interpreted (-print0), ignoring (-prune) directory "nestor" or any hidden files (.*). Then list (ls) the items terminated by a null character and no-run-if-empty (xargs -0 -r). The "-o" switch says "Do not evaluate the next expression if the previous is true", reason why the exceptions go first.
$ find /home/ -type d -name "nestor" -prune -o -name ".*" -prune -o -type f -mtime +30 -print0  | xargs -0 -r ls -al
Clearly you can change "ls -al" by "rm" if you are confident all those files might go away.

On a related issue it is common to forget to use "-mindepth 1" option which basically tells find "do not list in your 'findings' the start directory". Imagine you want to delete anything below /opt/tmp/realtime_temp. If you do not use the "-mindepth 1" option the directory itself will be deleted as well. So do use the flag if you are not trying to delete the start directory. I have seen some Linux installations delete the dir while others won't ... go figure.
find /opt/tmp/realtime_temp/ -mindepth 1 -mtime +5 -exec rm -Rf {} \; > /dev/null

Another related issue is that 'find' needs the '-depth' flag in order to correctly remove all files and directories matching certain rules (when using -exec or piping to xargs. If using -delete that option implies -depth as per man pages). Not using '-depth' results in errors like 'No such file or directory' as 'find' tries to execute the remove command for files contained in directories that it already removed.
find /opt/tmp/realtime_temp/ -depth -mindepth 1 -mtime +5 -exec rm -Rf {} \; > /dev/null

Friday, September 28, 2012

Show MySQL permissions AKA GRANTS

Listing current permissions including passwords in MySQL can be easily achieved using POB script, for example:
$ history -d $((HISTCMD-2)) && ./ "-u root -p$MYSQL_PASSWORD"
Note the pasword doesn't stay in the history which is a wise thing to do.

Thursday, September 27, 2012

Installing netcat in Solaris 10

Netcat or nc is the Swiss Army Knife of the Network-Security-Sysadmin Engineer. Here is a POB recipe I put together to ensure we push the package into our Solaris boxes.

In reality it is better to build a wrapper that will install any packages from sunfreeware. That is left to reader. Any takers?

Wednesday, September 26, 2012

Ubuntu Couldn't find your SSL library files for Monit

DevOps is I would argue used by many people in wrong terms nowadays (more on this in upcoming posts but be aware of the hype). In reality the Ops in the equation (meaning the sysadmin) should strive for identical Servers. Management through Recipes is the way to go, however many feel like a sysadmin should not code and that is wrong. The sysadmin should have nothing to do (probably) with OOP but with scripting and automation, that should be part of their daily work.

Today we got a weird error in one of the Ubuntu Servers. Monit would not install from the POB Recipe we have been using successfully, but instead fail with the below error:
$ ./configure --prefix=/usr/sbin --bindir=/usr/sbin --sysconfdir=/etc/monit/ 
checking for static SSL support... disabled
checking for SSL support... enabled
checking for SSL include directory... /usr/include
checking for SSL library directory... Not found

Couldn't find your SSL library files.
Use --with-ssl-lib-dir option to fix this problem or disable the
SSL support with --without-ssl
The only explanation for these inconsistencies is actually not managing the servers through recipes. With time the manual actions done in certain servers are different than in others.

Here is how we solved it in case someone else is having a similar error:
$ ./configure --prefix=/usr/sbin --bindir=/usr/sbin --sysconfdir=/etc/monit/ --with-ssl-lib-dir=/usr/lib/x86_64-linux-gnu
checking for static SSL support... disabled
checking for SSL support... enabled
checking for SSL include directory... /usr/include
checking for SSL library directory... /usr/lib/x86_64-linux-gnu
So try to keep your actions as sysadmin versioned in scripts which at the same time access versioned configurations. Then deploy always from scripts/recipes to have consistent environments.

Mount CIFS or NFS from a POB Recipe

I have packaged these so frequent actions in a couple of bash scripts: and

As a reminder you can run this scripts directly from root or you can use remoto-it to run them remotely in your servers.

Tuesday, September 25, 2012

Jasper Reports from XML Datasource : Rendering the Enterprise Hierarchical Data

Data is hierarchical by nature. Whether you model it with a relational database or a noSql database (it looks like nowadays anything non sql92+ compliant is simply noSQL and you do not need to talk about the big differences about graph, OO, XML and more) the fact is you need to report out of that data.

Your hierarchical data (did I say that any report will ultimately render hierarchical data?) can be rendered with the help of iReport at Design time and JasperReports at runtime. A combination of XML+XPATH and sub-reports will do the job for us.

Here is a showcase to illustrate how to build a report which parses a potentially big XML containing several companies to extract employees and their contacts. The source code can be downloaded from

The key here is to understand that you need a datasource per report and so to show hierarchical data you will need sub-reports. We will use bands which will result in rendering all nodes matching the xpath provided. Probably using tables for rendering tabular data is a better idea but for simple reports this should be enough.

  1. Create an XML datasource pointing to the file (In our case companies.xml)
  2. Check "Use the report XPath expression when filling the report" option Use as name the file for example "companies.xml"
  3. Go to File|New|Report|Blank|Open this template and name the report something like "employee"|Next Finish
  4. From the left pane remove all bands (right click|delete band) but the "title", "column header" and "detail" bands. Choose as title "Employees"
  5. Click om the Report Query button (database with arrow icon) | Xpath as query language. On the right pane of the Report Query Window drill down until you find the node containing the fields to present, in this case "employee". Right click and select "Set record node (generate xpath)". The text "/companies/company/employees/employee" appears as a result and you can see the selected nodes are 2. Drag name and phone to the fields pane and click OK.
  6. On the left pane (Report Inspector) the fields are now accessible. Drag and drop them in the details band. Two labels appear in the Column header band and two fields appear in the details band. Adjust the height of the two bands so they do not take more space than needed by a typical row
  7. Click on Preview and the two employees will show up. Now let us jump into the subreport to show the contacts below each employee
  8. Create a new report as explained before but name it "employee_contacts". Use as title "Contacts". Use as Datasource root the "contact" node (note the xpath is now /companies/company/employees/employee/contacts/contact) and drag and drop the name and phone for that node. Use as Title "Contacts".
  9. Hit Preview and note we have a problem, we are not filtering by a specific employee name. Let's go back to the Report Query and change the xpath to "/companies/company/employees/employee[@name='$P{employee_name}']/contacts/contact"
  10. In Report Inspector pane create a Parameter called "employee_name". Hit Preview and you will be prompted for an employee name. Pick John or Paul to get content back.
  11. Go back to the "employee" report and expand the details band so there is space for the "employee_contacts" subreport.
  12. Drag and drop the Subreport component from the Palette pane into the details band. Select "Use existing subreport" and point to "employee_contacts.jasper", click next to accept "Use the same connection used to fill the master report". Click next. For the parameter expression pick from the dropdown "name field" (F${name}) which is the employee name. Use option for absolute path and click Finish.
  13. Clicking preview at this point at least in version 4.1.2 will end in the below error, reason why we migrated to 4.1.7:
    Error filling print... null java.lang.NullPointerException     at net.sf.jasperreports.engine.fill.JRPrintBand.addOffsetElements(     at net.sf.jasperreports.engine.fill.JRFillElementContainer.addSubElements(     at net.sf.jasperreports.engine.fill.JRFillElementContainer.fillElements(     at net.sf.jasperreports.engine.fill.JRFillBand.fill(     at net.sf.jasperreports.engine.fill.JRFillBand.fill(     at net.sf.jasperreports.engine.fill.JRVerticalFiller.fillColumnBand(     at net.sf.jasperreports.engine.fill.JRVerticalFiller.fillDetail(     at net.sf.jasperreports.engine.fill.JRVerticalFiller.fillReportStart(     at net.sf.jasperreports.engine.fill.JRVerticalFiller.fillReport(     at net.sf.jasperreports.engine.fill.JRBaseFiller.fill(     at net.sf.jasperreports.engine.fill.JRFiller.fillReport(     at net.sf.jasperreports.engine.JasperFillManager.fillReport(     at net.sf.jasperreports.engine.JasperFillManager.fillReport(     at     at org.openide.util.RequestProcessor$     at org.openide.util.RequestProcessor$ Print not filled. Try to use an EmptyDataSource...
  14. open the XML and add to subreport node below the declaration for report element node:
    <subreportParameter name="XML_DATA_DOCUMENT">
  15. Remove the absolute path and use just relative path. You can also remove the below node:
  16. Go back to the Designer Editor and make sure it shows up as parameter. I have found sometimes it does not in which case closing and opening the properties editor or saving xml and switching to Designer or just closing and opening the report will restore it. For this subreport we have two parameters as the jrxml shows.
  17. Here are a couple of screenshots of how it looks locally for me:
  18. To invoke the report from java using the JasperReports library you go like
    Document document = JRXmlUtils.parse(xmlFile);               reportParameters.put(JRXPathQueryExecuterFactory.PARAMETER_XML_DATA_DOCUMENT, document);
    jasperPrint = JasperFillManager.fillReport(jasperFilePath, reportParameters);
The version on SVN shows a header just to demonstrate how it correctly shows the page numbers and total pages in the report header even when the subreport is the one printing full pages. The trick here is that you use the same variable for current page number and for total pages but for the latter the field has an attribute (evaluationTime="Report") setting that will cause the Jasper Engine to evaluate when the whole report has been pre-rendered, at that moment Jasper knows how many total pages will be in the final report.

Friday, September 21, 2012

Solaris Monit installation from POB recipe - Unattended Security Installation

Use POB Recipe together with Remoto-IT or directly as root in your server to install a specific monit version in an unattended and secure way. The script should be idempotent BTW.

Here is how to call it locally to install version 5.5:
./  5.5

Thursday, September 20, 2012

SSH Sessions in multiple tabs from one command

Seriously, aren't you tired of typing or clicking hundreds of times to get to hundreds of remote linux machines? Well I got tired today and I have to confess I barely interact with less than a dozen of them and only when there are serious issues. I cannot imagine what the life of those issuing commands the whole day to linux servers can be without having a way to open up connections to a whole farm of servers from just one command.

I created which only works for the gnome-terminal. You can do similar stuff with iTerm or plain Terminal plus Applescript in OSX even though that is out of the scope of this post as I am trying to push the team to work with Desktops that are closer to the servers where the applications are hosted.

Suppose you have server1 accessible, server2 and server 3 accessible from server1 and server 4 only accessible from 3 and 2. It is not hard to find this kind of situations especially in environments where security must be put in place with scarce resources. What do you do?
  1. Download the script.
  2. Create wrappers for your SSH connections especially if they are multi-hop
  3. Include the commands in a file like billing-environment.txt
    #A simple direct ssh
    ssh -t user@server1
    #Wrapped (for simplicity) ssh commands
    #A complex direct ssh equivalent to /home/nestor/
    #ssh -t user@server1 \"ssh -t server2 \"ssh -t server4\"\"
    Here is for example /home/nestor/
    ssh -t user@server1 "ssh -t server2 \"ssh -t server4\""
  4. Run just one command and get the 4 tabs with an ssh session to a different server each:
    $ ./ billing-environment.txt

Using monit instead of cron for scheduling high frequency tasks

Monit can be actually used for more than just monitoring. Let us consider the following scenario: You need to schedule a task to run every minute (for example shipping logs from one server to another via rsync), you want to receive a notification if it fails but if there is a networking issue you do not want to receive an alert every minute for an hour disruption. Instead it would be ideal to receive just one alert when it fails and just receive another when the service is back to normal. Crontab would do the work only partially as you either will need to live with receiving an execution error every minute or code yourself some kind of counter logic to make sure cron won't bombard you with hundreds or thousands of messages.

Here is what I ended up doing. I just removed from cron the commands I needed to run. If there is an error running a command monit will alert and it will not do it again until the script runs without errors to inform everything is back to normal:
#!/bin/bash -e
# name: /sbin/
# date: 20120919
# author: Nestor Urquiza
su sampleadmin -c 'rsync -avz -e "ssh -i /home/sampleadmin/.sample-logs_rsa" /opt/tomcat/logs/sample-app.log > /dev/null'
su sampleadmin -c 'rsync -avz -e "ssh -i /home/sampleadmin/.sample-logs_rsa" /opt/talend/log/talend.log > /dev/null'
Of course you do not need monit if all you do is to run a program once a day (albeit you can use it there as well as monit supports cron syntax) but if the program runs multiple times in a day or specially in an hour you will definitely save \a lot of email cleaning/inspecting time.

Multi-hop SSH

If you are constantly ssh-ing into boxes you better automize a little bit your daily work, especially if to get to some of them you have to go through multiple hops.

You can generate and authorize keys from one server to the other and then later on issue a command like:
ssh -t user@firstHost "ssh -t secondHost \"ssh -t thirdHost\""
You can even use Remoto-IT to deploy the above POB Recipes remotely in multiple servers.

You can BTW open several of these complex connections from just one command in multiple tabs in your terminal application. Here is how to do it in gnome.

Wednesday, September 19, 2012

Monit reinstallation in Ubuntu from a POB Recipe

It sucks to do updates in multiple servers but if you build a POB recipe (idempotent enough) you can install anything in seconds.

Use POB Recipe together with Remoto-IT or directly as root in your server to make sure you install a non-fake specific monit version. The script should be idempotent BTW.

Here is how to call it locally to install version 5.5:
$ sudo ./  5.5 8276b060b3f0e6453c9748d421dec044ddae09d3e4c4666e13472aab294d7c53

Saturday, September 15, 2012

Ubuntu from USB drive and wireless support in Dell Inspiron 1545

Nowadays I install OS from USB whenever I can. Most of the time the installation would be straightforward but Dell and its Broadcom wireless cards are still a problem just because of licencing issues.

Today I helped a family member with his Inspiron 1545 and here is the story of that Journey"
  1. Insert USB with Ubuntu
  2. Start the "Additional Drivers" application (Search for "Driver" from "Dash Home" application, the first icon with the Ubuntu Logo).
  3. The "Broadcom STA wireless driver" was found in my case. Select "Acrivate", I got:
    Sorry. installation of this driver failed. Please have a look at the log files for details: /var/log/jockey.log A quick look of the jockey.log revealed: 2012-09-15 19:30:56,301 ERROR: Package fetching failed: Failed to fetch cdrom:[Ubuntu 12.04.1 LTS _Precise Pangolin_ - Release amd64 (20120823.1)]/pool/main/d/dkms/dkms_2.2.0.3-1ubuntu3_all.deb Unable to stat the mount point /media/cdrom/ - stat (2: No such file or directory) Sorry. installation of this driver failed. Please have a look at the log files for details: /var/log/jockey.log
    A quick look of the jockey.log revealed:
    2012-09-15 19:30:56,301 ERROR: Package fetching failed: Failed to fetch cdrom:[Ubuntu 12.04.1 LTS _Precise Pangolin_ - Release amd64 (20120823.1)]/pool/main/d/dkms/dkms_2.2.0.3-1ubuntu3_all.deb Unable to stat the mount point /media/cdrom/ - stat (2: No such file or directory)
  4. Of course the problem here is that a cdrom is expected but we are using a usb device. Take a look at /media directory and you should see the name of the USB device (In my case UBUNTU) so a simple symlink will help:
    sudo ln -s /media/UBUNTU /media/cdrom
  5. Now you should be able to activate the driver now as explained above
  6. Once the driver is installed delete the shortcut we created:
    sudo rm /media/cdrom
  7. You might want to restart and use the fn+f2 combination to get to the BIOS, select wireless and disable the wireless hardware switch to avoid wireless disconnections.