Tuesday, April 30, 2013

Pre-populate Talend custom component schema

I run into an issue today where the tFileInputCSVFilter custom component was outputting empty lines instead of real records. The issue was related to using READONLY="true" in PARAMETER FIELD="SCHEMA_TYPE". Setting it to true or just removing it to fall into the default value did the trick. Leaving it to true will make the schema non editable. Going the extra mile I set the TABLE node inside the filter and reject schemas to make sure the "line" (which represents a whole line from the input) was available automatically. For example for the filter (I did the same for main and reject even though main is configured as having no outputs):
...
     <PARAMETER NAME="SCHEMA_FILTER" FIELD="SCHEMA_TYPE" NUM_ROW="1" CONTEXT="FILTER">
       <TABLE READONLY="false">
         <COLUMN NAME="line" TYPE="id_String" READONLY="false"/>
       </TABLE>
     </PARAMETER>
...
Do not forget to set the component schema to auto propagate using COMPONENT SCHEMA_AUTO_PROPAGATE ="true".

Bottom line I have found that the schema must be editable so in Talend custom components in order for them to be pre-populate or even available to following components. Failure to do so will result in empty lines as the schema will be missing and so there would be no place to flush the output.

Thursday, April 25, 2013

Finding big size files and directories in Unix and Linux

What is eating my HDD? That is a common question I hear. The response to it is pretty simple. Just search for big files or directories containing a lot of information.

Here are two ways to find big inmediate subdirectories for example from /home/user:
sudo du -h --max-depth 1 / --exclude=proc
ls -d -1 /home/user/*/ | xargs du -hs
Here is how to find files bigger than 20 MB starting at /home/user:
find /home/user -type f -size +20000k -exec ls -lh {} \;
Here is how to recursively find directory sizes starting at /home/user (the bigger will appear last):
find /home/user -type d -print0 | xargs -0 du -s | sort -n

Monday, April 22, 2013

Ubuntu apt-get update Err http://us.archive.ubuntu.com/ubuntu/ 404 Not Found [IP: w.x.y.z]

Ubuntu as any other OS should be upgraded when the time comes. Keeping an unsupported version of the OS is a no-no. Yet for whatever reason it looks on the technology side we need to deal with upgrades in servers that have not been patched. When you say that the solution is an upgrade and that it will take longer the statement, believe me, will be not appreciated.

In those case you better try to correct the issue even if providing a non-secure solution and later state what should be done ASAP to avoid possible exploits to compromise your outdated servers.

Commonly the first problem arises when you try to install a package in an old distro:
$ sudo apt-get  install nfs-common
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Package nfs-common is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source

E: Package 'nfs-common' has no installation candidate
To illustrate a possible "fix" with an example let us say you get the below error when trying to update an old maverick distro:
Err http://us.archive.ubuntu.com/ubuntu/ maverick-updates/main portmap amd64 6.0.0-2ubuntu1.1 404 Not Found [IP: 91.189.91.13 80]
Maverick is not longer a supported release so it won't be found in the original servers. You need to look for them in the old-releases server so most likely the below will work:
sudo sed -i 's/us.archive.ubuntu.com/old-releases.ubuntu.com/g' /etc/apt/sources.list
Most likely you will get then some key related issues like:
W: GPG error: http://old-releases.ubuntu.com maverick-security Release: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 40976EAF437D05B5
Which you can solve with:
$ sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 40976EAF437D05B5
Now you should be able to install that package which was missing in your old distribution. However let me state this once again, you should never rely on this procedure. You should always upgrade your distribution. To safely upgrade the distribution I believe automated server management in terms of recipes is the way to go.

IllegalArgumentException - Executable name has embedded quote, split the arguments

The way your Java program deals with external command invocations can be Runtime.exec, java.lang.ProcessBuilder or calling a web server wrapper that avoids huge memory leaking (JVM-Fork() problem) as I have posted before.

If you are using Runtime.exec without arrays for parameters then most likely you will face an "IllegalArgumentException" starting with version 1.7.0_21 in Windows platforms. The message states "Executable name has embedded quote, split the arguments". The solution as explained by Oracle is to use an array if you stick to Runtime.exec or migrate to java.lang.ProcessBuilder. Please refer to http://www.oracle.com/technetwork/java/javase/7u21-relnotes-1932873.html#jruntime for details.

I ignore if java7 has already corrected for good the momentary overcommit of swap space as a result of using fork() but we have found using the wrapper mentioned above has done the trick so far for us but for sure using a wrapper is not always an option.

Friday, April 12, 2013

Keeping the file execution permission in subversion

Subversion won't store the execution bit which is used in Unix, OSX and Linux Systems unless we use a property:
svn propset svn:executable yes test.sh
I always say that it is useful to use extensions in files. Even if you can set executable permission in any file the time will come when you need to know which files you should set this property in. This allows you to for example set the flag on for all shell scripts starting from the current directory:
find . -name "*.sh" | xargs svn propset svn:executable yes
If you are creating a new file then you should consider adding the execution flag from the operating system. Then, and only then, proceed to run the 'svn add' command. Subversion will automatically add the executable property to svn. Here is the POC:
$ rm -f test.sh
$ touch test.sh
$ chmod +x test.sh 
$ svn add test.sh 
A         test.sh
$ svn proplist test.sh
Properties on 'test.sh':
  svn:executable
Now you should be able to explain why sometimes your executables might come back from SVN without the +x flag/bit. If you are using an IDE you will need to look at what it is doing behind scenes with a file. For example if you copy an existing file with the svn executable flag from Eclipse into a new file, once Eclipse commits the new file it will end up with the right svn flag on SVN.

If on the contrary you create a brand new file in Eclipse, commit that file and later on you change the file permissions, then Eclipse won't set the svn:executable flag. This is the behavior for at least the subversive plugin (the one I currently use). I would say this functionality (SVN Client to automatically set the svn:executable property obeying the file system execution bit in Unix/OSX/Linux) is a good candidate for a feature request if you ask me. What IDEs would do this for you? Good question for a community forum.

Probably it is worth mentioning that when you checkout or export an existing file containing the svn executable flag, the subversion client will set the OS executable bit for you.

Wednesday, April 10, 2013

IT Agile and Lean Hiring

Simply put give the applicant an NDA and hire him/her as a consultant. Assign her/him a task that creates value for your company, do not go with a pet or PoC project, that is just waste you should eliminate in your lean process.

You need VDI for this. You want an isolated Desktop with access to few resources, no clipboard availability and proxy controlled access to the web. You want to make sure the person you are hiring will not go evil.

Pay the Applicant (now the Consultant) for her/his work and if you like the outcome (which goes beyond human relations, commitment, availability, hard work, knowledge ...) then approve her/him. When the new member joins the team there is already confidence in all parts involved, there are no surprises.

IT personnel are knowledge workers. You should have a layered infrastructure, architecture and SDLC in place respecting Separation Of Concerns. If you do so, then you should be able to define tasks that demand little or no knowledge of the current services you provide or the nature of your business. That translates into new team members that are less stressed, who provide value day one and who only keep providing more and more gradually as they learn.

While the technical manager discusses the intrinsics of the first few tickets that will help with your project there will be a full understanding of the capabilities of the applicant. The applicant himself will understand if this job is the right one, if the technology stack makes sense to her/him, if s(he) sees potential for growth. It is a win-win situation.

We should all learn from the Automobile Industry. It is easy to build a Software Shop. It is not that easy to build a Software Factory.

Let us practice an agile approach to hiring. Lean processes start at hiring and they go all the way up to to Strategic Planning. Agility in SDLC is not enough to make a business succeed.

Saturday, April 06, 2013

Talend: Run batch sql statements from internal resource script missing. The alternative: tSQLScriptParser

When you need to run multiple sql statements you commonly use sql scripts that are supplied to the SQL Engine. The Engine is optimized to digest those in one or multiple "batches". There is a solution for MySQL as the tMySQLRow component supports batching through the option "allowMultipleQueries". Not for tSqliteRow or other other components though.

From Talend we could invoke the Engine directly but at the time of this writingTalend lacks support for internal resources definition..

A custom component exists though, called tSQLScriptParser which can provide a work around. Here is how:
  1. Download the component and install it.
  2. Define your sql inside "SQL Script" field, for example for a sqlite table called person:
    "drop table if exists person;
    create table person (name varchar(32));
    insert into person values('Foo');
    insert into person values('Bar');"
    
  3. Connect "iterate" output to a t{DB}Row component. In the picture as you can see we use tSqliteRow
  4. In the SQL field for the t{DB}Row use the below:
    ((String)globalMap.get("tSQLScriptParser_1_STATEMENT_SQL"))
    
  5. Run it and you can confirm the statements have run. For our example:
    $ echo "select * from person;" | sqlite3 /tmp/sqlite_test.db
    Foo
    Bar
    

Thursday, April 04, 2013

MySQL ERROR 1045 (28000): Access denied for user

This issue happens every once in a while in certain mysql installations. You know the password and the user, you have created it yourself and yet mysql complains:
$ mysql -u myUser -p myDB
Enter password: 
ERROR 1045 (28000): Access denied for user 'myUser'@'localhost' (using password: YES)
Almost 100% of the time this error comes up as a result of a blank user defined for localhost so the below commands should solve it:
mysql> delete from mysql.user where Host='localhost' and user='';
mysql> flush privileges;

Ubuntu 12.04 WARNING: The following packages cannot be authenticated!

We were getting the below message today in one of our servers:
WARNING: The following packages cannot be authenticated!
Sometimes just running a couple of commands does the trick:
sudo apt-get update
sudo apt-get upgrade
Sometimes you need to delete the repositories list, reimport the keys, clearn, purge, update, upgrade ets. Just Google about this error to convince yourself about how many things could be originating this issue (or similar).

Perhaps one of the first places to look for problems is running command 'sudo apt-get update', a WARNING might reveal the problem like:
W: GPG error: http://ppa.launchpad.net precise Release: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 2EE5793634EF4A35
You know you need to authorize the above key to resolve it:
apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 16126D3A3E5C1192
After that you should be able to install packages that were failing before with errors like:
E: There are problems and -y was used without --force-yes
Some other times there is security imposed in your network by devices and/or appliances which will interfere with the upgrade. Especially firewalls use either Proxying and Packet Filtering and those might be the culprits. If you face this error I would recommend installing the same version of the OS in a same-subnetwork-machine and compare the results with an installation in a machine in a different network. You might have to run strace, tcpdump and God knows what else before you convince the Firewall admin that there is something funky going on with the Network.

Wednesday, April 03, 2013

Changing MYSQL Collation for all tables from bash

I just released another useful recipe where bash simplicity shows up. Just the way the command works that is the way you use it from bash, no wrapper. Simple, fast, agile.

Run it from remoto-it or just locally like in:
./mysql-change-collation-all-tables-db.sh mysql.sample.com root mydb utf8_unicode_ci
I try to keep the same collation and charset across all db objects. So I make sure my database (table_schema) has the same collation as the tables it hosts:
show variables like "collation_database";
SELECT table_schema, table_name,table_collation  FROM information_schema.tables where table_schema='myDB'; #show table status;

Followers