Thinking In Software: January 2014

Thursday, January 30, 2014

The history of Browser usage, the rise of Chrome and the collapse of Internet Explorer

A link is enough.

Google Chrome + Lego

My kids will laugh at my poor building lego skills but anyway in honor to the place where I was born here is my first chrome lego creation.

Is Kanban used as a buzz word?

I read today a good post from David J Anderson. Minutes after I read another one where actually the word "buzz" is stamped correctly as one of those factors leading to bad kanban implementations where basically the term is used as a "permission giver" by hipsters.

Some people simply like to use buzz words. Find out who is actually using "kanban" as a buzzword through some simple questions in the first link. Furthermore I would say you should respond to those questions yourself if you are planning to introduce kanban.

Unique list of components in use in Talend job

$ find /path/to/talend/job/ -name "*.item"|xargs grep -E "value=\"t[A-Z]+" | sed -E 's/^.*value=\"([^\"]*)\".*$/\1/g' | sed 's/_.*//g' |  sort | uniq

Thursday, January 23, 2014

The last packet successfully received from the server was x milliseconds ago. The last packet sent successfully to the server was y milliseconds ago.

The most annoying part of this alert is when actually it happens and your server needs to be restarted.

The last packet successfully received from the server was x milliseconds ago. The last packet sent successfully to the server was y milliseconds ago.

100% of the time this alert refers to a miss configured JDBC Pool. The most recent issue we had with this was related to a miss configuration in Tomcat JDBC Pool. The first parameter below will do nothing if you do not explicitely enable when to actually look at the pool. That is what the second parameter does. In other Pools specifying the query will be enough. Reading the documentation for your Pool should give you the necessary clues to adjust it correctly. This issues should be replicable with a restart on mysql side which will close the pool connections from server side. If the client side (Java App Server) is correctly configured it should picked the fact that the connections were closed. If that is not the case you will end up with several connections in TIME_WAIT state.

validationQuery=SELECT 1
testOnBorrow=true

When Talend is not enough - Sectioned CSV parsing with awk

Sectioned CSV files are commonly used for reporting purposes. It is a way to present several datasets in sections. Clearly this is a format intended to be used for reporting purposes and not really for post processing. However sometimes there is no option as "it is all the external party can give you" and you will try your best from your ETL tool.

In the case of Talend I found no component capable of doing this and since my time is limited at the moment I figured to better build something quick that could potentially be ported as a Talend component in the future like for example a tFileSectionInput which accepts a section string and a delimiter, it looks for the section and after matched it collects the first line as header and all after it as records. It inserts the section keyword in an additional header field. At least for a balanced section records file this approach should be enough. Let us illustrate with an example:

$ cat ~/sectioned.csv 
This is a sample of a typical sectioned scv file

 It contains indented fields

It also contains sections identified by a title, possible header and rows

For this proof of concept header and row are treated the same even though with awk specific rules could be applied in the future.

section1
c1,c2,c3,c4,c5
a,b,c,d,e
a,b,c,d,e

section2
c1,c2,c3,c4,c5
1,2,3,4,5
1,2,3,4,5
1,2,3,4,5

section3
c1,c2,c3,c4,c5
A,B,C,D,E
A,B,C,D,E
A,B,C,D,E
A,B,C,D,E

Our script should be able to provide the below output:
$ cat ~/sectioned.csv | ~/extract_csv_section.sh section1 ,
c1,c2,c3,c4,c5,section
a,b,c,d,e,section1
a,b,c,d,e,section1
$ cat ~/sectioned.csv | ~/extract_csv_section.sh section2 ,
c1,c2,c3,c4,c5,section
1,2,3,4,5,section2
1,2,3,4,5,section2
1,2,3,4,5,section2
$ cat ~/sectioned.csv | ~/extract_csv_section.sh section3 ,
\c1,c2,c3,c4,c5,section
A,B,C,D,E,section3
A,B,C,D,E,section3
A,B,C,D,E,section3

We use for this a bash wrapper script:

#!/bin/bash
#/usr/local/bin/extract_csv_section.sh
# author: Nestor Urquiza
# date: 20140123
# description: bash wrapper to call awk for extracting a section of a file

dirname=`dirname $0`
awk -v section=$1 -v separator=$2 -f $dirname/extract_csv_section.awk

Which calls the awk script below. Note that there is no need in this particular implementation to use any array. Instead of declaring header and records we can just print '$0 separator "section"' for the header and '$0 separator section' for each record:

#!/usr/bin/awk
#/usr/local/bin/extract_csv_section.awk
# author: Nestor Urquiza
# date: 20140123
# description: section and separator are expected as shell variables (see extract_csv_section.sh)

BEGIN {
 section_found = 0; 
 #section_regex = "^[[:blank:]]*i"section"[[:blank:]]*$";
 section_regex = "^[[:blank:]]*"section"[[:blank:]]*$";
 header = "";
 records[0] = ""
 record_regex = "^.*"separator".*$"
 record_number = 0;
}
{
 if( match($0, section_regex) ) {
  section_found = 1;
  next;
 }
 if( section_found == 1 ) {
  if( header == "" ) {
   header = $0 separator "section";
  } else if ( match($0, record_regex) ) {
   records[record_number] = $0 separator section;
   record_number++; 
  } else {
   nextfile;
  }
 }
}
END { 
 print header;
 for( i = 0; i < length(records); i++ ) {
       print records[i];
 }
}

Wednesday, January 22, 2014

JIRA custom reports from your current BI platform

Granted you can build plugins in JIRA, in fact you can even sell them. You are not supposed to hit the database directly as there is a JIRA REST API which should take care of certain isolation to maintain backward compatibility. However in a team where separation of concerns is respected, and specially if it is implemented through team specialization I am reluctant to think that middle tier developers will be providing reports while backend developers cannot. Furthermore, company wide reports while in part based on WIP and work done are usually generated from outside of JIRA so it makes sense to have a way to provide JIRA reports from outside of it, in other words from your custom BI platform.

Data analysts exist for a reason. Their skills concentrate on data manipulation rather than UX/UI and middle tier programming like Java.

Consuming JSON from their ETL tools should allow Data Analysts to mashup information and present it through their Report Engine* in Excel, PDF etc.

Below are some important API entry points to help you build such reports:

/rest/api/2/project/${project}: Information about ${project}. Note that you might have to use the description field if you want to add custom metadata to the project
/rest/api/latest/field: Find out how the custom field is actually named by the API, for example customfield_10013 could be a custom field representing a custom "department" field.
/rest/api/latest/search?jql=&expand=customfield_10013: Return all tickets matching the JQL provided adding our custom "department" field.
/rest/api/2/search?jql=issueFunction in workLogged('by admin on 2014/03/11'): This works only if you install the free plugin named "script runner". Another non-free way would be with /rest/api/latest/search?jql=key%20in%20workedIssues(%222013/1/20%22,%222014/1/20%22,%22jira-users%22)&fields=worklog: List tickets with worklog fields including the timespent on them. It requires the installation of the jiratimesheet plugin.
Search for your issue and if there are no clues for it then open a question in Jira Answers. The main question for reports will mainly be "How do I do ... through the REST API"

So while I see value in designing reports within JIRA I believe those should be easily exposed via JSON REST API (as jiratimesheet does) if the work spent there is expected to be reused from outside JIRA.

*If you ask me I would prefer Front End Engineers to take care of the layout of reports but at the time of this writing report engines are still a concern of Data Analysts.

Monday, January 20, 2014

issueId to issueKey in JIRA

http://jira.sample.com/secure/ViewIssue.jspa?id=58743

Migrating apache 2.2 to 2.4 to support forward secrecy

Apache 2.2 and below are compromised but Ubuntu will not support the security update in their 12.04LTS version, forcing users to migrate to 13.10 in order to support Apache 2.4. In our experience so far the only changes needed were:

Remove below sections apache2.conf

#
# The accept serialization lock file MUST BE STORED ON A LOCAL DISK.
#
LockFile ${APACHE_LOCK_DIR}/accept.lock
...
#
# DefaultType is the default MIME type the server will use for a document
# if it cannot otherwise determine one, such as from filename extensions.
# If your server contains mostly text or HTML documents, "text/plain" is
# a good value.  If most of your content is binary, such as applications
# or images, you may want to use "application/octet-stream" instead to
# keep browsers from trying to display binary files as though they are
# text.
#
DefaultType text/plain

Edit virtual host

...
 SSLProtocol all -SSLv2
 SSLHonorCipherOrder on
 SSLCipherSuite "EECDH+ECDSA+AESGCM EECDH+aRSA+AESGCM EECDH+ECDSA+SHA384 \
EECDH+ECDSA+SHA256 EECDH+aRSA+SHA384 EECDH+aRSA+SHA256 EECDH+aRSA+RC4 \
EECDH EDH+aRSA RC4 !aNULL !eNULL !LOW !3DES !MD5 !EXP !PSK !SRP !DSS"
...

Reminder: If you are doing this in a new server like you should IMO, you will need to hardcode the domain to server IP mapping in /etc/hosts until the changes to DNS are performed. At that point remove the ip domain mapping.

Unix / Linux Power Tools - awk for group by

You have a list of commands issued let us say from a log trace. You want to provide a report that shows at least one example per command which demands showing not just the command but also the parameters being passed. For example:

$ cat /tmp/test command1 param1 param2 command2 param3 param4 command1 param5 para6

$ cat /tmp/test | awk '{ array[$1]=$0; }  END {for (i in array) print array[i]}'
command2 param3 param4
command1 param5 para6

We use $1 (the command) as the key and the whole line ($0) as a value in an associative array which guarantees a unique entry per command (in our case the last entry). At the END we print the array content.

Friday, January 17, 2014

Kanban WIP Limit - How to?

How Kanban helps with Work In Progress (WIP) limit? Officially speaking Kanban proposes limiting the WIP in all columns but after struggling with some tools "deficiencies" I came to the realization that there are two types of WIP limits: those related to "columns" or "value stream states" and those related to "personal" instant capacity.

Limiting the WIP per person is trivial, a good number is to allow handling no more than 2 standard or intangible tickets in that order plus 1 expedite plus fixed delivery date which will be actually exceeding the limit imposed of 2 per person. The question is how to enforce it. JIRA at least does not have a good way to limit personal WIP limits, would you vote for it please?. The Kanban method tries to address the WIP limit with just limits in the value stream states or columns but I argue that is not enough.

In terms of columns we have queues for example which should be adjusted empirically. On the other side we also want to visualize the personal WIP limits in columns for which our board must be designed carefully. What you might think is a stage to be used across multiple teams should not be modeled as just one column but actually as a number of columns matching the amount of teams involved.

If you have Back End (BE) and Front End (FE) developers and the FE basically pull the work after the BE has finalized their part the pulling system just behaves as expected (pulling from the previous stage).

However if they can work in parallel then you will be tempted to create a common state to describe “dev-in-progress”, and use basically a different swim lane for the two teams. While the separated swim lanes will allow further visualization if you do not create separate columns for them you will be mixing the two concerns on a unique WIP limit (specified in the column) which would be incorrect. The fact that they are columns does not mean that all tickets need to stay time in both of them. This column skipping is not a violation of the Just In Time (JIT) system. The correct workflow should exist to avoid tickets to be pulled from BE into FE but on the contrary to pull them from the backlog.

While you could set WIP limits per team swim lane the WIP limits for queues will still need to be set by columns generating a less appealing board. If you want to visualize several projects in swim lanes you have no option other than using WIP limits in columns. The same applies for swim lanes dedicated to any other kind of criteria like classes of services.

I have found in practice that setting the personal WIP limits alone will be enough for a gradual reduction of WIP. However setting limits in queues is as well very important otherwise your cycle time will increase for no reason, you won't be able to balance the demand versus throughput neither control waste generation resulting from prioritization. A good personal WIP limit is 2 standard item per person with an additional expedite or fixed delivery date item. A good queue WIP limit is 5 per person considering you are trying to fix a ticket per day and that members of the team usually don't get more than one week vacation per individual in a row. Your mileage will vary of course but this should be at least some food for thought.

These complexities are usually not easy to be visualized in just one view so you could visualize the number of issues per member for WIP limits in a classical table layout, the number of issues per team in specific queues in a classical table layout and finally the total number of issues in the kanban board. All this is possible using JIRA Agile for example.

You can read a bit more about this issue in Personal WIP limit directly impacts Cycle Time

Monday, January 13, 2014

Apache Virtual Hosts not working? Are you using IP or wildcard?

The Apache virtual host directive starts with something like the below:

<VirtualHost sample.com:80>
  ServerName sample.com
  ...

Basically you need to include the IP or domain of your service with the port where the service is listening to separated by colon. Very often you see the sysadmin suggesting and in fact using the below instead:

<VirtualHost *:80>
  ServerName sample.com
  ...

That is basically saying the Server will listen in all IPs for the specific domain. This will work perfectly for SSL as well until ... a second website is needed.

The reason is that SSL works at IP level and not domain level. That means you cannot have two domains served by the same IP. At that point you need to specify the resolvable domain or the IP.

What happens when you are trying to migrate the existing site to a new server? The IPs will be different but your DNS cannot be changed until the migration is performed. You have no other option than resolving to the future IP in *just* the new server and that is exactly what /etc/hosts is for.

192.168.0.12 sample.com

Without that entry your new server will never work for the existing domain. Of course you need now to remember to remove the entry once the migration has finished.

Monday, January 06, 2014

On security: Linux known_hosts and the warning The RSA host key for domain has changed

The SSH Host Key fingerprint (at least for a MAC accessing a Ubuntu Server) stored in known_hosts is not built out of the host key but actually the host IP as well. That is the reason even when you copy your existing public and private keys you get the below warning:

$ ssh user@sample.com
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@       WARNING: POSSIBLE DNS SPOOFING DETECTED!          @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
The RSA host key for sample.com has changed,
and the key for the corresponding IP address 192.168.1.60
is unknown. This could either mean that
DNS SPOOFING is happening or the IP address for the host
and its host key have changed at the same time.
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.
The fingerprint for the RSA key sent by the remote host is
a6:d8:17:45:c3:74:eb:cd:a8:5a:a5:91:37:f8:8c:7f.
Please contact your system administrator.
Add correct host key in /Users/nestor/.ssh/known_hosts to get rid of this message.
Offending RSA key in /Users/nestor/.ssh/known_hosts:68
RSA host key for sample.com has changed and you have requested strict checking.
Host key verification failed.

This protects the client against spoofing and even though clients could disable "CheckHostIP" that would mean you will be vulnerable to DNS spoofing. Bottom line when changing IP for a domain the clients will need to update their known_hosts file which means they will need to accept the new key and this will be a manual operation.

Thinking In Software