Thinking In Software: September 2013

Saturday, September 28, 2013

Add the session cookie to apache logs

Business intelligence starts with your application logs. How your users use the system is at the core of the questions you need to respond to make sure your product correctly addresses productivity.

Apache can log specific cookies like the session id. Below is an example of such configuration for the typical JEE application:

 LogFormat "%h %l %u %t \"%r\" %>s %b %{JSESSIONID}C" custom
 CustomLog /var/log/apache2/sample.com.log custom

My Twitter Account was hacked - Solution is double factor authentication

Double Factor Authentication is an absolutely must have. It is between the 10 most important security measures for any application that can be reached one way or the other by public audience.

My twitter account was recently hacked and as a result my account spammed around 10 followers with around three spams each. My apologies for this incident.

I had a strong password and I usually change them every three months (a pain, I know). I even have different passwords for my different online accounts (another necessary pain). I would have saved time but more importantly I would have saved some reputation should I have looked into the Twitter privacy section because the service offers double factor authentication.

In twitter case they support the double factor authentication with SMS or the twitter app.

Double factor authentication is an inconvenient but it is better to go through that pain than getting hacked.

Friday, September 27, 2013

Apache log files statistics - Hits per resource - Finding most consumed resources

In a typical modern web application you have users hitting resources directly and indirectly. Many times, especially in REST approaches there is a number in the path representing the specific Collection member we are currently accessing. Unix Power Tools can quickly give us a response for questions like "List hits by page" or better said in WEB 2.0 "List hits per resource".

Given Apache logs look like:

192.168.0.100 - - [22/Sep/2013:06:25:09 -0400] "POST /my/resource HTTP/1.1" 200 3664

When the below command is run:

grep -o "[^\?]*" access.log | sed 's/[0-9]*//g' | awk '{url[$7]++} END{for (i in url) {print url[i], i}}' | sort -nr

Then an output like the below will be returned:

10000 /my/top/hit/resource
...
50 /my/number//including/hit/resource
...
1 /my/bottom/hit/resource

The command first gets rid of the query string, replaces all numbers (This allows us not to consider resources that differ by ids as different), builds an associative array (or map) with key being the resource and content being the number of such resources found, prints it as "counter resource" and finally sorts it descendant (no real need for the -n switch as no numbers will be present in the URL.

Force ssh password instead of public key authentication

ssh -o PubkeyAuthentication=no user@my.domain.com

Thursday, September 19, 2013

Limit CPU consumption for processes

Compression algorithms eat CPU. While they are needed for backups you do not want to put your resurces down just because of one process. Use cpulimit for it then:

$ mycommand.sh & sleep $delay & cpulimit -e gzip -l 30 -z

Looking at the man pages you will realize we are limitting overall usage to 30% of the whole available CPU (if 3 processors then around 10% each) and the -z option will make cpulimit to quit if no gzip process is found, hence the delay for the command will depend on what that command actually does.

Friday, September 13, 2013

Stress to test - Simulate cpu, memory, io load

How to stress test Linux? Just use stress command which is available from apt-get and probably other package managers as well. Here is how you can consume 500MB RAM for a period of 30 seconds.

$ stress --vm 1 --vm-bytes 500M -t 30s --vm-hang 30
stress: info: [7651] dispatching hogs: 0 cpu, 0 io, 1 vm, 0 hdd

Test up front, Quality assurance is the very first step to guarantee constant process improvement.

Sunday, September 08, 2013

Reconsider those NewLine in FileName

This is a huge issue. IMO file names with newlines should be considered invalid. I would do the same with file names containing spaces but well, that is a mission impossible. Ask any regular user, simply put, we write using spaces to separate words so why would you be forced to use underscores, dashes or write CamelCase?

So any code generating file names containing new line characters (or anything other than alphanumeric and space should be fix. If that is not possible, like when you do not own the code generating them, and we still need to process such files with our own tools then it is better if we just rename them.

I have scripted this post as:

#!/bin/bash -e
#/usr/sbin/fixInvalidFileName.sh

USAGE="Usage: `basename $0` <filePath>"

if [ $# -ne "1" ]
then
  echo $USAGE
  exit 1
fi

file=$1
pattern=$'[\r\n]'
if [[ "$file" == *$pattern* ]]; then
  mv "$file" "${file//$pattern/}"
fi

Now you can use it like:

find . -name "*" -exec /usr/sbin/fixInvalidFileName.sh {} \;

And the result would be newlines stripped out the file name.

Thinking In Software