Monday, June 13, 2016

#NoProjects are needed for software development - Prioritize MMF based on multi-dimensional analysis (value, risk and cost of delay)

#NoProjects are needed for software development. Prioritize MMF based on multi-dimensional analysis (value, risk and cost of delay) instead.

I was tempted to share why I think that Projects in software development kill value rather than bring value but before posting I did my homework and I found that somebody wrote about it before.

Prioritization should be about selecting the minimal marketable features (MMF) that bring the biggest value, minimize the cost of delay and/or are bigger risk reducers (what I call multi-dimensional analysis). It should be done continuously just as software should be shipped. Accepting a big effort as more profitable than focusing on MMF delivery is a mistake. The devil is *always* in the details.

Saturday, June 04, 2016

AH00051: child pid ? exit signal Segmentation fault (11), possible coredump in ?

I ran into the below segmentation fault in apache accompanied by high CPU usage in the server:
[Fri Jun 03 06:00:01.270303 2016] [core:notice] [pid 29628:tid 140412199950208] AH00051: child pid 2343 exit signal Segmentation fault (11), possible coredump in /tmp/apache-coredumps
After debugging what the problem was, I concluded that mod proxy was the culprit. There is an apparent recursive bug. I decided not to report it because it was hard to replicate (this happened only late at night) and there was a workaround I could use. This error happened when there was a first request for a resource that the proxied worker could not serve, apache tried to use a specific document for the error but a configuration issue would try to pull such document from the worker as well. Since the document is not in the remote server mod proxy will enter an infinite loop resulting in an apache crash. Most likely a recursion related bug. Clearly resolving the configuration issue to make sure the document is pulled from the local apache hides this potential mod proxy bug.
[Fri Jun 03 05:56:40.831871 2016] [proxy:error] [pid 2343:tid 140411834115840] (502)Unknown error 502: [client 192.168.0.180:58819] AH01084: pass request body failed to 172.16.2.2:8443 (sample.com)
[Fri Jun 03 05:56:40.831946 2016] [proxy:error] [pid 2343:tid 140411834115840] [client 192.168.0.180:58819] AH00898: Error during SSL Handshake with remote server returned by /sdk
[Fri Jun 03 05:56:40.831953 2016] [proxy_http:error] [pid 2343:tid 140411834115840] [client 192.168.0.180:58819] AH01097: pass request body failed to 172.16.2.2:8443 (sample.com) from 192.168.0.180 ()
[Fri Jun 03 05:56:40.844138 2016] [proxy:error] [pid 2343:tid 140411834115840] (502)Unknown error 502: [client 192.168.0.180:58819] AH01084: pass request body failed to 172.16.2.2:8443 (sample.com)
[Fri Jun 03 05:56:40.844177 2016] [proxy:error] [pid 2343:tid 140411834115840] [client 192.168.0.180:58819] AH00898: Error during SSL Handshake with remote server returned by /html/error/503.html
[Fri Jun 03 05:56:40.844185 2016] [proxy_http:error] [pid 2343:tid 140411834115840] [client 192.168.0.180:58819] AH01097: pass request body failed to 172.16.2.2:8443 (sample.com) from 192.168.0.180 ()

Why do I get a Linux SIGSEGV / segfault / Segmentation fault ?

To respond to this question you will need to get a coredump and use the GNU debugger (gdb).
The application generating the segmentation fault must be configured to produce a coredump or should be manually run with gdb in case a segmentation fault can be manually replicated.
Let us pick Apache to go through the debugging steps with a real world example. In order to produce a coredump apache2 needs to be configured:
$ vi /etc/apache2/apache2.conf
...
CoreDumpDirectory /tmp/apache-coredumps
...
Also the OS must not impose limits on the size of the core file in case we don't know how big it would be:
$ sudo bash
# ulimit -c unlimited
In addition the core dump directory must exist and must be owned by the apache user (www-data in this case)
sudo mkdir /tmp/apache-coredumps
sudo chown www-data:www-data /tmp/apache-coredumps
Make sure to stop and start apache separately
$ sudo apachectl stop
$ sudo apachectl start
Look into running processes:
$ ps -ef|grep apache2
Confirm that the processes are running with "Max core file size" unlimited:
$ cat /proc/${pid}/limits
...
Max core file size        unlimited            unlimited            bytes
...
Here is a quick way to do it with a one liner. It lists the max core file size for the parent apache2 process and all its children:
$ ps -ef | grep 'sbin/apache2' | grep -v grep | awk '{print $2}' | while read -r pid; do cat /proc/$pid/limits | grep core ; done
To test that your configuration works just force a coredump. This can be achieved by sending a SIGABRT signal to the process:
$ kill -SIGABRT ${pid}
Analyze the coredump file with gdb. In this case it confirms that the SIGABRT signal was used:
$ gdb core  /tmp/apache-coredumps/core
...
Core was generated by `/usr/sbin/apache2 -k start'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007f1bc74073bd in ?? ()
...
Leave apache running and when it fails with a segmentation fault you can confirm the reason:
$ gdb core  /tmp/apache-coredumps/core
...
Core was generated by `/usr/sbin/apache2 -k start'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007fc4d7ef6a11 in ?? ()
...
Explore then deeper to find out what the problem really is. Note that we run apache now through gdb to get deeper information about the coredump:
$ gdb /usr/sbin/apache2 /tmp/apache-coredumps/core
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007fc4d7ef6a11 in apr_brigade_cleanup (data=0x7fc4d8760e68) at /build/buildd/apr-util-1.5.3/buckets/apr_brigade.c:44
44 /build/buildd/apr-util-1.5.3/buckets/apr_brigade.c: No such file or directory.
Using the bt command from the gdb prompt gives us more:
Core was generated by `/usr/sbin/apache2 -k start'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007fc4d7ef6a11 in apr_brigade_cleanup (data=0x7fc4d8760e68) at /build/buildd/apr-util-1.5.3/buckets/apr_brigade.c:44
44 /build/buildd/apr-util-1.5.3/buckets/apr_brigade.c: No such file or directory.
(gdb) bt
#0  0x00007fc4d7ef6a11 in apr_brigade_cleanup (data=0x7fc4d8760e68) at /build/buildd/apr-util-1.5.3/buckets/apr_brigade.c:44
#1  0x00007fc4d7cd79ce in run_cleanups (cref=) at /build/buildd/apr-1.5.0/memory/unix/apr_pools.c:2352
#2  apr_pool_destroy (pool=0x7fc4d875f028) at /build/buildd/apr-1.5.0/memory/unix/apr_pools.c:814
#3  0x00007fc4d357de00 in ssl_io_filter_output (f=0x7fc4d876a8e0, bb=0x7fc4d8767ab8) at ssl_engine_io.c:1659
#4  0x00007fc4d357abaa in ssl_io_filter_coalesce (f=0x7fc4d876a8b8, bb=0x7fc4d8767ab8) at ssl_engine_io.c:1558
#5  0x00007fc4d87f1d2d in ap_process_request_after_handler (r=r@entry=0x7fc4d875f0a0) at http_request.c:256
#6  0x00007fc4d87f262a in ap_process_async_request (r=r@entry=0x7fc4d875f0a0) at http_request.c:353
#7  0x00007fc4d87ef500 in ap_process_http_async_connection (c=0x7fc4d876a330) at http_core.c:143
#8  ap_process_http_connection (c=0x7fc4d876a330) at http_core.c:228
#9  0x00007fc4d87e6220 in ap_run_process_connection (c=0x7fc4d876a330) at connection.c:41
#10 0x00007fc4d47fe81b in process_socket (my_thread_num=19, my_child_num=, cs=0x7fc4d876a2b8, sock=, p=, thd=)
    at event.c:970
#11 worker_thread (thd=, dummy=) at event.c:1815
#12 0x00007fc4d7aa7184 in start_thread (arg=0x7fc4c3fef700) at pthread_create.c:312
#13 0x00007fc4d77d437d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
I was miss led by the messages above until I started changing apache configurations and re-looking into generated coredumps. I realized Apache would fail at any line of any source code complaining about "No such file or directory" in a clear consequence of its incapacity to access resources that did exist:
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/sbin/apache2 -k start'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007fb44328d992 in ap_save_brigade (f=f@entry=0x7fb4432e5148, saveto=saveto@entry=0x7fb4432e59c8, b=b@entry=0x7fb435ffa568, 
    p=0x7fb443229028) at util_filter.c:648
648 util_filter.c: No such file or directory.
Disabling modules I was able to get to the bottom of it. An apparent bug in mod-proxy when a miss configuration causes recursive pulling of unavailable resources from the proxied target. Once the issue is found comment or eliminate the configuration from apache:
$ sudo vi /etc/apache2/apache2.conf
...
# CoreDumpDirectory /tmp/apache-coredumps
...
$ sudo apachectl graceful

Increasing swap size in Ubuntu Linux

Here is how to depart with parted (no pun intended) from a swap partition to a swap file in Ubuntu Linux. When in need to increase the swap file it is way easier to increase the size of a swap file than to increase the size of a partition. Starting with the 2.6 Linux kernel, swap files are just as fast as swap partitions. Remove the current swap partition:
$ sudo parted
…
(parted) print                                                            
…
Number  Start   End     Size    Type      File system     Flags
 1      1049kB  40.8GB  40.8GB  primary   ext4            boot
 2      40.8GB  42.9GB  2145MB  extended
 5      40.8GB  42.9GB  2145MB  logical   linux-swap(v1)
…
(parted) rm 2                                                             
(parted) print                                                            
…
Number  Start   End     Size    Type     File system  Flags
 1      1049kB  40.8GB  40.8GB  primary  ext4         boot
…
Delete the entry from stab
# swap was on /dev/sda5 during installation
# UUID=544f0d91-d3db-4301-8d1b-f6bfb2fdee5b none            swap    sw              0       0
Disable all swapping devices:
$ sudo swapoff -a
Create a swapfile with correct permissions, for example the below creates a 4GB one:
$ sudo fallocate -l 4G /swapfile
$ sudo chmod 600 /swapfile
Setup the linux swap area:
$ sudo mkswap /swapfile
Enable the swap device:
$ sudo swapon /swapfile
Confirm that it was created:
$ sudo swapon -s
…
Filename    Type  Size       Used Priority
/swapfile            file  4194300   0     -1
…
Add the entry to /etc/fstab:
/swapfile       none    swap    sw      0       0
Restart:
$ sudo reboot now
Confirm that the swap is active:
$ free
             total       used       free     shared    buffers     cached
Mem:       4048220    1651032    2397188        536      66648     336164
-/+ buffers/cache:    1248220    2800000
Swap:      4194300          0    4194300

$ sudo swapon -s
…
Filename    Type  Size       Used Priority
/swapfile            file  4194300   0     -1
…

Wednesday, May 25, 2016

From subversion to git with the help of gitlab

Here are the steps I recently followed to migrate a subversion repo into git:

Thursday, May 19, 2016

Filter output of Linux top command

To filter by process name:
top -cb | grep processName
Or if you want to interact with top just start it using 'top -c', then press "o" and use as filter "COMMAND=processName"

Thursday, April 21, 2016

Streaming and saving your presentation video

I need to write these steps so often that I rather have them handy
  1. go to https://www.youtube.com/
  2. Click on menu | mychannel
  3. Click on video Manager
  4. Click on Live Streaming | Events
  5. Click on New Live event
  6. Title: ${Name of presentation} ${part number}
  7. Select Option "Unlisted" if you want it to be available to only certain people
  8. Click on "go live now"
  9. Share screen and start broadcast
  10. When done stop broadcast
  11. From the video manager get the link and share it with the relevant people

Thursday, April 14, 2016

Upgrade Talend sqlite jdbc library

Talend is still ages behind with the outdated sqlite jar file it uses (sqlitejdbc-v056.jar). While we wait for a fix here is a workaround (using version 6.0.0 here, however I am sure you will figure easily what to do for other versions):
  1. Download latest sqlite-jdbc jar from https://github.com/xerial/sqlite-jdbc/releases
  2. Copy the jar (for example sqlite-jdbc-3.8.11.2.jar) as ~/.m2/repository/org/talend/libraries/sqlitejdbc-v056/6.0.0-SNAPSHOT/sqlitejdbc-v056.jar. Note that you must keep the same old name
  3. Copy the jar as /opt/TOS_DI-20150702_1326-V6.0.0/plugins/org.talend.libraries.jdbc.sqlite3_6.0.0.20150702_1326/lib/sqlitejdbc-v056.jar. Note again that you must keep the same old name. I assume your installation is done in /opt, if not follow the path convention and I am sure you will figure it out.

Sunday, April 10, 2016

Apache proxy to tomcat - Error during SSL Handshake with remote server (AH00898), pass request body failed (AH01097)

An error like the below means that certificates in the proxy server and the target server are not the same or are expired:
[Sun Apr 10 08:13:51.513836 2016] [proxy:error] [pid 32426:tid 140087715120896] [client 192.168.0.5:34425] AH00898: Error during SSL Handshake with remote server returned by /some/path
[Sun Apr 10 08:13:51.513848 2016] [proxy_http:error] [pid 32426:tid 140087715120896] [client 192.168.0.5:34425] AH01097: pass request body failed to 192.168.0.5:8443 (sample.com) from 192.168.0.5 ()
To understand exactly what is going on increase log level temporarily:
LogLevel info proxy:trace5
This will explain what is going on, for example:
[Sun Apr 10 11:45:30.708783 2016] [ssl:info] [pid 26391:tid 140560622925568] [remote 192.168.0.5:8443] AH02004: SSL Proxy: Peer certificate is expired
A one-liner will reveal why. The cert below just expired:
$ echo | openssl s_client -connect 192.168.0.5:8443 2>/dev/null | openssl x509 -noout -dates | grep 'notAfter=.*GMT'
notAfter=Apr 10 12:13:04 2016 GMT
To avoid cluster node certificates expiring at a different time that those from the proxy server use the same for all! To confirm they are the same use a hash to compare they are indeed the same:
$ md5sum /opt/tomcat/certs/my.crt
$ md5sum /opt/tomcat/certs/my.crt
$ md5sum /etc/apache2/certs/my.crt 
$ md5sum /etc/apache2/certs/my.key

Followers