Thinking In Software: 2016

Friday, December 30, 2016

AWS S3 Data transfer - Kill the Colo ;-)

Transferring data from your premises or Colo infrastructure to AWS Cloud is not longer as difficult as it used to be ;-)

Besides dedicated links and physical transport of files, Amazon provides a pure internet solution (S3 Transfer Acceleration) to transfer files to Amazon Simple Storage Solution (S3) which might be enough for your needs. I will describe here my experience using this method from a manual perspective (no scripting this time) which should be enough for cases when for example you are moving to the cloud those big on premises or on Colo files.

Start by performing a test to see how faster this method will be in comparison with the direct upload method http://s3-accelerate-speedtest.s3-accelerate.amazonaws.com/en/accelerate-speed-comparsion.html.

I had to transfer 1 TB of data to S3 from Windows 2008 servers. Here is how I did it.

To transfer your files with S3 Accelerated Transfer Upload Speed select your bucket from AWS console | Properties | Transfer Acceleration | Enable | Get the accelerated endpoint which will work just as the regular endpoint; for example mys3bucketname.s3-accelerate.amazonaws.com.

You can use AWS CLI in windows just as well as in *nix systems ( http://docs.aws.amazon.com/cli/latest/userguide/installing.html#install-msi-on-windows ) to upload or download files.

See below for an example that shows how I configure AWS CLI, list what I have in S3, set the AWS CLI to use the accelerated endpoint and finally copy the data to the bucket.

C:\> aws configure
AWS Access Key ID [****************HTAQ]:
AWS Secret Access Key [****************J+aE]:
Default region name [None]: us-east-1
Default output format [None]: json

C:\> aws s3 ls
2016-12-24 06:25:01 mys3bucket1
2016-12-03 15:15:37 mys3bucket2

C:\> aws configure set default.s3.use_accelerate_endpoint true

C:\> aws s3 sync E:\BigData\ s3://mys3bucket2

I got 12MB/sec versus 1.2MB/sec at regular speed. I was able to transfer 1 TB in around 16 hours. The good thing is that the command behaves like rsync meaning that new files or addition to existing files will be the only data you will be transferring after that first attempt. This is good news when you are planning to move the infrastructure to the cloud as it minimizes the possible business disruption timeframe.


C:\> aws configure set default.s3.use_accelerate_endpoint true

C:\> aws s3 sync s3://mys3bucket2 D:\SmallToBeBigDATA\

WARNING: S3 transfer Acceleration has an associated cost. All I can say is: It is worth it. It costed us $8.28 to transfer above 1TB of data from an on premises Windows Server to an EC2 hosted Windows Server via S3.

AWS Inventory - Audit the Cloud Infrastructure

Update 20180103: I just created a new PR that adds support for IAM, namely it lists users and the policies assigned to them. In order to use this version you need to add a new policy to the role used to run the Lambda:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "iam:ListUsers",
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": "iam:ListUserPolicies",
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": "iam:ListAttachedUserPolicies",
            "Resource": "*"
        }
    ]
}

Update 20171217: I just created a new PR with a couple of fixes:

* Lambda functions must finalize within a short period of time and therefore the amount of snapshots should be an external environment variable.
* Reserved IP addresses might not be in use in which case we should show the instance id as empty, otherwise we get an exception.

How difficult is to audit your AWS Cloud Infrastructure?

Instances, volumes, snapshots, RDS, security groups, elastic IPs and beyond. A single report to get all the invaluable information that will keep you informed to make critical and quick decisions.

The guys from powerupcloud shared an initial script in their blog which they put in github. I forked it and after some tweaks found it so useful that I decided to ask the author for a pull request. The new Lambda Function:

* Adds support for environment variables
* Adds security groups listing
* Removes hardcoded and non generic names
* Corrects some comments
* Retrieves the ownerId instead of hardcoding it
* Adds the description for volumes for clearer identification
* Lists the Elastic IPs with the instanceId the are assigned to for clearer identification
* Has a TODO ;-: Naming conventions should be established

Here is a quick start:

Create IAM role | Name:Inventory; Role Type: AWSLambda; Policy: EC2ReadOnly, AmazonS3FullAccess, AmazonRDSReadOnlyAccess
Create S3 bucket | Name: YourInventoryS3Name
Create Lambda | Name: YourInventoryLambda; Description: Extracts the AWS Inventory; Runtime: Python; Role: Inventory; Timeout: 3 min; environment variables: SES_SMTP_USER, SES_SMTP_PASSWORD, S3_INVENTORY_BUCKET, MAIL_FROM, MAIL_TO
Schedule the Lambda: Select Lambda | Trigger | Add Trigger | CloudWatch Events - Schedule | Rule Name: InventorySchedulerRule; Rule Description: Inventory Scheduler Rule; Schedule Expression: cron(0 14 ? * FRI *) if you want it to run every Friday at 9AM ET| Enable Trigger | Submit

To be able to send emails from AWS you need to allow Amazon Simple Email Service (SES) to send emails for some emails in your domain:

Verify the domain: SES | Manage Entities | Verify a New Domain ; yourdomain.com
Follow the instructions. To complete the verification you must add several TXT/CNAME records to your domain's DNS settings
Verify email addresses: SES | hit the domain | Verify a New Email Address | Confirm from email
After getting confirmation go to SES | SMTP Settings | Create my SMTP Credentials | Provide a username like ‘anyusername’ for example and Amazon will show you the credentials (SMTP Username and SMTP Password). Keep them in safe place. You can also download them. This is the last time AWS will share them with you.

You can clone/fetch/checkout my latest version from my fork: Really useful stuff. Thanks PowerUpCloud for the quick start!

Tuesday, December 27, 2016

AWS EC2 DR - cross region replication for COB via Lambda Function

It happens. The continuity of the business (COB) will be compromised if there is not a sound Disaster Recovery (DR) plan. DR is the ultimate goal of a business continuity plan (BCP). To accomplish this task for Amazon hosted machines we could use a CloudFormation stack template as we did for the in region backup. However, if we just need to replicate all snapshots created in a region (in region backup) to another, then all we need to build is a Lambda function and schedule it.

Create an AWS Identity and Access Management (IAM) Role: AWS Console | Services | IAM | Roles | Create New Role | Name: CrossRegionReplication | Next | Select Role Type: AWS Lambda | Attach Policy: AmazonEC2FullAccess | Create Role

Create a Lambda Function: AWS Console | Services | Lambda | Create a Lambda Function | Configure Function | Name: replicateAll; Description: Cross Region Replication; Runtime: Python; Paste the code from aws-cross-region-replicate-all-lambda.py script and customize it to your needs; Role: CrossRegionReplication; Timeout: 5 min; | Next | Create Function Test the Lambda: You can test the whole function or just part of it using the “Test” button. Very useful for example to see what it will do if you comment the copy_snapshot() statement.

Schedule the Lambda: Select Lambda | Trigger | Add Trigger | CloudWatch Events - Schedule | Rule Name: ReplicationSchedulerRule; Rule Description: Replication Scheduler Rule; Schedule Expression: rate(1 hour) | Enable Trigger | Submit

What did we do? In the last two posts I have shown how to create a bare minimum BCP/DR for AWS EC2. There is a lot more to have ready in order to make sure that a DR datacenter can operate as the new live datacenter, however having the data volumes available is the absolute first step. Out of the snapshots we could at least manually rebuild the environment. Culture first.

CloudFormation AWS EC2 BCP - in region backups for COB

To achieve continuity of business (COB) you need a business continuity plan (BCP). A crucial part of such plan is to have available backups.

Amazon Web Services (AWS) offer Amazon Elastic Compute Cloud (EC2) instances that can be crash-consistent backed up into Amazon Simple Storage Solution (S3) via Amazon Elastic Block Store (EBS) snapshots.

Here I show how to use the out of the box provided AWS CloudFormation Template for EBS Snapshot Scheduler. No need to deal with AWS CLI, lambda functions, task scheduling, extra dedicated instances, IAM policies and permissions.

From https://s3.amazonaws.com/solutions-reference/ebs-snapshot-scheduler/latest/ebs-snapshot-scheduler.pdf locate the “View Template” button which will download the ebs-snapshot-scheduler.template file (there is also a “download the template”). Rename your template and customize it. Below is an example of relevant fields you might want to change up front:

...
    "Description": "EBS Snapshot Scheduler for Test Environment,
...
        "DefaultRetentionDays": {
            "Description": "Default Retention Period (number of days)",
            "Type": "String",
            "Default": "7"
        },
...
        "CustomTagName": {
            "Description": "Custom Tag Name for testing environment backup",
            "Type": "String",
            "Default": "scheduler:ebs-snapshot:test"
        },
...
        "DefaultTimeZone": {
            "Type": "String",
            "Default": "US/Eastern",
...
        "AutoSnapshotDeletion": {
            "Description": "Enable auto-delete EBS snapshots after retention period.",
            "Type": "String",
            "Default": "Yes",
...
        "HistoryDDBTableName": {
            "Description": "History DynamoDB Table Name for Testing Environment",
            "Type": "String",
            "Default": "Scheduler-EBS-Snapshot-History-Test"
        },
        "PolicyDDBTableName": {
            "Description": "Policy DynamoDB Table Name for Testing Environment",
            "Type": "String",
            "Default": "Scheduler-EBS-Snapshot-Policy-Test"
        },
...

The Launch Solution button/link will allow you to define a new CloudFormation Stack (the collection of resources to be provisioned which are defined in a template file). Alternatively use Console | Services | CloudFormation | Create Stack | Choose a Template. Make sure to check what zone you are in the console UI and as usual change it to the correct one.

Select “Upload a template to Amazon S3” and upload the one you created above.

Type a stack name like EBSSnapshotSchedulerTest. If you don’t pick a name the default would be EBSSnapshotScheduler. You can configure from the UI most of the previously mentioned configuration but you might want to do that from the template anyway to keep your configuration files current and correctly versioned.

Press Next all the way through to go with defaults, check “I acknowledge that AWS CloudFormation might create IAM resources.” and hit Create.

Wait until the status of your stack is CREATE_COMPLETE and tag the instances you want to include in the backup, for example the below tag will use defaults:

scheduler:ebs-snapshot:test=true

The below tag will mean take two daily snapshots, one at midnight and one at noon in UTC zone and keep them for 7 years (2555 days):

scheduler:ebs-snapshot:prod:00 0000;2555;utc;all
scheduler:ebs-snapshot:prod:12 1200;2555;utc;all

You can deactivate by removing the tags.

To get an idea about how much time you saved take a look at your template from the AWS CloudFormation Designer:

That’s probably 24 hours worth of work and troubleshooting. It is worth to be said that I published a scheduled Lambda that will take care of cross region replication, which demonstrates that we could achieve the same most important goal for in-region backup in a simpler way. Of course you will need to code a bit for that solution. Hopefully the creators of the CloudFormation EBS Snapshot Scheduler will either add replication as a feature or will build a new CloudFormation to take care of replication for the masses.

Adding new backup policies should be straightforward. You just need to focus on customizing a new template and following the steps described above. For example, for a typical production 7 years retention backup we can use the below selecting as Stack name EBSSnapshotSchedulerProd. Don’t forget to tag the instances you want to include in the backup “scheduler:ebs-snapshot:prod=true”. To deactivate it use “scheduler:ebs-snapshot:prod=none” or just remove it:

...
    "Description": "EBS Snapshot Scheduler for Production Environment,
...
        "DefaultRetentionDays": {
            "Description": "Default Retention Period (number of days)",
            "Type": "String",
            "Default": "2555"
        },
...
        "CustomTagName": {
            "Description": "Custom Tag Name for production environment backup",
            "Type": "String",
            "Default": "scheduler:ebs-snapshot:prod"
        },
...
        "DefaultTimeZone": {
            "Type": "String",
            "Default": "US/Eastern",
...
        "AutoSnapshotDeletion": {
            "Description": "Enable auto-delete EBS snapshots after retention period.",
            "Type": "String",
            "Default": "Yes",
...
        "HistoryDDBTableName": {
            "Description": "History DynamoDB Table Name for Production Environment",
            "Type": "String",
            "Default": "Scheduler-EBS-Snapshot-History-Prod"
        },
        "PolicyDDBTableName": {
            "Description": "Policy DynamoDB Table Name for Production Environment",
            "Type": "String",
            "Default": "Scheduler-EBS-Snapshot-Policy-Prod"
        },
...

The ebs-snapshot-scheduler is open source.

If you need to take multiple backups within a day you will need to retag your instance but you can still use just one CloudFormation stack. I have opened a feature request to avoid adding an extra tag per snapshot creation time leveraging cron expressions.

Thursday, December 22, 2016

Working with Microsoft SQL Server (MSSQL) from Ubuntu Linux Terminal

Change your TDS version below as needed. You can add the $password directly at the end if you know what you are doing.

Wednesday, December 21, 2016

NetSuite tokenPassport Generator

If you need to test the NetSuite SOAP API (SuiteCloud) with tools like SOAPUI you might need a quick way to generate the tokenPassport node:

Friday, December 09, 2016

The lean http to https redirection in IIS

The leanest way to run a secure website from IIS is to serve just one application. If you can live with that, then redirecting HTTP to HTTPS traffic should not be as painful:

Add IIS Redirect Module: Server Manager | Roles Summary | Add Roles | Web Server - Common HTTP Features - HTTP Redirection
Redirect http to https: IIS Manager | Select server node | Bindings | Remove port 80 binding | Default Web Site node | Bindings | Set just port 80 binding | HTTP Redirect | Redirect to a specific https:// destination | Status code =Found (302)

I prefer to use the "Found" (302 option) rather than "Permanent" (301 option) just in case I want to change my domains in the future.

Simplicity rules.

Wednesday, November 23, 2016

lxde logout button not working

sudo apt-get update
sudo apt-get install lxsession-logout

Monday, November 21, 2016

What is my IP from *nix

Just for my own reminder:

dig +short myip.opendns.com @resolver1.opendns.com

Friday, October 21, 2016

Test your SSL certificate before deploying into Apache

Put key, cert and CA in a directory. Confirm the below outputs no errors and ends in ‘ACCEPT’:

site=sample.com
cacert=GeoTrust_CA_Bundle.crt
openssl s_server -accept 9090 -www \
  -cert ${site}.crt -key ${site}.key \
  -CAfile $cacert

Change User Agent in Chrome 53

]Right click | Inspect | Vertical three dots menu | More Tools | Network Conditions:

Thanks to Williams Medina for pointing this out.

Saturday, August 27, 2016

Stop Apache from rendering icons README file

If you try to access /icons/README or /icons/README.html from your domain and you get a page with content, then you know apache is running on the server side and exposing some default content. How bad is that? Probably not big deal but its removal is mandatory when it comes to hardening apache for most situations. Better not to give the public any chance to access a resource that should not be exposed. You can correct this with s simple POB recipe:

Thursday, August 25, 2016

No Physical Windows Login Screen after accessing it via RDP?

Use Windows logo key + P keyboard shortcut which is intended to extend the desktop to multiple monitors. Why you need to do that and not just hit a key or move your mouse is an excellent question. To me, it is a bug.

Monday, August 22, 2016

Show version number in jenkins job history

Everybody handles versioning in a different way but if your jenkins job sends the version to the output stream you can use the below steps to show such version in the build history. This is handy specially for test jobs as you want to make sure that tests did pass for that specific version.

Let us assume that your job prints something like "Front-End Version: 1.2509". Here is how you print the version number in the jenkins job history:

Install "Groovy Postbuild" plugin.
Add a post build action type "Groovy postbuild"

Insert the below in "Groovy Script" and save

def m = manager.getLogMatcher("^.*Front-End Version: (.*)\$")
// to debug using job logs
// manager.listener.logger.println("found version:" + m.group(1))
if(m != null && m.matches()) {
  version = m.group(1)
  manager.addShortText(version)
}

Run the job and you will get the version number next to the build.

Wednesday, August 03, 2016

Speeding up end to end (e2e) tests

How fast can you end to end test your application? It depends on how fast is your application.

You should parallelize your tests which of course must be idempotent. This means your tests should not step on each other toes. Your tests should take as long as the longest scenario you are trying to assert.

There are several ways to parallelize the tests. For example for web development when using protractor you can use {capabilities: {shardTestFiles: true, maxInstances: n}}. You should not use for n more than max number of processors - 1. You can use tools like jenkins or custom scripts that will spawn several tests at the same time but you will always face the limitations of the hardware used in each node. However you are clustering right? So why bother about the hardware for each node? Here are some numbers from a test run in a 4 processor VM using protractor and direct web driver configured to use 1,2 and 4 maxinstances:

1 instance: 87 sec
Wed Aug  3 09:53:25 EDT 2016
Wed Aug  3 09:54:52 EDT 2016

2 instances: 73 sec
Wed Aug  3 09:59:41 EDT 2016
Wed Aug  3 10:01:54 EDT 2016

4 instances: 117 sec
Wed Aug  3 10:10:06 EDT 2016
Wed Aug  3 10:12:03 EDT 2016

Not big savings really. We know better. Imposing a limitation on how long each test should take should be your premise. From there you act, build tests that include just enough scenarios that do not go beyond your acceptable time to run all e2e for your application, isolate them from each other, distribute the load. Divide and conquer.

I had an argument with somebody (I believe in stack overflow) over the efficiency of running browsers in the background (so called running the browser in headless mode) or in the foreground. When it comes to development, you should run it in the background most of the time so that your screen does not distract you from some other tasks that you are currently performing. When it comes to live testing it does not matter. There are no performance gains whatsoever in running your tests in the foreground or background. Of course if you run in headless mode you can leverage servers instead of desktops but it turns out that desktops allow you to easily debug what is going on in case you really need it: Just RDP into it and you see what is happening there. We must remember that we are testing the user interface/experience because that is how any program in the world ultimately functions: top-down.

Trying headless mode is straightforward. Here is how to do it in debian/ubuntu which I shared in stack overflow:

curl -k https://gist.githubusercontent.com/terrancesnyder/995250/raw/cdd1f52353bb614a5a016c2e8e77a2afb718f3c3/ephemeral-x.sh -o ~/ephemeral-x.sh
chmod +x ~/ephemeral-x.sh
date; ~/ephemeral-x.sh protractor … ; date

Here are my numbers for one instance running in headless mode. Compare the results between the previous GUI based versus the now headless mode. Having the real browser up and running ended up with a slightly better results. But I would not conclude from this that it is better. I would need to repeat this experiment several times before emitting such statement and I really do not have the time for it, but perhaps you do and want to report back your findings:

1 instance headless mode:

Wed Aug  3 10:35:41 EDT 2016
Wed Aug  3 10:37:14 EDT 2016

Bottom line, stop trying to resolve "test inefficiencies" or departing from UAT because "e2e are slow". Face the fact that perception is reality and take care of the performance of your application, make it fast. The application code is not just the runtime code but it is also the test code. All code should be as efficient as it can possibly be. All user interactions included in a particular scenario or group of scenarios should be wrapped in a unique test but only if those meet your test time objective (TTO). If the TTO = 1 minute then no test should take more that 1 minute. Then, spend money on a cluster of machines to perform the tests in parallel and get all tested in less than a minute. These machines can be spawned on demand of course but then account for the startup time as part of your TTO. Humans should be driven by goals and not by tools' deficiencies.

Monday, June 13, 2016

#NoProjects are needed for software development - Prioritize MMF based on multi-dimensional analysis (value, risk and cost of delay)

#NoProjects are needed for software development. Prioritize MMF based on multi-dimensional analysis (value, risk and cost of delay) instead.

I was tempted to share why I think that Projects in software development kill value rather than bring value but before posting I did my homework and I found that somebody wrote about it before.

Prioritization should be about selecting the minimal marketable features (MMF) that bring the biggest value, minimize the cost of delay and/or are bigger risk reducers (what I call multi-dimensional analysis). It should be done continuously just as software should be shipped. Accepting a big effort as more profitable than focusing on MMF delivery is a mistake. The devil is *always* in the details.

Saturday, June 04, 2016

AH00051: child pid ? exit signal Segmentation fault (11), possible coredump in ?

I ran into the below segmentation fault in apache accompanied by high CPU usage in the server:

[Fri Jun 03 06:00:01.270303 2016] [core:notice] [pid 29628:tid 140412199950208] AH00051: child pid 2343 exit signal Segmentation fault (11), possible coredump in /tmp/apache-coredumps

After debugging what the problem was, I concluded that mod proxy was the culprit. There is an apparent recursive bug. I decided not to report it because it was hard to replicate (this happened only late at night) and there was a workaround I could use. This error happened when there was a first request for a resource that the proxied worker could not serve, apache tried to use a specific document for the error but a configuration issue would try to pull such document from the worker as well. Since the document is not in the remote server mod proxy will enter an infinite loop resulting in an apache crash. Most likely a recursion related bug. Clearly resolving the configuration issue to make sure the document is pulled from the local apache hides this potential mod proxy bug.

[Fri Jun 03 05:56:40.831871 2016] [proxy:error] [pid 2343:tid 140411834115840] (502)Unknown error 502: [client 192.168.0.180:58819] AH01084: pass request body failed to 172.16.2.2:8443 (sample.com)
[Fri Jun 03 05:56:40.831946 2016] [proxy:error] [pid 2343:tid 140411834115840] [client 192.168.0.180:58819] AH00898: Error during SSL Handshake with remote server returned by /sdk
[Fri Jun 03 05:56:40.831953 2016] [proxy_http:error] [pid 2343:tid 140411834115840] [client 192.168.0.180:58819] AH01097: pass request body failed to 172.16.2.2:8443 (sample.com) from 192.168.0.180 ()
[Fri Jun 03 05:56:40.844138 2016] [proxy:error] [pid 2343:tid 140411834115840] (502)Unknown error 502: [client 192.168.0.180:58819] AH01084: pass request body failed to 172.16.2.2:8443 (sample.com)
[Fri Jun 03 05:56:40.844177 2016] [proxy:error] [pid 2343:tid 140411834115840] [client 192.168.0.180:58819] AH00898: Error during SSL Handshake with remote server returned by /html/error/503.html
[Fri Jun 03 05:56:40.844185 2016] [proxy_http:error] [pid 2343:tid 140411834115840] [client 192.168.0.180:58819] AH01097: pass request body failed to 172.16.2.2:8443 (sample.com) from 192.168.0.180 ()

Why do I get a Linux SIGSEGV / segfault / Segmentation fault ?

To respond to this question you will need to get a coredump and use the GNU debugger (gdb).
The application generating the segmentation fault must be configured to produce a coredump or should be manually run with gdb in case a segmentation fault can be manually replicated.
Let us pick Apache to go through the debugging steps with a real world example. In order to produce a coredump apache2 needs to be configured:

$ vi /etc/apache2/apache2.conf
...
CoreDumpDirectory /tmp/apache-coredumps
...

Also the OS must not impose limits on the size of the core file in case we don't know how big it would be:

$ sudo bash
# ulimit -c unlimited

In addition the core dump directory must exist and must be owned by the apache user (www-data in this case)

sudo mkdir /tmp/apache-coredumps
sudo chown www-data:www-data /tmp/apache-coredumps

Make sure to stop and start apache separately

$ sudo apachectl stop
$ sudo apachectl start

Look into running processes:

$ ps -ef|grep apache2

Confirm that the processes are running with "Max core file size" unlimited:

$ cat /proc/${pid}/limits
...
Max core file size        unlimited            unlimited            bytes
...

Here is a quick way to do it with a one liner. It lists the max core file size for the parent apache2 process and all its children:

$ ps -ef | grep 'sbin/apache2' | grep -v grep | awk '{print $2}' | while read -r pid; do cat /proc/$pid/limits | grep core ; done

To test that your configuration works just force a coredump. This can be achieved by sending a SIGABRT signal to the process:

$ kill -SIGABRT ${pid}

Analyze the coredump file with gdb. In this case it confirms that the SIGABRT signal was used:

$ gdb core  /tmp/apache-coredumps/core
...
Core was generated by `/usr/sbin/apache2 -k start'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007f1bc74073bd in ?? ()
...

Leave apache running and when it fails with a segmentation fault you can confirm the reason:

$ gdb core  /tmp/apache-coredumps/core
...
Core was generated by `/usr/sbin/apache2 -k start'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007fc4d7ef6a11 in ?? ()
...

Explore then deeper to find out what the problem really is. Note that we run apache now through gdb to get deeper information about the coredump:

$ gdb /usr/sbin/apache2 /tmp/apache-coredumps/core
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007fc4d7ef6a11 in apr_brigade_cleanup (data=0x7fc4d8760e68) at /build/buildd/apr-util-1.5.3/buckets/apr_brigade.c:44
44 /build/buildd/apr-util-1.5.3/buckets/apr_brigade.c: No such file or directory.

Using the bt command from the gdb prompt gives us more:

Core was generated by `/usr/sbin/apache2 -k start'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007fc4d7ef6a11 in apr_brigade_cleanup (data=0x7fc4d8760e68) at /build/buildd/apr-util-1.5.3/buckets/apr_brigade.c:44
44 /build/buildd/apr-util-1.5.3/buckets/apr_brigade.c: No such file or directory.
(gdb) bt
#0  0x00007fc4d7ef6a11 in apr_brigade_cleanup (data=0x7fc4d8760e68) at /build/buildd/apr-util-1.5.3/buckets/apr_brigade.c:44
#1  0x00007fc4d7cd79ce in run_cleanups (cref=) at /build/buildd/apr-1.5.0/memory/unix/apr_pools.c:2352
#2  apr_pool_destroy (pool=0x7fc4d875f028) at /build/buildd/apr-1.5.0/memory/unix/apr_pools.c:814
#3  0x00007fc4d357de00 in ssl_io_filter_output (f=0x7fc4d876a8e0, bb=0x7fc4d8767ab8) at ssl_engine_io.c:1659
#4  0x00007fc4d357abaa in ssl_io_filter_coalesce (f=0x7fc4d876a8b8, bb=0x7fc4d8767ab8) at ssl_engine_io.c:1558
#5  0x00007fc4d87f1d2d in ap_process_request_after_handler (r=r@entry=0x7fc4d875f0a0) at http_request.c:256
#6  0x00007fc4d87f262a in ap_process_async_request (r=r@entry=0x7fc4d875f0a0) at http_request.c:353
#7  0x00007fc4d87ef500 in ap_process_http_async_connection (c=0x7fc4d876a330) at http_core.c:143
#8  ap_process_http_connection (c=0x7fc4d876a330) at http_core.c:228
#9  0x00007fc4d87e6220 in ap_run_process_connection (c=0x7fc4d876a330) at connection.c:41
#10 0x00007fc4d47fe81b in process_socket (my_thread_num=19, my_child_num=, cs=0x7fc4d876a2b8, sock=, p=, thd=)
    at event.c:970
#11 worker_thread (thd=, dummy=) at event.c:1815
#12 0x00007fc4d7aa7184 in start_thread (arg=0x7fc4c3fef700) at pthread_create.c:312
#13 0x00007fc4d77d437d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

I was miss led by the messages above until I started changing apache configurations and re-looking into generated coredumps. I realized Apache would fail at any line of any source code complaining about "No such file or directory" in a clear consequence of its incapacity to access resources that did exist:

Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/sbin/apache2 -k start'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007fb44328d992 in ap_save_brigade (f=f@entry=0x7fb4432e5148, saveto=saveto@entry=0x7fb4432e59c8, b=b@entry=0x7fb435ffa568, 
    p=0x7fb443229028) at util_filter.c:648
648 util_filter.c: No such file or directory.

Disabling modules I was able to get to the bottom of it. An apparent bug in mod-proxy when a miss configuration causes recursive pulling of unavailable resources from the proxied target. Once the issue is found comment or eliminate the configuration from apache:

$ sudo vi /etc/apache2/apache2.conf
...
# CoreDumpDirectory /tmp/apache-coredumps
...
$ sudo apachectl graceful

Increasing swap size in Ubuntu Linux

Here is how to depart with parted (no pun intended) from a swap partition to a swap file in Ubuntu Linux. When in need to increase the swap file it is way easier to increase the size of a swap file than to increase the size of a partition. Starting with the 2.6 Linux kernel, swap files are just as fast as swap partitions. Remove the current swap partition:

$ sudo parted
…
(parted) print                                                            
…
Number  Start   End     Size    Type      File system     Flags
 1      1049kB  40.8GB  40.8GB  primary   ext4            boot
 2      40.8GB  42.9GB  2145MB  extended
 5      40.8GB  42.9GB  2145MB  logical   linux-swap(v1)
…
(parted) rm 2                                                             
(parted) print                                                            
…
Number  Start   End     Size    Type     File system  Flags
 1      1049kB  40.8GB  40.8GB  primary  ext4         boot
…

Delete the entry from stab

# swap was on /dev/sda5 during installation
# UUID=544f0d91-d3db-4301-8d1b-f6bfb2fdee5b none            swap    sw              0       0

Disable all swapping devices:

$ sudo swapoff -a

Create a swapfile with correct permissions, for example the below creates a 4GB one:

$ sudo fallocate -l 4G /swapfile
$ sudo chmod 600 /swapfile

Setup the linux swap area:

$ sudo mkswap /swapfile

Enable the swap device:

$ sudo swapon /swapfile

Confirm that it was created:

$ sudo swapon -s
…
Filename    Type  Size       Used Priority
/swapfile            file  4194300   0     -1
…

Add the entry to /etc/fstab:

/swapfile       none    swap    sw      0       0

Restart:

$ sudo reboot now

Confirm that the swap is active:

$ free
             total       used       free     shared    buffers     cached
Mem:       4048220    1651032    2397188        536      66648     336164
-/+ buffers/cache:    1248220    2800000
Swap:      4194300          0    4194300

$ sudo swapon -s
…
Filename    Type  Size       Used Priority
/swapfile            file  4194300   0     -1
…

Wednesday, May 25, 2016

From subversion to git with the help of gitlab

Here are the steps I recently followed to migrate a subversion repo into git:

Thursday, May 19, 2016

Filter output of Linux top command

To filter by process name:

top -cb | grep processName

Or if you want to interact with top just start it using 'top -c', then press "o" and use as filter "COMMAND=processName"

Thursday, April 21, 2016

Streaming and saving your presentation video

I need to write these steps so often that I rather have them handy

go to https://www.youtube.com/
Click on menu | mychannel
Click on video Manager
Click on Live Streaming | Events
Click on New Live event
Title: ${Name of presentation} ${part number}
Select Option "Unlisted" if you want it to be available to only certain people
Click on "go live now"
Share screen and start broadcast
When done stop broadcast
From the video manager get the link and share it with the relevant people

Thursday, April 14, 2016

Upgrade Talend sqlite jdbc library

Talend is still ages behind with the outdated sqlite jar file it uses (sqlitejdbc-v056.jar). While we wait for a fix here is a workaround (using version 6.0.0 here, however I am sure you will figure easily what to do for other versions):

Download latest sqlite-jdbc jar from https://github.com/xerial/sqlite-jdbc/releases
Copy the jar (for example sqlite-jdbc-3.8.11.2.jar) as ~/.m2/repository/org/talend/libraries/sqlitejdbc-v056/6.0.0-SNAPSHOT/sqlitejdbc-v056.jar. Note that you must keep the same old name
Copy the jar as /opt/TOS_DI-20150702_1326-V6.0.0/plugins/org.talend.libraries.jdbc.sqlite3_6.0.0.20150702_1326/lib/sqlitejdbc-v056.jar. Note again that you must keep the same old name. I assume your installation is done in /opt, if not follow the path convention and I am sure you will figure it out.

Sunday, April 10, 2016

Apache proxy to tomcat - Error during SSL Handshake with remote server (AH00898), pass request body failed (AH01097)

An error like the below means that certificates in the proxy server and the target server are not the same or are expired:

[Sun Apr 10 08:13:51.513836 2016] [proxy:error] [pid 32426:tid 140087715120896] [client 192.168.0.5:34425] AH00898: Error during SSL Handshake with remote server returned by /some/path
[Sun Apr 10 08:13:51.513848 2016] [proxy_http:error] [pid 32426:tid 140087715120896] [client 192.168.0.5:34425] AH01097: pass request body failed to 192.168.0.5:8443 (sample.com) from 192.168.0.5 ()

To understand exactly what is going on increase log level temporarily:

LogLevel info proxy:trace5

This will explain what is going on, for example:

[Sun Apr 10 11:45:30.708783 2016] [ssl:info] [pid 26391:tid 140560622925568] [remote 192.168.0.5:8443] AH02004: SSL Proxy: Peer certificate is expired

A one-liner will reveal why. The cert below just expired:

$ echo | openssl s_client -connect 192.168.0.5:8443 2>/dev/null | openssl x509 -noout -dates | grep 'notAfter=.*GMT'
notAfter=Apr 10 12:13:04 2016 GMT

To avoid cluster node certificates expiring at a different time that those from the proxy server use the same for all! To confirm they are the same use a hash to compare they are indeed the same:

$ md5sum /opt/tomcat/certs/my.crt
$ md5sum /opt/tomcat/certs/my.crt
$ md5sum /etc/apache2/certs/my.crt 
$ md5sum /etc/apache2/certs/my.key

Saturday, April 02, 2016

What are the most important Key Performance Indicators (KPI) to be measured?

One of the most commonly asked questions about Project Management is "what are the most important Key Performance Indicators (KPI) I should choose?" The answer as usual is: it depends.
However, assuming that you are building an identified minimal viable product (MVP), for which you will be continuously delivering minimal marketable features (MMF) to create or maintain a minimal marketable product (MMP) or a minimal marketable service (MMS), here is what I think the starting point should be:

Defect Ratio: Because without quality there is not sustainable productivity. To measure it, calculate what percentage of the total MMF are defects both for total open and closed per period MMF to understand the quality of your service or product.
Work In Progress (WIP): Because context switching works against human productivity. To measure it, inspect the cumulative flow diagram (CFD) for increased MMF WIP slope in relationship to closed MMF slope. To be precise, divide the gradients for a ratio. Both slopes should be the same to ensure optimal WIP limits are in place. That means the ratio should be 1. Forcing the Personal WIP limit to 1 is ideal, beyond 3 is chaotic.
Delivery Frequency: Because to avoid entropy you need to deliver as soon as it is done without piling up. To measure it, inspect the CFD for changes in closed-issues slope. The slope should be either constant or increasing, never decreasing. Deployments per period could give you a good measure of how often the company delivers.
Demand versus throughput balance: Because delivery is done through a funnel where the mouth and stem sizes signal the need for adjustment of one of the three iron triangle measures: resources, scope and schedule. To measure it, calculate the open issues divided by target issues. This ratio should be maintained or decreasing, never increasing.
Prioritization time: Because finding out what the customer wants is crucial but time consuming, any time over-spent on this is a nonsense waste. Prioritize based on available slots from the already prioritized backlog queue to keep a small weekly scope-framed value-cost-of-delay-risk multi-var-dimensional analysis. Measure it assigning to each prioritized MMF the time it took in arriving to its choice. Make the prioritization choice based on the results from a radar chart using relevant variables as axis and deciding what is most important based on resulting taxonomies.
Variability: Because predictability is the ultimate goal for continuous delivery of MMF (without continuous delivery the cost of delay and risk goes up). Measure it looking at the control chart for cycle and lead time. The objective is to narrow the standard deviation.
Throughput: Because it is what the business unit can offer in terms of resources and capabilities. Calculate it using the number of MMF completed. Its result is a quantitative measurement of the output.
Productivity: Because the objective is to deliver products or services that are effective (needed) and efficient (optimal use of resources). Calculate effectiveness as the ratio between current hours spent in the value stream and total hours paid. Calculate efficiency as the ratio between expected hours per MMF and your current hours per MMF. The team is more productive when both effectiveness and efficiency are high.

I can't see how a modern organization can achieve sustainable growth and a high capability maturity without using these essential metrics. To me, they are a must to reach operational effectiveness. What is left is Industry comparison. If you have numbers for the below, feel free to share them confidentially with me (nestor.urquiza at gmail). I would like to build Industry standards out of this survey:

Main Industry: ?
Defect Ratio: ?
Personal WIP: ?
Deployments per month: ?
Total MMF open at the end of the month / Total MMF closed per month: ?
Average prioritization time per MMF: ?
MMF cycle time standard deviation: ?
MMF lead time standard deviation: ?
MMF closed per month: ?
Hours spent in value stream / Hours paid: ?
Expected hours per MMF: ?
Current average hours per MMF: ?

Acknowledgement: I wouldn't have dared to think profoundly about metrics without entering the world of lean thinking. I entered this world after reading some W. Edwards Deming literature, getting introduced to TPS, Toyota Kanban and finally studying the blue book from David J. Anderson (a real revelation and source of constant reference for me).

Wednesday, March 30, 2016

AngularJS EvalError: Refused to evaluate a string as JavaScript because 'unsafe-eval'

Following angular documentation for ngCsp looks like just by using the directive is not enough to avoid errors if you use inline scripts. Even after following the guidelines we were still getting randomly the below errors causing angular to stop working and consequently blank content. The reason why it was random was that we had inline javascript coming up randomly in the page:

angular.js:13424 EvalError: Refused to evaluate a string as JavaScript because 'unsafe-eval' is not an allowed source of script in the following Content Security Policy directive: "default-src 'self' 'unsafe-inline'". at Function.jQuery.extend.globalEval (jquery.js:343) at domManip (jquery.js:5290) at jQuery.fn.extend.after (jquery.js:5456) at domInsert (angular.js:5189) at Object.$provide.$get.$$animateQueue.enter (angular.js:5352) at angular.js:25332 at $$sanitizeUriProvider.$get.node (angular.js:8212) at angular.js:8551 at boundTranscludeFn (angular.js:8350) at controllersBoundTransclude (angular.js:9072)

Inspecting the jquery code it is clear that eval is used:

globalEval: function( code ) {
  var script,
   indirect = eval;

  code = jQuery.trim( code );

  if ( code ) {

   // If the code includes a valid, prologue position
   // strict mode pragma, execute code by injecting a
   // script tag into the document.
   if ( code.indexOf( "use strict" ) === 1 ) {
    script = document.createElement( "script" );
    script.text = code;
    document.head.appendChild( script ).parentNode.removeChild( script );
   } else {

    // Otherwise, avoid the DOM node creation, insertion
    // and removal by using an indirect global eval

    indirect( code );
   }
  }
 }

When angular finds inline scripts it calls the jQuery after() which triggers the globalEval()

      afterElement ? afterElement.after(element) : parentElement.prepend(element);

Clearly a quick way to identify the culprit is just looking into the html content for embedded scripts. In our case Google mod-pagespeed for apache inserted an embedded script tag as shown below:

If the scripts are below an angular directive as in the below example then you will get the error:

<!DOCTYPE html>
<html lang="en" ng-app  ng-csp>
  <head>
    <script type="text/javascript" charset="utf-8" src="jquery.js"></script>
    <script type="text/javascript" charset="utf-8" src="angular.js"></script>
  </head>
  <body>
    <div ng-if="true">
      <script>console.log('foo');</script>
    </div>
  </body>
</html>

If you can live without it then you can remove the script like we did with mod-pagespeed. If not, you will need to include your javascript code from an external source file or avoid loading jquery before angular which will limit what you can do with for example angular.element.

I think Angular should probably just detect if ng-csp is being used to avoid executing inline scripts in such case. It could also come up with a non eval way if there is still a need to run inline scripts just as it does with the "Angular's built-in subset of jQuery, called 'jQuery lite' or jqLite". At a minimum the documentation should state that currently the ng-csp directive will not work if at least if there is inline javascript inside the ng-if directive. This is a problem even if the server sends unsafe-inline like in:

 'Content-Security-Policy:default-src 'self' 'unsafe-inline'"

I submitted this issue as a potential bug for consideration.

Disabling impossible. Removing/Uninstalling mod-pagespeed

The below alone will disable Google mod-pagespeed apache module only temporarily:

sudo a2dismod pagespeed
sudo apachectl configtest
sudo apachectl graceful

For some reason we have seen it enabled back sometime after we run the above. You will need to actually uninstall it. Here is how if you installed it from the debian package in Ubuntu:

sudo sh -c \
  'dpkg -r mod-pagespeed-stable \
  && rm -f /etc/apache2/mods-available/pagespeed* \
  && service apache2 restart'

Thursday, March 17, 2016

svn: E155021: This client is too old to work with the working copy at - Finding the executable path in MAC OS X

Got this error ON MAC OS X but you believe you have available the right version?

svn: E155021: This client is too old to work with the working copy at

First look at how many svn

$ which -a svn
/usr/local/bin/svn
/usr/bin/svn
/usr/local/bin/svn

Look at the output for each of the output using the --version flag. You might notice that the default which output is not the same as the default command output:

$ command -v svn
/usr/bin/svn
$ which svn
/usr/local/bin/svn

You might be tempted to update some symlinks or your PATH variable but before doing so try from a new console, it might be just that you are hitting a console open before you installed the latest version.

Thursday, March 03, 2016

Web Application Firewall in Ubuntu with Apache and ModSecurity

Here is a recipe to install and configure ModSecurity (mod_security) tested in Ubuntu 14.04 Apache.

Note that besides copying setup files we edit inline modsecurity.conf to make sure SecRuleEngine is set to On instead of DetectionOnly (switch between them to activate the rules or just get logging information) and to make sure SecAuditLogRelevantStatus is set to "^$" instead of "^(?:5|4(?!04))" (switch between them to get log entries when the application returns 4xx or 5xx status codes or not log them at all)

This recipe will activate sql and command injection protection rules. There are several other core rules you can add just by copying them to /etc/modsecurity and restarting the server after. There are base_rules, experimental_rules and optional_rules distributed in the ModSecurity project.

To test the effectiveness of sql injection protection do not activate the rule (remove the specific crs file from the /etc/modsecurity directory), restart the server and try the below request. Apache will pass it request to your application as usual

https://sample.com/foo?bar=%27%20or%20true%20--

Now activate the rule (put the specific crs file in the /etc/modsecurity directory), restart the server and try the same request. You receive a Forbidden status code (403). From logs (/var/log/modsec_audit.log) you can read:

Message: Access denied with code 403 (phase 2). Pattern match "(^[\"'`\xc2\xb4\xe2\x80\x99\xe2\x80\x98;]+|[\"'`\xc2\xb4\xe2\x80\x99\xe2\x80\x98;]+$)" at ARGS:a. [file "/etc/modsecurity/modsecurity_crs_41_sql_injection_attacks.conf"] [line "64"] [id "981318"] [rev "2"] [msg "SQL Injection Attack: Common Injection Testing Detected"] [data "Matched Data: ' found within ARGS:a: ' or true --"] [severity "CRITICAL"] [ver "OWASP_CRS/2.2.8"] [maturity "9"] [accuracy "8"] [tag "OWASP_CRS/WEB_ATTACK/SQL_INJECTION"] [tag "WASCTC/WASC-19"] [tag "OWASP_TOP_10/A1"] [tag "OWASP_AppSensor/CIE1"] [tag "PCI/6.5.2"] Action: Intercepted (phase 2)

The recipe also activates command injection which you can test as described above using the below url:

https://sample.com/foo?bar=curl

This is a fairly simple setup which I would consider basic to secure any production web application.

You might need to add exclusions for certain non compliant and existent URLs. Since developers could take take a while to fix several existent issues and you do not want to delay the Firewall protection here are some guidelines to smooth the installation and produce your MMF ASAP.

Set 'SecRuleEngine DetectionOnly' so that you can compile the current problems from the log file:

sudo sed -i  's/^SecRuleEngine.*/SecRuleEngine DetectionOnly/' /etc/modsecurity/modsecurity.conf && sudo apachectl graceful

Find the modsecurity rule id for issues so far:

cat /var/log/apache2/yourlogname.log|grep ModSecurity|grep -o 'id "[0-9]*"'|sort|uniq

cat /var/log/apache2/yourlogname.log | grep ModSecurity | grep 981240

sudo less /var/log/modsec_audit.log

Start adding your own rules. Note that I start at rule 1 as per the documentation and I put all of them in a custom.conf file:

# Using 25 MB for upload limit
SecResponseBodyLimit 26214400
# Exceptions for URLs that violate PCRE limits, dangerous characters accepted and parameters too long
SecRule REQUEST_URI "@contains /path/containing/service/accepting/long/parameters/content/and/literal/code/and/others "phase:1,t:none,pass,id:'1',nolog,ctl:ruleRemoveById=959070,ctl:ruleRemoveById=960024,ctl:ruleRemoveById=981173,ctl:ruleRemoveById=950901,ctl:ruleRemoveById=981231,ctl:ruleRemoveById=981240,ctl:ruleRemoveById=981243,ctl:ruleRemoveById=981245,ctl:ruleRemoveById=981318"

Sometimes you need to skip modecurity rules for a specific URI, for example:
```
SecRule REQUEST_BASENAME "@contains blogs" "id:1,ctl:ruleEngine=Off"
```
Once you get no alerts in your integration environment where surely you constantly run automated e2e tests then you can set 'SecRuleEngine On' to enable the APP FW:
```
sudo sed -i  's/^SecRuleEngine.*/SecRuleEngine On/' /etc/modsecurity/modsecurity.conf &&  sudo apachectl graceful
```

Friday, January 22, 2016

Internet Explorer 11 and Cache-Control: no-store - bug or feature?

We spent a considerable amount of time today. IE11 wouldn't render awesome fonts, why? Because we were protecting the privacy of our users and because we were stopping hackers from pulling sensitive information stored in users computers. In short IE11 will not render awesome fonts if you use the below header:

Cache-Control: no-store

The "solution" is to set a max-age for Cache-Control only when fonts are requested. This is an example of "let us please those that do not care much about security affecting those that do care". In my opinion this is an IE11 bug and I would certainly ban this browser until fixed from accessing any application that should comply with privacy laws.