Saturday, October 21, 2017

Simple and precise problem definition leads to the best software specifications

Be simple and precise. Simplicity brings efficiency and preciseness brings effectiveness. They both, combined, bring productivity.

Simple means "easily understood or done; presenting no difficulty". Precise means "clearly expressed or delineated".

See the below simple specification:
"Examine any workflow task with status 'not started' and send a 'tasks pending to be performed' notification to its owner if for such task workflow there is no previous task or if the previous task is in status 'completed'"
It is easy to understand and there should be no difficulty involved in its implementation. However this specification is not precise, and because of it the transaction costs will make its implementation at least an order of magnitude more expensive than its counterpart simple and precise specification:
Create task owner 1
Create task owner 2
Create workflow 1 accepting default values
Create workflow 2 accepting default values
Assert these tasks persist and their status is 'Not Started' because this is the default status and the owner is not a mandatory field
Assign owner 1 to workflow 1 task 1
Assign owner 2 to workflow 1 task 2
Assign owner 1 to workflow 2 task 1
Assign owner 2 to workflow 2 task 2
Assert that owner 1 gets two 'tasks pending to be performed' notifications because owner 1 is assigned to the first task of each workflow
Assert that owner 2 gets no 'tasks pending to be performed' notifications because owner 2 is assigned to a task with a predecessor task that is not completed yet
Update status 'In Progress' for workflow 1 task 1
Update status 'In Progress' for workflow 2 task 1
Assert that no 'tasks pending to be performed' notifications are sent because Owner 1 is still working on his task and Owner 2 should not be working until Owner 1 has finalized her task
Update status 'Not Started' for workflow 1 task 1
Update status 'Completed' for workflow 2 task 1
Assert that owner 1 gets one 'tasks pending to be performed' notification for workflow 1 task 1 because this is the first task in the workflow, it is assigned to owner 1 and the task should be started if it is not started
Assert that owner 2 gets one 'tasks pending to be performed' notification for workflow 2 task 2 because the previous task was completed and the assigned task has not started
Instead of going through the above top-down exercise that would allow to put automated end to end (e2e) testing in place and guarantee that important business rules are never broken we go lazy both with specs and QA. Documentation and QA are as important as implementing the functionality for your product. The three of them (documentation, implementation, and QA) should be simple and precise.

The devil is in the details and we cannot be simpler than what is absolutely needed. We kept as much simplicity as we could by structuring our specification using verbs to command what we should do (Create, Update, Assign, Assert) and at the same time we brought preciseness to the mix, specially by using the 'because' keyword for assertions. With simplicity we achieve efficiency and with preciseness we achieve effectiveness. They both, combined, bring productivity.

We resolve the whole problem using cause and effect led specifications: We have actions; an assertion on those *specific* actions and the cause/effect explanation via a 'because' statement. Not only the business rules are clear, but the test case is straightforward and in fact the test case is what drives the whole specification and implementation. It results in a total quality control based software development lifecycle system.

In this post I have explained how the documentation problem can be resolved in a simple and precise manner. I should probably write about how to resolve the QA and implementation problems in a simple and precise manner soon ...

Sunday, October 15, 2017

Tail logs from all Kubernetes pods at once with podlogs.sh

Unfortunately you cannot tail and at the same time use selectors:
$ kubectl logs -l 'role=nodejs' --tail=2 -f 
error: only one of follow (-f) or selector (-l) is allowed
See 'kubectl logs -h' for help and examples.
However there are alternatives. Here is a how to tail logs from all Kubernetes pods at once using just one Plain Old Bash (POB) script.

Wednesday, October 04, 2017

upgrading kubernetes - container pods stuck in state 'unknown'

I deleted an old pod that was sticking in our cluster without explanation and it turned into state 'unknown'. Getting logs from nodejs apps was impossible, in fact 'kubectl exec' hanged ssh sessions. I remember that I saw errors like these (pods reluctant to be deleted) when GKE was expecting a k8s upgrade. So I did and the issue got resolved.
# add temporary access from 0.0.0.0/0 (anywhere) to protected services pods connect to
# check cluster version
gcloud container clusters list
# switch to the specific project
gcloud config set project my-project
gcloud container clusters get-credentials my-project-cluster --zone us-east1-b --project my-project
# check available versions
gcloud container get-server-config
# upgrade cluster master. Note that you have to go up one minor version at a time, for example 1.5.7 needs to go up to 1.6.7 before being upgraded to 1.7.2
gcloud container clusters upgrade my-project-cluster --master --cluster-version=1.7.6-gke.1
# upgrade cluster nodes
gcloud container clusters upgrade my-project-cluster --cluster-version=1.7.6
# list instances external IPs
gcloud compute instances list
# remove access from old external pod IPs, add access to the new external IPs using CIDR /32
# remove temporary access from 0.0.0.0/0 (anywhere)

Followers