Monday, October 31, 2011

Spring One 2011 Notes

It was quite a week of learning with the folks from Spring Framework in the SpringOne Conference in Chicago.

Not only I had the opportunity to hear Spring explained from the code committers but I managed to discuss some common architectural, managerial and leadership issues with other attendees.

The amount of work Spring helps to solve could be enough for four weeks of conference so there were mainly 4 slices (or tracks) of very related conferences from which you had to pick just one. Here are some comments about those that I attended.

Javascript application engineering

It is clear javascript is now stronger that ever. It was a nice look into how to do Object Oriented Programming with Javascript, Inheritance through prototypes (Object.create(Object.prototype)), closures and more. Between others websites like Javascript weekly and hacker news were recommended to be followed up.

Spring Integration. Practical Tips and tricks

The Spring Integration project is an attempt to implement the patterns described in the book Enterprise Integration Patterns Book by Gregor Hohpe and Bobby Wolf
It was a discussion about Consumers, Producers and the Publish/Subscribe pattern using AMQP channels and comparing them with JMS channels. Of course SOAP, REST and others are supported. Channel redundancy was presented Rabitt MQ and JSM. The project reminds me of ESB and indeed there was a mention to it. Basically Spring Integration is a different approach than ESB because there is no server needed but just patterns applied.

Messaging

Messaging was presented through the Rabbit MQ which is the reference implementation of the Advanced Messaging Queue Protocol (AMQP). It was even proposed to start sending messages between Controllers and Services to prepare the application to scale in the future. I have to say I won't buy into that even though for sure I would consider using it to implement future distributed needs in my projects. Messaging was presented as a way around problems inherent to the use of shared class loaders (OSGI). In reality that is what any distributed technique does after all, being more than anything a consequence of the paradigm in use.

Neo4J

This is a graph database and the presentation was useful especially to eliminate any of the ongoing discussions about what is the best database out there in the NoSQL domain. It is clear they all address different problems while of course there is more than one choice within each possible domain. Basically there is a trade between size and complexity and in that order we have different options like key-value, bigtable, document, graph.

Spring Insight Plugin Development

This is a project with a mission that sounds familiar: Inspect how the application is behaving performance wise. Classes are annotated with @InsightOperation and @InsightEndPoint to allow performance inspection: Basically how long the method gets to execute. In other words where the time is spent in the application. While this could be definitely useful in many scenarios we have tools doing this already so it looked to me like an effort to build a lightweight profiler focused on vFabric.

Spring Integration. Implementing Scalable Architectures with Spring Integration

A simple implementation based on a simple relational database locked row was presented to provide a messaging cluster implementation.

Spring Batch

In a glance this is a project to manipulate data. We learned how Spring Batch works while defining and instantiating jobs (Job and Job Instance) that basically act on data. I will not start a war about if spring is better or worst that etl for this task. It is just another way of doing the same and depending on your architecture and team one will be favored over the other. I tend to separate concerns as much as possible and I think data is better processed and tackle by data centric developers and existing tools. In my experience most of people dealing with Spring actually love to implement business rules that are not necessarily related to data intensive processing. But again this discussion should end here, there is not correct way just different ways.

Deliver Performance and scalability with ehcache and spring

JSR107 will standardize caching while some of us still use some caching api from Google code to cache data at method level. However terracotta ehCache and the new standard add more flexibility to caching. For simple caching you can use the Google code API but for more complicated situations you will use this approach which involves @Cacheable and @CacheEvict annotations. When the source code is not available @Cacheable can be used from XML which is great you will agree. AOP helps here to cache certain methods. Different providers like terracotta backed open source Ehcache offer different possibilities which go beyond the basis of the default ConcurrentMap implementation. Automatic Resource Control now in beta will allow ehacahe to use cache sizing in terms of bytes or percentage besides the existing object count. Then of course you can even go for enterprise class monitoring with "terracotta Developer Console".

Eventing Data with RabbitMQ and Riak

Riak was discussed as the unique NoSQL database that would allow hooking into the data insertion. This is of advantage to create a trigger for RabitMQ when new data gets stored. A very cool example was presented where WebSocket was used to notify the client directly from the server. Again the CAP theorem was discussed: You must pick one from Consistency, Availability and Performance triangle and that would ultimately drive your decision towards one database or another.

Where did my architecture go

It was disscused how to control the gap between architecture and codebase. Here you have code analysis with JDepend and Sonar, SonarGraph (formerly SonarJ). Suggestions were made about the usage of ServiceLoader or OSGI for externalizing class creation.

Tailoring Spring for Custom Usage

As we all know Spring is about DI, IOP and Enterprise Service Abstractions (libraries). This lecture went through each of the components giving real case utilization of them. We saw @Configuration and @Bean to provide configuration from Java code instead of XML, @Value to allow the injection of a system property as a string member of a class, Spring scopes to be used for example to eliminate the need to pass custom context objects from method to method (A technique that I still use in my projects for maintaining ControllerContext), even though Singleton and Prototype are the most common scopes there are several others like Request and Session, @Profile annotation to define production or development. The Spring expression language in general was introduced.

Improving Java with Groovy

There were several trcks of Groovy and Grails. I attended just this one from the latest slice because there was nothing for Spring Java available and even though it was basic stuff we saw examples that did work from an excellent demo showing how to use both Java and Groovy in your project. It was basically a practical presentation of the ideas behind the book "Making Java Groovy". Heavy use of closures showed how the code got smaller while avoiding Interfaces and Inner Classes.

Monday, October 24, 2011

Cron error email notification in Ubuntu

A member of the team realized one of the cron processes was failing but we were not getting any alerts.

While I favor the use of LogMonitor for absolutely everything there are simple commands you use in cron jobs for which ideally you would like to get notifications without the need of logging.

If you design your script carefully respecting the fact that error messages mut go to stderr this procedure should work:

  1. Be sure sSMTP is installed and properly configured:
    $ sudo apt-get install ssmtp
    $ sudo vi /etc/ssmtp/ssmtp.conf
    ...
    root=nurquiza@nestorurquiza.com
    ...
    mailhub=mail.nestorurquiza.com
    ...
    
  2. Add a line at the beginning of crontab stating who the error email should me sent to:
    $ crontab -e 
    MAILTO=alerts@nestorurquiza.com
    0 22 * * * /usr/sbin/sftp_backup.sh > /dev/null
    ...
    

Note the redirection of stdout to /dev/null. You want to receive I bet just errors right?

Saturday, October 22, 2011

Upgrading CouchDB in Ubuntu

I had to upgrade CouchDB to candidate release 1.1.1 and I did it just following the same instructions I followed to install it from scratch.

Installing CouchDB in Ubuntu

I have tested the below procedure in Ubuntu 10.10 (Maverick) to install CouchDB 1.1.1 candidate release.

$ ps -ef|grep couch|awk '{print $2}'|xargs sudo kill -9
$ sudo apt-get update
$ sudo apt-get autoremove
$ sudo apt-get remove couchdb
$ sudo apt-get build-dep couchdb
$ sudo apt-get install libtool zip
$ cd
$ curl -O http://ftp.mozilla.org/pub/mozilla.org/js/js185-1.0.0.tar.gz
$ tar xvzf js185-1.0.0.tar.gz 
$ cd js-1.8.5/js/src
$ ./configure
$ make
$ sudo make install
$ cd
$ curl -O http://www.erlang.org/download/otp_src_R14B04.tar.gz
$ tar xvzf otp_src_R14B04.tar.gz 
$ cd otp_src_R14B04
$ ./configure --enable-smp-support --enable-dynamic-ssl-lib --enable-kernel-poll
$ make
$ sudo make install
$ cd
$ svn co http://svn.apache.org/repos/asf/couchdb/branches/1.1.x/ couchdb1.1.x
$ cd couchdb1.1.x
$ ./bootstrap
$ prefix='/usr/local'
$ ./configure --prefix=${prefix} 
$ make
$ sudo make install
$ sudo useradd -d /var/lib/couchdb couchdb
$ sudo chown -R couchdb: ${prefix}/var/{lib,log,run}/couchdb ${prefix}/etc/couchdb
$ for dir in `whereis couchdb | sed 's/couchdb: //'`; do echo $dir | xargs sudo chown couchdb; done
$ export xulrunnerversion=`xulrunner -v 2>&1 >  /dev/null | egrep -o "([0-9]{1,2})(\.[0-9]{1,2})+"`
$ echo $xulrunnerversion
$ echo "/usr/lib/xulrunner-$xulrunnerversion" > /etc/ld.so.conf.d/xulrunner.conf
$ echo "/usr/lib/xulrunner-devel-$xulrunnerversion" >> /etc/ld.so.conf.d/xulrunner.conf
$ sudo ln -s /usr/local/etc/init.d/couchdb /etc/init.d/couchdb
$ update-rc.d couchdb defaults
$ /etc/init.d/couchdb start
$ curl -X GET http://localhost:5984
{"couchdb":"Welcome","version":"1.1.1a1187726"}

Friday, October 21, 2011

Monitoring CouchDB with Monit

Just add the below in monitrc and reload config. I am expecting you hardened CouchDB and so it is listening in an SSL port. Note that couchdb is already monitored by heart so you can probably just not monitor it as well ;-)

#################################################################
# couchdb
################################################################

#check process couchdb
#  with pidfile /usr/local/var/run/couchdb/couchdb.pid
#  start program = "/usr/local/etc/init.d/couchdb start"
#  stop program = "/usr/local/etc/init.d/couchdb stop"
#  if failed port 6984 then restart
#  if failed url https://localhost:6984/ and content == '"couchdb"' then restart

#couchdb does not save the parent pid of the starting process so the above would serve no purpose
check host couchdb with address localhost
  if failed port 6984 then alert
  if failed url https://localhost:6984/ and content == '"couchdb"' then alert
group couchdb

Hardening CouchDB: A more secure distributed database

Here are the steps to follow in order to harden CouchDB. I am currently using candidate release 1.1.1 and these are instructions for a local environment. For a production environment you have to locate the non dev config files local.ini and default.ini (Just run "ps -ef | grep couchdb" and look for the path, in my case /usr/local/etc/couchdb/local.ini and /usr/local/etc/couchdb/default.ini).

  1. Install a primary key and generated pem certificate:
    $ export DOMAIN=couchdb.nestorurquiza.com
    $ openssl genrsa -des3 -out ${DOMAIN}.pem 1024
    $ openssl req -new -key ${DOMAIN}.pem -out ${DOMAIN}.csr
    $ openssl x509 -req -days 3650 -in ${DOMAIN}.csr -signkey ${DOMAIN}.pem -out ${DOMAIN}.cert.pem
    $ openssl rsa -in ${DOMAIN}.pem -out ${DOMAIN}.pem
    $ sudo mkdir -p /opt/couchdb/certs/
    $ sudo cp ${DOMAIN}.pem /opt/couchdb/certs/
    $ sudo cp ${DOMAIN}.cert.pem /opt/couchdb/certs/
    

  2. Configure couchDB for *SSL only* and restart the server
    $ vi /Users/nestor/Downloads/couchdb1.1.x/etc/couchdb/local.ini
    [admins]
    ;admin = mysecretpassword
    nestorurquizaadmin = secret1
    ...
    [daemons]
    ; enable SSL support by uncommenting the following line and supply the PEM's below.
    ; the default ssl port CouchDB listens on is 6984
    httpsd = {couch_httpd, start_link, [https]}
    ...
    [ssl]
    cert_file = /opt/couchdb/certs/couchdb.nestorurquiza.com.cert.pem
    key_file = /opt/couchdb/certs/couchdb.nestorurquiza.com.pem
    $ vi /Users/nestor/Downloads/couchdb1.1.x/etc/couchdb/default.ini
    [httpd]
    bind_address = 0.0.0.0 #Use no loopback address only if the server will be exposed to a different machine
    ...
    [daemons]
    ...
    ; httpd={couch_httpd, start_link, []}
    ...
    
  3. Restart
    sudo couchdb #local OSX
    sudo /usr/local/etc/init.d/couchdb restart #Ubuntu
    

  4. Now we can access the secured CouchDB instance
    HOST="https://nestorurquizaadmin:mysecret@127.0.0.1:6984"
    curl -k -X GET $HOST
    

Note that at this time to remove the insecure http you have to use /usr/local/etc/couchdb/default.ini which gets overwritten after any upgrade.

Of course we should sign our certificate with a Certificate Authority (CA) to make this really secure but you can also add some security if you play with firewall rules and allow access to the SSL port to certain IP only. I can do so because I do use a middle tear between CouchDB and the browser. If you directly hitting CouchDB from the wild you better use a CA.

Thursday, October 20, 2011

CouchDB filtered replication

One of the greatest features of CouchDB is its replication which allows for great distributed computing. It reminds me when in 1999 I met Erlang language for the first time (Working for a Telco). Erlang is made for distributed computing and so CouchDB which of course is built in Erlang.

I have to say I have successfully tested this in upcoming version 1.1.1 (built from a branch) Do not try this in 1.1.0.

The example below is based on the document that I have been discussing in the three part tutorial about building a Document Management System (DMS) with CouchDB.

Filtered or selective replication is a two step process:
  1. First create a filter named for example "clientFilter" in a new document called "replicateFilter". This sample filter will reject any client not matching the clientId parameter (step 2 explains what this parameter is about). Any deleted documents will be deleted from the target as well.
    curl -H 'Content-Type: application/json' -X PUT http://127.0.0.1:5984/dms4/_design/replicateFilter -d \
    '{
      "filters":{
        "clientFilter":"function(doc, req) {
          if (doc._deleted) {
            return true;
          }
     
          if(!doc.clientId) {
            return false;
          }
     
          if(!req.query.clientId) {
            throw(\"Please provide a query parameter clientId.\");
          }
     
          if(doc.clientId == req.query.clientId) {
            return true;
          }
          return false;
        }"
      }
    }'
    
  2. Create a replication document called "by_clientId". This example passes clientId=1 as a parameter to the filter we created in step number 1 ("replicateFilter/clientFilter"). You figured we will end up replicating documents for that client.
    curl -H 'Content-Type: application/json' -X POST http://127.0.0.1:5984/_replicator -d \
    '{
      "_id":"by_clientId",
      "source":"dms4",
      "target":"http://couchdb.nestorurquiza.com:5984/dms4",
      "create_target":true,
      "continuous":true,
      "filter":"replicateFilter/clientFilter",
      "query_params":{"clientId":1}
    }'
    

Deleting a replication document is how you turn off that replication. This is not any different than deleting any other document:
nestor:~ nestor$ curl -X GET http://127.0.0.1:5984/_replicator/by_clientId
{"_id":"by_clientId","_rev":"5-e177ca7f79d9ba6f91b803a2cb2abc1e","source":"dms4","target":"http://couchdb.nestorurquiza.com:5984/dms4","create_target":true,"continuous":true,"filter":"replicateFilter/clientFilter","query_params":{"clientId":1},"_replication_state":"triggered","_replication_state_time":"2011-10-20T13:09:56-04:00","_replication_id":"d8dc09e97f4948de0294260dda19fc6f"}
nestor:~ nestor$ curl -X DELETE http://127.0.0.1:5984/_replicator/by_clientId?rev=5-e177ca7f79d9ba6f91b803a2cb2abc1e
{"ok":true,"id":"by_clientId","rev":"6-0d20d90cbed22837eb3233e2bd8dfb2c"}

The same applies for getting a list of the current defined "selective replicators". You can use a temporary view like I show here or create a permanent view to list all the replicators:
$ curl -X POST http://127.0.0.1:5984/_replicator/_temp_view -H "Content-Type: application/json" -d '{
  "map": "function(doc) {
            emit(null, doc);
          }"
}'

OSX homebrew update: unexpected token near HOMEBREW_BREW_FILE

Update: I have seen this issue with other packages as well. Here is a recipe to make sure a specific version of ruby is used:
cp -p /usr/local/bin/brew /usr/local/bin/brew.old
echo  '#!'/Users/`whoami`/.rvm/rubies/ruby-1.8.7-p160/bin/ruby > /tmp/f1
sed '1d' /usr/local/bin/brew > /tmp/f2
cat /tmp/f1 /tmp/f2 > /usr/local/bin/brew
Original Post: After upgrading homebrew:
$ brew update

I got errors:
$ brew install mozilla-js
/usr/local/bin/brew: line 4: syntax error near unexpected token `('
/usr/local/bin/brew: line 4: `HOMEBREW_BREW_FILE = ENV['HOMEBREW_BREW_FILE'] = File.expand_path(__FILE__)'

Here is a temporary solution which you will need to do everytime you update brew:
$ sudo vi /usr/local/bin/brew
#!/Users/nestor/.rvm/rubies/ruby-1.9.2-p290/bin/ruby
##!/usr/bin/ruby

Upgrading couchDB in OSX

While trying to use couchDB filtered replication I had some problems as documented in gist.

I had to upgrade then couchDB to a non released version. Here are the steps I followed.

Delete all files from previous installations:
$ sudo find /usr/local -name couchdb | sudo xargs rm -fR

Install ICU from http://download.icu-project.org
$ tar xvzf icu4c-4_8_1-src.tgz 
$ cd icu/source/
$ ./runConfigureICU MacOSX --with-library-bits=64 --disable-samples --enable-static # if this fails for you try: ./configure --enable-64bit-libs
$ make
$ sudo make install

Install Spidermonkey
$ curl -O http://ftp.mozilla.org/pub/mozilla.org/js/js185-1.0.0.tar.gz
$ tar xvzf js185-1.0.0.tar.gz 
$ cd js-1.8.5/js/src
./configure 
$ make
$ sudo make install

Then install latest Erlang:
$ curl -O http://www.erlang.org/download/otp_src_R14B04.tar.gz
$ tar xvzf otp_src_R14B04.tar.gz 
$ cd otp_src_R14B04
$ ./configure --enable-smp-support --enable-dynamic-ssl-lib --enable-kernel-poll --enable-darwin-64bit
$ make
$ sudo make install

Then checkout the needed version. I tried from git (http://git-wip-us.apache.org/repos/asf/couchdb.git) first but I had several problems with autoconf and beyond (.configure was not available so I needed to go with automake -a; autoconf; autoheader) so I then built from svn:
$ svn co http://svn.apache.org/repos/asf/couchdb/branches/1.1.x/ couchdb1.1.x
$ cd couchdb1.1.x
$ ./bootstrap
$ ./configure
$ make
$ sudo make install
$ sudo couchdb

Sunday, October 16, 2011

Document Management System with CouchDB - Third Part

This is the last part of my attempt to cover how to use CouchDB to build a DMS. We will be using Java and the Ektorp library for the implementation.

Using Ektorp

A good place to start is downloading the BlogPost application (org.ektorp.sample-1.7-project) that shows some of the most important concepts around the Ektorp API. Follow Ektorp Tutorial and of course do not miss the reference documentation.

I will show here the steps on how to make your existing spring project able to interact with CouchDB using the Ektorp project. The code presented here is an implementation of the ideas exposed in the second part of this project.

Here the dependencies for your project. Note that I am adding file upload dependencies because I am building a DMS and I will provide an upload module:
<!-- Ektorp for CouchDB -->
        <dependency>
            <groupId>org.ektorp</groupId>
            <artifactId>org.ektorp</artifactId>
            <version>${ektorp.version}</version>
            <exclusions>
             <exclusion>
              <artifactId>slf4j-api</artifactId>
              <groupId>org.slf4j</groupId>
             </exclusion>
            </exclusions>
        </dependency>
        <dependency>
            <groupId>org.ektorp</groupId>
            <artifactId>org.ektorp.spring</artifactId>
            <version>${ektorp.version}</version>
        </dependency>

        <!-- File Upload -->
        <dependency>
            <groupId>commons-fileupload</groupId>
            <artifactId>commons-fileupload</artifactId>
            <version>1.2.2</version>
        </dependency>

In spring application context xml (Note the upload component which is only needed because again we need it for the DMS upload functionality):
...
xmlns:util="http://www.springframework.org/schema/util"
xmlns:couchdb="http://www.ektorp.org/schema/couchdb"
...
http://www.springframework.org/schema/util http://www.springframework.org/schema/util/spring-util-3.0.xsd
http://www.ektorp.org/schema/couchdb http://www.ektorp.org/schema/couchdb/couchdb.xsd
...
<context:component-scan base-package="com.nu.dms.couchdb.ektorp.model"/>
<context:component-scan base-package="com.nu.dms.couchdb.ektorp.dao"/>
...
<util:properties id="couchdbProperties" location="classpath:/couchdb.properties"/>
<couchdb:instance id="dmsCouchdb" url="${couchdb.url}" properties="couchdbProperties" />
<couchdb:database id="dmsDatabase" name="${couchdb.db}" instance-ref="dmsCouchdb" />
...
<!-- File Upload -->
<bean id="multipartResolver" class="org.springframework.web.multipart.commons.CommonsMultipartResolver">
        <property name="maxUploadSize" value="2000000"/>
</bean>

In classpath environment properties file:
couchdb.url=http://localhost:5984
couchdb.db=dms4 #my database name

File classpath couchdb.properties file:
host=localhost
port=5984
maxConnections=20
connectionTimeout=1000
socketTimeout=10000
autoUpdateViewOnChange=true
caching=false

A Document POJO following the BlogPost POJO from the Ektorp example:
package com.nu.dms.couchdb.ektorp.model;

import org.ektorp.Attachment;
import org.ektorp.support.CouchDbDocument;

public class CustomCouchDbDocument extends CouchDbDocument {

    private static final long serialVersionUID = -9012014877538917152L;

    @Override
    public void addInlineAttachment(Attachment a) {
        super.addInlineAttachment(a);
    }   
}


package com.nu.dms.couchdb.ektorp.model;

import java.util.Date;

import javax.validation.constraints.NotNull;

import org.ektorp.support.TypeDiscriminator;

public class Document extends CustomCouchDbDocument {

    private static final long serialVersionUID = 59516215253102057L;
    
    public Document() {
        super();
    }
    
    public Document(String title) {
        this.title = title;
    }
    
    /**
     * @TypeDiscriminator is used to mark properties that makes this class's documents unique in the database. 
     */
    @TypeDiscriminator
    @NotNull
    private String title;
    
    private int clientId;
    private int createdByEmployeeId;
    private int reviewedByEmployeeId;
    private int approvedByManagerId;
    private Date dateEffective;
    private Date dateCreated;
    private Date dateReviewed;
    private Date dateApproved;
    private int investorId;
    private int categoryId;
    private int subCategoryId;
    private int statusId;
    
    public String getTitle() {
        return title;
    }
    public void setTitle(String title) {
        this.title = title;
    }
    public int getClientId() {
        return clientId;
    }
    public void setClientId(int clientId) {
        this.clientId = clientId;
    }
    public int getCreatedByEmployeeId() {
        return createdByEmployeeId;
    }
    public void setCreatedByEmployeeId(int createdByEmployeeId) {
        this.createdByEmployeeId = createdByEmployeeId;
    }
    public int getReviewedByEmployeeId() {
        return reviewedByEmployeeId;
    }
    public void setReviewedByEmployeeId(int reviewedByEmployeeId) {
        this.reviewedByEmployeeId = reviewedByEmployeeId;
    }
    public int getApprovedByManagerId() {
        return approvedByManagerId;
    }
    public void setApprovedByManagerId(int approvedByManagerId) {
        this.approvedByManagerId = approvedByManagerId;
    }
  
    public Date getDateEffective() {
        return dateEffective;
    }

    public void setDateEffective(Date dateEffective) {
        this.dateEffective = dateEffective;
    }

    public Date getDateCreated() {
        return dateCreated;
    }
    public void setDateCreated(Date dateCreated) {
        this.dateCreated = dateCreated;
    }
    public Date getDateReviewed() {
        return dateReviewed;
    }
    public void setDateReviewed(Date dateReviewed) {
        this.dateReviewed = dateReviewed;
    }
    public Date getDateApproved() {
        return dateApproved;
    }
    public void setDateApproved(Date dateApproved) {
        this.dateApproved = dateApproved;
    }
    public int getInvestorId() {
        return investorId;
    }
    public void setInvestorId(int investorId) {
        this.investorId = investorId;
    }
    public int getCategoryId() {
        return categoryId;
    }
    public void setCategoryId(int categoryId) {
        this.categoryId = categoryId;
    }
    public int getSubCategoryId() {
        return subCategoryId;
    }

    public void setSubCategoryId(int subCategoryId) {
        this.subCategoryId = subCategoryId;
    }
    public int getStatusId() {
        return statusId;
    }
    public void setStatusId(int statusId) {
        this.statusId = statusId;
    }
    @Override
    public void setRevision(String s) {
        // downstream code does not like revision set to emtpy string, which Spring does when binding
        if (s != null && !s.isEmpty()) super.setRevision(s);
    }
    
    public boolean isNew() {
        return getId() == null;
    }
}

A Document Repository:
package com.nu.dms.couchdb.ektorp.dao;

import org.ektorp.CouchDbConnector;
import org.ektorp.support.CouchDbRepositorySupport;

public class CustomCouchDbRepositorySupport<T> extends CouchDbRepositorySupport<T> {


    protected CustomCouchDbRepositorySupport(Class<T> type, CouchDbConnector db) {
        super(type, db);
    }

    public CouchDbConnector getDb() {
        return super.db;
    }   
}

package com.nu.dms.couchdb.ektorp.dao;

import java.io.InputStream;

@Component
public class DocumentRepository  extends CustomCouchDbRepositorySupport<Document> {
    
    private static final Logger log = LoggerFactory.getLogger(DocumentRepository.class);
    
    @Autowired
    public DocumentRepository(@Qualifier("dmsDatabase") CouchDbConnector db) {
        super(Document.class, db);
        initStandardDesignDocument();
    }

    @GenerateView @Override
    public List<Document> getAll() {
        ViewQuery q = createQuery("all")
                        .includeDocs(true);
        return db.queryView(q, Document.class);
    }
    
    public Page<Document> getAll(PageRequest pr) {
        ViewQuery q = createQuery("all")
                        .includeDocs(true);
        return db.queryForPage(q, pr, Document.class);
    }
    
    @View( name = "tree", map = "classpath:/couchdb/tree_map.js", reduce = "classpath:/couchdb/tree_reduce.js")
    public InputStream getTree(String startKey, String endKey, int groupLevel) {
        ViewQuery q = createQuery("tree")
        .startKey(startKey)
        .endKey(endKey)
        .groupLevel(groupLevel)
        .group(true);
        InputStream is = db.queryForStream(q);
        return is;
    }

}

Map and Reduce javascript functions in src/main/resources/couchdb in other words in the classpath:
//by_categoryId_map.js
function(doc) { 
 if(doc.title && doc.categoryId) {
  emit(doc.categoryId, doc._id)
 } 
}

//by_categoryId_reduce.js
_count

//tree_map.js
function(doc) {
  var tokens = doc.dateEffective.split("-");
  var year = null;
  var month = null;
  if(tokens.length == 3) {
    year = tokens[0];
    month = tokens[1];
  }
  var key = [doc.clientId, doc.categoryId, doc.subCategoryId, year, month].concat(doc.title);
  var value = null;
  emit(key, value);
}

//tree_reduce.js
_count

In web.xml we need to allow flash because we will use the swfUpload to upload multiple files:
<url-pattern>*.swf</url-pattern>
</servlet-mapping>

Keeping things short I am not using a Service layer so I provide a unique Controller that allows creating a document including the attachment and metadata, uploading multiple documents (using swfupload flash) which follow a strong naming convention to produce all necessary metadata our of their names, viewing the documents as explained in part two in a tree view (using jquery.treeview.async.js)

package com.nu.web.controller.dms;

import java.io.ByteArrayOutputStream;
import java.io.InputStream;
import java.net.URLConnection;
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.Iterator;
import java.util.Locale;
import java.util.Map;
import java.util.Scanner;

import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

import org.apache.commons.codec.binary.Base64;
import org.apache.commons.fileupload.FileItemFactory;
import org.apache.commons.fileupload.FileUploadException;
import org.apache.commons.fileupload.disk.DiskFileItemFactory;
import org.apache.commons.fileupload.servlet.ServletFileUpload;
import org.apache.commons.lang.StringUtils;
import org.apache.poi.util.IOUtils;
import org.codehaus.jackson.JsonNode;
import org.codehaus.jackson.map.ObjectMapper;
import org.codehaus.jackson.node.ObjectNode;
import org.ektorp.Attachment;
import org.ektorp.AttachmentInputStream;
import org.ektorp.PageRequest;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Controller;
import org.springframework.ui.Model;
import org.springframework.validation.BindingResult;
import org.springframework.validation.ObjectError;
import org.springframework.web.bind.WebDataBinder;
import org.springframework.web.bind.annotation.InitBinder;
import org.springframework.web.bind.annotation.ModelAttribute;
import org.springframework.web.bind.annotation.PathVariable;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.multipart.MultipartFile;
import org.springframework.web.multipart.support.ByteArrayMultipartFileEditor;
import org.springframework.web.servlet.ModelAndView;

import com.nu.dms.couchdb.ektorp.dao.DocumentRepository;
import com.nu.dms.couchdb.ektorp.model.Document;
import com.nu.web.ControllerContext;
import com.nu.web.RootController;
import com.nu.web.WebConstants;
import com.nu.web.validator.BeanValidator;
import com.windriver.gson.extension.GeneralObjectDeserializer;

@Controller
@RequestMapping("/dms/document/*")
public class DocumentController extends RootController {
    private static final Logger log = LoggerFactory.getLogger(DocumentController.class);
    
    @Autowired
    DocumentRepository documentRepository;
    
    @Autowired
    private BeanValidator validator;
    
    private static final String LIST_PATH = "/dms/document/list";
    private static final String FORM_PATH = "/dms/document/form";
    private static final String TREE_PATH = "/dms/document/tree";
    public static final long UPLOAD_MAX_FILE_SIZE = 20 * 1024 * 1024; //10 MB
    public static final long UPLOAD_MAX_TOTAL_FILES_SIZE = 1 * 1024 * 1024 * 1024; //1 GB
    private static final int DOCUMENT_LEVEL = 5;
    
    @RequestMapping("/list")
    public ModelAndView list(   HttpServletRequest request, 
                                HttpServletResponse response, 
                                Model m, 
                                @RequestParam(value = "p", required = false) String pageLink) {
        ControllerContext ctx = new ControllerContext(request, response);
        init(ctx);

        if (!isValidCsrfToken(ctx)) {
            return getModelAndView(ctx, LIST_PATH);
        }
        
        PageRequest pr = pageLink != null ? PageRequest.fromLink(pageLink) : PageRequest.firstPage(5);
        m.addAttribute(documentRepository.getAll(pr));
        return getModelAndView(ctx, LIST_PATH);
    }
    
    @RequestMapping("/add")
    public ModelAndView add(HttpServletRequest request,
            HttpServletResponse response,
            @RequestParam(value = "attachment", required = false) MultipartFile multipartFile,
            @ModelAttribute("document") Document document,
            BindingResult result) {

        //Store constants for JSP
        request.setAttribute("UPLOAD_MAX_FILE_SIZE", UPLOAD_MAX_FILE_SIZE); 
        
        ControllerContext ctx = new ControllerContext(request, response);
        init(ctx);

        if (!isValidCsrfToken(ctx)) {
            return getModelAndView(ctx, FORM_PATH);
        }

        if (!isSubmission(ctx)) {
            return getModelAndView(ctx, FORM_PATH);
        } else {
            
            validator.validate(document, result);
            
            if (result.hasErrors()) {
                return getModelAndView(ctx, FORM_PATH);
            } else {
                if(multipartFile == null) {
                    result.addError(new ObjectError("document", getMessage("error.add", new String[] {"document"})));
                    return getModelAndView(ctx, FORM_PATH);
                }
                
                try {
                    String title = document.getTitle();
                    if(StringUtils.isEmpty(title)) throw new Exception("Empty title");
                    document.setId(title);
                    String contentType = multipartFile.getContentType();
                    String base64 = new String (Base64.encodeBase64(multipartFile.getBytes()));
                    if(StringUtils.isEmpty(base64)) throw new Exception("Empty attachment");
                    Attachment a = new Attachment(title, base64, contentType);
                    document.addInlineAttachment(a);
                } catch (Exception ex) {
                    result.addError(new ObjectError("attachmentError", getMessage("error.attachingDocument")));
                    log.error(null, ex);
                    return getModelAndView(ctx, FORM_PATH);
                }   
                
                try {
                    document.setDateCreated(new Date());
                    documentRepository.add(document);
                } catch (Exception ex) {
                    result.addError(new ObjectError("document", getMessage("error.add", new String[] {"document"})));
                    log.error(null, ex);
                    return getModelAndView(ctx, FORM_PATH);
                }
                return getModelAndView(ctx, LIST_PATH, true, true);
            }
        }
    }
    
    
    @RequestMapping("/{id}/edit")
    public ModelAndView edit(HttpServletRequest request,
            HttpServletResponse response,
            @ModelAttribute("document") Document document,
            BindingResult result,
            @PathVariable("id") String id,
            Model model) {

        ControllerContext ctx = new ControllerContext(request, response);
        init(ctx);

        if (!isValidCsrfToken(ctx)) {
            return getModelAndView(ctx, FORM_PATH);
        }

        try {
            Document storedDocument = documentRepository.get(id);
            if(storedDocument == null) {
                throw new Exception("No document found with id '" + id + "'");
            }
           
            if (!isSubmission(ctx)) {
                model.addAttribute("document", storedDocument);
                return getModelAndView(ctx, FORM_PATH);
            } else {
                validator.validate(document, result);
                
                if (result.hasErrors()) {
                    return getModelAndView(ctx, FORM_PATH);
                } else {
                    String title = document.getTitle();
                    if(StringUtils.isEmpty(title)) throw new Exception("Empty title");
                    document.setId(title);
                    document.setDateCreated(storedDocument.getDateCreated());
                    document.setRevision(storedDocument.getRevision());
                    documentRepository.update(document);
                    return getModelAndView(ctx, LIST_PATH, true, true);
                }
            }
        } catch (Exception ex) {
            result.addError(new ObjectError("document", getMessage("error.edit", new String[] {"document"}) + "." + ex.getMessage()));
            log.error(null, ex);
            return getModelAndView(ctx, FORM_PATH);
        }
    }
    
    
    @RequestMapping("/{id}/delete")
    public ModelAndView delete(HttpServletRequest request,
            HttpServletResponse response,
            @ModelAttribute("document") Document document,
            BindingResult result,
            @PathVariable("id") String id,
            Model model) {
        ControllerContext ctx = new ControllerContext(request, response);
        init(ctx);

        if (!isValidCsrfToken(ctx)) {
            return getModelAndView(ctx, LIST_PATH);
        }

        try {
            document = documentRepository.get(id);
            if(document == null) {
                throw new Exception("No document found with id '" + id + "'");
            }
            documentRepository.remove(document);
        } catch (Exception ex) {
            result.addError(new ObjectError("document", getMessage("error.add", new String[] {"document"})));
            log.error(null, ex);
            return getModelAndView(ctx, LIST_PATH);
        }
        return getModelAndView(ctx, LIST_PATH, true, true);
    }
    
    @RequestMapping("/{id}/show")
    public ModelAndView show(HttpServletRequest request,
            HttpServletResponse response,
            @ModelAttribute("document") Document document,
            BindingResult result,
            @PathVariable("id") String id,
            Model model) {
        ControllerContext ctx = new ControllerContext(request, response);
        init(ctx);

        if (!isValidCsrfToken(ctx)) {
            return getModelAndView(ctx, LIST_PATH);
        }

        try {
            document = documentRepository.get(id);
            if(document == null) {
                throw new Exception("No document found with id '" + id + "'");
            }
            Map<String, Attachment> attachments = document.getAttachments();
            if(attachments == null || attachments.size() == 0) {
                throw new Exception("No attachment found for id '" + id + "'");
            }
            for(Map.Entry<String, Attachment> entry : attachments.entrySet()) {
                String attachmentId = entry.getKey();
                Attachment attachment = entry.getValue();
                //long contentLength = attachment.getContentLength();
                String contentType = attachment.getContentType();
                AttachmentInputStream ais = documentRepository.getDb().getAttachment(id, attachmentId);
                response.setHeader("Content-Disposition", "attachment; filename=\"" + document.getTitle() + "\"");
                response.setContentType(contentType);
                final ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
                IOUtils.copy(ais, outputStream);
                render(response, outputStream);
            }
            return getModelAndView(ctx, LIST_PATH, true, true);
        } catch (Exception ex) {
            result.addError(new ObjectError("document", getMessage("error.internal", new String[] {"document"})));
            log.error(null, ex);
            return getModelAndView(ctx, LIST_PATH);
        }
        
    }
    
    @RequestMapping("/tree")
    public ModelAndView tree(HttpServletRequest request,
            HttpServletResponse response,
            @RequestParam(value = "root", required = false) String root) {
        ControllerContext ctx = new ControllerContext(request, response);
        init(ctx);

        if (!isValidCsrfToken(ctx)) {
            return getModelAndView(ctx, TREE_PATH);
        }
        if(root == null) {
            return getModelAndView(ctx, TREE_PATH);
        }
        try {
            Object objTree = getTreeObject(root);
            if(objTree == null) {
                ctx.setRequestAttribute("treeInfo", getMessage("noItemFound", new String[] {"record"}));
            }
            ctx.setRequestAttribute("tree", objTree);
            return getModelAndView(ctx, TREE_PATH);
        } catch (Exception ex) {
            ctx.setRequestViewAttribute("treeError", ex.getMessage());
            log.error(null, ex);
            return getModelAndView(ctx, TREE_PATH);
        }
        
    }
    
    /**
     * The needs for the current treeview plugin makes mandatory certain json structure so parsing the /document/tree service is not an option at the moment
     * @param request
     * @param response
     * @param root
     * @return
     */
    @RequestMapping("/ajaxTree")
    public ModelAndView ajaxTree(HttpServletRequest request,
            HttpServletResponse response,
            @RequestParam(value = "root", required = true) String root) {
        ControllerContext ctx = new ControllerContext(request, response);
        init(ctx);

        if (!isValidCsrfToken(ctx)) {
            return getModelAndView(ctx, LIST_PATH);
        }

        try {
            InputStream is = getTreeInputStream(root);
            ObjectMapper mapper = new ObjectMapper();
            JsonNode rootNode = mapper.readValue(is, JsonNode.class);
            JsonNode rowsNode = rootNode.get("rows");
            Iterator<JsonNode> iter = rowsNode.getElements();
            while (iter.hasNext()) {
                JsonNode row = iter.next();
                JsonNode keyNode = row.get("key");
                String[] key = mapper.readValue(keyNode, String[].class);
                if(key == null) {
                    continue;
                }
                String name = key[key.length - 1];
                String classes = null;
                if(key.length == DOCUMENT_LEVEL + 1) {
                    //Listing files
                    String extension = "unknownExtension";
                    String fileName = key[DOCUMENT_LEVEL];
                    String[] tokens = fileName.split("\\.");
                    if(tokens.length >= 2) {
                        extension = tokens[tokens.length - 1];
                    }
                    classes = "file " + extension;
                } else {
                    //Listing folders
                    classes = "folder";
                }
                
                ((ObjectNode)row).put("classes", classes);
                ((ObjectNode)row).put("name", name);
                boolean hasChildren = false;
                //To use the key as id for next request
                if(key.length != DOCUMENT_LEVEL + 1) {
                    int value = mapper.readValue(row.get("value"), Integer.class);
                    if(value > 0) {
                        hasChildren = true;
                    }
                } else {
                    ((ObjectNode)row).put("url", request.getContextPath() + "/dms/document/" + name + "/show?ctoken=" + ctx.getSessionAttribute(WebConstants.CSRF_TOKEN, ""));
                }
                ((ObjectNode)row).put("hasChildren", hasChildren);
                ((ObjectNode)row).put("id", keyNode.toString().replaceAll("[\\[\\]]", ""));
                
            }
            
            response.setContentType("application/json");
            final ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
            mapper.writeValue(outputStream, rowsNode);
            render(response, outputStream);
            return null;
        } catch (Exception ex) {
            log.error(null, ex);
            return null;
        }
        
    }
    
    private Object getTreeObject(String root) {
        InputStream is = getTreeInputStream(root);
        String tree = new Scanner(is).useDelimiter("\\A").next();
        Object objTree = GeneralObjectDeserializer.fromJson(tree);
        return objTree;
    }
    
    private InputStream getTreeInputStream(String root) {
        //Making it compatible with the jquery tree view in use by Portal in Liferay
        String endKey = null;
        if("0".equals(root)) {
            endKey = "[{}]";
        } else {
            endKey = "[" + root.substring(0,root.length()) + ",{}]";
        }
         
        String[] tokens = endKey.replaceAll("[\\[\\]]", "").split(",");
        int groupLevel = tokens.length;
        String startKey = "[1]";
        if(!"{}".equals(tokens[0])) {
            startKey = "[" + root + "]";
        }
        InputStream is = documentRepository.getTree(startKey, endKey, groupLevel);
        return is;
    }
    
    @InitBinder
    public void initBinder(WebDataBinder binder) {
        binder.registerCustomEditor(byte[].class, new ByteArrayMultipartFileEditor());
    }

    
    /**
     * To be consumed by swfupload.swf which is in charge of uploading multiple files
     * 
     * @param request
     * @param response
     * @return
     */
    @RequestMapping("/addBatch")
    public void addBatch(HttpServletRequest request,
            HttpServletResponse response,
            @RequestParam(value = "Filedata", required = false) MultipartFile multipartFile,
            @ModelAttribute("document") Document document,
            BindingResult result) {

        String message = "Completed";
        ControllerContext ctx = new ControllerContext(request, response);
        init(ctx);
        
        try {
            if (!isValidCsrfToken(ctx)) {
                message = error("Invalid session token");
            } else {
                uploadFile(request, multipartFile);
            }
            
        } catch (FileUploadException e) {
            log.error("FileUploadException:", e);
            message = error(e.getMessage());
        } catch (Exception e) {
            String errorMessage = e.toString();
            log.error("FileUploadException:", e);
            if (errorMessage == null) {
                errorMessage = "Internal Error. Please look at server logs for more detail";
            }
            message = error(errorMessage);
        }
        
        try {
            render(response, message.getBytes());
        } catch (Exception ex) {
            log.error(null, ex);
        }
    }
    
    /**
     * File must follow this convention: clientId_categoryId_subCategoryId_year_month_investorId.extension
     * @param req
     * @param multipartFile
     * @throws Exception
     */
    private void uploadFile(HttpServletRequest req, MultipartFile multipartFile) throws Exception {

            // Create a new file upload handler
            FileItemFactory factory = new DiskFileItemFactory();
            ServletFileUpload upload = new ServletFileUpload(factory);
            upload.setFileSizeMax(UPLOAD_MAX_FILE_SIZE);
            upload.setSizeMax(UPLOAD_MAX_TOTAL_FILES_SIZE);

            if(multipartFile == null) {
                throw new FileUploadException(getMessage("error.add", new String[] {"document"}));
            }
            
            String fileName = multipartFile.getOriginalFilename();
            String[] tokens = fileName.split("_");
            if(tokens.length != 6) {
                throw new Exception("Filename must have 6 tokens");
            }
            int clientId = Integer.parseInt(tokens[0]);
            int categoryId = Integer.parseInt(tokens[1]);
            int subCategoryId = Integer.parseInt(tokens[2]);
            String year = tokens[3];
            String month = tokens[4];
            //Use whole filename as title
            int investorId = Integer.parseInt(tokens[5].split("\\.")[0]);
            String title = fileName;
            //Using swfupload we almost always get "application/octet-stream"
            String contentType = multipartFile.getContentType();
            String guessedContentType = URLConnection.guessContentTypeFromName(fileName);
            if (guessedContentType != null) {
                contentType = guessedContentType;
            }
            String base64 = new String (Base64.encodeBase64(multipartFile.getBytes()));
            if(StringUtils.isEmpty(base64)) throw new Exception("Empty attachment");
            Attachment a = new Attachment(fileName, base64, contentType);
            Document document = new Document(title);
            document.setId(fileName);
            document.addInlineAttachment(a);
            document.setDateCreated(new Date());
            document.setClientId(clientId);
            document.setCategoryId(categoryId);
            document.setSubCategoryId(subCategoryId);
            document.setDateEffective(new SimpleDateFormat("yyyy-MM-dd", Locale.ENGLISH).parse(year + "-" + month + "-" + 1));
            document.setInvestorId(investorId);
            documentRepository.add(document);
            
            //Within Spring the below does not work
            /*
            List<FileItem> items = (List<FileItem>) upload.parseRequest(req);
            Iterator<FileItem> iter = items.iterator();
            while (iter.hasNext()) {
                FileItem item = iter.next();
                if (item.isFormField()) {
                    String name = item.getFieldName();
                    String value = item.getString();
                    log.debug("Form field " + name + " with value " + value);
                } else {
                    String fileName = item.getName();
                    String contentType = item.getContentType();
                    Document document = new Document(fileName);
                    String base64 = new String (Base64.encodeBase64(item.get()));
                    if(StringUtils.isEmpty(base64)) throw new Exception("Empty attachment");
                    Attachment a = new Attachment(fileName, base64, contentType);
                    document.addInlineAttachment(a);
                    document.setDateCreated(new Date());
                    documentRepository.add(document);
                }
            }
            */
    }
    /*
     * To format the message so sfwupload understands it
     */
    private String error(String error) {
        return "ERROR: " + error;
    }
    
    
}

Below are some screenshots of our simple user interface:



Why CouchDB? and not a SQL database?

Relational Database Management Systems (RDBMS) are good to store tabular data, enforce relationship, remove duplicated information, ensure data consistency and the list goes on and on. There is one thing though that makes relational databases not ideal for distributed computing and that is locking. The need for an alternative comes from impediments related to replication but also from storing hierarchical structures in RDBMS which is not natural. Finally it is difficult to manage inheritance and schema changes can easily become a big problem when the system grows and new simple necessities emerge from business requirements. RDBMS are also slow if you want an index for all fields in a table or multiple complex indexes. If your content management system has a need for distributed computing, fast storage and you can afford the compromise about losing the ability for easy normalization (for example your documents once created are stamped with metadata available at that time and which does not change over time)

CouchDB is a noSql type database which stores data structures as documents (JSON Strings). Schema less, based on b-tree and with no locking (but Multi Version Concurrency Control) design it is a really fast alternative when you look for a solution to store hierarchical non-strict schema data like for example storing web pages or binary files. All this robustness is exposed with a HTTP REST API where JSON is used to send messages to the server as well as receive messages from it. This makes it really attractive for those looking for lightweight solutions. One of the "trade-offs" in couchDB is that the only way to query CouchDB without creating indexes is a temporary View and that is not an option for production as the mapped result will not be stored in a B-Tree hitting performance in your server.

I have no other option than considering CouchDB the logical pick for my BHUB Document Management functionality. CouchDB provides fast access to data specified by stored keys. Using Map functions in your Views there is no limit on the efficiency you can get out of the fact that you can create at any point a key composed of several document fields.

CouchDB has been engineered with replication in mind and that means you get distributed computing on top of the advantages I already discussed above. You just run a command specifying URLs for the source and the destination server and the replication is done. By default latest changes will be favored and if there are conflicts you will get the differences so you can update with changes that resolve the changes. Think about any versioning system like subversion for a comparison on how it works. You can replicate in both directions of course.

Document Management System with CouchDB - Second Part

In the first part we installed CouchDB and explain how to create databases, documents, update them, create attachments and run queries.

Now we will design a DMS and plan for the different functionality we will need with one example.

For simplicity we will say our documents will be imported in batch for which it makes sense to have a convention for the file names client_cat_subcat_year_month_investor.ext. After importing the below 14 documents we can start navigating the tree. Note that in this example the importer code will replace the month by two digits so "3" becomes "03"
1_1_1_2003_1_1.pdf
1_1_1_2003_1_2.pdf
1_1_1_2003_2_3.pdf
1_1_1_2003_2_4.pdf
1_1_1_2004_3_5.pdf
1_1_2_2005_4_6.pdf
1_1_3_2006_5_7.pdf
1_1_3_2006_6_8.pdf
1_2_4_2007_7_9.pdf
2_3_5_2008_8_10.pdf
2_3_5_2009_9_11.pdf
2_3_6_2010_10_12.pdf
2_3_6_2010_11_13.pdf
2_3_7_2011_12_14.pdf

We want to offer a tree view of our documents. First we show the available clients. When the user clicks one of them we show the categories available. Clicking one of the categories will render all subcategories and so on. Let us go by the example of the first document (1_1_1_2003_1_1.pdf)

Here is how to pull all clients. Note the use of {} which means "any":
nestor-nu:~ nestor$ curl -X GET "http://127.0.0.1:5984/dms4/_design/Document/_view/tree?group=true&group_level=1" -G --data-urlencode startkey='[1]' --data-urlencode endkey='[{}]'
{"rows":[
{"key":[1],"value":9},
{"key":[2],"value":5}
]}
The categories for a client (1)
nestor-nu:~ nestor$ curl -X GET "http://127.0.0.1:5984/dms4/_design/Document/_view/tree?group=true&group_level=2" -G --data-urlencode startkey='[1]' --data-urlencode endkey='[1, {}]'
{"rows":[
{"key":[1,1],"value":8},
{"key":[1,2],"value":1}
]}
The subcategories for a client category (1,1)
nestor-nu:~ nestor$ curl -X GET "http://127.0.0.1:5984/dms4/_design/Document/_view/tree?group=true&group_level=3" -G --data-urlencode startkey='[1,1]' --data-urlencode endkey='[1, 1, {}]'
{"rows":[
{"key":[1,1,1],"value":5},
{"key":[1,1,2],"value":1},
{"key":[1,1,3],"value":2}
]} 
The effective years for the client category subcategory (1,1,1)
nestor-nu:~ nestor$ curl -X GET "http://127.0.0.1:5984/dms4/_design/Document/_view/tree?group=true&group_level=4" -G --data-urlencode startkey='[1,1,1]' --data-urlencode endkey='[1,1,1,{}]'
{"rows":[
{"key":[1,1,1,"2003"],"value":4},
{"key":[1,1,1,"2004"],"value":1}
]}
The effective months for the client category subcategory year (1,1,1,"2003"). Note the quotes for 2003 as it is a String obtained from a token.
nestor-nu:~ nestor$ curl -X GET "http://127.0.0.1:5984/dms4/_design/Document/_view/tree?group=true&group_level=5" -G --data-urlencode startkey='[1,1,1,"2003"]' --data-urlencode endkey='[1,1,1,"2003",{}]'
{"rows":[
{"key":[1,1,1,"2003","01"],"value":2},
{"key":[1,1,1,"2003","02"],"value":2}
]}
The documents for the client category subcategory year month(1,1,1,"2003","01"). Note "01" instead "1" just because our importer is treating months as 2 digits values.
nestor-nu:~ nestor$ curl -X GET "http://127.0.0.1:5984/dms4/_design/Document/_view/tree?group=true&group_level=6" -G --data-urlencode startkey='[1,1,1,"2003","01"]' --data-urlencode endkey='[1,1,1,"2003","01",{}]'
{"rows":[
{"key":[1,1,1,"2003","01","1_1_1_2003_1_1.pdf"],"value":1},
{"key":[1,1,1,"2003","01","1_1_1_2003_1_2.pdf"],"value":1}
]}
So you have figured if we want to navigate to document 2_3_5_2009_9_11.pdf we just have to pass startkey='[2]' and endkey=[2,3,5,"2009","09",{}] and the group_level 6:
nestor-nu:~ nestor$ curl -X GET "http://127.0.0.1:5984/dms4/_design/Document/_view/tree?group=true&group_level=6" -G --data-urlencode startkey='[2,3,5,"2009","09"]' --data-urlencode endkey='[2,3,5,"2009","09",{}]'
{"rows":[
{"key":[2,3,5,"2009","09","2_3_5_2009_9_11.pdf"],"value":1}
]}

If we were to build a JSON web service we just need to accept the startkey. The endkey is always an array containing startkey with a new last element: {}.

Specifically if I use BHUB (which is just a concept around Spring Framework) a typical request and response will look like:
http://localhost:8080/nu-app/dms/document/tree?root=2,3,5,"2009","09"&ert=json
{"rows":[
{"key":[2,3,5,"2009","09","2_3_5_2009_9_11.pdf"],"value":1}
]}

In the final part I show how to use Java Erktop library to implement the DMS we have been covering so far.

Document Management System with CouchDB - First Part

I will start documenting about my experience using CouchDB to build a Document Management System (DMS), an important component of any Content Management System (CMS).

The first part concentrates on installing and using CouchDB in OSX and Ubuntu.

OSX

Alternatively you could install from sources which I prefer to get later and greatest.

  1. Download any pending updates for OSX. Then latest version of XCode
  2. Install homebrew if you still not have it. It is the best package manager for OSX.
  3. /usr/bin/ruby -e "$(curl -fsSL https://raw.github.com/gist/323731)"
    
  4. Install couchDB. Following instructions from http://wiki.apache.org/couchdb/Installation with just one command. It could take a while, if it hangs then restart again, it will continue from where it broke.
  5. brew install couchdb
    
  6. Start the server.
    $ couchdb
    $ curl -X GET http://127.0.0.1:5984/
    
  7. If you need to change the default port:
    sudo vi  /usr/local/etc/couchdb/default.ini
    

Ubuntu

In Ubuntu I decided to build from sources to get latest available version for my 10.10 Maverick. For 11.4 I found this also works but you have to issue 'sudo apt-get remove libmozjs185-dev' in order to build.

Using CouchDB

Let us start interacting with CouchDB to create a database, a document, attach a file to it, update it etc. We use curl to be sure we can issue different HTTP request method (GET, POST, PUT, DELETE).
  1. We create our Document Management System database and we confirm it was created:
    $ curl -X PUT http://127.0.0.1:5984/dms
    {"ok":true}
    $ curl -X GET http://127.0.0.1:5984/_all_dbs
    ["_replicator","_users","dms"]
    
  2. Let us create a first document. In this example we use POST instead of PUT so we get a UUID generated by CouchDB:
    $ curl -d '{
        "name":"Investor Document 11",
        "clientId": "1001",
        "createdByEmployeeId": "2",
        "reviewedByEmployeeId": "1",
        "approvedByManagerId": "21",
        "created": "2/2/2011",
        "reviewed": "2/3/2011",
        "approved": "2/4/2011",
        "investorId": "32",
        "categoryId": "2",
        "statusId": "2"
    }' -H "Content-Type: application/json" -X POST http://127.0.0.1:5984/dms
    
    Result:
    {"ok":true,"id":"296ef7cde8fe533efe0c7dded873505b","rev":"1-d5dd0fa82df07553f3a2b82947864fc6"}
    
  3. Let us attach a file to the above document. Note we need the document is and rev:
    $ curl -X PUT -H "Content-Type: application/pdf"  --data-binary @DailyReport.pdf  $DMS/296ef7cde8fe533efe0c7dded873505b/DailyReport.pdf?rev=1-d5dd0fa82df07553f3a2b82947864fc6
    
  4. You can visually manage your CouchDB server via Futon user interface. Just hit http://localhost:5984/_utils/ and start playing with it.
  5. Create some documents for different combinations of categoryId, clientId and investorId either from curl or from Futon.
  6. Let us start querying our DB.

    You query a View in CouchDB. Views are a combination of two functions that are applied to the original data: Map and Reduce (MapReduce style: Map functions generate indexes and Reduce queries are requests against them). Map function as its name suggests specifies the mapping between the document structure and the structure of the View. Reduce function as its name suggests specifies how to group the resulting data to reduce the results. The View is consequently just a transformation of the document where an index is usually defined. If you need to group a Reduce step will be applied as well.

    You must become familiar with how to write the map and reduce functions for Views. This is done using the javascript language. From Futon select "Temporary View ..." option from the View dropdown. You have two panes now, the left is for your Map function and the right is for the Reduce. By default you see CouchDB proposes the below code which is equivalent to "Do not use any custom key and show all values from the document". There is no transformation nor custom index at all in this case, however if no key is specified couchDB uses the document id as unique identifier). Remember the Map function generates rows containing the id, an optional key and an optional value.
    function(doc) {
      emit(null, doc);
    }
    
  7. Let us edit the function to "Use the name as key and show only clientId and investorId". When you run the view using both functions you will realize the difference. By now you should be aware that emit() just accepts two parameters, the key for an index and the value that will be returned. Of course the results come ordered by the Key if provided. Both key and value are json expressions as well.
    function(doc) {
      //emit(doc.name, doc);
      if(doc.name && doc.clientId) {
        var key = doc.name;
        var value = {name: doc.name, clientId: doc.clientId, investorId: doc.investorId}
        emit(key, value);
      }
    }
    
  8. Here we use a composite key out of the clientId and the investorId so we can find the documents for that combination. Again the results are ordered first by clientId and later by investorId:
    function(doc) {
      if(doc.clientId && doc.investorId) {
        var key = [doc.clientId, doc.investorId];
        var value = {name: doc.name, clientId: doc.clientId, investorId: doc.investorId}
        emit(key, value);
      }
    }
    
  9. Save the view. The options you pick will be used in the URL to retrieve the View results. I have decided to use "common" for the design document name and "by_client_investor" for the name of the view:
    Design Document: _design/common
    View Name: by_client_investor
    

  10. Now the View is saved so we can query it at any time. The View is now "Permanent" and not longer "Temporary". Let us query it for just one key. Note that as we decided to use an array as key we will need to look for something like: ["1000","30"]. As you might have notice the key contains characters that must be URL encoded, in this case %5B%221000%22%2C%2230%22%5D. Here is how the command will look like:
    $ curl -X GET http://127.0.0.1:5984/dms/_design/common/_view/by_client_investor?key=%5B%221000%22%2C%2230%22%5D
    
    Alternatively you can use a more clear approach using some other curl flags:
    curl -X GET http://127.0.0.1:5984/dms/_design/common/_view/by_client_investor -G --data-urlencode key='["1000","30"]'
    
  11. Here is how you use curl to create and execute a temporary View from the command line. Here we are using categoryId as a key and getting the whole document as a result of the "non existent" transformation.
    curl -X POST http://127.0.0.1:5984/dms/_temp_view -H "Content-Type: application/json" -d \
    '{
      "map": "function(doc) {
                if (doc.categoryId) {
                  emit(doc.categoryId, doc);
                }
              }"
    }'
    
  12. Let us explore the results of the below temporary View. Here we are insterested in the total documents by category. As we are grouping we need to use a Reduce function where we take advantage of the provided _count. Note the key is null because it counts all of the existing documents.
    curl -X POST http://127.0.0.1:5984/dms/_temp_view -H "Content-Type: application/json" -d \
    '{
      "map": "function(doc) {
                if (doc.categoryId) {
                  emit(doc.categoryId, doc);
                }
              }",
      "reduce": "_count"
    }'
    
  13. Here is how we generate the counting by key which translates to use "Grouping=exact" from Futon or as shown below "group=true" from the HTTP request:
    curl -X POST http://127.0.0.1:5984/dms/_temp_view?group=true -H "Content-Type: application/json" -d \
    '{
      "map": "function(doc) {
                if (doc.categoryId) {
                  emit(doc.categoryId, doc);
                }
              }",
      "reduce": "_count"
    }'
    
  14. We already saw how to make a View permanent while saving it from Futon. Here is how from an HTTP request you do the same. This time we are adding a View called category_count to a Design Document called category:
    curl -X PUT http://127.0.0.1:5984/dms/_design/category -d \
    '{
       "_id": "_design/category",
       "language": "javascript",
       "views": {
         "count": {
           "map":
             "function(doc) {
               if (doc.categoryId) {
                 emit(doc.categoryId, doc);
               }
             }",
           "reduce": "_count"
          }
       }
    }'
    
  15. As we already saw we can query this view like this:
    curl -X GET http://127.0.0.1:5984/dms/_design/category/_view/count
    {"rows":[
    {"key":"1","value":17},
    {"key":"2","value":1}
    ]}
    

Some other examples

Here is how you would pull document information, delete its attachment (named the same as the document) using the revision number, try to delete it again and get an error, try to pull the attachment from the document and get an error and pull information again to confirm the document is still in the DB but simply it does not have any attachments. Note that I am accessing here now a production system where we use SSL with user and password:
$ curl -k -X GET "https://user:password@example.com:6984/dms/sample.pdf"
{"_id":"sample.pdf","_rev":"1-adb7b6f2e32d73758dfa16966c1caef9","approvedOn":"2012-03-15T16:34:08.000-0400","createdByEmployeeEmail":"nestor@example.com","title":"sample.pdf","_attachments":{"sample.pdf":{"content_type":"application/pdf","revpos":1,"digest":"md5-ZRJB3hYW9LuwL2p9wjJr0g==","length":167546,"stub":true}}}
$ curl -k -X DELETE "https://user:password@example.com:6984/dms/sample.pdf/sample.pdf/?rev=1-adb7b6f2e32d73758dfa16966c1caef9"
{"ok":true,"id":"sample.pdf","rev":"2-87988b99af60e2f7cb9022b65b7565d5"}
$ curl -k -X DELETE "https://user:password@example.com:6984/dms/sample.pdf/sample.pdf/?rev=1-adb7b6f2e32d73758dfa16966c1caef9"
{"error":"conflict","reason":"Document update conflict."}
$ curl -k -X GET "https://user:password@example.com:6984/dms/sample.pdf/sample.pdf"
{"error":"not_found","reason":"Document is missing attachment"}
$ curl -k -X GET "https://user:password@example.com:6984/dms/sample.pdf"
{"_id":"sample.pdf","_rev":"1-adb7b6f2e32d73758dfa16966c1caef9","approvedOn":"2012-03-15T16:34:08.000-0400","createdByEmployeeEmail":"nestor@example.com","title":"sample.pdf"}

Logging

If you are unsure where couchdb is logging just issue the below command:
$ curl -X GET http://localhost:5984/_config/log {"file":"/usr/local/var/log/couchdb/couch.log","include_sasl":"true","level":"info"}
You can also get the latest log lines directly from the below request:
$ curl -X GET http://localhost:5984/_log

Review

At this point you can interact with CouchDB from any language using plain REST commands. You might want to use some abstractions with an API that allows you to go through CRUD operations with CouchDB without being concern about the details of sending and parsing JSON.

In the next part we start the design of the DMS for which we will not use any specific language other than plain HTTP with the help of curl.

Monday, October 03, 2011

Tomcat 7 scans all jars for TLDs

Tomcat 7 scans all jars for TLDs. I am unsure if tomcat 6 does the same:
INFO: At least one JAR was scanned for TLDs yet contained no TLDs. Enable debug logging for this logger for a complete list of JARs that were scanned but no TLDs were found in them. Skipping unneeded JARs during scanning can improve startup time and JSP compilation time.

Once the log level was increased to FINE in conf/logging.properties:
org.apache.catalina.core.ContainerBase.[Catalina].[localhost].level = FINE

We got:
...
FINE: No TLD files were found in [file:/opt/tomcat/webapps/nestorurquiza-app/WEB-INF/lib/org.springframework.transaction-3.0.5.RELEASE.jar]. Consider adding the JAR to the tomcat.util.scan.DefaultJarScanner.jarsToSkip property in CATALINA_BASE/conf/catalina.properties file.
Oct 3, 2011 2:05:58 PM org.apache.jasper.compiler.TldLocationsCache tldScanJar
FINE: No TLD files were found in [file:/opt/tomcat/webapps/nestorurquiza-app/WEB-INF/lib/jsr250-api-1.0.jar]. Consider adding the JAR to the tomcat.util.scan.DefaultJarScanner.jarsToSkip property in CATALINA_BASE/conf/catalina.properties file.
Oct 3, 2011 2:05:58 PM org.apache.jasper.compiler.TldLocationsCache tldScanJar
FINE: No TLD files were found in [file:/opt/tomcat/webapps/nestorurquiza-app/WEB-INF/lib/org.springframework.security.ldap-3.0.5.RELEASE.jar]. Consider adding the JAR to the tomcat.util.scan.DefaultJarScanner.jarsToSkip property in CATALINA_BASE/conf/catalina.properties file.
...

Solution

Add all the project jars to the list in catalina.properties. I will need to see a real impact in performance because of this issue before spending time filling this list out in all of our Tomcat servers.

Tomcat 7 reveals log4j memory leak

We use MDC to log certain information in all traces

Tomcat 7 is notifying the following (As a difference with tomcat 6 which was silent):
SEVERE: The web application [/the-app] created a ThreadLocal with key of type [org.apache.log4j.helpers.ThreadLocalMap] (value [org.apache.log4j.helpers.ThreadLocalMap@757e5533]) and a value of type [java.util.Hashtable] (value [{sessionId=5366DB999B9EA1AC4CF30BED024BA44C, remoteAddress=127.0.0.1}]) but failed to remove it when the web application was stopped. Threads are going to be renewed over time to try and avoid a probable memory leak. 

Solution

Still waiting for a resolution on a Log4j memory leak: https://issues.apache.org/bugzilla/show_bug.cgi?id=50486

Tomcat 7 JSTL Failed to parse the expression

In Tomcat 7 (v7.0.22) method like isNew() cannot be referred as ${myObject.new} as before:
org.apache.jasper.JasperException: /WEB-INF/jsp/client/form.jsp (line: 5, column: 4) "${client.new}" contains invalid expression(s): javax.el.ELException: Failed to parse the expr
ession [${client.new}

This problem was documented a year ago and someone might be tempted to change the code for something like ${myObject.isNew()} after realizing that does work. However latest version of jasper-el breaks for this case which makes me think I will need to change my code again in future versions of Tomcat.

Solution

Change the code from ${client.new} to ${client['new']}

Alternative(s)

In mailing lists I understood Apache 7 is less permissive and since 'new' is not a valid Java identifier it cannot be part of the EL expression like in ${client.new}. There is flag to make Tomcat 7 more permissive albeit rewriting the code should be preferred:
-Dorg.apache.el.parser.SKIP_IDENTIFIER_CHECK=true

I came up with an alternative and temporary (and again not recommended) solution to avoid changing the code which is downloading latest jasper-el http://repo1.maven.org/maven2/org/apache/tomcat/jasper-el/6.0.33/jasper-el-6.0.33.jar or even copying the jar from a previous tomcat installation. Just remember to remove the old jar file:
$ cd ~
$ curl http://repo1.maven.org/maven2/org/apache/tomcat/jasper-el/6.0.33/jasper-el-6.0.33.jar > jasper-el-6.0.33.jar
$ mv /opt/apache-tomcat-7.0.22/lib/jasper-el.jar .
$ cp jasper-el-6.0.33.jar /opt/apache-tomcat-7.0.22/lib/
However that did not work for some other specific and more complex EL expressions (Not even 6.0.36 work with such more complex expressions. Here is just an example of one:
Internal Server Error org.apache.jasper.JasperException: /WEB-INF/jsp/workflow/processTaskInstance/list.jsp (line: 55, column: 20) "${serviceAgreementTypeNames.contains(processTaskInstanceDto.processDefinitionName)}" contains invalid expression(s): javax.el.ELException: Failed to parse the expression [${serviceAgreementTypeNames.contains(processTaskInstanceDto.processDefinitionName)}] at org.apache.jasper.compiler.DefaultErrorHandler.jspError(DefaultErrorHandler.java:42) at org.apache.jasper.compiler.ErrorDispatcher.dispatch(ErrorDispatcher.java:408) at org.apache.jasper.compiler.ErrorDispatcher.jspError(ErrorDispatcher.java:199) at org.apache.jasper.compiler.Validator$ValidateVisitor.checkXmlAttributes(Validator.java:1218) ...

Tomcat 7 TLD skipped ... is already defined

After deployment Tomcat 7 would log the below messages. Tomcat 6 did not:
INFO: TLD skipped. URI: http://java.sun.com/jstl/core_rt is already defined
Oct 3, 2011 11:45:44 AM org.apache.catalina.startup.TaglibUriRule body
INFO: TLD skipped. URI: http://java.sun.com/jstl/core is already defined
Oct 3, 2011 11:45:44 AM org.apache.catalina.startup.TaglibUriRule body
INFO: TLD skipped. URI: http://java.sun.com/jsp/jstl/core is already defined
Oct 3, 2011 11:45:44 AM org.apache.catalina.startup.TaglibUriRule body
INFO: TLD skipped. URI: http://java.sun.com/jstl/fmt_rt is already defined
Oct 3, 2011 11:45:44 AM org.apache.catalina.startup.TaglibUriRule body
INFO: TLD skipped. URI: http://java.sun.com/jstl/fmt is already defined
Oct 3, 2011 11:45:44 AM org.apache.catalina.startup.TaglibUriRule body
INFO: TLD skipped. URI: http://java.sun.com/jsp/jstl/fmt is already defined
Oct 3, 2011 11:45:44 AM org.apache.catalina.startup.TaglibUriRule body
INFO: TLD skipped. URI: http://java.sun.com/jsp/jstl/functions is already defined
Oct 3, 2011 11:45:44 AM org.apache.catalina.startup.TaglibUriRule body
INFO: TLD skipped. URI: http://jakarta.apache.org/taglibs/standard/permittedTaglibs is already defined
Oct 3, 2011 11:45:44 AM org.apache.catalina.startup.TaglibUriRule body
INFO: TLD skipped. URI: http://jakarta.apache.org/taglibs/standard/scriptfree is already defined
Oct 3, 2011 11:45:44 AM org.apache.catalina.startup.TaglibUriRule body
INFO: TLD skipped. URI: http://java.sun.com/jstl/sql_rt is already defined
Oct 3, 2011 11:45:44 AM org.apache.catalina.startup.TaglibUriRule body
INFO: TLD skipped. URI: http://java.sun.com/jstl/sql is already defined
Oct 3, 2011 11:45:44 AM org.apache.catalina.startup.TaglibUriRule body
INFO: TLD skipped. URI: http://java.sun.com/jsp/jstl/sql is already defined
Oct 3, 2011 11:45:44 AM org.apache.catalina.startup.TaglibUriRule body
INFO: TLD skipped. URI: http://java.sun.com/jstl/xml_rt is already defined
Oct 3, 2011 11:45:44 AM org.apache.catalina.startup.TaglibUriRule body
INFO: TLD skipped. URI: http://java.sun.com/jstl/xml is already defined
Oct 3, 2011 11:45:44 AM org.apache.catalina.startup.TaglibUriRule body
INFO: TLD skipped. URI: http://java.sun.com/jsp/jstl/xml is already defined

Solution

Look for duplicates in the server/project jars. In my case spring JSTL has a dependency of Spring standard and eliminating the second solves the problem (The second includes the same TLDs again)
<dependency>
            <groupId>javax.servlet</groupId>
            <artifactId>com.springsource.javax.servlet.jsp.jstl</artifactId>
            <version>1.2.0</version>
            <exclusions>
             <exclusion>
              <artifactId>com.springsource.org.apache.taglibs.standard</artifactId>
              <groupId>org.apache.taglibs</groupId>
             </exclusion>
            </exclusions>
</dependency>

Saturday, October 01, 2011

Upgrade Ubuntu Apache to latest version in available repositories

We are using in some server Ubuntu 10.10 (maverick). It ships with Apache 2.2.14 and there is no repository with an upgrade for this highly compromised apache version.

The latest version of Ubuntu still in beta is 11.10 (Oneiric). It ships Apache 2.2.20 which includes important vulnerabilities fixes.

When you are in a situation like this you need to look for available debian repositories. A good place to search for them is http://repogen.simplylinux.ch/

From the site you will be able to obtain the sources.list file for any Ubuntu distro. Once you have the entries you need to add them locally and then run some commands.

So here is what you can do to upgrade Apache to 2.2.20 in Maverick (and probably other Ubuntu versions)
$ sudo vi /etc/apt/sources.list
...
deb http://us.archive.ubuntu.com/ubuntu/ oneiric main
...
$ sudo apt-get update
$ sudo apt-get install apache2
$ apache2 -v
Server version: Apache/2.2.20 (Ubuntu)
Server built:   Sep  6 2011 18:40:05
$ sudo vi /etc/apt/sources.list
...
#comment it out or delete it completely
#deb http://us.archive.ubuntu.com/ubuntu/ oneiric main
...
$ sudo apt-get update

Followers