Thursday, October 20, 2011

CouchDB filtered replication

One of the greatest features of CouchDB is its replication which allows for great distributed computing. It reminds me when in 1999 I met Erlang language for the first time (Working for a Telco). Erlang is made for distributed computing and so CouchDB which of course is built in Erlang.

I have to say I have successfully tested this in upcoming version 1.1.1 (built from a branch) Do not try this in 1.1.0.

The example below is based on the document that I have been discussing in the three part tutorial about building a Document Management System (DMS) with CouchDB.

Filtered or selective replication is a two step process:
  1. First create a filter named for example "clientFilter" in a new document called "replicateFilter". This sample filter will reject any client not matching the clientId parameter (step 2 explains what this parameter is about). Any deleted documents will be deleted from the target as well.
    curl -H 'Content-Type: application/json' -X PUT http://127.0.0.1:5984/dms4/_design/replicateFilter -d \
    '{
      "filters":{
        "clientFilter":"function(doc, req) {
          if (doc._deleted) {
            return true;
          }
     
          if(!doc.clientId) {
            return false;
          }
     
          if(!req.query.clientId) {
            throw(\"Please provide a query parameter clientId.\");
          }
     
          if(doc.clientId == req.query.clientId) {
            return true;
          }
          return false;
        }"
      }
    }'
    
  2. Create a replication document called "by_clientId". This example passes clientId=1 as a parameter to the filter we created in step number 1 ("replicateFilter/clientFilter"). You figured we will end up replicating documents for that client.
    curl -H 'Content-Type: application/json' -X POST http://127.0.0.1:5984/_replicator -d \
    '{
      "_id":"by_clientId",
      "source":"dms4",
      "target":"http://couchdb.nestorurquiza.com:5984/dms4",
      "create_target":true,
      "continuous":true,
      "filter":"replicateFilter/clientFilter",
      "query_params":{"clientId":1}
    }'
    

Deleting a replication document is how you turn off that replication. This is not any different than deleting any other document:
nestor:~ nestor$ curl -X GET http://127.0.0.1:5984/_replicator/by_clientId
{"_id":"by_clientId","_rev":"5-e177ca7f79d9ba6f91b803a2cb2abc1e","source":"dms4","target":"http://couchdb.nestorurquiza.com:5984/dms4","create_target":true,"continuous":true,"filter":"replicateFilter/clientFilter","query_params":{"clientId":1},"_replication_state":"triggered","_replication_state_time":"2011-10-20T13:09:56-04:00","_replication_id":"d8dc09e97f4948de0294260dda19fc6f"}
nestor:~ nestor$ curl -X DELETE http://127.0.0.1:5984/_replicator/by_clientId?rev=5-e177ca7f79d9ba6f91b803a2cb2abc1e
{"ok":true,"id":"by_clientId","rev":"6-0d20d90cbed22837eb3233e2bd8dfb2c"}

The same applies for getting a list of the current defined "selective replicators". You can use a temporary view like I show here or create a permanent view to list all the replicators:
$ curl -X POST http://127.0.0.1:5984/_replicator/_temp_view -H "Content-Type: application/json" -d '{
  "map": "function(doc) {
            emit(null, doc);
          }"
}'

7 comments:

Hackworth said...

I have been looking at couchdb for a project I am working on. This post about filtered replication was very enlightening.
However I am not sure couch DB will do what I need. Here are my assumptions:
1. there are two dbs (A and B). A has some documents that we want to replicated to B.
2. Not all documents from A should be replication to B. Only the ones that the user of DB B has authorization to view.
3. the filter argument on the replication is optional, even if a filter is defined
4. This means that the user of DB B could start a replication with out the filter argument and get access to unauthorized documents.

Firstly, are any of my assumptions wrong? Secondly, is there a way to force the filter on a replication? Or is there a way to restrict user B from starting a replication (so user A can start the replication and enforce the filter)?
Is there a way now or in planing stages to limit replication and access of specific documents based on roles of the user?

Thanks
Jeff
clarke.hackworth(at)gmail.com

Nestor Urquiza said...

You should probably get a better answer from user@couchdb.apache.org mailing list but let me try to respond:

3. Filters can accept parameters, meaning they are not mandatory.

4. I wouldn't rely on couchDB authorization for anything other than having a protected from the outside DB. So I use a middle tier where authorization is actually handled in layers. In general I do not believe it is a good idea to have your application composed of just couchDB and javascript in the browser. If you follow this pattern you have no replication permision problems.

So no user starts any replication. You just instruct couchDB to do certain replications using different filters and that is going to happen until you cancel them again from an admin account.

In the meanwhile your two users are application users and not really couchdb users so of course those credentials will not work to trigger or remove any replication.

If you insist going the anemic js in browser plus couchdb only scenario I suggest you ask your question in the mailing list. For sure guys are working on that if not ready. I see many people going that route (I call it anemic because MVC is a pattern that IMO needs to be respected and correctly separated. An attempt to merge layers will end up with a design that will be incomplete to target separation of concerns, multiple client View and security for example)

Best,
-Nestor

Edillon said...

I am finally getting to a better understanding of CouchDB and I am in the process of trying to figure out how selective replication works. I now think I understand that you create a filter doc and then you have a replication doc and in your example you are filtering by "clientId", so do you create just documents and inside those docs you have a field that is called clientId? I have tried setting this up but I can't get the replication to happen.

Nestor Urquiza said...

Hi Edillon,

You are correct. Not sure why it is not woring for you. Are you sure you are running the version I suggested?

I forgot to post the document I am using so I have added now a paragraph "The example below is based on the document that I have been discussing in the three part tutorial about building a Document Management System (DMS) with CouchDB."

Hopefully following the tutorial will allow you to understand and even test the replication based on the document I am using.

Best,
-Nestor

Edillon said...

Ok, I got the replication to work, but it only seems to work when I send up a new replication document. I was assuming since I added the document to the _replicator db with continuous=true it would replicate any document that came into the db with a clientId of 1.

Edillon said...

I figured it out, I was going into Futon and editing the continuous manually in there, which wouldn't re-trigger the replication. When I blew away the doc and then re-posted it to the _replication db, it set the replication to a "triggered" state and now new docs are getting replicated over, thanks for your help.

Nestor Urquiza said...

Glad you got it working. Very powerful feature indeed.

Followers