Friday, September 30, 2011

Reusing Talend ETL Jobs

The question about how to reuse Talend jobs is always in the forums. I will demonstrate here with a proof of concept how this can be achieved using lightweight JSON HTTP requests.

You can find the source code in here.

If you prefer not to hack into the code then just keep on reading for an explanation.

Build a job (retrieve_stock) like the below which basically uses a Yahoo API to retrieve data about a security symbol:

The job uses a tFileInputJson:
//Basic Settings
context.url + "%22" + context.symbol + "%22"

... and a tJavaFlex:
//Advanced Settings/Import
import org.json.simple.JSONObject;
import org.json.simple.JSONArray;
import org.json.simple.parser.JSONParser;
//Basic Settings
//./Start Code
JSONParser parser = new JSONParser(); 
//./Main Code
JSONObject json = (JSONObject) parser.parse(row1.json);

As you already noticed you need to declare a context with two variables like below (showing default values):
url: "*"
symbol: "AAPL"

For those who like screenshots ;-)

All we are doing is getting the json from the remote URL (built out of context parameters) and returning it to the stdout. The URL will remain fixed in this example but I use a context variable to show it could change in the future.

Export the job as "Autonomous", uncompress the zip and run the wrapper shell/batch. I am working on Windows this time so we will need to modify the batch file adding @ECHO ON at the beginning of the file.

We can run now from command line our Talend job for a different symbol like Google (GOOG)
C:\etl\releases\retrieve_stock_0.1\retrieve_stock>retrieve_stock_run.bat --context_param symbol=GOOG

Here is a typical response
cationreturnedforsymbolchangedinvalid":null,"DaysRange":"519.41 - 537.30","MoreInfo":"cnprmiIed","AnnualizedGain":null,"Change_PercentChange":"-1.34 - -0.25%","DaysRangeRealtime":"N\/A - N\/
e":"9\/29\/2011","TwoHundreddayMovingAverage":"543.247","AskRealtime":"620.00","DividendPayDate":null,"PercentChange":"-0.25%","YearRange":"473.02 - 642.96","symbol":"GOOG","Change":"-1.34",
"PercentChangeFromFiftydayMovingAverage":"-1.17%","HoldingsGainPercent":"- - -","Notes":null,"HoldingsGain":null,"YearHigh":"642.96","Symbol":"GOOG","AfterHoursChangeRealtime":"N\/A - N\/A",
"HoldingsGainPercentRealtime":"N\/A - N\/A","MarketCapitalization":"170.3B","BidRealtime":"485.00","LastTradePriceOnly":"527.50","PERatio":"19.08","EPSEstimateNextQuarter":"10.04","MarketCap
Realtime":null,"AverageDailyVolume":"3820260","PercentChangeFromYearLow":"+11.52%","TickerTrend":"&nbsp;======&nbsp;","LastTradeWithTime":"4:00pm - <b>527.50<\/b>","ChangeFromYearHigh":"-115
0","DaysValueChange":"- - -0.25%","HighLimit":null,"TradeDate":null,"OneyrTargetPrice":"719.68","ChangeRealtime":"-1.34","YearLow":"473.02","ExDividendDate":null,"EPSEstimateNextYear":"41.98
gsGainRealtime":null,"PEGRatio":"0.79","Name":"Google Inc.","Commission":null,"ChangePercentRealtime":"N\/A - -0.25%","DaysValueChangeRealtime":"N\/A - N\/A","LastTradeRealtimeWithTime":"N\/
A - <b>527.50<\/b>","HoldingsValueRealtime":null,"EarningsShare":"27.719","PriceBook":"3.28","ChangeinPercent":"-0.25%","SharesOwned":null,"ShortRatio":"1.60","PriceSales":"5.12"}

This job is supposed to be reused from others. If at any time the way the symbol is retrieved changes we will change the logic from just one place. After that we just need to export retrieve_stock job and deploy it in our server.

We could reuse retrieve_stock job using a tSystem component to invoke the command but I am going to go a step forward and propose something else.

It is well known that java will use fork() to invoke an external command which means the whole JVM heap memory will be duplicated when the process run. This is of course not efficient. I have posted a solution around this issue before.

So the proposal here is to run a local server that executes local talend command line scripts. The output will be JSON. As you have already figured I am advocating here to use JSON as a lightweight data structure that can be used to maintain the communication channel between different jobs.

Using a nodejs server is better than using Jetty or any other java container for just running shell commands. In my tests the memory footprint for nodejs is really low and the performance is similar to a java container.

After you setup your server you should get the same response after hitting a url like the below:

What happens if the amount of data generated by one job is too big? Well in that case I recommend to use a shared resource like a file or a DB (or both like the case of sqlite ;-) If a job works generating a file or DB tables of course the interface will need to be documented and still you can invoke it via a REST call like we have explained here returning back to the caller (Parent job) the results of its execution.

The job we will use to invoke retrieve_stock actually uses the same pattern as you see below:

You will be able to hit a url like the below and your job will return the AAPL price:

As you can see at the end a job will call external jobs using the same JSON response strategy.

I am not including any transformations in these examples because the whole purpose of this post is to discuss alternatives when it comes to inter-job communication in Talend. Here is what this architecture is allowing me to do:
  1. Projects can be isolated and so better maintained in version control systems using the (free and open source) TOS version
  2. There is a good separation of concern when it comes to building an ETL (You do not need to be a Java developer, Talend also supports Perl but even if you build Java projects the language is only used in a scripting fashion so there is no overhead of unneeded OOP for ETL. Many developers are tempted to push data logic in their java code once they have the luxury of working from java and using Talend jobs written in java ) The ETL developer can test absolutely everything without the need of merging jars, troubleshooting class loading issues etc.
  3. Reusability.
  4. Deployment is easy, just uncompressing a zip file.
  5. Release is not hard if you keep the versioned zip file in a repository. Of course this should be so much better but unfortunately the Open Source version is still not providing maven integration for example. It would be great to be able to run 'mvn release' and get the project tagged.
  6. Interoperability: For example a job written in version 4 could interact with others written in version 5 as the only interface between them is an HTTP JSON Service call.


You are working with shell commands which can be pretty destructive if you do not ensure the json server just run locally attached to a loopback IP (in our case You will need a more robust server implementation if you violate that rule or if you cannot guarantee your server will not be hosting any other service where a different user could run malicious code.


In case you did not notice the below two lines are equivalent. The second is URL encoded. We need that to pass the whole command as a parameter from a query string or a POST request.
C:\etl\releases\retrieve_stock_0.1\retrieve_stock\retrieve_stock_run.bat --context_param symbol=AAPL

A nodejs server to run shell commands

WARNING: Since I initially posted about this the JDK patched the issue and there is no need to use a server like this to run commands from a JVM.

I was tempted to use jetty as a lightweight server to resolve the JVM-fork() memory problem as I already posted however nodejs provides a lighter and so from my simplistic design poit of view better alternative.

Below is the code for such a server:
** shell-server.js returns json response with the stdout and stderr of a shell command
** @Author: Nestor Urquiza
** @Date: 09/29/2011

* Dependencies
var http = require('http'),
    url = require('url'),
    exec = require('child_process').exec;

* Server Config
var host = "",
    port = "8088",
    thisServerUrl = "http://" + host + ":" + port;

* Main
http.createServer(function (req, res) {
  req.addListener('end', function () {
  var parsedUrl = url.parse(req.url, true);
  var cmd = parsedUrl.query['cmd'];
  var async = parsedUrl.query['async'];

  res.writeHead(200, {'Content-Type': 'text/plain'});

  if( cmd ) {
    var child = exec(cmd, function (error, stdout, stderr) {
      var result = '{"stdout":' + stdout + ',"stderr":"' + stderr + '","cmd":"' + cmd + '"}';
      res.end(result + '\n');
  } else {
    var result = '{"stdout":"' + '' + '","stderr":"' + 'cmd is mandatory' + '","cmd":"' + cmd + '"}';
    res.end(result + '\n');
  if(async == "true") {
    var result = '{"stdout":"async request' + '' + '","stderr":"' + '' + '","cmd":"' + cmd + '"}';
    res.end(result + '\n');

}).listen(port, host);
console.log('Server running at ' + thisServerUrl );

Once the server is running you can hit the below URL to get a list of the users loged in a OSX/linux/unix box:
Here is how to run a sleep command and a touch command demonstrating the usage of async=true which basically will run the commands but will not check for stdout nor stderr. You can see the response comes back instantaneously however the file will be touched 5 seconds later:
$ ls --full-time /tmp/here
-rw-r--r-- 1 dev dev 0 2014-03-06 16:17:07.546398632 -0500 /tmp/here
$ date
Thu Mar  6 16:17:08 EST 2014
$ curl "http://localhost:8088/?cmd=sleep%2010;%20touch%20/tmp/here&async=true"
{"stdout":"async request","stderr":"","cmd":"sleep 10; touch /tmp/here"}
$ date
Thu Mar  6 16:17:12 EST 2014
$ ls --full-time /tmp/here
-rw-r--r-- 1 dev dev 0 2014-03-06 16:17:20.602399455 -0500 /tmp/here

Development environment

If you are running Windows you should download the node executable and run the server as:
node c:\shell-server.js

If you are running OSX/Linux/Unix you have to install node make the script executable and run the below:

Production deployment

You need to be sure the server runs as a daemon and that it restarts if it fails to serve. In debian/Ubuntu + monit you can follow these steps:
$ sudo vi /opt/nodejs/shell-server.js
$ sudo vi /etc/init/shell-server.conf
description "shell server runs a command specified by HTTP GET param 'cmd'"
author      "admin"

start on startup
stop on shutdown

    #export HOME="/root"
    #exec sudo -u admin /usr/local/bin/node /opt/nodejs/shell-server.js
        #exec su -c "/usr/local/bin/node /opt/nodejs/shell-server.js" dev
    exec start-stop-daemon --start -c dev --exec /usr/local/bin/node /opt/nodejs/shell-server.js
end script
$ sudo vi /etc/monit/monitrc 
# shell-server

check host shell-server with address
start = "/sbin/start shell-server"
stop = "/sbin/stop shell-server"
if failed port 8088 protocol HTTP
  request /
  with timeout 10 seconds
then restart
group server
$ sudo monit reload
$ sudo vi /opt/nodejs/shell-server.js
$ wget http://localhost:8088 -O -
Note that upstart will log stdout and stderr to /var/log/upstart/shell-server.log

Tuesday, September 27, 2011

Mediawiki email notifications for any changes

I have to admit I love simplicity of MoinMoin wiki but more than anything I think it is really developer oriented.

Mediawiki on the other hand is more oriented to a general audience. It is a great product which for developers and other technical teams (my area of expertise) lacks of an important concept: People should be able to get alerts for any change even if they did not visit the page after a previous change was notified. If you want to do something like that in Mediawiki you need to add the email addresses in a configuration file. However the format will not be just the diff but an email with a link to visit the diff. Not ideal.

The first thing you will try in Mediawiki to get notifications about a page update is to watch that page. Every time you get an email notification you must click on the page other wise you will stop getting new updates. Updates will resume once you have visited the page. The steps to get email notifications are:
  1. Go to My Prefernces | Email. You might have a link asking for you to activate your email. Click it or you will never get an email.
  2. Once you have done it you will receive the first email and after clicking on the link if you come back to "My Peferences | Profile | Email Options" you will see something like "Your e-mail address was authenticated on 12 August 2010 at 00:37."
  3. Go to "My Peferences | Watchlist" to set extra options like automatically getting subscribed to pages you create.
  4. Now you can monitor any page clicking on the "watch" link. In the monitored page the link should change from "watch" to "unwatch"

Your e-mail address was authenticated on 12 August 2010 at 00:37.

While you can subscribe to an RSS feed you might prefer email.

Fortunately there is a way around this using a freely available python application called rss2email.

This will send emails with the typical diff format but using colors to highlight conflicts, what is new and what has been changed/removed. See below an example:

Below are the steps to install and configure such application:
$ sudo apt-get remove rss2email
$ cd
$ tar xvf rss2email-2.70.tar.gz 
$ sudo cp -R rss2email-2.70 /opt/
$ sudo mv
$ sudo chown -R nestorurquizaadmin:nestorurquizaadmin /opt/rss2email-2.70/
$ cd /opt/rss2email-2.70/
$ sudo vi 
$ r2e new
$ r2e add
$ r2e run --no-send #remove the flag in case you want to receive emails right away. Keep the flag to receive new changes.
$ crontab -e
# Send wiki changes to developers every 10 minutes
*/10 * * * * cd /opt/rss2email-2.70;./r2e run
I have found that the ~/.rss2email/feeds.dat can get corrupt in which case you will get errors like:
Traceback (most recent call last): File "", line 940, in elif action == "reset": reset() File "", line 874, in reset feeds, feedfileObject = load() File "", line 487, in load feeds = pickle.load(feedfileObject) EOFError
Following all steps above will reinstall rss2email again of course but you will need to provide all configurations you have added. Hence so important you keep track of them. Remember The palest ink is better than the best memory.

Thursday, September 22, 2011

Securing your Apache SSL site

The default apache SSL configuration accepts weak RC4+RSA cipher and SSL v2 both of which are vulnerable

Here is what you have to do to make it secure.

SSLProtocol all -SSLv2
SSLHonorCipherOrder on

If you are in doubts you can use ssllabs free service to find out if your SSL server is secure enough.

You will be amazed how many websites are vulnerable to MIM attacks just because of the fact that some people still think it is enough to buy a signed certificate. What is perhaps even more sad is that some people were surprised about the recent Diginotar hack but if you actually run the test for you will see it rated as "D" because it accepts weak ciphers and still supports insecure SSL 2.0. At the time of this writing that is still the case ( Below are the results I just got:

Please do yourself a favor and make sure your website is hosted in an "A" rated SSL host.

Thinking in Security after OWASP AppSec USA 2011

Just came back from Minneapolis after two days of application security training where we went through several tools that can be used to find out vulnerabilities in Web Applications.

OWASP-WTE is a Ubuntu distribution packaging several open source utilities used to perform what is called Penetration Testing (PenTest). The training focused on basic concepts about HTTP specification that any security tester should know, it presented the most common attacks and the available tools and manual procedures the tester is supposed to master.

Here are some reflections that I have come up with after these two days.

In my current project I have gone through the top ten application security risks I have used skipfish and websecurity and I have documented at least one of my experiences using this product to PenTest Liferay. We have done the same for our BHUB based application however in terms of automated tools it is never enough. What a tool can find others will miss and vice versa.

Dealing with false positives is really annoying but in a world where the number of threats only increases we have no other option than going through this practice in a regular basis.

If you are hosting a web application go through the OWASP Testing Guide. Web apps should be prepared to live in the wild and that is as important as hardening the OS.

Perhaps the most forgotten point related to security is monitoring. Trying your best with dozens of tools and manual hacking attempts is a must do however that is not enough. Controls must be put in place and your server logs are full of useful information you should analyze looking for patterns, eliminating false positives and hopefully automating blocking when a threat is identified.

The PenTest individual is someone that must be willing to script, to be a hacker, a programmer, a human being who knows there is a big responsibility on the job s(he) does. S(he) might be saving the company from failure after all.

The skills for such a person go beyond being a tech savvy. Discipline and persistence are a must have.

Any additional effort you can put in protecting your web application will be worth it but application security is just the tip of the Iceberg because ultimately it will always rely on some credentials for a user to gain access to certain resources and the credentials can be obtained even in applications for which an exploit has not yet been spotted.

Social engineering is one example, take just the real life example of an employee from a security related company who dared to transmit a password via email. Ultimately your company is as secure as the most careless of the company employees.

Phishing is another good example. Look here and here for a couple of posts I have made in the past related to twitter hacking attempts.

I have left Minneapolis convinced that like in any other aspect of our mortal life there is no silver bullet. For sure there is none when you try to implement security.

Wednesday, September 21, 2011

Shell processes from Java and the infamous OutOfMemory

If you run shell processes from a Java Application Server you can really easy run out of heap memory because Java will invoke a fork() system call which will duplicate the parent memory (current JVM memory in use) to be able to run the child (the command you are trying to run). Apparently this is supposed to be corrected in version 8. Is it?

Here is a workaround for this issue. Basically it relies on a war file containing a servlet that accepts regular get parameters like "cmd" containing the complete command to run. It returns a JSON response with the stderr and stdout content. The war file will need to be deployed in an application server with really small footprint to prevent the child process from originating again an OutOfMemory.

Note that I on purpose do not use any special jar files because that way I keep memory usage to a minimum. Do not use a JSON parser for this, just build the response string yourself. Simpler is better ;-)

import javax.servlet.*;
import javax.servlet.http.*;

public class ShellServlet extends HttpServlet {
    public void doGet(HttpServletRequest request, HttpServletResponse response)
            throws ServletException, IOException {
        String cmd = request.getParameter("cmd");
        StringBuilder stdout = new StringBuilder();
        StringBuilder stderr = new StringBuilder();
        if( cmd != null && cmd.length() > 0 ) {
            try {
                Process process = new ProcessBuilder(cmd.split(" ")).start();
                InputStream is = process.getInputStream();
                InputStreamReader isr = new InputStreamReader(is);
                BufferedReader br = new BufferedReader(isr);
                String line;
                while ((line = br.readLine()) != null) {
                InputStream errorInputStream = process.getErrorStream();
                isr = new InputStreamReader(errorInputStream);
                br = new BufferedReader(isr);
                while ((line = br.readLine()) != null) {
            } catch (Exception e) {
        StringBuilder sb = new StringBuilder();
        sb.append("'stdout':'" + stdout + "'");
        sb.append("'stderr':'" + stderr + "'");
        sb.append("'cmd':'" + cmd + "'");
        PrintWriter out = response.getWriter();

For a request like:

You will get (provided notifier.png file exist in the working directory) something like:
{'stdout':'-rw-r--r--  1 nestor  staff  886 Oct 18  2010 notifier.png','stderr':'','cmd':'ls -al notifier.png'}

Probably your best bet in Java would be Jetty because it is really lightweight. Simply download the zip file, uncompress it and follow the below steps:
$ vi etc/jetty.xml #change port to 8088
$ java -jar start.jar & 

Edit: A better alternative for a server to run local commands would be to build a lighter nodejs service like I explain here.

Mediawiki Error creating thumbnail: sh: convert: No such file or directory

The error below should be referring to a non existent ImageMagic executable:
Error creating thumbnail: sh: /usr/local/bin/convert: No such file or directory

After installing the executable my MediaWiki installation was still complaining. I fixed the problem using action=purge on the page, for example:

Saturday, September 17, 2011

error client ip File does not exist: /etc/apache2/htdocs after cloning

We cloned an ESX VM today to take advantage of all configurations in there. I went ahead and removed all unnecessary services and just left apache however something was not working:
error] [client] File does not exist: /etc/apache2/htdocs

I knew this was not a configuration issue as in the original VM apache does run without complaining.

It ended up being something about relative and absolute paths. For some reason Apache was failing to find the sites-enabled directory when using relative paths. A correction in /etc/apache2/apache2.conf made the trick:
$ sudo vi /etc/apache2/apache2.conf 
#Include sites-enabled/
Include /etc/apache2/sites-enabled/

Friday, September 16, 2011

monit: fatal: /usr/sfw/lib/ wrong ELF class: ELFCLASS32

Monit Solaris 10 5/9 64 bits installation went good but it was not working:

monit: fatal: /usr/sfw/lib/ wrong ELF class: ELFCLASS32

A simple ldd showed the library was not 64bits:
# ldd /usr/local/bin/monit =>      /usr/sfw/lib/  - wrong ELF class: ELFCLASS32 =>   /lib/64/ =>       /lib/64/ =>        /lib/64/ =>   /lib/64/ =>        /lib/64/ =>         /lib/64/ =>       /usr/sfw/lib/  - wrong ELF class: ELFCLASS32 =>    /usr/sfw/lib/  - wrong ELF class: ELFCLASS32 =>     /lib/64/ =>   /lib/64/ =>    /lib/64/ =>    /lib/64/ =>   /lib/64/ =>  /lib/64/ =>         /lib/64/ =>   /lib/64/ =>     /lib/64/

Here is how you solve these kind of issues in your 64 bits Solaris box:
export LD_LIBRARY_PATH=/usr/sfw/lib/64:$LD_LIBRARY_PATH

However you better use crle in this case as you want the change to affect your whole system, after all this is a 64 machine ins't it?

crle -64 -u -l /usr/sfw/lib/64

Monday, September 12, 2011

Allowing changes in subversion log messages or comments

When you commit your change to a subversion repository you have an option to change the comment you use:
svn propset --revprop -r 15125 svn:log "Editing this comment with more detail, blah, blah ..."

There is a hook script template that you could use to allow this but there is a reason why the default template should not be used as is. It is dangerous to be able to affect good comments with garbage if you just make a mistake with the version number but the most dangerous of all is that you could change comments done by a different user!!! From a GUI like Eclipse or Netbeans it is harder to make a mistake like the first one of course, however you should protect subversion so log changes are restricted to the owner of the commit.

Use the below to achieve it (Linux, if Windows translate to DOS)
$ sudo vi /var/local/svn/

owner=`svnlook author -r "$REV" "$REPOS"`
if [ "$ACTION" = "M" -a "$PROPNAME" = "svn:log" -a "$owner" = "$USER" ]; then exit 0; fi

echo "Changing revision properties other than svn:log is prohibited and you must be the owner($owner) to change $REPOS@$REV" >&2
exit 1 
$sudo chmod +x  /var/local/svn/

Phishing Attack: Using redirection

It has come to my attention the escalation in phishing attempts coming to my gmail account. In 4 days I got 3 emails that managed to pass the spam protection. They all claimed they were my twitter friends and that they found someone faking my account, my twitter picture, twitter avatar and what not.

I always inspect the url before clicking because I want to explore the vulnerabilities (of course the safest to do is just to report as spam anything looking suspicious) So I took a look at them and they were all referring to well known websites that are "offerring free redirection services". I hope this is just a bug in and Do not hit the urls below before reading the rest of this post. Here are the URLs:

Open Firefox and delete all your cookies. Failure to do that will probably compromise things like your google/gmail account.

If you take a look at the traces you will notice there were attempts to get some stuff from If you are logged into gmail or other google services your cookies for will be compromised and the intruder could fake your session resulting in identity theft.

Thursday, September 08, 2011

svnadmin: Couldn't perform atomic initialization database is locked

I was trying to use a CIFS (Windows) path as the svn repository but I was having locking issues:

$ sudo svnadmin load /repo/path < ~/svn_dump
<<< Started new transaction, based on original revision 1
    * adding path : projects ... done.
svnadmin: Couldn't perform atomic initialization
svnadmin: database is locked

Taking a look at man pages I gave option "nobrl" a try and that seemed to solve the problem which apparently is that our NetApp SAN does not support byte range locks.

Using the below did the trick then:
mount -t cifs // /mnt/local/path -o credentials=/root/cifs/cifs_credentials.txt,domain=COMPANYX,file_mode=0600,dir_mode=0700,uid=admin,gid=admin,nobrl

CIFS VFS: cifs_mount failed return code -13 0xc000006d NT_STATUS_LOGON_FAILURE

Using the following command to mount a CIFS (Windows) path:
mount -t cifs // /mnt/local/path -o credentials=/root/cifs/cifs_credentials.txt,domain=COMPANYX,file_mode=0600,dir_mode=0700,uid=admin,gid=admin

I was getting this error:
[307906.131366] Status code returned 0xc000006d NT_STATUS_LOGON_FAILURE
[307906.131374] CIFS VFS: Send error in SessSetup = -13
[307906.131575] CIFS VFS: cifs_mount failed w/return code = -13

While this can be caused by any permission problems sometimes the lack of more verbose log traces makes it harder to find the exact issue.

In my case this was related to the credentials file having spaces:
username = user 
password = password

Fixing it was a matter of trimming spaces out:

Saturday, September 03, 2011

FCKEditor for Mediawiki 1.17: WYSIWYG based on CKEditor

Mediawiki is not longer supporting the deprecated FCKEditor. The new CKEditor is supported through the WYSIWYG extension. I tried WYSIWYG 1.5.6 in my a new installation of Mediawiki 1.17.0 but I would get nothing everytime I would try to switch to the Rich Editor.

Debugging with Firebug I found a variable not defined error: "CKEDITOR is not defined". This was a result of addType directives in wiki/extensions/WYSIWYG/ckeditor/.htaccess. You either need to comment the lines or move the file (rename it, delete it or move it to somewhere else)

So below are the commands to get your WYSIWYG editor working in Mediawiki:
$ unzip
$ sudo cp -R extensions/WYSIWYG /var/www/wiki/extensions/
$ sudo chown -R www-data:www-data /var/www/wiki/
$ sudo mv /var/www/wiki/extensions/WYSIWYG/ckeditor/.htaccess /var/www/wiki/extensions/WYSIWYG/ckeditor/.htaccess.old
$ sudo vi /var/www/wiki/LocalSettings.php

Friday, September 02, 2011

Upgrading subversion

We are using WebDAV with Apache for subversion. Below are the steps I followed to migrate an old subversion repository to a brand new Ubuntu server with latest subversion.

Note that "admin" is a user which can make administer subversion.

$ sudo apt-get install subversion libapache2-svn
$ sudo mkdir -p /var/local/svn/
$ sudo addgroup svn
$ sudo usermod -a -G svn www-data
$ sudo usermod -a -G svn admin
$ sudo chmod 2770 /var/local/svn/
$ sudo svnadmin create /var/local/svn/
$ sudo vi /var/local/svn/ #ACL
$ sudo mkdir /var/log/apache2/
$ sudo vi /etc/apache2/sites-available/subversion
<VirtualHost *>

 DocumentRoot /var/local/svn/

 <Location /repos/reporting>
   DAV svn
   SVNListParentPath off
   AuthType Basic
   AuthName "Subversion repository"
   SVNPath /var/local/svn/
   AuthzSVNAccessFile /var/local/svn/
   AuthUserFile /var/local/svn/
   Require valid-user
        Require valid-user

 <Directory "/var/local/svn/">
   Options -Indexes
$ sudo cp authz /var/local/svn/ #assuming there is an existing svn access file. Better keep it on SVN ;-)
$ sudo cp passwd /var/local/svn/ #assuming there is an existing password file. Better keep it on SVN ;-)
$ sudo htpasswd /var/local/svn/ "new username here" #to create individual users
$ sudo ln -s /etc/apache2/sites-available/subversion /etc/apache2/sites-enabled/004-subversion
$ sudo svnadmin load /var/local/svn/ < ~/file_from_command_svnadmin_dump_originalRepoPath
$ sudo chown -R www-data:svn /var/local/svn/
$ sudo chmod -R g+w /var/local/svn/
$ sudo /etc/init.d/apache2 restart

Thursday, September 01, 2011

Upgrading Bugzilla

This one was pretty straightforward (from version 2 to 4 actually)

  1. If this is a new server restore your DB from current bugzilla installation
  2. Uncompress the distro in your document root, commonly /var/www/bugzilla-4.0.2
  3. Update apache virtual host to point to that directory
  4. Replace file 'localconfig' and directory 'local' from the previous installation
  5. If needed change db user and password if needed in 'localconfig'
  6. If needed update urlbase 'data/params' file
  7. Run the below commands from the bugzilla document root directory and make sure you get no warnings nor errors. Correct them all before considering your migration completed
    $ sudo ./ 
    $ sudo ./ 
  8. Restart apache and hit the bugzilla url

Upgrading Mediawiki

This one took a good chunk of time. More than anything the order was the important part:

  1. Install a fresh copy
  2. Modify LocalSettings.php to meet your goals
  3. Restore the previous db backup
  4. Run the below to update the tables so they work with the newest code
    $ cd /var/www/mediawiki/maintenance/
    $ php update.php