Thinking In Software: September 2010

Thursday, September 30, 2010

ApacheDS LDAP from Spring Security and Liferay

There are so many ways you can integrate LDAP with Liferay that I think a whole book could be filled with examples. Clearly this is a consequence of the many different scenarios you might face in your company.

Showcase
The showcase I am presenting is:
1. ApacheDS hosts users and groups (roles and groups are the same on the LDAP side in other words we do not have the diferentiation Liferay does have)
2. ApacheDS is accessible from spring security
3. ApacheDS is accessible from Liferay 5.2.3 LDAP implementation following the below user stories.

I have documented already the ApacheDS setup. Of you have not setup ApacheDS read here.

I have also documented on the same link how to get Spring security working with ApacheDS.

User Stories
Liferay LDAP authentication user stories:
1. When a user is set in LDAP, Then the user can login with his credentials when accessing Liferay even if the user has never been set in Liferay
2. When a user is assigned to a group/role in LDAP, Then after user login the new group and the user-group association will be created in Liferay
3. When a user is detached from a group/role in LDAP, Then after user login the user-group association will be removed.

The above guarrantees that we can handle the setup of users for both applications (Portlets and Servlets applications) in just one LDAP server.

Implementation
I thought this was going to be an easy plumbing but it turned to be not that easy. I posted the issue and continued investigating to arrive to the following solution.

1. Spring Security will work only if group contains users (uniqueMember attribute in the group cn)
2. Liferay can work as expected only if user contains groups (any attribute that points to a valid group cn)
3. I have added then an extra attribute to users (ou) which basically closes a cyclic reference between users and groups.
4. See below for the configuration in liferay. Note that I do not include the groups section as in Liferay you must decide either to import users or groups. If you import groups you will not be able to login in version 5.2.3 as I posted in the issues link. I do not show the import/export section as I do not import nor export users and roles. As said before this showcase is precisely about leaving those tasks to LDAP alone. Performance wise this is a good decision BTW.

5. Of course we need to build an application that handles this cyclical reference.

Tuesday, September 28, 2010

ETL: Importing data with Talend

ETL is used for Operational Data Stores, Data Warehouse and Data Mart. ETL tools can also be handy for simple importing into existing application databases as well.

Importing data into your application is as important as reporting out of your application. Importing can be achieved while distributing the workload through several people.

On one end you want a BA to decide when to run an import process, which components to assembly, which data sources to use, the mappings.

There is an existing model, datasources representing feeds that need to get to your model and a mapping to make that transformation happen. ETL processes are commonly used for this task. You can do something as simple as manual SQL scripting, something more elaborated using a Rules Engine or even more polished like using an ETL tool.

Talend has a very good tool for ETL (which JasperETL uses as well). The fact that latest JasperETL 3.2.3 does not work that well on MAC OSX even after tweaking made me decide to stick to Talend. These tools can be used for:

Construct a Business model with a graphical interface that allows BA to drop the general blocks for example an Excel source file to be used in conjunction with a web service output to fill out records in an existing database.
Design a Job to implement the business model blocks.
Schedule Jobs.

This tutorial is about designing a Job with Talend/Jasper ETL. I am not interested here in covering points 1 and 3 as they are really not needed for our task: Importing data from Excel into an application database.

For the impatient

Create a local MySQL DB and name it myapp. Use the model from http://thinkinginsoftware.blogspot.com/2010/09/jasper-real-time-report-services.html

Download Talend ( I used TOS-All-r48998-V4.1.0RC3.zip ) and uncompress in c:\TOS-All-r48998-V4.1.0RC3

Create folder c:\projects\talend to be used as workspace
Open Talend (use the non-wpf exe file for example TalendOpenStudio-win32-x86.exe ) and create a project named DbImport. Use as workspace folder c:\projects\talend
After Talend is done updating the project close it.
Checkout http://nestorurquiza.googlecode.com/svn/trunk/talend/DbImport/ in a temporary directory
Copy all files from the temporary directory to the workspace DbImport folder
I have included a sample Excel employees.xls with the project. Create folder c:\projects\DbImport and drop the excel there
Start Talend and open project DbImport
Run PopulateAll job and confirm offices, departments and employees have been added to “myapp” database
Note: Even if no changes are made after closing the IDE you will get differences in some projects files. It is good idea to always update from svn before starting to work on a project as other developers might commit their local project files to the repository.

Let us review in detail what I have done in this simple project.

Showcase

Go ahead and create a mySQL database named “myapp”. We are going to use the simple model we created on http://thinkinginsoftware.blogspot.com/2010/09/jasper-real-time-report-services.html
We want to import employees from an existing Excel spreadsheet into our new database.
Our Excel import file contains a de-normalized data we need to import into our normalized tables. The columns are:

first_name
last_name
office_name
department_name

Using Talend / Jasper ETL

Install Talend (I am using version 4.1.0) or Jasper ETL (I have tested version 3.2.3 which at least in MAC OSX needs a little tweak). Install it near the File System root. I will describe here everything for Windows OS but you should be able to to the same in other OS. The important is to keep paths reusable through your team. So install Talend then in “c:\talend”
Start the program (If using Windows XP use the executable, for example TalendOpenStudio-win32-x86.exe. The Eclipse -wpf- version flicks).Set up a connection. Click on the button near “Repository” (In Talend is “Email” button), provide your email (this will end up stamped in many files from now on so use a real personal or work email). For the workspace folder use a common folder that other users can later use as well in their own machines for example “C:\projects\talend”. You will need to hit “restart” if using Talend and changing the workspace.
Select “Create a new Local project” from “Project” section and click on “Go”. Use as project name “DbImport” and as language generation “Java”. Pick the new project from the last dropdown. Click on “Open”.
After you close the welcome windows you should see the “Window|Perspective|Design Workspace”. Right click on Job Designs on the left and create the first Job called “PopulateLookup” with Purpose “Import from Excel to lookup MySQL tables”. This is a job that will populate department and office tables. We need office_id and department_id for the employee table, that is why we must be sure the department and office exist in the DB before.
On the repository View (left panel by default) right click on Metadata/Db Connection and create a MySQL connection to the database containing the tables employee, department and office tables. Right click on the connection and “retrieve schema” for the three tables.
Create an Excel file named c:\projects\DbImport\employees.xls containing the data in the appendix.
Right click on Metadata/File Excel and point to a local Excel file (Use Path button to point to the file)
Select the sheet, click next, select “Set heading row as column names”, click next and select as name “employee”
Drag and drop one by one (into the job area) the department and office metadata (when prompted select tMySQLOutput)
Look for the Palette (Components Library). If it is not showing up use “Window|Show View|General|Palette.”
Drag and drop a tMap component from the palette “Processing” section.
Right click on the employee inner square (you must select the inner square otherwise the option will not be available) and select “Row|Main” a line will be started and will end wherever you click as final destination component. In this case click on the tMap component.
Right click on tMap, select “Row|New Output” and drop the line into department and name it outputDepartment. Do the same for Office. When prompted to get the schema from the target respond yes as that helps to see the available destination fields.
Double click on the tMap and drag and drop the fields from the input to the output
Run the Job from the Run tab. If there are problems the specific component will be red and double clicking on it will show up a description of the problem. You might notice that is the situation as we have specified office_name instead of just name as the destination field in the tMap, so correct that and rerun.
Save your job and create a second job named “PopulateEmployee” purpose “Populate table employee”
Drop department and office boxes into the working area. Be sure to select type tMySQLInput
Drop the employee Excel
Drop a tMap
Drop the employee MySQL as tMySQLOutput
Create input and output connections as explained before. Use naming conventions for example inputEmployee, inputOffice, inputDepartment and outputEmployee
Open the tMap and in the input panel drag and drop the inputOffice.name to inputEmployee.office_name and inputDepartment.name to inputEmployee.department_name. Here you are defining the necessary joins from input sources.
Drag and drop inputOffice.id, inputDepartment.id, inputEmployee.first_name and inputEmployee.last_name into the output panel left colum right next to the destination field.
Run the project to get the data imported. Check the data from the mySQL tables
Of course both jobs are related. We want to run PopulateLookup and then later PopulateEmployee. That is why we need to create a third job now. Name it “PopulateAll”
Drop two components type “tRunJob” from the palette. From the Component tab select for the first “PopulateLookup” and for the second “PopulateEmployee”
Right click on the first and select “Row|Main”. Drop the line into the second sub job.
Cleanup the records from the database so you can see all recreated.

delete from office;
delete from employee;
delete from office;

Run “populateAll” job and your data will be in the destination.

Sharing the project

Talend and so JasperETL are designed in a way that they have version control through a server. To avoid using an extra sever you could use export/import (but that would be limiting):
1. To export: Right click on “Business Models” and export “all” to the root folder (in our case C:\projects\) that is shared let us say on a subversion repository. This will create/update “c:\projects\DbImport”. Now you can share that on SVN.

2. To import: Checkout from SVN. Go to Talend and import.

As a better option you can (at least in Talend version 4.1.0) share the whole project (which does not include any binaries) Below is the list of all files for the project in this tutorial:

|-- TDQ_Data Profiling
|   |-- Analyses
|   `-- Reports
|-- TDQ_Libraries
|   |-- Indicators
|   |-- JRXML Template
|   |-- Patterns
|   `-- Rules
|-- businessProcess
|-- businessProcessSVG
|-- code
|   |-- jobscripts
|   |-- routines
|   |   `-- system
|   |       |-- DataOperation_0.1.item
|   |       |-- DataOperation_0.1.properties
|   |       |-- Mathematical_0.1.item
|   |       |-- Mathematical_0.1.properties
|   |       |-- Numeric_0.1.item
|   |       |-- Numeric_0.1.properties
|   |       |-- Relational_0.1.item
|   |       |-- Relational_0.1.properties
|   |       |-- StringHandling_0.1.item
|   |       |-- StringHandling_0.1.properties
|   |       |-- TalendDataGenerator_0.1.item
|   |       |-- TalendDataGenerator_0.1.properties
|   |       |-- TalendDate_0.1.item
|   |       |-- TalendDate_0.1.properties
|   |       |-- TalendString_0.1.item
|   |       `-- TalendString_0.1.properties
|   `-- snippets
|-- components
|-- context
|-- documentations
|-- images
|   |-- job_outlines
|   `-- joblet_outlines
|-- joblets
|-- libs
|-- metadata
|   |-- BRMSconnections
|   |-- FTPconnections
|   |-- LDAPSchema
|   |-- MDMconnections
|   |-- SalesforceSchema
|   |-- WSDLSchema
|   |-- connections
|   |   |-- myapp_0.1.item
|   |   `-- myapp_0.1.properties
|   |-- fileDelimited
|   |-- fileEBCDIC
|   |-- fileExcel
|   |   |-- employee_0.1.item
|   |   `-- employee_0.1.properties
|   |-- fileHL7
|   |-- fileLdif
|   |-- filePositional
|   |-- fileRegex
|   |-- fileXml
|   |-- genericSchema
|   |-- header_footer
|   |-- rules
|   `-- sapconnections
|-- process
|   |-- PopulateEmployee_0.1.item
|   |-- PopulateEmployee_0.1.properties
|   |-- PopulateLookup_0.1.item
|   |-- PopulateLookup_0.1.properties
|   |-- populateAll_0.1.item
|   `-- populateAll_0.1.properties
|-- sqlPatterns
|   |-- Generic
|   |   |-- UserDefined
|   |   `-- system
|   |       |-- Aggregate_0.1.item
|   |       |-- Aggregate_0.1.properties
|   |       |-- Commit_0.1.item
|   |       |-- Commit_0.1.properties
|   |       |-- DropSourceTable_0.1.item
|   |       |-- DropSourceTable_0.1.properties
|   |       |-- DropTargetTable_0.1.item
|   |       |-- DropTargetTable_0.1.properties
|   |       |-- FilterColumns_0.1.item
|   |       |-- FilterColumns_0.1.properties
|   |       |-- FilterRow_0.1.item
|   |       |-- FilterRow_0.1.properties
|   |       |-- MergeInsert_0.1.item
|   |       |-- MergeInsert_0.1.properties
|   |       |-- MergeUpdate_0.1.item
|   |       |-- MergeUpdate_0.1.properties
|   |       |-- Rollback_0.1.item
|   |       `-- Rollback_0.1.properties
|   |-- Hive
|   |   |-- UserDefined
|   |   `-- system
|   |       |-- HiveAggregate_0.1.item
|   |       |-- HiveAggregate_0.1.properties
|   |       |-- HiveCreateSourceTable_0.1.item
|   |       |-- HiveCreateSourceTable_0.1.properties
|   |       |-- HiveCreateTargetTable_0.1.item
|   |       |-- HiveCreateTargetTable_0.1.properties
|   |       |-- HiveDropSourceTable_0.1.item
|   |       |-- HiveDropSourceTable_0.1.properties
|   |       |-- HiveDropTargetTable_0.1.item
|   |       |-- HiveDropTargetTable_0.1.properties
|   |       |-- HiveFilterColumns_0.1.item
|   |       |-- HiveFilterColumns_0.1.properties
|   |       |-- HiveFilterRow_0.1.item
|   |       `-- HiveFilterRow_0.1.properties
|   |-- MySQL
|   |   |-- UserDefined
|   |   `-- system
|   |       |-- MySQLAggregate_0.1.item
|   |       |-- MySQLAggregate_0.1.properties
|   |       |-- MySQLCreateSourceTable_0.1.item
|   |       |-- MySQLCreateSourceTable_0.1.properties
|   |       |-- MySQLCreateTargetTable_0.1.item
|   |       |-- MySQLCreateTargetTable_0.1.properties
|   |       |-- MySQLDropSourceTable_0.1.item
|   |       |-- MySQLDropSourceTable_0.1.properties
|   |       |-- MySQLDropTargetTable_0.1.item
|   |       |-- MySQLDropTargetTable_0.1.properties
|   |       |-- MySQLFilterColumns_0.1.item
|   |       |-- MySQLFilterColumns_0.1.properties
|   |       |-- MySQLFilterRow_0.1.item
|   |       `-- MySQLFilterRow_0.1.properties
|   |-- Netezza
|   |   |-- UserDefined
|   |   `-- system
|   |       |-- NetezzaAggregate_0.1.item
|   |       |-- NetezzaAggregate_0.1.properties
|   |       |-- NetezzaCreateSourceTable_0.1.item
|   |       |-- NetezzaCreateSourceTable_0.1.properties
|   |       |-- NetezzaCreateTargetTable_0.1.item
|   |       |-- NetezzaCreateTargetTable_0.1.properties
|   |       |-- NetezzaDropSourceTable_0.1.item
|   |       |-- NetezzaDropSourceTable_0.1.properties
|   |       |-- NetezzaDropTargetTable_0.1.item
|   |       |-- NetezzaDropTargetTable_0.1.properties
|   |       |-- NetezzaFilterColumns_0.1.item
|   |       |-- NetezzaFilterColumns_0.1.properties
|   |       |-- NetezzaFilterRow_0.1.item
|   |       `-- NetezzaFilterRow_0.1.properties
|   |-- Oracle
|   |   |-- UserDefined
|   |   `-- system
|   |       |-- OracleAggregate_0.1.item
|   |       |-- OracleAggregate_0.1.properties
|   |       |-- OracleCreateSourceTable_0.1.item
|   |       |-- OracleCreateSourceTable_0.1.properties
|   |       |-- OracleCreateTargetTable_0.1.item
|   |       |-- OracleCreateTargetTable_0.1.properties
|   |       |-- OracleDropSourceTable_0.1.item
|   |       |-- OracleDropSourceTable_0.1.properties
|   |       |-- OracleDropTargetTable_0.1.item
|   |       |-- OracleDropTargetTable_0.1.properties
|   |       |-- OracleFilterColumns_0.1.item
|   |       |-- OracleFilterColumns_0.1.properties
|   |       |-- OracleFilterRow_0.1.item
|   |       |-- OracleFilterRow_0.1.properties
|   |       |-- OracleMerge_0.1.item
|   |       `-- OracleMerge_0.1.properties
|   |-- ParAccel
|   |   |-- UserDefined
|   |   `-- system
|   |       |-- ParAccelAggregate_0.1.item
|   |       |-- ParAccelAggregate_0.1.properties
|   |       |-- ParAccelCommit_0.1.item
|   |       |-- ParAccelCommit_0.1.properties
|   |       |-- ParAccelDropSourceTable_0.1.item
|   |       |-- ParAccelDropSourceTable_0.1.properties
|   |       |-- ParAccelDropTargetTable_0.1.item
|   |       |-- ParAccelDropTargetTable_0.1.properties
|   |       |-- ParAccelFilterColumns_0.1.item
|   |       |-- ParAccelFilterColumns_0.1.properties
|   |       |-- ParAccelFilterRow_0.1.item
|   |       |-- ParAccelFilterRow_0.1.properties
|   |       |-- ParAccelRollback_0.1.item
|   |       `-- ParAccelRollback_0.1.properties
|   `-- Teradata
|       |-- UserDefined
|       `-- system
|           |-- TeradataAggregate_0.1.item
|           |-- TeradataAggregate_0.1.properties
|           |-- TeradataColumnList_0.1.item
|           |-- TeradataColumnList_0.1.properties
|           |-- TeradataCreateSourceTable_0.1.item
|           |-- TeradataCreateSourceTable_0.1.properties
|           |-- TeradataCreateTargetTable_0.1.item
|           |-- TeradataCreateTargetTable_0.1.properties
|           |-- TeradataDropSourceTable_0.1.item
|           |-- TeradataDropSourceTable_0.1.properties
|           |-- TeradataDropTargetTable_0.1.item
|           |-- TeradataDropTargetTable_0.1.properties
|           |-- TeradataFilterColumns_0.1.item
|           |-- TeradataFilterColumns_0.1.properties
|           |-- TeradataFilterRow_0.1.item
|           |-- TeradataFilterRow_0.1.properties
|           |-- TeradataTableList_0.1.item
|           `-- TeradataTableList_0.1.properties
|-- talend.project
`-- temp

Unfortunately SVN support is not included in the IDE. Following some steps though you can still share the project.

Commit the project to SVN

Create “DbImport” project as explained before.
Delete temp directory
Import in your SVN
Checkout the project from SVN
Add svn:ignore for the temp directory (svn propset svn:ignore "temp" .)
Commit the project.

Check out the project from SVN

Create a new local “DbImport” project. Close the IDE.
Outside the workspace folder checkout “DbImport” from SVN.
Replace the content of the workspace “DbImport” directory with the checked from SVN files.
Open the IDE and modify the project as you wish.
Close the IDE and use svn update and/or commit commands as you need.

Documentation

http://sourceforge.net/projects/jasperetl/files/
http://talend.dreamhosters.com/tos/user-guide-download/V402/DocumentationSet_UG&RG_40b_EN.zip
Help from the GUI

Appendix

first_name	last_name	office_name	department_name
John	Smith	London	Legal
Mathew	Parker	USA	Marketing
Andrea	Polini	Rome	Sales

Saturday, September 25, 2010

ynotifier

You should be automatically redirected to this url

Friday, September 24, 2010

Android YNotifier: A Yahoo Email Notifier

Yahoo Email Notifier (YNotifier) allows you to configure a yahoo id (please note you must provide id and not the complete email address) and password to get notified about new and unread emails.
* It checks Yahoo every 5 minutes.
* Just click the alert and land in yahoo mobile to check, reply or send new emails.
* Once installed it will start automatically every time you restart your phone.
* To stop checking emails just leave id and password empty.
* It won’t bother you with the same unread emails alert if you do not perform any actions after clicking the alert and no new emails are received.
* For support go to http://thinkinginsoftware.blogspot.com/ynotifier

Installation

Go to Google market and download Ynotifier or if navigating this page from your android device click here to get any of my applications

Support

Use this page for support, questions, enhancements and feature requests. Post any issues here or drop me an email. I will be glad to help making this application better.

* If the application does not work as expected please check you actually have unread emails in your yahoo mail from a Desktop/Laptop computer. Then be sure you have configured your correct user id and password.
* If you suspect YNotifier is responsible for any performance issues you can install "TaskPanel" and kill "YNotifier".
* If you find out any problems (bugs) please install "Log Collector" and send me the content by email.

Note for IPhone customers

If there is enough demand I will make it available to the IPhone community as well. So drop me an email if interested.

Real Time Database Documentation

All the information about a project should be maintained from within the project whenever is possible. I have never seen a perfect documentation nor an updated documentation to live long enogh in any company I have been as employee or contractor. That is why I believe that the most valuable documentation is the one that can be auto generated when you need it.

The database or Model information in particular is very important as that is the bottom layer, the foundation of any modern software architecture. Having the ERD/EER available is then a must have for the agility of a true Business Driven development (BDD). Enough to say it allows to plan for new features providing just the gap to be implemented.

1. Provide business access to your current database metadata. You are maintaining it in a SCM repository, aren't you? This is the best way for business to see your naming conventions are aligned with their business language. So let them see your current tables and fields.
2. Provide business with all current default values. I hope you understand you must script them and keep them in a repository as well.
3. Provide business with a tool to get the EER/ERD by themselves. For MySQL install MySQL Workbench. Here is all they need to get a whole diagram whenever they want:

1. Open a MySQL Workbench already saved project. In a project some of the settings will be already saved but youy are free to create one project from scratch every time you want a new EER/ERD.
2. Choose "Create EER Model From SQL Script" from the Home page and point to the metadata file.
3. Select "Model|Create Diagram From Catalog Objects" to generate the diagram.
4. If the tables show up too tight select "Model|Diagram Properties and size" and expand columns and rows as needed.
5. Hit Arrange/Autolayout. If you need more space go back to the previous step.
6. In big diagrams sometimes we do not need all information about foreihn keys for example. From preferences/diagram select “Hide Captions” and deselect “Draw Line crossings” and “Center captions over the line”
7. Confirm all tables are viewable and save the project.

Here is a nice command to cleanup the MySQL database metadata (also referred as dump file) to show just tables and fields. You can encapsulate it in a batch/bash script so business guys have a cleaner file to look into when wondering if a given keyword has been already used in the system.

cat application.sql |grep -v "^/.*"|grep -v "\-.*"|grep -v "DROP.*"|grep -v "\`id.*"|grep -v "\`version.*"|grep -v "UNIQUE.*"|grep -v "KEY.*"|sed s/ENGINE.*//g|sed s/CREATE.//g > application_tables_and_fields.txt

Wednesday, September 22, 2010

Jasper Real Time Report Services Framework

Operational reporting (real time reporting) is an important part of the company software. The cost involved on Real Time Reporting can be high as it affects the existing system transactional capabilities.

On the other hand client reporting is usually done with data stored in a data warehouse. The data in there has certain delay and gets populated sometimes even just daily.

So the first task when designing reports is identifying which data can simply be refreshed from time to time and which one must be done in real time.

Once we have identified the data sets that must be generated in real time then we run into a new issue, some data must be joined but they are in a non linkable sources or even worst some of the data to be joined comes from an external application for example a Web Service. In those cases you will need a custom reporting solution.

A good custom reporting solution must provide the the best trade between high data availability for reporting and good application performance.

Resources are not unlimited and it is crucial that we use those we have at highest as possible percentage. The MVC pattern is to be applied to any software with a User Interface (UI) and I say UI and not GUI because even in the case of Console applications you still have a View.

If your needs are just pulling information from one existing database and you can live with just SQL then any Report Designer will be able to easily use any report utility like iReport to generate even the more complex reports you can imagine.

Of course real world is far from that. You need to pull data from different databases, some of them data warehouses and some of them real time application databases. You need data from other sources like Web Services, excel, text files and even (God forbid) pdf documents.

Only a high level language can come to your rescue to get Real Time Reporting in place.

The architecture

The purpose of this post is to document one implementation using Jasper Reports. I will show how real time reporting can be done while still separating the concerns of visualization, data and logic.

Take a look at the below diagram

Regular users see a list of reports generated from files in the file system. The files follow a convention _name.jrxml. They are Jasper XML files. The user selects a report and a form shows up asking for parameters or if not parameters are needed a pdf will be returned with the contents of the report.

Report Designers use iReport tool. They build reports containing subreports. They use parameters to communicate from the main report to the subreport or to customize dynamically the necessary data sources connections. They use a connection to a local database that is built following the indications from a Java Developer (from files daily.sql and daily_data.sql for example). When they are satisfied with the result they copy reports and subreports in a specific file system path. For example rt_daily.jasper will be generated from rt_daily.jrxml. The jrxml is maintained in SCM of course.

Java Developers build code that (look at the numbers in the diagram):

1. Decides which connection to supply to the main report. The data source comes as part of the report name. For the example it is “rt” which means realtime and so a local sqlite database will be populated using the same metadata iReport designers used for their tests (daily.sql). It can be a connection to a non realtime database for example a CRM database. Regardless of what connection we supply the “realTimeDbPath” will be passed to Jasper in the case any subreport needs a realtime connection.

2. Runs a Service#populateDb() method following a convention like for example “DailyReportService”. This service is in charge of preparing the local sqlite with all needed datasets for either the main report or subreports.

3. Resolves the name of the compiled report file to pull from the file system (rt_daily.jasper)

4. Supplies connection, parameters and jasper file to the Jasper Report Engine to get a pdf file with the results of the report. All parameters supplied as part of the form are supplied to the Jasper Engine BTW. This allows to minimize coding but also imposes a security concern. Be sure you do not rely just on parameters but on internal security at services layer. Spring Security with the help of AOP is ideal for this.

MVC pattern respected

View: (Report writting)
Ideally someone that knows how the report should look will take care of this layer. This person just cares about the organization and layout of the data. One important assumption should be made at this point: The report writer shouldn’t necessarily be a DB developer, a High Level Language (HLL) developer (like Java, C# developers) or any other technical person. The report writter should have available certain datasources to visualize his report. The writer should be familiar with basic SQL concepts. The writter can be and should be IMO a Business Analyst (BA)

Model: Data
To get the report your company needs you will probably need to dig into Excel, SOAP services, text documents, databases, XML you name it. It makes sense though that this data gets translated to fixed tables from where the person in charge of the View could easily build the report. A DB developer is the best fit for this layer. A HLL programmer is a good fit as well. A BA can definitely build a reporting model as well. After all nobody better than him knows the dataset he will need per subreport.

Controller: Logic
An HLL developer will be needed for this layer. This is the layer in charge of all the plumbing between View and Model:

1. Uses a Services layer that in turn uses a DAO layer.

2. Implements security to determine which users have access to run which stored reports.

3. When the report is run it looks for the need of any real time data and if needed it populates it.

4. It invokes the Jasper Engine to run the particular report.

Local Environment

It is easier when everything is in a simple database and better when the data is de-normalized. For this document we are starting from three tables that will be de-normalized into just one. I am using MySQL here. Note that this is an example to illustrate complicated scenarios when you need to get data from different sources in just one data set. In reality if you have all you need in three different tables from the same database and you must provide real time reporting you are fine just pointing to the real database from ireport for developing the report and later naming the report with a proper datasource to be sure the needed connection is available at runtime.

Let us say that our original DB has three tables

CREATE TABLE `office` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `name` varchar(50) NOT NULL,
 PRIMARY KEY (`id`),
 UNIQUE KEY `name` (`name`)
) ENGINE=InnoDB AUTO_INCREMENT=11 DEFAULT CHARSET=latin1

CREATE TABLE `department` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `name` varchar(50) NOT NULL,
 PRIMARY KEY (`id`),
 UNIQUE KEY `name` (`name`)
) ENGINE=InnoDB AUTO_INCREMENT=35 DEFAULT CHARSET=latin1

CREATE TABLE `employee` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
`first_name` varchar(50) NOT NULL,
`last_name` varchar(50) NOT NULL,
`department_id` int(11) DEFAULT NULL,
 `office_id` int(11) DEFAULT NULL,
 PRIMARY KEY (`id`),
 KEY `FK_employee_office_id` (`office_id`),
 KEY `FK_employee_department_id` (`department_id`),
 CONSTRAINT `FK_employee_department_id` FOREIGN KEY (`department_id`) REFERENCES `department` (`id`),
 CONSTRAINT `FK_employee_office_id` FOREIGN KEY (`office_id`) REFERENCES `office` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=15 DEFAULT CHARSET=latin1

Let us say it has this data:

INSERT INTO `office` (name) VALUES ('Buenos Aires');
SET @office_id = last_insert_id();
INSERT INTO `department` (name) VALUES ('Engineering');
SET @department_id = last_insert_id();
INSERT INTO employee (first_name, last_name, office_id, department_id) values ('Pablo', "Cardone", @office_id, @department_id);
INSERT INTO `office` (name) VALUES ('Sao Paulo');
SET @office_id = last_insert_id();
INSERT INTO `department` (first_name) VALUES ('Marketing');
SET @department_id = last_insert_id();
INSERT INTO employee (first_name, last_name, office_id, department_id) values ('Ronaldo', "Gomes", @office_id, @department_id);

Our good practices sense tell us that for a report we are better providing just the whole dataset denormalized (just one table). We use sqlite3 for the local database. We must provide the necessary metadata and some initial data for iReport Designers. We will package metadata in a file called sampleEmployee.sql. This file will be accessible in the classpath of our application as we want to share it with report designers and ensure the same version is used from the application:

CREATE TABLE `employee` (
 `first_name` varchar(50) NOT NULL,
 `last_name` varchar(50) NOT NULL,
 `office_name` varchar(50) NOT NULL,
 `department_name` varchar(50) NOT NULL
) ;

In a different file sampleEmployee_data.sql (not to be in the classpath of the application) we expose some sample data so the iReport users can start designing the layout:

INSERT INTO employee (first_name, last_name, office_name, department_name) VALUES ('Nestor','Urquiza','Buenos Aires','Engineering');
INSERT INTO employee (first_name, last_name, office_name, department_name) VALUES ('Pablo','Cardone','Sao Paulo','Sales');
INSERT INTO employee (first_name, last_name, office_name, department_name) VALUES ('Ronaldo','Gomes','Sao Paulo','Sales');

So locally our report writers can use the below command to get their testing data locally:

cd ~/
sqlite3 sampleEmployee.db <  sampleEmployee.db
sqlite3 sampleEmployee.db <  sampleEmployee_data.db

Designing the report with iReport

iDesiner is a visual tool that allows to design JasperReports. This is then your tool to create the Report Framework View side. I have tested this using iReport-3.7.4.

1. Open iReport. Go to preferences and be sure in classpath and point to sqlite3 driver file: sqlitejdbc-v056.jar. Of course you need to download that file if you do not have it. Just Google it.

2. Select File | New | Report | Blank | Launch Report Wizard. In Connections pick a new Database JDBC connection and use the below settings (You can always reconfigure this from the Designer toolbar clicking on the Report Datasources icon):

name: sampleEmployee
JDBC Driver: org.sqlite.JDBC
JDBC URL:  jdbc:sqlite:/Users/nestor/sampleEmployee.db

3. Hit Test button (username/password both are blank) and you should get a successful message. Hit Save. 4. In Query(SQL) paste this:

SELECT first_name, last_name, office_name, department_name FROM employee

4. Expand “Fields” on the left and drag each of them to the Detail portion of the report.

5. From the reports element palette on the right drag labels to the Column Header section.

6. Click on Preview and the report should show the two records we added initially to the employee table. When you hit “Preview” Jasper compiles the .jrxml file into a .jasper file. This .jasper file will be deployed in a reports folder that the application reads so the expensive report compilation will be done by the iReport user and we will reuse it later from our backend.

Implementation

We use Spring for dependency injection however for Jasper Reports we do not use Spring. Jasper provides enough simplicity, that is all. First you will need to include some dependencies. As I use Maven all I have to do is to include the below in my pom.xml:

<dependency>
          <groupId>net.sf.jasperreports</groupId>
          <artifactId>jasperreports</artifactId>
          <version>3.7.4</version>
</dependency>

Real time reporting needs a local database but concurrency makes it prohibited to have just one. Ideally User A running the sampleEmployee report should not be affected by User B who is running the same report. This is not a problem if the database is of a warehouse type meaning it gets populated asynchronously. In our case though we need to be sure we create tables per user for those cases where the data must be populated synchronously (real time) We address this using an individual database per user session. So we will have several local databases like:

...

report_17A1B49C645D39C2F2BE4CD12B54AF75.db
report_E33819D245A598EDA01D1E3FC468EFE8.db

...

Each database will have a dataSource associated to it which is built on the fly by a DataSourceService. As you can see the local per user BD convention is “report_” + JSESSIONID + “.db” Given a URL like report/{dataSourceName}/{reportName} for example report/rt/sampleEmployee we can determine by conventions that the data source is to be built on the fly (instead of using one already injected by Spring). That is what real time (rt) stands for. The Controller can then instantiate a service following conventions (SampleEmployeeReportService) and then call a method populateDB() on it (of course it is implementing a custom ReportService interface) The service will use a DAO that will access the local sqlite DB, it will drop the table representing the report data set and it will populate with certain data that follows some business rules. The power here is unlimited as we play with a high level language like Java. Data can literally come from any place. The DAO uses Spring JDBC template to connect to the specific local user DB. It creates the metadata and fill out the table as well. Finally the Controller will invoke JasperReports to render the table content with the help of the sampleEmployee.jasper file created from iReport. Below is what I think is relevant from Java perspective. This is just a typical Controller class relying on injected Services.

@Controller
public class ReportController extends RootController {

    @Autowired
    Properties applicationProperties;

    @Autowired
    ReportDataSourceService reportDataSourceService;

    /**
    * A filename for reports is composed of two tokens
    * <datasource>_<description> If datasource == 'rt' the datasource will be a
    * local to the server sqlite db built on the fly
    *
    *
    * @param request
    * @param response
    * @param result
    * @param reportType
    * @param dataSourceName
    * @param reportName
    * @param model
    * @return
    * @throws IOException
    */
    @RequestMapping("/report/{dataSourceName}/{name}")
    public ModelAndView run(HttpServletRequest request,
          HttpServletResponse response,
          @ModelAttribute("report") Report report, BindingResult result,
          @PathVariable("dataSourceName") String dataSourceName,
          @PathVariable("name") String name, Model model) throws IOException {
      // Initialize the context (mandatory)
      ControllerContext ctx = new ControllerContext(request, response);
      init(ctx);

      String sessionId = request.getSession().getId();

      // Bind to path variables
      report.setDataSourceName(dataSourceName);
      report.setName(name);
      report.setSessionId(sessionId);
      report.setParams(ctx.getParameterMapWithEncoding());

      // If real time type get the Service bean and populate DB
      try {
          // name = name.substring(0, 1).toLowerCase()
          // + name.substring(1);
          ReportService reportService = (ReportService) applicationContext
                  .getBean(name + "ReportService");
          reportService.populateData(report);
      } catch (Throwable e) {
          e.printStackTrace();
      }

      // Get proper parameters for jasper
      Map<String, String> jasperParams = getParamsForJasper(report
              .getParams());

      // insert parameters commonly used by most reports
      String realTimeDbPath = reportDataSourceService
              .getRealTimeDatabasePath(report);
      jasperParams.put("realTimeDbPath", realTimeDbPath);

      // Get the master report datasource
      DataSource dataSource = reportDataSourceService.getDataSource(report);

      // Get reports path
      String path = getReportsPath();

      File reportFile = new File(path + "/" + dataSourceName + "_" + name
              + ".jasper");
      byte[] bytes = null;

      Connection connection = null;
      try {
          connection = dataSource.getConnection();
          bytes = JasperRunManager.runReportToPdf(reportFile.getPath(),
                  jasperParams, connection);

          response.setContentType("application/pdf");
          response.setContentLength(bytes.length);
          response.getOutputStream().write(bytes);
      } catch (Exception e) {
          StringWriter stringWriter = new StringWriter();
          PrintWriter printWriter = new PrintWriter(stringWriter);
          e.printStackTrace(printWriter);
          String stackTrace = stringWriter.toString();
          result.addError(new ObjectError("report", stackTrace));
          return getModelAndView(ctx, "report/error");
      } finally {
          if (connection != null) {
              try {
                  connection.close();
              } catch (SQLException e) {
                  // TODO Auto-generated catch block
                  e.printStackTrace();
              }
          }
      }
      return null;
    }

    @RequestMapping("/report/list")
    public ModelAndView list(HttpServletRequest request,
          HttpServletResponse response, Model model) throws IOException {
      // Initialize the context (mandatory)
      ControllerContext ctx = new ControllerContext(request, response);
      init(ctx);

      // Get report path
      String path = getReportsPath();

      // Get all available reports
      File inFolder = new File(path);
      FileFilter fileFilter = new FileFilter() {
          public boolean accept(File file) {
              return !file.isDirectory()
                      && file.getName().endsWith(".jasper");
          }
      };
      File[] files = inFolder.listFiles(fileFilter);
      TreeMap<String, String> reports = new TreeMap<String, String>();
      for (File file : files) {
          String fullName = file.getName();
          String baseName = fullName.substring(0, fullName.length() - 7);
          String[] tokens = baseName.split("_");
          if (tokens.length == 2) {
              String url = tokens[0] + "/" + tokens[1];
              String name = tokens[1];
              reports.put(name, url);
          }
      }
      model.addAttribute("reports", reports);
      return getModelAndView(ctx, "report/list");
    }

    private String getReportsPath() throws IOException {
      return (String) applicationProperties.get("jasper.reports.path");
    }

    /**
    * Jasper will not accept more than one parameter named with the same name.

    * We most likely will not need to pass complex objects to Jasper so we
    * should be OK
    *
    * @param requestParams
    * @return
    */
    private Map<String, String> getParamsForJasper(
          Map<String, List<String>> requestParams) {
      Map<String, String> jasperParams = new HashMap<String, String>();
      for (String key : requestParams.keySet()) {
          jasperParams.put(key, requestParams.get(key).get(0));
      }
      return jasperParams;
    }

Sub Reports

To illustrate subreports let us create a report that accepts a parameter, the full name of an employee. We can run for example a Bing query to see some public pages that might be related to each employee (like a custom and proprietary background check ;-).

1. Establish the dataset you will need and script metadata. and data. Here is sampleCheck.sql:

CREATE TABLE `links` (
`full_name` varchar(50) NOT NULL,
`title` varchar(250) NOT NULL,
`url` varchar(50) NOT NULL
) ;

2. Script the data. Here is sampleCheck_data.sql

INSERT INTO links (full_name, title, url) VALUES ("Nestor Urquiza", "Nestor Urquiza", "http://www.bing.com:80/search?q=nestor+urquiza");
INSERT INTO links (full_name, title, url) VALUES ("Nestor Urquiza", "Nestor Urquiza Resume", "http://www.nestorurquiza.com/resume");
INSERT INTO links (full_name, title, url) VALUES ("Pablo Cardone", "Pablo Cardone", "http://www.bing.com:80/search?q=pablo+cardone");
INSERT INTO links (full_name, title, url) VALUES ("Pablo Cardone", "Pablo Cardone Resume", "http://www.pablocardone.com/resume");

3. Build a local DB to be used to design the report:

sqlite3 sampleCheck.db < sampleCheck.sql
sqlite3 sampleCheck.db < sampleCheck_data.sql

4. At this point table “links” exists inside sampleCheck.db so open iReport and create a report called rt_sampleCheck. A file with extension .jrxml will be created. Use as data source the newly created DB. So:

name: sampleCheck
JDBC Driver: org.sqlite.JDBC
JDBC URL:  jdbc:sqlite:/Users/nestor/sampleCheck.db
Query: select title, url from links where full_name = '$P!{fullName}';

5. Open iReport and include title and URL. Create a parameter called “fullName” Run the report. When asked for the parameter value use “Nestor Urquiza” The jasper file (rt_sampleCheck.jasper) is generated in the same directory where the jrxml file is.

6. Create the DAO implementation (SampleCheckReportDAO) that accepts a List of Objects to persist

7. Create a Service (SampleCheckReportService) that queries Bing for all users full names.

8. As we have follow conventions now it will be enough to call /report/rt/sampleCheck?fullName=Nestor+Urquiza from the browser. Note that we need a parameter this time. The parameter is inserted as is as a report param that is why the convention is so important here. Note also we could have live without a parameter in this simple example as we can restrict from java what we populate in the table however this is needed for our next final example. We will use sampleCheck as a subreport of sampleEmployee

Report plus Subreport

In reality both of the reports we have built so far use the same connection as they are both real time reports and so they use the same local sqlite database. However there are more complicated cases and I want to be sure I show here how a subreport can use a different connection than the main report. Once you add the subreport using the ireport GUI you will need to edit the XML as shown below. Note how I use a parameter to provide the location of the real time database. Of course I do so because I want to be sure I can inject that value later from Java.

<subreport>
              <reportElement x="14" y="35" width="200" height="100"/>
              <subreportParameter name="fullName">
                  <subreportParameterExpression><![CDATA[$F{first_name} + " " + $F{last_name}]]></subreportParameterExpression>
              </subreportParameter>
              <connectionExpression><![CDATA[java.sql.DriverManager.getConnection("jdbc:sqlite:" + $P{realTimeDbPath}, "", "")]]></connectionExpression>
              <subreportExpression class="java.lang.String"><![CDATA[$P{SUBREPORT_DIR} + "rt_sampleCheck.jasper"]]></subreportExpression>
          </subreport>

The report when run will ask for the parameter and we will provide locally a value like below:

/Users/nestor/sampleCheck.db

The parent report must have empty path to look for the subreport. The reason is that we will drop all reports in the same folder to avoid another parameter to be passed (the subreport path)

All that is left now is to run our main report from Java. So we need to provide as a parameter “realTimeDbPath”. From Java we will need to populate of course the subreport data source and that is why we call the sampleCheckService from sampleEmployeeService. We can check still /report/rt/sampleCheck?fullName=Nestor+Urquiza works. Now /report/rt/sampleEmployee is rendering the sampleCheck subreport as well. Note that sampleCheck subreport gets the fullName from the master report so there is no need to insert it from Java. Still this works both from iDesigner with no Server in the middle (ideal for report writters) and from Java which is of course needed to present dynamic real time data. It would be great if JasperReports would allow for the use of connectionExpression element at the main report level and not only at subreport level. Unfortunately that is not the case and that is why we need to supply the main report connection as part of the URL (Jasper Report engine will not be able to dynamically based on a parameter discover which connection to use for the master report) So for the master report we provide a connection object whereas for the subreports we use “connectionExpression”. The expression will need to use the “realTimeDbPath” parameter when the report uses a real time data set or will be completely hardcoded when using any SQL database. Note that datasources do not necessarily have to be hardcoded in subreports “connectionExpression” elements. We can always use parameters to build them on the fly as already explained.

Below is a snapshot of the pdf report obtained from the web request:

And here a snapshot of the same from iReport. Same JRXML used from the backend and the frontend without stepping on each other toes.

A Check List

Below is a check list for both iReport Designers (BA) and Java Developers. As a developer you will need to provide the iReport Designer with:

1. script.sql

2. script_data.sql

3. Path where to put the compiled report file so it shows up from the web interface

4. Agree on a name for the report. Use Spring Resources for internationalization so when showing the name of the report it can show a more descriptive name but try hard to get a name that makes sense to everybody.

As a report Designer you will need to:

1. run the below commands:

sqlite3 local.db < script.sql
sqlite3 local.db < script_data.sql

2. Create a report and use local.db from above as data source. Naming conventions is important. The report must be named using the data source name followed by underscore and then the name of the service agreed with the developer. Case matters, so be aware.

3. Configure subreport "subreportParameter" and "connectionExpression" nodes. Here is a list of useful connection expressions: realtime sqlite:

<connectionExpression><![CDATA[java.sql.DriverManager.getConnection("jdbc:sqlite:" + $P{realTimeDbPath}, "", "")]]></connectionExpression>
mysql: <connectionExpression><![CDATA[java.sql.DriverManager.getConnection("jdbc:mysql://localhost:3306/mySQLDatabaseName", "myUserName", "myPassword")]]></connectionExpression>
sqlserver: <connectionExpression><![CDATA[java.sql.DriverManager.getConnection("jdbc:jtds:sqlserver://localhost:1433/sqlServerDatabaseName;prepareSQL=3", "myUserName", "myPassword")]]></connectionExpression>

4. Put the results in the server reports directory.

You can download the Jasper sources and sqlite databases from here.

Saturday, September 18, 2010

Command Line Interface CLI from Spring

Command Line Interface (CLI) is useful when you want to script certain actions. While I prefer Perl and Python for this kind of tasks sometimes you have reduced team of developers that are just comfortable with one language. In addition what happens when you have a lot of libraries already built in that language? You might be able to interact with those still but let us face it native is faster and so better.

Let us say you are using Spring and have a lot of investment in Java. You can really easy integrate with your existing libraries and provide complex command line driven actions built completely in Java. You can later then run those commands still from Python, Ruby, Perl, AWK, bash etc.

The only thing you need to do from your main() method is to get access to the spring application context and get a service bean. The rest of annotations and definitions will be wired provided the spring context file allows for correct dependency injection. This is nothing different from what you would do in regular spring programming.

Below is a sample java class intended to rename files from an input folder to an output folder. The actual implementation is not provided as the only purpose of this post is clarify how to interconnect a main() java class with existing autowired (let's say from an included jar file) spring resources.

package com.nestorurquiza.cli;

import java.text.ParseException;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.context.support.ClassPathXmlApplicationContext;

import com.nestorurquiza.service.FilesService;

public class RenameFiles {
    private final static Logger log = LoggerFactory.getLogger(RenameFiles.class);
    
    /**
     * @param args
     * @throws ParseException 
     */
    public static void main(String[] args) {
  log.info("Starting ...");
  
        //get spring context first
        ClassPathXmlApplicationContext applicationContext = new ClassPathXmlApplicationContext("cli-context.xml");
        
        //Get then a reference to a service bean
        FilesService filesService = (FilesService) applicationContext.getBean("filesService");
        
        // parse command line parameters to CmdParser
        FilesCommandParser commandParser = new FilesCommandParser(args);
        
        String inFolderPath = commandParser.getInFolderPath();
        String outFolderPath = commandParser.getOutFolderPath();
        
        //Invoke the service method that executes the task
        fileService.renameFiles(inFolderPath, outFolderPath);
        
        //Exit the shell process
  log.info("... Done");
        System.exit(0);
    }
}

Here is the command line parser

package com.nestorurquiza.cli;
import org.apache.commons.cli.BasicParser;
import org.apache.commons.cli.CommandLine;
import org.apache.commons.cli.CommandLineParser;
import org.apache.commons.cli.HelpFormatter;
import org.apache.commons.cli.Option;
import org.apache.commons.cli.Options;
import org.apache.commons.cli.ParseException;

public class FilesCommandParser {
    private Options options = new Options();
    private CommandLine cl;
    
    private String inFolderPath;
    private String outFolderPath;
   
    public FilesCommandParser(String[] args) {
        options.addOption(new Option("inFolderPath", true, "input folder path"));
        options.addOption(new Option("outFolderPath", true, "output folder path"));
        parseCommandline(args);
        help();
        mandatory();
    }

    public String getInFolderPath() {
        return inFolderPath;
    }

    public void setInFolderPath(String inFolderPath) {
        this.inFolderPath = inFolderPath;
    }

    public String getOutFolderPath() {
        return outFolderPath;
    }

    public void setOutFolderPath(String outFolderPath) {
        this.outFolderPath = outFolderPath;
    }
    
    private void parseCommandline(String[] args) {
        CommandLineParser parser = new BasicParser();
        try {
            cl = parser.parse(options, args);
        } catch (ParseException e) {
            System.out.println("Parsing failed. Reason: " + e.getMessage());
            this.generateHelp();
            System.exit(1);
        }
    }
    
    private void generateHelp() {
        HelpFormatter formatter = new HelpFormatter();
        formatter.printHelp("RenameFiles", options);
    }
    
    private void help() {
        if (cl.hasOption("Help")) {
            this.generateHelp();
            System.exit(0);
        }
    }
    
    private void mandatory() {
        if (!cl.hasOption("inFolderPath") || !cl.hasOption("outFolderPath")) {
            this.generateHelp();
            System.exit(0);
        } else{
            setInFolderPath(cl.getOptionValue("inFolderPath"));
            setOutFolderPath(cl.getOptionValue("outFolderPath"));
        }
    }

}

Friday, September 17, 2010

Dynamic sitemap with annotations in Java

My current project is going fast. After a month of work we have 159 dynamic pages and this is going for sure up.

I was asked to provide a list of all the pages so the BA can specify what kind of security each of the pages should have.

I have built Site maps (or sitemaps) in different languages, both dynamically and by the time of release (pre published).

If you follow Spring @Controller and @RequestMapping annotations generating the site map is a piece of cake. Below is how to get a List of URLS:

...
import net.sourceforge.stripes.util.ResolverUtil;
...
private List<String> getSiteUrls(ControllerContext ctx, String controllersPackage) {
        List<String> urls = new ArrayList<String>();
        ResolverUtil<Object> resolver = new ResolverUtil<Object>();
        resolver.findAnnotated(Controller.class, controllersPackage);
        Set<Class<? extends Object>> controllerClasses = resolver.getClasses();
        for (Class<? extends Object> controller : controllerClasses) {
            String controllerRequestMapping = "";
            if(controller.isAnnotationPresent(RequestMapping.class)) {
                controllerRequestMapping = controller.getAnnotation(RequestMapping.class).value()[0];
                if(controllerRequestMapping.endsWith("/")) {
                    controllerRequestMapping = controllerRequestMapping.substring(0,controllerRequestMapping.length() - 1);
                }
                if(controllerRequestMapping.endsWith("/*")) {
                    controllerRequestMapping = controllerRequestMapping.substring(0,controllerRequestMapping.length() - 2);
                }
            }
            for (Method method : controller.getMethods()) {
                if (method.isAnnotationPresent(RequestMapping.class)) {
                    RequestMapping requestMapping = method.getAnnotation(RequestMapping.class);
                    urls.add(controllerRequestMapping + requestMapping.value()[0]);
                }
            }
        }
        Collections.sort(urls);
        return urls;
}

Below is a JSP to render the results:

<%@ include file="/WEB-INF/jsp/includes.jsp"%>
<%@ include file="/WEB-INF/jsp/header.jsp"%>

<h2><spring:message  code="sitemap"/>:</h2>

<div id="global_error"><form:errors path="sitemap"
             cssClass="errors" /></div>

<c:forEach var="url" items="${urls}">
            <a href="<spring:url value="${url}"/>">${url}</a> <br/>
</c:forEach>

<%@ include file="/WEB-INF/jsp/footer.jsp"%>

Tuesday, September 14, 2010

State Machines, Business Process, Rules with Drools

I have posted before how to use Drools as a rule engine but Drools as I said is way more than that.

When I developed (4 years ago) my own BPM based on SCXML I was looking for a state machine implementation that would allow me to change business rules with a domain specific language.

Since then I have evaluated Spring Webflow, JBPM and Drools. Spring Webflow is too oriented to web so I rapidly understood that was not what I wanted to make sure I had BPM on my BHUB architecture.

Later on I found JBPM very complete but not simple enough. I wanted something lighter. It is then when a year ago I came into this post. At that point I did a proof of concept and I proposed its use.

Here I am again proposing a year later its use again for a different project. This time with an already released drools eclipse plugin that actually worked out of the box for me.

I had success with a previous version of the plugin but this time I have to say it was really easier. My instructions to debug and develop with Eclipse are still valid. There are some stuff like expression evaluations that I would like to see but so far with variables inspection I would say the team can move. Of course this is only available in the Eclipse platform and so the team will need to use it (and I know how hard it is to switch IDEs)

State Machine

To see how you can implement a state machine using Drools refer to the StateExampleUsingSalience example from the distribution:

1. Download http://download.jboss.org/drools/release/5.1.1.34858.FINAL/drools-5.1.1-examples.zip.
2. Import the project in eclipse and make it a maven project.
3. Include the below in the pom.xml (For some reason that dependency was forgotten)

<dependency>
            <groupId>com.thoughtworks.xstream</groupId>
            <artifactId>xstream</artifactId>
            <version>1.3.1</version>
</dependency>

4. Put some breakpoints in the "then" section of the org.drools.examples.StateExampleUsingSalience.drl file.
5. Right click on org.drools.examples.StateExampleUsingSalience and use option "Debug as Drools Application"
6. Put the "salience 10" statement in rule "B to D" to see how you force certain state transition to have priority.

Business Process

Business process is all about states. It is a synonym of workflow. A business process outlines the order in which the different components of your business are executed: For example we need to pull data from different feeds, then run a report, make some corrections here and there, send emails, wait for responses and finally run the Executive Report.

Rules

I am not going to discuss semantics here but just the idea that any business process, besides states, needs rules that sometimes apply inside states and some times out of the states. Rules are several "if" statements that will affect the state or not but ultimately will have an impact on the output. As a programmer we program rules in backend, middle tier, front end and a mix of them. It would be ideal if rules could be reused but also if they are externalized so someone with domain logic could play with them. A rules engine allows to use Domain Specific Languages (DSL) to separate even further the application concerns. Rules can be applied to any of the three components of the MVC pattern even though you will find it more related to the M and C components.

Controversy

If a Rules engine is ok or not for your project is a very important question. Martin Fowler has written an article about rules engines and their drawbacks. You must be sure you understand where you are heading because your project might become easily unstable and unmanageable.

A lot of discussions are going back and forth in the community about rules or process centric solutions. Drools advocates to mix the concepts for more real implementations and I cannot agree more. I personally like SCXML because of the parallelism (start here to learn about Harel State charts).

In any case so far Drools has the simplicity I look for and I think the team will be happier to use it if I compare it to an in house business process and rules engine based on SCXML, especially because nowadays is "in vogue" to move away from XML ;-)

References

http://www.jboss.org/drools/drools-flow.html

Saturday, September 11, 2010

Liferay 6 Web Coast Simposium Part 2

Brian Cheung Introduced the LESA system. It is an internal to Liferay team support system that resolves some of the issues JIRA has (for example multiple language support, issue response ranking to give for example 1 star for an engineer saying "won't fix, period", audit capabilities, simpler interface for example just responding is enough to reassign the issue. I asked if the product would be available as open source however there is not decision still taken on this regard.

Mike Hon spoke about plans for the document library like mounting from other datasources (repositories), windows explorer kind of user interface, version control for layouts, workflow engine integration to control document changing. He went through integration of a workflow backed form engine (Workflow forms). There are improvements on Message Center (which is a more complete solution than the email portlet), Contact Center, Expertise Locator: for large developer teams, Knowledge Base, blog (anti spam for example). He gave an introduction of DSL and Drools for portal assets. He spoke about the upcoming "Workflow Designer for Kaleo" a visual tool that will allow to manage workflows. Currently there are efforts to build a console for tomcat a la Websphere management console. There will be broader support for caching technologies. I asked about Workflow forms and expando and the response was that expando is just one of the possibilities for the workflow forms. I had the opportunity to talk to him about Workflow Engine. As it is today is very limited as it does not support any DSL however that seems to be enough for the needs of the community. In addition the workflow engine integrates very easy with Intalio or JBPM thanks to the pluggable architecture so for more complex needs the architecture is pluggable.

I talked again with Raymond about OSGI modularity and liferay and the lack of documentation on this regard. This is because it is indeed a brand new subject that the Liferay team is considering with very high priority. Raymond talked about improvements in WCM. He presented the new WYSIWYG editor. There are better indexing capabilities. Freemarker support has been added for those looking for a better alternative to talk to Java from templates. More things coming up: Support for local and Remote Live environment support, scheduling publishing to update the site just with the delta changes, locale buckets per articles. He showcased on Content Portlet,
Enhanced SEO support through smart tags, support for custom html header (h1, h2...) styles forward compatibility of exported LARs, support for liferay tags (Alloy UI) from Freemaker, very powerful feature indeed. He talked about Site branches: For example brand new skinning of the site been worked out months ahead a time. The reality is as I pointed out that this is more like a fork rather than real SCM. Merging concept for example does not exist and its indeed not needed. Content people (CSS, HTML) are able to work on this without any help from developers. He talked about the Web Based RAD, a neat product coming up (officially still unnamed) which would allow RAD capabilities right from the Web Interface. About OSGI, Liferay will be able to be patched at runtime for example thanks to OSGI. Java Class loaders will not longer be a problem. This will make Liferay of course a more scalable platform. I asked the question about why not using an existing SCM for WCM branching and the response was that pages have other components associated to them and they are all in the database and so they have developed the SCM themselves.

Ivan and Wesley talked about Liferay best practices: LDAP when a user logs in is a best practice as the data is brought into liferay just when it is needed. HA environment: synch ehcache for consistent data on nodes, clustering indexes for consistent search, centralize document library repository. Recommended of course to use SAN/File System option for file storage over DB. It caught my attention the existence of a Data Migration tool Portlet to mitigate the conversion of files to a new document file system. Hardening Liferay: Change admin password, default timezone and locale, default communities and roles.
nightly or hourly backup of DB. Schedule backups, use SCM to store code and Disaster recovery (remote data backups, secondary servers). Disaster recovery: Have a preproduction environment for minimizing downtime. Swap pre and production environments when new features are considered stable. On data changes: Avoid touching data, go always through the GUI and web service APIs, use absolute paths for deployment directory, specify in portal.properties where the liferay home is (license key, lucene search, deployment folder are stored there). On code maintanance: keep core and custom code separated, apply EE Service packs and emergency patches (hot fixes)

James Min talked about Performance and scalability. There is a white paper with data about benchmarks on the Liferay website. He presented the following showcase: 1 http server (xeon qc 2.4GHz, 2GB RAM) 2 app servers (xeon qc 2.4GHz 8GB RAM), 1 db (xeon qc 2.4GHz 4GB RAM, 1HDD 1500rpm). Got user data (1MM) taken from US census data (max 400MM users) ramp up period of 1 user every 100 msec from multiple injectors, warme up for 5 minutes.
They used Grinder from linux servers. For the Login transaction (one of the most expensive operations in the current Liferay implementation) they simulated 3000 users during 30 min and they got responses of second and a half with a maximum of 81% processor utilization. This is a point where it is clear more servers will be needed in the cluster as over 1.5 sec response time would not be a good user experience. For WCM, Message boards, Blogging and other scenarios data is available as well. He talked about standby times like hot-hot, hot-warm and hot-cold. They use VisualVM as a good tool to measure performance. He recommended several best practices to create scenarios that reflect actual usage like having ready load tester and test scenario (configuration, test script). Disaster recovery procedure should be tested every quarter at a minimum. Caching mechanisms: replicated distributed EhCache is available out of the box, just search for Cluster in portal.properties and that is enough to identify how to use cache. More advanced caching would be using sharded distributed cache: terracota and memcached are examples of it. He went ahead and recommended Mika's blog for terracota.

There was also a JSF presentation from Triton lead architect. He is one of the main contributors to the project. Certainly as we both agree this is a good solution for those teams with lack of Javascript, CSS and HTML skills (No front end developer/designer on board). Andy Shwartz blog was recommended. He explained ICEfaces and IcePush, PortletFaces-Bridge was introduced. Declarative programming is what IFaces and JSF in general promotes. He showed the EVENT_PHASE of the portlet API to show inter portlet communication (with the bridge). ICEPush is now bundled with ICE Faces. I asked the questions for the plans towards a semantic markup and that would be resolved with the new widgets in version 2.0 which basically will be based on guess what ... YUI. I asked the question why xml for AJAX and the response was that as there is markup coming back from the server that is a must have. The reality is that I can see Liferay supporting more Icefaces in upcoming versions. The big and most important thing is though to provide as an end result semantic markup.

I spoke to some senior developers and consultants about supporting transactions in Liferay Portal. Now with the increased support at plugin level you can use the Liferay APIs in transactional way. Search for JTA in portal properties to find out how to support transactions in Liferay.

Finally I asked the question to several people backing up the whole environment. Code to svn, WCM in DB etc ... a way for automation? Data Migration tool? and in general that is a difficult task because Liferay is not opinionated but rather is trying to support any environment. I keep on defending that someone must come with a rapid development environment setup (most likely based on Maven) that could allow thousands of developers with scarce resources in their teams to catch up with Liferay way of developing fast enough. America at least is driven by small business and teams of 4 developers are more common than teams of over 20 developers. If the learning curve is high Liferay will not be addressing a broader community that could give up for more agile frameworks and languages in the near future.

One of the aanouncements was aboutFacebookConnect supported as opensso solution (OAth 2.0). I asked the question because I keep on writting and consulting about this: True SSO is not just a transparent login, it must implement keep alive and logout for a complete reach user experience. The Facebook solution is so far a transparent login and not a real SSO solution.

Brian Kim taked about Partner recognition and Brian Cheung concluded with a funny history of the company. I really enjoyed this part. It reminded me my trip to Rome and staying in a hostel very similar to the one he described. I found out how the Liferay logo comes from what we call in spanish "Vitros", those sensational glasses you find in Churches.

Jeff Handa talked about Liferay templates (page and site templates) This is an important step that addresses the issue of creating from scratch whole coomunities, pages and portlets. LAR import/export is not enough. Page templates Admins can define permissions on those templates and also other settings like Page type, layout template, portlet placement and configuraton. All that then can be reused when creating new communities. When creating new pages a drop down allows to pick the template provided you have permissions to view them. Site Templates (similar concept but for a collection of pages) were also visited. These features are available programatically: Service Builder managed entities(LayoutPrototype, LayoutSetPrototype) See the code for default-site-templates-hook to see how to use this programatically. See util classes LayoutLocalServiceUtil LayoutPrototypeLocalServiceUtil. Somne recommendations provided: Establish good naming conventions when building templates and set the right permissions from the beginning. Once the templates are used to create a new community they become final sites and pages and so they not longer get updated after changes.

Greg Amerson presented a workshop on Liferay Development Studio. Showed how to point to the portlet SDK. Support for Maven is coming up before the end of the year. He has published several videos online showing how to use eclipse liferay Interface. He mentioned that under the hood velocity templates are responsible for the magic behind the wizards and he has plans to expose that to the community so developers can create their own more complicated wizards. This will increase productivity for sure. He mentioned he will try to fix in the future the false negatives (red exes) that eclipse shows up on JSPs. Seriously I will follow up on this one with him, JSP support in Eclipse is not good in comparison with Netbeans and that is well simply not fair ;-)

Hope this post was large for something. I congratulate again the Liferay team for the effort they are making and I got a very positive experience from the Simposium. Hopefully Liferay will become an agile tool no matter its completeness and complexity.

Wednesday, September 08, 2010

Liferay 6: notes about the West Coast Simposium

According to the business presentation more than 10,000 customers are looking for a change as their current portals are in end of feature life (EFL). Liferay is then promising to cover that gap. How? Let's review what I heard today.

It was a pleasure meeting the liferay team and knowing their plans. I started sympathizing with them when I first heard the story: They went to the Madrid Liferay Symposium dressed a la "Silicon Valley", well you know blue jeans and that cool stuff. A big surprise when in the auditorium from junior to executives everybody was wearing a suit and a tie. Well, we will do better next time and here they are in Garden Grove, CA and they come wearing ties just to find people wearing Silicon Valley clothes.

The point is though that Liferay support all of us, those that wear a tie and those who don't and we will see more on this broad support below.

The new Liferay 6 comes with Workflow support, JSR 286, extended permission aware modularity, user ranking based on better rules (social ranking), a complete IDE solution based on Eclipse (Liferay Developer Studio)

Nate Cavanaugh, the leader of AlloyUI (based on YUI) presented an introduction to YQL, showing how all apis are hosted in Yahoo and there is minimum that gets locally downloaded. That has made a difference in terms of performance (check this to see why even cache will not be there to help if you load all javascripts in your site header). Thanks Nate for your explanations! The on-demand use of different javascript inclusions just by the time you really need them and the combination of them to minimize the amount of open sockets are a plus that YUI natively offers out of the box. The YUI documentation is great and so is the Liferay (Alloy UI). As Nate stated the Alloy UI tag libs are created from javadocs. See for example "aui:column" and "aui:script" tags. The examples I saw were not invasive so any HTML attribute from original HTML tag can be overridden. This is very important to keep sites with semantic markup. JQuery has been dropped and that is official now. YUI migration was really easy according to Nate.

Chris Stavros from Level Studios talked about the needed collaboration between front end and back end guys. The message has been taken by Liferay as you will read below. He talked about exposing services (for example User, Role) then accessing them (Utility Services) and finally aggregating them (Using Portal and Template capabilities). He showed an intensive use of Velocity templates in his showcase: Minimum code on the backend needed and a lot of front end development otherwise. I mentioned to him the BHUB idea instead of an extra service layer which he agreed to be a better approach. We talked about how JSR-286 is still not applied to most of the current portlets but rather stick to the JSR-168 specification. The Search Portlet is an example of it. I asked the about JSR286, servlets and service builders opinion just to find out we are on the same page. He concluded pointing to monsterenergy.com as a sample implementation of a complete change of the Liferay UI in fact that is full Flash!

Richard Sezov spoke about ServiceBuilder. How the idea of defining entities in just one file (service.xml) sounds perfect. He showed how to put your portlet in the control panel. He presented MVCPortlet which supports JSR-286. The most important feature of this as I have said before is the possibility of reusing same Controller to serve different content (see serveResource() method) He showed how AlloyUI tag libraries allow to reuse validations from the front end. He showed how to use for example the SearchContainer tag to render a table with the consistent Liferay look and feel. Talked about permissions: Resources are declared to be integrated with the permission system. He is finalizing the "Liferay in Action" book, the officially recommended developers guide. ....

Ali Loghmani from CIGNEX Technologies spoke about integrating Liferay in SaaS environments. "Liferay is SaaS ready" he stated. Liferay caching mechanisms integrating with Terracotta and ehCache are important features to consider that leads Ali to choose Liferay as the View side (rendering engine) for his distributed projects. He stated that most of the operations in websites are read operations, this is arguable valid in all projects but certainly a situation for many of those I have faced. Liferay as just the Front End layer: He presented a case study using REST APIs. This is actually in Sync with my postings about BHUB architecture. A second show case showed the use of ESB (providing security, cache, transaction, transformation services) to integrate legacy SOAP APIs. I asked the question and he has successfully used Mule for his ESB implementations.

Greg Amerson (who was member of MyEclipse project) presented the Liferay IDE (Liferay Developer Studio). Liferay IDE 6 will come with support for JIRA for example. There are plans for Workflow visual editors in the near future. He showed layout and theme development from the eclipse IDE. Same for Service Builder including the new possibility to expose web services from plugins. Some features like Code Snippets view, Liferay Tags Search and wizard Layouts from Eclipse UI are of a real interest to those "traditional developers" (Keep on reading to find out what Brian Chiang thinks about types of developers out there) in constant search of IDE capabilities. The old days of table generation for layout in Liferay are gone. Default layout were generated using <div> and not tables like in the past. He showed how to import and convert plugins. He explained the current support for deployment and responding my question he said he has plans to support external running servers (my preferred method as it respect the separation of concerns. Bottom line an embedded server slows down the IDE - unnecessarily). His recommendation in the meanwhile is to check out Eclipse (RSE) to use external automatic deployment of things like JSPs. Now "Report Bugs" is integrated in the GUI and Greg will get the bug reports directly, I guess in JIRA.

Raymond Auge clarified some workflow doubts I had. I will need to come back with questions for Mike Hon on this regard. Basically the workflow engine is proprietary (Named Kaleo). The reasons for that decision are the current complexity of JBPM and Intalio which even though are supported as plugins would make the integration a real challenge as external systems will need to be configured and supported (not real embedded paths for those monsters). Still I want to understand why going with proprietary and not reusing simpler engines like Drools or why not SCXML. On a side note I asked Raymond about the decision on using a proprietary scripting language and more than that favoring it over velocity for email templating. He was clear that he will consider supporting Velocity for email templates for up coming releases.

Brian chiang spoke about development strategies: Customizations and New developments. He went through the six types of plugins on the customization side. Even extension environment is now a plugin. Hooks are lighter and have runtime support however they are limited still in comparison with the extension environment. He put as an example http://www.sesamestreet.org/ (Sesame street website). He mentioned really quick web plugins and the best practices using the liferay-plugin tag. He jumped into New development and then he went ahead classifying developers as: Traditional developers(they love compilers and things like JSF, they like IDEs), content developers (ui developer who loves to do things directly in the browser), script developers (lightweight developers, those that like interpreted languages). He said version 6.1 will make some features available for the latest. In fact he showed a Controller built in pure JSP which I am very interested on seeing in action. Let us put it this way, this allows for real rapid development even though of course if you are one of the first two type of developers you will probably do not like. A sandbox sample was presented where he just created a directory and after naming it as a theme Liferay automatically generated there a theme project and made it available directly from the server URL. He wants to do the same for portlet development and basically this demonstrates the commitment Liferay team has for agile approaches but at the same time the respect for those that prefer to work in different ways, The statement is clear Liferay wants to atract all kind of developers and not just java developers. He zipped a PHP file and dropped it in liferay and the same can be done with ruby so PHP and Ruby will be pretty soon available to fast coding on the front end side. I have to admit it the JSP Controller was something cool I have thought about many times before. Simpler is better and Java developers do already know JSP. Listen careful I am still advocating for the support of the MVC pattern but giving the possibility for fast changes without recompiling. I asked the question about JPA and the response was it is almost there, however Liferay team will continue supporting both JPA and hibernate XML files. I guess something like the Petclicnic spring example where you can chose from different persistent mechanisms is what Liferay team has in mind. I asked in terms of Maven and I was pointed to Thiago post (http://www.liferay.com/web/thiago.moreira/blog/-/blogs/liferay-s-artifact-are-now-mavenized) He has done the job on mavenizing Liferay and this is simple great news. I started that effort myself with some portions of the Portal but now no effort is needed on this regard. Still the Liferay team is comitted to support both Maven and Ant. Once again giving space for different tastes.

James Min gave a presentation on High Availability, how to handle concurrent users and reduce the single point of failure. Too much to cover in so little time: Search Index (lucene), database, repositories (jackrabit),
mod-jk, mod-proxy, ehcache or tomcat tcpcluster, f5 hardware for load balancer, oracle RAC or MySQL cluster for DB redundancy, ehcache, lucene in combination with SOLR or clusterlink for centralization
jcrhook in portal properties abd DB to use a centralized place for documents. Better a SAN. He stressed on the need for a concrete test case to test clustering is working with cache replication before you actually assume your cluster is properly functioning. He spoke about performance tunning (portal properties, JVM params). Here are the three components he says are needed: repeatable test script, load test client (jmeter for example), java profiler. At least to cover the use case of concurrent users login at 9 am in the morning is a must have he said. Keep on recording the results (starting at a baseline without let us say any changes to default configuration) he advises and then after each change keep on recording to show later the statistics. Use a Java profiler to identify bottlenecks and keep on improving. He recommended the use of tools like visual vm, jmx console, jconsole. He explained the importance of tweaking the jvm garbage collector and java heap. I have been there before and it makes absolutely sense to me all his words. For Liferay he recommends to maintain minimum and maximum heap memory with equal values. He went on through app server config, monitoring threads in development boxes, look at memory, cpu, blocked threads. An important question from the auditorium was about using or not session replication. I have talked about this before, if not a need do not use it! He answered there is overhead for session replication so it should be decided in a per project basis. I asked the question about the need of Cache replication and the response for that is of course a big YES as Liferay uses Hibernate caching and so all servers in the cluster must be updated when a change is done in any of them.