Thursday, May 15, 2014

Talend Open Source needs Dynamic Schema for Delimited files

Talend Open Source needs Dynamic Schema for Delimited files. Only the commercial version allows dynamic schema.

We need to build a component called tFileInputDelimitedExtract. You could use tFileInputCSVFilter as a reference for the implementation. Unfortunately I don't have the time at the moment for this implementation but at least let me enunciate the specifications for it in case someone decides to go further with the implementation. It could be a good project for someone willing to learn talend component creation for example.
Narrative
=========
* As a Talend user
* I want a component called tFileInputDelimitedExtract
that accepts an input delimited file,
the delimiter and a list of column names
and generates as output *just* the columns in the specified list order
* So that I can guarantee a fixed schema
independent of the order or the number of future columns.
At the moment a quick "hack" for new unexpected inner columns would be to use 'cut' to exclude them. Below we remove the 7th unneeded column from a pipe delimited file:
cut -d'|' -f1-6,8-100 ~/input.bcp > ~/output.bcp

No comments:

Followers