As a consequence XML transformations are a key skill for those folks doing data transformations.
Even though there are more efficient tools XSLT is a standard which is supported in Talend through the tXSLT component. You just need to provide your XML, XSL and output files and Talend will apply the transformation for you. Talend uses Saxon at the moment so you get the benefit of clear error messages when trying to build your XSL.
Of course XSLT might be a skill that even scares some people, however XSLT is not difficult at all and the more you work with it the better you get as with any other human skill. Do not try to avoid it, if you have XML to process and your ETL tool is Talend then do your homework and learn some XSL.
As there is no better way to teach than providing an example I decided to write this quick showcase that will pivot the data resulting from running an Advent Geneva RSL report (A SOAP service) which comes in the form of key value pairs into a tabular output. I will provide two responses: XML and HTML. The first is probably what you need in Talend while the second is probably what you need if you want to provide a quick add hoc HTML report page.
Here is the XML:
<?xml version='1.0' encoding='UTF-8'?> <SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:SOAP-ENC="http://schemas.xmlsoap.org/soap/encoding/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:nsg="http://geneva.advent.com"> <SOAP-ENV:Body id="_0"> <reportResults xmlns="http://geneva.advent.com"> <return xsi:type="nsg:reportResultsPortfolioStruct"> <results xsi:type="nsg:reportResultsStruct"> <portfolioName xsi:type="xsd:string">Fund1</portfolioName> <header xsi:type="nsg:reportResultsRecordStruct"> <field xsi:type="nsg:reportResultsVectorElement"> <name xsi:type="xsd:string">Head</name> <value xsi:type="xsd:string">Fund 1 Example</value> </field> <field xsi:type="nsg:reportResultsVectorElement"> <name xsi:type="xsd:string">Head</name> <value xsi:type="xsd:string">DIVIDENDS RECEIVABLE AND PAYABLE</value> </field> <field xsi:type="nsg:reportResultsVectorElement"> <name xsi:type="xsd:string">Head</name> <value xsi:type="xsd:string">FOR THE PERIOD INCEPTION TO July 31, 2011</value> </field> <field xsi:type="nsg:reportResultsVectorElement"> <name xsi:type="xsd:string">Head</name> <value xsi:type="xsd:string"/> </field> </header> <record xsi:type="nsg:reportResultsRecordStruct"> <field xsi:type="nsg:reportResultsVectorElement"> <name xsi:type="xsd:string">CcyCode</name> <value xsi:type="xsd:string">BRL</value> </field> <field xsi:type="nsg:reportResultsVectorElement"> <name xsi:type="xsd:string">Custodian</name> <value xsi:type="xsd:string">My Custodian</value> </field> <field xsi:type="nsg:reportResultsVectorElement"> <name xsi:type="xsd:string">IDesc</name> <value xsi:type="xsd:string">My IDesc</value> </field> <field xsi:type="nsg:reportResultsVectorElement"> <name xsi:type="xsd:string">TransID</name> <value xsi:type="xsd:string">10145834</value> </field> <field xsi:type="nsg:reportResultsVectorElement"> <name xsi:type="xsd:string">TDate</name> <value xsi:type="xsd:string">March 15, 2011 12:00:00 am</value> </field> <field xsi:type="nsg:reportResultsVectorElement"> <name xsi:type="xsd:string">SDate</name> <value xsi:type="xsd:string">December 31, 9999 11:59:59 pm</value> </field> <field xsi:type="nsg:reportResultsVectorElement"> <name xsi:type="xsd:string">Qty</name> <value xsi:type="xsd:string">13528.013</value> </field> <field xsi:type="nsg:reportResultsVectorElement"> <name xsi:type="xsd:string">PerShare</name> <value xsi:type="xsd:string">0.15100000</value> </field> <field xsi:type="nsg:reportResultsVectorElement"> <name xsi:type="xsd:string">TaxRate</name> <value xsi:type="xsd:string"/> </field> <field xsi:type="nsg:reportResultsVectorElement"> <name xsi:type="xsd:string">DivLocal</name> <value xsi:type="xsd:string">2042.73</value> </field> <field xsi:type="nsg:reportResultsVectorElement"> <name xsi:type="xsd:string">TaxLocal</name> <value xsi:type="xsd:string">0.00</value> </field> <field xsi:type="nsg:reportResultsVectorElement"> <name xsi:type="xsd:string">DivLocalNet</name> <value xsi:type="xsd:string">2042.73</value> </field> <field xsi:type="nsg:reportResultsVectorElement"> <name xsi:type="xsd:string">DivBook</name> <value xsi:type="xsd:string">1225.76</value> </field> <field xsi:type="nsg:reportResultsVectorElement"> <name xsi:type="xsd:string">TaxBook</name> <value xsi:type="xsd:string">0.00</value> </field> <field xsi:type="nsg:reportResultsVectorElement"> <name xsi:type="xsd:string">DivBookNet</name> <value xsi:type="xsd:string">1225.76</value> </field> <field xsi:type="nsg:reportResultsVectorElement"> <name xsi:type="xsd:string">UnrealFXGL</name> <value xsi:type="xsd:string">0.00</value> </field> <field xsi:type="nsg:reportResultsVectorElement"> <name xsi:type="xsd:string">BookBal</name> <value xsi:type="xsd:string">1225.76</value> </field> </record> <record xsi:type="nsg:reportResultsRecordStruct"> <field xsi:type="nsg:reportResultsVectorElement"> <name xsi:type="xsd:string">CcyCode</name> <value xsi:type="xsd:string">USD</value> </field> <field xsi:type="nsg:reportResultsVectorElement"> <name xsi:type="xsd:string">Custodian</name> <value xsi:type="xsd:string">My Custodian</value> </field> <field xsi:type="nsg:reportResultsVectorElement"> <name xsi:type="xsd:string">IDesc</name> <value xsi:type="xsd:string">My IDesc</value> </field> <field xsi:type="nsg:reportResultsVectorElement"> <name xsi:type="xsd:string">TransID</name> <value xsi:type="xsd:string">10756740</value> </field> <field xsi:type="nsg:reportResultsVectorElement"> <name xsi:type="xsd:string">TDate</name> <value xsi:type="xsd:string">April 27, 2011 12:00:00 am</value> </field> <field xsi:type="nsg:reportResultsVectorElement"> <name xsi:type="xsd:string">SDate</name> <value xsi:type="xsd:string">July 1, 2011 12:00:00 am</value> </field> <field xsi:type="nsg:reportResultsVectorElement"> <name xsi:type="xsd:string">Qty</name> <value xsi:type="xsd:string">205212.046</value> </field> <field xsi:type="nsg:reportResultsVectorElement"> <name xsi:type="xsd:string">PerShare</name> <value xsi:type="xsd:string">0.16918500</value> </field> <field xsi:type="nsg:reportResultsVectorElement"> <name xsi:type="xsd:string">TaxRate</name> <value xsi:type="xsd:string"/> </field> <field xsi:type="nsg:reportResultsVectorElement"> <name xsi:type="xsd:string">DivLocal</name> <value xsi:type="xsd:string">34718.80</value> </field> <field xsi:type="nsg:reportResultsVectorElement"> <name xsi:type="xsd:string">TaxLocal</name> <value xsi:type="xsd:string">0.00</value> </field> <field xsi:type="nsg:reportResultsVectorElement"> <name xsi:type="xsd:string">DivLocalNet</name> <value xsi:type="xsd:string">34718.80</value> </field> <field xsi:type="nsg:reportResultsVectorElement"> <name xsi:type="xsd:string">DivBook</name> <value xsi:type="xsd:string">22153.39</value> </field> <field xsi:type="nsg:reportResultsVectorElement"> <name xsi:type="xsd:string">TaxBook</name> <value xsi:type="xsd:string">0.00</value> </field> <field xsi:type="nsg:reportResultsVectorElement"> <name xsi:type="xsd:string">DivBookNet</name> <value xsi:type="xsd:string">22153.39</value> </field> <field xsi:type="nsg:reportResultsVectorElement"> <name xsi:type="xsd:string">UnrealFXGL</name> <value xsi:type="xsd:string">0.00</value> </field> <field xsi:type="nsg:reportResultsVectorElement"> <name xsi:type="xsd:string">BookBal</name> <value xsi:type="xsd:string">23379.15</value> </field> </record> <addendumErrors xsi:type="nsg:reportResultsRecordStruct"> <field xsi:type="nsg:reportResultsVectorElement"> <name xsi:type="xsd:string">BegDesc</name> <value xsi:type="xsd:string"/> </field> <field xsi:type="nsg:reportResultsVectorElement"> <name xsi:type="xsd:string">EndDesc</name> <value xsi:type="xsd:string">DIVIDENDS RECEIVABLE - CLOSING BALANCE</value> </field> <field xsi:type="nsg:reportResultsVectorElement"> <name xsi:type="xsd:string">BegBal</name> <value xsi:type="xsd:string"/> </field> <field xsi:type="nsg:reportResultsVectorElement"> <name xsi:type="xsd:string">EndBal</name> <value xsi:type="xsd:string">71643.79</value> </field> </addendumErrors> <addendumErrors xsi:type="nsg:reportResultsRecordStruct"> <field xsi:type="nsg:reportResultsVectorElement"> <name xsi:type="xsd:string">BegDesc</name> <value xsi:type="xsd:string"/> </field> <field xsi:type="nsg:reportResultsVectorElement"> <name xsi:type="xsd:string">EndDesc</name> <value xsi:type="xsd:string">DIVIDENDS PAYABLE - CLOSING BALANCE</value> </field> <field xsi:type="nsg:reportResultsVectorElement"> <name xsi:type="xsd:string">BegBal</name> <value xsi:type="xsd:string"/> </field> <field xsi:type="nsg:reportResultsVectorElement"> <name xsi:type="xsd:string">EndBal</name> <value xsi:type="xsd:string">-20200.35</value> </field> </addendumErrors> </results> </return> </reportResults> </SOAP-ENV:Body> </SOAP-ENV:Envelope>
If you open it with Firefox and you choose "View XPath" from a right click on the body of the page (contextual menu) you could try several XPATH expresions. Together with XSLT skill it comes XPATH which is just a way to address a node, an attribute or textual content in the XML. See below how I tested one of the XPATH using this tool. Pay attention to the namespace definition, I use simple letters to abbreviate more verbose prefixes.
Now that you have a quick tool for finding nodes in the XML document let us see the desired document structure. Here is a screenshot of what we would like to see in HTML:
Here is in the XML we would like to obtain for further processing in Talend:
Here is the XSL that will output HTML:
<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:e="http://schemas.xmlsoap.org/soap/envelope/" xmlns:g="http://geneva.advent.com"> <xsl:output omit-xml-declaration="yes" indent="yes"/> <xsl:strip-space elements="*"/> <xsl:key name="kFieldNameByValue" match="/e:Envelope/e:Body/g:reportResults/g:return/g:results/g:record/g:field/g:name" use="."/> <xsl:variable name="vCols" select= "/e:Envelope/e:Body/g:reportResults/g:return/g:results/g:record/g:field/g:name [generate-id() = generate-id(key('kFieldNameByValue',.)[1]) ]"/> <xsl:template match="/"> <table> <tr> <xsl:apply-templates select="$vCols"/> </tr> <xsl:for-each select= "/e:Envelope/e:Body/g:reportResults/g:return/g:results/g:record"> <tr> <xsl:variable name="vPos" select="position()"/> <xsl:for-each select="$vCols"> <td> <xsl:value-of select= "../../../g:record[$vPos]/g:field[g:name = current()]/g:value"/> </td> </xsl:for-each> </tr> </xsl:for-each> </table> </xsl:template> <xsl:template match="/e:Envelope/e:Body/g:reportResults/g:return/g:results/g:record/g:field/g:name"> <td> <xsl:value-of select="." /> </td> </xsl:template> </xsl:stylesheet>
Here is the XSL that will output XML:
<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:e="http://schemas.xmlsoap.org/soap/envelope/" xmlns:g="http://geneva.advent.com" > <xsl:output omit-xml-declaration="no" indent="yes"/> <xsl:strip-space elements="*"/> <xsl:key name="kFieldNameByValue" match="/e:Envelope/e:Body/g:reportResults/g:return/g:results/g:record/g:field/g:name" use="."/> <xsl:variable name="vCols" select= "/e:Envelope/e:Body/g:reportResults/g:return/g:results/g:record/g:field/g:name [generate-id() = generate-id(key('kFieldNameByValue',.)[1]) ]"/> <xsl:template match="/"> <xsl:element name="root"> <xsl:element name="header"> <xsl:apply-templates select="$vCols"/> </xsl:element> <xsl:for-each select="/e:Envelope/e:Body/g:reportResults/g:return/g:results/g:record"> <xsl:element name="record"> <xsl:variable name="vPos" select="position()"/> <xsl:for-each select="$vCols"> <xsl:element name="value"><xsl:value-of select="../../../g:record[$vPos]/g:field[g:name = current()]/g:value"/></xsl:element> </xsl:for-each> </xsl:element> </xsl:for-each> </xsl:element> </xsl:template> <xsl:template match="/e:Envelope/e:Body/g:reportResults/g:return/g:results/g:record/g:field/g:name"> <xsl:element name="name"><xsl:value-of select="."/></xsl:element> </xsl:template> </xsl:stylesheet>
A quick explanation of what the XSL code does:
1-6 Is just stating it is an XML document which uses a namespace prefixed "xsl" for the transaformation instructions, a soap namespace with prefix "e" and an Advent Geneva namespace with prefix "g". Note that I abreviated the original namespace suffixes for the last two.
7-9 How the final output should look like.
11-19 We use the Muenchian grouping that allows us to have a list of all possible column names.
21-37 XSLT is a functional language which works matching nodes and applying transformations to them. The logic can be affected using xsl:apply-template. We use xsl:element to create our custom nodes: First the headers which come out of the Muenchian key and later the records for which we again use the keys while addressing the correct record through an xsl:foreach nested loop.
39-41 The template responsible for generating the name nodes.