Skip to content

Latest commit

 

History

History
53 lines (40 loc) · 3.33 KB

File metadata and controls

53 lines (40 loc) · 3.33 KB

Samples of complex XML processing

We are processing using XSLT 1.0 some XML that are very complex. Most of them coming from devices like Routers which eventually are more designed to be fast written than to be read easily. Here are a couple of some real sample processing.

How to check it?

You can use XSLT processor, in our case, we will use xlstproc as the tool to test them.

xsltproc -o results.csv transformation.xsl xml_source_file.xml

Samples

There are two samples from two mayor hardware vendors to use as example of work.

Huawei case - Version 1

In the Huawei case, we use the XSLT processor to convert from large XML files into CSV files. However, we still need to split values from the columns into rows. For example, you will get a result like this if you use the version huawei.xsl in the xlst folder

2019-01-23T11:00:00-05:00 | 41249367 | SomeDevice/SomeCell:Label=SomeLabel, CellID=1234, LogicRNCID=111 | 67179298 67179299 67179302 67179303 | 134 134 2200 59310 

In this sample, we have the fourth and fifth column with array values. Each value is a type of measurement and it's measure in that specific date time. We still need to split into something more valuable to be into a database table or a datalake columnar file. In our case, we will use Apache NiFi to do that job, so you can find an example in Groovy of how to split that using an InvokeScriptProcessor.

XLST that Pivot the Array values - Version 2

In the version 2 named huawei_v2.xsl you will find a XLST which slower but pivot the array into rows in the CSV. To see a quick example of this, you check this simple sample of how to pivot the array into rows

This method is too slow and inefficient. It took 9 hours and 47 minutes to complete in a 2015 macbook pro with i7 2.5ghz and 16gb RAM.

Doing the pivot in two steps - Version 3

In the version 3, we are doing the Pivot of the measures but in two steps. First, we output an XML with the values that we need in the CSV. In the second step, we use another XLST to pivot the arrays into rows.

xsltproc --timing -o result_v3.xml huawei_v3.xsl A20190725.1100-0500-1130-0500_DIFRNC190_P003.xml

After we have the intermediate XML file, we run:

xsltproc --timing -o result_v3.csv huawei_v3_child.xsl result_v3.xml

Times are not great, in a 2015 macbook pro (Intel i7 2.5ghz and 16gb RAM) takes arround 86 seconds to process a source XML of 24mb. However, it's really better than the second version which takes hours to complete the output in the same hardware.

Ericsson case

In this case, the XML is more simple, so the XSLT is also very straight to process because we can obtain the final CSV ready to be inserted into a table.

References

Pivot