Paul Kiel's Data Integration Blog
Data integration using Xml / Xslt and anything else...




XSLT 1.0 grouping, iterating, and uniqueness without keys

I'll preface this by saying that I love the new features in XSLT 2.0. The very first features that I used were ones that took care of some problems around grouping of nodes and finding unique ones. The xsl:for-each-group is a life saver. Previously, I'd used tool extensions such as Saxon's saxon:distinct() function to solve the problem. However, depending on the client, I may not have had the choice to use 2.0 or extensions. So I was stuck with trying to find out how to group nodes or find unique nodes using XLST 1.0 features and without proprietary extensions.

Some solutions have been documented, most notably Jeni Tennison who describes the Muenchian Method to grouping nodes. And this is probably the best one.

The problem is that the Meunchian Method requires the use of xsl keys. Some processors (and Jeni notes James Clark's XT is one) don't implement keys. I am a big fan of XT for certain applications and continue to use it on occasion. So I wanted to highligh how I've done grouping, iteration, and uniqueness without keys or extensions and using XSLT 1.0 techniques.

Assumptions:

  • I assume the same scenario as in Jeni's example. Records in a database that contain contacts that need to be sorted. Surnames need to be displayed only once (uniqueness) and matching all forenames (iteration).
  • I am using XSLT 1.0 without any extensions
  • I cannot use keys.
  • I need to iterate through a unique list of nodes (there are likely duplicate values within nodes)
  • There are an unknown number of nodes to iterate through.

While this example is very simple, and could be done with the preceeding or following axes, a complex xml may cause the XPath to get too confusing to track. This method keeps the XPath simple.

<records>
............<contact id="0001">
........................<title>Mr</title>
........................<forename>John</forename>
........................<surname>Smith</surname>
............</contact>
............<contact id="0003">
........................<title>Ms</title>
........................<forename>Fiona</forename>
........................<surname>Smith</surname>
............</contact>
............<contact id="0002">
........................<title>Dr</title>
........................<forename>Amy</forename>
........................<surname>Jones</surname>
............</contact>
............<contact id="0004">
........................<title>Mr</title>
........................<forename>Brian</forename>
........................<surname>Jones</surname>
............</contact>
</records>


 

The stylesheet:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
            <xsl:output method="xml"="1.0"="UTF-8" indent="yes"/>
            <!-- ====================================================================== -->
            <!-- ===== Root ===== -->
            <!-- ====================================================================== -->
            <xsl:template match="/">
                        <xsl:variable name="delimitedSurnames">
                                    <!-- variable will result in a delimited list of all surnames. will include duplicates and is delimited with a tilde "~".
xml produces this:
~Jones~Smith~Smith~
 
-->
                                    <xsl:for-each="//contact/surname">
                                                <xsl:sort="."/>
                                                <xsl:value-of="."/>~</xsl:for-each>
                        </xsl:variable>
                        <xsl:call-template name="processUniqueValues">
                                    <xsl:with-param="delimitedSurnames">
                                                <xsl:value-of="$delimitedSurnames"/>
                                    </xsl:with-param>
                        </xsl:call-template>
            </xsl:template>
            <!-- ====================================================================== -->
            <!-- ===== processUniqueValues: iteratesa unique list of surnames given a delimited string (deliminated with '~') ===== -->
            <!-- ====================================================================== -->
            <xsl:template name="processUniqueValues">
                        <xsl:param name="delimitedSurnames"/>
                        <xsl:variable name="firstOne">
                                    <!-- variable firstOne: the first value in the delimited list of-->
                                    <xsl:value-of="substring-before($delimitedSurnames,'~')"/>
                        </xsl:variable>
                        <xsl:variable name="firstOneDelimited">
                                    <!-- variable firstOneDelimited: the first value in the delimitedof items with the tilde "~" delimiter -->
                                    <xsl:value-of="substring-before($delimitedSurnames,'~')"/>~</xsl:variable>
                        <xsl:variable name="theRest">
                                    <!-- variable theRest: the rest of the delimited list after theone is removed -->
                                    <xsl:value-of="substring-after($delimitedSurnames,'~')"/>
                        </xsl:variable>
                        <xsl:choose>
                                    <!-- when the current one exists again in the remaining list ANDfirst one isn't empty, -->
                                    <xsl:when test="contains($theRest,$firstOneDelimited) and not($firstOne='')">
                                                <xsl:call-template name="processUniqueValues">
                                                            <xsl:with-param="delimitedSurnames">
                                                                        <xsl:value-of="$theRest"/>
                                                            </xsl:with-param>
                                                </xsl:call-template>
                                    </xsl:when>
                                    <xsl:otherwise>
                                                <!-- otherwise this is the last occurence in the list, so returnitem with a delimiter tilde "~". -->
                                                <xsl:text>
                                                </xsl:text>
                                                <xsl:value-of="$firstOne"/>, <br/>
                                                <xsl:for-each="//contact[surname=$firstOne]">
                                                            <xsl:text>
    </xsl:text>
                                                            <xsl:value-of="forename"/> (<xsl:value-of select="title"/>) <br/>
                                                </xsl:for-each>
                                                <xsl:if test="contains($theRest,'~')">
                                                            <!-- when there are more left in the delimited list, call thewith the remaining items -->
                                                            <xsl:call-template name="processUniqueValues">
                                                                        <xsl:with-param="delimitedSurnames">
                                                                         <xsl:value-of="$theRest"/>
                                                                        </xsl:with-param>
                                                            </xsl:call-template>
                                                </xsl:if>
                                    </xsl:otherwise>
                        </xsl:choose>
            </xsl:template>
</xsl:stylesheet>

The result:

Jones, <br/>
     Amy (Dr) <br/>
     Brian (Mr) <br/>
Smith, <br/>
     John (Mr) <br/>
     Fiona (Ms) <br/>


Click here to visit the Radio UserLand website. © Copyright 2007 Paul Kiel.
Last update: 9/22/2007; 4:15:31 PM.