Remove duplicate tags with XSLT

Sometimes we need to clean the XML data by removing duplicate tags(elements, or child elements. So, how can we remove? The answer is XSLT. So, how to remove duplicate tags with XSLT from the XML?

So here I’m taking a very simple input XML as a sample.

Input XML:

Click on the download button for sample input 

				
					<?xml version="1.0"?>
<customers>
   <customer id="55000">
      <name>Charter Group</name>
      <address>
         <street>100 Main</street>
         <city>Framingham</city>
         <state>MA</state>
         <zip>01701</zip>
      </address>
      <address>
         <street>720 Prospect</street>
         <city>Framingham</city>
         <state>MA</state>
         <zip>01701</zip>
         <detail>
            <A>akkk</A>
            <A>akkk</A>
            <A>akkk</A>   
         </detail>
      </address>   
      <address>
         <street>120 Ridge</street>
         <state>MA</state>
         <zip>01760</zip>
      </address>
	  <address>
         <street>120 Ridge</street>
         <state>MA</state>
         <zip>01760</zip>
      </address>
   </customer>
</customers>
				
			

Now If we analyze the above XML input, we can see there two places where xml tags are duplicates.

First duplicate tags

The <detail> tag containing multiple ‘A’ tag with same value ‘akkk’, so here distinct tag should be one <A>

				
					<detail>
    <A>akkk</A>
    <A>akkk</A>
    <A>akkk</A>   
</detail>
				
			

Second duplicate tags

The last <address> tag is the exact same as second last <address> tag with same child elements and their values, here these two tags also dupliates. We need only one tag. 

				
					<address>
     <street>120 Ridge</street>
     <state>MA</state>
     <zip>01760</zip>
</address>
 <address>
     <street>120 Ridge</street>
     <state>MA</state>
     <zip>01760</zip>
</address>
				
			

Remove duplicate tags with XSLT

So here I’ve written a very simple XSLT which copies all elements, removes duplicate child elements with the same name and string value within the same parent. Preserves both the tag name and value correctly.

				
					<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  
  <xsl:output method="xml" indent="yes"/>
  <xsl:template match="@* | text() | comment() | processing-instruction()">
    <xsl:copy/>
  </xsl:template>
  
  
  <xsl:template match="*">
    <xsl:copy>
      <xsl:apply-templates select="@*"/>
      <xsl:for-each-group select="*" group-by="name() || '|' || normalize-space(string(.))">
       <xsl:apply-templates select="current-group()[1]"/>
      </xsl:for-each-group>
      <xsl:apply-templates select="text()"/>
    </xsl:copy>
  </xsl:template>
  
</xsl:stylesheet>

				
			

So If we run above XSLT with above input, it will generate below the input as mentioned.  

				
					<?xml version="1.0" encoding="UTF-8"?>
<customers>
   <customer id="55000">
      <name>Charter Group</name>
      <address>
         <street>100 Main</street>
         <city>Framingham</city>
         <state>MA</state>
         <zip>01701</zip>
      </address>
      <address>
         <street>720 Prospect</street>
         <city>Framingham</city>
         <state>MA</state>
         <zip>01701</zip>
         <detail>
            <A>akkk</A>
         </detail>
      </address>
      <address>
         <street>120 Ridge</street>
         <state>MA</state>
         <zip>01760</zip>
         <detail>
            <A>akkk</A>
         </detail>
      </address>
      <address>
         <street>120 Ridge</street>
         <state>MA</state>
         <zip>01760</zip>
      </address>
   </customer>
</customers>

				
			

Now, if we analyse the above-generated output by the XSLT, we can see that all duplicate tags, either child, have been removed properly.