Dude, Let's [XML] Shred!

May 28^th, 2018

Our guest blogger, Andrew Hinkle, gives us the basics of how to shred XML into a more useful query using SQL Server.

When working with databases, at one point, you'll encounter an XML datatype which requires you to extract data for a report or to send the data to fields in a table.

These XML data types have been part of SQL Server since 2008.

But how do you query them and convert it into something useful?

By shredding, dude...by shredding!

XML Shredding is the process of parsing XML into something more useful and easier to query.

The following examples will demonstrate some of the basic methods for shredding your XML.

This article assumes some basic knowledge of XML and XPATH. However, most of these concepts are covered under the references at the end of the article.

An XML Sample

First, let's define a very simple XML that contains a couple root elements, a collection, nested elements, and a couple attributes.

DECLARE @x XML
 
SET @x = '
<Library>
  <Books>
    <Book Type="Paperback">
      <Author>Robert Jordan</Author>
      <Id>26</Id>
      <PublicationDate>01/15/1990</PublicationDate>
      <Series>
        <Name>The Wheel of Time</Name>
        <Number>1</Number>
      </Series>
      <Title Chapters="53">The Eye of the World</Title>
    </Book>
    <Book Type="Hardback">
      <Author>Robert Jordan</Author>
      <Id>87</Id>
      <PublicationDate>09/15/1992</PublicationDate>
      <Series>
        <Name>The Wheel of Time</Name>
        <Number>4</Number>
      </Series>
      <Title Chapters="58">The Shadow Rising</Title>
    </Book>
    <Book Type="eBook">
      <Author>Robert Jordan</Author>
      <Id>43</Id>
      <PublicationDate>05/15/1996</PublicationDate>
      <Series>
        <Name>The Wheel of Time</Name>
        <Number>7</Number>
      </Series>
      <Title Chapters="41">A Crown of Swords</Title>
    </Book>
  </Books>
  <Id>51</Id>
  <Name>We Have Books... Read Them or Else!</Name>
</Library>'

Using .value

Now that we have our XML, let's parse this data with the "value" method.

It requires an XPATH within parenthesis, the [1] meaning the first element returned (not zero-based), followed by the second parameter the value type (INT, VARCHAR(MAX), etc.)

SELECT @x.value('(/Library/Id)[1]', 'INT') AS LibraryId,
       @x.value('(/Library/Name)[1]', 'VARCHAR(MAX)') AS LibraryName

Here are the results of the query.

LibraryId	LibraryName
51	We have Books...Read Them or Else!

(1 row affected)

Moving Forward with .nodes

.nodes allow you to access collections. Again, use an XPATH string to identify the elements. Following the method call, define an alias of the collection and the alias of the element.

In this example, that's Books(Book), but it could just as easily have been t(c).

Use .value to access the node you want just like before, however, now you'll want to use ./ to start within the scope of the element. Of course, you may use any XPATH.

Notice that you access attributes by navigating to the node and then use the @AttributeName to access it.

SELECT Books.Book.value('(./Id)[1]', 'INT') AS Id,
       Books.Book.value('(./Title)[1]', 'VARCHAR(MAX)') AS Title,
       Books.Book.value('(./Author)[1]', 'VARCHAR(MAX)') AS Author,
       Books.Book.value('(./PublicationDate)[1]', 'DATE') AS PublicationDate,
       Books.Book.value('(./Series/Name)[1]', 'VARCHAR(MAX)') AS SeriesName,
       Books.Book.value('(./Series/Number)[1]', 'VARCHAR(MAX)') AS SeriesNumber,
       Books.Book.value('@Type', 'VARCHAR(MAX)') AS BookType,
       Books.Book.value('(./Title/@Chapters)[1]', 'INT') AS Chapters
FROM @x.nodes('(/Library/Books/Book)') AS Books(Book)

Here are the results:

Id	Title
26	The Eye of the World
87	The Shadow Rising
43	A Crown of Swords

(3 rows affected)

Using XML Namespaces

Let's add in XML namespaces by adjusting the original XML Sample on the Library and Books nodes.

SET @x = '
<Library xmlns="http://www.example.com/library">
  <Books xmlns="http://www.example.com/library/books">
    <Book Type="Paperback">
      <Author>Robert Jordan</Author>
      <Id>26</Id>
      <PublicationDate>01/15/1990</PublicationDate>
      <Series>
        <Name>The Wheel of Time</Name>
        <Number>1</Number>
      </Series>
      <Title Chapters="53">The Eye of the World</Title>
    </Book>
    <Book Type="Hardback">
      <Author>Robert Jordan</Author>
      <Id>87</Id>
      <PublicationDate>09/15/1992</PublicationDate>
      <Series>
        <Name>The Wheel of Time</Name>
        <Number>4</Number>
      </Series>
      <Title Chapters="58">The Shadow Rising</Title>
    </Book>
    <Book Type="eBook">
      <Author>Robert Jordan</Author>
      <Id>43</Id>
      <PublicationDate>05/15/1996</PublicationDate>
      <Series>
        <Name>The Wheel of Time</Name>
        <Number>7</Number>
      </Series>
      <Title Chapters="41">A Crown of Swords</Title>
    </Book>
  </Books>
  <Id>51</Id>
  <Name>We Have Books... Read Them or Else!</Name>
</Library>'

Now that we have our namespaces, let's see how we can access data through the .value method.

.value works the same except now you declare the namespace before the XPATH separating the two with a semicolon.

.nodes works the same way, but adding this declaration reduces the readability.

SELECT @x.value('declare namespace l="http://www.example.com/library";
                 (/l:Library/l:Id)[1]', 'INT') AS LibraryId,
       @x.value('declare namespace l="http://www.example.com/library";
                 (/l:Library/l:Name)[1]', 'VARCHAR(MAX)') AS LibraryName

This gives us:

LibraryId	LibraryName
51	We Have Books... Read Them or Else!

Declaring namespaces for each value is annoying. Let's wrap it with a using statement.

Using WITH XMLNAMESPACES

Add the ;WITH XMLNAMESPACES before each of your select statements as if you were creating a Common Table Expression (CTE).

Before each of your nodes in the XPATH, add the namespace prefix.

Note that the prefix was not in the actual XML.

;WITH XMLNAMESPACES ('http://www.example.com/library' AS l)
SELECT @x.value('(/l:Library/l:Id)[1]', 'INT') AS LibraryId,
       @x.value('(/l:Library/l:Name)[1]', 'VARCHAR(MAX)') AS LibraryName
 
;WITH XMLNAMESPACES ('http://www.example.com/library' AS l,
                     'http://www.example.com/library/books' AS b)
SELECT Books.Book.value('(./b:Id)[1]', 'INT') AS Id,
       Books.Book.value('(./b:Title)[1]', 'VARCHAR(MAX)') AS Title,
       Books.Book.value('(./b:Author)[1]', 'VARCHAR(MAX)') AS Author,
       Books.Book.value('(./b:PublicationDate)[1]', 'DATE') AS PublicationDate,
       Books.Book.value('(./b:Series/b:Name)[1]', 'VARCHAR(MAX)') AS SeriesName,
       Books.Book.value('(./b:Series/b:Number)[1]', 'VARCHAR(MAX)') AS SeriesNumber,
       Books.Book.value('@Type', 'VARCHAR(MAX)') AS BookType,
       Books.Book.value('(./b:Title/@Chapters)[1]', 'INT') AS Chapters
FROM @x.nodes('(/l:Library/b:Books/b:Book)') AS Books(Book)

(1 row affected)

LibraryId	LibraryName
51	We have Books...Read Them or Else!

(1 row affected)

Id	Title
26	The Eye of the World
87	The Shadow Rising
43	A Crown of Swords

(3 rows affected)

Shredding Multiple XMLs

Of course, once you've managed to shred one xml, you'll want to do more. You can read the XML from files and queries, let's try an example from a temp table.

We'll populate it with the XML data we just tested into three rows.

IF OBJECT_ID('tempdb..#XmlShreddingTable') IS NOT NULL DROP TABLE #XmlShreddingTable
CREATE TABLE #XmlShreddingTable ( XmlId INT, XmlData XML )
 
INSERT INTO #XmlShreddingTable
VALUES (1, @x), (2, @x), (3, @x)
 
SELECT * FROM #XmlShreddingTable

For the collection of books we'll cross apply those rows against the nodes to get our results for all of the record sets.

Then we insert it into other tables, files, or wherever we need to store the shredded data.

;WITH XMLNAMESPACES ('http://www.example.com/library' AS l),
      Records AS
(
    SELECT xst.XmlId, xst.XmlData FROM #XmlShreddingTable xst
)
SELECT Records.XmlId,
       Records.XmlData.value('(/l:Library/l:Id)[1]', 'INT') AS LibraryId,
       Records.XmlData.value('(/l:Library/l:Name)[1]', 'VARCHAR(MAX)') AS LibraryName
FROM Records

XmlId	LibraryId	LibraryName
1	51	We have Books...Read Them or Else!
2	51	We have Books...Read Them or Else!
3	51	We have Books...Read Them or Else!

(3 rows affected)

;WITH XMLNAMESPACES ('http://www.example.com/library' AS l,
                     'http://www.example.com/library/books' AS b),
      Records AS
(
    SELECT xst.XmlId, xst.XmlData FROM #XmlShreddingTable xst
)
SELECT Records.XmlId,
       Books.Book.value('(./b:Id)[1]', 'INT') AS Id,
       Books.Book.value('(./b:Title)[1]', 'VARCHAR(MAX)') AS Title,
       Books.Book.value('(./b:Author)[1]', 'VARCHAR(MAX)') AS Author,
       Books.Book.value('(./b:PublicationDate)[1]', 'DATE') AS PublicationDate,
       Books.Book.value('(./b:Series/b:Name)[1]', 'VARCHAR(MAX)') AS SeriesName,
       Books.Book.value('(./b:Series/b:Number)[1]', 'VARCHAR(MAX)') AS SeriesNumber,
       Books.Book.value('@Type', 'VARCHAR(MAX)') AS BookType,
       Books.Book.value('(./b:Title/@Chapters)[1]', 'INT') AS Chapters
FROM Records CROSS APPLY Records.XmlData.nodes('(/l:Library/b:Books/b:Book)') AS Books(Book)

XmlId	LibraryId	LibraryName
1	26	The Eye of the World
1	87	The Shadow Rising
1	43	A Crown of Swords
2	26	The Eye of the World
2	87	The Shadow Rising
2	43	A Crown of Swords
3	26	The Eye of the World
3	87	The Shadow Rising
3	43	A Crown of Swords

(9 records affected)

Instead of using a Common Table Expression (CTE) you could move the select statement into the FROM statement as an inline query as follows.

;WITH XMLNAMESPACES ('http://www.example.com/library' AS l)
SELECT Records.XmlId,
       Records.XmlData.value('(/l:Library/l:Id)[1]', 'INT') AS LibraryId,
       Records.XmlData.value('(/l:Library/l:Name)[1]', 'VARCHAR(MAX)') AS LibraryName
FROM (SELECT xst.XmlId, xst.XmlData FROM #XmlShreddingTable xst) AS Records

XmlId	LibraryId	LibraryName
1	51	We have Books...Read Them or Else!
2	51	We have Books...Read Them or Else!
3	51	We have Books...Read Them or Else!

(3 records affected)

;WITH XMLNAMESPACES ('http://www.example.com/library' AS l,
                     'http://www.example.com/library/books' AS b)
SELECT Records.XmlId,
       Books.Book.value('(./b:Id)[1]', 'INT') AS Id,
       Books.Book.value('(./b:Title)[1]', 'VARCHAR(MAX)') AS Title,
       Books.Book.value('(./b:Author)[1]', 'VARCHAR(MAX)') AS Author,
       Books.Book.value('(./b:PublicationDate)[1]', 'DATE') AS PublicationDate,
       Books.Book.value('(./b:Series/b:Name)[1]', 'VARCHAR(MAX)') AS SeriesName,
       Books.Book.value('(./b:Series/b:Number)[1]', 'VARCHAR(MAX)') AS SeriesNumber,
       Books.Book.value('@Type', 'VARCHAR(MAX)') AS BookType,
       Books.Book.value('(./b:Title/@Chapters)[1]', 'INT') AS Chapters
FROM (SELECT xst.XmlId, xst.XmlData FROM #XmlShreddingTable xst) AS Records
      CROSS APPLY Records.XmlData.nodes('(/l:Library/b:Books/b:Book)') AS Books(Book)

XmlId	LibraryId	LibraryName
1	26	The Eye of the World
1	87	The Shadow Rising
1	43	A Crown of Swords
2	26	The Eye of the World
2	87	The Shadow Rising
2	43	A Crown of Swords
3	26	The Eye of the World
3	87	The Shadow Rising
3	43	A Crown of Swords

(9 records affected)

Other Considerations

There are plenty of ways to perform these steps some more efficient than others.

Other options could be to create a while loop or create cursors to call stored procedures that perform XML shredding and inserts each to a single table. This would allow you to separate the concerns and follow a command pattern with a master stored procedure that calls all of the others.

If you are creating this data for reports, you may want to consider performing a bulk load of all the data from the XML shredding.

After that you may want to have the stored procedure performing this task accept a date, date range, or flag so you only shred the XML for those records not yet processed.

References

XQuery Language Reference - There are many ways to shred XML.
XPath Tutorial - Syntax
;WITH XMLNAMESPACES - Taught me how to use them.
How to Shred a Single Value - This reference as well.
How to shred collections of nodes
How to shred attributes
How to perform a CROSS APPLY
Common Table Expression (CTE) Overview
Performant XML Shredding - takes the basics of what was taught here and made it efficient when you need to perform joins using CROSS APPLY and OUTER APPLY.
These Books are Awesome

Conclusion

Now that we've extracted the data from the XML, you can save the data to tables in your database, maybe store them in TEMP tables to generate a report.

There are plenty of options. This is just the beginning.

The code is located at my GitHub repo under the Tips.XmlShredding folder.

How do you shred your XML? Is there an easier way? Do you know of a way to make this perform faster and more efficiently?