Dude, Let's [XML] Shred!
Our guest blogger, Andrew Hinkle, gives us the basics of how to shred XML into a more useful query using SQL Server.
When working with databases, at one point, you'll encounter an XML datatype which requires you to extract data for a report or to send the data to fields in a table.
These XML data types have been part of SQL Server since 2008.
But how do you query them and convert it into something useful?
By shredding, dude...by shredding!
XML Shredding is the process of parsing XML into something more useful and easier to query.
The following examples will demonstrate some of the basic methods for shredding your XML.
This article assumes some basic knowledge of XML and XPATH. However, most of these concepts are covered under the references at the end of the article.
An XML Sample
First, let's define a very simple XML that contains a couple root elements, a collection, nested elements, and a couple attributes.
DECLARE @x XML SET @x = ' <Library> <Books> <Book Type="Paperback"> <Author>Robert Jordan</Author> <Id>26</Id> <PublicationDate>01/15/1990</PublicationDate> <Series> <Name>The Wheel of Time</Name> <Number>1</Number> </Series> <Title Chapters="53">The Eye of the World</Title> </Book> <Book Type="Hardback"> <Author>Robert Jordan</Author> <Id>87</Id> <PublicationDate>09/15/1992</PublicationDate> <Series> <Name>The Wheel of Time</Name> <Number>4</Number> </Series> <Title Chapters="58">The Shadow Rising</Title> </Book> <Book Type="eBook"> <Author>Robert Jordan</Author> <Id>43</Id> <PublicationDate>05/15/1996</PublicationDate> <Series> <Name>The Wheel of Time</Name> <Number>7</Number> </Series> <Title Chapters="41">A Crown of Swords</Title> </Book> </Books> <Id>51</Id> <Name>We Have Books... Read Them or Else!</Name> </Library>'
Using .value
Now that we have our XML, let's parse this data with the "value" method.
It requires an XPATH within parenthesis, the [1] meaning the first element returned (not zero-based), followed by the second parameter the value type (INT, VARCHAR(MAX)
, etc.)
SELECT @x.value('(/Library/Id)[1]', 'INT') AS LibraryId, @x.value('(/Library/Name)[1]', 'VARCHAR(MAX)') AS LibraryName
Here are the results of the query.
LibraryId | LibraryName |
---|---|
51 | We have Books...Read Them or Else! |
(1 row affected)
Moving Forward with .nodes
.nodes allow you to access collections. Again, use an XPATH string to identify the elements. Following the method call, define an alias of the collection and the alias of the element.
In this example, that's Books(Book)
, but it could just as easily have been t(c).
Use .value
to access the node you want just like before, however, now you'll want to use ./ to start within the scope of the element. Of course, you may use any XPATH.
Notice that you access attributes by navigating to the node and then use the @AttributeName to access it.
SELECT Books.Book.value('(./Id)[1]', 'INT') AS Id, Books.Book.value('(./Title)[1]', 'VARCHAR(MAX)') AS Title, Books.Book.value('(./Author)[1]', 'VARCHAR(MAX)') AS Author, Books.Book.value('(./PublicationDate)[1]', 'DATE') AS PublicationDate, Books.Book.value('(./Series/Name)[1]', 'VARCHAR(MAX)') AS SeriesName, Books.Book.value('(./Series/Number)[1]', 'VARCHAR(MAX)') AS SeriesNumber, Books.Book.value('@Type', 'VARCHAR(MAX)') AS BookType, Books.Book.value('(./Title/@Chapters)[1]', 'INT') AS Chapters FROM @x.nodes('(/Library/Books/Book)') AS Books(Book)
Here are the results:
Id | Title |
---|---|
26 | The Eye of the World |
87 | The Shadow Rising |
43 | A Crown of Swords |
(3 rows affected)
Using XML Namespaces
Let's add in XML namespaces by adjusting the original XML Sample on the Library and Books nodes.
SET @x = ' <Library xmlns="http://www.example.com/library"> <Books xmlns="http://www.example.com/library/books"> <Book Type="Paperback"> <Author>Robert Jordan</Author> <Id>26</Id> <PublicationDate>01/15/1990</PublicationDate> <Series> <Name>The Wheel of Time</Name> <Number>1</Number> </Series> <Title Chapters="53">The Eye of the World</Title> </Book> <Book Type="Hardback"> <Author>Robert Jordan</Author> <Id>87</Id> <PublicationDate>09/15/1992</PublicationDate> <Series> <Name>The Wheel of Time</Name> <Number>4</Number> </Series> <Title Chapters="58">The Shadow Rising</Title> </Book> <Book Type="eBook"> <Author>Robert Jordan</Author> <Id>43</Id> <PublicationDate>05/15/1996</PublicationDate> <Series> <Name>The Wheel of Time</Name> <Number>7</Number> </Series> <Title Chapters="41">A Crown of Swords</Title> </Book> </Books> <Id>51</Id> <Name>We Have Books... Read Them or Else!</Name> </Library>'
Now that we have our namespaces, let's see how we can access data through the .value
method.
.value
works the same except now you declare the namespace before the XPATH separating the two with a semicolon.
.nodes
works the same way, but adding this declaration reduces the readability.
SELECT @x.value('declare namespace l="http://www.example.com/library"; (/l:Library/l:Id)[1]', 'INT') AS LibraryId, @x.value('declare namespace l="http://www.example.com/library"; (/l:Library/l:Name)[1]', 'VARCHAR(MAX)') AS LibraryName
This gives us:
LibraryId | LibraryName |
---|---|
51 | We Have Books... Read Them or Else! |
Declaring namespaces for each value is annoying. Let's wrap it with a using statement.
Using WITH XMLNAMESPACES
Add the ;WITH XMLNAMESPACES before each of your select statements as if you were creating a Common Table Expression (CTE).
Before each of your nodes in the XPATH, add the namespace prefix.
Note that the prefix was not in the actual XML.
;WITH XMLNAMESPACES ('http://www.example.com/library' AS l) SELECT @x.value('(/l:Library/l:Id)[1]', 'INT') AS LibraryId, @x.value('(/l:Library/l:Name)[1]', 'VARCHAR(MAX)') AS LibraryName ;WITH XMLNAMESPACES ('http://www.example.com/library' AS l, 'http://www.example.com/library/books' AS b) SELECT Books.Book.value('(./b:Id)[1]', 'INT') AS Id, Books.Book.value('(./b:Title)[1]', 'VARCHAR(MAX)') AS Title, Books.Book.value('(./b:Author)[1]', 'VARCHAR(MAX)') AS Author, Books.Book.value('(./b:PublicationDate)[1]', 'DATE') AS PublicationDate, Books.Book.value('(./b:Series/b:Name)[1]', 'VARCHAR(MAX)') AS SeriesName, Books.Book.value('(./b:Series/b:Number)[1]', 'VARCHAR(MAX)') AS SeriesNumber, Books.Book.value('@Type', 'VARCHAR(MAX)') AS BookType, Books.Book.value('(./b:Title/@Chapters)[1]', 'INT') AS Chapters FROM @x.nodes('(/l:Library/b:Books/b:Book)') AS Books(Book)
(1 row affected)
LibraryId | LibraryName |
---|---|
51 | We have Books...Read Them or Else! |
(1 row affected)
Id | Title |
---|---|
26 | The Eye of the World |
87 | The Shadow Rising |
43 | A Crown of Swords |
(3 rows affected)
Shredding Multiple XMLs
Of course, once you've managed to shred one xml, you'll want to do more. You can read the XML from files and queries, let's try an example from a temp table.
We'll populate it with the XML data we just tested into three rows.
IF OBJECT_ID('tempdb..#XmlShreddingTable') IS NOT NULL DROP TABLE #XmlShreddingTable CREATE TABLE #XmlShreddingTable ( XmlId INT, XmlData XML ) INSERT INTO #XmlShreddingTable VALUES (1, @x), (2, @x), (3, @x) SELECT * FROM #XmlShreddingTable
For the collection of books we'll cross apply those rows against the nodes to get our results for all of the record sets.
Then we insert it into other tables, files, or wherever we need to store the shredded data.
;WITH XMLNAMESPACES ('http://www.example.com/library' AS l), Records AS ( SELECT xst.XmlId, xst.XmlData FROM #XmlShreddingTable xst ) SELECT Records.XmlId, Records.XmlData.value('(/l:Library/l:Id)[1]', 'INT') AS LibraryId, Records.XmlData.value('(/l:Library/l:Name)[1]', 'VARCHAR(MAX)') AS LibraryName FROM Records
XmlId | LibraryId | LibraryName |
---|---|---|
1 | 51 | We have Books...Read Them or Else! |
2 | 51 | We have Books...Read Them or Else! |
3 | 51 | We have Books...Read Them or Else! |
(3 rows affected)
;WITH XMLNAMESPACES ('http://www.example.com/library' AS l, 'http://www.example.com/library/books' AS b), Records AS ( SELECT xst.XmlId, xst.XmlData FROM #XmlShreddingTable xst ) SELECT Records.XmlId, Books.Book.value('(./b:Id)[1]', 'INT') AS Id, Books.Book.value('(./b:Title)[1]', 'VARCHAR(MAX)') AS Title, Books.Book.value('(./b:Author)[1]', 'VARCHAR(MAX)') AS Author, Books.Book.value('(./b:PublicationDate)[1]', 'DATE') AS PublicationDate, Books.Book.value('(./b:Series/b:Name)[1]', 'VARCHAR(MAX)') AS SeriesName, Books.Book.value('(./b:Series/b:Number)[1]', 'VARCHAR(MAX)') AS SeriesNumber, Books.Book.value('@Type', 'VARCHAR(MAX)') AS BookType, Books.Book.value('(./b:Title/@Chapters)[1]', 'INT') AS Chapters FROM Records CROSS APPLY Records.XmlData.nodes('(/l:Library/b:Books/b:Book)') AS Books(Book)
XmlId | LibraryId | LibraryName |
---|---|---|
1 | 26 | The Eye of the World |
1 | 87 | The Shadow Rising |
1 | 43 | A Crown of Swords |
2 | 26 | The Eye of the World |
2 | 87 | The Shadow Rising |
2 | 43 | A Crown of Swords |
3 | 26 | The Eye of the World |
3 | 87 | The Shadow Rising |
3 | 43 | A Crown of Swords |
(9 records affected)
Instead of using a Common Table Expression (CTE) you could move the select statement into the FROM statement as an inline query as follows.
;WITH XMLNAMESPACES ('http://www.example.com/library' AS l) SELECT Records.XmlId, Records.XmlData.value('(/l:Library/l:Id)[1]', 'INT') AS LibraryId, Records.XmlData.value('(/l:Library/l:Name)[1]', 'VARCHAR(MAX)') AS LibraryName FROM (SELECT xst.XmlId, xst.XmlData FROM #XmlShreddingTable xst) AS Records
XmlId | LibraryId | LibraryName |
---|---|---|
1 | 51 | We have Books...Read Them or Else! |
2 | 51 | We have Books...Read Them or Else! |
3 | 51 | We have Books...Read Them or Else! |
(3 records affected)
;WITH XMLNAMESPACES ('http://www.example.com/library' AS l, 'http://www.example.com/library/books' AS b) SELECT Records.XmlId, Books.Book.value('(./b:Id)[1]', 'INT') AS Id, Books.Book.value('(./b:Title)[1]', 'VARCHAR(MAX)') AS Title, Books.Book.value('(./b:Author)[1]', 'VARCHAR(MAX)') AS Author, Books.Book.value('(./b:PublicationDate)[1]', 'DATE') AS PublicationDate, Books.Book.value('(./b:Series/b:Name)[1]', 'VARCHAR(MAX)') AS SeriesName, Books.Book.value('(./b:Series/b:Number)[1]', 'VARCHAR(MAX)') AS SeriesNumber, Books.Book.value('@Type', 'VARCHAR(MAX)') AS BookType, Books.Book.value('(./b:Title/@Chapters)[1]', 'INT') AS Chapters FROM (SELECT xst.XmlId, xst.XmlData FROM #XmlShreddingTable xst) AS Records CROSS APPLY Records.XmlData.nodes('(/l:Library/b:Books/b:Book)') AS Books(Book)
XmlId | LibraryId | LibraryName |
---|---|---|
1 | 26 | The Eye of the World |
1 | 87 | The Shadow Rising |
1 | 43 | A Crown of Swords |
2 | 26 | The Eye of the World |
2 | 87 | The Shadow Rising |
2 | 43 | A Crown of Swords |
3 | 26 | The Eye of the World |
3 | 87 | The Shadow Rising |
3 | 43 | A Crown of Swords |
(9 records affected)
Other Considerations
There are plenty of ways to perform these steps some more efficient than others.
Other options could be to create a while loop or create cursors to call stored procedures that perform XML shredding and inserts each to a single table. This would allow you to separate the concerns and follow a command pattern with a master stored procedure that calls all of the others.
If you are creating this data for reports, you may want to consider performing a bulk load of all the data from the XML shredding.
After that you may want to have the stored procedure performing this task accept a date, date range, or flag so you only shred the XML for those records not yet processed.
References
- XQuery Language Reference - There are many ways to shred XML.
- XPath Tutorial - Syntax
- ;WITH XMLNAMESPACES - Taught me how to use them.
- How to Shred a Single Value - This reference as well.
- How to shred collections of nodes
- How to shred attributes
- How to perform a CROSS APPLY
- Common Table Expression (CTE) Overview
- Performant XML Shredding - takes the basics of what was taught here and made it efficient when you need to perform joins using CROSS APPLY and OUTER APPLY.
- These Books are Awesome
Conclusion
Now that we've extracted the data from the XML, you can save the data to tables in your database, maybe store them in TEMP tables to generate a report.
There are plenty of options. This is just the beginning.
The code is located at my GitHub repo under the Tips.XmlShredding folder.
How do you shred your XML? Is there an easier way? Do you know of a way to make this perform faster and more efficiently?