Parse sitemap with XmlSlurper

This example will demonstrate how to parse a sitemap file with Groovy's XmlSlurper. A sitemap typically in xml or html format is a file hosted on websites that contain a list of accessible pages for crawlers or users to view. Below we will make a request to leveluplunch's xml version of the sitemap and print out the elements. The highest level node is a urlset containing one to many urls. The url node has the location of the page, last modified date and priority elements. For a full listing and description of elements be sure to check out sitemaps.org

Reading sitemap

@Test
public void parse_rss_xml_feed() {

    def siteMapLocation = "http://www.leveluplunch.com/sitemap.xml".toURL().text

    def urlset = new XmlSlurper().parseText(siteMapLocation)
    urlset.url.each{
        println it.loc
        println it.lastmod
        println it.priority
        println "^^^^^^^^"
    }
}

Output

http://www.leveluplunch.com/blog/2014/10/21/solving-for-enum-inheritance-extend-mixin/
2014-10-21T11:46:18-05:00
0.8
^^^^^^^^
http://www.leveluplunch.com/blog/2014/10/09/why-agile-could-fail-in-large-enterprise/
2014-10-09T11:46:18-05:00
0.8
^^^^^^^^
http://www.leveluplunch.com/blog/2014/09/29/amazon-cloudfront-s3-subfolders-default-index/
2014-09-29T11:46:18-05:00
0.8
^^^^^^^^