<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Robust HTML parsing the Groovy way</title>
	<atom:link href="http://www.maclovin.de/2010/02/robust-html-parsing-the-groovy-way/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.maclovin.de/2010/02/robust-html-parsing-the-groovy-way/</link>
	<description>An Apple a day keeps the Windows away</description>
	<lastBuildDate>Sat, 07 Aug 2010 10:55:23 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
	<item>
		<title>By: Dennis</title>
		<link>http://www.maclovin.de/2010/02/robust-html-parsing-the-groovy-way/comment-page-1/#comment-519</link>
		<dc:creator>Dennis</dc:creator>
		<pubDate>Sat, 13 Feb 2010 12:42:57 +0000</pubDate>
		<guid isPermaLink="false">http://www.maclovin.de/?p=445#comment-519</guid>
		<description>Thanks for pointing to this. I know some other libraries aiming the same goal, e.g. TidyHTML, but I never heard of NekoHTML.
Looks like this one is the way to go if you would like to use a DOMParser, though I really like XMLSlurper&#039;s syntax in Groovy.

Performance is not really an issue in my projects, so I wouldn&#039;t really care which one is faster. More important is reliability. How close is the result to the real intention of the page.</description>
		<content:encoded><![CDATA[<p>Thanks for pointing to this. I know some other libraries aiming the same goal, e.g. TidyHTML, but I never heard of NekoHTML.<br />
Looks like this one is the way to go if you would like to use a DOMParser, though I really like XMLSlurper&#8217;s syntax in Groovy.</p>
<p>Performance is not really an issue in my projects, so I wouldn&#8217;t really care which one is faster. More important is reliability. How close is the result to the real intention of the page.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Johann</title>
		<link>http://www.maclovin.de/2010/02/robust-html-parsing-the-groovy-way/comment-page-1/#comment-518</link>
		<dc:creator>Johann</dc:creator>
		<pubDate>Fri, 12 Feb 2010 19:36:27 +0000</pubDate>
		<guid isPermaLink="false">http://www.maclovin.de/?p=445#comment-518</guid>
		<description>I&#039;ve been using &lt;a href=&quot;http://nekohtml.sourceforge.net/&quot; rel=&quot;nofollow&quot;&gt;NekoHTML&lt;/a&gt; lately. The syntax is a bit different though and I haven&#039;t benchmarked yet if it&#039;s faster or slower than XmlSlurper.

FYI, the syntax is a little like this in Groovy:

&lt;code&gt;
	String get(def uri) {
		builder.request(uri, GET, TEXT, {}).text
	}
	
	Document document(def uri) {
		DOMParser parser = new DOMParser()
		parser.parse(new InputSource(new StringReader(get(uri))))
		parser.document
	}
&lt;/code&gt;

(Builder is an &lt;a href=&quot;http://groovy.codehaus.org/modules/http-builder/&quot; rel=&quot;nofollow&quot;&gt;HTTP Builder&lt;/a&gt;)</description>
		<content:encoded><![CDATA[<p>I&#8217;ve been using <a href="http://nekohtml.sourceforge.net/" rel="nofollow">NekoHTML</a> lately. The syntax is a bit different though and I haven&#8217;t benchmarked yet if it&#8217;s faster or slower than XmlSlurper.</p>
<p>FYI, the syntax is a little like this in Groovy:</p>
<p><code><br />
	String get(def uri) {<br />
		builder.request(uri, GET, TEXT, {}).text<br />
	}</p>
<p>	Document document(def uri) {<br />
		DOMParser parser = new DOMParser()<br />
		parser.parse(new InputSource(new StringReader(get(uri))))<br />
		parser.document<br />
	}<br />
</code></p>
<p>(Builder is an <a href="http://groovy.codehaus.org/modules/http-builder/" rel="nofollow">HTTP Builder</a>)</p>
]]></content:encoded>
	</item>
</channel>
</rss>
