Stop Your Java SAX Parser from Downloading DTDs
Posted by: Stuart in Programming, tags: DTD, Java, SAX, XMLBack in February, in a slightly plaintive post, the W3 sysadmins asked that people stop hammering their servers with requests for XHTML DTDs. Everyone said yes, this is a stupid problem that wouldn’t have happened if a) the XML spec were less dumb, or b) XML libraries were less dumb.
After that post, I spent two whole days fighting with XML catalogs — possibly the worst-documented XML spec ever — to make sure my Java code wasn’t downloading a DTD every time it read an XHTML document.
To my annoyance, no one seems to have posted any cut-and-paste solutions to this problem. Setting properties on the SAX parser is no help, and the XML catalogs solution is a pain to set up.
So what if someone wrote a “dummy” XML entity resolver that does nothing? Here’s what I came up with:
public class DummyEntityResolver implements EntityResolver {
public InputSource resolveEntity(String publicID, String systemID)
throws SAXException {
return new InputSource(new StringReader(""));
}
}
Lo and behold, it works! The key is the return line — if you return null, the SAX parser reverts to its default behavior and downloads the DTD.
Use it like this:
XMLReader reader = XMLReaderFactory.createXMLReader(); reader.setEntityResolver(new DummyEntityResolver()); reader.setContentHandler(new YourContentHandler()); reader.parse(your_xml_source);
The catch is that this will break any externally-defined entities, including standard XHTML entities like ©. The built-in XML entities such as &, and numeric character entities like &x43;, will still work.
You can check that you’re not downloading any DTD’s by watching the output of ngrep -q DTD while running your XML parser. If it doesn’t print anything, you’re good.

Entries (RSS)
Thanks this was just what I needed - I myself was a little baffled as to why there was not more information about this on the web, but anyway your solution worked out really well.