Stop Your Java SAX Parser from Downloading DTDs

Back in February, in a slightly plaintive post, the W3 sysadmins asked that people stop hammering their servers with requests for XHTML DTDs. Everyone said yes, this is a stupid problem that wouldn’t have happened if a) the XML spec were less dumb, or b) XML libraries were less dumb.

After that post, I spent two whole days fighting with XML catalogs — possibly the worst-documented XML spec ever — to make sure my Java code wasn’t downloading a DTD every time it read an XHTML document.

To my annoyance, no one seems to have posted any cut-and-paste solutions to this problem. Setting properties on the SAX parser is no help, and the XML catalogs solution is a pain to set up.

So what if someone wrote a “dummy” XML entity resolver that does nothing? Here’s what I came up with:

public class DummyEntityResolver implements EntityResolver {
    public InputSource resolveEntity(String publicID, String systemID)
        throws SAXException {
        
        return new InputSource(new StringReader(""));
    }
}

Lo and behold, it works! The key is the return line — if you return null, the SAX parser reverts to its default behavior and downloads the DTD.

Use it like this:

XMLReader reader = XMLReaderFactory.createXMLReader();
reader.setEntityResolver(new DummyEntityResolver());
reader.setContentHandler(new YourContentHandler());
reader.parse(your_xml_source);

The catch is that this will break any externally-defined entities, including standard XHTML entities like ©. The built-in XML entities such as &, and numeric character entities like &x43;, will still work.

You can check that you’re not downloading any DTD’s by watching the output of ngrep -q DTD while running your XML parser. If it doesn’t print anything, you’re good.

5 thoughts on “Stop Your Java SAX Parser from Downloading DTDs”

  1. Thanks this was just what I needed – I myself was a little baffled as to why there was not more information about this on the web, but anyway your solution worked out really well.

  2. Awesome.

    I had the exact same problem. Your solution worked.

    I must point however, that while your solution works perfectly the following, which seems like it could work also, does not:

    SAXParser saxParser = factory.newSAXParser();
    saxParser.getXMLReader().setEntityResolver(new DummyEntityResolver());
    saxParser.parse(new InputSource(conn.getInputStream(), new YourHandler()));

  3. Great!

    That’s my solution too!

    And if you still need the DTD’s to be read you can download and save as a file and you don’t need any internetconnection to run your programm. See below:

    public class DummyEntityResolver implements EntityResolver {
    public InputSource resolveEntity(String publicID, String systemID) throws SAXException {

    try {
    return new InputSource(new FileInputStream(“temp/PropertyList-1.0.dtd”));
    } catch (FileNotFoundException e) {
    e.printStackTrace();
    return null;
    }
    }
    }

Comments are closed.