Archive for May 8th, 2008

Back in February, in a slightly plaintive post, the W3 sysadmins asked that people stop hammering their servers with requests for XHTML DTDs. Everyone said yes, this is a stupid problem that wouldn’t have happened if a) the XML spec were less dumb, or b) XML libraries were less dumb.

After that post, I spent two whole days fighting with XML catalogs — possibly the worst-documented XML spec ever — to make sure my Java code wasn’t downloading a DTD every time it read an XHTML document.

To my annoyance, no one seems to have posted any cut-and-paste solutions to this problem. Setting properties on the SAX parser is no help, and the XML catalogs solution is a pain to set up.

So what if someone wrote a “dummy” XML entity resolver that does nothing? Here’s what I came up with:

public class DummyEntityResolver implements EntityResolver {
    public InputSource resolveEntity(String publicID, String systemID)
        throws SAXException {

        return new InputSource(new StringReader(""));
    }
}

Lo and behold, it works! The key is the return line — if you return null, the SAX parser reverts to its default behavior and downloads the DTD.

Use it like this:

XMLReader reader = XMLReaderFactory.createXMLReader();
reader.setEntityResolver(new DummyEntityResolver());
reader.setContentHandler(new YourContentHandler());
reader.parse(your_xml_source);

The catch is that this will break any externally-defined entities, including standard XHTML entities like ©. The built-in XML entities such as &, and numeric character entities like &x43;, will still work.

You can check that you’re not downloading any DTD’s by watching the output of ngrep -q DTD while running your XML parser. If it doesn’t print anything, you’re good.

Comments 1 Comment »

Paul Johnson, in the U.K., wrote a piece about how there is no known “process” for programming.  At some point, all the theory and methodology goes out the window and someone has to sit down, think about the problem, and write some code.

I’m sure I won’t be the only one to suggest this, but I like to think of programming as analogous to writing prose.  You have an idea, a concept, something nebulous in your head, and you have to express it in words.  A good program has both structure and flow, just as good writing does.

In one sense, the languages we program in are far less expressive than any human language, but seen in a mathematical light they are more expressive.  The code for, say, Euclid’s algorithm is much shorter than the English description of what it does, no matter how verbose your statically-typed object-oriented programming language may be.

Comments No Comments »