DTD Magic

I want someone to write a DTD generator that looks at a given URL or referring page and creates a DTD based on whatever XML or HTML tag soup it finds. People crave validation.

18 thoughts on “DTD Magic

  1. So… I was thinking I’d find some of the most snarled, twisted markup I could, and then print it off and use it as wrapping paper this Christmas.

    But then I couldn’t find any really decent candidates. Well, okay, a few, but I mostly try to remember the good sites.

  2. I’d love to see that come into being just to watch the XML snobs writhe on their own petards.

    (Mike: try the markup for excite.com. It’s downright frightening, especially if you run it through a markup whitespace optimizer.)

  3. Ryan, surely you don’t mean well formed in the XML sense? HTML 4 isn’t XML, but it’s got a DTD that you can validate against.

    Maybe you just meant that some tag soup isn’t consistent enough to reverse-engineer a DTD? But I can’t think of anything that would preclude it. DTDs allow for all sorts of element nesting (or not), optional elements, optional attributes…. Surely it would be possible to write a DTD for any kind of tag soup that we could invent. The hard part would be automating it.

  4. Matt: Are you talking about generating a DTD or just figuring out which DOCTYPE declaration needs to be placed at the top of the page based on its contents? I mean, really, it’s the page builder’s responsibility to choose a DTD and conform to it. Failing that, the page will not validate and that’s just a sign of something that needs fixing. Either creating a custom DTD to validate the garbage in a page or picking one that best fits the contents of a page takes the responsibility away from the designer and results in all kinds of unpredictable junk. Or am I misunderstanding you?

    Eric: What do you have against XML snobs? 😉 And yes, I had to eat humble pie when I switched over to application/xhtml+xml. It’s one thing to talk the talk, it’s entirely another to walk it. Touché.

  5. @dougal –

    I meant it in the technical sense. HTML well-formedness isn’t the same as xml wellformedness, but there’s still some basic rules that must be followed (like closing quotes and such).


  6. @Matt: what you want seems to me like something I once did like 7 years ago in perl, when XML was just about to be born. worked with SGML as well, and with long enough documents produced a slimmer (and necessarily incomplete) version of the DTD used to produce the document from. we used it as an extra possible test for our ‘document instances’ as we called them. feels like ages ago..

  7. People love valid code but apparently WordPress doesn’t, preferring to write invalid code (at least, it doesn’t generate the required result when it is rendered correctly) just to accomodate Microsoft’s crappy non-implementation of the 10-year old CSS standard.