18 Comments

  • Justin November 30, 2005 @ 7:50 pm

    Tag soup is not always valid SGML, though, which would preclude a DTD.

  • Matt November 30, 2005 @ 8:00 pm

    True.

  • Mike Purvis November 30, 2005 @ 8:15 pm

    So… I was thinking I’d find some of the most snarled, twisted markup I could, and then print it off and use it as wrapping paper this Christmas.

    But then I couldn’t find any really decent candidates. Well, okay, a few, but I mostly try to remember the good sites.

  • Dorothea November 30, 2005 @ 8:27 pm

    Might Trang come close to what you want?

  • Eric November 30, 2005 @ 9:21 pm

    I’d love to see that come into being just to watch the XML snobs writhe on their own petards.

    (Mike: try the markup for excite.com. It’s downright frightening, especially if you run it through a markup whitespace optimizer.)

  • ryan king December 1, 2005 @ 2:06 am

    Tag soup is not always well formed, which would also preclude any chance of validation.

  • David Dorward December 1, 2005 @ 3:49 am

    You don’t need to be well formed to be valid HTML. Well-formness is a requirement of XML.

  • Michael Cutler December 1, 2005 @ 8:16 am

    Well ignoring commercial software like XMLSpy – there is XMLBuddy for Eclipse. However, writing a PHP-based DTD/Schema generator shouldn’t be that difficult; maybe even as a WordPress plugin. ;)

  • Erundur December 1, 2005 @ 8:59 am

    We need merely imbue the DTD generator with intelligence and it will remake the tag soup in its own image.

  • Dougal Campbell December 1, 2005 @ 9:15 am

    Ryan, surely you don’t mean well formed in the XML sense? HTML 4 isn’t XML, but it’s got a DTD that you can validate against.

    Maybe you just meant that some tag soup isn’t consistent enough to reverse-engineer a DTD? But I can’t think of anything that would preclude it. DTDs allow for all sorts of element nesting (or not), optional elements, optional attributes…. Surely it would be possible to write a DTD for any kind of tag soup that we could invent. The hard part would be automating it.

  • Ara Pehlivanian December 1, 2005 @ 9:17 am

    Matt: Are you talking about generating a DTD or just figuring out which DOCTYPE declaration needs to be placed at the top of the page based on its contents? I mean, really, it’s the page builder’s responsibility to choose a DTD and conform to it. Failing that, the page will not validate and that’s just a sign of something that needs fixing. Either creating a custom DTD to validate the garbage in a page or picking one that best fits the contents of a page takes the responsibility away from the designer and results in all kinds of unpredictable junk. Or am I misunderstanding you?

    Eric: What do you have against XML snobs? ;) And yes, I had to eat humble pie when I switched over to application/xhtml+xml. It’s one thing to talk the talk, it’s entirely another to walk it. Touché.

  • Firas December 1, 2005 @ 6:00 pm

    Guys… stop geeking out about well formedness, it’s more of a wry commentary than a sincere request, I’d think!

  • ryan king December 2, 2005 @ 1:06 am

    @dougal -

    I meant it in the technical sense. HTML well-formedness isn’t the same as xml wellformedness, but there’s still some basic rules that must be followed (like closing quotes and such).

    -ryan

  • Jeff Wheeler December 3, 2005 @ 12:49 am

    I wrote a little short script that’ll do something like this for you. I wrote about it on my own blog: http://nokrev.com/older/custom-dtds-on-the-fly/ (I also pinged this site).

  • carsten December 3, 2005 @ 2:40 pm

    @Matt: what you want seems to me like something I once did like 7 years ago in perl, when XML was just about to be born. worked with SGML as well, and with long enough documents produced a slimmer (and necessarily incomplete) version of the DTD used to produce the document from. we used it as an extra possible test for our ‘document instances’ as we called them. feels like ages ago..

  • David Russell December 3, 2005 @ 2:42 pm

    People love valid code but apparently WordPress doesn’t, preferring to write invalid code (at least, it doesn’t generate the required result when it is rendered correctly) just to accomodate Microsoft’s crappy non-implementation of the 10-year old CSS standard.

  • Jay Fienberg December 6, 2005 @ 12:04 am

    IBM used to have an alphaworks tool (c. 2000?) that took an XML document as input and outputted a DTD for the document. I don’t see it online now, though.

Share Your Thoughts