I want someone to write a DTD generator that looks at a given URL or referring page and creates a DTD based on whatever XML or HTML tag soup it finds. People crave validation.
Well ignoring commercial software like XMLSpy – there is XMLBuddy for Eclipse. However, writing a PHP-based DTD/Schema generator shouldn’t be that difficult; maybe even as a WordPress plugin. 😉
Ryan, surely you don’t mean well formed in the XML sense? HTML 4 isn’t XML, but it’s got a DTD that you can validate against.
Maybe you just meant that some tag soup isn’t consistent enough to reverse-engineer a DTD? But I can’t think of anything that would preclude it. DTDs allow for all sorts of element nesting (or not), optional elements, optional attributes…. Surely it would be possible to write a DTD for any kind of tag soup that we could invent. The hard part would be automating it.
Matt: Are you talking about generating a DTD or just figuring out which DOCTYPE declaration needs to be placed at the top of the page based on its contents? I mean, really, it’s the page builder’s responsibility to choose a DTD and conform to it. Failing that, the page will not validate and that’s just a sign of something that needs fixing. Either creating a custom DTD to validate the garbage in a page or picking one that best fits the contents of a page takes the responsibility away from the designer and results in all kinds of unpredictable junk. Or am I misunderstanding you?
I meant it in the technical sense. HTML well-formedness isn’t the same as xml wellformedness, but there’s still some basic rules that must be followed (like closing quotes and such).
@Matt: what you want seems to me like something I once did like 7 years ago in perl, when XML was just about to be born. worked with SGML as well, and with long enough documents produced a slimmer (and necessarily incomplete) version of the DTD used to produce the document from. we used it as an extra possible test for our ‘document instances’ as we called them. feels like ages ago..
People love valid code but apparently WordPress doesn’t, preferring to write invalid code (at least, it doesn’t generate the required result when it is rendered correctly) just to accomodate Microsoft’s crappy non-implementation of the 10-year old CSS standard.
IBM used to have an alphaworks tool (c. 2000?) that took an XML document as input and outputted a DTD for the document. I don’t see it online now, though.
Tag soup is not always valid SGML, though, which would preclude a DTD.
True.
So… I was thinking I’d find some of the most snarled, twisted markup I could, and then print it off and use it as wrapping paper this Christmas.
But then I couldn’t find any really decent candidates. Well, okay, a few, but I mostly try to remember the good sites.
Might Trang come close to what you want?
I’d love to see that come into being just to watch the XML snobs writhe on their own petards.
(Mike: try the markup for excite.com. It’s downright frightening, especially if you run it through a markup whitespace optimizer.)
Tag soup is not always well formed, which would also preclude any chance of validation.
You don’t need to be well formed to be valid HTML. Well-formness is a requirement of XML.
Well ignoring commercial software like XMLSpy – there is XMLBuddy for Eclipse. However, writing a PHP-based DTD/Schema generator shouldn’t be that difficult; maybe even as a WordPress plugin. 😉
We need merely imbue the DTD generator with intelligence and it will remake the tag soup in its own image.
Ryan, surely you don’t mean well formed in the XML sense? HTML 4 isn’t XML, but it’s got a DTD that you can validate against.
Maybe you just meant that some tag soup isn’t consistent enough to reverse-engineer a DTD? But I can’t think of anything that would preclude it. DTDs allow for all sorts of element nesting (or not), optional elements, optional attributes…. Surely it would be possible to write a DTD for any kind of tag soup that we could invent. The hard part would be automating it.
Matt: Are you talking about generating a DTD or just figuring out which DOCTYPE declaration needs to be placed at the top of the page based on its contents? I mean, really, it’s the page builder’s responsibility to choose a DTD and conform to it. Failing that, the page will not validate and that’s just a sign of something that needs fixing. Either creating a custom DTD to validate the garbage in a page or picking one that best fits the contents of a page takes the responsibility away from the designer and results in all kinds of unpredictable junk. Or am I misunderstanding you?
Eric: What do you have against XML snobs? 😉 And yes, I had to eat humble pie when I switched over to application/xhtml+xml. It’s one thing to talk the talk, it’s entirely another to walk it. Touché.
Guys… stop geeking out about well formedness, it’s more of a wry commentary than a sincere request, I’d think!
@dougal –
I meant it in the technical sense. HTML well-formedness isn’t the same as xml wellformedness, but there’s still some basic rules that must be followed (like closing quotes and such).
-ryan
I wrote a little short script that’ll do something like this for you. I wrote about it on my own blog: http://nokrev.com/older/custom-dtds-on-the-fly/ (I also pinged this site).
@Matt: what you want seems to me like something I once did like 7 years ago in perl, when XML was just about to be born. worked with SGML as well, and with long enough documents produced a slimmer (and necessarily incomplete) version of the DTD used to produce the document from. we used it as an extra possible test for our ‘document instances’ as we called them. feels like ages ago..
People love valid code but apparently WordPress doesn’t, preferring to write invalid code (at least, it doesn’t generate the required result when it is rendered correctly) just to accomodate Microsoft’s crappy non-implementation of the 10-year old CSS standard.
IBM used to have an alphaworks tool (c. 2000?) that took an XML document as input and outputted a DTD for the document. I don’t see it online now, though.