16 thoughts on “Structured Blogging

  1. Yes ideally a solution for this wouldn’t need to duplicate all of the content. Putting structure around existing content I think would be much more effective, and gain wider adoption.

  2. It may actually be perfectly valid XHTML, the W3C Validator tends to puke on DTDs it doesn’t know about, even if they are valid XML.

    I do agree that this is something that could be done with structured markup rather than XML, though. No need to reinvent the wheel…

  3. Jay wrote: “the W3C Validator tends to puke on DTDs it doesn’t know about, even if they are valid XML.”

    Correct. Any use of namespaces in XHTML will generate conformance complaints. This is discussed in section 3.1.2 of the XHTML V1.0 specification at: http://www.w3.org/TR/xhtml1/#well-formed.

    “3.1.2. Using XHTML with other namespaces — The XHTML namespace may be used with other XML namespaces as per [XMLNS], although such documents are not strictly conforming XHTML 1.0 documents as defined above. Work by W3C is addressing ways to specify conformance for documents involving multiple namespaces. For an example, see [XHTML+MathML].”

    This paragraph is usually taken to mean that while using namespaces in XHTML is not strictly conformant with the XHTML V1.0 specification, it is “ok” to use namespaces. At some future time, the W3C will “address ways to specify conformance for documents involving multiple namespaces.”

    The alternative to using XML syntax and namespaces for the subnode Little-Language would have been a great deal of reinvention and added complexity. Frankly, we’re really pleased by how much of existing code we can leverage by sticking to standard XML and HTML mechanisms. For instance, given our current syntax, it is trivial to build processors for the subnode language using XSLT. An example of this is our “reference implementation” which you can see at: http://structuredblogging.org/download/x-subnode-processor.xsl. That tiny implementation does the job and serves as a nice framework for more complex implementations. This ease of implementation simply wouldn’t have been possible if we had some non-standard non-XML syntax.

    We are *very* concerned to ensure that subnode stays within the HTML, XHTML, and XML standards. We are not interested in defining proprietary solutions. Thanks for your comment and please let me know of any other standards conformance issues that you might see.

    bob wyman

  4. > This paragraph is usually taken to mean that while using namespaces in XHTML is not strictly conformant with the XHTML V1.0 specification, it is “ok” to use namespaces.

    I don’t read it that way at all. A document is either valid or not, you can’t have “sort-of-valid-even-though-the-spec-says-otherwise” documents.

    I read it as saying that you can use XHTML element types within other XML documents, and that the XHTML semantics should apply, even though that document type is not XHTML.

  5. Matt asked about XHTML Modularization (i.e. XHTML M12N…)

    M12N offered a great deal of promise when it was first proposed many years ago, and it has certainly made it easier to understand and think about XHTML itself, however, as we all know, it hasn’t really caught on and gotten the support in browsers and XHTML processing tools that we might have liked. The current W3C Recommendation is pretty musty (10 Apr 2001) and although there are recent working drafts that update it, I have no idea when we’re going to see a new recommendation. The current working draft finally addresses the issue of supporting XML Schemas — rather then just DTDs, but relying on it would be relying on non-standard features and we really wanted to do something within the existing standards.

    In any case, there are some serious problems with XHTML M12N that made it impossible for us to adopt it for this specific application although it is a very useful solution for a great many other problems.

    We needed a solution that would work not only in XHTML but also in HTML and with the HTML or XHTML “fragments” that are typically inserted into XML feeds such as RSS and Atom. Clearly, M12N wouldn’t get us anywhere with HTML — and unfortunately, that is what a great many people still generate today. Also, since M12N relies on declaring DTDs, there isn’t any practical way to use it in RSS/Atom feeds since there is no way to tell a feed aggregator what to do with an entry-specific DTD. (Note: The “description” elements of RSS/Atom entries don’t contain full HTML or XHTML documents… They don’t include DTD declarations.) Even if this wasn’t a problem, we’d still have the issue of “teaching” the browsers and feed aggregators how to present or style elements from the M12N modules that we’d defined.

    If we had used M12N, we would have been able to produce something which was “conformant” to the W3C Recommendations, but it would have been incompatible with most of the browsers and other tools that we believe structured blogging needs to work with. It also would have made the process of defining and modifying type definitions much more complicated then in our solution. Instead of just inserting a script tag, you would have had to define DTD, style-sheets, etc. for each schema. That’s too hard for most people.

    The method we’ve defined — creating the subnode Little-Language, achieves our purposes in that it can be silently and quietly inserted into HTML, XHTML or RSS/Atom document fragments. As far as we can tell, just about all browsers will ignore the script tags we insert since they don’t have processors for the subnode script language. The downside is, admittedly, a bit of bloat that results from repeated content. However, we think that this bloat is probably worth the cost given that it allows us to build a whole class of very interesting applications that wouldn’t otherwise be possible. Perhaps, if we get some experience with these new applications and uses, we’ll be able to motivate folk to address the general problem and invent better solutions. (i.e. Let experience drive design.)

    I should probably clarify that we’re not completely wedded to the approach we’ve taken. If someone can come up with something better that is at least as standards-conformant as what we’ve defined, we’d be happy to switch to it. Consider what we’ve done so far as simply a recommendation — our best guess of the best available solution today — and as a “request for comment”. Our interest is in having structured blogging become a reality — we don’t really care how it happens.

    bob wyman

  6. Jim (In comment 7): If you are truly concerned about the use of namespaces in XHTML prior to the W3C clarifying the conformance issues, then simply wrap the contents of subnode script blocks in CDATA tags. Doing so will, in fact, bring you into full compliance with XHTML although it will make it harder for people to process the data. (note: a subnode language processor should be prepared to work with data with or without CDATA wrappers or data which is url-encoded since some people will seek 100% standards conformance.). Please note that as far as we can tell, the *only* XHTML processor that actually cares about the use of namespaces or XML inside script elements is the XHTML validator… Because these namespaces are only used *inside* the subnode script element, browsers ignore the stuff.

    bob wyman

  7. Maybe I’m missing something here, but if it’s designed to be machine-read, won’t the machines be grabbing it from the RSS/Atom feeds? Wouldn’t it be easier to just add <sb:review-type>, <sb:rating> etc. elements to those feeds?

  8. Sam Angove wrote: “Wouldn’t it be easier to just add &sb:review-type>… to those feeds?”

    Yes, it would be much easier. However, it wouldn’t accomplish our goals and would be a significantly lower utility. The method we have defined allows structured data to be included in HTML, XHTML and feeds. The method you propose would work only in feeds. Additionally, the method that you propose would only be useful if feed aggregators were modified to support these format extensions. These two limitations: 1) Scope limited to feeds only and 2) Requirement for modified aggregators, would just about guarantee that doing this was useless.

    By defining subnode so that it conforms to standards and can be inserted in HTML and XHTML as well as in feeds, we make it possible for users to benefit from the data wherever it is found. And, we make it trivial to do things like build FireFox extensions that can recognize and support structured blogging. Additionally, because our method supports non-feeds (i.e. HTML and XHTML on web sites) it means that we can build applications that discover these inclusions in either the world of feeds or the world of pages. The idea here is to try to break down some of the walls between blogging and non-blogging and bring the better ideas of blogging to the rest of the web.

    Jeff Minard wrote: “this would be a good time to release that ‘per post template’ plugin I have”
    Yes, I think so! Hopefully, you’ll extend it to support structured blogging subnodes. Frankly, I think one of the nice user-benefits of our approach is that it allows entries to be “typed”. i.e. They presentation is dependent on their content or intent. Frankly, one of the things I really don’t like about most blogs is that every entry looks the same: Basically a blob of text/html… With per-post formatting, I think we’ll be able to make the blog reading experience much more enjoyable and easy — as long as people don’t get crazy in the variety of post-formats that they support. Even if such per-post templates don’t support full subnode structured blogging, I would ask that you at least do something to insert a “post-type” element in the posts so that people who are using search engines like Google or Feedster or matching engines like PubSub’s can search for entries based on type. (i.e. “Show me all ‘music-reviews’ that contain the keyword ‘Beatles’.”)

    bob wyman

  9. To address people’s concerns for strict XHTML conformance, we’ve been trying to build an update to our structured-blogging plug-in that would, by default, wrap the subnode language text in CDATA blocks. This is an appropriate thing to do according to the XHTML spec. However, we’re finding that it doesn’t work with WordPress since the the_content() function in template-functions-post.php escapes CDATA. There doesn’t appear to be any way that a plug-in like ours can selectively override this function. The alternative would be to do character escaping of the subnode language text, however, that strikes me as a pretty ugly solution. Before we make progress on resolving this issue, we’ll have to talk to Matt I think…

    (Note: In theory, for strict XHTML conformance, whenever you use JavaScript or any other script language in XHTML you should be escaping it. But, nobody seems to do that… Grumble…) Anyway, we’re trying to make this all as “clean” as it can be. Please bear with us and please send us any other suggestions that you might have.

    bob wyman

  10. Bob, yeah, the CDATA strip is kinda a bug in WP IMO. A few other authors have run into this same issue.

    If nothing else, it shouldn’t be getting done after the filters have been run, otherwise plugin authors loose the ability to fix it *back*.

  11. Aren’t we going to need modified aggregators anyway? Adding new elements should be easier to support — surely it’s trivial compared to discovering and parsing the embedded subnode? There’s still a lot of benefit there, though I see the point that it limits the scope too much.

  12. Jeff Minard wrote: “the CDATA strip is kinda a bug in WP IMO”… Whether or not it’s a “bug” is up for some debate… In any case, I talked to Matt about this and I think we understand how to move forward. But, it will take some time. Note: I don’t think anyone should get terribly worried about this in the meantime. Remember, virtually nobody ever escapes JavaScript or VBScript in XHTML and it doesn’t seem to have any bad effects…

    Sam Angove wrote: “Aren’t we going to need modified aggregators anyway?” Ideally, we would have modified aggregators. However, the approach we’ve defined works even if we don’t since we’re including the structured data in a way that aggregators and browsers should ignore if they don’t have a subnode processor. Additionally, the subnode stuff works with browsers and search engines as well as aggregators. The “cost” of the wider scope is duplication of data. Personally, I think it is a reasonable price to pay. My hope is that others will as well.

    Embedding the data in new elements is the approach taken by Joe Reger at http://reger.com. His “DataBlogging” relies on embedding new “EntryData” elements in blog feeds. Take a look at all the specialized “logs” that Reger produces if you want to get some ideas on what a semantic-aware blogging system that uses structured data can do… However, his approach requires support from the aggregator and won’t work in HTML/XHTML…

    bob wyman

Comments are closed.