The explicit and the implicit (RDF vs XML)

And in this corner …

RDF makes data explicit:

  • The RDF graph supports manifest relationships between any two objects.

    By contrast, the XML tree supports hierarchical containment relationships. Other relationships require something in addition to XML proper — either XLink (which has seen limited adoption) or, more typically, an attribute with referencing semantics in the vocabulary.

  • RDF dynamic typing annotates an object with multiple types as the properties of the object qualify it for those types.

    By contrast, an XML element tags an object with a single name. To capture other names again requires something in addition to XML proper — typically, a class or role attribute with type list semantics in the vocabulary.

  • RDF inference adds explicit data to the graph incrementally based on the implications of the current graph.

    XML itself has no method for identifying and operating on implications of the data. Applications for specific vocabularies sometimes add data to documents or, because additions might be invalid for the schema, rewrite documents completely. As often, vocabularies mandate application rules that don’t add data to the document.

Explicit data makes for easy processing, integration, and, more generally, agile data (as Lee Feigenbaum notes).

And yet …

An explicit document is more difficult to create, maintain, and understand. We’re adept at recognition. Anything that we would get without being told gets in the way: annoying at best; at worst, hiding the real news. Explicit data can also be awkward, as Benjamin Nowack notes with respect to tunneling structural and subject metadata through HTML with RDFa. In short, for authored documents, an XML markup with audience-sensitive implication is much more practical.

That leaves the challenge of getting from the XML representation suitable for people to the RDF representation that’s optimal for processors — a problem known in the RDF community as “lift.” The traditional answer has been GRDDL, in which an XML document refers to external XSLT transforms that extract the data from the document. XSLT, however, has a mismatch for the lift problem in that the output is RDF/XML — an XML document that happens to be a serialization of RDF — rather than RDF directly.

XSPARQL, a proposal for integration of the XQuery query language for XML and the SPARQL query language for RDF, has a lot of promise for the lift problem. (Check out the use cases.) Although I didn’t find an example, it would seem plausible to embed XSPARQL within a script element, along the lines of:

<script type="application/xsparql">
declare namespace html="http://www.w3.org/1999/xhtml/";
declare namespace dc="http://purl.org/dc/elements/1.1/";
let $page := fn:document-uri(/)
let $title := /html:html/html:head/html:title/text()
construct {
    $page dc:title $title .
}
</script>

A small subset of XSPARQL would be enough: a sequence of XPaths to extract values from the XML and a construct clause to build the triples. The XSPARQL could be referenced externally to share among XHTML documents or embedded to package with the XHTML documents.

Such an approach would be more flexible and convenient than the current alternatives (RDFa and GRDDL), leveraging both XML and RDF to maximum advantage.

This entry was posted in Semantic Web, XML. Bookmark the permalink.

Leave a comment