Data 2 Documents

A description of the main d2d vocabulary elements and their properties

This page describes the d2d vocabulary, that resides in the d2d namespace http://rdfns.org/d2d/. A basic 'Hello World' example can be found here. More elaborated examples, containing multiple pages that showcase various possibilities of the vocabulary, can be found at http://example.d2dsite.net.

The d2d vocabulary builds upon and extends the following two notions of HTML5:

The separation of content and style
The semantic sectioning of content

Though historically HTML was mainly used to style content, today its main role is to establish a semantic document structure that contains the actual content. The styling of this content is done separately, i.e. using CSS, creating a separation between the content layer and the style layer. The d2d builds on this notion by adding two additional abstract layers: that of re-usable content, i.e. beyond a single document, and that of re-usable rendering of those specific pieces of content into a semantic document structure.

The d2d vocabulary is aligned with HTML5 semantic document elements, such as 'Article', 'Section', 'Header' and 'Footer'. According to the HTML5 specification, Sections and Articles can be nested and contain one another. The difference between the two being that an Article is a 'self-contained' fragment of content that is "in principle, independently distributable or reusable, e.g. in syndication". In other words, a 'Section' can not be taken out of context, while an 'Article' can. The definition of the 'Article' element speaks of syndication; how this syndication is performed in practice though, is out of scope for HTML5. Due to its foundation on RDF, this syndication can be achieved using Data 2 Documents.

Within d2d, each fragment of content that is used in a document is coupled to the semantics of a specific HTML5 element. The vocabulary is used to describe how data from the Web should be used as content, and to couple that content to specific HTML5 semantics. Both existing Linked Data as well as content specifically created for a web document can be used in a web document. This can be done through dereferencing as well as by the use of SPARQL queries. In essence, the Data 2 Documents vocabulary can be used to specify a 'meta model' on top of RDF data that is structured using other, domain oriented vocabularies. Using this meta model, the data is aligned towards publication in a web document. For this reason, the d2d vocabulary does not use strong typing or subclassing towards potential content, as doing so is not always possible and would add additional and potentially unwanted semantics to selected data. Instead, d2d uses a more subtle way of associating selected resources with definitions on how to use them; namely by providing preferred definitions that match specific RDF classes.

Main elements of the d2d vocabulary

Key concepts of the vocabulary are Document and Section. Document refers to the web document as a whole and consists of a hierarchy of nested Sections. One Section -or more precisely an Article which is defined as a subclass of Section- is the root of this hierarchy. Each Section contains one or more Fields which are small fragments of content that together make up the Section. How many Fields a Section or Article has, and of which kind, is specified by a Section Definition that bundles multiple Field Specifications. How that Section or Article is rendered in HTML is specified by a Render Definition. Below is a description of the most important elements of the d2d vocabulary.

d2d:Document

The Document class represents the web document as a whole. It contains properties to specify document specific values such as its (HTML) title and various meta fields. The Document class also points to a (X)HTML template file that is to be used to render the document, and the so called 'root article' of the document. Since the content of the document as a whole can be seen as self contained, the root article is seen as an Article rather than its superclass Section. This follows from their definitions in HTML5. The property to indicate the root article is hasArticle: This is the only prescribed property to place content in a document. In all other cases the predicates that are used for this are defined by the Section Definition that places the (nested) content, which allows for the inclusion of arbitrary Linked Data. Two other important properties of the Document class are renderedBy and prefRenderDef. These properties point to a 'render definition' for the document as a whole and a preferred render definition for the root article of the document. Multiple preferred render definitions can be specified at the document level; this construction makes it possible to render resources from existing Linked Data as Section or Article. In such a case, it is often not possible to add a property specifying the required render definition on the data itself, e.g. if data from an external source is used. By specifying multiple rendering definitions on a higher level, a fitting definition can be chosen later on based on the type of resource that is to be rendered.

d2d:Section

As explained above a Section refers to a fragment of content that can be part of a web document. However, in d2d these fragments are not directly defined as such. This is due to the fact that one cannot force nor expect a data owner to define many resources in its data as an instance of a d2d Section class. Neither would this be a desirable practice from the Linked Data perspective, as this would affect vast quantities of Linked Data since in principle any resource might be used to act as a document Section or Article. Instead, within d2d, one can indicate that a selected resource is to be used as such within the Field Specification that selects it. The Section class has a number of subclasses which add additional semantics to a section such as Article (a self-contained section), Header, Footer, Main, Aside and Nav (section with a specific role within the parent section). All of these match their counterparts in HTML5.

d2d:SectionDefinition

The role of the SectionDefinition class is to create definitions that determine which properties related to a given resource should be used as content for a Section that follows that particular definition. It bundles a number of Field Specifications using a property hasFieldSpec. Furthermore the Section Definition indicates the RDF classes that it fits to, i.e. the classes that can be used to act as the data source for a Section that follows that definition.

d2d:FieldSpecification

The FieldSpecification class specifies how data should be selected for a field that is part of a Section. It has a property mustSatisfy that either a) directly specifies the predicate that the resource acting as Section should have in order to provide content for the field (shorthand notation), or b) points to an instance of class TripleSpecification that contains details on how to select the data. Alternatively, the Field Specification can have a property usesVariable to indicate a SPARQL variable that should be used. The Field Specification also has a property hasFieldType to indicate how the selected data should be treated as content. This is the property that ties a data field directly with HTML5 semantics; indicating whether a selected object should be treated as e.g. a nested Section or a Paragraph of text. There are additional properties to specify things such as the maximum amount of results for a field, required language, whether a field is optional or not, etc.

d2d:TripleSpecification

The TripleSpecification class is used to define a property path in order to select data for a field. The required predicate can be specified as well as details regarding the selected object such as its required type (e.g. xsd:String or foaf:Person) and its role. The role determines how the selected object is used, e.g. as content for the field, as sort key, as SPARQL query, as query endpoint, etc. There is also a role to specify additional preferred Render Definitions. A triple specification can have a property mustSatisfy to chain additional Triple Specifications and create a property path. There is also a property hasAlternative to specify an alternative {Triple Specification that needs to be followed in case its parent is not satisfied.

d2d:RenderDefinition

This class defines how a Section should be rendered. In order to do so, there is a property hasTemplate that can either point to a file containing an HTML5 (sub) template, or a literal holding the actual template. Alternatively, markup can also be specified on a per field basis. Because of the fact that Field Specifications indicate a specific field type such as Section or Paragraph, it is not mandatory to specify a Render Definition. If none is specified, the selected field values are rendered in an HTML5 element that corresponds to the indicated field type. However, the use of a Render Definition provides more control over the rendering.

Schematic overview of the Data 2 Documents vocabulary

Schematic overview of how the Data 2 Documents vocabulary is used to select content. Ovals represent RDF resources, where instance and subClass relations are denoted by italics in those ovals (e.g. ex:MyDocument is an instance of d2d:Document). The Implementation description below will take you through the processing sequence of the vocabulary using the indicators H - J.

Desription of the implementation

In order to test and evaluate the Data 2 Documents vocabulary we developed a reference implementation, that is available as open source. This reference implementation is a generic parsing script for the vocabulary terms. The script selects the data needed for a document and renders it according to the specified definitions. The implementation is based on two recursive functions:

process-Section: A function to process Sections, that calls itself recursively when a field of that Section is a nested Section;
process-Field-Specification: A function to process Field Specifications and the Triple Specifications therein, in order to select and validate data to satisfy the field of a Section. The function calls itself recursively if there are nested Triple Specifications in order to form a property path or specify alternatives.

Besides these main routines there are several other functions that facilitate specific needs such as dereferencing resources, parsing templates and storing them in a temporary template library, etc.

To provide insight in the workings of the reference implementation and the vocabulary, we will describe the processing of an article based on the vocabulary schema shown above. If an HTTP request is made for an instance of class Document, i.e. the main document resource, the implementation performs content negotiation based on the HTTP Accept header. If raw RDF data is requested, the document resource is returned as Symmetric Concise Bounded Description (SCBD). If text/html is requested, the implementation starts interpretation of the Document resource. It loads the specified Rendering Definitions and templates into a temporary library and calls the process-Section function with the IRI of the specified root Article (A) as parameter.

The process-Article function checks the type of the provided resource and searches the temporary library for definitions that are associated with that particular type of class (B). If a Section Definition is found for the class (C), the Field Specifications (D) of that definition are processed in order to select data related to the provided Article resource. How that data should be selected is indicated by the Triple Specification (E) that specifies the predicate (F) that the resource acting as article should have in order to satisfy the particular field. Optionally, the required type of object for that predicate can be specified (G), and by chaining several Triple Specifications a longer property path or alternatives can be specified. Using the information from the Triple Specification, the implementation checks if the provided resource that is acting as article has the required property or property path (H). If this is the case, and the property object matches the (optionally) required type, the property object is selected as data for the field. This data can be a literal, e.g. a paragraph of text, or a resource that is to be processed as a nested Section. Finally, the implementation uses a matching Render Definition (I) from the library to render the selected data, using a (sub)template (J) that is references by the Render Definition. These steps are repeated recursively for each nested Section.

Operational Requirements

The reference implementation has minimal operational requirements. It is implemented in PHP. The decision to implement the first reference implementation in PHP was motivated by the high availability of PHP on common web hosting environments including both Linux and Windows based hosting. The reference implementation uses the EasyRDF library to facilitate operations such as the parsing of several RDF serialisation formats. The exact operational requirements do not exceed those of the EasyRDF library and are as follows: PHP 5.3 or newer with the pcre and mbstring extensions. The implementation can interact with any Linked Data provider as long as it supports dereferencing or provides a SPARQL endpoint.

Data 2 Documents includes a declarative template solution.

Specifying templates is optional, as content is already associated with HTML5 semantic elements and can be rendered on that basis, while further styling can be done using CSS. However, in order to gain more control over document rendering, d2d includes a declarative template solution. (Sub)templates can be nested in a single HTML file that loads and renders individually, to facilitate design.

One of the requirements for Data 2 Documents is to stay within official web standards as much as possible. There are many existing template languages and solutions, but most use imperative terms such as {{if}}, {{then}} and {{for}} in order to express control over content placement. Other requirements are to have templates that can be loaded and viewed separately to facilitate easy web design, and the ability to nest sub templates in the same sense that sections of content are nested in the resulting document; By doing so, a template can contain multiple sub templates that together provide a complete mock-up version of the web document, optionally filled with sample content.

To meet these requirements a declarative template solution was developed for d2d. Templates can be expressed in HTML5 compliant XHTML (polyglot markup), which allows for adding the d2d namespace to the template and use special d2d template tags while staying within official standards. In principal, non-XML based HTML5 templates are also possible though technically speaking these would not be valid HTML5 documents due to the `unknown' template tags being present. However, browsers will skip the unknown tags when loading the template file separately, and in the resulting documents all template tags will be replaced.

The d2d template tags are:

d2d:Template: Indicates the start and end of a (sub) template. Template tags and all their child elements will be removed from the resulting document. A Template tag can be placed within a Content tag to specify sub templates.
d2d:Field: Indicates the placement of a content field. This tag can only occur within a Template tag and will be replaced by the actual content plus any optional HTML that is only placed if the field is present.
d2d:Segment: Can be used to provide several alternatives to render multiple values for a particular field. For example, to render the first value in a different way or treat odd and even indexed values differently. This tag can only occur within a Field tag.
d2d:Content: This tag can only occur within a Segment tag or directly within a Field tag. If specified, the Content tag will be replaced by the actual content, while any markup situated between the parent Field or Segment tag and the Content tag is regarded as optional content that is only placed if the field itself is placed. That is, if the Triple Specification for the field matched and selected a value. Within the Content tag, sample content can be placed to facilitate template design. As such, a nested sub template can also be specified within a Content tag.

Below are a several examples of the various ways the template tags can be used together.

Each 'template' consists of a d2d:Template tag that can contain regular HTML code that is used to render the section or article, and one or more d2d:Field tags that indicate the placement of the actual content within the HTML:

<d2d:Template>
  <!-- Optional HTML; placed if article is placed -->
    <d2d:Field />
  <!-- Optional HTML; placed if article is placed -->
    <d2d:Field />
  <!-- Optional HTML; placed if article is placed -->
</d2d:Template>

A template can be specified as a literal value within a Render Definition or in a separate template file that possibly contains multiple (nested) templates. In the former case, the d2d:hasTemplate property of the Render Definition has a literal value containing the template. In the latter case, d2d:hasTemplate has a resource IRI as value, being the IRI of the template file. Templates embedded as literals are useful for small, additional templates while a separate file with nested templates facilitates easy Web design, because it allows a Web designer to create a complete page design separate from the data (compare our special example website with its separate main template file). Typically, template tags can be added after the complete design is finished. Because a template file can contain multiple (nested) templates, templates within such a file must indicate the Render Definition they are for, using the d2d:for property:

<d2d:Template d2d:for="http://Render-Definition-IRI" />

The fields within a template can be defined with varying options and levels of complexity. The most basic version is shown below; it simply gets replaced by the actual content:

<d2d:Field />

This more advanced version allows for conditional HTML, that is placed only if there is a value to be placed for the specific field. Here, the d2d:Content tag is replaced by the actual field content:

<d2d:Field>
  <!-- Optional HTML; placed if field is placed -->
    <d2d:Content />
  <!-- Optional HTML; placed if field is placed -->
</d2d:Field>

This is the extended version of a content field: It allows for conditional HTML and sample content that facilitates template design. The sample content between the d2d:Content tags gets replaced by the actual content. As the sample content gets replaced, it can contain nested templates for nested articles or sections that would be placed as value for the particular field:

<d2d:Field>
  <!-- Optional HTML; placed if field is placed -->
    <d2d:Content>
      <!-- Sample content; gets replaced -->
      <!-- Can contain nested Templates -->
    </d2d:Content>
  <!-- Optional HTML; placed if field is placed -->
</d2d:Field>

This is the full version of a content field: It allows for multiple segments, conditional HTML and sample content. Using segments, a separate rendering can be defined in case there are possibly multiple values for a field. For example, in case a fiels contains a list of nested articles, a different rendering can be specified for odd and even articles:

<d2d:Field>
  <!-- Optional HTML; placed if field is placed -->
    <d2d:Segment d2d:matchRule="odd">
      <!-- Optional HTML -->
        <d2d:Content>
          <!-- Sample content; gets replaced -->
          <!-- Can contain nested Templates -->
        </d2d:Content>
      <!-- Optional HTML -->
    </d2d:Segment>
    <d2d:Segment d2d:matchRule="even">
      <!-- Optional HTML -->
        <d2d:Content>
          <!-- Sample content; gets replaced -->
          <!-- Can contain nested Templates -->
        </d2d:Content>
      <!-- Optional HTML -->
    </d2d:Segment>
  <!-- Optional HTML; placed if field is placed -->
</d2d:Field>

Field and Content tags get replaced in the order they appear in the template. Alternatively, you can specify an index to indicate a different ordering; the following example results in a field ordering of 1, 4, 2, 3 with respect to the order of the Field Specifications in the Article or Section Definition:

<d2d:Field />
<d2d:Field d2d:index="4" />
<d2d:Field />
<d2d:Field />

Field and Content tags can be nested within each other; the following example results in nothing being placed if a name is not specified, even if an age is specified for the resource acting as section or article:

<d2d:Field>
  <p>
    Name: <d2d:Content />
  </p>
  <d2d:Field>
    <p>
      Age: <d2d:Content />
    </p>
  </d2d:Field>
</d2d:Field>