Getting started with XProc

You hear at times that tools should just work — that your first impulse should always be right, and the overall experience should be like easing into a bath. That might make sense for familiar productivity tools, but should we never learn a new way of thinking about problems?

After deciding to write a more serious XSL tags preprocessor, I needed a way to run tests. The obvious solution would have been Ant, but it gave me a chance to go swimming in XProc (the pipeline language for processing XML documents).

It’s interesting to watch yourself learn a new tool — making mistakes that (in retrospect) are just silly because you’re trying to figure out the conceptual model, syntax, et al all at the same time. Eventually, after enough blunders, things start working. (Or, as Blake put it, “If the fool would persist in his folly he would become wise.”)

In hopes it might help someone else get up to speed, here’s a list of practical realizations (all no doubt spelled out somewhere in the specification):

  • Make the top element either
    • p:pipline for a pipeline that takes external input and output arguments
    • p:declare-step for a pipeline that identifies its own inputs and outputs
  • When a p:declare-step pipeline uses XSLT, specify a parameters port to avoid an XProc error, as in:
    <p:input port="parameters" kind="parameter" primary="true"/>
  • Use attributes for static values and a nested p:with-option element for dynamic values, as in:
    <p:store omit-xml-declaration="false">
     <p:with-option name="href" select="concat($builddir, $namein)"/>
    </p:store>
  • Apply a sequence of operations to an XML document with a list of steps that identify neither input nor output.
  • Reuse the output from a step as input for multiple steps (for instance, to generate different formats from a single XML document) by referring to the step that produces the output. Don’t bother changing the default “result” and “source” names for the output and input ports of the producer and consumer. Instead:
    1. Set the @name attribute on the step that produces the XML.
    2. Supply p:input and p:pipe subelements in the steps that consume the XML, as in:
    <p:wrap-sequence name="tests" wrapper="test-files"/>
    
    <p:xslt name="expected">
      <p:input port="source">
        <p:pipe step="tests" port="result"/>
      </p:input>

    In short, XProc has a pull instead of push polarity.

  • Save modified XML in an intermediate file (similar to a tee in a UNIX / Linux pipe) by using the same technique with the p:store element as one consumer and the next real pipeline step as a separate consumer, as in:
    <p:xslt name="expected">
      ...
    </p:xslt>
    
    <p:store>
      <p:with-option name="href" select="'xstag2xsl.xspec'"/>
    </p:store>
    
    <xpc:xspec>
      <p:input port="source">
        <p:pipe step="expected" port="result"/>
      </p:input>
    </xpc:xspec>
  • Process the files in a directory with:
    1. A p:directory-list step with a regular expression to match the files. The file list gets wrapped up as a single XML document (using a built-in XProc vocabulary with c:directory and c:file elements) and passed along the pipeline.
    2. A p:for-each step with a p:iteration-source subelement to iterate over the child elements identifying each file.
    3. A p:load step to get the file contents, as in:
    <p:directory-list include-filter=".*\.xspec$">
      <p:with-option name="path" select="$testdir"/>
    </p:directory-list>
    <p:for-each>
      <p:iteration-source select="/c:directory/c:file"/>
    
      <p:variable name="namein" select="/c:file/@name"/>
    
      <p:load>
        <p:with-option name="href" select="concat($testdir, $namein)"/>
      </p:load>
    
      ...
    </p:for-each>
  • Reuse the file list after you’re done loading and processing the files by selecting a filename step for the output of p:for-each:
    <p:for-each>
      <p:output port="result">
        <p:pipe step="testname" port="result"/>
      </p:output>
    
      <p:iteration-source select="/c:directory/c:file"/>
    
      <p:variable name="namein" select="/c:file/@name"/>
    
      <p:identity name="testname"/>
    
      ...
    
      <p:store>
        <p:with-option name="href" select="concat($resultdir, $namein)"/>
      </p:store>
    </p:for-each>
    <p:wrap-sequence name="tests" wrapper="test-files"/>

    That is, the p:for-each step can specify which subordinate step generates its output.

  • Supply simple transforms inline within the pipeline:
    <p:xslt>
      <p:input port="stylesheet">
        <p:inline>
    <xsl:stylesheet version="2.0">
      ...
    </xsl:stylesheet>
        </p:inline>
      </p:input>
    </p:xslt>

Initial reactions and speculations: I do miss some of Ant’s features — in particular, managing dependencies and skipping steps by checking timestamps on intermediate files. (Also, I haven’t checked whether xmltask improves the Ant story.) On the other hand, applying operations to XML in memory instead of applying tools to XML files is most beneficial, and inline XSLT is the bee’s knees.

Inline XSLT is so convenient that it cries out for co-mingling XProc and XSLT. Instead of using the fine-grained XML manipulation steps, it might be more straightforward for adopters to apply an XSLT template as a single pipeline step (where the template takes its current node from the step input and converts its result to the step output).

More importantly, would complex pipelines be more maintainable as a kind of state machine? After all, a pipeline step can be considered a state transition. If so, SCXML offers one model for state choreography. But so does XSLT — the combination of the current node and the mode is a state that transitions through apply operations.

In other words, the entire pipeline could be re-conceived as a single complex transform of a collection of XML sources to a collection of outputs. An XProc pipeline could be re-conceived as applying to a particular kind of XML input under specific circumstances — in short, a special kind of template with an XProc implementation that:

  • Executes when the template pattern matches the current XML node in the specified mode.
  • Takes primary input from the matched node.
  • Returns its primary output as the template result.
  • Can apply other templates as a pipeline step (as sketched above but including other XProc as well as XSLT templates).

Easy to sketch, of course; a bear to solve all the state and streaming challenges. Still, wouldn’t it be something to use XSLT as a pipeline engine?

This entry was posted in XProc, XSLT. Bookmark the permalink.

1 Response to Getting started with XProc

  1. Romain Deltour says:

    Interesting post!

    Note that there is actually an (experimental) XSLT-based XProc implementation, called Half-Pipe and hosted on Google Code.

Leave a comment