xml-project logo
le-tex logo
 

Interoperability
of XProc Pipelines

A real-world publishing scenario

…or how we eliminated cx:depends-on="ndw"

Achim Berndzen, <xml-project /> (@xml_project)

Gerrit Imsieke (@gimsieke),
le-tex publishing services GmbH (@letexml)

Background:

Joint project of le-tex publishing services and <xml-project />

Task:

  • Migrate a complex XProc pipeline system to another processor
  • Keep it runnable on both XProc processors
  • Ensure maintainability for pipeline developers

A test case for XProc’s interoperability

Is it possible to run the same real-world pipelines on two processors and produce the same results?

Testing interoperability is an aspect of W3C’s process, but:

  • Real world pipelines are more complex than test suite use cases.
  • Authors of these will use every feature offered, standard or not.
  • XMLCalabash was the only serious choice for production environments.

A test case for XProc’s interoperability

Is it possible to run the same real-world pipelines on two processors and produce the same results?

Who cares (or should care) about interoperability?

Pipeline users: Freedom of choice
Pipeline authors: Write once, use everywhere
Technology decision makers: No vendor lock-in, protection of investment
XProc community: Approval of the work done, hints at future tasks

Our real-world publishing scenario

Start


complex pipeline system


XProc A

Finish

?

XProc B

Our real-world publishing scenario

Start


transpect

XMLCalabash

Finish

?

MorganaXProc

Our real-world publishing scenario: docx2jats

  • Pipeline that converts .docx files to JATS XML and EPUB
  • As used for the XML Prague 2016 transpect tutorial (repo on Github)
  • Based on the transpect framework
  • 76 distinct steps
    • of which 28 belong to p:
    • 2 to pxp: and pos: (EXProc)
    • 2 to cx: (Calabash)
    • 39 to transpect libraries
    • 3 are façade steps for extensions written in Java
    • and 2 are project-specific

transpectdoc-generated
project documentation
for docx2jats

transpectdoc-generated project documentation

transpect: Conversion & Checking Framework

Functionality Technology
  • docx → Hub XML
  • IDML → Hub XML
  • flat Hub → structured Hub (≈ DocBook)
  • xlsx → XHTML
  • XML → IDML
  • Hub XML → docx
  • XHTML → EPUB
  • EPUB → Hub XML
  • Hub XML → TEI
  • Hub XML → JATS/BITS
  • … (more on Github)
  • XProc
  • XSLT 2.0
  • Relax NG
  • Schematron
Methodology
  • Cascaded configuration
  • HTML reports
  • Canonical URIs and XML Catalogs for resource import

transpect: Cascaded Configuration

Caption:

XProc runtime

transpect library

project-specific (a9s = adaptations)

docx2jats-demo
├── a9s
│   ├── common
│   │   ├── css
│   │   ├── epubtools
│   │   ├── evolve-hub
│   │   ├── hub2jats
│   │   ├── jats2html
│   │   ├── schematron
│   │   │   ├── evolve-hub
│   │   │   ├── flat
│   │   │   ├── jats
│   │   │   └── styles
│   │   ├── styles
│   │   ├── xpl
│   │   └── xsl
│   └── Imprint1
│       └── Journal1
│           ├── schematron
│           │   └── jats
│           └── styles

├── calabash
│   ├── distro
│   └── extensions
│       └── transpect
├── cascade
├── conf
├── css-tools
├── docx2hub
├── epubcheck-idpf
├── epubtools
├── evolve-hub
├── htmlreports
├── hub2html
├── hub2jats
├── jats2html
├── map-style-names
├── nlm-stylechecker
├── xmlcatalog
├── xproc-util
└── xslt-util

Dynamically evaluate (sub-) pipelines
that are loaded from the cascade (cx:eval)

<tr:xslt mode="hub:split-at-tab"/>
<tr:xslt mode="hub:preprocess-hierarchy"/>
<tr:xslt mode="hub:hierarchy"/>
<tr:xslt mode="hub:postprocess-hierarchy"/>
<hub:evolve-hub_lists-by-indent/>
<tr:xslt-mode mode="hub:clean-hub"/>
<tr:xslt mode="hub:split-at-tab"/>
<tr:xslt mode="hub:preprocess-hierarchy"/>
<tr:xslt mode="hub:hierarchy"/>
<tr:xslt mode="hub:postprocess-hierarchy"/>
<hub:evolve-hub_lists-by-style-name/>
<tr:xslt-mode mode="hub:clean-hub"/>

docx2jats-demo
├── a9s
│   ├── common
│   │   └── evolve-hub
│   │      ├── driver.xpl
│   │      └── driver.xsl
│   └── Imprint1
│       ├── evolve-hub
│       │   └── driver.xsl
│       └── Journal1
│           └── evolve-hub
│               ├── driver.xpl
│               └── driver.xsl

transpect: HTML reports

(see XML Amsterdam 2015 presentation)

<xproc-config xmlns="http://xmlcalabash.com/ns/configuration"
  xmlns:tr="http://transpect.io">
  <implementation type="tr:unzip" class-name="UnZip"/>
  <implementation type="tr:validate-with-rng" class-name="LtxValidateWithRNG"/>
  <implementation type="tr:image-identify" class-name="ImageIdentify"/>
  <implementation type="tr:image-transform" class-name="ImageTransform"/>
</xproc-config>
<tr:image-identify name="ii">
  <p:with-option name="href" 
    select="/epub-config/cover/@href"/>
</tr:image-identify>
<c:results name="narrow-cover.png">
  <c:result name="mimetype" value="image/png"/>
  <c:result name="formatdescription" value="PNG Portable Network Graphics"/>
  <c:result name="formatdetails" value="Png"/>
  <c:result name="width" value="219px"/>
  <c:result name="height" value="131px"/>
  <c:result name="density" value="90dpi"/>
  <c:result name="colorspace" value="RGB"/>
  <c:result name="transparency" value="true"/>
  <c:result name="compressionalgorithm" value="PNG Filter"/>
</c:results>

sample HTML report, complaining about insufficient cover pixel width

transpect: Calabash Extensions

HTML reports use Calabash extensions, written in Java:

  • jing with error location XPaths (for display at error location)
  • read image properties (for Schematron checks)

Other Calabash extensions:

  • unzip whole archives to disk (docx, IDML, EPUB, …)
  • transform images

Our real world publishing scenario

Start


transpect

XMLCalabash

Finish

?

MorganaXProc

XMLCalabash – MorganaXProc: Fact sheet

  XMLCalabash MorganaXProc
Developed by:

Norman Walsh – Chair of W3C WG

<xml-project /> – German start up

Since: 2008. Reference implementation for W3C recommendation Publicly available since early 2014
Recent version: 1.1.9 (more than 80 releases) 0.95–10 (11th release – public beta)
Test suite score: 100% 99.67% (Three tested optional features not implemented)
Highlights:
  • Very reliable: long history. Built on top of Saxon
  • Large library of extension steps
  • Integrated web server for XProc
  • and many more
  • Pluggable support for XQuery & XSLT processors
  • Adaptable file system
  • Graphical user interface
  • and some more
Licence: GPL 2.0 GPL 2.0
Status: Gold standard New kid on the block

Our real world publishing scenario

Start


transpect

XMLCalabash

Finish

?

MorganaXProc

Possible obstacles to migrate XProc pipelines:

XProc spec:

  • Implementation defined features in the recommendation e.g. required vs. optional steps (47 items)
  • Implementation-dependent features (21 items)
  • Processor specific steps and author defined steps in secondary language
  • Sketched proposed extension steps from EXProc.org
  • Problems from the underlying technologies in façade-steps

What obstacles to expect?

For our concrete migration project:

  • Resource management of XProc pipeline systems (XMLCatalog or EXPath packaging system)
  • User written extension steps in a secondary language
  • Differences in the supported step library: <cx:message/> and <cx:eval/>
  • Use of extension attribute "depends-on"

Short on "depends-on"

The evaluation order of steps not connected to one another is implementation-dependent.

XProc: An XML Pipeline Language (11 May 2010)

But, what if:

<nasp:log-in-to-a-web-service/>
<nasp:send-data-to-this-service/>

Solution: extension attribute 'depends-on'

  • Introduced by XMLCalabash (Norm Walsh)
  • List of step must be executed before this step
  • Very useful, but not standard!

Up to the lab: Problems found

  • No standard way to import (EXProc) libraries
  • Different behaviour of standard steps: <p:store/> etc.
  • Different interpretations of the specs:
    • Stylesheets without primary result document in <p:xslt/>
    • Trying to get an non-existent file in <p:http-request/>
    • Trying to create an existing folder with <pfx:mkdir/>
  • Different rules for namespace binding in underlying object models (xmlns:d='dummy')

The agenda for migration: Problems and solutions

Problem: Solution:
Resource management Add XML Catalog support to MorganaXProc
Divergent interpretation on <p:store/> etc. Adapt MorganaXProc to XML Calabash’s interpretation
User defined steps in a secondary language Rewrite steps for MorganaXProc
Namespace declarations without proper URIs Change pipelines by hand
Missing primary result for <p:xslt/>
Divergent message and eval steps XProc to the rescue
Different EXproc lib import mechanism
Implementation specific attribute "depends-on"
Different behaviour in <pxf:mkdir/>
Different errors for "file://" in <p:http-request/>

XProc to the rescue

Basic idea:
Use XProc pipelines to ensure interoperability of XProc pipeline.

Two pipelines needed:

  • Bridge the namespace gap, adapt different behaviour - xproc-iop.xpl
  • Make pipelines interoperable: interoperator.xpl

XProc to the rescue - step 1: xproc-iop.xpl

  • Bridge the namespace gap
  • Adapt different behaviour
<p:declare-step type="iop:message">
  <p:input port="source" sequence="true" />
  <p:output port="result" sequence="true" />
  <p:option name="message" required="true" />
  <cx:message p:use-when=
    "p:system-property('p:product-name') = 'XML Calabash'">
    <p:with-option name="message" select="$message" />
  </cx:message>
  <mod:report p:use-when=
    "p:system-property('p:product-name') = 'MorganaXProc'">
    <p:with-option name="message" select="$message" />
  </mod:report>
</p:declare-step>

XProc to the rescue - step 2: interoperator.xpl

  • A pipeline to make pipelines interoperable.
  • Imports are recursively resolved.
  • Call once, be interoperable once and for all!
Conditional import of Calabash libraries
Add mox:depends-on to steps with cx:depends-on
Rename <cx:message/> to <iop:message/>
Rename <cx:eval/> to <iop:eval>
Move all EXProc steps to EXProc-namespaces
Import EXProc libraries if needed
. . .

XProc to the rescue

Publicly available on github: https://github.com/xml-project

  • Make your XProc projects interoperable
  • Contribute to discover and conquer newly discovered obstacles
  • Help to document XProc’s state of interoperability

Available today!

Conclusions: What lessons to learn?

Is it possible to run the same real-world pipelines on two processors and produce the same results?
Pipeline users: It is easy to make even complex pipelines interoperable. Just use our two pipelines.
Pipeline authors:
  • KISS: Keep It Standard Stupid!
  • Mind the obstacles we have discussed!
Technology decision makers: Do not worry about vendor-lockins or reusability of pipelines: XProc is a great tool!
XProc community: The Working Group did a really great job!

But:

We are not completely finished with making XProc a fully useful and interoperable language.

Thank you!

http://tiny.cc/xprocinterop