transpect Setup Manual


Table of Contents

About le-tex transpect
About this Manual
1. Concepts
The Configuration Cascade: Content in Clades
Hierarchical Organization vs. Content Tagging
Inheritance by xsl:import
Dynamic Assembly
Clades
How a Clade is Selected for Given Content
Example: Unionsverlag
2. System requirements
3. Setting up a transpect project
4. Examples
Conversion of DOCX into EPUB
Conversion of DOCX into IDML
Conversion of IDML into HUB into EPUB
Conversion of IDML into TEI into EPUB with ONIX metadata
5. Transpect Modules
calabash
6. Help
Commom mistakes when setting up
I get I/O error reported by XML parser processing http://[…]
I get load-cascaded: no file available, evolve-hub/driver.xpl
I get SEVERE: If sequence is not specified on p:output, or has the value false, then it is a dynamic error if the step does not produce exactly one document on the declared port.
FAQ
Explain: http://transpect.le-tex.de/book-conversion/converter versus http://customers.le-tex.de/generic/book-conversion

List of Figures

1.1. Configuration directory layout for Unionsverlag

transpect is a collection of modules for converting and checking XML-based file formats, including XML itself and (X)HTML. Most of the modules (see http://www.le-tex.de/en/transpect.html#transpect-modules) are implemented in the programming languages XSLT 2.0 and XProc 1.0.

There are two overarching concepts or methodologies that turn this collection of conversion modules into a framework:

Configuration cascade

Default transformation and checking rules (XSLT, Schematron, CSS, …) may be superseded with specific rules. These rules specified according to the group of content that the input belongs to, for example per imprint, per series, or per work.

HTML report

Checks against Schematron or Relax NG schemas may be performed after each conversion step. Their results will uniformly be visualized at the error locations in an HTML rendering of the document.

The terms imprint, series, and work clearly stem from book publishing, and capturing the commonalities and disparities within a publisher’s book production is what transpect’s configuration cascade was targeting at originally. Since svn revision 2863 of the XProc library load-cascaded.xpl, an arbitrary configuration hierarchy may be specified. The concepts behind this configuration hierarchy are described in the section called “The Configuration Cascade: Content in Clades”.

The transpect modules are open-source software (2-clause BSD license). They reside in Subversion repositories, namely https://subversion.le-tex.de/common/, https://subversion.le-tex.de/docxtools/, and https://subversion.le-tex.de/idmltools/trunk/. Technically, transpect modules are assembled to a project by creating a path in a subversion repository anywhere, importing the required transpect modules by means of an svn:externals property, importing the modules’ XML catalogs from the project’s catalog and adding some glue code such as the project’s front-end XProc pipeline(s).

If the tasks at hand or the configurations are not too disparate, there is typically just a single transpect project per publisher. It may well accomodate both a pipeline for creating XML and EPUB from IDML input and another pipeline for synthesizing IDML from docx manuscripts, for example. In any case, it should not be necessary to create separate projects for the sole purpose of providing different conversion/checking settings because that’s what the cascade is for.

transpect is a conversion and checking framework. It does not provide services such as user and role administration, workflow management, or content management. There is currently no publicly available graphical UI to run arbitrary pipelines.

Some of the modules may be used as standalone converters. They typically run configuration-free. However, they may not run by themselves because they often depend on smaller libraries, not to mention on the Calabash processor. In order to make them available standalone, a minimal front-end setup is provided for each of them. Consider the module docx2hub. This module depends on utility XSLT stylesheets, XProc libraries, and on the ready-to-run Calabash XProc processor external with some le-tex extensions. It might attach them directly as SVN externals, but if every module that depends on other modules or utilities attached them as externals, a lot of duplicate externals would be present in a transpect project. Therefore the individual modules need front-end projects even if they are only supposed to perform the single task that they are made for. These front-end projects, such as https://subversion.le-tex.de/docxtools/trunk/docx2hub_frontend/ for docx→Hub conversion, attach the core docx2hub module along with Calabash and other externalized utilities.

Once configured, a transpect project should run on every platform with Java 1.6 or newer. Including the Calabash external with some preconfigured extension steps serves as a runtime library so that no other dependencies have to be satisfied. Only some dedicated tools, such as LaTeX or kindlegen, are invoked via p:exec XProc steps.

Depending on the size of the input documents, a transpect pipeline may consume much RAM. The expanded XML of Word or InDesign documents that comprise several 100 pages may easily reach a size of 50 or more MBs when serialized, the internal representation is of a similar order of magnitude, and there are dozens of XSLT passes. So it may be necessary to allocate as much as 4 GB of RAM to Calabash. The default is 1 GB.

This manual is a step-by-step guide to set up a transpect project. There are some example projects (see Chapter Chapter 4, Examples) which convert the transpect white paper, e.g. from DOCX→XML→EPUB. The sample projects may be downloaded from https://subversion.le-tex.de/common/transpect-demo/. This manual gives instructions on how to set up the code part (trunk subdirectory) from scratch.

Please note that this manual is work in progress. We’ve been busy setting up transpect projects for customers (4 projects so far, plus internal use), so there’s been little time for documentation.

Note

This documentation pertains to the clade concept that was introduced in revision 2863 of pubcoach, the XProc steps and XSLT transformations that implement the cascaded configuration. Older transpect installations might still rely on the rigid publisher/series/work configuration cascade.

Often different configuration settings are associated with different types of content. For example, one book series has boxes, marginal notes, footnotes and up to five heading levels while the other only has two heading levels and footnotes. Or books of a certain imprint extract their metadata from content files while the larger part of the book production relies on an ONIX dump. A more fundamental distinction runs between journals and books.

The transpect configuration cascade assumes that there is a fundamental setup, maybe for a whole group of publishers. Below that, there are overrides for the individual imprints, below that a distinction into books and journals, etc. An obvious disadvantage of this approach is that it is somehow arbitrary where to insert a configuration override level, and in which order. It might be wiser to distinguish between books and journals first before dealing with the different imprints. Another distinction may be the input type. If a series has both Word and InDesign input, should one divide it into virtual subseries for each input type? Or should one distinguish between input types on the highest level and then replicate the imprint/series hierarchy below each input type? What if a book that was scheduled to be typeset in Word moves to an InDesign production line? Does the whole project have to relocate then in the content hierarchy?

These thoughts illustrate that it is often impossible to find the single organizational hierarchy for all the content. There are orthogonal categories such as input type, language, layout type, etc. Although these categories often coincide with the more or less arbitrary content hierarchy, they do not necessarily do.

The question is why one should strive to organize the content in a single hierarchy. Another approach is to flexibly attach tags to each content item, such as “English”, “InDesign”, “A5 default layout”.

An important factor in favor the hierarchical approach is the xsl:import instruction. At the core of transpect, there are inevitably some XSLT transformations. It is conceivable that transpect stores special building blocks with transformation rules for processing English-language, InDesign-input and A5-layout content. That is of course, if there is a need at all for treating these properties individually. The fundamental XSLT then has to be enhanced in such a way that it it also includes the overrides. There are some issues with this approach that are mostly related to XSLTs customization method and how this approach is not in line with it: Firstly, the XSLT that is finally applied to the content is not a static file; it must be generated. Although this is absolutely feasible, it should better be avoided in order to lower the complexity and to ease debugging. Secondly: If the generated code is a wrapper that includes the fundamental code and the overrides, there will be priority conflicts between fundamental and override templates which leads to a warning, or function redeclarations which leads to an error. If the compound code uses xsl:import instead of xsl:include, these issues will disappear. It may not, however, be implemented in such a way that the fundamental code is enhanced with import statements because the importing code always wins, no matter what the priority of imported templates is. So one would need to generate a new wrapper that first imports the fundamental code and then the stylesheets that implement the special code. Global rules must be established which special code should have the highest precedence, in case that there were patterns that match the same nodes. This code must be included last, according to the XSLT import precedence rules. But what if a combination of English language and A5 layout requires a different template (for example, for running heads) than English language and A4 layout or German and A5? Then the categories are not perfectly orthogonal any more; they are somewhat entangled.

We try to overcome these issues by having an import cascade where the most specific stylesheet is chosen (more on how transpect selects it below). This stylesheet then imports the next-specific, until finally the fundamental template is included. This approach has its drawbacks exactly when cross-cutting categories are involved. Consider localization of generated content as an example. Imagine that it’s not just a linear string-for-string translation. For example, the English version reads “[for details see] [Chapter] [1]”, while the German reads “[siehe] [Kapitel] [1] [für Details]”. This kind of flexible localization may best be achieved in XSLT by supplying a default template in English that will construct the whole phrase, filling “Chapter” and the number by calling other templates. Depending on the content’s language, either the English default or the German wrapper should be used. Suppose that the German wrapper is the same for all content, so it will be placed in a directory for the common configuration. But then every customization level must also provide a localized wrapper that imports the default-language stylesheet of this level and then the common localization stylesheet. This is all feasible, but if there are many combinations of orthogonal features, there need to be a wrapper for each possible combination on each customization level. This is to illustrate that it is not always possible to organize content or configuration strictly hierarchically, and that dynamic assembly of stylesheets might at some point become inevitable. Stylesheets might indeed be assembled dynamically, but only with some orchestrating code such as XProc. Since we are using XProc, it is absolutely feasible (see next section), but we try to avoid it for reasons given above, at least for XSLT stylesheets.

For Schematron checks and for the merging of HTML with metadata, however, we are already using an XSLT-based assembly mechanism that not only selects the most specific file, but builds a compound ruleset from all relevant files that it finds in the cascade.

There is already a mechanism in place that permits dynamic assembly of an XSLT stylesheet. This might in fact be used for decorating an existing stylesheet with an import statement for localization, depending on the desired language.

Every file in the cascade may be replaced by an XSLT stylesheet with the same base name. The main motivation for this dynamic assembly mechanism was the redundancy that we were trying to avoid when maintaining many only slightly modified XProc pipelines.

This mechanism may also be used to share common configurations, for example between sibling journals, without introducing new customization levels. If 5 of 24 journals of a given imprint share a feature where the XML of each converted article must be automatically linked against a patent database, one could in principle introduce another configuration hierarchy level, journal-group. There will be two journal-groups, patent and no-patent. In no-patent, there will be no configuration at all – thus the parent configuration will be applied. For the journal-group called patent, there will be a hub2jats/hub2jats_driver.xpl.xsl that will patch the parent configuration’s XProc to invoke the additional XSLT pass (or XQuery script) that inserts the links to the patent DB. See Customizing XProc for a discussion of this approach.

Now consider that 2 of the 5 patent journals are produced in a two-column layout that they share with 13 other journals. The remaining 14 journals are a single-column layout. The processing will be different at a relatively early stage of the overall conversion process, evolve-hub. Let’s assume the XProc in this macroscopic step is the same for all journals and the only differences are in the XSLT. The XSLT that applies to all journals will contain the rules for one of the layouts. One of the 2-column journals will get an overriding evolve-hub/driver.xsl that imports the journal XSLT and override some templates/functions/variables in order to implement transformation rules for the the 2-column layout. This XSLT may now seve as the master for all other 2-column journals. All other 2-column journals may then feature an identical, rather trivial evolve-hub/driver.xsl.xsl stylesheet that simply loads the master journal’s XSLT and identically reproduces it.

By and large, we think that the hierarchical configuration mechanism, enhanced with this dynamic generation feature, is adequate to model many content zoos.

In phylogenetics, a clade is a group consisting of an ancestral species and all its descendants. Analogously, we call a subtree of common configuration settings a clade. Clades may contain other clades, and they may contain “content” elements – positions in the inheritance hierarchy where content items may be attached. These content items then belong exclusively to that clade.

As the first step in most pipelines, an XProc parameter document (c:param-set) will be calculated. It contains, among other settings, the paths that will be searched for configuration items.

Clade selection is being done by an XSLT stylesheet that transforms the configuration file into a parameter document. The XSLT stylesheet takes two parameters as input, file and clades. Both are optional, but if you don’t supply any of them, the transformation will terminate with an error.

Supplying the string file is sufficient if its base name adheres to the file naming conventions that the function transpect:parse-file-name() implements. For an input of file:/some/path/acme_02651_std.idml, the function might then return a sequence of attributes such as (publisher=acme, production-line=standard, work=02651, ext=idml). The attributes will be passed as tunneled parameters to the transformation of the configuration file. If the configuration file contains a clade with (role=production-line, name=standard) that is child of a clade with (role=publisher, name=acme) and that has a content child with (role=work, name-regex=\d{5}), the production-line clade that is immediately above the content element will be selected as the matching clade.

The matching algorithm is implemented in paths.xsl that can be imported from its canonical location, http://transpect.le-tex.de/book-conversion/converter/xsl/paths.xsl if the catalogs are properly set up (see below). The sample implementation of transpect:parse-file-name() in said stylesheet has to be overridden in the importing stylesheet (unless your project has clades with the roles “publisher” and “series”, a content element with the role “work”, and a the same file name matching regex as the imported paths.xsl). The customized (importing) stylesheet may be supplied to paths.xpl on the primary input port.

When a matching clade has been found, the paths where to look for configuration files will be calculated. For a given clade, the paths will be the names of all clades along the ancestor-or-self axis (in document order), joined by '/'. The paths are always relative to the adaptions subdirectory of the transpect base directory. For the matching clade in the example above, this would be $transpect/adaptions/acme/standard/. The less specific paths are generated by going up the clade hierarchy and calculating the paths for each clade in the same manner, yielding $transpect/adaptions/acme/. As the least specific path, $transpect/adaptions/common/ will be added.

An important concept of transpect is that there may be configuration overrides not only for a class of content (a clade), but for individual content objects. These overrides are expected to reside where the content is, not with the central transpect code. Each clade element (and the conf element) may carry a @content-base-uri attribute that specifies the URI where content is located. This URI is typically a canonical URI such as http://cms.acme.com/ that will be catalog-resolved to where the content is checked out in the local file system. Looking upward from the content element, the next @content-base-uri will be selected and the relative paths will be constructed as above, with the content name added as the last paths component. Example: http://cms.acme.com/acme/standard/02651/, or its catalog-resolved version file:/C:/cygwin/home/user/Acme/content/standard/02651/ if http://cms.acme.com/ resolves to file:/C:/cygwin/home/user/Acme/content/. In the resulting XProc parameter document, both the catalog-resolved and the canonical paths will be included for each of the paths, as we will see soon. It should be noted that content elements may include other content elements; for example, when a work may consist of multiple parts that are expected to be uploaded as single files. For example, the base name of the file above may be acme_02651-01_std, yielding an additional part=01 attribute as output of the custom file name parser.

In addition to the file parameter that goes into paths.xsl, a clades parameter may also be supplied. It is a space- or comma-separated list of name/value pairs, for example production-line=different,publisher=acme_uk. The parameter will be parsed into XML attributes in a straightforward way. In the clade matching algorithm, these attributes will have precedence over the filename parsed attributes. This enables applications that accept arbitrary input file names and let the user do the publisher/production-line/… selection perform by other means (e.g., dropdown lists).

A note on the matching algorithm: The sequence in which the attributes are given does not matter. The input above will also match a configuration document in which the outer clade has the role of production-line and the inner clade has the publisher role.

A note on file extensions: The function transpect:parse-file-name() accepts file names with or without paths, or URIs. They will strip everything including the last forward slash (backward slashes are not expected here – in fact there is a file name to URI normalization step in place). From that remainder, everything after the first dot will be stripped (the first dot will be stripped, too).

By convention, the extension should be put into the attribute ext so that it can be used for caluculating the repository location of the file. The extension is the result of the function transpect:ext() that is also overridable. By default, it will yield 'docx.xml' for a file 'file:/foo/bar.docx.xml'.

The subdirectory where a file will be placed if it is committed to revision control usually depends on its file name extension. In order to configure this, there is another overridable function in paths.xsl, transpect:target-subdir(), that takes a content element as its argument and will output the subdirectory (idml, docx, images,  …) where a content item will be stored. In the course of the configuration file transformation, all content elements below the matching clade will receive all attributes that have been generated by transpect:parse-file-name(). So if you otherwise don’t need to overwrite transpect:target-subdir(), let your transpect:parse-file-name() function generate an attribute ext and your destination subdirectory will be calculated properly.

Another use for file extensions can be seen in the example below, where the extension serves as a switch for the choice of configuration. A word of caution though: If you have separate overriding configurations for, e.g., 'docx' and 'idml', it may well happen that a file that once bore a 'docx' ending comes back with an 'idml' or an 'xml' ending at a later production stage. The different configurations should therefore only treat input differently during normalization stages (docx→XML, IDML→XML).

As a real-life example for a cascaded configuration, we’ll look at how Unionsverlag organize their configuration – the configuration file and how it translates to XSLT, CSS, … files in a system hierarchy.

This is the annotated configuration file:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-model href="http://transpect.le-tex.de/book-conversion/converter/schema/transpect-conf.rng"?>
<?xml-model href="http://transpect.le-tex.de/book-conversion/converter/schema/transpect-conf.rng" 
  schematypens="http://purl.oclc.org/dsdl/schematron"?>
<conf xmlns="http://www.le-tex.de/namespace/transpect"
  content-base-uri="http://unionsverlag.com/content-repo/" 
  ❷paths-xsl-uri="http://customers.le-tex.de/generic/book-conversion/adaptions/unionsverlag/xsl/paths.xsl">
  <cascade>
    <reserved name="css"/><reserved name="epubtools"/>
    <reserved name="evolve-hub"/>
    <reserved name="fonts"/>
    <reserved name="htmlreports"/>
    <reserved name="htmltemplates"/>
    <reserved name="hub2html"/>
    <reserved name="hub2tei"/>
    <reserved name="metadata"/>
    <reserved name="schematron"/>
    <reserved name="styles"/>
    <reserved name="tei2html"/>
    <reserved name="xpl"/>
    <reserved name="xsl"/>
    <clade role="publisher" name="unionsverlag">
      <param name="publisher-prefix-for-style-mapping" value="uv"/><param name="use-css-decorator-classes" value="yes"/>
      <param name="epub-version" value="EPUB3"/>
      <param name="which-htmlreport" value="evolve-hub"/>
      <param name="publisher-prefix-for-style-mapping" value="uv"/>
      <clade role="production-line" name="legacy">
        <content role="work" ❺name-regex="^\d{5}"/>
      </clade>
      <clade role="production-line" name="standard">
        <content role="work" name-regex="^\d{5}"/>
        <clade role="ext" name="idml" ❻content-base-uri="..">
          <content role="work" name-regex="^\d{5}">
          	<content role="chapter" name-regex="^\d{3}[fb]?$" 
          		content-base-uri=".."/>
          </content>
        </clade>
        <clade role="ext" name="docx" content-base-uri="..">
          <content role="work" name-regex="^\d{5}"/>
        </clade>
      </clade>
    </clade>
  </cascade>
</conf>

It is recommended that you validate the configuration file with Relax NG and Schematron. It can be done most conveniently in oXygen, but we’ll also include validation in the paths calculation step.

The location of an XSLT stylesheet that imports paths.xsl and overrides transpect:parse-file-name() (plus possibly some other functions and variables).

A list of reserved names for the configuration subdirectories. These names must not be used in naming clades.

Each clade may contain parameters that will be passed through, as c:params, to the XProc parameter document. Values of params in more specific clades will win, as expected.

Content elements don’t have a name attribute as clades have. They may optionally have a name-regex attribute. If present, a parsed attribute (e.g., work="20655a") will be matched against this regex. If it does not match, the containing clade does not match. This obviously allows for rejecting work IDs that don’t comply with conventions, but it also allows for filing books with certain work IDs (everything with 4 digits, everything that starts in 2, etc.) with certain imprints.

If you want to accept any potential ID that has been parsed, just leave the regex away. If the parsing function already does the routing into different clades or the rejection of non-compliant content IDs, also fine.

The default implementation of transpect:parse-file-name() will use the whole base name as content ID if parsing fails. The parameter document will then contain the file’s directory as s9y1-path and the common directory as s9y2-path.

This attribute on this and the sister clade effects that these clades have the same content base URI as their parent clade. As a consequence, all content that is determined to belong to one of these clades will be stored at the same location (modulo subdirectories that depend on the file extension), while the code can be differentiated for Word and InDesign input, as the clade names imply.

Note that content base URIs can be given either absolutely or relative to their parent elements’ content base URIs.

A content element may well be nested within another content element. This caters to a substructure where a work consists of multiple parts or chapters, for example individual .indd/.idml files that are bundled by an .indb file (or individual TEI .xml files bundled by an XIncluding TEI .xml file, once they are converted). What has been said about the content base URI above applies here, too: The URI '..' will effect that the chapter files will be stored together with the including files, rather than in a each own’s subdirectory.

The configuration file does not have to reside at any particular location. We usually store it as conf/transpect-conf.xml. It will often serve as the primary input to transpect pipelines that accept the input file name (docx, idml, …) as an XProc option.

In the file system, the cascaded configuration is stored like this:


And this is the resulting parameter document:

<?xml version="1.0" encoding="UTF-8"?>
<c:param-set xmlns:c="http://www.w3.org/ns/xproc-step">
   <c:param name="basename" value="UV_STD_20655_00000_DOCX_TestBuch"/>
   <c:param name="debug" value="'no'"/>
   <c:param name="debug-dir-uri"
            value="file:/C:/cygwin/home/gerrit/Unionsverlag/transpect/debug"/>
   <c:param name="epub-version" value="EPUB3"/>
   <c:param name="file"
            ❶value="file:/C:/cygwin/home/gerrit/Unionsverlag/content/unionsverlag/standard/20655/docx/UV_STD_20655_00000_DOCX_TestBuch.docx"/>
   <c:param name="interface-language" value="en"/>
   <c:param name="pipeline" value="unknown"/>
   <c:param name="progress" value="no"/>
   <c:param name="progress-to-stdout" value="no"/>
   <c:param name="publisher-prefix-for-style-mapping" value="uv"/>
   <c:param name="repo-href-canonical"
            ❷value="http://unionsverlag.com/content-repo/unionsverlag/standard/20655/docx/UV_STD_20655_00000_DOCX_TestBuch.docx"/>
   <c:param name="repo-href-local"
            value="file:/C:/cygwin/home/gerrit/Unionsverlag/content/unionsverlag/standard/20655/docx/UV_STD_20655_00000_DOCX_TestBuch.docx"/>
   <c:param name="s9y1" value="20655"/><c:param name="s9y1-path"
            value="file:/C:/cygwin/home/gerrit/Unionsverlag/content/unionsverlag/standard/20655/"/>
   <c:param name="s9y1-path-canonical"
            value="http://unionsverlag.com/content-repo/unionsverlag/standard/20655/"/>
   <c:param name="s9y1-role" value="work"/>
   <c:param name="s9y2" value="docx"/>
   <c:param name="s9y2-path"
            value="file:/C:/cygwin/home/gerrit/Unionsverlag/transpect/adaptions/unionsverlag/standard/docx/"/>
   <c:param name="s9y2-path-canonical"
            value="http://customers.le-tex.de/generic/book-conversion/adaptions/unionsverlag/standard/docx/"/>
   <c:param name="s9y2-role" value="ext"/>
   <c:param name="s9y3" value="standard"/>
   <c:param name="s9y3-path"
            value="file:/C:/cygwin/home/gerrit/Unionsverlag/transpect/adaptions/unionsverlag/standard/"/>
   <c:param name="s9y3-path-canonical"
            value="http://customers.le-tex.de/generic/book-conversion/adaptions/unionsverlag/standard/"/>
   <c:param name="s9y3-role" value="production-line"/>
   <c:param name="s9y4" value="unionsverlag"/>
   <c:param name="s9y4-path"
            value="file:/C:/cygwin/home/gerrit/Unionsverlag/transpect/adaptions/unionsverlag/"/>
   <c:param name="s9y4-path-canonical"
            value="http://customers.le-tex.de/generic/book-conversion/adaptions/unionsverlag/"/>
   <c:param name="s9y4-role" value="publisher"/>
   <c:param name="s9y5-path"
            value="file:/C:/cygwin/home/gerrit/Unionsverlag/transpect/adaptions/common/"/>
   <c:param name="s9y5-path-canonical"
            value="http://customers.le-tex.de/generic/book-conversion/adaptions/common/"/>
   <c:param name="s9y5-role" value="common"/>
   <c:param name="srcpaths" value="yes"/>
   <c:param name="transpect-project-uri" value=""/>
   <c:param name="use-css-decorator-classes" value="yes"/>
   <c:param name="which-htmlreport" value="evolve-hub"/>
</c:param-set>

The input file URI. It coincides with the calculated repo-href-local parameter because it was uploaded from its location in the locally checked-out content repository.

This is the calculated canonical content repository URL of the uploaded file. It is currently not used for any specific purpose, but who knows what it’ll be good for.

Now the interesting part begins. While previously transpect:load-cascaded and its implementing XSLT relied on a sequence of parameters called 'work-path', 'series-path', 'publisher-path', and 'common-path' (in descending specificity), these parameters are now called 's9y1-path', 's9y2-path', and so on, where 's9y' stands for “specificity”. (In contrast to “adaptations”, which we abbreviated by the much rarer “adaptions,” we kept the fully spelled-out word specificity, except that we didn’t spell it out due to its length and complexity. Maybe in a future revision we should replace “adaptions” with “adaptations,” abbreviated as 'a9s'…) There are additional parameters now so that one knows which role (publisher, work, …), name (unionsverlag, 20655, …) and canonical paths are associated with a given specificity. Except for the least specific item: There is a parameter 's9y5-role' with the value 'common', and also 's9y5-path' and 's9y5-path-canonical', but no parameter 's9y5' as for the more specific items. It should be clear that the number 5 in 's9y5-…' is not fixed. If another content item is determined to belong to a different clade, the total number of configuration levels could be different. For example, if a TEI XML file 'UV_STD_20655_00000_DOCX_TestBuch.xml' had been uploaded, the matching clade would have been the parent clade (production-line=standard), reducing the total number of levels to 4, that is, s9y4-role=common. Currently, the maximum nesting depth is 9.

This parameter document will be passed to all of transpect’s XProc steps that rely on cascaded configuration (and to some other steps, too – there are some more global settings apart from the cascade handily stowed away in that document), particularly to transpect:load-cascaded itself.

To wrap it up: The XProc parameter contains the paths where transpect will look for configuration overrides. The parameter document will be computed anew for each XProc run. It will be computed, by means of a customizable XSLT, from the transpect project’s configuration file. The input file name and optionally a clades selection string will serve as parameters to that transformation. The configuration file specifies how content is organized into “clades” and where the content resides, both as local and as canonical repository URIs.



[1] Tagging in the sense of keywords that form “tag clouds”, not in the sense of XML tagging.

Minimum system requirements:

Optional system requirements:

This chapter describes the general setting up of a transpect project. You can find examples in Chapter Chapter 4, Examples

A transpect project can consist of different sub-projects for, e.g., imprints, series etc. or other sub-projects which contain to one project. For example, the demo project contains several sub-projects for different conversions (see Chapter Chapter 4, Examples).

Components of a transpect project:

  • SVN code repository (generally called trunk)

  • SVN content repository (generally called content)

  • SVN externals for the transpect modules (one of the modules is calabash)

  • XML catalogs

  • Configuration file

  • Directory adaptions for project-specific adaptations (XSLT, XProc, Schematron, CSS, …)

  • Makefile (optional)

  • One or more front-end XProc pipelines, typically located in the directory adaptions/common/xpl

Step-by-step setup guide

  1. Creating SVN repositories

    1. Creating an SVN code repository

      1. Create an SVN code repository.

      2. Check out the SVN code repository.

        svn co web_address_of_the_code_repository name_of_the_code_directory

        e.g.

        svn co https://subversion.le-tex.de/common/transpect-demo/trunk/ transpect-demo/trunk
    2. Creating an SVN content repository (if none is existing)

      1. Create an SVN content repository.

      2. Check out the SVN content repository.

        svn co web_address_of_the_content_repository name_of_the_content_directory

        e.g.

        svn co https://subversion.le-tex.de/common/transpect-demo/content/ transpect-demo/content

      Generally, the SVN code repository and the SVN content repository are located in the same directory. But, the SVN content repository can be located in a different directory. Then, you have to consider Step Step 7.

  2. Creating an oXygen project file (if you use oXygen as editor)

    It isn't necessary to create an oXygen project file but an oXygen project file simplifies the working with the files of the demo project.

    1. Create an oXygen project file in the oXygen editor.

    2. Add it to the SVN code repository (trunk).

      svn add projectname.xpr
    3. Commit.

  3. Including the SVN externals

    The needed SVN externals depend on the transpect project and the conversion pipelines.

    calabash (https://subversion.le-tex.de/common/calabash/) must be included in any case.

    The examples in Chapter Chapter 4, Examples show the needed SVN externals for some selected conversions.

    1. Open an editor in the SVN code repository (trunk):

      export EDITOR=name_of_the_editor && svn pe svn:externals .

      e.g.

      export EDITOR=emacs && svn pe svn:externals .

      A temporary file will be opened in the choosen editor.

    2. Edit the file and include the needed SVN externals.

    3. Save the file.

    4. Update the SVN code repository.

    5. Commit the SVN externals.

      The externals will be fetched and are available in the SVN code repository.

  4. Configurating .subversion

    In more recent versions of SVN, XML files are treated as binary data. But so, it isn't possible to view different versions of a file with the command svn diff. For using it, the XML have to be treated as text. If you might want to change this, then you have to edit your subversion settings (~/.subversion/config on Unix-like systems). Then, new XML files will be stored as mime-type text/xml.

    1. Add the following lines to the file ~/.subversion/config

      Makefile = svn:eol-style=LF
      *.xml = svn:mime-type=text/xml;svn:eol-style=LF
      *.xsl = svn:mime-type=text/xml;svn:eol-style=LF
      *.xpl = svn:mime-type=text/xml;svn:eol-style=LF
      *.xpr = svn:mime-type=text/xml;svn:eol-style=LF
      *.rng = svn:mime-type=text/xml;svn:eol-style=LF
      *.rnc = svn:mime-type=text/plain;svn:eol-style=LF
      *.css = svn:mime-type=text/css;svn:eol-style=LF
    2. Commit.

  5. Creating the central XML catalog

  6. Setting up a configuration file

    The configuration file contains information about

    • publisher name

    • bookseries (optional)

    • DOI prefix (optional)

    • path name mappings (optional)

    While it is necessary to specify a config file, most of the settings there could probably be specified by providing a different paths XSLT (see Step 11). We might eventually drop config files. If we don’t, we’ll document its use and provide a schema.

    Example for a minimum configuration file:

    <publisher-conf content-base-uri="http://www.le-tex.de/transpect-demo/content/">
      <publishers>
        <publisher xml:id="letex" name="le-tex" doi-prefix="xxxx" file-name-component="" filesystem-path-component="letex"/>
      </publishers>
    </publisher-conf>
    1. Create a file conf/conf.xml.

    2. Add the root element publisher-conf with the attribute content-base-uri.

      <publisher-conf content-base-uri="content_base_uri">

      e.g.

      <publisher-conf content-base-uri="http://www.le-tex.de/transpect-demo/content/">
    3. Add an element publishers with one or elements publisher, e.g.

      <publishers>
        <publisher xml:id="letex" name="le-tex" doi-prefix="xxxx" file-name-component="" filesystem-path-component="letex"/>
      </publishers>

    If we talk about the configuration cascade, we are not referring to this config file. The cascade refers to the override files that may be specified for imprints, series, etc.

  7. Configurating the SVN content repository location

    Generally, the SVN code repository and the SVN content repository are located in the same directory. But, the SVN content repository can be located in a different directory. Then, you have to add an additional catalog file which contains the path to the location of the SVN content repository.

    When you are located in the SVN code repository, it is searched for the SVN content repository in the following order:

    1. xmlcatalog/content-repo.catalog.xml.

    2. xmlcatalog/content-repo.default.catalog.xml

    3. ../content/

    If there isn't a file xmlcatalog/content-repo.catalog.xml, it will be looked for a file xmlcatalog/content-repo.default.catalog.xml. If there isn't such a file, then the SVN content repository is expected in the folder ../content.

    If the SVN content repository isn't located in the same directory as the SVN code repository:

    1. Create an catalog file in the SVN code repository.

    2. Save the catalog file under the name

      • xmlcatalog/content-repo.catalog.xml (unversioned)

      or

      • xmlcatalog/content-repo.default.catalog.xml (versioned)

    3. Add the file xmlcatalog/content-repo.default.catalog.xml to the SVN code repository. So, it will be versioned.

    4. Edit the catalog file (xmlcatalog/content-repo.catalog.xml or xmlcatalog/content-repo.default.catalog.xml ) by adding the content-base-uri and the location of the SVN content repository.

      <catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
                      
        <rewriteURI uriStartString="content-base-uri" rewritePrefix="location_of_the_SVN_content_repository"/>
        
      </catalog>

      e.g.

      • absolute path to the SVN content repository

        <catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
                          
          <rewriteURI uriStartString="http://cms.acme.com/content/" rewritePrefix="file:///c:/cygwin/home/gerrit/ACME/content/"/>
          
        </catalog>
      • relative path to the SVN content repository

        <catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
                          
          <rewriteURI uriStartString="http://www.le-tex.de/transpect-demo/content/" rewritePrefix="../../content/"/> 
          
        </catalog>
    5. Add the catalog file to the central XML catalog xmlcatalog/xmlcatalog.xml

      • If the catalog file is unversioned (xmlcatalog/content-repo.catalog.xml):

        Add the line

        <nextCatalog catalog="content-repo.catalog.xml"/>

        e.g.

        <catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
                    
          <rewriteURI uriStartString="http://customers.le-tex.de/generic/book-conversion/" rewritePrefix="../"/>
          
          <nextCatalog catalog=""/>
          <nextCatalog catalog=""/>
          <nextCatalog catalog=""/>
          ...
         
          <nextCatalog catalog="content-repo.catalog.xml"/>
          
        </catalog>
      • If the catalog file is versioned (xmlcatalog/content-repo.default.catalog.xml):

        Add the line

        <nextCatalog catalog="content-repo.default.catalog.xml"/>

        e.g.

        <catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
                    
          <rewriteURI uriStartString="http://customers.le-tex.de/generic/book-conversion/" rewritePrefix="../"/>
          
          <nextCatalog catalog=""/>
          <nextCatalog catalog=""/>
          <nextCatalog catalog=""/>
          ...
         
          <nextCatalog catalog="content-repo.default.catalog.xml"/>
          
        </catalog>
  8. Creating a customizations directory

    A transpect project can consist of different sub-projects for e.g. imprints, series etc. or other sub-projects which contain to one project. For example, the demo project contains several sub-projects for different conversions (see Chapter Chapter 4, Examples).

    All folders and files which contain processes of a specific project should be stored in a customized directory in the SVN code repository. Call it adaptions, adaptations, customizations, whatever. This directory contains adaptions in XSLT, XProc, Schematron, CSS etc. The adaptions may be on several levels (e.g., common, imprint, series, work).

    1. Create a directory adaptions within the SVN code repository.

    2. Create a directory common within the directory adaptions.

    3. Create a directory xpl within the directory adaptions/common.

    4. Create a directory for project-specific adaptions within the directory adaptions.

  9. Creating a Makefile

    The Makefile controls the starting of conversions. It reduces the invocation of a conversion on the bash.

    1. Create a Makefile.

    2. Define a target name with the relevant information like

      • Location of calabash

      • Input ports which are used in the XProc pipeline (see Step Step 10)

      • Output ports which are used in the XProc pipeline (see Step Step 10)

      • XProc pipeline which controls the conversion

      • debug-dir-uri

      • debug=yes

    If you don't want to use a Makefile then you can start the conversion on the bash (see Step 14).

  10. Creating front-end XProc pipelines

    The conversion runs with XProc pipelines.

    1. Define the input ports.

      You have to specify the input ports depending on the conversion pipeline (see examples in Chapter Chapter 4, Examples).

      In any case, you need the port conf which declares the URI for the configuration file, e.g.

      <p:input port="conf" primary="true">
        <p:document href="http://customers.le-tex.de/generic/book-conversion/conf/conf.xml"/>
      </p:input>
    2. Define the output ports.

      You have to specify the output ports depending on the conversion pipeline (see examples in Chapter Chapter 4, Examples).

    3. Import the relevant subpipelines which depend on the SVN externals (see Step Step 3).

    4. Define an XProc step for the paths which permit more cascade levels for, e.g., imprints, series and work (see Step Step 11).

    5. Define the XProc steps of the conversion. You need at least one step for each transpect modul.

  11. Customizing the path synthesis XSLT

    The loading of customized files can be controlled by defining paths in the configuration file or in the file paths.xpl. The file paths.xpl, located in the directory pubcoach/xpl, contains the paths options like series, work, publisher etc. It loads the stylesheet pubcoach/xsl/paths.xsl. The paths are selected by the file name. This requires file naming conventions. The paths.xsl determines the correspondent customizations directory for a front-end pipeline by means of the file name.

    The files pubcoach/paths.xpl and pubcoach/paths.xsl should be adapted for the needs of your project. Therefore, it's necessary to create correspondent files in the directory adaptions/common/xpl respectively adaptions/common/xsl. Both files import the pubcoach files. These options can be specified for several conversions by an XProc step in the front-end pipelines.

    An XProc step in the front-end pipeline of the conversion defines paths options for the current conversion. Therefor, the file adaptions/common/xpl has to be imported. The paths of the configuration file (if existing, see Step Step 6) are overwritten.

    The current paths for imprints, series, work etc. are stored in the file paths.xml, generally located in the debug directory.

  12. Setting up an HTML report

    Checks against Schematron or Relax NG schemas may be performed after each conversion step. Their results will uniformly be visualized at the error locations in an HTML rendering of the document.

    Note

    On input port "report-in" of the first report step choose between <p:inline><c:reports/></p:inline> and bc:empty-report

    Create a custom htmlreports/svrl2xsl.xsl

  13. Creating metadata

    It is possible to integrate metadata in an EPUB file. You can find an example in Section the section called “Conversion of IDML into TEI into EPUB with ONIX metadata”.

    The following files are necessary for integrating metadata in EPUB files:

    • XML file with metadata

      This file contains the metadata for one or more books.

    • HTML template file

      This file provides the structure of the EPUB file. It will be filled with content generated by the file implementation.xsl.

    • XSL file

      This file defines the rules for generating metadata.

    • CSS file

      This file defines the layout for the HTML template file.

    1. Create an XML file with metadata.

      1. Fill the file with data.

      2. Save the file, e.g. as meta.xml in the customizations directory, e.g., in the directory adaptions/.../metadata.

    2. Adapt the front-end XProc pipeline (see the example in Section the section called “Conversion of IDML into TEI into EPUB with ONIX metadata”)

      1. Load the following files with the XProc steps <p:load>, <bc:load-whole-cascade> respectively <bc:load-cascaded>:

        • meta.xml

        • template.xhtml

        • implementation.xsl

      2. Integrate the relevant XProc steps in the front-end XProc pipeline:

        • html:generate-xsl-from-html-template

        • html:apply-generated-xsl

        • p:add-attribute

        • p:template

    3. Create an HTML template file.

      1. Create an HTML file with the following <body> elements

        • <div id=...>

          The div container contains a block of metadata. You can fill it with headings, paragraphs etc. and with references to the metadata so that the metadata will be included. Headings, paragraphs and references within a div container appear in the EPUB file.

          It is necessary to specify an attribute <id> because it is used for references.

        • <h1>, <h2> etc. (if you want to declare sections)

        E.g.

        <html xmlns="http://www.w3.org/1999/xhtml">
          <head>
            <title></title>
            <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
            <link href="../../../htmltemplates/css/stylesheet.css" type="text/css" rel="stylesheet"/>
          </head>
          
          <body>
          
            <h1 class="...">Überschrift-1</h1>
            
              <div id="div-id">
                <h1>Überschrift-1</h1>
                <a ...>...</a>
                ...
              </div>
            
            <h1 class="...">Überschrift-1</h1>
            
              <div id="div-id">
                ... 
              </div>
            
            <h1 class="...">Überschrift-1</h1>
            
              <div id="div-id">
                ...
              </div>
           
          </body>
          
        </html>
      2. Define references.

        There are two kinds of references:

      3. Save the HTML template file, e.g. as template.xhtml.

    4. Create an XSL file.

  14. Starting a conversion

    Now, you can start the conversion with a call on the bash. You have two possibilites

    • Use a Makefile

    • Use a call with information about

      The example shows an invocation of a conversion of IDML into HUB into EPUB. The SVN code repository is called trunk and the SVN content repository is called content. Both repositories are located in the directory transpect-demo. The IDML file transpect_wp_de.idml is located in the SVN content repository in the directory le-tex/whitepaper/de. The XProc pipeline idml2epub_hub.xpl controls the conversion. The output files and the debug files are written in the directory content/le-tex/whitepaper/de.

      calabash/calabash.sh -D \
        -i conf=absolute_path/transpect-demo/trunk/conf/conf.xml \
        -o hub=absolute_path/transpect-demo/content/le-tex/whitepaper/de/transpect_wp_de.flat.xml \
        -o hubevolved=absolute_path/transpect-demo/content/le-tex/whitepaper/de/transpect_wp_de.hub.xml \
        -o html=absolute_path/transpect-demo/content/le-tex/whitepaper/de/epub/transpect_wp_de.xhtml \
        -o htmlreport=absolute_path/transpect-demo/content/le-tex/whitepaper/de/transpect_wp_de.xhtml \
        -o schematron=absolute_path/transpect-demo/content/le-tex/whitepaper/de/transpect_wp_de.sch.xml \
        -o result=/dev/null \
        absolute_path/transpect-demo/trunk/adaptions/common/xpl/idml2epub_hub.xpl \
        idmlfile=absolute_path/transpect-demo/content/le-tex/whitepaper/de/transpect_wp_de.idml \
        check=yes \
        local-css=true \
        debug-dir-uri=absolute_path/transpect-demo/content/le-tex/whitepaper/de/debug

This chapter contains some examples. It describes the setting up of the following conversions:

  • DOCX → EPUB

  • DOCX → IDML

  • IDML → HUB → EPUB

  • IDML → TEI → EPUB with ONIX metadata

You can check out the transpect demo with

svn co https://subversion.le-tex.de/common/transpect-demo/ transpect-demo

The demo project will be checked out and you will get the directory transpect-demo with the sub-directories trunk (= SVN code repository) and content (= SVN content repository).

This section describes the setting up of a transpect project that converts an DOCX file into an EPUB file.

Intermediate steps are:

Input file:

Output files:

  1. Creating SVN repositories

    1. Creating an SVN code repository

    2. Creating an SVN content repository (if none is existing)

  2. Including the SVN externals

    1. Open an editor in the SVN code repository (trunk):

      export EDITOR='name of the editor' && svn pe svn:externals .

      e.g.

      export EDITOR=emacs && svn pe svn:externals .

      A temporary file will be opened in the choosen editor.

    2. Edit the file and include the needed SVN externals.

      You need the following SVN externals:

      calabash https://subversion.le-tex.de/common/calabash/
      css-expand https://subversion.le-tex.de/common/css-expand/
      css-generate https://subversion.le-tex.de/common/css-generate/
      docx2hub https://subversion.le-tex.de/docxtools/trunk/docx2hub/
      epubtools https://subversion.le-tex.de/common/epubtools/
      evolve-hub https://subversion.le-tex.de/common/evolve-hub/
      fontlib/dejavu-sans https://subversion.le-tex.de/common/fontlib/dejavu-sans
      fontlib/xmlcatalog https://subversion.le-tex.de/common/fontlib/xmlcatalog/
      htmlreports https://subversion.le-tex.de/common/htmlreports/trunk
      hub2html https://subversion.le-tex.de/common/hub2html_simple/trunk
      pubcoach https://subversion.le-tex.de/common/pubcoach/trunk/
      schema/Hub https://github.com/gimsieke/Hub/trunk/
      schema/Hub/css https://github.com/gimsieke/CSSa/trunk/
      schema/xhtml1 https://subversion.le-tex.de/common/schema/xhtml1/
      schema/iso-schematron https://subversion.le-tex.de/common/schema/iso-schematron/
      xproc-util/store-debug https://subversion.le-tex.de/common/xproc-util/store-debug
      xproc-util/xml-model https://subversion.le-tex.de/common/xproc-util/xml-model
      xproc-util/xslt-mode https://subversion.le-tex.de/common/xproc-util/xslt-mode
      xproc-util/store-zip https://subversion.le-tex.de/common/xproc-util/store-zip
      xproc-util/copy-files https://subversion.le-tex.de/common/xproc-util/copy-files
      xslt-util/colors https://subversion.le-tex.de/common/letex-util/colors/
      xslt-util/hex https://subversion.le-tex.de/common/letex-util/hex/
      xslt-util/lengths https://subversion.le-tex.de/common/letex-util/lengths/
      xslt-util/mime-type https://subversion.le-tex.de/common/letex-util/mime-type/
      xslt-util/resolve-uri https://subversion.le-tex.de/common/letex-util/resolve-uri/
      xslt-util/xslt-based-catalog-resolver -r1688 https://subversion.le-tex.de/common/letex-util/xslt-based-catalog-resolver/ 
      xslt-util/functx/XML_Elements_and_Attributes/XML_Document_Structure https://subversion.le-tex.de/common/functx/XML_Elements_and_Attributes/XML_Document_Structure
      xslt-util/functx/Sequences/Positional https://subversion.le-tex.de/common/functx/Sequences/Positional
      xslt-util/functx/xmlcatalog https://subversion.le-tex.de/common/functx/xmlcatalog
      html-tables https://subversion.le-tex.de/common/html-tables/lib/
    3. Save the file.

    4. Commit the SVN externals.

    5. Update the SVN code repository.

      The externals will be fetched and are available in the SVN code repository.

  3. Creating the central XML catalog

    1. Create a central XML catalog in the same directory as the calabash external (generally, this ist the code repository). The central XML catalog must be named xmlcatalog/catalog.xml.

    2. Include the element rewriteURI with the attributes uriStartString and rewritePrefix

      <rewriteURI uriStartString="http://customers.le-tex.de/generic/book-conversion/" rewritePrefix="../"/> 
    3. Include the XML catalogs via nextCatalog.

      This central XML catalog matches the externals above:

      <catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">        
      
        <rewriteURI uriStartString="http://customers.le-tex.de/generic/book-conversion/" rewritePrefix="../"/> 
        
        <nextCatalog catalog="../xproc-util/store-debug/xmlcatalog/catalog.xml"/> 
        <nextCatalog catalog="../xproc-util/store-zip/xmlcatalog/catalog.xml"/>
        <nextCatalog catalog="../xproc-util/xml-model/xmlcatalog/catalog.xml"/> 
        <nextCatalog catalog="../xproc-util/xslt-mode/xmlcatalog/catalog.xml"/> 
        <nextCatalog catalog="../xproc-util/copy-files/xmlcatalog/catalog.xml"/> 
        <nextCatalog catalog="../hub2html/xmlcatalog/catalog.xml"/> 
        <nextCatalog catalog="../pubcoach/xmlcatalog/catalog.xml"/> 
        <nextCatalog catalog="../schema/hub/xmlcatalog/catalog.xml"/>
        <nextCatalog catalog="../schema/xhtml1/xmlcatalog/catalog.xml"/>
        <nextCatalog catalog="../schema/iso-schematron/catalog.xml"/>
        <nextCatalog catalog="../docx2hub/xmlcatalog/catalog.xml"/>
        <nextCatalog catalog="../evolve-hub/xmlcatalog/catalog.xml"/>
        <nextCatalog catalog="../xslt-util/colors/xmlcatalog/catalog.xml"/>
        <nextCatalog catalog="../xslt-util/hex/xmlcatalog/catalog.xml"/>
        <nextCatalog catalog="../xslt-util/lengths/xmlcatalog/catalog.xml"/>
        <nextCatalog catalog="../xslt-util/functx/xmlcatalog/catalog.xml"/>
        <nextCatalog catalog="../css-expand/xmlcatalog/catalog.xml"/>  
        <nextCatalog catalog="../css-generate/xmlcatalog/catalog.xml"/>  
        <nextCatalog catalog="../epubtools/xmlcatalog/catalog.xml"/>    
        <nextCatalog catalog="../fontlib/xmlcatalog/catalog.xml"/>
        <nextCatalog catalog="../htmlreports/xmlcatalog/catalog.xml"/>
        <nextCatalog catalog="../xslt-util/mime-type/xmlcatalog/catalog.xml"/>
        <nextCatalog catalog="../xslt-util/resolve-uri/xmlcatalog/catalog.xml"/>
        <nextCatalog catalog="../xslt-util/xslt-based-catalog-resolver/xmlcatalog/catalog.xml"/>
        <nextCatalog catalog="../html-tables/xmlcatalog/catalog.xml"/>
        
        <nextCatalog catalog="content-repo.catalog.xml"/> 
        
      </catalog>
  4. Starting a conversion

    Now, you can start the conversion with a call on the bash.

    1. Go into the SVN code repository which is called trunk

    2. Call the conversion. You have two possibilites for it.

      • Call the conversion with the Makefile

        make docx2epub IN_FILE=../content/le-tex/whitepaper/de/transpect_wp_de.docx debug=yes
      • Call the conversion without the Makefile

        ???????????????????????????????????????????????????

This section describes the setting up of a transpect project that converts an DOCX file into an IDML file.

Intermediate steps are:

Input file:

Output files:

  1. Creating SVN repositories

    1. Creating an SVN code repository

    2. Creating an SVN content repository (if none is existing)

  2. Including the SVN externals

    1. Open an editor in the SVN code repository (trunk):

      export EDITOR='name of the editor' && svn pe svn:externals .

      e.g.

      export EDITOR=emacs && svn pe svn:externals .

      A temporary file will be opened in the choosen editor.

    2. Edit the file and include the needed SVN externals.

      You need the following SVN externals:

      calabash https://subversion.le-tex.de/common/calabash/
      css-expand https://subversion.le-tex.de/common/css-expand/
      css-generate https://subversion.le-tex.de/common/css-generate/
      docx2hub https://subversion.le-tex.de/docxtools/trunk/docx2hub/
      fontlib/dejavu-sans https://subversion.le-tex.de/common/fontlib/dejavu-sans
      fontlib/xmlcatalog https://subversion.le-tex.de/common/fontlib/xmlcatalog/
      pubcoach https://subversion.le-tex.de/common/pubcoach/trunk/
      schema/Hub https://github.com/gimsieke/Hub/trunk/
      schema/Hub/css https://github.com/gimsieke/CSSa/trunk/
      schema/xhtml1 https://subversion.le-tex.de/common/schema/xhtml1/
      schema/iso-schematron https://subversion.le-tex.de/common/schema/iso-schematron/
      xproc-util/store-debug https://subversion.le-tex.de/common/xproc-util/store-debug
      xproc-util/xml-model https://subversion.le-tex.de/common/xproc-util/xml-model
      xproc-util/xslt-mode https://subversion.le-tex.de/common/xproc-util/xslt-mode
      xproc-util/store-zip https://subversion.le-tex.de/common/xproc-util/store-zip
      xproc-util/copy-files https://subversion.le-tex.de/common/xproc-util/copy-files
      xslt-util/colors https://subversion.le-tex.de/common/letex-util/colors/
      xslt-util/hex https://subversion.le-tex.de/common/letex-util/hex/
      xslt-util/lengths https://subversion.le-tex.de/common/letex-util/lengths/
      xslt-util/mime-type https://subversion.le-tex.de/common/letex-util/mime-type/
      xslt-util/resolve-uri https://subversion.le-tex.de/common/letex-util/resolve-uri/
      xslt-util/xslt-based-catalog-resolver -r1688 https://subversion.le-tex.de/common/letex-util/xslt-based-catalog-resolver/ 
      xslt-util/functx/XML_Elements_and_Attributes/XML_Document_Structure https://subversion.le-tex.de/common/functx/XML_Elements_and_Attributes/XML_Document_Structure
      xslt-util/functx/Sequences/Positional https://subversion.le-tex.de/common/functx/Sequences/Positional
      xslt-util/functx/xmlcatalog https://subversion.le-tex.de/common/functx/xmlcatalog
      html-tables https://subversion.le-tex.de/common/html-tables/lib/
      idmlval https://subversion.le-tex.de/idmltools/trunk/validation/
      xml2idml https://subversion.le-tex.de/idmltools/trunk/xml2idml/lib
    3. Save the file.

    4. Commit the SVN externals.

    5. Update the SVN code repository.

      The externals will be fetched and are available in the SVN code repository.

  3. Creating the central XML catalog

    1. Create a central XML catalog in the same directory as the calabash external (generally, this ist the code repository). The central XML catalog must be named xmlcatalog/catalog.xml.

    2. Include the element rewriteURI with the attributes uriStartString and rewritePrefix

      <rewriteURI uriStartString="http://customers.le-tex.de/generic/book-conversion/" rewritePrefix="../"/> 
    3. Include the XML catalogs via nextCatalog.

      This central XML catalog matches the externals above:

      <catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">        
      
        <rewriteURI uriStartString="http://customers.le-tex.de/generic/book-conversion/" rewritePrefix="../"/> 
        
        <nextCatalog catalog="../xproc-util/store-debug/xmlcatalog/catalog.xml"/> 
        <nextCatalog catalog="../xproc-util/store-zip/xmlcatalog/catalog.xml"/>
        <nextCatalog catalog="../xproc-util/xml-model/xmlcatalog/catalog.xml"/> 
        <nextCatalog catalog="../xproc-util/xslt-mode/xmlcatalog/catalog.xml"/> 
        <nextCatalog catalog="../xproc-util/copy-files/xmlcatalog/catalog.xml"/>
        <nextCatalog catalog="../xml2idml/xmlcatalog/catalog.xml"/> 
        <nextCatalog catalog="../pubcoach/xmlcatalog/catalog.xml"/> 
        <nextCatalog catalog="../schema/hub/xmlcatalog/catalog.xml"/>
        <nextCatalog catalog="../schema/xhtml1/xmlcatalog/catalog.xml"/>
        <nextCatalog catalog="../schema/iso-schematron/catalog.xml"/>
        <nextCatalog catalog="../docx2hub/xmlcatalog/catalog.xml"/>
        <nextCatalog catalog="../xslt-util/colors/xmlcatalog/catalog.xml"/>
        <nextCatalog catalog="../xslt-util/hex/xmlcatalog/catalog.xml"/>
        <nextCatalog catalog="../xslt-util/lengths/xmlcatalog/catalog.xml"/>
        <nextCatalog catalog="../xslt-util/functx/xmlcatalog/catalog.xml"/>
        <nextCatalog catalog="../css-expand/xmlcatalog/catalog.xml"/>  
        <nextCatalog catalog="../css-generate/xmlcatalog/catalog.xml"/>   
        <nextCatalog catalog="../fontlib/xmlcatalog/catalog.xml"/>
        <nextCatalog catalog="../xslt-util/mime-type/xmlcatalog/catalog.xml"/>
        <nextCatalog catalog="../xslt-util/resolve-uri/xmlcatalog/catalog.xml"/>
        <nextCatalog catalog="../xslt-util/xslt-based-catalog-resolver/xmlcatalog/catalog.xml"/>
        <nextCatalog catalog="../html-tables/xmlcatalog/catalog.xml"/>
        <nextCatalog catalog="../idmlval/xmlcatalog/catalog.xml"/>
        
        <nextCatalog catalog="content-repo.catalog.xml"/> 
        
      </catalog>
  4. Starting a conversion

    Now, you can start the conversion with a call on the bash.

    1. Go into the SVN code repository which is called trunk

    2. Call the conversion. You have two possibilites for it.

      • Call the conversion with the Makefile

        make docx2idml IN_FILE=../content/le-tex/whitepaper/de/transpect_wp_de.docx debug=yes
      • Call the conversion without the Makefile

        ???????????????????????????????????????????????????

This section describes the setting up of a transpect project that converts an IDML file into a HUB into an EPUB file.

Intermediate steps are:

Input file:

Output files:

  1. Creating SVN repositories

    1. Creating an SVN code repository

    2. Creating an SVN content repository (if none is existing)

      Generally, the SVN code repository and the SVN content repository are located in the same directory. But, the SVN content repository can be located in a different directory. Then, you have to consider Step Step 7.

  2. Including the SVN externals

    1. Open an editor in the SVN code repository (trunk):

      export EDITOR='name of the editor' && svn pe svn:externals .

      e.g.

      export EDITOR=emacs && svn pe svn:externals .

      A temporary file will be opened in the choosen editor.

    2. Edit the file and include the needed SVN externals.

      You need the following SVN externals:

      calabash https://subversion.le-tex.de/common/calabash/
      css-expand https://subversion.le-tex.de/common/css-expand/
      css-generate https://subversion.le-tex.de/common/css-generate/
      epubtools https://subversion.le-tex.de/common/epubtools/
      evolve-hub https://subversion.le-tex.de/common/evolve-hub/
      fontlib/dejavu-sans https://subversion.le-tex.de/common/fontlib/dejavu-sans
      fontlib/xmlcatalog https://subversion.le-tex.de/common/fontlib/xmlcatalog/
      htmlreports https://subversion.le-tex.de/common/htmlreports/trunk
      hub2html https://subversion.le-tex.de/common/hub2html_simple/trunk
      pubcoach https://subversion.le-tex.de/common/pubcoach/trunk/
      schema/Hub https://github.com/gimsieke/Hub/trunk/
      schema/Hub/css https://github.com/gimsieke/CSSa/trunk/
      schema/xhtml1 https://subversion.le-tex.de/common/schema/xhtml1/
      schema/iso-schematron https://subversion.le-tex.de/common/schema/iso-schematron/
      xproc-util/store-debug https://subversion.le-tex.de/common/xproc-util/store-debug
      xproc-util/xml-model https://subversion.le-tex.de/common/xproc-util/xml-model
      xproc-util/xslt-mode https://subversion.le-tex.de/common/xproc-util/xslt-mode
      xproc-util/store-zip https://subversion.le-tex.de/common/xproc-util/store-zip
      xproc-util/copy-files https://subversion.le-tex.de/common/xproc-util/copy-files
      xslt-util/colors https://subversion.le-tex.de/common/letex-util/colors/
      xslt-util/hex https://subversion.le-tex.de/common/letex-util/hex/
      xslt-util/lengths https://subversion.le-tex.de/common/letex-util/lengths/
      xslt-util/mime-type https://subversion.le-tex.de/common/letex-util/mime-type/
      xslt-util/resolve-uri https://subversion.le-tex.de/common/letex-util/resolve-uri/
      xslt-util/xslt-based-catalog-resolver -r1688 https://subversion.le-tex.de/common/letex-util/xslt-based-catalog-resolver/ 
      xslt-util/functx/XML_Elements_and_Attributes/XML_Document_Structure https://subversion.le-tex.de/common/functx/XML_Elements_and_Attributes/XML_Document_Structure
      xslt-util/functx/Sequences/Positional https://subversion.le-tex.de/common/functx/Sequences/Positional
      xslt-util/functx/xmlcatalog https://subversion.le-tex.de/common/functx/xmlcatalog
      html-tables https://subversion.le-tex.de/common/html-tables/lib/
      idmlval https://subversion.le-tex.de/idmltools/trunk/validation/
      idml2xml https://subversion.le-tex.de/idmltools/trunk/idml2xml
    3. Save the file.

    4. Commit the SVN externals.

    5. Update the SVN code repository.

      The externals will be fetched and are available in the SVN code repository.

  3. Creating the central XML catalog

    1. Create a central XML catalog in the same directory as the calabash external (generally, this ist the code repository). The central XML catalog must be named xmlcatalog/catalog.xml.

    2. Include the element rewriteURI with the attributes uriStartString and rewritePrefix

      <rewriteURI uriStartString="http://customers.le-tex.de/generic/book-conversion/" rewritePrefix="../"/> 
    3. Include the XML catalogs via nextCatalog.

      This central XML catalog matches the externals above:

      <catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">  
      
        <rewriteURI uriStartString="http://customers.le-tex.de/generic/book-conversion/" rewritePrefix="../"/> 
        
        <nextCatalog catalog="../xproc-util/store-debug/xmlcatalog/catalog.xml"/> 
        <nextCatalog catalog="../xproc-util/store-zip/xmlcatalog/catalog.xml"/>
        <nextCatalog catalog="../xproc-util/xml-model/xmlcatalog/catalog.xml"/> 
        <nextCatalog catalog="../xproc-util/xslt-mode/xmlcatalog/catalog.xml"/> 
        <nextCatalog catalog="../xproc-util/copy-files/xmlcatalog/catalog.xml"/>
        <nextCatalog catalog="../idml2xml/xmlcatalog/catalog.xml"/>
        <nextCatalog catalog="../hub2html/xmlcatalog/catalog.xml"/> 
        <nextCatalog catalog="../pubcoach/xmlcatalog/catalog.xml"/> 
        <nextCatalog catalog="../schema/hub/xmlcatalog/catalog.xml"/>
        <nextCatalog catalog="../schema/xhtml1/xmlcatalog/catalog.xml"/>
        <nextCatalog catalog="../schema/iso-schematron/catalog.xml"/>
        <nextCatalog catalog="../evolve-hub/xmlcatalog/catalog.xml"/>
        <nextCatalog catalog="../xslt-util/colors/xmlcatalog/catalog.xml"/>
        <nextCatalog catalog="../xslt-util/hex/xmlcatalog/catalog.xml"/>
        <nextCatalog catalog="../xslt-util/lengths/xmlcatalog/catalog.xml"/>
        <nextCatalog catalog="../xslt-util/functx/xmlcatalog/catalog.xml"/>
        <nextCatalog catalog="../css-expand/xmlcatalog/catalog.xml"/>  
        <nextCatalog catalog="../css-generate/xmlcatalog/catalog.xml"/>  
        <nextCatalog catalog="../epubtools/xmlcatalog/catalog.xml"/>    
        <nextCatalog catalog="../fontlib/xmlcatalog/catalog.xml"/>
        <nextCatalog catalog="../htmlreports/xmlcatalog/catalog.xml"/>
        <nextCatalog catalog="../xslt-util/mime-type/xmlcatalog/catalog.xml"/>
        <nextCatalog catalog="../xslt-util/resolve-uri/xmlcatalog/catalog.xml"/>
        <nextCatalog catalog="../xslt-util/xslt-based-catalog-resolver/xmlcatalog/catalog.xml"/>
        <nextCatalog catalog="../html-tables/xmlcatalog/catalog.xml"/>
        <nextCatalog catalog="../idmlval/xmlcatalog/catalog.xml"/>
        
        <nextCatalog catalog="content-repo.catalog.xml"/> 
        
      </catalog>
  4. Starting a conversion

    Now, you can start the conversion with a call on the bash.

    1. Go into the SVN code repository which is called trunk

    2. Call the conversion. You have two possibilites for it.

      • Call the conversion with the Makefile

        make idml2epub_hub IN_FILE=../content/le-tex/whitepaper/de/transpect_wp_de.idml debug=yes
      • Call the conversion without the Makefile

        ???????????????????????????????????????????????????

This section describes the setting up of a transpect project that converts an IDML file into a TEI into an EPUB file with metadata which are generated from ONIX.

Intermediate steps are:

Input files:

Output files:

  1. Creating SVN repositories

    1. Creating an SVN code repository

    2. Creating an SVN content repository (if none is existing)

      Generally, the SVN code repository and the SVN content repository are located in the same directory. But, the SVN content repository can be located in a different directory. Then, you have to consider Step Step 7.

  2. Including the SVN externals

    1. Open an editor in the SVN code repository (trunk):

      export EDITOR='name of the editor' && svn pe svn:externals .

      e.g.

      export EDITOR=emacs && svn pe svn:externals .

      A temporary file will be opened in the choosen editor.

    2. Edit the file and include the needed SVN externals.

      You need the following SVN externals:

      calabash https://subversion.le-tex.de/common/calabash/
      css-expand https://subversion.le-tex.de/common/css-expand/
      css-generate https://subversion.le-tex.de/common/css-generate/
      epubtools https://subversion.le-tex.de/common/epubtools/
      evolve-hub https://subversion.le-tex.de/common/evolve-hub/
      fontlib/dejavu-sans https://subversion.le-tex.de/common/fontlib/dejavu-sans
      fontlib/xmlcatalog https://subversion.le-tex.de/common/fontlib/xmlcatalog/
      htmlreports https://subversion.le-tex.de/common/htmlreports/trunk
      htmltemplates https://subversion.le-tex.de/common/htmltemplates/trunk/
      hub2html https://subversion.le-tex.de/common/hub2html_simple/trunk
      hub2tei https://subversion.le-tex.de/common/hub2tei/trunk/
      tei2html https://subversion.le-tex.de/common/tei2html/
      pubcoach https://subversion.le-tex.de/common/pubcoach/trunk/
      schema/Hub https://github.com/gimsieke/Hub/trunk/
      schema/Hub/css https://github.com/gimsieke/CSSa/trunk/
      schema/xhtml1 https://subversion.le-tex.de/common/schema/xhtml1/
      schema/iso-schematron https://subversion.le-tex.de/common/schema/iso-schematron/
      xproc-util/store-debug https://subversion.le-tex.de/common/xproc-util/store-debug
      xproc-util/xml-model https://subversion.le-tex.de/common/xproc-util/xml-model
      xproc-util/xslt-mode https://subversion.le-tex.de/common/xproc-util/xslt-mode
      xproc-util/store-zip https://subversion.le-tex.de/common/xproc-util/store-zip
      xproc-util/copy-files https://subversion.le-tex.de/common/xproc-util/copy-files
      xslt-util/colors https://subversion.le-tex.de/common/letex-util/colors/
      xslt-util/hex https://subversion.le-tex.de/common/letex-util/hex/
      xslt-util/lengths https://subversion.le-tex.de/common/letex-util/lengths/
      xslt-util/mime-type https://subversion.le-tex.de/common/letex-util/mime-type/
      xslt-util/resolve-uri https://subversion.le-tex.de/common/letex-util/resolve-uri/
      xslt-util/xslt-based-catalog-resolver -r1688 https://subversion.le-tex.de/common/letex-util/xslt-based-catalog-resolver/ 
      xslt-util/functx/XML_Elements_and_Attributes/XML_Document_Structure https://subversion.le-tex.de/common/functx/XML_Elements_and_Attributes/XML_Document_Structure
      xslt-util/functx/Sequences/Positional https://subversion.le-tex.de/common/functx/Sequences/Positional
      xslt-util/functx/xmlcatalog https://subversion.le-tex.de/common/functx/xmlcatalog
      html-tables https://subversion.le-tex.de/common/html-tables/lib/
      idmlval https://subversion.le-tex.de/idmltools/trunk/validation/
      idml2xml https://subversion.le-tex.de/idmltools/trunk/idml2xml
    3. Save the file.

    4. Commit the SVN externals.

    5. Update the SVN code repository.

      The externals will be fetched and are available in the SVN code repository.

  3. Creating the central XML catalog

    1. Create a central XML catalog in the same directory as the calabash external (generally, this ist the code repository). The central XML catalog must be named xmlcatalog/catalog.xml.

    2. Include the element rewriteURI with the attributes uriStartString and rewritePrefix

      <rewriteURI uriStartString="http://customers.le-tex.de/generic/book-conversion/" rewritePrefix="../"/> 
    3. Include the XML catalogs via nextCatalog.

      This central XML catalog matches the externals above:

      <catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">  
        
        <rewriteURI uriStartString="http://customers.le-tex.de/generic/book-conversion/" rewritePrefix="../"/> 
        
        <nextCatalog catalog="../xproc-util/store-debug/xmlcatalog/catalog.xml"/> 
        <nextCatalog catalog="../xproc-util/store-zip/xmlcatalog/catalog.xml"/>
        <nextCatalog catalog="../xproc-util/xml-model/xmlcatalog/catalog.xml"/> 
        <nextCatalog catalog="../xproc-util/xslt-mode/xmlcatalog/catalog.xml"/> 
        <nextCatalog catalog="../xproc-util/copy-files/xmlcatalog/catalog.xml"/>
        <nextCatalog catalog="../idml2xml/xmlcatalog/catalog.xml"/>
        <nextCatalog catalog="../hub2html/xmlcatalog/catalog.xml"/> 
        <nextCatalog catalog="../pubcoach/xmlcatalog/catalog.xml"/> 
        <nextCatalog catalog="../schema/hub/xmlcatalog/catalog.xml"/>
        <nextCatalog catalog="../schema/xhtml1/xmlcatalog/catalog.xml"/>
        <nextCatalog catalog="../schema/iso-schematron/catalog.xml"/>
        <nextCatalog catalog="../evolve-hub/xmlcatalog/catalog.xml"/>
        <nextCatalog catalog="../xslt-util/colors/xmlcatalog/catalog.xml"/>
        <nextCatalog catalog="../xslt-util/hex/xmlcatalog/catalog.xml"/>
        <nextCatalog catalog="../xslt-util/lengths/xmlcatalog/catalog.xml"/>
        <nextCatalog catalog="../xslt-util/functx/xmlcatalog/catalog.xml"/>
        <nextCatalog catalog="../css-expand/xmlcatalog/catalog.xml"/>  
        <nextCatalog catalog="../css-generate/xmlcatalog/catalog.xml"/>  
        <nextCatalog catalog="../epubtools/xmlcatalog/catalog.xml"/>    
        <nextCatalog catalog="../fontlib/xmlcatalog/catalog.xml"/>
        <nextCatalog catalog="../htmlreports/xmlcatalog/catalog.xml"/>
        <nextCatalog catalog="../xslt-util/mime-type/xmlcatalog/catalog.xml"/>
        <nextCatalog catalog="../xslt-util/resolve-uri/xmlcatalog/catalog.xml"/>
        <nextCatalog catalog="../xslt-util/xslt-based-catalog-resolver/xmlcatalog/catalog.xml"/>
        <nextCatalog catalog="../html-tables/xmlcatalog/catalog.xml"/>
        <nextCatalog catalog="../idmlval/xmlcatalog/catalog.xml"/>
        <nextCatalog catalog="../hub2tei/xmlcatalog/catalog.xml"/> 
        <nextCatalog catalog="../tei2html/xmlcatalog/catalog.xml"/> 
        <nextCatalog catalog="../htmltemplates/xmlcatalog/catalog.xml"/>
        
        <nextCatalog catalog="content-repo.catalog.xml"/> 
        
        </catalog>
  4. Creating metadata

    You have to work with the following files

    • ONIX file with metadata trunk/adaptions/le-tex/idml2epub_tei_onix/metadata/meta.xml

      This file contains the metadata for one or more books. In the example, the metadata are saved in the format ONIX.

    • HTML template file called trunk/adaptions/common/htmltemplates/template.xhtml

      This file provides the structure of the EPUB file. It will be filled with content generated from the file implementation.xsl

    • XSL file for the transformation trunk/adaptions/common/htmltemplates/implementation.xsl

      This file defines

    1. Configurate the file meta.xml.

    2. Configurate the file template.xhtml.

    3. Configurate the file implementation.xsl.

  5. Starting a conversion

    Now, you can start the conversion with a call on the bash.

    1. Go into the SVN code repository which is called trunk

    2. Call the conversion. You have two possibilites for it.

      • Call the conversion with the Makefile

        make idml2epub_tei_onix IN_FILE=../content/le-tex/whitepaper/de/transpect_wp_de.idml debug=yes
      • Call the conversion without the Makefile

        calabash/calabash.sh -D \
            -i conf=conf/conf.xml \
            -o hub=transpect-demo/content/le-tex/whitepaper/de/output/hub/transpect_wp_de.flat.xml \
        		-o hubevolved=transpect-demo/content/le-tex/whitepaper/de/output/flat/transpect_wp_de.evolved.xml \
        		-o tei=transpect-demo/content/le-tex/whitepaper/de/output/tei/transpect_wp_de.tei.xml \
        		-o html=transpect-demo/content/le-tex/whitepaper/de/output/epub/transpect_wp_de.xhtml \
        		-o htmlreport=transpect-demo/content/le-tex/whitepaper/de/output/report/transpect_wp_de.xhtml \
        		-o schematron=transpect-demo/content/le-tex/whitepaper/de/output/report/transpect_wp_de.sch.xml \
        		-o result=$(DEVNULL) \
        		transpect-demo/trunk/adaptions/common/xpl/idml2epub_tei_onix.xpl \
        		idmlfile=transpect-demo/content/le-tex/whitepaper/de/output/transpect_wp_de.idml \
        		idml-target-uri=transpect-demo/content/le-tex/whitepaper/de/output/transpect_wp_de.idml \
        		check=yes \
        		local-css=true \
        		debug-dir-uri=transpect-demo/content/le-tex/whitepaper/de/output/debug 
        		debug=yes