Gerrit Imsieke (@gimsieke), le-tex publishing services (@letexml)
Also:
DITA OT Day 2018 video (~2:30–4:25), xsl-list post
“breaking up XML on page break element”
(Geert’s 2014-07-04 XSL Mailing list post)
An “overlapping markup” problem (page division vs. document hierarchy)
Martin Luther’s translation of the New Testament into German (1522), TEI P5 XML from Deutsches Textarchiv
452 pb
milestone elements at varying depths
<pbs>
<pb path="/TEI/text/body/div/div/p/pb" count="238"/>
<pb path="/TEI/text/body/div/div/pb" count="91"/>
<pb path="/TEI/text/body/pb" count="52"/>
<pb path="/TEI/text/body/div/pb" count="47"/>
<pb path="/TEI/text/front/pb" count="11"/>
<pb path="/TEI/text/body/div/p/pb" count="10"/>
<pb path="/TEI/text/front/div/p/pb" count="3"/>
</pbs>
/TEI
in split
modewith a tunneled $restricted-to
parameter
(omitting xsl:result-document
here for brevity)
teiHeader
would be missing……if it weren’t for this template:
FO block splitting
Nested grouping (group-starting-with
for <two-col-start>
,
group-ending-with
for <two-col-end>
)
Split at line breaks
Avoid splitting at line breaks in embedded list items or footnotes by
split
to #default
mode when processing
list item / footnote contentLuther’s 1522 New Testament translation:
pb
elements: 452Hypothesis:
Surprise: 1st 10 pages: milliseconds, 1st 375 pages: minutes
⇒ Need to measure dependence on chunk size at constant doc length
Repeatedly removing every other pb
results in fewer chunks
Culprit: Conditional Identity Template
When chunk length grows…
The number of Conditional Identity Template invocations cannot be reduced
Pass generated IDs instead of nodes, compare
generate-id() = $restricted-to
instead of exists(. intersect $restricted-to)
⇒ 20-fold acceleration for large chunks
No. Michael Kay wrote in 2014:
“... the real problem is that the logic is going down to descendants, then up to their ancestors, and then down again, and that's intrinsically not processing nodes in document order, which is a precondition for streaming.”
Even if it were feasible, the scaling with chunk size would be detrimental.
Can we have a configurable splitter for JATS, DocBook, TEI, HTML etc.?
final
entry template
and private
internal templates