pdf2fxl – A PDF → Fixed Layout EPUB Converter

If you are browsing the source code on our Subversion repository, you won’t see most of it because pdf2xsl relies on svn externals.

If you don’t have an svn client at your disposal, you may see which externals are being used by issuing the following statement on the command line:

curl -H Depth:0 -X PROPFIND https://subversion.le-tex.de/common/pdf2fxl/

and look for <S:externals> in the output. If you don’t have curl, you… are lost. Sorry.

Maybe we should start with the important stuff anyway.

There’s generated (using transpectdoc, an XProc documentation generator) documentation of the complete project, including externals, in the repo.

Prerequisites

  1. Poppler, in particular poppler’s pdftohtml binary
  2. ImageMagick

Poppler and ImageMagick are available for many package managers, including Cygwin’s. We made sure that the Bash front-end script runs on the Oracle Java for Windows / Cygwin combo. We have also tried to avoid utilities/options such as readlink -f that are known to not work on vanilla Mac OS X / BSD systems. However, we didn’t try it yet on a Mac and we’d like to hear from you whether it’s woking there.

The other prerequisites are Java 1.6 (or newer) and bash. Sorry no Windows batch file at the moment. But you may figure out the Poppler, ImageMagick, and Calabash invocation by peeking into the bash script. The Calabash processor is included as an svn external, and so are le-tex transpect’s XProc/XSLT modules for building EPUBs, etc.

Invocation

./pdf2fxl.sh -i sample-input/demojam.pdf -d -e

(Call it without arguments to see a brief help text.)

You will find the resulting epub in sample-input/demojam/demojam.wrap.epub.

Sample Input/Output

You may download the source PDF and its EPUB output directly from the repo. Please note that as of this writing, fixed layout EPUB3 files are best viewed in Readium or Adobe Digital Editions 4 if you are on a desktop computer.

You can verify in the readers that the text is actually selectable and not part of the images.

Generating Documentation

The documentation in the repo has been generated with a Calabash invocation as in generate-doc.sh. If you checked out the repo, you can generate it yourself.

Author

Martin Kraetke, le-tex publishing services

@mkraetke, @letexml

(small contributions, such as this page, by Gerrit Imsieke, @gimsieke)

License

This application and the underlying transpect framework is licensed under a 2-clause BSD license.

© 2014–2015, le-tex publising services GmbH. All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
  2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY LE-TEX PUBLISING SERVICES “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL LE-TEX PUBLISING SERVICES OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Individual license terms of the included 3rd-party libraries apply.

Last modified: 2015-02-16T14:54:56Z