transpect:file-uri file-uri

file-uri/file-uri.xpl

Import URI: http://transpect.le-tex.de/xproc-util/file-uri/file-uri.xpl

This step accepts either a file system path or a URL in its 'filename' option. It will normalize them so that both a file system path and a file: URL are available. If filename starts with http: or https:, the file will be retrieved and stored locally. Please note that this retrieval will not work for remote directories.

Its primary uses are

  • giving users the liberty to either specify a URL or an OS-specific path for input file parameters;
  • making XML catalog resolution available to any URI, not just when accessing resources through catalog-enabled methods such as doc();
  • if, after optional catalog resolution, the 'filename' URI is still http:/https:, p:http-request will be used to store the file locally.

Examples for 'filename' values

  • C:/temp/file.docx,
  • c:\temp\file.docx,
  • file:/C:/temp/file.docx,
  • file:///C:/temp/file.docx,
  • /tmp/file.docx,
  • subdir/file.docx,
  • https://github.com/me/myrepo/blob/master/file.docx?raw=true

Relative Paths

Relative paths will be resolved against the current working directory, which is better than the static base uri most of the time but which might not always be what the user wants. It is a good idea to absolutize paths, as in $(readlink -f subdir/file.docx) or $(cygpath -ma subdir/file.docx).

XML Catalogs

If a catalog is provided on the catalog port and an XSLT stylesheet for catalog resolution is supplied on the resolver port, http:/https: URIs will be catalog-resolved first, see below.

Storage Location for HTTP Downloads

It is possible to specify a temporary directory in the 'tmpdir' option. By default, it will be the subdir 'tmp' of the user’s home directory. The 'tmpdir' option accepts both a file: URL and an OS path, thanks to this normalization step.

Please note that temporary files will not be deleted by this step.

Unique File Names for HTTP Downloads

If the option 'make-unique' is true (which it is by default), the files that are fetched by p:http-request will get a random string like _0fa8d348 appended to their base name.

Output format

The output is a c:result element with the following attributes:

os-path
OS-specific path. This is always present except when there is error-status
local-href
file: URI. This is always present except when there is error-status
error-status
This may only happen if the 'filename' was an HTTP URI and if their was an error retrieving the resource
href
The post catalog-resolution URI of the resource (if it is an HTTP URI)
orig-href
The pre catalog-resolution URI of the resource (if different from post catalog)
lastpath
For ordinary files, the non-directory part including suffix. For directories, the last path component without trailing slash.

Input Ports

NameDocumentationConnections

source

Just to prevent that the default readable port will be connected to the catalog or resolver ports.

catalog

If it is a

<catalog>

document in the namespace

urn:oasis:names:tc:entity:xmlns:xml:catalog

, it will be used for catalog resolution of URIs that start with 'http'.

resolver

An XSLT stylesheet that provides the named template resolve. This template takes a parameter $uri and produces a document <result unresolved="{$uri}"/>. If the URI could be resolved to another URI, the result will take the form <result unresolved="{$uri}" resolved="{$resolved-uri}"/>.

By default, this step only provides trivial (i.e., identity) catalog resolution.

You have to supply an XSLT-based catalog resolver on the resolver port in order to use catalog resolution. That is because native catalog resolution is not available for p:http-request or by XPath function. This means that you can’t programmatically decide whether to retrieve a file via p:http-request or use the local file.

You may use the repository version of the XSLT-based resolver. However, in order to avoid network traffic, you should consider using a local copy. In order to avoid importing it via its absolute or relative file system path, you should use the transpect appoach of importing the resolver’s XML catalog via <nextCatalog from your project catalog. Then you can import the XSLT-based resolver by its canonical URI.

Output Ports

NameDocumentationConnections

result

A c:result document with a local-href and an os-path attribute.

Options

NameDocumentationDefault

filename

A URI or an OS-specific identifier. Relative paths will be resolved against the static-base-uri(). A future improvement might use the XSLT-based catalog resolver in order to detect whether a given http: URL will actually resolve to a local file.

make-unique

Whether to store files retrieved over HTTP with a unique random name in the temp dir.

'true'

fetch-http

Whether to fetch files referenced by URIs matching '^https?:'.

'true'

tmpdir

URI or OS name of a directory for storing files retrieved via HTTP.

''

Subpipeline

StepInputsOutputsOptions

p:xslt catalog-resolve

stylesheet

resolver on file-uri

source

p:document

catalog on file-uri

result

template-name = 'resolve'

p:sink d53e308

source

result on catalog-resolve

pos:info info

result

p:add-attribute empty-result

source

 <c:result/>

result

attribute-name = 'cwd'

match = '/*'

attribute-value = if (/*/@file-separator = '/') then replace(/*/@cwd, '\\', '/') else /*/@cwd

p:set-attributes add-orig-href

If the URL has been catalog-resolved, the original URL will be copied here from the preceding XSLT step, in an orig-href attribute. Apart from that, the XSLT step has to prodce an href attribute.

Please note that despite its name, the @href attribute doesn’t necessarily contain a URI. If $filename is an OS path, @href will contain this path.

source

result on empty-result

attributes

result on catalog-resolve

result

match = '/c:result'

p:group d53e340

p:variable catalog-resolved-uri

/c:result/@href

p:choose analyze-filename

matches($catalog-resolved-uri, '^file://///[^/]')

Windows UNC path URI. file:///// → \\ .

p:add-attribute d53e351

source

 <c:result/>

result

attribute-name = 'local-href'

match = '/*'

attribute-value = $catalog-resolved-uri

p:add-attribute d53e364

source

result on d53e351

result

match = '/*'

attribute-name = 'os-path'

attribute-value = replace(replace($catalog-resolved-uri, '^file:///', ''), '/', '\\')

matches($catalog-resolved-uri, '^file:/')

Unix file URI or Windows file: URI containing a drive letter.

p:add-attribute d53e375

source

source on file-uri

result

match = '/*'

attribute-name = 'local-href'

attribute-value = $catalog-resolved-uri

p:add-attribute d53e380

source

result on d53e375

result

match = '/*'

attribute-name = 'os-path'

attribute-value = replace($catalog-resolved-uri, '^file:/+(([a-z]:)/)?', '$2/', 'i')

matches($catalog-resolved-uri, '^/')

Unix Filename

p:add-attribute d53e391

source

source on file-uri

result

match = '/*'

attribute-name = 'local-href'

attribute-value = concat('file:', $catalog-resolved-uri)

p:add-attribute d53e396

source

result on d53e391

result

match = '/*'

attribute-name = 'os-path'

attribute-value = $catalog-resolved-uri

matches($catalog-resolved-uri, '^[a-z]:', 'i')

Windows path, either with forward or backward slashes.

p:add-attribute d53e407

source

source on file-uri

result

match = '/*'

attribute-name = 'local-href'

attribute-value = concat('file:///', replace($catalog-resolved-uri, '\\', '/'))

p:add-attribute d53e412

source

result on d53e407

result

match = '/*'

attribute-name = 'os-path'

attribute-value = $catalog-resolved-uri

matches($catalog-resolved-uri, '^https?:') and $fetch-http = 'true'

HTTP URL. Since there is no system property for a temp dir, store it in the subdir tmp of the user’s home dir. Optionally generate a random name.

p:uuid uuid

source

 <doc uuid=""/>

result

match = '/*/@uuid'

p:sink d53e434

source

result on uuid

transpect:file-uri tmp-dir

source

result

filename = ($tmpdir[normalize-space()], concat(/c:result/@user-home, '/tmp/'))[1]

p:group d53e444

p:variable tmp-dir-href

result on tmp-dir

/c:result/@local-href

p:add-attribute local-href

source

result on uuid

result

attribute-name = 'local-href'

match = '/*'

attribute-value = concat( $tmp-dir-href, replace( replace($catalog-resolved-uri, '^.+/', ''), '(.+?)([.?#].+)?', '$1' ), if ($make-unique = 'true') then concat('_', substring(/*/@uuid, 1, 8)) else '', replace(replace(replace($catalog-resolved-uri, '^.+/', ''), '^[^?#.]+', ''), '[?#].*$', '') )

p:sink d53e464

source

result on local-href

p:identity d53e466

source

 <c:request method="GET" detailed="true"/>

result

p:add-attribute d53e477

source

result on d53e466

result

match = '/c:request'

attribute-name = 'href'

attribute-value = $catalog-resolved-uri

p:http-request http-request

source

result on d53e477

result

p:choose store-http-resource

not(starts-with(/c:response/@status, '2'))

cx:message d53e489

source

result on http-request

result

message = concat('Cannot retrieve ', $catalog-resolved-uri, '. Status: ', /c:response/@status)

p:sink d53e494

source

result on d53e489

p:add-attribute d53e496

source

 <c:result/>

result

attribute-name = 'error-status'

match = '/c:result'

attribute-value = /c:response/@status

/c:response/c:body/(.[normalize-space(.)] | c:data)

p:store d53e517

source

result on http-request

result

href = /doc/@local-href

transpect:file-uri http-to-local-result_binary

source

result on d53e517

result

filename = /doc/@local-href

p:otherwise

p:store d53e541

source

result on http-request

result

omit-xml-declaration = 'false'

href = /doc/@local-href

transpect:file-uri http-to-local-result_xml

source

result on d53e541

result

filename = /doc/@local-href

p:add-attribute d53e564

source

result

match = '/c:result'

attribute-name = 'href'

attribute-value = $catalog-resolved-uri

matches($catalog-resolved-uri, '^\\\\[^\\]')

Windows UNC path. \\ → file:///// .

p:add-attribute d53e577

source

 <c:result/>

result

attribute-name = 'os-path'

match = '/*'

attribute-value = $catalog-resolved-uri

p:add-attribute d53e590

source

result on d53e577

result

match = '/*'

attribute-name = 'local-href'

attribute-value = concat('file:///', replace($catalog-resolved-uri, '\\', '/')

p:otherwise

Other protocol or relative filename. We don’t support other protocols/notations, so we assume it to be a relative path.

transpect:file-uri cwd-uri

source

source on file-uri

result

filename = concat(/c:result/@cwd, '/')

transpect:file-uri resolved-uri

source

result on cwd-uri

result

filename = resolve-uri($catalog-resolved-uri, /c:result/@local-href)

p:add-attribute lastpath

source

result

attribute-name = 'lastpath'

match = '/*'

attribute-value = replace(/*/@local-href, '^.+/([^/]+)/*$', '$1')

Used by