transpect:file-uri file-uri
file-uri/file-uri.xpl
Import URI: http://transpect.le-tex.de/xproc-util/file-uri/file-uri.xpl
This step accepts either a file system path or a URL in its 'filename' option. It will normalize them so that both a file system path and a file: URL are available. If filename starts with http: or https:, the file will be retrieved and stored locally. Please note that this retrieval will not work for remote directories.
Its primary uses are
- giving users the liberty to either specify a URL or an OS-specific path for input file parameters;
- making XML catalog resolution available to any URI, not just when accessing resources through catalog-enabled methods
such as
doc()
; - if, after optional catalog resolution, the 'filename' URI is still http:/https:,
p:http-request
will be used to store the file locally.
Examples for 'filename' values
C:/temp/file.docx
,c:\temp\file.docx
,file:/C:/temp/file.docx
,file:///C:/temp/file.docx
,/tmp/file.docx
,subdir/file.docx
,https://github.com/me/myrepo/blob/master/file.docx?raw=true
Relative Paths
Relative paths will be resolved against the current working directory, which is better than the static base uri most of the
time but which might not always be what the user wants. It is a good idea to absolutize paths, as in
$(readlink -f subdir/file.docx)
or $(cygpath -ma subdir/file.docx)
.
XML Catalogs
If a catalog is provided on the catalog port and an XSLT stylesheet for catalog resolution is supplied on the resolver port, http:/https: URIs will be catalog-resolved first, see below.
Storage Location for HTTP Downloads
It is possible to specify a temporary directory in the 'tmpdir' option. By default, it will be the subdir 'tmp' of the user’s home directory. The 'tmpdir' option accepts both a file: URL and an OS path, thanks to this normalization step.
Please note that temporary files will not be deleted by this step.
Unique File Names for HTTP Downloads
If the option 'make-unique' is true (which it is by default), the files that are fetched by p:http-request
will get a random string like _0fa8d348
appended to their base name.
Output format
The output is a c:result
element with the following attributes:
os-path
- OS-specific path. This is always present except when there is
error-status
local-href
- file: URI. This is always present except when there is
error-status
error-status
- This may only happen if the 'filename' was an HTTP URI and if their was an error retrieving the resource
href
- The post catalog-resolution URI of the resource (if it is an HTTP URI)
orig-href
- The pre catalog-resolution URI of the resource (if different from post catalog)
lastpath
- For ordinary files, the non-directory part including suffix. For directories, the last path component without trailing slash.
Input Ports
Name | Documentation | Connections |
---|---|---|
sourceⓅ | Just to prevent that the default readable port will be connected to the catalog or resolver ports. | |
catalog | If it is a <catalog> document in the namespace urn:oasis:names:tc:entity:xmlns:xml:catalog , it will be used for catalog resolution of URIs that start with 'http'. | |
resolver | An XSLT stylesheet that provides the named template By default, this step only provides trivial (i.e., identity) catalog resolution. You have to supply an XSLT-based catalog resolver on the resolver port in order to use catalog resolution. That is because native catalog resolution is not available for p:http-request or by XPath function. This means that you can’t programmatically decide whether to retrieve a file via p:http-request or use the local file. You may use the repository version of the XSLT-based resolver. However, in order to avoid network traffic, you should consider
using a local copy. In order to avoid importing it via its absolute or relative file system path, you should use the
transpect appoach of importing the resolver’s XML catalog via |
Output Ports
Name | Documentation | Connections |
---|---|---|
resultⓅ | A c:result document with a local-href and an os-path attribute. |
Options
Name | Documentation | Default |
---|---|---|
filename | A URI or an OS-specific identifier. Relative paths will be resolved against the static-base-uri(). A future improvement might use the XSLT-based catalog resolver in order to detect whether a given http: URL will actually resolve to a local file. | |
make-unique | Whether to store files retrieved over HTTP with a unique random name in the temp dir. | 'true' |
fetch-http | Whether to fetch files referenced by URIs matching '^https?:'. | 'true' |
tmpdir | URI or OS name of a directory for storing files retrieved via HTTP. | '' |
Subpipeline
Step | Inputs | Outputs | Options | ||||||
---|---|---|---|---|---|---|---|---|---|
p:xslt catalog-resolve | result | template-name = 'resolve' | |||||||
p:sink d53e308 |
| ||||||||
pos:info info | result | ||||||||
p:add-attribute empty-result |
| result | attribute-name = 'cwd' match = '/*' attribute-value = if (/*/@file-separator = '/') then replace(/*/@cwd, '\\', '/') else /*/@cwd | ||||||
p:set-attributes add-orig-href If the URL has been catalog-resolved, the original URL will be copied here from the preceding XSLT step, in an orig-href attribute. Apart from that, the XSLT step has to prodce an href attribute. Please note that despite its name, the @href attribute doesn’t necessarily contain a URI. If $filename is an OS path, @href will contain this path. |
| result | match = '/c:result' | ||||||
p:group d53e340 | |||||||||
p:variable catalog-resolved-uri | /c:result/@href | ||||||||
p:choose analyze-filename | |||||||||
matches($catalog-resolved-uri, '^file://///[^/]') | Windows UNC path URI. file:///// → \\ . | ||||||||
p:add-attribute d53e351 |
| result | attribute-name = 'local-href' match = '/*' attribute-value = $catalog-resolved-uri | ||||||
p:add-attribute d53e364 | result | match = '/*' attribute-name = 'os-path' attribute-value = replace(replace($catalog-resolved-uri, '^file:///', ''), '/', '\\') | |||||||
matches($catalog-resolved-uri, '^file:/') | Unix file URI or Windows file: URI containing a drive letter. | ||||||||
p:add-attribute d53e375 | result | match = '/*' attribute-name = 'local-href' attribute-value = $catalog-resolved-uri | |||||||
p:add-attribute d53e380 | result | match = '/*' attribute-name = 'os-path' attribute-value = replace($catalog-resolved-uri, '^file:/+(([a-z]:)/)?', '$2/', 'i') | |||||||
matches($catalog-resolved-uri, '^/') | Unix Filename | ||||||||
p:add-attribute d53e391 | result | match = '/*' attribute-name = 'local-href' attribute-value = concat('file:', $catalog-resolved-uri) | |||||||
p:add-attribute d53e396 | result | match = '/*' attribute-name = 'os-path' attribute-value = $catalog-resolved-uri | |||||||
matches($catalog-resolved-uri, '^[a-z]:', 'i') | Windows path, either with forward or backward slashes. | ||||||||
p:add-attribute d53e407 | result | match = '/*' attribute-name = 'local-href' attribute-value = concat('file:///', replace($catalog-resolved-uri, '\\', '/')) | |||||||
p:add-attribute d53e412 | result | match = '/*' attribute-name = 'os-path' attribute-value = $catalog-resolved-uri | |||||||
matches($catalog-resolved-uri, '^https?:') and $fetch-http = 'true' | HTTP URL. Since there is no system property for a temp dir, store it in the subdir tmp of the user’s home dir. Optionally generate a random name. | ||||||||
p:uuid uuid |
| result | match = '/*/@uuid' | ||||||
p:sink d53e434 | |||||||||
transpect:file-uri tmp-dir |
| result | filename = ($tmpdir[normalize-space()], concat(/c:result/@user-home, '/tmp/'))[1] | ||||||
p:group d53e444 | |||||||||
p:variable tmp-dir-href | /c:result/@local-href | ||||||||
p:add-attribute local-href | result | attribute-name = 'local-href' match = '/*' attribute-value = concat( $tmp-dir-href, replace( replace($catalog-resolved-uri, '^.+/', ''), '(.+?)([.?#].+)?', '$1' ), if ($make-unique = 'true') then concat('_', substring(/*/@uuid, 1, 8)) else '', replace(replace(replace($catalog-resolved-uri, '^.+/', ''), '^[^?#.]+', ''), '[?#].*$', '') ) | |||||||
p:sink d53e464 |
| ||||||||
p:identity d53e466 |
| result | |||||||
p:add-attribute d53e477 | result | match = '/c:request' attribute-name = 'href' attribute-value = $catalog-resolved-uri | |||||||
p:http-request http-request | result | ||||||||
p:choose store-http-resource | |||||||||
not(starts-with(/c:response/@status, '2')) | |||||||||
cx:message d53e489 |
| result | message = concat('Cannot retrieve ', $catalog-resolved-uri, '. Status: ', /c:response/@status) | ||||||
p:sink d53e494 | |||||||||
p:add-attribute d53e496 |
| result | attribute-name = 'error-status' match = '/c:result' attribute-value = /c:response/@status | ||||||
/c:response/c:body/(.[normalize-space(.)] | c:data) | |||||||||
p:store d53e517 |
| result | href = /doc/@local-href | ||||||
transpect:file-uri http-to-local-result_binary | result | filename = /doc/@local-href | |||||||
p:otherwise | |||||||||
p:store d53e541 |
| result | omit-xml-declaration = 'false' href = /doc/@local-href | ||||||
transpect:file-uri http-to-local-result_xml | result | filename = /doc/@local-href | |||||||
p:add-attribute d53e564 |
| result | match = '/c:result' attribute-name = 'href' attribute-value = $catalog-resolved-uri | ||||||
matches($catalog-resolved-uri, '^\\\\[^\\]') | Windows UNC path. \\ → file:///// . | ||||||||
p:add-attribute d53e577 |
| result | attribute-name = 'os-path' match = '/*' attribute-value = $catalog-resolved-uri | ||||||
p:add-attribute d53e590 | result | match = '/*' attribute-name = 'local-href' attribute-value = concat('file:///', replace($catalog-resolved-uri, '\\', '/') | |||||||
p:otherwise | Other protocol or relative filename. We don’t support other protocols/notations, so we assume it to be a relative path. | ||||||||
transpect:file-uri cwd-uri | result | filename = concat(/c:result/@cwd, '/') | |||||||
transpect:file-uri resolved-uri | result | filename = resolve-uri($catalog-resolved-uri, /c:result/@local-href) | |||||||
p:add-attribute lastpath |
| result | attribute-name = 'lastpath' match = '/*' attribute-value = replace(/*/@local-href, '^.+/([^/]+)/*$', '$1') |