Converters
1. Introduction
Converters are processors converting XML documents from one format to another. For
example, the standard HTML converter documented below converts an XML document into
an HTML document. This HTML document can then be sent to a web browser using the HTTP serializer, or attached to an
email with the Email processor.
Converters typically have a data output containing the converted
document.
2. Standard Converters
The standard converters convert XML infosets (the XML documents that circulate in
PresentationServer pipelines) into text according to standard output methods
defined by the XSLT specification. They convert to the following formats:
- XML: a standard XML document
- HTML: a standard HTML document
- Text: any text document
The resulting text is sent to the data output. It is embedded in an XML
document as specified by the text
document format.
2.1 Configuration
The configuration of the standard converters consists of the following optional
elements:
Element |
Purpose |
Default |
method |
XSLT output method |
html, xml or text, depending on the serializer |
content-type |
Content type hint specified on the output document element
|
Specific to each serializer |
encoding |
Encoding hint specified on the output document element
|
utf-8 |
version |
HTML or XML version number |
4.01 for HTML (ignored for XML, which always output 1.0) |
public-doctype |
The public doctype |
"-//W3C//DTD HTML 4.01 Transitional//EN" for HTML, none otherwise |
system-doctype |
The system doctype |
"http://www.w3.org/TR/html4/loose.dtd" for HTML, none otherwise |
omit-xml-declaration |
Specifies whether an XML declaration must be omitted |
false for XML and HTML (i.e. a declaration is output by default), ignored otherwise |
standalone |
If true, specifies standalone="yes" in the document
declaration. If false, specifies standalone="no" in the
document declaration. If missing, no standalone attribute is produced.
For more information about standalone document declarations, please
refer to the relevant
section of the XML specification. In most cases, this does not need
to be specified.
|
not specified for XML, ignored otherwise |
indent |
Specifies if the output is indented. This means that line breaks maybe
be inserted between adjacent elements. The actual level of indentation
is specified with the indent-amount configuration element.
|
true (ignored for text method) |
indent-amount |
Specifies the number of indentation space |
1 (ignored for text method) |
Example:
<config> <content-type>text/html</content-type>
<encoding>utf-8</encoding>
<version>4.01</version>
<public-doctype>-//W3C//DTD HTML 4.01//EN</public-doctype>
<system-doctype>http://www.w3.org/TR/html4/strict.dtd</system-doctype>
<indent-amount>4</indent-amount>
</config>
2.2 XML Converter
The XML converter outputs an XML document conform to the XSLT xml
semantic. By default, the output is indented with no spaces and encoded using
the UTF-8 character set. The default MIME content type is
application/xml. The following is a simple XML converter example:
<p:processor name="oxf:xml-converter" xmlns:p="
http://www.orbeon.com/oxf/pipeline"
> <p:input name="config"> <config> <content-type>application/xml</content-type>
<encoding>io-8859-1</encoding>
<version>1.0</version>
</config> </p:input> <p:input name="data" href="oxf:/my-xml-document.xml"/>
<p:output name="data" id="xml-document"/>
</p:processor>
This is an example of output produced by the XML converter:
<document xsi:type="xs:string" content-type="application/xml; charset=io-8859-1"><?xml version="1.0" encoding="io-8859-1" standalone="no"?> <claim xmlns="http://orbeon.org/oxf/examples/bizdoc/claim"> <insured-info>
<general-info> <name-info> <title-prefix>Dr.</title-prefix> <last-name>Doe</last-name> <first-name>John</first-name> <title-suffix/>
</name-info> <address> <address-detail> <street-name>N Columbus Dr.</street-name> <street-number>511</street-number> <unit-number/>
</address-detail> <city>Chicago</city> <state-province>IL</state-province> <postal-code>60611</postal-code> <country>USA</country>
<email>jdoe@acme.org</email> </address> </general-info> <person-info> <gender-code>M</gender-code> <birth-date>1972-10-01</birth-date>
<marital-status-code>C</marital-status-code> <occupation>Manager</occupation> </person-info> <family-info> <children> <child>
<birth-date>2003-02-02</birth-date> <first-name>Marco</first-name> </child> <child> <birth-date/> <first-name/> </child> </children>
<comments>No comments at this point!</comments> </family-info> <claim-info> <accident-type>FOOT</accident-type> <accident-date>2004-07-06</accident-date>
<rate/> </claim-info> </insured-info> </claim>
</document>
2.3 HTML Converter
The HTML converter outputs an HTML document conform to the XSLT
html semantic. By default, the doctype is set to HTML
4.0 Transitional and the content is indented with no space and encoded
using the UTF-8 character set. The default content type is
text/html. The following is a simple HTML converter example:
<p:processor name="oxf:html-converter" xmlns:p="
http://www.orbeon.com/oxf/pipeline"
> <p:input name="config"> <config> <content-type>text/html</content-type>
<encoding>io-8859-1</encoding>
<public-doctype>-//W3C//DTD HTML 4.01 Transitional//EN</public-doctype>
<version>4.01</version>
</config> </p:input> <p:input name="data"> <html> <head> <title>My HTML document</title>
</head> <body> <p>This is the content of the HTML document.
</p> </body> </html> </p:input> <p:output name="data" id="html-document"/>
</p:processor>
This is an example of output produced by the HTML converter:
<document xsi:type="xs:string" content-type="text/html; charset=io-8859-1"><!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <title>My HTML document</title> </head> <body>
<p> This is the content of the HTML document. </p> </body> </html>
</document>
2.4 Text Converter
The Text converter outputs a text document conform to the XSLT text
semantic. By default, the output is encoded using the UTF-8 character set. This
serializer is typically useful for pipelines generating Comma Separated Value
(CSV) files. The default content type is text/plain. The following
is a simple Text converter example:
<p:processor name="oxf:text-converter" xmlns:p="
http://www.orbeon.com/oxf/pipeline"
> <p:input name="config"> </p:input> <p:input name="data"> <document>This is just plain text. It will be output without the<em>text</em>and<em>em</em>elements.
</document> </p:input> <p:output name="data" id="text-document"/>
</p:processor>
This is an example of output produced by the Text converter:
<document xsi:type="xs:string" content-type="text/plain; charset=utf-8">This is just plain text. It will be output without the text and em elements.
</document>
3. XSL-FO Converter
The XSL-FO Converter produces PDF documents from an XSL-FO description of the page. The default
content type is application/pdf.
The resulting binary stream is sent to the data output. It is embedded
in an XML document as specified by the binary document format.
4. XLS Converters
PresentationServer ships with the POI library which allows import and
export of Microsoft Excel documents. PresentationServer uses an Excel file
template to define the layout of the spreadsheet. You define cells that will contain
the values with a special markup.
4.1 Preparing the Spreadsheet
First, create an Excel spreadsheet with the formatting of your choosing. Apply
a special markup to the cell you need to export values to:
- Select the cell
- Go to the menu
Format->Cell
-
In the Number tab, choose the Custom format and
enter a format that looks like: #,##0;"/a/b|/c/d". In this
example we have 2 XPath expressions separated by a pipe character
(|): /a/b and /c/d. The first XPath
expression is used when creating the Excel file (exporting) and is run
against the data input document of the To XLS converter. The
second expression is optional and is used when recreating an XML document
from the Excel file (importing with the From XLS converter).
4.2 To XLS Converter
The To XLS converter takes a config input describing the XLS
template file, and a data input containing the values to be
inserted in the template. The processor scans the template, and applies XPath
expressions to fill in the template. It returns a binary document on it
data output.
The config input takes a
single config element with one attribute:
template
|
A URL pointing to an XLS template file |
<p:processor name="oxf:xls-serializer" xmlns:p="
http://www.orbeon.com/oxf/pipeline"
> <p:input name="config"> <config template="oxf:/excel/template.xls"/>
</p:input> <p:input name="data"> <currency> <value1>10</value1>
<value2>20</value2>
<value3>30</value3>
</currency> </p:input> </p:processor>
The config element can also contain zero or more
repeat-row elements with two attributes, row-num and
for-each.
The To XLS converter is typically connected to the HTTP serializer. This allows
specifying headers such as Content-Disposition:
<!-- Convert to XLS -->
<p:processor name="oxf:to-xls-converter" xmlns:p="
http://www.orbeon.com/oxf/pipeline"
> <p:input name="config"> <config template="oxf:/examples/employees/export-excel/employees.xls"> <repeat-row row-num="3" for-each="employees/employee"/>
</config> </p:input> <p:input name="data" href="#workbook"/>
<p:output name="data" id="xls-binary"/>
</p:processor> <!-- Serialize -->
<p:processor name="oxf:http-serializer" xmlns:p="
http://www.orbeon.com/oxf/pipeline"
> <p:input name="data" href="#xls-binary"/>
<p:input name="config"> <config> <header> <name>Content-Disposition</name>
<value>attachment; filename=employees.xls</value>
</header> </config> </p:input> </p:processor>
4.3 From XLS Converter
The From XLS converter takes an Excel file (for example uploaded with an XForms
upload control), finds special markup cells and reconstructs an XML document
from this markup. The converter has one data input which must
receive a binary document, and
a data output containing the generated XML document. Assume the
following XForms model:
<xf:model xmlns:xf="
http://www.w3.org/2002/xforms"
> <xf:instance> <form> <action/>
<files> <file filename="" mediatype="" size="" xsi:type="xs:anyURI"/>
</files> </form> </xf:instance> <xf:submission method="post" encoding="multipart/form-data"/>
</xf:model>
The model can be filled with the following XForms controls:
<xforms:group ref="/form" xmlns:xforms="
http://www.w3.org/2002/xforms"
> <p> <xforms:upload ref="files/file[1]"/>
<xforms:submit> <xforms:label>Submit</xforms:label>
<xforms:setvalue ref="action">import</xforms:setvalue>
</xforms:submit> </p> </xforms:group>
Then the following pipeline can extract the data from the uploaded file:
<!-- Dereference URI stored in instance and return a binary -->
<p:processor name="oxf:url-generator" xmlns:p="
http://www.orbeon.com/oxf/pipeline"
> <p:input name="config" href="aggregate('config', aggregate('url', #instance#xpointer(string(/form/files/file[1]))), aggregate('content-type', #instance#xpointer('application/octet-stream')))"/>
<p:output name="data" id="xls-binary"/>
</p:processor> <!-- Convert file to XML -->
<p:processor name="oxf:from-xls-converter" xmlns:p="
http://www.orbeon.com/oxf/pipeline"
> <p:input name="data" href="#xls-binary"/>
<p:output name="data" id="xls"/>
</p:processor>
This is an example of returned document, given an appropriate configuration of the
Excel template:
<workbook> <sheet> <employees> <employee-id>5398</employee-id>
<firstname>Nils</firstname>
<lastname>Aas</lastname>
<phone>(555) 123 0434</phone>
<title>Norwegian sculptor and illustrator</title>
<age>70</age>
<manager-id/>
<employee-id>5028</employee-id>
<firstname>Ali</firstname>
<lastname>Abbasi</lastname>
<phone>(555) 123 0060</phone>
<title>BBC Scotland travel presenter</title>
<age>42</age>
<manager-id/>
</employees> </sheet> </workbook>