



































|
|
|
This
document describes the transformation API for XML (TrAX), the set
of APIs contained in
javax.xml.transform,
javax.xml.transform.stream,
javax.xml.transform.dom, and
javax.xml.transform.sax.
There is a
broad need for Java applications to be able to transform XML and
related tree-shaped data structures. In fact, XML is not normally
very useful to an application without going through some sort of
transformation, unless the semantic structure is used directly as
data. Almost all XML-related applications need to perform
transformations. Transformations may be described by Java code,
Perl code, XSLT
Stylesheets, other types of script, or by proprietary formats. The
inputs, one or multiple, to a transformation, may be a URL, XML
stream, a DOM tree, SAX Events, or a proprietary format or data
structure. The output types are the pretty much the same types as
the inputs, but different inputs may need to be combined with
different outputs.
The great
challenge of a transformation API is how to deal with all the
possible combinations of inputs and outputs, without becoming
specialized for any of the given types.
The Java
community will greatly benefit from a common API that will allow
them to understand and apply a single model, write to consistent
interfaces, and apply the transformations polymorphically. TrAX
attempts to define a model that is clean and generic, yet fills
general application requirements across a wide variety of
uses.
|
|
|
This
section will explain some general terminology used in this
document. Technical terminology will be explained in the Model
section. In many cases, the general terminology overlaps with the
technical terminology.
- Tree
- This
term, as used within this document, describes an abstract structure
that consists of nodes or events that may be produced by XML. A
Tree physically may be a DOM tree, a series of well balanced parse
events (such as those coming from a SAX2 ContentHander), a series
of requests (the result of which can describe a tree), or a stream
of marked-up characters.
- Source
Tree(s)
- One or
more trees that are the inputs to the transformation.
- Result
Tree(s)
- One or
more trees that are the output of the transformation.
- Transformation
- The
process of consuming a stream or tree to produce another stream or
tree.
- Identity
(or Copy) Transformation
- The
process of transformation from a source to a result, making as few
structural changes as possible and no informational changes. The
term is somewhat loosely used, as the process is really a copy.
from one "format" (such as a DOM tree, stream, or set of SAX
events) to another.
- Serialization
- The
process of taking a tree and turning it into a stream. In some
sense, a serialization is a specialized transformation.
- Parsing
- The
process of taking a stream and turning it into a tree. In some
sense, parsing is a specialized transformation.
- Transformer
- A
Transformer is the object that executes the
transformation.
- Transformation
instructions
- Describes
the transformation. A form of code, script, or simply a declaration
or series of declarations.
- Stylesheet
- The same
as "transformation instructions," except it is likely to be used in
conjunction with XSLT.
- Templates
- Another
form of "transformation instructions." In the TrAX interface, this
term is used to describe processed or compiled transformation
instructions. The Source flows through a Templates object to be
formed into the Result.
- Processor
- A general
term for the thing that may both process the transformation
instructions, and perform the transformation.
- DOM
- Document
Object Model, specifically referring to the Document Object Model
(DOM) Level 2 Specification.
- SAX
- Simple
API for XML, specifically referring to the SAX 2.0
release.
|
|
|
The
following requirements have been determined from broad experience
with XML projects from the various members participating on the
JCP.
- TrAX must
provide a clean, simple interface for simple uses.
- TrAX must
be powerful enough to be applied to a wide range of uses, such as,
e-commerce, content management, server content delivery, and client
applications.
- A
processor that implements a TrAX interface must be optimizeable.
Performance is a critical issue for most transformation use
cases.
- As a
specialization of the above requirement, a TrAX processor must be
able to support a compiled model, so that a single set of
transformation instructions can be compiled, optimized, and applied
to a large set of input sources.
- TrAX must
not be dependent an any given type of transformation instructions.
For instance, it must remain independent of XSLT.
- TrAX must
be able to allow processors to transform DOM trees.
- TrAX must
be able to allow processors to produce DOM trees.
- TrAX must
allow processors to transform SAX events.
- TrAX must
allow processors to produce SAX events.
- TrAX must
allow processors to transform streams of XML.
- TrAX must
allow processors to produce XML, HTML, and other types of
streams.
- TrAX must
allow processors to implement the various combinations of inputs
and outputs within a single processor.
- TrAX must
allow processors to implement only a limited set of inputs. For
instance, it should be possible to write a processor that
implements the TrAX interfaces and that only processes DOM trees,
not streams or SAX events.
- TrAX
should allow a processor to implement transformations of
proprietary data structures. For instance, it should be possible to
implement a processor that provides TrAX interfaces that performs
transformation of JDOM trees.
- TrAX must
allow the setting of serialization properties, without constraint
as to what the details of those properties are.
- TrAX must
allow the setting of parameters to the transformation
instructions.
- TrAX must
support the setting of parameters and properties as XML Namespaced
items (i.e., qualified names).
- TrAX must
support URL resolution from within the transformation, and have it
return the needed data structure.
- TrAX must
have a mechanism for reporting errors and warnings to the calling
application.
|
|
|
The
section defines the abstract model for TrAX, apart from the details
of the interfaces.
A TRaX TransformerFactory
is an object that processes transformation instructions, and
produces Templates (in
the technical terminology). A Templates object provides a
Transformer, which
transforms one or more Sources into one or more Results.
To use the
TRaX interface, you create a TransformerFactory,
which may directly provide a Transformers, or which can
provide Templates from a
variety of Sources. The Templates object is a
processed or compiled representation of the transformation
instructions, and provides a Transformer. The Transformer processes a Source according to the
instructions found in the Templates, and produces a Result.
The
process of transformation from a tree, either in the form of an
object model, or in the form of parse events, into a stream, is
known as serialization. We believe this is the most suitable
term for this process, despite the overlap with Java object
serialization.
|
|
|
The
intent, responsibilities, and thread safety of TrAX
objects:
|
|
- Intent
- Generic concept for the set of objects that implement the
TrAX interfaces.
- Responsibilities
- Create compiled transformation instructions, transform
sources, and manage transformation parameters and
properties.
- Thread safety
- Only the Templates object can be used concurrently in
multiple threads. The rest of the processor does not do
synchronized blocking, and so may not be used to perform multiple
concurrent operations.
|
|
|
- Intent
- Serve as a vendor-neutral Processor interface for XSLT and similar
processors.
- Responsibilities
- Serve as a factory for a concrete implementation of an
TransformerFactory, serve as a direct factory for Transformer
objects, serve as a factory for Templates objects, and manage
processor specific features.
- Thread safety
- A TransformerFactory may not perform mulitple concurrent
operations.
|
|
|
- Intent
- The runtime representation of the transformation
instructions.
- Responsibilities
- A data bag for transformation instructions; act as a
factory for Transformers.
- Thread safety
- Threadsafe for concurrent usage over multiple threads
once construction is complete.
|
|
|
- Intent
- Act as a per-thread execution context for
transformations, act as an interface for performing the
transformation.
- Responsibilities
- Perform the transformation.
- Thread safety
- Only one instance per thread is safe.
 |
The Transformer is
bound to the Templates object that created it. |
|
|
|
- Intent
- Serve as a single vendor-neutral object for multiple
types of input.
- Responsibilities
- Act as simple data holder for System IDs, DOM nodes,
streams, etc.
- Thread safety
- Threadsafe concurrently over multiple threads for
read-only operations; must be synchronized for edit
operations.
|
|
|
Alternative name: ResultTarget.
- Intent
- Serve as a single object for multiple types of output, so
there can be simple process method signatures.
- Responsibilities
- Act as simple data holder for output stream, DOM node,
ContentHandler, etc.
- Thread safety
- Threadsafe concurrently over multiple threads for
read-only, must be synchronized for edit.
|
|
|
|