Provides the tools for doing the conversion of StarWriter XML to and from AportisDoc format.

It follows the {@link org.openoffice.xmerge} framework for the conversion process.

Since it converts to/from a Palm application format, these converters follow the PalmDB stream format for writing out to the Palm sync client or reading in from the Palm sync client.

Note that PluginFactoryImpl also provides a DocumentMerger object, i.e. {@link org.openoffice.xmerge.converter.xml.sxw.aportisdoc.DocumentMergerImpl DocumentMergerImpl}. This functionality was derived from its superclass {@link org.openoffice.xmerge.converter.xml.sxw.SxwPluginFactory SxwPluginFactory}.

AportisDoc pdb format - Doc

The AportisDoc pdb format is widely used by different Palm applications, e.g. QuickWord, AportisDoc Reader, MiniWrite, etc. Note that some of these applications put tweaks into the format. The converters will only support the default AportisDoc format, plus some very minor tweaks to accommodate other applications.

The text content of the format is plain text, i.e. there are no styles or structures. There is no notion of lists, list items, paragraphs, headings, etc. The format does have support for bookmarks.

For most Doc applications, the default character encoding supported is the extended ASCII character set, i.e. ISO-8859-1. StarWriter XML is in UTF-8 encoding scheme. Since UTF-8 encoding scheme covers more characters, converting UTF-8 strings into extended ASCII would mean that there can be possible loss of character mappings.

Using JAXP, XML files can be parsed and read in as Java Strings which is in Unicode format, there is no loss of character mapping from UTF-8 to Java Strings. There is possible loss of character mapping in converting Java Strings to ASCII bytes. Java characters that cannot be represented in extended ASCII are converted into the ASCII character '?' or x3F in hex digit via the String.getBytes(encoding) API.

SXW to DOC Conversion

The DocumentSerializerImpl class implements the org.openoffice.xmerge.DocumentSerializer. This class specifically provides the conversion process from a given SxwDocument object to DOC formatted records, which are then passed back to the client via the ConvertData object.

The following XML tags are handled. [Note that some may not be implemented yet.]

There may be other tags that will still need to be addressed for this conversion.

Refer to {@link org.openoffice.xmerge.converter.xml.sxw.aportisdoc.DocumentSerializerImpl DocumentSerializerImpl} for details of implementation. It uses DocEncoder class to do the encoding part.

DOC to SXW Conversion

The DocumentDeserializerImpl class implements the org.openoffice.xmerge.DocumentDeserializer. It is passed the device document in the form of a ConvertData object. It will then create a SxwDocument object from the conversion of the DOC formatted records.

The text content of the Doc format will be transferred as text. Paragraph elements will be formed based on the existence of an ASCII LF character. There will be at least one paragraph element.

Bookmarks in the Doc format will be converted to the bookmark element <text:bookmark> [Not implemented yet].

Merging changes

As mentioned above, the DocumentMerger object produced by PluginFactoryImpl is DocumentMergerImpl. Refer to the javadocs for that package/class on its merging specifications.

TODO list

  1. Investigate Palm's with different character encodings.
  2. Investigate other StarWriter XML tags