1<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> 2<!-- 3#************************************************************** 4# 5# Licensed to the Apache Software Foundation (ASF) under one 6# or more contributor license agreements. See the NOTICE file 7# distributed with this work for additional information 8# regarding copyright ownership. The ASF licenses this file 9# to you under the Apache License, Version 2.0 (the 10# "License"); you may not use this file except in compliance 11# with the License. You may obtain a copy of the License at 12# 13# http://www.apache.org/licenses/LICENSE-2.0 14# 15# Unless required by applicable law or agreed to in writing, 16# software distributed under the License is distributed on an 17# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY 18# KIND, either express or implied. See the License for the 19# specific language governing permissions and limitations 20# under the License. 21# 22#************************************************************** 23 --> 24<html> 25<head> 26<title>org.openoffice.xmerge.converter.xml.sxw.aportisdoc package</title> 27</head> 28 29<body bgcolor="white"> 30 31<p>Provides the tools for doing the conversion of StarWriter XML to 32and from AportisDoc format.</p> 33 34<p>It follows the {@link org.openoffice.xmerge} framework for the conversion process.</p> 35 36<p>Since it converts to/from a Palm application format, these converters 37follow the <a href=../../../../converter/palm/package-summary.html#streamformat> 38<code>PalmDB</code> stream format</a> for writing out to the Palm sync client or 39reading in from the Palm sync client.</p> 40 41<p>Note that <code>PluginFactoryImpl</code> also provides a 42<code>DocumentMerger</code> object, i.e. {@link org.openoffice.xmerge.converter.xml.sxw.aportisdoc.DocumentMergerImpl DocumentMergerImpl}. 43This functionality was derived from its superclass 44{@link org.openoffice.xmerge.converter.xml.sxw.SxwPluginFactory 45SxwPluginFactory}.</p> 46 47<h2>AportisDoc pdb format - Doc</h2> 48 49<p>The AportisDoc pdb format is widely used by different Palm applications, 50e.g. QuickWord, AportisDoc Reader, MiniWrite, etc. Note that some 51of these applications put tweaks into the format. The converters will only 52support the default AportisDoc format, plus some very minor tweaks to accommodate 53other applications.</p> 54 55<p>The text content of the format is plain text, i.e. there are no styles 56or structures. There is no notion of lists, list items, paragraphs, 57headings, etc. The format does have support for bookmarks.</p> 58 59<p>For most Doc applications, the default character encoding supported is 60the extended ASCII character set, i.e. ISO-8859-1. StarWriter XML is in 61UTF-8 encoding scheme. Since UTF-8 encoding scheme covers more characters, 62converting UTF-8 strings into extended ASCII would mean that there can be 63possible loss of character mappings.</p> 64 65<p>Using JAXP, XML files can be parsed and read in as Java <code>String</code>s 66which is in Unicode format, there is no loss of character mapping from UTF-8 67to Java Strings. There is possible loss of character mapping in 68converting Java <code>String</code>s to ASCII bytes. Java characters that 69cannot be represented in extended ASCII are converted into the ASCII 70character '?' or x3F in hex digit via the <code>String.getBytes(encoding)</code> 71API.</p> 72 73<h2>SXW to DOC Conversion</h2> 74 75<p>The <code>DocumentSerializerImpl</code> class implements the 76<code>org.openoffice.xmerge.DocumentSerializer</code>. 77This class specifically provides the conversion process from a given 78<code>SxwDocument</code> object to DOC formatted records, which are 79then passed back to the client via the <code>ConvertData</code> object.</p> 80 81<p>The following XML tags are handled. [Note that some may not be implemented yet.]</p> 82<ul> 83<li> 84 <p>Paragraphs <tt><text:p></tt> and Headings <tt><text:h></tt></p> 85 86 <p>Heading elements are classified the same as paragraph 87 elements since both have the same possible elements inside. 88 Their main difference is that they refer to different types 89 of style information, which is outside of their element tags. 90 Since there are no styles on the DOC format, headings should 91 be treated the same way a paragraph is converted.</p> 92 93 <p>For paragraph elements, convert and transfer text nodes 94 that are essential. Text nodes directly contained within paragraph 95 nodes are such. There are also a number of elements that 96 a paragraph element may contain. These are explained in their 97 own context.</p> 98 99 <p>At the end of the paragraph, an EOL character is added by 100 the converter to provide a separation for each paragraph, 101 since the Doc format does not have a notion of a paragraph.</p> 102</li> 103<li> 104 <p>White spaces <tt><text:s></tt> and Tabs <tt><text:tab-stop></tt></p> 105 106 <p>In SXW, normally 2 or more white-space characters are collapsed into 107 a single space character. In order to make sure that the document 108 content really contains those white-space characters, there are special 109 elements assigned to them.</p> 110 111 <p>The space element specifies the number of spaces are in it. 112 Thus, converting it just means providing the specific number of spaces 113 that the element requires.</p> 114 115 <p>There is also the tab-stop element. This is a bit tricky. In a 116 StarWriter document, tab-stops are specified by a column position. 117 A tab is not an exact number of space, but rather a specific column 118 positioning. Say, regular tab-stops are set at every 5th column. 119 At column 4, if I hit a tab, it goes to column 5. At column 1, hitting 120 a tab would put the cursor at column 5 as well. SmartDoc and AporticDoc 121 applications goes by columns for the ASCII tab character. The only problem 122 is that in StarWriter, one could specify a different tab-stop, but not 123 in most of these Doc applications, at least I have not seen one. 124 Solution for this is just to go with the converting to the ASCII tab 125 character and not do anything for different tab-stop positioning.</p> 126</li> 127<li> 128 <p>Line breaks <tt><text:line-break></tt></p> 129 130 <p>To represent line breaks, it is simpliest to just put an ASCII LF 131 character. Note that the side effect of this is that an end of paragraph 132 also contains an ASCII LF character. Thus, for the DOC to SXW conversion, 133 line breaks are not distinguishable from specifying the end of a 134 paragraph.</p> 135</li> 136<li> 137 <p>Text spans <tt><text:span></tt></p> 138 139 <p>Text spans contain text that have different style attributes 140 from the paragraphs'. Text spans can be embedded within another 141 text span. Since it is purely for style tagging, we only needed 142 to convert and transfer the text elements within these.</p> 143</li> 144<li> 145 <p>Hyperlinks <tt><text:a></tt> 146 147 <p>Convert and transfer the text portion.</p> 148</li> 149<li> 150 <p>Bookmarks <tt><text:bookmark></tt> <tt><text:bookmark-start></tt> 151 <tt><text:bookmark-end></tt> [Not implemented yet]</p> 152 153 <p>In SXW, bookmark elements are embedded inside paragraph elements. 154 Bookmarks can either mark a text position or a text range. <tt><text:bookmark></tt> 155 marks a position while the pair <tt><text:bookmark-start></tt> and 156 <tt><text:bookmark-end></tt></p> marks a text range. The DOC format only 157 supports bookmarking a text position. Thus, for the conversion, 158 <tt><text:bookmark></tt> and <tt><text:bookmark-start></tt> will both mark 159 a text position.</p> 160</li> 161<li> 162 <p>Change Tracking <tt><text:tracked-changes></tt> 163 <tt><text:change*></tt> [Not implemented yet]</p> 164 165 <p>Change tracking elements are not supported yet on the current 166 OpenOffice.org XML filters, will have to watch out on this. The text 167 within these elements have to be interpreted properly during the 168 conversion process.</p> 169</li> 170<li> 171 <p>Lists <tt><text:unordered-list></tt> and 172 <tt><text:ordered-lists></tt></p> 173 174 <p>A list can only contain one optional <tt><text:list-header></tt> 175 and one or more <tt><text:list-item></tt> elements.</p> 176 177 <p>A <tt><text:list-header></tt> contains one or more paragraph 178 elements. Since there are no styles, the conversion process does not 179 do anything special for list headers, conversion for the paragraphs 180 within list headers are the same as explained above.</p> 181 182 <p>A <tt><text:list-item></tt> may contain one or more of paragraphs, 183 headings, list, etc. Since the Doc format does not support any list 184 structure, there will not be any special handling for this element. 185 Conversion for elements within it shall be applied according to the 186 element type. Thus, lists with paragraphs within it will result in just 187 plain paragraphs. Sublists will not be identifiable. Paragraphs in 188 sublists will still appear.</p> 189</li> 190<li> 191 <p><tt><text:section></tt></p> 192 193 <p>I am not sure what this is yet, will need to investigate more on this.</p> 194</li> 195</ul> 196<p>There may be other tags that will still need to be addressed for this conversion.</p> 197 198<p>Refer to {@link org.openoffice.xmerge.converter.xml.sxw.aportisdoc.DocumentSerializerImpl DocumentSerializerImpl} 199for details of implementation. It uses <code>DocEncoder</code> class to do the encoding 200part.</p> 201 202<h2>DOC to SXW Conversion</h2> 203 204<p>The <code>DocumentDeserializerImpl</code> class implements the 205<code>org.openoffice.xmerge.DocumentDeserializer</code>. It is 206passed the device document in the form of a <code>ConvertData</code> object. 207It will then create a <code>SxwDocument</code> object from the conversion of 208the DOC formatted records.</p> 209 210<p>The text content of the Doc format will be transferred as text. Paragraph 211elements will be formed based on the existence of an ASCII LF character. There 212will be at least one paragraph element.</p> 213 214<p>Bookmarks in the Doc format will be converted to the bookmark element 215<tt><text:bookmark></tt> [Not implemented yet].</p> 216 217 218<h2>Merging changes</h2> 219 220<p>As mentioned above, the <code>DocumentMerger</code> object produced by 221<code>PluginFactoryImpl</code> is <code>DocumentMergerImpl</code>. 222Refer to the javadocs for that package/class on its merging specifications. 223</p> 224 225<h2>TODO list</h2> 226 227<p><ol> 228<li>Investigate Palm's with different character encodings.</li> 229<li>Investigate other StarWriter XML tags</li> 230</ol></p> 231 232</body> 233</html> 234