commit | 295caa80be4cfc37e0e7bdd4aa3fcc6ad7b1a1b6 | [log] [tgz] |
---|---|---|
author | Gabriela Gibson <gbg@apache.org> | Sat May 23 00:21:12 2015 +0100 |
committer | Gabriela Gibson <gbg@apache.org> | Sat May 23 00:21:12 2015 +0100 |
tree | 2e394afc602762f4380ba17519fed1f861ded1bc | |
parent | f60c7ce9c4313e47d33944e96df51eda27cd4cba [diff] |
Tables, headers, lists, bold, italic, underline. Tables, headers and lists are working albeit in a very rudimentary fashion. Regards bold, italic and underline, nothing is what I expected -- everything seems to be out of sync. I expected a node list of the shape: HTML_I DOM_TEXT HTML_B DOM_TEXT ... Where I am assuming that eventually the HTML tag is used to wrap the next DOM_TEXT up, with something of the shape like: <i>DOM_TEXT</i> <b>DOM_TEXT</b> ... But nodes get produced that do not seem to have this ordering. ----------------------------------------------------------- Code changes: ----------------------------------------------------------- Personal small samples this code should cover (mostly): * sample/documents/odf/Table.odt * sample/documents/odf/bold-italic-underlined.odt * sample/documents/odf/headers.odt * sample/documents/odf/lists.odt * .gitignore Add emacs and patch exclusions. * CMakeLists.txt Add -DCOLOR=1 to gcc cmd, so I can read the long output easier. Colors will be removed again, but it makes it much easier to spot things (for me) * DocFormats/core/src/xml/DFNameMap.c Reason: Needed to access DFNameMap but could not do so because it was defined inside this file. Move to the header file. (HASH_TABLE_SIZE): Remove. (DFNameEntry): Remove. (DFNameHashTable): Remove. (DFNameMap): Remove. * DocFormats/core/src/xml/DFNameMap.h (HASH_TABLE_SIZE): Add. (DFNameEntry): Add. (DFNameHashTable): Add. (DFNameMap): Add.. * DocFormats/filters/odf/CMakeLists.txt (set): Add temporary color.* files. * DocFormats/filters/odf/src/text/ODFText.c (ODFTextGet): Put in a better structure. * DocFormats/filters/odf/src/text/color.h * DocFormats/filters/odf/src/text/color.c ANSI colors for easier debugging of long lists of text. * DocFormats/filters/odf/src/text/gbg_test.c (#include) "DFNameMap.h" (find_HTML): Add lots of switch cases and a structure that reports missing tags and prints them as ready code to stdout. (show_nodes): dev tool -- print all the nodes. (print_node_info): dev tool -- print everything possible about a node. (node_id_info): dev tool -- print minimalist info about a node. (missing_tag_info): dev tool -- create minimalist char* about a missing node. (print_line): dev tool -- 3 styles of printed lines, to make recognition easier. * DocFormats/filters/odf/src/text/gbg_test.h (#define) TAG_NOT_FOUND: New tag that has never been seen before. TAG_NOT_MATCHED: Known tag that has no HTML match yet. (various): prototype definitions for gbg_test.c
Corinthia is a library for converting between different word-processing file formats. Initially, it supports .docx (part of the OOXML specification), HTML, and LaTeX (export-only). The Corinthia project also provides convenience executables. The library has shipped as part of UX Write since February 2013.
On December 8, 2014, Corinthia entered the Apache Software Foundation incubator. The accepted proposal and incubation status provide incubation background and progress information.
The communication hub of the project is the development mailing list,
dev @ corinthia.incubator.apache.org
To receive list postings and interact on the list, simply send a message to
dev-subscribe @ corinthia.incubator.apache.org
from the email address to receive list messages at. The reply from the list robot to that address provides confirmation instructions and information on managing the subscription.
There are a Corinthia incubator web site, a project wiki, and a JIRA issue tracker.
The sites and documentation for this project are at a preliminary stage. Content will be moved to Apache and improved as incubation moves along.
Meanwhile, there is a Facebook page and a Twitter account, @ApacheCorinthia.
Corinthia is licensed under the Apache License version 2.0; see LICENSE.txt for details.
There are three major components, in their respective directories:
DocFormats
- the library itselfdfutil
- a driver program used for running [...]Run dfutil without any command-line arguments to see a list of operations. Here is an example of converting a .docx file to HTML, modifying it, and then updating the original .docx. Note that it is important, due to how internal mapping works, that the .docx file being written is the same file as the original; using a new file won't work.
dfutil filename.docx filename.html vi filename.html # Make some changes dfutil filename.html filename.docx
If you examine the convertFile function in dfutil/Commands.c
, you will see the main entry points to perform these conversions, which you can call from your own program.
Corinthia builds and runs on iOS, OS X, and Linux. Windows support is in the works.
To build DocFormats, you will need to have the following installed:
Corinthia currently builds on Linux and OS X (mac). See the build instructions.
Contributors are welcome and prized. Details on how to participate on the project will be posted soon.
Meanwhile, the easiest way to contribute is by subscribing to the development list and asking your questions and offering suggestions there.