tree: 6b5d3d1ae36c2396730b57d79832b1ff27137887 [path history] [tgz]
  1. .gitattributes
  2. .gitignore
  3. CMakeLists.txt
  5. DocFormats/
  6. LICENSE.txt
  8. NOTICE.txt
  10. THANKS
  11. consumers/
  12. schemas/

About Apache Corinthia (incubating)

Corinthia is a set of libraries and tools for dealing with different file formats for productivity applications, with an initial focus on word processing. The goal of the project is to provide components which developers can easily integrate into their own applications and scripts for converting and manipulating data in a wide range of formats via a consistent interface.

This is the first public release of Corinthia, and consists of a single core library called DocFormats. The library provides two-way conversion between OOXML word processing documents (aka Microsoft Word .docx) and HTML. The Microsoft Word support has previously been used in commercial applications and is fairly mature. Support for other file formats is in development, but not part of this release.

The Corinthia project is part of the Apache Software Foundation incubator, which it entered on December 8, 2014. The accepted proposal and incubation status provide incubation background and progress information.

The communication hub of the project is the development mailing list,

dev @

To receive list postings and interact on the list, simply send a message to

dev-subscribe @

from the email address to receive list messages at. The reply from the list robot to that address provides confirmation instructions and information on managing the subscription.

Further links:

These sites and the documentation for this project are at a preliminary stage. Content will be moved to Apache and improved as incubation moves along.

There is also a Facebook page and a Twitter account, @ApacheCorinthia.


Corinthia is licensed under the Apache License version 2.0; see LICENSE.txt for details.

What the library can do

  1. Create new HTML files from a .docx source
  2. Create new .docx files from a HTML source
  3. Update existing .docx files based on a modified HTML file produced in (1)
  4. Convert .docx or HTML files to LaTeX
  5. Provide access to document structure, in terms of a DOM-like API for manipulating XML trees, and an object model for working with CSS stylesheets


There are three major components, in their respective directories:

  • DocFormats - file format conversion library
  • dfconvert - driver program for performing conversions
  • dftest - test harness

Run dfconvert without any command-line arguments to see a list of possible operations. The following is an example of converting a .docx file to HTML, modifying it, and then updating the original .docx file based on the modified HTML file. Any content or formatting information that could not be converted to HTML (e.g. embedded spreadsheets) will be left untouched.

dfconvert get report.docx report.html
vi report.html # Make some changes
dfconvert put report.docx report.html

Note that when executing a put operation to update the document, the .docx file must be identical to that from which the HTML file was originally generated. This is because of assumptions the update process relies on about the relationship between elements in the HTML file and their counterparts in the .docx file. If you have modified the .docx file between get and put, or execute a put on the same file twice, this will be automatically detected and an error will be reported.

Look at consumers/dfconvert/src/main.c to see how to use the API. The public API headers are in the DocFormats/api/headers/DocFormats directory.

Platforms and dependencies

Corinthia builds and runs on OS X, Linux and Windows.

To build DocFormats, you will need to have the following installed:

Build instructions

Corinthia currently builds on Linux, OS X and Windows. See the build instructions.


Contributors are welcome. Details on how to participate on the project will be posted soon.

Meanwhile, the easiest way to contribute is by subscribing to the development list and asking your questions and offering suggestions there.