NAME

docx2troff, docx2txt, word2troff, word2txt – translate Microsoft(tm) Office(tm) documents

SYNOPSIS

docx2troff [ file.docx ]
docx2txt [ file.docx ]
opc/word2troff
opc/word2txt

DESCRIPTION

Microsoft's new format for Office documents is a zip'ed directory hierarchy containing XML files. This format is known as the ``Open Packaging Convention'' or OPC.
Docx2txt is an rc(1) script that uses fs/zipfs (see tapefs(4)) and opc/word2txt to extract the printable text from the body of a Microsoft Word docx document and write it on the standard output. Typically this is then piped through fmt(1) to wrap paragraphs.
Docx2troff is similar, but emits troff source corresponding to the document. If the document contains tables additional commands will be emitted for tbl(1) Opc/word2troff does not attempt to produce an exact facsimile of the source layout, but rather a reasonable looking troff version of the document.

SOURCE

/sys/src/cmd/opc

SEE ALSO

docx2troff(1), tapefs(4), xml(2)
``2007 Office Document: Open XML Markup Explained'', http://www.microsoft.com/en–us/download/details.aspx?id=15359