Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 9 Aug 2011 16:40:14 -0400
From:      Alejandro Imass <ait@p2ee.org>
To:        freebsd-questions@freebsd.org
Subject:   Re: extracting text from docx files
Message-ID:  <CAHieY7SKAbaqyY9LkS6krsxtHOUHi1i4RAtV03AmmLcSt9cAbA@mail.gmail.com>
In-Reply-To: <CAJ5UdcNqxZwTjs33xdUXatWCN%2BSDP3EFkqb_MYeVTF34rvsmxg@mail.gmail.com>
References:  <20110809133632.GA37445@mech-cluster241.men.bris.ac.uk> <20110809191610.GA6129@nyx.user-mode.org> <CAJ5UdcNqxZwTjs33xdUXatWCN%2BSDP3EFkqb_MYeVTF34rvsmxg@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Aug 9, 2011 at 3:57 PM, Antonio Olivares
<olivares14031@gmail.com> wrote:
>> But if you really, really need to read docx, you can try the web
>> application from Microsoft. A few months ago, I got also a lot of docx
>> and I opend it with the microsoft web app; this worked for me to extract
>> the information...
>>

just a thought here but if docx is XML why not just find/build some
XSLT that extracts what you need into another format?
you probably have libxml2 and libxslt already in your system, and the
command line utility: xsltproc
there are probably already existing XSLT to transform to RTF and plain text.

--
Alejandro Imass



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAHieY7SKAbaqyY9LkS6krsxtHOUHi1i4RAtV03AmmLcSt9cAbA>