Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 9 Aug 2011 21:16:11 +0200
From:      Christian Barthel <test@nyx.user-mode.org>
To:        Anton Shterenlikht <mexas@bristol.ac.uk>
Cc:        freebsd-questions@freebsd.org
Subject:   Re: extracting text from docx files
Message-ID:  <20110809191610.GA6129@nyx.user-mode.org>
In-Reply-To: <20110809133632.GA37445@mech-cluster241.men.bris.ac.uk>
References:  <20110809133632.GA37445@mech-cluster241.men.bris.ac.uk>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Aug 09, 2011 at 02:36:32PM +0100, Anton Shterenlikht wrote:
> I often receive information in *.docx format
> from my MS using colleagues. Sometimes I can
> ask for a pdf (or similar) instead, but not always.

You have a lot of nice options: 
- Force them to use BSD/Linux ;)
- explain them, why docx is shit!
- don't read it

> 
> Usually I unzip a docx and then search
> through all *xml  files to find the
> useful data. However, I can't find any
> xml styles to use, so I have to convert
> the relevant xml file(s) to plain text
> by hand. I wonder if anybody can suggest
> a better way. Perhaps there's something
> in ports that can help.

But if you really, really need to read docx, you can try the web
application from Microsoft. A few months ago, I got also a lot of docx
and I opend it with the microsoft web app; this worked for me to extract
the information...

More information: 
http://office.microsoft.com/en-us/web-apps/

The downside:  you have to sign up on a microsoft service :( 

cheers

-- 
Christian Barthel 
Public-Key: http://bc.user-mode.org/bc.asc 
Mail: bc@nyx.user-mode.org
Web: http://bc.user-mode.org



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110809191610.GA6129>