Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 6 Sep 2010 12:04:37 -0700
From:      Chip Camden <sterling@camdensoftware.com>
To:        FreeBSD Questions <freebsd-questions@freebsd.org>
Subject:   Re: PDF to HTML translations
Message-ID:  <20100906190437.GB26054@libertas.local.camdensoftware.com>
In-Reply-To: <20100906184802.GC28608@guilt.hydra>
References:  <20100904230920.GA20735@guilt.hydra> <20100905065711.GA34993@slackbox.erewhon.net> <20100905083154.GA89704@owl.midgard.homeip.net> <20100906184802.GC28608@guilt.hydra>

next in thread | previous in thread | raw e-mail | index | archive | help

--LpQ9ahxlCli8rRTG
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Quoth Chad Perrin on Monday, 06 September 2010:
> On Sun, Sep 05, 2010 at 10:31:54AM +0200, Erik Trulsson wrote:
> > On Sun, Sep 05, 2010 at 08:57:11AM +0200, Roland Smith wrote:
> > > On Sat, Sep 04, 2010 at 05:09:20PM -0600, Chad Perrin wrote:
> > > > What PDF to HTML translators, other than pdftohtml, am I likely to =
be
> > > > able to find in ports?  I went looking for pdf2html, expecting to f=
ind
> > > > that there, but no luck.  Before I spend hours sifting through, sti=
ll
> > > > without knowing whether I missed something that should be obvious,=
=20
> > >=20
> > > Yes, you did. :-)
>=20
> Apparently not.  See below.
>=20
>=20
> > >=20
> > > > I
> > > > figured I'd ask here whether anyone knows of something off the top =
of
> > > > his/her head.
> > >=20
> > > Try textproc/pdftohtml=20
> >=20
> > Uhm, he said "other than pdftohtml" so I suspect he already knew about
> > that one.
>=20
> This is indeed the case.
>=20
> I appreciate the several suggestions I've received, though I see in
> retrospect that I haven't been sufficiently specific, since I have not
> gotten any suitable answers.
>=20
> I have "inherited" a Perl script that wraps pdftohtml.  The reason a
> wrapper is needed is that a substantial amount of cleanup work is needed
> to produce HTML suitable to our final needs.  The output of pdftohtml is
> sufficiently far from "perfect" that I would like to test the output of a
> few other possible "back ends" for the script to see if a significant
> amount of work being done by the script can be eliminated.
>=20
> Toward that end, the simpler the tool the better -- and the tool on the
> "back end" should not be something that must be contacted across a
> network, or that cannot be redistributed freely.  I wanted to start with
> things I have in the base system on my FreeBSD laptop (where I'm doing my
> development) or through ports.  OpenOffice.org is quite a bit larger and
> more unwieldy than we would really want to deal with at this point.
> Using Google or Adobe tools online is well outside the range of what we
> need (requiring network access for the tool to work).
>=20
> I've started looking at the Xpdf tools as well as pdftohtml.  Other
> suggestions from within ports would be appreciated.  Additional options
> other than what can be found in ports might also be useful, understanding
> the needs I sketched out above.  The script itself is Perl, in case that
> matters.
>=20
> To everyone who has replied so far: thank you for your time.
>=20
> --=20
> Chad Perrin [ original content licensed OWL: http://owl.apotheon.org ]

How about print/p5-PDFLib and print/pecl-pdflib to roll your own?  Maybe
that's more work than you wanted.

--=20
Sterling (Chip) Camden    | sterling@camdensoftware.com | 2048D/3A978E4F
http://camdensoftware.com | http://chipstips.com        | http://chipsquips=
.com

--LpQ9ahxlCli8rRTG
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (FreeBSD)

iQEcBAEBAgAGBQJMhTtEAAoJEIpckszW26+RVKoH/jVEgYohW5uY8QzVcxD4hKM4
EAC7Cvy+KVb++6sJTY9YGPJFPfhZjeMdfPaQXPk4JHdi1FHlcr2NGAYZNy8oelOo
XJWgAAjN22jJFen3Y2UK+3Z2TH+0ZEEaB4TniSkDlAQob5xUz6gBnL1cnOZxoI0z
h32kNmGuMj2YU6kwcl3hFEANhaEox9L10Cu/csYc6AbTts6e8sVUhBs5i8EVb3r+
APhAR7AqqS8WyJr+R9ABl9L3yXdHJYbAXS75aebEt9Mmbz0G7JbBNou7L93E9L7Z
c8oU0KuMVR07VJ0NZA9hLAtTwWOJJsHTzK9WJpFinDpsua1d1kPY3RKwxAWHSJc=
=oRb/
-----END PGP SIGNATURE-----

--LpQ9ahxlCli8rRTG--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100906190437.GB26054>