Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 26 Oct 2010 22:03:01 +0200
From:      Roland Smith <rsmith@xs4all.nl>
To:        Gary Kline <kline@thought.org>
Cc:        Polytropon <freebsd@edvax.de>, FreeBSD Mailing List <freebsd-questions@freebsd.org>, Liontaur <liontaur@gmail.com>
Subject:   Re: Is there any way of transfering my excellent PDF file into plain HTML
Message-ID:  <20101026200301.GA12886@slackbox.erewhon.net>
In-Reply-To: <20101026193020.GA3792@thought.org>
References:  <20101026182958.GA3646@thought.org> <AANLkTikBba5k4CAG_Qa%2BBjwy-URiKY0ak%2BNK0cc38OJB@mail.gmail.com> <20101026205924.91748d4c.freebsd@edvax.de> <20101026193020.GA3792@thought.org>

next in thread | previous in thread | raw e-mail | index | archive | help

--vkogqOf2sHV7VnPd
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Tue, Oct 26, 2010 at 12:30:20PM -0700, Gary Kline wrote:
> On Tue, Oct 26, 2010 at 08:59:24PM +0200, Polytropon wrote:
> > On Tue, 26 Oct 2010 11:38:20 -0700, Liontaur <liontaur@gmail.com> wrote:
> > > Related but slightly OT, I've never had much luck getting it the othe=
r way
> > > around, HTML to PDF. It's often off a bit. I can't remember off the t=
op of
> > > my head what ports i've tried but yea. Either the images are wonky or=
 my
> > > forms go wonky.
> >=20
> > This is simply because HTML is not typesetting-capable. Depending
> > on the source of the PDF file, it may help to convert from THAT
> > format instead from PDF. E. g. if you have a .tex (LaTeX) file
> > that has been the source of the PDF file, you can use a converter
> > from LaTeX to HTML, often with acceptable results.
> >=20
> > The HTML concept, especially when incorporating CSS for formatting,
> > _can_ be used to gain a bit typographic quality, e. g. by defining
> > parameters for "screen" and for "printed" media. Still it suffers
> > from things like maintaining good grey values, hypenation and
> > ligatures.

You can add proper justification to the list that HTML doesn't do well!

> 	Hmm. The ligatures that looked so great in my .tex/PDF output
> 	got lost.

Very few programs do ligatures well. If you're using unicode text, you can =
use
them directly in your text, like this: =EF=AC=80 =EF=AC=81 =EF=AC=82 =EF=AC=
=83 =EF=AC=84 =EF=AC=86

How well these look depends on the fonts used. I've got a whole list of han=
dy
unicode characters on my webpage. See the entry marked 2010-10-16.

>       Only that somehow, HTML4 can read the hex code that
> 	abiword's html created.  :-)   Also, the `` and '' look great in
> 	Times.  I fixed the page numbers--all had to go away; I edited
> 	the chapter headings--all by hand.  What's left are the hundreds
> 	of broken paragraphs.

You might fare better by taking the TeX souce, run it though detex(1) and u=
se
markdown [http://daringfireball.net/projects/markdown/] do create HTML.

> 	What utility take a LaTeX file -> HTML?  ((Be nice to have both
> 	*strictly professional typeset* and then HTML.  I can add
> 	indents for AE style paragraphing, and much more.  Fix the
> 	hyphenation, etc.

Next to the obvious textproc/latex2html? :-)

Roland
--=20
R.F.Smith                                   http://www.xs4all.nl/~rsmith/
[plain text _non-HTML_ PGP/GnuPG encrypted/signed email much appreciated]
pgp: 1A2B 477F 9970 BA3C 2914  B7CE 1277 EFB0 C321 A725 (KeyID: C321A725)

--vkogqOf2sHV7VnPd
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.16 (FreeBSD)

iEYEARECAAYFAkzHM/UACgkQEnfvsMMhpyUkXwCgnt7OZRIRy2ia7jXyxkQ/vQK0
MncAn3AKJd3cAEJdOTsPFQY1kaCR3EVo
=Nqki
-----END PGP SIGNATURE-----

--vkogqOf2sHV7VnPd--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20101026200301.GA12886>