Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 3 Aug 1999 15:48:03 -0400 (EDT)
From:      stanislav shalunov <shalunov@att.com>
To:        jfieber@FreeBSD.org
Cc:        freebsd-ports@FreeBSD.org
Subject:   sgmlfmt: producing text files
Message-ID:  <199908031948.PAA11536@tuzik.lz.att.com>

next in thread | raw e-mail | index | archive | help
John,

I'm using sgmlfmt (Id: sgmlfmt.pl,v 1.26 1997/05/12 14:16:48 jfieber
Exp, the version that came with 3.1-RELEASE) to format my SGML
linuxdoc documents.  I need to be able to produce plain text output.
I noticed that "sgmlfmt -f ascii" goes through groff to produce the
text.  Unfortunately, this mean that the file will be formatted all
right for printing on a line printer, but that's the least likely use
of a text file (it would rather be used for Usenet postings, emailing,
etc.; if I wanted to print, I'd produce PostScript!).

The disadvantages of using groff to produce text are:

	* Underlined/bold text (easily fixed with "ul -l dumb");

	* Headings/footings (not so easily fixed, because one needs to
	  extract the title, decode entities, etc.);

	* Hyphenations: this makes text not searchable, and
          spell-checking won't work.  It's the accepted practice to
          just wrap the lines on word boundaries.

I found that I can get much better results by editing sgmlfmt so that
$maxlevel=0, producing HTML file, and then doing "lynx -dump -nolist".
I also found that for moderate size documents, I *don't* want to have
have them split in multiple files, so maxlevel 0 seems a very
reasonable default for HTML generation as well.

In short:

Suggestion one: Produce text files from HTML (using "lynx -dump -nolist").

Suggestion two: Make $maxlevel configurable from the command line (I
think latex2html uses option name "-split" for the variable with this
meaning, just in case you want to be consistent with something).

Bug report: When making HTML from a linuxdoc file that has <email> tag
in <author>, email is not handled correctly.

Question: What about producing LaTeX output?  I would very much prefer
to have LaTeX formatting to the shitty paragraphs produced by groff
(no offense to groff, but TeX paragraph formatting and hyphenation
algorithms are just much better).

Seems like the script was last updated a lo-o-ong time ago, do you
still support it?

--Stanislav


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-ports" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199908031948.PAA11536>