Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 3 Aug 1999 23:26:05 -0400 (EDT)
From:      Chuck Robey <chuckr@picnic.mat.net>
To:        stanislav shalunov <shalunov@att.com>
Cc:        jfieber@FreeBSD.ORG, freebsd-ports@FreeBSD.ORG
Subject:   Re: sgmlfmt: producing text files
Message-ID:  <Pine.BSF.4.10.9908032321350.451-100000@picnic.mat.net>
In-Reply-To: <199908031948.PAA11536@tuzik.lz.att.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 3 Aug 1999, stanislav shalunov wrote:

> John,
> 
> I'm using sgmlfmt (Id: sgmlfmt.pl,v 1.26 1997/05/12 14:16:48 jfieber
> Exp, the version that came with 3.1-RELEASE) to format my SGML
> linuxdoc documents.  I need to be able to produce plain text output.
> I noticed that "sgmlfmt -f ascii" goes through groff to produce the
> text.  Unfortunately, this mean that the file will be formatted all
> right for printing on a line printer, but that's the least likely use
> of a text file (it would rather be used for Usenet postings, emailing,
> etc.; if I wanted to print, I'd produce PostScript!).

You people are forgetting that man pages need to be available on the
base system, and the sgmlformat tools are way too large EVER to be
considered for the main system.  Sgmlformat is not a viable solution.

> 
> The disadvantages of using groff to produce text are:
> 
> 	* Underlined/bold text (easily fixed with "ul -l dumb");
> 
> 	* Headings/footings (not so easily fixed, because one needs to
> 	  extract the title, decode entities, etc.);

Untrue, you don't seem to know groff well enough, it's very easy to do.

> 
> 	* Hyphenations: this makes text not searchable, and
>           spell-checking won't work.  It's the accepted practice to
>           just wrap the lines on word boundaries.

Not true, man -k uses the troff source for searching, which is not
hyphenated.  Hyphenation *could* be turned off, tho.

> 
> I found that I can get much better results by editing sgmlfmt so that
> $maxlevel=0, producing HTML file, and then doing "lynx -dump -nolist".
> I also found that for moderate size documents, I *don't* want to have
> have them split in multiple files, so maxlevel 0 seems a very
> reasonable default for HTML generation as well.
> 
> In short:
> 
> Suggestion one: Produce text files from HTML (using "lynx -dump -nolist").
> 
> Suggestion two: Make $maxlevel configurable from the command line (I
> think latex2html uses option name "-split" for the variable with this
> meaning, just in case you want to be consistent with something).
> 
> Bug report: When making HTML from a linuxdoc file that has <email> tag
> in <author>, email is not handled correctly.
> 
> Question: What about producing LaTeX output?  I would very much prefer
> to have LaTeX formatting to the shitty paragraphs produced by groff
> (no offense to groff, but TeX paragraph formatting and hyphenation
> algorithms are just much better).
> 
> Seems like the script was last updated a lo-o-ong time ago, do you
> still support it?
> 
> --Stanislav
> 
> 
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-ports" in the body of the message
> 

----------------------------+-----------------------------------------------
Chuck Robey                 | Interests include any kind of voice or data 
chuckr@picnic.mat.net       | communications topic, C programming, and Unix.
213 Lakeside Drive Apt T-1  |
Greenbelt, MD 20770         | I run picnic and jaunt, both FreeBSD-current.
(301) 220-2114              | 
----------------------------+-----------------------------------------------






To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-ports" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.10.9908032321350.451-100000>