From owner-freebsd-ports Tue Aug 3 20:27:51 1999 Delivered-To: freebsd-ports@freebsd.org Received: from picnic.mat.net (picnic.mat.net [206.246.122.133]) by hub.freebsd.org (Postfix) with ESMTP id 5A20114DB2; Tue, 3 Aug 1999 20:27:18 -0700 (PDT) (envelope-from chuckr@picnic.mat.net) Received: from localhost (chuckr@localhost) by picnic.mat.net (8.9.3/8.9.3) with ESMTP id XAA53951; Tue, 3 Aug 1999 23:26:05 -0400 (EDT) (envelope-from chuckr@picnic.mat.net) Date: Tue, 3 Aug 1999 23:26:05 -0400 (EDT) From: Chuck Robey To: stanislav shalunov Cc: jfieber@FreeBSD.ORG, freebsd-ports@FreeBSD.ORG Subject: Re: sgmlfmt: producing text files In-Reply-To: <199908031948.PAA11536@tuzik.lz.att.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-ports@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Tue, 3 Aug 1999, stanislav shalunov wrote: > John, > > I'm using sgmlfmt (Id: sgmlfmt.pl,v 1.26 1997/05/12 14:16:48 jfieber > Exp, the version that came with 3.1-RELEASE) to format my SGML > linuxdoc documents. I need to be able to produce plain text output. > I noticed that "sgmlfmt -f ascii" goes through groff to produce the > text. Unfortunately, this mean that the file will be formatted all > right for printing on a line printer, but that's the least likely use > of a text file (it would rather be used for Usenet postings, emailing, > etc.; if I wanted to print, I'd produce PostScript!). You people are forgetting that man pages need to be available on the base system, and the sgmlformat tools are way too large EVER to be considered for the main system. Sgmlformat is not a viable solution. > > The disadvantages of using groff to produce text are: > > * Underlined/bold text (easily fixed with "ul -l dumb"); > > * Headings/footings (not so easily fixed, because one needs to > extract the title, decode entities, etc.); Untrue, you don't seem to know groff well enough, it's very easy to do. > > * Hyphenations: this makes text not searchable, and > spell-checking won't work. It's the accepted practice to > just wrap the lines on word boundaries. Not true, man -k uses the troff source for searching, which is not hyphenated. Hyphenation *could* be turned off, tho. > > I found that I can get much better results by editing sgmlfmt so that > $maxlevel=0, producing HTML file, and then doing "lynx -dump -nolist". > I also found that for moderate size documents, I *don't* want to have > have them split in multiple files, so maxlevel 0 seems a very > reasonable default for HTML generation as well. > > In short: > > Suggestion one: Produce text files from HTML (using "lynx -dump -nolist"). > > Suggestion two: Make $maxlevel configurable from the command line (I > think latex2html uses option name "-split" for the variable with this > meaning, just in case you want to be consistent with something). > > Bug report: When making HTML from a linuxdoc file that has tag > in , email is not handled correctly. > > Question: What about producing LaTeX output? I would very much prefer > to have LaTeX formatting to the shitty paragraphs produced by groff > (no offense to groff, but TeX paragraph formatting and hyphenation > algorithms are just much better). > > Seems like the script was last updated a lo-o-ong time ago, do you > still support it? > > --Stanislav > > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-ports" in the body of the message > ----------------------------+----------------------------------------------- Chuck Robey | Interests include any kind of voice or data chuckr@picnic.mat.net | communications topic, C programming, and Unix. 213 Lakeside Drive Apt T-1 | Greenbelt, MD 20770 | I run picnic and jaunt, both FreeBSD-current. (301) 220-2114 | ----------------------------+----------------------------------------------- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-ports" in the body of the message