From owner-freebsd-questions@FreeBSD.ORG Tue May 15 05:34:48 2007 Return-Path: X-Original-To: freebsd-questions@freebsd.org Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 5F14716A402 for ; Tue, 15 May 2007 05:34:48 +0000 (UTC) (envelope-from smithi@nimnet.asn.au) Received: from gaia.nimnet.asn.au (nimbin.lnk.telstra.net [139.130.45.143]) by mx1.freebsd.org (Postfix) with ESMTP id 146BB13C448 for ; Tue, 15 May 2007 05:34:46 +0000 (UTC) (envelope-from smithi@nimnet.asn.au) Received: from localhost (smithi@localhost) by gaia.nimnet.asn.au (8.8.8/8.8.8R1.5) with SMTP id PAA09529; Tue, 15 May 2007 15:34:15 +1000 (EST) (envelope-from smithi@nimnet.asn.au) Date: Tue, 15 May 2007 15:34:14 +1000 (EST) From: Ian Smith To: Gary Kline In-Reply-To: <20070514210933.1024A16A478@hub.freebsd.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Cc: freebsd-questions@freebsd.org Subject: Re: what's the easiest way to de-html-ize files? X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 May 2007 05:34:48 -0000 On Sat, 12 May 2007 14:34:52 -0700 Gary Kline wrote: > On Mon, May 14, 2007 at 12:09:07PM -0700, Chuck Swiger wrote: > > On May 12, 2007, at 12:54 PM, Gary Kline wrote: > > >This is for those of us who appreciate ASCII or straight > > > ISO_8859-15 rather than marked up files. I have slapped together > > > a crude C program that does scotch (or *cleanse*) text of > > > and so on. Still... is there some standalone converter > > > that gets rids of markup more elegantly? Something where i > > > can say > > > > > > % cmd file_1.html ... file_N.html and output file_1.text ... > > > file_N.text? > > > > Perhaps: > > > > lynx -dump file1.html ... > file.text > > > > ...? > > Hm, maybe Ineed Bill Campbell's -force_html switch. > > Yes, seems that way. USing just -dump got most of them, but > using the -force_html caught all. Need to script something to > reformat, but the worst of it's done! Also, if using Mozilla (so, I would assume, Firefox) the 'Save Page As' dialog offers a picklist for 'Files of Type' that includes 'Text Files'. This does a pretty decent job of producing text from HTML files, and is quicker than firing up lynx (or links) if you're already viewing a page. Cheers, Ian