Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 27 Dec 2005 10:04:45 -0600
From:      David Kelly <dkelly@hiwaay.net>
To:        Jack Stone <antennex@hotmail.com>
Cc:        freebsd-questions@freebsd.org
Subject:   Re: a SED need
Message-ID:  <20051227160445.GA56368@Grumpy.DynDNS.org>
In-Reply-To: <BAY106-F1673797A89767CF16F02ECC370@phx.gbl>
References:  <BAY106-F1673797A89767CF16F02ECC370@phx.gbl>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Dec 27, 2005 at 09:18:56AM -0600, Jack Stone wrote:
> I have some HTML files with hundreds of URLs that I need to modify using a 
> search/replace string. I assume that SED(1) is the right tool to use, but 
> every syntax I've tried has not worked.
> 
> Here is what I'm trying to do:
> Change full URLs to relative paths, in other words, chop off the
> "http://www.example.com/" portion:
> 
> >From this:
> <li><a href="http://www.example.com/model/many.html">;
> To this:
> <li><a href="model/many.html">
> 
> I think it is the slashes and quotes that are giving me fits as I'm very 
> much a novice on SED(1) syntax.

Am sure sed is the right high power production tool for getting the job
done but I get such things done easier in awk. Am sure many say the same
about perl. Sed, awk, perl, is the evolutionary order.

Save this as something like "example.awk" and chmod +x to make it
executable for easy reuse. Or you could "awk -f example.exe input >
output"

By saving to a file you bypass the need to escape characters from the
shell (which will be different depending on csh vs. sh) and yet again
from the RE parser. The escapes below are to make sure the literal
character is used for regular expression rather than a possible RE
interpretation.

Contains two patterns to match. The first matches the thing you are
looking to change. The match regular expression is repeated in gsub()
where its replaced with the plain text you desire. "Print" causes the
line to be outputed, and "next" ends the processing of that input line
so the next pattern isn't tried. Therefore the next match-all pattern
prints everything the first skipped.

#!/usr/bin/awk -f

/<a href=\"http:\/\/www.example.com\// {
	gsub(/<a href=\"http:\/\/www.example.com\//, "<a href=\"")
	print
	next
}

{ print }


-- 
David Kelly N4HHE, dkelly@HiWAAY.net
========================================================================
Whom computers would destroy, they must first drive mad.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20051227160445.GA56368>