From owner-freebsd-questions  Thu May 11 10:27:21 2000
Delivered-To: freebsd-questions@freebsd.org
Received: from larryboy.graphics.cornell.edu (larryboy.graphics.cornell.edu [128.84.247.48])
	by hub.freebsd.org (Postfix) with ESMTP id BD04537B973
	for <questions@FreeBSD.ORG>; Thu, 11 May 2000 10:27:17 -0700 (PDT)
	(envelope-from mkc@larryboy.graphics.cornell.edu)
Received: from larryboy.graphics.cornell.edu (mkc@localhost)
	by larryboy.graphics.cornell.edu (8.9.3/8.9.3) with ESMTP id NAA86965;
	Thu, 11 May 2000 13:27:15 -0400 (EDT)
	(envelope-from mkc@larryboy.graphics.cornell.edu)
Message-Id: <200005111727.NAA86965@larryboy.graphics.cornell.edu>
To: "Dan Larsson" <dl@tyfon.net>
Cc: questions@FreeBSD.ORG
Subject: Re: regexp driving me nuts, help needed! 
In-Reply-To: Message from "Dan Larsson" <dl@tyfon.net> 
   of "Thu, 11 May 2000 18:42:59 +0200." <NEBBJANJCNNAKCPFKHHFEEENCCAA.dl@tyfon.net> 
Date: Thu, 11 May 2000 13:27:15 -0400
From: Mitch Collinsworth <mkc@Graphics.Cornell.EDU>
Sender: owner-freebsd-questions@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


>I need to get the domain and tld from an url.
>
>this my idea of what would match and return 'domain.com':
>echo http://www.domain.com/html.asp | sed -e 's/\([\.a-zA-Z0-9]+[a-zA-Z]{2,3}\
>)/\1 /g'
>
>But that's not what sh thinks ( it returns the whole url )
>What regexp should I use to get the desired result?

Here's a perl 1-liner:

echo http://www.domain.com/html.asp |\
 perl -e '$u=<>; $u=~s/http:\/\///; $u=~s/^www.//i; $u=~s/\/.*$//; print $u'
domain.com

This works in stages, so it doesn't depending on the starting string
always containing all syntactical elements.

-Mitch


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-questions" in the body of the message