Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 11 May 2000 13:56:10 -0600
From:      Charles Randall <crandall@matchlogic.com>
To:        Mitch Collinsworth <mkc@Graphics.Cornell.EDU>, Dan Larsson <dl@tyfon.net>
Cc:        questions@FreeBSD.ORG
Subject:   RE: regexp driving me nuts, help needed! 
Message-ID:  <5FE9B713CCCDD311A03400508B8B3013B256B8@bdr-xcln.is.matchlogic.com>

next in thread | raw e-mail | index | archive | help
That seems like a lot of work,

% echo http://www.domain.com/www.blah/html.asp | perl -ne 'print $1,"\n" if
m|http://www\.([^/]+)|i'
domain.com

This will work with a big list of URLs on stdin.

Charles

-----Original Message-----
From: Mitch Collinsworth [mailto:mkc@Graphics.Cornell.EDU]
Sent: Thursday, May 11, 2000 11:27 AM
To: Dan Larsson
Cc: questions@FreeBSD.ORG
Subject: Re: regexp driving me nuts, help needed! 



>I need to get the domain and tld from an url.
>
>this my idea of what would match and return 'domain.com':
>echo http://www.domain.com/html.asp | sed -e
's/\([\.a-zA-Z0-9]+[a-zA-Z]{2,3}\
>)/\1 /g'
>
>But that's not what sh thinks ( it returns the whole url )
>What regexp should I use to get the desired result?

Here's a perl 1-liner:

echo http://www.domain.com/html.asp |\
 perl -e '$u=<>; $u=~s/http:\/\///; $u=~s/^www.//i; $u=~s/\/.*$//; print $u'
domain.com

This works in stages, so it doesn't depending on the starting string
always containing all syntactical elements.

-Mitch


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-questions" in the body of the message


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-questions" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5FE9B713CCCDD311A03400508B8B3013B256B8>