Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 14 Sep 2007 03:42:36 -0400
From:      Steve Bertrand <iaccounts@ibctech.ca>
To:        Jonathan McKeown <jonathan+freebsd-questions@hst.org.za>
Cc:        Kurt Buff <kurt.buff@gmail.com>, freebsd-questions@freebsd.org
Subject:   Re: Scripting question
Message-ID:  <46EA3B6C.7050200@ibctech.ca>
In-Reply-To: <200709140930.21142.jonathan%2Bfreebsd-questions@hst.org.za>
References:  <a9f4a3860709131016w54c12b6fy94fc2b0f286aea3d@mail.gmail.com>	<20070913183504.GC11683@slackbox.xs4all.nl> <200709140930.21142.jonathan%2Bfreebsd-questions@hst.org.za>

next in thread | previous in thread | raw e-mail | index | archive | help

>>> I don't have the perl skills, though that would be ideal.

-- snip --

> Another approach in Perl would be:
> 
> #!/usr/bin/perl
> my (%names, %dups);
> while (<>) {
>     my ($key) = split;
>     $dups{$key} = 1 if $names{$key};
>     $names{$key} = 1;
> }
> delete @names{keys %dups};
> #
> # keys %names is now an unordered list of only non-repeated elements
> # keys %dups is an unordered list of only repeated elements
> 
> split splits on whitespace, returning a list of fields which can be assigned 
> to a list of variables. Here we only want to capture the first field: split 
> is more efficient for this than using a regex. The first occurrence of $key 
> is in parens because it's actually a list of one variable name.
> 
> We build two hashes, one, %name, keyed by the original names (this is the 
> classic way to reduce duplicates to single occurrences, since the duplicated 
> keys overwrite the originals), and one, %dup, whose keys are names already 
> appearing in %names - the duplicated entries. Having done that we use a hash 
> slice to delete from %names all the keys of %dups, which leaves the keys of 
> %names holding all the entries which only appear once (and the keys of %dups 
> all the duplicated entries if that's useful).

I don't know if this is completely relevant, but it appears as though it
 may help.

Bob Showalter once advised me on the Perl Beginners list as such,
quoted, but snipped for clarity:

see "perldoc -q duplicate" If the array elements can
be compared with string semantics (as you are doing here), the following
will work:

   my @array = do { my %seen; grep !$seen{$_}++, @clean };

Steve



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?46EA3B6C.7050200>