Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 23 Jan 2014 18:56:04 +0000
From:      RW <rwmaillists@googlemail.com>
To:        freebsd-questions@freebsd.org
Subject:   Re: awk programming question
Message-ID:  <20140123185604.4cbd7611@gumby.homeunix.com>
In-Reply-To: <alpine.BSF.2.00.1401230900270.76961@wonkity.com>
References:  <F01EB9CE742DEB17DB6B51C7@localhost> <alpine.BSF.2.00.1401230900270.76961@wonkity.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 23 Jan 2014 09:30:35 -0700 (MST)
Warren Block wrote:

> On Thu, 23 Jan 2014, Paul Schmehl wrote:
> 
> > I'm kind of stubborn.  There's lots of different ways to skin a
> > cat, but I like to force myself to use the built-in utilities to do
> > things so I can learn more about them and better understand how
> > they work.
> >
> > So, I'm trying to parse a file of snort rules, extract two string
> > values and insert a double pipe between them to create a
> > sig-msg.map file
> >
> > Here's a typical rule:
> >
> > alert udp $HOME_NET any -> $EXTERNAL_NET 69 (msg:"E3[rb] ET POLICY
> > Outbound TFTP Read Request"; content:"|00 01|"; depth:2;
> > classtype:bad-unknown; sid:2008120; rev:1;)
> >
> > Here's a typical sig-msg.map file entry:
> >
> > 9624 || RPC UNIX authentication machinename string overflow attempt
> > UDP
> >
> > So, from the above rule I would want to create a single line like
> > this:
> >
> > 2008120 || E3[rb] ET POLICY Outbound TFTP Read Request
> >
> > There are several ways I can extract one or the other value, and
> > I've figured out how to extract the sid and add the double pipe,
> > but for the life of me I can't figure out how to extract and print
> > out sid || msg.
> >
> > This prints out the sid and the double pipe:
> >
> > echo `awk 'match($0,/sid:[0-9]*;/) {print
> > substr($0,RSTART,RLENGTH)" || "}' /tmp/mtc.rules | tr -d ";sid"
> >
> > It seems I could put the results into a variable rather than
> > printing them out, and then print var1 || var2, but my google foo
> > hasn't found a useful example.
> >
> > Surely there's a way to do this using awk?  I can use tr for
> > cleanup.  I just need to get close to the right result.
> >
> > How about it awk experts?  What's the cleanest way to get this done?
> 
> Not an awk expert, but you can do math on the start and length
> variables to get just the date part:
> 
> echo "sid:2008120;" \
>    | awk '{ match($0, /sid:[0-9]*;/) ; \
>  	ymd=substr($0, RSTART+4, RLENGTH-5) ; print ymd }'
> 
> Closer to what you want:
> 
> echo 'msg:"E3[rb] ET POLICY Outbound TFTP Read Request";
> sid:2008120;' \ | awk '{ match($0, /sid:[0-9]*;/) ; \
>  	ymd=substr($0, RSTART+4, RLENGTH-5) ; \
>  	match($0, /msg:.*;/) ; \
>  	msg = substr($0, RSTART+4, RLENGTH-5) ; \
>  	print ymd, "||", msg }'
> 
> Note the error that the too-greedy regex creates, and the inability
> of awk to capture regex sub-expressions.  awk does not have a way to
> reduce the greediness, at least that I'm aware.  You may be able to
> work around that, like if the message is always the same length.


$ echo 'msg:"E3[rb] ET POLICY Outbound TFTP Read Request"; sid:2008120;'    |\
 awk '{ match($0, /sid:[0-9]+;/) ;  ymd=substr($0, RSTART+4, RLENGTH-5) ; \
      match($0, /msg:[^;]+;/) ; msg = substr($0, RSTART+4, RLENGTH-5) ;   \
      print ymd, "||", msg }'

2008120 || "E3[rb] ET POLICY Outbound TFTP Read Request"


Note that awk supports +, but not newfangled things like *.  



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20140123185604.4cbd7611>