Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 23 Jan 2014 12:20:26 -0800
From:      <dteske@FreeBSD.org>
To:        "'RW'" <rwmaillists@googlemail.com>, <freebsd-questions@freebsd.org>
Cc:        'Devin Teske' <dteske@freebsd.org>
Subject:   RE: awk programming question
Message-ID:  <04a201cf1878$8ebce540$ac36afc0$@FreeBSD.org>
In-Reply-To: <20140123185604.4cbd7611@gumby.homeunix.com>
References:  <F01EB9CE742DEB17DB6B51C7@localhost> <alpine.BSF.2.00.1401230900270.76961@wonkity.com> <20140123185604.4cbd7611@gumby.homeunix.com>

next in thread | previous in thread | raw e-mail | index | archive | help


> -----Original Message-----
> From: RW [mailto:rwmaillists@googlemail.com]
> Sent: Thursday, January 23, 2014 10:56 AM
> To: freebsd-questions@freebsd.org
> Subject: Re: awk programming question
> 
> On Thu, 23 Jan 2014 09:30:35 -0700 (MST) Warren Block wrote:
> 
> > On Thu, 23 Jan 2014, Paul Schmehl wrote:
> >
> > > I'm kind of stubborn.  There's lots of different ways to skin a cat,
> > > but I like to force myself to use the built-in utilities to do
> > > things so I can learn more about them and better understand how they
> > > work.
> > >
> > > So, I'm trying to parse a file of snort rules, extract two string
> > > values and insert a double pipe between them to create a sig-msg.map
> > > file
> > >
> > > Here's a typical rule:
> > >
> > > alert udp $HOME_NET any -> $EXTERNAL_NET 69 (msg:"E3[rb] ET POLICY
> > > Outbound TFTP Read Request"; content:"|00 01|"; depth:2;
> > > classtype:bad-unknown; sid:2008120; rev:1;)
> > >
> > > Here's a typical sig-msg.map file entry:
> > >
> > > 9624 || RPC UNIX authentication machinename string overflow attempt
> > > UDP
> > >
> > > So, from the above rule I would want to create a single line like
> > > this:
> > >
> > > 2008120 || E3[rb] ET POLICY Outbound TFTP Read Request
> > >
> > > There are several ways I can extract one or the other value, and
> > > I've figured out how to extract the sid and add the double pipe, but
> > > for the life of me I can't figure out how to extract and print out
> > > sid || msg.
> > >
> > > This prints out the sid and the double pipe:
> > >
> > > echo `awk 'match($0,/sid:[0-9]*;/) {print substr($0,RSTART,RLENGTH)"
> > > || "}' /tmp/mtc.rules | tr -d ";sid"
> > >
> > > It seems I could put the results into a variable rather than
> > > printing them out, and then print var1 || var2, but my google foo
> > > hasn't found a useful example.
> > >
> > > Surely there's a way to do this using awk?  I can use tr for
> > > cleanup.  I just need to get close to the right result.
> > >
> > > How about it awk experts?  What's the cleanest way to get this done?
> >
> > Not an awk expert, but you can do math on the start and length
> > variables to get just the date part:
> >
> > echo "sid:2008120;" \
> >    | awk '{ match($0, /sid:[0-9]*;/) ; \
> >  	ymd=substr($0, RSTART+4, RLENGTH-5) ; print ymd }'
> >
> > Closer to what you want:
> >
> > echo 'msg:"E3[rb] ET POLICY Outbound TFTP Read Request"; sid:2008120;'
> > \ | awk '{ match($0, /sid:[0-9]*;/) ; \
> >  	ymd=substr($0, RSTART+4, RLENGTH-5) ; \
> >  	match($0, /msg:.*;/) ; \
> >  	msg = substr($0, RSTART+4, RLENGTH-5) ; \
> >  	print ymd, "||", msg }'
> >
> > Note the error that the too-greedy regex creates, and the inability of
> > awk to capture regex sub-expressions.  awk does not have a way to
> > reduce the greediness, at least that I'm aware.  You may be able to
> > work around that, like if the message is always the same length.
> 
> 
> $ echo 'msg:"E3[rb] ET POLICY Outbound TFTP Read Request"; sid:2008120;'
> |\
>  awk '{ match($0, /sid:[0-9]+;/) ;  ymd=substr($0, RSTART+4, RLENGTH-5) ;
\
>       match($0, /msg:[^;]+;/) ; msg = substr($0, RSTART+4, RLENGTH-5) ;
\
>       print ymd, "||", msg }'
> 
> 2008120 || "E3[rb] ET POLICY Outbound TFTP Read Request"
> 
> Note that awk supports +, but not newfangled things like *.

With respect to regex, what awk really needs is the quantifier syntax...

* = {0,} = zero or more
+ = {1,} = one or more
{x,y} = any quantity from x inclusively up to y
{x,} = any quantity from x or more

sed supports it -- e.g., echo "aaa" | sed -e 's/a\{1,2\}//' # produces "a"
sed -E (aka sed -r) supports it -- e.g., echo "aaa" | sed -E 's/a{1,2}//' #
produces "a"
grep supports it -- e.g., echo "aaa" | grep 'a\{2,\}' # match printed
grep -E (aka egrep) supports it -- e.g., echo "aaa" | grep -E 'a{2,}' #
match printed
perl supports it -- obviously (in the modern regex form, lacking backslash)
nvi supports it -- e.g., :%s/a\{1,2\}//
vim supports it -- obviously (and uses the backslash form; even with
noncompatible set)

onetrueawk however does NOT support it -- example given...
echo aaa | awk '/a{2,}/{print}' # no match printed
echo aaa | awk '/a\{2,\}/{print}' # no match printed

There's a couple of other nits here...

1. sig-msg.map file according to OP shouldn't have the quotes that are
present from the snort rule input
2. Doesn't ignore lines of disinterest (See http://oreilly.com/pub/h/1393)
NB: The result code of match() is ignored; I don't think the program should
output
known bad sig-msg.map lines (where an sid is not given, for example; which
appears
to be the key for the sig-msg.map file).

I gather that a more complete solution would be as follows:

awk '!/^[[:space:]]*(#|$)/{if (!match($0,
/[[:space:](;]sid:[[:space:]]*[0-9]/)) next; sid = substr($0, RSTART +
RLENGTH - 1); sub(/[^0-9].*/, "", sid); if (!match($0,
/[[:space:](;]msg:[[:space:]]*/)) next; buf = substr($0, RSTART + RLENGTH);
quoted = substr(buf, 0, 1) == "\""; split(buf, msg, quoted ? "\"" : FS);
print sid, "||", msg[quoted ? 2 : 1]}' rules_file

Where "rules_file" is the name of the file you want to parse.

Putting this into a script, we can clean it up so that it's readable...

#!/bin/sh
awk '
!/^[[:space:]]*(#|$)/ {
	if (!match($0, /[[:space:](;]sid:[[:space:]]*[0-9]/)) next
	sid = substr($0, RSTART + RLENGTH - 1)
	sub(/[^0-9].*/, "", sid)
	if (!match($0, /[[:space:](;]msg:[[:space:]]*/)) next
	buf = substr($0, RSTART + RLENGTH)
	quoted = substr(buf, 0, 1) == "\""
	split(buf, msg, quoted ? "\"" : FS)
	print sid, "||", msg[quoted ? 2 : 1]
}' "$@"

-- 
Cheers,
Devin

_____________
The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?04a201cf1878$8ebce540$ac36afc0$>