From owner-freebsd-questions@FreeBSD.ORG Thu Jan 23 20:20:33 2014 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 218889BF; Thu, 23 Jan 2014 20:20:33 +0000 (UTC) Received: from mx1.fisglobal.com (mx1.fisglobal.com [199.200.24.190]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id DF46E165B; Thu, 23 Jan 2014 20:20:32 +0000 (UTC) Received: from smarthost.fisglobal.com ([10.132.206.192]) by ltcfislmsgpa01.fnfis.com (8.14.5/8.14.5) with ESMTP id s0NKKWBW004241 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT); Thu, 23 Jan 2014 14:20:32 -0600 Received: from THEMADHATTER (10.242.181.54) by smarthost.fisglobal.com (10.132.206.192) with Microsoft SMTP Server id 14.3.174.1; Thu, 23 Jan 2014 14:20:30 -0600 From: Sender: Devin Teske To: "'RW'" , References: <20140123185604.4cbd7611@gumby.homeunix.com> In-Reply-To: <20140123185604.4cbd7611@gumby.homeunix.com> Subject: RE: awk programming question Date: Thu, 23 Jan 2014 12:20:26 -0800 Message-ID: <04a201cf1878$8ebce540$ac36afc0$@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Outlook 15.0 Thread-Index: AQImboywfYAMtXakWRoTJ2P0ROJwWwJr8ryGAjfseSaZvuO00A== Content-Language: en-us X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.11.87, 1.0.14, 0.0.0000 definitions=2014-01-23_05:2014-01-22,2014-01-23,1970-01-01 signatures=0 Cc: 'Devin Teske' X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Jan 2014 20:20:33 -0000 > -----Original Message----- > From: RW [mailto:rwmaillists@googlemail.com] > Sent: Thursday, January 23, 2014 10:56 AM > To: freebsd-questions@freebsd.org > Subject: Re: awk programming question > > On Thu, 23 Jan 2014 09:30:35 -0700 (MST) Warren Block wrote: > > > On Thu, 23 Jan 2014, Paul Schmehl wrote: > > > > > I'm kind of stubborn. There's lots of different ways to skin a cat, > > > but I like to force myself to use the built-in utilities to do > > > things so I can learn more about them and better understand how they > > > work. > > > > > > So, I'm trying to parse a file of snort rules, extract two string > > > values and insert a double pipe between them to create a sig-msg.map > > > file > > > > > > Here's a typical rule: > > > > > > alert udp $HOME_NET any -> $EXTERNAL_NET 69 (msg:"E3[rb] ET POLICY > > > Outbound TFTP Read Request"; content:"|00 01|"; depth:2; > > > classtype:bad-unknown; sid:2008120; rev:1;) > > > > > > Here's a typical sig-msg.map file entry: > > > > > > 9624 || RPC UNIX authentication machinename string overflow attempt > > > UDP > > > > > > So, from the above rule I would want to create a single line like > > > this: > > > > > > 2008120 || E3[rb] ET POLICY Outbound TFTP Read Request > > > > > > There are several ways I can extract one or the other value, and > > > I've figured out how to extract the sid and add the double pipe, but > > > for the life of me I can't figure out how to extract and print out > > > sid || msg. > > > > > > This prints out the sid and the double pipe: > > > > > > echo `awk 'match($0,/sid:[0-9]*;/) {print substr($0,RSTART,RLENGTH)" > > > || "}' /tmp/mtc.rules | tr -d ";sid" > > > > > > It seems I could put the results into a variable rather than > > > printing them out, and then print var1 || var2, but my google foo > > > hasn't found a useful example. > > > > > > Surely there's a way to do this using awk? I can use tr for > > > cleanup. I just need to get close to the right result. > > > > > > How about it awk experts? What's the cleanest way to get this done? > > > > Not an awk expert, but you can do math on the start and length > > variables to get just the date part: > > > > echo "sid:2008120;" \ > > | awk '{ match($0, /sid:[0-9]*;/) ; \ > > ymd=substr($0, RSTART+4, RLENGTH-5) ; print ymd }' > > > > Closer to what you want: > > > > echo 'msg:"E3[rb] ET POLICY Outbound TFTP Read Request"; sid:2008120;' > > \ | awk '{ match($0, /sid:[0-9]*;/) ; \ > > ymd=substr($0, RSTART+4, RLENGTH-5) ; \ > > match($0, /msg:.*;/) ; \ > > msg = substr($0, RSTART+4, RLENGTH-5) ; \ > > print ymd, "||", msg }' > > > > Note the error that the too-greedy regex creates, and the inability of > > awk to capture regex sub-expressions. awk does not have a way to > > reduce the greediness, at least that I'm aware. You may be able to > > work around that, like if the message is always the same length. > > > $ echo 'msg:"E3[rb] ET POLICY Outbound TFTP Read Request"; sid:2008120;' > |\ > awk '{ match($0, /sid:[0-9]+;/) ; ymd=substr($0, RSTART+4, RLENGTH-5) ; \ > match($0, /msg:[^;]+;/) ; msg = substr($0, RSTART+4, RLENGTH-5) ; \ > print ymd, "||", msg }' > > 2008120 || "E3[rb] ET POLICY Outbound TFTP Read Request" > > Note that awk supports +, but not newfangled things like *. With respect to regex, what awk really needs is the quantifier syntax... * = {0,} = zero or more + = {1,} = one or more {x,y} = any quantity from x inclusively up to y {x,} = any quantity from x or more sed supports it -- e.g., echo "aaa" | sed -e 's/a\{1,2\}//' # produces "a" sed -E (aka sed -r) supports it -- e.g., echo "aaa" | sed -E 's/a{1,2}//' # produces "a" grep supports it -- e.g., echo "aaa" | grep 'a\{2,\}' # match printed grep -E (aka egrep) supports it -- e.g., echo "aaa" | grep -E 'a{2,}' # match printed perl supports it -- obviously (in the modern regex form, lacking backslash) nvi supports it -- e.g., :%s/a\{1,2\}// vim supports it -- obviously (and uses the backslash form; even with noncompatible set) onetrueawk however does NOT support it -- example given... echo aaa | awk '/a{2,}/{print}' # no match printed echo aaa | awk '/a\{2,\}/{print}' # no match printed There's a couple of other nits here... 1. sig-msg.map file according to OP shouldn't have the quotes that are present from the snort rule input 2. Doesn't ignore lines of disinterest (See http://oreilly.com/pub/h/1393) NB: The result code of match() is ignored; I don't think the program should output known bad sig-msg.map lines (where an sid is not given, for example; which appears to be the key for the sig-msg.map file). I gather that a more complete solution would be as follows: awk '!/^[[:space:]]*(#|$)/{if (!match($0, /[[:space:](;]sid:[[:space:]]*[0-9]/)) next; sid = substr($0, RSTART + RLENGTH - 1); sub(/[^0-9].*/, "", sid); if (!match($0, /[[:space:](;]msg:[[:space:]]*/)) next; buf = substr($0, RSTART + RLENGTH); quoted = substr(buf, 0, 1) == "\""; split(buf, msg, quoted ? "\"" : FS); print sid, "||", msg[quoted ? 2 : 1]}' rules_file Where "rules_file" is the name of the file you want to parse. Putting this into a script, we can clean it up so that it's readable... #!/bin/sh awk ' !/^[[:space:]]*(#|$)/ { if (!match($0, /[[:space:](;]sid:[[:space:]]*[0-9]/)) next sid = substr($0, RSTART + RLENGTH - 1) sub(/[^0-9].*/, "", sid) if (!match($0, /[[:space:](;]msg:[[:space:]]*/)) next buf = substr($0, RSTART + RLENGTH) quoted = substr(buf, 0, 1) == "\"" split(buf, msg, quoted ? "\"" : FS) print sid, "||", msg[quoted ? 2 : 1] }' "$@" -- Cheers, Devin _____________ The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.