Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 20 Aug 2010 12:12:20 -0500
From:      Paul Schmehl <pschmehl_lists@tx.rr.com>
To:        FreeBSD Questions <freebsd-questions@freebsd.org>
Subject:   Any awk gurus on the list?
Message-ID:  <23BA961B74BA2B5CA8B523F9@utd65257.utdallas.edu>

next in thread | raw e-mail | index | archive | help
I'm trying to figure out how to use awk to parse values from a string of 
unknown length and unknown fields using awk, from within a shell script, and 
write those values to a file in a certain order.

Here's a typical string that I want to parse:

alert ip 
[50.0.0.0/8,100.0.0.0/6,104.0.0.0/5,112.0.0.0/6,173.0.0.0/8,174.0.0.0/7,176.0.0.0/5,184.0.0.0/6] 
any -> $HOME_NET any (msg:"ET POLICY Reserved IP Space Traffic - Bogon Nets 2"; 
classtype:bad-unknown; reference:url,www.cymru.com/Documents/bogon-list.html; 
threshold: type limit, track by_src, count 1, seconds 360; sid:2002750; rev:10;)

What I want to do is extract the value after "sid:", the value after 
"reference:" and the value after "msg:" and insert them into a file that would 
look like this:

2002750 || "ET POLICY Reserved IP Space Traffic - Bogon Nets 2" || 
url,www.cymru.com/Documents/bogon-list.html

Yes, I know I could do this easily in Perl.  I'm doing this to try and improve 
my understanding of awk.  I *think* I've figured out that the right approach is 
to use an associative array, and this command:

#  awk '!/#/ { for (i=1; i<=NF; i++) { if ( $i ~ /sid/) {mtcmsg[sid]=$i; print 
mtcmsg[sid]}}}' < /usr/local/etc/snort/rules/mtc.rules.test

prodcues this data:
sid:299913;
sid:52123;
sid:3001441;
sid:1444;
sid:2008120;
sid:5001684;
sid:2001683;
sid:22466;
sid:2002750;
sid:3000003;
sid:292000032;
sid:22000032;
sid:3000000;
sid:2003070;
sid:2003484;
sid:2003603;
sid:31000004;
sid:299998;

So it appears (at least to me) that I'm on the right path, but I thought I'd 
query the awk gurus on the list.  Is there a better way to approach this?

The standard FS breaks the msg into multiple fields, which is unacceptable.  So 
my thinking is that I would need to do somthing like this (pseudocode)

!/#/; FS=";" {if ( $i ~ /sid/) then use tr to stip the "sid:" and ";" and 
insert the result into an element named sid
if ($i ~ /reference/) then ditto into an element named ref
if $i ~ /msg/) then ditto into an element named msg)
then print array[sid]" || "array[msg]" || " array[ref] > resulting file.}

But when I add an FS to the script, I get odd results:

#  awk '!/#/ { FS=";"; for (i=1; i<=NF; i++) { if ( $i ~ /sid/) 
{mtcmsg[sid]=$i; print mtcmsg[sid]}}}' < 
/usr/local/etc/snort/rules/mtc.rules.test
sid:299913;
 sid:52123
 sid:3001441
 sid:1444
 sid:2008120
 sid:5001684
 sid:2001683
 sid:22466
 sid:2002750
 sid:3000003
 sid:292000032
 sid:22000032
 sid:3000000
 sid:2003070
 sid:2003484
 sid:2003603
 sid:31000004
 sid:299998

Why is the first value indented and not stripped of the semi-colon?

-- 
Paul Schmehl, Senior Infosec Analyst
As if it wasn't already obvious, my opinions
are my own and not those of my employer.
*******************************************
"It is as useless to argue with those who have
renounced the use of reason as to administer
medication to the dead." Thomas Jefferson




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?23BA961B74BA2B5CA8B523F9>