Date: Fri, 20 Aug 2010 12:12:20 -0500 From: Paul Schmehl <pschmehl_lists@tx.rr.com> To: FreeBSD Questions <freebsd-questions@freebsd.org> Subject: Any awk gurus on the list? Message-ID: <23BA961B74BA2B5CA8B523F9@utd65257.utdallas.edu>
next in thread | raw e-mail | index | archive | help
I'm trying to figure out how to use awk to parse values from a string of unknown length and unknown fields using awk, from within a shell script, and write those values to a file in a certain order. Here's a typical string that I want to parse: alert ip [50.0.0.0/8,100.0.0.0/6,104.0.0.0/5,112.0.0.0/6,173.0.0.0/8,174.0.0.0/7,176.0.0.0/5,184.0.0.0/6] any -> $HOME_NET any (msg:"ET POLICY Reserved IP Space Traffic - Bogon Nets 2"; classtype:bad-unknown; reference:url,www.cymru.com/Documents/bogon-list.html; threshold: type limit, track by_src, count 1, seconds 360; sid:2002750; rev:10;) What I want to do is extract the value after "sid:", the value after "reference:" and the value after "msg:" and insert them into a file that would look like this: 2002750 || "ET POLICY Reserved IP Space Traffic - Bogon Nets 2" || url,www.cymru.com/Documents/bogon-list.html Yes, I know I could do this easily in Perl. I'm doing this to try and improve my understanding of awk. I *think* I've figured out that the right approach is to use an associative array, and this command: # awk '!/#/ { for (i=1; i<=NF; i++) { if ( $i ~ /sid/) {mtcmsg[sid]=$i; print mtcmsg[sid]}}}' < /usr/local/etc/snort/rules/mtc.rules.test prodcues this data: sid:299913; sid:52123; sid:3001441; sid:1444; sid:2008120; sid:5001684; sid:2001683; sid:22466; sid:2002750; sid:3000003; sid:292000032; sid:22000032; sid:3000000; sid:2003070; sid:2003484; sid:2003603; sid:31000004; sid:299998; So it appears (at least to me) that I'm on the right path, but I thought I'd query the awk gurus on the list. Is there a better way to approach this? The standard FS breaks the msg into multiple fields, which is unacceptable. So my thinking is that I would need to do somthing like this (pseudocode) !/#/; FS=";" {if ( $i ~ /sid/) then use tr to stip the "sid:" and ";" and insert the result into an element named sid if ($i ~ /reference/) then ditto into an element named ref if $i ~ /msg/) then ditto into an element named msg) then print array[sid]" || "array[msg]" || " array[ref] > resulting file.} But when I add an FS to the script, I get odd results: # awk '!/#/ { FS=";"; for (i=1; i<=NF; i++) { if ( $i ~ /sid/) {mtcmsg[sid]=$i; print mtcmsg[sid]}}}' < /usr/local/etc/snort/rules/mtc.rules.test sid:299913; sid:52123 sid:3001441 sid:1444 sid:2008120 sid:5001684 sid:2001683 sid:22466 sid:2002750 sid:3000003 sid:292000032 sid:22000032 sid:3000000 sid:2003070 sid:2003484 sid:2003603 sid:31000004 sid:299998 Why is the first value indented and not stripped of the semi-colon? -- Paul Schmehl, Senior Infosec Analyst As if it wasn't already obvious, my opinions are my own and not those of my employer. ******************************************* "It is as useless to argue with those who have renounced the use of reason as to administer medication to the dead." Thomas Jefferson
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?23BA961B74BA2B5CA8B523F9>