From owner-freebsd-questions@FreeBSD.ORG Thu Jan 23 18:56:08 2014 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 766CF996 for ; Thu, 23 Jan 2014 18:56:08 +0000 (UTC) Received: from mail-we0-x232.google.com (mail-we0-x232.google.com [IPv6:2a00:1450:400c:c03::232]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 0B5AB1E64 for ; Thu, 23 Jan 2014 18:56:07 +0000 (UTC) Received: by mail-we0-f178.google.com with SMTP id t60so1668747wes.9 for ; Thu, 23 Jan 2014 10:56:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20120113; h=date:from:to:subject:message-id:in-reply-to:references:mime-version :content-type:content-transfer-encoding; bh=P9mBn580QX4xuGaCqVHeFoSUkTwpzC/ITYdjyR5iP8Q=; b=xOrPnxYay57VzhYHWoyGWP7t1y9uGudhhif28h/xCT2AlDv9MYagjwvQIMVmVZFb2c auu9UeT2SH5QM9K0SyAKYMkyD+lKoFEhhMIsxF5baaUV7F0K15FLKKwiA9cEmM5bvaYy veZq1ggzt3glRLmfm7YfQNaInlgf6KgjF01yPMba4KqjA5shrkk+C0GM5DsbifOeacY4 UtvlFareE0UYFZzAzqmROOWXgMF59UqOLORC5Nbgm/KVlPSQC3mQ4wZ2Aj84XwDUfVYi yUOY0j5jw/yzs9QRn/x86s5NOGk0Mi51PsKOl/W7+K13bmxGEmY+Qp6oaWwT6vPzysjl +FGA== X-Received: by 10.194.202.230 with SMTP id kl6mr7976191wjc.9.1390503366436; Thu, 23 Jan 2014 10:56:06 -0800 (PST) Received: from gumby.homeunix.com (87-194-112-13.bethere.co.uk. [87.194.112.13]) by mx.google.com with ESMTPSA id z1sm24599814wjq.19.2014.01.23.10.56.05 for (version=SSLv3 cipher=RC4-SHA bits=128/128); Thu, 23 Jan 2014 10:56:05 -0800 (PST) Date: Thu, 23 Jan 2014 18:56:04 +0000 From: RW To: freebsd-questions@freebsd.org Subject: Re: awk programming question Message-ID: <20140123185604.4cbd7611@gumby.homeunix.com> In-Reply-To: References: X-Mailer: Claws Mail 3.9.3 (GTK+ 2.24.22; amd64-portbld-freebsd10.0) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Jan 2014 18:56:08 -0000 On Thu, 23 Jan 2014 09:30:35 -0700 (MST) Warren Block wrote: > On Thu, 23 Jan 2014, Paul Schmehl wrote: > > > I'm kind of stubborn. There's lots of different ways to skin a > > cat, but I like to force myself to use the built-in utilities to do > > things so I can learn more about them and better understand how > > they work. > > > > So, I'm trying to parse a file of snort rules, extract two string > > values and insert a double pipe between them to create a > > sig-msg.map file > > > > Here's a typical rule: > > > > alert udp $HOME_NET any -> $EXTERNAL_NET 69 (msg:"E3[rb] ET POLICY > > Outbound TFTP Read Request"; content:"|00 01|"; depth:2; > > classtype:bad-unknown; sid:2008120; rev:1;) > > > > Here's a typical sig-msg.map file entry: > > > > 9624 || RPC UNIX authentication machinename string overflow attempt > > UDP > > > > So, from the above rule I would want to create a single line like > > this: > > > > 2008120 || E3[rb] ET POLICY Outbound TFTP Read Request > > > > There are several ways I can extract one or the other value, and > > I've figured out how to extract the sid and add the double pipe, > > but for the life of me I can't figure out how to extract and print > > out sid || msg. > > > > This prints out the sid and the double pipe: > > > > echo `awk 'match($0,/sid:[0-9]*;/) {print > > substr($0,RSTART,RLENGTH)" || "}' /tmp/mtc.rules | tr -d ";sid" > > > > It seems I could put the results into a variable rather than > > printing them out, and then print var1 || var2, but my google foo > > hasn't found a useful example. > > > > Surely there's a way to do this using awk? I can use tr for > > cleanup. I just need to get close to the right result. > > > > How about it awk experts? What's the cleanest way to get this done? > > Not an awk expert, but you can do math on the start and length > variables to get just the date part: > > echo "sid:2008120;" \ > | awk '{ match($0, /sid:[0-9]*;/) ; \ > ymd=substr($0, RSTART+4, RLENGTH-5) ; print ymd }' > > Closer to what you want: > > echo 'msg:"E3[rb] ET POLICY Outbound TFTP Read Request"; > sid:2008120;' \ | awk '{ match($0, /sid:[0-9]*;/) ; \ > ymd=substr($0, RSTART+4, RLENGTH-5) ; \ > match($0, /msg:.*;/) ; \ > msg = substr($0, RSTART+4, RLENGTH-5) ; \ > print ymd, "||", msg }' > > Note the error that the too-greedy regex creates, and the inability > of awk to capture regex sub-expressions. awk does not have a way to > reduce the greediness, at least that I'm aware. You may be able to > work around that, like if the message is always the same length. $ echo 'msg:"E3[rb] ET POLICY Outbound TFTP Read Request"; sid:2008120;' |\ awk '{ match($0, /sid:[0-9]+;/) ; ymd=substr($0, RSTART+4, RLENGTH-5) ; \ match($0, /msg:[^;]+;/) ; msg = substr($0, RSTART+4, RLENGTH-5) ; \ print ymd, "||", msg }' 2008120 || "E3[rb] ET POLICY Outbound TFTP Read Request" Note that awk supports +, but not newfangled things like *.