From owner-freebsd-questions@freebsd.org Tue Apr 18 00:19:37 2017 Return-Path: Delivered-To: freebsd-questions@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D7E4ED42CA7 for ; Tue, 18 Apr 2017 00:19:37 +0000 (UTC) (envelope-from freebsd@edvax.de) Received: from mailrelay10.qsc.de (mailrelay10.qsc.de [212.99.163.152]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "*.antispameurope.com", Issuer "TeleSec ServerPass DE-2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 5D3D99EE for ; Tue, 18 Apr 2017 00:19:36 +0000 (UTC) (envelope-from freebsd@edvax.de) Received: from mx01.qsc.de ([213.148.129.14]) by mailrelay10.qsc.de; Tue, 18 Apr 2017 02:19:27 +0200 Received: from r56.edvax.de (port-92-195-114-44.dynamic.qsc.de [92.195.114.44]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx01.qsc.de (Postfix) with ESMTPS id 2B86F3CC3F; Tue, 18 Apr 2017 02:19:27 +0200 (CEST) Received: from r56.edvax.de (localhost [127.0.0.1]) by r56.edvax.de (8.14.5/8.14.5) with SMTP id v3I0JQK1002273; Tue, 18 Apr 2017 02:19:26 +0200 (CEST) (envelope-from freebsd@edvax.de) Date: Tue, 18 Apr 2017 02:19:26 +0200 From: Polytropon To: Ernie Luzar Cc: freebsd-questions@freebsd.org Subject: Re: awk help Message-Id: <20170418021926.8410148b.freebsd@edvax.de> In-Reply-To: <58F53EEA.2030206@gmail.com> References: <58F25A01.1060208@gmail.com> <7951DF71-5CD3-4B53-9CB4-13CAA8945983@huiekin.org> <58F4CD14.7090008@gmail.com> <58F53EEA.2030206@gmail.com> Reply-To: Polytropon Organization: EDVAX X-Mailer: Sylpheed 3.1.1 (GTK+ 2.24.5; i386-portbld-freebsd8.2) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-cloud-security-sender: freebsd@edvax.de X-cloud-security-recipient: freebsd-questions@freebsd.org X-cloud-security-Virusscan: CLEAN X-cloud-security-disclaimer: This E-Mail was scanned by E-Mailservice on mailrelay10.qsc.de with 7417B68340A X-cloud-security-connect: mx01.qsc.de[213.148.129.14], TLS=1, IP=213.148.129.14 X-cloud-security: scantime:.1269 X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Apr 2017 00:19:37 -0000 On Mon, 17 Apr 2017 18:17:14 -0400, Ernie Luzar wrote: > In general I am experimenting with ipfilter ippools {IE; in-core > I have written a csh "process hits" script that takes 5+ minutes to > process that report. I have seen awk used in some public scripts but I > have never used it before. I wanted to learn awk and though it would be > a good idea to rewrite my csh "process hits" script in awk. To have a > fair comparison I needed the awk version to do the rm & touch on the > files that the csh version does. Allow me a short side note: What you've written (and presented to the list) is not a csh script. It's a sh script. FreeBSD's default dialog shell is csh, the C shell, but the default scripting shell is sh, a "kind of" Bourne shell. Using sh for scripting is something like an "industry standard". Nobody likes to write scripts for the C shell. :-) > Its obvious that awk is far superior in performance over native csh > programming. It is. The awk scripting language is intended for text processing, pattern matching, output manipulation and "text-related" programming, while sh (not csh!) is much better for "general" programming, and of course as "all purpose programming glue". :-) > I have another csh script to expire records from the master file that > runs a long time. The csh script follows; > > # The following logic removes expired records > for line in `cat $temp_master_db`; do > ip=`echo -n $line | cut -w -f 2` > date=`echo -n $line | cut -w -f 1` > > if [ "$on_one" = "YES" ]; then > on_one="NO" > previous_ip="$ip" > previous_date="$date" > continue > fi > > if [ "$ip" != "$previous_ip" ]; then > > if [ $previous_date -le $expire_date ]; then > # Drop the record from the master db file as expired. > previous_ip="$ip" > previous_date="$date" > continue > else > db_rec="$previous_date $previous_ip" > echo "${db_rec}" >> $master_db_new > previous_ip="$ip" > previous_date="$date" > fi > else > # Here current ip and previous_ip are the same. > # Check if expired. > if [ $previous_date -le $expire_date ]; then > # Drop the record from the master db file as expired. > previous_ip="$ip" > previous_date="$date" > continue > fi > if [ $previous_date -le $date ]; then > # Drop the record from the master db file as expired. > previous_ip="$ip" > previous_date="$date" > continue > fi > > db_rec="$previous_date $previous_ip" > echo "${db_rec}" >> $master_db_new > previous_ip="$ip" > previous_date="$date" > > fi > done > > # At EOF, must still process previous record. > if [ $previous_date -le $expire_date ]; then > db_rec="$previous_date $previous_ip" > echo "${db_rec}" >> $master_db_new > fi > > > Is there some standard awk model to achieve this previous-save logic? >From quickly reading that code, it should be possible to re-implement this with awk. I'm currently not aware of a "pattern name" of what you're trying to accomplish, but should be able to "translate" the sh code into awk code. > Also can a csh $variable be used inside of an awk program? No directly. A sh (not csh!) variable is prefixed by $, but the awk program is typically enclosed in single quotes which prohibit the normal function of $FOO or ${FOO}; awk uses $ itself, for example as field identifiers like $0, $1, $2 and so on. If you'd have _no_ $ in your awk code, you could probably do something like this: #!/bin/sh FOO=100 awk "BEGIN { print $FOO }" But of course, now you'll get problems using double quotes in awk. However, there is (at least) a way to deal with this problem: Prefix the data you're going to process with "special lines", let's say they start with #, a name (the "variable name", a =, and the "value". You can easily generate this as a temporary file from your "glue" script. Example: #!/bin/sh # variables and values FOO="100" BAR="123.456.789.0" # file names CONFIGFILE="/tmp/config.tmp" DATA_IN="ip_in.txt" DATA_OUT="ip_out.txt" echo "#FOO=${FOO}" > ${CONFIGFILE} echo "#BAR=${BAR}" >> ${CONFIGFILE} cat ${CONFIGFILE} ${DATA_IN} | awk -F "=" ' /^#[A-Z]/ { if ($1 == "#FOO") foo = $2 if ($1 == "#BAR") bar = $2 } /Address/ { ... # something that uses foo } /Hits/ { ... # something else that uses bar } ' > ${DATA_OUT} rm ${CONFIGFILE} In case you want to "filter out" those "special lines", you can for example use | grep -v "^#" | in your processing pipeline. Another option would be a "search and replace" mechanism that modifies the awk program itself. That can be done with awk or sed (NB: sed, the stream editor, is one of the most convenient ways to do a "search and replace" operation: | sed "s/from/to/g" | in your pipeline. As you see the " quotes, using shell variables is no problem here. Let's say your awk script has two "placeholders" called FOO and BAR (make sure they're unique!). You simply replace them with the values present in the sh "glue". Example: #!/bin/sh # variables and values FOO="100" BAR="123.456.789.0" # file names DATA_IN="ip_in.txt" DATA_OUT="ip_out.txt" SCRIPT_ORIG="process_ip_orig.awk" SCRIPT_MOD="process_ip.awk" sed "s/FOO/${FOO}/g; s/BAR/${BAR}/g" < ${SCRIPT_ORIG} > ${SCRIPT_MOD} cat ${DATA_IN} | awk -f ${SCRIPT_MOD} > ${DATA_OUT} rm ${SCRIPT_MOD} NB: Useless use of cat. :-) I'm sure there are several other ways of doing this, but maybe those two examples can help or at least inspire you. :-) -- Polytropon Magdeburg, Germany Happy FreeBSD user since 4.0 Andra moi ennepe, Mousa, ...