From owner-freebsd-questions@FreeBSD.ORG Thu Dec 13 22:41:10 2007 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A963816A41A for ; Thu, 13 Dec 2007 22:41:10 +0000 (UTC) (envelope-from kdk@daleco.biz) Received: from ezekiel.daleco.biz (southernuniform.com [66.76.92.18]) by mx1.freebsd.org (Postfix) with ESMTP id 51C0013C448 for ; Thu, 13 Dec 2007 22:41:10 +0000 (UTC) (envelope-from kdk@daleco.biz) Received: from localhost (localhost [127.0.0.1]) by ezekiel.daleco.biz (8.13.8/8.13.8) with ESMTP id lBDMetAc041687; Thu, 13 Dec 2007 16:40:55 -0600 (CST) (envelope-from kdk@daleco.biz) X-Virus-Scanned: amavisd-new at daleco.biz Received: from ezekiel.daleco.biz ([127.0.0.1]) by localhost (ezekiel.daleco.biz [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 9xpstgy0MaNm; Thu, 13 Dec 2007 16:40:49 -0600 (CST) Received: from archangel.daleco.biz (dsl.daleco.biz [209.125.108.70]) by ezekiel.daleco.biz (8.13.8/8.13.8) with ESMTP id lBDMdsZO041662; Thu, 13 Dec 2007 16:39:56 -0600 (CST) (envelope-from kdk@daleco.biz) Message-ID: <4761B4B4.8080100@daleco.biz> Date: Thu, 13 Dec 2007 16:39:48 -0600 From: Kevin Kinsey User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.8.1.2) Gecko/20070418 SeaMonkey/1.1.1 MIME-Version: 1.0 To: Rudy References: <4754C19E.5060708@monkeybrains.net> <4754CD5B.90605@daleco.biz> <4754DD17.6050701@monkeybrains.net> <20071204074444.GB12505@dan.emsphone.com> <476198EB.5010802@monkeybrains.net> In-Reply-To: <476198EB.5010802@monkeybrains.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Dan Nelson , freebsd-questions@freebsd.org Subject: Re: cron pile up! Lot's of "cron: running job (cron)" X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 13 Dec 2007 22:41:10 -0000 Rudy wrote: > Dan Nelson wrote: >> In the last episode (Dec 03), Support (Rudy) said: >>> Below is part of the cron... Seems like any random cronjob can get >>> clogged up... load varies from 0.2 to 1.0 on this dual-core box. I >>> rebooted the box -- cron's continue to slowly pile up. >>> >>> One of the cronjobs that is 'stuck' is this one: >>> /root/bin/raid-status.sh >>> which can be found here: >>> http://www.monkeybrains.net/~rudy/example/raid_status.html >>> >>> Forgot to mention, I am running: >>> 6.2-STABLE FreeBSD 6.2-STABLE #3: Thu May 31 01:18:15 PDT 2007 >>> >>> OH, ps shows this: >>> 58383 ?? D 0:00.00 cron: running job (cron) >>> 58384 ?? IVs 0:00.00 cron: running job (cron) >> >> In general, when troubleshhoting, "ps axlw" is a more useful command. >> It adds among other columns, the MWCHAN one, which details exactly why >> a process is stuck in the D state. >> Anyway, cron does a fork and then a vfork creating a child and a >> grandchild process. I'm sort of surprised at the amount of code >> between vfork and exec in the grandchild in >> /src/usr.sbin/cron/cron/do_command.c . Since process 3 is actually >> using process 2's address space one must be extremely careful not to >> modify static variables or change other global state that would affect >> the parent once it resumes execution, and all the logging, >> environment-setting, and user-context calls are certain to mess with >> the parent's state, especially with nss modules in the mix. I'd >> personally recompile cron with all vforks replaced with fork and see >> what happens. >> >> It couldn't hurt to update to a newer kernel version along the RELENG_6 >> branch as a test, I guess. Note that your uname will change to >> 6.3-PRERELEASE, but apart from causing lsof to complain, you should be >> okay. >> >>> /var/log/cron has this entry: >>> Dec 3 20:16:00 pita /usr/sbin/cron[58384]: (root) CMD >>> (/root/bin/raid-status.sh CRON) >>> >>> BUT there is no 'raid-status.sh' stuck in the "ps axw". Seems like >>> the vfork set off the cronjob, it ran, but then cron didn't 'stop' >>> executing. Any debuggin tips? >> >> Can you tell if raid-status.sh ever ran? i.e. is process 2 >> stuck at the start of vfork or at the end. > > I added this line to the top of my cronjob: > logger -t DEBUG "$0: $$" > and cron seems stuck BEFORE the script is ever run. Whether it sticks > or not appears random, as plenty of log lines are showing up with the > output of the logger command in my /var/log/messages. > > # tail /var/log/messages > Dec 13 11:16:00 pita DEBUG: /root/bin/raid-status.sh: 64414 > Dec 13 12:00:00 pita DEBUG: /root/bin/raid-status.sh: 80115 > Dec 13 12:00:00 pita DEBUG: /root/bin/raid-status.sh: 80119 > Dec 13 12:11:00 pita DEBUG: /root/bin/raid-status.sh: 84283 > > Here is the ps output: > # ps axlw > UID PID PPID CPU PRI NI VSZ RSS MWCHAN STAT TT TIME COMMAND > 0 85939 82253 0 8 0 2148 1560 ppwait D ?? 0:00.00 > cron: running job (cron) > 0 85940 85939 0 4 0 2148 1560 sbwait IVs ?? 0:00.00 > cron: running job (cron) > # grep 85940 /var/log/cron > Dec 13 12:16:00 pita /usr/sbin/cron[85940]: (root) CMD > (/root/bin/raid-status.sh CRON) > > - Rudy Just as a favor to an old coot, could you change your crontab entry to read like this: */16 * * * * "/root/bin/raid-status.sh" and see if it makes any difference? Kevin Kinsey -- There are never any bugs you haven't found yet.