Date: Thu, 13 Dec 2007 16:39:48 -0600 From: Kevin Kinsey <kdk@daleco.biz> To: Rudy <crapsh@monkeybrains.net> Cc: Dan Nelson <dnelson@allantgroup.com>, freebsd-questions@freebsd.org Subject: Re: cron pile up! Lot's of "cron: running job (cron)" Message-ID: <4761B4B4.8080100@daleco.biz> In-Reply-To: <476198EB.5010802@monkeybrains.net> References: <4754C19E.5060708@monkeybrains.net> <4754CD5B.90605@daleco.biz> <4754DD17.6050701@monkeybrains.net> <20071204074444.GB12505@dan.emsphone.com> <476198EB.5010802@monkeybrains.net>
next in thread | previous in thread | raw e-mail | index | archive | help
Rudy wrote: > Dan Nelson wrote: >> In the last episode (Dec 03), Support (Rudy) said: >>> Below is part of the cron... Seems like any random cronjob can get >>> clogged up... load varies from 0.2 to 1.0 on this dual-core box. I >>> rebooted the box -- cron's continue to slowly pile up. >>> >>> One of the cronjobs that is 'stuck' is this one: >>> /root/bin/raid-status.sh >>> which can be found here: >>> http://www.monkeybrains.net/~rudy/example/raid_status.html >>> >>> Forgot to mention, I am running: >>> 6.2-STABLE FreeBSD 6.2-STABLE #3: Thu May 31 01:18:15 PDT 2007 >>> >>> OH, ps shows this: >>> 58383 ?? D 0:00.00 cron: running job (cron) >>> 58384 ?? IVs 0:00.00 cron: running job (cron) >> >> In general, when troubleshhoting, "ps axlw" is a more useful command. >> It adds among other columns, the MWCHAN one, which details exactly why >> a process is stuck in the D state. >> Anyway, cron does a fork and then a vfork creating a child and a >> grandchild process. I'm sort of surprised at the amount of code >> between vfork and exec in the grandchild in >> /src/usr.sbin/cron/cron/do_command.c . Since process 3 is actually >> using process 2's address space one must be extremely careful not to >> modify static variables or change other global state that would affect >> the parent once it resumes execution, and all the logging, >> environment-setting, and user-context calls are certain to mess with >> the parent's state, especially with nss modules in the mix. I'd >> personally recompile cron with all vforks replaced with fork and see >> what happens. >> >> It couldn't hurt to update to a newer kernel version along the RELENG_6 >> branch as a test, I guess. Note that your uname will change to >> 6.3-PRERELEASE, but apart from causing lsof to complain, you should be >> okay. >> >>> /var/log/cron has this entry: >>> Dec 3 20:16:00 pita /usr/sbin/cron[58384]: (root) CMD >>> (/root/bin/raid-status.sh CRON) >>> >>> BUT there is no 'raid-status.sh' stuck in the "ps axw". Seems like >>> the vfork set off the cronjob, it ran, but then cron didn't 'stop' >>> executing. Any debuggin tips? >> >> Can you tell if raid-status.sh ever ran? i.e. is process 2 >> stuck at the start of vfork or at the end. > > I added this line to the top of my cronjob: > logger -t DEBUG "$0: $$" > and cron seems stuck BEFORE the script is ever run. Whether it sticks > or not appears random, as plenty of log lines are showing up with the > output of the logger command in my /var/log/messages. > > # tail /var/log/messages > Dec 13 11:16:00 pita DEBUG: /root/bin/raid-status.sh: 64414 > Dec 13 12:00:00 pita DEBUG: /root/bin/raid-status.sh: 80115 > Dec 13 12:00:00 pita DEBUG: /root/bin/raid-status.sh: 80119 > Dec 13 12:11:00 pita DEBUG: /root/bin/raid-status.sh: 84283 > > Here is the ps output: > # ps axlw > UID PID PPID CPU PRI NI VSZ RSS MWCHAN STAT TT TIME COMMAND > 0 85939 82253 0 8 0 2148 1560 ppwait D ?? 0:00.00 > cron: running job (cron) > 0 85940 85939 0 4 0 2148 1560 sbwait IVs ?? 0:00.00 > cron: running job (cron) > # grep 85940 /var/log/cron > Dec 13 12:16:00 pita /usr/sbin/cron[85940]: (root) CMD > (/root/bin/raid-status.sh CRON) > > - Rudy Just as a favor to an old coot, could you change your crontab entry to read like this: */16 * * * * "/root/bin/raid-status.sh" and see if it makes any difference? Kevin Kinsey -- There are never any bugs you haven't found yet.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4761B4B4.8080100>