Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 13 Dec 2007 16:39:48 -0600
From:      Kevin Kinsey <kdk@daleco.biz>
To:        Rudy <crapsh@monkeybrains.net>
Cc:        Dan Nelson <dnelson@allantgroup.com>, freebsd-questions@freebsd.org
Subject:   Re: cron pile up!  Lot's of "cron: running job (cron)"
Message-ID:  <4761B4B4.8080100@daleco.biz>
In-Reply-To: <476198EB.5010802@monkeybrains.net>
References:  <4754C19E.5060708@monkeybrains.net>	<4754CD5B.90605@daleco.biz>	<4754DD17.6050701@monkeybrains.net>	<20071204074444.GB12505@dan.emsphone.com> <476198EB.5010802@monkeybrains.net>

next in thread | previous in thread | raw e-mail | index | archive | help
Rudy wrote:
> Dan Nelson wrote:
>> In the last episode (Dec 03), Support (Rudy) said:
>>> Below is part of the cron...  Seems like any random cronjob can get
>>> clogged up... load varies from 0.2 to 1.0 on this dual-core box.  I
>>> rebooted the box -- cron's continue to slowly pile up.
>>>
>>> One of the cronjobs that is 'stuck' is this one: 
>>> /root/bin/raid-status.sh
>>> which can be found here:
>>>  http://www.monkeybrains.net/~rudy/example/raid_status.html
>>>
>>> Forgot to mention, I am running:
>>>   6.2-STABLE FreeBSD 6.2-STABLE #3: Thu May 31 01:18:15 PDT 2007
>>>
>>> OH, ps shows this:
>>> 58383  ??  D      0:00.00 cron: running job (cron)
>>> 58384  ??  IVs    0:00.00 cron: running job (cron)
>>
>> In general, when troubleshhoting, "ps axlw" is a more useful command.
>> It adds among other columns, the MWCHAN one, which details exactly why
>> a process is stuck in the D state. 
>> Anyway, cron does a fork and then a vfork creating a child and a
>> grandchild process.  I'm sort of surprised at the amount of code
>> between vfork and exec in the grandchild in
>> /src/usr.sbin/cron/cron/do_command.c .  Since process 3 is actually
>> using process 2's address space one must be extremely careful not to
>> modify static variables or change other global state that would affect
>> the parent once it resumes execution, and all the logging,
>> environment-setting, and user-context calls are certain to mess with
>> the parent's state, especially with nss modules in the mix.  I'd
>> personally recompile cron with all vforks replaced with fork and see
>> what happens.
>>
>> It couldn't hurt to update to a newer kernel version along the RELENG_6
>> branch as a test, I guess.  Note that your uname will change to
>> 6.3-PRERELEASE, but apart from causing lsof to complain, you should be
>> okay.
>>
>>> /var/log/cron has this entry:
>>> Dec  3 20:16:00 pita /usr/sbin/cron[58384]: (root) CMD  
>>> (/root/bin/raid-status.sh CRON)
>>>
>>> BUT there is no 'raid-status.sh' stuck in the "ps axw".  Seems like 
>>> the vfork set off the cronjob, it ran, but then cron didn't 'stop' 
>>> executing.  Any debuggin tips?
>>
>> Can you tell if raid-status.sh ever ran?  i.e. is process 2
>> stuck at the start of vfork or at the end.
> 
> I added this line to the top of my cronjob:
>  logger -t DEBUG "$0: $$"
> and cron seems stuck BEFORE the script is ever run.  Whether it sticks 
> or not appears random, as plenty of log lines are showing up with the 
> output of the logger command in my /var/log/messages.
> 
> # tail /var/log/messages
> Dec 13 11:16:00 pita DEBUG: /root/bin/raid-status.sh: 64414
> Dec 13 12:00:00 pita DEBUG: /root/bin/raid-status.sh: 80115
> Dec 13 12:00:00 pita DEBUG: /root/bin/raid-status.sh: 80119
> Dec 13 12:11:00 pita DEBUG: /root/bin/raid-status.sh: 84283
> 
> Here is the ps output:
> # ps axlw
>   UID   PID  PPID CPU PRI NI   VSZ   RSS MWCHAN STAT  TT       TIME COMMAND
>     0 85939 82253   0   8  0  2148  1560 ppwait D     ??    0:00.00 
> cron: running job (cron)
>     0 85940 85939   0   4  0  2148  1560 sbwait IVs   ??    0:00.00 
> cron: running job (cron)
> # grep 85940 /var/log/cron
> Dec 13 12:16:00 pita /usr/sbin/cron[85940]: (root) CMD 
> (/root/bin/raid-status.sh CRON)
> 
> - Rudy

Just as a favor to an old coot, could you change your
crontab entry to read like this:

*/16  *  *  *  *  "/root/bin/raid-status.sh"

and see if it makes any difference?

Kevin Kinsey
-- 
There are never any bugs you haven't found yet.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4761B4B4.8080100>