Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 25 Mar 2000 23:51:03 -0600 (CST)
From:      Jonathan Lemon <jlemon@flugsvamp.com>
To:        sue@welearn.com.au, hackers@freebsd.org
Subject:   Re: syslogd stops logging - caught in the act
Message-ID:  <200003260551.XAA37498@prism.flugsvamp.com>
In-Reply-To: <local.mail.freebsd-hackers/20000326140241.C43926@welearn.com.au>

next in thread | previous in thread | raw e-mail | index | archive | help
I asked Sue to get a ktrace of the syslogd, and here's the
output:

 18869 syslogd  954045445.977145 PSIG  SIGALRM caught handler=0x804b068 mask=0x0
 code=0x0
 18869 syslogd  954045445.977343 RET   poll -1 errno 4 Interrupted system call
 18869 syslogd  954045445.977366 CALL  gettimeofday(0xbfbfc5f0,0)
 18869 syslogd  954045445.977382 RET   gettimeofday 0
 18869 syslogd  954045445.977403 CALL  setitimer(0,0xbfbfc5e8,0xbfbfc5d8)
 18869 syslogd  954045445.977424 RET   setitimer 0
 18869 syslogd  954045445.977438 CALL  old.sigreturn(0xbfbfc624)
 18869 syslogd  954045445.977456 RET   old.sigreturn JUSTRETURN
 18869 syslogd  954045445.977476 CALL  poll(0xbfbfc6f0,0x1,0x9c40)
 18869 syslogd  954045475.987785 PSIG  SIGALRM caught handler=0x804b068 mask=0x0
 code=0x0
 18869 syslogd  954045475.987859 RET   poll -1 errno 4 Interrupted system call
 18869 syslogd  954045475.987879 CALL  gettimeofday(0xbfbfc5f0,0)
 18869 syslogd  954045475.987895 RET   gettimeofday 0
 18869 syslogd  954045475.987917 CALL  setitimer(0,0xbfbfc5e8,0xbfbfc5d8)
 18869 syslogd  954045475.987938 RET   setitimer 0
 18869 syslogd  954045475.987952 CALL  old.sigreturn(0xbfbfc624)
 18869 syslogd  954045475.987969 RET   old.sigreturn JUSTRETURN
 18869 syslogd  954045475.987990 CALL  poll(0xbfbfc6f0,0x1,0x9c40)
 18869 syslogd  954045505.997954 PSIG  SIGALRM caught handler=0x804b068 mask=0x0
 code=0x0
 18869 syslogd  954045505.998120 RET   poll -1 errno 4 Interrupted system call


The poll() calls are from libc/net/res_send, while the gettimeofday()
calls are from the alarm handler (in syslogd).  The res_send code does
roughly the following:

	msec = (timeout calculated based on # of tries)
   repeat:
	poll(pfd, 1, msec);
	if (errno == EINTR)
		goto repeat;

So what's happening here is it seems that after the # of tries grows
to a certain point, the timeout being passed to poll() is larger than
the timeout between calls to the SIGALRM handler.  Since the poll()
timeout is not reset, this leads to an infinite loop.

In the traces above, the poll() timeout is 40000msec (== 40 sec),
and the alarm handler is called every 30 sec.

The fix should probably be to change res_send.c so that it properly
decrements it's timeout value after being interrrupted.
--
Jonathan


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200003260551.XAA37498>