From owner-cvs-all  Mon Feb 28  9:49:49 2000
Delivered-To: cvs-all@freebsd.org
Received: from freefall.freebsd.org (freefall.FreeBSD.ORG [204.216.27.21])
	by hub.freebsd.org (Postfix) with ESMTP
	id CE4F837B804; Mon, 28 Feb 2000 09:49:44 -0800 (PST)
	(envelope-from joerg@FreeBSD.org)
Received: (from joerg@localhost)
	by freefall.freebsd.org (8.9.3/8.9.2) id JAA35872;
	Mon, 28 Feb 2000 09:49:44 -0800 (PST)
	(envelope-from joerg@FreeBSD.org)
Message-Id: <200002281749.JAA35872@freefall.freebsd.org>
From: Joerg Wunsch <joerg@FreeBSD.org>
Date: Mon, 28 Feb 2000 09:49:44 -0800 (PST)
To: cvs-committers@FreeBSD.org, cvs-all@FreeBSD.org
Subject: cvs commit: src/usr.sbin/syslogd syslogd.c
Sender: owner-cvs-all@FreeBSD.ORG
Precedence: bulk

joerg       2000/02/28 09:49:44 PST

  Modified files:
    usr.sbin/syslogd     syslogd.c 
  Log:
  Fix a serious bug in syslogd regarding the handling of pipes.  The bug
  would cause syslogd to eventually kill innocent processes in the
  system over time (note: not `could' but `would').  Many thanks to my
  colleague Mirko for digging into the kernel structures and providing
  me with the debugging framework to find out about the nature of this
  bug (and to isolate that syslogd was the culprit) in a rather large
  set of distributed machines at client sites where this happened
  occasionally.
  
  Whenever a child process was no longer responsive, or when syslogd
  receives a SIGHUP so it closes all its logging file descriptors, for
  any descriptor that refers to a pipe syslogd enters the data about the
  old logging child process into a `dead queue', where it is being
  removed from (and the status of the dead kitten being fetched) upon
  receipt of a SIGCHLD.  However, there's a high probability that the
  SIGCHLD already arrives before the child's data are actually entered
  into the dead queue inside the SIGHUP handler, so the SIGCHLD handler
  has nothing to fetch and remove and simply continues.  Whenever this
  happens, the process'es data remain on the dead queue forever, and
  since domark() tried to get rid of totally unresponsive children by
  first sending a SIGTERM and later a SIGKILL, it was only a matter of
  time until the system had recycled enough PIDs so an innocent process
  got shot to death.
  
  Fix the race by masking SIGHUP and SIGCHLD from both handlers mutually.
  
  Add additional bandaids ``just in case'', i. e. don't enter a process
  into the dead queue if we can't signal it (this should only happen in
  case it is already dead by that time so we can fetch the status
  immediately instead of deferring this to the SIGCHLD handler); for the
  kill(2) inside domark(), check for an error status (/* Can't happen */
  :) and remove it from the dead queue in this case (which if it would
  have been there in the first place would have reduced the problem to a
  statistically minimal likelihood so i certainly would never have
  noticed the bug at all :).
  
  Mirko also reviewed the fix in priciple (mutual blocking of both
  signals inside the handlers), but not the actual code.
  
  Reviewed by:	Mirko Kaffka <mirko@interface-business.de>
  Approved by:	jkh
  
  Revision  Changes    Path
  1.58      +97 -36    src/usr.sbin/syslogd/syslogd.c


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe cvs-all" in the body of the message