From owner-freebsd-hackers@FreeBSD.ORG  Mon Jan 24 15:39:24 2005
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 6039616A4CE
	for <freebsd-hackers@freebsd.org>;
	Mon, 24 Jan 2005 15:39:24 +0000 (GMT)
Received: from wproxy.gmail.com (wproxy.gmail.com [64.233.184.198])
	by mx1.FreeBSD.org (Postfix) with ESMTP id D6CE543D1F
	for <freebsd-hackers@freebsd.org>;
	Mon, 24 Jan 2005 15:39:23 +0000 (GMT)
	(envelope-from felixhc@gmail.com)
Received: by wproxy.gmail.com with SMTP id 71so355038wra
	for <freebsd-hackers@freebsd.org>;
	Mon, 24 Jan 2005 07:39:23 -0800 (PST)
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws;
	s=beta; d=gmail.com;
	h=received:message-id:date:from:reply-to:to:subject:mime-version:content-type:content-transfer-encoding;
	b=SEDv3gJUdiD8cXmO5aArewvEzrfSw61TGVLAthBLY1zCTDBv/TRxxm7/diEMh58RsBzgEMbxKLdcVg78hkRXIBZnxkrYqX8iI8hq89SWOUW26/esKZcUKikpwbr0P/PEgh91hKHv4Ea/OTOUdIIuTHQGpbHeQbwytaqP6Ck7FE4=
Received: by 10.54.23.3 with SMTP id 3mr32840wrw;
        Mon, 24 Jan 2005 07:39:23 -0800 (PST)
Received: by 10.54.24.13 with HTTP; Mon, 24 Jan 2005 07:39:23 -0800 (PST)
Message-ID: <ea018ca805012407392b1180a3@mail.gmail.com>
Date: Mon, 24 Jan 2005 10:39:23 -0500
From: Felix Hernandez-Campos <felixhc@gmail.com>
To: freebsd-hackers@freebsd.org
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Subject: syslogd pipelines losing messages
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
Reply-To: at <felixhc@FreeBSD.ORG>
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 24 Jan 2005 15:39:24 -0000

Hi,

I've been using syslogd to monitor a few hundred hosts, and I
encountered an interesting problem, which seems to be explained by a
bug in syslogd. In my work, I'm using a slow Pentium II host running
FreeBSD 4.10 to collect syslog messages that arrive at a rate of at
most a couple hundred messages per second. A large percentage of these
messages is useless and I filter them out with grep using a syslog
pipeline, i.e.,/etc/syslog.conf reads

local0.*        | exec grep -v DUPLEX_MISMATCH >> /usr/home2/wap_syslog_loc=
al0

Given the moderate arrival rate and that the machine is only used for
this purpose, I would not expect any losses of syslog messages. After
I disabled DNS lookups (syslogd -n), syslogd's CPU usage is very small
(1%), and grep's one is even smaller. However, I observe a large
number of "resource temporarily unavailable" messages in
/var/log/messages. Furthermore, I also noticed that each of these
errors also implies a restart of the grep process (which worried me).

I tracked this down to the failure of a writev call in syslogd. This
is the relevant part of the code (see writev in second "if"
statement):

fprintlog(struct filed *f, int flags, const char *msg) {
=E2=80=A6
=09case F_PIPE:
=09=09dprintf(" %s\n", f->f_un.f_pipe.f_pname);
=09=09v->iov_base =3D lf;
=09=09v->iov_len =3D 1;
=09=09if (f->f_un.f_pipe.f_pid =3D=3D 0) {
=09=09=09if ((f->f_file =3D p_open(f->f_un.f_pipe.f_pname,
=09=09=09=09=09=09&f->f_un.f_pipe.f_pid)) < 0) {
=09=09=09=09f->f_type =3D F_UNUSED;
=09=09=09=09logerror(f->f_un.f_pipe.f_pname);
=09=09=09=09break;
=09=09=09}
=09=09}
=09=09if (writev(f->f_file, iov, 7) < 0) {
=09=09=09int e =3D errno;
=09=09=09(void)close(f->f_file);
=09=09=09if (f->f_un.f_pipe.f_pid > 0)
=09=09=09=09deadq_enter(f->f_un.f_pipe.f_pid,
=09=09=09=09=09    f->f_un.f_pipe.f_pname);
=09=09=09f->f_un.f_pipe.f_pid =3D 0;
=09=09=09errno =3D e;
=09=09=09logerror(f->f_un.f_pipe.f_pname);
=09=09}
=09=09break;

For some reason, the pipeline is often not writeable and writev fails
with errno EWOULDBLOCK (f_file is opened in non-blocking mode). After
writev fails, the message is quietly discarded, and the pipeline is
closed. This means that the process reading the pipeline (grep) does
finish its work gracefully, but one message is completely lost. The
next message goes through the first "if" statement in the "switch",
which creates a new pipeline reader. Interestingly, if I add code
inside the second "if" statement to give grep a chance to complete,
e.g.,

                        if (e =3D=3D EWOULDBLOCK) {
                                usleep(1);
                                if (writev(f->f_file, iov, 7) >=3D 0) {
                                        logerror("IT WORKED!");
                                        break;
                                }
                        }

the writev call always works. I'm not familiar with the pipeline
implementation, but I would guess the reader is locking the pipeline,
and sometimes syslogd ends up running in the middle of the read
operation, so writev fails. The usleep call gives the pipeline reader
a chance to finish, so the second writev call works fine every time
after my change. My solution is just a proof of concept that solves my
problem, but I wonder whether someone in this list can suggest a
better solution, which should go into the source tree. The simplest
one would be to make f_file blocking, but there was a good reason for
making it non-blocking in the first place, as stated in the code

p_open(const char *prog, pid_t *rpid) {
=E2=80=A6
=09/*
=09 * Avoid blocking on a hung pipe.  With O_NONBLOCK, we are
=09 * supposed to get an EWOULDBLOCK on writev(2), which is
=09 * caught by the logic above anyway, which will in turn close
=09 * the pipe, and fork a new logging subprocess if necessary.
=09 * The stale subprocess will be killed some time later unless
=09 * it terminated itself due to closing its input pipe (so we
=09 * get rid of really dead puppies).
=09 */
=09if (fcntl(pfd[1], F_SETFL, O_NONBLOCK) =3D=3D -1) {=E2=80=A6}

Anyway, I think I did all the homework, and I just need someone to
suggest an elegant solution rather than my usleep (is there a
yield-type syscall?). I'm more than willing to try out your ideas in
our environment and prepare the patch when something good comes up.

Best regards,
Felix Hernandez-Campos