From owner-freebsd-stable@FreeBSD.ORG Wed Nov 3 23:14:57 2004 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3061816A4CE for ; Wed, 3 Nov 2004 23:14:57 +0000 (GMT) Received: from gen129.n001.c02.escapebox.net (gen129.n001.c02.escapebox.net [213.73.91.129]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6F64443D58 for ; Wed, 3 Nov 2004 23:14:56 +0000 (GMT) (envelope-from gemini@geminix.org) Message-ID: <4189666A.9020500@geminix.org> Date: Thu, 04 Nov 2004 00:14:50 +0100 From: Uwe Doering Organization: Private UNIX Site User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.2) Gecko/20041002 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Igor Sysoev References: <4168578F.7060706@geminix.org> <20041103191641.K63546@is.park.rambler.ru> In-Reply-To: <20041103191641.K63546@is.park.rambler.ru> Content-Type: multipart/mixed; boundary="------------040502050407080706070809" Received: from gemini by geminix.org with asmtp (TLSv1:AES256-SHA:256) (Exim 3.36 #1) id 1CPUL6-0007YZ-00; Thu, 04 Nov 2004 00:14:53 +0100 cc: stable@freebsd.org Subject: Re: vnode_pager_putpages errors and DOS? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Nov 2004 23:14:57 -0000 This is a multi-part message in MIME format. --------------040502050407080706070809 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Igor Sysoev wrote: > On Sat, 9 Oct 2004, Uwe Doering wrote: >>[...] >>I wonder whether the unresponsiveness is actually just the result of the >>kernel spending most of the time in printf(), generating warning >>messages. vnode_pager_generic_putpages() doesn't return any error in >>case of a write failure, so the caller (syncer in this case) isn't aware >>that the paging out failed, that is, it is supposed to carry on as if >>nothing happened. >> >>So how about limiting the number of warnings to one per second? UFS has >>similar code in order to curb "file system full" and the like. Please >>consider trying the attached patch, which applies cleanly to 4-STABLE. >>It won't make the actual application causing these errors any happier, >>but it may eliminate the DoS aspect of the issue. > > I have just tried your patch. To test I ran the program from > http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/67919 > > The patch allows me to login on machine while the system reports about > "vnode_pager_putpages: I/O error 28". However, the file system access is > very limited and after some time the system became unresponsible. Limited file system access is to be expected, since vnode_pager_putpages() keeps the number of dirty buffers ('numdirtybuffers') near its upper limit ('hidirtybuffers'). However, the unresponsiveness may be caused by another shortcoming I found in the meantime. When 'numdirtybuffers' is greater or equal 'hidirtybuffers', function bwillwrite() will block until 'numdirtybuffers' drops below some threshold value. bwillwrite() gets called in a number of places that deal with writing data to disk. Two of these places are dofilewrite() (which is in turn called by write() and pwrite()) and writev(). There, bwillwrite() gets called if the file descriptor is of type DTYPE_VNODE. Now, this unfortunately doesn't take into account that ttys, including pseudo ttys, and even /dev/null and friends, are character device nodes and therefore vnodes as well, but have nothing to do with writing data to disk. That is, in case of heavy disk write activity, write attempts to these device nodes get blocked, too! With the consequence that the system appears to become unresponsive at the shell prompt, or reacts very sporadic. Even daemonized processes that happen to log data to /dev/null (on stdout & stderr, for example) will block. What we need here is an additional test that makes sure that in case of a character device bwillwrite() gets called only if the device is in fact a disk. Please consider trying out the attached patch. It will not reduce the heavy disk activity (which is, after all, legitimate), but it is supposed to enable you to operate the system at shell level and kill the offending process, or do whatever is necessary to resolve the problem. Uwe -- Uwe Doering | EscapeBox - Managed On-Demand UNIX Servers gemini@geminix.org | http://www.escapebox.net --------------040502050407080706070809 Content-Type: text/plain; name="sys_generic.c.diff" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="sys_generic.c.diff" --- src/sys/kern/sys_generic.c.orig Tue Sep 14 19:56:53 2004 +++ src/sys/kern/sys_generic.c Sun Sep 26 13:13:46 2004 @@ -48,6 +48,7 @@ #include #include #include +#include #include #include #include @@ -78,6 +79,23 @@ static int dofilewrite __P((struct proc *, struct file *, int, const void *, size_t, off_t, int)); +static __inline int +isndchr(vp) + struct vnode *vp; +{ + struct cdevsw *dp; + + if (vp->v_type != VCHR) + return (0); + if (vp->v_rdev == NULL) + return (0); + if ((dp = devsw(vp->v_rdev)) == NULL) + return (0); + if (dp->d_flags & D_DISK) + return (0); + return (1); +} + struct file* holdfp(fdp, fd, flag) struct filedesc* fdp; @@ -403,7 +420,7 @@ } #endif cnt = nbyte; - if (fp->f_type == DTYPE_VNODE) + if (fp->f_type == DTYPE_VNODE && !isndchr((struct vnode *)(fp->f_data))) bwillwrite(); if ((error = fo_write(fp, &auio, fp->f_cred, flags, p))) { if (auio.uio_resid != cnt && (error == ERESTART || @@ -496,7 +513,7 @@ } #endif cnt = auio.uio_resid; - if (fp->f_type == DTYPE_VNODE) + if (fp->f_type == DTYPE_VNODE && !isndchr((struct vnode *)(fp->f_data))) bwillwrite(); if ((error = fo_write(fp, &auio, fp->f_cred, 0, p))) { if (auio.uio_resid != cnt && (error == ERESTART || --------------040502050407080706070809--