From owner-freebsd-hackers Thu Feb 25 11:35:47 1999 Delivered-To: freebsd-hackers@freebsd.org Received: from arjun.niksun.com (gw.niksun.com [206.20.52.122]) by hub.freebsd.org (Postfix) with ESMTP id 6A7DA14DA3 for ; Thu, 25 Feb 1999 11:35:41 -0800 (PST) (envelope-from ath@niksun.com) Received: from stiegl.niksun.com (stiegl.niksun.com [10.0.0.44]) by arjun.niksun.com (8.8.8/8.8.8) with ESMTP id OAA26572 for ; Thu, 25 Feb 1999 14:35:25 -0500 (EST) Received: from stiegl.niksun.com (localhost.niksun.com [127.0.0.1]) by stiegl.niksun.com (8.8.8/8.8.7) with ESMTP id OAA06361 for ; Thu, 25 Feb 1999 14:35:23 -0500 (EST) (envelope-from ath@stiegl.niksun.com) Message-Id: <199902251935.OAA06361@stiegl.niksun.com> From: Andrew Heybey To: freebsd-hackers@freebsd.org Subject: Advice wanted on tracking down bug (or hw problem?) in 3.1R Mime-Version: 1.0 (generated by tm-edit 7.108) Content-Type: text/plain; charset=US-ASCII Date: Thu, 25 Feb 1999 14:35:23 -0500 Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG I have just submitted PR kern/10243, but I thought I would ask for some advice on hackers as well. The bug is that under certain loads, read(2) can return corrupted data (ie data that are not in the file on disk). The instances I have seen are relatively small amounts (8-64 bytes) of corrupt data at the end of a 4k page. The corrupt data is from a file previously read or another position in the current file. I have also seen this problem in 3.0-RELEASE but not in 2.2.8-RELEASE. The load under which I see this bug is several programs reading data from disk combined with a very high network interrupt rate (about 45k pkts/sec on an fxp interface in promiscuous mode with a bpf listener). The PR has a longer description of exactly what I am doing to reproduce the bug. I put a tar file containing a small set of programs that I use to generate this load at http://www.niksun.com/~ath/fbsd_bug.tgz if anyone wants to try to reproduce this. It looks to me like not enough splfoo() calls someplace, but I'm not sure where to start looking. Cam? VM? UFS? BPF (though it seems unlikely that BPF would reach out and mess with data from another process)? It is extremely load sensitive so it is difficult to reproduce the same way every time. I can almost always make it happen within 5-10 minutes of testing but not in exactly the same way.q I have reproduced the bug on two different machines, so I don't think that the hw is defective (though the machines have substantially the same kind of hardware so it could be a HW bug of some kind). I would sure appreciate it if someone with a larger collection of clues than I would take a look at this or give me some advice as to where I should start looking. thanks, andrew To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message