From owner-freebsd-current Fri Sep 21 11:31:51 2001 Delivered-To: freebsd-current@freebsd.org Received: from InterJet.elischer.org (c421509-a.pinol1.sfba.home.com [24.7.86.9]) by hub.freebsd.org (Postfix) with ESMTP id 7FE1637B42A; Fri, 21 Sep 2001 11:31:43 -0700 (PDT) Received: from localhost (localhost.elischer.org [127.0.0.1]) by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id MAA37306; Fri, 21 Sep 2001 12:00:37 -0700 (PDT) Date: Fri, 21 Sep 2001 12:00:36 -0700 (PDT) From: Julian Elischer To: Josef Karthauser Cc: Dag-Erling Smorgrav , Jun Kuriyama , Julian Elischer , current@freebsd.org Subject: Re: Problems with interrupts on -current. In-Reply-To: <20010921135558.A761@tao.org.uk> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG I must say I'm worried, but stumped.. I cannot see this problem here, and I cannot think of a change in the KSE support stuff that would have this effect. There was soem small change in the statistics gathering code that is done at clock time, but nothing so low-level as to effect the further generation of clock ticks. It does sound as thuogh statclock has been stopped though. On Fri, 21 Sep 2001, Josef Karthauser wrote: > [This is the continuation of a thread that started on -committers] > > On Sun, Sep 16, 2001 at 02:48:48PM +0100, Josef Karthauser wrote: > > On Sun, Sep 16, 2001 at 01:35:20AM +0100, Josef Karthauser wrote: > > > On Sat, Sep 15, 2001 at 03:51:07PM +0200, Dag-Erling Smorgrav wrote: > > > > Josef Karthauser writes: > > > > > Is there a possibility that this commit is causing me to lose key > > > > > presses? I'm finding it hard to imagine that I'm miss typing as > > > > > I've never noticed it before. (Every N, where N is > 30 or 40, a key > > > > > that I press doesn't register and I have to press it again). > > > > > > > > Educated guess: your interrupt latency just went to hell (where mine's > > > > been for three months now, I'm still waiting to hear if Matt could > > > > make any sense out of my crash dump) and you're losing interrupts. If > > > > you have a serial mouse, try moving it around a lot and see if it > > > > seems to hang (you should see mentions of interrupt-level buffer > > > > overflows in your /var/log/messages). Also, just for kicks, check how > > > > much CPU time your syncer process is using, and try running sync(8) > > > > and see if your keyboard wedges for a couple of seconds when you do > > > > that. > > > > > > My mouse is /dev/psm0. From time to time the ata device's > > > interrupt/second goes through the roof for not apparent reason (i.e. > > > several hundred interrupts/sec). Sync never wedges anything. > > > > There's almost definitely an interrupt problem. I regularly have > > the machine wedge almost solid when rsyncing a lot of data to and > > fro. The machine begins to behave eratically, which I now think > > happens mainly because all the timers stop working (maybe the > > interrupts stop working?), 'systat -vmstat' doesn't produce any > > numbers because the initial time delay never passes. :(. Also, I > > don't appear to be able to enter the kernel debugger when this > > happens! :( Can someone in the know give me a hand debugging this. > > It really ought to be fixed, but my knowledge isn't sufficient to > > find this on my own. > > > > Thanks, > > Joe > > This also happens from time to time: > > > 6 users Load 1.39 1.23 1.14 Sep 21 13:32 > > Mem:KB REAL VIRTUAL VN PAGER SWAP PAGER > Tot Share Tot Share Free in out in out > Act 62696 8932 111764 14728 15052 count > All 249864 12164 2806932 25860 pages > Interrupts > Proc:r p d s w Csw Trp Sys Int Sof Flt 1 cow 1743 total > 6 32 12398 13 866 1823 26 45516 wire stray irq0 > 90820 act stray irq6 > 8.3%Sys 5.1%Intr 0.2%User 0.0%Nice 86.4%Idl 102140 inact stray irq7 > | | | | | | | | | | 11388 cache 1 acpi0 irq9 > ====+++ 3664 free 1505 ata0 irq14 > daefr uhci0 irq5 > Namei Name-cache Dir-cache 5 prcfr 2 pcm0 irq5 > Calls hits % hits % react 7 atkbd0 irq > 688 687 100 pdwak psm0 irq12 > 4 zfod pdpgs 100 clk irq0 > Disks ad0 fd0 ofod intrn 128 rtc irq8 > KB/t 6.00 0.00 9 %slo-z 35712 buf > tps 1507 0 7 tfree 10 dirtybuf > MB/s 8.83 0.00 17913 desiredvnodes > % busy 98 0 14595 numvnodes > 4798 freevnodes > > > Look at the number of interrupts that the ata device is generating. > This is in no way normal! It happens randomly and causes the machine > to basically grind to a halt. > > As a comparison on the same machine, here's the output of systat -vmstat > for the machine after I rebooted it and it was running a background > fsck: > > > 4 users Load 1.01 0.42 0.16 Sep 21 13:50 > > Mem:KB REAL VIRTUAL VN PAGER SWAP PAGER > Tot Share Tot Share Free in out in out > Act 40328 3848 71980 4408 53308 count > All 200248 6884 1085132 10232 pages > Interrupts > Proc:r p d s w Csw Trp Sys Int Sof Flt cow 329 total > 2 30 622 11 955 402 2 34 35928 wire stray irq0 > 35492 act stray irq6 > 1.4%Sys 1.9%Intr 1.2%User 0.6%Nice 94.9%Idl 128800 inact stray irq7 > | | | | | | | | | | 28 cache acpi0 irq9 > =+- 53280 free 97 ata0 irq14 > daefr uhci0 irq5 > Namei Name-cache Dir-cache prcfr 1 pcm0 irq5 > Calls hits % hits % react 3 atkbd0 irq > 536 534 100 pdwak psm0 irq12 > 8 zfod pdpgs 100 clk irq0 > Disks ad0 fd0 1 ofod intrn 128 rtc irq8 > KB/t 7.99 0.00 7 %slo-z 35712 buf > tps 97 0 1 tfree 33 dirtybuf > MB/s 0.76 0.00 17913 desiredvnodes > % busy 98 0 1655 numvnodes > 29 freevnodes > > > Who's responsible for this area? I'm happy to help in getting to the > bottom of it. Is it an interrupt routing problem? It is a ata device > problem? It is something else (maybe locking) altogether? > > This problem has existed in -current for at least 6 weeks. > > Thanks for any suggestions, > Joe > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message