From owner-freebsd-bugs Sat Oct 25 15:33:28 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.7/8.8.7) id PAA27661 for bugs-outgoing; Sat, 25 Oct 1997 15:33:28 -0700 (PDT) (envelope-from owner-freebsd-bugs) Received: from pat.idi.ntnu.no (0@pat.idi.ntnu.no [129.241.103.5]) by hub.freebsd.org (8.8.7/8.8.7) with ESMTP id PAA27654 for ; Sat, 25 Oct 1997 15:33:23 -0700 (PDT) (envelope-from Tor.Egge@idi.ntnu.no) Received: from idt.unit.no (tegge@ikke.idi.ntnu.no [129.241.111.65]) by pat.idi.ntnu.no (8.8.6/8.8.6) with ESMTP id AAA16319; Sun, 26 Oct 1997 00:33:12 +0200 (MET DST) Message-Id: <199710252233.AAA16319@pat.idi.ntnu.no> to: dillon@best.net cc: freebsd-bugs@hub.freebsd.org subject: re: kern/4844: vm lookup, endless loop in vm_map_lookup_entry() in-reply-to: your message of "sat, 25 oct 1997 01:20:01 -0700 (pdt)" references: <199710250820.baa21715@hub.freebsd.org> x-mailer: mew version 1.70 on emacs 19.34.1 mime-version: 1.0 Content-Type: Text/Plain; charset=iso-8859-1 Date: Sun, 26 Oct 1997 00:33:12 +0200 From: Tor Egge Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by hub.freebsd.org id PAA27656 Sender: owner-freebsd-bugs@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk > hmm (looking at pr4630). this looks like a rather serious problem > considering the core nature of brelse(). this may be responsible > for several other crashes we have had involving "biodone: buffer already > done" panics. we've had four or five of those. I've also experienced some panics of the form biodone: buffer already done biodone: buffer already done biodone: buffer already done biodone: buffer already done biodone: buffer already done panic: biodone: buffer not busy The last panic was on a kernel where extra sanity checks should cause an earlier panic if vm_map_entry_create or vm_map_entry_dispose was called during an interrupt. thus this is a different problem. biodone seems to be called several times on the same buffer, probably due to a bug in the low level device driver (ahc). Use of the ccd device makes debugging more difficult, as the buffer has been freed and reused for other purposes in the meantime. 254 operations in progress is clearly *wrong* (in the scsi_link structure). IdlePTD 22b000 current pcb at 20e21c panic: biodone: buffer not busy (kgdb) where #0 boot (howto=260) at ../../kern/kern_shutdown.c:266 #1 0xe0117676 in panic (fmt=0xe0131d6d "biodone: buffer not busy") at ../../kern/kern_shutdown.c:393 #2 0xe0131ec8 in biodone (bp=0xe4098000) at ../../kern/vfs_bio.c:1754 #3 0xe019849c in scsi_done (xs=0xe3071d00) at ../../scsi/scsi_base.c:450 #4 0xe01f295c in ahc_done (ahc=0xe2fa5000, scb=0xe259f880) at ../../i386/scsi/aic7xxx.c:1969 #5 0xe01f04c7 in ahc_intr (arg=0xe2fa5000) at ../../i386/scsi/aic7xxx.c:843 #6 0xe01ccffc in splx (ipl=0) at ../../i386/isa/ipl_funcs.c:93 #7 0xe011915d in tsleep (ident=0xe01fb1d0, priority=22, wmesg=0xe011704b "cpu0wt", timo=10) at ../../kern/kern_synch.c:329 #8 0xe01171d0 in boot (howto=256) at ../../kern/kern_shutdown.c:186 #9 0xe0117676 in panic (fmt=0xe0131d6d "biodone: buffer not busy") at ../../kern/kern_shutdown.c:393 #10 0xe0131ec8 in biodone (bp=0xe3d6db00) at ../../kern/vfs_bio.c:1754 #11 0xe019849c in scsi_done (xs=0xe304c800) at ../../scsi/scsi_base.c:450 #12 0xe01f295c in ahc_done (ahc=0xe2fa5000, scb=0xe259f9e0) at ../../i386/scsi/aic7xxx.c:1969 #13 0xe01f04c7 in ahc_intr (arg=0xe2fa5000) at ../../i386/scsi/aic7xxx.c:843 #14 0xed66 in ?? () #15 0xa1f9 in ?? () #16 0xc3f3 in ?? () #17 0x1095 in ?? () (kgdb) up 2 #2 0xe0131ec8 in biodone (bp=0xe4098000) at ../../kern/vfs_bio.c:1754 (kgdb) up 1 #3 0xe019849c in scsi_done (xs=0xe3071d00) at ../../scsi/scsi_base.c:450 (kgdb) print *xs $1 = {next = 0xe304c800, flags = 2097, sc_link = 0xe25a1900, retries = 4 '\004', spare = "Ç\001À", timeout = 10000, cmd = 0xe3071d58, cmdlen = 10, data = 0xe702d000 "", datalen = 8192, resid = 0, error = 0, bp = 0xe4098000, sense = {error_code = 80 'P', ext = {unextended = { blockhi = 48 '0', blockmed = 194 'Â', blocklow = 0 '\000'}, extended = {segment = 48 '0', flags = 194 'Â', info = "\000`\020\020", extra_len = 64 '@', extra_bytes = "O\000^\b¢\000\\*\211[\000\002\230ñ0\b-ÝÓ\000rÝ \004"}}}, req_sense_length = -2147483638, status = 0, cmdstore = {opcode = 42 '*', bytes = "\000\000\025?ô\000\000\020\000\001\001"}} (kgdb) print *xs->sc_link $2 = {target = 3 '\003', lun = 0 '\000', adapter_targ = 7 '\a', adapter_unit = 1 '\001', adapter_bus = 0 '\000', scsibus = 1 '\001', dev_unit = 7 '\a', opennings = 6 '\006', active = 254 'þ', flags = 4101, quirks = 0, adapter = 0xe020c474, device = 0xe01ffd88, active_xs = 0x0, fordriver = 0x0, devmodes = 0x0, dev = 3384, sd = 0xe259fe40, inqbuf = { device = 0 '\000', dev_qual2 = 0 '\000', version = 2 '\002', response_format = 2 '\002', additional_length = 91 '[', unused = "\000", flags = 62 '>', vendor = "QUANTUM ", product = "XP34550W ", revision = "LXY4", extra = "1847051"}, adapter_softc = 0xe2fa5000} (kgdb) > it sounds to me that a slight modification to the pr4630 suggestion > would work. rather then call bfreekva(), brelse() puts the bp on a > defered free list, yes, but why not clear out this list from > getnewbuf() ? i don't particularly see the need for a high priority > kernel process or other complexity. I agree. The last suggested patch in PR#4630 does not even have a deferred list. Using a deferred list is a more robust (and more complex) solution. > if getnewbuf() (called by getblk()) is not called from an interrupt, > we are home free. i don't think anyone else allocates out of the > buffer_map so the defered frees would not create a secondary effect > anywhere else. If getnewbuf() is called from an interrupt, something is seriously broken. - Tor Egge