Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 1 Sep 1999 07:24:37 +0930
From:      Greg Lehey <grog@lemis.com>
To:        Bernd Walter <ticso@cicely.de>
Cc:        Matthew Dillon <dillon@apollo.backplane.com>, Mike Smith <mike@smith.net.au>, Parag Patel <parag@cgt.com>, freebsd-current@FreeBSD.ORG
Subject:   Help needed with debugging (was: 4.0-CURRENT SMP crash with vinum raid-5 and softupdates)
Message-ID:  <19990901072437.A86067@freebie.lemis.com>
In-Reply-To: <19990830075311.A30271@cicely8.cicely.de>; from Bernd Walter on Mon, Aug 30, 1999 at 07:53:12AM %2B0200
References:  <199908292224.PAA15435@dingo.cdrom.com> <199908292348.QAA07774@apollo.backplane.com> <19990830075311.A30271@cicely8.cicely.de>

next in thread | previous in thread | raw e-mail | index | archive | help
On Monday, 30 August 1999 at  7:53:12 +0200, Bernd Walter wrote:
> On Sun, Aug 29, 1999 at 04:48:32PM -0700, Matthew Dillon wrote:
>> :
>> :How similar?  The trap above is extremely bad; it looks like a return
>> :on a corrupted stack or a jump through a null function vector.
>> :
>> :Make very sure that your vinum kld is in sync with your kernel.
>>
>>     This looks like an indirect call through a NULL function pointer.
>
> In my case it is a call to bp->b_iodone in kern/vfs_bio.c:2580 which is 0 :(

We seem to have a specific problem here.  To summarize:

1.  It was first reported (by Bernd) on 15 August.

2.  The system crashes in biodone because the buffer header has the
    B_CALL flag set, but the value of bp->b_iodone is NULL.

3.  bp->b_iodone is only set to NULL (well, 0) in one place:
    getnewbuf, which I don't call.  It gets reset to a previous value
    in dsiodone, which is about the only place where things could
    conceivably go wrong.  I put some check code in at every
    conceivable place, and the only place which found the situation
    was in biodone.

4.  In every case, the fields just before b_iodone were also zeroed:

      b_dev = 0xc098d840,
      b_data = 0xc0befc00 "\2275\t",
      b_kvabase = 0x0,
      b_kvasize = 0x0,
      b_lblkno = 0x0,
      b_blkno = 0x0,
      b_offset = 0x0,
      b_iodone = 0,
      b_iodone_chain = 0x0,
      b_vp = 0xc5d4a700,

    The b_dev and b_vp fields are OK, and b_data looks OK as well.
    I'm guessing that something is overwriting some of the fields in
    the header, but it's always the same, and I can't find anything in
    the code that does that.

5.  In one case yesterday, there were two requests involved in the
    vinum request.  The other one had already completed (been through
    biodone), but the bp->b_iodone word was zeroed out in the same
    manner.  In addition, other fields have been set by Vinum's iodone
    function.  From this I deduce that the fields were zeroed after
    biodone, which makes it very unlikely that it was done by Vinum.

I don't know how to proceed at the moment.  Matt Dillon has suggested
adding some dummy fields in the buffer header and setting them to
known values, but I expect this will drive the problem into hiding.
Instead, I'm migrating the whole thing to -STABLE to see if it happens
there.  In view of the impending release of 3.3, this makes sense
anyway.  In the meantime, if anybody has any ideas, or if any of this
rings a bell, I'd be grateful for feedback.

Greg
--
See complete headers for address, home page and phone numbers
finger grog@lemis.com for PGP public key


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19990901072437.A86067>