From owner-freebsd-current@FreeBSD.ORG Sat Aug 2 17:57:47 2003 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1DD9537B401; Sat, 2 Aug 2003 17:57:47 -0700 (PDT) Received: from stork.mail.pas.earthlink.net (stork.mail.pas.earthlink.net [207.217.120.188]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8A9D143F75; Sat, 2 Aug 2003 17:57:46 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from user-38lc076.dialup.mindspring.com ([209.86.0.230] helo=mindspring.com) by stork.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 19j7Bw-0003L6-00; Sat, 02 Aug 2003 17:57:45 -0700 Message-ID: <3F2C5DD1.36570B38@mindspring.com> Date: Sat, 02 Aug 2003 17:56:49 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Greg 'groggy' Lehey References: <1079.192.168.0.3.1059811884.squirrel@webmail.aminor.no> <3F2B803C.21D38E0B@mindspring.com> <20030803000302.GE95375@wantadilla.lemis.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a4e472d9215543838e141a1f1e23159f1e3ca473d225a0f487350badd9bab72f9c350badd9bab72f9c cc: current@freebsd.org Subject: Re: Yet another crash in FreeBSD 5.1 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 03 Aug 2003 00:57:47 -0000 Greg 'groggy' Lehey wrote: > > You don't actually need a crash dump to debug a stack traceback. > > Great! So you know the answer? Please submit a patch. > > Seriously, this is nonsense. Yes, it's a null pointer dereference. > What? That is precisely what doing what I suggested discovers, Greg. If you haven't seen his response posting: (kgdb) list *(g_dev_strategy+29) 0xc02e812d is in g_dev_strategy (/usr/src/sys/geom/geom_dev.c:415). 410 KASSERT(cp->acr || cp->acw, 411 ("Consumer with zero access count in g_dev_strategy")); 412 413 bp2 = g_clone_bio(bp); 414 KASSERT(bp2 != NULL, ("XXX: ENOMEM in a bad place")); 415 bp2->bio_offset = (off_t)bp->bio_blkno << DEV_BSHIFT; 416 KASSERT(bp2->bio_offset >= 0, 417 ("Negative bio_offset (%jd) on bio %p", 418 (intmax_t)bp2->bio_offset, bp)); 419 bp2->bio_length = (off_t)bp->bio_bcount; Clearly, bp2 or bp is NULL at the time of the dereference. > Why? Programmer error. Either bp2 or bp is a NULL pointer. > How do you fix it? It depends on the root cause. If the root cause is that the bp is NULL, then I'd hope that it would have been caught higher up; if it wasn't, then I'd hope that g_clone_bio(bp) would have returned NULL. Is the KASSERT() active at the time of the problem? I don't know; if it isn't, it probably should be converted to an if()...panic(). If it is, then I'd have to expect that the validity fell out from under it as a result of an interrupt, preemption, reentrancy (if the locking didn't prevent it) or SMP races (if the locking didn't prevent it). I really can't answer it for the same reason that I couldn't locate the line in the source code that was failing for him from his posting of hex offsets into functions compiled from unknown source code: I don't have his object set for the problem in question, nor his debug kernel. > Finding the first step doesn't solve the problem. No. Finding the first step is *necessary* to solving the problem, but you are entirely correct in pointing out that it's not in itself *sufficient*. But it's one step farther along than he was. I didn't see anyone else helping him take that first step, so I did. -- Terry