From owner-freebsd-hackers  Thu Feb  4 20:41:45 1999
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id UAA24067
          for freebsd-hackers-outgoing; Thu, 4 Feb 1999 20:41:45 -0800 (PST)
          (envelope-from owner-freebsd-hackers@FreeBSD.ORG)
Received: from skynet.ctr.columbia.edu (skynet.ctr.columbia.edu [128.59.64.70])
          by hub.freebsd.org (8.8.8/8.8.8) with SMTP id UAA24054
          for <hackers@freebsd.org>; Thu, 4 Feb 1999 20:41:38 -0800 (PST)
          (envelope-from wpaul@skynet.ctr.columbia.edu)
Received: (from wpaul@localhost) by skynet.ctr.columbia.edu (8.6.12/8.6.9) id XAA10060; Thu, 4 Feb 1999 23:48:21 -0500
From: Bill Paul <wpaul@skynet.ctr.columbia.edu>
Message-Id: <199902050448.XAA10060@skynet.ctr.columbia.edu>
Subject: Re: Seen fxp or mbuf problems?
To: julian@whistle.com (Julian Elischer)
Date: Thu, 4 Feb 1999 23:48:11 -0500 (EST)
Cc: hackers@FreeBSD.ORG
In-Reply-To: <36BA4603.1CFBAE39@whistle.com> from "Julian Elischer" at Feb 4, 99 05:14:43 pm
X-Mailer: ELM [version 2.4 PL24]
Content-Type: text
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

Of all the gin joints in all the towns in all the world, Julian Elischer 
had to walk into mine and say:

> Anyone seen bugs in fxp driver or mbuf related code recently?
> 
> Here is a crash dump from a system about 10 days old (3.x series)
> 
> We are willing to believe that we've done this (we do enough 
> networking stuff but I'm just looking to see if there
> is anyone else that has seen this.
> 
> julian
> 

> #5  0xf015f320 in fxp_add_rfabuf (sc=0xf059ce00, oldm=0xf0390400)
>     at ../../pci/if_fxp.c:1535
[...]
>                 MCLGET(m, M_DONTWAIT);   <-------- error here.
[...]
> (kgdb) p mclfree
> $3 = (union mcluster *) 0xa0225000
> 
> *cough*

This means that either some code in the kernel has s stale pointer to
an mbuf cluster and has modified it after it was released, or the
Intel chip itself has been given the address of this cluster buffer
and it DMA'ed data into it after it had been released. Unfortunately,
the trashed buffer has already been reallocated by a call to MCLGET()
immediately prior to this one; when you pull a cluster buffer off the 
free list, its first 4 bytes contains the address of the next buffer in 
the free list. (Well... I suppose it's 8 bytes on the alpha.) This 
address gets saved in mclfree and then the buffer gets handed out. If a 
buffer is trashed while it's sitting on the free list and then it gets 
reallocated, mclfree will be loaded with garbage, and the next time 
you call MCLGET(), hijinx will ensue.

If you can reproduce the crash reliably, you might be able to catch
mclfree getting clobbered by modifying the MCLGET() macro to test for
'reasonably sane' values when updating mclfree and then panic()ing if
it spots an insane one.

-Bill

-- 
=============================================================================
-Bill Paul            (212) 854-6020 | System Manager, Master of Unix-Fu
Work:         wpaul@ctr.columbia.edu | Center for Telecommunications Research
Home:  wpaul@skynet.ctr.columbia.edu | Columbia University, New York City
=============================================================================
 "It is not I who am crazy; it is I who am mad!" - Ren Hoek, "Space Madness"
=============================================================================

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message