From owner-freebsd-bugs  Tue May 11 15:30:57 1999
Delivered-To: freebsd-bugs@freebsd.org
Received: from luke.pmr.com (luke.pmr.com [207.170.114.132])
	by hub.freebsd.org (Postfix) with ESMTP
	id B74BF15A38; Tue, 11 May 1999 15:30:42 -0700 (PDT)
	(envelope-from bob@luke.pmr.com)
Received: (from bob@localhost)
	by luke.pmr.com (8.9.3/8.9.2) id RAA34133;
	Tue, 11 May 1999 17:30:19 -0500 (CDT)
	(envelope-from bob)
Date: Tue, 11 May 1999 17:30:19 -0500
From: Bob Willcox <bob@luke.pmr.com>
To: Pierre Beyssac <beyssac@enst.fr>
Cc: Bob Willcox <bob@pmr.com>, freebsd-bugs@freebsd.org,
	FreeBSD-gnats-submit@freebsd.org
Subject: Re: kern/10872: Panic in sorecieve()
Message-ID: <19990511173019.A33995@luke.pmr.com>
Reply-To: Bob Willcox <bob@pmr.com>
References: <19990511185956.A12679@enst.fr> <19990511124117.A28606@luke.pmr.com> <19990511195311.R427@enst.fr>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 0.95.1i
In-Reply-To: <19990511195311.R427@enst.fr>; from Pierre Beyssac on Tue, May 11, 1999 at 07:53:11PM +0200
Sender: owner-freebsd-bugs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

Well, I can easily recreate the panic with -current as of this morning.
I tried the "maxusers 128" change and that did not help.  I have
attached a slightly modified test shell script that I have been using.

I run this shell script on three other systems simultaneously, all
writing to the same SCSI disk on the test system (this sort of simulates
amanda activity with multiple systems all dumping to the holding disk).
As I mentioned in an earlier note, these systems are all connected
together via a 100mbps full-duplex switching hub.  Two of them are
running 3.1-stable and the other is running 2.2.8-release.

I run the tests simultaneously on the three systems as follows:

On obiwan:
./panic_test 5 10000 lando /stuff/tmp/obiwan

On deathstar:
./panic_test 5 10000 lando /stuff/tmp/deathstar

On luke:
./panic_test 5 10000 lando /stuff/tmp/luke

(I've got kind of a Star Wars theme going here)

Usually within about 5 minutes lando panics.  Note that I have built
lando's kernel with the options INVARIANTS and INVARIANT_SUPPORT.  If
you don't, you'll still get a panic (sbdrop) but it will occur later on
during the close of the socket instead of the "receive 1" panic due to
the KASSERT() that we've been talking about.

One more thing...I never got low on mbufs prior to the panic.

Thanks,
Bob

On Tue, May 11, 1999 at 07:53:11PM +0200, Pierre Beyssac wrote:
> On Tue, May 11, 1999 at 12:41:17PM -0500, Bob Willcox wrote:
> > fix).  The problem as I have seen it is that the mbuf chain pointer (m)
> > is NULL and so_rcv.sb_cc is not zero.  Its as though somewhere either
> > the mbuf chain pointer gets zapped with NULL or something fails to
> 
> This can happen when the system is out of mbufs. Sadly there are
> many places in the kernel where the condition is not trapped at
> all.
> 
> How many mbufs does netstat -m report on your system? Maybe I
> couldn't reproduce it because my kernel is configured with maxusers
> 128, which yields more mbufs. You can try that as a temporary fix.
> 
> > properly update so_rcv.sb_cc as mbufs are processed.
> > 
> > I believe one can expand the KASSERT macro and rewrite the line:
> > 	if (m == 0 && so->so_rcv.sb_cc != 0)
> 
> Oops, you're right. I stupidly looked at so_snd.sb_cc in the debug
> output, which is 0.
> 
> I prefer that, it'll probably be easier to fix.
> -- 
> Pierre Beyssac		pb@enst.fr

-- 
Bob Willcox             The man who follows the crowd will usually get no
bob@luke.pmr.com        further than the crowd.  The man who walks alone is
Austin, TX              likely to find himself in places no one has ever
                        been.            -- Alan Ashley-Pitt


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-bugs" in the body of the message