Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 14 Jul 2003 17:29:36 -0700
From:      "Andrew Kinney" <andykinney@advantagecom.net>
To:        John Fox <jjf@mind.net>, freebsd-hackers@freebsd.org
Subject:   Re: Kernel panic when moving lots of data over network
Message-ID:  <3F12E880.3304.39F3E27D@localhost>
In-Reply-To: <20030709175336.GF5200@mind.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On 9 Jul 2003, at 10:53, John Fox wrote:

> Strange problem on a new server we're setting up.  It's very stable,
> except when moving a large amount of data onto it via the network.  I
> begin moving approx 4GB of data onto it, and before the xfer can
> complete, the system panics and reboots.  (I am generally able to get
> from 1 to 2 GB transferred before the panic occurrs.)
> 

I'm not really a kernel hacker, but I've solved lots of our own kernel 
problems on 4.5 release, 4.7 release, and 4.8 release with the help 
of others on this list.  

We haven't had any problems exactly like what you described, but 
I seem to remember some open PRs relating to SSH and/or the xl 
network driver causing panics.  You might want to browse through 
them and see if any match your situation.  FWIW, though, we run 
4.8-RELEASE, SSH, and the xl driver (3com 905C-TX, I believe) on 
one of our heavily used dual CPU machines and don't have any 
problems, so I'd be surprised if any of those PRs had any bearing 
on this.  We don't do any large file transfers over SSH, though.  We 
usually use rsync for that since we deal with lots of little files that 
get out of synch easily.


> #6  0xc021745f in xl_newbuf ()
> #7  0xc021761e in xl_rxeof ()
> #8  0xc0219296 in xl_watchdog ()
> #9  0xc01b662f in if_slowtimo ()
> #10 0xc0180799 in softclock ()


Here's some slightly educated guesses that you'll want to eliminate 
until you isolate the trouble:

1.  My experience is that a lot of "trap 12" seem to come from 
running out of some hard limited kernel resource.  Try logging the 
sysctl vm.zone once a minute through cron to see if you're 
bumping any of those limits.  You'll also want to try logging sysctl 
kvm_free in the same manner to make sure you're not running out 
of KVA or KVM.  Our system is setup with 2GB KVA (default is 
1GB) which solved all the trap 12 issues our system was having 
due to running out of KVA/KVM.

2.  Check your RAM.  Bad RAM caused us innumerable 
headaches from seemingly random trap 12 problems on one of our 
other systems.  Usually hit on some buffer allocation, especially 
when that was the primary activity in RAM.  SSH is especially 
sensitive to bad RAM.  We could usually trigger a panic on a 
system with bad RAM just by excercising SSH a bit.

3.  Some unknown or known problem with the xl driver and long file 
transfers over SSH.  Check those PRs (sorry, don't know the 
numbers off hand).

Sincerely,
Andrew Kinney
President and
Chief Technology Officer
Advantagecom Networks, Inc.
http://www.advantagecom.net



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3F12E880.3304.39F3E27D>