From owner-freebsd-hackers@FreeBSD.ORG  Mon Jul 14 17:31:16 2003
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id D5B2937B401
	for <freebsd-hackers@freebsd.org>;
	Mon, 14 Jul 2003 17:31:16 -0700 (PDT)
Received: from mail.advantagecom.net (mail.advantagecom.net [65.103.151.155])
	by mx1.FreeBSD.org (Postfix) with ESMTP id D4C6143F75
	for <freebsd-hackers@freebsd.org>;
	Mon, 14 Jul 2003 17:31:15 -0700 (PDT)
	(envelope-from andykinney@advantagecom.net)
Received: from SCSI-MONSTER (scsi-monster.advantagecom.net [207.109.186.200])
	by mail.advantagecom.net (8.11.6/8.11.6) with ESMTP id h6F0VBn32311;
	Mon, 14 Jul 2003 17:31:11 -0700
From: "Andrew Kinney" <andykinney@advantagecom.net>
Organization: Advantagecom Networks, Inc.
To: John Fox <jjf@mind.net>, freebsd-hackers@freebsd.org
Date: Mon, 14 Jul 2003 17:29:36 -0700
MIME-Version: 1.0
Content-type: text/plain; charset=US-ASCII
Content-transfer-encoding: 7BIT
Message-ID: <3F12E880.3304.39F3E27D@localhost>
Priority: normal
In-reply-to: <20030709175336.GF5200@mind.net>
X-mailer: Pegasus Mail for Win32 (v3.12c)
Subject: Re: Kernel panic when moving lots of data over network
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
Reply-To: andykinney@advantagecom.net
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 15 Jul 2003 00:31:17 -0000

On 9 Jul 2003, at 10:53, John Fox wrote:

> Strange problem on a new server we're setting up.  It's very stable,
> except when moving a large amount of data onto it via the network.  I
> begin moving approx 4GB of data onto it, and before the xfer can
> complete, the system panics and reboots.  (I am generally able to get
> from 1 to 2 GB transferred before the panic occurrs.)
> 

I'm not really a kernel hacker, but I've solved lots of our own kernel 
problems on 4.5 release, 4.7 release, and 4.8 release with the help 
of others on this list.  

We haven't had any problems exactly like what you described, but 
I seem to remember some open PRs relating to SSH and/or the xl 
network driver causing panics.  You might want to browse through 
them and see if any match your situation.  FWIW, though, we run 
4.8-RELEASE, SSH, and the xl driver (3com 905C-TX, I believe) on 
one of our heavily used dual CPU machines and don't have any 
problems, so I'd be surprised if any of those PRs had any bearing 
on this.  We don't do any large file transfers over SSH, though.  We 
usually use rsync for that since we deal with lots of little files that 
get out of synch easily.


> #6  0xc021745f in xl_newbuf ()
> #7  0xc021761e in xl_rxeof ()
> #8  0xc0219296 in xl_watchdog ()
> #9  0xc01b662f in if_slowtimo ()
> #10 0xc0180799 in softclock ()


Here's some slightly educated guesses that you'll want to eliminate 
until you isolate the trouble:

1.  My experience is that a lot of "trap 12" seem to come from 
running out of some hard limited kernel resource.  Try logging the 
sysctl vm.zone once a minute through cron to see if you're 
bumping any of those limits.  You'll also want to try logging sysctl 
kvm_free in the same manner to make sure you're not running out 
of KVA or KVM.  Our system is setup with 2GB KVA (default is 
1GB) which solved all the trap 12 issues our system was having 
due to running out of KVA/KVM.

2.  Check your RAM.  Bad RAM caused us innumerable 
headaches from seemingly random trap 12 problems on one of our 
other systems.  Usually hit on some buffer allocation, especially 
when that was the primary activity in RAM.  SSH is especially 
sensitive to bad RAM.  We could usually trigger a panic on a 
system with bad RAM just by excercising SSH a bit.

3.  Some unknown or known problem with the xl driver and long file 
transfers over SSH.  Check those PRs (sorry, don't know the 
numbers off hand).

Sincerely,
Andrew Kinney
President and
Chief Technology Officer
Advantagecom Networks, Inc.
http://www.advantagecom.net