From owner-freebsd-hackers@FreeBSD.ORG Mon Jul 14 17:31:16 2003 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D5B2937B401 for ; Mon, 14 Jul 2003 17:31:16 -0700 (PDT) Received: from mail.advantagecom.net (mail.advantagecom.net [65.103.151.155]) by mx1.FreeBSD.org (Postfix) with ESMTP id D4C6143F75 for ; Mon, 14 Jul 2003 17:31:15 -0700 (PDT) (envelope-from andykinney@advantagecom.net) Received: from SCSI-MONSTER (scsi-monster.advantagecom.net [207.109.186.200]) by mail.advantagecom.net (8.11.6/8.11.6) with ESMTP id h6F0VBn32311; Mon, 14 Jul 2003 17:31:11 -0700 From: "Andrew Kinney" Organization: Advantagecom Networks, Inc. To: John Fox , freebsd-hackers@freebsd.org Date: Mon, 14 Jul 2003 17:29:36 -0700 MIME-Version: 1.0 Content-type: text/plain; charset=US-ASCII Content-transfer-encoding: 7BIT Message-ID: <3F12E880.3304.39F3E27D@localhost> Priority: normal In-reply-to: <20030709175336.GF5200@mind.net> X-mailer: Pegasus Mail for Win32 (v3.12c) Subject: Re: Kernel panic when moving lots of data over network X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: andykinney@advantagecom.net List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Jul 2003 00:31:17 -0000 On 9 Jul 2003, at 10:53, John Fox wrote: > Strange problem on a new server we're setting up. It's very stable, > except when moving a large amount of data onto it via the network. I > begin moving approx 4GB of data onto it, and before the xfer can > complete, the system panics and reboots. (I am generally able to get > from 1 to 2 GB transferred before the panic occurrs.) > I'm not really a kernel hacker, but I've solved lots of our own kernel problems on 4.5 release, 4.7 release, and 4.8 release with the help of others on this list. We haven't had any problems exactly like what you described, but I seem to remember some open PRs relating to SSH and/or the xl network driver causing panics. You might want to browse through them and see if any match your situation. FWIW, though, we run 4.8-RELEASE, SSH, and the xl driver (3com 905C-TX, I believe) on one of our heavily used dual CPU machines and don't have any problems, so I'd be surprised if any of those PRs had any bearing on this. We don't do any large file transfers over SSH, though. We usually use rsync for that since we deal with lots of little files that get out of synch easily. > #6 0xc021745f in xl_newbuf () > #7 0xc021761e in xl_rxeof () > #8 0xc0219296 in xl_watchdog () > #9 0xc01b662f in if_slowtimo () > #10 0xc0180799 in softclock () Here's some slightly educated guesses that you'll want to eliminate until you isolate the trouble: 1. My experience is that a lot of "trap 12" seem to come from running out of some hard limited kernel resource. Try logging the sysctl vm.zone once a minute through cron to see if you're bumping any of those limits. You'll also want to try logging sysctl kvm_free in the same manner to make sure you're not running out of KVA or KVM. Our system is setup with 2GB KVA (default is 1GB) which solved all the trap 12 issues our system was having due to running out of KVA/KVM. 2. Check your RAM. Bad RAM caused us innumerable headaches from seemingly random trap 12 problems on one of our other systems. Usually hit on some buffer allocation, especially when that was the primary activity in RAM. SSH is especially sensitive to bad RAM. We could usually trigger a panic on a system with bad RAM just by excercising SSH a bit. 3. Some unknown or known problem with the xl driver and long file transfers over SSH. Check those PRs (sorry, don't know the numbers off hand). Sincerely, Andrew Kinney President and Chief Technology Officer Advantagecom Networks, Inc. http://www.advantagecom.net