From owner-freebsd-hackers@FreeBSD.ORG Wed Apr 2 07:13:19 2003 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 10C0E37B404 for ; Wed, 2 Apr 2003 07:13:19 -0800 (PST) Received: from puffin.mail.pas.earthlink.net (puffin.mail.pas.earthlink.net [207.217.120.139]) by mx1.FreeBSD.org (Postfix) with ESMTP id 478EA43FB1 for ; Wed, 2 Apr 2003 07:13:18 -0800 (PST) (envelope-from tlambert2@mindspring.com) Received: from pool0051.cvx21-bradley.dialup.earthlink.net ([209.179.192.51] helo=mindspring.com) by puffin.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 190jvO-0006lk-00; Wed, 02 Apr 2003 07:13:14 -0800 Message-ID: <3E8AFD9E.A34213B4@mindspring.com> Date: Wed, 02 Apr 2003 07:11:26 -0800 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Dmitry Sivachenko References: <20030402134428.GA43549@fling-wing.demos.su> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a4b7e8e9b6d146f377d0ebeb286acee3ae667c3043c0873f7e350badd9bab72f9c350badd9bab72f9c cc: hackers@freebsd.org Subject: Re: Repeated similar panics on -STABLE X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 02 Apr 2003 15:13:19 -0000 Dmitry Sivachenko wrote: > We have three machines under relatively high load. They are running -STABLE > on the same hardware with 2 processors (and SMP kernel). > Periodically (approximately once a week) they panic with similar symptoms: [ ... ] Panic. > #18 0xc0162549 in panic (fmt=0xc028e3b9 "%s") > at /mnt/se3/releng_4/src/sys/kern/kern_shutdown.c:595 > #19 0xc0251b1a in trap_fatal (frame=0xeb278e04, eva=1558020096) > at /mnt/se3/releng_4/src/sys/i386/i386/trap.c:974 > #20 0xc0251775 in trap_pfault (frame=0xeb278e04, usermode=0, eva=1558020096) > at /mnt/se3/releng_4/src/sys/i386/i386/trap.c:867 > #21 0xc02512b7 in trap (frame={tf_fs = -1072300008, tf_es = -361627632, > tf_ds = 16, tf_edi = -1070989600, tf_esi = -349729108, > tf_ebp = -349729176, tf_isp = -349729232, tf_ebx = -1070870564, > tf_edx = 1558020096, tf_ecx = 7, tf_eax = 128, tf_trapno = 12, > tf_err = 0, tf_eip = -1072309505, tf_cs = 8, tf_eflags = 66054, > tf_esp = 0, tf_ss = -349729108}) > at /mnt/se3/releng_4/src/sys/i386/i386/trap.c:466 Page not present error. > #22 0xc015daff in malloc (size=72, type=0xc029fee0, flags=0) > at /mnt/se3/releng_4/src/sys/kern/kern_malloc.c:243 Malloc failure was not checked for return value by source code; probably the kbp list was just refreshed, and while you were calling the failing malloc, the list was reemptied. What this generally means is that KVA was exhausted, and the caller did not expect that. To workaround: don't exhaust the KVA space; probably you have tuned some kernel parameter way too high. To fix: at line 243, you need to check if va is NULL; if it is, you need to wheck the M_WAITOK, and if set, restart the allocation. This has to be done before the next line, where "va" is dereferenced. Maybe something like: Change: va = kbp->kb_next; kbp->kb_next = ((struct freelist *)va)->next; To: va = kbp->kb_next; if (va == NULL) { if (flags & M_NOWAIT) { splx(s); return ((void *) NULL); } goto restart; /* put this label above the "while" */ } kbp->kb_next = ((struct freelist *)va)->next; Working around the problem is easier (IMO): just change your tuning parameters to avoid running out of KVA. Probably your mbufs or mbufclusters are way to large, for your amount of physical RAM; remember that, except in very sepcial circumstances, kernel memory is non-pageable. > #23 0xc015a3fe in exit1 (p=0xea726820, rv=15) > at /mnt/se3/releng_4/src/sys/kern/kern_exit.c:166 It was trying to allocate a "zombie" structure. > #24 0xc0164011 in sigexit (p=0xea726820, sig=15) > at /mnt/se3/releng_4/src/sys/kern/kern_sig.c:1503 For a process someone sent a SIGTERM to, to kill it. > #25 0xc0163d9c in postsig (sig=15) > at /mnt/se3/releng_4/src/sys/kern/kern_sig.c:1406 > #26 0xc0251fc5 in syscall2 (frame={tf_fs = 47, tf_es = 47, tf_ds = 47, > tf_edi = 174, tf_esi = 1049187701, tf_ebp = -1077936960, > tf_isp = -349728812, tf_ebx = 1, tf_edx = 3, tf_ecx = -1078002496, > tf_eax = 3, tf_trapno = 7, tf_err = 2, tf_eip = 672039098, tf_cs = 31, > tf_eflags = 659, tf_esp = -1078069180, tf_ss = 47}) > at /mnt/se3/releng_4/src/sys/i386/i386/trap.c:174 Looks like you caused a floating point exception, and died when the exit1 failed to create a zombie structure for the process. -- Terry