From owner-freebsd-current@FreeBSD.ORG Mon Oct 31 18:55:24 2005 Return-Path: X-Original-To: freebsd-current@freebsd.org Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 59B8416A41F; Mon, 31 Oct 2005 18:55:24 +0000 (GMT) (envelope-from max@love2party.net) Received: from moutng.kundenserver.de (moutng.kundenserver.de [212.227.126.187]) by mx1.FreeBSD.org (Postfix) with ESMTP id 94D7543D53; Mon, 31 Oct 2005 18:55:23 +0000 (GMT) (envelope-from max@love2party.net) Received: from p54A3EDB3.dip.t-dialin.net [84.163.237.179] (helo=donor.laier.local) by mrelayeu.kundenserver.de (node=mrelayeu7) with ESMTP (Nemesis), id 0ML2Dk-1EWeoN1pfw-0003to; Mon, 31 Oct 2005 19:55:15 +0100 From: Max Laier To: freebsd-current@freebsd.org Date: Mon, 31 Oct 2005 19:54:53 +0100 User-Agent: KMail/1.8.2 References: <20051027022313.R675@kushnir1.kiev.ua> <43602F2F.7080500@samsco.org> <200510281404.33462.jhb@freebsd.org> In-Reply-To: <200510281404.33462.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="nextPart1298973.9I7Ud9geTj"; protocol="application/pgp-signature"; micalg=pgp-sha1 Content-Transfer-Encoding: 7bit Message-Id: <200510311955.13137.max@love2party.net> X-Provags-ID: kundenserver.de abuse@kundenserver.de login:61c499deaeeba3ba5be80f48ecc83056 Cc: Subject: Re: CURRENT + amd64 + user-ppp = panic X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 31 Oct 2005 18:55:24 -0000 --nextPart1298973.9I7Ud9geTj Content-Type: multipart/mixed; boundary="Boundary-01=_MimZDv6xdU755fY" Content-Transfer-Encoding: 7bit Content-Disposition: inline --Boundary-01=_MimZDv6xdU755fY Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline On Friday 28 October 2005 20:04, John Baldwin wrote: > On Wednesday 26 October 2005 09:36 pm, Scott Long wrote: > > Vladimir Kushnir wrote: > > > Hello, > > > For a couple of days already my -CURRENT amd64 reliably panicks > > > whenever I'm trying to connect via ppp (nothing fancy - playn dialup, > > > no firewall). It's 100% reproducible both with custom kernel and with > > > GENERIC. A typescript of kgdb is attached. > > > > > > I'm running now on the kernel from Oct 19 which also panicks, BTW, wi= th > > > "kmem_map too small" on an attempt to run something like Linux > > > OpenOffice or Mathematica (neither kern.ipc.nmbclusters nor > > > vm.kmem_size_max tweaking helps; besides, I've only 512 MB RAM) > > > > > > Regards, > > > Vladimir > > > > I think that this is a result of the interrupt handler changes that John > > Baldwin made yesterday. Can you step your source back in time and see > > where it stops panicing? > > Actually, it can't be if softclock() is called directly from > ithread_loop(). In the new code ithread_loop() calls > ithread_execute_handlers() which would call softclock(). > > > > #0 doadump () at pcpu.h:172 > > > > > > 172 pcpu.h: No such file or directory. > > > > > > in pcpu.h > > > > > > (kgdb) where > > > > > > #0 doadump () at pcpu.h:172 > > > #1 0xffffffff803c65fc in boot (howto=3D260) > > > at /usr/src/sys/kern/kern_shutdown.c:399 > > > #2 0xffffffff803c609b in panic (fmt=3D0xffffffff805f2f46 "from > > > debugger") at /usr/src/sys/kern/kern_shutdown.c:555 > > > #3 0xffffffff801a8a32 in db_panic (addr=3D0, have_addr=3D0, count=3D= 0, > > > modif=3D0x0) > > > at /usr/src/sys/ddb/db_command.c:435 > > > #4 0xffffffff801a8f75 in db_command_loop () > > > at /usr/src/sys/ddb/db_command.c:404 > > > #5 0xffffffff801aae83 in db_trap (type=3D-1794574032, code=3D0) > > > at /usr/src/sys/ddb/db_main.c:221 > > > #6 0xffffffff803e5279 in kdb_trap (type=3D9, code=3D0, > > > tf=3D0xffffffff9508fb10) > > > at /usr/src/sys/kern/subr_kdb.c:445 > > > #7 0xffffffff8058d84e in trap_fatal (frame=3D0xffffffff9508fb10, > > > eva=3D18446742974715243568) at /usr/src/sys/amd64/amd64/trap.c:672 > > > #8 0xffffffff8058ddb1 in trap (frame=3D > > > {tf_rdi =3D 1, tf_rsi =3D 70876, tf_rdx =3D -240105096286740457= 8, > > > tf_rcx =3D 70876, tf_r8 =3D 0, tf_r9 =3D 1, tf_rax =3D 5340, tf_rbx = =3D 1, tf_rbp > > > =3D -1794573296, tf_r10 =3D 1, tf_r11 =3D 4, tf_r12 =3D -109951114368= 0, tf_r13 > > > =3D -1099035903488, tf_r14 =3D -1964245152, tf_r15 =3D 2, tf_trapno = =3D 9, > > > tf_addr =3D 0, tf_flags =3D 0, tf_err =3D 0, tf_rip =3D -2143462195, = tf_cs =3D 8, > > > tf_rflags =3D 65538, tf_rsp =3D -1794573360, tf_ss =3D 16}) at > > > /usr/src/sys/amd64/amd64/trap.c:488 > > > #9 0xffffffff8057b3bb in calltrap () > > > at /usr/src/sys/amd64/amd64/exception.S:168 > > This looks like a page fault rather than a 'kmem_map too small' panic. > > > > ---Type to continue, or q to quit--- > > > > > > #10 0xffffffff803d5ccd in softclock (dummy=3D0x1) > > > at /usr/src/sys/kern/kern_timeout.c:220 > > This is here: > while (c) { > depth++; > =3D=3D> if (c->c_time !=3D curticks) { > c =3D TAILQ_NEXT(c, c_links.tqe); > > c can't be NULL due to the while loop. Are any kernel modules being > unloaded when this happens? It isn't a NULL deref as "eva" is clearly non-NULL above. This makes me th= ink=20 of a callout list inconsistency. Most likely - due to the rest of the thre= ad=20 =2D this was introduced via "tn_timer_ch" in struct llinfo_nd6. I am think= ing=20 of a double callout_stop() or something like that. The callout_stop/reset(= )=20 calls on that callout are clearly over-nested to get things from a quick=20 glance :-\ The easiest seems to be to put some good old printf() debugging in=20 nd6_llinfo_settimer() and see what it does. Vladimir, could you try that? = =20 "Patch" attached. > > > #11 0xffffffff803b05cc in ithread_loop (arg=3D0xffffff0000031780) > > > at /usr/src/sys/kern/kern_intr.c:662 > > > #12 0xffffffff803af3cb in fork_exit ( > > > callout=3D0xffffffff803b0480 , arg=3D0xffffff000003= 1780, > > > frame=3D0xffffffff9508fc90) at /usr/src/sys/kern/kern_fork.c:789 > > > #13 0xffffffff8057b71e in fork_trampoline () > > > at /usr/src/sys/amd64/amd64/exception.S:394 > > > #14 0x0000000000000000 in ?? () =2D-=20 /"\ Best regards, | mlaier@freebsd.org \ / Max Laier | ICQ #67774661 X http://pf4freebsd.love2party.net/ | mlaier@EFnet / \ ASCII Ribbon Campaign | Against HTML Mail and News --Boundary-01=_MimZDv6xdU755fY Content-Type: text/x-diff; charset="iso-8859-1"; name="nd6_llinfo_settimer.printf.diff" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="nd6_llinfo_settimer.printf.diff" Index: nd6.c =================================================================== RCS file: /usr/store/mlaier/fcvs/src/sys/netinet6/nd6.c,v retrieving revision 1.62 diff -u -p -r1.62 nd6.c --- nd6.c 22 Oct 2005 05:07:16 -0000 1.62 +++ nd6.c 31 Oct 2005 18:49:58 -0000 @@ -395,6 +395,7 @@ nd6_llinfo_settimer(ln, tick) struct llinfo_nd6 *ln; long tick; { + printf("For %p %ld ticks\n", ln, tick); if (tick < 0) { ln->ln_expire = 0; ln->ln_ntick = 0; --Boundary-01=_MimZDv6xdU755fY-- --nextPart1298973.9I7Ud9geTj Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (FreeBSD) iD4DBQBDZmiRXyyEoT62BG0RAjIrAJjItg/4+0B3ox15ov2Xtf40Lf6GAJ4kCzFh gs3UpibqAh3jo7KIqnoRkA== =38Pu -----END PGP SIGNATURE----- --nextPart1298973.9I7Ud9geTj--