Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 2 Jun 2002 23:38:39 -0700 (PDT)
From:      Oliver Crow <ocrow@simplexity.net>
To:        Miguel Mendez <flynn@energyhq.homeip.net>
Cc:        freebsd-stable@FreeBSD.ORG
Subject:   Re: [stable] 4.5-S crashing like clockwork (fwd)
Message-ID:  <20020602231348.A79925-100000@iguana.simplexity.net>

next in thread | raw e-mail | index | archive | help


On Sun, 2 Jun 2002, Miguel Mendez wrote:

> Back in January the fan of my server stopped working, and my computer
> exhibited that same behaviour, e.g. a kernel panic ever 24 hours or so,
> then the box would reboot and keep working until next day. Have you
> checked for a possible hardware issue?


I haven't tried running the same configuration on different hardware.
There's some evidence that leads me to think that this is a software
problem though.

Firstly it started on the day that I installed a new kernel, which I
rarely do.  Secondly it seems to crash in the same piece of code each time
-- whilst processing an ioctl.  Thirdly, Mike Nowlin reports the same
symptoms, with the same crash dump call stack, on completely different
hardware.

Mike's box is running Zebra.  Mine is running mpd.  Both packages deal
with PPP links.  Also Karl Joch reports the same symptoms on a box running
mpd.

I've pasted below the crash dump debugging session that Mike provided.
That seems to provide some pretty interesting clues...

Oliver


---------- Forwarded message ----------
Date: Sun, 2 Jun 2002 13:29:54 -0700 (PDT)
From: Oliver Crow <ocrow@simplexity.net>
To: Mike Nowlin <mike@argos.org>
Subject: Re: [stable] 4.5-S crashing like clockwork


On Sun, 2 Jun 2002, Mike Nowlin wrote:

> > Not running Zebra, no.  The only unusual networking software I'm runnin=
g
> > is mpd, the PPP & PPTP tunnelling daemon.
>
> Hmm - will have to think about this a bit - I'm using Zebra to handle the
> route up/down events from the PPP connections that this box handles...
> (Needed to use OSPF for various reasons, hence zebra instead of routed.)
>
> > > Hmm - that's exactly what I'm getting on that system.  I'm pretty sur=
e
> > > that on every crash dump I've gotten, it's dealing with the lo0 loopb=
ack
> > > interface in ifconf() when it blows up...
> >
> > Interesting ... how were you able to determine that?
>
> nas-1:/home/mike/crashdumps# gdb -k kernel.7 vmcore.7
> GNU gdb 4.18
> Copyright 1998 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you =
are
> welcome to change it and/or distribute copies of it under certain conditi=
ons.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for detail=
s.
> This GDB was configured as "i386-unknown-freebsd"...
> IdlePTD at phsyical address 0x004e4000
> initial pcb at physical address 0x00419f20
> panicstr: page fault
> panic messages:
> ---
> Fatal trap 12: page fault while in kernel mode
> fault virtual address=09  =3D 0xfd0171a5
> fault code    =09=09  =3D supervisor read, page not present
> instruction pointer=09  =3D 0x8:0xc020483f
> stack pointer=09=09          =3D 0x10:0xc76bdda0
> frame pointer=09=09=09          =3D 0x10:0xc76bde28
> code segment=09=09=09=09    =3D base 0x0, limit 0xfffff, type 0x1b
>      =09=09=09=09=09      =3D DPL 0, pres 1, def32 1, gran 1
> =09=09=09=09=09      processor eflags =3D interrupt
> enabled, resume, IOPL =3D 0
> current process=09      =3D 5364 (sendmail)
> interrupt mask=09      =09=3D none
> trap number=09=09  =3D 12
> panic: page fault
>
> syncing disks... 23 22 18 14 10 7 4 1
> done
> Uptime: 1d10h3m50s
>
>
> dumping to dev #ad/0x20001, offset 16384
> dump ata0: resetting devices .. done
> 92 91 90 89 88 87 86 85 84 83 82 81 80 79 78 77 76 75 74 73 72 71 70 69 6=
8
> 67 66 65 64 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 4=
3
> 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 1=
8
> 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
> ---
> #0  dumpsys () at /usr/src/sys/kern/kern_shutdown.c:473
> 473 SYSCTL_PROC(_kern, KERN_DUMPDEV, dumpdev, CTLTYPE_OPAQUE|CTLFLAG_RW,
> (kgdb) bt
> #0  dumpsys () at /usr/src/sys/kern/kern_shutdown.c:473
> #1  0xc01c16b3 in boot (howto=3D256) at /usr/src/sys/kern/kern_shutdown.c=
:313
> #2  0xc01c1a88 in poweroff_wait (junk=3D0xc03c190c, howto=3D-1069804497) =
at
>   /usr/src/sys/kern/kern_shutdown.c:581
> #3  0xc034f802 in trap_fatal (frame=3D0xc76bdd60, eva=3D4244730277) at
>   /usr/src/sys/i386/i386/trap.c:956
> #4  0xc034f4d5 in trap_pfault (frame=3D0xc76bdd60, usermode=3D0, eva=3D42=
44730277)
>   at /usr/src/sys/i386/i386/trap.c:849
> #5  0xc034f093 in trap (frame=3D{tf_fs =3D 16, tf_es =3D 16, tf_ds =3D 16=
, tf_edi =3D
>   15976, tf_esi =3D 135164312, tf_ebp =3D -949232088, tf_isp =3D -9492322=
44,
>   tf_ebx =3D -1059625860, tf_edx =3D -50237019, tf_ecx =3D 0, tf_eax =3D =
28,
>   tf_trapno =3D 12, tf_err =3D 0, tf_eip =3D -1071626177, tf_cs =3D 8, tf=
_eflags
>   =3D 66182, tf_esp =3D -961908672,
>   tf_ss =3D -1073190620}) at /usr/src/sys/i386/i386/trap.c:448
> #6  0xc020483f in ifconf (cmd=3D3221776676, data=3D0xc76bdeac "") at
>   /usr/src/sys/net/if.c:1300
> #7  0xc0204061 in ifioctl (so=3D0xc6ec0040, cmd=3D3221776676, data=3D0xc7=
6bdeac
>   "", p=3D0xc6aa7040) at /usr/src/sys/net/if.c:958
> #8  0xc01d3746 in soo_ioctl (fp=3D0xc0f979c0, cmd=3D3221776676, data=3D0x=
c76bdeac
>   "", p=3D0xc6aa7040) at /usr/src/sys/kern/sys_socket.c:143
> #9  0xc01d07de in ioctl (p=3D0xc6aa7040, uap=3D0xc76bdf80) at
>   /usr/src/sys/sys/file.h:177
> #10 0xc034faad in syscall2 (frame=3D{tf_fs =3D 47, tf_es =3D 47, tf_ds =
=3D 47,
>   tf_edi =3D -1077941716, tf_esi =3D -1077936586,
>   tf_ebp =3D -1077941808, tf_isp =3D -949231660, tf_ebx =3D -1077937184,
>   tf_edx =3D 135163904, tf_ecx =3D 4, tf_eax =3D 54,
>   tf_trapno =3D 12, tf_err =3D 2, tf_eip =3D 673130376, tf_cs =3D 31, tf_=
eflags
>   =3D 663, tf_esp =3D -1077942252, tf_ss =3D 47})
>   at /usr/src/sys/i386/i386/trap.c:1155
> #11 0xc03434e5 in Xint0x80_syscall ()
> #12 0x80682e0 in ?? ()
> #13 0x804c049 in ?? ()
>
>
> (kgdb) frame 6
> #6  0xc020483f in ifconf (cmd=3D3221776676, data=3D0xc76bdeac "") at
> /usr/src/sys/net/if.c:1300
> 1300=09=09=09  =09=09for ( ; space > sizeof (ifr) && ifa;
> (kgdb) print ifr
> $1 =3D {ifr_name =3D "lo0\000h0\000\000=CC=DEk=C7\000\000\000", ifr_ifru =
=3D {ifru_addr
> =3D {sa_len =3D 16 '\020', sa_family =3D 2 '\002',
>       sa_data =3D "\000\000\n\201\001\a\000\000\000\000\000\000\000"},
> ifru_dstaddr =3D {sa_len =3D 16 '\020',
>       sa_family =3D 2 '\002', sa_data =3D
> "\000\000\n\201\001\a\000\000\000\000\000\000\000"}, ifru_broadaddr =3D {
>       sa_len =3D 16 '\020', sa_family =3D 2 '\002', sa_data =3D
> "\000\000\n\201\001\a\000\000\000\000\000\000\000"},
>     ifru_flags =3D {528, 0}, ifru_metric =3D 528, ifru_mtu =3D 528, ifru_=
phys =3D
> 528, ifru_media =3D 528,
>     ifru_data =3D 0x210 <Address 0x210 out of bounds>, ifru_cap =3D {528,
> 117539082}}}
>
> (Forgive the formatting - I'm too brain-fried to fix it right now... :) )
>
>
> ...it's always at line 1300 in if.c - note the first part of ifr->ifr_nam=
e -
> "lo0".  I need to look through this further at an earlier step in the
> progressing crash, but I'm highly suspect of the "ifru_data =3D 0x210 <Ad=
dress
> 0x210 out of bounds>" part...
>
>
> --mike









To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20020602231348.A79925-100000>