Date: Thu, 3 Jun 2010 09:29:20 +0300 From: Nikolay Denev <ndenev@gmail.com> To: pyunyh@gmail.com Cc: freebsd-stable@freebsd.org, John Baldwin <jhb@freebsd.org> Subject: Re: if_sge related panics Message-ID: <87BA8EDC-BE95-4C84-94CD-5CA12961708A@gmail.com> In-Reply-To: <20100524171210.GA1418@michelle.cdnetworks.com> References: <E7AF7DD3-FE50-42DD-8391-0F576708EAF7@gmail.com> <77DFF2E5-7A1E-4063-A852-2C7AD9BC3DD4@gmail.com> <201005240948.33555.jhb@freebsd.org> <20100524171210.GA1418@michelle.cdnetworks.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On May 24, 2010, at 8:12 PM, Pyun YongHyeon wrote: > On Mon, May 24, 2010 at 09:48:33AM -0400, John Baldwin wrote: >> On Monday 24 May 2010 6:35:01 am Nikolay Denev wrote: >>> On May 24, 2010, at 8:57 AM, Nikolay Denev wrote: >>>=20 >>>> Hi, >>>>=20 >>>> Recently I started to experience a if_sge(4) related panic. >>>> It happens almost every time I try to download a torrent file for = example. >>>> Copying of large files over NFS seem not to trigger it, but I = haven't tested extensively. >>>>=20 >>>> Here is the panic message : >>>>=20 >>>> Fatal trap 12: page fault while in kernel mode >>>> cpuid =3D 0; apic id =3D 00 >>>> fault virtual address =3D 0x8 >>>> fault code =3D supervisor write data, page = not present >>>> instruction pointer =3D 0x20:0xffffffff80230413 >>>> stack pointer =3D = 0x28:0xffffff80001e9280 >>>> frame pointer =3D 0x28:0xffffff80001e9510 >>>> code segment =3D base 0x0, limit 0xfffff, = type 0x1b >>>> =3D DPL 0, pres 1, long = 1, def32 0, gran 1 >>>> processor eflags =3D interrupt enabled, resume, = IOPL =3D 0 >>>> current process =3D 12 (irq19: sge0) >>>> trap number =3D 12 >>>> panic: page fault >>>> cpuid =3D 0 >>>> Uptime: 1d20h56m20s >>>> Cannot dump. Device not defined or unavailable >>>> Automatic reboot in 15 seconds - press a key on the console to = abort >>>> Sleeping thread (tid 100039, pid 12) owns a non-sleepable lock >>>>=20 >>>> My swap is on a zvol, so I don't have dump. I'll try to attach a = disk on the eSATA port and dump there if needed. >>>=20 >>> Here is some info from the crashdump : >>>=20 >>> (kgdb) #0 doadump () at pcpu.h:223 >>> #1 0xffffffff802fb149 in boot (howto=3D260) >>> at /usr/src/sys/kern/kern_shutdown.c:416 >>> #2 0xffffffff802fb57c in panic (fmt=3D0xffffffff8055d564 "%s") >>> at /usr/src/sys/kern/kern_shutdown.c:590 >>> #3 0xffffffff805055b8 in trap_fatal (frame=3D0xffffff000288a3e0, = eva=3DVariable "eva" is not available. >>> ) >>> at /usr/src/sys/amd64/amd64/trap.c:777 >>> #4 0xffffffff805059dc in trap_pfault (frame=3D0xffffff80001e91d0, = usermode=3D0) >>> at /usr/src/sys/amd64/amd64/trap.c:693 >>> #5 0xffffffff805061c5 in trap (frame=3D0xffffff80001e91d0) >>> at /usr/src/sys/amd64/amd64/trap.c:451 >>> #6 0xffffffff804eb977 in calltrap () >>> at /usr/src/sys/amd64/amd64/exception.S:223 >>> #7 0xffffffff80230413 in sge_start_locked (ifp=3D0xffffff000270d800) >>> at /usr/src/sys/dev/sge/if_sge.c:1591 >>=20 >> Try this. sge_encap() can sometimes return an error with m_head set = to NULL: >>=20 >=20 > Thanks John. Committed in r208512. >=20 >> Index: if_sge.c >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >> --- if_sge.c (revision 208375) >> +++ if_sge.c (working copy) >> @@ -1588,7 +1588,8 @@ >> if (m_head =3D=3D NULL) >> break; >> if (sge_encap(sc, &m_head)) { >> - IFQ_DRV_PREPEND(&ifp->if_snd, m_head); >> + if (m_head !=3D NULL) >> + IFQ_DRV_PREPEND(&ifp->if_snd, m_head); >> ifp->if_drv_flags |=3D IFF_DRV_OACTIVE; >> break; >> } >>=20 >> --=20 >> John Baldwin After the patch I experienced several network outages (ping reporting = "no buffer space available") that were resolved by ifconfig down/up of the sge(4) interface. I can see that most of the other drivers that handle XXX_encap() = returning m_head pointing NULL, break when this condition is hit: i.e. : Index: if_sge.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- if_sge.c (revision 208375) +++ if_sge.c (working copy) @@ -1588,7 +1588,8 @@ if (m_head =3D=3D NULL) break; if (sge_encap(sc, &m_head)) { - IFQ_DRV_PREPEND(&ifp->if_snd, m_head); + if (m_head =3D=3D NULL) + break; IFQ_DRV_PREPEND(&ifp->if_snd, m_head); ifp->if_drv_flags |=3D IFF_DRV_OACTIVE; break; } But here in sge(4) we always set IFF_DRV_OACTIVE. Do you think this can be the source of the problem ? Regards, Niki=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?87BA8EDC-BE95-4C84-94CD-5CA12961708A>