Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 3 Jun 2010 09:29:20 +0300
From:      Nikolay Denev <ndenev@gmail.com>
To:        pyunyh@gmail.com
Cc:        freebsd-stable@freebsd.org, John Baldwin <jhb@freebsd.org>
Subject:   Re: if_sge related panics
Message-ID:  <87BA8EDC-BE95-4C84-94CD-5CA12961708A@gmail.com>
In-Reply-To: <20100524171210.GA1418@michelle.cdnetworks.com>
References:  <E7AF7DD3-FE50-42DD-8391-0F576708EAF7@gmail.com> <77DFF2E5-7A1E-4063-A852-2C7AD9BC3DD4@gmail.com> <201005240948.33555.jhb@freebsd.org> <20100524171210.GA1418@michelle.cdnetworks.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On May 24, 2010, at 8:12 PM, Pyun YongHyeon wrote:

> On Mon, May 24, 2010 at 09:48:33AM -0400, John Baldwin wrote:
>> On Monday 24 May 2010 6:35:01 am Nikolay Denev wrote:
>>> On May 24, 2010, at 8:57 AM, Nikolay Denev wrote:
>>>=20
>>>> Hi,
>>>>=20
>>>> Recently I started to experience a if_sge(4) related panic.
>>>> It happens almost every time I try to download a torrent file for =
example.
>>>> Copying of large files over NFS seem not to trigger it, but I =
haven't tested extensively.
>>>>=20
>>>> Here is the panic message :
>>>>=20
>>>> Fatal trap 12: page fault while in kernel mode
>>>> cpuid =3D 0; apic id =3D 00
>>>> fault virtual address		=3D 0x8
>>>> fault code				=3D supervisor write data, page =
not present
>>>> instruction pointer		=3D 0x20:0xffffffff80230413
>>>> stack pointer				=3D =
0x28:0xffffff80001e9280
>>>> frame pointer			=3D 0x28:0xffffff80001e9510
>>>> code segment			=3D base 0x0, limit 0xfffff, =
type 0x1b
>>>> 						=3D DPL 0, pres 1, long =
1, def32 0, gran 1
>>>> processor eflags			=3D interrupt enabled, resume, =
IOPL =3D 0
>>>> current process			=3D 12 (irq19: sge0)
>>>> trap number				=3D 12
>>>> panic: page fault
>>>> cpuid =3D 0
>>>> Uptime: 1d20h56m20s
>>>> Cannot dump. Device not defined or unavailable
>>>> Automatic reboot in 15 seconds - press a key on the console to =
abort
>>>> Sleeping thread (tid 100039, pid 12) owns a non-sleepable lock
>>>>=20
>>>> My swap is on a zvol, so I don't have dump. I'll try to attach a =
disk on the eSATA port and dump there if needed.
>>>=20
>>> Here is some info from the crashdump :
>>>=20
>>> (kgdb) #0  doadump () at pcpu.h:223
>>> #1  0xffffffff802fb149 in boot (howto=3D260)
>>>    at /usr/src/sys/kern/kern_shutdown.c:416
>>> #2  0xffffffff802fb57c in panic (fmt=3D0xffffffff8055d564 "%s")
>>>    at /usr/src/sys/kern/kern_shutdown.c:590
>>> #3  0xffffffff805055b8 in trap_fatal (frame=3D0xffffff000288a3e0, =
eva=3DVariable "eva" is not available.
>>> )
>>>    at /usr/src/sys/amd64/amd64/trap.c:777
>>> #4  0xffffffff805059dc in trap_pfault (frame=3D0xffffff80001e91d0, =
usermode=3D0)
>>>    at /usr/src/sys/amd64/amd64/trap.c:693
>>> #5  0xffffffff805061c5 in trap (frame=3D0xffffff80001e91d0)
>>>    at /usr/src/sys/amd64/amd64/trap.c:451
>>> #6  0xffffffff804eb977 in calltrap ()
>>>    at /usr/src/sys/amd64/amd64/exception.S:223
>>> #7  0xffffffff80230413 in sge_start_locked (ifp=3D0xffffff000270d800)
>>>    at /usr/src/sys/dev/sge/if_sge.c:1591
>>=20
>> Try this.  sge_encap() can sometimes return an error with m_head set =
to NULL:
>>=20
>=20
> Thanks John. Committed in r208512.
>=20
>> Index: if_sge.c
>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>> --- if_sge.c	(revision 208375)
>> +++ if_sge.c	(working copy)
>> @@ -1588,7 +1588,8 @@
>> 		if (m_head =3D=3D NULL)
>> 			break;
>> 		if (sge_encap(sc, &m_head)) {
>> -			IFQ_DRV_PREPEND(&ifp->if_snd, m_head);
>> +			if (m_head !=3D NULL)
>> +				IFQ_DRV_PREPEND(&ifp->if_snd, m_head);
>> 			ifp->if_drv_flags |=3D IFF_DRV_OACTIVE;
>> 			break;
>> 		}
>>=20
>> --=20
>> John Baldwin

After the patch I experienced several network outages (ping reporting =
"no buffer space available")
that were resolved by ifconfig down/up of the sge(4) interface.

I can see that most of the other drivers that handle XXX_encap() =
returning m_head pointing NULL, break when this condition
is hit: i.e. :

Index: if_sge.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- if_sge.c	(revision 208375)
+++ if_sge.c	(working copy)
@@ -1588,7 +1588,8 @@
		if (m_head =3D=3D NULL)
			break;
		if (sge_encap(sc, &m_head)) {
-			IFQ_DRV_PREPEND(&ifp->if_snd, m_head);
+			if (m_head =3D=3D NULL)
+				break;
			IFQ_DRV_PREPEND(&ifp->if_snd, m_head);
			ifp->if_drv_flags |=3D IFF_DRV_OACTIVE;
			break;
		}

But here in sge(4) we always set IFF_DRV_OACTIVE.
Do you think this can be the source of the problem ?

Regards,
Niki=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?87BA8EDC-BE95-4C84-94CD-5CA12961708A>