Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 20 Jun 2005 02:11:57 +0200
From:      =?ISO-8859-1?Q?Eirik_=D8verby?= <ltning@anduin.net>
To:        Robert Watson <rwatson@FreeBSD.org>
Cc:        stable@freebsd.org, mlaier@FreeBSD.org
Subject:   Re: NFS-related hang in 5.4?
Message-ID:  <CF3CB334-ACF4-4DA5-9CE5-D2C7466DCD10@anduin.net>
In-Reply-To: <20050619185338.J6413@fledge.watson.org>
References:  <8149D7F8-3FA2-48F5-BF03-9AF813448BF0@anduin.net> <20050619185338.J6413@fledge.watson.org>

next in thread | previous in thread | raw e-mail | index | archive | help

On 19. jun. 2005, at 20.06, Robert Watson wrote:

>
> On Sun, 19 Jun 2005, Eirik =D8verby wrote:
>
>
>> when doing large file transfers (backing up jails using tar+gzip =20
>> to a neighboring server), NFS has a tendency to lock up on me. =20
>> This usually happens after quite a while - like a few hours or so. =20=

>> Also, before the hang, performance is generally bad.
>>
>
> Hmm.  Looks like a bug in dummynet.  ipfw should not be directly re-=20=

> injecting UDP traffic back into the input path from an outbound =20
> path, or it risks re-entering, generating lock order problems, etc. =20=

> It should be getting dropped into the netisr queue to be processed =20
> from the netisr context.

This problem would exist across all 5.4 installations, both i386 and =20
amd64? Would it depend on heavy load, or could it theoretically =20
happen at any time when there's traffic? All three of my fbsd5 =20
servers (dual opteron, dual p3-1ghz, dual p3-700mhz) are experiencing =20=

random hangs with ~a few weeks between, impression is that if running =20=

single-cpu mode they are all stable. All using dummynet in a =20
comparable manner. Ideas?

> Is it possible to configure dummynet out of your configuration, and =20=

> see if the problem goes away?

I'm running a test right now, will let you know in the morning.

>
> Robert N M Watson
>
>
>>
>> KDB trace:
>>
>> db> trace
>> Tracing pid 56 tid 100064 td 0xc1a18600
>> kdb_enter(c096bad3,4,480758,c08dcbf9,f5) at kdb_enter+0x30
>> siointr1(c1a8e000,c1a18600,c1a148d4,c1a12700,c1a12700) at siointr1=20
>> +0xe7
>> siointr(c1a8e000,0,0,4,c1a18600) at siointr+0x78
>> intr_execute_handlers(c19bd090,d54807bc,d5480818,c08d05a3,34) at =20
>> intr_execute_handlers+0x88
>> lapic_handle_intr(34) at lapic_handle_intr+0x3a
>> Xapic_isr1() at Xapic_isr1+0x33
>> --- interrupt, eip =3D 0xc06b8490, esp =3D 0xd5480800, ebp =3D =20
>> 0xd5480818 ---
>> _mtx_lock_sleep(c0a1cd2c,c1a18600,0,0,0) at _mtx_lock_sleep+0xb0
>> udp_input(c2d40000,14,c1a99000,1,0) at udp_input+0x257
>> ip_input(c2d40000,0,0,0,0) at ip_input+0x590
>> transmit_event(c1c64100,20940000,0,c1d58a80,7f4220) at =20
>> transmit_event+0x107
>> ready_event_wfq(c1c64100,20940000,0,c1d58a80,c06d860a) at =20
>> ready_event_wfq+0x511
>> dummynet_io(c2bd2e00,64,1,d54809c8,c2bd2e00) at dummynet_io+0x519
>> ipfw_check_out(0,d5480a24,c1a99000,2,c1d1821c) at ipfw_check_out+0xf1
>> pfil_run_hooks(c0a1c160,d5480a9c,c1a99000,2,c1d1821c) at =20
>> pfil_run_hooks+0x138
>> ip_output(c2bd2e00,0,0,0,0) at ip_output+0x593
>> udp_output(c1d1821c,c2bd2e00,0,0,c1a18600) at udp_output+0x597
>> udp_send(c2242654,0,c1e12100,0,0) at udp_send+0x30
>> sosend(c2242654,0,0,c1e12100,0) at sosend+0x6f1
>> nfs_send(c2242654,c1d57860,c1e12100,c2313900,1c) at nfs_send+0xc9
>> nfs_request(c22cf108,c1e12a00,7,0,c20bb300) at nfs_request+0x342
>> nfs_writerpc(c22cf108,d5480ca4,c20bb300,d5480c94,d5480c98) at =20
>> nfs_writerpc+0x2a0
>> nfs_doio(cbf75e08,c20bb300,0,c094f9b4,0) at nfs_doio+0x508
>> nfssvc_iod(c0a21828,d5480d38,0,0,0) at nfssvc_iod+0x1db
>> fork_exit(c07c5150,c0a21828,d5480d38) at fork_exit+0x80
>> fork_trampoline() at fork_trampoline+0x8
>> --- trap 0x1, eip =3D 0, esp =3D 0xd5480d6c, ebp =3D 0 ---
>>
>> I cannot seem to kill process 56 (nfsiod), so I have to reset the =20
>> box.
>>
>> Anyone got a clue? What can I do to ease debugging here? Next time =20=

>> it happens I can probably make a dump, at least I will have a =20
>> debug kernel running then.
>>
>> /Eirik
>> _______________________________________________
>> freebsd-stable@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
>> To unsubscribe, send any mail to "freebsd-stable-=20
>> unsubscribe@freebsd.org"
>




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CF3CB334-ACF4-4DA5-9CE5-D2C7466DCD10>