FreeBSD Mail Archives

Date:      Wed, 06 Jul 2011 13:50:24 +0200
From:      Martin Birgmeier <la5lbtyi@aon.at>
To:        freebsd-net@freebsd.org
Cc:        art@freebsd.org
Subject:   Re: amd + NFS reconnect = ICMP storm + unkillable process.
Message-ID:  <4E144C00.9020804@aon.at>

next in thread | raw e-mail | index | archive | help

Hi Artem,

I have exactly the same problem as you are describing below, also with quite
a number of amd mounts.

In addition to the scenario you describe, another way this happens here
is when downloading a file via firefox to a directory currently open in
dolphin (KDE file manager). This will almost surely trigger the symptoms
you describe.

I've had 7.4 running on the box before, now with 8.2 this has started to 
happen.

Alas, I don't have a solution.

We should probably file a PR, but I don't even know where to assign it to.
Amd does not seem much maintained, it's probably using some old-style
mounts (it never mounts anything via IPv6, for example).

Regards,

Martin

 > Hi,
 >
 > I wonder if someone else ran into this issue before and, maybe, have 
a solution.
 >
 > I've been running into a problem where access to filesystems mouted
 > with amd wedges processes in an unkillable state and produces ICMP
 > storm on loopback interface.I've managed to narrow down to NFS
 > reconnect, but that's when I ran out of ideas.
 >
 > Usually the problem happens when I abort a parallel build job in an
 > i386 jail on FreeBSD-8/amd64 (r223055). When the build job is killed
 > now and then I end up with one process consuming 100% of CPU time on
 > one of the cores. At the same time I get a lot of messages on the
 > console saying "Limiting icmp unreach response from 49837 to 200
 > packets/sec" and the loopback traffic goes way up.
 >
 > As far as I can tell here's what's happening:
 >
 > * My setup uses a lot of filesystems mounted by amd.
 > * amd itself pretends to be an NFS server running on the localhost and
 > serving requests for amd mounts.
 > * Now and then amd seems to change the ports it uses. Beats me why.
 > * the problem seems to happen when some process is about to access amd
 > mountpoint when amd instance disappears from the port it used to
 > listen on. In my case it does correlate with interrupted builds, but I
 > have no clue why.
 > * NFS client detects disconnect and tries to reconnect using the same
 > destination port.
 > * That generates ICMP response as port is unreachable and it reconnect
 > call returns almost immediatelly.
 > * We try to reconnect again, and again, and again....
 > * the process in this state is unkillable
 >
 > Here's what the stack of the 'stuck' process looks like in those rare
 > moments when it gets to sleep:
 > 18779 100511 collect2         -                mi_switch+0x176
 > turnstile_wait+0x1cb _mtx_lock_sleep+0xe1 sleepq_catch_signals+0x386
 > sleepq_timedwait_sig+0x19 _sleep+0x1b1 clnt_dg_call+0x7e6
 > clnt_reconnect_call+0x12e nfs_request+0x212 nfs_getattr+0x2e4
 > VOP_GETATTR_APV+0x44 nfs_bioread+0x42a VOP_READLINK_APV+0x4a
 > namei+0x4f9 kern_statat_vnhook+0x92 kern_statat+0x15
 > freebsd32_stat+0x2e syscallenter+0x23d
 >
 > * Usually some timeout expires in few minutes, the process dies, ICMP
 > storm stops and the system is usable again.
 > * On occasion the process is stuck forever and I have to reboot the box.
 >
 > I'm not sure who's to blame here.
 >
 > Is the automounter at fault for disappearing from the port it was
 > supposed to listen to?
 > If NFS guilty of trying blindly to reconnect on the same port and not
 > giving up sooner?
 > Should I flog the operator (ALA myself) for misconfiguring something
 > (what?) in amd or NFS?
 >
 > More importantly -- how do I fix it?
 > Any suggestions on fixing/debugging this issue?
 >
 > --Artem

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4E144C00.9020804>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation