Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 1 Jul 2010 13:36:12 -0700 (PDT)
From:      alan bryan <alan.bryan@yahoo.com>
To:        Garrett Cooper <yanefbsd@gmail.com>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: NFS 75 second stall
Message-ID:  <119072.59868.qm@web50504.mail.re2.yahoo.com>
In-Reply-To: <AANLkTikxnw7sQ_cWCekS-qI3mP1Ui3dPjK1KAVqRg239@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
=0A=0A--- On Thu, 7/1/10, Garrett Cooper <yanefbsd@gmail.com> wrote:=0A=0A>=
 From: Garrett Cooper <yanefbsd@gmail.com>=0A> Subject: Re: NFS 75 second s=
tall=0A> To: "alan bryan" <alan.bryan@yahoo.com>=0A> Cc: freebsd-stable@fre=
ebsd.org=0A> Date: Thursday, July 1, 2010, 1:28 PM=0A> On Thu, Jul 1, 2010 =
at 1:18 PM, alan=0A> bryan <alan.bryan@yahoo.com>=0A> wrote:=0A> >=0A> >=0A=
> > --- On Thu, 7/1/10, Garrett Cooper <yanefbsd@gmail.com>=0A> wrote:=0A> =
>=0A> >> From: Garrett Cooper <yanefbsd@gmail.com>=0A> >> Subject: Re: NFS =
75 second stall=0A> >> To: "alan bryan" <alan.bryan@yahoo.com>=0A> >> Cc: f=
reebsd-stable@freebsd.org=0A> >> Date: Thursday, July 1, 2010, 12:23 PM=0A>=
 >> On Thu, Jul 1, 2010 at 11:51 AM, alan=0A> >> bryan <alan.bryan@yahoo.co=
m>=0A> >> wrote:=0A> >> >=0A> >> >=0A> >> > --- On Thu, 7/1/10, Garrett Coo=
per <yanefbsd@gmail.com>=0A> >> wrote:=0A> >> >=0A> >> >> From: Garrett Coo=
per <yanefbsd@gmail.com>=0A> >> >> Subject: Re: NFS 75 second stall=0A> >> =
>> To: "alan bryan" <alan.bryan@yahoo.com>=0A> >> >> Cc: freebsd-stable@fre=
ebsd.org=0A> >> >> Date: Thursday, July 1, 2010, 11:13 AM=0A> >> >> On Thu,=
 Jul 1, 2010 at 11:01 AM, alan=0A> >> >> bryan <alan.bryan@yahoo.com>=0A> >=
> >> wrote:=0A> >> >> > Setup:=0A> >> >> >=0A> >> >> > server - FreeBSD 8-s=
table from=0A> today.=A0 2 UFS=0A> >> dirs=0A> >> >> exported via NFS.=0A> =
>> >> > client - FreeBSD 8.0-Release.=0A> =A0Running a=0A> >> test php=0A> =
>> >> script that copies around various files=0A> to/from 2=0A> >> separate=
=0A> >> >> NFS mounts.=0A> >> >> >=0A> >> >> > Situation:=0A> >> >> >=0A> >=
> >> > script is started (forked to do 20=0A> >> simultaneous runs)=0A> >> =
>> and 20 1GB files are copied to the NFS=0A> dir which=0A> >> works=0A> >>=
 >> fine.=A0 When it then switches to reading=0A> those=0A> >> files back=
=0A> >> >> and simultaneously writing to the other=0A> NFS mount=0A> >> I s=
ee a=0A> >> >> hang of 75 seconds.=A0 If I do an "ls -l"=0A> on the=0A> >> =
NFS mount it=0A> >> >> hangs too.=A0 After 75 seconds the client=0A> has=0A=
> >> reported:=0A> >> >> >=0A> >> >> > nfs server=0A> 192.168.10.133:/usr/l=
ocal/export1:=0A> >> not=0A> >> >> responding=0A> >> >> > nfs server=0A> 19=
2.168.10.133:/usr/local/export1:=0A> >> is alive=0A> >> >> again=0A> >> >> =
> nfs server=0A> 192.168.10.133:/usr/local/export1:=0A> >> not=0A> >> >> re=
sponding=0A> >> >> > nfs server=0A> 192.168.10.133:/usr/local/export1:=0A> =
>> is alive=0A> >> >> again=0A> >> >> >=0A> >> >> > and then things start w=
orking=0A> again.=A0 The=0A> >> server was=0A> >> >> originally FreeBSD 8.0=
-Release also but=0A> was=0A> >> upgraded to the=0A> >> >> latest stable to=
 see if this issue could=0A> be=0A> >> avoided.=0A> >> >> >=0A> >> >> > # n=
fsstat -s -W -w 1=0A> >> >> > =A0GtAttr Lookup Rdlink=A0=A0=A0Read=A0=0A> W=
rite=0A> >> Rename=0A> >> >> Access=A0 Rddir=0A> >> >> > =A0 =A0 =A0 0=A0 =
=A0 =A0 0=A0 =A0 =A0 0=A0 =A0=0A> 222=0A> >> 257=0A> >> >> =A0 0=A0 =A0 =A0=
 0=A0 =A0 =A0 0=0A> >> >> > =A0 =A0 =A0 0=A0 =A0 =A0 0=A0 =A0 =A0 0=A0 =A0=
=0A> 178=0A> >> 135=0A> >> >> =A0 0=A0 =A0 =A0 0=A0 =A0 =A0 0=0A> >> >> > =
=A0 =A0 =A0 0=A0 =A0 =A0 0=A0 =A0 =A0 0=A0=0A> =A0=A0=A085=0A> >> =A0 127=
=0A> >> >> =A0 =A0 0=A0 =A0 =A0 0=A0 =A0 =A0 0=0A> >> >> > =A0 =A0 =A0 0=A0=
 =A0 =A0 0=A0 =A0 =A0 0=A0 =A0=0A> =A0 0=0A> >> =A0 0=0A> >> >> =A0 =A0 0=
=A0 =A0 =A0 0=A0 =A0 =A0 0=0A> >> >> > =A0 =A0 =A0 0=A0 =A0 =A0 0=A0 =A0 =
=A0 0=A0 =A0=0A> =A0 0=0A> >> =A0 0=0A> >> >> =A0 =A0 0=A0 =A0 =A0 0=A0 =A0=
 =A0 0=0A> >> >> > =A0 =A0 =A0 0=A0 =A0 =A0 0=A0 =A0 =A0 0=A0 =A0=0A> =A0 0=
=0A> >> =A0 0=0A> >> >> =A0 =A0 0=A0 =A0 =A0 0=A0 =A0 =A0 0=0A> >> >> > =A0=
 =A0 =A0 0=A0 =A0 =A0 0=A0 =A0 =A0 0=A0 =A0=0A> =A0 0=0A> >> =A0 0=0A> >> >=
> =A0 =A0 0=A0 =A0 =A0 0=A0 =A0 =A0 0=0A> >> >> > =A0 =A0 =A0 0=A0 =A0 =A0 =
0=A0 =A0 =A0 0=A0 =A0=0A> =A0 0=0A> >> =A0 0=0A> >> >> =A0 =A0 0=A0 =A0 =A0=
 0=A0 =A0 =A0 0=0A> >> >> >=0A> >> >> > ... for 75 rows of all zeros=0A> >>=
 >> >=0A> >> >> > =A0 =A0 =A0 0=A0 =A0 =A0 0=A0 =A0 =A0 0=A0 =A0=0A> 272=0A=
> >> 266=0A> >> >> =A0 0=A0 =A0 =A0 0=A0 =A0 =A0 0=0A> >> >> > =A0 =A0 =A0 =
0=A0 =A0 =A0 0=A0 =A0 =A0 0=A0 =A0=0A> 167=0A> >> 165=0A> >> >> =A0 0=A0 =
=A0 =A0 0=A0 =A0 =A0 0=0A> >> >> >=0A> >> >> > I also tried runs with 15=0A=
> simultaneous=0A> >> processes and=0A> >> >> 25. =A015 processes gave only=
 about a 5=0A> second=0A> >> stall but 25=0A> >> >> gave again the same 75 =
second stall.=0A> >> >> >=0A> >> >> > Further, I tested with 2 mounts to=0A=
> the same=0A> >> server but=0A> >> >> from ZFS filesytems with the exact s=
ame=0A> >> stall/timeout=0A> >> >> periods. =A0So, it doesn't appear to=0A>=
 matter what=0A> >> the=0A> >> >> underlying filesystem is - it's something=
=0A> in NFS=0A> >> or=0A> >> >> networking code.=0A> >> >> >=0A> >> >> > An=
y ideas on what's going on here?=0A> =A0What's=0A> >> causing=0A> >> >> the=
 complete stall period of zero NFS=0A> activity?=0A> >> Any flaws=0A> >> >>=
 with my testing methods?=0A> >> >> >=0A> >> >> > Thanks for any and all he=
lp/ideas.=0A> >> >>=0A> >> >> What network driver are you using? Have=0A> y=
ou tried=0A> >> >> tcpdumping the packets?=0A> >> >> -Garrett=0A> >> >>=0A>=
 >> >=0A> >> > I'm using igb currently but have also used=0A> em. =A0I=0A> =
>> have not tried tcpdumping the packets yet on this=0A> test.=0A> >> =A0An=
y suggestions on things to look out for (I'm=0A> not that=0A> >> familiar w=
ith that whole process).=0A> >> >=0A> >> > Which brings up another point - =
I'm using=0A> TCP=0A> >> connections for NFS, not UDP.=0A> >>=0A> >> =A0 =
=A0 Is the net.inet.tcp.tso sysctl enabled or=0A> >> not? What about rxcsum=
 and txcsum?=0A> >> Thanks,=0A> >> -Garrett=0A> >>=0A> >=0A> > I haven't in=
tentionally/explicitly set any of this so=0A> it's "default":=0A> >=0A> > #=
 sysctl net.inet.tcp.tso=0A> > net.inet.tcp.tso: 1=0A> >=0A> >=0A> > igb0:=
=0A> flags=3D8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST>=0A> metric 0 mtu =
1500=0A> > =A0 =A0 =A0=0A> =A0options=3D13b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWT=
AGGING,JUMBO_MTU,TSO4>=0A> > =A0 =A0 =A0 =A0ether 00:30:48:c3:26:94=0A> > =
=A0 =A0 =A0 =A0inet 192.168.10.133 netmask 0xffffff00=0A> broadcast 192.168=
.10.255=0A> > =A0 =A0 =A0 =A0media: Ethernet autoselect (1000baseT=0A> <ful=
l-duplex>)=0A> > =A0 =A0 =A0 =A0status: active=0A> =0A> Devise all of the a=
vailable permutations that you need to=0A> use to test=0A> this out; there =
are a total of 3 variables, so 9=0A> permutations, but=0A> you've already `=
tested one', so that makes the permutation=0A> count 8.=0A> Example:=0A> =
=0A> TXCSUM=3Doff, RXCSUM=3Don, TSO=3Don=0A> TXCSUM=3Don, RXCSUM=3Doff, TSO=
=3Don=0A> TXCSUM=3Don, RXCSUM=3Doff, TSO=3Doff=0A> =0A> ...=0A> =0A> Try ex=
ecuting the permutations on the client first, keeping=0A> the server=0A> co=
nstant, then make the client constant and make the server=0A> variable,=0A>=
 and finally do both to the server and client.=0A> =0A> Be sure to take mea=
surements for each permutation to ensure=0A> that=0A> things make functiona=
l sense.=0A> =0A> The reason why I'm suggesting this is that there were=0A>=
 issues with=0A> em(4) [and igb(4) too I think since it uses common code],=
=0A> with various=0A> hardware offload bits on 8.0-RELEASE (IIRC disabling =
txcsum=0A> did the=0A> trick, but you may have to do more than that in orde=
r to=0A> get things to=0A> work).=0A> =0A> Here's a similar thread with a d=
ifferent driver:=0A> http://lists.freebsd.org/pipermail/freebsd-current/200=
9-June/008264.html=0A> (just to illustrate the thought process used to dete=
rmine=0A> the source=0A> of failure).=0A> =0A> Thanks,=0A> -Garrett=0A> =0A=
=0AThanks for the detailed test plan!=0A=0AIs it also fair to then assume t=
hat if I update the NFS client machine to the latest 8-Stable that should a=
lso fix this issue?  (Both will then be running the latest 8-stable code). =
 These are not in production so I can test or upgrade with no issues.=0A=0A=
Thanks again.=0A--Alan=0A=0A=0A=0A      



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?119072.59868.qm>