Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 25 Jan 2014 23:39:45 -0500 (EST)
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Daniel Braniss <danny@cs.huji.ac.il>
Cc:        Adrian Chadd <adrian@freebsd.org>, FreeBSD stable <freebsd-stable@freebsd.org>, Luigi Rizzo <luigi@freebsd.org>
Subject:   Re: on 9.2-stable nfs/zfs and 10g hang
Message-ID:  <1716898514.16343391.1390711185414.JavaMail.root@uoguelph.ca>
In-Reply-To: <C2102616-3239-4425-8475-51B709A57737@cs.huji.ac.il>

next in thread | previous in thread | raw e-mail | index | archive | help
Daniel Braniss wrote:
>=20
>=20
>=20
> On Jan 18, 2014, at 6:13 PM, Adrian Chadd < adrian@freebsd.org >
> wrote:
>=20
>=20
> Hi!
>=20
> Please try reducing the size down to 32k but leave TSO enabled.
>=20
> did so, it worked ok, but took longer:
> with TSO disabled: 14834.61 real 609.29 user 1996.90 sys
> with TSO + 32k: 15714.46 real 639.98 user 1828.07 sys
>=20
I just might have an idea why TSO breaks for rsize/wsize=3D65536 for
NFS. (Note that I know diddly about these drivers and am going on
what I've seen in the last few minutes of looking at them.)

When the rsize/wsize is set to 64K, NFS will generate a list of
about 32 mbuf clusters. The krpc may add more for the RPC header,
then this list is handed to TCP via sosend(). I'm not sure what
TCP does with it when TSO is enabled, but hopefully you guys do?

What I am wondering about is...can the mbuf list exceed the scatter
size the driver handles (IXGBE_82599_SCATTER is 32)? The code in the
.._xmit() function seems to try and m_defrag() it once and then
throws it away. (If it is thrown away, that would be pretty disastrous
for NFS.)

Could this be happening?

Can IXGBE_82599_SCATTER be increased or is it a hardware limit?

Sorry, if this is total bunk, since I know diddly about this, rick

>=20
>=20
> It's 9.2, so there may be some bugfixes that haven't been backported
> from 10 or -HEAD. Would you be able to try a -HEAD snapshot here?
>=20
> ENOTIME :-).
>=20
>=20
>=20
> What's the NFS server and hosts? I saw the core.txt.16 that says
> "ix0/ix1" so I can glean the basic chipset family but which NIC in
> particular is it? What would people need to try and reproduce it?
>=20
> The hosts involved are Dell 720/710
> the 10G card are Intel
>=20
>=20
> ix0@pci0:5:0:0: class=3D0x020000 card=3D0x7a118086 chip=3D0x10fb8086
> rev=3D0x01 hdr=3D0x00
> vendor =3D 'Intel Corporation'
> device =3D '82599EB 10-Gigabit SFI/SFP+ Network Connection'
> class =3D network
> subclass =3D ethernet
>=20
>=20
> the server is exporting a big ZFS file system, which is served via 2
> raid controllers:
>=20
>=20
>=20
> mfi1@pci0:65:0:0: class=3D0x010400 card=3D0x1f2d1028 chip=3D0x005b1000
> rev=3D0x05 hdr=3D0x00
> vendor =3D 'LSI Logic / Symbios Logic'
> device =3D 'MegaRAID SAS 2208 [Thunderbolt]'
> class =3D mass storage
> subclass =3D RAID
> mfi2@pci0:66:0:0: class=3D0x010400 card=3D0x1f151028 chip=3D0x00791000
> rev=3D0x05 hdr=3D0x00
> vendor =3D 'LSI Logic / Symbios Logic'
> device =3D 'MegaRAID SAS 2108 [Liberator]'
> class =3D mass storage
> subclass =3D RAID
>=20
>=20
> - just had the driver card lying around-
>=20
>=20
> I will try a divergent client, which has a Broadcom Nic later.
>=20
>=20
> Q: is the TSO bug in the NIC/driver or in the kernel or both?
>=20
>=20
> cheers
> danny
>=20
>=20
>=20
>=20
>=20
>=20
>=20
>=20
>=20
>=20
> -a
>=20
>=20
> On 18 January 2014 03:24, Daniel Braniss < danny@cs.huji.ac.il >
> wrote:
>=20
>=20
>=20
> On Jan 17, 2014, at 4:47 PM, Rick Macklem < rmacklem@uoguelph.ca >
> wrote:
>=20
>=20
>=20
> Daniel Braniss wrote:
>=20
>=20
> hi all,
>=20
> All was going ok till I decided to connect this host via a 10g nic
> and very soon it started
> to hang. Running multiple make buildworlds from other hosts connected
> via 10g and
> using both src and obj on the server via tcp/nfs did ok. but running
> find =E2=80=A6 -exec md5 {} + (the find finds over 6M files)
> from another host (at 10g) will hang it very quickly.
>=20
> If I wait a while (can=E2=80=99t be more specific) it sometimes recovers =
-
> but my users are not very
> patient :-)
>=20
> This suggests that an RPC request/reply gets dropped in a way that
> TCP
> doesn't recover. Eventually (after up to about 15min, I think?) the
> TCP
> connection will be shut down and a new TCP connection started, with a
> retry of outstanding RPCs.
>=20
>=20
>=20
> I will soon try the same experiment using the old 1G nic, but in the
> meantime, if someone
> could shed some light would be very helpful
>=20
> I=E2=80=99m attaching core.txt, but if it doesn=E2=80=99t make it, it=E2=
=80=99s also
> available at:
> ftp://ftp.cs.huji.ac.il/users/danny/freebsd/core.txt.16
>=20
> You might try disabling TSO on the net interface. There are been
> issues
> with TSO for segments around 64K in the past (or use
> rsize=3D32768,wsize=3D32768
> options on the client mount, to avoid RPCs over about 32K in size).
>=20
> BINGO! disabling tso did it. I=E2=80=99ll try reducing the packet size la=
ter.
> some numbers:
> there where some 7*10^6 files
> doing it locally (the find + md5) took about 3hs,
> via nfs at 1g took 11 hrs.
> at 10g it took 4 hrs.
>=20
> thanks!
> danny
>=20
>=20
>=20
>=20
> Beyond that, capturing a packet trace for the case that hangs easily
> and
> looking at what goes on near the end of it in wireshark might give
> you
> a hint about what is going on.
>=20
> rick
>=20
>=20
>=20
> thanks,
> danny
> _______________________________________________
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to
> "freebsd-stable-unsubscribe@freebsd.org"
>=20
>=20
> _______________________________________________
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to
> "freebsd-stable-unsubscribe@freebsd.org"
>=20
>=20



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1716898514.16343391.1390711185414.JavaMail.root>