Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 18 Jan 2014 13:24:43 +0200
From:      Daniel Braniss <danny@cs.huji.ac.il>
To:        Rick Macklem <rmacklem@uoguelph.ca>
Cc:        FreeBSD stable <freebsd-stable@freebsd.org>
Subject:   Re: on 9.2-stable nfs/zfs and 10g hang
Message-ID:  <2C287272-7B57-4AAD-B22F-6A65D9F8677B@cs.huji.ac.il>
In-Reply-To: <588564685.11730322.1389970076386.JavaMail.root@uoguelph.ca>
References:  <588564685.11730322.1389970076386.JavaMail.root@uoguelph.ca>

next in thread | previous in thread | raw e-mail | index | archive | help

On Jan 17, 2014, at 4:47 PM, Rick Macklem <rmacklem@uoguelph.ca> wrote:

> Daniel Braniss wrote:
>> hi all,
>>=20
>> All was going ok till I decided to connect this host via a 10g nic
>> and very soon it started
>> to hang. Running multiple make buildworlds from other hosts connected
>> via 10g and
>> using both src and obj on the server via tcp/nfs did ok. but running
>> 	find =85 -exec md5 {} + (the find finds over 6M files)
>> from another host (at 10g) will hang it very quickly.
>>=20
>> If I wait a while (can=92t be more specific) it sometimes recovers -
>> but my users are not very
>> patient :-)
>>=20
> This suggests that an RPC request/reply gets dropped in a way that TCP
> doesn't recover. Eventually (after up to about 15min, I think?) the =
TCP
> connection will be shut down and a new TCP connection started, with a
> retry of outstanding RPCs.
>=20
>> I will soon try the same experiment using the old 1G nic, but in the
>> meantime, if someone
>> could shed some light would be very helpful
>>=20
>> I=92m attaching core.txt, but if it doesn=92t make it, it=92s also
>> available at:
>> 	ftp://ftp.cs.huji.ac.il/users/danny/freebsd/core.txt.16
>>=20
> You might try disabling TSO on the net interface. There are been =
issues
> with TSO for segments around 64K in the past (or use =
rsize=3D32768,wsize=3D32768
> options on the client mount, to avoid RPCs over about 32K in size).
>=20
BINGO! disabling tso did it. I=92ll try reducing the packet size later.
some numbers:
there where some 7*10^6 files
doing it locally (the find + md5) took about 3hs,
via nfs at 1g took 11 hrs.
at 10g it took 4 hrs.

thanks!
	danny


> Beyond that, capturing a packet trace for the case that hangs easily =
and
> looking at what goes on near the end of it in wireshark might give you
> a hint about what is going on.
>=20
> rick
>=20
>> thanks,
>> 	danny
>> _______________________________________________
>> freebsd-stable@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
>> To unsubscribe, send any mail to
>> "freebsd-stable-unsubscribe@freebsd.org"
>>=20




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?2C287272-7B57-4AAD-B22F-6A65D9F8677B>