Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 15 Apr 2021 21:14:32 +0000
From:      "Scheffenegger, Richard" <Richard.Scheffenegger@netapp.com>
To:        Rick Macklem <rmacklem@uoguelph.ca>, Allan Jude <allanjude@freebsd.org>, "freebsd-current@freebsd.org" <freebsd-current@freebsd.org>
Cc:        Richard Scheffenegger <rscheff@FreeBSD.org>, Juraj Lutter <otis@FreeBSD.org>
Subject:   AW: NFS issues since upgrading to 13-RELEASE
Message-ID:  <SN4PR0601MB3728FEEB12F4F4F66E2276E0864D9@SN4PR0601MB3728.namprd06.prod.outlook.com>
In-Reply-To: <YQXPR0101MB096883332B60E632ADA6F2A4DD4D9@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM>
References:  <902a3c81-2ce8-49c0-b163-5ffa4b90afe5@www.fastmail.com>, <e8f585eb-a2a8-ae9d-7f33-526e412ec462@freebsd.org>, <YQXPR0101MB09681707D3F3DC10814A905BDD4D9@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <YQXPR0101MB096883332B60E632ADA6F2A4DD4D9@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM>

next in thread | previous in thread | raw e-mail | index | archive | help
FWIW:

r367492 fixes an issue around "premature" transmission of an ACK due to the=
 incoming segment only been partially processed at the time - related to in=
-kernel TCP consumers which use socket upcalls.

Rick mentioned, that the NFS server (one in-kernel TCP user) has stringent =
requirements on the state of the socket during the upcall, thus D29690 is r=
etaining the lock on the socket buffer until TCP processing is finalized an=
d the upcall can be done without running any risk for transmitting outdated=
 information back to the other end.

However, I have no proper way to verify/validate this interaction.

My ask would be to test the behavior with D29690 first - but if similar han=
gs keep reoccurring, then revert r367492 (which will also mean more severe =
surgery on the TCP processing flow).

Thanks.

Richard Scheffenegger

-----Urspr=FCngliche Nachricht-----
Von: Rick Macklem <rmacklem@uoguelph.ca>=20
Gesendet: Donnerstag, 15. April 2021 23:05
An: Allan Jude <allanjude@freebsd.org>; freebsd-current@freebsd.org
Cc: Richard Scheffenegger <rscheff@FreeBSD.org>; Juraj Lutter <otis@FreeBSD=
.org>
Betreff: Re: NFS issues since upgrading to 13-RELEASE

NetApp Security WARNING: This is an external email. Do not click links or o=
pen attachments unless you recognize the sender and know the content is saf=
e.




I wrote:
[stuff snipped]
>- Alternately you can try rscheff@'s alternate proposed patch that is=20
>at
>  https://reviews.freebsd.og/D29690.
Oops, that's
    https:/reviews.freebsd.org/D29690

rick

  I have not yet had time to test this one, but since I cannot reproduce th=
e hang, I can
  only do testing of it to see that it is "no worse" than reverting r367492=
 for my
  setup.

Please let us know which you choose and whether or not it fixes your proble=
m.

>> Any pointers for troubleshooting this? I've been looking through vmstat,=
 gstat, top, etc. when the problem occurs, but I haven't been able to pinpo=
int the issue. I can get pcap, but it would be from the hosts, because I do=
n't have a 10G tap or managed switch.
>>
>
>run `nfsstat -d 1` and try to capture a few lines from before, during,=20
>and after the stall, and that may provide some insight.
>
>Specifically, does the queue length grow, suggesting it is waiting on=20
>the I/O subsystem, or does it just stop getting traffic all together.

If the revert of r367492 does not fix the problem, monitor the TCP connecti=
on(s) via "netstat -a" and, if possible, capture packets via tcpdump -s 0 -=
w hang.pcap host <nfs-client> or similar, run on the server.

Ideally the tcpdump would  be started before the "hang" occurs, but running=
 one while the hang is occurring (until after it recovers) could also be us=
eful.

Thanks for reporting this, rick

--
Allan Jude
_______________________________________________
freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/=
listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?SN4PR0601MB3728FEEB12F4F4F66E2276E0864D9>