Date: Thu, 15 Apr 2021 21:14:32 +0000 From: "Scheffenegger, Richard" <Richard.Scheffenegger@netapp.com> To: Rick Macklem <rmacklem@uoguelph.ca>, Allan Jude <allanjude@freebsd.org>, "freebsd-current@freebsd.org" <freebsd-current@freebsd.org> Cc: Richard Scheffenegger <rscheff@FreeBSD.org>, Juraj Lutter <otis@FreeBSD.org> Subject: AW: NFS issues since upgrading to 13-RELEASE Message-ID: <SN4PR0601MB3728FEEB12F4F4F66E2276E0864D9@SN4PR0601MB3728.namprd06.prod.outlook.com> In-Reply-To: <YQXPR0101MB096883332B60E632ADA6F2A4DD4D9@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> References: <902a3c81-2ce8-49c0-b163-5ffa4b90afe5@www.fastmail.com>, <e8f585eb-a2a8-ae9d-7f33-526e412ec462@freebsd.org>, <YQXPR0101MB09681707D3F3DC10814A905BDD4D9@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <YQXPR0101MB096883332B60E632ADA6F2A4DD4D9@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM>
next in thread | previous in thread | raw e-mail | index | archive | help
FWIW: r367492 fixes an issue around "premature" transmission of an ACK due to the= incoming segment only been partially processed at the time - related to in= -kernel TCP consumers which use socket upcalls. Rick mentioned, that the NFS server (one in-kernel TCP user) has stringent = requirements on the state of the socket during the upcall, thus D29690 is r= etaining the lock on the socket buffer until TCP processing is finalized an= d the upcall can be done without running any risk for transmitting outdated= information back to the other end. However, I have no proper way to verify/validate this interaction. My ask would be to test the behavior with D29690 first - but if similar han= gs keep reoccurring, then revert r367492 (which will also mean more severe = surgery on the TCP processing flow). Thanks. Richard Scheffenegger -----Urspr=FCngliche Nachricht----- Von: Rick Macklem <rmacklem@uoguelph.ca>=20 Gesendet: Donnerstag, 15. April 2021 23:05 An: Allan Jude <allanjude@freebsd.org>; freebsd-current@freebsd.org Cc: Richard Scheffenegger <rscheff@FreeBSD.org>; Juraj Lutter <otis@FreeBSD= .org> Betreff: Re: NFS issues since upgrading to 13-RELEASE NetApp Security WARNING: This is an external email. Do not click links or o= pen attachments unless you recognize the sender and know the content is saf= e. I wrote: [stuff snipped] >- Alternately you can try rscheff@'s alternate proposed patch that is=20 >at > https://reviews.freebsd.og/D29690. Oops, that's https:/reviews.freebsd.org/D29690 rick I have not yet had time to test this one, but since I cannot reproduce th= e hang, I can only do testing of it to see that it is "no worse" than reverting r367492= for my setup. Please let us know which you choose and whether or not it fixes your proble= m. >> Any pointers for troubleshooting this? I've been looking through vmstat,= gstat, top, etc. when the problem occurs, but I haven't been able to pinpo= int the issue. I can get pcap, but it would be from the hosts, because I do= n't have a 10G tap or managed switch. >> > >run `nfsstat -d 1` and try to capture a few lines from before, during,=20 >and after the stall, and that may provide some insight. > >Specifically, does the queue length grow, suggesting it is waiting on=20 >the I/O subsystem, or does it just stop getting traffic all together. If the revert of r367492 does not fix the problem, monitor the TCP connecti= on(s) via "netstat -a" and, if possible, capture packets via tcpdump -s 0 -= w hang.pcap host <nfs-client> or similar, run on the server. Ideally the tcpdump would be started before the "hang" occurs, but running= one while the hang is occurring (until after it recovers) could also be us= eful. Thanks for reporting this, rick -- Allan Jude _______________________________________________ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/= listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?SN4PR0601MB3728FEEB12F4F4F66E2276E0864D9>