From owner-freebsd-current@FreeBSD.ORG Fri Apr 2 16:44:09 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4FE3D16A4CE for ; Fri, 2 Apr 2004 16:44:09 -0800 (PST) Received: from ms-smtp-01-eri0.socal.rr.com (ms-smtp-01-qfe0.socal.rr.com [66.75.162.133]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2E98243D46 for ; Fri, 2 Apr 2004 16:44:09 -0800 (PST) (envelope-from sean@mcneil.com) Received: from mail.mcneil.com (rrcs-west-24-199-45-54.biz.rr.com [24.199.45.54])i330i3u2027621; Fri, 2 Apr 2004 16:44:05 -0800 (PST) Received: from localhost (localhost.mcneil.com [127.0.0.1]) by mail.mcneil.com (Postfix) with ESMTP id 9F893FD2BB; Fri, 2 Apr 2004 16:44:02 -0800 (PST) Received: from mail.mcneil.com ([127.0.0.1]) by localhost (server.mcneil.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 03313-09; Fri, 2 Apr 2004 16:44:02 -0800 (PST) Received: from [24.199.45.54] (mcneil.com [24.199.45.54]) by mail.mcneil.com (Postfix) with ESMTP id 2A450FD266; Fri, 2 Apr 2004 16:44:02 -0800 (PST) From: Sean McNeil To: Dan Nelson In-Reply-To: <20040403000742.GD49311@dan.emsphone.com> References: <1080882894.5980.26.camel@server.mcneil.com> <20040402163353.GC6724@dan.emsphone.com> <1080940409.3711.1.camel@server.mcneil.com> <20040402215745.GB49311@dan.emsphone.com> <1080949413.49158.27.camel@server.mcneil.com> <20040403000742.GD49311@dan.emsphone.com> Content-Type: text/plain Message-Id: <1080953041.51638.11.camel@server.mcneil.com> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.6 Date: Fri, 02 Apr 2004 16:44:02 -0800 Content-Transfer-Encoding: 7bit X-Virus-Scanned: Symantec AntiVirus Scan Engine X-Virus-Scanned: by amavisd-new at mcneil.com cc: freebsd-current@freebsd.org Subject: Re: nfs server issues X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 03 Apr 2004 00:44:09 -0000 On Fri, 2004-04-02 at 16:07, Dan Nelson wrote: > In the last episode (Apr 02), Sean McNeil said: > > On Fri, 2004-04-02 at 13:57, Dan Nelson wrote: > > > In the last episode (Apr 02), Sean McNeil said: > > > > OK, here is a tcpdump. It is confusing. It looks like after the > > > > first fragment is received it is looking up some bazaar IP > > > > address.... > > > > > > > > 13:02:57.566952 free.mcneil.com.1360032988 > server.mcneil.com.nfs: 136 readdir fh 1002,54097/7890231 4096 bytes @ 0x000000000 (DF) > > > > 13:02:57.567266 server.mcneil.com.nfs > free.mcneil.com.1360032988: reply ok 1472 readdir (frag 1645:1480@0+) > > > > 13:02:57.567268 0.0.0.1 > 0.0.10.7: (frag 1645:4@1480) > > > > > > Weird. Is this at the server or the client? > > > > This is a client-side dump. Both server and client have MTU of 1500. > > > > Server side says: > > > > 15:37:44.292564 IP free.mcneil.com.851449566 > server.mcneil.com.nfs: 136 readdir fh 1002,54097/7890231 4096 bytes @ 0x0 > > 15:37:44.292705 IP server.mcneil.com.nfs > free.mcneil.com.851449566: reply ok 1472 readdir > > 15:37:44.292711 IP server.mcneil.com > free.mcneil.com: udp > > > > Is there something in a packet that tells rpc/nfs to reassemble with > > something other than the source/destination info? > > Neither RPC or NFS are involved with fragmentation. That's all done at > the UDP level. I wonder if it's a NIC problem. Can you try a > different card (maybe even a different brand of card if possible)? > another interesting test would be to get a hub and a 3rd machine, then > do dumps with the hub on the server's port, and then the client's port. > If you get garbled frags in both places, I'd lean toward a NIC problem > on the server. If your card supports checksum offloading, try > disabling it (ifconfig xx0 -rxcsum -txcsum). Bingo! It looks like a problem with checksum offloading: ifconfig re0 -rxcsum -txcsum and now it no longer hangs. Good call! The NIC in question is: re0: port 0xa400-0xa4ff mem 0xdf004000-0xdf0040ff irq 12 at device 11.0 on pci1 The extremely odd thing is http, ldap, samba, and many other services that go both to the box and are sent out via nat all work fine. nfs is the only protocol I've seen that has an issue. I am happy now :) Cheers, Sean