Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 2 Nov 2009 10:37:24 -0500 (EST)
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Olaf Seibert <O.Seibert@cs.ru.nl>
Cc:        danny@cs.huji.ca.il, dfr@freebsd.org, freebsd-stable@freebsd.org
Subject:   Re: 8.0-RC1 NFS client timeout issue
Message-ID:  <Pine.GSO.4.63.0911021028140.10631@muncher.cs.uoguelph.ca>
In-Reply-To: <20091102100958.GY841@twoquid.cs.ru.nl>
References:  <20091027164159.GU841@twoquid.cs.ru.nl> <Pine.GSO.4.63.0910281624440.18390@muncher.cs.uoguelph.ca> <20091029135239.GX841@twoquid.cs.ru.nl> <Pine.GSO.4.63.0911011713290.23081@muncher.cs.uoguelph.ca> <20091102100958.GY841@twoquid.cs.ru.nl>

next in thread | previous in thread | raw e-mail | index | archive | help


On Mon, 2 Nov 2009, Olaf Seibert wrote:

>> Although I think the patch does avoid sending the request on the
>> partially closed connection, it doesn't fix the "real problem",
>> so I don't know if it is worth testing?
>
> Well, I tested it anyway, just in case. It seems to work fine for me, so
> far.
>
Yes, I think the patch is ok, but it doesn't completely resolve the
reconnect issue. It's good to hear that it helps for your case.

> I don't see your extra RSTs either. Maybe that is because in my case the
> client used a different port number for the new connection. (Usually,
> this is controlled by the TCP option SO_REUSEADDR from <sys/socket.h>).
>
For my packet trace, it is using different port#s. The problem is that,
for some reason, it sends the RST from the new port# instead of the port#
for the old connection just closed via soclose().

I don't know why you don't see the extra RSTs, but consider yourself
lucky, since you should be ok without them. (It may simply be that your
server isn't Solaris10 --> a different TCP stack in it.)

Do you happen to know what your server is?

>> I'm hoping that the "Help TCP Wizards..." thread I just started
>> on freebsd-current comes up with something.
>>
>> At least I can reproduce the problem now. (For some reason, I have
>> to reboot the Solaris10 server before the problem appears for me.
>> I can't think why this matters, but that's networking for you:-)
>
> Maybe it depends on server load or something. This particular server is
> a central file server at a university, it may have some more pressure to
> terminate unused connections.
>
Or type of server (ie. not Solaris10). It definitely depends upon timing
in the client. (I'm about to try introducing a 1sec delay before the
soconnect() call and see if that makes the RSTs go away. Not much of a
fix, but...)

I now recall that I ran into a similar problem (although I didn't dig
into the packet traces then) when testing my Mac OS X 10 client, which
uses essentially the reconnect code from Mac OS X 10.4 Tiger. I "fixed"
it by adding a 1sec delay before the reconnect.

Thanks for helping with testing, rick




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.GSO.4.63.0911021028140.10631>