From owner-svn-src-all@FreeBSD.ORG  Thu Jun  9 18:31:34 2011
Return-Path: <owner-svn-src-all@FreeBSD.ORG>
Delivered-To: svn-src-all@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 1E6DD1065673;
	Thu,  9 Jun 2011 18:31:34 +0000 (UTC)
	(envelope-from sobomax@sippysoft.com)
Received: from mail.sippysoft.com (mail.sippysoft.com [4.59.13.245])
	by mx1.freebsd.org (Postfix) with ESMTP id A9F768FC1A;
	Thu,  9 Jun 2011 18:31:33 +0000 (UTC)
Received: from [4.59.13.245] (helo=[192.168.1.79])
	by mail.sippysoft.com with esmtpsa (TLSv1:CAMELLIA256-SHA:256)
	(Exim 4.72 (FreeBSD)) (envelope-from <sobomax@sippysoft.com>)
	id 1QUk0y-000Fn0-PV; Thu, 09 Jun 2011 11:31:32 -0700
Message-ID: <4DF11183.3060806@FreeBSD.org>
Date: Thu, 09 Jun 2011 11:31:31 -0700
From: Maxim Sobolev <sobomax@FreeBSD.org>
Organization: Sippy Software, Inc.
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US;
	rv:1.9.2.17) Gecko/20110414 Lightning/1.0b2 Thunderbird/3.1.10
MIME-Version: 1.0
To: Mikolaj Golub <trociny@freebsd.org>
References: <201106041601.p54G1Ut7016697@svn.freebsd.org>	<BA66495E-AED3-459F-A5CD-69B91DB359BC@lists.zabbadoz.net>	<4DEA653F.7070503@FreeBSD.org>	<201106061057.p56Av3u7037614@kernblitz.nuclight.avtf.net>	<4DED1CC5.1070001@FreeBSD.org>
	<86wrgvkv67.fsf@kopusha.home.net>
In-Reply-To: <86wrgvkv67.fsf@kopusha.home.net>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Sender: sobomax@sippysoft.com
X-ssp-trusted: yes
Cc: vadim_nuclight@mail.ru, Kostik Belousov <kib@FreeBSD.org>,
	svn-src-all@FreeBSD.org, Pawel Jakub Dawidek <pjd@FreeBSD.org>
Subject: Re: svn commit: r222688 - head/sbin/hastd
X-BeenThere: svn-src-all@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "SVN commit messages for the entire src tree \(except for &quot;
	user&quot; and &quot; projects&quot; \)" <svn-src-all.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/svn-src-all>,
	<mailto:svn-src-all-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-all>
List-Post: <mailto:svn-src-all@freebsd.org>
List-Help: <mailto:svn-src-all-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/svn-src-all>,
	<mailto:svn-src-all-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 09 Jun 2011 18:31:34 -0000

On 6/9/2011 6:10 AM, Mikolaj Golub wrote:
>   >>>  Hmm, not sure what exactly is wrong? Sender does 3 writes to the TCP
>   >>>  socket - 32k, 32k and 1071 bytes, while receiver does one
>   >>>  recv(MSG_WAITALL) with the size of 66607. So I suspect sender's kernel
>   >>>  does deliver two 32k packets and fills up receiver's buffer or
>   >>>  something. And the remaining 1071 bytes stay somewhere in sender's
>   >>>  kernel indefinitely, while recv() cannot complete in receiver's. Using
>   >>>  the same size when doing recv() solves the issue for me.
>
> With MSG_WAITALL, if data to receive are larger than receive buffer, after
> receiving some part of data it is drained to user buffer and the protocol is
> notified (sending window update) that there is some space in the receive
> buffer. So, normally, there should not be an issue with the scenario described
> above. But there was a race in soreceive_generic(), I believe I have fixed in
> r222454, when the connection could stall in sbwait. Do you still observe the
> issue with only r222454 applied?

The patch makes things slightly better, but it appears that there are 
still some "magic" buffer sizes that got stuck somewhere. Particularly 
66607 bytes in my case. You can probably easily reproduce the issue by 
creating large disk with data of various kind (i.e. FreeBSD UFS with 
source/object code for example), enabling compression and setting block 
size to 128kb. Then at least if you run this scenario over WAN it should 
stuck from time to time when hitting that "magic" size. One can probably 
easily write simple test case in C with server part sending 32k, 32k and 
1071 bytes and receiver reading the whole message with WAITALL. 
Unfortunately I am overloaded right now, so it's unlikely that I would 
do it.

>   MS>  MSG_WAITALL might be an issue here. I suspect receiver's kernel can't
>   MS>  dequeue two 32k packets until the last chunk arrives. I don't have a
>   MS>  time to look into it in detail unfortunately.
>
> Sorry, but I think your patch is wrong. If even it fixes the issue for you,
> actually I think it does not fix but hides a real problem we have to address.
>
> Receiving the whole chunk at once should be more effectively because we do one
> syscall instead of several. Also, if you receive in smaller chunks no need to
> set MSG_WAITALL at all.
>
> Besides, with your patch I am observing hangs on primary startup in
>
> init_remote->primary_connect->proto_connection_recv->proto_common_recv
>
> The primary worker process asks the parent to connect to the secondary. After
> establishing the connection the parent sends connection protocol name and
> descriptor to the worker (proto_connection_send/proto_connection_recv). The
> issue here is that in proto_connection_recv() the size of protoname is
> unknown, so it calls proto_common_recv() with size = 127, larger than
> protoname ("tcp").
>
> It worked previously because after sending protoname proto_connection_send()
> sends the descriptor calling sendmsg(). This is data of different type and it
> makes recv() return although only 4 bytes of 127 requested were received.
>
> With your patch, after receiving these 4 bytes it returns back to recv()
> waiting for rest 123 bytes and gets stuck forever. Don't you observe this?  It
> is strange, because for me it hangs on every start up. I am seeing this on
> yesterday current.

Yes, you are right. It appears that I did not test new code on primary, 
only on secondary. Which explains why I did not see that issue. Can you 
please try the following patch and let me know if it solves the issue 
for you?

http://sobomax.sippysoft.com/hastd.diff

-Maxim