From owner-freebsd-hackers@FreeBSD.ORG  Sat Jun  8 13:16:31 2013
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 81AC97CD
 for <freebsd-hackers@freebsd.org>; Sat,  8 Jun 2013 13:16:31 +0000 (UTC)
 (envelope-from rwatson@FreeBSD.org)
Received: from cyrus.watson.org (cyrus.watson.org [198.74.231.69])
 by mx1.freebsd.org (Postfix) with ESMTP id 457FD1473
 for <freebsd-hackers@freebsd.org>; Sat,  8 Jun 2013 13:16:31 +0000 (UTC)
Received: from fledge.watson.org (fledge.watson.org [198.74.231.63])
 by cyrus.watson.org (Postfix) with ESMTPS id 4FE7B46B8D;
 Sat,  8 Jun 2013 09:16:30 -0400 (EDT)
Date: Sat, 8 Jun 2013 14:16:30 +0100 (BST)
From: Robert Watson <rwatson@FreeBSD.org>
X-X-Sender: robert@fledge.watson.org
To: vasanth rao naik sabavat <vasanth.raonaik@gmail.com>
Subject: Re: question in sosend_generic()
In-Reply-To: <CAAuizBikR=Ooj9z=JMK4mOJXQQb4W54icZpuTDGt6Zn6=8+6ew@mail.gmail.com>
Message-ID: <alpine.BSF.2.00.1306081358090.44664@fledge.watson.org>
References: <CAAuizBikR=Ooj9z=JMK4mOJXQQb4W54icZpuTDGt6Zn6=8+6ew@mail.gmail.com>
User-Agent: Alpine 2.00 (BSF 1167 2008-08-23)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-hackers@freebsd.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 08 Jun 2013 13:16:31 -0000


On Fri, 7 Jun 2013, vasanth rao naik sabavat wrote:

> When sending data out of the socket I don't see in the code where the sb_cc 
> is incremented.

sb_cc reflects data appended to the socket buffer; sosend_generic() is 
responsible for arranging copying in and performing flow control, but the 
protocol's own pru_send() routine performs the append.  E.g., tcp_usr_send() 
performs sbappendstream() which actually adds it to the socket buffer. 
Notice that not all protocols actually use the send socket buffer -- for 
example, UNIX domain sockets direct cross-deliver to the receiving socket's 
receive buffer.

> Is the socket send performed in the same thread of execution or the data is 
> copied on to the socket send buffer and a different thread then sends the 
> data out of the socket?

Protocols provide their own implementations to handle data moving down the 
stack, so the specifics are protocol-dependent.  In TCP, socket buffer append 
occurs synchronously in the same thread as part of the pru_send() downcall 
from the socket layer.  When data leaves the send socket buffer is quite a 
different question.  For TCP, data may be sent immediately if there various 
windows allow immediate transmit of the data (e.g., flow control, congestion 
control) ... or it may remain enqueued in the send socket buffer until an ACK 
is received that indicates the receiver is ready for more data (E.g., growing 
window size, ACK clocking, etc).  In the steady send state (e.g., filling the 
window) I would expect to see data sent (and later removed) from the socket 
buffer only in an asynchronous context.  Typically, ACK processing occurs in 
one of two threads: device driver interrupt handling (i.e., in the ithread) or 
in the netisr thread for encapsulated or looped back traffic.

> Because, I see a call to sbwait(&so->so_snd) in the sosend_generic and I 
> don't understand who would wake up this thread?

sbwait() implements blocking for flow/congestion control: when the socket 
buffer fills, the sending thread must wait for space to open up.  Space 
becomes available as a result of successful transmit -- e.g., the sbtruncate() 
of the sending socket buffer when a TCP ACK has been received.  So the thread 
that triggers the wakeup will usually be the ithread or netisr.  In the case 
of UNIX domain sockets, it's actually the receiving thread that triggers the 
wakeup directly.


> If the data is not copied on to the socket buffers then it should 
> technically send all data out in the same thread of execution and future 
> socket send calls should see that space is always fully available. In that 
> case I dont see a reason why we need to wait on the socket send buffer. As 
> there would no one who will actually wake you up.

There are some false assumptions here.  The sending thread will always append 
data [that fits] to the socket buffer, but may have to loop awaiting space for 
all data, depending on blocking/non-blocking status.  Space becomes available 
when the remote endpoint acknowledges receipt, perhaps via a TCP ACK.  You 
might never wake up if flow control from the remote endpoint doesn't find 
space becoming available, you've enabled blocking, and no timeout is set.  If 
you fear the recipient may block the sender, then you need to implement some 
timeout mechanism to decide how long you're willing to wait.

>                if (space < resid + clen &&
>                    (atomic || space < so->so_snd.sb_lowat || space <
> clen)) {
>                        if ((so->so_state & SS_NBIO) || (flags & MSG_NBIO))
> {
>                                SOCKBUF_UNLOCK(&so->so_snd);
>                                error = EWOULDBLOCK;
>                                goto release;
>                        }
>                        error = sbwait(&so->so_snd);
>                        SOCKBUF_UNLOCK(&so->so_snd);
>                        if (error)
>                                goto release;
>                        goto restart;
>                }
>
> In the above code snippet, for a blocking socket if the space is not
> available, then it may trigger a deadlock?

You can experience deadlocks between senders and receivers as a result of 
cyclic waits for constrained resources (e.g., buffers).  However, that is a 
property of application design, and applications that are killed will close 
their sockets, releasing resources.  Most application designers attempt to 
avoid deadlock in their designs by ensuring that there is a path to progress, 
even a slow one.

The deadlock you're suggesting in general does not exist -- it would be silly 
to wait for something that could never happen.  Instead, we wait for things 
that generally will happen (e.g., a TCP ACK) or a timeout, which would close 
the connection.  Notice that sbwait() is allowed to fail -- if the connection 
is severed due to a timeout or RST, then it returns immediately with an error.

Robert