Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 01 Sep 2013 01:54:59 -0700
From:      Yuri <yuri@rawbw.com>
To:        net@FreeBSD.org
Subject:   Packet loss when 'control' messages are present with large data (sendmsg(2))
Message-ID:  <522300E3.6050303@rawbw.com>

next in thread | raw e-mail | index | archive | help
I found the case when sendmsg(2) silently loses packets for AF_LOCAL 
domain when large packets with control part in them are sent.

Here is how:
There is the watermark limit on sockbuf determined by 
net.local.stream.sendspace, default is 8192 bytes (field sockbuf.sb_hiwat).
When sendmsg(2) sends large enough data (8K+ that hits this 8192 limit) 
with control message, sosend_generic will be cutting the message data 
into separate mbufs based on 'sbspace' (derived from the above-mentioned 
sb_hiwat limit) with adjustment for control message size as it sees it. 
This way it tries to make sure this sb_hiwat limit is enforced.

However, down on uipc level control message is being further modified in 
two ways: unp_internalize modifies it into some 'internal' form, also 
unp_addsockcred function adds another control message when LOCAL_CREDS 
are requested by client. Both functions only increase control message 
size beyond its original size (seen by sosend_generic). So that the 
first final mbuf sent (concatenation of control and data) will always be 
larger than 'sbspace' limit that sosend_generic was cutting data for.

There is also the function sbappendcontrol_locked. It checks the 
'sbspace' limit again, and discards the packet when sbspace llimit is 
exceeded. Its result code is essentially ignored in uipc_send. I 
believe, sbappendcontrol_locked shouldn't be checking space at all, 
since packets are expected to be properly sized to begin with. But this 
won't be the right fix, since sizes would be exceeding the sbspace limit 
anyway.

sosend_default is one level up over uipc level, and it doesn't know what 
uipc will do with control message. Therefore it can't know what the real 
adjustment for control message is needed (to properly cut data). It 
wrongly takes the original control size and this makes the first packet 
too large and discarded by sbappendcontrol_locked.

To solve the problem, I propose to add another function into struct 
pr_usrreqs:
int     (*pru_finalizecontrol)(struct socket *so, int flags, struct mbuf 
**pcontrol);

This function will be called from sosend_default and sosend_dgram. 
uipc_finalizecontrol will do the same that unp_internalize and 
unp_addsockcred do on uipc level, and it will allow sosend_default to 
see the final version of the control message, and properly split data 
into pieces when data is large enough to hit the limit.

I felt I better discuss such addition to struct pr_usrreqs, because it 
might seem like an overkill to add this function just to solve this one 
local issue. But it seems there is no other solution (other than just 
ignoring the occasionally larger mbuf size).

I can easily make a patch fixing this issue with this new function.

Yuri



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?522300E3.6050303>