From owner-freebsd-net@FreeBSD.ORG  Wed Sep 21 12:48:46 2005
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
X-Original-To: freebsd-net@freebsd.org
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id B6E0D16A41F
	for <freebsd-net@freebsd.org>; Wed, 21 Sep 2005 12:48:46 +0000 (GMT)
	(envelope-from rwatson@FreeBSD.org)
Received: from cyrus.watson.org (cyrus.watson.org [204.156.12.53])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 0DD1343D45
	for <freebsd-net@freebsd.org>; Wed, 21 Sep 2005 12:48:46 +0000 (GMT)
	(envelope-from rwatson@FreeBSD.org)
Received: from fledge.watson.org (fledge.watson.org [204.156.12.50])
	by cyrus.watson.org (Postfix) with ESMTP id 7B3D046B3C;
	Wed, 21 Sep 2005 08:48:45 -0400 (EDT)
Date: Wed, 21 Sep 2005 13:48:45 +0100 (BST)
From: Robert Watson <rwatson@FreeBSD.org>
X-X-Sender: robert@fledge.watson.org
To: =?ISO-8859-1?Q?Sten_Daniel_S=F8rsdal?= <lists@wm-access.no>
In-Reply-To: <4331539D.9030204@wm-access.no>
Message-ID: <20050921134029.M34322@fledge.watson.org>
References: <20050918212110.61962.qmail@web54501.mail.yahoo.com>
	<20050920134408.Y34322@fledge.watson.org>
	<43313924.9050009@wm-access.no>
	<20050921114511.D34322@fledge.watson.org>
	<4331539D.9030204@wm-access.no>
MIME-Version: 1.0
Content-Type: MULTIPART/MIXED; BOUNDARY="0-1545815129-1127306925=:34322"
Cc: freebsd-net@freebsd.org
Subject: Re: UDP dont fragment bit
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 21 Sep 2005 12:48:46 -0000

  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

--0-1545815129-1127306925=:34322
Content-Type: TEXT/PLAIN; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: QUOTED-PRINTABLE


On Wed, 21 Sep 2005, Sten Daniel S=F8rsdal wrote:

> Robert Watson wrote:
>>
>> So if someone could generate some application pseudo-code that suggests
>> what specifically is necessary from the socket layer in order for the
>> application to function, we can talk about socket service extensions
>> that might support the application.  For example, a way to query
>> detailed error information rather than just the SO_ERROR socket option.
>> Or a longer haul PMTU data gathering mechanism for UDP sockets.  Or ways
>> for UDP applications to more usefully query the kernel for the TCP PMTU
>> data already being recorded.
>>
>> It sounds like for the bandwidth tester, IP raw sockets already provide=
=20
>> what you need, since you want to be able to do fairly irregular UDP=20
>> things (i.e., receive UDP packets with bad checksums, and see=20
>> fragments).
>
> IP raw sockets? Sure, Everything can be solved the complicated way :o)=20
> Some userland applications could benefit from having the option of DF=20
> flag set/unset.

UDP sockets are defined as being a way to send and receive valid UDP=20
datagrams.  Your list of things to receive included fragments and invalid=
=20
datagrams.  While I agree with your comments below about things UDP=20
applications want to do, I don't agree that we should teach UDP sockets to=
=20
receive UDP datagram fragments or packets with bad checksums.=20
Applications looking for non-accepted IP packets and complete ICMP=20
messages should be using the raw socket interface.  Applications looking=20
for post-processed abstracted interfaces to a datagram service should be=20
using UDP sockets.  See below for discussion of enhancing UDP sockets.

> What about applications that wants to have a way of optimizing UDP
> transfers in their network path?
>
> Some networks filter icmp and fragments irresponsibly (imho) and=20
> sometimes the combination of two or more networks that would cause=20
> problems for multicast/video/voip applications.
>
> Sometimes in one network udp packets need fragmenting and in the next=20
> network fragments need to get reassembled to pass a firewall which in=20
> turn runs out of reassembling resources. ( It is more common to block=20
> icmp messages about reassembly problems than DF problems IF a message is=
=20
> generated in the first place. )
>
> Sure, all of this could be fixed the complicated way but what if one=20
> already has an application that runs in unprivileged userland. How many=
=20
> lines of code would a simple socket option plus the "tuning" code=20
> require?

You're still not answering my question about application pseudo-code,=20
however. Adding an IP_DF option to UDP sockets is easy, and can be done in=
=20
ten lines of code or less.  Adding a way to provide detailed feedback on=20
error conditions associated with UDP packets sent at arbitrary points in=20
the path is not something that falls naturally out of the socket API, and=
=20
will require non-trivial amounts of work.  Hence my asking about the=20
structure and event model of your application: what exactly do you want to=
=20
know about UDP packet delivery?

Specifically, what information do you as a developer need in order to=20
handle asynchronous error delivery from UDP packet send, and how will it=20
affect your application's interaction with the network stack? We can=20
already deliver an synchronous EMSGSIZE when you try to send a UDP packet=
=20
out of an interface with an MTU that is lower than the packet size, given=
=20
a socket option to force IP_DF. However, if the packet hits a potential=20
fragmentation problem out in the wide area network, that notification is=20
completely asynchronous from packet transmission, and we will need a way=20
to feed more detailed ICMP information to the application.  Right now=20
asynchronous error delivery on a UDP socket is already fairly messy due to=
=20
the fact that generally applications can only pick up the error when doing=
=20
further I/O, confusing the issue of which operation actually generated the=
=20
error.

Robert N M Watson
--0-1545815129-1127306925=:34322--