From owner-freebsd-net@FreeBSD.ORG Fri Dec 6 21:10:51 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id D5DB730A; Fri, 6 Dec 2013 21:10:51 +0000 (UTC) Received: from mail-n.franken.de (drew.ipv6.franken.de [IPv6:2001:638:a02:a001:20e:cff:fe4a:feaa]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 6410F10FC; Fri, 6 Dec 2013 21:10:51 +0000 (UTC) Received: from [192.168.1.200] (p508F3521.dip0.t-ipconnect.de [80.143.53.33]) (Authenticated sender: macmic) by mail-n.franken.de (Postfix) with ESMTP id 5E9881C0C0695; Fri, 6 Dec 2013 22:10:49 +0100 (CET) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 6.6 \(1510\)) Subject: Re: A small fix for if_em.c, if_igb.c, if_ixgbe.c From: Michael Tuexen In-Reply-To: <20131206202012.GG55638@funkthat.com> Date: Fri, 6 Dec 2013 22:10:50 +0100 Content-Transfer-Encoding: 7bit Message-Id: <609C63CD-9332-4EAE-AACE-5B911416DF80@lurchi.franken.de> References: <521B9C2A-EECC-4412-9F68-2235320EF324@lurchi.franken.de> <20131202022338.GA3500@michelle.cdnetworks.com> <20131203021658.GC2981@michelle.cdnetworks.com> <20131205223711.GB55638@funkthat.com> <3576B69E-E943-46E0-83E5-0B2194A44ED0@lurchi.franken.de> <20131206202012.GG55638@funkthat.com> To: John-Mark Gurney X-Mailer: Apple Mail (2.1510) Cc: Yong-Hyeon Pyun , Jack F Vogel , Adrian Chadd , "freebsd-net@freebsd.org list" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 Dec 2013 21:10:52 -0000 On Dec 6, 2013, at 9:20 PM, John-Mark Gurney wrote: > Michael Tuexen wrote this message on Fri, Dec 06, 2013 at 21:17 +0100: >> On Dec 5, 2013, at 11:37 PM, John-Mark Gurney wrote: >> >>> Adrian Chadd wrote this message on Thu, Dec 05, 2013 at 14:01 -0800: >>>> On 5 December 2013 13:05, Michael Tuexen >>>> wrote: >>>> >>>>> Just to be clear: This would mean that xxx_transmit() would return >>>>> an error even if the packet provided in the call xxx_transmit() is >>>>> enqueued and not dropped? >>>>> This would also be problem with the current SCTP stack. >>>> >>>> I think it'll return an error only if: >>>> >>>> * it queued the frame to the tail of the drbd; >>>> * it then tried to transmit a frame from the head of the drbd; >>>> * it failed to transmit the first frame in the drbd and it couldn't >>>> put it back into the queue for whatever reason. >>>> >>>> So I think it should be "ok enough" for both TCP and SCTP. >>> >>> IMO it should only return an error if the specific frame failed to be >>> sent or queued. If you cannot determine at return time if the frame >>> failed to be transmitted/queued, then it should return success. >> Yes, this is exactly what I think too. This is what my first patch >> realizes. >>> >>> In the above case, if there were other frames queued ahead, and the >>> first one failed, then it sounds like the frame may eventually be sent >>> and we will end up sending a duplicate frame. >> Correct. SCTP will consider the frame even unsent... So the SCTP stack >> behaves strange and sends a packet at wirespeed over and over again (which >> is not good...). > > Sounds like a bug in SCTP, if it gets an error like that, it needs to back > off a bit.. Though when to wake up, etc, is harder to decide... Well, this is what happens: The sender takes a packet from the send-queue, calls ip-output. Since it returns an error, we don't move it to the sent-queue, but leave it in the send queue (assuming it doesn't went on the wire). However, the driver puts it on the wire, it makes it to the peer, the peer sends SACK, and we receive the SACK. Since the packet is not on the sent queue, we don't realize that it is acked. Receiving a SACK is a trigger for sending a packet. So we take the next one from the send-queue (the one from the beginning), and send it again. So it is a wire speed ping pong... So in case the lower layer tells us that there was a problem in sending the packet, we * don't consider it sent * wait for the next normal protocol trigger for send another packet. This sounds OK to me... That is why I need to know what an error from ip_output() means. If I can't conclude that the provided packet was dropped, I can just consider it sent and don't try to do any optimisation. Best regards Michael > > -- > John-Mark Gurney Voice: +1 415 225 5579 > > "All that I will do, has been done, All that I have, has not." >