From owner-freebsd-net@FreeBSD.ORG Mon Dec 2 02:23:45 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 2429EFCC; Mon, 2 Dec 2013 02:23:45 +0000 (UTC) Received: from mail-pb0-x22b.google.com (mail-pb0-x22b.google.com [IPv6:2607:f8b0:400e:c01::22b]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id E5AB01689; Mon, 2 Dec 2013 02:23:44 +0000 (UTC) Received: by mail-pb0-f43.google.com with SMTP id rq2so17776899pbb.2 for ; Sun, 01 Dec 2013 18:23:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:date:to:cc:subject:message-id:reply-to:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=BMrc5f1xZFo5JoUpPS3cIBbLXZ67fmwF0iYqyuIaQqw=; b=mX48HI9a1BB+DOE8EnNjBiYMXkoJ/W5SoeBkSpc6FdTW5EL+X95pNrgvg4SD+nOqQc SQUfxRUMgdD3EXm3tmApQMaygLRFG53IZVPClFN+F3I4jAkKUnIp4X8h81d+EYlum3FI dMKDFY6Gw4nycDx8igeTAUxIE73s0XsXJcO3JJGm+52foR9vERmkoKA2OkFuARzEaqwJ b6jNRxL4Kx1fNbAOfdBsKi/J8IJnoEgrhML0Cu+PbJiLbxno7gnGKuNR5cS9EzvA6wGi WwBVppcldCwoGfGdV+0mPvUEfaSw4pcOlxFtMvOpk5b5R56oSUNT78VAijeYkkwfXNti ZWTA== X-Received: by 10.68.102.133 with SMTP id fo5mr453880pbb.175.1385951024505; Sun, 01 Dec 2013 18:23:44 -0800 (PST) Received: from pyunyh@gmail.com (lpe4.p59-icn.cdngp.net. [114.111.62.249]) by mx.google.com with ESMTPSA id ki1sm118525907pbd.1.2013.12.01.18.23.41 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Sun, 01 Dec 2013 18:23:43 -0800 (PST) Received: by pyunyh@gmail.com (sSMTP sendmail emulation); Mon, 02 Dec 2013 11:23:38 +0900 From: Yonghyeon PYUN Date: Mon, 2 Dec 2013 11:23:38 +0900 To: Michael Tuexen Subject: Re: A small fix for if_em.c, if_igb.c, if_ixgbe.c Message-ID: <20131202022338.GA3500@michelle.cdnetworks.com> References: <521B9C2A-EECC-4412-9F68-2235320EF324@lurchi.franken.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <521B9C2A-EECC-4412-9F68-2235320EF324@lurchi.franken.de> User-Agent: Mutt/1.4.2.3i Cc: Jack F Vogel , "freebsd-net@freebsd.org list" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list Reply-To: pyunyh@gmail.com List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 Dec 2013 02:23:45 -0000 On Fri, Nov 29, 2013 at 06:24:12PM +0100, Michael Tuexen wrote: > Dear all, > > ifnet(9) says regarding if_transmit(): > > Transmit a packet on an interface or queue it if the interface is > in use. This function will return ENOBUFS if the devices software > and hardware queues are both full. > > The drivers for em, igb and ixgbe might also return an error even > in the case the packet was enqueued. The attached patches fix this > issue. How do you know the packet is successfully enqueued but driver returns an error? Do non-buf-ring-aware drivers also show the same behavior? > > Any comments? I'm afraid the patch you posted ignores any errors(i.e. m_defrag(9), bus_dma(9) etc) happened during TX processing. > > Jack: What do you think? Would you prefer to commit the fix if > you think it is acceptable? > > Best regards > Michael From owner-freebsd-net@FreeBSD.ORG Mon Dec 2 04:17:30 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 2BD69A23; Mon, 2 Dec 2013 04:17:30 +0000 (UTC) Received: from mail-la0-x22a.google.com (mail-la0-x22a.google.com [IPv6:2a00:1450:4010:c03::22a]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 0E5EF6EDA; Mon, 2 Dec 2013 04:17:28 +0000 (UTC) Received: by mail-la0-f42.google.com with SMTP id ec20so8179546lab.29 for ; Sun, 01 Dec 2013 20:17:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=TBydLbV1oTwW4mrWjkravv0PuL225Hkj4WyuqT2q1D0=; b=METhUSh9DV/HufXjpgp5NtqgQhsJ0crKGT71Rx4NktT0m70z4FO/lrOQHli+F58BJ/ hlFw+CljShATBYr88RCpBhD73IrbS+O1kgnvwyvKoQuxAYxD/U4XEJRTLYtCynMco1im WzusAUumTtiQHdiGsAcH+wzfFksOkxOYGNUYXio2b1H/8qnWdAJnJPDPeDCniZSqUBVp JqZ7k/LpOzhcQS/PNjGYqJHvut4uuZFDcy7Ye0iLncn40oFoepZfdYdGUUMg2Aigf10M fWNAw0iUlaieokLs29melEPlqA9CvfXynqNVfILjYWHPFbvIoAcRLfzFt44G8eDtv8p4 Oagw== MIME-Version: 1.0 X-Received: by 10.112.29.147 with SMTP id k19mr42224808lbh.9.1385957846897; Sun, 01 Dec 2013 20:17:26 -0800 (PST) Received: by 10.114.166.163 with HTTP; Sun, 1 Dec 2013 20:17:26 -0800 (PST) In-Reply-To: References: <4053E074-EDC5-49AB-91A7-E50ABE36602E@freebsd.org> Date: Mon, 2 Dec 2013 12:17:26 +0800 Message-ID: Subject: Re: [PATCH] SO_REUSEADDR and SO_REUSEPORT behaviour From: Sepherosa Ziehau To: =?ISO-8859-1?Q?Ermal_Lu=E7i?= Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.16 Cc: freebsd-net , Oleg Moskalenko , Tim Kientzle , "freebsd-current@freebsd.org" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 Dec 2013 04:17:30 -0000 On Sat, Nov 30, 2013 at 2:42 AM, Ermal Lu=E7i wrote: > Well seems Dragonfly has some version of it already from commit [1]. > > The distribution algorithm was changed a little bit after initial commit to gain more idle time (bnx(4) output has already been maxed out): http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/c275f18d832361be28b= 150d3f4fd518914bdeba6 Well, I also addressed a reasonable concern from nginx folks (I am not quite sure about Linux's position on it; Linux original implementation of SO_REUSEPORT from Google had this drawback, which I mentioned in the commit message): http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/02ad2f0b874fb0a45eb= 69750219f79f5e8982272 As about nginx, SO_REUSEPORT patch for nginx (both 1.4.x and 1.5.x) is in dports; should be easier to be back ported to FreeBSD's ports. I failed to convince nginx folks to merge it into mainline and I am currently onto other stuffs, will come back to them later. If FreeBSD is going to implement Linux's style of SO_REUSEPORT, pushing the patch to the nginx mainline will be easier. I also put up a brief description of SO_REUSEPORT in dfly; may be useful to you: http://leaf.dragonflybsd.org/~sephe/netisr_so_reuseport.txt Best Regards, sephe > In FreeBSD there is the framework for this with by defining PCBGROUP. > Also the explanation of it at [2] and [3]. > It can achieve approximately the same features of SO_RESUSEPORT of linux. > The only thing missing is the marketing behind it and i think and better > RSS support. > By looking at dates the support is there before linux so all you guys > looking for it can experiment with it. > > What i was trying to accomplish was something else from performance > improvement and > maybe put a sysctl behind it to make it more acceptable.. > > [1] > > http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/740d1d9f7b7bf9c9c= 021abb8197718d7a2d441c9 > [2] > http://fxr.watson.org/fxr/source/netinet/in_pcbgroup.c?im=3Dbigexcerpts#L= 51 > [3] http://lists.freebsd.org/pipermail/svn-src-head/2011-June/028190.html > > > On Fri, Nov 29, 2013 at 7:03 PM, Oleg Moskalenko >wrote: > > > Tim, you are wrong. Read what is "multicast" definition, and read how U= DP > > and TCP sockets work in Linux 3.9+ kernels. > > > > Oleg . > > > > > > On Fri, Nov 29, 2013 at 9:59 AM, Tim Kientzle >wrote: > > > >> > >> On Nov 29, 2013, at 4:04 AM, Ermal Lu=E7i wrote: > >> > >> > Hello, > >> > > >> > since SO_REUSEADDR and SO_REUSEPORT are supposed to allow two daemon= s > to > >> > share the same port and possibly listening ip =85 > >> > >> These flags are used with TCP-based servers. > >> > >> I=92ve used them to make software upgrades go more smoothly. > >> Without them, the following often happens: > >> > >> * Old server stops. In the process, all of its TCP connections are > >> closed. > >> > >> * Connections to old server remain in the TCP connection table until t= he > >> remote end can acknowledge. > >> > >> * New server starts. > >> > >> * New server tries to open port but fails because that port is =93stil= l in > >> use=94 by connections in the TCP connection table. > >> > >> With these flags, the new server can open the port even though > >> it is =93still in use=94 by existing connections. > >> > >> > >> > This is not the case today. > >> > Only multicast sockets seem to have the behaviour of broadcasting th= e > >> data > >> > to all sockets sharing the same properties through these options! > >> > >> That is what multicast is for. > >> > >> If you want the same data sent to all listeners, then > >> that is multicast behavior and you should be using > >> a multicast socket. > >> > >> > The patch at [1] implements/corrects the behaviour for UDP sockets. > >> > >> You=92re trying to turn all UDP sockets with those options > >> into multicast sockets. > >> > >> If you want a multicast socket, you should ask for one. > >> > >> Tim > >> > >> _______________________________________________ > >> freebsd-net@freebsd.org mailing list > >> http://lists.freebsd.org/mailman/listinfo/freebsd-net > >> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > >> > > > > > > > -- > Ermal > _______________________________________________ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org= " > --=20 Tomorrow Will Never Die From owner-freebsd-net@FreeBSD.ORG Mon Dec 2 04:29:25 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A1A36C2B; Mon, 2 Dec 2013 04:29:25 +0000 (UTC) Received: from mail-pd0-x22d.google.com (mail-pd0-x22d.google.com [IPv6:2607:f8b0:400e:c02::22d]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 6026E6F4D; Mon, 2 Dec 2013 04:29:25 +0000 (UTC) Received: by mail-pd0-f173.google.com with SMTP id p10so17221296pdj.4 for ; Sun, 01 Dec 2013 20:29:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=NaCGDx3HFUa3c+mKizZv7n3HQlfVnUN6VxQjv0+vs+c=; b=aZgFITjM/m7qc1pCJvCyjNfiGheQPjkmIs8FIqZnwk1kZcX7CBxNFO2a2n6sKbUcsq +IAiOMwwnJLZcPEq+ZveIJjTvzIgmUsDBFEkzsoSr2lLC9DIfqOT8CTLh+QAqNrF/4Bp +DR0Yus61LH+pLD9Jf5Esvg5BqDtEQ9WOI6tMls3Cf5lDLR2xjdzkMA36QcO7SQp3Mwp 8cJwI2+yuwGz80so1gUfpyhUy5ZngyGlAmnxrVqTlJUdJqK2Okk8T49Sqh+3egAN6BYn XwilHYr6hV4GKsqaN6Te+/kjRDd1VZ/c48kEhNwS03D/RriFq0MjCKWPhTs+JKXynj9x 77zg== MIME-Version: 1.0 X-Received: by 10.68.254.164 with SMTP id aj4mr1231772pbd.161.1385958564133; Sun, 01 Dec 2013 20:29:24 -0800 (PST) Received: by 10.68.147.131 with HTTP; Sun, 1 Dec 2013 20:29:24 -0800 (PST) In-Reply-To: References: <4053E074-EDC5-49AB-91A7-E50ABE36602E@freebsd.org> Date: Sun, 1 Dec 2013 20:29:24 -0800 Message-ID: Subject: Re: [PATCH] SO_REUSEADDR and SO_REUSEPORT behaviour From: Oleg Moskalenko To: Sepherosa Ziehau Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.16 Cc: =?ISO-8859-1?Q?Ermal_Lu=E7i?= , freebsd-net , Tim Kientzle , "freebsd-current@freebsd.org" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 Dec 2013 04:29:25 -0000 Sepherosa, while reading your description I noticed another long-standing problem for UDP application developers: the UDP sockets are always hashed with 2-tuple. But UDP sockets can be "connected", too, to a remote address, with connect(...) function. Unfortunately, with 2-tuple hashing, that pattern is useless for large-scale applications: if a large number of UDP sockets on the same local port are "connected" to remote address, then the kernel have to go thru the long list of UDP sockets with the same hash value. If the connected UDP sockets would use 4-tuples, then it would be very helpful for the new generation of the UDP-based media applications. For example, servers which use DTLS protocol would become simpler and more efficient. Thanks Oleg On Sun, Dec 1, 2013 at 8:17 PM, Sepherosa Ziehau wrote= : > > > > On Sat, Nov 30, 2013 at 2:42 AM, Ermal Lu=E7i wrote: > >> Well seems Dragonfly has some version of it already from commit [1]. >> >> > The distribution algorithm was changed a little bit after initial commit > to gain more idle time (bnx(4) output has already been maxed out): > > http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/c275f18d832361be2= 8b150d3f4fd518914bdeba6 > > Well, I also addressed a reasonable concern from nginx folks (I am not > quite sure about Linux's position on it; Linux original implementation of > SO_REUSEPORT from Google had this drawback, which I mentioned in the comm= it > message): > > http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/02ad2f0b874fb0a45= eb69750219f79f5e8982272 > > As about nginx, SO_REUSEPORT patch for nginx (both 1.4.x and 1.5.x) is in > dports; should be easier to be back ported to FreeBSD's ports. I failed = to > convince nginx folks to merge it into mainline and I am currently onto > other stuffs, will come back to them later. If FreeBSD is going to > implement Linux's style of SO_REUSEPORT, pushing the patch to the nginx > mainline will be easier. > > I also put up a brief description of SO_REUSEPORT in dfly; may be useful > to you: > http://leaf.dragonflybsd.org/~sephe/netisr_so_reuseport.txt > > Best Regards, > sephe > > >> In FreeBSD there is the framework for this with by defining PCBGROUP. >> Also the explanation of it at [2] and [3]. >> It can achieve approximately the same features of SO_RESUSEPORT of linux= . >> The only thing missing is the marketing behind it and i think and better >> RSS support. >> By looking at dates the support is there before linux so all you guys >> looking for it can experiment with it. >> >> What i was trying to accomplish was something else from performance >> improvement and >> maybe put a sysctl behind it to make it more acceptable.. >> >> [1] >> >> http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/740d1d9f7b7bf9c9= c021abb8197718d7a2d441c9 >> [2] >> http://fxr.watson.org/fxr/source/netinet/in_pcbgroup.c?im=3Dbigexcerpts#= L51 >> [3] http://lists.freebsd.org/pipermail/svn-src-head/2011-June/028190.htm= l >> >> >> On Fri, Nov 29, 2013 at 7:03 PM, Oleg Moskalenko > >wrote: >> >> > Tim, you are wrong. Read what is "multicast" definition, and read how >> UDP >> > and TCP sockets work in Linux 3.9+ kernels. >> > >> > Oleg . >> > >> > >> > On Fri, Nov 29, 2013 at 9:59 AM, Tim Kientzle > >wrote: >> > >> >> >> >> On Nov 29, 2013, at 4:04 AM, Ermal Lu=E7i wrote: >> >> >> >> > Hello, >> >> > >> >> > since SO_REUSEADDR and SO_REUSEPORT are supposed to allow two >> daemons to >> >> > share the same port and possibly listening ip =85 >> >> >> >> These flags are used with TCP-based servers. >> >> >> >> I=92ve used them to make software upgrades go more smoothly. >> >> Without them, the following often happens: >> >> >> >> * Old server stops. In the process, all of its TCP connections are >> >> closed. >> >> >> >> * Connections to old server remain in the TCP connection table until >> the >> >> remote end can acknowledge. >> >> >> >> * New server starts. >> >> >> >> * New server tries to open port but fails because that port is =93sti= ll >> in >> >> use=94 by connections in the TCP connection table. >> >> >> >> With these flags, the new server can open the port even though >> >> it is =93still in use=94 by existing connections. >> >> >> >> >> >> > This is not the case today. >> >> > Only multicast sockets seem to have the behaviour of broadcasting t= he >> >> data >> >> > to all sockets sharing the same properties through these options! >> >> >> >> That is what multicast is for. >> >> >> >> If you want the same data sent to all listeners, then >> >> that is multicast behavior and you should be using >> >> a multicast socket. >> >> >> >> > The patch at [1] implements/corrects the behaviour for UDP sockets. >> >> >> >> You=92re trying to turn all UDP sockets with those options >> >> into multicast sockets. >> >> >> >> If you want a multicast socket, you should ask for one. >> >> >> >> Tim >> >> >> >> _______________________________________________ >> >> freebsd-net@freebsd.org mailing list >> >> http://lists.freebsd.org/mailman/listinfo/freebsd-net >> >> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org= " >> >> >> > >> > >> >> >> -- >> Ermal >> _______________________________________________ >> freebsd-current@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-current >> To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.or= g >> " >> > > > > -- > Tomorrow Will Never Die > From owner-freebsd-net@FreeBSD.ORG Mon Dec 2 05:02:53 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 83F38D5; Mon, 2 Dec 2013 05:02:53 +0000 (UTC) Received: from mail-qe0-x22d.google.com (mail-qe0-x22d.google.com [IPv6:2607:f8b0:400d:c02::22d]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 0AA3310B2; Mon, 2 Dec 2013 05:02:52 +0000 (UTC) Received: by mail-qe0-f45.google.com with SMTP id 6so12836248qea.18 for ; Sun, 01 Dec 2013 21:02:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=5OZpeEVxX+5txyZVFMFemU3TsA/C5ipw0qvzHw7mRw0=; b=duNlyUeNGj9wLEaO6kiuT7UiTZfWp6NYk39dDrNzlQJe8LN/AyQsxKWZTd2Chwx/LZ 0hM6LUGOZ3t2ShHzstqgOwQLuPCWYeUIYLrF6wE1H6qTq6PiaMzSjxpB0n6VpfufrVML P7M7EoXNUMk2KmtBuMBzvI/oUDYOQBGJwLBV3ti0KucA6n8k5vrUB7hZ6G3/IijLWN89 4jIisZdhfjacqpPwU5YBEcX60/18kNIyETFOr6jfI0r9tT/LMxcTNBDpbofzThpAkxA9 RZ0F5LQpjmRAiy4Tv23GV5VDUIY76SWcx0e4AhFvByMwpfrlOE1LCPTICGiJO8rjhHoL vAtA== MIME-Version: 1.0 X-Received: by 10.229.122.195 with SMTP id m3mr109680144qcr.7.1385960572142; Sun, 01 Dec 2013 21:02:52 -0800 (PST) Sender: adrian.chadd@gmail.com Received: by 10.224.53.200 with HTTP; Sun, 1 Dec 2013 21:02:52 -0800 (PST) In-Reply-To: References: <4053E074-EDC5-49AB-91A7-E50ABE36602E@freebsd.org> Date: Sun, 1 Dec 2013 21:02:52 -0800 X-Google-Sender-Auth: 1Fcg6ebyMlcDG_508SdPJ3VO2Qc Message-ID: Subject: Re: [PATCH] SO_REUSEADDR and SO_REUSEPORT behaviour From: Adrian Chadd To: Sepherosa Ziehau Content-Type: text/plain; charset=ISO-8859-1 Cc: =?ISO-8859-1?Q?Ermal_Lu=E7i?= , freebsd-net , Oleg Moskalenko , Tim Kientzle , "freebsd-current@freebsd.org" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 Dec 2013 05:02:53 -0000 Hi! Thanks for the writeup! On 1 December 2013 20:17, Sepherosa Ziehau wrote: > I also put up a brief description of SO_REUSEPORT in dfly; may be useful to > you: > http://leaf.dragonflybsd.org/~sephe/netisr_so_reuseport.txt Ok, so given this, how do you guarantee the UTHREAD stays on the given CPU? You assume it stays on the CPU that the initial listen socket was created on, right? If it's migrated to another CPU core then the listen queue still stays in the original hash group that's in a netisr on a different CPU? -adrian From owner-freebsd-net@FreeBSD.ORG Mon Dec 2 08:02:19 2013 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 02EC9FEC for ; Mon, 2 Dec 2013 08:02:19 +0000 (UTC) Received: from shell0.rawbw.com (shell0.rawbw.com [198.144.192.45]) by mx1.freebsd.org (Postfix) with ESMTP id E516D197B for ; Mon, 2 Dec 2013 08:02:18 +0000 (UTC) Received: from eagle.yuri.org (stunnel@localhost [127.0.0.1]) (authenticated bits=0) by shell0.rawbw.com (8.14.4/8.14.4) with ESMTP id rB282ClP064622 for ; Mon, 2 Dec 2013 00:02:12 -0800 (PST) (envelope-from yuri@rawbw.com) Message-ID: <529C3E84.1030203@rawbw.com> Date: Mon, 02 Dec 2013 00:02:12 -0800 From: Yuri User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:24.0) Gecko/20100101 Thunderbird/24.1.0 MIME-Version: 1.0 To: net@freebsd.org Subject: DIOCNATLOOK fails with ipfw Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 Dec 2013 08:02:19 -0000 I have an app with transparent proxy that should intercept all TCP connections in the interface. This is done with ipfw(8) rule like this: ipfw add 200 fwd 192.168.10.1,15020 tcp from 192.168.10.0/24 to any 80 keep-state Transparent proxy is on 192.168.10.1:15020 Proxy accepts the connections, however, it is using /dev/pf to get the original destination and the lookup procedure fails: ioctl(DIOCNATLOOK) failed: No such file or directory It fails because nobody ever calls pf_state_insert. I see from the source that ioctl to add the pf_state is DIOCSTART, which is issued by pfctl(8), but I am not using pfctl(8) at all. My questions are: What is the relationship between ipfw(8) and pfctl(8)? Do they do the same? Why two of them? If I only use ipfw, is there a way for the acceptor to find what the original destination was without /dev/pf? Yuri From owner-freebsd-net@FreeBSD.ORG Mon Dec 2 08:42:48 2013 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 39C4064C for ; Mon, 2 Dec 2013 08:42:48 +0000 (UTC) Received: from mail-wg0-x232.google.com (mail-wg0-x232.google.com [IPv6:2a00:1450:400c:c00::232]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id C60041CA1 for ; Mon, 2 Dec 2013 08:42:47 +0000 (UTC) Received: by mail-wg0-f50.google.com with SMTP id a1so9778709wgh.29 for ; Mon, 02 Dec 2013 00:42:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=29N7aGykL6mBI68fR+uUMezOOpxDVCjVOSFL746EjwY=; b=c2Hetn9MISeREG2rah0oje8Ui/Uhbep+RjLwk9Mqy+3ml6JPyVjrGK8I92Ity+d3YR NEoKDmoBICUfomXd0opG/clhHERclRC6UYQjFX0vZh3QtAom/XregCw7WNGYd8D3JUcd M4VG0qexg/DvGs6MUyORW3qk1UU8Fxrht5HWIDPxr03KHHBnvJKVBU30ctapL0XOYfGd U3UsqrtO31h93h+5fVE1ynuxkkAMmAI5m6aymdHo6Vdm8K4aua+nNNXEkHVn3CqyP9dJ gnxQVghTu81tLBL6DhblwPLZsz7z3DmVEjdGLHkfVXNPp4Wr3riczV01A5AtRb8tB/qq rUNA== X-Received: by 10.180.89.68 with SMTP id bm4mr17179585wib.0.1385973766311; Mon, 02 Dec 2013 00:42:46 -0800 (PST) Received: from [192.168.2.30] ([2.176.198.47]) by mx.google.com with ESMTPSA id ll10sm120172426wic.9.2013.12.02.00.42.44 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 02 Dec 2013 00:42:45 -0800 (PST) Message-ID: <529C4801.3010000@gmail.com> Date: Mon, 02 Dec 2013 12:12:41 +0330 From: Hooman Fazaeli User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130215 Thunderbird/17.0.3 MIME-Version: 1.0 To: Yuri Subject: Re: DIOCNATLOOK fails with ipfw References: <529C3E84.1030203@rawbw.com> In-Reply-To: <529C3E84.1030203@rawbw.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: net@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 Dec 2013 08:42:48 -0000 On 12/2/2013 11:32 AM, Yuri wrote: > I have an app with transparent proxy that should intercept all TCP connections in the interface. > This is done with ipfw(8) rule like this: > ipfw add 200 fwd 192.168.10.1,15020 tcp from 192.168.10.0/24 to any 80 keep-state > Transparent proxy is on 192.168.10.1:15020 > > Proxy accepts the connections, however, it is using /dev/pf to get the original destination and the lookup procedure fails: > ioctl(DIOCNATLOOK) failed: No such file or directory > It fails because nobody ever calls pf_state_insert. I see from the source that ioctl to add the pf_state is DIOCSTART, which is issued by pfctl(8), but I am not using pfctl(8) at all. > > My questions are: > What is the relationship between ipfw(8) and pfctl(8)? Do they do the same? Why two of them? > If I only use ipfw, is there a way for the acceptor to find what the original destination was without /dev/pf? > > Yuri > _______________________________________________ ipfw and pf are two completely separate firewalls. You can not use /dev/pf to control/query ipfw. Use getsockname(2) to find out original destination address with ipfw. -- Best regards. Hooman Fazaeli From owner-freebsd-net@FreeBSD.ORG Mon Dec 2 11:06:50 2013 Return-Path: Delivered-To: freebsd-net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 7F151B7B for ; Mon, 2 Dec 2013 11:06:50 +0000 (UTC) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 68EB51954 for ; Mon, 2 Dec 2013 11:06:50 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id rB2B6oJn007798 for ; Mon, 2 Dec 2013 11:06:50 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.7/8.14.7/Submit) id rB2B6nZJ007796 for freebsd-net@FreeBSD.org; Mon, 2 Dec 2013 11:06:49 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 2 Dec 2013 11:06:49 GMT Message-Id: <201312021106.rB2B6nZJ007796@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-net@FreeBSD.org Subject: Current problem reports assigned to freebsd-net@FreeBSD.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 Dec 2013 11:06:50 -0000 Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/184311 net [bge] [panic] kernel panic with bge(4) on SunFire X210 o kern/184084 net [ral] kernel crash by ral (RT3090) o bin/183687 net [patch] route(8): route add -net 172.20 add wrong host o kern/183659 net [tcp] ]TCP stack lock contention with short-lived conn o conf/183407 net [rc.d] [patch] Routing restart returns non-zero exitco o kern/183391 net [oce] 10gigabit networking problems with Emulex OCE 11 o kern/183390 net [ixgbe] 10gigabit networking problems o kern/182917 net [igb] strange out traffic with igb interfaces o kern/182847 net [netinet6] [patch] Remove dead code o kern/182665 net [wlan] Kernel panic when creating second wlandev. o kern/182382 net [tcp] sysctl to set TCP CC method on BIG ENDIAN system o kern/182297 net [cm] ArcNet driver fails to detect the link address - o kern/182212 net [patch] [ng_mppc] ng_mppc(4) blocks on network errors o kern/181970 net [re] LAN RealtekŪ 8111G is not supported by re driver o kern/181931 net [vlan] [lagg] vlan over lagg over mlxen crashes the ke o kern/181823 net [ip6] [patch] make ipv6 mroute return same errror code o kern/181741 net [kernel] [patch] Packet loss when 'control' messages a o kern/181703 net [re] [patch] Fix Realtek 8111G Ethernet controller not o kern/181657 net [bpf] [patch] BPF_COP/BPF_COPX instruction reservation o kern/181257 net [bge] bge link status change o kern/181236 net [igb] igb driver unstable work o kern/181135 net [netmap] [patch] sys/dev/netmap patch for Linux compat o kern/181131 net [netmap] [patch] sys/dev/netmap memory allocation impr o kern/181006 net [run] [patch] mbuf leak in run(4) driver o kern/180893 net [if_ethersubr] [patch] Packets received with own LLADD o kern/180844 net [panic] [re] Intermittent panic (re driver?) o kern/180775 net [bxe] if_bxe driver broken with Broadcom BCM57711 card o kern/180722 net [bluetooth] bluetooth takes 30-50 attempts to pair to s kern/180468 net [request] LOCAL_PEERCRED support for PF_INET o kern/180065 net [netinet6] [patch] Multicast loopback to own host brok o kern/179926 net [lacp] [patch] active aggregator selection bug o kern/179824 net [ixgbe] System (9.1-p4) hangs on heavy ixgbe network t o kern/179733 net [lagg] [patch] interface loses capabilities when proto o kern/179429 net [tap] STP enabled tap bridge o kern/179299 net [igb] Intel X540-T2 - unstable driver a kern/179264 net [vimage] [pf] Core dump with Packet filter and VIMAGE o kern/178947 net [arp] arp rejecting not working o kern/178782 net [ixgbe] 82599EB SFP does not work with passthrough und o kern/178612 net [run] kernel panic due the problems with run driver o kern/178472 net [ip6] [patch] make return code consistent with IPv4 co o kern/178079 net [tcp] Switching TCP CC algorithm panics on sparc64 wit s kern/178071 net FreeBSD unable to recongize Kontron (Industrial Comput o kern/177905 net [xl] [panic] ifmedia_set when pluging CardBus LAN card o kern/177618 net [bridge] Problem with bridge firewall with trunk ports o kern/177402 net [igb] [pf] problem with ethernet driver igb + pf / alt o kern/177400 net [jme] JMC25x 1000baseT establishment issues o kern/177366 net [ieee80211] negative malloc(9) statistics for 80211nod f kern/177362 net [netinet] [patch] Wrong control used to return TOS o kern/177194 net [netgraph] Unnamed netgraph nodes for vlan interfaces o kern/177184 net [bge] [patch] enable wake on lan o kern/177139 net [igb] igb drops ethernet ports 2 and 3 o kern/176884 net [re] re0 flapping up/down o kern/176671 net [epair] MAC address for epair device not unique o kern/176484 net [ipsec] [enc] [patch] panic: IPsec + enc(4); device na o kern/176446 net [netinet] [patch] Concurrency in ixgbe driving out-of- o kern/176420 net [kernel] [patch] incorrect errno for LOCAL_PEERCRED o kern/176419 net [kernel] [patch] socketpair support for LOCAL_PEERCRED o kern/176401 net [netgraph] page fault in netgraph o kern/176167 net [ipsec][lagg] using lagg and ipsec causes immediate pa o kern/176027 net [em] [patch] flow control systcl consistency for em dr o kern/176026 net [tcp] [patch] TCP wrappers caused quite a lot of warni o kern/175864 net [re] Intel MB D510MO, onboard ethernet not working aft o kern/175852 net [amd64] [patch] in_cksum_hdr() behaves differently on o kern/175734 net no ethernet detected on system with EG20T PCH chipset o kern/175267 net [pf] [tap] pf + tap keep state problem o kern/175236 net [epair] [gif] epair and gif Devices On Bridge o kern/175182 net [panic] kernel panic on RADIX_MPATH when deleting rout o kern/175153 net [tcp] will there miss a FIN when do TSO? o kern/174959 net [net] [patch] rnh_walktree_from visits spurious nodes o kern/174958 net [net] [patch] rnh_walktree_from makes unreasonable ass o kern/174897 net [route] Interface routes are broken o kern/174851 net [bxe] [patch] UDP checksum offload is wrong in bxe dri o kern/174850 net [bxe] [patch] bxe driver does not receive multicasts o kern/174849 net [bxe] [patch] bxe driver can hang kernel when reset o kern/174822 net [tcp] Page fault in tcp_discardcb under high traffic o kern/174602 net [gif] [ipsec] traceroute issue on gif tunnel with ipse o kern/174535 net [tcp] TCP fast retransmit feature works strange o kern/173871 net [gif] process of 'ifconfig gif0 create hangs' when if_ o kern/173475 net [tun] tun(4) stays opened by PID after process is term o kern/173201 net [ixgbe] [patch] Missing / broken ixgbe sysctl's and tu o kern/173137 net [em] em(4) unable to run at gigabit with 9.1-RC2 o kern/173002 net [patch] data type size problem in if_spppsubr.c o kern/172895 net [ixgb] [ixgbe] do not properly determine link-state o kern/172683 net [ip6] Duplicate IPv6 Link Local Addresses o kern/172675 net [netinet] [patch] sysctl_tcp_hc_list (net.inet.tcp.hos p kern/172113 net [panic] [e1000] [patch] 9.1-RC1/amd64 panices in igb(4 o kern/171840 net [ip6] IPv6 packets transmitting only on queue 0 o kern/171739 net [bce] [panic] bce related kernel panic o kern/171711 net [dummynet] [panic] Kernel panic in dummynet o kern/171532 net [ndis] ndis(4) driver includes 'pccard'-specific code, o kern/171531 net [ndis] undocumented dependency for ndis(4) o kern/171524 net [ipmi] ipmi driver crashes kernel by reboot or shutdow s kern/171508 net [epair] [request] Add the ability to name epair device o kern/171228 net [re] [patch] if_re - eeprom write issues o kern/170701 net [ppp] killl ppp or reboot with active ppp connection c o kern/170267 net [ixgbe] IXGBE_LE32_TO_CPUS is probably an unintentiona o kern/170081 net [fxp] pf/nat/jails not working if checksum offloading o kern/169898 net ifconfig(8) fails to set MTU on multiple interfaces. o kern/169676 net [bge] [hang] system hangs, fully or partially after re o kern/169620 net [ng] [pf] ng_l2tp incoming packet bypass pf firewall o kern/169459 net [ppp] umodem/ppp/3g stopped working after update from o kern/169438 net [ipsec] ipv4-in-ipv6 tunnel mode IPsec does not work p kern/168294 net [ixgbe] [patch] ixgbe driver compiled in kernel has no o kern/168246 net [em] Multiple em(4) not working with qemu o kern/168245 net [arp] [regression] Permanent ARP entry not deleted on o kern/168244 net [arp] [regression] Unable to manually remove permanent o kern/168183 net [bce] bce driver hang system o kern/167603 net [ip] IP fragment reassembly's broken: file transfer ov o kern/167500 net [em] [panic] Kernel panics in em driver o kern/167325 net [netinet] [patch] sosend sometimes return EINVAL with o kern/167202 net [igmp]: Sending multiple IGMP packets crashes kernel o kern/166462 net [gre] gre(4) when using a tunnel source address from c o kern/166285 net [arp] FreeBSD v8.1 REL p8 arp: unknown hardware addres o kern/166255 net [net] [patch] It should be possible to disable "promis p kern/165903 net mbuf leak o kern/165622 net [ndis][panic][patch] Unregistered use of FPU in kernel s kern/165562 net [request] add support for Intel i350 in FreeBSD 7.4 o kern/165526 net [bxe] UDP packets checksum calculation whithin if_bxe o kern/165488 net [ppp] [panic] Fatal trap 12 jails and ppp , kernel wit o kern/165305 net [ip6] [request] Feature parity between IP_TOS and IPV6 o kern/165296 net [vlan] [patch] Fix EVL_APPLY_VLID, update EVL_APPLY_PR o kern/165181 net [igb] igb freezes after about 2 weeks of uptime o kern/165174 net [patch] [tap] allow tap(4) to keep its address on clos o kern/165152 net [ip6] Does not work through the issue of ipv6 addresse o kern/164495 net [igb] connect double head igb to switch cause system t o kern/164490 net [pfil] Incorrect IP checksum on pfil pass from ip_outp o kern/164475 net [gre] gre misses RUNNING flag after a reboot o kern/164265 net [netinet] [patch] tcp_lro_rx computes wrong checksum i o kern/163903 net [igb] "igb0:tx(0)","bpf interface lock" v2.2.5 9-STABL o kern/163481 net freebsd do not add itself to ping route packet o kern/162927 net [tun] Modem-PPP error ppp[1538]: tun0: Phase: Clearing o kern/162558 net [dummynet] [panic] seldom dummynet panics o kern/162153 net [em] intel em driver 7.2.4 don't compile o kern/162110 net [igb] [panic] RELENG_9 panics on boot in IGB driver - o kern/162028 net [ixgbe] [patch] misplaced #endif in ixgbe.c o kern/161277 net [em] [patch] BMC cannot receive IPMI traffic after loa o kern/160873 net [igb] igb(4) from HEAD fails to build on 7-STABLE o kern/160750 net Intel PRO/1000 connection breaks under load until rebo o kern/160693 net [gif] [em] Multicast packet are not passed from GIF0 t o kern/160293 net [ieee80211] ppanic] kernel panic during network setup o kern/160206 net [gif] gifX stops working after a while (IPv6 tunnel) o kern/159817 net [udp] write UDPv4: No buffer space available (code=55) o kern/159629 net [ipsec] [panic] kernel panic with IPsec in transport m o kern/159621 net [tcp] [panic] panic: soabort: so_count o kern/159603 net [netinet] [patch] in_ifscrubprefix() - network route c o kern/159601 net [netinet] [patch] in_scrubprefix() - loopback route re o kern/159294 net [em] em watchdog timeouts o kern/159203 net [wpi] Intel 3945ABG Wireless LAN not support IBSS o kern/158930 net [bpf] BPF element leak in ifp->bpf_if->bif_dlist o kern/158726 net [ip6] [patch] ICMPv6 Router Announcement flooding limi o kern/158694 net [ix] [lagg] ix0 is not working within lagg(4) o kern/158665 net [ip6] [panic] kernel pagefault in in6_setscope() o kern/158635 net [em] TSO breaks BPF packet captures with em driver f kern/157802 net [dummynet] [panic] kernel panic in dummynet o kern/157785 net amd64 + jail + ipfw + natd = very slow outbound traffi o kern/157418 net [em] em driver lockup during boot on Supermicro X9SCM- o kern/157410 net [ip6] IPv6 Router Advertisements Cause Excessive CPU U o kern/157287 net [re] [panic] INVARIANTS panic (Memory modified after f o kern/157200 net [network.subr] [patch] stf(4) can not communicate betw o kern/157182 net [lagg] lagg interface not working together with epair o kern/156877 net [dummynet] [panic] dummynet move_pkt() null ptr derefe o kern/156667 net [em] em0 fails to init on CURRENT after March 17 o kern/156408 net [vlan] Routing failure when using VLANs vs. Physical e o kern/156328 net [icmp]: host can ping other subnet but no have IP from o kern/156317 net [ip6] Wrong order of IPv6 NS DAD/MLD Report o kern/156279 net [if_bridge][divert][ipfw] unable to correctly re-injec o kern/156226 net [lagg]: failover does not announce the failover to swi o kern/156030 net [ip6] [panic] Crash in nd6_dad_start() due to null ptr o kern/155680 net [multicast] problems with multicast s kern/155642 net [new driver] [request] Add driver for Realtek RTL8191S o kern/155597 net [panic] Kernel panics with "sbdrop" message o kern/155420 net [vlan] adding vlan break existent vlan o kern/155177 net [route] [panic] Panic when inject routes in kernel o kern/155010 net [msk] ntfs-3g via iscsi using msk driver cause kernel o kern/154943 net [gif] ifconfig gifX create on existing gifX clears IP s kern/154851 net [new driver] [request]: Port brcm80211 driver from Lin o kern/154850 net [netgraph] [patch] ng_ether fails to name nodes when t o kern/154679 net [em] Fatal trap 12: "em1 taskq" only at startup (8.1-R o kern/154600 net [tcp] [panic] Random kernel panics on tcp_output o kern/154557 net [tcp] Freeze tcp-session of the clients, if in the gat o kern/154443 net [if_bridge] Kernel module bridgestp.ko missing after u o kern/154286 net [netgraph] [panic] 8.2-PRERELEASE panic in netgraph o kern/154255 net [nfs] NFS not responding o kern/154214 net [stf] [panic] Panic when creating stf interface o kern/154185 net race condition in mb_dupcl p kern/154169 net [multicast] [ip6] Node Information Query multicast add o kern/154134 net [ip6] stuck kernel state in LISTEN on ipv6 daemon whic o kern/154091 net [netgraph] [panic] netgraph, unaligned mbuf? o conf/154062 net [vlan] [patch] change to way of auto-generatation of v o kern/153937 net [ral] ralink panics the system (amd64 freeBSDD 8.X) wh o kern/153936 net [ixgbe] [patch] MPRC workaround incorrectly applied to o kern/153816 net [ixgbe] ixgbe doesn't work properly with the Intel 10g o kern/153772 net [ixgbe] [patch] sysctls reference wrong XON/XOFF varia o kern/153497 net [netgraph] netgraph panic due to race conditions o kern/153454 net [patch] [wlan] [urtw] Support ad-hoc and hostap modes o kern/153308 net [em] em interface use 100% cpu o kern/153244 net [em] em(4) fails to send UDP to port 0xffff o kern/152893 net [netgraph] [panic] 8.2-PRERELEASE panic in netgraph o kern/152853 net [em] tftpd (and likely other udp traffic) fails over e o kern/152828 net [em] poor performance on 8.1, 8.2-PRE o kern/152569 net [net]: Multiple ppp connections and routing table prob o kern/152235 net [arp] Permanent local ARP entries are not properly upd o kern/152141 net [vlan] [patch] encapsulate vlan in ng_ether before out o kern/152036 net [libc] getifaddrs(3) returns truncated sockaddrs for n o kern/151690 net [ep] network connectivity won't work until dhclient is o kern/151681 net [nfs] NFS mount via IPv6 leads to hang on client with o kern/151593 net [igb] [panic] Kernel panic when bringing up igb networ o kern/150920 net [ixgbe][igb] Panic when packets are dropped with heade o kern/150557 net [igb] igb0: Watchdog timeout -- resetting o kern/150251 net [patch] [ixgbe] Late cable insertion broken o kern/150249 net [ixgbe] Media type detection broken o bin/150224 net ppp(8) does not reassign static IP after kill -KILL co f kern/149969 net [wlan] [ral] ralink rt2661 fails to maintain connectio o kern/149643 net [rum] device not sending proper beacon frames in ap mo o kern/149609 net [panic] reboot after adding second default route o kern/149117 net [inet] [patch] in_pcbbind: redundant test o kern/149086 net [multicast] Generic multicast join failure in 8.1 o kern/148018 net [flowtable] flowtable crashes on ia64 o kern/147912 net [boot] FreeBSD 8 Beta won't boot on Thinkpad i1300 11 o kern/147894 net [ipsec] IPv6-in-IPv4 does not work inside an ESP-only o kern/147155 net [ip6] setfb not work with ipv6 o kern/146845 net [libc] close(2) returns error 54 (connection reset by f kern/146792 net [flowtable] flowcleaner 100% cpu's core load o kern/146719 net [pf] [panic] PF or dumynet kernel panic o kern/146534 net [icmp6] wrong source address in echo reply o kern/146427 net [mwl] Additional virtual access points don't work on m f kern/146394 net [vlan] IP source address for outgoing connections o bin/146377 net [ppp] [tun] Interface doesn't clear addresses when PPP o kern/146358 net [vlan] wrong destination MAC address o kern/146165 net [wlan] [panic] Setting bssid in adhoc mode causes pani o kern/146082 net [ng_l2tp] a false invaliant check was performed in ng_ o kern/146037 net [panic] mpd + CoA = kernel panic o kern/145825 net [panic] panic: soabort: so_count o kern/145728 net [lagg] Stops working lagg between two servers. p kern/145600 net TCP/ECN behaves different to CE/CWR than ns2 reference f kern/144917 net [flowtable] [panic] flowtable crashes system [regressi o kern/144882 net MacBookPro =>4.1 does not connect to BSD in hostap wit o kern/144874 net [if_bridge] [patch] if_bridge frees mbuf after pfil ho o conf/144700 net [rc.d] async dhclient breaks stuff for too many people o kern/144616 net [nat] [panic] ip_nat panic FreeBSD 7.2 f kern/144315 net [ipfw] [panic] freebsd 8-stable reboot after add ipfw o kern/144231 net bind/connect/sendto too strict about sockaddr length o kern/143846 net [gif] bringing gif3 tunnel down causes gif0 tunnel to s kern/143673 net [stf] [request] there should be a way to support multi o kern/143622 net [pfil] [patch] unlock pfil lock while calling firewall o kern/143593 net [ipsec] When using IPSec, tcpdump doesn't show outgoin o kern/143591 net [ral] RT2561C-based DLink card (DWL-510) fails to work o kern/143208 net [ipsec] [gif] IPSec over gif interface not working o kern/143034 net [panic] system reboots itself in tcp code [regression] o kern/142877 net [hang] network-related repeatable 8.0-STABLE hard hang o kern/142774 net Problem with outgoing connections on interface with mu o kern/142772 net [libc] lla_lookup: new lle malloc failed f kern/142518 net [em] [lagg] Problem on 8.0-STABLE with em and lagg o kern/142018 net [iwi] [patch] Possibly wrong interpretation of beacon- o kern/141861 net [wi] data garbled with WEP and wi(4) with Prism 2.5 f kern/141741 net Etherlink III NIC won't work after upgrade to FBSD 8, o kern/140742 net rum(4) Two asus-WL167G adapters cannot talk to each ot o kern/140682 net [netgraph] [panic] random panic in netgraph f kern/140634 net [vlan] destroying if_lagg interface with if_vlan membe o kern/140619 net [ifnet] [patch] refine obsolete if_var.h comments desc o kern/140346 net [wlan] High bandwidth use causes loss of wlan connecti o kern/140142 net [ip6] [panic] FreeBSD 7.2-amd64 panic w/IPv6 o kern/140066 net [bwi] install report for 8.0 RC 2 (multiple problems) o kern/139387 net [ipsec] Wrong lenth of PF_KEY messages in promiscuous o bin/139346 net [patch] arp(8) add option to remove static entries lis o kern/139268 net [if_bridge] [patch] allow if_bridge to forward just VL p kern/139204 net [arp] DHCP server replies rejected, ARP entry lost bef o kern/139117 net [lagg] + wlan boot timing (EBUSY) o kern/138850 net [dummynet] dummynet doesn't work correctly on a bridge o kern/138782 net [panic] sbflush_internal: cc 0 || mb 0xffffff004127b00 o kern/138688 net [rum] possibly broken on 8 Beta 4 amd64: able to wpa a o kern/138678 net [lo] FreeBSD does not assign linklocal address to loop o kern/138407 net [gre] gre(4) interface does not come up after reboot o kern/138332 net [tun] [lor] ifconfig tun0 destroy causes LOR if_adata/ o kern/138266 net [panic] kernel panic when udp benchmark test used as r f kern/138029 net [bpf] [panic] periodically kernel panic and reboot o kern/137881 net [netgraph] [panic] ng_pppoe fatal trap 12 p bin/137841 net [patch] wpa_supplicant(8) cannot verify SHA256 signed p kern/137776 net [rum] panic in rum(4) driver on 8.0-BETA2 o bin/137641 net ifconfig(8): various problems with "vlan_device.vlan_i o kern/137392 net [ip] [panic] crash in ip_nat.c line 2577 o kern/137372 net [ral] FreeBSD doesn't support wireless interface from o kern/137089 net [lagg] lagg falsely triggers IPv6 duplicate address de o kern/136911 net [netgraph] [panic] system panic on kldload ng_bpf.ko t o kern/136618 net [pf][stf] panic on cloning interface without unit numb o kern/135502 net [periodic] Warning message raised by rtfree function i o kern/134583 net [hang] Machine with jail freezes after random amount o o kern/134531 net [route] [panic] kernel crash related to routes/zebra o kern/134157 net [dummynet] dummynet loads cpu for 100% and make a syst o kern/133969 net [dummynet] [panic] Fatal trap 12: page fault while in o kern/133968 net [dummynet] [panic] dummynet kernel panic o kern/133736 net [udp] ip_id not protected ... o kern/133595 net [panic] Kernel Panic at pcpu.h:195 o kern/133572 net [ppp] [hang] incoming PPTP connection hangs the system o kern/133490 net [bpf] [panic] 'kmem_map too small' panic on Dell r900 o kern/133235 net [netinet] [patch] Process SIOCDLIFADDR command incorre f kern/133213 net arp and sshd errors on 7.1-PRERELEASE o kern/133060 net [ipsec] [pfsync] [panic] Kernel panic with ipsec + pfs o kern/132889 net [ndis] [panic] NDIS kernel crash on load BCM4321 AGN d o conf/132851 net [patch] rc.conf(5): allow to setfib(1) for service run o kern/132734 net [ifmib] [panic] panic in net/if_mib.c o kern/132705 net [libwrap] [patch] libwrap - infinite loop if hosts.all o kern/132672 net [ndis] [panic] ndis with rt2860.sys causes kernel pani o kern/132354 net [nat] Getting some packages to ipnat(8) causes crash o kern/132277 net [crypto] [ipsec] poor performance using cryptodevice f o kern/131781 net [ndis] ndis keeps dropping the link o kern/131776 net [wi] driver fails to init o kern/131753 net [altq] [panic] kernel panic in hfsc_dequeue o bin/131365 net route(8): route add changes interpretation of network f kern/130820 net [ndis] wpa_supplicant(8) returns 'no space on device' o kern/130628 net [nfs] NFS / rpc.lockd deadlock on 7.1-R o kern/130525 net [ndis] [panic] 64 bit ar5008 ndisgen-erated driver cau o kern/130311 net [wlan_xauth] [panic] hostapd restart causing kernel pa o kern/130109 net [ipfw] Can not set fib for packets originated from loc f kern/130059 net [panic] Leaking 50k mbufs/hour f kern/129719 net [nfs] [panic] Panic during shutdown, tcp_ctloutput: in o kern/129517 net [ipsec] [panic] double fault / stack overflow f kern/129508 net [carp] [panic] Kernel panic with EtherIP (may be relat o kern/129219 net [ppp] Kernel panic when using kernel mode ppp o kern/129197 net [panic] 7.0 IP stack related panic o bin/128954 net ifconfig(8) deletes valid routes o bin/128602 net [an] wpa_supplicant(8) crashes with an(4) o kern/128448 net [nfs] 6.4-RC1 Boot Fails if NFS Hostname cannot be res o bin/128295 net [patch] ifconfig(8) does not print TOE4 or TOE6 capabi o bin/128001 net wpa_supplicant(8), wlan(4), and wi(4) issues o kern/127826 net [iwi] iwi0 driver has reduced performance and connecti o kern/127815 net [gif] [patch] if_gif does not set vlan attributes from o kern/127724 net [rtalloc] rtfree: 0xc5a8f870 has 1 refs f bin/127719 net [arp] arp: Segmentation fault (core dumped) f kern/127528 net [icmp]: icmp socket receives icmp replies not owned by p kern/127360 net [socket] TOE socket options missing from sosetopt() o bin/127192 net routed(8) removes the secondary alias IP of interface f kern/127145 net [wi]: prism (wi) driver crash at bigger traffic o kern/126895 net [patch] [ral] Add antenna selection (marked as TBD) o kern/126874 net [vlan]: Zebra problem if ifconfig vlanX destroy o kern/126695 net rtfree messages and network disruption upon use of if_ o kern/126339 net [ipw] ipw driver drops the connection o kern/126075 net [inet] [patch] internet control accesses beyond end of o bin/125922 net [patch] Deadlock in arp(8) o kern/125920 net [arp] Kernel Routing Table loses Ethernet Link status o kern/125845 net [netinet] [patch] tcp_lro_rx() should make use of hard o kern/125258 net [socket] socket's SO_REUSEADDR option does not work o kern/125239 net [gre] kernel crash when using gre o kern/124341 net [ral] promiscuous mode for wireless device ral0 looses o kern/124225 net [ndis] [patch] ndis network driver sometimes loses net o kern/124160 net [libc] connect(2) function loops indefinitely o kern/124021 net [ip6] [panic] page fault in nd6_output() o kern/123968 net [rum] [panic] rum driver causes kernel panic with WPA. o kern/123892 net [tap] [patch] No buffer space available o kern/123890 net [ppp] [panic] crash & reboot on work with PPP low-spee o kern/123858 net [stf] [patch] stf not usable behind a NAT o kern/123758 net [panic] panic while restarting net/freenet6 o bin/123633 net ifconfig(8) doesn't set inet and ether address in one o kern/123559 net [iwi] iwi periodically disassociates/associates [regre o bin/123465 net [ip6] route(8): route add -inet6 -interfac o kern/123463 net [ipsec] [panic] repeatable crash related to ipsec-tool o conf/123330 net [nsswitch.conf] Enabling samba wins in nsswitch.conf c o kern/123160 net [ip] Panic and reboot at sysctl kern.polling.enable=0 o kern/122989 net [swi] [panic] 6.3 kernel panic in swi1: net o kern/122954 net [lagg] IPv6 EUI64 incorrectly chosen for lagg devices f kern/122780 net [lagg] tcpdump on lagg interface during high pps wedge o kern/122685 net It is not visible passing packets in tcpdump(1) o kern/122319 net [wi] imposible to enable ad-hoc demo mode with Orinoco o kern/122290 net [netgraph] [panic] Netgraph related "kmem_map too smal o kern/122252 net [ipmi] [bge] IPMI problem with BCM5704 (does not work o kern/122033 net [ral] [lor] Lock order reversal in ral0 at bootup ieee o bin/121895 net [patch] rtsol(8)/rtsold(8) doesn't handle managed netw s kern/121774 net [swi] [panic] 6.3 kernel panic in swi1: net o kern/121555 net [panic] Fatal trap 12: current process = 12 (swi1: net o kern/121534 net [ipl] [nat] FreeBSD Release 6.3 Kernel Trap 12: o kern/121443 net [gif] [lor] icmp6_input/nd6_lookup o kern/121437 net [vlan] Routing to layer-2 address does not work on VLA o bin/121359 net [patch] [security] ppp(8): fix local stack overflow in o kern/121257 net [tcp] TSO + natd -> slow outgoing tcp traffic o kern/121181 net [panic] Fatal trap 3: breakpoint instruction fault whi o kern/120966 net [rum] kernel panic with if_rum and WPA encryption o kern/120566 net [request]: ifconfig(8) make order of arguments more fr o kern/120304 net [netgraph] [patch] netgraph source assumes 32-bit time o kern/120266 net [udp] [panic] gnugk causes kernel panic when closing U o bin/120060 net routed(8) deletes link-level routes in the presence of o kern/119945 net [rum] [panic] rum device in hostap mode, cause kernel o kern/119791 net [nfs] UDP NFS mount of aliased IP addresses from a Sol o kern/119617 net [nfs] nfs error on wpa network when reseting/shutdown f kern/119516 net [ip6] [panic] _mtx_lock_sleep: recursed on non-recursi o kern/119432 net [arp] route add -host -iface causes arp e o kern/119225 net [wi] 7.0-RC1 no carrier with Prism 2.5 wifi card [regr o kern/118727 net [netgraph] [patch] [request] add new ng_pf module o kern/117423 net [vlan] Duplicate IP on different interfaces o bin/117339 net [patch] route(8): loading routing management commands o bin/116643 net [patch] [request] fstat(1): add INET/INET6 socket deta o kern/116185 net [iwi] if_iwi driver leads system to reboot o kern/115239 net [ipnat] panic with 'kmem_map too small' using ipnat o kern/115019 net [netgraph] ng_ether upper hook packet flow stops on ad o kern/115002 net [wi] if_wi timeout. failed allocation (busy bit). ifco o kern/114915 net [patch] [pcn] pcn (sys/pci/if_pcn.c) ethernet driver f o kern/113432 net [ucom] WARNING: attempt to net_add_domain(netgraph) af o kern/112722 net [ipsec] [udp] IP v4 udp fragmented packet reject o kern/112686 net [patm] patm driver freezes System (FreeBSD 6.2-p4) i38 o bin/112557 net [patch] ppp(8) lock file should not use symlink name o kern/112528 net [nfs] NFS over TCP under load hangs with "impossible p o kern/111537 net [inet6] [patch] ip6_input() treats mbuf cluster wrong o kern/111457 net [ral] ral(4) freeze o kern/110284 net [if_ethersubr] Invalid Assumption in SIOCSIFADDR in et o kern/110249 net [kernel] [regression] [patch] setsockopt() error regre o kern/109470 net [wi] Orinoco Classic Gold PC Card Can't Channel Hop o bin/108895 net pppd(8): PPPoE dead connections on 6.2 [regression] f kern/108197 net [panic] [gif] [ip6] if_delmulti reference counting pan o kern/107944 net [wi] [patch] Forget to unlock mutex-locks o conf/107035 net [patch] bridge(8): bridge interface given in rc.conf n o kern/106444 net [netgraph] [panic] Kernel Panic on Binding to an ip to o kern/106316 net [dummynet] dummynet with multipass ipfw drops packets o kern/105945 net Address can disappear from network interface s kern/105943 net Network stack may modify read-only mbuf chain copies o bin/105925 net problems with ifconfig(8) and vlan(4) [regression] o kern/104851 net [inet6] [patch] On link routes not configured when usi o kern/104751 net [netgraph] kernel panic, when getting info about my tr o kern/104738 net [inet] [patch] Reentrant problem with inet_ntoa in the o kern/103191 net Unpredictable reboot o kern/103135 net [ipsec] ipsec with ipfw divert (not NAT) encodes a pac o kern/102540 net [netgraph] [patch] supporting vlan(4) by ng_fec(4) o conf/102502 net [netgraph] [patch] ifconfig name does't rename netgrap o kern/102035 net [plip] plip networking disables parallel port printing o kern/100709 net [libc] getaddrinfo(3) should return TTL info o kern/100519 net [netisr] suggestion to fix suboptimal network polling o kern/98597 net [inet6] Bug in FreeBSD 6.1 IPv6 link-local DAD procedu o bin/98218 net wpa_supplicant(8) blacklist not working o kern/97306 net [netgraph] NG_L2TP locks after connection with failed o conf/97014 net [gif] gifconfig_gif? in rc.conf does not recognize IPv f kern/96268 net [socket] TCP socket performance drops by 3000% if pack o kern/95519 net [ral] ral0 could not map mbuf o kern/95288 net [pppd] [tty] [panic] if_ppp panic in sys/kern/tty_subr o kern/95277 net [netinet] [patch] IP Encapsulation mask_match() return o kern/95267 net packet drops periodically appear f kern/93378 net [tcp] Slow data transfer in Postfix and Cyrus IMAP (wo o kern/93019 net [ppp] ppp and tunX problems: no traffic after restarti o kern/92880 net [libc] [patch] almost rewritten inet_network(3) functi s kern/92279 net [dc] Core faults everytime I reboot, possible NIC issu o kern/91859 net [ndis] if_ndis does not work with Asus WL-138 o kern/91364 net [ral] [wep] WF-511 RT2500 Card PCI and WEP o kern/91311 net [aue] aue interface hanging o kern/87421 net [netgraph] [panic]: ng_ether + ng_eiface + if_bridge o kern/86871 net [tcp] [patch] allocation logic for PCBs in TIME_WAIT s o kern/86427 net [lor] Deadlock with FASTIPSEC and nat o kern/85780 net 'panic: bogus refcnt 0' in routing/ipv6 o bin/85445 net ifconfig(8): deprecated keyword to ifconfig inoperativ o bin/82975 net route change does not parse classfull network as given o kern/82881 net [netgraph] [panic] ng_fec(4) causes kernel panic after o kern/82468 net Using 64MB tcp send/recv buffers, trafficflow stops, i o bin/82185 net [patch] ndp(8) can delete the incorrect entry o kern/81095 net IPsec connection stops working if associated network i o kern/78968 net FreeBSD freezes on mbufs exhaustion (network interface o kern/78090 net [ipf] ipf filtering on bridged packets doesn't work if o kern/77341 net [ip6] problems with IPV6 implementation o kern/75873 net Usability problem with non-RFC-compliant IP spoof prot s kern/75407 net [an] an(4): no carrier after short time a kern/71474 net [route] route lookup does not skip interfaces marked d o kern/71469 net default route to internet magically disappears with mu o kern/68889 net [panic] m_copym, length > size of mbuf chain o kern/66225 net [netgraph] [patch] extend ng_eiface(4) control message o kern/65616 net IPSEC can't detunnel GRE packets after real ESP encryp s kern/60293 net [patch] FreeBSD arp poison patch a kern/56233 net IPsec tunnel (ESP) over IPv6: MTU computation is wrong s bin/41647 net ifconfig(8) doesn't accept lladdr along with inet addr o kern/39937 net ipstealth issue a kern/38554 net [patch] changing interface ipaddress doesn't seem to w o kern/31940 net ip queue length too short for >500kpps o kern/31647 net [libc] socket calls can return undocumented EINVAL o kern/30186 net [libc] getaddrinfo(3) does not handle incorrect servna f kern/24959 net [patch] proper TCP_NOPUSH/TCP_CORK compatibility o conf/23063 net [arp] [patch] for static ARP tables in rc.network o kern/21998 net [socket] [patch] ident only for outgoing connections o kern/5877 net [socket] sb_cc counts control data as well as data dat 472 problems total. From owner-freebsd-net@FreeBSD.ORG Mon Dec 2 11:36:48 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 7E21C914; Mon, 2 Dec 2013 11:36:48 +0000 (UTC) Received: from mail-lb0-x22e.google.com (mail-lb0-x22e.google.com [IPv6:2a00:1450:4010:c04::22e]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 5CD4B1C65; Mon, 2 Dec 2013 11:36:47 +0000 (UTC) Received: by mail-lb0-f174.google.com with SMTP id c11so8403816lbj.33 for ; Mon, 02 Dec 2013 03:36:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=qTU7Q/AIp9kHrcHK+lK70NkoSLxkcH8Ceb88oF4YTGA=; b=dCpyXHnM47pnGYFAmZzhhNqqcVbBZKgxmy0+ScCJbwPoQVPZ4pJKSqKtMo2MSmqWVg IV4mCZKr4iV+zUFDukIiRJVpQ/xWAY7hmU54XyKotPV8XpyWT5v1KsgqDu8IKIfNWYhB UqTzc/Ut+WyYlBneXatwFu98v43A/jYH6h+/RD5u34v7GEOHhvtxY0zSH9kZ7cdzJFT5 Y37mcRR8XvjkAXkAkbdy42TMG744MFBs4/JoPkjM28KOfUZO5LJ4MuXL28jidE3z2Dx/ kUKh/d2TAD6TObUcgacb1lSoOyus6k5Akz4Kf0xZ03bjOXRMjJMYe3HkHXb3iutseAae 4s1w== MIME-Version: 1.0 X-Received: by 10.152.28.230 with SMTP id e6mr39187123lah.3.1385984205170; Mon, 02 Dec 2013 03:36:45 -0800 (PST) Received: by 10.114.166.163 with HTTP; Mon, 2 Dec 2013 03:36:45 -0800 (PST) In-Reply-To: References: <4053E074-EDC5-49AB-91A7-E50ABE36602E@freebsd.org> Date: Mon, 2 Dec 2013 19:36:45 +0800 Message-ID: Subject: Re: [PATCH] SO_REUSEADDR and SO_REUSEPORT behaviour From: Sepherosa Ziehau To: Oleg Moskalenko Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.17 Cc: =?ISO-8859-1?Q?Ermal_Lu=E7i?= , freebsd-net , Tim Kientzle , "freebsd-current@freebsd.org" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 Dec 2013 11:36:48 -0000 On Mon, Dec 2, 2013 at 12:29 PM, Oleg Moskalenko wrote= : > Sepherosa, while reading your description I noticed another long-standing > problem for UDP application developers: the UDP sockets are always hashed > with 2-tuple. But UDP sockets can be "connected", too, to a remote addres= s, > with connect(...) > The connected UDP sockets will be in connect hash, which is hashed using faddr/laddr/fport/lport. SO_REUSEPORT only affects wildcard sockets. > function. Unfortunately, with 2-tuple hashing, that pattern is useless fo= r > large-scale applications: if a large number of UDP sockets on the same > local port are "connected" to remote address, then the kernel have to go > thru the long list of UDP sockets with the same hash value. > > If the connected UDP sockets would use 4-tuples, then it would be very > helpful for the new generation of the UDP-based media applications. For > example, servers which use DTLS protocol would become simpler and more > efficient. > > If you are talking about RSS, then igb, ixgbe and mxge (and may be other drivers) support RSS extension (mxge is not using RSS, but still 4-tuple hash), which will include UDP fport/lport into Toeplitz hash calculation. Well, for fragments of a UDP datagram, if the ports are taken into consideration the RSS hash will be different for leading fragment and rest of the fragments; I think that's why MS didn't include ports for UDP. Best Regards, sephe > Thanks > Oleg > > > > On Sun, Dec 1, 2013 at 8:17 PM, Sepherosa Ziehau wro= te: > >> >> >> >> On Sat, Nov 30, 2013 at 2:42 AM, Ermal Lu=E7i wrote: >> >>> Well seems Dragonfly has some version of it already from commit [1]. >>> >>> >> The distribution algorithm was changed a little bit after initial commit >> to gain more idle time (bnx(4) output has already been maxed out): >> >> http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/c275f18d832361be= 28b150d3f4fd518914bdeba6 >> >> Well, I also addressed a reasonable concern from nginx folks (I am not >> quite sure about Linux's position on it; Linux original implementation o= f >> SO_REUSEPORT from Google had this drawback, which I mentioned in the com= mit >> message): >> >> http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/02ad2f0b874fb0a4= 5eb69750219f79f5e8982272 >> >> As about nginx, SO_REUSEPORT patch for nginx (both 1.4.x and 1.5.x) is i= n >> dports; should be easier to be back ported to FreeBSD's ports. I failed= to >> convince nginx folks to merge it into mainline and I am currently onto >> other stuffs, will come back to them later. If FreeBSD is going to >> implement Linux's style of SO_REUSEPORT, pushing the patch to the nginx >> mainline will be easier. >> >> I also put up a brief description of SO_REUSEPORT in dfly; may be useful >> to you: >> http://leaf.dragonflybsd.org/~sephe/netisr_so_reuseport.txt >> >> Best Regards, >> sephe >> >> >>> In FreeBSD there is the framework for this with by defining PCBGROUP. >>> Also the explanation of it at [2] and [3]. >>> It can achieve approximately the same features of SO_RESUSEPORT of linu= x. >>> The only thing missing is the marketing behind it and i think and bette= r >>> RSS support. >>> By looking at dates the support is there before linux so all you guys >>> looking for it can experiment with it. >>> >>> What i was trying to accomplish was something else from performance >>> improvement and >>> maybe put a sysctl behind it to make it more acceptable.. >>> >>> [1] >>> >>> http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/740d1d9f7b7bf9c= 9c021abb8197718d7a2d441c9 >>> [2] >>> http://fxr.watson.org/fxr/source/netinet/in_pcbgroup.c?im=3Dbigexcerpts= #L51 >>> [3] >>> http://lists.freebsd.org/pipermail/svn-src-head/2011-June/028190.html >>> >>> >>> On Fri, Nov 29, 2013 at 7:03 PM, Oleg Moskalenko >> >wrote: >>> >>> > Tim, you are wrong. Read what is "multicast" definition, and read how >>> UDP >>> > and TCP sockets work in Linux 3.9+ kernels. >>> > >>> > Oleg . >>> > >>> > >>> > On Fri, Nov 29, 2013 at 9:59 AM, Tim Kientzle >> >wrote: >>> > >>> >> >>> >> On Nov 29, 2013, at 4:04 AM, Ermal Lu=E7i wrote: >>> >> >>> >> > Hello, >>> >> > >>> >> > since SO_REUSEADDR and SO_REUSEPORT are supposed to allow two >>> daemons to >>> >> > share the same port and possibly listening ip =85 >>> >> >>> >> These flags are used with TCP-based servers. >>> >> >>> >> I=92ve used them to make software upgrades go more smoothly. >>> >> Without them, the following often happens: >>> >> >>> >> * Old server stops. In the process, all of its TCP connections are >>> >> closed. >>> >> >>> >> * Connections to old server remain in the TCP connection table until >>> the >>> >> remote end can acknowledge. >>> >> >>> >> * New server starts. >>> >> >>> >> * New server tries to open port but fails because that port is =93st= ill >>> in >>> >> use=94 by connections in the TCP connection table. >>> >> >>> >> With these flags, the new server can open the port even though >>> >> it is =93still in use=94 by existing connections. >>> >> >>> >> >>> >> > This is not the case today. >>> >> > Only multicast sockets seem to have the behaviour of broadcasting >>> the >>> >> data >>> >> > to all sockets sharing the same properties through these options! >>> >> >>> >> That is what multicast is for. >>> >> >>> >> If you want the same data sent to all listeners, then >>> >> that is multicast behavior and you should be using >>> >> a multicast socket. >>> >> >>> >> > The patch at [1] implements/corrects the behaviour for UDP sockets= . >>> >> >>> >> You=92re trying to turn all UDP sockets with those options >>> >> into multicast sockets. >>> >> >>> >> If you want a multicast socket, you should ask for one. >>> >> >>> >> Tim >>> >> >>> >> _______________________________________________ >>> >> freebsd-net@freebsd.org mailing list >>> >> http://lists.freebsd.org/mailman/listinfo/freebsd-net >>> >> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.or= g >>> " >>> >> >>> > >>> > >>> >>> >>> -- >>> Ermal >>> _______________________________________________ >>> freebsd-current@freebsd.org mailing list >>> http://lists.freebsd.org/mailman/listinfo/freebsd-current >>> To unsubscribe, send any mail to " >>> freebsd-current-unsubscribe@freebsd.org" >>> >> >> >> >> -- >> Tomorrow Will Never Die >> > > --=20 Tomorrow Will Never Die From owner-freebsd-net@FreeBSD.ORG Mon Dec 2 11:45:58 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 60110C12; Mon, 2 Dec 2013 11:45:58 +0000 (UTC) Received: from mail-la0-x236.google.com (mail-la0-x236.google.com [IPv6:2a00:1450:4010:c03::236]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 208AD1CE7; Mon, 2 Dec 2013 11:45:56 +0000 (UTC) Received: by mail-la0-f54.google.com with SMTP id b8so2107915lan.13 for ; Mon, 02 Dec 2013 03:45:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=vX313e18J9aw/XnxvVPUlETDS0/4b/gcKO+9Hem8Q6g=; b=BJsCbZajKAUpeNeIFQ3Jb3izcl56KP4rNuYRW1egj4kOaFQ+dJ8PjDz3las5sIebAI 8HXmz9ujfR5ilGahZdQk8/EYLIheuPcBHKtEFNL6/lYYLVunJ6i6H47ehHypwSeZdiyJ kQ31KxAu+7oo5IBInrUw2qlZfvOOe3HoprWYYSCPP9cOyxCYkbnc1EigYR22cOsxfiEh Gw8Zmv0bAKA0n9+Rubi2Ck7UJQGuhdjogql3zow2q4MDZdxaPIzm2LGrv1yXDlgbrbEC f4KJrQRFtvkFpo94KMNFKwmxyf2OlNK2CXa0Em4VqK9kXaFmPYIvEj1BNAYrMjOdTb20 t0xw== MIME-Version: 1.0 X-Received: by 10.152.140.193 with SMTP id ri1mr45245856lab.18.1385984754969; Mon, 02 Dec 2013 03:45:54 -0800 (PST) Received: by 10.114.166.163 with HTTP; Mon, 2 Dec 2013 03:45:54 -0800 (PST) In-Reply-To: References: <4053E074-EDC5-49AB-91A7-E50ABE36602E@freebsd.org> Date: Mon, 2 Dec 2013 19:45:54 +0800 Message-ID: Subject: Re: [PATCH] SO_REUSEADDR and SO_REUSEPORT behaviour From: Sepherosa Ziehau To: Adrian Chadd Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.17 Cc: =?ISO-8859-1?Q?Ermal_Lu=E7i?= , freebsd-net , Oleg Moskalenko , Tim Kientzle , "freebsd-current@freebsd.org" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 Dec 2013 11:45:58 -0000 On Mon, Dec 2, 2013 at 1:02 PM, Adrian Chadd wrote: > Hi! Thanks for the writeup! > > On 1 December 2013 20:17, Sepherosa Ziehau wrote: > > > I also put up a brief description of SO_REUSEPORT in dfly; may be useful > to > > you: > > http://leaf.dragonflybsd.org/~sephe/netisr_so_reuseport.txt > > Ok, so given this, how do you guarantee the UTHREAD stays on the given > CPU? You assume it stays on the CPU that the initial listen socket was > created on, right? If it's migrated to another CPU core then the > listen queue still stays in the original hash group that's in a netisr > on a different CPU? > > As I wrote in the above brief introduction, Dfly currently relies on the scheduler doing the proper thing (the scheduler does do a very good job during my tests). I need to export certain kind of socket option to make that information available to user space programs. Force UTHREAD binding in kernel is not helpful, given in reverse proxy application, things are different. And even if that kind of binding information was exported to user space, user space program still would have to poll it periodically (in Dfly at least), since other programs binding to the same addr/port could come and go, which will cause reorganizing of the inp localgroup in the current Dfly implementation. Best Regards, sephe -- Tomorrow Will Never Die From owner-freebsd-net@FreeBSD.ORG Mon Dec 2 20:57:52 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id D1F7AF5F; Mon, 2 Dec 2013 20:57:52 +0000 (UTC) Received: from lakerest.net (lakerest.net [162.235.35.161]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 6201C1572; Mon, 2 Dec 2013 20:57:52 +0000 (UTC) Received: from [10.1.1.124] (bsd4.lakerest.net [162.235.35.162]) (authenticated bits=0) by lakerest.net (8.14.4/8.14.3) with ESMTP id rB2KuxhD098908 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT); Mon, 2 Dec 2013 15:57:04 -0500 (EST) (envelope-from rrs@lakerest.net) Subject: Re: A small fix for if_em.c, if_igb.c, if_ixgbe.c Mime-Version: 1.0 (Apple Message framework v1283) Content-Type: text/plain; charset=us-ascii From: Randall Stewart In-Reply-To: <20131202022338.GA3500@michelle.cdnetworks.com> Date: Mon, 2 Dec 2013 15:56:59 -0500 Content-Transfer-Encoding: quoted-printable Message-Id: <1ED6A1C2-6CED-4FDA-9C61-76FBCB2D7452@lakerest.net> References: <521B9C2A-EECC-4412-9F68-2235320EF324@lurchi.franken.de> <20131202022338.GA3500@michelle.cdnetworks.com> To: pyunyh@gmail.com X-Mailer: Apple Mail (2.1283) Cc: Jack F Vogel , Michael Tuexen , "freebsd-net@freebsd.org list" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 Dec 2013 20:57:52 -0000 On Dec 1, 2013, at 9:23 PM, Yonghyeon PYUN wrote: > On Fri, Nov 29, 2013 at 06:24:12PM +0100, Michael Tuexen wrote: >> Dear all, >>=20 >> ifnet(9) says regarding if_transmit(): >>=20 >> Transmit a packet on an interface or queue it if the interface is >> in use. This function will return ENOBUFS if the devices software >> and hardware queues are both full. >>=20 >> The drivers for em, igb and ixgbe might also return an error even >> in the case the packet was enqueued. The attached patches fix this >> issue. >=20 > How do you know the packet is successfully enqueued but driver > returns an error? Do non-buf-ring-aware drivers also show the same > behavior? >=20 All of the drivers have traditionally (from what I can tell and all the ones I have poked at) no matter if they are the new format (with ring-buf) or the old, would only return an error in the enqueue if we hit the limit. The driver down the road can in theory drop the packet for other reasons (errors etc) and there is no communication back up to the upper layers that this occurred. >>=20 >> Any comments? >=20 > I'm afraid the patch you posted ignores any errors(i.e. > m_defrag(9), bus_dma(9) etc) happened during TX processing. But that is always the case. Most of the time when you send down to if_transmit() the first time you are going to get your thread working on those things m_defrag() and bus_dma().. but if another thread awoke the driver ahead of you all you get is the return code of the queue into the buffers.. you can't know what is happening on the other thread that is actually putting the work out. This has always been the case. This patch I think is *very* much needed on all the ring buffer aware = drivers except maybe Chelsio (since there's is so different it probably does not have this = issue). I will be applying this to all of Adara's code and I would *strongly* = encourage Jack to get this in to the intel side. I will also pull this patch (and fix all the other drivers) in the = branch I will be creating shortly per Adrian's suggestion on the multi-Q qos stuff I was working = on.. Jack? when can you get this in ?? R >=20 >>=20 >> Jack: What do you think? Would you prefer to commit the fix if >> you think it is acceptable? >>=20 >> Best regards >> Michael > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >=20 ------------------------------ Randall Stewart 803-317-4952 (cell) From owner-freebsd-net@FreeBSD.ORG Mon Dec 2 21:06:37 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A38201D5; Mon, 2 Dec 2013 21:06:37 +0000 (UTC) Received: from lakerest.net (lakerest.net [162.235.35.161]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 2E5B71631; Mon, 2 Dec 2013 21:06:36 +0000 (UTC) Received: from [10.1.1.124] (bsd4.lakerest.net [162.235.35.162]) (authenticated bits=0) by lakerest.net (8.14.4/8.14.3) with ESMTP id rB2L6UBV099061 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT); Mon, 2 Dec 2013 16:06:30 -0500 (EST) (envelope-from rrs@lakerest.net) Subject: Re: A small fix for if_em.c, if_igb.c, if_ixgbe.c Mime-Version: 1.0 (Apple Message framework v1283) Content-Type: text/plain; charset=us-ascii From: Randall Stewart In-Reply-To: <521B9C2A-EECC-4412-9F68-2235320EF324@lurchi.franken.de> Date: Mon, 2 Dec 2013 16:06:30 -0500 Content-Transfer-Encoding: quoted-printable Message-Id: <92126112-73DA-42D3-A8CD-DBF5FB8F45E8@lakerest.net> References: <521B9C2A-EECC-4412-9F68-2235320EF324@lurchi.franken.de> To: Michael Tuexen X-Mailer: Apple Mail (2.1283) Cc: Jack F Vogel , "freebsd-net@freebsd.org list" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 Dec 2013 21:06:37 -0000 Michael: Looking at this patch (as I apply it to my world), I think=20 you can just take the=20 > - if ((err =3D igb_xmit(txr, &next)) !=3D 0) { > + if (igb_xmit(txr, &next) !=3D 0) { Type lines and leave the=20 return(err) since err will get set to 0 by the drbr_enqueue() and return the proper = response to the transport above sending the packet. R On Nov 29, 2013, at 12:24 PM, Michael Tuexen wrote: > Dear all, >=20 > ifnet(9) says regarding if_transmit(): >=20 > Transmit a packet on an interface or queue it if the interface is > in use. This function will return ENOBUFS if the devices software > and hardware queues are both full. >=20 > The drivers for em, igb and ixgbe might also return an error even > in the case the packet was enqueued. The attached patches fix this > issue. >=20 > Any comments? >=20 > Jack: What do you think? Would you prefer to commit the fix if > you think it is acceptable? >=20 > Best regards > Michael >=20 >=20 > [bsd5:~/head/sys/dev] tuexen% svn diff -x -p > Index: e1000/if_em.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- e1000/if_em.c (revision 258746) > +++ e1000/if_em.c (working copy) > @@ -930,7 +930,7 @@ em_mq_start_locked(struct ifnet *ifp, struct tx_ri >=20 > /* Process the queue */ > while ((next =3D drbr_peek(ifp, txr->br)) !=3D NULL) { > - if ((err =3D em_xmit(txr, &next)) !=3D 0) { > + if (em_xmit(txr, &next) !=3D 0) { > if (next =3D=3D NULL) > drbr_advance(ifp, txr->br); > else=20 > @@ -957,7 +957,7 @@ em_mq_start_locked(struct ifnet *ifp, struct tx_ri > em_txeof(txr); > if (txr->tx_avail < EM_MAX_SCATTER) > ifp->if_drv_flags |=3D IFF_DRV_OACTIVE; > - return (err); > + return (0); > } >=20 > /* > Index: e1000/if_igb.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- e1000/if_igb.c (revision 258746) > +++ e1000/if_igb.c (working copy) > @@ -192,7 +192,7 @@ static int igb_suspend(device_t); > static int igb_resume(device_t); > #ifndef IGB_LEGACY_TX > static int igb_mq_start(struct ifnet *, struct mbuf *); > -static int igb_mq_start_locked(struct ifnet *, struct tx_ring *); > +static void igb_mq_start_locked(struct ifnet *, struct tx_ring *); > static void igb_qflush(struct ifnet *); > static void igb_deferred_mq_start(void *, int); > #else > @@ -989,31 +989,31 @@ igb_mq_start(struct ifnet *ifp, struct mbuf *m) > if (err) > return (err); > if (IGB_TX_TRYLOCK(txr)) { > - err =3D igb_mq_start_locked(ifp, txr); > + igb_mq_start_locked(ifp, txr); > IGB_TX_UNLOCK(txr); > } else > taskqueue_enqueue(que->tq, &txr->txq_task); >=20 > - return (err); > + return (0); > } >=20 > -static int > +static void > igb_mq_start_locked(struct ifnet *ifp, struct tx_ring *txr) > { > struct adapter *adapter =3D txr->adapter; > struct mbuf *next; > - int err =3D 0, enq =3D 0; > + int enq =3D 0; >=20 > IGB_TX_LOCK_ASSERT(txr); >=20 > if (((ifp->if_drv_flags & IFF_DRV_RUNNING) =3D=3D 0) || > adapter->link_active =3D=3D 0) > - return (ENETDOWN); > + return; >=20 >=20 > /* Process the queue */ > while ((next =3D drbr_peek(ifp, txr->br)) !=3D NULL) { > - if ((err =3D igb_xmit(txr, &next)) !=3D 0) { > + if (igb_xmit(txr, &next) !=3D 0) { > if (next =3D=3D NULL) { > /* It was freed, move forward */ > drbr_advance(ifp, txr->br); > @@ -1045,7 +1045,7 @@ igb_mq_start_locked(struct ifnet *ifp, struct = tx_r > igb_txeof(txr); > if (txr->tx_avail <=3D IGB_MAX_SCATTER) > txr->queue_status |=3D IGB_QUEUE_DEPLETED; > - return (err); > + return; > } >=20 > /* > Index: ixgbe/ixgbe.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- ixgbe/ixgbe.c (revision 258746) > +++ ixgbe/ixgbe.c (working copy) > @@ -107,7 +107,7 @@ static void ixgbe_start(struct ifnet *); > static void ixgbe_start_locked(struct tx_ring *, struct ifnet *); > #else /* ! IXGBE_LEGACY_TX */ > static int ixgbe_mq_start(struct ifnet *, struct mbuf *); > -static int ixgbe_mq_start_locked(struct ifnet *, struct tx_ring *); > +static void ixgbe_mq_start_locked(struct ifnet *, struct tx_ring *); > static void ixgbe_qflush(struct ifnet *); > static void ixgbe_deferred_mq_start(void *, int); > #endif /* IXGBE_LEGACY_TX */ > @@ -831,35 +831,35 @@ ixgbe_mq_start(struct ifnet *ifp, struct mbuf = *m) > if (err) > return (err); > if (IXGBE_TX_TRYLOCK(txr)) { > - err =3D ixgbe_mq_start_locked(ifp, txr); > + ixgbe_mq_start_locked(ifp, txr); > IXGBE_TX_UNLOCK(txr); > } else > taskqueue_enqueue(que->tq, &txr->txq_task); >=20 > - return (err); > + return (0); > } >=20 > -static int > +static void > ixgbe_mq_start_locked(struct ifnet *ifp, struct tx_ring *txr) > { > struct adapter *adapter =3D txr->adapter; > struct mbuf *next; > - int enqueued =3D 0, err =3D 0; > + int enqueued =3D 0; >=20 > if (((ifp->if_drv_flags & IFF_DRV_RUNNING) =3D=3D 0) || > adapter->link_active =3D=3D 0) > - return (ENETDOWN); > + return; >=20 > /* Process the queue */ > #if __FreeBSD_version < 901504 > next =3D drbr_dequeue(ifp, txr->br); > while (next !=3D NULL) { > - if ((err =3D ixgbe_xmit(txr, &next)) !=3D 0) { > + if (ixgbe_xmit(txr, &next) !=3D 0) { > if (next !=3D NULL) > - err =3D drbr_enqueue(ifp, txr->br, = next); > + drbr_enqueue(ifp, txr->br, next); > #else > while ((next =3D drbr_peek(ifp, txr->br)) !=3D NULL) { > - if ((err =3D ixgbe_xmit(txr, &next)) !=3D 0) { > + if (ixgbe_xmit(txr, &next) !=3D 0) { > if (next =3D=3D NULL) { > drbr_advance(ifp, txr->br); > } else { > @@ -890,7 +890,7 @@ ixgbe_mq_start_locked(struct ifnet *ifp, struct tx > if (txr->tx_avail < IXGBE_TX_CLEANUP_THRESHOLD) > ixgbe_txeof(txr); >=20 > - return (err); > + return; > } >=20 > /* >=20 > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >=20 ------------------------------ Randall Stewart 803-317-4952 (cell) From owner-freebsd-net@FreeBSD.ORG Mon Dec 2 21:41:16 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 99EE4AEA; Mon, 2 Dec 2013 21:41:16 +0000 (UTC) Received: from mail-qa0-x229.google.com (mail-qa0-x229.google.com [IPv6:2607:f8b0:400d:c00::229]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 1D1961AA9; Mon, 2 Dec 2013 21:41:16 +0000 (UTC) Received: by mail-qa0-f41.google.com with SMTP id j5so4920197qaq.7 for ; Mon, 02 Dec 2013 13:41:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=EC57TpugfYxzzijezRYWYhKJlDnMDzob9xIx5rcHmhs=; b=emc8FrTlknWZQkrwuFVDodUQvlkf+wFO+HbO9ZO3Z0V8X6HdP9tFP+4cxTnC7LfM0h 420sOCTDlKA4PcapacIL1YSq0vWQop8bNWo0AhSPq6z/oYCQu/3WUAPUOniJBsc7Gf/q yJL/7HKRqsX+hxUJZPpOb/e22bmjJx0+QGikNWUOXGUkvPxK4xFcnHCGeGnHhRVEWnNF gS/ph5trkDFhP9agRYvENpYjdG5FTj9V/HPqaYAqrz1uBYqkS6p0DiBFYzQUvwN1QrlS DhI54HdejTfLCGTmlSD+yQ43oj3/4v4iSfNCPnx8dspEUgLs3hDCV3bAx/uQacHwK5mH wW6A== MIME-Version: 1.0 X-Received: by 10.49.131.5 with SMTP id oi5mr76884665qeb.38.1386020474382; Mon, 02 Dec 2013 13:41:14 -0800 (PST) Sender: adrian.chadd@gmail.com Received: by 10.224.53.200 with HTTP; Mon, 2 Dec 2013 13:41:14 -0800 (PST) In-Reply-To: References: <4053E074-EDC5-49AB-91A7-E50ABE36602E@freebsd.org> Date: Mon, 2 Dec 2013 13:41:14 -0800 X-Google-Sender-Auth: nFocPVewXGPEdhU8tnGKtSD_dUk Message-ID: Subject: Re: [PATCH] SO_REUSEADDR and SO_REUSEPORT behaviour From: Adrian Chadd To: Sepherosa Ziehau Content-Type: text/plain; charset=ISO-8859-1 Cc: =?ISO-8859-1?Q?Ermal_Lu=E7i?= , freebsd-net , Oleg Moskalenko , Tim Kientzle , "freebsd-current@freebsd.org" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 Dec 2013 21:41:16 -0000 On 2 December 2013 03:45, Sepherosa Ziehau wrote: > > On Mon, Dec 2, 2013 at 1:02 PM, Adrian Chadd wrote: > >> Ok, so given this, how do you guarantee the UTHREAD stays on the given >> CPU? You assume it stays on the CPU that the initial listen socket was >> created on, right? If it's migrated to another CPU core then the >> listen queue still stays in the original hash group that's in a netisr >> on a different CPU? > > As I wrote in the above brief introduction, Dfly currently relies on the > scheduler doing the proper thing (the scheduler does do a very good job > during my tests). I need to export certain kind of socket option to make > that information available to user space programs. Force UTHREAD binding in > kernel is not helpful, given in reverse proxy application, things are > different. And even if that kind of binding information was exported to > user space, user space program still would have to poll it periodically (in > Dfly at least), since other programs binding to the same addr/port could > come and go, which will cause reorganizing of the inp localgroup in the > current Dfly implementation. Right. I kinda gathered that. It's fine, I was conceptually thinking of doing some thead pinning into this anyway. How do you see this scaling on massively multi-core machines? Like 32, 48, 64, 128 cores? I had some vague handwav-y notion of maybe limiting the concept of pcbgroup hash / netisr threads to a subset of CPUs, or have them be able to float between sockets but only have 1 (or n, maybe) per socket. Or just have a fixed, smaller pool. The idea then is the scheduler would need to be told that a given userland thread/process belongs to a given netisr thread, and to schedule them on the same CPU when possible. Anyway, thanks for doing this work. I only wish that you'd do it for FreeBSD. :-) -adrian From owner-freebsd-net@FreeBSD.ORG Mon Dec 2 21:44:49 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 6609FCAF; Mon, 2 Dec 2013 21:44:49 +0000 (UTC) Received: from mail-n.franken.de (drew.ipv6.franken.de [IPv6:2001:638:a02:a001:20e:cff:fe4a:feaa]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 8E9831AD9; Mon, 2 Dec 2013 21:44:48 +0000 (UTC) Received: from [192.168.1.102] (p508F2CD2.dip0.t-ipconnect.de [80.143.44.210]) (Authenticated sender: macmic) by mail-n.franken.de (Postfix) with ESMTP id 9CC361C0C0693; Mon, 2 Dec 2013 22:44:45 +0100 (CET) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 6.6 \(1510\)) Subject: Re: A small fix for if_em.c, if_igb.c, if_ixgbe.c From: Michael Tuexen In-Reply-To: <92126112-73DA-42D3-A8CD-DBF5FB8F45E8@lakerest.net> Date: Mon, 2 Dec 2013 22:44:46 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: References: <521B9C2A-EECC-4412-9F68-2235320EF324@lurchi.franken.de> <92126112-73DA-42D3-A8CD-DBF5FB8F45E8@lakerest.net> To: Randall Stewart X-Mailer: Apple Mail (2.1510) Cc: Jack F Vogel , "freebsd-net@freebsd.org list" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 Dec 2013 21:44:49 -0000 On Dec 2, 2013, at 10:06 PM, Randall Stewart wrote: > Michael: >=20 >=20 > Looking at this patch (as I apply it to my world), I think=20 > you can just take the=20 >=20 >> - if ((err =3D igb_xmit(txr, &next)) !=3D 0) { >> + if (igb_xmit(txr, &next) !=3D 0) { >=20 > Type lines >=20 > and leave the=20 >=20 > return(err) >=20 > since err will get set to 0 by the drbr_enqueue() and return the = proper response to the > transport above sending the packet. True. Just thought this is clearer... But the patch is not minimal, you = are right. Best regards Michael >=20 > R > On Nov 29, 2013, at 12:24 PM, Michael Tuexen wrote: >=20 >> Dear all, >>=20 >> ifnet(9) says regarding if_transmit(): >>=20 >> Transmit a packet on an interface or queue it if the interface is >> in use. This function will return ENOBUFS if the devices software >> and hardware queues are both full. >>=20 >> The drivers for em, igb and ixgbe might also return an error even >> in the case the packet was enqueued. The attached patches fix this >> issue. >>=20 >> Any comments? >>=20 >> Jack: What do you think? Would you prefer to commit the fix if >> you think it is acceptable? >>=20 >> Best regards >> Michael >>=20 >>=20 >> [bsd5:~/head/sys/dev] tuexen% svn diff -x -p >> Index: e1000/if_em.c >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >> --- e1000/if_em.c (revision 258746) >> +++ e1000/if_em.c (working copy) >> @@ -930,7 +930,7 @@ em_mq_start_locked(struct ifnet *ifp, struct = tx_ri >>=20 >> /* Process the queue */ >> while ((next =3D drbr_peek(ifp, txr->br)) !=3D NULL) { >> - if ((err =3D em_xmit(txr, &next)) !=3D 0) { >> + if (em_xmit(txr, &next) !=3D 0) { >> if (next =3D=3D NULL) >> drbr_advance(ifp, txr->br); >> else=20 >> @@ -957,7 +957,7 @@ em_mq_start_locked(struct ifnet *ifp, struct = tx_ri >> em_txeof(txr); >> if (txr->tx_avail < EM_MAX_SCATTER) >> ifp->if_drv_flags |=3D IFF_DRV_OACTIVE; >> - return (err); >> + return (0); >> } >>=20 >> /* >> Index: e1000/if_igb.c >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >> --- e1000/if_igb.c (revision 258746) >> +++ e1000/if_igb.c (working copy) >> @@ -192,7 +192,7 @@ static int igb_suspend(device_t); >> static int igb_resume(device_t); >> #ifndef IGB_LEGACY_TX >> static int igb_mq_start(struct ifnet *, struct mbuf *); >> -static int igb_mq_start_locked(struct ifnet *, struct tx_ring *); >> +static void igb_mq_start_locked(struct ifnet *, struct tx_ring *); >> static void igb_qflush(struct ifnet *); >> static void igb_deferred_mq_start(void *, int); >> #else >> @@ -989,31 +989,31 @@ igb_mq_start(struct ifnet *ifp, struct mbuf *m) >> if (err) >> return (err); >> if (IGB_TX_TRYLOCK(txr)) { >> - err =3D igb_mq_start_locked(ifp, txr); >> + igb_mq_start_locked(ifp, txr); >> IGB_TX_UNLOCK(txr); >> } else >> taskqueue_enqueue(que->tq, &txr->txq_task); >>=20 >> - return (err); >> + return (0); >> } >>=20 >> -static int >> +static void >> igb_mq_start_locked(struct ifnet *ifp, struct tx_ring *txr) >> { >> struct adapter *adapter =3D txr->adapter; >> struct mbuf *next; >> - int err =3D 0, enq =3D 0; >> + int enq =3D 0; >>=20 >> IGB_TX_LOCK_ASSERT(txr); >>=20 >> if (((ifp->if_drv_flags & IFF_DRV_RUNNING) =3D=3D 0) || >> adapter->link_active =3D=3D 0) >> - return (ENETDOWN); >> + return; >>=20 >>=20 >> /* Process the queue */ >> while ((next =3D drbr_peek(ifp, txr->br)) !=3D NULL) { >> - if ((err =3D igb_xmit(txr, &next)) !=3D 0) { >> + if (igb_xmit(txr, &next) !=3D 0) { >> if (next =3D=3D NULL) { >> /* It was freed, move forward */ >> drbr_advance(ifp, txr->br); >> @@ -1045,7 +1045,7 @@ igb_mq_start_locked(struct ifnet *ifp, struct = tx_r >> igb_txeof(txr); >> if (txr->tx_avail <=3D IGB_MAX_SCATTER) >> txr->queue_status |=3D IGB_QUEUE_DEPLETED; >> - return (err); >> + return; >> } >>=20 >> /* >> Index: ixgbe/ixgbe.c >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >> --- ixgbe/ixgbe.c (revision 258746) >> +++ ixgbe/ixgbe.c (working copy) >> @@ -107,7 +107,7 @@ static void ixgbe_start(struct ifnet *); >> static void ixgbe_start_locked(struct tx_ring *, struct ifnet *); >> #else /* ! IXGBE_LEGACY_TX */ >> static int ixgbe_mq_start(struct ifnet *, struct mbuf *); >> -static int ixgbe_mq_start_locked(struct ifnet *, struct tx_ring *); >> +static void ixgbe_mq_start_locked(struct ifnet *, struct tx_ring *); >> static void ixgbe_qflush(struct ifnet *); >> static void ixgbe_deferred_mq_start(void *, int); >> #endif /* IXGBE_LEGACY_TX */ >> @@ -831,35 +831,35 @@ ixgbe_mq_start(struct ifnet *ifp, struct mbuf = *m) >> if (err) >> return (err); >> if (IXGBE_TX_TRYLOCK(txr)) { >> - err =3D ixgbe_mq_start_locked(ifp, txr); >> + ixgbe_mq_start_locked(ifp, txr); >> IXGBE_TX_UNLOCK(txr); >> } else >> taskqueue_enqueue(que->tq, &txr->txq_task); >>=20 >> - return (err); >> + return (0); >> } >>=20 >> -static int >> +static void >> ixgbe_mq_start_locked(struct ifnet *ifp, struct tx_ring *txr) >> { >> struct adapter *adapter =3D txr->adapter; >> struct mbuf *next; >> - int enqueued =3D 0, err =3D 0; >> + int enqueued =3D 0; >>=20 >> if (((ifp->if_drv_flags & IFF_DRV_RUNNING) =3D=3D 0) || >> adapter->link_active =3D=3D 0) >> - return (ENETDOWN); >> + return; >>=20 >> /* Process the queue */ >> #if __FreeBSD_version < 901504 >> next =3D drbr_dequeue(ifp, txr->br); >> while (next !=3D NULL) { >> - if ((err =3D ixgbe_xmit(txr, &next)) !=3D 0) { >> + if (ixgbe_xmit(txr, &next) !=3D 0) { >> if (next !=3D NULL) >> - err =3D drbr_enqueue(ifp, txr->br, = next); >> + drbr_enqueue(ifp, txr->br, next); >> #else >> while ((next =3D drbr_peek(ifp, txr->br)) !=3D NULL) { >> - if ((err =3D ixgbe_xmit(txr, &next)) !=3D 0) { >> + if (ixgbe_xmit(txr, &next) !=3D 0) { >> if (next =3D=3D NULL) { >> drbr_advance(ifp, txr->br); >> } else { >> @@ -890,7 +890,7 @@ ixgbe_mq_start_locked(struct ifnet *ifp, struct = tx >> if (txr->tx_avail < IXGBE_TX_CLEANUP_THRESHOLD) >> ixgbe_txeof(txr); >>=20 >> - return (err); >> + return; >> } >>=20 >> /* >>=20 >> _______________________________________________ >> freebsd-net@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-net >> To unsubscribe, send any mail to = "freebsd-net-unsubscribe@freebsd.org" >>=20 >=20 > ------------------------------ > Randall Stewart > 803-317-4952 (cell) >=20 >=20 From owner-freebsd-net@FreeBSD.ORG Mon Dec 2 21:48:08 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 6DB00D8A; Mon, 2 Dec 2013 21:48:08 +0000 (UTC) Received: from mail-n.franken.de (drew.ipv6.franken.de [IPv6:2001:638:a02:a001:20e:cff:fe4a:feaa]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 2CA481AFD; Mon, 2 Dec 2013 21:48:08 +0000 (UTC) Received: from [192.168.1.102] (p508F2CD2.dip0.t-ipconnect.de [80.143.44.210]) (Authenticated sender: macmic) by mail-n.franken.de (Postfix) with ESMTP id 4729E1C0C0693; Mon, 2 Dec 2013 22:48:06 +0100 (CET) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 6.6 \(1510\)) Subject: Re: A small fix for if_em.c, if_igb.c, if_ixgbe.c From: Michael Tuexen In-Reply-To: <20131202022338.GA3500@michelle.cdnetworks.com> Date: Mon, 2 Dec 2013 22:48:07 +0100 Content-Transfer-Encoding: 7bit Message-Id: References: <521B9C2A-EECC-4412-9F68-2235320EF324@lurchi.franken.de> <20131202022338.GA3500@michelle.cdnetworks.com> To: pyunyh@gmail.com X-Mailer: Apple Mail (2.1510) Cc: Jack F Vogel , "freebsd-net@freebsd.org list" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 Dec 2013 21:48:08 -0000 On Dec 2, 2013, at 3:23 AM, Yonghyeon PYUN wrote: > On Fri, Nov 29, 2013 at 06:24:12PM +0100, Michael Tuexen wrote: >> Dear all, >> >> ifnet(9) says regarding if_transmit(): >> >> Transmit a packet on an interface or queue it if the interface is >> in use. This function will return ENOBUFS if the devices software >> and hardware queues are both full. >> >> The drivers for em, igb and ixgbe might also return an error even >> in the case the packet was enqueued. The attached patches fix this >> issue. > > How do you know the packet is successfully enqueued but driver > returns an error? Do non-buf-ring-aware drivers also show the same When debugging the issue, I saw the packet on the wire but the if_transmit() returning ENOBUFS. > behavior? I don't know. I saw this issue with the igb driver. > >> >> Any comments? > > I'm afraid the patch you posted ignores any errors(i.e. > m_defrag(9), bus_dma(9) etc) happened during TX processing. Correct. I want to make sure that if ENOBUFS is returned, the packet hasn't made it on the wire. The other errors can occur for the packet provided to if_transmit() or due to packet processing of other packets. Am I missing something? Best regards Michael > >> >> Jack: What do you think? Would you prefer to commit the fix if >> you think it is acceptable? >> >> Best regards >> Michael > From owner-freebsd-net@FreeBSD.ORG Mon Dec 2 22:10:06 2013 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 69AFF3DA for ; Mon, 2 Dec 2013 22:10:06 +0000 (UTC) Received: from shell0.rawbw.com (shell0.rawbw.com [198.144.192.45]) by mx1.freebsd.org (Postfix) with ESMTP id 5704E1D3C for ; Mon, 2 Dec 2013 22:10:05 +0000 (UTC) Received: from eagle.yuri.org (stunnel@localhost [127.0.0.1]) (authenticated bits=0) by shell0.rawbw.com (8.14.4/8.14.4) with ESMTP id rB2MA5NU043505 for ; Mon, 2 Dec 2013 14:10:05 -0800 (PST) (envelope-from yuri@rawbw.com) Message-ID: <529D053D.8050700@rawbw.com> Date: Mon, 02 Dec 2013 14:10:05 -0800 From: Yuri User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:24.0) Gecko/20100101 Thunderbird/24.1.0 MIME-Version: 1.0 To: net@freebsd.org Subject: How to forward UDP packets to another port and get responses with port translation? Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 Dec 2013 22:10:06 -0000 I would like to translate the port in all DNS requests, so that the server works on the different port (ex. 1053) on the same net and the client works on the original port 53. I am thinking about two approaches: * forward packets into the server: ipfw add 200 fwd 192.168.10.1,1053 udp from 192.168.10.0/24 to 192.168.10.1 53 The problem with routing responses is that natd(8) doesn't allow to change the source port, only the source address. There is -alias_address option but no -alias_port option. * divert and natd(8): natd -port 8668 -interface tap0 -redirect_port udp 192.168.10.1:1053 53 $IPF 200 divert natd udp from 192.168.10.0/24 to 192.168.10.1 53 via tap0 keep-state In both cases reply packets have the source port 1053, and it isn't clear how to make it 53. It seems that divert only passes to natd(8) packets from one direction, and not from the other. Is there a way to properly translate the ports back and forth in such simple UDP communication? Yuri From owner-freebsd-net@FreeBSD.ORG Tue Dec 3 02:06:28 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 85396B28; Tue, 3 Dec 2013 02:06:28 +0000 (UTC) Received: from mail-pa0-x22e.google.com (mail-pa0-x22e.google.com [IPv6:2607:f8b0:400e:c03::22e]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 4FECE1C8E; Tue, 3 Dec 2013 02:06:28 +0000 (UTC) Received: by mail-pa0-f46.google.com with SMTP id kl14so2202129pab.5 for ; Mon, 02 Dec 2013 18:06:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:date:to:cc:subject:message-id:reply-to:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=nZzME3o/TJ2qf0NkP981vJJVpjvwnP5B3AfryCKhFTQ=; b=W+qMX4o8sfuaB3nGJFcDp0SbtqO6Jvkkfz2xHhh0zlSqpU693R1CPx4ETS85JwfJuq eiL/XB0VeqJtuhkNRRqTCIkIRxS1dXSDQSSxD13L38nV0hdy0hIl3dEqxu4/nK3Tdaug Cpuv/oSOgS/0sXumIr/n6jGyktII/hnSZ6QdN1YTWeB2RAj8D/YkNi6pL7QhfWtydAJb ZpRt6Edfoy/eMfGYyVL9eP+VwX95yMUkob8WaTybbcGEbZuaysWDWD8+hn6Tw+IUmw1z Swpo0NGp9h+cOyiqHQzIG40bHvY87vUBbhXUjImx3jLgeEHpG0PmWbMUwjl0Xn/VSo1V njpA== X-Received: by 10.68.224.38 with SMTP id qz6mr7547294pbc.156.1386036386327; Mon, 02 Dec 2013 18:06:26 -0800 (PST) Received: from pyunyh@gmail.com (lpe4.p59-icn.cdngp.net. [114.111.62.249]) by mx.google.com with ESMTPSA id sg1sm125857604pbb.16.2013.12.02.18.06.22 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Mon, 02 Dec 2013 18:06:25 -0800 (PST) Received: by pyunyh@gmail.com (sSMTP sendmail emulation); Tue, 03 Dec 2013 11:06:18 +0900 From: Yonghyeon PYUN Date: Tue, 3 Dec 2013 11:06:18 +0900 To: Randall Stewart Subject: Re: A small fix for if_em.c, if_igb.c, if_ixgbe.c Message-ID: <20131203020618.GB2981@michelle.cdnetworks.com> References: <521B9C2A-EECC-4412-9F68-2235320EF324@lurchi.franken.de> <20131202022338.GA3500@michelle.cdnetworks.com> <1ED6A1C2-6CED-4FDA-9C61-76FBCB2D7452@lakerest.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1ED6A1C2-6CED-4FDA-9C61-76FBCB2D7452@lakerest.net> User-Agent: Mutt/1.4.2.3i Cc: Jack F Vogel , Michael Tuexen , "freebsd-net@freebsd.org list" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list Reply-To: pyunyh@gmail.com List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Dec 2013 02:06:28 -0000 On Mon, Dec 02, 2013 at 03:56:59PM -0500, Randall Stewart wrote: > > On Dec 1, 2013, at 9:23 PM, Yonghyeon PYUN wrote: > > > On Fri, Nov 29, 2013 at 06:24:12PM +0100, Michael Tuexen wrote: > >> Dear all, > >> > >> ifnet(9) says regarding if_transmit(): > >> > >> Transmit a packet on an interface or queue it if the interface is > >> in use. This function will return ENOBUFS if the devices software > >> and hardware queues are both full. > >> > >> The drivers for em, igb and ixgbe might also return an error even > >> in the case the packet was enqueued. The attached patches fix this > >> issue. > > > > How do you know the packet is successfully enqueued but driver > > returns an error? Do non-buf-ring-aware drivers also show the same > > behavior? > > > All of the drivers have traditionally (from what I can tell > and all the ones I have poked at) no matter if they are the new > format (with ring-buf) or the old, would only return an error in > the enqueue if we hit the limit. > > The driver down the road can in theory drop the packet for other > reasons (errors etc) and there is no communication back up to the > upper layers that this occurred. > Hmm, I was under the impression that buf_ring changed old behavior we had in the past. Before introduction of if_transmit, queuing was done in upper layer so returning an error in driver's TX path didn't affect upper layer. With if_transmit, queuing and TX processing would be done in driver. In order to preserve old behavior, buf-ring-aware drivers may have to return ENOBUFS as you said. The compatibility code introduced in if_transmit for legacy drivers shall return ENOBUFS when there is no room in if_snd. This is the reason why I asked whether Michael sees the same behavior on non-buf-ring aware drivers. > > > >> > >> Any comments? > > > > I'm afraid the patch you posted ignores any errors(i.e. > > m_defrag(9), bus_dma(9) etc) happened during TX processing. > > But that is always the case. Most of the time when you send > down to if_transmit() the first time you are going to get > your thread working on those things m_defrag() and bus_dma().. but if > another thread awoke the driver ahead of you all you get is the return > code of the queue into the buffers.. you can't know what is happening on > the other thread that is actually putting the work out. > > This has always been the case. > > This patch I think is *very* much needed on all the ring buffer aware drivers except maybe > Chelsio (since there's is so different it probably does not have this issue). > > I will be applying this to all of Adara's code and I would *strongly* encourage Jack to get this > in to the intel side. > > I will also pull this patch (and fix all the other drivers) in the branch I will be creating > shortly per Adrian's suggestion on the multi-Q qos stuff I was working on.. > > Jack? when can you get this in ?? > > R > > > > > >> > >> Jack: What do you think? Would you prefer to commit the fix if > >> you think it is acceptable? > >> > >> Best regards > >> Michael > > _______________________________________________ > > freebsd-net@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-net > > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > > > > ------------------------------ > Randall Stewart > 803-317-4952 (cell) > > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" From owner-freebsd-net@FreeBSD.ORG Tue Dec 3 02:17:06 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 1DEF2D85; Tue, 3 Dec 2013 02:17:06 +0000 (UTC) Received: from mail-pd0-x22c.google.com (mail-pd0-x22c.google.com [IPv6:2607:f8b0:400e:c02::22c]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id DD2A91CFB; Tue, 3 Dec 2013 02:17:05 +0000 (UTC) Received: by mail-pd0-f172.google.com with SMTP id g10so19326517pdj.17 for ; Mon, 02 Dec 2013 18:17:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:date:to:cc:subject:message-id:reply-to:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=Z5kB6anVb5378+2xiqTsMA3VoBzRAM9hfX/pJGPeccU=; b=VqgHLbR5y7BTGfFFiMMQ7hpw7rX/DwSWULmHzS2mCPtzm3ekusxrv4DobU/9aNSfA7 dPIKUZ1SuStkTN9OxyihvJj0uXwojJxX/xk567hdsH6Zqqt38xUJRk0u236F0cdOW3/r A6KJsEi1R4Bw7GWM8CphEp1S2KH0YWHAFQyDpxTs9kfkWGyNCyOXUYb4YQben/Ss4LjK dC9gBg+7sfZ0PiLwY2lexES7c0MverPr+wsoNYAS2zf+ug9HciYGAytD6gYdvvSXglxG Y5A7SxWR99Vg0woj52BUINSIA+JqkvKTTzKvTPtt5qRJp7OlLjp4JVWsrPGOQU9ykmA/ p4Zw== X-Received: by 10.68.170.225 with SMTP id ap1mr35390933pbc.117.1386037025514; Mon, 02 Dec 2013 18:17:05 -0800 (PST) Received: from pyunyh@gmail.com (lpe4.p59-icn.cdngp.net. [114.111.62.249]) by mx.google.com with ESMTPSA id gg10sm125876650pbc.46.2013.12.02.18.17.02 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Mon, 02 Dec 2013 18:17:04 -0800 (PST) Received: by pyunyh@gmail.com (sSMTP sendmail emulation); Tue, 03 Dec 2013 11:16:58 +0900 From: Yonghyeon PYUN Date: Tue, 3 Dec 2013 11:16:58 +0900 To: Michael Tuexen Subject: Re: A small fix for if_em.c, if_igb.c, if_ixgbe.c Message-ID: <20131203021658.GC2981@michelle.cdnetworks.com> References: <521B9C2A-EECC-4412-9F68-2235320EF324@lurchi.franken.de> <20131202022338.GA3500@michelle.cdnetworks.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i Cc: Jack F Vogel , "freebsd-net@freebsd.org list" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list Reply-To: pyunyh@gmail.com List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Dec 2013 02:17:06 -0000 On Mon, Dec 02, 2013 at 10:48:07PM +0100, Michael Tuexen wrote: > On Dec 2, 2013, at 3:23 AM, Yonghyeon PYUN wrote: > > > On Fri, Nov 29, 2013 at 06:24:12PM +0100, Michael Tuexen wrote: > >> Dear all, > >> > >> ifnet(9) says regarding if_transmit(): > >> > >> Transmit a packet on an interface or queue it if the interface is > >> in use. This function will return ENOBUFS if the devices software > >> and hardware queues are both full. > >> > >> The drivers for em, igb and ixgbe might also return an error even > >> in the case the packet was enqueued. The attached patches fix this > >> issue. > > > > How do you know the packet is successfully enqueued but driver > > returns an error? Do non-buf-ring-aware drivers also show the same > When debugging the issue, I saw the packet on the wire but the if_transmit() > returning ENOBUFS. > > behavior? > I don't know. I saw this issue with the igb driver. I see. > > > >> > >> Any comments? > > > > I'm afraid the patch you posted ignores any errors(i.e. > > m_defrag(9), bus_dma(9) etc) happened during TX processing. > Correct. I want to make sure that if ENOBUFS is returned, the > packet hasn't made it on the wire. The other errors can occur > for the packet provided to if_transmit() or due to packet > processing of other packets. Am I missing something? > No. It seems the only return code buf-ring-aware drivers can return is ENOBUFS since queuing is done in the driver. > Best regards > Michael > > > >> > >> Jack: What do you think? Would you prefer to commit the fix if > >> you think it is acceptable? > >> > >> Best regards > >> Michael > > > From owner-freebsd-net@FreeBSD.ORG Tue Dec 3 18:44:40 2013 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id D131A16C; Tue, 3 Dec 2013 18:44:40 +0000 (UTC) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher ADH-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id A7DE718C5; Tue, 3 Dec 2013 18:44:40 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 1C200B972; Tue, 3 Dec 2013 13:44:39 -0500 (EST) From: John Baldwin To: freebsd-net@freebsd.org Subject: Re: Defaults for if_capenable and detecting user initiated changes Date: Tue, 3 Dec 2013 12:13:41 -0500 User-Agent: KMail/1.13.5 (FreeBSD/8.4-CBSD-20130906; KDE/4.5.5; amd64; ; ) References: <0E13D481-9D6D-4B52-A5AD-B671BF3A85AF@scsiguy.com> In-Reply-To: <0E13D481-9D6D-4B52-A5AD-B671BF3A85AF@scsiguy.com> MIME-Version: 1.0 Content-Type: Text/Plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <201312031213.41677.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Tue, 03 Dec 2013 13:44:39 -0500 (EST) Cc: "Justin T. Gibbs" , Roger Pau =?utf-8?q?Monn=C3=A9?= , net@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Dec 2013 18:44:40 -0000 On Wednesday, November 27, 2013 12:59:08 pm Justin T. Gibbs wrote: > Hi net, >=20 > I=E2=80=99m reviewing a patch from Roger Pau Monn=C3=A9 for the Xen netfr= ont driver. The=20 goal of the change is to avoid disturbing the user=E2=80=99s settings for t= he=20 interface just because the backend device has changed or the connection to = the=20 backend was reset. I=E2=80=99ve attached the latest version of the patch. >=20 > The current patch leaves the interface settings alone if they can be=20 supported by the newly attached backend. What would be ideal is to enable= =20 capabilities that default to being enabled if they were not explicitly=20 disabled by the user and can be supported by the new backend. Unfortunatel= y,=20 I don=E2=80=99t think the if_capenable and if_capabilities fields are descr= iptive=20 enough to deal with an interface whose capabilities can change at runtime. = =20 Just as can be done with link speed, some of these settings need to allow a= n=20 =E2=80=9Cauto/default=E2=80=9D setting in addition to on or off. This woul= d allow the user to=20 explicitly disable a capability if needed, but generally allow the system t= o=20 chose the most optimal settings when they are supported. Would this be=20 difficult to add? Couldn't you maintain this state in the Xen netfront driver's softc? You already get the ioctls that track changes to the capenable field, so you when a change explicitly disables a capability you can set that in a 'forced off' or 'forced on' field. Perhaps more of a 'forced' field that you just update by doing: sc->capforced |=3D (oldcapenable ^ newcapenable) However, it's not clear to me if you can get the underlying adapters initial capenable list. If so, I think capforced should be all you need to handle this (though it might be easier if you have separate forcedon and forcedoff fields). =2D-=20 John Baldwin From owner-freebsd-net@FreeBSD.ORG Tue Dec 3 22:51:44 2013 Return-Path: Delivered-To: freebsd-net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id CD8D4956; Tue, 3 Dec 2013 22:51:44 +0000 (UTC) Received: from olgeni.olgeni.com (host-156-246-171-31.cloudsigma.com [31.171.246.156]) by mx1.freebsd.org (Postfix) with ESMTP id 738BB1918; Tue, 3 Dec 2013 22:51:44 +0000 (UTC) Received: from olgeni.olgeni (unknown [82.84.68.101]) by olgeni.olgeni.com (Postfix) with ESMTPSA id 9B573174483; Tue, 3 Dec 2013 23:51:42 +0100 (CET) Date: Tue, 3 Dec 2013 23:51:41 +0100 (CET) From: Jimmy Olgeni X-X-Sender: olgeni@olgeni.olgeni To: freebsd-questions@FreeBSD.org Subject: ipsec packets apparently not getting to destination Message-ID: User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) X-OpenPGP-KeyID: 0x90B7A98E6450AE47 X-OpenPGP-Fingerprint: 7133 AB4D DFC8 0A0D F891 B0D2 90B7 A98E 6450 AE47 X-OpenPGP-URL: http://olgeni.olgeni.com/~olgeni/pgp/olgeni@olgeni.com MIME-Version: 1.0 Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII Cc: freebsd-net@FreeBSD.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Dec 2013 22:51:44 -0000 Hello, I'm trying to setup a VPN server using L2TP/IPSEC, racoon (from ipsec-tools) and mpd5, with certificates (to avoid patching racoon for handling wildcard PSKs). PF disabled for testing, no other firewall is active, no NAT on the server, NAT on the client using server port 4500. Server is running 9.2-RELEASE r256712, with this config appended to GENERIC: device crypto # core crypto support device cryptodev # /dev/crypto for access to h/w device enc # IPsec interface. options IPSEC # IP security (requires device crypto) options IPSEC_NAT_T # NAT-T support, UDP encap of ESP options IPSEC_FILTERTUNNEL # filter ipsec packets from a tunnel (plus other unrelated things, ALTQ, SW_WATCHDOG, DDB, TEKEN_UTF8). After tens of tests I got to this point... If I disable ipsec on the Windows 8 client, the L2TP tunnel comes up perfectly. A sample PPTP tunnel (unrelated) also works fine. I take it as proof that mpd5 is configured in a more or less sensible manner. My /etc/ipsec.conf looks like this: flush; spdflush; spdadd 0.0.0.0/0[0] 0.0.0.0/0[1701] udp -P in ipsec esp/transport//require; spdadd 0.0.0.0/0[1701] 0.0.0.0/0[0] udp -P out ipsec esp/transport//require; Which translates to this at runtime: 0.0.0.0/0[1701] 0.0.0.0/0[any] udp in ipsec esp/transport//require spid=58 seq=1 pid=43822 refcnt=1 0.0.0.0/0[any] 0.0.0.0/0[1701] udp out ipsec esp/transport//require spid=57 seq=0 pid=43822 refcnt=1 When connecting with L2TP/IPSEC from the Windows client, racoon shows this output: (C.C.C.C -> NAT address before Windows client, S.S.S.S -> public address of L2TP server) 2013-12-03 23:10:03: INFO: respond new phase 1 negotiation: S.S.S.S[500]<=>C.C.C.C[49216] 2013-12-03 23:10:03: INFO: begin Identity Protection mode. 2013-12-03 23:10:03: INFO: received broken Microsoft ID: MS NT5 ISAKMPOAKLEY 2013-12-03 23:10:03: INFO: received Vendor ID: RFC 3947 2013-12-03 23:10:03: INFO: received Vendor ID: draft-ietf-ipsec-nat-t-ike-02 2013-12-03 23:10:03: INFO: received Vendor ID: FRAGMENTATION 2013-12-03 23:10:03: [C.C.C.C] INFO: Selected NAT-T version: RFC 3947 2013-12-03 23:10:03: ERROR: invalid DH group 20. 2013-12-03 23:10:03: ERROR: invalid DH group 19. 2013-12-03 23:10:03: [S.S.S.S] INFO: Hashing S.S.S.S[500] with algo #2 2013-12-03 23:10:03: INFO: NAT-D payload #0 verified 2013-12-03 23:10:03: [C.C.C.C] INFO: Hashing C.C.C.C[49216] with algo #2 2013-12-03 23:10:03: INFO: NAT-D payload #1 doesn't match 2013-12-03 23:10:03: INFO: NAT detected: PEER 2013-12-03 23:10:03: [C.C.C.C] INFO: Hashing C.C.C.C[49216] with algo #2 2013-12-03 23:10:03: [S.S.S.S] INFO: Hashing S.S.S.S[500] with algo #2 2013-12-03 23:10:03: INFO: Adding remote and local NAT-D payloads. 2013-12-03 23:10:03: INFO: NAT-T: ports changed to: C.C.C.C[4500]<->S.S.S.S[4500] 2013-12-03 23:10:03: INFO: KA found: S.S.S.S[4500]->C.C.C.C[4500] (in_use=2) 2013-12-03 23:10:03: WARNING: unable to get certificate CRL(3) at depth:0 SubjectName:/C=IT/ST=Lombardia/L=Milano/O=MovieReading/CN=LiveSub Client 2013-12-03 23:10:03: WARNING: unable to get certificate CRL(3) at depth:1 SubjectName:/C=IT/ST=Lombardia/O=MovieReading/CN=ROOT CA 2013-12-03 23:10:03: INFO: ISAKMP-SA established S.S.S.S[4500]-C.C.C.C[4500] spi:077c160ee905cf2e:062d1918ab2b788f 2013-12-03 23:10:03: INFO: respond new phase 2 negotiation: S.S.S.S[4500]<=>C.C.C.C[4500] 2013-12-03 23:10:03: INFO: Adjusting my encmode UDP-Transport->Transport 2013-12-03 23:10:03: INFO: Adjusting peer's encmode UDP-Transport(4)->Transport(2) 2013-12-03 23:10:03: INFO: IPsec-SA established: ESP/Transport S.S.S.S[500]->C.C.C.C[500] spi=225553014(0xd71aa76) 2013-12-03 23:10:03: INFO: IPsec-SA established: ESP/Transport S.S.S.S[500]->C.C.C.C[500] spi=2749046390(0xa3db1e76) CRL aside, which is not configured right now, certificate handling looks ok. Client side NAT also looks good. Also, tcpdump on enc0 shows the relevant packets coming through IPSEC: tcpdump: WARNING: enc0: no IPv4 address assigned tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on enc0, link-type ENC (OpenBSD encapsulated IP), capture size 65535 bytes 23:10:03.521573 (authentic,confidential): SPI 0x0d71aa76: IP client.dialup.tiscali.it.l2f > olgeni.olgeni.com.l2f: l2tp:[TLS](0/0)Ns=0,Nr=0 *MSGTYPE(SCCRQ) *PROTO_VER(1.0) *FRAMING_CAP(S) *BEARER_CAP() FIRM_VER(1539) *HOST_NAME(moviereading) VENDOR_NAME(Microsoft) *ASSND_TUN_ID(16) *RECV_WIN_SIZE(8) 23:10:04.513077 (authentic,confidential): SPI 0x0d71aa76: IP client.dialup.tiscali.it.l2f > olgeni.olgeni.com.l2f: l2tp:[TLS](0/0)Ns=0,Nr=0 *MSGTYPE(SCCRQ) *PROTO_VER(1.0) *FRAMING_CAP(S) *BEARER_CAP() FIRM_VER(1539) *HOST_NAME(moviereading) VENDOR_NAME(Microsoft) *ASSND_TUN_ID(16) *RECV_WIN_SIZE(8) Now, the really weird part is that mpd5 does not even see the packets addressed to the l2f (1701) port. I tried to bind mpd5 both to S.S.S.S and to 0.0.0.0, but nothing changed. Also, if I run "socat UDP-LISTEN:1701 STDOUT" in place of mpd5, *nothing* comes through, even if the dump on enc0 shows that something is coming in. Running "setkey -D" shows this: S.S.S.S C.C.C.C esp mode=transport spi=3417968112(0xcbba0df0) reqid=0(0x00000000) E: rijndael-cbc 65260e8e fd0d9dbf 8aa363d8 7cc81f41 2eb89aff d6984fb9 b7bdfc56 50774e0a A: hmac-sha1 fd5e6716 fe7e2c57 fc1f42b9 ec5307ab dae3ea6f seq=0x00000000 replay=4 flags=0x00000000 state=mature created: Dec 3 23:24:16 2013 current: Dec 3 23:24:29 2013 diff: 13(s) hard: 3600(s) soft: 2880(s) last: hard: 0(s) soft: 0(s) current: 0(bytes) hard: 0(bytes) soft: 0(bytes) allocated: 0 hard: 0 soft: 0 sadb_seq=1 pid=43884 refcnt=1 C.C.C.C S.S.S.S esp mode=transport spi=253016163(0x0f14b863) reqid=0(0x00000000) E: rijndael-cbc 1463f10b 87e52b9b 9d32ee04 350198ae 6779d06d 3f57389b 71bffd18 72211b36 A: hmac-sha1 1037b02e 7ec2cf51 50351bb6 cf8ab693 25d87e0a seq=0x00000004 replay=4 flags=0x00000000 state=mature created: Dec 3 23:24:16 2013 current: Dec 3 23:24:29 2013 diff: 13(s) hard: 3600(s) soft: 2880(s) last: Dec 3 23:24:23 2013 hard: 0(s) soft: 0(s) current: 532(bytes) hard: 0(bytes) soft: 0(bytes) allocated: 4 hard: 0 soft: 0 sadb_seq=0 pid=43884 refcnt=1 I cannot imagine any obvious reason for packets getting "lost" after enc0, so any hint would be much appreciated :) -- jimmy From owner-freebsd-net@FreeBSD.ORG Wed Dec 4 12:52:11 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id CDA4AA4A for ; Wed, 4 Dec 2013 12:52:11 +0000 (UTC) Received: from smtp-out.dnepro.net (mail.dnepro.net [178.219.93.41]) by mx1.freebsd.org (Postfix) with ESMTP id 629701E98 for ; Wed, 4 Dec 2013 12:52:10 +0000 (UTC) Received: from traktor.dnepro.net (localhost [127.0.0.1]) by traktor.dnepro.net (8.14.3/8.14.3) with ESMTP id rB4CLGlL010929 for ; Wed, 4 Dec 2013 14:21:16 +0200 (EET) (envelope-from john@traktor.dnepro.net) Received: (from john@localhost) by traktor.dnepro.net (8.14.3/8.14.3/Submit) id rB4CLGfw010927 for freebsd-net@freebsd.org; Wed, 4 Dec 2013 14:21:16 +0200 (EET) (envelope-from john) Date: Wed, 4 Dec 2013 14:21:16 +0200 From: Eugene Perevyazko To: freebsd-net@freebsd.org Subject: Re: ipsec packets apparently not getting to destination Message-ID: <20131204122115.GA46835@traktor.dnepro.net> Mail-Followup-To: freebsd-net@freebsd.org References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.6 (traktor.dnepro.net [127.0.0.1]); Wed, 04 Dec 2013 14:21:16 +0200 (EET) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 04 Dec 2013 12:52:11 -0000 On Tue, Dec 03, 2013 at 11:51:41PM +0100, Jimmy Olgeni wrote: > > Hello, > > I'm trying to setup a VPN server using L2TP/IPSEC, racoon (from > ipsec-tools) and mpd5, with certificates (to avoid patching racoon for > handling wildcard PSKs). PF disabled for testing, no other firewall is > active, no NAT on the server, NAT on the client using server port 4500. > > Server is running 9.2-RELEASE r256712, with this config appended to > GENERIC: > > device crypto # core crypto support > device cryptodev # /dev/crypto for access to h/w > device enc # IPsec interface. > options IPSEC # IP security (requires device crypto) > options IPSEC_NAT_T # NAT-T support, UDP encap of ESP > options IPSEC_FILTERTUNNEL # filter ipsec packets from a tunnel > > (plus other unrelated things, ALTQ, SW_WATCHDOG, DDB, TEKEN_UTF8). > > After tens of tests I got to this point... > > If I disable ipsec on the Windows 8 client, the L2TP tunnel comes up > perfectly. A sample PPTP tunnel (unrelated) also works fine. I take it as > proof that mpd5 is configured in a more or less sensible manner. > > My /etc/ipsec.conf looks like this: > > flush; > spdflush; > spdadd 0.0.0.0/0[0] 0.0.0.0/0[1701] udp -P in ipsec > esp/transport//require; > spdadd 0.0.0.0/0[1701] 0.0.0.0/0[0] udp -P out ipsec > esp/transport//require; > > Which translates to this at runtime: > > 0.0.0.0/0[1701] 0.0.0.0/0[any] udp > in ipsec > esp/transport//require > spid=58 seq=1 pid=43822 > refcnt=1 > 0.0.0.0/0[any] 0.0.0.0/0[1701] udp > out ipsec > esp/transport//require > spid=57 seq=0 pid=43822 > refcnt=1 > > When connecting with L2TP/IPSEC from the Windows client, racoon shows this > output: > > (C.C.C.C -> NAT address before Windows client, S.S.S.S -> public > address of L2TP server) > > 2013-12-03 23:10:03: INFO: respond new phase 1 negotiation: > S.S.S.S[500]<=>C.C.C.C[49216] > 2013-12-03 23:10:03: INFO: begin Identity Protection mode. > 2013-12-03 23:10:03: INFO: received broken Microsoft ID: MS NT5 > ISAKMPOAKLEY > 2013-12-03 23:10:03: INFO: received Vendor ID: RFC 3947 > 2013-12-03 23:10:03: INFO: received Vendor ID: > draft-ietf-ipsec-nat-t-ike-02 > 2013-12-03 23:10:03: INFO: received Vendor ID: FRAGMENTATION > 2013-12-03 23:10:03: [C.C.C.C] INFO: Selected NAT-T version: RFC 3947 > 2013-12-03 23:10:03: ERROR: invalid DH group 20. > 2013-12-03 23:10:03: ERROR: invalid DH group 19. > 2013-12-03 23:10:03: [S.S.S.S] INFO: Hashing S.S.S.S[500] with algo #2 > 2013-12-03 23:10:03: INFO: NAT-D payload #0 verified > 2013-12-03 23:10:03: [C.C.C.C] INFO: Hashing C.C.C.C[49216] with algo #2 > 2013-12-03 23:10:03: INFO: NAT-D payload #1 doesn't match > 2013-12-03 23:10:03: INFO: NAT detected: PEER > 2013-12-03 23:10:03: [C.C.C.C] INFO: Hashing C.C.C.C[49216] with algo #2 > 2013-12-03 23:10:03: [S.S.S.S] INFO: Hashing S.S.S.S[500] with algo #2 > 2013-12-03 23:10:03: INFO: Adding remote and local NAT-D payloads. > 2013-12-03 23:10:03: INFO: NAT-T: ports changed to: > C.C.C.C[4500]<->S.S.S.S[4500] > 2013-12-03 23:10:03: INFO: KA found: S.S.S.S[4500]->C.C.C.C[4500] > (in_use=2) > 2013-12-03 23:10:03: WARNING: unable to get certificate CRL(3) at > depth:0 > SubjectName:/C=IT/ST=Lombardia/L=Milano/O=MovieReading/CN=LiveSub Client > 2013-12-03 23:10:03: WARNING: unable to get certificate CRL(3) at > depth:1 SubjectName:/C=IT/ST=Lombardia/O=MovieReading/CN=ROOT CA > 2013-12-03 23:10:03: INFO: ISAKMP-SA established > S.S.S.S[4500]-C.C.C.C[4500] spi:077c160ee905cf2e:062d1918ab2b788f > 2013-12-03 23:10:03: INFO: respond new phase 2 negotiation: > S.S.S.S[4500]<=>C.C.C.C[4500] > 2013-12-03 23:10:03: INFO: Adjusting my encmode UDP-Transport->Transport > 2013-12-03 23:10:03: INFO: Adjusting peer's encmode > UDP-Transport(4)->Transport(2) > 2013-12-03 23:10:03: INFO: IPsec-SA established: ESP/Transport > S.S.S.S[500]->C.C.C.C[500] spi=225553014(0xd71aa76) > 2013-12-03 23:10:03: INFO: IPsec-SA established: ESP/Transport > S.S.S.S[500]->C.C.C.C[500] spi=2749046390(0xa3db1e76) > > CRL aside, which is not configured right now, certificate handling looks > ok. Client side NAT also looks good. > > Also, tcpdump on enc0 shows the relevant packets coming through IPSEC: > > tcpdump: WARNING: enc0: no IPv4 address assigned > tcpdump: verbose output suppressed, use -v or -vv for full protocol > decode > listening on enc0, link-type ENC (OpenBSD encapsulated IP), capture > size 65535 bytes > 23:10:03.521573 (authentic,confidential): SPI 0x0d71aa76: IP > client.dialup.tiscali.it.l2f > olgeni.olgeni.com.l2f: > l2tp:[TLS](0/0)Ns=0,Nr=0 *MSGTYPE(SCCRQ) *PROTO_VER(1.0) > *FRAMING_CAP(S) *BEARER_CAP() FIRM_VER(1539) *HOST_NAME(moviereading) > VENDOR_NAME(Microsoft) *ASSND_TUN_ID(16) *RECV_WIN_SIZE(8) > 23:10:04.513077 (authentic,confidential): SPI 0x0d71aa76: IP > client.dialup.tiscali.it.l2f > olgeni.olgeni.com.l2f: > l2tp:[TLS](0/0)Ns=0,Nr=0 *MSGTYPE(SCCRQ) *PROTO_VER(1.0) > *FRAMING_CAP(S) *BEARER_CAP() FIRM_VER(1539) *HOST_NAME(moviereading) > VENDOR_NAME(Microsoft) *ASSND_TUN_ID(16) *RECV_WIN_SIZE(8) > > Now, the really weird part is that mpd5 does not even see the packets > addressed to the l2f (1701) port. > > I tried to bind mpd5 both to S.S.S.S and to 0.0.0.0, but nothing > changed. > > Also, if I run "socat UDP-LISTEN:1701 STDOUT" in place of mpd5, *nothing* > comes through, even if the dump on enc0 shows that something is coming in. > > Running "setkey -D" shows this: > > S.S.S.S C.C.C.C > esp mode=transport spi=3417968112(0xcbba0df0) reqid=0(0x00000000) > E: rijndael-cbc 65260e8e fd0d9dbf 8aa363d8 7cc81f41 2eb89aff > d6984fb9 b7bdfc56 50774e0a > A: hmac-sha1 fd5e6716 fe7e2c57 fc1f42b9 ec5307ab dae3ea6f > seq=0x00000000 replay=4 flags=0x00000000 state=mature > created: Dec 3 23:24:16 2013 current: Dec 3 23:24:29 2013 > diff: 13(s) hard: 3600(s) soft: 2880(s) > last: hard: 0(s) soft: 0(s) > current: 0(bytes) hard: 0(bytes) soft: 0(bytes) > allocated: 0 hard: 0 soft: 0 > sadb_seq=1 pid=43884 refcnt=1 > C.C.C.C S.S.S.S > esp mode=transport spi=253016163(0x0f14b863) reqid=0(0x00000000) > E: rijndael-cbc 1463f10b 87e52b9b 9d32ee04 350198ae 6779d06d > 3f57389b 71bffd18 72211b36 > A: hmac-sha1 1037b02e 7ec2cf51 50351bb6 cf8ab693 25d87e0a > seq=0x00000004 replay=4 flags=0x00000000 state=mature > created: Dec 3 23:24:16 2013 current: Dec 3 23:24:29 2013 > diff: 13(s) hard: 3600(s) soft: 2880(s) > last: Dec 3 23:24:23 2013 hard: 0(s) soft: 0(s) > current: 532(bytes) hard: 0(bytes) soft: 0(bytes) > allocated: 4 hard: 0 soft: 0 > sadb_seq=0 pid=43884 refcnt=1 > > I cannot imagine any obvious reason for packets getting "lost" after enc0, > so any hint would be much appreciated :) > mpd uses netgraph for most if not all processing. Could it be that ipsec-processed packets do not enter corresponding netgraph node? You can look at the netgraph tree to see where mpd expects to see incoming packets. -- Eugene Perevyazko From owner-freebsd-net@FreeBSD.ORG Thu Dec 5 08:41:56 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 0BDFDAC0 for ; Thu, 5 Dec 2013 08:41:56 +0000 (UTC) Received: from segfault.kiev.ua (segfault.kiev.ua [193.193.193.4]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 3EDC3186B for ; Thu, 5 Dec 2013 08:41:54 +0000 (UTC) Received: from segfault.kiev.ua (localhost.segfault.kiev.ua [127.0.0.1]) by segfault.kiev.ua (8.14.5/8.14.5/8.Who.Cares) with ESMTP id rB58flqe031839; Thu, 5 Dec 2013 10:41:47 +0200 (EET) (envelope-from netch@segfault.kiev.ua) Received: (from netch@localhost) by segfault.kiev.ua (8.14.5/8.14.5/Submit) id rB58fghP031836; Thu, 5 Dec 2013 10:41:42 +0200 (EET) (envelope-from netch) Date: Thu, 5 Dec 2013 10:41:42 +0200 From: Valentin Nechayev To: freebsd-net@freebsd.org Subject: SCTP huge connect delays (at amd64) and yet another question Message-ID: <20131205084142.GA31113@netch.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-42: On X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 05 Dec 2013 08:41:56 -0000 Hi, I've got some test results which are surprising and I would get a clarification. A simple connection is created between two one-to-one SCTP sockets (AF_INET, SOCK_STREAM, IPPROTO_SCTP) at loopback (127.0.0.1). The server side sends 6 3-byte messages to client side and optionally designates writing shutdown. Client receives all them and measures a time before each receiving. Code is showed at the end of this message. Tested systems are: * FreeBSD 9.2-release/amd64 * FreeBSD 9.1-release/amd64 * FreeBSD 9.1-release/i386 * Linux OpenSuSE 12.2, kernel 3.4.63-2.44-default, x86_64 * Linux RHEL 6.3, kernel 2.6.32-279.22.1.38.0.el6.x86_64 The first discrepancy found is specific for FreeBSD on amd64 and not for i386 version; it's that connection setup lasts 2-4 seconds (!!) Tcpdump shows indication that could be parsed as message miss: tcpdump: listening on lo0, link-type NULL (BSD loopback), capture size 65535 byt es 08:18:34.639422 IP (tos 0x0, ttl 64, id 65094, offset 0, flags [none], proto SCT P (132), length 188, bad cksum 0 (->f274)!) 10.0.0.2.50025 > 127.0.0.1.2500: sctp 1) [INIT] [init tag: 3943463987] [rwnd: 1864135] [OS: 10] [MIS: 2048] [i nit TSN: 3475830004] 08:18:34.639450 IP (tos 0x0, ttl 64, id 42621, offset 0, flags [none], proto SCT P (132), length 524, bad cksum 0 (->48ee)!) 127.0.0.1.2500 > 10.0.0.2.50025: sctp 1) [INIT ACK] [init tag: 59811639] [rwnd: 1864135] [OS: 10] [MIS: 2048] [init TSN: 466863335] 08:18:34.639467 IP (tos 0x0, ttl 64, id 52783, offset 0, flags [none], proto SCT P (132), length 424, bad cksum 0 (->21a0)!) 10.0.0.2.50025 > 127.0.0.1.2500: sctp 1) [COOKIE ECHO] 08:18:35.639618 IP (tos 0x0, ttl 64, id 12109, offset 0, flags [DF], proto SCTP (132), length 424, bad cksum 0 (->8082)!) 10.0.0.2.50025 > 127.0.0.1.2500: sctp 1) [COOKIE ECHO] 08:18:36.692628 IP (tos 0x0, ttl 64, id 48682, offset 0, flags [DF], proto SCTP (132), length 76, bad cksum 0 (->7e01)!) 127.0.0.1.2500 > 127.0.0.1.50025: sctp 1) [HB REQ] 08:18:36.692668 IP (tos 0x0, ttl 64, id 10809, offset 0, flags [DF], proto SCTP (132), length 76, bad cksum 0 (->86f2)!) 10.0.0.2.50025 > 127.0.0.1.2500: sctp 1) [HB ACK] 08:18:36.692707 IP (tos 0x2,ECT(0), ttl 64, id 16588, offset 0, flags [DF], proto SCTP (132), length 52, bad cksum 0 (->fb75)!) 127.0.0.1.2500 > 127.0.0.1.50025: sctp 1) [DATA] (B)(E) [TSN: 466863335] [SID: 0] [SSEQ 0] [PPID 0x0] [Payload: 0x0000: 6162 63 abc [...] At 08:18:34.639467, cookie echo was sent but likely ignored. One second later it was resent. Then, yet another strange timeout was invented before HB REQ. Test series show this can spend more than 4 seconds, average value is about 3 seconds. Two 20-times run summary times are 58 to 63 seconds, so, I've got 2.9...3.15 average connect time. Neither Linux nor 32-bit FreeBSD shows this. The second discrepancy is well known case of so-called "Nagle" algorithm adapted for SCTP but details are confusing. If SCTP_NODELAY isn't turned on on server side, tcpdump shows that the second packet is sent from sender side without delay, but receiver's SACK is delayed for 200 ms by default. These results are identical for FreeBSD (32 bit) and Linux, but not amd64 FreeBSD (see below). But why? A common sense suggests that, if client receives all immediately, and server has already prepared its data, no additional delay shall be invented. In analogue to TCP, I would expect that, until acknoledge for "abc" is got, "def" isn't sent, but then the latter is sent immediately. 09:28:11.374335 IP (tos 0x2,ECT(0), ttl 64, id 24204, offset 0, flags [DF], prot o SCTP (132), length 52, bad cksum 0 (->ddb5)!) 127.0.0.1.2500 > 127.0.0.1.41007: sctp 1) [DATA] (B)(E) [TSN: 183313025] [SID: 0] [SSEQ 0] [PPID 0x0] [Payload: 0x0000: 6162 63 abc 09:28:11.374349 IP (tos 0x0, ttl 64, id 522, offset 0, flags [none], proto SCTP (132), length 48, bad cksum 0 (->7a3e)!) 127.0.0.1.41007 > 127.0.0.1.2500: sctp 1) [SACK] [cum ack 183313025] [a_rwnd 1863876] [#gap acks 0] [#dup tsns 0] 09:28:11.374368 IP (tos 0x2,ECT(0), ttl 64, id 64629, offset 0, flags [DF], prot o SCTP (132), length 52, bad cksum 0 (->3fcc)!) 127.0.0.1.2500 > 127.0.0.1.41007: sctp 1) [DATA] (B)(E) [TSN: 183313026] [SID: 0] [SSEQ 1] [PPID 0x0] [Payload: 0x0000: 6465 66 def 09:28:11.573780 IP (tos 0x0, ttl 64, id 12179, offset 0, flags [none], proto SCT P (132), length 48, bad cksum 0 (->4cb5)!) 127.0.0.1.41007 > 127.0.0.1.2500: sctp 1) [SACK] [cum ack 183313026] [a_rwnd 1864135] [#gap acks 0] [#dup tsns 0] But, if server shuts its writing side down ("s" in argv[]), this laziness disappears. Again, the logic is too opaque and confusing. 64-bit (amd64) FreeBSD shows another behavior (both 9.1 and 9.2): in addition to setup delay (see above), the delay between 2nd and 3rd received packet (case SCTP_NODELAY isn't activated) could be longer than minimally needed one and spreads between a few hundreds of microseconds up to full 0.2 second delay shown on other platforms. In average, 1/8 of runs show this delay: $ fgrep ghi ll | sort -rn -k2,2 -t= | uniq -c 1 got: ghi (with MSG_EOR) tdiff=200835 1 got: ghi (with MSG_EOR) tdiff=200829 1 got: ghi (with MSG_EOR) tdiff=200826 1 got: ghi (with MSG_EOR) tdiff=200822 1 got: ghi (with MSG_EOR) tdiff=200819 1 got: ghi (with MSG_EOR) tdiff=200800 1 got: ghi (with MSG_EOR) tdiff=200792 1 got: ghi (with MSG_EOR) tdiff=199885 1 got: ghi (with MSG_EOR) tdiff=163816 1 got: ghi (with MSG_EOR) tdiff=55849 1 got: ghi (with MSG_EOR) tdiff=1825 21 got: ghi (with MSG_EOR) tdiff=2 38 got: ghi (with MSG_EOR) tdiff=1 It's definitely better than delay each run, as on other platforms (but the initial delay annoys roughly). The testing code: === #include #include #include #include #include #include #include #include #include #include #include #include #define PORT 2500 int main(int argc, char *argv[]) { int s_li, s_ac, s_cl; struct sockaddr_in sia; struct iovec iov[1]; struct msghdr msg; socklen_t slen; struct timeval tv0, tv1; int tdiff; int i; s_li = socket(AF_INET, SOCK_STREAM, IPPROTO_SCTP); if (s_li < 0) err(1, "socket"); memset(&sia, 0, sizeof(sia)); sia.sin_family = AF_INET; sia.sin_addr.s_addr = htonl(0x7F000001); sia.sin_port = htons(PORT); if (bind(s_li, (struct sockaddr*)&sia, sizeof(sia)) < 0) err(1, "bind"); if (listen(s_li, 1) < 0) err(1, "listen"); s_cl = socket(AF_INET, SOCK_STREAM, IPPROTO_SCTP); if (s_cl < 0) err(1, "socket"); if (connect(s_cl, (struct sockaddr*)&sia, sizeof(sia)) < 0) err(1, "connect"); slen = sizeof(sia); s_ac = accept(s_li, (struct sockaddr*) &sia, &slen); if (s_ac < 0) err(1, "accept"); for (i = 1; i < argc; ++i) { if (!strcmp(argv[i], "nn")) { const int one = 1; if (setsockopt(s_ac, IPPROTO_SCTP, SCTP_NODELAY, &one, sizeof(one)) < 0) warn("setsockopt(SCTP_NODELAY)"); } } if (send(s_ac, "abc", 3, 0) != 3) err(1, "send"); if (send(s_ac, "def", 3, MSG_EOR) != 3) err(1, "send"); if (send(s_ac, "ghi", 3, 0) != 3) err(1, "send"); if (send(s_ac, "jkl", 3, MSG_EOR) != 3) err(1, "send"); if (send(s_ac, "mno", 3, 0) != 3) err(1, "send"); if (send(s_ac, "pqr", 3, MSG_EOR) != 3) err(1, "send"); for (i = 1; i < argc; ++i) { if (!strcmp(argv[i], "s")) shutdown(s_ac, SHUT_WR); } for(;;) { char buf[1024]; memset(&msg, 0, sizeof(msg)); iov[0].iov_base = buf; iov[0].iov_len = sizeof(buf) - 1; msg.msg_iov = iov; msg.msg_iovlen = 1; gettimeofday(&tv0, NULL); ssize_t got = recvmsg(s_cl, &msg, 0); gettimeofday(&tv1, NULL); tdiff = (int)tv1.tv_usec - (int)tv0.tv_usec; if (tdiff < 0) tdiff += 1000000; if (got == 0) break; if (got == -1) { perror("recvmsg"); break; } buf[got] = 0; printf("got: %s (%s MSG_EOR) tdiff=%d\n", buf, (msg.msg_flags & MSG_EOR) ? "with" : "without", tdiff); if (!strncmp(buf, "pqr", 3)) break; } return 0; } // vim:ts=2:sts=2:sw=2:et:si: === -netch- From owner-freebsd-net@FreeBSD.ORG Thu Dec 5 10:32:05 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id E85085A5 for ; Thu, 5 Dec 2013 10:32:04 +0000 (UTC) Received: from mail-n.franken.de (drew.ipv6.franken.de [IPv6:2001:638:a02:a001:20e:cff:fe4a:feaa]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 1F4ED103B for ; Thu, 5 Dec 2013 10:32:04 +0000 (UTC) Received: from [10.225.9.5] (unknown [194.95.73.101]) (Authenticated sender: macmic) by mail-n.franken.de (Postfix) with ESMTP id ED9271C0C0692; Thu, 5 Dec 2013 11:32:00 +0100 (CET) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 6.6 \(1510\)) Subject: Re: SCTP huge connect delays (at amd64) and yet another question From: Michael Tuexen In-Reply-To: <20131205084142.GA31113@netch.kiev.ua> Date: Thu, 5 Dec 2013 11:32:03 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: <11932BA9-A734-4D4F-BCBB-6A0D926A22A9@lurchi.franken.de> References: <20131205084142.GA31113@netch.kiev.ua> To: Valentin Nechayev X-Mailer: Apple Mail (2.1510) Cc: freebsd-net@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 05 Dec 2013 10:32:05 -0000 On Dec 5, 2013, at 9:41 AM, Valentin Nechayev = wrote: > Hi, >=20 > I've got some test results which are surprising and I would get > a clarification. >=20 > A simple connection is created between two one-to-one SCTP sockets > (AF_INET, SOCK_STREAM, IPPROTO_SCTP) at loopback (127.0.0.1). The > server side sends 6 3-byte messages to client side and optionally > designates writing shutdown. Client receives all them and measures > a time before each receiving. > Code is showed at the end of this message. > Tested systems are: > * FreeBSD 9.2-release/amd64 > * FreeBSD 9.1-release/amd64 > * FreeBSD 9.1-release/i386 > * Linux OpenSuSE 12.2, kernel 3.4.63-2.44-default, x86_64 > * Linux RHEL 6.3, kernel 2.6.32-279.22.1.38.0.el6.x86_64 >=20 > The first discrepancy found is specific for FreeBSD on amd64 and not > for i386 version; it's that connection setup lasts 2-4 seconds (!!) > Tcpdump shows indication that could be parsed as message miss: Hi Valentin, could you send me the .pcap file instead of the tcpdump output. I would like to see the addresses listed in the INIT and INIT-ACK. You can send that file to tuexen@freebsd.org. >=20 > tcpdump: listening on lo0, link-type NULL (BSD loopback), capture size = 65535 byt > es > 08:18:34.639422 IP (tos 0x0, ttl 64, id 65094, offset 0, flags [none], = proto SCT > P (132), length 188, bad cksum 0 (->f274)!) > 10.0.0.2.50025 > 127.0.0.1.2500: sctp I'm wondering why 10.0.0.2 is the source address and not 127.0.0.1 > 1) [INIT] [init tag: 3943463987] [rwnd: 1864135] [OS: 10] [MIS: = 2048] [i > nit TSN: 3475830004] > 08:18:34.639450 IP (tos 0x0, ttl 64, id 42621, offset 0, flags [none], = proto SCT > P (132), length 524, bad cksum 0 (->48ee)!) > 127.0.0.1.2500 > 10.0.0.2.50025: sctp > 1) [INIT ACK] [init tag: 59811639] [rwnd: 1864135] [OS: 10] = [MIS: 2048] > [init TSN: 466863335] > 08:18:34.639467 IP (tos 0x0, ttl 64, id 52783, offset 0, flags [none], = proto SCT > P (132), length 424, bad cksum 0 (->21a0)!) > 10.0.0.2.50025 > 127.0.0.1.2500: sctp > 1) [COOKIE ECHO] > 08:18:35.639618 IP (tos 0x0, ttl 64, id 12109, offset 0, flags [DF], = proto SCTP > (132), length 424, bad cksum 0 (->8082)!) > 10.0.0.2.50025 > 127.0.0.1.2500: sctp > 1) [COOKIE ECHO] > 08:18:36.692628 IP (tos 0x0, ttl 64, id 48682, offset 0, flags [DF], = proto SCTP > (132), length 76, bad cksum 0 (->7e01)!) > 127.0.0.1.2500 > 127.0.0.1.50025: sctp The retransmission goes from 127.0.0.1. Hmm. Not sure why. > 1) [HB REQ] > 08:18:36.692668 IP (tos 0x0, ttl 64, id 10809, offset 0, flags [DF], = proto SCTP (132), length 76, bad cksum 0 (->86f2)!) > 10.0.0.2.50025 > 127.0.0.1.2500: sctp > 1) [HB ACK]=20 > 08:18:36.692707 IP (tos 0x2,ECT(0), ttl 64, id 16588, offset 0, flags = [DF], proto SCTP (132), length 52, bad cksum 0 (->fb75)!) > 127.0.0.1.2500 > 127.0.0.1.50025: sctp > 1) [DATA] (B)(E) [TSN: 466863335] [SID: 0] [SSEQ 0] [PPID 0x0] = [Payload: > 0x0000: 6162 63 abc > [...] >=20 > At 08:18:34.639467, cookie echo was sent but likely ignored. One > second later it was resent. Then, yet another strange timeout was > invented before HB REQ. >=20 > Test series show this can spend more than 4 seconds, average value > is about 3 seconds. Two 20-times run summary times are 58 to 63 > seconds, so, I've got 2.9...3.15 average connect time. >=20 > Neither Linux nor 32-bit FreeBSD shows this. FreeBSD should neither... Do you see this on FreeBSD 9.2 amd64? >=20 > The second discrepancy is well known case of so-called "Nagle" > algorithm adapted for SCTP but details are confusing. If > SCTP_NODELAY isn't turned on on server side, tcpdump shows that the > second packet is sent from sender side without delay, but receiver's > SACK is delayed for 200 ms by default. These results are identical for > FreeBSD (32 bit) and Linux, but not amd64 FreeBSD (see below). But > why? A common sense suggests that, if client receives all immediately, > and server has already prepared its data, no additional delay shall be > invented. In analogue to TCP, I would expect that, until acknoledge > for "abc" is got, "def" isn't sent, but then the latter is sent > immediately. >=20 > 09:28:11.374335 IP (tos 0x2,ECT(0), ttl 64, id 24204, offset 0, flags = [DF], prot > o SCTP (132), length 52, bad cksum 0 (->ddb5)!) > 127.0.0.1.2500 > 127.0.0.1.41007: sctp > 1) [DATA] (B)(E) [TSN: 183313025] [SID: 0] [SSEQ 0] [PPID 0x0] = [Payload: > 0x0000: 6162 63 abc > 09:28:11.374349 IP (tos 0x0, ttl 64, id 522, offset 0, flags [none], = proto SCTP=20 > (132), length 48, bad cksum 0 (->7a3e)!) > 127.0.0.1.41007 > 127.0.0.1.2500: sctp > 1) [SACK] [cum ack 183313025] [a_rwnd 1863876] [#gap acks 0] = [#dup tsns=20 > 0]=20 > 09:28:11.374368 IP (tos 0x2,ECT(0), ttl 64, id 64629, offset 0, flags = [DF], prot > o SCTP (132), length 52, bad cksum 0 (->3fcc)!) > 127.0.0.1.2500 > 127.0.0.1.41007: sctp > 1) [DATA] (B)(E) [TSN: 183313026] [SID: 0] [SSEQ 1] [PPID 0x0] = [Payload: > 0x0000: 6465 66 def > 09:28:11.573780 IP (tos 0x0, ttl 64, id 12179, offset 0, flags [none], = proto SCT > P (132), length 48, bad cksum 0 (->4cb5)!) > 127.0.0.1.41007 > 127.0.0.1.2500: sctp > 1) [SACK] [cum ack 183313026] [a_rwnd 1864135] [#gap acks 0] = [#dup tsns=20 > 0]=20 >=20 Please note, that the first SACK is returned without the 200ms delay. = This is required by the RFC and the above trace seems to show that. > But, if server shuts its writing side down ("s" in argv[]), this > laziness disappears. Again, the logic is too opaque and confusing. What do you mean by this? >=20 > 64-bit (amd64) FreeBSD shows another behavior (both 9.1 and 9.2): in > addition to setup delay (see above), the delay between 2nd and 3rd > received packet (case SCTP_NODELAY isn't activated) could be longer > than minimally needed one and spreads between a few hundreds of > microseconds up to full 0.2 second delay shown on other platforms. > In average, 1/8 of runs show this delay: >=20 > $ fgrep ghi ll | sort -rn -k2,2 -t=3D | uniq -c > 1 got: ghi (with MSG_EOR) tdiff=3D200835 > 1 got: ghi (with MSG_EOR) tdiff=3D200829 > 1 got: ghi (with MSG_EOR) tdiff=3D200826 > 1 got: ghi (with MSG_EOR) tdiff=3D200822 > 1 got: ghi (with MSG_EOR) tdiff=3D200819 > 1 got: ghi (with MSG_EOR) tdiff=3D200800 > 1 got: ghi (with MSG_EOR) tdiff=3D200792 > 1 got: ghi (with MSG_EOR) tdiff=3D199885 > 1 got: ghi (with MSG_EOR) tdiff=3D163816 > 1 got: ghi (with MSG_EOR) tdiff=3D55849 > 1 got: ghi (with MSG_EOR) tdiff=3D1825 > 21 got: ghi (with MSG_EOR) tdiff=3D2 > 38 got: ghi (with MSG_EOR) tdiff=3D1 >=20 > It's definitely better than delay each run, as on other platforms > (but the initial delay annoys roughly). Without SCTP_NODELAY bundling can happen or not, it depends on timing. It would be great, if you can provide a .pcap file for a transfer you think shows some buggy behaviour. Then we can figure out what is going = on. >=20 > The testing code: > =3D=3D=3D > #include > #include > #include > #include > #include > #include > #include > #include > #include > #include > #include > #include >=20 > #define PORT 2500 >=20 > int main(int argc, char *argv[]) > { > int s_li, s_ac, s_cl; > struct sockaddr_in sia; > struct iovec iov[1]; > struct msghdr msg; > socklen_t slen; > struct timeval tv0, tv1; > int tdiff; > int i; >=20 > s_li =3D socket(AF_INET, SOCK_STREAM, IPPROTO_SCTP); > if (s_li < 0) > err(1, "socket"); > memset(&sia, 0, sizeof(sia)); > sia.sin_family =3D AF_INET; > sia.sin_addr.s_addr =3D htonl(0x7F000001); > sia.sin_port =3D htons(PORT); > if (bind(s_li, (struct sockaddr*)&sia, sizeof(sia)) < 0) > err(1, "bind"); > if (listen(s_li, 1) < 0) > err(1, "listen"); > s_cl =3D socket(AF_INET, SOCK_STREAM, IPPROTO_SCTP); > if (s_cl < 0) > err(1, "socket"); > if (connect(s_cl, (struct sockaddr*)&sia, sizeof(sia)) < 0) > err(1, "connect"); > slen =3D sizeof(sia); > s_ac =3D accept(s_li, (struct sockaddr*) &sia, &slen); > if (s_ac < 0) > err(1, "accept"); > for (i =3D 1; i < argc; ++i) { > if (!strcmp(argv[i], "nn")) { > const int one =3D 1; > if (setsockopt(s_ac, IPPROTO_SCTP, SCTP_NODELAY, &one, = sizeof(one)) < 0) > warn("setsockopt(SCTP_NODELAY)"); > } > } > if (send(s_ac, "abc", 3, 0) !=3D 3) > err(1, "send"); > if (send(s_ac, "def", 3, MSG_EOR) !=3D 3) MSG_EOR is nothing you provide at a send() call. The flag is only returned by the recvmsg() call. > err(1, "send"); > if (send(s_ac, "ghi", 3, 0) !=3D 3) > err(1, "send"); > if (send(s_ac, "jkl", 3, MSG_EOR) !=3D 3) > err(1, "send"); > if (send(s_ac, "mno", 3, 0) !=3D 3) > err(1, "send"); > if (send(s_ac, "pqr", 3, MSG_EOR) !=3D 3) > err(1, "send"); > for (i =3D 1; i < argc; ++i) { > if (!strcmp(argv[i], "s")) > shutdown(s_ac, SHUT_WR); > } > for(;;) { > char buf[1024]; > memset(&msg, 0, sizeof(msg)); > iov[0].iov_base =3D buf; iov[0].iov_len =3D sizeof(buf) - 1; > msg.msg_iov =3D iov; msg.msg_iovlen =3D 1; > gettimeofday(&tv0, NULL); > ssize_t got =3D recvmsg(s_cl, &msg, 0); > gettimeofday(&tv1, NULL); > tdiff =3D (int)tv1.tv_usec - (int)tv0.tv_usec; > if (tdiff < 0) > tdiff +=3D 1000000; > if (got =3D=3D 0) > break; > if (got =3D=3D -1) { > perror("recvmsg"); > break; > } > buf[got] =3D 0; > printf("got: %s (%s MSG_EOR) tdiff=3D%d\n", > buf, > (msg.msg_flags & MSG_EOR) ? "with" : "without", > tdiff); > if (!strncmp(buf, "pqr", 3)) > break; > } > return 0; > } OK. Here is what I would expect on the wire: Without SCTP_NODELAY: > INIT < INIT_ACK > COOKIE_ECHO < COOKIE_ACK < DATA(abc) > SACK < DATA(def);DATA(ghi);DATA(jkl);DATA(mno);DATA(pqr) > SACK > SHUTDOWN < SHUTDOWN_ACK > SHUTDOWN_COMPLETE There should be no substantial delay between any messages above. With SCTP_NODELAY > INIT < INIT_ACK > COOKIE_ECHO < COOKIE_ACK < DATA(abc) < DATA(def) < DATA(ghi) < DATA(mno) < DATA(pqr) > SHUTDOWN < SHUTDOWN_ACK > SHUTDOWN_COMPLETE There will be three SACK somewhere between the DATA chunks depending on the timing. There should be no substantial delay between any messages above. I think if you see anything else, there is a bug. So do you see a = different behavior on FreeBSD 9.2 (i386/amd64)? If yes, can you provide a .pcap = file? Here is what I see on a 9.2 amd64 system: tuexen@bsd9:~ % uname -a FreeBSD bsd9.fh-muenster.de 9.2-RELEASE FreeBSD 9.2-RELEASE #0 r255898: = Thu Sep 26 22:50:31 UTC 2013 = root@bake.isc.freebsd.org:/usr/obj/usr/src/sys/GENERIC amd64 tuexen@bsd9:~ % ./valentin=20 got: abc (with MSG_EOR) tdiff=3D3 got: def (with MSG_EOR) tdiff=3D1 got: ghi (with MSG_EOR) tdiff=3D1 got: jkl (with MSG_EOR) tdiff=3D1 got: mno (with MSG_EOR) tdiff=3D1 got: pqr (with MSG_EOR) tdiff=3D0 tuexen@bsd9:~ % ./valentin nn got: abc (with MSG_EOR) tdiff=3D4 got: def (with MSG_EOR) tdiff=3D2 got: ghi (with MSG_EOR) tdiff=3D1 got: jkl (with MSG_EOR) tdiff=3D1 got: mno (with MSG_EOR) tdiff=3D1 got: pqr (with MSG_EOR) tdiff=3D1 Do you have any special routing setup? Best regards Michael > // vim:ts=3D2:sts=3D2:sw=3D2:et:si: > =3D=3D=3D >=20 >=20 > -netch- > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >=20 From owner-freebsd-net@FreeBSD.ORG Thu Dec 5 10:57:45 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id BDE1DC5C for ; Thu, 5 Dec 2013 10:57:45 +0000 (UTC) Received: from mail-n.franken.de (drew.ipv6.franken.de [IPv6:2001:638:a02:a001:20e:cff:fe4a:feaa]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id E5B7F11AF for ; Thu, 5 Dec 2013 10:57:44 +0000 (UTC) Received: from [10.225.9.5] (unknown [194.95.73.101]) (Authenticated sender: macmic) by mail-n.franken.de (Postfix) with ESMTP id 655081C0C0693; Thu, 5 Dec 2013 11:57:43 +0100 (CET) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 6.6 \(1510\)) Subject: Re: SCTP huge connect delays (at amd64) and yet another question From: Michael Tuexen In-Reply-To: <11932BA9-A734-4D4F-BCBB-6A0D926A22A9@lurchi.franken.de> Date: Thu, 5 Dec 2013 11:57:46 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: <45DB7B10-68DE-41F2-A5E9-22AFFC65999E@lurchi.franken.de> References: <20131205084142.GA31113@netch.kiev.ua> <11932BA9-A734-4D4F-BCBB-6A0D926A22A9@lurchi.franken.de> To: Valentin Nechayev X-Mailer: Apple Mail (2.1510) Cc: freebsd-net@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 05 Dec 2013 10:57:45 -0000 More thinking and testing. Without SCTP_NODELAY the following can also happen: > INIT < INIT-ACK < COOKIE-ECHO > COOKIE-ACK < DATA(abc) > SACK < DATA(def) possibly more... 200 ms delay > SACK < all remaining DATA chunks > SHUTDOWN < SHUTDOWN-ACK > SHUTDOWN-COMPLETE Timing comes into the game. The question is if all send() calls have = been completed before the first SACK is received. Not sure this depends in i386 vs. = amd64, but timing is important. On a Raspberry Pi I saw in a reproducable way > INIT < INIT-ACK < COOKIE-ECHO > COOKIE-ACK < DATA(abc) > SACK < DATA(def) 200 ms delay > SACK < DATA(ghi);DATA(jkl);DATA(mno);DATA(pqr); > SHUTDOWN < SHUTDOWN-ACK > SHUTDOWN-COMPLETE Best regards Michael On Dec 5, 2013, at 11:32 AM, Michael Tuexen = wrote: > On Dec 5, 2013, at 9:41 AM, Valentin Nechayev = wrote: >=20 >> Hi, >>=20 >> I've got some test results which are surprising and I would get >> a clarification. >>=20 >> A simple connection is created between two one-to-one SCTP sockets >> (AF_INET, SOCK_STREAM, IPPROTO_SCTP) at loopback (127.0.0.1). The >> server side sends 6 3-byte messages to client side and optionally >> designates writing shutdown. Client receives all them and measures >> a time before each receiving. >> Code is showed at the end of this message. >> Tested systems are: >> * FreeBSD 9.2-release/amd64 >> * FreeBSD 9.1-release/amd64 >> * FreeBSD 9.1-release/i386 >> * Linux OpenSuSE 12.2, kernel 3.4.63-2.44-default, x86_64 >> * Linux RHEL 6.3, kernel 2.6.32-279.22.1.38.0.el6.x86_64 >>=20 >> The first discrepancy found is specific for FreeBSD on amd64 and not >> for i386 version; it's that connection setup lasts 2-4 seconds (!!) >> Tcpdump shows indication that could be parsed as message miss: > Hi Valentin, >=20 > could you send me the .pcap file instead of the tcpdump output. > I would like to see the addresses listed in the INIT and INIT-ACK. >=20 > You can send that file to tuexen@freebsd.org. >>=20 >> tcpdump: listening on lo0, link-type NULL (BSD loopback), capture = size 65535 byt >> es >> 08:18:34.639422 IP (tos 0x0, ttl 64, id 65094, offset 0, flags = [none], proto SCT >> P (132), length 188, bad cksum 0 (->f274)!) >> 10.0.0.2.50025 > 127.0.0.1.2500: sctp > I'm wondering why 10.0.0.2 is the source address and not 127.0.0.1 >> 1) [INIT] [init tag: 3943463987] [rwnd: 1864135] [OS: 10] [MIS: = 2048] [i >> nit TSN: 3475830004] >> 08:18:34.639450 IP (tos 0x0, ttl 64, id 42621, offset 0, flags = [none], proto SCT >> P (132), length 524, bad cksum 0 (->48ee)!) >> 127.0.0.1.2500 > 10.0.0.2.50025: sctp >> 1) [INIT ACK] [init tag: 59811639] [rwnd: 1864135] [OS: 10] = [MIS: 2048] >> [init TSN: 466863335] >> 08:18:34.639467 IP (tos 0x0, ttl 64, id 52783, offset 0, flags = [none], proto SCT >> P (132), length 424, bad cksum 0 (->21a0)!) >> 10.0.0.2.50025 > 127.0.0.1.2500: sctp >> 1) [COOKIE ECHO] >> 08:18:35.639618 IP (tos 0x0, ttl 64, id 12109, offset 0, flags [DF], = proto SCTP >> (132), length 424, bad cksum 0 (->8082)!) >> 10.0.0.2.50025 > 127.0.0.1.2500: sctp >> 1) [COOKIE ECHO] >> 08:18:36.692628 IP (tos 0x0, ttl 64, id 48682, offset 0, flags [DF], = proto SCTP >> (132), length 76, bad cksum 0 (->7e01)!) >> 127.0.0.1.2500 > 127.0.0.1.50025: sctp > The retransmission goes from 127.0.0.1. Hmm. Not sure why. >> 1) [HB REQ] >> 08:18:36.692668 IP (tos 0x0, ttl 64, id 10809, offset 0, flags [DF], = proto SCTP (132), length 76, bad cksum 0 (->86f2)!) >> 10.0.0.2.50025 > 127.0.0.1.2500: sctp >> 1) [HB ACK]=20 >> 08:18:36.692707 IP (tos 0x2,ECT(0), ttl 64, id 16588, offset 0, flags = [DF], proto SCTP (132), length 52, bad cksum 0 (->fb75)!) >> 127.0.0.1.2500 > 127.0.0.1.50025: sctp >> 1) [DATA] (B)(E) [TSN: 466863335] [SID: 0] [SSEQ 0] [PPID 0x0] = [Payload: >> 0x0000: 6162 63 abc >> [...] >>=20 >> At 08:18:34.639467, cookie echo was sent but likely ignored. One >> second later it was resent. Then, yet another strange timeout was >> invented before HB REQ. >>=20 >> Test series show this can spend more than 4 seconds, average value >> is about 3 seconds. Two 20-times run summary times are 58 to 63 >> seconds, so, I've got 2.9...3.15 average connect time. >>=20 >> Neither Linux nor 32-bit FreeBSD shows this. > FreeBSD should neither... Do you see this on FreeBSD 9.2 amd64? >>=20 >> The second discrepancy is well known case of so-called "Nagle" >> algorithm adapted for SCTP but details are confusing. If >> SCTP_NODELAY isn't turned on on server side, tcpdump shows that the >> second packet is sent from sender side without delay, but receiver's >> SACK is delayed for 200 ms by default. These results are identical = for >> FreeBSD (32 bit) and Linux, but not amd64 FreeBSD (see below). But >> why? A common sense suggests that, if client receives all = immediately, >> and server has already prepared its data, no additional delay shall = be >> invented. In analogue to TCP, I would expect that, until acknoledge >> for "abc" is got, "def" isn't sent, but then the latter is sent >> immediately. >>=20 >> 09:28:11.374335 IP (tos 0x2,ECT(0), ttl 64, id 24204, offset 0, flags = [DF], prot >> o SCTP (132), length 52, bad cksum 0 (->ddb5)!) >> 127.0.0.1.2500 > 127.0.0.1.41007: sctp >> 1) [DATA] (B)(E) [TSN: 183313025] [SID: 0] [SSEQ 0] [PPID 0x0] = [Payload: >> 0x0000: 6162 63 abc >> 09:28:11.374349 IP (tos 0x0, ttl 64, id 522, offset 0, flags [none], = proto SCTP=20 >> (132), length 48, bad cksum 0 (->7a3e)!) >> 127.0.0.1.41007 > 127.0.0.1.2500: sctp >> 1) [SACK] [cum ack 183313025] [a_rwnd 1863876] [#gap acks 0] = [#dup tsns=20 >> 0]=20 >> 09:28:11.374368 IP (tos 0x2,ECT(0), ttl 64, id 64629, offset 0, flags = [DF], prot >> o SCTP (132), length 52, bad cksum 0 (->3fcc)!) >> 127.0.0.1.2500 > 127.0.0.1.41007: sctp >> 1) [DATA] (B)(E) [TSN: 183313026] [SID: 0] [SSEQ 1] [PPID 0x0] = [Payload: >> 0x0000: 6465 66 def >> 09:28:11.573780 IP (tos 0x0, ttl 64, id 12179, offset 0, flags = [none], proto SCT >> P (132), length 48, bad cksum 0 (->4cb5)!) >> 127.0.0.1.41007 > 127.0.0.1.2500: sctp >> 1) [SACK] [cum ack 183313026] [a_rwnd 1864135] [#gap acks 0] = [#dup tsns=20 >> 0]=20 >>=20 > Please note, that the first SACK is returned without the 200ms delay. = This is > required by the RFC and the above trace seems to show that. >> But, if server shuts its writing side down ("s" in argv[]), this >> laziness disappears. Again, the logic is too opaque and confusing. > What do you mean by this? >>=20 >> 64-bit (amd64) FreeBSD shows another behavior (both 9.1 and 9.2): in >> addition to setup delay (see above), the delay between 2nd and 3rd >> received packet (case SCTP_NODELAY isn't activated) could be longer >> than minimally needed one and spreads between a few hundreds of >> microseconds up to full 0.2 second delay shown on other platforms. >> In average, 1/8 of runs show this delay: >>=20 >> $ fgrep ghi ll | sort -rn -k2,2 -t=3D | uniq -c >> 1 got: ghi (with MSG_EOR) tdiff=3D200835 >> 1 got: ghi (with MSG_EOR) tdiff=3D200829 >> 1 got: ghi (with MSG_EOR) tdiff=3D200826 >> 1 got: ghi (with MSG_EOR) tdiff=3D200822 >> 1 got: ghi (with MSG_EOR) tdiff=3D200819 >> 1 got: ghi (with MSG_EOR) tdiff=3D200800 >> 1 got: ghi (with MSG_EOR) tdiff=3D200792 >> 1 got: ghi (with MSG_EOR) tdiff=3D199885 >> 1 got: ghi (with MSG_EOR) tdiff=3D163816 >> 1 got: ghi (with MSG_EOR) tdiff=3D55849 >> 1 got: ghi (with MSG_EOR) tdiff=3D1825 >> 21 got: ghi (with MSG_EOR) tdiff=3D2 >> 38 got: ghi (with MSG_EOR) tdiff=3D1 >>=20 >> It's definitely better than delay each run, as on other platforms >> (but the initial delay annoys roughly). > Without SCTP_NODELAY bundling can happen or not, it depends on timing. > It would be great, if you can provide a .pcap file for a transfer you > think shows some buggy behaviour. Then we can figure out what is going = on. >>=20 >> The testing code: >> =3D=3D=3D >> #include >> #include >> #include >> #include >> #include >> #include >> #include >> #include >> #include >> #include >> #include >> #include >>=20 >> #define PORT 2500 >>=20 >> int main(int argc, char *argv[]) >> { >> int s_li, s_ac, s_cl; >> struct sockaddr_in sia; >> struct iovec iov[1]; >> struct msghdr msg; >> socklen_t slen; >> struct timeval tv0, tv1; >> int tdiff; >> int i; >>=20 >> s_li =3D socket(AF_INET, SOCK_STREAM, IPPROTO_SCTP); >> if (s_li < 0) >> err(1, "socket"); >> memset(&sia, 0, sizeof(sia)); >> sia.sin_family =3D AF_INET; >> sia.sin_addr.s_addr =3D htonl(0x7F000001); >> sia.sin_port =3D htons(PORT); >> if (bind(s_li, (struct sockaddr*)&sia, sizeof(sia)) < 0) >> err(1, "bind"); >> if (listen(s_li, 1) < 0) >> err(1, "listen"); >> s_cl =3D socket(AF_INET, SOCK_STREAM, IPPROTO_SCTP); >> if (s_cl < 0) >> err(1, "socket"); >> if (connect(s_cl, (struct sockaddr*)&sia, sizeof(sia)) < 0) >> err(1, "connect"); >> slen =3D sizeof(sia); >> s_ac =3D accept(s_li, (struct sockaddr*) &sia, &slen); >> if (s_ac < 0) >> err(1, "accept"); >> for (i =3D 1; i < argc; ++i) { >> if (!strcmp(argv[i], "nn")) { >> const int one =3D 1; >> if (setsockopt(s_ac, IPPROTO_SCTP, SCTP_NODELAY, &one, = sizeof(one)) < 0) >> warn("setsockopt(SCTP_NODELAY)"); >> } >> } >> if (send(s_ac, "abc", 3, 0) !=3D 3) >> err(1, "send"); >> if (send(s_ac, "def", 3, MSG_EOR) !=3D 3) > MSG_EOR is nothing you provide at a send() call. The flag is only > returned by the recvmsg() call. >> err(1, "send"); >> if (send(s_ac, "ghi", 3, 0) !=3D 3) >> err(1, "send"); >> if (send(s_ac, "jkl", 3, MSG_EOR) !=3D 3) >> err(1, "send"); >> if (send(s_ac, "mno", 3, 0) !=3D 3) >> err(1, "send"); >> if (send(s_ac, "pqr", 3, MSG_EOR) !=3D 3) >> err(1, "send"); >> for (i =3D 1; i < argc; ++i) { >> if (!strcmp(argv[i], "s")) >> shutdown(s_ac, SHUT_WR); >> } >> for(;;) { >> char buf[1024]; >> memset(&msg, 0, sizeof(msg)); >> iov[0].iov_base =3D buf; iov[0].iov_len =3D sizeof(buf) - 1; >> msg.msg_iov =3D iov; msg.msg_iovlen =3D 1; >> gettimeofday(&tv0, NULL); >> ssize_t got =3D recvmsg(s_cl, &msg, 0); >> gettimeofday(&tv1, NULL); >> tdiff =3D (int)tv1.tv_usec - (int)tv0.tv_usec; >> if (tdiff < 0) >> tdiff +=3D 1000000; >> if (got =3D=3D 0) >> break; >> if (got =3D=3D -1) { >> perror("recvmsg"); >> break; >> } >> buf[got] =3D 0; >> printf("got: %s (%s MSG_EOR) tdiff=3D%d\n", >> buf, >> (msg.msg_flags & MSG_EOR) ? "with" : "without", >> tdiff); >> if (!strncmp(buf, "pqr", 3)) >> break; >> } >> return 0; >> } > OK. Here is what I would expect on the wire: >=20 > Without SCTP_NODELAY: >=20 >> INIT > < INIT_ACK >> COOKIE_ECHO > < COOKIE_ACK > < DATA(abc) >> SACK > < DATA(def);DATA(ghi);DATA(jkl);DATA(mno);DATA(pqr) >> SACK >> SHUTDOWN > < SHUTDOWN_ACK >> SHUTDOWN_COMPLETE >=20 > There should be no substantial delay between any messages above. >=20 > With SCTP_NODELAY >> INIT > < INIT_ACK >> COOKIE_ECHO > < COOKIE_ACK > < DATA(abc) > < DATA(def) > < DATA(ghi) > < DATA(mno) > < DATA(pqr) >> SHUTDOWN > < SHUTDOWN_ACK >> SHUTDOWN_COMPLETE >=20 > There will be three SACK somewhere between the DATA chunks depending = on > the timing. >=20 > There should be no substantial delay between any messages above. >=20 > I think if you see anything else, there is a bug. So do you see a = different > behavior on FreeBSD 9.2 (i386/amd64)? If yes, can you provide a .pcap = file? >=20 >=20 > Here is what I see on a 9.2 amd64 system: >=20 > tuexen@bsd9:~ % uname -a > FreeBSD bsd9.fh-muenster.de 9.2-RELEASE FreeBSD 9.2-RELEASE #0 = r255898: Thu Sep 26 22:50:31 UTC 2013 = root@bake.isc.freebsd.org:/usr/obj/usr/src/sys/GENERIC amd64 > tuexen@bsd9:~ % ./valentin=20 > got: abc (with MSG_EOR) tdiff=3D3 > got: def (with MSG_EOR) tdiff=3D1 > got: ghi (with MSG_EOR) tdiff=3D1 > got: jkl (with MSG_EOR) tdiff=3D1 > got: mno (with MSG_EOR) tdiff=3D1 > got: pqr (with MSG_EOR) tdiff=3D0 > tuexen@bsd9:~ % ./valentin nn > got: abc (with MSG_EOR) tdiff=3D4 > got: def (with MSG_EOR) tdiff=3D2 > got: ghi (with MSG_EOR) tdiff=3D1 > got: jkl (with MSG_EOR) tdiff=3D1 > got: mno (with MSG_EOR) tdiff=3D1 > got: pqr (with MSG_EOR) tdiff=3D1 >=20 > Do you have any special routing setup? >=20 > Best regards > Michael >> // vim:ts=3D2:sts=3D2:sw=3D2:et:si: >> =3D=3D=3D >>=20 >>=20 >> -netch- >> _______________________________________________ >> freebsd-net@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-net >> To unsubscribe, send any mail to = "freebsd-net-unsubscribe@freebsd.org" >>=20 >=20 > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >=20 From owner-freebsd-net@FreeBSD.ORG Thu Dec 5 11:46:11 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 2750216F for ; Thu, 5 Dec 2013 11:46:11 +0000 (UTC) Received: from mail-ve0-f177.google.com (mail-ve0-f177.google.com [209.85.128.177]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id D96CC158B for ; Thu, 5 Dec 2013 11:46:10 +0000 (UTC) Received: by mail-ve0-f177.google.com with SMTP id db12so13229678veb.8 for ; Thu, 05 Dec 2013 03:46:04 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:date:message-id:subject:from:to :content-type; bh=cZ0SuuLqXcu1J5zThNldubYBkpS6gR2tDX0Hzf/jIEU=; b=BsmnILlK0YkOwK11zmMLln3azyvhJUxRKPS3jaKkk4K3sG1PwTljotlcfHR+qqOyR3 x8G7rKN+hNJXRjOpTtKBgAIF18klyEZue7p0xMxmEZK+6rFkVtkvdJD7VEJidR2MQr1m EDX6+V4iuGLa1AjycMRY5JmZEdSxCinRJAr+R2/gD/62s/t3JCKBTOYrUFZ82zD0mxX4 ZV5VsuSJbt6wpEIJo/xBqdligGyfjNB0L1SIQS2DzB+C9eODtvAFrlnTWYz98pmjHdeg SDRRcHTrz/9GHM8ybQTZvHjEJDZRg5UdfC/K9EyomUGRiSEkPq3sKe1Fvgkmk3MqA7iv CtAw== X-Gm-Message-State: ALoCoQlaocxIH4l9bH6seEsPkp9dyhitHRHCile58Z6PLWgnVaopKzrgZnTzl8rEIL+IVJzgAteS MIME-Version: 1.0 X-Received: by 10.220.86.69 with SMTP id r5mr62999959vcl.9.1386243964186; Thu, 05 Dec 2013 03:46:04 -0800 (PST) Received: by 10.221.48.3 with HTTP; Thu, 5 Dec 2013 03:46:04 -0800 (PST) Date: Thu, 5 Dec 2013 13:46:04 +0200 Message-ID: Subject: Relayd and load balancing modes From: Ilias Bertsimas To: freebsd-net@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.17 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 05 Dec 2013 11:46:11 -0000 Hello All, We are baffled by the relayd modes for relays as it seems some of them are not working at all. We are running FreeBSD 9.1-RELEASE-p7 and the latest relayd from ports relayd-5.4.20131122. We notice on relays with 2 hosts both of them up 100% we only get traffic sent to just one server. We tried mode loadbalance/hash/source-hash without any success. We only get load balancing between the 2 hosts with roundrobin or random. Both clients and target hosts are on the same vlan. We also had issues with any version apart from the "stable" packaged one that comes with FreeBSD 9.1. We end up with 2-3 relayd child procs at 100% cpu without doing anything I tried ktrace on them but there were no syscalls or anything else going on. They are unresponsive and survive reloads and only can be terminated with kill -9. Any ideas what is going on ? Kind Regards, Ilias Bertsimas. From owner-freebsd-net@FreeBSD.ORG Thu Dec 5 12:30:27 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 5549ADB3 for ; Thu, 5 Dec 2013 12:30:27 +0000 (UTC) Received: from segfault.kiev.ua (segfault.kiev.ua [193.193.193.4]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 9303D1856 for ; Thu, 5 Dec 2013 12:30:26 +0000 (UTC) Received: from segfault.kiev.ua (localhost.segfault.kiev.ua [127.0.0.1]) by segfault.kiev.ua (8.14.5/8.14.5/8.Who.Cares) with ESMTP id rB5CUAtj057583; Thu, 5 Dec 2013 14:30:10 +0200 (EET) (envelope-from netch@segfault.kiev.ua) Received: (from netch@localhost) by segfault.kiev.ua (8.14.5/8.14.5/Submit) id rB5CU5Pe057580; Thu, 5 Dec 2013 14:30:05 +0200 (EET) (envelope-from netch) Date: Thu, 5 Dec 2013 14:30:05 +0200 From: Valentin Nechayev To: Michael Tuexen Subject: Re: SCTP huge connect delays (at amd64) and yet another question Message-ID: <20131205123005.GE71737@netch.kiev.ua> References: <20131205084142.GA31113@netch.kiev.ua> <11932BA9-A734-4D4F-BCBB-6A0D926A22A9@lurchi.franken.de> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="bCsyhTFzCvuiizWE" Content-Disposition: inline In-Reply-To: <11932BA9-A734-4D4F-BCBB-6A0D926A22A9@lurchi.franken.de> X-42: On Cc: freebsd-net@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 05 Dec 2013 12:30:27 -0000 --bCsyhTFzCvuiizWE Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Hi, Thu, Dec 05, 2013 at 11:32:03, Michael.Tuexen wrote about "Re: SCTP huge connect delays (at amd64) and yet another question": > > The first discrepancy found is specific for FreeBSD on amd64 and not > > for i386 version; it's that connection setup lasts 2-4 seconds (!!) > > Tcpdump shows indication that could be parsed as message miss: > Hi Valentin, > > could you send me the .pcap file instead of the tcpdump output. > I would like to see the addresses listed in the INIT and INIT-ACK. I've sent them, thanks. > > tcpdump: listening on lo0, link-type NULL (BSD loopback), capture size 65535 byt > > es > > 08:18:34.639422 IP (tos 0x0, ttl 64, id 65094, offset 0, flags [none], proto SCT > > P (132), length 188, bad cksum 0 (->f274)!) > > 10.0.0.2.50025 > 127.0.0.1.2500: sctp > I'm wondering why 10.0.0.2 is the source address and not 127.0.0.1 I've showed the code, it doesn't make any explicit binding or address suggestion. For this host (9.1/i386), 10.0.0.2 resides on xl0. There is no routing specifics which forces it to select 10.0.0.2: $ route -n get 127.0.0.1 route to: 127.0.0.1 destination: 127.0.0.1 interface: lo0 flags: recvpipe sendpipe ssthresh rtt,msec mtu weight expire 0 0 0 0 16384 1 0 $ telnet 127.0.0.1 25 Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. 220 iv.local ESMTP Sendmail 8.14.5/8.14.5; Thu, 5 Dec 2013 13:48:31 +0200 (EET) ehlo zzz 250-iv.local Hello netch@localhost [127.0.0.1], pleased to meet you [...] At least for TCP and UDP, it's quite straightforward. > > At 08:18:34.639467, cookie echo was sent but likely ignored. One > > second later it was resent. Then, yet another strange timeout was > > invented before HB REQ. > > > > Test series show this can spend more than 4 seconds, average value > > is about 3 seconds. Two 20-times run summary times are 58 to 63 > > seconds, so, I've got 2.9...3.15 average connect time. > > > > Neither Linux nor 32-bit FreeBSD shows this. > FreeBSD should neither... Do you see this on FreeBSD 9.2 amd64? Yes. A fresh dump has reproduced this. > > It's definitely better than delay each run, as on other platforms > > (but the initial delay annoys roughly). > Without SCTP_NODELAY bundling can happen or not, it depends on timing. > It would be great, if you can provide a .pcap file for a transfer you > think shows some buggy behaviour. Then we can figure out what is going on. > MSG_EOR is nothing you provide at a send() call. The flag is only > returned by the recvmsg() call. Yes, I know. This has remained from the code which exposes SOCK_SEQPACKET specifics over different transport families (e.g. FreeBSD keeps this flag over AF_UNIX but Linux doesn't). I didn't take it into account, but, if is needed for sight clarity, I'll remove it:) > > } > OK. Here is what I would expect on the wire: > > Without SCTP_NODELAY: > > > INIT > < INIT_ACK > > COOKIE_ECHO > < COOKIE_ACK > < DATA(abc) > > SACK > < DATA(def);DATA(ghi);DATA(jkl);DATA(mno);DATA(pqr) > > SACK > > SHUTDOWN > < SHUTDOWN_ACK > > SHUTDOWN_COMPLETE > > There should be no substantial delay between any messages above. > > With SCTP_NODELAY > > INIT > < INIT_ACK > > COOKIE_ECHO > < COOKIE_ACK > < DATA(abc) > < DATA(def) > < DATA(ghi) > < DATA(mno) > < DATA(pqr) > > SHUTDOWN > < SHUTDOWN_ACK > > SHUTDOWN_COMPLETE > > There will be three SACK somewhere between the DATA chunks depending on > the timing. > > There should be no substantial delay between any messages above. > > I think if you see anything else, there is a bug. So do you see a different > behavior on FreeBSD 9.2 (i386/amd64)? If yes, can you provide a .pcap file? Sorry, I don't have 9.2/i386 yet. The dump from 9.1 is attached. It has no address mess but the event sequence is following: > INIT < INIT_ACK > COOKIE_ECHO < COOKIE_ACK < DATA(abc) > SACK < DATA(def) ... delay 200ms... > SACK < DATA(ghi); DATA(jkl); DATA(mno); DATA(pqr) Comparing to your description, it has unexplained waiting after DATA(def) from the server side, and SACK delay from the client side. If you think it's fixed in 9.2, we can postpone this part of discussion until my upgrade to 9.2. > Do you have any special routing setup? Just this box (9.1/i386) is trivial, no any routing specifics. For amd64 boxes, I've sent routing details privately. But it seems there are also none principally "special" these except multiple addresses at loopback. > Please note, that the first SACK is returned without the 200ms delay. This is > required by the RFC and the above trace seems to show that. > > But, if server shuts its writing side down ("s" in argv[]), this > > laziness disappears. Again, the logic is too opaque and confusing. > What do you mean by this? At least, removing this delay by shutdown(,SHUT_WR) is unexpected. -netch- --bCsyhTFzCvuiizWE Content-Type: application/octet-stream Content-Disposition: attachment; filename="dump.blocking.91.i386" Content-Transfer-Encoding: base64 1MOyoQIABAAAAAAAAAAAAP//AAAAAAAA7GmgUpLSBgCoAAAAqAAAAAIAAABFAACkjqAAAECE AAB/AAABfwAAAbCOCcQAAAAAAAAAAAEAAITCIlkIABxxxwAKCACDBLlQAAwACAAFAAbABgAI UExSU4AAAATAAAAEgAgACsGAwIGCDwAAgAIAJJXYsfcXBl4xhHUBiJhYaQrSXyWD9MxR7Ovu h+UaZfMDgAQACAABAAOAAwAGgMEAAAAFAAjBwcEEAAUACAoAAAEABQAIfwAAAexpoFK/0gYA 8AEAAPABAAACAAAARQAB7PA7AABAhAAAfwAAAX8AAAEJxLCOwiJZCAAAAAACAAHMmtfahgAc cccACggAjNQE78AGAAhQTFJTgAAABMAAAASACAAKwYDAgYIPAACAAgAk9Jph42rEKuidkZoz GJX+yxzBaaTvAJD628lchrCAA6yABAAIAAEAA4ADAAaAwQAAAAcBaEtBTUUtQlNEIDEuMQAA AAAp8nAAfAEEAGDqAAAAAAAAAAAAAMIiWQia19qGfwAAAQAAAAAAAAAAAAAAAAUAAAB/AAAB AAAAAAAAAAAAAAAABQAAAAAAAACwjgnEAQAAAQEBAAAAAAAAAQAAhMIiWQgAHHHHAAoIAIME uVAADAAIAAUABsAGAAhQTFJTgAAABMAAAASACAAKwYDAgYIPAACAAgAkldix9xcGXjGEdQGI mFhpCtJfJYP0zFHs6+6H5Rpl8wOABAAIAAEAA4ADAAaAwQAAAAUACMHBwQQABQAICgAAAQAF AAh/AAABAgABzJrX2oYAHHHHAAoIAIzUBO/ABgAIUExSU4AAAATAAAAEgAgACsGAwIGCDwAA gAIAJPSaYeNqxCronZGaMxiV/sscwWmk7wCQ+tvJXIawgAOsgAQACAABAAOAAwAGgMEAABNy y/c5fNDpoXfpgYzvgMFLDkul7GmgUtrSBgCMAQAAjAEAAAIAAABFAAGIpeUAAECEAAB/AAAB fwAAAbCOCcSa19qGAAAAAAoAAWhLQU1FLUJTRCAxLjEAAAAAKfJwAHwBBABg6gAAAAAAAAAA AADCIlkImtfahn8AAAEAAAAAAAAAAAAAAAAFAAAAfwAAAQAAAAAAAAAAAAAAAAUAAAAAAAAA sI4JxAEAAAEBAQAAAAAAAAEAAITCIlkIABxxxwAKCACDBLlQAAwACAAFAAbABgAIUExSU4AA AATAAAAEgAgACsGAwIGCDwAAgAIAJJXYsfcXBl4xhHUBiJhYaQrSXyWD9MxR7Ovuh+UaZfMD gAQACAABAAOAAwAGgMEAAAAFAAjBwcEEAAUACAoAAAEABQAIfwAAAQIAAcya19qGABxxxwAK CACM1ATvwAYACFBMUlOAAAAEwAAABIAIAArBgMCBgg8AAIACACT0mmHjasQq6J2RmjMYlf7L HMFppO8AkPrbyVyGsIADrIAEAAgAAQADgAMABoDBAAATcsv3OXzQ6aF36YGM74DBSw5Lpexp oFIM0wYAKAAAACgAAAACAAAARQAAJHjVQABAhAAAfwAAAX8AAAEJxLCOwiJZCAAAAAALAAAE 7GmgUpbTBgA4AAAAOAAAAAIAAABFAgA0aepAAECEAAB/AAABfwAAAQnEsI7CIlkIAAAAAAAD ABOM1ATvAAAAAAAAAABhYmMA7GmgUqTTBgA0AAAANAAAAAIAAABFAAAwx1YAAECEAAB/AAAB fwAAAbCOCcSa19qGAAAAAAMAABCM1ATvABxwxAAAAADsaaBSt9MGADgAAAA4AAAAAgAAAEUC ADRihEAAQIQAAH8AAAF/AAABCcSwjsIiWQgAAAAAAAMAE4zUBPAAAAABAAAAAGRlZgDsaaBS e94JADQAAAA0AAAAAgAAAEUAADChDwAAQIQAAH8AAAF/AAABsI4JxJrX2oYAAAAAAwAAEIzU BPAAHHHHAAAAAOxpoFKZ3gkAdAAAAHQAAAACAAAARQIAcKjqQABAhAAAfwAAAX8AAAEJxLCO wiJZCAAAAAAAAwATjNQE8QAAAAIAAAAAZ2hpAAADABOM1ATyAAAAAwAAAABqa2wAAAMAE4zU BPMAAAAEAAAAAG1ubwAAAwATjNQE9AAAAAUAAAAAcHFyAOxpoFIg3wkALAAAACwAAAACAAAA RQAAKALUQABAhAAAfwAAAX8AAAGwjgnEmtfahgAAAAAHAAAIjNQE9OxpoFIt3wkAKAAAACgA AAACAAAARQAAJDX4QABAhAAAfwAAAX8AAAEJxLCOwiJZCAAAAAAIAAAE7GmgUjffCQAoAAAA KAAAAAIAAABFAAAkAjRAAECEAAB/AAABfwAAAbCOCcSa19qGAAAAAA4AAAQ= --bCsyhTFzCvuiizWE Content-Type: application/octet-stream Content-Disposition: attachment; filename="dump.blocking.91.i386.with_shutdown" Content-Transfer-Encoding: base64 1MOyoQIABAAAAAAAAAAAAP//AAAAAAAAR3GgUo43AQCoAAAAqAAAAAIAAABFAACk7TkAAECE AAB/AAABfwAAAbqICcQAAAAAAAAAAAEAAIR66MIVABxxxwAKCAD9YvR3AAwACAAFAAbABgAI UExSU4AAAATAAAAEgAgACsGAwIGCDwAAgAIAJEhbs8Eq+J+5eMsDCDZeYbQkOOToNyfE9mtW kOtYMkRpgAQACAABAAOAAwAGgMEAAAAFAAjBwcEEAAUACAoAAAEABQAIfwAAAUdxoFLLNwEA 8AEAAPABAAACAAAARQAB7B05AABAhAAAfwAAAX8AAAEJxLqIeujCFQAAAAACAAHM1zt92QAc cccACggA+kkUysAGAAhQTFJTgAAABMAAAASACAAKwYDAgYIPAACAAgAkf1KiUVKTzCLyCOhR 3JrwxqftPTy4UkCNqSAfMInuC9CABAAIAAEAA4ADAAaAwQAAAAcBaEtBTUUtQlNEIDEuMQAA AACD+XAAJqoNAGDqAAAAAAAAAAAAAHrowhXXO33ZfwAAAQAAAAAAAAAAAAAAAAUAAAB/AAAB AAAAAAAAAAAAAAAABQAAAAAAAAC6iAnEAQAAAQEBAAAAAAAAAQAAhHrowhUAHHHHAAoIAP1i 9HcADAAIAAUABsAGAAhQTFJTgAAABMAAAASACAAKwYDAgYIPAACAAgAkSFuzwSr4n7l4ywMI Nl5htCQ45Og3J8T2a1aQ61gyRGmABAAIAAEAA4ADAAaAwQAAAAUACMHBwQQABQAICgAAAQAF AAh/AAABAgABzNc7fdkAHHHHAAoIAPpJFMrABgAIUExSU4AAAATAAAAEgAgACsGAwIGCDwAA gAIAJH9SolFSk8wi8gjoUdya8Man7T08uFJAjakgHzCJ7gvQgAQACAABAAOAAwAGgMEAAPpS mvBRdTocxYM2w4bfg3L/e5M5R3GgUuo3AQCMAQAAjAEAAAIAAABFAAGIkH4AAECEAAB/AAAB fwAAAbqICcTXO33ZAAAAAAoAAWhLQU1FLUJTRCAxLjEAAAAAg/lwACaqDQBg6gAAAAAAAAAA AAB66MIV1zt92X8AAAEAAAAAAAAAAAAAAAAFAAAAfwAAAQAAAAAAAAAAAAAAAAUAAAAAAAAA uogJxAEAAAEBAQAAAAAAAAEAAIR66MIVABxxxwAKCAD9YvR3AAwACAAFAAbABgAIUExSU4AA AATAAAAEgAgACsGAwIGCDwAAgAIAJEhbs8Eq+J+5eMsDCDZeYbQkOOToNyfE9mtWkOtYMkRp gAQACAABAAOAAwAGgMEAAAAFAAjBwcEEAAUACAoAAAEABQAIfwAAAQIAAczXO33ZABxxxwAK CAD6SRTKwAYACFBMUlOAAAAEwAAABIAIAArBgMCBgg8AAIACACR/UqJRUpPMIvII6FHcmvDG p+09PLhSQI2pIB8wie4L0IAEAAgAAQADgAMABoDBAAD6UprwUXU6HMWDNsOG34Ny/3uTOUdx oFIcOAEAKAAAACgAAAACAAAARQAAJJFZQABAhAAAfwAAAX8AAAEJxLqIeujCFQAAAAALAAAE R3GgUks5AQA4AAAAOAAAAAIAAABFAgA0h0JAAECEAAB/AAABfwAAAQnEuoh66MIVAAAAAAAD ABP6SRTKAAAAAAAAAABhYmMAR3GgUlo5AQA0AAAANAAAAAIAAABFAAAwEKkAAECEAAB/AAAB fwAAAbqICcTXO33ZAAAAAAMAABD6SRTKABxwxAAAAABHcaBSszkBADgAAAA4AAAAAgAAAEUC ADTXKkAAQIQAAH8AAAF/AAABCcS6iHrowhUAAAAAAAMAE/pJFMsAAAABAAAAAGRlZgBHcaBS yzkBAHQAAAB0AAAAAgAAAEUCAHBuJkAAQIQAAH8AAAF/AAABCcS6iHrowhUAAAAAAAMAE/pJ FMwAAAACAAAAAGdoaQAAAwAT+kkUzQAAAAMAAAAAamtsAAADABP6SRTOAAAABAAAAABtbm8A AAMAE/pJFM8AAAAFAAAAAHBxcgBHcaBS1DkBADQAAAA0AAAAAgAAAEUAADC2AwAAQIQAAH8A AAF/AAABuogJxNc7fdkAAAAAAwAAEPpJFM8AHGu1AAAAAEdxoFLhOQEALAAAACwAAAACAAAA RQAAKNpBQABAhAAAfwAAAX8AAAEJxLqIeujCFQAAAAAHAAAI/WL0dkdxoFLpOQEAKAAAACgA AAACAAAARQAAJDaaQABAhAAAfwAAAX8AAAG6iAnE1zt92QAAAAAIAAAER3GgUu85AQAoAAAA KAAAAAIAAABFAAAkdutAAECEAAB/AAABfwAAAQnEuoh66MIVAAAAAA4AAAQ= --bCsyhTFzCvuiizWE-- From owner-freebsd-net@FreeBSD.ORG Thu Dec 5 13:39:05 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 956EED4A for ; Thu, 5 Dec 2013 13:39:05 +0000 (UTC) Received: from mail-n.franken.de (drew.ipv6.franken.de [IPv6:2001:638:a02:a001:20e:cff:fe4a:feaa]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id A74FD1D47 for ; Thu, 5 Dec 2013 13:39:04 +0000 (UTC) Received: from [10.225.9.5] (unknown [194.95.73.101]) (Authenticated sender: macmic) by mail-n.franken.de (Postfix) with ESMTP id 7B7691C0C0693; Thu, 5 Dec 2013 14:39:01 +0100 (CET) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 6.6 \(1510\)) Subject: Re: SCTP huge connect delays (at amd64) and yet another question From: Michael Tuexen In-Reply-To: <20131205123005.GE71737@netch.kiev.ua> Date: Thu, 5 Dec 2013 14:39:01 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: <1564E942-DC9E-4142-89F3-B82EEF1A103C@lurchi.franken.de> References: <20131205084142.GA31113@netch.kiev.ua> <11932BA9-A734-4D4F-BCBB-6A0D926A22A9@lurchi.franken.de> <20131205123005.GE71737@netch.kiev.ua> To: Valentin Nechayev X-Mailer: Apple Mail (2.1510) Cc: freebsd-net@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 05 Dec 2013 13:39:05 -0000 On Dec 5, 2013, at 1:30 PM, Valentin Nechayev = wrote: > Hi, >=20 > Thu, Dec 05, 2013 at 11:32:03, Michael.Tuexen wrote about "Re: SCTP = huge connect delays (at amd64) and yet another question":=20 >=20 >>> The first discrepancy found is specific for FreeBSD on amd64 and not >>> for i386 version; it's that connection setup lasts 2-4 seconds (!!) >>> Tcpdump shows indication that could be parsed as message miss: >> Hi Valentin, >>=20 >> could you send me the .pcap file instead of the tcpdump output. >> I would like to see the addresses listed in the INIT and INIT-ACK. >=20 > I've sent them, thanks. I answered... >=20 >>> tcpdump: listening on lo0, link-type NULL (BSD loopback), capture = size 65535 byt >>> es >>> 08:18:34.639422 IP (tos 0x0, ttl 64, id 65094, offset 0, flags = [none], proto SCT >>> P (132), length 188, bad cksum 0 (->f274)!) >>> 10.0.0.2.50025 > 127.0.0.1.2500: sctp >> I'm wondering why 10.0.0.2 is the source address and not 127.0.0.1 >=20 > I've showed the code, it doesn't make any explicit binding or address > suggestion. For this host (9.1/i386), 10.0.0.2 resides on xl0. There > is no routing specifics which forces it to select 10.0.0.2: >=20 > $ route -n get 127.0.0.1 > route to: 127.0.0.1 > destination: 127.0.0.1 > interface: lo0 > flags: > recvpipe sendpipe ssthresh rtt,msec mtu weight expire > 0 0 0 0 16384 1 0 > $ telnet 127.0.0.1 25 > Trying 127.0.0.1... > Connected to localhost. > Escape character is '^]'. > 220 iv.local ESMTP Sendmail 8.14.5/8.14.5; Thu, 5 Dec 2013 13:48:31 = +0200 (EET) > ehlo zzz > 250-iv.local Hello netch@localhost [127.0.0.1], pleased to meet you > [...] >=20 > At least for TCP and UDP, it's quite straightforward. There might be an issue in the SCTP stack. It does handle addresses = differently than UDP. However, I wasn't able to reproduce your problem. I need to = test a setup similar to your, which I haven't done yet. >=20 >>> At 08:18:34.639467, cookie echo was sent but likely ignored. One >>> second later it was resent. Then, yet another strange timeout was >>> invented before HB REQ. >>>=20 >>> Test series show this can spend more than 4 seconds, average value >>> is about 3 seconds. Two 20-times run summary times are 58 to 63 >>> seconds, so, I've got 2.9...3.15 average connect time. >>>=20 >>> Neither Linux nor 32-bit FreeBSD shows this. >> FreeBSD should neither... Do you see this on FreeBSD 9.2 amd64? >=20 > Yes. A fresh dump has reproduced this. OK. Fine. This might an issue in the address handling... I'll try to reproduce this, >=20 >>> It's definitely better than delay each run, as on other platforms >>> (but the initial delay annoys roughly). >> Without SCTP_NODELAY bundling can happen or not, it depends on = timing. >> It would be great, if you can provide a .pcap file for a transfer you >> think shows some buggy behaviour. Then we can figure out what is = going on. >=20 >> MSG_EOR is nothing you provide at a send() call. The flag is only >> returned by the recvmsg() call. >=20 > Yes, I know. This has remained from the code which exposes > SOCK_SEQPACKET specifics over different transport families (e.g. > FreeBSD keeps this flag over AF_UNIX but Linux doesn't). I didn't take > it into account, but, if is needed for sight clarity, I'll remove it:) >=20 >>> } >> OK. Here is what I would expect on the wire: >>=20 >> Without SCTP_NODELAY: >>=20 >>> INIT >> < INIT_ACK >>> COOKIE_ECHO >> < COOKIE_ACK >> < DATA(abc) >>> SACK >> < DATA(def);DATA(ghi);DATA(jkl);DATA(mno);DATA(pqr) >>> SACK >>> SHUTDOWN >> < SHUTDOWN_ACK >>> SHUTDOWN_COMPLETE >>=20 >> There should be no substantial delay between any messages above. >>=20 >> With SCTP_NODELAY >>> INIT >> < INIT_ACK >>> COOKIE_ECHO >> < COOKIE_ACK >> < DATA(abc) >> < DATA(def) >> < DATA(ghi) >> < DATA(mno) >> < DATA(pqr) >>> SHUTDOWN >> < SHUTDOWN_ACK >>> SHUTDOWN_COMPLETE >>=20 >> There will be three SACK somewhere between the DATA chunks depending = on >> the timing. >>=20 >> There should be no substantial delay between any messages above. >>=20 >> I think if you see anything else, there is a bug. So do you see a = different >> behavior on FreeBSD 9.2 (i386/amd64)? If yes, can you provide a .pcap = file? >=20 > Sorry, I don't have 9.2/i386 yet. The dump from 9.1 is attached. It I actually don't expect a difference between 32-bit or 64-bit. I guess it might be more related to different address setup or timing. > has no address mess but the event sequence is following: >=20 >> INIT > < INIT_ACK >> COOKIE_ECHO > < COOKIE_ACK > < DATA(abc) >> SACK > < DATA(def) > ... delay 200ms... >> SACK > < DATA(ghi); DATA(jkl); DATA(mno); DATA(pqr) >=20 > Comparing to your description, it has unexplained waiting after > DATA(def) from the server side, and SACK delay from the client side. It is timing related as described in my other mail. Is the SACK received before the send() calls finish or vice versa... >=20 > If you think it's fixed in 9.2, we can postpone this part of > discussion until my upgrade to 9.2. >=20 >> Do you have any special routing setup? >=20 > Just this box (9.1/i386) is trivial, no any routing specifics. > For amd64 boxes, I've sent routing details privately. But it seems > there are also none principally "special" these except multiple > addresses at loopback. >=20 >> Please note, that the first SACK is returned without the 200ms delay. = This is >> required by the RFC and the above trace seems to show that. >>> But, if server shuts its writing side down ("s" in argv[]), this >>> laziness disappears. Again, the logic is too opaque and confusing. >> What do you mean by this? >=20 > At least, removing this delay by shutdown(,SHUT_WR) is unexpected. When you shutdown(,SHUT_WR) we send out pending data without waiting for a SACK, since there will be no more data from the user. This is shown by your attached traces and is intended. So it seems that * the timing is as expected for the data transmission phase * there is an issue with setting up associations when there are specific addresses on loopback. Do you agree? Best regards Michael >=20 >=20 > -netch- > From owner-freebsd-net@FreeBSD.ORG Thu Dec 5 18:29:52 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 1709778F; Thu, 5 Dec 2013 18:29:52 +0000 (UTC) Received: from mail-qe0-x232.google.com (mail-qe0-x232.google.com [IPv6:2607:f8b0:400d:c02::232]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id B69FB11D2; Thu, 5 Dec 2013 18:29:51 +0000 (UTC) Received: by mail-qe0-f50.google.com with SMTP id 1so15503204qec.23 for ; Thu, 05 Dec 2013 10:29:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=G0HpOnWOZgQvdeO5/FtHLsKbi5gBRJlunrlZSQE22TM=; b=bzUxqf/p7idy6hQlp6lmK6lcJ7yE7OI8fH2oSe1XGsowXlAApXqZlHnpvdD+r1l7aQ SzKWgdUgoUuGxlea7QluVlCo5ySlCy4GHzZfS61pXJ+ksXrYlVB/igAKtc9HzRUNewYg wRkJGO6y5yXApYCnGEx0JYRuoxWXkSqh3qa9rkVgffmRzcdOUphtadGS6ZRrrX9KUM/r 0fDJ32Z84Lc4WJEuOzDejnpWLe604pp3dfHcuR+bTmiyjATgd0oq8VxHHVjHT3v3EvvG /zkeyAlzTmRYDPD72f7qiqTbxiaDNKcZZtpyf0sX6XAx+qx+NMPdo5iHVxGyA5l+/hQv OyXQ== MIME-Version: 1.0 X-Received: by 10.49.24.163 with SMTP id v3mr87399765qef.78.1386268190994; Thu, 05 Dec 2013 10:29:50 -0800 (PST) Sender: adrian.chadd@gmail.com Received: by 10.224.53.200 with HTTP; Thu, 5 Dec 2013 10:29:50 -0800 (PST) In-Reply-To: <20131203021658.GC2981@michelle.cdnetworks.com> References: <521B9C2A-EECC-4412-9F68-2235320EF324@lurchi.franken.de> <20131202022338.GA3500@michelle.cdnetworks.com> <20131203021658.GC2981@michelle.cdnetworks.com> Date: Thu, 5 Dec 2013 10:29:50 -0800 X-Google-Sender-Auth: GgzO7v3Q3v1O6TEyiMPh6f7avAU Message-ID: Subject: Re: A small fix for if_em.c, if_igb.c, if_ixgbe.c From: Adrian Chadd To: Yong-Hyeon Pyun Content-Type: text/plain; charset=ISO-8859-1 Cc: Jack F Vogel , Michael Tuexen , "freebsd-net@freebsd.org list" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 05 Dec 2013 18:29:52 -0000 Hi, Yes. Looking at the ixgbe code, ixgbe_mq_start_locked() returns an error from ixgbe_xmit() but if it fails, it puts the buffer back. But it's already successfully queued a frame to the driver, so in this instance it shouldn't return the error from ixgbe_mq_start_locked(). The same deal in if_em.c and igb.c Now, drbr_putback() used to fail and now it doesn't, as you've said. So we should change the xxx_mq_start_locked() to set err=0 if we go via the drbr_putback() routine, as it hasn't actually failed to transmit. Now the very dirty thing is this - the error from xxx_transmit() is for the mbuf being queued at the end; but xxx_mq_start_locked() failures are for transmitting from the front. If there's only packet in the queue and that fails then they're the same thing and returning the error from xxx_mq_start_locked() matches the current mbuf being queued. But otherwise, they're referring to totally different packets. For TCP this may hurt; the TCP stack treats ENOBUFS a certain way and kicks off a timer to schedule a retransmit. I don't think we can fix _this_ right now. So Michael - can you redo your patch to set err=0 if drbr_putback() is called, and retest? Thanks! -adrian From owner-freebsd-net@FreeBSD.ORG Thu Dec 5 19:07:04 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id B0C5158E; Thu, 5 Dec 2013 19:07:04 +0000 (UTC) Received: from mail-n.franken.de (drew.ipv6.franken.de [IPv6:2001:638:a02:a001:20e:cff:fe4a:feaa]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 3E0021437; Thu, 5 Dec 2013 19:07:04 +0000 (UTC) Received: from [192.168.1.102] (p508F016D.dip0.t-ipconnect.de [80.143.1.109]) (Authenticated sender: macmic) by mail-n.franken.de (Postfix) with ESMTP id 5A0D01C0C0692; Thu, 5 Dec 2013 20:07:01 +0100 (CET) Content-Type: text/plain; charset=iso-8859-1 Mime-Version: 1.0 (Mac OS X Mail 6.6 \(1510\)) Subject: Re: A small fix for if_em.c, if_igb.c, if_ixgbe.c From: Michael Tuexen In-Reply-To: Date: Thu, 5 Dec 2013 20:07:00 +0100 Content-Transfer-Encoding: 7bit Message-Id: References: <521B9C2A-EECC-4412-9F68-2235320EF324@lurchi.franken.de> <20131202022338.GA3500@michelle.cdnetworks.com> <20131203021658.GC2981@michelle.cdnetworks.com> To: Adrian Chadd X-Mailer: Apple Mail (2.1510) Cc: Yong-Hyeon Pyun , Jack F Vogel , "freebsd-net@freebsd.org list" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 05 Dec 2013 19:07:04 -0000 On Dec 5, 2013, at 7:29 PM, Adrian Chadd wrote: > Hi, > > Yes. Looking at the ixgbe code, ixgbe_mq_start_locked() returns an > error from ixgbe_xmit() but if it fails, it puts the buffer back. But > it's already successfully queued a frame to the driver, so in this > instance it shouldn't return the error from ixgbe_mq_start_locked(). > > The same deal in if_em.c and igb.c > > Now, drbr_putback() used to fail and now it doesn't, as you've said. > So we should change the xxx_mq_start_locked() to set err=0 if we go > via the drbr_putback() routine, as it hasn't actually failed to > transmit. > > Now the very dirty thing is this - the error from xxx_transmit() is > for the mbuf being queued at the end; but xxx_mq_start_locked() > failures are for transmitting from the front. If there's only packet > in the queue and that fails then they're the same thing and returning > the error from xxx_mq_start_locked() matches the current mbuf being > queued. But otherwise, they're referring to totally different packets. > For TCP this may hurt; the TCP stack treats ENOBUFS a certain way and > kicks off a timer to schedule a retransmit. I don't think we can fix > _this_ right now. > > So Michael - can you redo your patch to set err=0 if drbr_putback() is > called, and retest? Sure. I'll report the result. Best regards Michael > > Thanks! > > > > > -adrian > From owner-freebsd-net@FreeBSD.ORG Thu Dec 5 21:05:21 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id F1DB2297; Thu, 5 Dec 2013 21:05:20 +0000 (UTC) Received: from mail-n.franken.de (drew.ipv6.franken.de [IPv6:2001:638:a02:a001:20e:cff:fe4a:feaa]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 83B6C1C7A; Thu, 5 Dec 2013 21:05:20 +0000 (UTC) Received: from [192.168.1.102] (p508F016D.dip0.t-ipconnect.de [80.143.1.109]) (Authenticated sender: macmic) by mail-n.franken.de (Postfix) with ESMTP id 778861C0C0695; Thu, 5 Dec 2013 22:05:18 +0100 (CET) Content-Type: text/plain; charset=iso-8859-1 Mime-Version: 1.0 (Mac OS X Mail 6.6 \(1510\)) Subject: Re: A small fix for if_em.c, if_igb.c, if_ixgbe.c From: Michael Tuexen In-Reply-To: Date: Thu, 5 Dec 2013 22:05:16 +0100 Content-Transfer-Encoding: 7bit Message-Id: References: <521B9C2A-EECC-4412-9F68-2235320EF324@lurchi.franken.de> <20131202022338.GA3500@michelle.cdnetworks.com> <20131203021658.GC2981@michelle.cdnetworks.com> To: Adrian Chadd X-Mailer: Apple Mail (2.1510) Cc: Yong-Hyeon Pyun , Jack F Vogel , "freebsd-net@freebsd.org list" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 05 Dec 2013 21:05:21 -0000 On Dec 5, 2013, at 7:29 PM, Adrian Chadd wrote: > Hi, > > Yes. Looking at the ixgbe code, ixgbe_mq_start_locked() returns an > error from ixgbe_xmit() but if it fails, it puts the buffer back. But > it's already successfully queued a frame to the driver, so in this > instance it shouldn't return the error from ixgbe_mq_start_locked(). > > The same deal in if_em.c and igb.c > > Now, drbr_putback() used to fail and now it doesn't, as you've said. > So we should change the xxx_mq_start_locked() to set err=0 if we go > via the drbr_putback() routine, as it hasn't actually failed to > transmit. > > Now the very dirty thing is this - the error from xxx_transmit() is > for the mbuf being queued at the end; but xxx_mq_start_locked() > failures are for transmitting from the front. If there's only packet > in the queue and that fails then they're the same thing and returning > the error from xxx_mq_start_locked() matches the current mbuf being > queued. But otherwise, they're referring to totally different packets. > For TCP this may hurt; the TCP stack treats ENOBUFS a certain way and > kicks off a timer to schedule a retransmit. I don't think we can fix > _this_ right now. Just to be clear: This would mean that xxx_transmit() would return an error even if the packet provided in the call xxx_transmit() is enqueued and not dropped? This would also be problem with the current SCTP stack. Best regards Michael > > So Michael - can you redo your patch to set err=0 if drbr_putback() is > called, and retest? > > Thanks! > > > > > -adrian > From owner-freebsd-net@FreeBSD.ORG Thu Dec 5 22:01:39 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 9F26FD79; Thu, 5 Dec 2013 22:01:39 +0000 (UTC) Received: from mail-qe0-x22d.google.com (mail-qe0-x22d.google.com [IPv6:2607:f8b0:400d:c02::22d]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 47B811FB8; Thu, 5 Dec 2013 22:01:39 +0000 (UTC) Received: by mail-qe0-f45.google.com with SMTP id 6so18006147qea.32 for ; Thu, 05 Dec 2013 14:01:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=u1PSICOLJ42xaneOr7sr51F8D9iFoAXIuPsswbBOP5E=; b=cRFfFZLRwcikolmBhurz5tIabVn/+/lUEIpmiyuE5iF+PfzeBZ1/4ixW93cJhphNT9 BcZMpQ8HlMJzcNGn1jo0kR+jBTATcwskL+f4OsqkxD+Ct3lvMqx+IARB7E0CDB+08zjk xodUV8VRLGGRCThZB2UNCVauIgvWdPnVL6TIEed27bp86T6E7BbO+sRZu7Eu+725AWv9 gGDOo0eAp2hWkmHrzKTfQqndoLtW6t2IlkNSQOMphzvZn1n7rAV97lxhCzEosE/j7M/x iJniXwcNVsONYF87mkK+0mp6gdY721NUK7G12faPXuT8/Ixe9H7/qllP1hP7+pUjo036 evCg== MIME-Version: 1.0 X-Received: by 10.229.137.69 with SMTP id v5mr633018qct.4.1386280898446; Thu, 05 Dec 2013 14:01:38 -0800 (PST) Sender: adrian.chadd@gmail.com Received: by 10.224.53.200 with HTTP; Thu, 5 Dec 2013 14:01:38 -0800 (PST) In-Reply-To: References: <521B9C2A-EECC-4412-9F68-2235320EF324@lurchi.franken.de> <20131202022338.GA3500@michelle.cdnetworks.com> <20131203021658.GC2981@michelle.cdnetworks.com> Date: Thu, 5 Dec 2013 14:01:38 -0800 X-Google-Sender-Auth: Q9_u_SdHfVKBB2BXEE5tt5WEKS4 Message-ID: Subject: Re: A small fix for if_em.c, if_igb.c, if_ixgbe.c From: Adrian Chadd To: Michael Tuexen Content-Type: text/plain; charset=ISO-8859-1 Cc: Yong-Hyeon Pyun , Jack F Vogel , "freebsd-net@freebsd.org list" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 05 Dec 2013 22:01:39 -0000 On 5 December 2013 13:05, Michael Tuexen wrote: > Just to be clear: This would mean that xxx_transmit() would return > an error even if the packet provided in the call xxx_transmit() is > enqueued and not dropped? > This would also be problem with the current SCTP stack. I think it'll return an error only if: * it queued the frame to the tail of the drbd; * it then tried to transmit a frame from the head of the drbd; * it failed to transmit the first frame in the drbd and it couldn't put it back into the queue for whatever reason. So I think it should be "ok enough" for both TCP and SCTP. Give it a go and let me know how it goes. It's an interesting architectural problem to completely solve. -adrian From owner-freebsd-net@FreeBSD.ORG Thu Dec 5 22:37:13 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id EBF493B9; Thu, 5 Dec 2013 22:37:13 +0000 (UTC) Received: from h2.funkthat.com (gate2.funkthat.com [208.87.223.18]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id A4117117D; Thu, 5 Dec 2013 22:37:13 +0000 (UTC) Received: from h2.funkthat.com (localhost [127.0.0.1]) by h2.funkthat.com (8.14.3/8.14.3) with ESMTP id rB5MbBfr074799 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 5 Dec 2013 14:37:12 -0800 (PST) (envelope-from jmg@h2.funkthat.com) Received: (from jmg@localhost) by h2.funkthat.com (8.14.3/8.14.3/Submit) id rB5MbBH3074798; Thu, 5 Dec 2013 14:37:11 -0800 (PST) (envelope-from jmg) Date: Thu, 5 Dec 2013 14:37:11 -0800 From: John-Mark Gurney To: Adrian Chadd Subject: Re: A small fix for if_em.c, if_igb.c, if_ixgbe.c Message-ID: <20131205223711.GB55638@funkthat.com> Mail-Followup-To: Adrian Chadd , Michael Tuexen , Yong-Hyeon Pyun , Jack F Vogel , "freebsd-net@freebsd.org list" References: <521B9C2A-EECC-4412-9F68-2235320EF324@lurchi.franken.de> <20131202022338.GA3500@michelle.cdnetworks.com> <20131203021658.GC2981@michelle.cdnetworks.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i X-Operating-System: FreeBSD 7.2-RELEASE i386 X-PGP-Fingerprint: 54BA 873B 6515 3F10 9E88 9322 9CB1 8F74 6D3F A396 X-Files: The truth is out there X-URL: http://resnet.uoregon.edu/~gurney_j/ X-Resume: http://resnet.uoregon.edu/~gurney_j/resume.html X-to-the-FBI-CIA-and-NSA: HI! HOW YA DOIN? can i haz chizburger? X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (h2.funkthat.com [127.0.0.1]); Thu, 05 Dec 2013 14:37:12 -0800 (PST) Cc: Yong-Hyeon Pyun , Michael Tuexen , Jack F Vogel , "freebsd-net@freebsd.org list" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 05 Dec 2013 22:37:14 -0000 Adrian Chadd wrote this message on Thu, Dec 05, 2013 at 14:01 -0800: > On 5 December 2013 13:05, Michael Tuexen > wrote: > > > Just to be clear: This would mean that xxx_transmit() would return > > an error even if the packet provided in the call xxx_transmit() is > > enqueued and not dropped? > > This would also be problem with the current SCTP stack. > > I think it'll return an error only if: > > * it queued the frame to the tail of the drbd; > * it then tried to transmit a frame from the head of the drbd; > * it failed to transmit the first frame in the drbd and it couldn't > put it back into the queue for whatever reason. > > So I think it should be "ok enough" for both TCP and SCTP. IMO it should only return an error if the specific frame failed to be sent or queued. If you cannot determine at return time if the frame failed to be transmitted/queued, then it should return success. In the above case, if there were other frames queued ahead, and the first one failed, then it sounds like the frame may eventually be sent and we will end up sending a duplicate frame. > Give it a go and let me know how it goes. > > It's an interesting architectural problem to completely solve. -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not." From owner-freebsd-net@FreeBSD.ORG Thu Dec 5 23:10:31 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 5602BF9B; Thu, 5 Dec 2013 23:10:31 +0000 (UTC) Received: from mail-qa0-x233.google.com (mail-qa0-x233.google.com [IPv6:2607:f8b0:400d:c00::233]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id E19B61371; Thu, 5 Dec 2013 23:10:30 +0000 (UTC) Received: by mail-qa0-f51.google.com with SMTP id o15so51607qap.17 for ; Thu, 05 Dec 2013 15:10:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:content-type; bh=jAK6VOKg3hHSGafZSkKKlWuv5na1bc7HdO1VllnyIj4=; b=tOE0j8P8w/2uYvUv9eFVoN8O7gSQGKlRyyx1vSwcGg5r92wP84fSDwVUoEMZTl010o sITmErzYkSCdDAPsU2ek1IiekIm+yGmxpSlOyB0A9IJeH03jmULYxbWwxMAvnKbcG94/ WwbAKM3U/QiC0n/gfvekenMJu6R2VqR7pIPcFeLkOP6QF5hHWLs1Z9SMhyAMM3vvHXXr yPF2AQ0Lv5TgMe51oBz2dVqSRxpJihu6PkdWKm8UQulmbIjOD+j9fDBXfEL56QNyMdrb 0ACmljiCMezsmKFkHc519Naoza2JZR2Cs9lf7NT90wizH/I0Wao/EzwTY5dLOXD1s9o3 LPtA== MIME-Version: 1.0 X-Received: by 10.49.17.232 with SMTP id r8mr625571qed.74.1386285030050; Thu, 05 Dec 2013 15:10:30 -0800 (PST) Sender: adrian.chadd@gmail.com Received: by 10.224.53.200 with HTTP; Thu, 5 Dec 2013 15:10:29 -0800 (PST) In-Reply-To: <20131205223711.GB55638@funkthat.com> References: <521B9C2A-EECC-4412-9F68-2235320EF324@lurchi.franken.de> <20131202022338.GA3500@michelle.cdnetworks.com> <20131203021658.GC2981@michelle.cdnetworks.com> <20131205223711.GB55638@funkthat.com> Date: Thu, 5 Dec 2013 15:10:29 -0800 X-Google-Sender-Auth: uIGcF04FtMbrYJKt4ij-9g8qE7s Message-ID: Subject: Re: A small fix for if_em.c, if_igb.c, if_ixgbe.c From: Adrian Chadd To: Adrian Chadd , Michael Tuexen , Yong-Hyeon Pyun , Jack F Vogel , "freebsd-net@freebsd.org list" Content-Type: text/plain; charset=ISO-8859-1 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 05 Dec 2013 23:10:31 -0000 On 5 December 2013 14:37, John-Mark Gurney wrote: > Adrian Chadd wrote this message on Thu, Dec 05, 2013 at 14:01 -0800: >> On 5 December 2013 13:05, Michael Tuexen >> wrote: >> >> > Just to be clear: This would mean that xxx_transmit() would return >> > an error even if the packet provided in the call xxx_transmit() is >> > enqueued and not dropped? >> > This would also be problem with the current SCTP stack. >> >> I think it'll return an error only if: >> >> * it queued the frame to the tail of the drbd; >> * it then tried to transmit a frame from the head of the drbd; >> * it failed to transmit the first frame in the drbd and it couldn't >> put it back into the queue for whatever reason. >> >> So I think it should be "ok enough" for both TCP and SCTP. > > IMO it should only return an error if the specific frame failed to be > sent or queued. If you cannot determine at return time if the frame > failed to be transmitted/queued, then it should return success. For the long term solution, I agree. > In the above case, if there were other frames queued ahead, and the > first one failed, then it sounds like the frame may eventually be sent > and we will end up sending a duplicate frame. Right. We should also fix this properly. I think the right thing, long term, is something like this; * xxx_mq_start_locked() returns whether the head frame was transmitted or not; * the if_transmit() entry point(s) return whether the given frame was queued to the software queue or not; * the if_transmit() entry point(s) ignore the return value of xxx_mq_start_locked(), as the stack _should_ handle the case of a frame handed to the driver but dropped. So, I'd like to get Michael to first test fixing up xxx_mq_start_locked() to only return an error if it failed to transmit a frame and the frame was dropped. Then, once we get feedback from that, I was going to propose that we also do what Michael initially did - and that's ignore the error from calling xxx_mq_start_locked(). Followed, hopefully, with some comments explaining how this all holds together. How's that sound? -adrian From owner-freebsd-net@FreeBSD.ORG Fri Dec 6 02:51:40 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 2DB0EAD8 for ; Fri, 6 Dec 2013 02:51:40 +0000 (UTC) Received: from nm11-vm1.bullet.mail.bf1.yahoo.com (nm11-vm1.bullet.mail.bf1.yahoo.com [98.139.213.152]) by mx1.freebsd.org (Postfix) with SMTP id C3BA111F4 for ; Fri, 6 Dec 2013 02:51:39 +0000 (UTC) Received: from [66.196.81.172] by nm11.bullet.mail.bf1.yahoo.com with NNFMP; 06 Dec 2013 02:51:33 -0000 Received: from [68.142.230.65] by tm18.bullet.mail.bf1.yahoo.com with NNFMP; 06 Dec 2013 02:51:32 -0000 Received: from [127.0.0.1] by smtp222.mail.bf1.yahoo.com with NNFMP; 06 Dec 2013 02:51:32 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1386298292; bh=KgMYV567kAtn8o8hpPVZL2Q667pb0084Lr7nz9T18Ec=; h=X-Yahoo-Newman-Id:X-Yahoo-Newman-Property:X-YMail-OSG:X-Yahoo-SMTP:X-Rocket-Received:Message-ID:Date:From:User-Agent:MIME-Version:To:Subject:Content-Type; b=5GqqE2L6HEGF7l+1sDKTOwkJzVveksqHORTdJ6n3klcJPDYymWS7ORdyK08j4Y2Uak50uH06UwNGFuT07mb5UK7Ba5dMLtVJd/pS97a2GacnE1pXCvwokxcJPz/Iiuy+c5y2Lr8jRa4ByG4GPHaejKjJHEZzeDw78SdIGjA8LiY= X-Yahoo-Newman-Id: 957067.31842.bm@smtp222.mail.bf1.yahoo.com X-Yahoo-Newman-Property: ymail-3 X-YMail-OSG: 91Vh7LAVM1niDGiwgV5R_KnKbYP67F12qauK4WQWJDqokmd EMq441Hjgq6E6P3g13wSNzwQlYcQtgStoIrETCb4VxSMH6acapw_6ScI5llJ iBB1Nk72uPDT9L0V1oqDrIcWgxoJ.12LEqLyc6Kzivnq5twArb9hXza8jQCc 5RefqM319e4oqihJKwGk8QBQXelnnOlUiLKO6Rj2HVgbMy9ok8qrKYEt31gt 0RqwEGGivcsC1hn4P2l3Pp5mDux3HQQgVyMrRTBadyOlh4IceqOzLGW8loT3 rDRc7Tghv_uatRR3dxn4lvknROJoTySNH1K3Er73uOD.DRL7aAWhamD4x5dU PZJ7HIPm7BgMnd3nVTI2ZZRAsfpVH5PASr2NDfdLD24K3DW..X64SkaJHsFX SCS7Vg5BeN_XKq5LXif1I7fUfe.8yLbMuygso.WWQ5_ptnSr3.AwvyGTuuh8 EVAcEpyrjaqZbShDJ1aF7Vi59kkWGApdYFMoA2BMEYchQGf6whxmjAt4T29d 2Ob.PR52qfmbAYFUlbAMpwdvAoWHr.oZv4nVOnajeFESbLTtz9q9Qzd9LKw- - X-Yahoo-SMTP: sHqPI42swBDl6e.0QxkIIsC77EttkMXsaRT5OA-- X-Rocket-Received: from [192.168.1.18] (blue_phoenix316@76.4.203.61 with ) by smtp222.mail.bf1.yahoo.com with SMTP; 06 Dec 2013 02:51:32 +0000 UTC Message-ID: <52A13BB1.50106@yahoo.com> Date: Thu, 05 Dec 2013 21:51:29 -0500 From: Darryl Lyle User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:24.0) Gecko/20100101 Thunderbird/24.1.0 MIME-Version: 1.0 To: freebsd-net@freebsd.org, yongari@freebsd.org Subject: Can't connect to network with my NIC Content-Type: multipart/mixed; boundary="------------050205050605040301090407" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 Dec 2013 02:51:40 -0000 This is a multi-part message in MIME format. --------------050205050605040301090407 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Hey guys, I just installed the latest pc-bsd 10 stable image, and I can't connect to my network with my nic, but I can connect fine via wifi. ifconfig re0 e0: flags=8843 metric 0 mtu 1500 options=8209b ether f8:b1:56:9d:84:3a inet6 fe80::fab1:56ff:fe9d:843a%re0 prefixlen 64 scopeid 0x1 inet 0.0.0.0 netmask 0xff000000 broadcast 255.255.255.255 nd6 options=23 media: Ethernet autoselect (1000baseT ) status: active Attached is dmesg and pciconf -lv If I try dhclient re0 I get no DHCPOFFERS v/r Darryl Lyle --------------050205050605040301090407 Content-Type: text/plain; charset=UTF-8; name="dmesg.txt" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="dmesg.txt" Copyright (c) 1992-2013 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 10-STABLE-p4 #0 403eae4(stable/10): Mon Nov 18 16:35:51 EST 2013 root@avenger:/usr/obj/root/pcbsd-build/git/freebsd/sys/GENERIC amd64 FreeBSD clang version 3.3 (tags/RELEASE_33/final 183502) 20130610 CPU: Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz (3392.21-MHz K8-class CPU) Origin = "GenuineIntel" Id = 0x306c3 Family = 0x6 Model = 0x3c Stepping = 3 Features=0xbfebfbff Features2=0x7ffafbff,FMA,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND> AMD Features=0x2c100800 AMD Features2=0x21 Standard Extended Features=0x2fbb TSC: P-state invariant, performance statistics real memory = 9110028288 (8688 MB) avail memory = 8212455424 (7832 MB) Event timer "LAPIC" quality 600 ACPI APIC Table: FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs FreeBSD/SMP: 1 package(s) x 4 core(s) x 2 SMT threads cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 cpu2 (AP): APIC ID: 2 cpu3 (AP): APIC ID: 3 cpu4 (AP): APIC ID: 4 cpu5 (AP): APIC ID: 5 cpu6 (AP): APIC ID: 6 cpu7 (AP): APIC ID: 7 ioapic0 irqs 0-23 on motherboard kbd1 at kbdmux0 random: initialized cryptosoft0: on motherboard aesni0: on motherboard acpi0: on motherboard acpi0: Power Button (fixed) acpi0: reservation of 67, 1 (4) failed cpu0: on acpi0 cpu1: on acpi0 cpu2: on acpi0 cpu3: on acpi0 cpu4: on acpi0 cpu5: on acpi0 cpu6: on acpi0 cpu7: on acpi0 hpet0: iomem 0xfed00000-0xfed003ff on acpi0 Timecounter "HPET" frequency 14318180 Hz quality 950 Event timer "HPET" frequency 14318180 Hz quality 550 atrtc0: port 0x70-0x77 irq 8 on acpi0 atrtc0: Warning: Couldn't map I/O. Event timer "RTC" frequency 32768 Hz quality 0 attimer0: port 0x40-0x43,0x50-0x53 irq 0 on acpi0 Timecounter "i8254" frequency 1193182 Hz quality 0 Event timer "i8254" frequency 1193182 Hz quality 100 Timecounter "ACPI-fast" frequency 3579545 Hz quality 900 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1808-0x180b on acpi0 pcib0: port 0xcf8-0xcff on acpi0 pci0: on pcib0 pcib1: irq 16 at device 1.0 on pci0 pci1: on pcib1 vgapci0: port 0xe000-0xe07f mem 0xf6000000-0xf6ffffff,0xe8000000-0xefffffff,0xf0000000-0xf1ffffff irq 16 at device 0.0 on pci1 nvidia0: on vgapci0 vgapci0: child nvidia0 requested pci_enable_io vgapci0: child nvidia0 requested pci_enable_io hdac0: mem 0xf7080000-0xf7083fff irq 17 at device 0.1 on pci1 xhci0: mem 0xf7300000-0xf730ffff irq 16 at device 20.0 on pci0 xhci0: 32 byte context size. xhci0: Port routing mask set to 0xffffffff usbus0 on xhci0 pci0: at device 22.0 (no driver attached) ehci0: mem 0xf7318000-0xf73183ff irq 16 at device 26.0 on pci0 usbus1: EHCI version 1.0 usbus1 on ehci0 hdac1: mem 0xf7310000-0xf7313fff irq 22 at device 27.0 on pci0 pcib2: irq 16 at device 28.0 on pci0 pci2: on pcib2 pcib3: irq 18 at device 28.2 on pci0 pci3: on pcib3 re0: port 0xd000-0xd0ff mem 0xf7200000-0xf7200fff,0xf2100000-0xf2103fff irq 18 at device 0.0 on pci3 re0: Using 1 MSI-X message re0: Chip rev. 0x4c000000 re0: MAC rev. 0x00000000 miibus0: on re0 rgephy0: PHY 1 on miibus0 rgephy0: none, 10baseT, 10baseT-FDX, 10baseT-FDX-flow, 100baseTX, 100baseTX-FDX, 100baseTX-FDX-flow, 1000baseT-FDX, 1000baseT-FDX-master, 1000baseT-FDX-flow, 1000baseT-FDX-flow-master, auto, auto-flow re0: Ethernet address: f8:b1:56:9d:84:3a pcib4: irq 19 at device 28.7 on pci0 pci4: on pcib4 ath0: mem 0xf7100000-0xf717ffff irq 19 at device 0.0 on pci4 ar9300_set_stub_functions: setting stub functions ar9300_set_stub_functions: setting stub functions ar9300_attach: calling ar9300_hw_attach ar9300_hw_attach: calling ar9300_eeprom_attach ar9300_flash_map: unimplemented for now Restoring Cal data from DRAM Restoring Cal data from EEPROM Restoring Cal data from Flash Restoring Cal data from Flash Restoring Cal data from OTP ar9300_hw_attach: ar9300_eeprom_attach returned 0 ath0: RX status length: 48 ath0: RX buffer size: 4096 ath0: TX descriptor length: 128 ath0: TX status length: 36 ath0: TX buffers per descriptor: 4 ar9300_freebsd_setup_x_tx_desc: called, 0x0/0, 0x0/0, 0x0/0 ath0: ath_edma_setup_rxfifo: type=0, FIFO depth = 16 entries ath0: ath_edma_setup_rxfifo: type=1, FIFO depth = 128 entries ath0: [HT] enabling HT modes ath0: [HT] enabling short-GI in 20MHz mode ath0: [HT] 1 stream STBC receive enabled ath0: [HT] 1 RX streams; 1 TX streams ath0: AR9485 mac 576.1 RF5110 phy 0.0 ath0: 2GHz radio: 0x0000; 5GHz radio: 0x0000 ehci1: mem 0xf7317000-0xf73173ff irq 23 at device 29.0 on pci0 usbus2: EHCI version 1.0 usbus2 on ehci1 isab0: at device 31.0 on pci0 isa0: on isab0 ahci0: port 0xf070-0xf077,0xf060-0xf063,0xf050-0xf057,0xf040-0xf043,0xf020-0xf03f mem 0xf7316000-0xf73167ff irq 19 at device 31.2 on pci0 ahci0: AHCI v1.30 with 6 6Gbps ports, Port Multiplier not supported ahcich0: at channel 0 on ahci0 ahcich1: at channel 1 on ahci0 ahciem0: on ahci0 pci0: at device 31.3 (no driver attached) acpi_button0: on acpi0 acpi_tz0: on acpi0 acpi_tz1: on acpi0 sc0: at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 atkbdc0: at port 0x60,0x64 on isa0 atkbd0: irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] ppc0: cannot reserve I/O port range est0: on cpu0 p4tcc0: on cpu0 est1: on cpu1 p4tcc1: on cpu1 est2: on cpu2 p4tcc2: on cpu2 est3: on cpu3 p4tcc3: on cpu3 est4: on cpu4 p4tcc4: on cpu4 est5: on cpu5 p4tcc5: on cpu5 est6: on cpu6 p4tcc6: on cpu6 est7: on cpu7 p4tcc7: on cpu7 ZFS filesystem version: 5 ZFS storage pool version: features support (5000) Timecounters tick every 1.000 msec vboxdrv: fAsync=0 offMin=0x468 offMax=0x668 hdacc0: at cad 0 on hdac0 hdaa0: at nid 1 on hdacc0 pcm0: at nid 4 on hdaa0 pcm1: at nid 5 on hdaa0 pcm2: at nid 6 on hdaa0 pcm3: at nid 7 on hdaa0 hdacc1: at cad 0 on hdac1 hdaa1: at nid 1 on hdacc1 pcm4: at nid 20,22,21,23,27 and 24,25,26 on hdaa1 random: unblocking device. usbus0: 5.0Gbps Super Speed USB v3.0 usbus1: 480Mbps High Speed USB v2.0 usbus2: 480Mbps High Speed USB v2.0 ugen2.1: at usbus2 uhub0: on usbus2 ugen1.1: at usbus1 uhub1: on usbus1 ugen0.1: <0x8086> at usbus0 uhub2: <0x8086 XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0 ada0 at ahcich0 bus 0 scbus0 target 0 lun 0 ada0: ATA-9 SATA 3.x device ada0: Serial Number Z1D7GHVB ada0: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes) ada0: Command Queueing enabled ada0: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C) ada0: quirks=0x1<4K> ada0: Previously was known as ad4 ses0 at ahciem0 bus 0 scbus2 target 0 lun 0 ses0: SEMB S-E-S 2.00 device ses0: SEMB SES Device cd0 at ahcich1 bus 0 scbus1 target 0 lun 0 cd0: Removable CD-ROM SCSI-0 device cd0: Serial Number S10Q6YBD800E6C cd0: 150.000MB/s transfers (SATA 1.x, UDMA5, ATAPI 12bytes, PIO 8192bytes) cd0: Attempt to query device size failed: NOT READY, Medium not present - tray closed Netvsc initializing... SMP: AP CPU #1 Launched! SMP: AP CPU #3 Launched! SMP: AP CPU #2 Launched! SMP: AP CPU #6 Launched! SMP: AP CPU #7 Launched! SMP: AP CPU #4 Launched! SMP: AP CPU #5 Launched! Timecounter "TSC-low" frequency 1696106832 Hz quality 1000 Root mount waiting for: usbus2 usbus1 usbus0 uhub0: 2 ports with 2 removable, self powered uhub1: 2 ports with 2 removable, self powered uhub2: 21 ports with 21 removable, self powered Root mount waiting for: usbus2 usbus1 usbus0 xhci0: Port routing mask set to 0x00000000 usb_alloc_device: device init 2 failed (USB_ERR_IOERROR, ignored) ugen0.2: at usbus0 (disconnected) uhub_reattach_port: could not allocate new device ugen2.2: at usbus2 uhub3: on usbus2 ugen1.2: at usbus1 uhub4: on usbus1 uhub4: 6 ports with 6 removable, self powered uhub3: 8 ports with 8 removable, self powered Root mount waiting for: usbus2 usbus1 ugen1.3: at usbus1 umass0: on usbus1 umass0: SCSI over Bulk-Only; quirks = 0x4000 umass0:3:0:-1: Attached to scbus3 da0 at umass-sim0 bus 0 scbus3 target 0 lun 0 da0: Removable Direct Access SCSI-0 device da0: Serial Number 20100818841300000 da0: 40.000MB/s transfers da0: Attempt to query device size failed: NOT READY, Medium not present da0: quirks=0x2 da1 at umass-sim0 bus 0 scbus3 target 0 lun 1 da1: Removable Direct Access SCSI-0 device da1: Serial Number 20100818841300000 da1: 40.000MB/s transfers da1: Attempt to query device size failed: NOT READY, Medium not present da1: quirks=0x2 ugen2.3: at usbus2 ukbd0: on usbus2 da2 at umass-sim0 bus 0 scbus3 target 0 lun 2 da2: Removable Direct Access SCSI-0 device da2: Serial Number 20100818841300000 da2: 40.000MB/s transfers da2: Attempt to query device size failed: NOT READY, Medium not present da2: quirks=0x2 kbd2 at ukbd0 da3 at umass-sim0 bus 0 scbus3 target 0 lun 3 da3: Removable Direct Access SCSI-0 device da3: Serial Number 20100818841300000 da3: 40.000MB/s transfers da3: Attempt to query device size failed: NOT READY, Medium not present da3: quirks=0x2 ugen2.4: at usbus2 Root mount waiting for: usbus2 ugen2.5: at usbus2 Trying to mount root from zfs:tank/ROOT/default []... wlan0: Ethernet address: 80:56:f2:3b:a5:67 uhid0: on usbus2 ums0: on usbus2 ums0: 8 buttons and [XYZT] coordinates ID=0 Cuse4BSD v0.1.30 @ /dev/cuse pefs: AESNI hardware acceleration enabled ipfw2 (+ipv6) initialized, divert loadable, nat loadable, default to deny, logging disabled WARNING: attempt to domain_add(bluetooth) after domainfinalize() pid 3347 (VBoxSVC), uid 0: exited on signal 6 re0: link state changed to DOWN re0: link state changed to UP wlan0: Ethernet address: 80:56:f2:3b:a5:67 ath0: ath_edma_recv_tasklet: sc_inreset_cnt > 0; skipping wlan0: link state changed to UP wlan0: link state changed to DOWN re0: link state changed to DOWN re0: link state changed to UP wlan0: Ethernet address: 80:56:f2:3b:a5:67 wlan0: link state changed to UP --------------050205050605040301090407 Content-Type: text/plain; charset=UTF-8; name="pciconf.txt" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="pciconf.txt" hostb0@pci0:0:0:0: class=0x060000 card=0x05b71028 chip=0x0c008086 rev=0x06 hdr=0x00 vendor = 'Intel Corporation' device = 'Haswell DRAM Controller' class = bridge subclass = HOST-PCI pcib1@pci0:0:1:0: class=0x060400 card=0x05b71028 chip=0x0c018086 rev=0x06 hdr=0x01 vendor = 'Intel Corporation' device = 'Haswell PCI Express x16 Controller' class = bridge subclass = PCI-PCI xhci0@pci0:0:20:0: class=0x0c0330 card=0x05b71028 chip=0x8c318086 rev=0x05 hdr=0x00 vendor = 'Intel Corporation' device = 'Lynx Point USB xHCI Host Controller' class = serial bus subclass = USB none0@pci0:0:22:0: class=0x078000 card=0x05b71028 chip=0x8c3a8086 rev=0x04 hdr=0x00 vendor = 'Intel Corporation' device = 'Lynx Point MEI Controller' class = simple comms ehci0@pci0:0:26:0: class=0x0c0320 card=0x05b71028 chip=0x8c2d8086 rev=0x05 hdr=0x00 vendor = 'Intel Corporation' device = 'Lynx Point USB Enhanced Host Controller' class = serial bus subclass = USB hdac1@pci0:0:27:0: class=0x040300 card=0x05b71028 chip=0x8c208086 rev=0x05 hdr=0x00 vendor = 'Intel Corporation' device = 'Lynx Point High Definition Audio Controller' class = multimedia subclass = HDA pcib2@pci0:0:28:0: class=0x060400 card=0x05b71028 chip=0x8c108086 rev=0xd5 hdr=0x01 vendor = 'Intel Corporation' device = 'Lynx Point PCI Express Root Port' class = bridge subclass = PCI-PCI pcib3@pci0:0:28:2: class=0x060400 card=0x05b71028 chip=0x8c148086 rev=0xd5 hdr=0x01 vendor = 'Intel Corporation' device = 'Lynx Point PCI Express Root Port' class = bridge subclass = PCI-PCI pcib4@pci0:0:28:7: class=0x060400 card=0x05b71028 chip=0x8c1e8086 rev=0xd5 hdr=0x01 vendor = 'Intel Corporation' device = 'Lynx Point PCI Express Root Port' class = bridge subclass = PCI-PCI ehci1@pci0:0:29:0: class=0x0c0320 card=0x05b71028 chip=0x8c268086 rev=0x05 hdr=0x00 vendor = 'Intel Corporation' device = 'Lynx Point USB Enhanced Host Controller' class = serial bus subclass = USB isab0@pci0:0:31:0: class=0x060100 card=0x05b71028 chip=0x8c448086 rev=0x05 hdr=0x00 vendor = 'Intel Corporation' device = 'Lynx Point LPC Controller' class = bridge subclass = PCI-ISA ahci0@pci0:0:31:2: class=0x010601 card=0x05b71028 chip=0x8c028086 rev=0x05 hdr=0x00 vendor = 'Intel Corporation' device = 'Lynx Point 6-port SATA Controller 1 [AHCI mode]' class = mass storage subclass = SATA none1@pci0:0:31:3: class=0x0c0500 card=0x05b71028 chip=0x8c228086 rev=0x05 hdr=0x00 vendor = 'Intel Corporation' device = 'Lynx Point SMBus Controller' class = serial bus subclass = SMBus vgapci0@pci0:1:0:0: class=0x030000 card=0x098a10de chip=0x118510de rev=0xa1 hdr=0x00 vendor = 'NVIDIA Corporation' class = display subclass = VGA hdac0@pci0:1:0:1: class=0x040300 card=0x098a10de chip=0x0e0a10de rev=0xa1 hdr=0x00 vendor = 'NVIDIA Corporation' device = 'GK104 HDMI Audio Controller' class = multimedia subclass = HDA re0@pci0:3:0:0: class=0x020000 card=0x05b71028 chip=0x816810ec rev=0x0c hdr=0x00 vendor = 'Realtek Semiconductor Co., Ltd.' device = 'RTL8111/8168B PCI Express Gigabit Ethernet controller' class = network subclass = ethernet ath0@pci0:4:0:0: class=0x028000 card=0x02091028 chip=0x0032168c rev=0x01 hdr=0x00 vendor = 'Atheros Communications Inc.' device = 'AR9485 Wireless Network Adapter' class = network --------------050205050605040301090407-- From owner-freebsd-net@FreeBSD.ORG Fri Dec 6 03:04:33 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 67052F0; Fri, 6 Dec 2013 03:04:33 +0000 (UTC) Received: from mail-la0-x22d.google.com (mail-la0-x22d.google.com [IPv6:2a00:1450:4010:c03::22d]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 2E7BF133A; Fri, 6 Dec 2013 03:04:32 +0000 (UTC) Received: by mail-la0-f45.google.com with SMTP id eh20so39634lab.4 for ; Thu, 05 Dec 2013 19:04:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=mm46h5ABagl0FCh8JpvgvRzoQH2L1l1AWnt4Tz+UZ3s=; b=FX2egA1tHxz2DbTswccBj2H9zS1iSgk2BW8PI4eK/xa2zMhDL8jvWRA0lPNf+sWsFD 8+cg15Xe/tNrLLAg0SNt9YIHnlMvzOWgpSIRWhGnbMqWNzKYKYoDBGJee4dFKRwkh6Uh trg1L5vy/2vFtnp6hPYPvv3SFLVWI5KeQlYVFoDPg8cOUGm6Oj8i/TSel1cDJVf8mMf4 0o1yGvAclB4x0NoXChxW0rAdASsFcy+RO6vSKVM0kiejyWcr58BDe/g15mCCLK0bPVlM dOOLBQinezRSfVCsO4HqwQHzW/V4Ov+YNGkpWlqwB8QKXM9QeXsdsLQCj/sFoxEHLR+Z PZ7w== MIME-Version: 1.0 X-Received: by 10.152.234.170 with SMTP id uf10mr238362lac.43.1386299069530; Thu, 05 Dec 2013 19:04:29 -0800 (PST) Received: by 10.114.166.163 with HTTP; Thu, 5 Dec 2013 19:04:29 -0800 (PST) In-Reply-To: References: <4053E074-EDC5-49AB-91A7-E50ABE36602E@freebsd.org> Date: Fri, 6 Dec 2013 11:04:29 +0800 Message-ID: Subject: Re: [PATCH] SO_REUSEADDR and SO_REUSEPORT behaviour From: Sepherosa Ziehau To: Adrian Chadd Content-Type: text/plain; charset=ISO-8859-1 Cc: =?ISO-8859-1?Q?Ermal_Lu=E7i?= , freebsd-net , Oleg Moskalenko , Tim Kientzle , "freebsd-current@freebsd.org" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 Dec 2013 03:04:33 -0000 On Tue, Dec 3, 2013 at 5:41 AM, Adrian Chadd wrote: > > On 2 December 2013 03:45, Sepherosa Ziehau wrote: > > > > On Mon, Dec 2, 2013 at 1:02 PM, Adrian Chadd wrote: > > > >> Ok, so given this, how do you guarantee the UTHREAD stays on the given > >> CPU? You assume it stays on the CPU that the initial listen socket was > >> created on, right? If it's migrated to another CPU core then the > >> listen queue still stays in the original hash group that's in a netisr > >> on a different CPU? > > > > As I wrote in the above brief introduction, Dfly currently relies on the > > scheduler doing the proper thing (the scheduler does do a very good job > > during my tests). I need to export certain kind of socket option to make > > that information available to user space programs. Force UTHREAD binding in > > kernel is not helpful, given in reverse proxy application, things are > > different. And even if that kind of binding information was exported to > > user space, user space program still would have to poll it periodically (in > > Dfly at least), since other programs binding to the same addr/port could > > come and go, which will cause reorganizing of the inp localgroup in the > > current Dfly implementation. > > Right. I kinda gathered that. It's fine, I was conceptually thinking > of doing some thead pinning into this anyway. > > How do you see this scaling on massively multi-core machines? Like 32, > 48, 64, 128 cores? I had some vague handwav-y notion of maybe limiting We do have a 48 core box. It is mainly used for package building and other stuffs. I didn't run network stress tests on it. However, we do address some message passing problems on it which will not be unveiled on 8 cpu boxes. > the concept of pcbgroup hash / netisr threads to a subset of CPUs, or > have them be able to float between sockets but only have 1 (or n, Floating around may be good, but by pinning netisr to a specific CPU you could enjoy lockless per-cpu data. > maybe) per socket. Or just have a fixed, smaller pool. The idea then We used to have dedicated threads for UDP and TCP processing, but it turns out that one netisr per cpu works best in Dfly. You probably need to try and measure before deciding to move to 1 or N netisrs per cpu. Best Regards, sephe > is the scheduler would need to be told that a given userland > thread/process belongs to a given netisr thread, and to schedule them > on the same CPU when possible. > > Anyway, thanks for doing this work. I only wish that you'd do it for > FreeBSD. :-) > > > > -adrian -- Tomorrow Will Never Die From owner-freebsd-net@FreeBSD.ORG Fri Dec 6 03:50:59 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 9BF7C194; Fri, 6 Dec 2013 03:50:59 +0000 (UTC) Received: from mail-qe0-x230.google.com (mail-qe0-x230.google.com [IPv6:2607:f8b0:400d:c02::230]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 1E0B21669; Fri, 6 Dec 2013 03:50:59 +0000 (UTC) Received: by mail-qe0-f48.google.com with SMTP id gc15so119188qeb.35 for ; Thu, 05 Dec 2013 19:50:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=J41ZYyv2zM12gpm4hTJRsBZIJO9j6ai5ubyi0NvaAhQ=; b=bAT6sFNJnbcw8+8RESL27KPy3tmRymlOWsfAsiBp9OTBi1Hk6mjcJSnIp5KCUExQbk 6nkdztjH/VR2pXgLmwf194RU9HEVZFvKFfKYKcm3Xjt0ktE9W7VqHtQ7pz5ZQfvJYrOw pdCUziOT+7Jdt3Min6Hm+lJb33/fK0vF8q04oZeKPQ3ZRN6OCvtLB52DXZQ09j6WEEes LEfFGPothrY8eQy4cH4U0pv/nD3W4h4sW3bJCehQSon0e8FjTiHD9DAsQ1x8hm+zbfUU xyBO2D5FrCDKQW8wPDS/NyMEPjikV+f39LHz9mx867vrn6CsLXXiYgLqzRxScDP6LmK1 3jeA== MIME-Version: 1.0 X-Received: by 10.49.116.141 with SMTP id jw13mr2419321qeb.2.1386301858239; Thu, 05 Dec 2013 19:50:58 -0800 (PST) Sender: adrian.chadd@gmail.com Received: by 10.224.53.200 with HTTP; Thu, 5 Dec 2013 19:50:58 -0800 (PST) In-Reply-To: References: <4053E074-EDC5-49AB-91A7-E50ABE36602E@freebsd.org> Date: Thu, 5 Dec 2013 19:50:58 -0800 X-Google-Sender-Auth: BJDVu1NITTCeRaUA2CYm0PA57Pw Message-ID: Subject: Re: [PATCH] SO_REUSEADDR and SO_REUSEPORT behaviour From: Adrian Chadd To: Sepherosa Ziehau Content-Type: text/plain; charset=ISO-8859-1 Cc: =?ISO-8859-1?Q?Ermal_Lu=E7i?= , freebsd-net , Oleg Moskalenko , Tim Kientzle , "freebsd-current@freebsd.org" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 Dec 2013 03:50:59 -0000 I was thinking of n netisrs per m CPUs, where n < m; or maybe 1 netisr for m CPUs, where m is less than the total number. Having 48 cores contending on netisr stuff is a bit crazy. It's highly unlikely you need that many cores doing packet pushing. -a From owner-freebsd-net@FreeBSD.ORG Fri Dec 6 08:47:05 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 415BE8B8; Fri, 6 Dec 2013 08:47:05 +0000 (UTC) Received: from mail-ve0-x22b.google.com (mail-ve0-x22b.google.com [IPv6:2607:f8b0:400c:c01::22b]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id E071118DB; Fri, 6 Dec 2013 08:47:04 +0000 (UTC) Received: by mail-ve0-f171.google.com with SMTP id pa12so421249veb.16 for ; Fri, 06 Dec 2013 00:47:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=isOs9QJ92qWiJyDuRr0Jnv4a07JtfaIu3Fe/Mbn4pcE=; b=N0n52CteadD35VO+wTzmMtM8DFaPSA5tf1wIAW1RZ+RrmbixCYUndlL6c4HX8q713V KXFWvEffysJcbWN1evreurLjPV2664bmC7Ca5opoECDhYExYvzbxRKEmpR4p7OqV5YCJ wl4H8t8LBiQnK7reHgoAPJ2/X7gqUW5d915Y5jbO7UlKriUEP4hstn7MsJTgbFEUAn9a ON9xqHJ5IbKlyW2/HmlAAAWr3T5ASUN+LkBnrFq6EBFH5F1zi+CSsq6YzOEG/vPNfko9 jJhnWPtQtInOupUAGtQuzGrf2MacSZ1fdf9LZUclCYPx+/QAgoDkMIeL5C2YecooJ67Z 4WkQ== MIME-Version: 1.0 X-Received: by 10.220.174.200 with SMTP id u8mr1309687vcz.6.1386319623977; Fri, 06 Dec 2013 00:47:03 -0800 (PST) Received: by 10.58.7.169 with HTTP; Fri, 6 Dec 2013 00:47:03 -0800 (PST) In-Reply-To: <52A13BB1.50106@yahoo.com> References: <52A13BB1.50106@yahoo.com> Date: Fri, 6 Dec 2013 12:47:03 +0400 Message-ID: Subject: Re: Can't connect to network with my NIC From: Mikhail Vorobyev To: Darryl Lyle Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.17 Cc: freebsd-net@freebsd.org, yongari@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 Dec 2013 08:47:05 -0000 Hi man. Apparently it is necessary to properly configure network interfaces. 2013/12/6 Darryl Lyle > Hey guys, > > I just installed the latest pc-bsd 10 stable image, and I can't > connect to my network with my nic, but I can connect fine via wifi. > > ifconfig re0 > e0: flags=8843 metric 0 mtu 1500 > options=8209b HWCSUM,WOL_MAGIC,LINKSTATE> > ether f8:b1:56:9d:84:3a > inet6 fe80::fab1:56ff:fe9d:843a%re0 prefixlen 64 scopeid 0x1 > inet 0.0.0.0 netmask 0xff000000 broadcast 255.255.255.255 > nd6 options=23 > media: Ethernet autoselect (1000baseT ) > status: active > > Attached is dmesg and pciconf -lv > > If I try dhclient re0 I get no DHCPOFFERS > > > > > v/r > Darryl Lyle > > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > -- Regards, Vorobyev Mikhail. From owner-freebsd-net@FreeBSD.ORG Fri Dec 6 09:42:55 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id E4233825 for ; Fri, 6 Dec 2013 09:42:55 +0000 (UTC) Received: from vps.rulingia.com (host-122-100-2-194.octopus.com.au [122.100.2.194]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 7BA541DDB for ; Fri, 6 Dec 2013 09:42:54 +0000 (UTC) Received: from vps.rulingia.com (localhost [127.0.0.1]) by vps.rulingia.com (8.14.7/8.14.7) with ESMTP id rB69RZhY052065 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 6 Dec 2013 20:27:35 +1100 (EST) (envelope-from peter@vps.rulingia.com) Received: (from peter@localhost) by vps.rulingia.com (8.14.7/8.14.7/Submit) id rB69RZqu052064; Fri, 6 Dec 2013 20:27:35 +1100 (EST) (envelope-from peter) Date: Fri, 6 Dec 2013 20:27:35 +1100 From: Peter Jeremy To: Darryl Lyle Subject: Re: Can't connect to network with my NIC Message-ID: <20131206092735.GA51955@vps.rulingia.com> References: <52A13BB1.50106@yahoo.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="IJpNTDwzlM2Ie8A6" Content-Disposition: inline In-Reply-To: <52A13BB1.50106@yahoo.com> X-PGP-Key: http://www.rulingia.com/keys/peter.pgp User-Agent: Mutt/1.5.22 (2013-10-16) Cc: freebsd-net@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 Dec 2013 09:42:56 -0000 --IJpNTDwzlM2Ie8A6 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 2013-Dec-05 21:51:29 -0500, Darryl Lyle wrot= e: > I just installed the latest pc-bsd 10 stable image, and I can't=20 >connect to my network with my nic, but I can connect fine via wifi. I don't see anything immediately obvious that's wrong. >If I try dhclient re0 I get no DHCPOFFERS I presume there's a DHCP server visible from whatever re0 is plugged into. As further debugging steps: If you "tcpdump -n -i re0" on the affected box, do you see any network traffic? Can you see the outgoing DHCP requests? Is there any response? If you run "tcpdump -n -i ... ether host f8:b1:56:9d:84:3a" on another box on the network (ideally the DHCP server), do you see the DHCP requests? If it's the switch or DHCP server, do you see any responses? --=20 Peter --IJpNTDwzlM2Ie8A6 Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (FreeBSD) iEYEARECAAYFAlKhmIcACgkQ/opHv/APuIfRzACeNK7stJoPNq2rdU4OhbQ4mGtA CuIAoLSLaOUOJJ/GQVJFdQx2iYH5nP86 =1xw2 -----END PGP SIGNATURE----- --IJpNTDwzlM2Ie8A6-- From owner-freebsd-net@FreeBSD.ORG Fri Dec 6 16:26:04 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id C7AD37A7; Fri, 6 Dec 2013 16:26:04 +0000 (UTC) Received: from aslan.scsiguy.com (mail.scsiguy.com [70.89.174.89]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 9BEC31B52; Fri, 6 Dec 2013 16:26:04 +0000 (UTC) Received: from raycaruso-lt.sldomain.com (207-225-98-3.dia.static.qwest.net [207.225.98.3]) (authenticated bits=0) by aslan.scsiguy.com (8.14.7/8.14.5) with ESMTP id rB6GPsHg047332 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO); Fri, 6 Dec 2013 09:25:57 -0700 (MST) (envelope-from gibbs@scsiguy.com) Mime-Version: 1.0 (Mac OS X Mail 7.0 \(1822\)) Subject: Re: Defaults for if_capenable and detecting user initiated changes From: "Justin T. Gibbs" In-Reply-To: <201312031213.41677.jhb@freebsd.org> Date: Fri, 6 Dec 2013 09:25:48 -0700 Message-Id: <526A243B-7B66-45BD-9B45-3BFB04F1E16D@scsiguy.com> References: <0E13D481-9D6D-4B52-A5AD-B671BF3A85AF@scsiguy.com> <201312031213.41677.jhb@freebsd.org> To: John Baldwin X-Mailer: Apple Mail (2.1822) X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.4.3 (aslan.scsiguy.com [70.89.174.89]); Fri, 06 Dec 2013 09:25:58 -0700 (MST) Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.17 Cc: freebsd-net@freebsd.org, =?iso-8859-1?Q?Roger_Pau_Monn=E9?= , net@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 Dec 2013 16:26:04 -0000 On Dec 3, 2013, at 10:13 AM, John Baldwin wrote: > On Wednesday, November 27, 2013 12:59:08 pm Justin T. Gibbs wrote: >> Hi net, >>=20 >> I=92m reviewing a patch from Roger Pau Monn=E9 for the Xen netfront = driver. The=20 > goal of the change is to avoid disturbing the user=92s settings for = the=20 > interface just because the backend device has changed or the = connection to the=20 > backend was reset. I=92ve attached the latest version of the patch. >>=20 >> The current patch leaves the interface settings alone if they can be=20= > supported by the newly attached backend. What would be ideal is to = enable=20 > capabilities that default to being enabled if they were not explicitly=20= > disabled by the user and can be supported by the new backend. = Unfortunately,=20 > I don=92t think the if_capenable and if_capabilities fields are = descriptive=20 > enough to deal with an interface whose capabilities can change at = runtime. =20 > Just as can be done with link speed, some of these settings need to = allow an=20 > =93auto/default=94 setting in addition to on or off. This would allow = the user to=20 > explicitly disable a capability if needed, but generally allow the = system to=20 > chose the most optimal settings when they are supported. Would this = be=20 > difficult to add? >=20 > Couldn't you maintain this state in the Xen netfront driver's softc? > You already get the ioctls that track changes to the capenable field, > so you when a change explicitly disables a capability you can set that > in a 'forced off' or 'forced on' field. Perhaps more of a 'forced' > field that you just update by doing: >=20 > sc->capforced |=3D (oldcapenable ^ newcapenable) >=20 > However, it's not clear to me if you can get the underlying adapters > initial capenable list. If so, I think capforced should be all you > need to handle this (though it might be easier if you have separate > forcedon and forcedoff fields). >=20 > --=20 > John Baldwin Certainly this could be done in the Xen driver. The reason I posted my = question, however, was to ask whether this should be more generically = tracked by the if layer instead of handled by the underlying driver. = Lots of user interfaces support a =93restore defaults=94 capability = (e.g. for the novice administrator who screws up, or as a step in = writing a script/procedure that starts by getting to a known state), so = I think this is interesting for more than this particular Xen issue. =97 Justin From owner-freebsd-net@FreeBSD.ORG Fri Dec 6 20:08:16 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 9E6301F7; Fri, 6 Dec 2013 20:08:16 +0000 (UTC) Received: from mail-n.franken.de (drew.ipv6.franken.de [IPv6:2001:638:a02:a001:20e:cff:fe4a:feaa]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id C44A51B48; Fri, 6 Dec 2013 20:08:15 +0000 (UTC) Received: from [192.168.1.200] (p508F3521.dip0.t-ipconnect.de [80.143.53.33]) (Authenticated sender: macmic) by mail-n.franken.de (Postfix) with ESMTP id A43481C0C0692; Fri, 6 Dec 2013 21:08:12 +0100 (CET) Content-Type: text/plain; charset=iso-8859-1 Mime-Version: 1.0 (Mac OS X Mail 6.6 \(1510\)) Subject: Re: A small fix for if_em.c, if_igb.c, if_ixgbe.c From: Michael Tuexen In-Reply-To: Date: Fri, 6 Dec 2013 21:08:13 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: <2D0F95A6-1321-4F8E-87FB-1B9DD33FD319@lurchi.franken.de> References: <521B9C2A-EECC-4412-9F68-2235320EF324@lurchi.franken.de> <20131202022338.GA3500@michelle.cdnetworks.com> <20131203021658.GC2981@michelle.cdnetworks.com> To: Adrian Chadd X-Mailer: Apple Mail (2.1510) Cc: Yong-Hyeon Pyun , Jack F Vogel , "freebsd-net@freebsd.org list" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 Dec 2013 20:08:16 -0000 On Dec 5, 2013, at 7:29 PM, Adrian Chadd wrote: > Hi, >=20 > Yes. Looking at the ixgbe code, ixgbe_mq_start_locked() returns an > error from ixgbe_xmit() but if it fails, it puts the buffer back. But > it's already successfully queued a frame to the driver, so in this > instance it shouldn't return the error from ixgbe_mq_start_locked(). >=20 > The same deal in if_em.c and igb.c >=20 > Now, drbr_putback() used to fail and now it doesn't, as you've said. > So we should change the xxx_mq_start_locked() to set err=3D0 if we go > via the drbr_putback() routine, as it hasn't actually failed to > transmit. >=20 > Now the very dirty thing is this - the error from xxx_transmit() is > for the mbuf being queued at the end; but xxx_mq_start_locked() > failures are for transmitting from the front. If there's only packet > in the queue and that fails then they're the same thing and returning > the error from xxx_mq_start_locked() matches the current mbuf being > queued. But otherwise, they're referring to totally different packets. > For TCP this may hurt; the TCP stack treats ENOBUFS a certain way and > kicks off a timer to schedule a retransmit. I don't think we can fix > _this_ right now. >=20 > So Michael - can you redo your patch to set err=3D0 if drbr_putback() = is > called, and retest? Hi Adrian, I guess you are talking about a patch like: [bsd5:~/head/sys/dev] tuexen% svn diff -x -p Index: e1000/if_em.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- e1000/if_em.c (revision 259039) +++ e1000/if_em.c (working copy) @@ -935,6 +935,7 @@ em_mq_start_locked(struct ifnet *ifp, struct tx_ri drbr_advance(ifp, txr->br); else=20 drbr_putback(ifp, txr->br, next); + err =3D 0; break; } drbr_advance(ifp, txr->br); Index: e1000/if_igb.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- e1000/if_igb.c (revision 259039) +++ e1000/if_igb.c (working copy) @@ -1024,6 +1024,7 @@ igb_mq_start_locked(struct ifnet *ifp, struct tx_r * may have changed it. */ drbr_putback(ifp, txr->br, next); + err =3D 0; } break; } Index: ixgbe/ixgbe.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- ixgbe/ixgbe.c (revision 259039) +++ ixgbe/ixgbe.c (working copy) @@ -864,6 +864,7 @@ ixgbe_mq_start_locked(struct ifnet *ifp, struct tx drbr_advance(ifp, txr->br); } else { drbr_putback(ifp, txr->br, next); + err =3D 0; } #endif break; Index: ixgbe/ixv.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- ixgbe/ixv.c (revision 259039) +++ ixgbe/ixv.c (working copy) @@ -629,6 +629,7 @@ ixv_mq_start_locked(struct ifnet *ifp, struct tx_r drbr_advance(ifp, txr->br); } else { drbr_putback(ifp, txr->br, next); + err =3D 0; } break; } Index: virtio/network/if_vtnet.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- virtio/network/if_vtnet.c (revision 259039) +++ virtio/network/if_vtnet.c (working copy) @@ -2242,9 +2242,10 @@ vtnet_txq_mq_start_locked(struct vtnet_txq *txq, = s while ((m =3D drbr_peek(ifp, br)) !=3D NULL) { error =3D vtnet_txq_encap(txq, &m); if (error) { - if (m !=3D NULL) + if (m !=3D NULL) { drbr_putback(ifp, br, m); - else + error =3D 0; + } else drbr_advance(ifp, br); break; } I looked for drivers using drbr_putback() and used a similar fix. Please = note that sys/dev/oce/oce_if.c seems strange. It uses drbr_putback() and = drbr_enqueue(), so I left it out for now. I tested the igb driver and the above patch fixes the problem I saw. =46rom your above description I think the above patch is a valid patch. However, xxx_transmit() can still return an error, even if there is no problem with the provided packet. This is an issue for transport = protocols... Best regards Michael >=20 > Thanks! >=20 >=20 >=20 >=20 > -adrian >=20 From owner-freebsd-net@FreeBSD.ORG Fri Dec 6 20:15:22 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 5B29968E; Fri, 6 Dec 2013 20:15:22 +0000 (UTC) Received: from mail-n.franken.de (drew.ipv6.franken.de [IPv6:2001:638:a02:a001:20e:cff:fe4a:feaa]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id E000E1D0D; Fri, 6 Dec 2013 20:15:21 +0000 (UTC) Received: from [192.168.1.200] (p508F3521.dip0.t-ipconnect.de [80.143.53.33]) (Authenticated sender: macmic) by mail-n.franken.de (Postfix) with ESMTP id E5B941C0C069B; Fri, 6 Dec 2013 21:15:19 +0100 (CET) Content-Type: text/plain; charset=iso-8859-1 Mime-Version: 1.0 (Mac OS X Mail 6.6 \(1510\)) Subject: Re: A small fix for if_em.c, if_igb.c, if_ixgbe.c From: Michael Tuexen In-Reply-To: Date: Fri, 6 Dec 2013 21:15:17 +0100 Content-Transfer-Encoding: 7bit Message-Id: <4E82B807-12DE-441E-BCB3-261866CC5B28@lurchi.franken.de> References: <521B9C2A-EECC-4412-9F68-2235320EF324@lurchi.franken.de> <20131202022338.GA3500@michelle.cdnetworks.com> <20131203021658.GC2981@michelle.cdnetworks.com> To: Adrian Chadd X-Mailer: Apple Mail (2.1510) Cc: Yong-Hyeon Pyun , Jack F Vogel , "freebsd-net@freebsd.org list" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 Dec 2013 20:15:22 -0000 On Dec 5, 2013, at 11:01 PM, Adrian Chadd wrote: > On 5 December 2013 13:05, Michael Tuexen > wrote: > >> Just to be clear: This would mean that xxx_transmit() would return >> an error even if the packet provided in the call xxx_transmit() is >> enqueued and not dropped? >> This would also be problem with the current SCTP stack. > > I think it'll return an error only if: > > * it queued the frame to the tail of the drbd; > * it then tried to transmit a frame from the head of the drbd; > * it failed to transmit the first frame in the drbd and it couldn't > put it back into the queue for whatever reason. > > So I think it should be "ok enough" for both TCP and SCTP. No it isn't. The transport layer calls ip_output() (or the v6 variant), and it needs to know if the packet provided will not be put on the wire. If it knows for sure that the provided packet was dropped by the local stack it can do some special treatment. In all other cases, the packet may or may not make it to the peer and the transport layer will take care, but can't optimize. If the above describes what I get from ip_output(), I can only ignore it, since it doesn't help. Which layer can make use of the above information? > > Give it a go and let me know how it goes. The patch in the other mail fixes the problem and improves the driver. > > It's an interesting architectural problem to completely solve. Yes, it is. I think setting err=0 makes sense. However, the information returned by xxx_transmit() as described above seems useless for me. This is an architectural point as you said and I'm interested in knowing which consumer of the return code of xxx_transmit() can make use of it. Best regards Michael > > > -adrian > From owner-freebsd-net@FreeBSD.ORG Fri Dec 6 20:17:16 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id F029D7B0; Fri, 6 Dec 2013 20:17:16 +0000 (UTC) Received: from mail-n.franken.de (drew.ipv6.franken.de [IPv6:2001:638:a02:a001:20e:cff:fe4a:feaa]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 804C51D38; Fri, 6 Dec 2013 20:17:16 +0000 (UTC) Received: from [192.168.1.200] (p508F3521.dip0.t-ipconnect.de [80.143.53.33]) (Authenticated sender: macmic) by mail-n.franken.de (Postfix) with ESMTP id A267B1C0C0692; Fri, 6 Dec 2013 21:17:14 +0100 (CET) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 6.6 \(1510\)) Subject: Re: A small fix for if_em.c, if_igb.c, if_ixgbe.c From: Michael Tuexen In-Reply-To: <20131205223711.GB55638@funkthat.com> Date: Fri, 6 Dec 2013 21:17:15 +0100 Content-Transfer-Encoding: 7bit Message-Id: <3576B69E-E943-46E0-83E5-0B2194A44ED0@lurchi.franken.de> References: <521B9C2A-EECC-4412-9F68-2235320EF324@lurchi.franken.de> <20131202022338.GA3500@michelle.cdnetworks.com> <20131203021658.GC2981@michelle.cdnetworks.com> <20131205223711.GB55638@funkthat.com> To: John-Mark Gurney X-Mailer: Apple Mail (2.1510) Cc: Yong-Hyeon Pyun , Jack F Vogel , Adrian Chadd , "freebsd-net@freebsd.org list" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 Dec 2013 20:17:17 -0000 On Dec 5, 2013, at 11:37 PM, John-Mark Gurney wrote: > Adrian Chadd wrote this message on Thu, Dec 05, 2013 at 14:01 -0800: >> On 5 December 2013 13:05, Michael Tuexen >> wrote: >> >>> Just to be clear: This would mean that xxx_transmit() would return >>> an error even if the packet provided in the call xxx_transmit() is >>> enqueued and not dropped? >>> This would also be problem with the current SCTP stack. >> >> I think it'll return an error only if: >> >> * it queued the frame to the tail of the drbd; >> * it then tried to transmit a frame from the head of the drbd; >> * it failed to transmit the first frame in the drbd and it couldn't >> put it back into the queue for whatever reason. >> >> So I think it should be "ok enough" for both TCP and SCTP. > > IMO it should only return an error if the specific frame failed to be > sent or queued. If you cannot determine at return time if the frame > failed to be transmitted/queued, then it should return success. Yes, this is exactly what I think too. This is what my first patch realizes. > > In the above case, if there were other frames queued ahead, and the > first one failed, then it sounds like the frame may eventually be sent > and we will end up sending a duplicate frame. Correct. SCTP will consider the frame even unsent... So the SCTP stack behaves strange and sends a packet at wirespeed over and over again (which is not good...). Best regards Michael > >> Give it a go and let me know how it goes. >> >> It's an interesting architectural problem to completely solve. > > -- > John-Mark Gurney Voice: +1 415 225 5579 > > "All that I will do, has been done, All that I have, has not." > From owner-freebsd-net@FreeBSD.ORG Fri Dec 6 20:17:55 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id B2017882; Fri, 6 Dec 2013 20:17:55 +0000 (UTC) Received: from h2.funkthat.com (gate2.funkthat.com [208.87.223.18]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 88DFA1D4E; Fri, 6 Dec 2013 20:17:55 +0000 (UTC) Received: from h2.funkthat.com (localhost [127.0.0.1]) by h2.funkthat.com (8.14.3/8.14.3) with ESMTP id rB6KHmgo092246 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 6 Dec 2013 12:17:48 -0800 (PST) (envelope-from jmg@h2.funkthat.com) Received: (from jmg@localhost) by h2.funkthat.com (8.14.3/8.14.3/Submit) id rB6KHmEr092245; Fri, 6 Dec 2013 12:17:48 -0800 (PST) (envelope-from jmg) Date: Fri, 6 Dec 2013 12:17:48 -0800 From: John-Mark Gurney To: Michael Tuexen Subject: Re: A small fix for if_em.c, if_igb.c, if_ixgbe.c Message-ID: <20131206201748.GF55638@funkthat.com> Mail-Followup-To: Michael Tuexen , Adrian Chadd , Yong-Hyeon Pyun , Jack F Vogel , "freebsd-net@freebsd.org list" References: <521B9C2A-EECC-4412-9F68-2235320EF324@lurchi.franken.de> <20131202022338.GA3500@michelle.cdnetworks.com> <20131203021658.GC2981@michelle.cdnetworks.com> <2D0F95A6-1321-4F8E-87FB-1B9DD33FD319@lurchi.franken.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <2D0F95A6-1321-4F8E-87FB-1B9DD33FD319@lurchi.franken.de> User-Agent: Mutt/1.4.2.3i X-Operating-System: FreeBSD 7.2-RELEASE i386 X-PGP-Fingerprint: 54BA 873B 6515 3F10 9E88 9322 9CB1 8F74 6D3F A396 X-Files: The truth is out there X-URL: http://resnet.uoregon.edu/~gurney_j/ X-Resume: http://resnet.uoregon.edu/~gurney_j/resume.html X-to-the-FBI-CIA-and-NSA: HI! HOW YA DOIN? can i haz chizburger? X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (h2.funkthat.com [127.0.0.1]); Fri, 06 Dec 2013 12:17:49 -0800 (PST) Cc: Yong-Hyeon Pyun , Jack F Vogel , Adrian Chadd , "freebsd-net@freebsd.org list" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 Dec 2013 20:17:55 -0000 Michael Tuexen wrote this message on Fri, Dec 06, 2013 at 21:08 +0100: > On Dec 5, 2013, at 7:29 PM, Adrian Chadd wrote: > > > Yes. Looking at the ixgbe code, ixgbe_mq_start_locked() returns an > > error from ixgbe_xmit() but if it fails, it puts the buffer back. But > > it's already successfully queued a frame to the driver, so in this > > instance it shouldn't return the error from ixgbe_mq_start_locked(). > > > > The same deal in if_em.c and igb.c > > > > Now, drbr_putback() used to fail and now it doesn't, as you've said. > > So we should change the xxx_mq_start_locked() to set err=0 if we go > > via the drbr_putback() routine, as it hasn't actually failed to > > transmit. > > > > Now the very dirty thing is this - the error from xxx_transmit() is > > for the mbuf being queued at the end; but xxx_mq_start_locked() > > failures are for transmitting from the front. If there's only packet > > in the queue and that fails then they're the same thing and returning > > the error from xxx_mq_start_locked() matches the current mbuf being > > queued. But otherwise, they're referring to totally different packets. > > For TCP this may hurt; the TCP stack treats ENOBUFS a certain way and > > kicks off a timer to schedule a retransmit. I don't think we can fix > > _this_ right now. > > > > So Michael - can you redo your patch to set err=0 if drbr_putback() is > > called, and retest? > Hi Adrian, > > I guess you are talking about a patch like: > > [bsd5:~/head/sys/dev] tuexen% svn diff -x -p > Index: e1000/if_em.c > =================================================================== > --- e1000/if_em.c (revision 259039) > +++ e1000/if_em.c (working copy) > @@ -935,6 +935,7 @@ em_mq_start_locked(struct ifnet *ifp, struct tx_ri > drbr_advance(ifp, txr->br); > else > drbr_putback(ifp, txr->br, next); > + err = 0; You probably want curly braces around this... -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not." From owner-freebsd-net@FreeBSD.ORG Fri Dec 6 20:20:13 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id BC69E988; Fri, 6 Dec 2013 20:20:13 +0000 (UTC) Received: from h2.funkthat.com (gate2.funkthat.com [208.87.223.18]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 8FB451D6E; Fri, 6 Dec 2013 20:20:13 +0000 (UTC) Received: from h2.funkthat.com (localhost [127.0.0.1]) by h2.funkthat.com (8.14.3/8.14.3) with ESMTP id rB6KKCLE092298 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 6 Dec 2013 12:20:12 -0800 (PST) (envelope-from jmg@h2.funkthat.com) Received: (from jmg@localhost) by h2.funkthat.com (8.14.3/8.14.3/Submit) id rB6KKC3Z092297; Fri, 6 Dec 2013 12:20:12 -0800 (PST) (envelope-from jmg) Date: Fri, 6 Dec 2013 12:20:12 -0800 From: John-Mark Gurney To: Michael Tuexen Subject: Re: A small fix for if_em.c, if_igb.c, if_ixgbe.c Message-ID: <20131206202012.GG55638@funkthat.com> Mail-Followup-To: Michael Tuexen , Yong-Hyeon Pyun , Jack F Vogel , Adrian Chadd , "freebsd-net@freebsd.org list" References: <521B9C2A-EECC-4412-9F68-2235320EF324@lurchi.franken.de> <20131202022338.GA3500@michelle.cdnetworks.com> <20131203021658.GC2981@michelle.cdnetworks.com> <20131205223711.GB55638@funkthat.com> <3576B69E-E943-46E0-83E5-0B2194A44ED0@lurchi.franken.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3576B69E-E943-46E0-83E5-0B2194A44ED0@lurchi.franken.de> User-Agent: Mutt/1.4.2.3i X-Operating-System: FreeBSD 7.2-RELEASE i386 X-PGP-Fingerprint: 54BA 873B 6515 3F10 9E88 9322 9CB1 8F74 6D3F A396 X-Files: The truth is out there X-URL: http://resnet.uoregon.edu/~gurney_j/ X-Resume: http://resnet.uoregon.edu/~gurney_j/resume.html X-to-the-FBI-CIA-and-NSA: HI! HOW YA DOIN? can i haz chizburger? X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (h2.funkthat.com [127.0.0.1]); Fri, 06 Dec 2013 12:20:13 -0800 (PST) Cc: Yong-Hyeon Pyun , Jack F Vogel , Adrian Chadd , "freebsd-net@freebsd.org list" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 Dec 2013 20:20:13 -0000 Michael Tuexen wrote this message on Fri, Dec 06, 2013 at 21:17 +0100: > On Dec 5, 2013, at 11:37 PM, John-Mark Gurney wrote: > > > Adrian Chadd wrote this message on Thu, Dec 05, 2013 at 14:01 -0800: > >> On 5 December 2013 13:05, Michael Tuexen > >> wrote: > >> > >>> Just to be clear: This would mean that xxx_transmit() would return > >>> an error even if the packet provided in the call xxx_transmit() is > >>> enqueued and not dropped? > >>> This would also be problem with the current SCTP stack. > >> > >> I think it'll return an error only if: > >> > >> * it queued the frame to the tail of the drbd; > >> * it then tried to transmit a frame from the head of the drbd; > >> * it failed to transmit the first frame in the drbd and it couldn't > >> put it back into the queue for whatever reason. > >> > >> So I think it should be "ok enough" for both TCP and SCTP. > > > > IMO it should only return an error if the specific frame failed to be > > sent or queued. If you cannot determine at return time if the frame > > failed to be transmitted/queued, then it should return success. > Yes, this is exactly what I think too. This is what my first patch > realizes. > > > > In the above case, if there were other frames queued ahead, and the > > first one failed, then it sounds like the frame may eventually be sent > > and we will end up sending a duplicate frame. > Correct. SCTP will consider the frame even unsent... So the SCTP stack > behaves strange and sends a packet at wirespeed over and over again (which > is not good...). Sounds like a bug in SCTP, if it gets an error like that, it needs to back off a bit.. Though when to wake up, etc, is harder to decide... -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not." From owner-freebsd-net@FreeBSD.ORG Fri Dec 6 21:04:43 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 3D3C0184; Fri, 6 Dec 2013 21:04:43 +0000 (UTC) Received: from mail-n.franken.de (drew.ipv6.franken.de [IPv6:2001:638:a02:a001:20e:cff:fe4a:feaa]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 66D3B10B6; Fri, 6 Dec 2013 21:04:42 +0000 (UTC) Received: from [192.168.1.200] (p508F3521.dip0.t-ipconnect.de [80.143.53.33]) (Authenticated sender: macmic) by mail-n.franken.de (Postfix) with ESMTP id 08CA31C0C0695; Fri, 6 Dec 2013 22:04:39 +0100 (CET) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 6.6 \(1510\)) Subject: Re: A small fix for if_em.c, if_igb.c, if_ixgbe.c From: Michael Tuexen In-Reply-To: <20131206201748.GF55638@funkthat.com> Date: Fri, 6 Dec 2013 22:04:41 +0100 Content-Transfer-Encoding: 7bit Message-Id: <956436B1-5E20-4470-B415-3311F5CC24B8@lurchi.franken.de> References: <521B9C2A-EECC-4412-9F68-2235320EF324@lurchi.franken.de> <20131202022338.GA3500@michelle.cdnetworks.com> <20131203021658.GC2981@michelle.cdnetworks.com> <2D0F95A6-1321-4F8E-87FB-1B9DD33FD319@lurchi.franken.de> <20131206201748.GF55638@funkthat.com> To: John-Mark Gurney X-Mailer: Apple Mail (2.1510) Cc: Yong-Hyeon Pyun , Jack F Vogel , Adrian Chadd , "freebsd-net@freebsd.org list" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 Dec 2013 21:04:43 -0000 On Dec 6, 2013, at 9:17 PM, John-Mark Gurney wrote: > Michael Tuexen wrote this message on Fri, Dec 06, 2013 at 21:08 +0100: >> On Dec 5, 2013, at 7:29 PM, Adrian Chadd wrote: >> >>> Yes. Looking at the ixgbe code, ixgbe_mq_start_locked() returns an >>> error from ixgbe_xmit() but if it fails, it puts the buffer back. But >>> it's already successfully queued a frame to the driver, so in this >>> instance it shouldn't return the error from ixgbe_mq_start_locked(). >>> >>> The same deal in if_em.c and igb.c >>> >>> Now, drbr_putback() used to fail and now it doesn't, as you've said. >>> So we should change the xxx_mq_start_locked() to set err=0 if we go >>> via the drbr_putback() routine, as it hasn't actually failed to >>> transmit. >>> >>> Now the very dirty thing is this - the error from xxx_transmit() is >>> for the mbuf being queued at the end; but xxx_mq_start_locked() >>> failures are for transmitting from the front. If there's only packet >>> in the queue and that fails then they're the same thing and returning >>> the error from xxx_mq_start_locked() matches the current mbuf being >>> queued. But otherwise, they're referring to totally different packets. >>> For TCP this may hurt; the TCP stack treats ENOBUFS a certain way and >>> kicks off a timer to schedule a retransmit. I don't think we can fix >>> _this_ right now. >>> >>> So Michael - can you redo your patch to set err=0 if drbr_putback() is >>> called, and retest? >> Hi Adrian, >> >> I guess you are talking about a patch like: >> >> [bsd5:~/head/sys/dev] tuexen% svn diff -x -p >> Index: e1000/if_em.c >> =================================================================== >> --- e1000/if_em.c (revision 259039) >> +++ e1000/if_em.c (working copy) >> @@ -935,6 +935,7 @@ em_mq_start_locked(struct ifnet *ifp, struct tx_ri >> drbr_advance(ifp, txr->br); >> else >> drbr_putback(ifp, txr->br, next); >> + err = 0; > > You probably want curly braces around this... For sure. Thanks for catching it: [bsd5:~/head/sys/dev] tuexen% svn diff -x -p Index: e1000/if_em.c =================================================================== --- e1000/if_em.c (revision 259039) +++ e1000/if_em.c (working copy) @@ -933,8 +933,10 @@ em_mq_start_locked(struct ifnet *ifp, struct tx_ri if ((err = em_xmit(txr, &next)) != 0) { if (next == NULL) drbr_advance(ifp, txr->br); - else + else { drbr_putback(ifp, txr->br, next); + err = 0; + } break; } drbr_advance(ifp, txr->br); Index: e1000/if_igb.c =================================================================== --- e1000/if_igb.c (revision 259039) +++ e1000/if_igb.c (working copy) @@ -1024,6 +1024,7 @@ igb_mq_start_locked(struct ifnet *ifp, struct tx_r * may have changed it. */ drbr_putback(ifp, txr->br, next); + err = 0; } break; } Index: ixgbe/ixgbe.c =================================================================== --- ixgbe/ixgbe.c (revision 259039) +++ ixgbe/ixgbe.c (working copy) @@ -864,6 +864,7 @@ ixgbe_mq_start_locked(struct ifnet *ifp, struct tx drbr_advance(ifp, txr->br); } else { drbr_putback(ifp, txr->br, next); + err = 0; } #endif break; Index: ixgbe/ixv.c =================================================================== --- ixgbe/ixv.c (revision 259039) +++ ixgbe/ixv.c (working copy) @@ -629,6 +629,7 @@ ixv_mq_start_locked(struct ifnet *ifp, struct tx_r drbr_advance(ifp, txr->br); } else { drbr_putback(ifp, txr->br, next); + err = 0; } break; } Index: virtio/network/if_vtnet.c =================================================================== --- virtio/network/if_vtnet.c (revision 259039) +++ virtio/network/if_vtnet.c (working copy) @@ -2242,9 +2242,10 @@ vtnet_txq_mq_start_locked(struct vtnet_txq *txq, s while ((m = drbr_peek(ifp, br)) != NULL) { error = vtnet_txq_encap(txq, &m); if (error) { - if (m != NULL) + if (m != NULL) { drbr_putback(ifp, br, m); - else + error = 0; + } else drbr_advance(ifp, br); break; } > > -- > John-Mark Gurney Voice: +1 415 225 5579 > > "All that I will do, has been done, All that I have, has not." > From owner-freebsd-net@FreeBSD.ORG Fri Dec 6 21:10:51 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id D5DB730A; Fri, 6 Dec 2013 21:10:51 +0000 (UTC) Received: from mail-n.franken.de (drew.ipv6.franken.de [IPv6:2001:638:a02:a001:20e:cff:fe4a:feaa]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 6410F10FC; Fri, 6 Dec 2013 21:10:51 +0000 (UTC) Received: from [192.168.1.200] (p508F3521.dip0.t-ipconnect.de [80.143.53.33]) (Authenticated sender: macmic) by mail-n.franken.de (Postfix) with ESMTP id 5E9881C0C0695; Fri, 6 Dec 2013 22:10:49 +0100 (CET) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 6.6 \(1510\)) Subject: Re: A small fix for if_em.c, if_igb.c, if_ixgbe.c From: Michael Tuexen In-Reply-To: <20131206202012.GG55638@funkthat.com> Date: Fri, 6 Dec 2013 22:10:50 +0100 Content-Transfer-Encoding: 7bit Message-Id: <609C63CD-9332-4EAE-AACE-5B911416DF80@lurchi.franken.de> References: <521B9C2A-EECC-4412-9F68-2235320EF324@lurchi.franken.de> <20131202022338.GA3500@michelle.cdnetworks.com> <20131203021658.GC2981@michelle.cdnetworks.com> <20131205223711.GB55638@funkthat.com> <3576B69E-E943-46E0-83E5-0B2194A44ED0@lurchi.franken.de> <20131206202012.GG55638@funkthat.com> To: John-Mark Gurney X-Mailer: Apple Mail (2.1510) Cc: Yong-Hyeon Pyun , Jack F Vogel , Adrian Chadd , "freebsd-net@freebsd.org list" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 Dec 2013 21:10:52 -0000 On Dec 6, 2013, at 9:20 PM, John-Mark Gurney wrote: > Michael Tuexen wrote this message on Fri, Dec 06, 2013 at 21:17 +0100: >> On Dec 5, 2013, at 11:37 PM, John-Mark Gurney wrote: >> >>> Adrian Chadd wrote this message on Thu, Dec 05, 2013 at 14:01 -0800: >>>> On 5 December 2013 13:05, Michael Tuexen >>>> wrote: >>>> >>>>> Just to be clear: This would mean that xxx_transmit() would return >>>>> an error even if the packet provided in the call xxx_transmit() is >>>>> enqueued and not dropped? >>>>> This would also be problem with the current SCTP stack. >>>> >>>> I think it'll return an error only if: >>>> >>>> * it queued the frame to the tail of the drbd; >>>> * it then tried to transmit a frame from the head of the drbd; >>>> * it failed to transmit the first frame in the drbd and it couldn't >>>> put it back into the queue for whatever reason. >>>> >>>> So I think it should be "ok enough" for both TCP and SCTP. >>> >>> IMO it should only return an error if the specific frame failed to be >>> sent or queued. If you cannot determine at return time if the frame >>> failed to be transmitted/queued, then it should return success. >> Yes, this is exactly what I think too. This is what my first patch >> realizes. >>> >>> In the above case, if there were other frames queued ahead, and the >>> first one failed, then it sounds like the frame may eventually be sent >>> and we will end up sending a duplicate frame. >> Correct. SCTP will consider the frame even unsent... So the SCTP stack >> behaves strange and sends a packet at wirespeed over and over again (which >> is not good...). > > Sounds like a bug in SCTP, if it gets an error like that, it needs to back > off a bit.. Though when to wake up, etc, is harder to decide... Well, this is what happens: The sender takes a packet from the send-queue, calls ip-output. Since it returns an error, we don't move it to the sent-queue, but leave it in the send queue (assuming it doesn't went on the wire). However, the driver puts it on the wire, it makes it to the peer, the peer sends SACK, and we receive the SACK. Since the packet is not on the sent queue, we don't realize that it is acked. Receiving a SACK is a trigger for sending a packet. So we take the next one from the send-queue (the one from the beginning), and send it again. So it is a wire speed ping pong... So in case the lower layer tells us that there was a problem in sending the packet, we * don't consider it sent * wait for the next normal protocol trigger for send another packet. This sounds OK to me... That is why I need to know what an error from ip_output() means. If I can't conclude that the provided packet was dropped, I can just consider it sent and don't try to do any optimisation. Best regards Michael > > -- > John-Mark Gurney Voice: +1 415 225 5579 > > "All that I will do, has been done, All that I have, has not." > From owner-freebsd-net@FreeBSD.ORG Fri Dec 6 22:54:42 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 0024BA3D for ; Fri, 6 Dec 2013 22:54:41 +0000 (UTC) Received: from mail-yh0-x230.google.com (mail-yh0-x230.google.com [IPv6:2607:f8b0:4002:c01::230]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id B5CBB192C for ; Fri, 6 Dec 2013 22:54:41 +0000 (UTC) Received: by mail-yh0-f48.google.com with SMTP id f73so1008990yha.21 for ; Fri, 06 Dec 2013 14:54:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:cc:subject :content-type:content-transfer-encoding; bh=xjntvCHfUWHfvlMVypviH0HxyFQaECAq6HOHVseK69s=; b=PS0CtaqHwcoA763gFgdQnSb1JFTMMgUwGMTsDd3rk8ReaoQBr54xbaataZmR1Z8xld TKvoYzkstPsg+gVId2/hMtRSGYOKrGOmMaJ066jLJGY9H+qsdrfvS2nfx0vERrCAGh9k qbb/03fBXsXGm1f1G7PDl5YW7vOFQPtSj4jCbDO5FUhUao7gVLVaZ01o2xMd5XtITaEc nI72NTpgsM5UQUzTcBq8/zF7jkS1s93jXlcj+OKzRd/30Ht+k7lkVyoLnrzWhI39ejfC 3JrHw3Xjmil8Pi+WPfFW/myg++BBCdmKfzgAmn7hKiXpbpZ5Req2MH28EYs/b1Nk2Bif iEJQ== X-Received: by 10.236.174.37 with SMTP id w25mr4623808yhl.36.1386370480977; Fri, 06 Dec 2013 14:54:40 -0800 (PST) Received: from [10.10.1.35] ([192.252.130.194]) by mx.google.com with ESMTPSA id b30sm106230yhm.5.2013.12.06.14.54.40 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 06 Dec 2013 14:54:40 -0800 (PST) Message-ID: <52A255AB.8040905@gmail.com> Date: Fri, 06 Dec 2013 17:54:35 -0500 From: Karim Fodil-Lemelin User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.1.1 MIME-Version: 1.0 To: freebsd-net@FreeBSD.org Subject: Avoiding an infinite loop in e1000 82575 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Jack Vogel X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 Dec 2013 22:54:42 -0000 Hi, I have encountered a strange issue were the igb driver goes into an infinite loop (I'm using version - 2.3.10) if many incantations of ifconfig are running in a while loop very fast. The following patch fixed it for me: @@ -1052,12 +1052,11 @@ static void e1000_release_swfw_sync_82575(struct e1000_hw *hw, u16 mask) { u32 swfw_sync; DEBUGFUNC("e1000_release_swfw_sync_82575"); - while (e1000_get_hw_semaphore_generic(hw) != E1000_SUCCESS) - ; /* Empty */ + e1000_get_hw_semaphore_generic(hw); swfw_sync = E1000_READ_REG(hw, E1000_SW_FW_SYNC); swfw_sync &= ~mask; E1000_WRITE_REG(hw, E1000_SW_FW_SYNC, swfw_sync); Now, I haven't seen any side effect of this change except that it fixed my issue although I wonder what they are and what effect will this change have on the system? Thanks, Karim. PS: Some more information on the devices: dmesg: igb0: port 0xc880-0xc89f mem 0xfba80000-0xfbafffff,0xfbb78000-0xfbb7bfff irq 16 at device 0.0 on pci4 igb0: Using MSIX interrupts with 2 vectors igb0: Ethernet address: 00:90:0b:2f:b8:00 igb0: [ITHREAD] igb0: [ITHREAD] igb1: port 0xcc00-0xcc1f mem 0xfbb80000-0xfbbfffff,0xfbb7c000-0xfbb7ffff irq 17 at device 0.1 on pci4 igb1: Using MSIX interrupts with 2 vectors igb1: Ethernet address: 00:90:0b:2f:b8:01 igb1: [ITHREAD] igb1: [ITHREAD] igb2: port 0xd880-0xd89f mem 0xfbc80000-0xfbcfffff,0xfbd78000-0xfbd7bfff irq 16 at device 0.0 on pci5 igb2: Using MSIX interrupts with 2 vectors igb2: Ethernet address: 00:90:0b:2f:b8:02 igb2: [ITHREAD] igb2: [ITHREAD] igb3: port 0xdc00-0xdc1f mem 0xfbd80000-0xfbdfffff,0xfbd7c000-0xfbd7ffff irq 17 at device 0.1 on pci5 igb3: Using MSIX interrupts with 2 vectors igb3: Ethernet address: 00:90:0b:2f:b8:03 igb3: [ITHREAD] igb3: [ITHREAD] pciconf igb0@pci0:4:0:0: class=0x020000 card=0x00008086 chip=0x150e8086 rev=0x01 hdr=0x00 igb1@pci0:4:0:1: class=0x020000 card=0x00008086 chip=0x150e8086 rev=0x01 hdr=0x00 igb2@pci0:5:0:0: class=0x020000 card=0x00008086 chip=0x150e8086 rev=0x01 hdr=0x00 igb3@pci0:5:0:1: class=0x020000 card=0x00008086 chip=0x150e8086 rev=0x01 hdr=0x00 From owner-freebsd-net@FreeBSD.ORG Fri Dec 6 23:25:05 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id E3BE9BB9; Fri, 6 Dec 2013 23:25:05 +0000 (UTC) Received: from mail-qc0-x235.google.com (mail-qc0-x235.google.com [IPv6:2607:f8b0:400d:c01::235]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 892E11CB8; Fri, 6 Dec 2013 23:25:05 +0000 (UTC) Received: by mail-qc0-f181.google.com with SMTP id e9so994543qcy.26 for ; Fri, 06 Dec 2013 15:25:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=DY2p0OeQFtAmRu+8iIRt/bYIZT6fpNHMpFhsaXt/uUc=; b=qWk0cLZBCEW0B1R32tyTNUpVChOJNnJ6EccV4cH/tY4QCsE9ZTSNsQkGOLeYJlxAGb R7Lr18F+oEt1VX74E2W05IymVU0cuWlMRfwhwIyOTilXsvvT3SNGwY0vzLDgQqHSDPUD L9iYhi23//2/BjVvj74HohfERHSUxYS8ZAyO65RE9BS9+X/B+aS7tDJJZBe300dJJVeB KYnXsIb0tcdplFU0JFzTMj2ZFs6ucjnN10u8XG1Zc7Ak8Rk1FtaHdUT8MC0I96vxOsT4 Qhv/Bo6h+V5b5kp4adBydJepZxC0603mUSwf9T93hNm17C5g1poS/oXkZpPxj6qrepGY /eYA== MIME-Version: 1.0 X-Received: by 10.224.89.73 with SMTP id d9mr11480031qam.5.1386372304778; Fri, 06 Dec 2013 15:25:04 -0800 (PST) Sender: adrian.chadd@gmail.com Received: by 10.224.53.200 with HTTP; Fri, 6 Dec 2013 15:25:04 -0800 (PST) In-Reply-To: <609C63CD-9332-4EAE-AACE-5B911416DF80@lurchi.franken.de> References: <521B9C2A-EECC-4412-9F68-2235320EF324@lurchi.franken.de> <20131202022338.GA3500@michelle.cdnetworks.com> <20131203021658.GC2981@michelle.cdnetworks.com> <20131205223711.GB55638@funkthat.com> <3576B69E-E943-46E0-83E5-0B2194A44ED0@lurchi.franken.de> <20131206202012.GG55638@funkthat.com> <609C63CD-9332-4EAE-AACE-5B911416DF80@lurchi.franken.de> Date: Fri, 6 Dec 2013 15:25:04 -0800 X-Google-Sender-Auth: zRzZdbNmCEGXZcUFoZOuhoLJDug Message-ID: Subject: Re: A small fix for if_em.c, if_igb.c, if_ixgbe.c From: Adrian Chadd To: Michael Tuexen Content-Type: text/plain; charset=ISO-8859-1 Cc: Yong-Hyeon Pyun , Jack F Vogel , John-Mark Gurney , "freebsd-net@freebsd.org list" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 Dec 2013 23:25:06 -0000 On 6 December 2013 13:10, Michael Tuexen wrote: > Well, this is what happens: > The sender takes a packet from the send-queue, calls ip-output. Since > it returns an error, we don't move it to the sent-queue, but leave > it in the send queue (assuming it doesn't went on the wire). > However, the driver puts it on the wire, it makes it to the peer, > the peer sends SACK, and we receive the SACK. Since the packet is > not on the sent queue, we don't realize that it is acked. Receiving > a SACK is a trigger for sending a packet. So we take the next one > from the send-queue (the one from the beginning), and send it again. > So it is a wire speed ping pong... > So in case the lower layer tells us that there was a problem in > sending the packet, we > * don't consider it sent > * wait for the next normal protocol trigger for send another packet. > This sounds OK to me... > > That is why I need to know what an error from ip_output() means. > If I can't conclude that the provided packet was dropped, I can just > consider it sent and don't try to do any optimisation. We're heading down the right path. I'm increasingly believing that ignoring the return value is the correct thing to do. -adrian From owner-freebsd-net@FreeBSD.ORG Fri Dec 6 23:26:28 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 2BF31CC1 for ; Fri, 6 Dec 2013 23:26:28 +0000 (UTC) Received: from mail-qa0-x22c.google.com (mail-qa0-x22c.google.com [IPv6:2607:f8b0:400d:c00::22c]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id E022E1CDB for ; Fri, 6 Dec 2013 23:26:27 +0000 (UTC) Received: by mail-qa0-f44.google.com with SMTP id i13so1041941qae.17 for ; Fri, 06 Dec 2013 15:26:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=+hx6Q8PAcqLWlg81Rd0NBizlRy1v2VkAwDD6V6lpLak=; b=aZ4cECot0lL1Z1FzzewUunkFXxE+7siwmYk9xaiIbwBLKmSfJl8wwW+OG/cZoYKTby MkPYXugH5ra9JDAKnqzW6xbRicOAYEp9UQO+YIuiBz2sF4peSHx4P8XnppDliq39oyTm Y8lJc67LXJ1ubc6BqUQY6gZH7Kz77BFErO6fHMbequMe8f7TRikIBl5eBix49bFiaTJ5 3mCuu4vFHg7jQgBU0nWw2Vnys9wKN+1nlrAAB1E6VrPTabDP9eQgeZDuTyg9CX9t3L9f 9EVWZjmxXq7MOqQHxSTp9dqzqZbj2VdjiU775GB4XJ5WmfDL20QsZufgYa9VMYnT3Lip 6Dgw== MIME-Version: 1.0 X-Received: by 10.229.56.200 with SMTP id z8mr10765856qcg.1.1386372387046; Fri, 06 Dec 2013 15:26:27 -0800 (PST) Sender: adrian.chadd@gmail.com Received: by 10.224.53.200 with HTTP; Fri, 6 Dec 2013 15:26:27 -0800 (PST) In-Reply-To: <52A255AB.8040905@gmail.com> References: <52A255AB.8040905@gmail.com> Date: Fri, 6 Dec 2013 15:26:27 -0800 X-Google-Sender-Auth: aUEcvx-obMbMaL5hhxIhlI-mkgs Message-ID: Subject: Re: Avoiding an infinite loop in e1000 82575 From: Adrian Chadd To: Karim Fodil-Lemelin Content-Type: text/plain; charset=ISO-8859-1 Cc: FreeBSD Net , Jack Vogel X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 Dec 2013 23:26:28 -0000 Heh, your solution isn't correct. There's a higher level race condition somewhere.. :( -a On 6 December 2013 14:54, Karim Fodil-Lemelin wrote: > Hi, > > I have encountered a strange issue were the igb driver goes into an infinite > loop (I'm using version - 2.3.10) if many incantations of ifconfig are > running in a while loop very fast. The following patch fixed it for me: > > @@ -1052,12 +1052,11 @@ static void e1000_release_swfw_sync_82575(struct > e1000_hw *hw, u16 mask) > { > u32 swfw_sync; > > DEBUGFUNC("e1000_release_swfw_sync_82575"); > > - while (e1000_get_hw_semaphore_generic(hw) != E1000_SUCCESS) > - ; /* Empty */ > + e1000_get_hw_semaphore_generic(hw); > > swfw_sync = E1000_READ_REG(hw, E1000_SW_FW_SYNC); > swfw_sync &= ~mask; > E1000_WRITE_REG(hw, E1000_SW_FW_SYNC, swfw_sync); > > Now, I haven't seen any side effect of this change except that it fixed my > issue although I wonder what they are and what effect will this change have > on the system? > > Thanks, > > Karim. > > PS: Some more information on the devices: > > dmesg: > > igb0: port > 0xc880-0xc89f mem 0xfba80000-0xfbafffff,0xfbb78000-0xfbb7bfff irq 16 at > device 0.0 on pci4 > igb0: Using MSIX interrupts with 2 vectors > igb0: Ethernet address: 00:90:0b:2f:b8:00 > igb0: [ITHREAD] > igb0: [ITHREAD] > igb1: port > 0xcc00-0xcc1f mem 0xfbb80000-0xfbbfffff,0xfbb7c000-0xfbb7ffff irq 17 at > device 0.1 on pci4 > igb1: Using MSIX interrupts with 2 vectors > igb1: Ethernet address: 00:90:0b:2f:b8:01 > igb1: [ITHREAD] > igb1: [ITHREAD] > igb2: port > 0xd880-0xd89f mem 0xfbc80000-0xfbcfffff,0xfbd78000-0xfbd7bfff irq 16 at > device 0.0 on pci5 > igb2: Using MSIX interrupts with 2 vectors > igb2: Ethernet address: 00:90:0b:2f:b8:02 > igb2: [ITHREAD] > igb2: [ITHREAD] > igb3: port > 0xdc00-0xdc1f mem 0xfbd80000-0xfbdfffff,0xfbd7c000-0xfbd7ffff irq 17 at > device 0.1 on pci5 > igb3: Using MSIX interrupts with 2 vectors > igb3: Ethernet address: 00:90:0b:2f:b8:03 > igb3: [ITHREAD] > igb3: [ITHREAD] > > pciconf > > igb0@pci0:4:0:0: class=0x020000 card=0x00008086 chip=0x150e8086 > rev=0x01 hdr=0x00 > igb1@pci0:4:0:1: class=0x020000 card=0x00008086 chip=0x150e8086 > rev=0x01 hdr=0x00 > igb2@pci0:5:0:0: class=0x020000 card=0x00008086 chip=0x150e8086 > rev=0x01 hdr=0x00 > igb3@pci0:5:0:1: class=0x020000 card=0x00008086 chip=0x150e8086 > rev=0x01 hdr=0x00 > > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" From owner-freebsd-net@FreeBSD.ORG Sat Dec 7 23:16:31 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 9342F3B7 for ; Sat, 7 Dec 2013 23:16:31 +0000 (UTC) Received: from mail-qe0-x231.google.com (mail-qe0-x231.google.com [IPv6:2607:f8b0:400d:c02::231]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 4D50A10F5 for ; Sat, 7 Dec 2013 23:16:31 +0000 (UTC) Received: by mail-qe0-f49.google.com with SMTP id w7so1667508qeb.8 for ; Sat, 07 Dec 2013 15:16:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=eitanadler.com; s=0xdeadbeef; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=NBeLw8RIAncRFcU2iYwYlfWLIemjg6JQfKq2RO2ft7I=; b=qO9ZaYLgrVV26a+N2T1hgssjQKV4qO9aHSXO4kn1H8w3V1wzFCNUZBsvefmoanxyIh 4nXywBVZU5TySi/G/j2B+rbxEqMoS6ghusBeX8EIr3CXuokP6+uzBQekWfdrZlvnfslH uwAIEGSwjJCuWQLUn1I9e+ZuWHsh09L5D9vrs= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-type; bh=NBeLw8RIAncRFcU2iYwYlfWLIemjg6JQfKq2RO2ft7I=; b=aMdQktbor6lBx8zda/9i2GO6zapRu6hGgeTAPlz6Xow6LJnv8sdujkLWJejV8h55BH emaWKmPtGQD0s4MkSK74McDFSZWkf7/KVZMcTZye1qTuwdu4PORXLEP43n78c4n3yHKz TjQq296jTWtQR6+k2OTpiCJ+NCthGTi+fEeFHZcufhjQ6eLCWamgEtswCu/l009lWmCz 9VbnL00FbKmKm14l3wxW1KRtHV3RAjdpsKP5D4LAK350LmV4vTOUemm+QvQ8m3IKhShi IfX6Jwx8ib4ickDH2xl0nKM4ahD1hap5bVGfTj4pW58jcaH0lGE42/MaT1MkLKVFgXn3 E6BQ== X-Gm-Message-State: ALoCoQmwoqbQg6BubdUbsIhCAFXA1DskhK5ORvwu5SuJWTiX1spMP7rorPDrurvO8rnSkBoWD3Ja X-Received: by 10.49.24.211 with SMTP id w19mr20056909qef.9.1386458190505; Sat, 07 Dec 2013 15:16:30 -0800 (PST) MIME-Version: 1.0 Received: by 10.96.86.42 with HTTP; Sat, 7 Dec 2013 15:16:00 -0800 (PST) In-Reply-To: <523457A1.3090606@debian.org> References: <523457A1.3090606@debian.org> From: Eitan Adler Date: Sat, 7 Dec 2013 18:16:00 -0500 Message-ID: Subject: Re: IPSEC To: Robert Millan Content-Type: text/plain; charset=UTF-8 Cc: "freebsd-net@freebsd.org" , "debian-bsd@lists.debian.org" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 07 Dec 2013 23:16:31 -0000 Hi all, I understand this is an old thread but I do not see an answer here. Can anyone answer the question below? On Sat, Sep 14, 2013 at 8:33 AM, Robert Millan wrote: > > Hi! > > Is there any particular reason (performance, stability concerns...) > IPSEC support is not enabled in GENERIC? > > In Debian GNU/kFreeBSD we're considering enabling it in our default > builds, due to increased user demand and as it is already enabled for > our Linux-based flavours. > > However we're concerned about diverging from FreeBSD as there might be > unforeseen consequences. Is there any specific concern on your side? > > If not, perhaps it could be considered for HEAD after 10.0 release? -- Eitan Adler