From owner-freebsd-net@FreeBSD.ORG Sun Aug 25 11:38:49 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 51561F91 for ; Sun, 25 Aug 2013 11:38:49 +0000 (UTC) (envelope-from carlopmart@gmail.com) Received: from mail-wg0-x22f.google.com (mail-wg0-x22f.google.com [IPv6:2a00:1450:400c:c00::22f]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id DD2DC28F7 for ; Sun, 25 Aug 2013 11:38:48 +0000 (UTC) Received: by mail-wg0-f47.google.com with SMTP id j13so1891327wgh.14 for ; Sun, 25 Aug 2013 04:38:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:subject :content-type:content-transfer-encoding; bh=/bJ6qMSD1TB8cvoForhTmwhJHf2wkELF2nZqu1RaN8w=; b=N1fdSNU/gyqxHYxIkjhURFyuVOVsZb0+BXSXEbUucRNni66CQe62TaGWSLT9selsb1 G4Rw46VSrD2HBPW8L1zguevc6SJiQS2zAJYE9tHG9yt6KMJejr0Re+/B5/i1lJz1cDOc 84bualg95H50T0CCEFB5lb9Fly3I8ZRdOLkKbOmswfj+6WjQLgaFbd3EXoWArbo+NLg1 +ZSH1dKrWoQRckrtSG7nqCJXPz4fdF0KtWaNHMY468znXHt3y42DaZUt7KqmxAIIdMhl ebEBWadrynZh89kWDXUIxsdFPp0J47h6qst3mnlmcavn3nlXJLhnno382OH27U7gJwx5 VgZw== X-Received: by 10.194.75.165 with SMTP id d5mr6068163wjw.18.1377430726285; Sun, 25 Aug 2013 04:38:46 -0700 (PDT) Received: from [172.17.35.3] (6.Red-79-156-165.staticIP.rima-tde.net. [79.156.165.6]) by mx.google.com with ESMTPSA id ei6sm10199233wid.11.1969.12.31.16.00.00 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Sun, 25 Aug 2013 04:38:45 -0700 (PDT) Message-ID: <5219ECBD.4040209@gmail.com> Date: Sun, 25 Aug 2013 11:38:37 +0000 From: carlopmart User-Agent: Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/17.0 Thunderbird/17.0 MIME-Version: 1.0 To: freebsd-net@freebsd.org Subject: Options to monitor/sniff network traffic under a vm Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 25 Aug 2013 11:38:49 -0000 Hi all, I need to monitor/sniff network traffic for three subnets (1 GiB nets) and I need to do this using a virtual guest under an ESXi 5 host (yes, it is a "handicap"). I would like to use FreeBSD 8.4 + netmap, but I see some problems: a) How can I avoid sharing interrupts for nics interfaces?? This vm needs to use 6 nic interfaces. b) Which is best: em or ixgb emulated drivers?? c) Is it a good idea to enable polling in these nics?? Thanks. From owner-freebsd-net@FreeBSD.ORG Sun Aug 25 14:42:44 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 29DD032F; Sun, 25 Aug 2013 14:42:44 +0000 (UTC) (envelope-from adrian.chadd@gmail.com) Received: from mail-qe0-x232.google.com (mail-qe0-x232.google.com [IPv6:2607:f8b0:400d:c02::232]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id C91DD210B; Sun, 25 Aug 2013 14:42:43 +0000 (UTC) Received: by mail-qe0-f50.google.com with SMTP id s14so1253177qeb.23 for ; Sun, 25 Aug 2013 07:42:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=M0eqWikbruwrDefVXjNln16LD73eM804JV5K0dCzW/s=; b=PE0HqoYR8vRLHt+KUOjsPDxOWMobI7Z/trnUM8C9xxqwMkNKtMORoYPCHzQEXT4V/n IIhGAcDNdBte/lECS+cgasgf1zz917uqsEVAJruqQIEHck408oh6w3BJR0h2eFEZ2Spj F8fjhmFhFUQ+tbXZ9ZafrEcQabHjNzBP8ERAuVSsVf+LRxkPT9TSweJkNEmQzpRRNufQ iLuS4zLyti33xhUq5/1wsNGpzV6bBD+L0vC6BtraGKs342hEhEWo/quNVUTXgiUT+DVH ofxQd7a+4geuXA5kTKMqJ5sKijeeAoeDN4UmezvSLi+55RlfMviDA6yZL4r72v9yNQLg 1UfQ== MIME-Version: 1.0 X-Received: by 10.49.105.74 with SMTP id gk10mr11768829qeb.8.1377441762660; Sun, 25 Aug 2013 07:42:42 -0700 (PDT) Sender: adrian.chadd@gmail.com Received: by 10.224.128.70 with HTTP; Sun, 25 Aug 2013 07:42:42 -0700 (PDT) In-Reply-To: <5218E8B6.5090407@freebsd.org> References: <520A6D07.5080106@freebsd.org> <520B74DD.1060102@ipfw.ru> <20130814124024.GA64548@onelab2.iet.unipi.it> <201308141740.28779.zec@fer.hr> <20130814154853.GA66341@onelab2.iet.unipi.it> <521204A9.7080607@ipfw.ru> <52152837.9010101@freebsd.org> <5218ABB4.5070601@ipfw.ru> <5218E8B6.5090407@freebsd.org> Date: Sun, 25 Aug 2013 07:42:42 -0700 X-Google-Sender-Auth: TgsdbSSv9k3CWhCet1fyV31xcGw Message-ID: Subject: Re: route/arp lifetime (Re: it's the output, not ack coalescing (Re: TSO and FreeBSD vs Linux)) From: Adrian Chadd To: Andre Oppermann Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: FreeBSD Net , "Alexander V. Chernikov" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 25 Aug 2013 14:42:44 -0000 On 24 August 2013 10:09, Andre Oppermann wrote: > On 24.08.2013 19:04, Adrian Chadd wrote: > >> I'm very close to starting an mbuf batching thing to use in a few places >> like receive, transmit and >> transmit completion -> free path. I'd be interested in your >> review/feedback and testing as it sounds >> like something you can easily stress test there. :) >> > > I'd strongly recommend fixing a number of other places and collect > lower hanging fruit before starting with mbuf batching. I'm open to suggestions. Scott killed our high hanging fruit (VM / buffer page lifecycle) and what's left is not very low. If you have any recommendations, I'd love to hear them. -adrian From owner-freebsd-net@FreeBSD.ORG Sun Aug 25 20:04:15 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 3F723D8E; Sun, 25 Aug 2013 20:04:15 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: from mail-ie0-x22e.google.com (mail-ie0-x22e.google.com [IPv6:2607:f8b0:4001:c03::22e]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 071FD2FAF; Sun, 25 Aug 2013 20:04:14 +0000 (UTC) Received: by mail-ie0-f174.google.com with SMTP id k14so3775997iea.33 for ; Sun, 25 Aug 2013 13:04:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=2ZK4AH4NR6lYb5mRqMxozH2mHX0JLGw5VW/Md+PeszQ=; b=smwHLVPdMJ7PZ9bvIvtXORnVUE6V6MOx6DqkcFcz6gWAo37YHnJWiMWG9kaM/avtv4 +9m4+OenDwVfBAuepn0oRByjnOr3Iozt94dWAtdVFKuID/+MUvZqkPm1TryC1P/ITXD7 W+SblsZeMQy3SEgVdJWaoxI6ml6WHiN2HNtXmZ9sVXVPZaJxzI6NrvSdXcoClQjjaTzA nO64vXUNFlSEYpgizB/nliTA+YZwrt/+bPibKa4+YJbPRs3P+iwUpkXL4rmbRa0M2z3I SRtcgjBENEdXg5+itD3EBzGXj0O/Kftby03oW1gpYDt3Nq8XQCAiYQMK0WjD6sRAamKV 6FgQ== X-Received: by 10.50.2.99 with SMTP id 3mr4422328igt.51.1377461053563; Sun, 25 Aug 2013 13:04:13 -0700 (PDT) Received: from raichu (24-212-218-13.cable.teksavvy.com. [24.212.218.13]) by mx.google.com with ESMTPSA id oq3sm13004417igb.1.1969.12.31.16.00.00 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Sun, 25 Aug 2013 13:04:12 -0700 (PDT) Sender: Mark Johnston Date: Sun, 25 Aug 2013 16:04:07 -0400 From: Mark Johnston To: Yuri Subject: Re: DTrace network providers Message-ID: <20130825200407.GA76615@raichu> References: <20130821045926.GA17196@raichu> <52168B0E.1050308@rawbw.com> <5216FC0B.1060304@rawbw.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5216FC0B.1060304@rawbw.com> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-net@freebsd.org, freebsd-dtrace@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 25 Aug 2013 20:04:15 -0000 On Thu, Aug 22, 2013 at 11:07:07PM -0700, Yuri wrote: > On 08/22/2013 15:30, Mark Johnston wrote: > > My apologies! It looks like r254523 introduced a conflict. r254468 is a > > minimum dependency. > > > > I'd suggest trying again with r254523 or later. > > I don't know what the problem is, but I tried on r254523 and on the > later r254677, and am getting build error during buildkernel, see below. > And I did successful buildworld and installworld before this. > I suspect that you didn't delete the files added by the patch before applying the patch a second time. That'll result in two concatenated copies of the new files, which will cause the build to fail at in_kdtrace.c. I just made the same mistake myself. :) > > --- buildkernel errors --- > /usr/src/sys/netinet/in_kdtrace.c:165:1: error: redefinition of > 'sdt_provider_ip' > SDT_PROVIDER_DEFINE(ip); > ^ > /usr/src/sys/sys/sdt.h:136:22: note: expanded from macro > 'SDT_PROVIDER_DEFINE' > struct sdt_provider sdt_provider_##prov[1] = > { \ > ^ > :127:1: note: expanded from here > sdt_provider_ip > ^ > /usr/src/sys/netinet/in_kdtrace.c:38:1: note: previous definition is here > SDT_PROVIDER_DEFINE(ip); > ^ > /usr/src/sys/sys/sdt.h:136:22: note: expanded from macro > 'SDT_PROVIDER_DEFINE' > struct sdt_provider sdt_provider_##prov[1] = > { \ > ^ > :203:1: note: expanded from here > sdt_provider_ip > ^ > /usr/src/sys/netinet/in_kdtrace.c:166:1: error: redefinition of > 'sdt_provider_tcp' > SDT_PROVIDER_DEFINE(tcp); > ^ > /usr/src/sys/sys/sdt.h:136:22: note: expanded from macro > 'SDT_PROVIDER_DEFINE' > struct sdt_provider sdt_provider_##prov[1] = > { \ > ^ > From owner-freebsd-net@FreeBSD.ORG Mon Aug 26 11:06:48 2013 Return-Path: Delivered-To: freebsd-net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 99C6018F for ; Mon, 26 Aug 2013 11:06:48 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 839E82866 for ; Mon, 26 Aug 2013 11:06:48 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id r7QB6mtt066012 for ; Mon, 26 Aug 2013 11:06:48 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.7/8.14.7/Submit) id r7QB6lAW066010 for freebsd-net@FreeBSD.org; Mon, 26 Aug 2013 11:06:47 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 26 Aug 2013 11:06:47 GMT Message-Id: <201308261106.r7QB6lAW066010@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-net@FreeBSD.org Subject: Current problem reports assigned to freebsd-net@FreeBSD.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Aug 2013 11:06:48 -0000 Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/181257 net [bge] bge link status change o kern/181236 net [igb] igb driver unstable work o kern/181225 net [infiniband] [patch] unloading ipoib crashes the kerne o kern/181135 net [netmap] [patch] sys/dev/netmap patch for Linux compat o kern/181131 net [netmap] [patch] sys/dev/netmap memory allocation impr o kern/181006 net [run] [patch] mbuf leak in run(4) driver o kern/180893 net [if_ethersubr] [patch] Packets received with own LLADD o kern/180844 net [panic] [re] Intermittent panic (re driver?) o kern/180775 net [bxe] if_bxe driver broken with Broadcom BCM57711 card o kern/180722 net [bluetooth] bluetooth takes 30-50 attempts to pair to s kern/180468 net [request] LOCAL_PEERCRED support for PF_INET o kern/180065 net [netinet6] [patch] Multicast loopback to own host brok o kern/179926 net [lacp] [patch] active aggregator selection bug o kern/179824 net [ixgbe] System (9.1-p4) hangs on heavy ixgbe network t o kern/179733 net [lagg] [patch] interface loses capabilities when proto o kern/179429 net [tap] STP enabled tap bridge o kern/179299 net [igb] Intel X540-T2 - unstable driver a kern/179264 net [vimage] [pf] Core dump with Packet filter and VIMAGE o kern/178947 net [arp] arp rejecting not working o kern/178782 net [ixgbe] 82599EB SFP does not work with passthrough und o kern/178612 net [run] kernel panic due the problems with run driver o kern/178472 net [ip6] [patch] make return code consistent with IPv4 co o kern/178079 net [tcp] Switching TCP CC algorithm panics on sparc64 wit s kern/178071 net FreeBSD unable to recongize Kontron (Industrial Comput o kern/177905 net [xl] [panic] ifmedia_set when pluging CardBus LAN card o kern/177618 net [bridge] Problem with bridge firewall with trunk ports o kern/177417 net [ip6] Invalid protocol value in ipsec6_common_input_cb o kern/177402 net [igb] [pf] problem with ethernet driver igb + pf / alt o kern/177400 net [jme] JMC25x 1000baseT establishment issues o kern/177366 net [ieee80211] negative malloc(9) statistics for 80211nod f kern/177362 net [netinet] [patch] Wrong control used to return TOS o kern/177194 net [netgraph] Unnamed netgraph nodes for vlan interfaces o kern/177139 net [igb] igb drops ethernet ports 2 and 3 o kern/176884 net [re] re0 flapping up/down o kern/176671 net [epair] MAC address for epair device not unique o kern/176484 net [ipsec] [enc] [patch] panic: IPsec + enc(4); device na o kern/176446 net [netinet] [patch] Concurrency in ixgbe driving out-of- o kern/176420 net [kernel] [patch] incorrect errno for LOCAL_PEERCRED o kern/176419 net [kernel] [patch] socketpair support for LOCAL_PEERCRED o kern/176401 net [netgraph] page fault in netgraph o kern/176167 net [ipsec][lagg] using lagg and ipsec causes immediate pa o kern/176097 net [lagg] [patch] lagg/lacp broken when aggregated interf o kern/176027 net [em] [patch] flow control systcl consistency for em dr o kern/176026 net [tcp] [patch] TCP wrappers caused quite a lot of warni o bin/175974 net ppp(8): logic issue o kern/175864 net [re] Intel MB D510MO, onboard ethernet not working aft o kern/175852 net [amd64] [patch] in_cksum_hdr() behaves differently on o kern/175734 net no ethernet detected on system with EG20T PCH chipset o kern/175267 net [pf] [tap] pf + tap keep state problem o kern/175236 net [epair] [gif] epair and gif Devices On Bridge o kern/175182 net [panic] kernel panic on RADIX_MPATH when deleting rout o kern/175153 net [tcp] will there miss a FIN when do TSO? o kern/174959 net [net] [patch] rnh_walktree_from visits spurious nodes o kern/174958 net [net] [patch] rnh_walktree_from makes unreasonable ass o kern/174897 net [route] Interface routes are broken o kern/174851 net [bxe] [patch] UDP checksum offload is wrong in bxe dri o kern/174850 net [bxe] [patch] bxe driver does not receive multicasts o kern/174849 net [bxe] [patch] bxe driver can hang kernel when reset o kern/174822 net [tcp] Page fault in tcp_discardcb under high traffic o kern/174602 net [gif] [ipsec] traceroute issue on gif tunnel with ipse o kern/174535 net [tcp] TCP fast retransmit feature works strange o kern/173871 net [gif] process of 'ifconfig gif0 create hangs' when if_ o kern/173475 net [tun] tun(4) stays opened by PID after process is term o kern/173201 net [ixgbe] [patch] Missing / broken ixgbe sysctl's and tu o kern/173137 net [em] em(4) unable to run at gigabit with 9.1-RC2 o kern/173002 net [patch] data type size problem in if_spppsubr.c o kern/172895 net [ixgb] [ixgbe] do not properly determine link-state o kern/172683 net [ip6] Duplicate IPv6 Link Local Addresses o kern/172675 net [netinet] [patch] sysctl_tcp_hc_list (net.inet.tcp.hos o kern/172113 net [panic] [e1000] [patch] 9.1-RC1/amd64 panices in igb(4 o kern/171840 net [ip6] IPv6 packets transmitting only on queue 0 o kern/171739 net [bce] [panic] bce related kernel panic o kern/171711 net [dummynet] [panic] Kernel panic in dummynet o kern/171532 net [ndis] ndis(4) driver includes 'pccard'-specific code, o kern/171531 net [ndis] undocumented dependency for ndis(4) o kern/171524 net [ipmi] ipmi driver crashes kernel by reboot or shutdow s kern/171508 net [epair] [request] Add the ability to name epair device o kern/171228 net [re] [patch] if_re - eeprom write issues o kern/170701 net [ppp] killl ppp or reboot with active ppp connection c o kern/170267 net [ixgbe] IXGBE_LE32_TO_CPUS is probably an unintentiona o kern/170081 net [fxp] pf/nat/jails not working if checksum offloading o kern/169898 net ifconfig(8) fails to set MTU on multiple interfaces. o kern/169676 net [bge] [hang] system hangs, fully or partially after re o kern/169664 net [bgp] Wrongful replacement of interface connected net o kern/169620 net [ng] [pf] ng_l2tp incoming packet bypass pf firewall o kern/169459 net [ppp] umodem/ppp/3g stopped working after update from o kern/169438 net [ipsec] ipv4-in-ipv6 tunnel mode IPsec does not work p kern/168294 net [ixgbe] [patch] ixgbe driver compiled in kernel has no o kern/168246 net [em] Multiple em(4) not working with qemu o kern/168245 net [arp] [regression] Permanent ARP entry not deleted on o kern/168244 net [arp] [regression] Unable to manually remove permanent o kern/168183 net [bce] bce driver hang system o kern/167947 net [setfib] [patch] arpresolve checks only the default FI o kern/167603 net [ip] IP fragment reassembly's broken: file transfer ov o kern/167500 net [em] [panic] Kernel panics in em driver o kern/167325 net [netinet] [patch] sosend sometimes return EINVAL with o kern/167202 net [igmp]: Sending multiple IGMP packets crashes kernel o kern/166462 net [gre] gre(4) when using a tunnel source address from c o kern/166285 net [arp] FreeBSD v8.1 REL p8 arp: unknown hardware addres o kern/166255 net [net] [patch] It should be possible to disable "promis p kern/165903 net mbuf leak o kern/165622 net [ndis][panic][patch] Unregistered use of FPU in kernel s kern/165562 net [request] add support for Intel i350 in FreeBSD 7.4 o kern/165526 net [bxe] UDP packets checksum calculation whithin if_bxe o kern/165488 net [ppp] [panic] Fatal trap 12 jails and ppp , kernel wit o kern/165305 net [ip6] [request] Feature parity between IP_TOS and IPV6 o kern/165296 net [vlan] [patch] Fix EVL_APPLY_VLID, update EVL_APPLY_PR o kern/165181 net [igb] igb freezes after about 2 weeks of uptime o kern/165174 net [patch] [tap] allow tap(4) to keep its address on clos o kern/165152 net [ip6] Does not work through the issue of ipv6 addresse o kern/164495 net [igb] connect double head igb to switch cause system t o kern/164490 net [pfil] Incorrect IP checksum on pfil pass from ip_outp o kern/164475 net [gre] gre misses RUNNING flag after a reboot o kern/164265 net [netinet] [patch] tcp_lro_rx computes wrong checksum i o kern/163903 net [igb] "igb0:tx(0)","bpf interface lock" v2.2.5 9-STABL o kern/163481 net freebsd do not add itself to ping route packet o kern/162927 net [tun] Modem-PPP error ppp[1538]: tun0: Phase: Clearing o kern/162558 net [dummynet] [panic] seldom dummynet panics o kern/162153 net [em] intel em driver 7.2.4 don't compile o kern/162110 net [igb] [panic] RELENG_9 panics on boot in IGB driver - o kern/162028 net [ixgbe] [patch] misplaced #endif in ixgbe.c o kern/161277 net [em] [patch] BMC cannot receive IPMI traffic after loa o kern/160873 net [igb] igb(4) from HEAD fails to build on 7-STABLE o kern/160750 net Intel PRO/1000 connection breaks under load until rebo o kern/160693 net [gif] [em] Multicast packet are not passed from GIF0 t o kern/160293 net [ieee80211] ppanic] kernel panic during network setup o kern/160206 net [gif] gifX stops working after a while (IPv6 tunnel) o kern/159817 net [udp] write UDPv4: No buffer space available (code=55) o kern/159629 net [ipsec] [panic] kernel panic with IPsec in transport m o kern/159621 net [tcp] [panic] panic: soabort: so_count o kern/159603 net [netinet] [patch] in_ifscrubprefix() - network route c o kern/159601 net [netinet] [patch] in_scrubprefix() - loopback route re o kern/159294 net [em] em watchdog timeouts o kern/159203 net [wpi] Intel 3945ABG Wireless LAN not support IBSS o kern/158930 net [bpf] BPF element leak in ifp->bpf_if->bif_dlist o kern/158726 net [ip6] [patch] ICMPv6 Router Announcement flooding limi o kern/158694 net [ix] [lagg] ix0 is not working within lagg(4) o kern/158665 net [ip6] [panic] kernel pagefault in in6_setscope() o kern/158635 net [em] TSO breaks BPF packet captures with em driver f kern/157802 net [dummynet] [panic] kernel panic in dummynet o kern/157785 net amd64 + jail + ipfw + natd = very slow outbound traffi o kern/157418 net [em] em driver lockup during boot on Supermicro X9SCM- o kern/157410 net [ip6] IPv6 Router Advertisements Cause Excessive CPU U o kern/157287 net [re] [panic] INVARIANTS panic (Memory modified after f o kern/157200 net [network.subr] [patch] stf(4) can not communicate betw o kern/157182 net [lagg] lagg interface not working together with epair o kern/156877 net [dummynet] [panic] dummynet move_pkt() null ptr derefe o kern/156667 net [em] em0 fails to init on CURRENT after March 17 o kern/156408 net [vlan] Routing failure when using VLANs vs. Physical e o kern/156328 net [icmp]: host can ping other subnet but no have IP from o kern/156317 net [ip6] Wrong order of IPv6 NS DAD/MLD Report o kern/156283 net [ip6] [patch] nd6_ns_input - rtalloc_mpath does not re o kern/156279 net [if_bridge][divert][ipfw] unable to correctly re-injec o kern/156226 net [lagg]: failover does not announce the failover to swi o kern/156030 net [ip6] [panic] Crash in nd6_dad_start() due to null ptr o kern/155680 net [multicast] problems with multicast s kern/155642 net [new driver] [request] Add driver for Realtek RTL8191S o kern/155597 net [panic] Kernel panics with "sbdrop" message o kern/155420 net [vlan] adding vlan break existent vlan o kern/155177 net [route] [panic] Panic when inject routes in kernel o kern/155010 net [msk] ntfs-3g via iscsi using msk driver cause kernel o kern/154943 net [gif] ifconfig gifX create on existing gifX clears IP s kern/154851 net [new driver] [request]: Port brcm80211 driver from Lin o kern/154850 net [netgraph] [patch] ng_ether fails to name nodes when t o kern/154679 net [em] Fatal trap 12: "em1 taskq" only at startup (8.1-R o kern/154600 net [tcp] [panic] Random kernel panics on tcp_output o kern/154557 net [tcp] Freeze tcp-session of the clients, if in the gat o kern/154443 net [if_bridge] Kernel module bridgestp.ko missing after u o kern/154286 net [netgraph] [panic] 8.2-PRERELEASE panic in netgraph o kern/154255 net [nfs] NFS not responding o kern/154214 net [stf] [panic] Panic when creating stf interface o kern/154185 net race condition in mb_dupcl p kern/154169 net [multicast] [ip6] Node Information Query multicast add o kern/154134 net [ip6] stuck kernel state in LISTEN on ipv6 daemon whic o kern/154091 net [netgraph] [panic] netgraph, unaligned mbuf? o conf/154062 net [vlan] [patch] change to way of auto-generatation of v o kern/153937 net [ral] ralink panics the system (amd64 freeBSDD 8.X) wh o kern/153936 net [ixgbe] [patch] MPRC workaround incorrectly applied to o kern/153816 net [ixgbe] ixgbe doesn't work properly with the Intel 10g o kern/153772 net [ixgbe] [patch] sysctls reference wrong XON/XOFF varia o kern/153497 net [netgraph] netgraph panic due to race conditions o kern/153454 net [patch] [wlan] [urtw] Support ad-hoc and hostap modes o kern/153308 net [em] em interface use 100% cpu o kern/153244 net [em] em(4) fails to send UDP to port 0xffff o kern/152893 net [netgraph] [panic] 8.2-PRERELEASE panic in netgraph o kern/152853 net [em] tftpd (and likely other udp traffic) fails over e o kern/152828 net [em] poor performance on 8.1, 8.2-PRE o kern/152569 net [net]: Multiple ppp connections and routing table prob o kern/152235 net [arp] Permanent local ARP entries are not properly upd o kern/152141 net [vlan] [patch] encapsulate vlan in ng_ether before out o kern/152036 net [libc] getifaddrs(3) returns truncated sockaddrs for n o kern/151690 net [ep] network connectivity won't work until dhclient is o kern/151681 net [nfs] NFS mount via IPv6 leads to hang on client with o kern/151593 net [igb] [panic] Kernel panic when bringing up igb networ o kern/150920 net [ixgbe][igb] Panic when packets are dropped with heade o kern/150557 net [igb] igb0: Watchdog timeout -- resetting o kern/150251 net [patch] [ixgbe] Late cable insertion broken o kern/150249 net [ixgbe] Media type detection broken o bin/150224 net ppp(8) does not reassign static IP after kill -KILL co f kern/149969 net [wlan] [ral] ralink rt2661 fails to maintain connectio o kern/149643 net [rum] device not sending proper beacon frames in ap mo o kern/149609 net [panic] reboot after adding second default route o kern/149117 net [inet] [patch] in_pcbbind: redundant test o kern/149086 net [multicast] Generic multicast join failure in 8.1 o kern/148018 net [flowtable] flowtable crashes on ia64 o kern/147912 net [boot] FreeBSD 8 Beta won't boot on Thinkpad i1300 11 o kern/147894 net [ipsec] IPv6-in-IPv4 does not work inside an ESP-only o kern/147155 net [ip6] setfb not work with ipv6 o kern/146845 net [libc] close(2) returns error 54 (connection reset by f kern/146792 net [flowtable] flowcleaner 100% cpu's core load o kern/146719 net [pf] [panic] PF or dumynet kernel panic o kern/146534 net [icmp6] wrong source address in echo reply o kern/146427 net [mwl] Additional virtual access points don't work on m f kern/146394 net [vlan] IP source address for outgoing connections o bin/146377 net [ppp] [tun] Interface doesn't clear addresses when PPP o kern/146358 net [vlan] wrong destination MAC address o kern/146165 net [wlan] [panic] Setting bssid in adhoc mode causes pani o kern/146082 net [ng_l2tp] a false invaliant check was performed in ng_ o kern/146037 net [panic] mpd + CoA = kernel panic o kern/145825 net [panic] panic: soabort: so_count o kern/145728 net [lagg] Stops working lagg between two servers. p kern/145600 net TCP/ECN behaves different to CE/CWR than ns2 reference f kern/144917 net [flowtable] [panic] flowtable crashes system [regressi o kern/144882 net MacBookPro =>4.1 does not connect to BSD in hostap wit o kern/144874 net [if_bridge] [patch] if_bridge frees mbuf after pfil ho o conf/144700 net [rc.d] async dhclient breaks stuff for too many people o kern/144616 net [nat] [panic] ip_nat panic FreeBSD 7.2 f kern/144315 net [ipfw] [panic] freebsd 8-stable reboot after add ipfw o kern/144231 net bind/connect/sendto too strict about sockaddr length o kern/143846 net [gif] bringing gif3 tunnel down causes gif0 tunnel to s kern/143673 net [stf] [request] there should be a way to support multi s kern/143666 net [ip6] [request] PMTU black hole detection not implemen o kern/143622 net [pfil] [patch] unlock pfil lock while calling firewall o kern/143593 net [ipsec] When using IPSec, tcpdump doesn't show outgoin o kern/143591 net [ral] RT2561C-based DLink card (DWL-510) fails to work o kern/143208 net [ipsec] [gif] IPSec over gif interface not working o kern/143034 net [panic] system reboots itself in tcp code [regression] o kern/142877 net [hang] network-related repeatable 8.0-STABLE hard hang o kern/142774 net Problem with outgoing connections on interface with mu o kern/142772 net [libc] lla_lookup: new lle malloc failed f kern/142518 net [em] [lagg] Problem on 8.0-STABLE with em and lagg o kern/142018 net [iwi] [patch] Possibly wrong interpretation of beacon- o kern/141861 net [wi] data garbled with WEP and wi(4) with Prism 2.5 f kern/141741 net Etherlink III NIC won't work after upgrade to FBSD 8, o kern/140742 net rum(4) Two asus-WL167G adapters cannot talk to each ot o kern/140682 net [netgraph] [panic] random panic in netgraph f kern/140634 net [vlan] destroying if_lagg interface with if_vlan membe o kern/140619 net [ifnet] [patch] refine obsolete if_var.h comments desc o kern/140346 net [wlan] High bandwidth use causes loss of wlan connecti o kern/140142 net [ip6] [panic] FreeBSD 7.2-amd64 panic w/IPv6 o kern/140066 net [bwi] install report for 8.0 RC 2 (multiple problems) o kern/139387 net [ipsec] Wrong lenth of PF_KEY messages in promiscuous o bin/139346 net [patch] arp(8) add option to remove static entries lis o kern/139268 net [if_bridge] [patch] allow if_bridge to forward just VL p kern/139204 net [arp] DHCP server replies rejected, ARP entry lost bef o kern/139117 net [lagg] + wlan boot timing (EBUSY) o kern/138850 net [dummynet] dummynet doesn't work correctly on a bridge o kern/138782 net [panic] sbflush_internal: cc 0 || mb 0xffffff004127b00 o kern/138688 net [rum] possibly broken on 8 Beta 4 amd64: able to wpa a o kern/138678 net [lo] FreeBSD does not assign linklocal address to loop o kern/138407 net [gre] gre(4) interface does not come up after reboot o kern/138332 net [tun] [lor] ifconfig tun0 destroy causes LOR if_adata/ o kern/138266 net [panic] kernel panic when udp benchmark test used as r f kern/138029 net [bpf] [panic] periodically kernel panic and reboot o kern/137881 net [netgraph] [panic] ng_pppoe fatal trap 12 p bin/137841 net [patch] wpa_supplicant(8) cannot verify SHA256 signed p kern/137776 net [rum] panic in rum(4) driver on 8.0-BETA2 o bin/137641 net ifconfig(8): various problems with "vlan_device.vlan_i o kern/137392 net [ip] [panic] crash in ip_nat.c line 2577 o kern/137372 net [ral] FreeBSD doesn't support wireless interface from o kern/137089 net [lagg] lagg falsely triggers IPv6 duplicate address de o kern/136911 net [netgraph] [panic] system panic on kldload ng_bpf.ko t o kern/136618 net [pf][stf] panic on cloning interface without unit numb o kern/135502 net [periodic] Warning message raised by rtfree function i o kern/134583 net [hang] Machine with jail freezes after random amount o o kern/134531 net [route] [panic] kernel crash related to routes/zebra o kern/134157 net [dummynet] dummynet loads cpu for 100% and make a syst o kern/133969 net [dummynet] [panic] Fatal trap 12: page fault while in o kern/133968 net [dummynet] [panic] dummynet kernel panic o kern/133736 net [udp] ip_id not protected ... o kern/133595 net [panic] Kernel Panic at pcpu.h:195 o kern/133572 net [ppp] [hang] incoming PPTP connection hangs the system o kern/133490 net [bpf] [panic] 'kmem_map too small' panic on Dell r900 o kern/133235 net [netinet] [patch] Process SIOCDLIFADDR command incorre f kern/133213 net arp and sshd errors on 7.1-PRERELEASE o kern/133060 net [ipsec] [pfsync] [panic] Kernel panic with ipsec + pfs o kern/132889 net [ndis] [panic] NDIS kernel crash on load BCM4321 AGN d o conf/132851 net [patch] rc.conf(5): allow to setfib(1) for service run o kern/132734 net [ifmib] [panic] panic in net/if_mib.c o kern/132705 net [libwrap] [patch] libwrap - infinite loop if hosts.all o kern/132672 net [ndis] [panic] ndis with rt2860.sys causes kernel pani o kern/132354 net [nat] Getting some packages to ipnat(8) causes crash o kern/132277 net [crypto] [ipsec] poor performance using cryptodevice f o kern/131781 net [ndis] ndis keeps dropping the link o kern/131776 net [wi] driver fails to init o kern/131753 net [altq] [panic] kernel panic in hfsc_dequeue o bin/131365 net route(8): route add changes interpretation of network f kern/130820 net [ndis] wpa_supplicant(8) returns 'no space on device' o kern/130628 net [nfs] NFS / rpc.lockd deadlock on 7.1-R o kern/130525 net [ndis] [panic] 64 bit ar5008 ndisgen-erated driver cau o kern/130311 net [wlan_xauth] [panic] hostapd restart causing kernel pa o kern/130109 net [ipfw] Can not set fib for packets originated from loc f kern/130059 net [panic] Leaking 50k mbufs/hour f kern/129719 net [nfs] [panic] Panic during shutdown, tcp_ctloutput: in o kern/129517 net [ipsec] [panic] double fault / stack overflow f kern/129508 net [carp] [panic] Kernel panic with EtherIP (may be relat o kern/129219 net [ppp] Kernel panic when using kernel mode ppp o kern/129197 net [panic] 7.0 IP stack related panic o bin/128954 net ifconfig(8) deletes valid routes o bin/128602 net [an] wpa_supplicant(8) crashes with an(4) o kern/128448 net [nfs] 6.4-RC1 Boot Fails if NFS Hostname cannot be res o bin/128295 net [patch] ifconfig(8) does not print TOE4 or TOE6 capabi o bin/128001 net wpa_supplicant(8), wlan(4), and wi(4) issues o kern/127826 net [iwi] iwi0 driver has reduced performance and connecti o kern/127815 net [gif] [patch] if_gif does not set vlan attributes from o kern/127724 net [rtalloc] rtfree: 0xc5a8f870 has 1 refs f bin/127719 net [arp] arp: Segmentation fault (core dumped) f kern/127528 net [icmp]: icmp socket receives icmp replies not owned by p kern/127360 net [socket] TOE socket options missing from sosetopt() o bin/127192 net routed(8) removes the secondary alias IP of interface f kern/127145 net [wi]: prism (wi) driver crash at bigger traffic o kern/126895 net [patch] [ral] Add antenna selection (marked as TBD) o kern/126874 net [vlan]: Zebra problem if ifconfig vlanX destroy o kern/126695 net rtfree messages and network disruption upon use of if_ o kern/126339 net [ipw] ipw driver drops the connection o kern/126075 net [inet] [patch] internet control accesses beyond end of o bin/125922 net [patch] Deadlock in arp(8) o kern/125920 net [arp] Kernel Routing Table loses Ethernet Link status o kern/125845 net [netinet] [patch] tcp_lro_rx() should make use of hard o kern/125258 net [socket] socket's SO_REUSEADDR option does not work o kern/125239 net [gre] kernel crash when using gre o kern/124341 net [ral] promiscuous mode for wireless device ral0 looses o kern/124225 net [ndis] [patch] ndis network driver sometimes loses net o kern/124160 net [libc] connect(2) function loops indefinitely o kern/124021 net [ip6] [panic] page fault in nd6_output() o kern/123968 net [rum] [panic] rum driver causes kernel panic with WPA. o kern/123892 net [tap] [patch] No buffer space available o kern/123890 net [ppp] [panic] crash & reboot on work with PPP low-spee o kern/123858 net [stf] [patch] stf not usable behind a NAT o kern/123758 net [panic] panic while restarting net/freenet6 o bin/123633 net ifconfig(8) doesn't set inet and ether address in one o kern/123559 net [iwi] iwi periodically disassociates/associates [regre o bin/123465 net [ip6] route(8): route add -inet6 -interfac o kern/123463 net [ipsec] [panic] repeatable crash related to ipsec-tool o conf/123330 net [nsswitch.conf] Enabling samba wins in nsswitch.conf c o kern/123160 net [ip] Panic and reboot at sysctl kern.polling.enable=0 o kern/122989 net [swi] [panic] 6.3 kernel panic in swi1: net o kern/122954 net [lagg] IPv6 EUI64 incorrectly chosen for lagg devices f kern/122780 net [lagg] tcpdump on lagg interface during high pps wedge o kern/122685 net It is not visible passing packets in tcpdump(1) o kern/122319 net [wi] imposible to enable ad-hoc demo mode with Orinoco o kern/122290 net [netgraph] [panic] Netgraph related "kmem_map too smal o kern/122252 net [ipmi] [bge] IPMI problem with BCM5704 (does not work o kern/122033 net [ral] [lor] Lock order reversal in ral0 at bootup ieee o bin/121895 net [patch] rtsol(8)/rtsold(8) doesn't handle managed netw s kern/121774 net [swi] [panic] 6.3 kernel panic in swi1: net o kern/121555 net [panic] Fatal trap 12: current process = 12 (swi1: net o kern/121534 net [ipl] [nat] FreeBSD Release 6.3 Kernel Trap 12: o kern/121443 net [gif] [lor] icmp6_input/nd6_lookup o kern/121437 net [vlan] Routing to layer-2 address does not work on VLA o bin/121359 net [patch] [security] ppp(8): fix local stack overflow in o kern/121257 net [tcp] TSO + natd -> slow outgoing tcp traffic o kern/121181 net [panic] Fatal trap 3: breakpoint instruction fault whi o kern/120966 net [rum] kernel panic with if_rum and WPA encryption o kern/120566 net [request]: ifconfig(8) make order of arguments more fr o kern/120304 net [netgraph] [patch] netgraph source assumes 32-bit time o kern/120266 net [udp] [panic] gnugk causes kernel panic when closing U o bin/120060 net routed(8) deletes link-level routes in the presence of o kern/119945 net [rum] [panic] rum device in hostap mode, cause kernel o kern/119791 net [nfs] UDP NFS mount of aliased IP addresses from a Sol o kern/119617 net [nfs] nfs error on wpa network when reseting/shutdown f kern/119516 net [ip6] [panic] _mtx_lock_sleep: recursed on non-recursi o kern/119432 net [arp] route add -host -iface causes arp e o kern/119225 net [wi] 7.0-RC1 no carrier with Prism 2.5 wifi card [regr o kern/118727 net [netgraph] [patch] [request] add new ng_pf module o kern/117423 net [vlan] Duplicate IP on different interfaces o bin/117339 net [patch] route(8): loading routing management commands o bin/116643 net [patch] [request] fstat(1): add INET/INET6 socket deta o kern/116185 net [iwi] if_iwi driver leads system to reboot o kern/115239 net [ipnat] panic with 'kmem_map too small' using ipnat o kern/115019 net [netgraph] ng_ether upper hook packet flow stops on ad o kern/115002 net [wi] if_wi timeout. failed allocation (busy bit). ifco o kern/114915 net [patch] [pcn] pcn (sys/pci/if_pcn.c) ethernet driver f o kern/113432 net [ucom] WARNING: attempt to net_add_domain(netgraph) af o kern/112722 net [ipsec] [udp] IP v4 udp fragmented packet reject o kern/112686 net [patm] patm driver freezes System (FreeBSD 6.2-p4) i38 o bin/112557 net [patch] ppp(8) lock file should not use symlink name o kern/112528 net [nfs] NFS over TCP under load hangs with "impossible p o kern/111537 net [inet6] [patch] ip6_input() treats mbuf cluster wrong o kern/111457 net [ral] ral(4) freeze o kern/110284 net [if_ethersubr] Invalid Assumption in SIOCSIFADDR in et o kern/110249 net [kernel] [regression] [patch] setsockopt() error regre o kern/109470 net [wi] Orinoco Classic Gold PC Card Can't Channel Hop o bin/108895 net pppd(8): PPPoE dead connections on 6.2 [regression] f kern/108197 net [panic] [gif] [ip6] if_delmulti reference counting pan o kern/107944 net [wi] [patch] Forget to unlock mutex-locks o conf/107035 net [patch] bridge(8): bridge interface given in rc.conf n o kern/106444 net [netgraph] [panic] Kernel Panic on Binding to an ip to o kern/106316 net [dummynet] dummynet with multipass ipfw drops packets o kern/105945 net Address can disappear from network interface s kern/105943 net Network stack may modify read-only mbuf chain copies o bin/105925 net problems with ifconfig(8) and vlan(4) [regression] o kern/104851 net [inet6] [patch] On link routes not configured when usi o kern/104751 net [netgraph] kernel panic, when getting info about my tr o kern/104738 net [inet] [patch] Reentrant problem with inet_ntoa in the o kern/103191 net Unpredictable reboot o kern/103135 net [ipsec] ipsec with ipfw divert (not NAT) encodes a pac o kern/102540 net [netgraph] [patch] supporting vlan(4) by ng_fec(4) o conf/102502 net [netgraph] [patch] ifconfig name does't rename netgrap o kern/102035 net [plip] plip networking disables parallel port printing o kern/100709 net [libc] getaddrinfo(3) should return TTL info o kern/100519 net [netisr] suggestion to fix suboptimal network polling o kern/98597 net [inet6] Bug in FreeBSD 6.1 IPv6 link-local DAD procedu o bin/98218 net wpa_supplicant(8) blacklist not working o kern/97306 net [netgraph] NG_L2TP locks after connection with failed o conf/97014 net [gif] gifconfig_gif? in rc.conf does not recognize IPv f kern/96268 net [socket] TCP socket performance drops by 3000% if pack o kern/95519 net [ral] ral0 could not map mbuf o kern/95288 net [pppd] [tty] [panic] if_ppp panic in sys/kern/tty_subr o kern/95277 net [netinet] [patch] IP Encapsulation mask_match() return o kern/95267 net packet drops periodically appear f kern/93378 net [tcp] Slow data transfer in Postfix and Cyrus IMAP (wo o kern/93019 net [ppp] ppp and tunX problems: no traffic after restarti o kern/92880 net [libc] [patch] almost rewritten inet_network(3) functi s kern/92279 net [dc] Core faults everytime I reboot, possible NIC issu o kern/91859 net [ndis] if_ndis does not work with Asus WL-138 o kern/91364 net [ral] [wep] WF-511 RT2500 Card PCI and WEP o kern/91311 net [aue] aue interface hanging o kern/87421 net [netgraph] [panic]: ng_ether + ng_eiface + if_bridge o kern/86871 net [tcp] [patch] allocation logic for PCBs in TIME_WAIT s o kern/86427 net [lor] Deadlock with FASTIPSEC and nat o kern/85780 net 'panic: bogus refcnt 0' in routing/ipv6 o bin/85445 net ifconfig(8): deprecated keyword to ifconfig inoperativ o bin/82975 net route change does not parse classfull network as given o kern/82881 net [netgraph] [panic] ng_fec(4) causes kernel panic after o kern/82468 net Using 64MB tcp send/recv buffers, trafficflow stops, i o bin/82185 net [patch] ndp(8) can delete the incorrect entry o kern/81095 net IPsec connection stops working if associated network i o kern/78968 net FreeBSD freezes on mbufs exhaustion (network interface o kern/78090 net [ipf] ipf filtering on bridged packets doesn't work if o kern/77341 net [ip6] problems with IPV6 implementation o kern/75873 net Usability problem with non-RFC-compliant IP spoof prot s kern/75407 net [an] an(4): no carrier after short time a kern/71474 net [route] route lookup does not skip interfaces marked d o kern/71469 net default route to internet magically disappears with mu o kern/68889 net [panic] m_copym, length > size of mbuf chain o kern/66225 net [netgraph] [patch] extend ng_eiface(4) control message o kern/65616 net IPSEC can't detunnel GRE packets after real ESP encryp s kern/60293 net [patch] FreeBSD arp poison patch a kern/56233 net IPsec tunnel (ESP) over IPv6: MTU computation is wrong s bin/41647 net ifconfig(8) doesn't accept lladdr along with inet addr o kern/39937 net ipstealth issue a kern/38554 net [patch] changing interface ipaddress doesn't seem to w o kern/31940 net ip queue length too short for >500kpps o kern/31647 net [libc] socket calls can return undocumented EINVAL o kern/30186 net [libc] getaddrinfo(3) does not handle incorrect servna f kern/24959 net [patch] proper TCP_NOPUSH/TCP_CORK compatibility o conf/23063 net [arp] [patch] for static ARP tables in rc.network o kern/21998 net [socket] [patch] ident only for outgoing connections o kern/5877 net [socket] sb_cc counts control data as well as data dat 460 problems total. From owner-freebsd-net@FreeBSD.ORG Mon Aug 26 11:39:05 2013 Return-Path: Delivered-To: freebsd-net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id A350E1C4; Mon, 26 Aug 2013 11:39:05 +0000 (UTC) (envelope-from hrs@FreeBSD.org) Received: from mail.allbsd.org (gatekeeper.allbsd.org [IPv6:2001:2f0:104:e001::32]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 154E72BE2; Mon, 26 Aug 2013 11:39:01 +0000 (UTC) Received: from alph.d.allbsd.org (p2049-ipbf1102funabasi.chiba.ocn.ne.jp [122.26.101.49]) (authenticated bits=128) by mail.allbsd.org (8.14.5/8.14.5) with ESMTP id r7QBcVEd036429 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 26 Aug 2013 20:38:41 +0900 (JST) (envelope-from hrs@FreeBSD.org) Received: from localhost (localhost [IPv6:::1]) (authenticated bits=0) by alph.d.allbsd.org (8.14.5/8.14.5) with ESMTP id r7QBcRsC020918; Mon, 26 Aug 2013 20:38:31 +0900 (JST) (envelope-from hrs@FreeBSD.org) Date: Mon, 26 Aug 2013 20:37:44 +0900 (JST) Message-Id: <20130826.203744.2304902117196747104.hrs@allbsd.org> To: d@delphij.net, delphij@delphij.net Subject: Re: Why default route is not installed last? From: Hiroki Sato In-Reply-To: <521670FF.6080407@delphij.net> References: <521670FF.6080407@delphij.net> X-PGPkey-fingerprint: BDB3 443F A5DD B3D0 A530 FFD7 4F2C D3D8 2793 CF2D X-Mailer: Mew version 6.5 on Emacs 24.3 / Mule 6.0 (HANACHIRUSATO) Mime-Version: 1.0 Content-Type: Multipart/Signed; protocol="application/pgp-signature"; micalg=pgp-sha1; boundary="--Security_Multipart(Mon_Aug_26_20_37_44_2013_842)--" Content-Transfer-Encoding: 7bit X-Virus-Scanned: clamav-milter 0.97.4 at gatekeeper.allbsd.org X-Virus-Status: Clean X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (mail.allbsd.org [133.31.130.32]); Mon, 26 Aug 2013 20:38:41 +0900 (JST) X-Spam-Status: No, score=-90.6 required=13.0 tests=CONTENT_TYPE_PRESENT, DIRECTOCNDYN,DYN_PBL,RCVD_IN_PBL,SPF_SOFTFAIL,USER_IN_WHITELIST autolearn=no version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on gatekeeper.allbsd.org Cc: freebsd-net@FreeBSD.org, freebsd-rc@FreeBSD.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Aug 2013 11:39:05 -0000 ----Security_Multipart(Mon_Aug_26_20_37_44_2013_842)-- Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Xin Li wrote in <521670FF.6080407@delphij.net>: de> -----BEGIN PGP SIGNED MESSAGE----- de> Hash: SHA512 de> de> Hi, de> de> I've noticed that we do not install default route last (after other de> static routes). I think we should probably install it last, since the de> administrator may legitimately configure a static route (e.g. this de> IPv6 address goes to this interface) that is required by the default de> route. Do you have an example? I could imagine some theoretically but personally think that the default route which depends on a static route is one which should be avoided. -- Hiroki ----Security_Multipart(Mon_Aug_26_20_37_44_2013_842)-- Content-Type: application/pgp-signature Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.13 (FreeBSD) iEYEABECAAYFAlIbPggACgkQTyzT2CeTzy3s+QCdF+QZ29eOQQI7iuBQpBdUsxjt 67QAoN7iRbfoSo7qEzA2w2yolz7XRqp8 =SN+U -----END PGP SIGNATURE----- ----Security_Multipart(Mon_Aug_26_20_37_44_2013_842)---- From owner-freebsd-net@FreeBSD.ORG Mon Aug 26 11:56:17 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 62FF794E; Mon, 26 Aug 2013 11:56:17 +0000 (UTC) (envelope-from kpaasial@gmail.com) Received: from mail-qe0-x230.google.com (mail-qe0-x230.google.com [IPv6:2607:f8b0:400d:c02::230]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 0351E2D30; Mon, 26 Aug 2013 11:56:16 +0000 (UTC) Received: by mail-qe0-f48.google.com with SMTP id 1so1657905qec.21 for ; Mon, 26 Aug 2013 04:56:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=OaZTMCwXbvKj8f+xQLhTBpc05r9z8jMTnwNKb9yVeMs=; b=A9Jhr1SH8rdfnQcX6uxPevIQ1v9gwkE4wwNCZ/i1istmwrEl/L+FyGfgyaDqlOrmr5 6QIVDWuVJOhZlrCdNw+aW1Mf2bfsSUViBuQr18YGTwPsHLLJU7RPZE4Kgq/K2RD9Y61a nJC5oW2pRIpLMUus+Wm68r8YN7Nq9TF/iSql5WUiIIhN0GpqSArBwK9tPmdCHjwKc8VT N6sHMnYXjLOpt2gTlzJwRKRRZ8lPcELKaT4Sj6+mOgA7BwBC6jnmMWBtGf5skuDI2mua LD4OWIDvPB2famN4Vr/ExIK0LFkhiPJpeto3nZlblWB93HkvnGXzIlBZjFTc40EXW04R 66bw== MIME-Version: 1.0 X-Received: by 10.224.112.69 with SMTP id v5mr958548qap.91.1377518176119; Mon, 26 Aug 2013 04:56:16 -0700 (PDT) Received: by 10.224.5.195 with HTTP; Mon, 26 Aug 2013 04:56:16 -0700 (PDT) In-Reply-To: <20130826.203744.2304902117196747104.hrs@allbsd.org> References: <521670FF.6080407@delphij.net> <20130826.203744.2304902117196747104.hrs@allbsd.org> Date: Mon, 26 Aug 2013 14:56:16 +0300 Message-ID: Subject: Re: Why default route is not installed last? From: Kimmo Paasiala To: Hiroki Sato Content-Type: text/plain; charset=UTF-8 Cc: freebsd-rc@freebsd.org, delphij@delphij.net, d@delphij.net, FreeBSD Net X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Aug 2013 11:56:17 -0000 On Mon, Aug 26, 2013 at 2:37 PM, Hiroki Sato wrote: > Xin Li wrote > in <521670FF.6080407@delphij.net>: > > de> -----BEGIN PGP SIGNED MESSAGE----- > de> Hash: SHA512 > de> > de> Hi, > de> > de> I've noticed that we do not install default route last (after other > de> static routes). I think we should probably install it last, since the > de> administrator may legitimately configure a static route (e.g. this > de> IPv6 address goes to this interface) that is required by the default > de> route. > > Do you have an example? I could imagine some theoretically but > personally think that the default route which depends on a static > route is one which should be avoided. > > -- Hiroki Isn't that the case when the default gateway address is on a different subnet than the address assigned to the interface? Such set ups are admittedly odd but they should be possible on FreeBSD as well as on other OSes. -Kimmo From owner-freebsd-net@FreeBSD.ORG Mon Aug 26 14:28:04 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 7C456B2F for ; Mon, 26 Aug 2013 14:28:04 +0000 (UTC) (envelope-from andre@freebsd.org) Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id DE6FA27D6 for ; Mon, 26 Aug 2013 14:28:03 +0000 (UTC) Received: (qmail 6815 invoked from network); 26 Aug 2013 15:10:14 -0000 Received: from c00l3r.networx.ch (HELO [127.0.0.1]) ([62.48.2.2]) (envelope-sender ) by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP for ; 26 Aug 2013 15:10:14 -0000 Message-ID: <521B65EF.1030408@freebsd.org> Date: Mon, 26 Aug 2013 16:27:59 +0200 From: Andre Oppermann User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130801 Thunderbird/17.0.8 MIME-Version: 1.0 To: Adrian Chadd Subject: Re: route/arp lifetime (Re: it's the output, not ack coalescing (Re: TSO and FreeBSD vs Linux)) References: <520A6D07.5080106@freebsd.org> <520B74DD.1060102@ipfw.ru> <20130814124024.GA64548@onelab2.iet.unipi.it> <201308141740.28779.zec@fer.hr> <20130814154853.GA66341@onelab2.iet.unipi.it> <521204A9.7080607@ipfw.ru> <52152837.9010101@freebsd.org> <5218ABB4.5070601@ipfw.ru> <5218E8B6.5090407@freebsd.org> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: FreeBSD Net , "Alexander V. Chernikov" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Aug 2013 14:28:04 -0000 On 25.08.2013 16:42, Adrian Chadd wrote: > On 24 August 2013 10:09, Andre Oppermann > wrote: > > On 24.08.2013 19:04, Adrian Chadd wrote: > > I'm very close to starting an mbuf batching thing to use in a few places like receive, > transmit and > transmit completion -> free path. I'd be interested in your review/feedback and testing as > it sounds > like something you can easily stress test there. :) > > > I'd strongly recommend fixing a number of other places and collect > lower hanging fruit before starting with mbuf batching. > > > I'm open to suggestions. > > Scott killed our high hanging fruit (VM / buffer page lifecycle) and what's left is not very low. If > you have any recommendations, I'd love to hear them. 1. lle lock to rmlock. 2. if_addr and IN_ADDR locks to rmlocks. 3. routing table locking (rmlocks, and by doing away with rtentry locks and refcounting through copy-out on lookup and prohibition of having any pointers into the rtable). -- Andre From owner-freebsd-net@FreeBSD.ORG Mon Aug 26 14:34:16 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 7956CC8F for ; Mon, 26 Aug 2013 14:34:16 +0000 (UTC) (envelope-from andre@freebsd.org) Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id C67662863 for ; Mon, 26 Aug 2013 14:34:15 +0000 (UTC) Received: (qmail 6860 invoked from network); 26 Aug 2013 15:16:26 -0000 Received: from c00l3r.networx.ch (HELO [127.0.0.1]) ([62.48.2.2]) (envelope-sender ) by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP for ; 26 Aug 2013 15:16:26 -0000 Message-ID: <521B6763.1080308@freebsd.org> Date: Mon, 26 Aug 2013 16:34:11 +0200 From: Andre Oppermann User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130801 Thunderbird/17.0.8 MIME-Version: 1.0 To: Navdeep Parhar Subject: Re: Please review: LRO entry last-active timestamp. References: <521510CC.3040104@FreeBSD.org> In-Reply-To: <521510CC.3040104@FreeBSD.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: "freebsd-net@freebsd.org" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Aug 2013 14:34:16 -0000 On 21.08.2013 21:11, Navdeep Parhar wrote: > I'd like to add a last-active timestamp to the structure that tracks the > LRO state in a NIC's rx handler. This is r254336 in user/np/cxl_tuning > that will be merged to head if there are no objections. No objections. This is good thing. The last time I looked some time back there were a couple of additional issues with the soft-LRO code. One was that it would cause packet re- ordering when a new segment is out-of-order or can't be merged. There the new packet would be delivered first and the merged LRO chain after it. I think there were a couple of other similar issues. However I don't know if these have been fixed in the mean time. BTW an excellent resource on the correct LRO behavior is this detailed flow chart. It probably would be good to audit our soft-LRO against it. http://msdn.microsoft.com/en-us/library/windows/hardware/jj853325%28v=vs.85%29.aspx -- Andre > http://svnweb.freebsd.org/base?view=revision&revision=254336 > http://svnweb.freebsd.org/base/user/np/cxl_tuning/sys/netinet/tcp_lro.c?r1=254336&r2=254335&pathrev=254336 > http://svnweb.freebsd.org/base/user/np/cxl_tuning/sys/netinet/tcp_lro.h?r1=254336&r2=254335&pathrev=254336 > > > ----- > Add a last-modified timestamp to each LRO entry and provide an interface > to flush all inactive entries. Drivers decide when to flush and what > the inactivity threshold should be. > > Network drivers that process an rx queue to completion can enter a > livelock type situation when the rate at which packets are received > reaches equilibrium with the rate at which the rx thread is processing > them. When this happens the final LRO flush (normally when the rx > routine is done) does not occur. Pure ACKs and segments with total > payload < 64K can get stuck in an LRO entry. Symptoms are that TCP > tx-mostly connections' performance falls off a cliff during heavy, > unrelated rx on the interface. > > Flushing only inactive LRO entries works better than any of these > alternates that I tried: > - don't LRO pure ACKs > - flush _all_ LRO entries periodically (every 'x' microseconds or every > 'y' descriptors) > - stop rx processing in the driver periodically and schedule remaining > work for later. > ----- > > > Regards, > Navdeep > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > > From owner-freebsd-net@FreeBSD.ORG Mon Aug 26 14:41:09 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 22940EB4; Mon, 26 Aug 2013 14:41:09 +0000 (UTC) (envelope-from luigi@onelab2.iet.unipi.it) Received: from onelab2.iet.unipi.it (onelab2.iet.unipi.it [131.114.59.238]) by mx1.freebsd.org (Postfix) with ESMTP id D608328DA; Mon, 26 Aug 2013 14:41:08 +0000 (UTC) Received: by onelab2.iet.unipi.it (Postfix, from userid 275) id 77F387300A; Mon, 26 Aug 2013 16:46:01 +0200 (CEST) Date: Mon, 26 Aug 2013 16:46:01 +0200 From: Luigi Rizzo To: Andre Oppermann Subject: Re: route/arp lifetime (Re: it's the output, not ack coalescing (Re: TSO and FreeBSD vs Linux)) Message-ID: <20130826144601.GA11595@onelab2.iet.unipi.it> References: <20130814124024.GA64548@onelab2.iet.unipi.it> <201308141740.28779.zec@fer.hr> <20130814154853.GA66341@onelab2.iet.unipi.it> <521204A9.7080607@ipfw.ru> <52152837.9010101@freebsd.org> <5218ABB4.5070601@ipfw.ru> <5218E8B6.5090407@freebsd.org> <521B65EF.1030408@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <521B65EF.1030408@freebsd.org> User-Agent: Mutt/1.5.20 (2009-06-14) Cc: FreeBSD Net , Adrian Chadd , "Alexander V. Chernikov" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Aug 2013 14:41:09 -0000 On Mon, Aug 26, 2013 at 04:27:59PM +0200, Andre Oppermann wrote: ... > > 1. lle lock to rmlock. > > 2. if_addr and IN_ADDR locks to rmlocks. > > 3. routing table locking (rmlocks, and by doing away with rtentry locks and refcounting > through copy-out on lookup and prohibition of having any pointers into the rtable). re. the last item, the problem is that we need to access *ifp after the route lookup, and this cannot be solved with a copy-on-lookup (I guess at the moment the rte has a refcounted pointer to the ifp). This is why i argued that it might be useful to cache into the socket a refcounted pointer into the ifp (or rte) and update it lazily (periodically or through generation counters). cheers luigi From owner-freebsd-net@FreeBSD.ORG Mon Aug 26 15:07:34 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 68ACA48F for ; Mon, 26 Aug 2013 15:07:34 +0000 (UTC) (envelope-from andre@freebsd.org) Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id C9AA22A3D for ; Mon, 26 Aug 2013 15:07:33 +0000 (UTC) Received: (qmail 7014 invoked from network); 26 Aug 2013 15:49:44 -0000 Received: from c00l3r.networx.ch (HELO [127.0.0.1]) ([62.48.2.2]) (envelope-sender ) by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP for ; 26 Aug 2013 15:49:44 -0000 Message-ID: <521B6F31.705@freebsd.org> Date: Mon, 26 Aug 2013 17:07:29 +0200 From: Andre Oppermann User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130801 Thunderbird/17.0.8 MIME-Version: 1.0 To: Luigi Rizzo Subject: Re: route/arp lifetime (Re: it's the output, not ack coalescing (Re: TSO and FreeBSD vs Linux)) References: <20130814124024.GA64548@onelab2.iet.unipi.it> <201308141740.28779.zec@fer.hr> <20130814154853.GA66341@onelab2.iet.unipi.it> <521204A9.7080607@ipfw.ru> <52152837.9010101@freebsd.org> <5218ABB4.5070601@ipfw.ru> <5218E8B6.5090407@freebsd.org> <521B65EF.1030408@freebsd.org> <20130826144601.GA11595@onelab2.iet.unipi.it> In-Reply-To: <20130826144601.GA11595@onelab2.iet.unipi.it> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: FreeBSD Net , Adrian Chadd , "Alexander V. Chernikov" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Aug 2013 15:07:34 -0000 On 26.08.2013 16:46, Luigi Rizzo wrote: > On Mon, Aug 26, 2013 at 04:27:59PM +0200, Andre Oppermann wrote: > ... >> >> 1. lle lock to rmlock. >> >> 2. if_addr and IN_ADDR locks to rmlocks. >> >> 3. routing table locking (rmlocks, and by doing away with rtentry locks and refcounting >> through copy-out on lookup and prohibition of having any pointers into the rtable). > > re. the last item, the problem is that we need to access *ifp > after the route lookup, and this cannot be solved with a copy-on-lookup > (I guess at the moment the rte has a refcounted pointer to the ifp). The ifp has always been a bit lazy and later access is fine. We have the same problem with packets coming up from an interface (m->pkthdr.rcvif). > This is why i argued that it might be useful to cache into the socket a > refcounted pointer into the ifp (or rte) and update it lazily > (periodically or through generation counters). Unless you want to ref-count every invocation of ifp the cheapest way to solve this is by making sure the ifp stays around for some time (2 minutes?) pointing to a dummy if_transmit after an interface departs. -- Andre From owner-freebsd-net@FreeBSD.ORG Mon Aug 26 17:18:26 2013 Return-Path: Delivered-To: net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 0BB05F39; Mon, 26 Aug 2013 17:18:26 +0000 (UTC) (envelope-from gibbs@FreeBSD.org) Received: from aslan.scsiguy.com (www.scsiguy.com [70.89.174.89]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id D259C222C; Mon, 26 Aug 2013 17:18:25 +0000 (UTC) Received: from [192.168.6.166] (207-225-98-3.dia.static.qwest.net [207.225.98.3]) (authenticated bits=0) by aslan.scsiguy.com (8.14.7/8.14.5) with ESMTP id r7QHINuA058863 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO); Mon, 26 Aug 2013 17:18:24 GMT (envelope-from gibbs@FreeBSD.org) From: "Justin T. Gibbs" Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Subject: Flow ID, LACP, and igb Date: Mon, 26 Aug 2013 11:18:18 -0600 Message-Id: To: net@FreeBSD.org Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\)) X-Mailer: Apple Mail (2.1508) X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.4.3 (aslan.scsiguy.com [70.89.174.89]); Mon, 26 Aug 2013 17:18:24 +0000 (UTC) Cc: jfv@FreeBSD.org, Alan Somers X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Aug 2013 17:18:26 -0000 Hi Net, I'm an infrequent traveler through the networking code and would appreciate some feedback on some proposed solutions to issues Spectra has seen with outbound LACP traffic. lacp_select_tx_port() uses the flow ID if it is available in the outbound mbuf to select the outbound port. The igb driver uses the msix queue of the inbound packet to set a packet's flow ID. This doesn't provide enough bits of information to yield a high quality flow ID. If, for example, the switch controlling inbound packet distribution does a poor job, the outbound packet distribution will also be poorly distributed. The majority of the adapters supported by this driver will compute the Toeplitz RSS hash. Using this data seems to work quite well in our tests (3 member LAGG group). Is there any reason we shouldn't use the RSS hash for flow ID? We also tried disabling the use of flow ID and doing the hash directly in the driver. Unfortunately, the current hash is pretty weak. It multiplies by 33, which yield very poor distributions if you need to mod the result by 3 (e.g. LAGG group with 3 members). Alan modified the driver to use the FNV hash, which is already in the kernel, and this yielded much better results. He is still benchmarking the impact of this change. Assuming we can get decent flow ID data, this should only impact outbound UDP, since the stack doesn't provide a flow ID in this case. Are there other checksums we should be looking at in addition to FNV? Thanks, Justin From owner-freebsd-net@FreeBSD.ORG Mon Aug 26 17:31:39 2013 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 6CAFF978; Mon, 26 Aug 2013 17:31:39 +0000 (UTC) (envelope-from asomers@gmail.com) Received: from mail-qc0-x229.google.com (mail-qc0-x229.google.com [IPv6:2607:f8b0:400d:c01::229]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id EB7472330; Mon, 26 Aug 2013 17:31:38 +0000 (UTC) Received: by mail-qc0-f169.google.com with SMTP id k8so818484qcq.28 for ; Mon, 26 Aug 2013 10:31:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:date:message-id:subject:from:to:cc:content-type; bh=abG1iumbPc++xGdNRWmaRzpETXkxhFeXpGYx/ArwDZE=; b=SoS8SnZf9kKYWnVWgd7mScweWAR99mFWDA0I63io+FYJWnrSoG/sIACoJxku7bGGwa +gJog+mmwCYqr7u/s8q5N77F0c+AIGqt4Fle8aFgM6SNwZ2sibmixY5ZCsojhc5S7LbK bcQy4nxLqedvoQSw/9RafqZr872y/WHrMvBdoZsoZOFlwffjVGtPlF6AHnVIYgPrQCSW hALsv8xNHNAXocPIIRzhY68Bt7PD0FUbTUl3q0TStj+3GYDkkhcG8lKJvRi5Uw8E9GLx L2uvt+MlBkF4m9cKv7cYSMrN7qnpj4fHyntoD3e+hUALsP++RrRLOr9ZVI721BNQhrr2 784g== MIME-Version: 1.0 X-Received: by 10.224.23.134 with SMTP id r6mr17387677qab.34.1377538298027; Mon, 26 Aug 2013 10:31:38 -0700 (PDT) Sender: asomers@gmail.com Received: by 10.49.39.101 with HTTP; Mon, 26 Aug 2013 10:31:37 -0700 (PDT) Date: Mon, 26 Aug 2013 11:31:37 -0600 X-Google-Sender-Auth: eWRpuz1COx_oc4nkaa1z_5hvPNE Message-ID: Subject: Re: Flow ID, LACP, and igb From: Alan Somers To: "Justin T. Gibbs" Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: jfv@freebsd.org, Alan Somers , net@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Aug 2013 17:31:39 -0000 On Mon, Aug 26, 2013 at 11:18 AM, Justin T. Gibbs wrote: > Hi Net, > > I'm an infrequent traveler through the networking code and would > appreciate some feedback on some proposed solutions to issues Spectra > has seen with outbound LACP traffic. > > lacp_select_tx_port() uses the flow ID if it is available in the outbound > mbuf to select the outbound port. The igb driver uses the msix queue of > the inbound packet to set a packet's flow ID. This doesn't provide enough > bits of information to yield a high quality flow ID. If, for example, the > switch controlling inbound packet distribution does a poor job, the > outbound > packet distribution will also be poorly distributed. > It's actually worse than this. If two inbound TCP packets get sent to the same queue on different igb ports, then they will have the same flowid. That could happen even if the switch is distributing packets just fine. > > The majority of the adapters supported by this driver will compute > the Toeplitz RSS hash. Using this data seems to work quite well > in our tests (3 member LAGG group). Is there any reason we shouldn't > use the RSS hash for flow ID? > > We also tried disabling the use of flow ID and doing the hash directly in > the driver. Unfortunately, the current hash is pretty weak. It multiplies > by 33, which yield very poor distributions if you need to mod the result > by 3 (e.g. LAGG group with 3 members). Alan modified the driver to use > the FNV hash, which is already in the kernel, and this yielded much better > results. He is still benchmarking the impact of this change. Assuming we > can get decent flow ID data, this should only impact outbound UDP, since > the > stack doesn't provide a flow ID in this case. > It also affects outbound TCP packets for streams that originated on the host. For example, it affects tcp-mounted NFS clients. > > Are there other checksums we should be looking at in addition to FNV? > s/checksums/hashes/ > > Thanks, > Justin > > From owner-freebsd-net@FreeBSD.ORG Mon Aug 26 18:10:54 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id D902753C; Mon, 26 Aug 2013 18:10:54 +0000 (UTC) (envelope-from julian@freebsd.org) Received: from vps1.elischer.org (vps1.elischer.org [204.109.63.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id A8C682548; Mon, 26 Aug 2013 18:10:53 +0000 (UTC) Received: from Julian-MBP3.local (ppp121-45-245-177.lns20.per2.internode.on.net [121.45.245.177]) (authenticated bits=0) by vps1.elischer.org (8.14.6/8.14.6) with ESMTP id r7QIAeHe014036 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Mon, 26 Aug 2013 11:10:43 -0700 (PDT) (envelope-from julian@freebsd.org) Message-ID: <521B9A1B.7080908@freebsd.org> Date: Tue, 27 Aug 2013 02:10:35 +0800 From: Julian Elischer User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:17.0) Gecko/20130801 Thunderbird/17.0.8 MIME-Version: 1.0 To: Kimmo Paasiala Subject: Re: Why default route is not installed last? References: <521670FF.6080407@delphij.net> <20130826.203744.2304902117196747104.hrs@allbsd.org> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: delphij@delphij.net, Hiroki Sato , freebsd-rc@freebsd.org, d@delphij.net, FreeBSD Net X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Aug 2013 18:10:54 -0000 On 8/26/13 7:56 PM, Kimmo Paasiala wrote: > On Mon, Aug 26, 2013 at 2:37 PM, Hiroki Sato wrote: >> Xin Li wrote >> in <521670FF.6080407@delphij.net>: >> >> de> -----BEGIN PGP SIGNED MESSAGE----- >> de> Hash: SHA512 >> de> >> de> Hi, >> de> >> de> I've noticed that we do not install default route last (after other >> de> static routes). I think we should probably install it last, since the >> de> administrator may legitimately configure a static route (e.g. this >> de> IPv6 address goes to this interface) that is required by the default >> de> route. >> >> Do you have an example? I could imagine some theoretically but >> personally think that the default route which depends on a static >> route is one which should be avoided. >> >> -- Hiroki > Isn't that the case when the default gateway address is on a different > subnet than the address assigned to the interface? Such set ups are > admittedly odd but they should be possible on FreeBSD as well as on > other OSes. That has always been specifically not supported. default route needs to be directly attached. in fact the routing tables only ever deliver the 'next hop' > > -Kimmo > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > > From owner-freebsd-net@FreeBSD.ORG Mon Aug 26 18:45:30 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 2B15D7D4; Mon, 26 Aug 2013 18:45:30 +0000 (UTC) (envelope-from davide.italiano@gmail.com) Received: from mail-vc0-x229.google.com (mail-vc0-x229.google.com [IPv6:2607:f8b0:400c:c03::229]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id A6C80273A; Mon, 26 Aug 2013 18:45:29 +0000 (UTC) Received: by mail-vc0-f169.google.com with SMTP id ib11so2362043vcb.14 for ; Mon, 26 Aug 2013 11:45:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=1eN6suHFYVPG/ZPNbyyUSstgvMT2vDNQnDD+m7G3FIU=; b=PS0HPAq5tNsxGTR6LSYTpce17cMfwJSWvYLZcpilsAmDAywC/1X4ZBUbsA1Z4xr3pc FHfFJ8w+sOwdFxh/L8K/QzYJFIUCr1/3itLxNVMBW5C6I0wsJuUVy887DTrGOfHIfqk/ ilqtOskFYN0z2uzxuCZCIttpRsU/UWp81/p0DdUSPzU0FIPy1fsshTUBCiyeAyLz7CqS l+jI/3wMGG8kdORenv35b84pvs+DyEiK9UWsG7OnnDGw3PNqIEVENYcxCKgUJAPziXGi Ov5L+MNijeuIx7EwOlzrU26BLjrzDti/ndSbLVv9CAIEk5M02L2YDCT1f6o5DemBTSPp tS1A== MIME-Version: 1.0 X-Received: by 10.220.174.200 with SMTP id u8mr16166622vcz.6.1377542728764; Mon, 26 Aug 2013 11:45:28 -0700 (PDT) Sender: davide.italiano@gmail.com Received: by 10.220.65.132 with HTTP; Mon, 26 Aug 2013 11:45:28 -0700 (PDT) In-Reply-To: References: <5218AA36.1080807@ipfw.ru> Date: Mon, 26 Aug 2013 11:45:28 -0700 X-Google-Sender-Auth: uAH71tZSdvuPxJdT5_x8nHM2C0g Message-ID: Subject: Re: [rfc] migrate lagg to an rmlock From: Davide Italiano To: Robert Watson Content-Type: text/plain; charset=ISO-8859-1 Cc: FreeBSD Net , Adrian Chadd , freebsd-current , "Alexander V. Chernikov" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Aug 2013 18:45:30 -0000 On Sat, Aug 24, 2013 at 7:16 AM, Robert Watson wrote: > On Sat, 24 Aug 2013, Alexander V. Chernikov wrote: > >> On 24.08.2013 00:54, Adrian Chadd wrote: >>> >>> >>> I'd like to commit this to -10. It migrates the if_lagg locking >>> from a rw lock to a rm lock. We see a bit of contention between the >>> transmit and >> >> >> We're running lagg with rmlock on several hundred heavily loaded machines, >> it really works better. However, there should not be any contention between >> receive and transmit side since there is actually no _real_ need to lock RX >> (and even use lagg receive code at all): >> >> http://lists.freebsd.org/pipermail/svn-src-all/2013-April/067570.html > > > We should distinguish "lock contention" from "line contention". When > acquiring a rwlock on multiple CPUs concurrently, the cache lines used to > implement the lock are contended, as they must bounce between caches via the > cache coherence protocol, also referred to as "contention". In the if_lagg > code, I assume that the read-only acquire of the rwlock (and perhaps now > rmlock) is for data stability rather than mutual exclusion -- e.g., to allow > processing to completion against a stable version of the lagg configuration. > As such, indeed, there should be no lock contention unless a configuration > update takes place, and any line contention is a property of the locking > primitive rather than data model. > > There are a number of other places in the kernel where migration to an > rmlock makes sense -- however, some care must be taken for four reasons: (1) > while read locks don't experience line contention, write locking becomes > observably e.g., rmlocks might not be suitable for tcbinfo; (2) rmlocks, > unlike rwlocks, more expensive so is not suitable for all rwlock line > contention spots -- implement reader priority propagation, so you must > reason about; and (3) historically, rmlocks have not fully implemented > WITNESS so you may get less good debugging output. if_lagg is a nice place I'm not sure what you mean here with (3), because from my understanding of the code WITNESS is implemented both in the sleepable and non-sleepable case, but there could be something I'm missing. Something I think we lack in rmlock code is fully supported LOCK_PROFILING as we have in all the other primitives, but again, if I'm wrong feel free to correct me. > to use rmlocks, as reconfigurations are very rare, and it's really all about > long-term data stability. > > Robert > > _______________________________________________ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" Thanks, -- Davide "There are no solved problems; there are only problems that are more or less solved" -- Henri Poincare From owner-freebsd-net@FreeBSD.ORG Mon Aug 26 18:49:03 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 05738B16; Mon, 26 Aug 2013 18:49:03 +0000 (UTC) (envelope-from delphij@delphij.net) Received: from anubis.delphij.net (anubis.delphij.net [IPv6:2001:470:1:117::25]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id C832B2787; Mon, 26 Aug 2013 18:49:02 +0000 (UTC) Received: from zeta.ixsystems.com (unknown [69.198.165.132]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by anubis.delphij.net (Postfix) with ESMTPSA id 52BED5D45; Mon, 26 Aug 2013 11:49:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=delphij.net; s=anubis; t=1377542941; bh=2GkMwHbiHN9Qzi0yTKk7eNMMAZZLrz3kwlr/i3uW3es=; h=Date:From:Reply-To:To:CC:Subject:References:In-Reply-To; b=QDPPL+pekpTF1iBefBUO7CcruBk5cq1EO7pBRWQ9+AjbRZkeTZEMb2c3a1vlMoBiT s6uo9iPSCkrpVR6IOnshxTGlKquF+fm7T0DlUC46WjcbO1Xi4je4UzZ1iX0s9o6uWD BKS1LyITolGEpyx70MMWveqkqBkCEuYADvtm4ja8= Message-ID: <521BA31C.5000807@delphij.net> Date: Mon, 26 Aug 2013 11:49:00 -0700 From: Xin Li Organization: The FreeBSD Project MIME-Version: 1.0 To: Julian Elischer Subject: Re: Why default route is not installed last? References: <521670FF.6080407@delphij.net> <20130826.203744.2304902117196747104.hrs@allbsd.org> <521B9A1B.7080908@freebsd.org> In-Reply-To: <521B9A1B.7080908@freebsd.org> X-Enigmail-Version: 1.5.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: Kimmo Paasiala , Hiroki Sato , freebsd-rc@freebsd.org, d@delphij.net, FreeBSD Net X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: d@delphij.net List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Aug 2013 18:49:03 -0000 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 On 08/26/13 11:10, Julian Elischer wrote: > On 8/26/13 7:56 PM, Kimmo Paasiala wrote: >> On Mon, Aug 26, 2013 at 2:37 PM, Hiroki Sato >> wrote: >>> Xin Li wrote in >>> <521670FF.6080407@delphij.net>: >>> >>> de> -----BEGIN PGP SIGNED MESSAGE----- de> Hash: SHA512 de> de> >>> Hi, de> de> I've noticed that we do not install default route >>> last (after other de> static routes). I think we should >>> probably install it last, since the de> administrator may >>> legitimately configure a static route (e.g. this de> IPv6 >>> address goes to this interface) that is required by the >>> default de> route. >>> >>> Do you have an example? I could imagine some theoretically >>> but personally think that the default route which depends on a >>> static route is one which should be avoided. >>> >>> -- Hiroki >> Isn't that the case when the default gateway address is on a >> different subnet than the address assigned to the interface? Such >> set ups are admittedly odd but they should be possible on FreeBSD >> as well as on other OSes. > That has always been specifically not supported. default route > needs to be directly attached. in fact the routing tables only ever > deliver the 'next hop' Well, depends on whether the 'next hop' is an IP or an interface. For instance one can have a valid configuration that they have a static route of: 2607:5300:XXXX:XXXX:ff:ff:ff:ff -prefixlen 128 -interface em0 Then have 2607:5300:XXXX:XXXX:ff:ff:ff:ff as default router. This configuration is not possible with the current rc.d startup order. Cheers, - -- Xin LI https://www.delphij.net/ FreeBSD - The Power to Serve! Live free or die -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.21 (FreeBSD) iQEcBAEBCgAGBQJSG6MbAAoJEG80Jeu8UPuzAYMH/2K+wa2I2jexZourxzPgH25X OWxsxZgAwd/rEbsbm/0r0ApzGLNm7WQaXaBuNk+u9G9DWOLSTh1M/axRDAez4vOC EJiOfMQxMXlK7uBuA+1cUUrFbrPN4bNaRKY4DvSMWocd3x9T2CrxGaT9Y2SO6Q2g 1x2xSH63MXxebFaaT7nXqLLfpT4IK7yCOWPSXatBdZyZXAZh2ePa7wP4JX/Ti4ON IFE6IQwOs9q+w8EiyzLMtoqpZTt882Zw8beDmKMj7On+yXsw48+ryZF54kVu8+Sz dEwdvuKlXWB8FVWRz5gYbAOePq3XqCLeOuMZ5b6eIiHwhlY184nw2A94ahqVRGE= =27i9 -----END PGP SIGNATURE----- From owner-freebsd-net@FreeBSD.ORG Mon Aug 26 20:40:07 2013 Return-Path: Delivered-To: net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 1A6C6E6C for ; Mon, 26 Aug 2013 20:40:07 +0000 (UTC) (envelope-from andre@freebsd.org) Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 652F62EA0 for ; Mon, 26 Aug 2013 20:40:06 +0000 (UTC) Received: (qmail 9678 invoked from network); 26 Aug 2013 21:22:13 -0000 Received: from c00l3r.networx.ch (HELO [127.0.0.1]) ([62.48.2.2]) (envelope-sender ) by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP for ; 26 Aug 2013 21:22:13 -0000 Message-ID: <521BBD21.4070304@freebsd.org> Date: Mon, 26 Aug 2013 22:40:01 +0200 From: Andre Oppermann User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130801 Thunderbird/17.0.8 MIME-Version: 1.0 To: "Justin T. Gibbs" Subject: Re: Flow ID, LACP, and igb References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: jfv@FreeBSD.org, Alan Somers , net@FreeBSD.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Aug 2013 20:40:07 -0000 On 26.08.2013 19:18, Justin T. Gibbs wrote: > Hi Net, > > I'm an infrequent traveler through the networking code and would > appreciate some feedback on some proposed solutions to issues Spectra > has seen with outbound LACP traffic. > > lacp_select_tx_port() uses the flow ID if it is available in the outbound > mbuf to select the outbound port. The igb driver uses the msix queue of > the inbound packet to set a packet's flow ID. This doesn't provide enough > bits of information to yield a high quality flow ID. If, for example, the > switch controlling inbound packet distribution does a poor job, the outbound > packet distribution will also be poorly distributed. Please note that inbound and outbound flow ID do not need to be the same or symmetric. It only should stay the same for all packets in a single connection to prevent reordering. Generally it doesn't matter if in- and outbound packets do not use the same queue. Only in sophisticated setups with full affinity, which we don't support yet, it could matter. > The majority of the adapters supported by this driver will compute > the Toeplitz RSS hash. Using this data seems to work quite well > in our tests (3 member LAGG group). Is there any reason we shouldn't > use the RSS hash for flow ID? Using the RSS hash is the idea. The infrastructure and driver adjustments haven't been implemented throughout yet. > We also tried disabling the use of flow ID and doing the hash directly in > the driver. Unfortunately, the current hash is pretty weak. It multiplies > by 33, which yield very poor distributions if you need to mod the result > by 3 (e.g. LAGG group with 3 members). Alan modified the driver to use > the FNV hash, which is already in the kernel, and this yielded much better > results. He is still benchmarking the impact of this change. Assuming we > can get decent flow ID data, this should only impact outbound UDP, since the > stack doesn't provide a flow ID in this case. > > Are there other checksums we should be looking at in addition to FNV? siphash24() is fast, keyed and strong. -- Andre From owner-freebsd-net@FreeBSD.ORG Mon Aug 26 21:07:07 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 0E2BB61B; Mon, 26 Aug 2013 21:07:07 +0000 (UTC) (envelope-from crodr001@gmail.com) Received: from mail-la0-x231.google.com (mail-la0-x231.google.com [IPv6:2a00:1450:4010:c03::231]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id A358A2096; Mon, 26 Aug 2013 21:07:05 +0000 (UTC) Received: by mail-la0-f49.google.com with SMTP id ev20so2799553lab.22 for ; Mon, 26 Aug 2013 14:07:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:date:message-id:subject:from:to:cc:content-type; bh=41XSI9ovOm0lSR3h9cWuXdiI6EEqZZrtVuYcLWPDE3A=; b=kKCNa7wSPIlx6ZLLaYoIpxr2EEKqqp1gsgv+et0m9DFhckNdBXG/NDeJXznktOBtBH brgsvJK4aPaOTJkUE2QqIziQTR67yDKXh3QVM2NnPCNUZ8xy2+M/Y1AZJlA+9ihO6oFW RjM/vYPTfrn2dcGp3Rph8RMw1BZ713jBEq5bRuKD6R0xy+OoBf+bwl+2gGx+wDHJVu1H zZjAqFRnZ1m1UFiZ+6lwDrfNojByrW+pHu91mbazeue17uIHvlVOi098dyH+GaAZFJY3 hyhoFTby6fZEWPfdXVmOGFeUs0vDhx8+7hGbyNd/2A8EKDFXJTYA9hOm+UNzc1mZTIcr Hmxg== MIME-Version: 1.0 X-Received: by 10.152.2.4 with SMTP id 4mr15433149laq.0.1377551223471; Mon, 26 Aug 2013 14:07:03 -0700 (PDT) Sender: crodr001@gmail.com Received: by 10.112.168.136 with HTTP; Mon, 26 Aug 2013 14:07:03 -0700 (PDT) Date: Mon, 26 Aug 2013 14:07:03 -0700 X-Google-Sender-Auth: 8Ity0X1dIH7ipY2m8rdkIMu_53A Message-ID: Subject: devel/jenkins port not starting. Kernel panic in IPv6 multicast code From: Craig Rodrigues To: lwhsu@freebsd.org Content-Type: multipart/mixed; boundary=089e0112c51c55c2ed04e4e023dc X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: freebsd-net@freebsd.org, freebsd-ports@freebsd.org, freebsd-java@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Aug 2013 21:07:07 -0000 --089e0112c51c55c2ed04e4e023dc Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi, Sorry for the cross-posting, but I'm not 100% sure where the problem is, either with the Jenkins port, Java, or FreeBSD networking code.. I recently tried to install the devel/jenkins port on two separate boxes: (1) box 1 running FreeBSD-9-STABLE. (2) box 2 running running FreeBSD-CURRENT r254815 On box 1 when I tried to run jenkins with "service jenkins start", I got a Java error with backtrace: WARNING: Failed to advertise the service to DNS multi-cast (see attached jenkins.log.txt) On box 2, since I this is a debug kernel with WITNESS and INVARIANTS enabled, I get a kernel panic. (see attached core.txt.gz) The panic occurs here on line 1779: 1768 static struct ifnet * 1769 in6p_lookup_mcast_ifp(const struct inpcb *in6p, 1770 const struct sockaddr_in6 *gsin6) 1771 { 1772 struct route_in6 ro6; 1773 struct ifnet *ifp; 1774 1775 KASSERT(in6p->inp_vflag & INP_IPV6, 1776 ("%s: not INP_IPV6 inpcb", __func__)); 1777 KASSERT(gsin6->sin6_family =3D=3D AF_INET6, 1778 ("%s: not AF_INET6 group", __func__)); 1779 KASSERT(IN6_IS_ADDR_MULTICAST(&gsin6->sin6_addr), 1780 ("%s: not multicast", __func__)); If I look at gsin6->sin6_addr inside kgdb, I see: (kgdb) p gsin6->sin6_addr $1 =3D {__u6_addr =3D {__u6_addr8 =3D "\000\000\000\000\000\000\000\000\000\000=EF=BF=BD=EF=BF=BD=EF=BF=BDM|=EF= =BF=BD", __u6_addr16 =3D {0, 0, 0, 0, 0, 65535, 19951, 54652}, __u6_addr32 =3D {0, 0, 4294901760, 3581693423}}} I am not so familiar with this part of the networking code. Can someone recommend where is the best place to fix this would be? Thanks. -- Craig --089e0112c51c55c2ed04e4e023dc Content-Type: text/plain; charset=US-ASCII; name="jenkins.log.txt" Content-Disposition: attachment; filename="jenkins.log.txt" Content-Transfer-Encoding: base64 X-Attachment-Id: f_hku6268f0 UnVubmluZyBmcm9tOiAvdXNyL2xvY2FsL3NoYXJlL2plbmtpbnMvamVua2lucy53YXIKQXVnIDI2 LCAyMDEzIDk6NTg6MjcgQU0gd2luc3RvbmUuTG9nZ2VyIGxvZ0ludGVybmFsCklORk86IEJlZ2lu bmluZyBleHRyYWN0aW9uIGZyb20gd2FyIGZpbGUKSmVua2lucyBob21lIGRpcmVjdG9yeTogL3Vz ci9sb2NhbC9qZW5raW5zIGZvdW5kIGF0OiBTeXN0ZW0uZ2V0UHJvcGVydHkoIkpFTktJTlNfSE9N RSIpCkF1ZyAyNiwgMjAxMyA5OjU4OjMwIEFNIHdpbnN0b25lLkxvZ2dlciBsb2dJbnRlcm5hbApJ TkZPOiBIVFRQIExpc3RlbmVyIHN0YXJ0ZWQ6IHBvcnQ9ODE4MApBdWcgMjYsIDIwMTMgOTo1ODoz MCBBTSB3aW5zdG9uZS5Mb2dnZXIgbG9nSW50ZXJuYWwKSU5GTzogQUpQMTMgTGlzdGVuZXIgc3Rh cnRlZDogcG9ydD04MDA5CkF1ZyAyNiwgMjAxMyA5OjU4OjMwIEFNIHdpbnN0b25lLkxvZ2dlciBs b2dJbnRlcm5hbApJTkZPOiBXaW5zdG9uZSBTZXJ2bGV0IEVuZ2luZSB2MC45LjEwIHJ1bm5pbmc6 IGNvbnRyb2xQb3J0PWRpc2FibGVkCkF1ZyAyNiwgMjAxMyA5OjU4OjMwIEFNIGplbmtpbnMuSW5p dFJlYWN0b3JSdW5uZXIkMSBvbkF0dGFpbmVkCklORk86IFN0YXJ0ZWQgaW5pdGlhbGl6YXRpb24K QXVnIDI2LCAyMDEzIDk6NTg6MzggQU0gamVua2lucy5Jbml0UmVhY3RvclJ1bm5lciQxIG9uQXR0 YWluZWQKSU5GTzogTGlzdGVkIGFsbCBwbHVnaW5zCkF1ZyAyNiwgMjAxMyA5OjU4OjM4IEFNIGpl bmtpbnMuSW5pdFJlYWN0b3JSdW5uZXIkMSBvbkF0dGFpbmVkCklORk86IFByZXBhcmVkIGFsbCBw bHVnaW5zCkF1ZyAyNiwgMjAxMyA5OjU4OjM5IEFNIGplbmtpbnMuSW5pdFJlYWN0b3JSdW5uZXIk MSBvbkF0dGFpbmVkCklORk86IFN0YXJ0ZWQgYWxsIHBsdWdpbnMKQXVnIDI2LCAyMDEzIDk6NTg6 MzkgQU0gamVua2lucy5Jbml0UmVhY3RvclJ1bm5lciQxIG9uQXR0YWluZWQKSU5GTzogQXVnbWVu dGVkIGFsbCBleHRlbnNpb25zCkF1ZyAyNiwgMjAxMyA5OjU4OjM5IEFNIGplbmtpbnMuSW5pdFJl YWN0b3JSdW5uZXIkMSBvbkF0dGFpbmVkCklORk86IExvYWRlZCBhbGwgam9icwpBdWcgMjYsIDIw MTMgOTo1ODo0OCBBTSBvcmcuamVua2luc2NpLm1haW4ubW9kdWxlcy5zc2hkLlNTSEQgc3RhcnQK SU5GTzogU3RhcnRlZCBTU0hEIGF0IHBvcnQgMTk2NzIKQXVnIDI2LCAyMDEzIDk6NTg6NDggQU0g amVua2lucy5Jbml0UmVhY3RvclJ1bm5lciQxIG9uQXR0YWluZWQKSU5GTzogQ29tcGxldGVkIGlu aXRpYWxpemF0aW9uCkF1ZyAyNiwgMjAxMyA5OjU4OjQ4IEFNIGh1ZHNvbi5UY3BTbGF2ZUFnZW50 TGlzdGVuZXIgPGluaXQ+CklORk86IEpOTFAgc2xhdmUgYWdlbnQgbGlzdGVuZXIgc3RhcnRlZCBv biBUQ1AgcG9ydCA1NDY4NwpBdWcgMjYsIDIwMTMgOTo1ODo0OCBBTSBodWRzb24uVURQQnJvYWRj YXN0VGhyZWFkIHJ1bgpXQVJOSU5HOiBVRFAgaGFuZGxpbmcgcHJvYmxlbQpqYXZhLm5ldC5Tb2Nr ZXRFeGNlcHRpb246IEludmFsaWQgYXJndW1lbnQKCWF0IGphdmEubmV0LlBsYWluRGF0YWdyYW1T b2NrZXRJbXBsLmpvaW4oTmF0aXZlIE1ldGhvZCkKCWF0IGphdmEubmV0LkFic3RyYWN0UGxhaW5E YXRhZ3JhbVNvY2tldEltcGwuam9pbihBYnN0cmFjdFBsYWluRGF0YWdyYW1Tb2NrZXRJbXBsLmph dmE6MTY4KQoJYXQgamF2YS5uZXQuTXVsdGljYXN0U29ja2V0LmpvaW5Hcm91cChNdWx0aWNhc3RT b2NrZXQuamF2YTozMDApCglhdCBodWRzb24uVURQQnJvYWRjYXN0VGhyZWFkLnJ1bihVRFBCcm9h ZGNhc3RUaHJlYWQuamF2YTo3NikKQXVnIDI2LCAyMDEzIDk6NTg6NDggQU0gaHVkc29uLldlYkFw cE1haW4kMyBydW4KSU5GTzogSmVua2lucyBpcyBmdWxseSB1cCBhbmQgcnVubmluZwpBdWcgMjYs IDIwMTMgOTo1ODo0OCBBTSBodWRzb24uRE5TTXVsdGlDYXN0JDEgY2FsbApXQVJOSU5HOiBGYWls ZWQgdG8gYWR2ZXJ0aXNlIHRoZSBzZXJ2aWNlIHRvIEROUyBtdWx0aS1jYXN0CmphdmEubmV0LlNv Y2tldEV4Y2VwdGlvbjogSW52YWxpZCBhcmd1bWVudAoJYXQgamF2YS5uZXQuUGxhaW5EYXRhZ3Jh bVNvY2tldEltcGwuam9pbihOYXRpdmUgTWV0aG9kKQoJYXQgamF2YS5uZXQuQWJzdHJhY3RQbGFp bkRhdGFncmFtU29ja2V0SW1wbC5qb2luKEFic3RyYWN0UGxhaW5EYXRhZ3JhbVNvY2tldEltcGwu amF2YToxNjgpCglhdCBqYXZhLm5ldC5NdWx0aWNhc3RTb2NrZXQuam9pbkdyb3VwKE11bHRpY2Fz dFNvY2tldC5qYXZhOjMwMCkKCWF0IGphdmF4LmptZG5zLmltcGwuSm1ETlNJbXBsLm9wZW5NdWx0 aWNhc3RTb2NrZXQoSm1ETlNJbXBsLmphdmE6NDU5KQoJYXQgamF2YXguam1kbnMuaW1wbC5KbURO U0ltcGwuPGluaXQ+KEptRE5TSW1wbC5qYXZhOjQyMCkKCWF0IGphdmF4LmptZG5zLkptRE5TLmNy ZWF0ZShKbUROUy5qYXZhOjYwKQoJYXQgaHVkc29uLkROU011bHRpQ2FzdCQxLmNhbGwoRE5TTXVs dGlDYXN0LmphdmE6MzIpCglhdCBqYXZhLnV0aWwuY29uY3VycmVudC5GdXR1cmVUYXNrJFN5bmMu aW5uZXJSdW4oRnV0dXJlVGFzay5qYXZhOjMzNCkKCWF0IGphdmEudXRpbC5jb25jdXJyZW50LkZ1 dHVyZVRhc2sucnVuKEZ1dHVyZVRhc2suamF2YToxNjYpCglhdCBqYXZhLnV0aWwuY29uY3VycmVu dC5UaHJlYWRQb29sRXhlY3V0b3IucnVuV29ya2VyKFRocmVhZFBvb2xFeGVjdXRvci5qYXZhOjEx NDYpCglhdCBqYXZhLnV0aWwuY29uY3VycmVudC5UaHJlYWRQb29sRXhlY3V0b3IkV29ya2VyLnJ1 bihUaHJlYWRQb29sRXhlY3V0b3IuamF2YTo2MTUpCglhdCBqYXZhLmxhbmcuVGhyZWFkLnJ1bihU aHJlYWQuamF2YTo2NzkpCg== --089e0112c51c55c2ed04e4e023dc-- From owner-freebsd-net@FreeBSD.ORG Mon Aug 26 23:30:06 2013 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 846071BE; Mon, 26 Aug 2013 23:30:06 +0000 (UTC) (envelope-from adrian.chadd@gmail.com) Received: from mail-qc0-x22c.google.com (mail-qc0-x22c.google.com [IPv6:2607:f8b0:400d:c01::22c]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 00AFC28E3; Mon, 26 Aug 2013 23:30:05 +0000 (UTC) Received: by mail-qc0-f172.google.com with SMTP id a1so2173567qcx.31 for ; Mon, 26 Aug 2013 16:30:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=PjAtJXlrewkZ3Vaflw80IOGnnmAuRAJ0xiqdsy8g1hw=; b=YYgq+xPMVnCMyiMDmvOJmetxgcC0NXZW0VCahrdDtcfBnrh9RnM6t9GLWOJZA8AI6+ M6yh6uTx4WVOjgH0df/Thtzj/nahqHLf2de3BalL9EqEOG7iBMnfZvMw1hznyxHXhZUU cZ8lRQlsUy7WHqGTg3r1vOPhmMzri8ZnweHkDBXFc/ENMoj+wRWPxZiBUfvCYYUjyaBq ZcKLfB63dYzh5LLgTd/AI6oMptv826NCb6ydDg6gQNLgM/r7xM3VCCDw/Qp5nsEnCTZm GJX6ppm+hdltn17GtH1Lj/l+LqNEdMkXUv6Ng5tqHL77N88KA+p4+nSFhDYqlYu/JvX2 WIAA== MIME-Version: 1.0 X-Received: by 10.224.166.129 with SMTP id m1mr18952146qay.46.1377559805194; Mon, 26 Aug 2013 16:30:05 -0700 (PDT) Sender: adrian.chadd@gmail.com Received: by 10.224.128.70 with HTTP; Mon, 26 Aug 2013 16:30:05 -0700 (PDT) In-Reply-To: <521BBD21.4070304@freebsd.org> References: <521BBD21.4070304@freebsd.org> Date: Mon, 26 Aug 2013 16:30:05 -0700 X-Google-Sender-Auth: ZViB53TuZ52B82R6Y-XGm7gykB0 Message-ID: Subject: Re: Flow ID, LACP, and igb From: Adrian Chadd To: Andre Oppermann Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: Jack F Vogel , "Justin T. Gibbs" , Alan Somers , "net@freebsd.org" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Aug 2013 23:30:06 -0000 ... is there any reason we wouldn't want to have the TX and RX for a given flow mapped to the same core? -adrian From owner-freebsd-net@FreeBSD.ORG Mon Aug 26 23:39:36 2013 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id A73FD8D7; Mon, 26 Aug 2013 23:39:36 +0000 (UTC) (envelope-from jfvogel@gmail.com) Received: from mail-ve0-x22d.google.com (mail-ve0-x22d.google.com [IPv6:2607:f8b0:400c:c01::22d]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 083FC295B; Mon, 26 Aug 2013 23:39:35 +0000 (UTC) Received: by mail-ve0-f173.google.com with SMTP id cy12so2585017veb.18 for ; Mon, 26 Aug 2013 16:39:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=StaHPv3JaA2uyd8RA6vLbGBVTa1E9rMB1u4HNXD3vOQ=; b=hDqkC5SaVQNsl7gNhW2Vrp5mPD4K25h+3U2XriaDewNvl4bv7VGxFif6XPuq3uXTqy ZCGkNY5WopySG0HebUkoMN3aSy4YBNVQ0C2e7AEeZAbzRvGxceXLL8abJUdw9CEgFRzO NiEALV+NtfXi1fpxmF74ugm68MweO24EM4jARHvGs7Wp1i5E618JURSTd6Hq4iDOeNVK GRr0ZnlXsqoiEhCYNh6NWEn5h7Ji6AXJ5Yr6hyUo5t9U1zu3IgoStMtA4GI3OQ4OPRYE Ga1Mx/eLpTQsXGq8WiUbwQUjzEGeO0EogkHHvHJnalVQxtdrYp50SzwWkDwcM6PTvJcu m9Pg== MIME-Version: 1.0 X-Received: by 10.52.96.100 with SMTP id dr4mr14105240vdb.17.1377560375179; Mon, 26 Aug 2013 16:39:35 -0700 (PDT) Received: by 10.220.159.141 with HTTP; Mon, 26 Aug 2013 16:39:35 -0700 (PDT) In-Reply-To: References: <521BBD21.4070304@freebsd.org> Date: Mon, 26 Aug 2013 16:39:35 -0700 Message-ID: Subject: Re: Flow ID, LACP, and igb From: Jack Vogel To: Adrian Chadd Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: Jack F Vogel , "Justin T. Gibbs" , Andre Oppermann , Alan Somers , "net@freebsd.org" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Aug 2013 23:39:36 -0000 None that I can think of. On Mon, Aug 26, 2013 at 4:30 PM, Adrian Chadd wrote: > ... is there any reason we wouldn't want to have the TX and RX for a given > flow mapped to the same core? > > > > > -adrian > From owner-freebsd-net@FreeBSD.ORG Mon Aug 26 23:56:01 2013 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 0CE1DB98 for ; Mon, 26 Aug 2013 23:56:01 +0000 (UTC) (envelope-from scott4long@yahoo.com) Received: from nm8-vm9.bullet.mail.gq1.yahoo.com (nm8-vm9.bullet.mail.gq1.yahoo.com [98.136.218.232]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id A77A72A16 for ; Mon, 26 Aug 2013 23:56:00 +0000 (UTC) Received: from [216.39.60.183] by nm8.bullet.mail.gq1.yahoo.com with NNFMP; 26 Aug 2013 23:49:36 -0000 Received: from [98.136.164.78] by tm19.bullet.mail.gq1.yahoo.com with NNFMP; 26 Aug 2013 23:49:36 -0000 Received: from [127.0.0.1] by smtp240.mail.gq1.yahoo.com with NNFMP; 26 Aug 2013 23:49:36 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1377560976; bh=HiPf+RNrZ8RdfvmPRPsHoQcxBHO0clAK5x/ApCGySEA=; h=X-Yahoo-Newman-Id:X-Yahoo-Newman-Property:X-YMail-OSG:X-Yahoo-SMTP:X-Rocket-Received:Content-Type:Mime-Version:Subject:From:In-Reply-To:Date:Cc:Content-Transfer-Encoding:Message-Id:References:To:X-Mailer; b=YIqirAv/TKr/JTD0IrjNMHu5VdnOCBvwyKDxRbnVc+p+Pyd0Ed9KXv4/GzcPFwDfuLaYcffP1ndiacfjouIo9OIUPC3j0RqHSnNaYt28+qkWkDzo/+9oaC/j5nWUZ15jR1fJj4Www3ZLkFNpZbDWaE/9TaIKYFvARRumzhQhOeA= X-Yahoo-Newman-Id: 154045.52281.bm@smtp240.mail.gq1.yahoo.com X-Yahoo-Newman-Property: ymail-3 X-YMail-OSG: Yr.ymhkVM1mApmCJ_g2pzV0ERP7ONfVs4DfhU3s8vZ7awoH An0Ommj3NXu1yKAYwF.T9z1YJuthiuFFHKfdJ6dwsg6kipOwbNaob3KyItkY hfe.s7YNLr3NcIynOXgQBWI6hjY8Oxc6ZdHSM_LtNfTqF.M05SjwxzfYg6Az vJdSQ2jN9AwMN5BakKVynPKwHJy5zdHFqWVtp.nyHWf_8UXHYWAEusvnuiah k0UWUJa7msqUsN9SfUCmFaTz9lU03KbP4rr9EEkRiLVjZDMOAITqwAg5C755 .U7lzsmejCk8x5YEP34vhCyYX6U_TUhBMOM3ZIWmygRcTIl3fWwhY0my4WDB HpIyw5wRT_Mv2U95qI96Hno3qN8xlVjOSjbVoSdBvSolEgzHt6RTae4OKpRq GhmUnYoY8CuOxDC2wojuGSa4t1TZkSxl9q4YhtKqe264I3XDnR3m8uR7PFdh 32MzZpBuIgOSEyVFdAsI3m5Wrk_9Q7GVv4WU_b..Bj_CigB9OUQi094M9LYG _DPiXM1CmP3zlwXpiTuhDWX_XpIE_pB6NYWDJvBhgX3AEBhwO10i20wOE9TE rP7w- X-Yahoo-SMTP: clhABp.swBB7fs.LwIJpv3jkWgo2NU8- X-Rocket-Received: from lgwl-achen.corp.netflix.com (scott4long@69.53.237.126 with ) by smtp240.mail.gq1.yahoo.com with SMTP; 26 Aug 2013 23:49:36 +0000 UTC Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\)) Subject: Re: Flow ID, LACP, and igb From: Scott Long In-Reply-To: Date: Mon, 26 Aug 2013 17:49:34 -0600 Content-Transfer-Encoding: quoted-printable Message-Id: <2F36A2B1-A2AF-4331-BF2A-144915BEE706@yahoo.com> References: <521BBD21.4070304@freebsd.org> To: Adrian Chadd , Andre Oppermann X-Mailer: Apple Mail (2.1508) Cc: Jack F Vogel , Justin Gibbs , Alan Somers , net@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Aug 2013 23:56:01 -0000 On Aug 26, 2013, at 5:30 PM, Adrian Chadd wrote: > ... is there any reason we wouldn't want to have the TX and RX for a = given > flow mapped to the same core? >=20 Given than an inbound ACK is likely to be turned into an outbound = segment from within the same execution context and CPU instance, I can't imagine = why it would be useful for these flows to be different. However, I'm still = a n00b at this networking stuff, so please correct me if I'm wrong. Scott From owner-freebsd-net@FreeBSD.ORG Tue Aug 27 00:30:54 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 6991FF9F; Tue, 27 Aug 2013 00:30:54 +0000 (UTC) (envelope-from lists@rewt.org.uk) Received: from hosted.mx.as41113.net (abby.lhr1.as41113.net [91.208.177.20]) by mx1.freebsd.org (Postfix) with ESMTP id F301A2BCF; Tue, 27 Aug 2013 00:30:52 +0000 (UTC) Received: from jwhlaptop (unknown [91.208.177.70]) (using TLSv1.2 with cipher AES128-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: lists@rewt.org.uk) by hosted.mx.as41113.net (Postfix) with ESMTPSA id 3cP9wh24Gdz63; Tue, 27 Aug 2013 01:30:43 +0100 (BST) From: "Joe Holden" To: , "'Julian Elischer'" References: <521670FF.6080407@delphij.net> <20130826.203744.2304902117196747104.hrs@allbsd.org> <521B9A1B.7080908@freebsd.org> <521BA31C.5000807@delphij.net> In-Reply-To: <521BA31C.5000807@delphij.net> Subject: RE: Why default route is not installed last? Date: Tue, 27 Aug 2013 01:30:34 +0100 Message-ID: <1e7801cea2bc$a60acc80$f2206580$@rewt.org.uk> X-Mailer: Microsoft Outlook 15.0 Thread-Index: AQGgX+Eylv9X6MbbhKK+2kmDsedzAQL7Aov4AjQVfu4C+Dvc8AGD+cxAmbdetHA= Content-Language: en-gb Cc: 'Kimmo Paasiala' , 'Hiroki Sato' , freebsd-rc@freebsd.org, 'FreeBSD Net' X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 Aug 2013 00:30:54 -0000 A whole extra line is required in rc.conf to make that situation work and since it is an edge case and doesn't apply in 99% of uses it really shouldn't be catered for... but what do I know? There has been a few insane changes recently ;) > -----Original Message----- > From: owner-freebsd-net@freebsd.org [mailto:owner-freebsd- > net@freebsd.org] On Behalf Of Xin Li > Sent: 26 August 2013 19:49 > To: Julian Elischer > Cc: Kimmo Paasiala; Hiroki Sato; freebsd-rc@freebsd.org; d@delphij.net; > FreeBSD Net > Subject: Re: Why default route is not installed last? > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA512 > > On 08/26/13 11:10, Julian Elischer wrote: > > On 8/26/13 7:56 PM, Kimmo Paasiala wrote: > >> On Mon, Aug 26, 2013 at 2:37 PM, Hiroki Sato > >> wrote: > >>> Xin Li wrote in > >>> <521670FF.6080407@delphij.net>: > >>> > >>> de> -----BEGIN PGP SIGNED MESSAGE----- de> Hash: SHA512 de> de> > >>> Hi, de> de> I've noticed that we do not install default route last > >>> (after other de> static routes). I think we should probably install > >>> it last, since the de> administrator may legitimately configure a > >>> static route (e.g. this de> IPv6 address goes to this interface) > >>> that is required by the default de> route. > >>> > >>> Do you have an example? I could imagine some theoretically but > >>> personally think that the default route which depends on a static > >>> route is one which should be avoided. > >>> > >>> -- Hiroki > >> Isn't that the case when the default gateway address is on a > >> different subnet than the address assigned to the interface? Such set > >> ups are admittedly odd but they should be possible on FreeBSD as well > >> as on other OSes. > > That has always been specifically not supported. default route needs > > to be directly attached. in fact the routing tables only ever deliver > > the 'next hop' > > Well, depends on whether the 'next hop' is an IP or an interface. For > instance one can have a valid configuration that they have a static route of: > > 2607:5300:XXXX:XXXX:ff:ff:ff:ff -prefixlen 128 -interface em0 > > Then have 2607:5300:XXXX:XXXX:ff:ff:ff:ff as default router. > > This configuration is not possible with the current rc.d startup order. > > Cheers, > - -- > Xin LI https://www.delphij.net/ > FreeBSD - The Power to Serve! Live free or die > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v2.0.21 (FreeBSD) > > iQEcBAEBCgAGBQJSG6MbAAoJEG80Jeu8UPuzAYMH/2K+wa2I2jexZourxzPgH > 25X > OWxsxZgAwd/rEbsbm/0r0ApzGLNm7WQaXaBuNk+u9G9DWOLSTh1M/axRD > Aez4vOC > EJiOfMQxMXlK7uBuA+1cUUrFbrPN4bNaRKY4DvSMWocd3x9T2CrxGaT9Y2SO > 6Q2g > 1x2xSH63MXxebFaaT7nXqLLfpT4IK7yCOWPSXatBdZyZXAZh2ePa7wP4JX/Ti4O > N > IFE6IQwOs9q+w8EiyzLMtoqpZTt882Zw8beDmKMj7On+yXsw48+ryZF54kVu8 > +Sz > dEwdvuKlXWB8FVWRz5gYbAOePq3XqCLeOuMZ5b6eIiHwhlY184nw2A94ahq > VRGE= > =27i9 > -----END PGP SIGNATURE----- > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" From owner-freebsd-net@FreeBSD.ORG Tue Aug 27 07:28:03 2013 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 4EBC9935 for ; Tue, 27 Aug 2013 07:28:03 +0000 (UTC) (envelope-from andre@freebsd.org) Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 9B37C21D5 for ; Tue, 27 Aug 2013 07:28:02 +0000 (UTC) Received: (qmail 11868 invoked from network); 27 Aug 2013 08:10:05 -0000 Received: from c00l3r.networx.ch (HELO [127.0.0.1]) ([62.48.2.2]) (envelope-sender ) by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP for ; 27 Aug 2013 08:10:05 -0000 Message-ID: <521C54FD.2060109@freebsd.org> Date: Tue, 27 Aug 2013 09:27:57 +0200 From: Andre Oppermann User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130801 Thunderbird/17.0.8 MIME-Version: 1.0 To: Adrian Chadd Subject: Re: Flow ID, LACP, and igb References: <521BBD21.4070304@freebsd.org> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Jack F Vogel , "Justin T. Gibbs" , Alan Somers , "net@freebsd.org" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 Aug 2013 07:28:03 -0000 On 27.08.2013 01:30, Adrian Chadd wrote: > ... is there any reason we wouldn't want to have the TX and RX for a given flow mapped to the same core? They are. Thing is the inbound and outbound packet flow id's are totally independent from each other. The inbound one determines the RX ring it will take to go up the stack. If that's bound to a core that's fine and gives affinity. If the socket and user-space application are bound to the same core as well, there is full affinity. Now on the way down the core doing the write to the socket matters entering the kernel. It stays there until the packet is generated (in tcp_output for example). The flow id of the packet doesn't matter at all so far because it is filled only then. Now the packet goes down the stack and the flow id is only used at the end when it has to decide for an outbound TX queue based on it. This outbound TX ring doesn't have to be same it came in on as long as it stays the same to prevent reordering. This fixes Justin's issue with if_lagg and poor balancing. He can simply choose a good hash for the packets going out and stop worrying about it. More important he's no longer hostage to random switches with poor hashing. Ultimately you could try to bind the TX ring to a particular CPU as well and try to run it lockless. That is fraught with some difficult problems though. First you must have exactly as many RX/TX queues as cores. That's often not the case as there are many cards that only support a limited number of rings. Then for packets generated locally (think DNS query over UDP) you either simply stick to the local cpu-assigned queue to send without looking at the computed flow id or you have to switch cores to send the packet on the correct queue. Such a very strong core binding is typically only really useful in embarrassing parallel applications that only do packet pushing. If your application is also compute intense you may want to have some more flexibility to schedule threads to prevent stalls from busy cores. In that case not binding TX to a core is a win. So we will pretty much end up with one lock per TX ring to protect the DMA descriptor structures. We're still far way from having to worry about this TX issue. The big win is the RX queue - socket - application affinity (to the same core). -- Andre From owner-freebsd-net@FreeBSD.ORG Tue Aug 27 08:13:35 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 8749649B; Tue, 27 Aug 2013 08:13:35 +0000 (UTC) (envelope-from bu7cher@yandex.ru) Received: from forward16.mail.yandex.net (forward16.mail.yandex.net [IPv6:2a02:6b8:0:1402::1]) by mx1.freebsd.org (Postfix) with ESMTP id EB568250A; Tue, 27 Aug 2013 08:13:34 +0000 (UTC) Received: from smtp18.mail.yandex.net (smtp18.mail.yandex.net [95.108.252.18]) by forward16.mail.yandex.net (Yandex) with ESMTP id 5082AD21BB3; Tue, 27 Aug 2013 12:13:32 +0400 (MSK) Received: from smtp18.mail.yandex.net (localhost [127.0.0.1]) by smtp18.mail.yandex.net (Yandex) with ESMTP id E0C4F18A0959; Tue, 27 Aug 2013 12:13:31 +0400 (MSK) Received: from v10-165-45.yandex.net (v10-165-45.yandex.net [84.201.165.45]) by smtp18.mail.yandex.net (nwsmtp/Yandex) with ESMTP id dtrAOescQA-DVn0DlCD; Tue, 27 Aug 2013 12:13:31 +0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex.ru; s=mail; t=1377591211; bh=J0WjbDTFYBH2zd4iHxxRlP5oB4p3hNtVD2ToVXys0xo=; h=Message-ID:Date:From:User-Agent:MIME-Version:To:CC:Subject: References:In-Reply-To:X-Enigmail-Version:Content-Type; b=DOTULRsFt4EJAC4jdkTkIHPoUlkkHy7/bScmMP9dyyXhu81EcAP4wYDrEYgy7Sl9F Fszj1lcdyy9GwyoB8Ef/aK1bQzG/bepHl5jBduHhguH4d8Tmnj7DAia2V7evujGi3r MzwZHv7M9yh/cGNqd59jWk7BQrMVGmmivws37exY= Authentication-Results: smtp18.mail.yandex.net; dkim=pass header.i=@yandex.ru Message-ID: <521C5EC2.1060901@yandex.ru> Date: Tue, 27 Aug 2013 12:09:38 +0400 From: "Andrey V. Elsukov" User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/17.0 Thunderbird/17.0 MIME-Version: 1.0 To: Craig Rodrigues Subject: Re: devel/jenkins port not starting. Kernel panic in IPv6 multicast code References: In-Reply-To: X-Enigmail-Version: 1.4.6 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enig1C98A100C131DF04096A6AEA" Cc: freebsd-net@freebsd.org, bms@freebsd.org, lwhsu@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 Aug 2013 08:13:35 -0000 This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig1C98A100C131DF04096A6AEA Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On 27.08.2013 01:07, Craig Rodrigues wrote: > Hi, >=20 > On box 2, since I this is a debug kernel with WITNESS and INVARIANTS > enabled, I get a kernel panic. (see attached core.txt.gz) It seems the log was stripped by maillist. > The panic occurs here on line 1779: >=20 > 1768 static struct ifnet * > 1769 in6p_lookup_mcast_ifp(const struct inpcb *in6p, > 1770 const struct sockaddr_in6 *gsin6) > 1771 { > 1772 struct route_in6 ro6; > 1773 struct ifnet *ifp; > 1774 > 1775 KASSERT(in6p->inp_vflag & INP_IPV6, > 1776 ("%s: not INP_IPV6 inpcb", __func__)); > 1777 KASSERT(gsin6->sin6_family =3D=3D AF_INET6, > 1778 ("%s: not AF_INET6 group", __func__)); > 1779 KASSERT(IN6_IS_ADDR_MULTICAST(&gsin6->sin6_addr), > 1780 ("%s: not multicast", __func__)); >=20 > If I look at gsin6->sin6_addr inside kgdb, > I see: >=20 > (kgdb) p gsin6->sin6_addr > $1 =3D {__u6_addr =3D {__u6_addr8 =3D > "\000\000\000\000\000\000\000\000\000\000=EF=BF=BD=EF=BF=BD=EF=BF=BDM|=EF= =BF=BD", __u6_addr16 =3D {0, 0, 0, > 0, 0, 65535, 19951, 54652}, __u6_addr32 =3D {0, 0, > 4294901760, 3581693423}}} >=20 >=20 > I am not so familiar with this part of the networking code. > Can someone recommend where is the best place to fix > this would be? AFAIR, I already saw similar report here. This is V4 mapped IPv6 address ::ffff:239.77.124.213. I guess application is trying to use setsockopt with IPV6_JOIN_GROUP option. And since outgoing interface isn't specified, the kernel is trying to determine it from routing table. But this mapped address triggers assert in in6p_lookup_mcast_ifp() function. It seems to me, that v4mapped addresses isn't supported in the multicast code. If you remove KASSERT from in6p_lookup_mcast_ifp(), this address will be treated as invalid later. --=20 WBR, Andrey V. Elsukov --------------enig1C98A100C131DF04096A6AEA Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iQEcBAEBAgAGBQJSHF7IAAoJEAHF6gQQyKF6BVEIAJFM89HyzlmLQm9ybqPJafDU Rj1v5ePz+uf5gXXJiVQGCrTTydvm/SJyIITybZ6g/fSbxZkgu9hXDm8fKtuppVU3 hvhJWlQVTVHw/Khr8/HUXMHdYnEk04K3yTRP5B/wcoPAyh9bO3usrcHboFE8dqWY fmubgNjNhIylBfeOi7lDh3i4li1NQH4xgck/gJ+kePmkqrJxu05f/umRn28s7xyj ndAes6gIifV5729Vli6lmS0t95SXXcwPQk+x4b0krDuVRDmgUORXLNC4v/co2PFx BioxSsEzk8N1I06NvTdfD9it4uUMdyNO1OyeztkejZsPzKAXD+CwuDF1G/+qe8Q= =icq3 -----END PGP SIGNATURE----- --------------enig1C98A100C131DF04096A6AEA-- From owner-freebsd-net@FreeBSD.ORG Tue Aug 27 22:27:51 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 2C67C54B for ; Tue, 27 Aug 2013 22:27:51 +0000 (UTC) (envelope-from lists@jnielsen.net) Received: from ns1.jnielsen.net (secure.freebsdsolutions.net [69.55.234.48]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 0FA3A27A1 for ; Tue, 27 Aug 2013 22:27:50 +0000 (UTC) Received: from [10.10.1.32] (office.betterlinux.com [199.58.199.60]) (authenticated bits=0) by ns1.jnielsen.net (8.14.4/8.14.4) with ESMTP id r7RMQWKI062349 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT); Tue, 27 Aug 2013 18:26:33 -0400 (EDT) (envelope-from lists@jnielsen.net) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\)) Subject: Re: Options to monitor/sniff network traffic under a vm From: John Nielsen In-Reply-To: <5219ECBD.4040209@gmail.com> Date: Tue, 27 Aug 2013 16:26:34 -0600 Content-Transfer-Encoding: quoted-printable Message-Id: References: <5219ECBD.4040209@gmail.com> To: carlopmart X-Mailer: Apple Mail (2.1508) X-DCC-sonic.net-Metrics: ns1.jnielsen.net 1156; Body=2 Fuz1=2 Fuz2=2 X-Virus-Scanned: clamav-milter 0.97.8 at ns1.jnielsen.net X-Virus-Status: Clean Cc: freebsd-net@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 Aug 2013 22:27:51 -0000 On Aug 25, 2013, at 5:38 AM, carlopmart wrote: > I need to monitor/sniff network traffic for three subnets (1 GiB nets) = and I need to do this using a virtual guest under an ESXi 5 host (yes, = it is a "handicap"). Not sure about your questions below, but doesn't ESXi 5 support port = mirroring in the virtual switch? That seems like a better place to do = most of the heavy lifting. You could still attach your FreeBSD instance = to the monitor port(s) for analysis. That would hopefully help at least = with a) by reducing the number of virtual NICs needed. > I would like to use FreeBSD 8.4 + netmap, but I see some problems: >=20 > a) How can I avoid sharing interrupts for nics interfaces?? This vm = needs to use 6 nic interfaces. >=20 > b) Which is best: em or ixgb emulated drivers?? >=20 > c) Is it a good idea to enable polling in these nics?? From owner-freebsd-net@FreeBSD.ORG Wed Aug 28 06:10:08 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 297A915F for ; Wed, 28 Aug 2013 06:10:08 +0000 (UTC) (envelope-from carlopmart@gmail.com) Received: from mail-wg0-x234.google.com (mail-wg0-x234.google.com [IPv6:2a00:1450:400c:c00::234]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id B5C662CFC for ; Wed, 28 Aug 2013 06:10:07 +0000 (UTC) Received: by mail-wg0-f52.google.com with SMTP id l18so4226511wgh.31 for ; Tue, 27 Aug 2013 23:10:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; bh=YV5nPs9z78furruCfwwpIiHAxlQ2gFzgdoGE7b2YIp0=; b=msBgt7FuEJyNDIRCA1mjC/XNE6cEaJ3hYRf3YCO6w5ua6X1Ddtew3DwGWEu/SC1+r1 TZYt1ur8wMlAl3AGD0r7T9pK+LaCzkp+N5ZeoSL+f6+RbhpXf3CeciifOEwUjWLVfJt+ l9L8qhoLU5FGFY0UzM+JkFFH0G3AhCpo3VYJFRmDEvKkTftucx4DQR5qlrZCZOIgMd0u VMD9quhf3Xmive7MqVQO68/F6dE8fiIbXpJvzkE+WfDClBwGw5Ihmg0jGLXS2+DBlOv3 ZtzawFHNMm2Zyjz3+nFFphMrheFEu466HWppimrDSagzeb86pxRxKP1+VWUxA1vwvXrd 4bkw== MIME-Version: 1.0 X-Received: by 10.194.120.68 with SMTP id la4mr9296316wjb.33.1377670205945; Tue, 27 Aug 2013 23:10:05 -0700 (PDT) Received: by 10.194.46.33 with HTTP; Tue, 27 Aug 2013 23:10:05 -0700 (PDT) In-Reply-To: References: <5219ECBD.4040209@gmail.com> Date: Wed, 28 Aug 2013 06:10:05 +0000 Message-ID: Subject: Re: Options to monitor/sniff network traffic under a vm From: "C. L. Martinez" To: freebsd-net@freebsd.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 Aug 2013 06:10:08 -0000 On Tue, Aug 27, 2013 at 10:26 PM, John Nielsen wrote: > On Aug 25, 2013, at 5:38 AM, carlopmart wrote: > >> I need to monitor/sniff network traffic for three subnets (1 GiB nets) a= nd I need to do this using a virtual guest under an ESXi 5 host (yes, it is= a "handicap"). > > Not sure about your questions below, but doesn't ESXi 5 support port mirr= oring in the virtual switch? That seems like a better place to do most of t= he heavy lifting. You could still attach your FreeBSD instance to the monit= or port(s) for analysis. That would hopefully help at least with a) by redu= cing the number of virtual NICs needed. > Thanks John for your answer, but I can't use distributed switches in this ESXi server because is a standalone server (distributed vswitches are only available when you manage more than tow ESXi servers using clustering features and is the only option to do port mirroring. Using a standalone server you can enable promisc in a vswitch and use an external tap to see all traffic, but that's not the problem actually: I can see all traffic in this freebsd vm). About nics: I can't reduce the number of virtual NICs. I need to use six to monitor six different subnets ... And here is the problem with IRQs. From owner-freebsd-net@FreeBSD.ORG Wed Aug 28 10:45:50 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 532D1CB5 for ; Wed, 28 Aug 2013 10:45:50 +0000 (UTC) (envelope-from misc+freebsd@talk2dom.com) Received: from mail.shmtech.biz (unknown [IPv6:2001:41c8:10:8c::4:3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id CBD562CFC for ; Wed, 28 Aug 2013 10:45:49 +0000 (UTC) Received: from wingwang.domlan.talk2dom.com (5ac6e914.bb.sky.com [90.198.233.20] (may be forged)) (authenticated bits=0) by mail.shmtech.biz (8.14.7/8.14.5) with ESMTP id r7SAjlgw000621 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO) for ; Wed, 28 Aug 2013 11:45:47 +0100 (BST) (envelope-from misc+freebsd@talk2dom.com) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=talk2dom.com; s=shmtech1; t=1377686748; bh=JKm1Nt9V/5KQ4oL2en+KhFt18KX5Eng/i55KNXFjcw0=; h=Date:From:To:Subject:References:In-Reply-To; b=0c2vUHNNkY3JZNZbLqrPJI4pDxAmexVUbwEPI/0x0/junj1V4DEmSyIzuqLEqYUVM JjoHC3edpBWLMlo13C38uzg6B/m0BGhe45IaCqQxOBhrJdw6SCVq4bJRavDlK/ryKh mW7/M/NUQy6Am6ZLCL9Kzwl+F9DFLTqHQUIWjoiY= X-Authentication-Warning: sendmail: Host 5ac6e914.bb.sky.com [90.198.233.20] (may be forged) claimed to be wingwang.domlan.talk2dom.com Message-ID: <521DD4D6.7010403@talk2dom.com> Date: Wed, 28 Aug 2013 11:45:42 +0100 From: Dom F User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130809 Thunderbird/17.0.8 MIME-Version: 1.0 To: freebsd-net@freebsd.org Subject: [IPFW] [DIVERT] IP header checksums - why calculate twice? References: In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 Aug 2013 10:45:50 -0000 [Copy of my post to FreeBSD Firewalls forum sent here by suggestion from moderator] I've been toying with using IPDIVERT to adjust values in an IPv4 header. When adjusting an incoming IP header, the man page for divert(4) says: Quote: Packets written as incoming and having incorrect checksums will be dropped. My main issue was with trying to leverage the optimised kernel functions for checksumming an IP header, for example in_cksum_hdr(). Processes that connect to DIVERT sockets are based in user-land so in_cksum_hdr() isn't readily available during compile. Eventually the thought hit me that if some part of the kernel has to validate checksums (to decide whether to drop a packet) AND if my user-land process has to calculate a checksum to avoid its packet being dropped THEN surely there are two wasted checksum calculations going on? If a root-owned process, root needed for RAW socket, can be trusted to inject packets back into the IP stack then surely we can skip the checksum test and save a few CPU cycles plus a bit of latency. Very simple patch for /usr/src/sys/netinet/ip_divert.c (based on rev 224575): Code: --- ip_divert.c.orig 2013-08-26 20:52:18.000000000 +0100 +++ ip_divert.c 2013-08-26 20:52:44.000000000 +0100 @@ -496,6 +496,12 @@ /* Send packet to input processing via netisr */ switch (ip->ip_v) { case IPVERSION: + /* mark mbuf as having valid checksum + to save userland divert process from + calculating checksum, and kernel having + to check it */ + m->m_pkthdr.csum_flags |= CSUM_IP_CHECKED | + CSUM_IP_VALID; netisr_queue_src(NETISR_IP, (uintptr_t)so, m); break; #ifdef INET6 From owner-freebsd-net@FreeBSD.ORG Wed Aug 28 13:45:28 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 6CBD04CC for ; Wed, 28 Aug 2013 13:45:28 +0000 (UTC) (envelope-from joemoog@ebureau.com) Received: from internet06.ebureau.com (internet06.ebureau.com [65.127.24.25]) by mx1.freebsd.org (Postfix) with ESMTP id 4660C2830 for ; Wed, 28 Aug 2013 13:45:28 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by internet06.ebureau.com (Postfix) with ESMTP id 2F71A3C3ED86 for ; Wed, 28 Aug 2013 08:36:46 -0500 (CDT) X-Virus-Scanned: amavisd-new at ebureau.com Received: from internet06.ebureau.com ([127.0.0.1]) by localhost (internet06.ebureau.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Z_U9hJ-rC1Wl for ; Wed, 28 Aug 2013 08:36:46 -0500 (CDT) Received: from nail.office.ebureau.com (nail.office.ebureau.com [10.10.20.23]) by internet06.ebureau.com (Postfix) with ESMTPSA id 0DA313C3ED7E for ; Wed, 28 Aug 2013 08:36:46 -0500 (CDT) Content-Type: text/plain; charset=iso-8859-1 Mime-Version: 1.0 (Mac OS X Mail 7.0 \(1800\)) Subject: Re: Intel 4-port ethernet adaptor link aggregation issue From: Joe Moog In-Reply-To: Date: Wed, 28 Aug 2013 08:36:45 -0500 Content-Transfer-Encoding: quoted-printable Message-Id: <71042F7C-5CBB-4494-B53A-EF4CE45B41BE@ebureau.com> References: To: freebsd-net X-Mailer: Apple Mail (2.1800) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 Aug 2013 13:45:28 -0000 All: Thanks again to everybody for the responses and suggestions to our = 4-port lagg issue. The solution (for those that may find the information = of some value) was to set the value for kern.ipc.nmbclusters to a higher = value than we had initially. Our previous tuning had this value set at = 25600, but following a recommendation from the good folks at iXSystems = we bumped this to a value closer to 2000000, and the 4-port lagg is = functioning as expected now. Thank you all. Joe From owner-freebsd-net@FreeBSD.ORG Wed Aug 28 15:13:26 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 9DAADE94 for ; Wed, 28 Aug 2013 15:13:26 +0000 (UTC) (envelope-from milu@dat.pl) Received: from jab.dat.pl (dat.pl [80.51.155.34]) by mx1.freebsd.org (Postfix) with ESMTP id 5CD872FDA for ; Wed, 28 Aug 2013 15:13:26 +0000 (UTC) Received: from jab.dat.pl (jsrv.dat.pl [127.0.0.1]) by jab.dat.pl (Postfix) with ESMTP id 4DDD2C2 for ; Wed, 28 Aug 2013 17:07:55 +0200 (CEST) X-Virus-Scanned: amavisd-new at dat.pl Received: from jab.dat.pl ([127.0.0.1]) by jab.dat.pl (jab.dat.pl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id g7Wo5pWo2Use for ; Wed, 28 Aug 2013 17:07:53 +0200 (CEST) Received: from [10.0.6.80] (unknown [212.69.68.42]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by jab.dat.pl (Postfix) with ESMTPSA id 7CA715F for ; Wed, 28 Aug 2013 17:07:53 +0200 (CEST) Message-ID: <521E1251.9040800@dat.pl> Date: Wed, 28 Aug 2013 17:08:01 +0200 From: Maciej Milewski User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130806 Thunderbird/17.0.8 MIME-Version: 1.0 To: freebsd-net Subject: LOR @netipsec/key.c:2434 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 Aug 2013 15:13:26 -0000 I've observed the following LOR in my logs. Just after fresh start: lock order reversal: 1st 0x80865bb4 sptree (fast ipsec security policy database) @ /data/head-git/sys/netipsec/key.c:2434 2nd 0x80861554 rawcb (rawcb) @ /data/head-git/sys/netipsec/keysock.c:303 KDB: stack backtrace: db_trace_thread+30 (?,?,?,?) ra cd67f78800000018 sp 0 sz 0 db_trace_self+1c (?,?,?,?) ra cd67f7a000000018 sp 0 sz 0 8008f828+34 (?,?,?,?) ra cd67f7b8000001a0 sp 0 sz 0 kdb_backtrace+44 (?,?,?,?) ra cd67f95800000018 sp 0 sz 0 802defd8+34 (?,?,?,?) ra cd67f97000000020 sp 0 sz 0 witness_checkorder+b0c (?,?,8061bb24,12f) ra cd67f99000000050 sp 0 sz 1 __mtx_lock_flags+e8 (?,?,?,?) ra cd67f9e000000030 sp 0 sz 0 key_sendup_mbuf+274 (?,?,?,?) ra cd67fa1000000030 sp 0 sz 0 8044f238+150 (?,?,?,?) ra cd67fa4000000030 sp 0 sz 0 key_parse+f6c (?,?,?,?) ra cd67fa7000000180 sp 0 sz 0 key_output+334 (?,?,?,?) ra cd67fbf000000028 sp 0 sz 0 8036c314+8c (?,?,?,?) ra cd67fc1800000020 sp 0 sz 0 80451e80+28 (?,?,?,?) ra cd67fc3800000020 sp 0 sz 0 sosend_generic+4c4 (?,0,?,?) ra cd67fc5800000068 sp 1 sz 0 sosend+34 (?,?,?,?) ra cd67fcc000000028 sp 0 sz 0 kern_sendit+11c (?,?,?,?) ra cd67fce800000068 sp 0 sz 0 80307a48+b4 (?,?,?,?) ra cd67fd5000000038 sp 0 sz 0 sys_sendto+50 (?,?,?,?) ra cd67fd8800000040 sp 0 sz 0 trap+7f0 (?,?,?,?) ra cd67fdc8000000b8 sp 0 sz 0 MipsUserGenException+10c (?,?,?,40896240) ra cd67fe8000000000 sp 0 sz 0 pid 3035 root@RSPRO:~# ps auxwww | grep 3035 root 3035 0.0 1.3 13832 1736 - Is 2:03AM 0:00.45 /usr/local/sbin/racoon This is on MIPS Ubiquiti RSPRO board that is configured to use IPSEC transport mode with IPv6. I'm using for that racoon from ipsec-tools package. System is running currently: HEAD@r253582 I'll be happy with testing some patches if anyone knows why this happens and provides them. -- Pozdrawiam, Maciej Milewski From owner-freebsd-net@FreeBSD.ORG Wed Aug 28 18:30:46 2013 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 624D2B52; Wed, 28 Aug 2013 18:30:46 +0000 (UTC) (envelope-from melifaro@yandex-team.ru) Received: from forward-corp1e.mail.yandex.net (forward-corp1e.mail.yandex.net [IPv6:2a02:6b8:0:202::10]) by mx1.freebsd.org (Postfix) with ESMTP id 3E5232D9C; Wed, 28 Aug 2013 18:30:45 +0000 (UTC) Received: from smtpcorp4.mail.yandex.net (smtpcorp4.mail.yandex.net [95.108.252.2]) by forward-corp1e.mail.yandex.net (Yandex) with ESMTP id 1CAD064006D; Wed, 28 Aug 2013 22:30:42 +0400 (MSK) Received: from smtpcorp4.mail.yandex.net (localhost [127.0.0.1]) by smtpcorp4.mail.yandex.net (Yandex) with ESMTP id 018FC2C0173; Wed, 28 Aug 2013 22:30:41 +0400 (MSK) Received: from dhcp170-36-red.yandex.net (dhcp170-36-red.yandex.net [95.108.170.36]) by smtpcorp4.mail.yandex.net (nwsmtp/Yandex) with ESMTP id fyMijftbk1-UfD0Ynrx; Wed, 28 Aug 2013 22:30:41 +0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex-team.ru; s=default; t=1377714641; bh=AW6ufh6kIvAxiM+MsIJACctEm8ZbDDYBUX9jB61JP/4=; h=Message-ID:Date:From:User-Agent:MIME-Version:To:CC:Subject: Content-Type; b=u8yrzEOZBJx9qwddK0RtoZ49tkgibPrkr2JZ+Jf85+Jaj0CrmBmy77f0747mqE+AS w7TDf3gcCt+QbncXKKcIFCxas4ITtcPy4R4fhXsELRhDjBGz7WyBYNLhnt1KibM3je uhHqth/NVDpzXMr5toFTgvyh0qUB0BRcXtFa/P2I= Authentication-Results: smtpcorp4.mail.yandex.net; dkim=pass header.i=@yandex-team.ru Message-ID: <521E41CB.30700@yandex-team.ru> Date: Wed, 28 Aug 2013 22:30:35 +0400 From: "Alexander V. Chernikov" User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130418 Thunderbird/17.0.5 MIME-Version: 1.0 To: FreeBSD Net , freebsd-hackers@freebsd.org, freebsd-arch@freebsd.org Subject: Network stack changes Content-Type: multipart/mixed; boundary="------------010308000904000207080306" Cc: ae@FreeBSD.org, adrian@freebsd.org, andre@freebsd.org, luigi@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 Aug 2013 18:30:46 -0000 This is a multi-part message in MIME format. --------------010308000904000207080306 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Hello list! There is a lot constantly raising discussions related to networking stack performance/changes. I'll try to summarize current problems and possible solutions from my point of view. (Generally this is one problem: stack is slooooooooooooooooooooooooooow, but we need to know why and what to do). Let's start with current IPv4 packet flow on a typical router: http://static.ipfw.ru/images/freebsd_ipv4_flow.png (I'm sorry I can't provide this as text since Visio don't have any 'ascii-art' exporter). Note that we are using process-to-completion model, e.g. process any packet in ISR until it is either consumed by L4+ stack or dropped or put to egress NIC queue. (There is also deferred ISR model implemented inside netisr but it does not change much: it can help to do more fine-grained hashing (for GRE or other similar traffic), but 1) it uses per-packet mutex locking which kills all performance 2) it currently does not have _any_ hashing functions (see absence of flags in `netstat -Q`) People using http://static.ipfw.ru/patches/netisr_ip_flowid.diff (or modified PPPoe/GRE version) report some profit, but without fixing (1) it can't help much ) So, let's start: 1) Ixgbe uses mutex to protect each RX ring which is perfectly fine since there is nearly no contention (the only thing that can happen is driver reconfiguration which is rare and, more signifficant, we do this once for the batch of packets received in given interrupt). However, due to some (im)possible deadlocks current code does per-packet ring unlock/lock (see ixgbe_rx_input()). There was a discussion ended with nothing: http://lists.freebsd.org/pipermail/freebsd-net/2012-October/033520.html 1*) Possible BPF users. Here we have one rlock if there are any readers present (and mutex for any matching packets, but this is more or less OK. Additionally, there is WIP to implement multiqueue BPF and there is chance that we can reduce lock contention there). There is also an "optimize_writers" hack permitting applications like CDP to use BPF as writers but not registering them as receivers (which implies rlock) 2/3) Virtual interfaces (laggs/vlans over lagg and other simular constructions). Currently we simply use rlock to make s/ix0/lagg0/ and, what is much more funny - we use complex vlan_hash with another rlock to get vlan interface from underlying one. This is definitely not like things should be done and this can be changed more or less easily. There are some useful terms/techniques in world of software/hardware routing: they have clear 'control plane' and 'data plane' separation. Former one is for dealing control traffic (IGP, MLD, IGMP snooping, lagg hellos, ARP/NDP, etc..) and some data traffic (packets with TTL=1, with options, destined to hosts without ARP/NDP record, and similar). Latter one is done in hardware (or effective software implementation). Control plane is responsible to provide data for efficient data plane operations. This is the point we are missing nearly everywhere. What I want to say is: lagg is pure control-plane stuff and vlan is nearly the same. We can't apply this approach to complex cases like lagg-over-vlans-over-vlans-over-(pppoe_ng0-and_wifi0) but we definitely can do this for most common setups like (igb* or ix* in lagg with or without vlans on top of lagg). We already have some capabilities like VLANHWFILTER/VLANHWTAG, we can add some more. We even have per-driver hooks to program HW filtering. One small step to do is to throw packet to vlan interface directly (P1), proof-of-concept(working in production): http://lists.freebsd.org/pipermail/freebsd-net/2013-April/035270.html Another is to change lagg packet accounting: http://lists.freebsd.org/pipermail/svn-src-all/2013-April/067570.html Again, this is more like HW boxes do (aggregate all counters including errors) (and I can't imagine what real error we can get from _lagg_). 4) If we are router, we can do either slooow ip_input() -> ip_forward() -> ip_output() cycle or use optimized ip_fastfwd() which falls back to 'slow' path for multicast/options/local traffic (e.g. works exactly like 'data plane' part). (Btw, we can consider net.inet.ip.fastforwarding to be turned on by default at least for non-IPSEC kernels) Here we have to determine if this is local packet or not, e.g. F(dst_ip) returning 1 or 0. Currently we are simply using standard rlock + hash of iface addresses. (And some consumers like ipfw(4) do the same, but without lock). We don't need to do this! We can build sorted array of IPv4 addresses or other efficient structure on every address change and use it unlocked with delayed garbage collection (proof-of-concept attached) (There is another thing to discuss: maybe we can do this once somewhere in ip_input and mark mbuf as 'local/non-local' ? ) 5, 9) Currently we have L3 ingress/egress PFIL hooks protected by rmlocks. This is OK. However, 6) and 7) are not. Firewall can use the same pfil lock as reader protection without imposing its own lock. currently pfil&ipfw code is ready to do this. 8) Radix/rt* api. This is probably the worst place in entire stack. It is toooo generic, tooo slow and buggy (do you use IPv6? you definitely know what I'm talking about). A) It really is too generic and assumption that it can be (effectively) used for every family is wrong. Two examples: we don't need to lookup all 128 bits of IPv6 address. Subnets with mask >/64 are not used widely (actually the only reason to use them are p2p links due to ND potential problems). One of common solutions is to lookup 64bits, and build another trie (or other structure) in case of collision. Another example is MPLS where we can simply do direct array lookup based on ingress label. B) It is terribly slow (AFAIR luigi@ did some performance management, numbers available in one of netmap pdfs) C) It is not multipath-capable. Stateful (and non-working) multipath is definitely not the right way. 8*) rtentry We are doing it wrong. Currently _every_ lookup locks/unlocks given rte twice. First lock is related to and old-old story for trusting IP redirects (and auto-adding host routes for them). Hopefully currently it is disabled automatically when you turn forwarding on. The second one is much more complicated: we are assuming that rte's with non-zero refcount value can stop egress interface from being destroyed. This is wrong (but widely used) assumption. We can use delayed GC instead of locking for rte's and this won't break things more than they are broken now (patch attached). We can't do the same for ifp structures since a) virtual ones can assume some state in underlying physical NIC b) physical ones just _can_ be destroyed (maybe regardless of user wants this or not, like: SFP being unplugged from NIC) or simply lead to kernel crash due to SW/HW inconsistency One of possible solution is to implement stable refcounts based on PCPU counters, and apply thos counters to ifp, but seem to be non-trivial. Another rtalloc(9) problem is the fact that radix is used as both 'control plane' and 'data plane' structure/api. Some users always want to put more information in rte, while others want to make rte more compact. We just need _different_ structures for that. Feature-rich, lot-of-data control plane one (to store everything we want to store, including, for example, PID of process originating the route) - current radix can be modified to do this. And address-family-depended another structure (array, trie, or anything) which contains _only_ data necessary to put packet on the wire. 11) arpresolve. Currently (this was decoupled in 8.x) we have a) ifaddr rlock b) lle rlock. We don't need those locks. We need to a) make lle layer per-interface instead of global (and this can also solve multiple fibs and L2 mappings done in fib.0 issue) b) use rtalloc(9)-provided lock instead of separate locking c) actually, we need to do rewrite this layer because d) lle actually is the place to do real multipath: briefly, you have rte pointing to some special nexthop structure pointing to lle, which has the following data: num_of_egress_ifaces: [ifindex1, ifindex2, ifindex3] | L2 data to prepend to header Separate post will follow. With the following, we can achieve lagg traffic distribution without actually using lagg_transmit and similar stuff (at least in most common scenarious) (for example, TCP output definitely can benefit from this, since we can account flowid once for TCP session and use in in every mbuf) So. Imagine we have done all this. How we can estimate the difference? There was a thread, started a year ago, describing 'stock' performance and difference for various modifications. It is done on 8.x, however I've got similar results on recent 9.x http://lists.freebsd.org/pipermail/freebsd-net/2012-July/032680.html Briefly: 2xE5645 @ Intel 82599 NIC. Kernel: FreeBSD-8-S r237994, stock drivers, stock routing, no FLOWTABLE, no firewallIxia XM2 (traffic generator) <> ix0 (FreeBSD). Ixia sends 64byte IP packets from vlan10 (10.100.0.64 - 10.100.0.156) to destinations in vlan11 (10.100.1.128 - 10.100.1.192). Static arps are configured for all destination addresses. Traffic level is slightly above or slightly below system performance. we start from 1.4MPPS (if we are using several routes to minimize mutex contention). My 'current' result for the same test, on same HW, with the following modifications: * 1) ixgbe per-packet ring unlock removed * P1) ixgbe is modified to do direct vlan input (so 2,3 are not used) * 4) separate lockless in_localip() version * 6) - using existing pfil lock * 7) using lockless version * 8) radix converted to use rmlock instead of rlock. Delayed GC is used instead of mutexes * 10) - using existing pfil lock * 11) using radix lock to do arpresolve(). Not using lle rlock (so the rmlocks are the only locks used on data path). Additionally, ipstat counters are converted to PCPU (no real performance implications). ixgbe does not do per-packet accounting (as in head). if_vlan counters are converted to PCPU lagg is converted to rmlock, per-packet accounting is removed (using stat from underlying interfaces) lle hash size is bumped to 1024 instead of 32 (not applicable here, but slows things down for large L2 domains) The result is 5.6 MPPS for single port (11 cores) and 6.5MPPS for lagg (16 cores), nearly the same for HT on and 22 cores. .. while Intel DPDK claims 80MPPS (and 6windgate talks about 160 or so) on the same-class hardware and _userland_ forwarding. One of key features making all such products possible (DPDK, netmap, packetshader, Cisco SW forwarding) - is use of batching instead of process-to-completion model. Batching mitigates locking cost, batching does not wash out CPU cache, and so on. So maybe we can consider passing batches from NIC to at least L2 layer with netisr? or even up to ip_input() ? Another question is about making some sort of reliable GC like ("passive serialization" or other similar not-to-pronounce-words about Linux and lockless objects). P.S. Attached patches are 1) for 8.x 2) mostly 'hacks' showing roughly how can this be done and what benefit can be achieved. --------------010308000904000207080306 Content-Type: text/plain; charset=UTF-8; name="1_ixgbe_unlock.diff" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="1_ixgbe_unlock.diff" commit 20a52503455c80cd149d2232bdc0d37e14381178 Author: Charlie Root Date: Tue Oct 23 21:20:13 2012 +0000 Remove RX ring unlock/lock before calling if_input() from ixgbe drivers. diff --git a/sys/dev/ixgbe/ixgbe.c b/sys/dev/ixgbe/ixgbe.c index 5d8752b..fc1491e 100644 --- a/sys/dev/ixgbe/ixgbe.c +++ b/sys/dev/ixgbe/ixgbe.c @@ -4171,9 +4171,7 @@ ixgbe_rx_input(struct rx_ring *rxr, struct ifnet *ifp, struct mbuf *m, u32 ptype if (tcp_lro_rx(&rxr->lro, m, 0) == 0) return; } - IXGBE_RX_UNLOCK(rxr); (*ifp->if_input)(ifp, m); - IXGBE_RX_LOCK(rxr); } static __inline void --------------010308000904000207080306 Content-Type: text/plain; charset=UTF-8; name="2_ixgbe_vlans2.diff" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="2_ixgbe_vlans2.diff" Index: sys/dev/ixgbe/ixgbe.c =================================================================== --- sys/dev/ixgbe/ixgbe.c (revision 248704) +++ sys/dev/ixgbe/ixgbe.c (working copy) @@ -2880,6 +2880,14 @@ ixgbe_allocate_queues(struct adapter *adapter) error = ENOMEM; goto err_rx_desc; } + + if ((rxr->vlans = malloc(sizeof(struct ifvlans), M_DEVBUF, + M_NOWAIT | M_ZERO)) == NULL) { + device_printf(dev, + "Critical Failure setting up vlan index\n"); + error = ENOMEM; + goto err_rx_desc; + } } /* @@ -4271,6 +4279,11 @@ ixgbe_free_receive_buffers(struct rx_ring *rxr) rxr->ptag = NULL; } + if (rxr->vlans != NULL) { + free(rxr->vlans, M_DEVBUF); + rxr->vlans = NULL; + } + return; } @@ -4303,7 +4316,7 @@ ixgbe_rx_input(struct rx_ring *rxr, struct ifnet * return; } IXGBE_RX_UNLOCK(rxr); - (*ifp->if_input)(ifp, m); + (*ifp->if_input)(m->m_pkthdr.rcvif, m); IXGBE_RX_LOCK(rxr); } @@ -4360,6 +4373,7 @@ ixgbe_rxeof(struct ix_queue *que) u16 count = rxr->process_limit; union ixgbe_adv_rx_desc *cur; struct ixgbe_rx_buf *rbuf, *nbuf; + struct ifnet *ifp_dst; IXGBE_RX_LOCK(rxr); @@ -4522,9 +4536,19 @@ ixgbe_rxeof(struct ix_queue *que) (staterr & IXGBE_RXD_STAT_VP)) vtag = le16toh(cur->wb.upper.vlan); if (vtag) { - sendmp->m_pkthdr.ether_vtag = vtag; - sendmp->m_flags |= M_VLANTAG; - } + ifp_dst = rxr->vlans->idx[EVL_VLANOFTAG(vtag)]; + + if (ifp_dst != NULL) { + ifp_dst->if_ipackets++; + sendmp->m_pkthdr.rcvif = ifp_dst; + } else { + sendmp->m_pkthdr.ether_vtag = vtag; + sendmp->m_flags |= M_VLANTAG; + sendmp->m_pkthdr.rcvif = ifp; + } + } else + sendmp->m_pkthdr.rcvif = ifp; + if ((ifp->if_capenable & IFCAP_RXCSUM) != 0) ixgbe_rx_checksum(staterr, sendmp, ptype); #if __FreeBSD_version >= 800000 @@ -4625,7 +4649,32 @@ ixgbe_rx_checksum(u32 staterr, struct mbuf * mp, u return; } +/* + * This routine gets real vlan ifp based on + * underlying ifp and vlan tag. + */ +static struct ifnet * +ixgbe_get_vlan(struct ifnet *ifp, uint16_t vtag) +{ + /* XXX: IFF_MONITOR */ +#if 0 + struct lagg_port *lp = ifp->if_lagg; + struct lagg_softc *sc = lp->lp_softc; + + /* Skip lagg nesting */ + while (ifp->if_type == IFT_IEEE8023ADLAG) { + lp = ifp->if_lagg; + sc = lp->lp_softc; + ifp = sc->sc_ifp; + } +#endif + /* Get vlan interface based on tag */ + ifp = VLAN_DEVAT(ifp, vtag); + + return (ifp); +} + /* ** This routine is run via an vlan config EVENT, ** it enables us to use the HW Filter table since @@ -4637,7 +4686,9 @@ static void ixgbe_register_vlan(void *arg, struct ifnet *ifp, u16 vtag) { struct adapter *adapter = ifp->if_softc; - u16 index, bit; + u16 index, bit, j; + struct rx_ring *rxr; + struct ifnet *ifv; if (ifp->if_softc != arg) /* Not our event */ return; @@ -4645,7 +4696,20 @@ ixgbe_register_vlan(void *arg, struct ifnet *ifp, if ((vtag == 0) || (vtag > 4095)) /* Invalid */ return; + ifv = ixgbe_get_vlan(ifp, vtag); + IXGBE_CORE_LOCK(adapter); + + if (ifp->if_capenable & IFCAP_VLAN_HWFILTER) { + rxr = adapter->rx_rings; + + for (j = 0; j < adapter->num_queues; j++, rxr++) { + IXGBE_RX_LOCK(rxr); + rxr->vlans->idx[vtag] = ifv; + IXGBE_RX_UNLOCK(rxr); + } + } + index = (vtag >> 5) & 0x7F; bit = vtag & 0x1F; adapter->shadow_vfta[index] |= (1 << bit); @@ -4663,7 +4727,8 @@ static void ixgbe_unregister_vlan(void *arg, struct ifnet *ifp, u16 vtag) { struct adapter *adapter = ifp->if_softc; - u16 index, bit; + u16 index, bit, j; + struct rx_ring *rxr; if (ifp->if_softc != arg) return; @@ -4672,6 +4737,15 @@ ixgbe_unregister_vlan(void *arg, struct ifnet *ifp return; IXGBE_CORE_LOCK(adapter); + + rxr = adapter->rx_rings; + + for (j = 0; j < adapter->num_queues; j++, rxr++) { + IXGBE_RX_LOCK(rxr); + rxr->vlans->idx[vtag] = NULL; + IXGBE_RX_UNLOCK(rxr); + } + index = (vtag >> 5) & 0x7F; bit = vtag & 0x1F; adapter->shadow_vfta[index] &= ~(1 << bit); @@ -4686,8 +4760,8 @@ ixgbe_setup_vlan_hw_support(struct adapter *adapte { struct ifnet *ifp = adapter->ifp; struct ixgbe_hw *hw = &adapter->hw; + u32 ctrl, j; struct rx_ring *rxr; - u32 ctrl; /* @@ -4713,6 +4787,15 @@ ixgbe_setup_vlan_hw_support(struct adapter *adapte if (ifp->if_capenable & IFCAP_VLAN_HWFILTER) { ctrl &= ~IXGBE_VLNCTRL_CFIEN; ctrl |= IXGBE_VLNCTRL_VFE; + } else { + /* Zero vlan table */ + rxr = adapter->rx_rings; + + for (j = 0; j < adapter->num_queues; j++, rxr++) { + IXGBE_RX_LOCK(rxr); + memset(rxr->vlans->idx, 0, sizeof(struct ifvlans)); + IXGBE_RX_UNLOCK(rxr); + } } if (hw->mac.type == ixgbe_mac_82598EB) ctrl |= IXGBE_VLNCTRL_VME; Index: sys/dev/ixgbe/ixgbe.h =================================================================== --- sys/dev/ixgbe/ixgbe.h (revision 248704) +++ sys/dev/ixgbe/ixgbe.h (working copy) @@ -284,6 +284,11 @@ struct ix_queue { u64 irqs; }; +struct ifvlans { + struct ifnet *idx[4096]; +}; + + /* * The transmit ring, one per queue */ @@ -307,7 +312,6 @@ struct tx_ring { } queue_status; u32 txd_cmd; bus_dma_tag_t txtag; - char mtx_name[16]; #ifndef IXGBE_LEGACY_TX struct buf_ring *br; struct task txq_task; @@ -324,6 +328,7 @@ struct tx_ring { unsigned long no_tx_dma_setup; u64 no_desc_avail; u64 total_packets; + char mtx_name[16]; }; @@ -346,8 +351,8 @@ struct rx_ring { u16 num_desc; u16 mbuf_sz; u16 process_limit; - char mtx_name[16]; struct ixgbe_rx_buf *rx_buffers; + struct ifvlans *vlans; bus_dma_tag_t ptag; u32 bytes; /* Used for AIM calc */ @@ -363,6 +368,7 @@ struct rx_ring { #ifdef IXGBE_FDIR u64 flm; #endif + char mtx_name[16]; }; /* Our adapter structure */ --------------010308000904000207080306 Content-Type: text/plain; charset=UTF-8; name="3_in_localip_fast.diff" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="3_in_localip_fast.diff" commit 7f1103ac622881182642b2d3ae17b6ff484c1293 Author: Charlie Root Date: Sun Apr 7 23:50:26 2013 +0000 Use lockles in_localip_fast() function. diff --git a/sys/net/route.h b/sys/net/route.h index 4d9371b..f588f03 100644 --- a/sys/net/route.h +++ b/sys/net/route.h @@ -365,6 +365,7 @@ void rt_maskedcopy(struct sockaddr *, struct sockaddr *, struct sockaddr *); */ #define RTGC_ROUTE 1 #define RTGC_IF 3 +#define RTGC_IFADDR 4 int rtexpunge(struct rtentry *); diff --git a/sys/netinet/in.c b/sys/netinet/in.c index 5341918..a83b8a9 100644 --- a/sys/netinet/in.c +++ b/sys/netinet/in.c @@ -93,6 +93,20 @@ VNET_DECLARE(struct inpcbinfo, ripcbinfo); VNET_DECLARE(struct arpstat, arpstat); /* ARP statistics, see if_arp.h */ #define V_arpstat VNET(arpstat) +struct in_ifaddrf { + struct in_ifaddrf *next; + struct in_addr addr; +}; + +struct in_ifaddrhashf { + uint32_t hmask; + uint32_t count; + struct in_ifaddrf **hash; +}; + +VNET_DEFINE(struct in_ifaddrhashf *, in_ifaddrhashtblf) = NULL; /* inet addr fast hash table */ +#define V_in_ifaddrhashtblf VNET(in_ifaddrhashtblf) + /* * Return 1 if an internet address is for a ``local'' host * (one to which we have a connection). If subnetsarelocal @@ -145,6 +159,120 @@ in_localip(struct in_addr in) return (0); } +int +in_localip_fast(struct in_addr in) +{ + struct in_ifaddrf *rec; + struct in_ifaddrhashf *f; + + if ((f = V_in_ifaddrhashtblf) == NULL) + return (0); + + rec = f->hash[INADDR_HASHVAL(in) & f->hmask]; + + while (rec != NULL && rec->addr.s_addr != in.s_addr) + rec = rec->next; + + if (rec != NULL) + return (1); + + return (0); +} + +struct in_ifaddrhashf * +in_hash_alloc(int additional) +{ + int count, hsize, i; + struct in_ifaddr *ia; + struct in_ifaddrhashf *new; + + count = additional + 1; + + IN_IFADDR_RLOCK(); + for (i = 0; i < INADDR_NHASH; i++) { + LIST_FOREACH(ia, &V_in_ifaddrhashtbl[i], ia_hash) + count++; + } + IN_IFADDR_RUNLOCK(); + + /* roundup to the next power of 2 */ + hsize = (1UL << flsl(count - 1)); + + new = malloc(sizeof(struct in_ifaddrhashf) + + sizeof(void *) * hsize + + sizeof(struct in_ifaddrf) * count, M_IFADDR, + M_NOWAIT | M_ZERO); + + if (new == NULL) + return (NULL); + + new->count = count; + new->hmask = hsize - 1; + new->hash = (struct in_ifaddrf **)(new + 1); + + return (new); +} + +int +in_hash_build(struct in_ifaddrhashf *new) +{ + struct in_ifaddr *ia; + int i, j, count, hsize, r; + struct in_ifaddrhashf *old; + struct in_ifaddrf *rec, *tmp; + + count = new->count - 1; + hsize = new->hmask + 1; + rec = (struct in_ifaddrf *)&new->hash[hsize]; + + IN_IFADDR_RLOCK(); + for (i = 0; i < INADDR_NHASH; i++) { + LIST_FOREACH(ia, &V_in_ifaddrhashtbl[i], ia_hash) { + rec->addr.s_addr = IA_SIN(ia)->sin_addr.s_addr; + + j = INADDR_HASHVAL(rec->addr) & new->hmask; + if ((tmp = new->hash[j]) == NULL) + new->hash[j] = rec; + else { + while (tmp->next) + tmp = tmp->next; + tmp->next = rec; + } + + rec++; + count--; + + /* End of memory */ + if (count < 0) + break; + } + + /* End of memory */ + if (count < 0) + break; + } + IN_IFADDR_RUNLOCK(); + + /* If count >0 then we succeeded in building hash. Stop cycle */ + + if (count >= 0) { + old = V_in_ifaddrhashtblf; + V_in_ifaddrhashtblf = new; + + rtgc_free(RTGC_IFADDR, old, 0); + + return (1); + } + + /* Fail. */ + if (new) + free(new, M_IFADDR); + + return (0); +} + + + /* * Determine whether an IP address is in a reserved set of addresses * that may not be forwarded, or whether datagrams to that destination @@ -239,6 +367,7 @@ in_control(struct socket *so, u_long cmd, caddr_t data, struct ifnet *ifp, struct sockaddr_in oldaddr; int error, hostIsNew, iaIsNew, maskIsNew; int iaIsFirst; + struct in_ifaddrhashf *new_hash; ia = NULL; iaIsFirst = 0; @@ -405,6 +534,11 @@ in_control(struct socket *so, u_long cmd, caddr_t data, struct ifnet *ifp, goto out; } + if ((new_hash = in_hash_alloc(1)) == NULL) { + error = ENOBUFS; + goto out; + } + ifa = &ia->ia_ifa; ifa_init(ifa); ifa->ifa_addr = (struct sockaddr *)&ia->ia_addr; @@ -427,6 +561,8 @@ in_control(struct socket *so, u_long cmd, caddr_t data, struct ifnet *ifp, IN_IFADDR_WLOCK(); TAILQ_INSERT_TAIL(&V_in_ifaddrhead, ia, ia_link); IN_IFADDR_WUNLOCK(); + + in_hash_build(new_hash); iaIsNew = 1; } break; @@ -649,6 +785,8 @@ in_control(struct socket *so, u_long cmd, caddr_t data, struct ifnet *ifp, ifa_free(&if_ia->ia_ifa); } else IN_IFADDR_WUNLOCK(); + if ((new_hash = in_hash_alloc(0)) != NULL) + in_hash_build(new_hash); ifa_free(&ia->ia_ifa); /* in_ifaddrhead */ out: if (ia != NULL) @@ -852,6 +990,7 @@ in_ifinit(struct ifnet *ifp, struct in_ifaddr *ia, struct sockaddr_in *sin, register u_long i = ntohl(sin->sin_addr.s_addr); struct sockaddr_in oldaddr; int s = splimp(), flags = RTF_UP, error = 0; + struct in_ifaddrhashf *new_hash; oldaddr = ia->ia_addr; if (oldaddr.sin_family == AF_INET) @@ -862,6 +1001,9 @@ in_ifinit(struct ifnet *ifp, struct in_ifaddr *ia, struct sockaddr_in *sin, LIST_INSERT_HEAD(INADDR_HASH(ia->ia_addr.sin_addr.s_addr), ia, ia_hash); IN_IFADDR_WUNLOCK(); + + if ((new_hash = in_hash_alloc(1)) != NULL) + in_hash_build(new_hash); } /* * Give the interface a chance to initialize @@ -887,6 +1029,8 @@ in_ifinit(struct ifnet *ifp, struct in_ifaddr *ia, struct sockaddr_in *sin, */ LIST_REMOVE(ia, ia_hash); IN_IFADDR_WUNLOCK(); + if ((new_hash = in_hash_alloc(1)) != NULL) + in_hash_build(new_hash); return (error); } } diff --git a/sys/netinet/in.h b/sys/netinet/in.h index b03e74c..948938a 100644 --- a/sys/netinet/in.h +++ b/sys/netinet/in.h @@ -741,6 +741,7 @@ int in_broadcast(struct in_addr, struct ifnet *); int in_canforward(struct in_addr); int in_localaddr(struct in_addr); int in_localip(struct in_addr); +int in_localip_fast(struct in_addr); int inet_aton(const char *, struct in_addr *); /* in libkern */ char *inet_ntoa(struct in_addr); /* in libkern */ char *inet_ntoa_r(struct in_addr ina, char *buf); /* in libkern */ diff --git a/sys/netinet/ip_fastfwd.c b/sys/netinet/ip_fastfwd.c index 692e3e5..f7734a9 100644 --- a/sys/netinet/ip_fastfwd.c +++ b/sys/netinet/ip_fastfwd.c @@ -347,7 +347,7 @@ ip_fastforward(struct mbuf *m) /* * Is it for a local address on this host? */ - if (in_localip(ip->ip_dst)) + if (in_localip_fast(ip->ip_dst)) return m; //IPSTAT_INC(ips_total); @@ -390,7 +390,7 @@ ip_fastforward(struct mbuf *m) /* * Is it now for a local address on this host? */ - if (in_localip(dest)) + if (in_localip_fast(dest)) goto forwardlocal; /* * Go on with new destination address @@ -479,7 +479,7 @@ passin: /* * Is it now for a local address on this host? */ - if (m->m_flags & M_FASTFWD_OURS || in_localip(dest)) { + if (m->m_flags & M_FASTFWD_OURS || in_localip_fast(dest)) { forwardlocal: /* * Return packet for processing by ip_input(). diff --git a/sys/netinet/ipfw/ip_fw2.c b/sys/netinet/ipfw/ip_fw2.c index b76a638..53f6e97 100644 --- a/sys/netinet/ipfw/ip_fw2.c +++ b/sys/netinet/ipfw/ip_fw2.c @@ -1450,10 +1450,7 @@ do { \ case O_IP_SRC_ME: if (is_ipv4) { - struct ifnet *tif; - - INADDR_TO_IFP(src_ip, tif); - match = (tif != NULL); + match = in_localip_fast(src_ip); break; } #ifdef INET6 @@ -1490,10 +1487,7 @@ do { \ case O_IP_DST_ME: if (is_ipv4) { - struct ifnet *tif; - - INADDR_TO_IFP(dst_ip, tif); - match = (tif != NULL); + match = in_localip_fast(dst_ip); break; } #ifdef INET6 diff --git a/sys/netinet/ipfw/ip_fw_pfil.c b/sys/netinet/ipfw/ip_fw_pfil.c index a21f501..bdf8beb 100644 --- a/sys/netinet/ipfw/ip_fw_pfil.c +++ b/sys/netinet/ipfw/ip_fw_pfil.c @@ -184,7 +184,7 @@ again: bcopy(args.next_hop, (fwd_tag+1), sizeof(struct sockaddr_in)); m_tag_prepend(*m0, fwd_tag); - if (in_localip(args.next_hop->sin_addr)) + if (in_localip_fast(args.next_hop->sin_addr)) (*m0)->m_flags |= M_FASTFWD_OURS; } #endif /* INET || INET6 */ --------------010308000904000207080306 Content-Type: text/plain; charset=UTF-8; name="80_use_rtgc.diff" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="80_use_rtgc.diff" commit 67a74d91a7b4a47a83fcfa5e79a6c6f0b4b1122d Author: Charlie Root Date: Fri Oct 26 17:10:52 2012 +0000 Remove rte locking for IPv4. Remove one of 2 locks from IPv6 rtes diff --git a/sys/net/if.c b/sys/net/if.c index a875326..eb6a723 100644 --- a/sys/net/if.c +++ b/sys/net/if.c @@ -487,6 +487,13 @@ if_alloc(u_char type) return (ifp); } + +void +if_free_real(struct ifnet *ifp) +{ + free(ifp, M_IFNET); +} + /* * Do the actual work of freeing a struct ifnet, and layer 2 common * structure. This call is made when the last reference to an @@ -499,6 +506,15 @@ if_free_internal(struct ifnet *ifp) KASSERT((ifp->if_flags & IFF_DYING), ("if_free_internal: interface not dying")); + if (rtgc_is_enabled()) { + /* + * FIXME: Sleep some time to permit packets + * using fastforwarding routine without locking + * die withour side effects. + */ + pause("if_free_gc", hz / 20); /* Sleep 50 milliseconds */ + } + if (if_com_free[ifp->if_alloctype] != NULL) if_com_free[ifp->if_alloctype](ifp->if_l2com, ifp->if_alloctype); @@ -511,7 +527,10 @@ if_free_internal(struct ifnet *ifp) IF_AFDATA_DESTROY(ifp); IF_ADDR_LOCK_DESTROY(ifp); ifq_delete(&ifp->if_snd); - free(ifp, M_IFNET); + if (rtgc_is_enabled()) + rtgc_free(RTGC_IF, ifp, 0); + else + if_free_real(ifp); } /* diff --git a/sys/net/if_var.h b/sys/net/if_var.h index 39c499f..5ef6264 100644 --- a/sys/net/if_var.h +++ b/sys/net/if_var.h @@ -857,6 +857,7 @@ void if_down(struct ifnet *); struct ifmultiaddr * if_findmulti(struct ifnet *, struct sockaddr *); void if_free(struct ifnet *); +void if_free_real(struct ifnet *); void if_free_type(struct ifnet *, u_char); void if_initname(struct ifnet *, const char *, int); void if_link_state_change(struct ifnet *, int); diff --git a/sys/net/route.c b/sys/net/route.c index 3059f5a..97965b3 100644 --- a/sys/net/route.c +++ b/sys/net/route.c @@ -142,6 +142,175 @@ VNET_DEFINE(int, rttrash); /* routes not in table but not freed */ static VNET_DEFINE(uma_zone_t, rtzone); /* Routing table UMA zone. */ #define V_rtzone VNET(rtzone) +SYSCTL_NODE(_net, OID_AUTO, gc, CTLFLAG_RW, 0, "Garbage collector"); + +MALLOC_DEFINE(M_RTGC, "rtgc", "route GC"); +void rtgc_func(void *_unused); +void rtfree_real(struct rtentry *rt); + +int _rtgc_default_enabled = 1; +TUNABLE_INT("net.gc.enable", &_rtgc_default_enabled); + +#define RTGC_CALLOUT_DELAY 1 +#define RTGC_EXPIRE_DELAY 3 + +VNET_DEFINE(struct mtx, rtgc_mtx); +#define V_rtgc_mtx VNET(rtgc_mtx) +VNET_DEFINE(struct callout, rtgc_callout); +#define V_rtgc_callout VNET(rtgc_callout) +VNET_DEFINE(int, rtgc_enabled); +#define V_rtgc_enabled VNET(rtgc_enabled) +SYSCTL_VNET_INT(_net_gc, OID_AUTO, enable, CTLFLAG_RW, + &VNET_NAME(rtgc_enabled), 1, + "Enable garbage collector"); +VNET_DEFINE(int, rtgc_expire_delay) = RTGC_EXPIRE_DELAY; +#define V_rtgc_expire_delay VNET(rtgc_expire_delay) +SYSCTL_VNET_INT(_net_gc, OID_AUTO, expire, CTLFLAG_RW, + &VNET_NAME(rtgc_expire_delay), 1, + "Object expiration delay"); +VNET_DEFINE(int, rtgc_numfailures); +#define V_rtgc_numfailures VNET(rtgc_numfailures) +SYSCTL_VNET_INT(_net_gc, OID_AUTO, failures, CTLFLAG_RD, + &VNET_NAME(rtgc_numfailures), 0, + "Number of objects leaked from route garbage collector"); +VNET_DEFINE(int, rtgc_numqueued); +#define V_rtgc_numqueued VNET(rtgc_numqueued) +SYSCTL_VNET_INT(_net_gc, OID_AUTO, queued, CTLFLAG_RD, + &VNET_NAME(rtgc_numqueued), 0, + "Number of objects queued for deletion"); +VNET_DEFINE(int, rtgc_numfreed); +#define V_rtgc_numfreed VNET(rtgc_numfreed) +SYSCTL_VNET_INT(_net_gc, OID_AUTO, freed, CTLFLAG_RD, + &VNET_NAME(rtgc_numfreed), 0, + "Number of objects deleted"); +VNET_DEFINE(int, rtgc_numinvoked); +#define V_rtgc_numinvoked VNET(rtgc_numinvoked) +SYSCTL_VNET_INT(_net_gc, OID_AUTO, invoked, CTLFLAG_RD, + &VNET_NAME(rtgc_numinvoked), 0, + "Number of times GC was invoked"); + +struct rtgc_item { + time_t expire; /* Whe we can delete this entry */ + int etype; /* Entry type */ + void *data; /* data to free */ + TAILQ_ENTRY(rtgc_item) items; +}; + +VNET_DEFINE(TAILQ_HEAD(, rtgc_item), rtgc_queue); +#define V_rtgc_queue VNET(rtgc_queue) + +int +rtgc_is_enabled() +{ + return V_rtgc_enabled; +} + +void +rtgc_func(void *_unused) +{ + struct rtgc_item *item, *temp_item; + TAILQ_HEAD(, rtgc_item) rtgc_tq; + int empty, deleted; + + CTR2(KTR_NET, "%s: started with %d objects", __func__, V_rtgc_numqueued); + + TAILQ_INIT(&rtgc_tq); + + /* Move all contents of current queue to new empty queue */ + mtx_lock(&V_rtgc_mtx); + V_rtgc_numinvoked++; + TAILQ_SWAP(&rtgc_queue, &rtgc_tq, rtgc_item, items); + mtx_unlock(&V_rtgc_mtx); + + deleted = 0; + + /* Dispatch as much as we can */ + TAILQ_FOREACH_SAFE(item, &rtgc_tq, items, temp_item) { + if (item->expire > time_uptime) + break; + + /* We can definitely delete this item */ + TAILQ_REMOVE(&rtgc_tq, item, items); + + switch (item->etype) { + case RTGC_ROUTE: + CTR1(KTR_NET, "Freeing route structure %p", item->data); + rtfree_real((struct rtentry *)item->data); + break; + case RTGC_IF: + CTR1(KTR_NET, "Freeing iface structure %p", item->data); + if_free_real((struct ifnet *)item->data); + break; + default: + CTR2(KTR_NET, "Unknown type: %d %p", item->etype, item->data); + break; + } + + /* Remove item itself */ + free(item, M_RTGC); + deleted++; + } + + /* + * Add remaining data back to mail queue. + * Note items are still sorted by time_uptime after merge. + */ + + mtx_lock(&V_rtgc_mtx); + /* Add new items to the end of our temporary queue */ + TAILQ_CONCAT(&rtgc_tq, &rtgc_queue, items); + /* Move items back to stable storage */ + TAILQ_SWAP(&rtgc_queue, &rtgc_tq, rtgc_item, items); + /* Check if we need to run callout another time */ + empty = TAILQ_EMPTY(&rtgc_queue); + /* Update counters */ + V_rtgc_numfreed += deleted; + V_rtgc_numqueued -= deleted; + mtx_unlock(&V_rtgc_mtx); + + CTR4(KTR_NET, "%s: ended with %d object(s) (%d deleted), callout: %s", + __func__, V_rtgc_numqueued, deleted, empty ? "stopped" : "sheduled"); + /* Schedule ourself iff there are items to delete */ + if (!empty) + callout_reset(&V_rtgc_callout, hz * RTGC_CALLOUT_DELAY, rtgc_func, NULL); +} + +void +rtgc_free(int etype, void *data, int can_sleep) +{ + struct rtgc_item *item; + + item = malloc(sizeof(struct rtgc_item), M_RTGC, (can_sleep ? M_WAITOK : M_NOWAIT) | M_ZERO); + if (item == NULL) { + V_rtgc_numfailures++; /* XXX: locking */ + return; /* Skip route freeing. Memory leak is much better than panic */ + } + + item->expire = time_uptime + V_rtgc_expire_delay; + item->etype = etype; + item->data = data; + + if ((!can_sleep) && (mtx_trylock(&V_rtgc_mtx) == 0)) { + /* Fail to acquire lock. Add another leak */ + free(item, M_RTGC); + V_rtgc_numfailures++; /* XXX: locking */ + return; + } + + if (can_sleep) + mtx_lock(&V_rtgc_mtx); + + TAILQ_INSERT_TAIL(&rtgc_queue, item, items); + V_rtgc_numqueued++; + + mtx_unlock(&V_rtgc_mtx); + + /* Schedule callout if not running */ + if (!callout_pending(&V_rtgc_callout)) + callout_reset(&V_rtgc_callout, hz * RTGC_CALLOUT_DELAY, rtgc_func, NULL); +} + + /* * handler for net.my_fibnum */ @@ -241,6 +410,17 @@ vnet_route_init(const void *unused __unused) dom->dom_rtattach((void **)rnh, dom->dom_rtoffset); } } + + /* Init garbage collector */ + mtx_init(&V_rtgc_mtx, "routeGC", NULL, MTX_DEF); + /* Init queue */ + TAILQ_INIT(&V_rtgc_queue); + /* Init garbage callout */ + memset(&V_rtgc_callout, 0, sizeof(rtgc_callout)); + callout_init(&V_rtgc_callout, 1); + /* Set default from loader tunable */ + V_rtgc_enabled = _rtgc_default_enabled; + //callout_reset(&V_rtgc_callout, 3 * hz, &rtgc_func, NULL); } VNET_SYSINIT(vnet_route_init, SI_SUB_PROTO_DOMAIN, SI_ORDER_FOURTH, vnet_route_init, 0); @@ -351,6 +531,74 @@ rtalloc1(struct sockaddr *dst, int report, u_long ignflags) } struct rtentry * +rtalloc1_fib_nolock(struct sockaddr *dst, int report, u_long ignflags, + u_int fibnum) +{ + struct radix_node_head *rnh; + struct radix_node *rn; + struct rtentry *newrt; + struct rt_addrinfo info; + int err = 0, msgtype = RTM_MISS; + int needlock; + + KASSERT((fibnum < rt_numfibs), ("rtalloc1_fib: bad fibnum")); + switch (dst->sa_family) { + case AF_INET6: + case AF_INET: + /* We support multiple FIBs. */ + break; + default: + fibnum = RT_DEFAULT_FIB; + break; + } + rnh = rt_tables_get_rnh(fibnum, dst->sa_family); + newrt = NULL; + if (rnh == NULL) + goto miss; + + /* + * Look up the address in the table for that Address Family + */ + needlock = !(ignflags & RTF_RNH_LOCKED); + if (needlock) + RADIX_NODE_HEAD_RLOCK(rnh); +#ifdef INVARIANTS + else + RADIX_NODE_HEAD_LOCK_ASSERT(rnh); +#endif + rn = rnh->rnh_matchaddr(dst, rnh); + if (rn && ((rn->rn_flags & RNF_ROOT) == 0)) { + newrt = RNTORT(rn); + if (needlock) + RADIX_NODE_HEAD_RUNLOCK(rnh); + goto done; + + } else if (needlock) + RADIX_NODE_HEAD_RUNLOCK(rnh); + + /* + * Either we hit the root or couldn't find any match, + * Which basically means + * "caint get there frm here" + */ +miss: + V_rtstat.rts_unreach++; + + if (report) { + /* + * If required, report the failure to the supervising + * Authorities. + * For a delete, this is not an error. (report == 0) + */ + bzero(&info, sizeof(info)); + info.rti_info[RTAX_DST] = dst; + rt_missmsg_fib(msgtype, &info, 0, err, fibnum); + } +done: + return (newrt); +} + +struct rtentry * rtalloc1_fib(struct sockaddr *dst, int report, u_long ignflags, u_int fibnum) { @@ -422,6 +670,23 @@ done: return (newrt); } + +void +rtfree_real(struct rtentry *rt) +{ + /* + * The key is separatly alloc'd so free it (see rt_setgate()). + * This also frees the gateway, as they are always malloc'd + * together. + */ + Free(rt_key(rt)); + + /* + * and the rtentry itself of course + */ + uma_zfree(V_rtzone, rt); +} + /* * Remove a reference count from an rtentry. * If the count gets low enough, take it out of the routing table @@ -484,18 +749,13 @@ rtfree(struct rtentry *rt) */ if (rt->rt_ifa) ifa_free(rt->rt_ifa); - /* - * The key is separatly alloc'd so free it (see rt_setgate()). - * This also frees the gateway, as they are always malloc'd - * together. - */ - Free(rt_key(rt)); - /* - * and the rtentry itself of course - */ RT_LOCK_DESTROY(rt); - uma_zfree(V_rtzone, rt); + + if (V_rtgc_enabled) + rtgc_free(RTGC_ROUTE, rt, 0); + else + rtfree_real(rt); return; } done: diff --git a/sys/net/route.h b/sys/net/route.h index b26ac44..3aa694d 100644 --- a/sys/net/route.h +++ b/sys/net/route.h @@ -363,9 +363,14 @@ void rt_maskedcopy(struct sockaddr *, struct sockaddr *, struct sockaddr *); * * RTFREE() uses an unlocked entry. */ +#define RTGC_ROUTE 1 +#define RTGC_IF 3 + int rtexpunge(struct rtentry *); void rtfree(struct rtentry *); +void rtgc_free(int etype, void *data, int can_sleep); +int rtgc_is_enabled(void); int rt_check(struct rtentry **, struct rtentry **, struct sockaddr *); /* XXX MRT COMPAT VERSIONS THAT SET UNIVERSE to 0 */ @@ -394,6 +399,7 @@ int rt_getifa_fib(struct rt_addrinfo *, u_int fibnum); void rtalloc_ign_fib(struct route *ro, u_long ignflags, u_int fibnum); void rtalloc_fib(struct route *ro, u_int fibnum); struct rtentry *rtalloc1_fib(struct sockaddr *, int, u_long, u_int); +struct rtentry *rtalloc1_fib_nolock(struct sockaddr *, int, u_long, u_int); int rtioctl_fib(u_long, caddr_t, u_int); void rtredirect_fib(struct sockaddr *, struct sockaddr *, struct sockaddr *, int, struct sockaddr *, u_int); diff --git a/sys/netinet/in_rmx.c b/sys/netinet/in_rmx.c index 1389873..1c9d9db 100644 --- a/sys/netinet/in_rmx.c +++ b/sys/netinet/in_rmx.c @@ -122,12 +122,12 @@ in_matroute(void *v_arg, struct radix_node_head *head) struct rtentry *rt = (struct rtentry *)rn; if (rt) { - RT_LOCK(rt); +// RT_LOCK(rt); if (rt->rt_flags & RTPRF_OURS) { rt->rt_flags &= ~RTPRF_OURS; rt->rt_rmx.rmx_expire = 0; } - RT_UNLOCK(rt); +// RT_UNLOCK(rt); } return rn; } @@ -365,7 +365,7 @@ in_inithead(void **head, int off) rnh = *head; rnh->rnh_addaddr = in_addroute; - rnh->rnh_matchaddr = in_matroute; + rnh->rnh_matchaddr = rn_match; rnh->rnh_close = in_clsroute; if (_in_rt_was_here == 0 ) { callout_init(&V_rtq_timer, CALLOUT_MPSAFE); diff --git a/sys/netinet/ip_fastfwd.c b/sys/netinet/ip_fastfwd.c index d7fe411..d2b98b3 100644 --- a/sys/netinet/ip_fastfwd.c +++ b/sys/netinet/ip_fastfwd.c @@ -112,6 +112,22 @@ static VNET_DEFINE(int, ipfastforward_active); SYSCTL_VNET_INT(_net_inet_ip, OID_AUTO, fastforwarding, CTLFLAG_RW, &VNET_NAME(ipfastforward_active), 0, "Enable fast IP forwarding"); +void +rtalloc_ign_fib_nolock(struct route *ro, u_long ignore, u_int fibnum); + +void +rtalloc_ign_fib_nolock(struct route *ro, u_long ignore, u_int fibnum) +{ + struct rtentry *rt; + + if ((rt = ro->ro_rt) != NULL) { + if (rt->rt_ifp != NULL && rt->rt_flags & RTF_UP) + return; + ro->ro_rt = NULL; + } + ro->ro_rt = rtalloc1_fib_nolock(&ro->ro_dst, 1, ignore, fibnum); +} + static struct sockaddr_in * ip_findroute(struct route *ro, struct in_addr dest, struct mbuf *m) { @@ -126,7 +142,7 @@ ip_findroute(struct route *ro, struct in_addr dest, struct mbuf *m) dst->sin_family = AF_INET; dst->sin_len = sizeof(*dst); dst->sin_addr.s_addr = dest.s_addr; - in_rtalloc_ign(ro, 0, M_GETFIB(m)); + rtalloc_ign_fib_nolock(ro, 0, M_GETFIB(m)); /* * Route there and interface still up? @@ -140,8 +156,10 @@ ip_findroute(struct route *ro, struct in_addr dest, struct mbuf *m) } else { IPSTAT_INC(ips_noroute); IPSTAT_INC(ips_cantforward); +#if 0 if (rt) RTFREE(rt); +#endif icmp_error(m, ICMP_UNREACH, ICMP_UNREACH_HOST, 0, 0); return NULL; } @@ -334,10 +352,11 @@ ip_fastforward(struct mbuf *m) if (in_localip(ip->ip_dst)) return m; - IPSTAT_INC(ips_total); + //IPSTAT_INC(ips_total); /* * Step 3: incoming packet firewall processing + in_rtalloc_ign(ro, 0, M_GETFIB(m)); */ /* @@ -476,8 +495,10 @@ forwardlocal: * "ours"-label. */ m->m_flags |= M_FASTFWD_OURS; +/* if (ro.ro_rt) RTFREE(ro.ro_rt); +*/ return m; } /* @@ -490,7 +511,7 @@ forwardlocal: m_tag_delete(m, fwd_tag); } #endif /* IPFIREWALL_FORWARD */ - RTFREE(ro.ro_rt); +// RTFREE(ro.ro_rt); if ((dst = ip_findroute(&ro, dest, m)) == NULL) return NULL; /* icmp unreach already sent */ ifp = ro.ro_rt->rt_ifp; @@ -601,17 +622,21 @@ passout: if (error != 0) IPSTAT_INC(ips_odropped); else { +#if 0 ro.ro_rt->rt_rmx.rmx_pksent++; IPSTAT_INC(ips_forward); IPSTAT_INC(ips_fastforward); +#endif } consumed: - RTFREE(ro.ro_rt); +// RTFREE(ro.ro_rt); return NULL; drop: if (m) m_freem(m); +/* if (ro.ro_rt) RTFREE(ro.ro_rt); +*/ return NULL; } diff --git a/sys/netinet6/in6_rmx.c b/sys/netinet6/in6_rmx.c index b526030..9aabe63 100644 --- a/sys/netinet6/in6_rmx.c +++ b/sys/netinet6/in6_rmx.c @@ -195,12 +195,12 @@ in6_matroute(void *v_arg, struct radix_node_head *head) struct rtentry *rt = (struct rtentry *)rn; if (rt) { - RT_LOCK(rt); + //RT_LOCK(rt); if (rt->rt_flags & RTPRF_OURS) { rt->rt_flags &= ~RTPRF_OURS; rt->rt_rmx.rmx_expire = 0; } - RT_UNLOCK(rt); + //RT_UNLOCK(rt); } return rn; } @@ -440,7 +440,7 @@ in6_inithead(void **head, int off) rnh = *head; rnh->rnh_addaddr = in6_addroute; - rnh->rnh_matchaddr = in6_matroute; + rnh->rnh_matchaddr = rn_match; if (V__in6_rt_was_here == 0) { callout_init(&V_rtq_timer6, CALLOUT_MPSAFE); --------------010308000904000207080306 Content-Type: text/plain; charset=UTF-8; name="81_radix_rmlock.diff" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="81_radix_rmlock.diff" commit 0e7cebd1753c3b77bdc00d728fbd5910c2d2afec Author: Charlie Root Date: Mon Apr 8 15:35:00 2013 +0000 Make radix use rmlock. diff --git a/sys/contrib/ipfilter/netinet/ip_compat.h b/sys/contrib/ipfilter/netinet/ip_compat.h index 31e5b11..5e74da4 100644 --- a/sys/contrib/ipfilter/netinet/ip_compat.h +++ b/sys/contrib/ipfilter/netinet/ip_compat.h @@ -870,6 +870,7 @@ typedef u_int32_t u_32_t; # if (__FreeBSD_version >= 500043) # include # if (__FreeBSD_version > 700014) +# include # include # define KRWLOCK_T struct rwlock # ifdef _KERNEL diff --git a/sys/contrib/pf/net/pf_table.c b/sys/contrib/pf/net/pf_table.c index 40c9f67..b1dd703 100644 --- a/sys/contrib/pf/net/pf_table.c +++ b/sys/contrib/pf/net/pf_table.c @@ -44,6 +44,7 @@ __FBSDID("$FreeBSD$"); #include #include #include +#include #include #ifdef __FreeBSD__ #include diff --git a/sys/kern/subr_witness.c b/sys/kern/subr_witness.c index e565d01..f913d27 100644 --- a/sys/kern/subr_witness.c +++ b/sys/kern/subr_witness.c @@ -508,7 +508,7 @@ static struct witness_order_list_entry order_lists[] = { * Routing */ { "so_rcv", &lock_class_mtx_sleep }, - { "radix node head", &lock_class_rw }, + { "radix node head", &lock_class_rm }, { "rtentry", &lock_class_mtx_sleep }, { "ifaddr", &lock_class_mtx_sleep }, { NULL, NULL }, diff --git a/sys/kern/sys_socket.c b/sys/kern/sys_socket.c index 4cbae74..fea12d0 100644 --- a/sys/kern/sys_socket.c +++ b/sys/kern/sys_socket.c @@ -50,6 +50,8 @@ __FBSDID("$FreeBSD$"); #include #include +#include +#include #include #include diff --git a/sys/kern/vfs_export.c b/sys/kern/vfs_export.c index 4185211..848c232 100644 --- a/sys/kern/vfs_export.c +++ b/sys/kern/vfs_export.c @@ -47,7 +47,7 @@ __FBSDID("$FreeBSD$"); #include #include #include -#include +#include #include #include #include @@ -427,6 +427,7 @@ vfs_export_lookup(struct mount *mp, struct sockaddr *nam) register struct netcred *np; register struct radix_node_head *rnh; struct sockaddr *saddr; + RADIX_NODE_HEAD_READER; nep = mp->mnt_export; if (nep == NULL) diff --git a/sys/net/if.c b/sys/net/if.c index 5ecde8c..351e046 100644 --- a/sys/net/if.c +++ b/sys/net/if.c @@ -51,6 +51,7 @@ #include #include #include +#include #include #include #include diff --git a/sys/net/radix.c b/sys/net/radix.c index 33fcf82..d8d1e8b 100644 --- a/sys/net/radix.c +++ b/sys/net/radix.c @@ -37,7 +37,7 @@ #ifdef _KERNEL #include #include -#include +#include #include #include #include diff --git a/sys/net/radix.h b/sys/net/radix.h index 29659b5..2d130f0 100644 --- a/sys/net/radix.h +++ b/sys/net/radix.h @@ -36,7 +36,7 @@ #ifdef _KERNEL #include #include -#include +#include #endif #ifdef MALLOC_DECLARE @@ -133,7 +133,7 @@ struct radix_node_head { struct radix_node rnh_nodes[3]; /* empty tree for common case */ int rnh_multipath; /* multipath capable ? */ #ifdef _KERNEL - struct rwlock rnh_lock; /* locks entire radix tree */ + struct rmlock rnh_lock; /* locks entire radix tree */ #endif }; @@ -146,18 +146,21 @@ struct radix_node_head { #define R_Zalloc(p, t, n) (p = (t) malloc((unsigned long)(n), M_RTABLE, M_NOWAIT | M_ZERO)) #define Free(p) free((caddr_t)p, M_RTABLE); +#define RADIX_NODE_HEAD_READER struct rm_priotracker tracker #define RADIX_NODE_HEAD_LOCK_INIT(rnh) \ - rw_init_flags(&(rnh)->rnh_lock, "radix node head", 0) -#define RADIX_NODE_HEAD_LOCK(rnh) rw_wlock(&(rnh)->rnh_lock) -#define RADIX_NODE_HEAD_UNLOCK(rnh) rw_wunlock(&(rnh)->rnh_lock) -#define RADIX_NODE_HEAD_RLOCK(rnh) rw_rlock(&(rnh)->rnh_lock) -#define RADIX_NODE_HEAD_RUNLOCK(rnh) rw_runlock(&(rnh)->rnh_lock) -#define RADIX_NODE_HEAD_LOCK_TRY_UPGRADE(rnh) rw_try_upgrade(&(rnh)->rnh_lock) - - -#define RADIX_NODE_HEAD_DESTROY(rnh) rw_destroy(&(rnh)->rnh_lock) -#define RADIX_NODE_HEAD_LOCK_ASSERT(rnh) rw_assert(&(rnh)->rnh_lock, RA_LOCKED) -#define RADIX_NODE_HEAD_WLOCK_ASSERT(rnh) rw_assert(&(rnh)->rnh_lock, RA_WLOCKED) + rm_init(&(rnh)->rnh_lock, "radix node head") +#define RADIX_NODE_HEAD_LOCK(rnh) rm_wlock(&(rnh)->rnh_lock) +#define RADIX_NODE_HEAD_UNLOCK(rnh) rm_wunlock(&(rnh)->rnh_lock) +#define RADIX_NODE_HEAD_RLOCK(rnh) rm_rlock(&(rnh)->rnh_lock, &tracker) +#define RADIX_NODE_HEAD_RUNLOCK(rnh) rm_runlock(&(rnh)->rnh_lock, &tracker) +//#define RADIX_NODE_HEAD_LOCK_TRY_UPGRADE(rnh) rw_try_upgrade(&(rnh)->rnh_lock) + + +#define RADIX_NODE_HEAD_DESTROY(rnh) rm_destroy(&(rnh)->rnh_lock) +#define RADIX_NODE_HEAD_LOCK_ASSERT(rnh) +#define RADIX_NODE_HEAD_WLOCK_ASSERT(rnh) +//#define RADIX_NODE_HEAD_LOCK_ASSERT(rnh) rw_assert(&(rnh)->rnh_lock, RA_LOCKED) +//#define RADIX_NODE_HEAD_WLOCK_ASSERT(rnh) rw_assert(&(rnh)->rnh_lock, RA_WLOCKED) #endif /* _KERNEL */ void rn_init(int); diff --git a/sys/net/radix_mpath.c b/sys/net/radix_mpath.c index ee7826f..c69888e 100644 --- a/sys/net/radix_mpath.c +++ b/sys/net/radix_mpath.c @@ -45,6 +45,8 @@ __FBSDID("$FreeBSD$"); #include #include #include +#include +#include #include #include #include diff --git a/sys/net/route.c b/sys/net/route.c index 5d56688..2cf6ea5 100644 --- a/sys/net/route.c +++ b/sys/net/route.c @@ -52,6 +52,8 @@ #include #include #include +#include +#include #include #include @@ -544,6 +546,7 @@ rtalloc1_fib_nolock(struct sockaddr *dst, int report, u_long ignflags, struct rtentry *newrt; struct rt_addrinfo info; int err = 0, msgtype = RTM_MISS; + RADIX_NODE_HEAD_READER; int needlock; KASSERT((fibnum < rt_numfibs), ("rtalloc1_fib: bad fibnum")); @@ -612,6 +615,7 @@ rtalloc1_fib(struct sockaddr *dst, int report, u_long ignflags, struct rtentry *newrt; struct rt_addrinfo info; int err = 0, msgtype = RTM_MISS; + RADIX_NODE_HEAD_READER; int needlock; KASSERT((fibnum < rt_numfibs), ("rtalloc1_fib: bad fibnum")); @@ -799,6 +803,7 @@ rtredirect_fib(struct sockaddr *dst, struct rt_addrinfo info; struct ifaddr *ifa; struct radix_node_head *rnh; + RADIX_NODE_HEAD_READER; ifa = NULL; rnh = rt_tables_get_rnh(fibnum, dst->sa_family); diff --git a/sys/net/rtsock.c b/sys/net/rtsock.c index 58c46a6..18d3e06 100644 --- a/sys/net/rtsock.c +++ b/sys/net/rtsock.c @@ -45,6 +45,7 @@ #include #include #include +#include #include #include #include @@ -577,6 +578,7 @@ route_output(struct mbuf *m, struct socket *so) struct ifnet *ifp = NULL; union sockaddr_union saun; sa_family_t saf = AF_UNSPEC; + RADIX_NODE_HEAD_READER; #define senderr(e) { error = e; goto flush;} if (m == NULL || ((m->m_len < sizeof(long)) && @@ -1818,6 +1820,7 @@ sysctl_rtsock(SYSCTL_HANDLER_ARGS) int i, lim, error = EINVAL; u_char af; struct walkarg w; + RADIX_NODE_HEAD_READER; name ++; namelen--; diff --git a/sys/netinet/in_rmx.c b/sys/netinet/in_rmx.c index 1c9d9db..775ba5a 100644 --- a/sys/netinet/in_rmx.c +++ b/sys/netinet/in_rmx.c @@ -53,6 +53,8 @@ __FBSDID("$FreeBSD$"); #include #include +#include +#include #include #include diff --git a/sys/netinet6/in6_ifattach.c b/sys/netinet6/in6_ifattach.c index 80eb022..cbfe1d8 100644 --- a/sys/netinet6/in6_ifattach.c +++ b/sys/netinet6/in6_ifattach.c @@ -42,6 +42,8 @@ __FBSDID("$FreeBSD$"); #include #include #include +#include +#include #include #include diff --git a/sys/netinet6/in6_rmx.c b/sys/netinet6/in6_rmx.c index 9aabe63..a291db2 100644 --- a/sys/netinet6/in6_rmx.c +++ b/sys/netinet6/in6_rmx.c @@ -84,6 +84,7 @@ __FBSDID("$FreeBSD$"); #include #include #include +#include #include #include #include diff --git a/sys/netinet6/nd6_rtr.c b/sys/netinet6/nd6_rtr.c index 687d84d..7737d47 100644 --- a/sys/netinet6/nd6_rtr.c +++ b/sys/netinet6/nd6_rtr.c @@ -45,6 +45,7 @@ __FBSDID("$FreeBSD: stable/8/sys/netinet6/nd6_rtr.c 233201 2012-03-19 20:49:42Z #include #include #include +#include #include #include #include --------------010308000904000207080306 Content-Type: text/plain; charset=UTF-8; name="11_no_lle_rlock.diff" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="11_no_lle_rlock.diff" commit 963196095589c03880ddd13a5c16f9e50cf6d7ce Author: Charlie Root Date: Sun Nov 4 15:52:50 2012 +0000 Do not require locking arp lle diff --git a/sys/net/if_llatbl.h b/sys/net/if_llatbl.h index 9f6531b..c1b2af9 100644 --- a/sys/net/if_llatbl.h +++ b/sys/net/if_llatbl.h @@ -169,6 +169,7 @@ MALLOC_DECLARE(M_LLTABLE); #define LLE_PUB 0x0020 /* publish entry ??? */ #define LLE_DELETE 0x4000 /* delete on a lookup - match LLE_IFADDR */ #define LLE_CREATE 0x8000 /* create on a lookup miss */ +#define LLE_UNLOCKED 0x1000 /* return lle unlocked */ #define LLE_EXCLUSIVE 0x2000 /* return lle xlocked */ #define LLATBL_HASH(key, mask) \ diff --git a/sys/netinet/if_ether.c b/sys/netinet/if_ether.c index f61b803..ecb9b8e 100644 --- a/sys/netinet/if_ether.c +++ b/sys/netinet/if_ether.c @@ -283,10 +283,10 @@ arpresolve(struct ifnet *ifp, struct rtentry *rt0, struct mbuf *m, struct sockaddr *dst, u_char *desten, struct llentry **lle) { struct llentry *la = 0; - u_int flags = 0; + u_int flags = LLE_UNLOCKED; struct mbuf *curr = NULL; struct mbuf *next = NULL; - int error, renew; + int error, renew = 0; *lle = NULL; if (m != NULL) { @@ -307,7 +307,41 @@ arpresolve(struct ifnet *ifp, struct rtentry *rt0, struct mbuf *m, retry: IF_AFDATA_RLOCK(ifp); la = lla_lookup(LLTABLE(ifp), flags, dst); + + /* + * Fast path. Do not require rlock on llentry. + */ + if ((la != NULL) && (flags & LLE_UNLOCKED)) { + if ((la->la_flags & LLE_VALID) && + ((la->la_flags & LLE_STATIC) || la->la_expire > time_uptime)) { + bcopy(&la->ll_addr, desten, ifp->if_addrlen); + /* + * If entry has an expiry time and it is approaching, + * see if we need to send an ARP request within this + * arpt_down interval. + */ + if (!(la->la_flags & LLE_STATIC) && + time_uptime + la->la_preempt > la->la_expire) { + renew = 1; + la->la_preempt--; + } + + IF_AFDATA_RUNLOCK(ifp); + if (renew != 0) + arprequest(ifp, NULL, &SIN(dst)->sin_addr, NULL); + + return (0); + } + + /* Revert to normal path for other cases */ + *lle = la; + LLE_RLOCK(la); + } + + flags &= ~LLE_UNLOCKED; + IF_AFDATA_RUNLOCK(ifp); + if ((la == NULL) && ((flags & LLE_EXCLUSIVE) == 0) && ((ifp->if_flags & (IFF_NOARP | IFF_STATICARP)) == 0)) { flags |= (LLE_CREATE | LLE_EXCLUSIVE); @@ -324,27 +358,6 @@ retry: return (EINVAL); } - if ((la->la_flags & LLE_VALID) && - ((la->la_flags & LLE_STATIC) || la->la_expire > time_second)) { - bcopy(&la->ll_addr, desten, ifp->if_addrlen); - /* - * If entry has an expiry time and it is approaching, - * see if we need to send an ARP request within this - * arpt_down interval. - */ - if (!(la->la_flags & LLE_STATIC) && - time_second + la->la_preempt > la->la_expire) { - arprequest(ifp, NULL, - &SIN(dst)->sin_addr, IF_LLADDR(ifp)); - - la->la_preempt--; - } - - *lle = la; - error = 0; - goto done; - } - if (la->la_flags & LLE_STATIC) { /* should not happen! */ log(LOG_DEBUG, "arpresolve: ouch, empty static llinfo for %s\n", inet_ntoa(SIN(dst)->sin_addr)); diff --git a/sys/netinet/in.c b/sys/netinet/in.c index eaba4e5..5341918 100644 --- a/sys/netinet/in.c +++ b/sys/netinet/in.c @@ -1561,7 +1561,7 @@ in_lltable_lookup(struct lltable *llt, u_int flags, const struct sockaddr *l3add if (LLE_IS_VALID(lle)) { if (flags & LLE_EXCLUSIVE) LLE_WLOCK(lle); - else + else if (!(flags & LLE_UNLOCKED)) LLE_RLOCK(lle); } done: --------------010308000904000207080306-- From owner-freebsd-net@FreeBSD.ORG Wed Aug 28 19:37:12 2013 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id BDB86FF7; Wed, 28 Aug 2013 19:37:12 +0000 (UTC) (envelope-from jfvogel@gmail.com) Received: from mail-ve0-x235.google.com (mail-ve0-x235.google.com [IPv6:2607:f8b0:400c:c01::235]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id DEA2221E4; Wed, 28 Aug 2013 19:37:11 +0000 (UTC) Received: by mail-ve0-f181.google.com with SMTP id jz10so4684066veb.12 for ; Wed, 28 Aug 2013 12:37:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=dxnKa8hM+AefrXStvsj3F1WeY+ZAph3Wgx8EEc4Juck=; b=nDn2BH6n6WYTCIyZd96zHN8qhwNcQ0pJVWb4lKM7AK7rcBXSJRUP5XpTA3vnpkd5Qz 5uPda4ecEq7EqxRDWBYRA4Schfm5gAeoGby4K41DMofd8RpKrdlnWj+7ZT8JND8Hedh5 tAPciWe84X9MKbEc4HINMV7Yku+OAZn2/zgpeaye7vPXzAsGnHwUEpFOcWWPTR05qs43 8usgrgTfZx3ua2xF9o1tCACdJTt7edXUX0o9mGvYSsaCSTDTJOTVjltYPOmw716R9Fwl 6k+XnaZw8cUQmXH6qpoWM4ijNVJEv5LgiNCZ6A6JWdDuv0U76pIIt5Z0f7hkNytI40AL Phew== MIME-Version: 1.0 X-Received: by 10.58.235.69 with SMTP id uk5mr27194246vec.17.1377718630983; Wed, 28 Aug 2013 12:37:10 -0700 (PDT) Received: by 10.220.159.141 with HTTP; Wed, 28 Aug 2013 12:37:10 -0700 (PDT) In-Reply-To: <521E41CB.30700@yandex-team.ru> References: <521E41CB.30700@yandex-team.ru> Date: Wed, 28 Aug 2013 12:37:10 -0700 Message-ID: Subject: Re: Network stack changes From: Jack Vogel To: "Alexander V. Chernikov" Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: Adrian Chadd , Andre Oppermann , FreeBSD Hackers , FreeBSD Net , Luigi Rizzo , "Andrey V. Elsukov" , freebsd-arch@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 Aug 2013 19:37:12 -0000 Very interesting material Alexander, only had time to glance at it now, will look in more depth later, thanks! Jack On Wed, Aug 28, 2013 at 11:30 AM, Alexander V. Chernikov < melifaro@yandex-team.ru> wrote: > Hello list! > > There is a lot constantly raising discussions related to networking stack > performance/changes. > > I'll try to summarize current problems and possible solutions from my > point of view. > (Generally this is one problem: stack is slooooooooooooooooooooooooooow**, > but we need to know why and what to do). > > Let's start with current IPv4 packet flow on a typical router: > http://static.ipfw.ru/images/**freebsd_ipv4_flow.png > > (I'm sorry I can't provide this as text since Visio don't have any > 'ascii-art' exporter). > > Note that we are using process-to-completion model, e.g. process any > packet in ISR until it is either > consumed by L4+ stack or dropped or put to egress NIC queue. > > (There is also deferred ISR model implemented inside netisr but it does > not change much: > it can help to do more fine-grained hashing (for GRE or other similar > traffic), but > 1) it uses per-packet mutex locking which kills all performance > 2) it currently does not have _any_ hashing functions (see absence of > flags in `netstat -Q`) > People using http://static.ipfw.ru/patches/**netisr_ip_flowid.diff(or modified PPPoe/GRE version) > report some profit, but without fixing (1) it can't help much > ) > > So, let's start: > > 1) Ixgbe uses mutex to protect each RX ring which is perfectly fine since > there is nearly no contention > (the only thing that can happen is driver reconfiguration which is rare > and, more signifficant, we do this once > for the batch of packets received in given interrupt). However, due to > some (im)possible deadlocks current code > does per-packet ring unlock/lock (see ixgbe_rx_input()). > There was a discussion ended with nothing: http://lists.freebsd.org/** > pipermail/freebsd-net/2012-**October/033520.html > > 1*) Possible BPF users. Here we have one rlock if there are any readers > present > (and mutex for any matching packets, but this is more or less OK. > Additionally, there is WIP to implement multiqueue BPF > and there is chance that we can reduce lock contention there). There is > also an "optimize_writers" hack permitting applications > like CDP to use BPF as writers but not registering them as receivers > (which implies rlock) > > 2/3) Virtual interfaces (laggs/vlans over lagg and other simular > constructions). > Currently we simply use rlock to make s/ix0/lagg0/ and, what is much more > funny - we use complex vlan_hash with another rlock to > get vlan interface from underlying one. > > This is definitely not like things should be done and this can be changed > more or less easily. > > There are some useful terms/techniques in world of software/hardware > routing: they have clear 'control plane' and 'data plane' separation. > Former one is for dealing control traffic (IGP, MLD, IGMP snooping, lagg > hellos, ARP/NDP, etc..) and some data traffic (packets with TTL=1, with > options, destined to hosts without ARP/NDP record, and similar). Latter one > is done in hardware (or effective software implementation). > Control plane is responsible to provide data for efficient data plane > operations. This is the point we are missing nearly everywhere. > > What I want to say is: lagg is pure control-plane stuff and vlan is nearly > the same. We can't apply this approach to complex cases like > lagg-over-vlans-over-vlans-**over-(pppoe_ng0-and_wifi0) > but we definitely can do this for most common setups like (igb* or ix* in > lagg with or without vlans on top of lagg). > > We already have some capabilities like VLANHWFILTER/VLANHWTAG, we can add > some more. We even have per-driver hooks to program HW filtering. > > One small step to do is to throw packet to vlan interface directly (P1), > proof-of-concept(working in production): > http://lists.freebsd.org/**pipermail/freebsd-net/2013-**April/035270.html > > Another is to change lagg packet accounting: http://lists.freebsd.org/** > pipermail/svn-src-all/2013-**April/067570.html > Again, this is more like HW boxes do (aggregate all counters including > errors) (and I can't imagine what real error we can get from _lagg_). > > 4) If we are router, we can do either slooow ip_input() -> ip_forward() -> > ip_output() cycle or use optimized ip_fastfwd() which falls back to 'slow' > path for multicast/options/local traffic (e.g. works exactly like 'data > plane' part). > (Btw, we can consider net.inet.ip.fastforwarding to be turned on by > default at least for non-IPSEC kernels) > > Here we have to determine if this is local packet or not, e.g. F(dst_ip) > returning 1 or 0. Currently we are simply using standard rlock + hash of > iface addresses. > (And some consumers like ipfw(4) do the same, but without lock). > We don't need to do this! We can build sorted array of IPv4 addresses or > other efficient structure on every address change and use it unlocked with > delayed garbage collection (proof-of-concept attached) > (There is another thing to discuss: maybe we can do this once somewhere in > ip_input and mark mbuf as 'local/non-local' ? ) > > 5, 9) Currently we have L3 ingress/egress PFIL hooks protected by rmlocks. > This is OK. > > However, 6) and 7) are not. > Firewall can use the same pfil lock as reader protection without imposing > its own lock. currently pfil&ipfw code is ready to do this. > > 8) Radix/rt* api. This is probably the worst place in entire stack. It is > toooo generic, tooo slow and buggy (do you use IPv6? you definitely know > what I'm talking about). > A) It really is too generic and assumption that it can be (effectively) > used for every family is wrong. Two examples: > we don't need to lookup all 128 bits of IPv6 address. Subnets with mask > >/64 are not used widely (actually the only reason to use them are p2p > links due to ND potential problems). > One of common solutions is to lookup 64bits, and build another trie (or > other structure) in case of collision. > Another example is MPLS where we can simply do direct array lookup based > on ingress label. > > B) It is terribly slow (AFAIR luigi@ did some performance management, > numbers available in one of netmap pdfs) > C) It is not multipath-capable. Stateful (and non-working) multipath is > definitely not the right way. > > 8*) rtentry > We are doing it wrong. > Currently _every_ lookup locks/unlocks given rte twice. > First lock is related to and old-old story for trusting IP redirects (and > auto-adding host routes for them). Hopefully currently it is disabled > automatically when you turn forwarding on. > The second one is much more complicated: we are assuming that rte's with > non-zero refcount value can stop egress interface from being destroyed. > This is wrong (but widely used) assumption. > > We can use delayed GC instead of locking for rte's and this won't break > things more than they are broken now (patch attached). > We can't do the same for ifp structures since > a) virtual ones can assume some state in underlying physical NIC > b) physical ones just _can_ be destroyed (maybe regardless of user wants > this or not, like: SFP being unplugged from NIC) or simply lead to kernel > crash due to SW/HW inconsistency > > One of possible solution is to implement stable refcounts based on PCPU > counters, and apply thos counters to ifp, but seem to be non-trivial. > > > Another rtalloc(9) problem is the fact that radix is used as both 'control > plane' and 'data plane' structure/api. Some users always want to put more > information in rte, while others > want to make rte more compact. We just need _different_ structures for > that. > Feature-rich, lot-of-data control plane one (to store everything we want > to store, including, for example, PID of process originating the route) - > current radix can be modified to do this. > And address-family-depended another structure (array, trie, or anything) > which contains _only_ data necessary to put packet on the wire. > > 11) arpresolve. Currently (this was decoupled in 8.x) we have > a) ifaddr rlock > b) lle rlock. > > We don't need those locks. > We need to > a) make lle layer per-interface instead of global (and this can also solve > multiple fibs and L2 mappings done in fib.0 issue) > b) use rtalloc(9)-provided lock instead of separate locking > c) actually, we need to do rewrite this layer because > d) lle actually is the place to do real multipath: > > briefly, > you have rte pointing to some special nexthop structure pointing to lle, > which has the following data: > num_of_egress_ifaces: [ifindex1, ifindex2, ifindex3] | L2 data to prepend > to header > Separate post will follow. > > With the following, we can achieve lagg traffic distribution without > actually using lagg_transmit and similar stuff (at least in most common > scenarious) > (for example, TCP output definitely can benefit from this, since we can > account flowid once for TCP session and use in in every mbuf) > > > So. Imagine we have done all this. How we can estimate the difference? > > There was a thread, started a year ago, describing 'stock' performance and > difference for various modifications. > It is done on 8.x, however I've got similar results on recent 9.x > > http://lists.freebsd.org/**pipermail/freebsd-net/2012-**July/032680.html > > Briefly: > > 2xE5645 @ Intel 82599 NIC. > Kernel: FreeBSD-8-S r237994, stock drivers, stock routing, no FLOWTABLE, > no firewallIxia XM2 (traffic generator) <> ix0 (FreeBSD). Ixia sends 64byte > IP packets from vlan10 (10.100.0.64 - 10.100.0.156) to destinations in > vlan11 (10.100.1.128 - 10.100.1.192). Static arps are configured for all > destination addresses. Traffic level is slightly above or slightly below > system performance. > > we start from 1.4MPPS (if we are using several routes to minimize mutex > contention). > > My 'current' result for the same test, on same HW, with the following > modifications: > > * 1) ixgbe per-packet ring unlock removed > * P1) ixgbe is modified to do direct vlan input (so 2,3 are not used) > * 4) separate lockless in_localip() version > * 6) - using existing pfil lock > * 7) using lockless version > * 8) radix converted to use rmlock instead of rlock. Delayed GC is used > instead of mutexes > * 10) - using existing pfil lock > * 11) using radix lock to do arpresolve(). Not using lle rlock > > (so the rmlocks are the only locks used on data path). > > Additionally, ipstat counters are converted to PCPU (no real performance > implications). > ixgbe does not do per-packet accounting (as in head). > if_vlan counters are converted to PCPU > lagg is converted to rmlock, per-packet accounting is removed (using stat > from underlying interfaces) > lle hash size is bumped to 1024 instead of 32 (not applicable here, but > slows things down for large L2 domains) > > The result is 5.6 MPPS for single port (11 cores) and 6.5MPPS for lagg (16 > cores), nearly the same for HT on and 22 cores. > > .. > while Intel DPDK claims 80MPPS (and 6windgate talks about 160 or so) on > the same-class hardware and _userland_ forwarding. > > One of key features making all such products possible (DPDK, netmap, > packetshader, Cisco SW forwarding) - is use of batching instead of > process-to-completion model. > Batching mitigates locking cost, batching does not wash out CPU cache, and > so on. > > So maybe we can consider passing batches from NIC to at least L2 layer > with netisr? or even up to ip_input() ? > > Another question is about making some sort of reliable GC like ("passive > serialization" or other similar not-to-pronounce-words about Linux and > lockless objects). > > > P.S. Attached patches are 1) for 8.x 2) mostly 'hacks' showing roughly how > can this be done and what benefit can be achieved. > > > > > > > > > > > _______________________________________________ > freebsd-arch@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" > From owner-freebsd-net@FreeBSD.ORG Wed Aug 28 22:25:04 2013 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 786CDD38 for ; Wed, 28 Aug 2013 22:25:04 +0000 (UTC) (envelope-from andre@freebsd.org) Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 6EA9E2CCC for ; Wed, 28 Aug 2013 22:25:03 +0000 (UTC) Received: (qmail 22174 invoked from network); 28 Aug 2013 23:06:41 -0000 Received: from c00l3r.networx.ch (HELO [127.0.0.1]) ([62.48.2.2]) (envelope-sender ) by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP for ; 28 Aug 2013 23:06:41 -0000 Message-ID: <521E78B0.6080709@freebsd.org> Date: Thu, 29 Aug 2013 00:24:48 +0200 From: Andre Oppermann User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130801 Thunderbird/17.0.8 MIME-Version: 1.0 To: "Alexander V. Chernikov" Subject: Re: Network stack changes References: <521E41CB.30700@yandex-team.ru> In-Reply-To: <521E41CB.30700@yandex-team.ru> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: adrian@freebsd.org, freebsd-hackers@freebsd.org, FreeBSD Net , luigi@freebsd.org, ae@FreeBSD.org, freebsd-arch@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 Aug 2013 22:25:04 -0000 On 28.08.2013 20:30, Alexander V. Chernikov wrote: > Hello list! Hello Alexander, you sent quite a few things in the same email. I'll try to respond as much as I can right now. Later you should split it up to have more in-depth discussions on the individual parts. If you could make it to the EuroBSDcon 2013 DevSummit that would be even more awesome. Most of the active network stack people will be there too. > There is a lot constantly raising discussions related to networking stack performance/changes. > > I'll try to summarize current problems and possible solutions from my point of view. > (Generally this is one problem: stack is slooooooooooooooooooooooooooow, but we need to know why and > what to do). Compared to others its not thaaaaaaat slow. ;) > Let's start with current IPv4 packet flow on a typical router: > http://static.ipfw.ru/images/freebsd_ipv4_flow.png > > (I'm sorry I can't provide this as text since Visio don't have any 'ascii-art' exporter). > > Note that we are using process-to-completion model, e.g. process any packet in ISR until it is either > consumed by L4+ stack or dropped or put to egress NIC queue. > > (There is also deferred ISR model implemented inside netisr but it does not change much: > it can help to do more fine-grained hashing (for GRE or other similar traffic), but > 1) it uses per-packet mutex locking which kills all performance > 2) it currently does not have _any_ hashing functions (see absence of flags in `netstat -Q`) > People using http://static.ipfw.ru/patches/netisr_ip_flowid.diff (or modified PPPoe/GRE version) > report some profit, but without fixing (1) it can't help much > ) > > So, let's start: > > 1) Ixgbe uses mutex to protect each RX ring which is perfectly fine since there is nearly no contention > (the only thing that can happen is driver reconfiguration which is rare and, more signifficant, we > do this once > for the batch of packets received in given interrupt). However, due to some (im)possible deadlocks > current code > does per-packet ring unlock/lock (see ixgbe_rx_input()). > There was a discussion ended with nothing: > http://lists.freebsd.org/pipermail/freebsd-net/2012-October/033520.html > > 1*) Possible BPF users. Here we have one rlock if there are any readers present > (and mutex for any matching packets, but this is more or less OK. Additionally, there is WIP to > implement multiqueue BPF > and there is chance that we can reduce lock contention there). Rlock to rmlock? > There is also an "optimize_writers" hack permitting applications > like CDP to use BPF as writers but not registering them as receivers (which implies rlock) I believe longer term we should solve this with a protocol type "ethernet" so that one can send/receive ethernet frames through a normal socket. > 2/3) Virtual interfaces (laggs/vlans over lagg and other simular constructions). > Currently we simply use rlock to make s/ix0/lagg0/ and, what is much more funny - we use complex > vlan_hash with another rlock to > get vlan interface from underlying one. > > This is definitely not like things should be done and this can be changed more or less easily. Indeed. > There are some useful terms/techniques in world of software/hardware routing: they have clear > 'control plane' and 'data plane' separation. > Former one is for dealing control traffic (IGP, MLD, IGMP snooping, lagg hellos, ARP/NDP, etc..) and > some data traffic (packets with TTL=1, with options, destined to hosts without ARP/NDP record, and > similar). Latter one is done in hardware (or effective software implementation). > Control plane is responsible to provide data for efficient data plane operations. This is the point > we are missing nearly everywhere. ACK. > What I want to say is: lagg is pure control-plane stuff and vlan is nearly the same. We can't apply > this approach to complex cases like lagg-over-vlans-over-vlans-over-(pppoe_ng0-and_wifi0) > but we definitely can do this for most common setups like (igb* or ix* in lagg with or without vlans > on top of lagg). ACK. > We already have some capabilities like VLANHWFILTER/VLANHWTAG, we can add some more. We even have > per-driver hooks to program HW filtering. We could. Though for vlan it looks like it would be easier to remove the hardware vlan tag stripping and insertion. It only adds complexity in all drivers for no gain. > One small step to do is to throw packet to vlan interface directly (P1), proof-of-concept(working in > production): > http://lists.freebsd.org/pipermail/freebsd-net/2013-April/035270.html > > Another is to change lagg packet accounting: > http://lists.freebsd.org/pipermail/svn-src-all/2013-April/067570.html > Again, this is more like HW boxes do (aggregate all counters including errors) (and I can't imagine > what real error we can get from _lagg_). > > 4) If we are router, we can do either slooow ip_input() -> ip_forward() -> ip_output() cycle or use > optimized ip_fastfwd() which falls back to 'slow' path for multicast/options/local traffic (e.g. > works exactly like 'data plane' part). > (Btw, we can consider net.inet.ip.fastforwarding to be turned on by default at least for non-IPSEC > kernels) ACK. > Here we have to determine if this is local packet or not, e.g. F(dst_ip) returning 1 or 0. Currently > we are simply using standard rlock + hash of iface addresses. > (And some consumers like ipfw(4) do the same, but without lock). > We don't need to do this! We can build sorted array of IPv4 addresses or other efficient structure > on every address change and use it unlocked with delayed garbage collection (proof-of-concept attached) I'm a bit uneasy with unlocked access. On very weakly ordered architectures this could trip over cache coherency issues. A rmlock is essentially for free in the read case. > (There is another thing to discuss: maybe we can do this once somewhere in ip_input and mark mbuf as > 'local/non-local' ? ) The problem is packet filters may change the destination address and thus can invalidate such a lookup. > 5, 9) Currently we have L3 ingress/egress PFIL hooks protected by rmlocks. This is OK. > > However, 6) and 7) are not. > Firewall can use the same pfil lock as reader protection without imposing its own lock. currently > pfil&ipfw code is ready to do this. The problem with the global pfil rmlock is the comparatively long time it is held in a locked state. Also packet filters may have to acquire additional locks when they have to modify state tables. Rmlocks are not made for that because they pin the thread to the cpu they're currently on. This is what Gleb is complaining about. My idea is to hold the pfil rmlock only for the lookup of the first/next packet filter that will run, not for the entire duration. That would solve the problem. However packets filter then have to use their own locks again, which could be rmlock too. > 8) Radix/rt* api. This is probably the worst place in entire stack. It is toooo generic, tooo slow > and buggy (do you use IPv6? you definitely know what I'm talking about). > A) It really is too generic and assumption that it can be (effectively) used for every family is > wrong. Two examples: > we don't need to lookup all 128 bits of IPv6 address. Subnets with mask >/64 are not used widely > (actually the only reason to use them are p2p links due to ND potential problems). > One of common solutions is to lookup 64bits, and build another trie (or other structure) in case of > collision. > Another example is MPLS where we can simply do direct array lookup based on ingress label. Yes. While we shouldn't throw it out, it should be run as RIB and allow a much more protocol specific FIB for the hot packet path. > B) It is terribly slow (AFAIR luigi@ did some performance management, numbers available in one of > netmap pdfs) Again not thaaaat slow but inefficient enough. > C) It is not multipath-capable. Stateful (and non-working) multipath is definitely not the right way. Indeed. > 8*) rtentry > We are doing it wrong. > Currently _every_ lookup locks/unlocks given rte twice. > First lock is related to and old-old story for trusting IP redirects (and auto-adding host routes > for them). Hopefully currently it is disabled automatically when you turn forwarding on. They're disabled. > The second one is much more complicated: we are assuming that rte's with non-zero refcount value can > stop egress interface from being destroyed. > This is wrong (but widely used) assumption. Not really. The reason for the refcount is not the ifp reference but other code parts that may hold direct pointers to the rtentry and do direct dereferencing to access information in it. > We can use delayed GC instead of locking for rte's and this won't break things more than they are > broken now (patch attached). Nope. Delayed GC is not the way to go here. To do away with rtentry locking and refcounting we have change rtalloc(9) to return the information the caller wants (e.g. ifp, ia, others) and not the rtentry address anymore. So instead of rtalloc() we have rtlookup(). > We can't do the same for ifp structures since > a) virtual ones can assume some state in underlying physical NIC > b) physical ones just _can_ be destroyed (maybe regardless of user wants this or not, like: SFP > being unplugged from NIC) or simply lead to kernel crash due to SW/HW inconsistency Here I actually believe we can do a GC or stable storage based approach. Ifp pointers are kept in too many places and properly refcounting it is very (too) hard. So whenever an interface gets destroyed or disappears it's callable function pointers are replaced with dummies returning an error. The ifp in memory will stay for some time and even may be reused for another new interface later again (Cisco does it that way in their IOS). > One of possible solution is to implement stable refcounts based on PCPU counters, and apply thos > counters to ifp, but seem to be non-trivial. > > > Another rtalloc(9) problem is the fact that radix is used as both 'control plane' and 'data plane' > structure/api. Some users always want to put more information in rte, while others > want to make rte more compact. We just need _different_ structures for that. ACK. > Feature-rich, lot-of-data control plane one (to store everything we want to store, including, for > example, PID of process originating the route) - current radix can be modified to do this. > And address-family-depended another structure (array, trie, or anything) which contains _only_ data > necessary to put packet on the wire. ACK. > 11) arpresolve. Currently (this was decoupled in 8.x) we have > a) ifaddr rlock > b) lle rlock. > > We don't need those locks. > We need to > a) make lle layer per-interface instead of global (and this can also solve multiple fibs and L2 > mappings done in fib.0 issue) Yes! > b) use rtalloc(9)-provided lock instead of separate locking No. Interface rmlock. > c) actually, we need to do rewrite this layer because > d) lle actually is the place to do real multipath: No, you can do multipath through more than one interface. If lle is per interface that wont work and is not the right place. > briefly, > you have rte pointing to some special nexthop structure pointing to lle, which has the following data: > num_of_egress_ifaces: [ifindex1, ifindex2, ifindex3] | L2 data to prepend to header > Separate post will follow. This should be part of the RIB/FIB and select on of the ifp+nexthops to return on lookup. > With the following, we can achieve lagg traffic distribution without actually using lagg_transmit > and similar stuff (at least in most common scenarious) This seems to be a rather nasty layering violation. > (for example, TCP output definitely can benefit from this, since we can account flowid once for TCP > session and use in in every mbuf) > > So. Imagine we have done all this. How we can estimate the difference? > > There was a thread, started a year ago, describing 'stock' performance and difference for various > modifications. > It is done on 8.x, however I've got similar results on recent 9.x > > http://lists.freebsd.org/pipermail/freebsd-net/2012-July/032680.html > > Briefly: > > 2xE5645 @ Intel 82599 NIC. > Kernel: FreeBSD-8-S r237994, stock drivers, stock routing, no FLOWTABLE, no firewallIxia XM2 > (traffic generator) <> ix0 (FreeBSD). Ixia sends 64byte IP packets from vlan10 (10.100.0.64 - > 10.100.0.156) to destinations in vlan11 (10.100.1.128 - 10.100.1.192). Static arps are configured > for all destination addresses. Traffic level is slightly above or slightly below system performance. > > we start from 1.4MPPS (if we are using several routes to minimize mutex contention). > > My 'current' result for the same test, on same HW, with the following modifications: > > * 1) ixgbe per-packet ring unlock removed > * P1) ixgbe is modified to do direct vlan input (so 2,3 are not used) > * 4) separate lockless in_localip() version > * 6) - using existing pfil lock > * 7) using lockless version > * 8) radix converted to use rmlock instead of rlock. Delayed GC is used instead of mutexes > * 10) - using existing pfil lock > * 11) using radix lock to do arpresolve(). Not using lle rlock > > (so the rmlocks are the only locks used on data path). > > Additionally, ipstat counters are converted to PCPU (no real performance implications). > ixgbe does not do per-packet accounting (as in head). > if_vlan counters are converted to PCPU > lagg is converted to rmlock, per-packet accounting is removed (using stat from underlying interfaces) > lle hash size is bumped to 1024 instead of 32 (not applicable here, but slows things down for large > L2 domains) > > The result is 5.6 MPPS for single port (11 cores) and 6.5MPPS for lagg (16 cores), nearly the same > for HT on and 22 cores. That's quite good, but we want more. ;) > .. > while Intel DPDK claims 80MPPS (and 6windgate talks about 160 or so) on the same-class hardware and > _userland_ forwarding. Those numbers sound a bit far out. Maybe if the packet isn't touched or looked at at all in a pure netmap interface to interface bridging scenario. I don't believe these numbers. > One of key features making all such products possible (DPDK, netmap, packetshader, Cisco SW > forwarding) - is use of batching instead of process-to-completion model. > Batching mitigates locking cost, batching does not wash out CPU cache, and so on. The work has to be done eventually. Batching doesn't relieve from it. IMHO batch moving is only the last step would should look at. It makes the stack rather complicated and introduces other issues like packet latency. > So maybe we can consider passing batches from NIC to at least L2 layer with netisr? or even up to > ip_input() ? And then? You probably won't win much in the end (if the lock path is optimized). > Another question is about making some sort of reliable GC like ("passive serialization" or other > similar not-to-pronounce-words about Linux and lockless objects). Rmlocks are our secret weapon and just as good. > P.S. Attached patches are 1) for 8.x 2) mostly 'hacks' showing roughly how can this be done and what > benefit can be achieved. -- Andre From owner-freebsd-net@FreeBSD.ORG Wed Aug 28 23:42:59 2013 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 10942647; Wed, 28 Aug 2013 23:42:59 +0000 (UTC) (envelope-from asomers@gmail.com) Received: from mail-qa0-x22f.google.com (mail-qa0-x22f.google.com [IPv6:2607:f8b0:400d:c00::22f]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 7E389211D; Wed, 28 Aug 2013 23:42:58 +0000 (UTC) Received: by mail-qa0-f47.google.com with SMTP id j7so2298910qaq.20 for ; Wed, 28 Aug 2013 16:42:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=tW3XeRzW9/4QvfSFm3hG7IY07mMajzujPReCU73KOVI=; b=CYbZBPVWJj2AYBbBCOz+ExfmVNsUI45n41WrqbYtjOUIhawt34lWnF54B+rkOypwKO DxhtxOcF9op64TsJV7Ni0juXzNH/jBhvWZkftJ1L0t9CkRD68VbCFU6UIj82xiTRbBWN o741yHmXNN82jVNrK8JveYw2F3xvVVcTWY8chuUu34yT9kQ+sjZxMCQMIFRcrxstN9pS E+smGudEHyE+j2RpshTCZQOVTtBgBYMGLr9FDbXLBgNxO4ZpdMMb2pX2wLkwc6P/JvBP C5rXqjWsIytbCFwYnUTB0JBupCOtSndbpC8EPWpO7PD/hrYqVgPDjOqKVrQu6EB9QUNC u+DQ== MIME-Version: 1.0 X-Received: by 10.224.46.202 with SMTP id k10mr1317225qaf.63.1377733377637; Wed, 28 Aug 2013 16:42:57 -0700 (PDT) Sender: asomers@gmail.com Received: by 10.49.39.101 with HTTP; Wed, 28 Aug 2013 16:42:57 -0700 (PDT) In-Reply-To: <521BBD21.4070304@freebsd.org> References: <521BBD21.4070304@freebsd.org> Date: Wed, 28 Aug 2013 17:42:57 -0600 X-Google-Sender-Auth: EMv0kAPDL5Sh0dCB9KRcfiY-hdU Message-ID: Subject: Re: Flow ID, LACP, and igb From: Alan Somers To: Andre Oppermann Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: Jack F Vogel , "Justin T. Gibbs" , Alan Somers , net@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 Aug 2013 23:42:59 -0000 On Mon, Aug 26, 2013 at 2:40 PM, Andre Oppermann wrote: > On 26.08.2013 19:18, Justin T. Gibbs wrote: > >> Hi Net, >> >> I'm an infrequent traveler through the networking code and would >> appreciate some feedback on some proposed solutions to issues Spectra >> has seen with outbound LACP traffic. >> >> lacp_select_tx_port() uses the flow ID if it is available in the outbound >> mbuf to select the outbound port. The igb driver uses the msix queue of >> the inbound packet to set a packet's flow ID. This doesn't provide enough >> bits of information to yield a high quality flow ID. If, for example, the >> switch controlling inbound packet distribution does a poor job, the >> outbound >> packet distribution will also be poorly distributed. >> > > Please note that inbound and outbound flow ID do not need to be the same > or symmetric. It only should stay the same for all packets in a single > connection to prevent reordering. > > Generally it doesn't matter if in- and outbound packets do not use the > same queue. Only in sophisticated setups with full affinity, which we > don't support yet, it could matter. > > > The majority of the adapters supported by this driver will compute >> the Toeplitz RSS hash. Using this data seems to work quite well >> in our tests (3 member LAGG group). Is there any reason we shouldn't >> use the RSS hash for flow ID? >> > > Using the RSS hash is the idea. The infrastructure and driver adjustments > haven't been implemented throughout yet. > > > We also tried disabling the use of flow ID and doing the hash directly in >> the driver. Unfortunately, the current hash is pretty weak. It >> multiplies >> by 33, which yield very poor distributions if you need to mod the result >> by 3 (e.g. LAGG group with 3 members). Alan modified the driver to use >> the FNV hash, which is already in the kernel, and this yielded much better >> results. He is still benchmarking the impact of this change. Assuming we >> can get decent flow ID data, this should only impact outbound UDP, since >> the >> stack doesn't provide a flow ID in this case. >> >> Are there other checksums we should be looking at in addition to FNV? >> > > siphash24() is fast, keyed and strong. > I benchmarked hash32 (the existing hash function) vs fnv_hash using both TCP and UDP, with 1500 and 9000 byte MTUs. At 10Gbps, I couldn't measure any difference in either throughput or cpu utilization. Given that siphash24 is definitely slower than hash32, there's no way that I'll find it to be significantly faster than fnv_hash for this application. In fact, I'm guessing that it will be slower due to the function call overhead and the fact that lagg_hashmbuf calls the hash function on very short buffers. Therefore I'm going to commit the change using fnv_hash in the next few days if no one objects. Here's the diff: ==== //SpectraBSD/stable/sys/net/ieee8023ad_lacp.c#4 (text) ==== @@ -763,7 +763,6 @@ sc->sc_psc = (caddr_t)lsc; lsc->lsc_softc = sc; - lsc->lsc_hashkey = arc4random(); lsc->lsc_active_aggregator = NULL; LACP_LOCK_INIT(lsc); TAILQ_INIT(&lsc->lsc_aggregators); @@ -841,7 +840,7 @@ if (sc->use_flowid && (m->m_flags & M_FLOWID)) hash = m->m_pkthdr.flowid; else - hash = lagg_hashmbuf(sc, m, lsc->lsc_hashkey); + hash = lagg_hashmbuf(sc, m); hash %= pm->pm_count; lp = pm->pm_map[hash]; ==== //SpectraBSD/stable/sys/net/ieee8023ad_lacp.h#2 (text) ==== @@ -244,7 +244,6 @@ LIST_HEAD(, lacp_port) lsc_ports; struct lacp_portmap lsc_pmap[2]; volatile u_int lsc_activemap; - u_int32_t lsc_hashkey; }; #define LACP_TYPE_ACTORINFO 1 ==== //SpectraBSD/stable/sys/net/if_lagg.c#9 (text) ==== @@ -35,7 +35,7 @@ #include #include #include -#include +#include #include #include #include @@ -1588,10 +1588,10 @@ } uint32_t -lagg_hashmbuf(struct lagg_softc *sc, struct mbuf *m, uint32_t key) +lagg_hashmbuf(struct lagg_softc *sc, struct mbuf *m) { uint16_t etype; - uint32_t p = key; + uint32_t p = FNV1_32_INIT; int off; struct ether_header *eh; const struct ether_vlan_header *vlan; @@ -1622,13 +1622,13 @@ eh = mtod(m, struct ether_header *); etype = ntohs(eh->ether_type); if (sc->sc_flags & LAGG_F_HASHL2) { - p = hash32_buf(&eh->ether_shost, ETHER_ADDR_LEN, p); - p = hash32_buf(&eh->ether_dhost, ETHER_ADDR_LEN, p); + p = fnv_32_buf(&eh->ether_shost, ETHER_ADDR_LEN, p); + p = fnv_32_buf(&eh->ether_dhost, ETHER_ADDR_LEN, p); } /* Special handling for encapsulating VLAN frames */ if ((m->m_flags & M_VLANTAG) && (sc->sc_flags & LAGG_F_HASHL2)) { - p = hash32_buf(&m->m_pkthdr.ether_vtag, + p = fnv_32_buf(&m->m_pkthdr.ether_vtag, sizeof(m->m_pkthdr.ether_vtag), p); } else if (etype == ETHERTYPE_VLAN) { vlan = lagg_gethdr(m, off, sizeof(*vlan), &buf); @@ -1636,7 +1636,7 @@ goto out; if (sc->sc_flags & LAGG_F_HASHL2) - p = hash32_buf(&vlan->evl_tag, sizeof(vlan->evl_tag), p); + p = fnv_32_buf(&vlan->evl_tag, sizeof(vlan->evl_tag), p); etype = ntohs(vlan->evl_proto); off += sizeof(*vlan) - sizeof(*eh); } @@ -1649,8 +1649,8 @@ goto out; if (sc->sc_flags & LAGG_F_HASHL3) { - p = hash32_buf(&ip->ip_src, sizeof(struct in_addr), p); - p = hash32_buf(&ip->ip_dst, sizeof(struct in_addr), p); + p = fnv_32_buf(&ip->ip_src, sizeof(struct in_addr), p); + p = fnv_32_buf(&ip->ip_dst, sizeof(struct in_addr), p); } if (!(sc->sc_flags & LAGG_F_HASHL4)) break; @@ -1665,7 +1665,7 @@ ports = lagg_gethdr(m, off, sizeof(*ports), &buf); if (ports == NULL) break; - p = hash32_buf(ports, sizeof(*ports), p); + p = fnv_32_buf(ports, sizeof(*ports), p); break; } break; @@ -1678,10 +1678,10 @@ if (ip6 == NULL) goto out; - p = hash32_buf(&ip6->ip6_src, sizeof(struct in6_addr), p); - p = hash32_buf(&ip6->ip6_dst, sizeof(struct in6_addr), p); + p = fnv_32_buf(&ip6->ip6_src, sizeof(struct in6_addr), p); + p = fnv_32_buf(&ip6->ip6_dst, sizeof(struct in6_addr), p); flow = ip6->ip6_flow & IPV6_FLOWLABEL_MASK; - p = hash32_buf(&flow, sizeof(flow), p); /* IPv6 flow label */ + p = fnv_32_buf(&flow, sizeof(flow), p); /* IPv6 flow label */ break; #endif } @@ -1904,7 +1904,7 @@ if (sc->use_flowid && (m->m_flags & M_FLOWID)) p = m->m_pkthdr.flowid; else - p = lagg_hashmbuf(sc, m, lb->lb_key); + p = lagg_hashmbuf(sc, m); p %= sc->sc_count; lp = lb->lb_ports[p]; ==== //SpectraBSD/stable/sys/net/if_lagg.h#5 (text) ==== @@ -262,7 +262,7 @@ extern void (*lagg_linkstate_p)(struct ifnet *, int ); int lagg_enqueue(struct ifnet *, struct mbuf *); -uint32_t lagg_hashmbuf(struct lagg_softc *, struct mbuf *, uint32_t); +uint32_t lagg_hashmbuf(struct lagg_softc *, struct mbuf *); SYSCTL_DECL(_net_link_lagg); > -- > Andre > > From owner-freebsd-net@FreeBSD.ORG Thu Aug 29 01:30:37 2013 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id A19FC88D; Thu, 29 Aug 2013 01:30:37 +0000 (UTC) (envelope-from slw@zxy.spb.ru) Received: from zxy.spb.ru (zxy.spb.ru [195.70.199.98]) by mx1.freebsd.org (Postfix) with ESMTP id 5B62426E5; Thu, 29 Aug 2013 01:30:37 +0000 (UTC) Received: from slw by zxy.spb.ru with local (Exim 4.69 (FreeBSD)) (envelope-from ) id 1VEr6H-000In5-Q7; Thu, 29 Aug 2013 05:32:41 +0400 Date: Thu, 29 Aug 2013 05:32:41 +0400 From: Slawa Olhovchenkov To: Andre Oppermann Subject: Re: Network stack changes Message-ID: <20130829013241.GB70584@zxy.spb.ru> References: <521E41CB.30700@yandex-team.ru> <521E78B0.6080709@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <521E78B0.6080709@freebsd.org> User-Agent: Mutt/1.5.21 (2010-09-15) X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: slw@zxy.spb.ru X-SA-Exim-Scanned: No (on zxy.spb.ru); SAEximRunCond expanded to false Cc: "Alexander V. Chernikov" , adrian@freebsd.org, freebsd-hackers@freebsd.org, freebsd-arch@freebsd.org, luigi@freebsd.org, ae@FreeBSD.org, FreeBSD Net X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Aug 2013 01:30:37 -0000 On Thu, Aug 29, 2013 at 12:24:48AM +0200, Andre Oppermann wrote: > > .. > > while Intel DPDK claims 80MPPS (and 6windgate talks about 160 or so) on the same-class hardware and > > _userland_ forwarding. > > Those numbers sound a bit far out. Maybe if the packet isn't touched > or looked at at all in a pure netmap interface to interface bridging > scenario. I don't believe these numbers. 80*64*8 = 40.960 Gb/s May be DCA? And use CPU with 40 PCIe lane and 4 memory chanell. From owner-freebsd-net@FreeBSD.ORG Thu Aug 29 06:23:31 2013 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 1783447C for ; Thu, 29 Aug 2013 06:23:31 +0000 (UTC) (envelope-from andre@freebsd.org) Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 7BB38252C for ; Thu, 29 Aug 2013 06:23:30 +0000 (UTC) Received: (qmail 26153 invoked from network); 29 Aug 2013 07:05:11 -0000 Received: from c00l3r.networx.ch (HELO [127.0.0.1]) ([62.48.2.2]) (envelope-sender ) by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP for ; 29 Aug 2013 07:05:11 -0000 Message-ID: <521EE8DA.3060107@freebsd.org> Date: Thu, 29 Aug 2013 08:23:22 +0200 From: Andre Oppermann User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130801 Thunderbird/17.0.8 MIME-Version: 1.0 To: Alan Somers Subject: Re: Flow ID, LACP, and igb References: <521BBD21.4070304@freebsd.org> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Jack F Vogel , "Justin T. Gibbs" , net@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Aug 2013 06:23:31 -0000 On 29.08.2013 01:42, Alan Somers wrote: > On Mon, Aug 26, 2013 at 2:40 PM, Andre Oppermann wrote: > >> On 26.08.2013 19:18, Justin T. Gibbs wrote: >> >>> Hi Net, >>> >>> I'm an infrequent traveler through the networking code and would >>> appreciate some feedback on some proposed solutions to issues Spectra >>> has seen with outbound LACP traffic. >>> >>> lacp_select_tx_port() uses the flow ID if it is available in the outbound >>> mbuf to select the outbound port. The igb driver uses the msix queue of >>> the inbound packet to set a packet's flow ID. This doesn't provide enough >>> bits of information to yield a high quality flow ID. If, for example, the >>> switch controlling inbound packet distribution does a poor job, the >>> outbound >>> packet distribution will also be poorly distributed. >>> >> >> Please note that inbound and outbound flow ID do not need to be the same >> or symmetric. It only should stay the same for all packets in a single >> connection to prevent reordering. >> >> Generally it doesn't matter if in- and outbound packets do not use the >> same queue. Only in sophisticated setups with full affinity, which we >> don't support yet, it could matter. >> >> >> The majority of the adapters supported by this driver will compute >>> the Toeplitz RSS hash. Using this data seems to work quite well >>> in our tests (3 member LAGG group). Is there any reason we shouldn't >>> use the RSS hash for flow ID? >>> >> >> Using the RSS hash is the idea. The infrastructure and driver adjustments >> haven't been implemented throughout yet. >> >> >> We also tried disabling the use of flow ID and doing the hash directly in >>> the driver. Unfortunately, the current hash is pretty weak. It >>> multiplies >>> by 33, which yield very poor distributions if you need to mod the result >>> by 3 (e.g. LAGG group with 3 members). Alan modified the driver to use >>> the FNV hash, which is already in the kernel, and this yielded much better >>> results. He is still benchmarking the impact of this change. Assuming we >>> can get decent flow ID data, this should only impact outbound UDP, since >>> the >>> stack doesn't provide a flow ID in this case. >>> >>> Are there other checksums we should be looking at in addition to FNV? >>> >> >> siphash24() is fast, keyed and strong. >> > I benchmarked hash32 (the existing hash function) vs fnv_hash using both > TCP and UDP, with 1500 and 9000 byte MTUs. At 10Gbps, I couldn't measure > any difference in either throughput or cpu utilization. Given that > siphash24 is definitely slower than hash32, there's no way that I'll find > it to be significantly faster than fnv_hash for this application. In fact, > I'm guessing that it will be slower due to the function call overhead and > the fact that lagg_hashmbuf calls the hash function on very short buffers. No problem with fnv_hash(). While I agree that it is likely that siphash24() is slower if you could afford the time do a test run it would be great to from guess to know. > Therefore I'm going to commit the change using fnv_hash in the next few > days if no one objects. Here's the diff: > > ==== //SpectraBSD/stable/sys/net/ieee8023ad_lacp.c#4 (text) ==== > > @@ -763,7 +763,6 @@ > sc->sc_psc = (caddr_t)lsc; > lsc->lsc_softc = sc; > > - lsc->lsc_hashkey = arc4random(); > lsc->lsc_active_aggregator = NULL; > LACP_LOCK_INIT(lsc); > TAILQ_INIT(&lsc->lsc_aggregators); > @@ -841,7 +840,7 @@ > if (sc->use_flowid && (m->m_flags & M_FLOWID)) > hash = m->m_pkthdr.flowid; > else > - hash = lagg_hashmbuf(sc, m, lsc->lsc_hashkey); > + hash = lagg_hashmbuf(sc, m); > hash %= pm->pm_count; > lp = pm->pm_map[hash]; The reason for the hashkey was to prevent directed "attacks" on the load balancing by choosing/predicting the outcome of it. This is good and bad as it is undeterministic between runs, which makes debugging particular situations harder. To work around the lack of key for fnv_hash() XOR'ing the hash output with a pre-initialized random is likely sufficient. The true importance of this randomization is debatable and just point out why it was there, not to object to you removing it. -- Andre From owner-freebsd-net@FreeBSD.ORG Thu Aug 29 06:46:54 2013 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 0F26FE7D; Thu, 29 Aug 2013 06:46:54 +0000 (UTC) (envelope-from bryanv@daemoninthecloset.org) Received: from torment.daemoninthecloset.org (torment.daemoninthecloset.org [94.242.209.234]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id C098F2912; Thu, 29 Aug 2013 06:46:53 +0000 (UTC) Received: from sage.daemoninthecloset.org (unknown [70.114.209.60]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "sage.daemoninthecloset.org", Issuer "daemoninthecloset.org" (verified OK)) by torment.daemoninthecloset.org (Postfix) with ESMTPS id DFBE342C08C6; Thu, 29 Aug 2013 08:52:03 +0200 (CEST) X-Virus-Scanned: amavisd-new at daemoninthecloset.org X-Virus-Scanned: amavisd-new at daemoninthecloset.org Date: Thu, 29 Aug 2013 01:46:32 -0500 (CDT) From: Bryan Venteicher To: Andre Oppermann Message-ID: <2112475076.435.1377758792082.JavaMail.root@daemoninthecloset.org> In-Reply-To: <521E78B0.6080709@freebsd.org> References: <521E41CB.30700@yandex-team.ru> <521E78B0.6080709@freebsd.org> Subject: Re: Network stack changes MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [192.168.10.20] X-Mailer: Zimbra 8.0.2_GA_5569 (ZimbraWebClient - GC20 ([unknown])/8.0.2_GA_5569) Thread-Topic: Network stack changes Thread-Index: anDUShTn7iVw7wFEqZDuK6ld/6VXsQ== Cc: "Alexander V. Chernikov" , adrian@freebsd.org, freebsd-hackers@freebsd.org, freebsd-arch@freebsd.org, luigi@freebsd.org, ae@FreeBSD.org, FreeBSD Net X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Aug 2013 06:46:54 -0000 ----- Original Message ----- > On 28.08.2013 20:30, Alexander V. Chernikov wrote: > > Hello list! > > Hello Alexander, > > you sent quite a few things in the same email. I'll try to respond > as much as I can right now. Later you should split it up to have > more in-depth discussions on the individual parts. > > > > We already have some capabilities like VLANHWFILTER/VLANHWTAG, we can add > > some more. We even have > > per-driver hooks to program HW filtering. > > We could. Though for vlan it looks like it would be easier to remove the > hardware vlan tag stripping and insertion. It only adds complexity in all > drivers for no gain. > In the shorter term, can we remove the requirement for the parent interface to support IFCAP_VLAN_HWTAGGING in order to do checksum offloading on the VLAN interface (see vlan_capabilities())? From owner-freebsd-net@FreeBSD.ORG Thu Aug 29 08:17:03 2013 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 33BCA637; Thu, 29 Aug 2013 08:17:03 +0000 (UTC) (envelope-from talayeh.asadi@gmail.com) Received: from mail-ie0-x233.google.com (mail-ie0-x233.google.com [IPv6:2607:f8b0:4001:c03::233]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id EF3F72E1F; Thu, 29 Aug 2013 08:17:02 +0000 (UTC) Received: by mail-ie0-f179.google.com with SMTP id m16so138942ieq.38 for ; Thu, 29 Aug 2013 01:17:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:reply-to:sender:in-reply-to:references:from:date :message-id:subject:to:content-type; bh=qepuzPMG2uo+0rXT3e3tnZmt6d0oVkbE/4WldObjjE0=; b=IR2nZocswP1rWvL/g+z/j8xKh1Q2JkzaNCUWoh5zyY4DX+FEIgwEN7t5KSN1u8OfY+ IgoH7k8U9DXlyAZKznwzRoYEeoZETUDl/y/jlsJ6r0coCyUec+HP9BB62g9LrNH6IVjD 0avwy+3NSjUfE4jL3tBT6ZaFQ99BL8owqYRMkE29PdbEkl1vgLEMSPpftqxm4KHYva1T t/dmU6vQVsI1DS9209buC/yZjObYsauQgQe2kEgHetvfGJdc57QWWB5ELo96/oh68EO7 vDkcIFyzFYqt/AEfngs0kjAExvjzT7nD+UY/ukOLGJKb2i3ObcpqzYpYu4WgRW5080Sz TK5A== X-Received: by 10.50.23.16 with SMTP id i16mr1178053igf.50.1377764222424; Thu, 29 Aug 2013 01:17:02 -0700 (PDT) MIME-Version: 1.0 Sender: talayeh.asadi@gmail.com Received: by 10.42.153.8 with HTTP; Thu, 29 Aug 2013 01:16:42 -0700 (PDT) In-Reply-To: References: From: takCoder Date: Thu, 29 Aug 2013 12:46:42 +0430 X-Google-Sender-Auth: --i8uKSVtRi7TVeziDJphz8vW8k Message-ID: Subject: Re: telnet authentication using RADIUS To: Freebsd-net , FreeBSD Questions Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: tak.official@gmail.com List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Aug 2013 08:17:03 -0000 hi again.. pardon me, but I still have not find anything to solve my problem with using pam/telnetd.. my problem is: I need pam.d/telnetd to be always used as telnet aaa configs.. but when a non-sra telnet connection is created, pam.d/login is used for that telnet session's aaa configurations.. is there any way to do an integration? any ideas?? please let me know of any point you may know about this.. thank you so much :) Best Regards, takCoder On Wed, Aug 14, 2013 at 2:38 PM, takCoder wrote: > hi all, > > I need to apply radius authentication for my remote connections. For ssh, > I have no problems, as I use pam.d/sshd file to add pam_radius.so entry.. > > but for telnet I've faced a problem.. as I have seen, for non-SRA telnet > connections, telnet authentication will be done via pam.d/login rather than > pam.d/telnetd.. and this depends on telnet client as well rather than just > my server.. > > I need it to always apply pam.d/telnetd file for all telnet > authentications, so i can separate my remote authentication policies from > local ones.. > > am I right with the facts I said above about telnet? > Do you know of any tip or trick on this?? any ideas are really > appreciated.. > Thank you :) > > Best Regards, > t.a.k > From owner-freebsd-net@FreeBSD.ORG Thu Aug 29 11:49:34 2013 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id B9C08243; Thu, 29 Aug 2013 11:49:34 +0000 (UTC) (envelope-from adrian.chadd@gmail.com) Received: from mail-wi0-x234.google.com (mail-wi0-x234.google.com [IPv6:2a00:1450:400c:c05::234]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 566572FB7; Thu, 29 Aug 2013 11:49:33 +0000 (UTC) Received: by mail-wi0-f180.google.com with SMTP id l12so352069wiv.13 for ; Thu, 29 Aug 2013 04:49:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=t5rRGR7p2lccT5dIVAFMMz5iEhB8EN3uCWH29jaGBug=; b=jG6EG9SHnUTL+LWD0mt8ZdlyhVyFrBZee9eO0wArZSQi7Kxa4sipEEiBbicH27NlRE WCFqBBALUkxOLkfAinjqMBlaV/iJhly1bozkC2JSX40PczqetRoSxgspp1/Uf8S+/7Y/ SAPOMG5R/RfYBn/5LaIxPpziJpJ8uJvmxiuc1U90ViJZGA7R/XjoJgRyDWubRr53+sIM prBz7ivSPp48uUqSxvRc6u09Edy/XM3+hSFHKyMWPMoP/isaPhtr5W6IrGK0lz1Cm4Oc SNbYlKGLpk1MD4mzIVxrREDDjMFyu8VVJjqpBIM9jv8oxMF2TcPAFeMlBXnnKrl9XP9A PNgw== MIME-Version: 1.0 X-Received: by 10.194.79.33 with SMTP id g1mr2141120wjx.79.1377776971643; Thu, 29 Aug 2013 04:49:31 -0700 (PDT) Sender: adrian.chadd@gmail.com Received: by 10.216.146.2 with HTTP; Thu, 29 Aug 2013 04:49:31 -0700 (PDT) In-Reply-To: <521E41CB.30700@yandex-team.ru> References: <521E41CB.30700@yandex-team.ru> Date: Thu, 29 Aug 2013 04:49:31 -0700 X-Google-Sender-Auth: fjTZLF4GZ_Hxxlda_cdxncMN6aA Message-ID: Subject: Re: Network stack changes From: Adrian Chadd To: "Alexander V. Chernikov" Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: Luigi Rizzo , Andre Oppermann , "freebsd-hackers@freebsd.org" , FreeBSD Net , "Andrey V. Elsukov" , "freebsd-arch@freebsd.org" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Aug 2013 11:49:34 -0000 Hi, There's a lot of good stuff to review here, thanks! Yes, the ixgbe RX lock needs to die in a fire. It's kinda pointless to keep locking things like that on a per-packet basis. We should be able to do this in a cleaner way - we can defer RX into a CPU pinned taskqueue and convert the interrupt handler to a fast handler that just schedules that taskqueue. We can ignore the ithread entirely here. What do you think? Totally pie in the sky handwaving at this point: * create an array of mbuf pointers for completed mbufs; * populate the mbuf array; * pass the array up to ether_demux(). For vlan handling, it may end up populating its own list of mbufs to push up to ether_demux(). So maybe we should extend the API to have a bitmap of packets to actually handle from the array, so we can pass up a larger array of mbufs, note which ones are for the destination and then the upcall can mark which frames its consumed. I specifically wonder how much work/benefit we may see by doing: * batching packets into lists so various steps can batch process things rather than run to completion; * batching the processing of a list of frames under a single lock instance - eg, if the forwarding code could do the forwarding lookup for 'n' packets under a single lock, then pass that list of frames up to inet_pfil_hook() to do the work under one lock, etc, etc. Here, the processing would look less like "grab lock and process to completion" and more like "mark and sweep" - ie, we have a list of frames that we mark as needing processing and mark as having been processed at each layer, so we know where to next dispatch them. I still have some tool coding to do with PMC before I even think about tinkering with this as I'd like to measure stuff like per-packet latency as well as top-level processing overhead (ie, CPU_CLK_UNHALTED.THREAD_P / lagg0 TX bytes/pkts, RX bytes/pkts, NIC interrupts on that core, etc.) Thanks, -adrian From owner-freebsd-net@FreeBSD.ORG Thu Aug 29 11:52:54 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 1BB6D7AB for ; Thu, 29 Aug 2013 11:52:54 +0000 (UTC) (envelope-from darrenr@netbsd.org) Received: from out4-smtp.messagingengine.com (out4-smtp.messagingengine.com [66.111.4.28]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id E21772053 for ; Thu, 29 Aug 2013 11:52:53 +0000 (UTC) Received: from compute3.internal (compute3.nyi.mail.srv.osa [10.202.2.43]) by gateway1.nyi.mail.srv.osa (Postfix) with ESMTP id 8934321E1D; Thu, 29 Aug 2013 07:52:50 -0400 (EDT) Received: from frontend1 ([10.202.2.160]) by compute3.internal (MEProxy); Thu, 29 Aug 2013 07:52:50 -0400 DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d= messagingengine.com; h=message-id:date:from:mime-version:to:cc :subject:references:in-reply-to:content-type :content-transfer-encoding; s=smtpout; bh=rV1HpaiAO2+B9JYuOWptUO I8mCw=; b=jxApQHAH9+4DjI7KJYU9t1NyD/iGUEeQ0kTmZStIjyul6s5FrAkeK8 zEt5+WWjoWfMgeNmZMcRGDeXUpMmzZUAivY/cv3I8ipjvDDpHLh8fdoTccerxINE aeO82R1Aw16GKGf6wJeuXhZUdewbrzb4Z2svlSFuFJKY3tzhz/Ttg= X-Sasl-enc: 5xyMFwxwNbxr+WXDObK5YqBhIh2EwJC7pF0XWeaNtMwg 1377777170 Received: from [192.168.1.31] (unknown [203.206.138.26]) by mail.messagingengine.com (Postfix) with ESMTPA id F0915C00E84; Thu, 29 Aug 2013 07:52:48 -0400 (EDT) Message-ID: <521F4522.5070403@netbsd.org> Date: Thu, 29 Aug 2013 22:57:06 +1000 From: Darren Reed User-Agent: Thunderbird 2.0.0.24 (Windows/20100228) MIME-Version: 1.0 To: Mindaugas Rasiukevicius Subject: Re: BPF_MISC+BPF_COP and BPF_COPX (summary and patch) References: <20130804191310.2FFBB14A152@mail.netbsd.org> <20130822101623.3837E14A21D@mail.netbsd.org> In-Reply-To: <20130822101623.3837E14A21D@mail.netbsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: tech-net@netbsd.org, guy@alum.mit.edu, freebsd-net@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Aug 2013 11:52:54 -0000 Mindaugas Rasiukevicius wrote: > Hi, > > OK, to summarise what has been discussed: > > - Problem > > There is a need to perform more complex operations from the BPF program. > Currently, there is no (practical) way to do that from the byte-code. > Such functionality is useful for the packet filters or other components, > which could integrate with BPF. For example, while most of the packet > inspection logic can stay in the byte-code, such operations as looking up > an IP address in some container or walking the IPv6 headers and returning > some offsets have to be done externally. The first existing user of such > capability would be NPF in NetBSD. > I'd argue that the IPv6 problem is of such a generic nature that it deserves its own instruction/s. We may look at IPv6 today and think nobody uses it much but over time that is going to change. Thus there will be an outcome not possible with co-processor approach if an instruction is created for that purpose and is common across all platforms through libpcap. Unless the IPv6 problem is too complex for a single instruction (this has not been demonstrated.) In that case maybe BPF itself needs to evolve such that it can support more complex instructions. The current implementation of BPF makes it very hard to expand the instruction set without impinging on the ability to make future changes due to the way in which instructions are codified into 32bits. Whilst the method of supporting a co-processor gets around that, it does so in such a generic fashion that it becomes too easy to use it as a bit-bucket for anything you think might be a good idea if BPF could do without really evaluating if it should do. When it comes to looking up addresses in tables, I don't see the advantage in adding this to BPF to support NPF. My suspicioun is that the goal is to support expressing the entire rule as just BPF byte code. For rules, it makes no sense as the expensive operation (table lookup) could just as easily be done after the rest of the packet is matched with BPF. Or is there something else here at play that I'm missing? From owner-freebsd-net@FreeBSD.ORG Thu Aug 29 12:35:30 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 4AA164E2 for ; Thu, 29 Aug 2013 12:35:30 +0000 (UTC) (envelope-from freebsd-net@m.gmane.org) Received: from plane.gmane.org (plane.gmane.org [80.91.229.3]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 091B72370 for ; Thu, 29 Aug 2013 12:35:29 +0000 (UTC) Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1VF1Re-0008SY-GO for freebsd-net@freebsd.org; Thu, 29 Aug 2013 14:35:26 +0200 Received: from rebar.astron.com ([208.77.212.97]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 29 Aug 2013 14:35:26 +0200 Received: from christos by rebar.astron.com with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 29 Aug 2013 14:35:26 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-net@freebsd.org From: christos@astron.com (Christos Zoulas) Subject: Re: BPF_MISC+BPF_COP and BPF_COPX (summary and patch) Date: Thu, 29 Aug 2013 12:35:07 +0000 (UTC) Lines: 22 Message-ID: References: <20130804191310.2FFBB14A152@mail.netbsd.org> <20130822101623.3837E14A21D@mail.netbsd.org> <521F4522.5070403@netbsd.org> X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: rebar.astron.com X-Newsreader: trn 4.0-test76 (Apr 2, 2001) Cc: tech-net@netbsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Aug 2013 12:35:30 -0000 In article <521F4522.5070403@netbsd.org>, Darren Reed wrote: > >The current implementation of BPF makes it very hard to expand >the instruction set without impinging on the ability to make >future changes due to the way in which instructions are codified >into 32bits. Whilst the method of supporting a co-processor gets >around that, it does so in such a generic fashion that it becomes >too easy to use it as a bit-bucket for anything you think might >be a good idea if BPF could do without really evaluating if it >should do. I think that the COP/COPX encapsulation does not leak outside the kernel (in principle), so the functionality that the COP/COPX subroutines NPF provides doesn't become part of the BPF feature set and cannot be re-used outside the kernel. As such, this technique can be used to experiment and see what offloading mechanisms are required for efficient IPv6 processing. Once that is better understood, we can think about turning them into real and officially supported BPF instructions. christos From owner-freebsd-net@FreeBSD.ORG Thu Aug 29 14:08:17 2013 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 57BF4DCD for ; Thu, 29 Aug 2013 14:08:17 +0000 (UTC) (envelope-from btv1==9536076923e==tgubatayao@barracuda.com) Received: from bsf01.barracuda.com (bsf01.barracuda.com [64.235.145.81]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 327EE2A36 for ; Thu, 29 Aug 2013 14:08:17 +0000 (UTC) X-ASG-Debug-ID: 1377784346-03dc6652b2ad260005-oFaieN Received: from bn-scl-fe06.Cudanet.local (mail.barracuda.com [10.8.1.48]) by bsf01.barracuda.com with ESMTP id Y4hkEaZuDp7hhtlV (version=TLSv1 cipher=AES128-SHA bits=128 verify=NO); Thu, 29 Aug 2013 06:52:28 -0700 (PDT) X-Barracuda-Envelope-From: tgubatayao@barracuda.com Received: from bn-scl-be03.Cudanet.local (10.8.1.54) by bn-scl-fe06.Cudanet.local (10.8.1.48) with Microsoft SMTP Server (TLS) id 8.3.298.1; Thu, 29 Aug 2013 06:51:19 -0700 Received: from BN-SCL-MBX03.Cudanet.local ([fe80::e5b6:9fef:a4d2:a5ba]) by bn-scl-be03.Cudanet.local ([::1]) with mapi; Thu, 29 Aug 2013 00:28:00 -0700 From: "T.C. Gubatayao" To: Andre Oppermann , Alan Somers Date: Thu, 29 Aug 2013 00:27:59 -0700 Subject: RE: Flow ID, LACP, and igb Thread-Topic: Flow ID, LACP, and igb X-ASG-Orig-Subj: RE: Flow ID, LACP, and igb Thread-Index: Ac6kgE7vX+IaHy2MTXGTTUxllQgr3AAAYtHH Message-ID: References: <521BBD21.4070304@freebsd.org> , <521EE8DA.3060107@freebsd.org> In-Reply-To: <521EE8DA.3060107@freebsd.org> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Barracuda-Connect: mail.barracuda.com[10.8.1.48] X-Barracuda-Start-Time: 1377784348 X-Barracuda-Encrypted: AES128-SHA X-Barracuda-URL: http://bsf01.barracuda.com:8000/cgi-mod/mark.cgi X-Barracuda-BRTS-Status: 1 X-Virus-Scanned: by bsmtpd at barracuda.com X-Barracuda-Spam-Score: 0.74 X-Barracuda-Spam-Status: No, SCORE=0.74 using per-user scores of TAG_LEVEL=1000.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=9.0 tests=CN_BODY_332, COMMA_SUBJECT, THREAD_INDEX, THREAD_TOPIC X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.139746 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- 0.01 THREAD_INDEX thread-index: AcO7Y8iR61tzADqsRmmc5wNiFHEOig== 0.01 THREAD_TOPIC Thread-Topic: ...(Japanese Subject)... 0.60 COMMA_SUBJECT Subject is like 'Re: FDSDS, this is a subject' 0.12 CN_BODY_332 BODY: CN_BODY_332 Cc: Jack F Vogel , "Justin T. Gibbs" , "net@freebsd.org" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Aug 2013 14:08:17 -0000 > No problem with fnv_hash(). Doesn't it have bad mixing? Good distribution is important since this code= is for load balancing. FNV is also slower compared to most of the newer non-cryptographic hashes, certainly on large keys, but even on small ones. Of course, performance wi= ll vary with the architecture. > While I agree that it is likely that siphash24() is slower if you could a= fford > the time do a test run it would be great to from guess to know. +1 You might want to consider lookup3 too, since it's also readily available i= n the kernel [1]. T.C. [1] http://svnweb.freebsd.org/base/head/sys/libkern/jenkins_hash.c?view=3Dm= arkup= From owner-freebsd-net@FreeBSD.ORG Thu Aug 29 14:58:15 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 8FA50605; Thu, 29 Aug 2013 14:58:15 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher ADH-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 1432F2DBC; Thu, 29 Aug 2013 14:58:15 +0000 (UTC) Received: from jhbbsd.localnet (unknown [38.105.238.108]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 13E17B9A9; Thu, 29 Aug 2013 10:58:14 -0400 (EDT) From: John Baldwin To: freebsd-current@freebsd.org Subject: Re: [rfc] migrate lagg to an rmlock Date: Thu, 29 Aug 2013 10:42:13 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p28; KDE/4.5.5; amd64; ; ) References: <5218AA36.1080807@ipfw.ru> In-Reply-To: MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201308291042.13282.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Thu, 29 Aug 2013 10:58:14 -0400 (EDT) Cc: FreeBSD Net , Adrian Chadd , Robert Watson , "Alexander V. Chernikov" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Aug 2013 14:58:15 -0000 On Saturday, August 24, 2013 10:16:33 am Robert Watson wrote: > There are a number of other places in the kernel where migration to an rmlock > makes sense -- however, some care must be taken for four reasons: (1) while > read locks don't experience line contention, write locking becomes observably > e.g., rmlocks might not be suitable for tcbinfo; (2) rmlocks, unlike rwlocks, > more expensive so is not suitable for all rwlock line contention spots -- > implement reader priority propagation, so you must reason about; and (3) > historically, rmlocks have not fully implemented WITNESS so you may get less > good debugging output. if_lagg is a nice place to use rmlocks, as > reconfigurations are very rare, and it's really all about long-term data > stability. 3) should no longer be an issue. rmlocks now have full WITNESS and assertion support (including an rm_assert). However, one thing to consider is that rmlocks pin readers to CPUs while the read lock is held (which rwlocks do not do). -- John Baldwin From owner-freebsd-net@FreeBSD.ORG Thu Aug 29 15:27:58 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id A30AEB10 for ; Thu, 29 Aug 2013 15:27:58 +0000 (UTC) (envelope-from rmind@netbsd.org) Received: from mail.netbsd.org (mail.NetBSD.org [IPv6:2001:4f8:3:7::25]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 8DB7E2075 for ; Thu, 29 Aug 2013 15:27:58 +0000 (UTC) Received: from ws (localhost [IPv6:::1]) by mail.netbsd.org (Postfix) with SMTP id 030C414A13C; Thu, 29 Aug 2013 15:27:56 +0000 (UTC) Date: Thu, 29 Aug 2013 16:27:41 +0100 From: Mindaugas Rasiukevicius To: Darren Reed Subject: Re: BPF_MISC+BPF_COP and BPF_COPX (summary and patch) In-Reply-To: <521F4522.5070403@netbsd.org> References: <20130804191310.2FFBB14A152@mail.netbsd.org> <20130822101623.3837E14A21D@mail.netbsd.org> <521F4522.5070403@netbsd.org> X-Mailer: mail(1) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-Id: <20130829152757.030C414A13C@mail.netbsd.org> Cc: tech-net@netbsd.org, guy@alum.mit.edu, freebsd-net@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Aug 2013 15:27:58 -0000 Darren Reed wrote: > Mindaugas Rasiukevicius wrote: > > Hi, > > > > OK, to summarise what has been discussed: > > > > - Problem > > > > There is a need to perform more complex operations from the BPF program. > > Currently, there is no (practical) way to do that from the byte-code. > > Such functionality is useful for the packet filters or other components, > > which could integrate with BPF. For example, while most of the packet > > inspection logic can stay in the byte-code, such operations as looking > > up an IP address in some container or walking the IPv6 headers and > > returning some offsets have to be done externally. The first existing > > user of such capability would be NPF in NetBSD. > > > > I'd argue that the IPv6 problem is of such a generic nature that > it deserves its own instruction/s. We may look at IPv6 today and > think nobody uses it much but over time that is going to change. > Thus there will be an outcome not possible with co-processor > approach if an instruction is created for that purpose and is > common across all platforms through libpcap. Unless the IPv6 > problem is too complex for a single instruction (this has not > been demonstrated.) In that case maybe BPF itself needs to evolve > such that it can support more complex instructions. This is a separate issue. Feel free to propose an new instruction to parse IPv6 headers. > The current implementation of BPF makes it very hard to expand > the instruction set without impinging on the ability to make > future changes due to the way in which instructions are codified > into 32bits. Whilst the method of supporting a co-processor gets > around that, it does so in such a generic fashion that it becomes > too easy to use it as a bit-bucket for anything you think might > be a good idea if BPF could do without really evaluating if it > should do. It is certainly possible that some operations, which will be implemented using BPF coprocessor, will be useful in general. Again, whether such operations should be "promoted" to be new BPF instructions or there should be a global "standardised" coprocessor or how BPF should evolve (including RISC vs CISC-like instruction set debate) is a separate discussion. -- Mindaugas From owner-freebsd-net@FreeBSD.ORG Thu Aug 29 15:28:49 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 51FDDBCF; Thu, 29 Aug 2013 15:28:49 +0000 (UTC) (envelope-from rmind@netbsd.org) Received: from mail.netbsd.org (mail.NetBSD.org [IPv6:2001:4f8:3:7::25]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 3D468208E; Thu, 29 Aug 2013 15:28:49 +0000 (UTC) Received: from ws (localhost [IPv6:::1]) by mail.netbsd.org (Postfix) with SMTP id 9AC9514A14E; Thu, 29 Aug 2013 15:28:47 +0000 (UTC) Date: Thu, 29 Aug 2013 16:28:32 +0100 From: Mindaugas Rasiukevicius To: Adrian Chadd Subject: Re: BPF_MISC+BPF_COP and BPF_COPX In-Reply-To: References: <20130804191310.2FFBB14A152@mail.netbsd.org> <9813E50B-C557-4FE1-BADF-A2CFFCBB8BD7@felyko.com> <20130804195538.C87A614A135@mail.netbsd.org> <20130804225434.87A9C14A152@mail.netbsd.org> X-Mailer: mail(1) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-Id: <20130829152847.9AC9514A14E@mail.netbsd.org> Cc: tech-net@netbsd.org, guy@alum.mit.edu, freebsd-net@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Aug 2013 15:28:49 -0000 Hi, Adrian Chadd wrote: > >> > It provides us a capability to offload more complex packet > >> > processing. My primary user would be NPF in NetBSD, e.g. one of the > >> > operations is to lookup an IP address in a table/ipset. > > > > I would like to coordinate the reservation of BPF opcodes though. > > That's a good idea. I have no problem with that. > I have added these to the NetBSD tree: #define BPF_MISCOP(code) ((code) & 0xf8) #define BPF_TAX 0x00 +#define BPF_COP 0x20 +#define BPF_COPX 0x40 #define BPF_TXA 0x80 Would you like to reserve them in FreeBSD as well? Thanks. -- Mindaugas From owner-freebsd-net@FreeBSD.ORG Thu Aug 29 15:37:49 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id B6FBB5B4 for ; Thu, 29 Aug 2013 15:37:49 +0000 (UTC) (envelope-from scott4long@yahoo.com) Received: from nm14-vm0.bullet.mail.bf1.yahoo.com (nm14-vm0.bullet.mail.bf1.yahoo.com [98.139.213.164]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 470B62188 for ; Thu, 29 Aug 2013 15:37:48 +0000 (UTC) Received: from [98.139.212.146] by nm14.bullet.mail.bf1.yahoo.com with NNFMP; 29 Aug 2013 15:37:41 -0000 Received: from [68.142.230.65] by tm3.bullet.mail.bf1.yahoo.com with NNFMP; 29 Aug 2013 15:37:41 -0000 Received: from [127.0.0.1] by smtp222.mail.bf1.yahoo.com with NNFMP; 29 Aug 2013 15:37:41 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1377790661; bh=sl2eof/Mq54DsMSXno1xhR4rB+OMdYDBqJioaWzAoJI=; h=X-Yahoo-Newman-Id:X-Yahoo-Newman-Property:X-YMail-OSG:X-Yahoo-SMTP:X-Rocket-Received:Content-Type:Mime-Version:Subject:From:In-Reply-To:Date:Cc:Content-Transfer-Encoding:Message-Id:References:To:X-Mailer; b=5nx67tQjcxswQuOUZtvuFUTji4snWQ8vKGCobZlla/wKprKSIXZucgcseXPtxw6IpAT6pQdUWVTpCmaiFJqjohx6D8nFFQ0clvx/oOs+r6HUcChmq9T9CyOeK6bdgi2fXa+m2ugPgy1f7o0yjY8MzjAH5tkjeRbtylxe8MPPGwA= X-Yahoo-Newman-Id: 508687.6034.bm@smtp222.mail.bf1.yahoo.com X-Yahoo-Newman-Property: ymail-3 X-YMail-OSG: zMjjfIIVM1nwUud_OaWyyfNc2uwkmn1U3bga6WLmaJ886ZJ mFcyxKf9A9K0uip.cX9.5LCYtBMK5PHk9981MniK5ueCwZbK1wEN5wCpq1AE fHMGpfdjuivwPcXX0ANpgauuue5rmEIXGs18M_tGKYsLHbB8F1QqqBrzGmQw O5U1DeIiLa1MBHsykyPzYvSGyLfxkS5W1eISdAjSlNLjJuRuDcA5a.9nGPs3 FAOFM.pb3k9ZB8UNPM5dSwgfGjyIHITfzEskSwnYPizsHmti7AzP25ahwg0t 95Qw_wiZuI4qPtKBJR3JcGR_7lVaswort6W4marwsh4L0zr9TtPIK1YlONP7 cZqwJh69eW48s1zb_Gpqld7waQPTk03qScPcSSrPIabrxiDY.rAkGnqJXER. Fl_MCNfVAe3MAAF89BwLW9G2Ak97mrnGzJq8oT1NTM_DTu_VHgClWPyTUqh2 c3rZfNXwUVn62rhmEN4b6cFkyc_CPB1xJlbWhBIIkRaJq5t6uywMx7hsV5dV bUmiw0X9kA3actusX5mAADFTRcN6ytJU- X-Yahoo-SMTP: clhABp.swBB7fs.LwIJpv3jkWgo2NU8- X-Rocket-Received: from [172.16.7.246] (scott4long@173.13.138.133 with ) by smtp222.mail.bf1.yahoo.com with SMTP; 29 Aug 2013 15:37:41 +0000 UTC Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\)) Subject: Re: [rfc] migrate lagg to an rmlock From: Scott Long In-Reply-To: <201308291042.13282.jhb@freebsd.org> Date: Thu, 29 Aug 2013 08:37:08 -0700 Content-Transfer-Encoding: quoted-printable Message-Id: <41614148-3900-4FE0-88AC-40F10DAE2030@yahoo.com> References: <5218AA36.1080807@ipfw.ru> <201308291042.13282.jhb@freebsd.org> To: John Baldwin X-Mailer: Apple Mail (2.1508) Cc: FreeBSD Net , Adrian Chadd , freebsd-current@freebsd.org, Robert Watson , "Alexander V. Chernikov" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Aug 2013 15:37:49 -0000 On Aug 29, 2013, at 7:42 AM, John Baldwin wrote: > On Saturday, August 24, 2013 10:16:33 am Robert Watson wrote: >> There are a number of other places in the kernel where migration to = an rmlock=20 >> makes sense -- however, some care must be taken for four reasons: (1) = while=20 >> read locks don't experience line contention, write locking becomes = observably=20 >> e.g., rmlocks might not be suitable for tcbinfo; (2) rmlocks, unlike = rwlocks,=20 >> more expensive so is not suitable for all rwlock line contention = spots --=20 >> implement reader priority propagation, so you must reason about; and = (3)=20 >> historically, rmlocks have not fully implemented WITNESS so you may = get less=20 >> good debugging output. if_lagg is a nice place to use rmlocks, as=20 >> reconfigurations are very rare, and it's really all about long-term = data=20 >> stability. >=20 > 3) should no longer be an issue. rmlocks now have full WITNESS and = assertion > support (including an rm_assert). >=20 > However, one thing to consider is that rmlocks pin readers to CPUs = while the > read lock is held (which rwlocks do not do). And this is not a problem for the application that we're giving it in = the lagg driver. Scott From owner-freebsd-net@FreeBSD.ORG Thu Aug 29 16:02:20 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id E36A8E26; Thu, 29 Aug 2013 16:02:20 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher ADH-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id B1B0E23FE; Thu, 29 Aug 2013 16:02:20 +0000 (UTC) Received: from jhbbsd.localnet (unknown [38.105.238.108]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 7183BB924; Thu, 29 Aug 2013 12:02:19 -0400 (EDT) From: John Baldwin To: Scott Long Subject: Re: [rfc] migrate lagg to an rmlock Date: Thu, 29 Aug 2013 12:01:03 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p28; KDE/4.5.5; amd64; ; ) References: <201308291042.13282.jhb@freebsd.org> <41614148-3900-4FE0-88AC-40F10DAE2030@yahoo.com> In-Reply-To: <41614148-3900-4FE0-88AC-40F10DAE2030@yahoo.com> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201308291201.04131.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Thu, 29 Aug 2013 12:02:19 -0400 (EDT) Cc: FreeBSD Net , Adrian Chadd , freebsd-current@freebsd.org, Robert Watson , "Alexander V. Chernikov" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Aug 2013 16:02:21 -0000 On Thursday, August 29, 2013 11:37:08 am Scott Long wrote: > > On Aug 29, 2013, at 7:42 AM, John Baldwin wrote: > > > On Saturday, August 24, 2013 10:16:33 am Robert Watson wrote: > >> There are a number of other places in the kernel where migration to an rmlock > >> makes sense -- however, some care must be taken for four reasons: (1) while > >> read locks don't experience line contention, write locking becomes observably > >> e.g., rmlocks might not be suitable for tcbinfo; (2) rmlocks, unlike rwlocks, > >> more expensive so is not suitable for all rwlock line contention spots -- > >> implement reader priority propagation, so you must reason about; and (3) > >> historically, rmlocks have not fully implemented WITNESS so you may get less > >> good debugging output. if_lagg is a nice place to use rmlocks, as > >> reconfigurations are very rare, and it's really all about long-term data > >> stability. > > > > 3) should no longer be an issue. rmlocks now have full WITNESS and assertion > > support (including an rm_assert). > > > > However, one thing to consider is that rmlocks pin readers to CPUs while the > > read lock is held (which rwlocks do not do). > > And this is not a problem for the application that we're giving it in the > lagg driver. That is likely true. I was merely tweaking Robert's general guidelines re: rmlock. -- John Baldwin From owner-freebsd-net@FreeBSD.ORG Thu Aug 29 16:45:15 2013 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 17E6A5AE; Thu, 29 Aug 2013 16:45:15 +0000 (UTC) (envelope-from asomers@gmail.com) Received: from mail-qa0-x233.google.com (mail-qa0-x233.google.com [IPv6:2607:f8b0:400d:c00::233]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 851C2273F; Thu, 29 Aug 2013 16:45:14 +0000 (UTC) Received: by mail-qa0-f51.google.com with SMTP id bv4so561234qab.17 for ; Thu, 29 Aug 2013 09:45:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=HAyNn9bc3yaMa1x5xVTCmpi6ny00uipYeGPrsNlbhdk=; b=Q0IVXH1ksA0z0LLG7P4FpTpfLN6uun8FIlmCgIUGcNaHfxxdoDZnTNnWv6GqU6b5lI 9zlXV4dyxWKobcttaYvssbvZ9RBzMZw8QNGZ7g+737HR7mxLn2VrSyBp7P9e0rQ8Y0eb p4c9pWamP6QRSkjLIJIHhyG2ig+sA11JnwU25eukaoln93Sczh+JhGKfIg5AExnz8VI9 /Xbx0sEBSHzN8jdE9ACUGEGnpxeGayBLsT8112dCyGm+VXGOhCF7s+ePII4aBNGvwWDS WtSP9gtXEHak9y4svuFXO5HmhLgJjdYpxJndcmAr/2f/nqVsTBi6Tsq8Nga5yyTrznnI n4Aw== MIME-Version: 1.0 X-Received: by 10.224.122.195 with SMTP id m3mr6296790qar.9.1377794713672; Thu, 29 Aug 2013 09:45:13 -0700 (PDT) Sender: asomers@gmail.com Received: by 10.49.39.101 with HTTP; Thu, 29 Aug 2013 09:45:13 -0700 (PDT) In-Reply-To: References: <521BBD21.4070304@freebsd.org> <521EE8DA.3060107@freebsd.org> Date: Thu, 29 Aug 2013 10:45:13 -0600 X-Google-Sender-Auth: 7eBMBpBK73wvS-Ufq7TRJZINz8s Message-ID: Subject: Re: Flow ID, LACP, and igb From: Alan Somers To: "T.C. Gubatayao" Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: Jack F Vogel , "Justin T. Gibbs" , Andre Oppermann , Alan Somers , "net@freebsd.org" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Aug 2013 16:45:15 -0000 On Thu, Aug 29, 2013 at 1:27 AM, T.C. Gubatayao wrote: > > No problem with fnv_hash(). > > Doesn't it have bad mixing? Good distribution is important since this > code is > for load balancing. > The poor mixing in FNV hash comes from the 8-bit XOR operation. But that provides fine mixing of the last 8 bits, which should be sufficient for lagg_hash unless people are lagging together > 256 ports. > > FNV is also slower compared to most of the newer non-cryptographic hashes, > certainly on large keys, but even on small ones. Of course, performance > will > vary with the architecture. > > > While I agree that it is likely that siphash24() is slower if you could > afford > > the time do a test run it would be great to from guess to know. > > +1 > You might want to consider lookup3 too, since it's also readily available > in the > kernel [1]. > I pulled all four hash functions out into userland and microbenchmarked them. The upshot is that hash32 and fnv_hash are the fastest, jenkins_hash is slower, and siphash24 is the slowest. Also, Clang resulted in much faster code than gcc. http://people.freebsd.org/~asomers/lagg_hash/ [root@sm4u-4 /usr/home/alans/ctest/lagg_hash]# ./lagg_hash-gcc-4.8 FNV: 0.76 hash32: 1.18 SipHash24: 44.39 Jenkins: 6.20 [root@sm4u-4 /usr/home/alans/ctest/lagg_hash]# ./lagg_hash-gcc-4.2.1 FNV: 0.74 hash32: 1.35 SipHash24: 55.25 Jenkins: 7.37 [root@sm4u-4 /usr/home/alans/ctest/lagg_hash]# ./lagg_hash.clang-3.3 FNV: 0.30 hash32: 0.30 SipHash24: 55.97 Jenkins: 6.45 [root@sm4u-4 /usr/home/alans/ctest/lagg_hash]# ./lagg_hash.clang-3.2 FNV: 0.30 hash32: 0.30 SipHash24: 44.52 Jenkins: 6.48 > T.C. > > [1] > http://svnweb.freebsd.org/base/head/sys/libkern/jenkins_hash.c?view=markup From owner-freebsd-net@FreeBSD.ORG Thu Aug 29 17:40:42 2013 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id ECC1C8BA; Thu, 29 Aug 2013 17:40:41 +0000 (UTC) (envelope-from rizzo.unipi@gmail.com) Received: from mail-la0-x22a.google.com (mail-la0-x22a.google.com [IPv6:2a00:1450:4010:c03::22a]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id A38822B51; Thu, 29 Aug 2013 17:40:40 +0000 (UTC) Received: by mail-la0-f42.google.com with SMTP id ep20so665043lab.1 for ; Thu, 29 Aug 2013 10:40:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=+QtmQRstoe0BYelCPkOV3t6KkzGns7QM4t+LglDOmmE=; b=aGsDbxYE4esUnxSRJ91o5B02qW9oDJydAbLQhRT4wuyy6DxoeTc5h00mgD3KwS0KLG oHtHywrJev7pe/iTl++bqJB9WzvyY0zsl4yodjVmagcHafXmJo31w4vbJSfMRubvMpCf kjBvXY1oQhfXY4HLqj119r4qxF65wxRGpEtK9Rv5nODck8e83Y8ABre8+P1t5ZJhSCLB b5yIu3pJItZOvY7MDFQOvQ+9xSvKQj4IT83qWUTqlrlLiT9v2Oz9XBPrhYjckccl9qPC yESXQd2H60hJMYH1awwnNEJ2x7HXY5R9Ya3rWvcGlAByLoZZ7oPpZaLyM5r8qaoPpODs uX4A== MIME-Version: 1.0 X-Received: by 10.152.170.166 with SMTP id an6mr3818549lac.20.1377798038395; Thu, 29 Aug 2013 10:40:38 -0700 (PDT) Sender: rizzo.unipi@gmail.com Received: by 10.114.200.165 with HTTP; Thu, 29 Aug 2013 10:40:38 -0700 (PDT) In-Reply-To: References: <521BBD21.4070304@freebsd.org> Date: Thu, 29 Aug 2013 19:40:38 +0200 X-Google-Sender-Auth: AoImMjnmcbj2urmbr8Z8qccl46Q Message-ID: Subject: Re: Flow ID, LACP, and igb From: Luigi Rizzo To: Alan Somers Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: Jack F Vogel , "Justin T. Gibbs" , Andre Oppermann , "freebsd-net@freebsd.org" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Aug 2013 17:40:42 -0000 On Thu, Aug 29, 2013 at 1:42 AM, Alan Somers wrote: > On Mon, Aug 26, 2013 at 2:40 PM, Andre Oppermann > wrote: > > > On 26.08.2013 19:18, Justin T. Gibbs wrote: > > > ... > > >> Are there other checksums we should be looking at in addition to FNV? > >> > > > > siphash24() is fast, keyed and strong. > > > I benchmarked hash32 (the existing hash function) vs fnv_hash using both > TCP and UDP, with 1500 and 9000 byte MTUs. At 10Gbps, I couldn't measure > any difference in either throughput or cpu utilization. Given that > siphash24 is definitely slower than hash32, there's no way that I'll find > with these large MTUs the packet rate is too low to see the difference between the various functions. Just as a data point, the jenkins hash used in the netmap code takes at most 10-15ns (with data in cache) on the i7-2600 CPUs i was using in my tests. I think the way to tell which hash is faster is to run the function in a tight loop, rather than relying on input traffic. Then of course there are cache misses that impact heavily the cost of the function, but that is an orthogonal issues that exists for all hashes. cheers luigi From owner-freebsd-net@FreeBSD.ORG Thu Aug 29 17:44:39 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id D7404DEE for ; Thu, 29 Aug 2013 17:44:39 +0000 (UTC) (envelope-from adrian.chadd@gmail.com) Received: from mail-qe0-x22f.google.com (mail-qe0-x22f.google.com [IPv6:2607:f8b0:400d:c02::22f]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 962F52BB3 for ; Thu, 29 Aug 2013 17:44:39 +0000 (UTC) Received: by mail-qe0-f47.google.com with SMTP id b4so380675qen.20 for ; Thu, 29 Aug 2013 10:44:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=IfNhQ761TGQg32fYKl0LKJwYGK2QmjHF1wl3LBkKJcQ=; b=h81pRuWqE15CXt6wd7x6HuWAYSmKSsNFYr9YcyMEhLuoPTBzBVvmgj+imhSjVc+7OQ h8XLKVu+YLORLR3PJ2FP82chlBvvDH5TjXcWAoPf8xSyzDPNSfWdghlokxVZllOTBVkU GHMEVibfAzNTYwbf2uYOYVDwqhchqXTMCt0F4VHknbpd9iNCBm1P5KeX9iKW8QkkFMd/ D6vBqxmIhDxeOugqjnXLmZBZfECwgz2bTpJ99ebBosZnc29qc3TOKaedMkJc3gUKWXdk VD9cS8yrgpjgjqaxk9IaOyIOPBHfsC/I3bo/RK/SiiyQyVD58V4LTdkOp/2miWeWe8FJ OzhQ== MIME-Version: 1.0 X-Received: by 10.224.42.200 with SMTP id t8mr6471462qae.4.1377798268830; Thu, 29 Aug 2013 10:44:28 -0700 (PDT) Sender: adrian.chadd@gmail.com Received: by 10.224.128.70 with HTTP; Thu, 29 Aug 2013 10:44:28 -0700 (PDT) In-Reply-To: <20130829152847.9AC9514A14E@mail.netbsd.org> References: <20130804191310.2FFBB14A152@mail.netbsd.org> <9813E50B-C557-4FE1-BADF-A2CFFCBB8BD7@felyko.com> <20130804195538.C87A614A135@mail.netbsd.org> <20130804225434.87A9C14A152@mail.netbsd.org> <20130829152847.9AC9514A14E@mail.netbsd.org> Date: Thu, 29 Aug 2013 10:44:28 -0700 X-Google-Sender-Auth: FE7KujQzojcQhCsHhCascVHou_Q Message-ID: Subject: Re: BPF_MISC+BPF_COP and BPF_COPX From: Adrian Chadd To: Mindaugas Rasiukevicius Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: tech-net@netbsd.org, guy@alum.mit.edu, FreeBSD Net X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Aug 2013 17:44:40 -0000 On 29 August 2013 08:28, Mindaugas Rasiukevicius wrote: > I have added these to the NetBSD tree: > > #define BPF_MISCOP(code) ((code) & 0xf8) > #define BPF_TAX 0x00 > +#define BPF_COP 0x20 > +#define BPF_COPX 0x40 > #define BPF_TXA 0x80 > > Would you like to reserve them in FreeBSD as well? > Sure, open up a PR so I at least add them to the tree. Thanks, -adrian From owner-freebsd-net@FreeBSD.ORG Thu Aug 29 18:48:48 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id CC946556 for ; Thu, 29 Aug 2013 18:48:48 +0000 (UTC) (envelope-from yuri@rawbw.com) Received: from shell0.rawbw.com (shell0.rawbw.com [198.144.192.45]) by mx1.freebsd.org (Postfix) with ESMTP id 9D852205A for ; Thu, 29 Aug 2013 18:48:48 +0000 (UTC) Received: from eagle.yuri.org (stunnel@localhost [127.0.0.1]) (authenticated bits=0) by shell0.rawbw.com (8.14.4/8.14.4) with ESMTP id r7TImf61093538 for ; Thu, 29 Aug 2013 11:48:42 -0700 (PDT) (envelope-from yuri@rawbw.com) Message-ID: <521F9789.5000903@rawbw.com> Date: Thu, 29 Aug 2013 11:48:41 -0700 From: Yuri User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130822 Thunderbird/17.0.8 MIME-Version: 1.0 To: freebsd-net@freebsd.org Subject: LOCAL_CREDS are broken ? Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Aug 2013 18:48:48 -0000 The example below breaks with "Protocol not available" But what is wrong? Isn't this the correct usage? LOCAL_CREDS are only handled in kern/uipc_usrreq.c for AF_LOCAL, so it isn't clear why this doesn't work. Yuri --- example.c --- #include #include #include #include #include main() { int sock; int error; int oval = 1; error = socket(AF_LOCAL, SOCK_SEQPACKET, 0); if (error == -1) {perror("socket"); exit(-1);} sock = error; error = setsockopt(sock, SOL_SOCKET, LOCAL_CREDS, &oval, sizeof(oval)); if (error) {perror("setsockopt"); exit(-1);} } From owner-freebsd-net@FreeBSD.ORG Thu Aug 29 19:53:44 2013 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id D8EEED3E for ; Thu, 29 Aug 2013 19:53:44 +0000 (UTC) (envelope-from btv1==9536076923e==tgubatayao@barracuda.com) Received: from bsf03.barracuda.com (bsf03.barracuda.com [64.235.145.83]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id AFDCE24B4 for ; Thu, 29 Aug 2013 19:53:44 +0000 (UTC) X-ASG-Debug-ID: 1377804860-05b9635b395d6430001-oFaieN Received: from bn-scl-fe06.Cudanet.local (mail.barracuda.com [10.8.1.48]) by bsf03.barracuda.com with ESMTP id GwE9GMdeuZwZMBSB (version=TLSv1 cipher=AES128-SHA bits=128 verify=NO); Thu, 29 Aug 2013 12:34:20 -0700 (PDT) X-Barracuda-Envelope-From: tgubatayao@barracuda.com Received: from bn-scl-be04.Cudanet.local (10.8.1.56) by bn-scl-fe06.Cudanet.local (10.8.1.48) with Microsoft SMTP Server (TLS) id 8.3.298.1; Thu, 29 Aug 2013 12:34:21 -0700 Received: from BN-SCL-MBX03.Cudanet.local ([fe80::e5b6:9fef:a4d2:a5ba]) by bn-scl-be04.Cudanet.local ([::1]) with mapi; Thu, 29 Aug 2013 12:34:20 -0700 From: "T.C. Gubatayao" To: Alan Somers Date: Thu, 29 Aug 2013 12:33:58 -0700 Subject: Re: Flow ID, LACP, and igb Thread-Topic: Flow ID, LACP, and igb X-ASG-Orig-Subj: Re: Flow ID, LACP, and igb Thread-Index: Ac6k7rUiDkFvYNlHSL69A2ttEnQXYA== Message-ID: <0771FC4F-BCDD-4985-A33F-09951806AD99@barracuda.com> References: <521BBD21.4070304@freebsd.org> <521EE8DA.3060107@freebsd.org> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-apple-encoding-hint: 513 x-universally-unique-identifier: 608f5fdc-5bd1-4080-bc50-c66091b62866 x-apple-mail-remote-attachments: YES x-apple-base-url: x-msg://3642/ x-apple-windows-friendly: 1 x-apple-mail-signature: x-uniform-type-identifier: com.apple.mail-draft acceptlanguage: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Barracuda-Connect: mail.barracuda.com[10.8.1.48] X-Barracuda-Start-Time: 1377804860 X-Barracuda-Encrypted: AES128-SHA X-Barracuda-URL: http://10.8.98.66:8000/cgi-mod/mark.cgi X-Barracuda-BRTS-Status: 1 X-Virus-Scanned: by bsmtpd at barracuda.com X-Barracuda-Spam-Score: 0.62 X-Barracuda-Spam-Status: No, SCORE=0.62 using per-user scores of TAG_LEVEL=1000.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=9.0 tests=COMMA_SUBJECT, THREAD_INDEX, THREAD_TOPIC X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.139766 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- 0.01 THREAD_INDEX thread-index: AcO7Y8iR61tzADqsRmmc5wNiFHEOig== 0.01 THREAD_TOPIC Thread-Topic: ...(Japanese Subject)... 0.60 COMMA_SUBJECT Subject is like 'Re: FDSDS, this is a subject' Cc: Jack F Vogel , "Justin T. Gibbs" , Andre Oppermann , "net@freebsd.org" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Aug 2013 19:53:44 -0000 On Aug 29, 2013, at 12:45 PM, Alan Somers wrote: > I pulled all four hash functions out into userland and microbenchmarked t= hem. > The upshot is that hash32 and fnv_hash are the fastest, jenkins_hash is > slower, and siphash24 is the slowest. Also, Clang resulted in much faste= r > code than gcc. I didn't realize that you were testing incremental hashing with 4 and 6 byt= e keys. There might be advantages to conditionally filling out a contiguous key and then performing the hash on that. You could guarantee key alignment, f= or =20 one, and this would benefit the hashes which perform word-sized reads.=20 Based on my quick tests, lookup3 and SipHash improve significantly. T.C. diff -u a/lagg_hash.c b/lagg_hash.c --- a/lagg_hash.c 2013-08-29 14:21:17.255307349 -0400 +++ b/lagg_hash.c 2013-08-29 15:16:31.135653259 -0400 @@ -7,22 +7,30 @@ #include #include #include - -uint32_t jenkins_hash32(const uint32_t *, size_t, uint32_t); +#include =20 #define ITERATIONS 100000000 =20 typedef uint32_t do_hash_t(void); =20 -// Pad the MACs with 0s because jenkins_hash operates on 32-bit inputs -const uint8_t ether_shost[] =3D {181, 16, 73, 9, 219, 22, 0, 0}; -const uint8_t ether_dhost[] =3D {69, 170, 210, 111, 24, 120, 0, 0}; +const uint8_t ether_shost[] =3D {181, 16, 73, 9, 219, 22}; +const uint8_t ether_dhost[] =3D {69, 170, 210, 111, 24, 120}; +const uint8_t ether_hosts[] =3D { 181, 16, 73, 9, 219, 22, + 69, 170, 210, 111, 24, 120 }; const struct in_addr ip_src =3D {.s_addr =3D 1329258245}; const struct in_addr ip_dst =3D {.s_addr =3D 1319097119}; +const struct in_addr ips[2] =3D { { .s_addr =3D 1329258245 }, + { .s_addr =3D 1319097119 } }; const uint32_t ports =3D 3132895450; const uint8_t sipkey[16] =3D {7, 239, 255, 43, 68, 53, 56, 225, 98, 81, 177, 80, 92, 235, 242, 39}; =20 +struct key { + uint8_t ether_hosts[12]; + struct in_addr ips[2]; + uint16_t ports[2]; +} __attribute__((packed)); + /* * Simulate how lagg_hashmbuf uses FNV hash for a TCP/IP packet * No VLAN tagging @@ -58,6 +66,15 @@ return (p); } =20 +static __inline init_key(struct key *key) +{ + + /* Simulate copying the info out of the mbuf. */ + memcpy(key->ether_hosts, ether_hosts, sizeof(ether_hosts)); + memcpy(key->ips, ips, sizeof(ips)); + memcpy(key->ports, &ports, sizeof(ports)); +} + /* * Simulate how lagg_hashmbuf would use siphash24 for a TCP/IP packet * No VLAN tagging @@ -65,16 +82,11 @@ uint32_t do_siphash24(void) { SIPHASH_CTX ctx; + struct key key; =20 - SipHash24_Init(&ctx); - SipHash_SetKey(&ctx, sipkey); + init_key(&key); =20 - SipHash_Update(&ctx, ether_shost, 6); - SipHash_Update(&ctx, ether_dhost, 6); - SipHash_Update(&ctx, &ip_src, sizeof(struct in_addr)); - SipHash_Update(&ctx, &ip_dst, sizeof(struct in_addr)); - SipHash_Update(&ctx, &ports, sizeof(ports)); - return (SipHash_End(&ctx) & 0xFFFFFFFF); + return (SipHash24(&ctx, sipkey, &key, sizeof(key)) & 0xFFFFFFFF); } =20 /* @@ -83,19 +95,11 @@ */ uint32_t do_jenkins(void) { - /* Jenkins hash does not recommend any specific initializer */ - uint32_t p =3D FNV1_32_INIT; + struct key key; =20 - /*=20 - * jenkins_hash uses 32-bit inputs, so we need to present the MACs = as - * arrays of 2 32-bit values - */ - p =3D jenkins_hash32((uint32_t*)ether_shost, 2, p); - p =3D jenkins_hash32((uint32_t*)ether_dhost, 2, p); - p =3D jenkins_hash32((uint32_t*)&ip_src, sizeof(struct in_addr) / 4= , p); - p =3D jenkins_hash32((uint32_t*)&ip_dst, sizeof(struct in_addr) / 4= , p); - p =3D jenkins_hash32(&ports, sizeof(ports) / 4, p); - return (p); + init_key(&key); + + return (jenkins_hash(&key, sizeof(key), FNV1_32_INIT)); } =20 =20 diff -u a/siphash.h b/siphash.h --- a/siphash.h 2013-08-29 14:21:21.851306417 -0400 +++ b/siphash.h 2013-08-29 14:26:44.470240137 -0400 @@ -73,8 +73,8 @@ void SipHash_Final(void *, SIPHASH_CTX *); uint64_t SipHash_End(SIPHASH_CTX *); =20 -#define SipHash24(x, y, z, i) SipHashX((x), 2, 4, (y), (z), (i)); -#define SipHash48(x, y, z, i) SipHashX((x), 4, 8, (y), (z), (i)); +#define SipHash24(x, y, z, i) SipHashX((x), 2, 4, (y), (z), (i)) +#define SipHash48(x, y, z, i) SipHashX((x), 4, 8, (y), (z), (i)) uint64_t SipHashX(SIPHASH_CTX *, int, int, const uint8_t [16], const void = *, size_t); =20 From owner-freebsd-net@FreeBSD.ORG Thu Aug 29 19:57:10 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 57BA2E2A for ; Thu, 29 Aug 2013 19:57:10 +0000 (UTC) (envelope-from nparhar@gmail.com) Received: from mail-pa0-x22d.google.com (mail-pa0-x22d.google.com [IPv6:2607:f8b0:400e:c03::22d]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 30D2424EB for ; Thu, 29 Aug 2013 19:57:10 +0000 (UTC) Received: by mail-pa0-f45.google.com with SMTP id bg4so1356247pad.18 for ; Thu, 29 Aug 2013 12:57:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:message-id:date:from:user-agent:mime-version:to:subject :content-type:content-transfer-encoding; bh=CLIfD9kglveWVuPmeYgqi8K88nOIp9F7MaUw9nG4QS8=; b=Rtz4lid0AptAiqvvprO0IIQSUg0Do3ZE+BK3/vDhl1vTDiaVjr/MTRYeyMfxXsVWYY +GfVFHo1mdR38QDatsKIPfJ9Y4wcC2faCaqk6IXFiHD2V4UyhbzzpRk4Zo0WVyPNCP3H N2YiVHet08fbp5JTsQMIYyC0gAbiyITqd4FbfAZeBG0t9Yp+gYy4JF6/MsK5Yopm/olJ OXHIbm1yMxOBtqCdCsv5KqV89p8UshaaoR2A1mClcZn4Qf4sYg1hxlsJAMi2wjCtqZ/l HNvc4iUyU2N9kQ+q6TBZ5ey639ZASmf/v5X6bS0plcu7wtmSbnop17IcG8wVh6cFEnHx iVOA== X-Received: by 10.66.253.4 with SMTP id zw4mr6341755pac.119.1377806229909; Thu, 29 Aug 2013 12:57:09 -0700 (PDT) Received: from [10.192.166.0] (stargate.chelsio.com. [67.207.112.58]) by mx.google.com with ESMTPSA id yg3sm42772653pab.16.1969.12.31.16.00.00 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 29 Aug 2013 12:57:08 -0700 (PDT) Sender: Navdeep Parhar Message-ID: <521FA792.9000807@FreeBSD.org> Date: Thu, 29 Aug 2013 12:57:06 -0700 From: Navdeep Parhar User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130819 Thunderbird/17.0.8 MIME-Version: 1.0 To: freebsd-net@FreeBSD.org Subject: Please review: atomic updates to mbuf's external refcount Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Aug 2013 19:57:10 -0000 I'd like to merge r254341 from user/np/cxl_tuning to head if there are no objections. It eliminates a couple of iffy looking constructs in uipc_mbuf.c http://svnweb.freebsd.org/base/user/np/cxl_tuning/sys/kern/uipc_mbuf.c?r1=254334&r2=254341&diff_format=u --------------------- Always increment or decrement an mbuf's external refcount atomically. Always decrement it in mb_free_ext() so that an external free routine can safely assert the refcount is 0 (and not 0 or 1) when it's called. --------------------- Regards, Navdeep diff -r 9e9639a7df80 -r 9753d3e51363 sys/kern/uipc_mbuf.c --- a/sys/kern/uipc_mbuf.c Thu Aug 29 11:16:04 2013 -0700 +++ b/sys/kern/uipc_mbuf.c Thu Aug 29 11:16:04 2013 -0700 @@ -282,7 +282,7 @@ m_extadd(struct mbuf *mb, caddr_t buf, u /* * Non-directly-exported function to clean up after mbufs with M_EXT - * storage attached to them if the reference count hits 1. + * storage attached to them if the reference count hits 0. */ void mb_free_ext(struct mbuf *m) @@ -298,8 +298,7 @@ mb_free_ext(struct mbuf *m) skipmbuf = (m->m_flags & M_NOFREE); /* Free attached storage if this mbuf is the only reference to it. */ - if (*(m->m_ext.ref_cnt) == 1 || - atomic_fetchadd_int(m->m_ext.ref_cnt, -1) == 1) { + if (atomic_fetchadd_int(m->m_ext.ref_cnt, -1) == 1) { switch (m->m_ext.ext_type) { case EXT_PACKET: /* The packet zone is special. */ if (*(m->m_ext.ref_cnt) == 0) @@ -367,10 +366,7 @@ mb_dupcl(struct mbuf *n, struct mbuf *m) KASSERT(m->m_ext.ref_cnt != NULL, ("%s: ref_cnt not set", __func__)); KASSERT((n->m_flags & M_EXT) == 0, ("%s: M_EXT set", __func__)); - if (*(m->m_ext.ref_cnt) == 1) - *(m->m_ext.ref_cnt) += 1; - else - atomic_add_int(m->m_ext.ref_cnt, 1); + atomic_add_int(m->m_ext.ref_cnt, 1); n->m_ext.ext_buf = m->m_ext.ext_buf; n->m_ext.ext_free = m->m_ext.ext_free; n->m_ext.ext_arg1 = m->m_ext.ext_arg1; From owner-freebsd-net@FreeBSD.ORG Thu Aug 29 20:21:07 2013 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 95BE64C6; Thu, 29 Aug 2013 20:21:07 +0000 (UTC) (envelope-from asomers@gmail.com) Received: from mail-qa0-x233.google.com (mail-qa0-x233.google.com [IPv6:2607:f8b0:400d:c00::233]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 0BE372692; Thu, 29 Aug 2013 20:21:06 +0000 (UTC) Received: by mail-qa0-f51.google.com with SMTP id bv4so702370qab.3 for ; Thu, 29 Aug 2013 13:21:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=5vxxH+s3gd/HOVTD7ikX0bfmGwV4VP/RsWHW2nelBqo=; b=Ua8CzKrkq1z18X/uhwiFm3S99kiuGvJFR165oDmrXTdpTL/us6QrL4Yr6/hTzV7RsT IMH9dBMwTLcEvodj5xNhV6Rp6x9rWV55sZ374X4e6L7GoZlM7ruFbRg/cJY9l/v5uyv2 hGTweZ8ZRh19qnHLauxqgyDHkPLfEJWkiNhQ+h+qt014cffuvkL4e5bKsVPVpc0vl7qg 1aaoIuo6MG0t+Vxno8UOjgZSBybrBxUFnDSGTvtSVPybEbVbhJpYQV2Ce71g0Jv9ekrd 3bvDuMq4QwaWcgmG6k7mHAzFM6aOOrrZEworY91Wa/dn/WgcglF/Xws7C5OlIi8R+tl0 DjZQ== MIME-Version: 1.0 X-Received: by 10.224.23.134 with SMTP id r6mr7207108qab.34.1377807666112; Thu, 29 Aug 2013 13:21:06 -0700 (PDT) Sender: asomers@gmail.com Received: by 10.49.39.101 with HTTP; Thu, 29 Aug 2013 13:21:05 -0700 (PDT) In-Reply-To: <0771FC4F-BCDD-4985-A33F-09951806AD99@barracuda.com> References: <521BBD21.4070304@freebsd.org> <521EE8DA.3060107@freebsd.org> <0771FC4F-BCDD-4985-A33F-09951806AD99@barracuda.com> Date: Thu, 29 Aug 2013 14:21:05 -0600 X-Google-Sender-Auth: tD5HdcWsW6On3TITBCf8wMaZt7c Message-ID: Subject: Re: Flow ID, LACP, and igb From: Alan Somers To: "T.C. Gubatayao" Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: Jack F Vogel , "Justin T. Gibbs" , Andre Oppermann , Alan Somers , "net@freebsd.org" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Aug 2013 20:21:07 -0000 On Thu, Aug 29, 2013 at 1:33 PM, T.C. Gubatayao wrote: > On Aug 29, 2013, at 12:45 PM, Alan Somers wrote: > > > I pulled all four hash functions out into userland and microbenchmarked > them. > > The upshot is that hash32 and fnv_hash are the fastest, jenkins_hash is > > slower, and siphash24 is the slowest. Also, Clang resulted in much > faster > > code than gcc. > > I didn't realize that you were testing incremental hashing with 4 and 6 > byte > keys. > > There might be advantages to conditionally filling out a contiguous key > and then performing the hash on that. You could guarantee key alignment, > for > one, and this would benefit the hashes which perform word-sized reads. > > Based on my quick tests, lookup3 and SipHash improve significantly. > They're faster, but even with this change, jenkins_hash is still 6 times slower than FNV hash. Also, your technique of copying the hashable fields into a separate buffer would need modification to work with different types of packet and different LAGG_F_HASH[234] flags. Because different packets have different hashable fields, struct key would need to be expanded to include the vlan tag, IPV6 addresses, and IPv6 flowid. lagg_hashmbuf would then have to zero the unused fields. In any case, that's not going to make Jenkins and SipHash24 more likely to beat FNV. > > T.C. > > diff -u a/lagg_hash.c b/lagg_hash.c > --- a/lagg_hash.c 2013-08-29 14:21:17.255307349 -0400 > +++ b/lagg_hash.c 2013-08-29 15:16:31.135653259 -0400 > @@ -7,22 +7,30 @@ > #include > #include > #include > - > -uint32_t jenkins_hash32(const uint32_t *, size_t, uint32_t); > +#include > > #define ITERATIONS 100000000 > > typedef uint32_t do_hash_t(void); > > -// Pad the MACs with 0s because jenkins_hash operates on 32-bit inputs > -const uint8_t ether_shost[] = {181, 16, 73, 9, 219, 22, 0, 0}; > -const uint8_t ether_dhost[] = {69, 170, 210, 111, 24, 120, 0, 0}; > +const uint8_t ether_shost[] = {181, 16, 73, 9, 219, 22}; > +const uint8_t ether_dhost[] = {69, 170, 210, 111, 24, 120}; > +const uint8_t ether_hosts[] = { 181, 16, 73, 9, 219, 22, > + 69, 170, 210, 111, 24, 120 }; > const struct in_addr ip_src = {.s_addr = 1329258245}; > const struct in_addr ip_dst = {.s_addr = 1319097119}; > +const struct in_addr ips[2] = { { .s_addr = 1329258245 }, > + { .s_addr = 1319097119 } }; > const uint32_t ports = 3132895450; > const uint8_t sipkey[16] = {7, 239, 255, 43, 68, 53, 56, 225, > 98, 81, 177, 80, 92, 235, 242, 39}; > > +struct key { > + uint8_t ether_hosts[12]; > + struct in_addr ips[2]; > + uint16_t ports[2]; > +} __attribute__((packed)); > + > /* > * Simulate how lagg_hashmbuf uses FNV hash for a TCP/IP packet > * No VLAN tagging > @@ -58,6 +66,15 @@ > return (p); > } > > +static __inline init_key(struct key *key) > +{ > + > + /* Simulate copying the info out of the mbuf. */ > + memcpy(key->ether_hosts, ether_hosts, sizeof(ether_hosts)); > + memcpy(key->ips, ips, sizeof(ips)); > + memcpy(key->ports, &ports, sizeof(ports)); > +} > + > /* > * Simulate how lagg_hashmbuf would use siphash24 for a TCP/IP packet > * No VLAN tagging > @@ -65,16 +82,11 @@ > uint32_t do_siphash24(void) > { > SIPHASH_CTX ctx; > + struct key key; > > - SipHash24_Init(&ctx); > - SipHash_SetKey(&ctx, sipkey); > + init_key(&key); > > - SipHash_Update(&ctx, ether_shost, 6); > - SipHash_Update(&ctx, ether_dhost, 6); > - SipHash_Update(&ctx, &ip_src, sizeof(struct in_addr)); > - SipHash_Update(&ctx, &ip_dst, sizeof(struct in_addr)); > - SipHash_Update(&ctx, &ports, sizeof(ports)); > - return (SipHash_End(&ctx) & 0xFFFFFFFF); > + return (SipHash24(&ctx, sipkey, &key, sizeof(key)) & 0xFFFFFFFF); > } > > /* > @@ -83,19 +95,11 @@ > */ > uint32_t do_jenkins(void) > { > - /* Jenkins hash does not recommend any specific initializer */ > - uint32_t p = FNV1_32_INIT; > + struct key key; > > - /* > - * jenkins_hash uses 32-bit inputs, so we need to present the MACs > as > - * arrays of 2 32-bit values > - */ > - p = jenkins_hash32((uint32_t*)ether_shost, 2, p); > - p = jenkins_hash32((uint32_t*)ether_dhost, 2, p); > - p = jenkins_hash32((uint32_t*)&ip_src, sizeof(struct in_addr) / 4, > p); > - p = jenkins_hash32((uint32_t*)&ip_dst, sizeof(struct in_addr) / 4, > p); > - p = jenkins_hash32(&ports, sizeof(ports) / 4, p); > - return (p); > + init_key(&key); > + > + return (jenkins_hash(&key, sizeof(key), FNV1_32_INIT)); > } > > > diff -u a/siphash.h b/siphash.h > --- a/siphash.h 2013-08-29 14:21:21.851306417 -0400 > +++ b/siphash.h 2013-08-29 14:26:44.470240137 -0400 > @@ -73,8 +73,8 @@ > void SipHash_Final(void *, SIPHASH_CTX *); > uint64_t SipHash_End(SIPHASH_CTX *); > > -#define SipHash24(x, y, z, i) SipHashX((x), 2, 4, (y), (z), (i)); > -#define SipHash48(x, y, z, i) SipHashX((x), 4, 8, (y), (z), (i)); > +#define SipHash24(x, y, z, i) SipHashX((x), 2, 4, (y), (z), (i)) > +#define SipHash48(x, y, z, i) SipHashX((x), 4, 8, (y), (z), (i)) > uint64_t SipHashX(SIPHASH_CTX *, int, int, const uint8_t [16], const void > *, > size_t); > > From owner-freebsd-net@FreeBSD.ORG Thu Aug 29 20:48:00 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 32197AB7; Thu, 29 Aug 2013 20:48:00 +0000 (UTC) (envelope-from crodr001@gmail.com) Received: from mail-lb0-x22d.google.com (mail-lb0-x22d.google.com [IPv6:2a00:1450:4010:c04::22d]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 3189E2801; Thu, 29 Aug 2013 20:47:59 +0000 (UTC) Received: by mail-lb0-f173.google.com with SMTP id o14so1064062lbi.32 for ; Thu, 29 Aug 2013 13:47:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=CcISQNVl0zmUJb2N5e+a1QsrMAEWMjRtpycuFtsB27Y=; b=vOqafh+99rUundUoAMiFiFPVyxGa2fjbPh4Oay3Mr1BfKmQOjYj1O+6ZI+MWSOeeAO jsvtbtg8pXg6riPzrw7sejVR+ym6Qp4tapDt6G37bIkw6dvodQJi1l6+U4+x9gk7u+xp 3BxWeu5iKhUhOao5eoIt2weDp6bSRqt7ePFemIv6RW5LfC8ZSoUy5c9m7eocoFvHFi31 ndKApq3AOxO85/vvWc84YgQTBbzKzNtsTapp8c8wV2hee8dRZ4cEtrpZrKOSH+7o0XAb a5D1/IUNMWpM3WztAs8BOYPo3yOOvCTMA3WDbO9HfEARjVQg6g3AP6YoLh7lPKWG3pFr 0pdg== MIME-Version: 1.0 X-Received: by 10.152.8.12 with SMTP id n12mr4555558laa.10.1377809276531; Thu, 29 Aug 2013 13:47:56 -0700 (PDT) Sender: crodr001@gmail.com Received: by 10.112.168.136 with HTTP; Thu, 29 Aug 2013 13:47:56 -0700 (PDT) In-Reply-To: <521C5EC2.1060901@yandex.ru> References: <521C5EC2.1060901@yandex.ru> Date: Thu, 29 Aug 2013 13:47:56 -0700 X-Google-Sender-Auth: JQvbNg2aBkzW85Fsgxgquk6zpiY Message-ID: Subject: Re: devel/jenkins port not starting. Kernel panic in IPv6 multicast code From: Craig Rodrigues To: "Andrey V. Elsukov" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: freebsd-net@freebsd.org, bms@freebsd.org, lwhsu@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Aug 2013 20:48:00 -0000 On Tue, Aug 27, 2013 at 1:09 AM, Andrey V. Elsukov wrote= : > On 27.08.2013 01:07, Craig Rodrigues wrote: > > Hi, > > > > On box 2, since I this is a debug kernel with WITNESS and INVARIANTS > > enabled, I get a kernel panic. (see attached core.txt.gz) > > It seems the log was stripped by maillist. > > > The panic occurs here on line 1779: > > > > 1768 static struct ifnet * > > 1769 in6p_lookup_mcast_ifp(const struct inpcb *in6p, > > 1770 const struct sockaddr_in6 *gsin6) > > 1771 { > > 1772 struct route_in6 ro6; > > 1773 struct ifnet *ifp; > > 1774 > > 1775 KASSERT(in6p->inp_vflag & INP_IPV6, > > 1776 ("%s: not INP_IPV6 inpcb", __func__)); > > 1777 KASSERT(gsin6->sin6_family =3D=3D AF_INET6, > > 1778 ("%s: not AF_INET6 group", __func__)); > > 1779 KASSERT(IN6_IS_ADDR_MULTICAST(&gsin6->sin6_addr), > > 1780 ("%s: not multicast", __func__)); > > > > If I look at gsin6->sin6_addr inside kgdb, > > I see: > > > > (kgdb) p gsin6->sin6_addr > > $1 =3D {__u6_addr =3D {__u6_addr8 =3D > > "\000\000\000\000\000\000\000\000\000\000=EF=BF=BD=EF=BF=BD=EF=BF=BDM|= =EF=BF=BD", __u6_addr16 =3D {0, 0, 0, > > 0, 0, 65535, 19951, 54652}, __u6_addr32 =3D {0, 0, > > 4294901760, 3581693423}}} > > > > > > I am not so familiar with this part of the networking code. > > Can someone recommend where is the best place to fix > > this would be? > > AFAIR, I already saw similar report here. > This is V4 mapped IPv6 address ::ffff:239.77.124.213. > I guess application is trying to use setsockopt with IPV6_JOIN_GROUP > option. And since outgoing interface isn't specified, the kernel is > trying to determine it from routing table. But this mapped address > triggers assert in in6p_lookup_mcast_ifp() function. It seems to me, > that v4mapped addresses isn't supported in the multicast code. If you > remove KASSERT from in6p_lookup_mcast_ifp(), this address will be > treated as invalid later. > > -- > WBR, Andrey V. Elsukov > > Andrey, Thanks for the analysis. It looks like the mailing list is dropping my attachments, so I uploaded my log files here: http://people.freebsd.org/~rodrigc/jenkins-problem/ I don't understand where to go about fixing the problem. Should we fix the FreeBSD kernel to not panic in this case, by changing the KASSERT to an error? Does the Jenkins source code need to change? Is there a way I can configure my system not to trigger this error? I am not using IPv6 in my network. Thanks -- Craig From owner-freebsd-net@FreeBSD.ORG Thu Aug 29 21:31:21 2013 Return-Path: Delivered-To: net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id B6B1E944 for ; Thu, 29 Aug 2013 21:31:21 +0000 (UTC) (envelope-from bright@mu.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.freebsd.org (Postfix) with ESMTP id A4F5F2AF7 for ; Thu, 29 Aug 2013 21:31:21 +0000 (UTC) Received: from Alfreds-MacBook-Pro-9.local (c-67-180-208-218.hsd1.ca.comcast.net [67.180.208.218]) by elvis.mu.org (Postfix) with ESMTPSA id 259B91A3DAF; Thu, 29 Aug 2013 14:31:14 -0700 (PDT) Message-ID: <521FBDA1.4060106@mu.org> Date: Thu, 29 Aug 2013 14:31:13 -0700 From: Alfred Perlstein User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:17.0) Gecko/20130801 Thunderbird/17.0.8 MIME-Version: 1.0 To: Yuri , net@FreeBSD.org Subject: Re: LOCAL_CREDS are broken ? References: <521F9789.5000903@rawbw.com> In-Reply-To: <521F9789.5000903@rawbw.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Aug 2013 21:31:21 -0000 On 8/29/13 11:48 AM, Yuri wrote: > The example below breaks with "Protocol not available" > But what is wrong? Isn't this the correct usage? > LOCAL_CREDS are only handled in kern/uipc_usrreq.c for AF_LOCAL, so it > isn't clear why this doesn't work. > > Yuri > > > > --- example.c --- > #include > #include > #include > #include > #include > > main() { > int sock; > int error; > int oval = 1; > > error = socket(AF_LOCAL, SOCK_SEQPACKET, 0); > if (error == -1) {perror("socket"); exit(-1);} > sock = error; > > error = setsockopt(sock, SOL_SOCKET, LOCAL_CREDS, &oval, sizeof(oval)); > if (error) {perror("setsockopt"); exit(-1);} > } > Looks like SOCK_SEQPACKET doesn't support LOCAL_CREDS because its protosw doesn't contain the entry for: .pr_ctloutput = &uipc_ctloutput, Have a look at src/sys/kern/uipc_usrreq.c at around lines 280-332: > static struct protosw localsw[] = { > { > .pr_type = SOCK_STREAM, > .pr_domain = &localdomain, > .pr_flags = PR_CONNREQUIRED|PR_WANTRCVD|PR_RIGHTS, > .pr_ctloutput = &uipc_ctloutput, > .pr_usrreqs = &uipc_usrreqs_stream > }, > { > .pr_type = SOCK_DGRAM, > .pr_domain = &localdomain, > .pr_flags = PR_ATOMIC|PR_ADDR|PR_RIGHTS, > .pr_ctloutput = &uipc_ctloutput, > .pr_usrreqs = &uipc_usrreqs_dgram > }, > { > .pr_type = SOCK_SEQPACKET, > .pr_domain = &localdomain, > > /* > * XXXRW: For now, PR_ADDR because soreceive will bump into them > * due to our use of sbappendaddr. A new sbappend variants is > needed > * that supports both atomic record writes and control data. > */ > .pr_flags = PR_ADDR|PR_ATOMIC|PR_CONNREQUIRED|PR_WANTRCVD| > PR_RIGHTS, > .pr_usrreqs = &uipc_usrreqs_seqpacket, > }, > }; I wonder if this is just a bug/missing code!? -Alfred > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > -- Alfred Perlstein From owner-freebsd-net@FreeBSD.ORG Thu Aug 29 21:35:55 2013 Return-Path: Delivered-To: net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 0DE67C52 for ; Thu, 29 Aug 2013 21:35:55 +0000 (UTC) (envelope-from yuri@rawbw.com) Received: from shell0.rawbw.com (shell0.rawbw.com [198.144.192.45]) by mx1.freebsd.org (Postfix) with ESMTP id D56D02B5B for ; Thu, 29 Aug 2013 21:35:54 +0000 (UTC) Received: from eagle.yuri.org (stunnel@localhost [127.0.0.1]) (authenticated bits=0) by shell0.rawbw.com (8.14.4/8.14.4) with ESMTP id r7TLZswN044374; Thu, 29 Aug 2013 14:35:54 -0700 (PDT) (envelope-from yuri@rawbw.com) Message-ID: <521FBEB9.20403@rawbw.com> Date: Thu, 29 Aug 2013 14:35:53 -0700 From: Yuri User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130822 Thunderbird/17.0.8 MIME-Version: 1.0 To: Alfred Perlstein Subject: Re: LOCAL_CREDS are broken ? References: <521F9789.5000903@rawbw.com> <521FBDA1.4060106@mu.org> In-Reply-To: <521FBDA1.4060106@mu.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: net@FreeBSD.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Aug 2013 21:35:55 -0000 On 08/29/2013 14:31, Alfred Perlstein wrote: > Looks like SOCK_SEQPACKET doesn't support LOCAL_CREDS because its > protosw doesn't contain the entry for: > .pr_ctloutput = &uipc_ctloutput, > > Have a look at src/sys/kern/uipc_usrreq.c at around lines 280-332: But SOCK_DGRAM produces the same result. Sorry, I forgot to mention in my OP. I found this case by troubleshooting the linux code that does socketpair(AF_LOCAL, SOCK_DGRAM, 0,...) and later sets the equivalent of LOCAL_CREDS and this fails. Yuri From owner-freebsd-net@FreeBSD.ORG Thu Aug 29 21:40:57 2013 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id DDB5CEE7 for ; Thu, 29 Aug 2013 21:40:56 +0000 (UTC) (envelope-from btv1==9536076923e==tgubatayao@barracuda.com) Received: from bsf03.barracuda.com (bsf03.barracuda.com [64.235.145.83]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 96E982BEF for ; Thu, 29 Aug 2013 21:40:56 +0000 (UTC) X-ASG-Debug-ID: 1377812454-05b9635b3a5dd640001-oFaieN Received: from BN-SCL-FE02.Cudanet.local (bn-scl-fe02.cudanet.local [10.8.96.69]) by bsf03.barracuda.com with ESMTP id NrfXEpN5DlYB44gl (version=TLSv1 cipher=AES128-SHA bits=128 verify=NO); Thu, 29 Aug 2013 14:40:54 -0700 (PDT) X-Barracuda-Envelope-From: tgubatayao@barracuda.com Received: from BN-SCL-FE04.Cudanet.local (10.8.96.204) by BN-SCL-FE02.Cudanet.local (10.8.96.69) with Microsoft SMTP Server (TLS) id 8.3.298.1; Thu, 29 Aug 2013 14:40:54 -0700 Received: from BN-SCL-MBX03.Cudanet.local ([fe80::e5b6:9fef:a4d2:a5ba]) by BN-SCL-FE04.Cudanet.local ([fe80::7443:fe71:7539:9156%10]) with mapi; Thu, 29 Aug 2013 14:40:54 -0700 From: "T.C. Gubatayao" To: Alan Somers Date: Thu, 29 Aug 2013 14:40:53 -0700 Subject: Re: Flow ID, LACP, and igb Thread-Topic: Flow ID, LACP, and igb X-ASG-Orig-Subj: Re: Flow ID, LACP, and igb Thread-Index: Ac6lAHAMARmm9ywPRcWaH417AVY3jw== Message-ID: References: <521BBD21.4070304@freebsd.org> <521EE8DA.3060107@freebsd.org> <0771FC4F-BCDD-4985-A33F-09951806AD99@barracuda.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: yes X-MS-TNEF-Correlator: x-apple-encoding-hint: 513 x-universally-unique-identifier: 2f974b20-2ce7-4994-a8de-d34babbf2057 x-apple-mail-remote-attachments: YES x-apple-base-url: x-msg://4347/ x-apple-mail-signature: x-uniform-type-identifier: com.apple.mail-draft acceptlanguage: en-US Content-Type: multipart/mixed; boundary="_002_C209B12FA40447EC82253F5E4123E05Ebarracudacom_" MIME-Version: 1.0 X-Barracuda-Connect: bn-scl-fe02.cudanet.local[10.8.96.69] X-Barracuda-Start-Time: 1377812454 X-Barracuda-Encrypted: AES128-SHA X-Barracuda-URL: http://10.8.98.66:8000/cgi-mod/mark.cgi X-Barracuda-BRTS-Status: 1 X-Virus-Scanned: by bsmtpd at barracuda.com X-Barracuda-Spam-Score: 0.62 X-Barracuda-Spam-Status: No, SCORE=0.62 using per-user scores of TAG_LEVEL=1000.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=9.0 tests=COMMA_SUBJECT, THREAD_INDEX, THREAD_TOPIC X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.139774 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- 0.01 THREAD_INDEX thread-index: AcO7Y8iR61tzADqsRmmc5wNiFHEOig== 0.01 THREAD_TOPIC Thread-Topic: ...(Japanese Subject)... 0.60 COMMA_SUBJECT Subject is like 'Re: FDSDS, this is a subject' X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: Jack F Vogel , "Justin T. Gibbs" , Andre Oppermann , "net@freebsd.org" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Aug 2013 21:40:57 -0000 --_002_C209B12FA40447EC82253F5E4123E05Ebarracudacom_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable On Aug 29, 2013, at 4:21 PM, Alan Somers wrote: > They're faster, but even with this change, jenkins_hash is still 6 times > slower than FNV hash. Actually, I think your test isn't accurately simulating memory access, whic= h might be skewing the results. For example, from net/if_lagg.c: p =3D hash32_buf(&eh->ether_shost, ETHER_ADDR_LEN, p); p =3D hash32_buf(&eh->ether_dhost, ETHER_ADDR_LEN, p); These two calls can't both be aligned, since ETHER_ADDR_LEN is 6 octets. T= he same is true for the other hashed fields in the IP and TCP/UDP headers. Assuming the mbuf data pointer is aligned, the IP addresses and ports are b= oth on 2-byte alignments (without VLAN or IP options). In your test, they're a= ll =20 aligned and in the same cache line. When I modify the test to simulate an mbuf, lookup3 beats FNV and hash32, a= nd SipHash is only 2-3 times slower. > Also, your technique of copying the hashable fields into a separate buffe= r > would need modification to work with different types of packet and differ= ent > LAGG_F_HASH[234] flags. Because different packets have different hashabl= e > fields, struct key would need to be expanded to include the vlan tag, IPV= 6 > addresses, and IPv6 flowid. lagg_hashmbuf would then have to zero the un= used > fields. Agreed, but this is relatively simple with a buffer on the stack, and does = not require zeroes or padding. See my modified test, attached. T.C.= --_002_C209B12FA40447EC82253F5E4123E05Ebarracudacom_-- From owner-freebsd-net@FreeBSD.ORG Thu Aug 29 21:51:44 2013 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 9D964230 for ; Thu, 29 Aug 2013 21:51:44 +0000 (UTC) (envelope-from btv1==9536076923e==tgubatayao@barracuda.com) Received: from bsf03.barracuda.com (bsf03.barracuda.com [64.235.145.83]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 1AC782CB6 for ; Thu, 29 Aug 2013 21:51:44 +0000 (UTC) X-ASG-Debug-ID: 1377813102-05b9635b395de7c0001-oFaieN Received: from BN-SCL-FE02.Cudanet.local (bn-scl-fe02.cudanet.local [10.8.96.69]) by bsf03.barracuda.com with ESMTP id IGzyeeDVpxjoVdSZ (version=TLSv1 cipher=AES128-SHA bits=128 verify=NO); Thu, 29 Aug 2013 14:51:42 -0700 (PDT) X-Barracuda-Envelope-From: tgubatayao@barracuda.com Received: from BN-SCL-FE04.Cudanet.local (10.8.96.204) by BN-SCL-FE02.Cudanet.local (10.8.96.69) with Microsoft SMTP Server (TLS) id 8.3.298.1; Thu, 29 Aug 2013 14:51:42 -0700 Received: from BN-SCL-MBX03.Cudanet.local ([fe80::e5b6:9fef:a4d2:a5ba]) by BN-SCL-FE04.Cudanet.local ([fe80::7443:fe71:7539:9156%10]) with mapi; Thu, 29 Aug 2013 14:51:42 -0700 From: "T.C. Gubatayao" To: Alan Somers Date: Thu, 29 Aug 2013 14:51:41 -0700 Subject: Re: Flow ID, LACP, and igb Thread-Topic: Flow ID, LACP, and igb X-ASG-Orig-Subj: Re: Flow ID, LACP, and igb Thread-Index: Ac6lAfIUZwEezkr9Tj603bLm0m2f8g== Message-ID: <49170157-EFC7-44A3-B881-12B4F2644F59@barracuda.com> References: <521BBD21.4070304@freebsd.org> <521EE8DA.3060107@freebsd.org> <0771FC4F-BCDD-4985-A33F-09951806AD99@barracuda.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-universally-unique-identifier: 2f974b20-2ce7-4994-a8de-d34babbf2057 x-apple-mail-remote-attachments: YES x-apple-base-url: x-msg://4455/ x-apple-windows-friendly: 1 x-apple-mail-signature: x-uniform-type-identifier: com.apple.mail-draft acceptlanguage: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Barracuda-Connect: bn-scl-fe02.cudanet.local[10.8.96.69] X-Barracuda-Start-Time: 1377813102 X-Barracuda-Encrypted: AES128-SHA X-Barracuda-URL: http://10.8.98.66:8000/cgi-mod/mark.cgi X-Barracuda-BRTS-Status: 1 X-Virus-Scanned: by bsmtpd at barracuda.com X-Barracuda-Spam-Score: 0.62 X-Barracuda-Spam-Status: No, SCORE=0.62 using per-user scores of TAG_LEVEL=1000.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=9.0 tests=COMMA_SUBJECT, THREAD_INDEX, THREAD_TOPIC X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.139776 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- 0.01 THREAD_INDEX thread-index: AcO7Y8iR61tzADqsRmmc5wNiFHEOig== 0.01 THREAD_TOPIC Thread-Topic: ...(Japanese Subject)... 0.60 COMMA_SUBJECT Subject is like 'Re: FDSDS, this is a subject' Cc: Jack F Vogel , "Justin T. Gibbs" , Andre Oppermann , "net@freebsd.org" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Aug 2013 21:51:44 -0000 On Aug 29, 2013, at 5:40 PM, T.C. Gubatayao wrot= e: > On Aug 29, 2013, at 4:21 PM, Alan Somers wrote: > >> They're faster, but even with this change, jenkins_hash is still 6 times >> slower than FNV hash. > > Actually, I think your test isn't accurately simulating memory access, wh= ich > might be skewing the results. > > For example, from net/if_lagg.c: > > p =3D hash32_buf(&eh->ether_shost, ETHER_ADDR_LEN, p); > p =3D hash32_buf(&eh->ether_dhost, ETHER_ADDR_LEN, p); > > These two calls can't both be aligned, since ETHER_ADDR_LEN is 6 octets. = The > same is true for the other hashed fields in the IP and TCP/UDP headers. > Assuming the mbuf data pointer is aligned, the IP addresses and ports are= both > on 2-byte alignments (without VLAN or IP options). In your test, they're= all > aligned and in the same cache line. > > When I modify the test to simulate an mbuf, lookup3 beats FNV and hash32,= and > SipHash is only 2-3 times slower. > >> Also, your technique of copying the hashable fields into a separate buff= er >> would need modification to work with different types of packet and diffe= rent >> LAGG_F_HASH[234] flags. Because different packets have different hashab= le >> fields, struct key would need to be expanded to include the vlan tag, IP= V6 >> addresses, and IPv6 flowid. lagg_hashmbuf would then have to zero the u= nused >> fields. > > Agreed, but this is relatively simple with a buffer on the stack, and doe= s not > require zeroes or padding. See my modified test, attached. > > T.C. Attachment was stripped. --- a/lagg_hash.c 2013-08-29 14:21:17.255307349 -0400 +++ b/lagg_hash.c 2013-08-29 17:26:14.055404918 -0400 @@ -7,35 +7,63 @@ #include #include #include - -uint32_t jenkins_hash32(const uint32_t *, size_t, uint32_t); +#include +#include +#include +#include =20 #define ITERATIONS 100000000 =20 -typedef uint32_t do_hash_t(void); +typedef uint32_t do_hash_t(uint32_t); + +/* + * Simulate mbuf data for a packet. + * No VLAN tagging and no IP options. + */ +struct _mbuf { + struct ether_header eh; + struct ip ip; + struct tcphdr th; +} __attribute__((packed)) m =3D { + { + .ether_dhost =3D { 181, 16, 73, 9, 219, 22 }, + .ether_shost =3D { 69, 170, 210, 11, 24, 120 }, + .ether_type =3D 0x008 + }, + { + .ip_src.s_addr =3D 1329258245, + .ip_dst.s_addr =3D 1319097119, + .ip_p =3D 0x06 + }, + { + .th_sport =3D 12506, + .th_dport =3D 47804 + } +}; =20 -// Pad the MACs with 0s because jenkins_hash operates on 32-bit inputs -const uint8_t ether_shost[] =3D {181, 16, 73, 9, 219, 22, 0, 0}; -const uint8_t ether_dhost[] =3D {69, 170, 210, 111, 24, 120, 0, 0}; -const struct in_addr ip_src =3D {.s_addr =3D 1329258245}; -const struct in_addr ip_dst =3D {.s_addr =3D 1319097119}; -const uint32_t ports =3D 3132895450; const uint8_t sipkey[16] =3D {7, 239, 255, 43, 68, 53, 56, 225, 98, 81, 177, 80, 92, 235, 242, 39}; =20 +#define LAGG_F_HASHL2 0x1 +#define LAGG_F_HASHL3 0x2 +#define LAGG_F_HASHL4 0x4 +#define LAGG_F_HASHALL (LAGG_F_HASHL2|LAGG_F_HASHL3|LAGG_F_HASHL4) + /* * Simulate how lagg_hashmbuf uses FNV hash for a TCP/IP packet * No VLAN tagging */ -uint32_t do_fnv(void) +uint32_t do_fnv(uint32_t flags) { uint32_t p =3D FNV1_32_INIT; =20 - p =3D fnv_32_buf(ether_shost, 6, p); - p =3D fnv_32_buf(ether_dhost, 6, p); - p =3D fnv_32_buf(&ip_src, sizeof(struct in_addr), p); - p =3D fnv_32_buf(&ip_dst, sizeof(struct in_addr), p); - p =3D fnv_32_buf(&ports, sizeof(ports), p); + if (flags & LAGG_F_HASHL2) + p =3D fnv_32_buf(&m.eh.ether_dhost, 12, p); + if (flags & LAGG_F_HASHL3) + p =3D fnv_32_buf(&m.ip.ip_src, 8, p); + if (flags & LAGG_F_HASHL4) + p =3D fnv_32_buf(&m.th.th_sport, 4, p); + return (p); } =20 @@ -43,59 +71,74 @@ * Simulate how lagg_hashmbuf uses hash32 for a TCP/IP packet * No VLAN tagging */ -uint32_t do_hash32(void) +uint32_t do_hash32(uint32_t flags) { // Actually, if_lagg used a pseudorandom number determined at inter= face // creation time. But this should have the same timing // characteristics. uint32_t p =3D HASHINIT; =20 - p =3D hash32_buf(ether_shost, 6, p); - p =3D hash32_buf(ether_dhost, 6, p); - p =3D hash32_buf(&ip_src, sizeof(struct in_addr), p); - p =3D hash32_buf(&ip_dst, sizeof(struct in_addr), p); - p =3D hash32_buf(&ports, sizeof(ports), p); + if (flags & LAGG_F_HASHL2) + p =3D hash32_buf(&m.eh.ether_dhost, 12, p); + if (flags & LAGG_F_HASHL3) + p =3D hash32_buf(&m.ip.ip_src, 8, p); + if (flags & LAGG_F_HASHL4) + p =3D hash32_buf(&m.th.th_sport, 4, p); + return (p); } =20 +/* Simulate copying the info out of the mbuf. */ +static __inline size_t init_key(char *key, uint32_t flags) +{ + uint16_t etype; + size_t len =3D 0; + + if (flags & LAGG_F_HASHL2) { + memcpy(key + len, &m.eh.ether_dhost, 12); + len +=3D 12; + } + + if (flags & LAGG_F_HASHL3) { + memcpy(key + len, &m.ip.ip_src, 8); + len +=3D 8; + } + + if (flags & LAGG_F_HASHL4) { + memcpy(key + len, &m.th.th_sport, 4); + len +=3D 4; + } + + return (len); +} + /* * Simulate how lagg_hashmbuf would use siphash24 for a TCP/IP packet * No VLAN tagging */ -uint32_t do_siphash24(void) +uint32_t do_siphash24(uint32_t flags) { SIPHASH_CTX ctx; + char key[26]; + size_t len; =20 - SipHash24_Init(&ctx); - SipHash_SetKey(&ctx, sipkey); + len =3D init_key(key, flags); =20 - SipHash_Update(&ctx, ether_shost, 6); - SipHash_Update(&ctx, ether_dhost, 6); - SipHash_Update(&ctx, &ip_src, sizeof(struct in_addr)); - SipHash_Update(&ctx, &ip_dst, sizeof(struct in_addr)); - SipHash_Update(&ctx, &ports, sizeof(ports)); - return (SipHash_End(&ctx) & 0xFFFFFFFF); + return (SipHash24(&ctx, sipkey, key, len) & 0xFFFFFFFF); } =20 /* * Simulate how lagg_hashmbuf would use lookup3 aka jenkins_hash * No VLAN tagging */ -uint32_t do_jenkins(void) +uint32_t do_jenkins(uint32_t flags) { - /* Jenkins hash does not recommend any specific initializer */ - uint32_t p =3D FNV1_32_INIT; + char key[26]; + size_t len; =20 - /*=20 - * jenkins_hash uses 32-bit inputs, so we need to present the MACs = as - * arrays of 2 32-bit values - */ - p =3D jenkins_hash32((uint32_t*)ether_shost, 2, p); - p =3D jenkins_hash32((uint32_t*)ether_dhost, 2, p); - p =3D jenkins_hash32((uint32_t*)&ip_src, sizeof(struct in_addr) / 4= , p); - p =3D jenkins_hash32((uint32_t*)&ip_dst, sizeof(struct in_addr) / 4= , p); - p =3D jenkins_hash32(&ports, sizeof(ports) / 4, p); - return (p); + len =3D init_key(key, flags); + + return (jenkins_hash(key, len, FNV1_32_INIT)); } =20 =20 @@ -120,7 +163,7 @@ =20 gettimeofday(&tv_old, NULL); for (j=3D0; j Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 9FA283A7 for ; Thu, 29 Aug 2013 22:58:34 +0000 (UTC) (envelope-from nparhar@gmail.com) Received: from mail-pb0-x232.google.com (mail-pb0-x232.google.com [IPv6:2607:f8b0:400e:c01::232]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 79BB620BE for ; Thu, 29 Aug 2013 22:58:34 +0000 (UTC) Received: by mail-pb0-f50.google.com with SMTP id uo5so1054151pbc.23 for ; Thu, 29 Aug 2013 15:58:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:message-id:date:from:user-agent:mime-version:to:subject :content-type:content-transfer-encoding; bh=9rkCDT4G7LeQD9U1jM9zisyKCN2LL5mLf213x7qMhdw=; b=bheLelrRJvq0ikMuptz3sauC8/Bvw/s8yxTGd9qPTfBpgy0oy6H514cth4yxRao5/Q gD0kCNsGwGUmqyTnt71cVJA9r4W6/a+M8qX7XTuI+1hUPBf4/OuydO+t+iFpjpFTvruf j9TxNv6JmnZ0gw3oVZtFjOAP4C6ruItFYGE2F6rVZF2tTb4RBaQm1N7Y+YVAlumXIK6Z kFclPujmZ+/me53YtH59fowfHTdtgr0b2ZoMPUjLzc6ycEmWt+cPrvEusLcKC0aRSRZx +lfQJ1RjSKKcWlAnqsKgohpbq5Mvb5m2zgcZmgSDeYrymHndojEqHmjUBztvN22MBxi9 mwIg== X-Received: by 10.66.136.131 with SMTP id qa3mr6992943pab.77.1377817114195; Thu, 29 Aug 2013 15:58:34 -0700 (PDT) Received: from [10.192.166.0] (stargate.chelsio.com. [67.207.112.58]) by mx.google.com with ESMTPSA id sb9sm30759708pbb.0.1969.12.31.16.00.00 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 29 Aug 2013 15:58:33 -0700 (PDT) Sender: Navdeep Parhar Message-ID: <521FD217.5080603@FreeBSD.org> Date: Thu, 29 Aug 2013 15:58:31 -0700 From: Navdeep Parhar User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130819 Thunderbird/17.0.8 MIME-Version: 1.0 To: "freebsd-net@freebsd.org" Subject: Please review: lazy ext refcount initialization Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Aug 2013 22:58:34 -0000 I'd like to merge r254342 from user/np/cxl_tuning to head if there are no objections. http://svnweb.freebsd.org/base/user/np/cxl_tuning/sys/kern/kern_mbuf.c?r1=254334&r2=254342&diff_format=u http://svnweb.freebsd.org/base/user/np/cxl_tuning/sys/sys/mbuf.h?r1=254334&r2=254342&diff_format=u --------------------- Perform lazy initialization of a cluster's refcount if possible. This doesn't change anything for the common cases where the constructor is given an mbuf to attach to the cluster, or when the cluster is obtained with m_cljget(NULL, ...) and attached later with m_cljset(). But it allows for an alternate usage scenario where the cluster is managed EXT_EXTREF style without ever having to look up its "usual" refcount via uma_find_refcnt. --------------------- Regards, Navdeep diff -r 9753d3e51363 -r c9388a59fba6 sys/kern/kern_mbuf.c --- a/sys/kern/kern_mbuf.c Thu Aug 29 11:16:04 2013 -0700 +++ b/sys/kern/kern_mbuf.c Thu Aug 29 11:16:04 2013 -0700 @@ -503,8 +503,6 @@ mb_dtor_pack(void *mem, int size, void * static int mb_ctor_clust(void *mem, int size, void *arg, int how) { - struct mbuf *m; - u_int *refcnt; int type; uma_zone_t zone; @@ -535,10 +533,11 @@ mb_ctor_clust(void *mem, int size, void break; } - m = (struct mbuf *)arg; - refcnt = uma_find_refcnt(zone, mem); - *refcnt = 1; - if (m != NULL) { + if (arg != NULL) { + struct mbuf *m = arg; + u_int *refcnt = uma_find_refcnt(zone, mem); + + *refcnt = 1; m->m_ext.ext_buf = (caddr_t)mem; m->m_data = m->m_ext.ext_buf; m->m_flags |= M_EXT; @@ -549,6 +548,10 @@ mb_ctor_clust(void *mem, int size, void m->m_ext.ext_type = type; m->m_ext.ext_flags = 0; m->m_ext.ref_cnt = refcnt; + } else { +#ifdef INVARIANTS + *uma_find_refcnt(zone, mem) = 0; +#endif } return (0); diff -r 9753d3e51363 -r c9388a59fba6 sys/sys/mbuf.h --- a/sys/sys/mbuf.h Thu Aug 29 11:16:04 2013 -0700 +++ b/sys/sys/mbuf.h Thu Aug 29 11:16:04 2013 -0700 @@ -721,6 +721,7 @@ m_cljset(struct mbuf *m, void *cl, int t m->m_ext.ext_type = type; m->m_ext.ext_flags = 0; m->m_ext.ref_cnt = uma_find_refcnt(zone, cl); + *m->m_ext.ref_cnt = 1; m->m_flags |= M_EXT; } From owner-freebsd-net@FreeBSD.ORG Fri Aug 30 03:11:51 2013 Return-Path: Delivered-To: freebsd-net@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id A3CD9296; Fri, 30 Aug 2013 03:11:51 +0000 (UTC) (envelope-from linimon@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 78C842E11; Fri, 30 Aug 2013 03:11:51 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id r7U3BpgZ001618; Fri, 30 Aug 2013 03:11:51 GMT (envelope-from linimon@freefall.freebsd.org) Received: (from linimon@localhost) by freefall.freebsd.org (8.14.7/8.14.7/Submit) id r7U3BpC8001617; Fri, 30 Aug 2013 03:11:51 GMT (envelope-from linimon) Date: Fri, 30 Aug 2013 03:11:51 GMT Message-Id: <201308300311.r7U3BpC8001617@freefall.freebsd.org> To: linimon@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-net@FreeBSD.org From: linimon@FreeBSD.org Subject: Re: kern/181657: [bpf] [patch] BPF_COP/BPF_COPX instruction reservation (sync with NetBSD) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 Aug 2013 03:11:51 -0000 Old Synopsis: BPF_COP/BPF_COPX instruction reservation (sync with NetBSD) New Synopsis: [bpf] [patch] BPF_COP/BPF_COPX instruction reservation (sync with NetBSD) Responsible-Changed-From-To: freebsd-bugs->freebsd-net Responsible-Changed-By: linimon Responsible-Changed-When: Fri Aug 30 03:11:33 UTC 2013 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=181657 From owner-freebsd-net@FreeBSD.ORG Fri Aug 30 06:36:04 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 76FF469D for ; Fri, 30 Aug 2013 06:36:04 +0000 (UTC) (envelope-from darrenr@netbsd.org) Received: from out3-smtp.messagingengine.com (out3-smtp.messagingengine.com [66.111.4.27]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 320AB292A for ; Fri, 30 Aug 2013 06:36:03 +0000 (UTC) Received: from compute6.internal (compute6.nyi.mail.srv.osa [10.202.2.46]) by gateway1.nyi.mail.srv.osa (Postfix) with ESMTP id 5FA4A21112; Fri, 30 Aug 2013 02:35:57 -0400 (EDT) Received: from web4 ([10.202.2.214]) by compute6.internal (MEProxy); Fri, 30 Aug 2013 02:35:57 -0400 DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d= messagingengine.com; h=message-id:from:to:cc:mime-version :content-transfer-encoding:content-type:subject:reply-to:date :in-reply-to:references; s=smtpout; bh=uQRrtrMIsTRHTH+iZO+aVGzrn Cg=; b=FxRCpiQRaNosfpw3JQHSY5MXRxQX1snl65oPfMF2AeQHhTH1LEx0LI9Gx k8qzeRHcvMRNcIFT+xs9dl+PWWZjziGKG5HKFNSYOF7hqU+9pk0pgpb88skv1wXB Y3C8vGqOSD4Jqj6JqYFlJhFBX8hkecthHm7tP441HcyPi3hx2M= Received: by web4.nyi.mail.srv.osa (Postfix, from userid 99) id 29BD410B747; Fri, 30 Aug 2013 02:35:57 -0400 (EDT) Message-Id: <1377844557.1741.15873957.4FB18641@webmail.messagingengine.com> X-Sasl-Enc: 0q2qO4KgjuowgnRYLhFjsXDtthNOA8HB6tR/zmDNiNpB 1377844557 From: Darren Reed To: Mindaugas Rasiukevicius MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Type: text/plain X-Mailer: MessagingEngine.com Webmail Interface - ajax-f98c0b0f Subject: Re: BPF_MISC+BPF_COP and BPF_COPX (summary and patch) Date: Fri, 30 Aug 2013 08:35:57 +0200 In-Reply-To: <20130829152757.030C414A13C@mail.netbsd.org> References: <20130804191310.2FFBB14A152@mail.netbsd.org> <20130822101623.3837E14A21D@mail.netbsd.org> <521F4522.5070403@netbsd.org> <20130829152757.030C414A13C@mail.netbsd.org> Cc: tech-net@netbsd.org, guy@alum.mit.edu, freebsd-net@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: darrenr@netbsd.org List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 Aug 2013 06:36:04 -0000 On Thu, Aug 29, 2013, at 05:27 PM, Mindaugas Rasiukevicius wrote: > Darren Reed wrote: > > Mindaugas Rasiukevicius wrote: > > > Hi, > > > > > > OK, to summarise what has been discussed: > > > > > > - Problem > > > > > > There is a need to perform more complex operations from the BPF program. > > > Currently, there is no (practical) way to do that from the byte-code. > > > Such functionality is useful for the packet filters or other components, > > > which could integrate with BPF. For example, while most of the packet > > > inspection logic can stay in the byte-code, such operations as looking > > > up an IP address in some container or walking the IPv6 headers and > > > returning some offsets have to be done externally. The first existing > > > user of such capability would be NPF in NetBSD. > > > > > > > I'd argue that the IPv6 problem is of such a generic nature that > > it deserves its own instruction/s. We may look at IPv6 today and > > think nobody uses it much but over time that is going to change. > > Thus there will be an outcome not possible with co-processor > > approach if an instruction is created for that purpose and is > > common across all platforms through libpcap. Unless the IPv6 > > problem is too complex for a single instruction (this has not > > been demonstrated.) In that case maybe BPF itself needs to evolve > > such that it can support more complex instructions. > > This is a separate issue. Feel free to propose an new instruction to > parse IPv6 headers. To do this requires understanding what we want to do with the extension headers. Do we want to: 1) find a particular extension header 2) find the start of the last extension header 3) get a list of all extension headers (including L4 protocols) 4) as per (3) but also their offset 5) ... anything else? For example, if you were to write a tcpdump expression to filter fragments for IPv6, what should the BPF look like? Alternatively if you wanted to display all packets that had both a routing header, fragment header and were UDP, how complex should the BPF instruction set that results be? > > The current implementation of BPF makes it very hard to expand > > the instruction set without impinging on the ability to make > > future changes due to the way in which instructions are codified > > into 32bits. Whilst the method of supporting a co-processor gets > > around that, it does so in such a generic fashion that it becomes > > too easy to use it as a bit-bucket for anything you think might > > be a good idea if BPF could do without really evaluating if it > > should do. > > It is certainly possible that some operations, which will be > implemented using BPF coprocessor, will be useful in general. > Again, whether such operations should be "promoted" to be > new BPF instructions or there should be a global "standardised" > coprocessor or how BPF should evolve (including RISC vs > CISC-like instruction set debate) is a separate discussion. Oh, my comment above was much more open than that. Effectively what I'm proposing is moving beyond the single 32bit instruction word for BPF: maybe it is time for BPF to join the 64bit world. The current design of BPF enforces a certain amount of rigour because it can't be expanded ad-hoc due to the design and the implementation of that design. It forces you to consider "should I do it" not just "can I do it." My comment above about the "should" part refers to the co-processor argument taking away a certain amount of engineering discipline that currently goes with BPF. Currently you can't just use it for "anything and everything." Cheers, Darren From owner-freebsd-net@FreeBSD.ORG Fri Aug 30 23:24:52 2013 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 073825DD; Fri, 30 Aug 2013 23:24:52 +0000 (UTC) (envelope-from asomers@gmail.com) Received: from mail-wi0-x234.google.com (mail-wi0-x234.google.com [IPv6:2a00:1450:400c:c05::234]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id E4CF82696; Fri, 30 Aug 2013 23:24:50 +0000 (UTC) Received: by mail-wi0-f180.google.com with SMTP id l12so51060wiv.13 for ; Fri, 30 Aug 2013 16:24:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=fLDSSp0OfJacufg1SrGhgxfw3yeSmqu3OycbUyvI1ZA=; b=KRjo/9Xxgnx65lbyXYa9mKigS4d3fvR57I0DxqCvHYXuDBwnLoHWg9nceXNgiHfQ4J JM/JU8rs91hA99RANqVMjKjZ3Kx9BbXs4wMjhD5dS720Joy2t5CkKbFikJTXASP2wHcF jlLgAG7pgJHrRNOiKbMXaX2WHPL3jDK19scc9c9kwx+6V2MuRsonjK6Ib1GN8hQ72Llk wIGjzz8g1TBOLfCfzTzFh4uA0HIB6eVf+XBjRjFngyzaOkfDbj62qnQq/SZqW79CwNjq Y6Xv18+B57KceYAY1QEeqlqxhU5m+Fb/98D7vRZCyQRyP0f+trrfZ8+UwEL6ezW891xk 0UxA== MIME-Version: 1.0 X-Received: by 10.180.219.75 with SMTP id pm11mr4283869wic.47.1377905089252; Fri, 30 Aug 2013 16:24:49 -0700 (PDT) Sender: asomers@gmail.com Received: by 10.194.171.35 with HTTP; Fri, 30 Aug 2013 16:24:49 -0700 (PDT) In-Reply-To: References: <521BBD21.4070304@freebsd.org> <521EE8DA.3060107@freebsd.org> <0771FC4F-BCDD-4985-A33F-09951806AD99@barracuda.com> Date: Fri, 30 Aug 2013 17:24:49 -0600 X-Google-Sender-Auth: n5LLHwtvkvGGDB9K70IMsIVL_yo Message-ID: Subject: Re: Flow ID, LACP, and igb From: Alan Somers To: "T.C. Gubatayao" Content-Type: text/plain; charset=ISO-8859-1 Cc: Jack F Vogel , "Justin T. Gibbs" , Andre Oppermann , Alan Somers , "net@freebsd.org" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 Aug 2013 23:24:52 -0000 On Thu, Aug 29, 2013 at 3:40 PM, T.C. Gubatayao wrote: > On Aug 29, 2013, at 4:21 PM, Alan Somers wrote: > >> They're faster, but even with this change, jenkins_hash is still 6 times >> slower than FNV hash. > > Actually, I think your test isn't accurately simulating memory access, which > might be skewing the results. > > For example, from net/if_lagg.c: > > p = hash32_buf(&eh->ether_shost, ETHER_ADDR_LEN, p); > p = hash32_buf(&eh->ether_dhost, ETHER_ADDR_LEN, p); > > These two calls can't both be aligned, since ETHER_ADDR_LEN is 6 octets. The > same is true for the other hashed fields in the IP and TCP/UDP headers. > Assuming the mbuf data pointer is aligned, the IP addresses and ports are both > on 2-byte alignments (without VLAN or IP options). In your test, they're all > aligned and in the same cache line. > > When I modify the test to simulate an mbuf, lookup3 beats FNV and hash32, and > SipHash is only 2-3 times slower. Indeed, in your latest version FNV and hash32 are significantly slower. It isn't due to alignment issues though; those hashes don't care about alignment because they access data 8 bits at a time. The problem was that Clang was too smart for me. In my version, Clang was computing FNV hash and hash32 entirely at compile time. All the functions did at runtime was return the correct answer. Your mbuf simulation defeats that optimization. I think that your latest version is fairly accurate, and it shows that Jenkins is the fastest when compiled with Clang and when all three layers are hashed. However, FNV is faster when compiled with GCC, or when only one or two layers are hashed. In any case, the difference between FNV and Jenkins is about 4ns/packet, which is about as significant as whether to paint the roof cyan or aquamarine. As far as I'm concerned, FNV still has two major advantages: it's available in stable/9, and it's a drop-in replacement for hash32. Using Jenkins would require refactoring lagg_hashmbuf to copy the hashable fields into a stack buffer. I'm loath to do that, because then I would have to test lagg_hashmbuf with IPv6 and VLAN packets. My network isn't currently setup for those. Using FNV is a simple enough change that I would feel comfortable committing it without testing VLANs and IPv6. We have a three day weekend in my country, but hopefully I'll be able to wrap up my testing on Tuesday. > >> Also, your technique of copying the hashable fields into a separate buffer >> would need modification to work with different types of packet and different >> LAGG_F_HASH[234] flags. Because different packets have different hashable >> fields, struct key would need to be expanded to include the vlan tag, IPV6 >> addresses, and IPv6 flowid. lagg_hashmbuf would then have to zero the unused >> fields. > > Agreed, but this is relatively simple with a buffer on the stack, and does not > require zeroes or padding. See my modified test, attached. > > T.C. From owner-freebsd-net@FreeBSD.ORG Sat Aug 31 00:05:03 2013 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 28D44731; Sat, 31 Aug 2013 00:05:03 +0000 (UTC) (envelope-from rizzo.unipi@gmail.com) Received: from mail-la0-x22d.google.com (mail-la0-x22d.google.com [IPv6:2a00:1450:4010:c03::22d]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id BCA2528D8; Sat, 31 Aug 2013 00:05:01 +0000 (UTC) Received: by mail-la0-f45.google.com with SMTP id eh20so2035237lab.18 for ; Fri, 30 Aug 2013 17:04:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=JFxNqLmO9Al4PmG9V9J0Y+zFkCpcJ9iXICzugC7dKMQ=; b=oqwp2eBdnK8bHRD7KdCI1dmpprRWH0Nh3fN5VxfjPY56xoa427DB2ynTtLQ5uiXPGw P2gZMIvByNPfiVQtKoJqSbm8YkiFUzTHmn0Lp+OquVgpCHvnZJ8eyck/rVPtCiUuQ9Yt szQLxaYb4hMru8hx+zE0uYtcv5yvkoaDSJWQSAsjA9GCfZMYjUZOvrvRmXt0UE3P6ST+ NJD5AAsbxThBpenKMprQM3/1KgKTnJRRjOmCrBoWfRNBtSQwTMeWw4Qn4kUU2NOJaKET j58uZZ8yE3wlUqmQcmEuxqfdl+3Q9h/25mJM08vRbEEeFUPMZ0wGQ4v2JHhg7M+ZRA5D A3Zw== MIME-Version: 1.0 X-Received: by 10.112.14.102 with SMTP id o6mr9757380lbc.28.1377907499050; Fri, 30 Aug 2013 17:04:59 -0700 (PDT) Sender: rizzo.unipi@gmail.com Received: by 10.114.200.165 with HTTP; Fri, 30 Aug 2013 17:04:58 -0700 (PDT) In-Reply-To: References: <521BBD21.4070304@freebsd.org> <521EE8DA.3060107@freebsd.org> Date: Sat, 31 Aug 2013 02:04:58 +0200 X-Google-Sender-Auth: nP5sj5JRm-M4YdVnQI6x6G9HdPc Message-ID: Subject: Re: Flow ID, LACP, and igb From: Luigi Rizzo To: Alan Somers Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: Jack F Vogel , "net@freebsd.org" , "Justin T. Gibbs" , Andre Oppermann , "T.C. Gubatayao" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 31 Aug 2013 00:05:03 -0000 Alan, On Thu, Aug 29, 2013 at 6:45 PM, Alan Somers wrote: > > > ... > I pulled all four hash functions out into userland and microbenchmarked > them. The upshot is that hash32 and fnv_hash are the fastest, jenkins_hash > is slower, and siphash24 is the slowest. Also, Clang resulted in much > faster code than gcc. > > i missed this part of your message, but if i read your code well, you are running 100M iterations and the numbers below are in seconds, so if you multiply the numbers by 10 you have the cost per hash in nanoseconds. What CPU did you use for your tests ? Also some of the numbers (FNV and hash32) are suspiciously low. I believe that the compiler (both of them) have figure out that everything is constant in these functions, and fnv_32_buf() and hash32_buf() are inline, hence they can be optimized to just return a constant. This does not happen for siphash and jenkins because they are defined externally. Can you please re-run the tests in a way that defeats the optimization ? (e.g. pass a non constant argument to the the hashes so you actually need to run the code). cheers luigi http://people.freebsd.org/~asomers/lagg_hash/ > > [root@sm4u-4 /usr/home/alans/ctest/lagg_hash]# ./lagg_hash-gcc-4.8 > FNV: 0.76 > hash32: 1.18 > SipHash24: 44.39 > Jenkins: 6.20 > [root@sm4u-4 /usr/home/alans/ctest/lagg_hash]# ./lagg_hash-gcc-4.2.1 > FNV: 0.74 > hash32: 1.35 > SipHash24: 55.25 > Jenkins: 7.37 > [root@sm4u-4 /usr/home/alans/ctest/lagg_hash]# ./lagg_hash.clang-3.3 > FNV: 0.30 > hash32: 0.30 > SipHash24: 55.97 > Jenkins: 6.45 > [root@sm4u-4 /usr/home/alans/ctest/lagg_hash]# ./lagg_hash.clang-3.2 > FNV: 0.30 > hash32: 0.30 > SipHash24: 44.52 > Jenkins: 6.48 > > > > > T.C. > > > > [1] > > > http://svnweb.freebsd.org/base/head/sys/libkern/jenkins_hash.c?view=markup > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > From owner-freebsd-net@FreeBSD.ORG Sat Aug 31 07:18:00 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 8F0ADB42 for ; Sat, 31 Aug 2013 07:18:00 +0000 (UTC) (envelope-from darrenr@netbsd.org) Received: from out5-smtp.messagingengine.com (out5-smtp.messagingengine.com [66.111.4.29]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 61C1E2FB1 for ; Sat, 31 Aug 2013 07:18:00 +0000 (UTC) Received: from compute6.internal (compute6.nyi.mail.srv.osa [10.202.2.46]) by gateway1.nyi.mail.srv.osa (Postfix) with ESMTP id 4CE33201A7; Sat, 31 Aug 2013 03:17:58 -0400 (EDT) Received: from web1 ([10.202.2.211]) by compute6.internal (MEProxy); Sat, 31 Aug 2013 03:17:58 -0400 DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d= messagingengine.com; h=message-id:from:to:cc:mime-version :content-transfer-encoding:content-type:reply-to:in-reply-to :references:subject:date; s=smtpout; bh=Qd+sqhvzwVQDMRzPjX3/SdJj jFo=; b=BBmowzY5R73Jol9pN5RoxzUNO1N5YK7OnGzTMoOmVW7NDk+ACYqB/stl 60oW3Br5OTfwdIWx2E8fxnSILiJz7Ts1WUvt9difv5Z6cF5xD1ag7S1qeIKFlv5J T1nUsIqf4taYk0+ugEMQ716POjIXRM+R5zRhkFtgLz7yZjNylzw= Received: by web1.nyi.mail.srv.osa (Postfix, from userid 99) id 1AE2BF00016; Sat, 31 Aug 2013 03:17:58 -0400 (EDT) Message-Id: <1377933478.28627.16270117.27E3258F@webmail.messagingengine.com> X-Sasl-Enc: QCtYSly6OBYMcfelhT96gv4IBsdCzw3gr5fHWI7vq4+s 1377933478 From: Darren Reed To: Mindaugas Rasiukevicius MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Type: text/plain X-Mailer: MessagingEngine.com Webmail Interface - ajax-f98c0b0f In-Reply-To: <1377844557.1741.15873957.4FB18641@webmail.messagingengine.com> References: <20130804191310.2FFBB14A152@mail.netbsd.org> <20130822101623.3837E14A21D@mail.netbsd.org> <521F4522.5070403@netbsd.org> <20130829152757.030C414A13C@mail.netbsd.org> <1377844557.1741.15873957.4FB18641@webmail.messagingengine.com> Subject: Re: BPF_MISC+BPF_COP and BPF_COPX (summary and patch) Date: Sat, 31 Aug 2013 09:17:58 +0200 Cc: tech-net@netbsd.org, guy@alum.mit.edu, freebsd-net@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: darrenr@netbsd.org List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 31 Aug 2013 07:18:00 -0000 Look, if you're going to appeal to core to approve changes for committing then don't bother posting them to tech-net or any other list because it is quite clear that you are not interested in feedback, only people to rubber-stamp your ideas. All of which to say is that I'm sorry I bothered replying to any email in this thread because obviously I've wasted my time, as have others who have replied to emails in this thread. There is no point involving people in the community if you (and those funding your work) take this approach in development. In future please don't bother emailing this list or any other with your ideas as it is obvious that you (and your backers) are simply not interested in what others have to say. Kind Regards, Darren From owner-freebsd-net@FreeBSD.ORG Sat Aug 31 12:42:01 2013 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 8CD4D135 for ; Sat, 31 Aug 2013 12:42:01 +0000 (UTC) (envelope-from barney_cordoba@yahoo.com) Received: from nm25-vm5.bullet.mail.ne1.yahoo.com (nm25-vm5.bullet.mail.ne1.yahoo.com [98.138.91.247]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 2D9DE2F49 for ; Sat, 31 Aug 2013 12:42:00 +0000 (UTC) Received: from [98.138.90.54] by nm25.bullet.mail.ne1.yahoo.com with NNFMP; 31 Aug 2013 12:41:54 -0000 Received: from [98.138.101.163] by tm7.bullet.mail.ne1.yahoo.com with NNFMP; 31 Aug 2013 12:41:54 -0000 Received: from [127.0.0.1] by omp1074.mail.ne1.yahoo.com with NNFMP; 31 Aug 2013 12:41:54 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 318374.35467.bm@omp1074.mail.ne1.yahoo.com Received: (qmail 59789 invoked by uid 60001); 31 Aug 2013 12:41:54 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1377952914; bh=1EHxvoH8V8rYNzrqDND6z3u4ABlyi9RltR2y16rhNjM=; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type; b=IuPYlnq9for77UvA044H65tVACn8J8rrGkaggtYRk9oBEYPRNGaocWhiNl+55t4O0k+T+meCEoqsMPrStPnMFE/6kfXnavzYSEQ4g4ktl9p2vnI17KwKwdsJcS7SKGIOwiWXgRklGDllgETEE572LKAjDTG7B50xoyTWdDCuN30= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type; b=pZgO0B3Gp6x2NoRhrJNcEX//Z6ONaceILTU++jbpF7/md/0aDMGiF6IwdrlMDX02kOdTMTUv2Zg7VtKITj+waD0VJzr0qdGWa6SM5YNYmG1it6BiOUBjaHyV3DdWgM77+sl5mDM4MBaeNJyH+IUpv8mSkYSFHmgBqCYsgnTaOvE=; X-YMail-OSG: QfW5D34VM1l93hf7uMM_4bP0UYKih.R14jWrEjISQwicC47 cL3J4dHwNrXXlPpFwOofeUQAMx_oEjX2uuPpbDw92i6y.hslxlTZrJrUZ553 Z5GSp8PKTg2QfbttpiUokXmJFsMobRb2fZIVaeSiRNv4jpXsaYwpl_t6GFNu oVFGoFLBMjClwlqerC8Kds4njsyCUml3mSrsywxSM1D3ZjTR.qZEtVLSRepU O3UDJb0hRI5HYGpoP8K0Ei.iO_295YeE7YLmkXmeq5FUsmHU3rECLwL8lEuU KtLhRZCoaQogYuu10ub8QSvzKarUgieId4r80GJtFBWBwWikjdYuD.dtFKZY H5R8NYh9W_uPDYMU7up6PBip8_B9ZmjopyTvqeUSiIrg.qRLes9fTCG1me40 NHRqYs.2x34Od3wYiIO36dO32geXiN0QaHCbCHQzErb3j2lUAdX4naFJcR9Y D3bH1nhNnvggYk52rKciAqKFSu.5xANkElzE9NUEDs60PsUdediAa_IAfrPa vgjFFiGD8n7RJO3NbY0uDDRwj3zFt3lpSI84R1Sp_S8ZRZThZ_WVJ.zjzKDi cQAPvq6.lggDfg7hDuwEe4Q-- Received: from [98.203.118.124] by web121605.mail.ne1.yahoo.com via HTTP; Sat, 31 Aug 2013 05:41:53 PDT X-Rocket-MIMEInfo: 002.001, TWF5IEkgZXhwcmVzcyBteSBnbGVlIGFuZCBhc3RvbmlzaG1lbnQgdGhhdCDCoHlvdSdyZSBkZWJhdGluZyB0aGUgdXNlIG9mIGNvbXBsaWNhdGVkIGhhc2ggZnVuY3Rpb25zCmZvciBzb21ldGhpbmcgdGhhdCdzIGxpa2VseSB0byBoYXZlIGZyb20gMi04IHNsb3RzPwoKQWxzbywgdGhlICptb3N0KiBpbXBvcnRhbnQgdGhpbmcgaXMgZGlzdHJpYnV0aW9uIHdpdGggcmVhbGlzdGljIGRhdGEuIFRoZSBnb2FsIHNob3VsZCBiZSB0byB1c2UgdGhlCm1vc3QgdHJpdmlhbCBmdW5jdGlvbiB0aGF0IGdpdmVzIHRoZSABMAEBAQE- X-Mailer: YahooMailWebService/0.8.156.576 References: <521BBD21.4070304@freebsd.org> <521EE8DA.3060107@freebsd.org> Message-ID: <1377952913.44129.YahooMailNeo@web121605.mail.ne1.yahoo.com> Date: Sat, 31 Aug 2013 05:41:53 -0700 (PDT) From: Barney Cordoba Subject: Re: Flow ID, LACP, and igb To: Luigi Rizzo , Alan Somers In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: Jack F Vogel , "T.C. Gubatayao" , "Justin T. Gibbs" , Andre Oppermann , "net@freebsd.org" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: Barney Cordoba List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 31 Aug 2013 12:42:01 -0000 May I express my glee and astonishment that =A0you're debating the use of c= omplicated hash functions=0Afor something that's likely to have from 2-8 sl= ots?=0A=0AAlso, the *most* important thing is distribution with realistic d= ata. The goal should be to use the=0Amost trivial function that gives the m= ost balanced distribution with real numbers. Faster is=0Anot better if the = result is an unbalanced distribution.=0A=0AMany of your ports will be 80 an= d 53, and if you're going through a router your ethernets=0Amay not be very= unique, so why even bother to include them? Does getting a good distributi= on=0Arequire that you hash every element individually, or can you get the s= ame distribution with=0Aa faster, simpler way of creating the seed?=0A=0ATh= ere's also the other consideration of packet size. Packets on port 53 are l= ikely to be smaller=0Athan packets on port 80. What you want is equal distr= ibution PER PORT on the ports that will=0Acarry that vast majority of your = traffic.=0A=0AWhen designing efficient systems, you must not assume that po= rts and IPs are random, because they're=0Anot. 99% of your load will be on = a small number of destination ports and a limited range of source ports.=0A= =0AFor a web server application, geting a perfect distribution on the http = ports is most crucial.=0A=0AThe hash function in if_lagg.c looks like more = of a classroom exercise than a practical implementation.=A0=0AIf you're goi= ng to consider 100M iterations; consider that much of the time is wasted pa= rsing the=0Apacket (again). Why not add a simple sysctl that enables a hash= that is created in the ip parser,=0Awhen all of the pieces are available w= ithout having to re-parse the mbuf?=0A=0AOr better yet, use the same number= of queues on igb as you have LAGG ports, and use the queue id (or RSS)=0Aa= s the hash, so that=A0your traffic is sync'd between the ethernet adapter q= ueues and the LAGG ports. The card=0Ahas already done the work for you.=0A= =0ABC=0A=0A=0A=0A=0A=0A________________________________=0A From: Luigi Rizz= o =0ATo: Alan Somers =0ACc: Jack = F Vogel ; "net@freebsd.org" ; Justin T. G= ibbs ; Andre Oppermann ; T.C. Gubatay= ao =0ASent: Friday, August 30, 2013 8:04 PM=0ASu= bject: Re: Flow ID, LACP, and igb=0A =0A=0AAlan,=0A=0A=0AOn Thu, Aug 29, 20= 13 at 6:45 PM, Alan Somers wrote:=0A>=0A>=0A> ...=0A>= I pulled all four hash functions out into userland and microbenchmarked=0A= > them.=A0 The upshot is that hash32 and fnv_hash are the fastest, jenkins_= hash=0A> is slower, and siphash24 is the slowest.=A0 Also, Clang resulted i= n much=0A> faster code than gcc.=0A>=0A>=0Ai missed this part of your messa= ge, but if i read your code well,=0Ayou are running 100M iterations and the= numbers below are in seconds,=0Aso if you multiply the numbers by 10 you h= ave the cost per hash in=0Ananoseconds.=0A=0AWhat CPU did you use for your = tests ?=0A=0AAlso some of the numbers (FNV and hash32) are suspiciously low= .=0A=0AI believe that the compiler (both of them) have figure out that ever= ything=0Ais constant in these functions, and fnv_32_buf() and hash32_buf() = are=0Ainline,=0Ahence they can be optimized to just return a constant.=0ATh= is does not happen for siphash and jenkins because they are defined=0Aexter= nally.=0A=0ACan you please re-run the tests in a way that defeats the optim= ization ?=0A(e.g. pass a non constant argument to the the hashes so you act= ually need=0Ato run the code).=0A=0Acheers=0Aluigi=0A=0A=0Ahttp://people.fr= eebsd.org/~asomers/lagg_hash/=0A>=0A> [root@sm4u-4 /usr/home/alans/ctest/la= gg_hash]# ./lagg_hash-gcc-4.8=0A> FNV: 0.76=0A> hash32: 1.18=0A> SipHash24:= 44.39=0A> Jenkins: 6.20=0A> [root@sm4u-4 /usr/home/alans/ctest/lagg_hash]#= ./lagg_hash-gcc-4.2.1=0A> FNV: 0.74=0A> hash32: 1.35=0A> SipHash24: 55.25= =0A> Jenkins: 7.37=0A> [root@sm4u-4 /usr/home/alans/ctest/lagg_hash]# ./lag= g_hash.clang-3.3=0A> FNV: 0.30=0A> hash32: 0.30=0A> SipHash24: 55.97=0A> Je= nkins: 6.45=0A> [root@sm4u-4 /usr/home/alans/ctest/lagg_hash]# ./lagg_hash.= clang-3.2=0A> FNV: 0.30=0A> hash32: 0.30=0A> SipHash24: 44.52=0A> Jenkins: = 6.48=0A>=0A>=0A>=0A> > T.C.=0A> >=0A> > [1]=0A> >=0A> http://svnweb.freebsd= .org/base/head/sys/libkern/jenkins_hash.c?view=3Dmarkup=0A> _______________= ________________________________=0A> freebsd-net@freebsd.org mailing list= =0A> http://lists.freebsd.org/mailman/listinfo/freebsd-net=0A> To unsubscri= be, send any mail to "freebsd-net-unsubscribe@freebsd.org"=0A>=0A__________= _____________________________________=0Afreebsd-net@freebsd.org mailing lis= t=0Ahttp://lists.freebsd.org/mailman/listinfo/freebsd-net=0ATo unsubscribe,= send any mail to "freebsd-net-unsubscribe@freebsd.org" From owner-freebsd-net@FreeBSD.ORG Sat Aug 31 14:50:37 2013 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 6E9D6602 for ; Sat, 31 Aug 2013 14:50:37 +0000 (UTC) (envelope-from barney_cordoba@yahoo.com) Received: from nm13-vm5.bullet.mail.ne1.yahoo.com (nm13-vm5.bullet.mail.ne1.yahoo.com [98.138.91.235]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 2F39F25BB for ; Sat, 31 Aug 2013 14:50:36 +0000 (UTC) Received: from [98.138.226.179] by nm13.bullet.mail.ne1.yahoo.com with NNFMP; 31 Aug 2013 14:47:48 -0000 Received: from [98.138.88.239] by tm14.bullet.mail.ne1.yahoo.com with NNFMP; 31 Aug 2013 14:47:48 -0000 Received: from [127.0.0.1] by omp1039.mail.ne1.yahoo.com with NNFMP; 31 Aug 2013 14:47:48 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 11055.63437.bm@omp1039.mail.ne1.yahoo.com Received: (qmail 1521 invoked by uid 60001); 31 Aug 2013 14:47:47 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1377960467; bh=91z46d+R+3onCaet/dXTIaEVGu76b3VAiwEAhziMStI=; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type; b=0vc/X0MuzCRm0Pe3apvKdA2M40v+jeAPakoMKmmcezHv4FpA418BfWX8f+0wN38qOI4EqLPw6/oV8V4zpwR/xgR3okQxQmu0/px1FIKnh1I/2LFr0QOf9xMW7oVgY798D0dIeej4QDaNblx3fUGULgR08Scm7NbHlplxEz7/aTw= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type; b=eO+dtiWX9Ukfsx7CmNIEay2MFKeHjU2zVSxXsYMoM+dRwhtXLC3+tNi2x4AsW1gNjhx2wOnj4B2clfyjNVCI4mQcTdra1bGkWlQNeDvxahig6Ewwyq7/exkb4McIoRBnaRsPr13xRiMsB4sA4walhmCv5fX5gBLI94AZgsxMarQ=; X-YMail-OSG: upHBLR4VM1kga1rhHiwyEXJI5MEDL17tFUKdbT7XFWh6rKD dTA_nZG9a2OdVZW7dGnNzW64q0AiCITb7GjphParmIq7qWUnw8irC60HvmYV Pja3Q1CbuYY1C_hNWkYeO.D1dHsFgxyWyYG7Znx5WHP7Sbqk9TsPnulk3CzX 8CznhUBzvNYg7iq4Dr.gLK3DX9NOBv8mmBS48EMApkNdpV6aJx91DwG1zn_N WMOIN7gnheymnx357Cv1suudwm8In0UzSJM07f9URCEGXwTrE4YTi.5cY4m6 .QVarhLcpZ556bfIQpHW8jq_HKi6ybnKdjP3_14MyeQpquQI.oVasgGsfleK toRbHI9rHfwNwq9IcZoWpYUv2A3KVU07gWLQSDZgxQA5zF95LcyZGGIvNTTi bCuXu9lHalBSwptou_BZl5jtTtG_VErF.ESYkmIi5gkEZqCuEejxLbQNZKd6 R77O5LtTtitak5EkfNc7rBaKPJl2Lqx0XQUCrT8LD1RV5JX5JrSaCNbJ7x_h S0x8h9.Jt6iWCaiV5VX86AXe4dr.U6Hx1GCg0bQaNVBko29oNALkTyyPU_E0 - Received: from [98.203.118.124] by web121603.mail.ne1.yahoo.com via HTTP; Sat, 31 Aug 2013 07:47:47 PDT X-Rocket-MIMEInfo: 002.001, QW5kIGFub3RoZXIgdGhpbmc7IHRoZSB1c2Ugb2YgbW9kdWxvIGlzIHZlcnkgZXhwZW5zaXZlIHdoZW4gdGhlIG51bWJlciBvZiBwb3J0cwp1c2VkIGluIExBR0cgaXMgKnVzdWFsbHkqIGEgcG93ZXIgb2YgMi4gZm9vJihTTE9UUy0xKSBpcyBhIGxvdCBmYXN0ZXIgdGhhbiAoZm9vJVNMT1RTKS7CoAoKaWYgKFNMT1RTID09IDIgfHwgU0xPVFMgPT0gNCB8fCBTTE9UUyA9PSA4KQrCoCDCoCBoYXNoID0gaGFzaCYoU0xPVFMtMSk7CmVsc2UKwqAgwqAgaGFzaCA9IGhhc2ggJSBTTE9UUzsKCmlzIG1vcmUgdGhhbiABMAEBAQE- X-Mailer: YahooMailWebService/0.8.156.576 References: <521BBD21.4070304@freebsd.org> <521EE8DA.3060107@freebsd.org> Message-ID: <1377960467.93173.YahooMailNeo@web121603.mail.ne1.yahoo.com> Date: Sat, 31 Aug 2013 07:47:47 -0700 (PDT) From: Barney Cordoba Subject: Re: Flow ID, LACP, and igb To: Luigi Rizzo , Alan Somers In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: Jack F Vogel , "T.C. Gubatayao" , "Justin T. Gibbs" , Andre Oppermann , "net@freebsd.org" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: Barney Cordoba List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 31 Aug 2013 14:50:37 -0000 And another thing; the use of modulo is very expensive when the number of p= orts=0Aused in LAGG is *usually* a power of 2. foo&(SLOTS-1) is a lot faste= r than (foo%SLOTS).=A0=0A=0Aif (SLOTS =3D=3D 2 || SLOTS =3D=3D 4 || SLOTS = =3D=3D 8)=0A=A0 =A0 hash =3D hash&(SLOTS-1);=0Aelse=0A=A0 =A0 hash =3D hash= % SLOTS;=0A=0Ais more than twice as fast as=A0=0A=0Ahash % SLOTS;=0A=0ABC= =0A=0A=0A________________________________=0A From: Luigi Rizzo =0ATo: Alan Somers =0ACc: Jack F Vogel ; "net@freebsd.org" ; Justin T. Gibbs ; Andre Oppermann ; T.C. Gubatayao =0ASent: Friday, August 30, 2013 8:04 PM=0ASubject: Re: Flo= w ID, LACP, and igb=0A =0A=0AAlan,=0A=0A=0AOn Thu, Aug 29, 2013 at 6:45 PM,= Alan Somers wrote:=0A>=0A>=0A> ...=0A> I pulled all = four hash functions out into userland and microbenchmarked=0A> them.=A0 The= upshot is that hash32 and fnv_hash are the fastest, jenkins_hash=0A> is sl= ower, and siphash24 is the slowest.=A0 Also, Clang resulted in much=0A> fas= ter code than gcc.=0A>=0A>=0Ai missed this part of your message, but if i r= ead your code well,=0Ayou are running 100M iterations and the numbers below= are in seconds,=0Aso if you multiply the numbers by 10 you have the cost p= er hash in=0Ananoseconds.=0A=0AWhat CPU did you use for your tests ?=0A=0AA= lso some of the numbers (FNV and hash32) are suspiciously low.=0A=0AI=0A be= lieve that the compiler (both of them) have figure out that everything=0Ais= constant in these functions, and fnv_32_buf() and hash32_buf() are=0Ainlin= e,=0Ahence they can be optimized to just return a constant.=0AThis does not= happen for siphash and jenkins because they are defined=0Aexternally.=0A= =0ACan you please re-run the tests in a way that defeats the optimization ?= =0A(e.g. pass a non constant argument to the the hashes so you actually nee= d=0Ato run the code).=0A=0Acheers=0Aluigi=0A=0A=0Ahttp://people.freebsd.org= /~asomers/lagg_hash/=0A>=0A> [root@sm4u-4 /usr/home/alans/ctest/lagg_hash]#= ./lagg_hash-gcc-4.8=0A> FNV: 0.76=0A> hash32: 1.18=0A> SipHash24: 44.39=0A= > Jenkins: 6.20=0A> [root@sm4u-4 /usr/home/alans/ctest/lagg_hash]# ./lagg_h= ash-gcc-4.2.1=0A> FNV: 0.74=0A> hash32: 1.35=0A> SipHash24: 55.25=0A> Jenki= ns: 7.37=0A> [root@sm4u-4 /usr/home/alans/ctest/lagg_hash]# ./lagg_hash.cla= ng-3.3=0A> FNV: 0.30=0A> hash32: 0.30=0A> SipHash24: 55.97=0A> Jenkins: 6.4= 5=0A> [root@sm4u-4 /usr/home/alans/ctest/lagg_hash]# ./lagg_hash.clang-3.2= =0A> FNV: 0.30=0A> hash32: 0.30=0A> SipHash24: 44.52=0A> Jenkins: 6.48=0A>= =0A>=0A>=0A> > T.C.=0A> >=0A> > [1]=0A> >=0A> http://svnweb.freebsd.org/bas= e/head/sys/libkern/jenkins_hash.c?view=3Dmarkup=0A> _______________________= ________________________=0A> freebsd-net@freebsd.org mailing list=0A> http:= //lists.freebsd.org/mailman/listinfo/freebsd-net=0A> To unsubscribe, send a= ny mail to "freebsd-net-unsubscribe@freebsd.org"=0A>=0A____________________= ___________________________=0Afreebsd-net@freebsd.org mailing list=0Ahttp:/= /lists.freebsd.org/mailman/listinfo/freebsd-net=0ATo unsubscribe, send any = mail to "freebsd-net-unsubscribe@freebsd.org" From owner-freebsd-net@FreeBSD.ORG Sat Aug 31 15:05:23 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 701BDB81 for ; Sat, 31 Aug 2013 15:05:23 +0000 (UTC) (envelope-from barney_cordoba@yahoo.com) Received: from nm21-vm4.bullet.mail.ne1.yahoo.com (nm21-vm4.bullet.mail.ne1.yahoo.com [98.138.91.181]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 1EA6D2656 for ; Sat, 31 Aug 2013 15:05:22 +0000 (UTC) Received: from [98.138.101.128] by nm21.bullet.mail.ne1.yahoo.com with NNFMP; 31 Aug 2013 15:03:08 -0000 Received: from [98.138.89.248] by tm16.bullet.mail.ne1.yahoo.com with NNFMP; 31 Aug 2013 15:03:08 -0000 Received: from [127.0.0.1] by omp1040.mail.ne1.yahoo.com with NNFMP; 31 Aug 2013 15:03:08 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 307911.32973.bm@omp1040.mail.ne1.yahoo.com Received: (qmail 38024 invoked by uid 60001); 31 Aug 2013 15:03:08 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1377961388; bh=Y7fnNQoRaB9Mi4aMrvBp20u0Jchwx0rrwWWy0Jz3lMw=; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=mYfGjAfYimzkoy0S2YCm47jExxgCxB9/sGt60fEAmtyGc88ohT79tDls79W6OI59UNLaSGxQDL2h89St8qo4G4b//1wlq9zAvjlPQjyvTkdhS45pNJmGLMLgyNBRdfXaq4b4MHK9bNZ9gwjavxPmO+jM6DQB8+cbgEdSuV2En9Y= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=ExHGsnoQn35p3cmBt8J5dbDx+EClbaAS6h12mTmHDbzdPQgbYqYkn0v6paHO16ot1K+rn9e5lK+y5iGaa/ieFrTdJ8aocOSVSWIt1eHfXGqCYGI/HY3VKDu4GK/wNsN+i6G5SoW/tuofW8Tf2kvmYFq3Jz28Xb/hltoSgd7qjfY=; X-YMail-OSG: ob_reFgVM1lDffF6LE7hWbGzHbMGfKgIk0e6GWQyyC3FBT9 noVa6TSxl70hWFu9H3ep9GCTM0XrY0v.r8xdn0a3f4Dtv_9LvAI9UWM2rd8g o2TGdzGUYgdULWWfaxS_OgR1YU0XUcYRQRcCYViVK.V_89x6EWMqmJxUCvUk xp_l1ocXjkdwEp.6G5huw8624Wv0jchbKkoOXTPytM63KMLpwPV4aE_quPtS 94zfqrMowraoCynekyrfSnoW5xugQvykV4VKgpXP3VKlgC3in0uhOwLvEXCI .FhKTLbDrMeI.Wlv5PIypHX6TgnPJ_h91Lr4vByOHCEAF2jjIp4FoXtOQvoV hRmDmAS_1V9mezXJazZPjFEpeH63Wdemqi1HZnHzQY3hftGqvB5tIJ8V1Cw4 KN2EVxQIbhceL.pJN5kV_hxBtvzpBfwU_BKC7evZqMA5IeKjrnq7b1r07IVJ zripu_ZMp0Bi7QyunKQoaBSV5y_fxYi0EEHZIG4PFsNa.79N8Ck1ijCWvLrV Nu34TTEk7KdtQtx6e6gS41Zuy.fcwVtbtnF69Wcb0tyTAOYHVez2BVHNwokC L.VsnOocD1zqLDPIvlyQnT6ZAKENkgfWce8Via6y8n99ibWbQj1RG559YcJY lvoDJhCWhgjc- Received: from [98.203.118.124] by web121601.mail.ne1.yahoo.com via HTTP; Sat, 31 Aug 2013 08:03:08 PDT X-Rocket-MIMEInfo: 002.001, VGhhdCdzIHdheSB0b28gaGlnaC4gWW91ciBiYXNlIHJ4IHJlcXVpcmVtZW50IGlzwqAKClBvcnRzICogcXVldWVzICogcnhkwqAKCldpdGggYSBxdWFkIGNhcmQgeW91IHNob3VsZG4ndCBiZSB1c2luZyBtb3JlIHRoYW4gMiBxdWV1ZXMsIHNvIHlvdXIgcmVxdWlyZW1lbnQKd2l0aCA1IHBvcnRzIGlzIDEwLDI0MCBqdXN0IGZvciB0aGUgcmVjZWl2ZSBzZXR1cC4gSWYgeW91J3JlIHVzaW5nIDQgcXVldWVzIHRoYXQKbnVtYmVyIGRvdWJsZXMsIHdoaWNoIHdvdWxkIG1ha2UgMjUsNjAwIG5vdCBlbm91Z2guwqABMAEBAQE- X-Mailer: YahooMailWebService/0.8.156.576 References: <71042F7C-5CBB-4494-B53A-EF4CE45B41BE@ebureau.com> Message-ID: <1377961388.28903.YahooMailNeo@web121601.mail.ne1.yahoo.com> Date: Sat, 31 Aug 2013 08:03:08 -0700 (PDT) From: Barney Cordoba Subject: Re: Intel 4-port ethernet adaptor link aggregation issue To: Joe Moog , freebsd-net In-Reply-To: <71042F7C-5CBB-4494-B53A-EF4CE45B41BE@ebureau.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.14 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: Barney Cordoba List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 31 Aug 2013 15:05:23 -0000 That's way too high. Your base rx requirement is=A0=0A=0APorts * queues * r= xd=A0=0A=0AWith a quad card you shouldn't be using more than 2 queues, so y= our requirement=0Awith 5 ports is 10,240 just for the receive setup. If you= 're using 4 queues that=0Anumber doubles, which would make 25,600 not enoug= h.=A0=0A=0ANote that setting mbufs to a huge number doesn't allocate the bu= ffers; they'll be=0Aallocated as needed. It's a ceiling. The reason for the= ceiling is so that you don't=0Ablow up your memory. If your system is usin= g 2 million mbuf clusters then you=0Ahave much bigger problems than LAGG.= =0A=0AAnyone who recommends 2 million clearly has no idea what they're doin= g.=0A=0ABC=0A=0A=0A________________________________=0A From: Joe Moog =0ATo: freebsd-net =0ASent: Wedne= sday, August 28, 2013 9:36 AM=0ASubject: Re: Intel 4-port ethernet adaptor = link aggregation issue=0A =0A=0AAll:=0A=0AThanks again to everybody for the= responses and suggestions to our 4-port lagg issue. The solution (for thos= e that may find the information of some value) was to set the value for ker= n.ipc.nmbclusters to a higher value than we had initially. Our previous tun= ing had this value set at 25600, but following a recommendation from the go= od folks at iXSystems we bumped this to a value closer to 2000000, and the = 4-port lagg is functioning as expected now.=0A=0AThank you all.=0A=0AJoe=0A= =0A_______________________________________________=0Afreebsd-net@freebsd.or= g mailing list=0Ahttp://lists.freebsd.org/mailman/listinfo/freebsd-net=0ATo= unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"