From owner-freebsd-net@FreeBSD.ORG Mon Feb 4 01:40:26 2013 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 74B04D04 for ; Mon, 4 Feb 2013 01:40:26 +0000 (UTC) (envelope-from gnn@neville-neil.com) Received: from vps.hungerhost.com (vps.hungerhost.com [216.38.53.176]) by mx1.freebsd.org (Postfix) with ESMTP id 4D6853DE for ; Mon, 4 Feb 2013 01:40:26 +0000 (UTC) Received: from pool-96-250-5-62.nycmny.fios.verizon.net ([96.250.5.62]:56104 helo=minion.home) by vps.hungerhost.com with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.80) (envelope-from ) id 1U2B2i-0003T6-B0 for net@freebsd.org; Sun, 03 Feb 2013 20:40:20 -0500 From: George Neville-Neil Content-Type: multipart/signed; boundary="Apple-Mail=_6DA9CCF7-4CE7-4C7E-BDE2-267DE4398129"; protocol="application/pgp-signature"; micalg=pgp-sha1 Subject: A question about SYN cookies... Message-Id: <131E67C7-F336-414E-89C7-535D549443F5@neville-neil.com> Date: Sun, 3 Feb 2013 19:09:34 -0500 To: net@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) X-Mailer: Apple Mail (2.1499) X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - vps.hungerhost.com X-AntiAbuse: Original Domain - freebsd.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - neville-neil.com X-Get-Message-Sender-Via: vps.hungerhost.com: authenticated_id: gnn@neville-neil.com X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Feb 2013 01:40:26 -0000 --Apple-Mail=_6DA9CCF7-4CE7-4C7E-BDE2-267DE4398129 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii Howdy, I've been reviewing the SYN cache and SYN cookie code and I'm wondering = why we do all the work of generating a SYN cache entry before sending a SYN cookie. If the = point of SYN cookies is to defend against a SYN flood then, to my mind, the SYN/ACK for the cookie = case should be sent off before doing all the work to try to create and insert a cache entry. Has = anyone, as yet, looked at a way to move the sending code earlier into syncache_add() and checked to see = if there is a performance improvement when a system is flooded with SYN packets? Best, George --Apple-Mail=_6DA9CCF7-4CE7-4C7E-BDE2-267DE4398129 Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=signature.asc Content-Type: application/pgp-signature; name=signature.asc Content-Description: Message signed with OpenPGP using GPGMail -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.19 (Darwin) iEYEARECAAYFAlEO/D4ACgkQYdh2wUQKM9KKggCeJqiQoewbJyjXT9pZTccTDV6X OgAAnRi99xl5OO8TiKlBBM7vQBeZwNA0 =/oqE -----END PGP SIGNATURE----- --Apple-Mail=_6DA9CCF7-4CE7-4C7E-BDE2-267DE4398129-- From owner-freebsd-net@FreeBSD.ORG Mon Feb 4 09:09:19 2013 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id B8AD19E8 for ; Mon, 4 Feb 2013 09:09:19 +0000 (UTC) (envelope-from andre@freebsd.org) Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2]) by mx1.freebsd.org (Postfix) with ESMTP id 179DC638 for ; Mon, 4 Feb 2013 09:09:18 +0000 (UTC) Received: (qmail 20332 invoked from network); 4 Feb 2013 10:28:20 -0000 Received: from c00l3r.networx.ch (HELO [127.0.0.1]) ([62.48.2.2]) (envelope-sender ) by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP for ; 4 Feb 2013 10:28:20 -0000 Message-ID: <510F7AB5.1040508@freebsd.org> Date: Mon, 04 Feb 2013 10:09:09 +0100 From: Andre Oppermann User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130107 Thunderbird/17.0.2 MIME-Version: 1.0 To: George Neville-Neil Subject: Re: A question about SYN cookies... References: <131E67C7-F336-414E-89C7-535D549443F5@neville-neil.com> In-Reply-To: <131E67C7-F336-414E-89C7-535D549443F5@neville-neil.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: net@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Feb 2013 09:09:19 -0000 On 04.02.2013 01:09, George Neville-Neil wrote: > Howdy, > > I've been reviewing the SYN cache and SYN cookie code and I'm wondering why we do all the work > of generating a SYN cache entry before sending a SYN cookie. If the point of SYN cookies is to > defend against a SYN flood then, to my mind, the SYN/ACK for the cookie case should be sent off before > doing all the work to try to create and insert a cache entry. Has anyone, as yet, looked at a way > to move the sending code earlier into syncache_add() and checked to see if there is a performance > improvement when a system is flooded with SYN packets? So far all syncookie implementations have an information loss because they can't store all state in the cookie unless timestamps are enabled. Apparently Windows 8 still doesn't enable timestamps but does quite a bit of window scaling leading to problems. See recent bug report here on net@. For generating syncookies we have three possible strategies: 1/ Use syncache and cookies in parallel and bump the oldest syncache entry replacing it with the new SYN attempt. Syncookies are done on all SYN-ACK's going out. 2/ Fill the syncache but do not bump the oldest entry, other than normal expiry. All further SYN-ACK's are syncookies-only (w/o window scaling etc). Those in the syncache do not need to carry syncookies and are real full SYN-ACK's. 3/ Only send syncookies and do not cache anything. No window scaling and SACK-PERM can be carried though. So far we've been doing option 1. We can switch to option 2 which, depending on the situation, may be better or worse. Options 3 isn't viable currently due to loss of window scaling and SACK. Based on the recent Windows 8 issue I've devised a different HMAC based syncookie scheme where all necessary information can be stored in the ISS forgoing the need for the timestamp bits. I have sent a description of the scheme to Colin and Nate to have it reviewed. It must be cryptographically strong enough to withstand cracking attempts for about 30 seconds. Forward security isn't necessary as the syncookie secrets are completely random and renewed every 30 seconds. -- Andre From owner-freebsd-net@FreeBSD.ORG Mon Feb 4 11:06:48 2013 Return-Path: Delivered-To: freebsd-net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 2F6D499C for ; Mon, 4 Feb 2013 11:06:48 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id 20DB1D0A for ; Mon, 4 Feb 2013 11:06:48 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.6/8.14.6) with ESMTP id r14B6msq028835 for ; Mon, 4 Feb 2013 11:06:48 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.6/8.14.6/Submit) id r14B6lDJ028831 for freebsd-net@FreeBSD.org; Mon, 4 Feb 2013 11:06:47 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 4 Feb 2013 11:06:47 GMT Message-Id: <201302041106.r14B6lDJ028831@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-net@FreeBSD.org Subject: Current problem reports assigned to freebsd-net@FreeBSD.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Feb 2013 11:06:48 -0000 Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/175734 net no ethernet detected on system with EG20T PCH chipset o kern/175267 net [pf] [tap] pf + tap keep state problem o kern/175236 net [epair] [gif] epair and gif Devices On Bridge o kern/175182 net [panic] kernel panic on RADIX_MPATH when deleting rout o kern/175153 net [tcp] will there miss a FIN when do TSO? o kern/174959 net [net] [patch] rnh_walktree_from visits spurious nodes o kern/174958 net [net] [patch] rnh_walktree_from makes unreasonable ass o kern/174897 net [route] Interface routes are broken o kern/174851 net [bxe] [patch] UDP checksum offload is wrong in bxe dri o kern/174850 net [bxe] [patch] bxe driver does not receive multicasts o kern/174849 net [bxe] [patch] bxe driver can hang kernel when reset o kern/174822 net [tcp] Page fault in tcp_discardcb under high traffic o kern/174602 net [gif] [ipsec] traceroute issue on gif tunnel with ipse o kern/174535 net [tcp] TCP fast retransmit feature works strange o kern/173475 net [tun] tun(4) stays opened by PID after process is term o kern/173201 net [ixgbe] [patch] Missing / broken ixgbe sysctl's and tu o kern/173137 net [em] em(4) unable to run at gigabit with 9.1-RC2 o kern/173002 net [patch] data type size problem in if_spppsubr.c o kern/172985 net [patch] [ip6] lltable leak when adding and removing IP o kern/172895 net [ixgb] [ixgbe] do not properly determine link-state o kern/172683 net [ip6] Duplicate IPv6 Link Local Addresses o kern/172675 net [netinet] [patch] sysctl_tcp_hc_list (net.inet.tcp.hos o kern/172113 net [panic] [e1000] [patch] 9.1-RC1/amd64 panices in igb(4 o kern/171840 net [ip6] IPv6 packets transmitting only on queue 0 o kern/171838 net [oce] [patch] Possible lock reversal and duplicate loc o kern/171739 net [bce] [panic] bce related kernel panic o kern/171728 net [arp] arp issue o kern/171711 net [dummynet] [panic] Kernel panic in dummynet o kern/171532 net [ndis] ndis(4) driver includes 'pccard'-specific code, o kern/171531 net [ndis] undocumented dependency for ndis(4) o kern/171524 net [ipmi] ipmi driver crashes kernel by reboot or shutdow s kern/171508 net [epair] [request] Add the ability to name epair device o kern/171228 net [re] [patch] if_re - eeprom write issues o kern/170701 net [ppp] killl ppp or reboot with active ppp connection c o kern/170267 net [ixgbe] IXGBE_LE32_TO_CPUS is probably an unintentiona o kern/170081 net [fxp] pf/nat/jails not working if checksum offloading o kern/169898 net ifconfig(8) fails to set MTU on multiple interfaces. o kern/169676 net [bge] [hang] system hangs, fully or partially after re o kern/169664 net [bgp] Wrongful replacement of interface connected net o kern/169620 net [ng] [pf] ng_l2tp incoming packet bypass pf firewall o kern/169459 net [ppp] umodem/ppp/3g stopped working after update from o kern/169438 net [ipsec] ipv4-in-ipv6 tunnel mode IPsec does not work p kern/168294 net [ixgbe] [patch] ixgbe driver compiled in kernel has no o kern/168246 net [em] Multiple em(4) not working with qemu o kern/168245 net [arp] [regression] Permanent ARP entry not deleted on o kern/168244 net [arp] [regression] Unable to manually remove permanent o kern/168183 net [bce] bce driver hang system o kern/167947 net [setfib] [patch] arpresolve checks only the default FI o kern/167603 net [ip] IP fragment reassembly's broken: file transfer ov o kern/167500 net [em] [panic] Kernel panics in em driver o kern/167325 net [netinet] [patch] sosend sometimes return EINVAL with o kern/167202 net [igmp]: Sending multiple IGMP packets crashes kernel o kern/167059 net [tcp] [panic] System does panic in in_pcbbind() and ha o kern/166940 net [ipfilter] [panic] Double fault in kern 8.2 o kern/166462 net [gre] gre(4) when using a tunnel source address from c o kern/166372 net [patch] ipfilter drops UDP packets with zero checksum o kern/166285 net [arp] FreeBSD v8.1 REL p8 arp: unknown hardware addres o kern/166255 net [net] [patch] It should be possible to disable "promis o kern/165963 net [panic] [ipf] ipfilter/nat NULL pointer deference o kern/165903 net mbuf leak o kern/165643 net [net] [patch] Missing vnet restores in net/if_ethersub o kern/165622 net [ndis][panic][patch] Unregistered use of FPU in kernel s kern/165562 net [request] add support for Intel i350 in FreeBSD 7.4 o kern/165526 net [bxe] UDP packets checksum calculation whithin if_bxe o kern/165488 net [ppp] [panic] Fatal trap 12 jails and ppp , kernel wit o kern/165305 net [ip6] [request] Feature parity between IP_TOS and IPV6 o kern/165296 net [vlan] [patch] Fix EVL_APPLY_VLID, update EVL_APPLY_PR o kern/165181 net [igb] igb freezes after about 2 weeks of uptime o kern/165174 net [patch] [tap] allow tap(4) to keep its address on clos o kern/165152 net [ip6] Does not work through the issue of ipv6 addresse o kern/164495 net [igb] connect double head igb to switch cause system t o kern/164490 net [pfil] Incorrect IP checksum on pfil pass from ip_outp o kern/164475 net [gre] gre misses RUNNING flag after a reboot o kern/164265 net [netinet] [patch] tcp_lro_rx computes wrong checksum i o kern/163903 net [igb] "igb0:tx(0)","bpf interface lock" v2.2.5 9-STABL o kern/163481 net freebsd do not add itself to ping route packet o kern/162927 net [tun] Modem-PPP error ppp[1538]: tun0: Phase: Clearing o kern/162926 net [ipfilter] Infinite loop in ipfilter with fragmented I o kern/162558 net [dummynet] [panic] seldom dummynet panics o kern/162153 net [em] intel em driver 7.2.4 don't compile o kern/162110 net [igb] [panic] RELENG_9 panics on boot in IGB driver - o kern/162028 net [ixgbe] [patch] misplaced #endif in ixgbe.c o kern/161277 net [em] [patch] BMC cannot receive IPMI traffic after loa o kern/160873 net [igb] igb(4) from HEAD fails to build on 7-STABLE o kern/160750 net Intel PRO/1000 connection breaks under load until rebo o kern/160693 net [gif] [em] Multicast packet are not passed from GIF0 t o kern/160293 net [ieee80211] ppanic] kernel panic during network setup o kern/160206 net [gif] gifX stops working after a while (IPv6 tunnel) o kern/159817 net [udp] write UDPv4: No buffer space available (code=55) o kern/159629 net [ipsec] [panic] kernel panic with IPsec in transport m o kern/159621 net [tcp] [panic] panic: soabort: so_count o kern/159603 net [netinet] [patch] in_ifscrubprefix() - network route c o kern/159601 net [netinet] [patch] in_scrubprefix() - loopback route re o kern/159294 net [em] em watchdog timeouts o kern/159203 net [wpi] Intel 3945ABG Wireless LAN not support IBSS o kern/158930 net [bpf] BPF element leak in ifp->bpf_if->bif_dlist o kern/158726 net [ip6] [patch] ICMPv6 Router Announcement flooding limi o kern/158694 net [ix] [lagg] ix0 is not working within lagg(4) o kern/158665 net [ip6] [panic] kernel pagefault in in6_setscope() o kern/158635 net [em] TSO breaks BPF packet captures with em driver f kern/157802 net [dummynet] [panic] kernel panic in dummynet o kern/157785 net amd64 + jail + ipfw + natd = very slow outbound traffi o kern/157418 net [em] em driver lockup during boot on Supermicro X9SCM- o kern/157410 net [ip6] IPv6 Router Advertisements Cause Excessive CPU U o kern/157287 net [re] [panic] INVARIANTS panic (Memory modified after f o kern/157209 net [ip6] [patch] locking error in rip6_input() (sys/netin o kern/157200 net [network.subr] [patch] stf(4) can not communicate betw o kern/157182 net [lagg] lagg interface not working together with epair o kern/156877 net [dummynet] [panic] dummynet move_pkt() null ptr derefe o kern/156667 net [em] em0 fails to init on CURRENT after March 17 o kern/156408 net [vlan] Routing failure when using VLANs vs. Physical e o kern/156328 net [icmp]: host can ping other subnet but no have IP from o kern/156317 net [ip6] Wrong order of IPv6 NS DAD/MLD Report o kern/156283 net [ip6] [patch] nd6_ns_input - rtalloc_mpath does not re o kern/156279 net [if_bridge][divert][ipfw] unable to correctly re-injec o kern/156226 net [lagg]: failover does not announce the failover to swi o kern/156030 net [ip6] [panic] Crash in nd6_dad_start() due to null ptr o kern/155772 net ifconfig(8): ioctl (SIOCAIFADDR): File exists on direc o kern/155680 net [multicast] problems with multicast s kern/155642 net [request] Add driver for Realtek RTL8191SE/RTL8192SE W o kern/155597 net [panic] Kernel panics with "sbdrop" message o kern/155420 net [vlan] adding vlan break existent vlan o kern/155177 net [route] [panic] Panic when inject routes in kernel p kern/155030 net [igb] igb(4) DEVICE_POLLING does not work with carp(4) o kern/155010 net [msk] ntfs-3g via iscsi using msk driver cause kernel o kern/154943 net [gif] ifconfig gifX create on existing gifX clears IP s kern/154851 net [request]: Port brcm80211 driver from Linux to FreeBSD o kern/154850 net [netgraph] [patch] ng_ether fails to name nodes when t o kern/154679 net [em] Fatal trap 12: "em1 taskq" only at startup (8.1-R o kern/154600 net [tcp] [panic] Random kernel panics on tcp_output o kern/154557 net [tcp] Freeze tcp-session of the clients, if in the gat o kern/154443 net [if_bridge] Kernel module bridgestp.ko missing after u o kern/154286 net [netgraph] [panic] 8.2-PRERELEASE panic in netgraph o kern/154255 net [nfs] NFS not responding o kern/154214 net [stf] [panic] Panic when creating stf interface o kern/154185 net race condition in mb_dupcl o kern/154169 net [multicast] [ip6] Node Information Query multicast add o kern/154134 net [ip6] stuck kernel state in LISTEN on ipv6 daemon whic o kern/154091 net [netgraph] [panic] netgraph, unaligned mbuf? o conf/154062 net [vlan] [patch] change to way of auto-generatation of v o kern/153937 net [ral] ralink panics the system (amd64 freeBSDD 8.X) wh o kern/153936 net [ixgbe] [patch] MPRC workaround incorrectly applied to o kern/153816 net [ixgbe] ixgbe doesn't work properly with the Intel 10g o kern/153772 net [ixgbe] [patch] sysctls reference wrong XON/XOFF varia o kern/153497 net [netgraph] netgraph panic due to race conditions o kern/153454 net [patch] [wlan] [urtw] Support ad-hoc and hostap modes o kern/153308 net [em] em interface use 100% cpu o kern/153244 net [em] em(4) fails to send UDP to port 0xffff o kern/152893 net [netgraph] [panic] 8.2-PRERELEASE panic in netgraph o kern/152853 net [em] tftpd (and likely other udp traffic) fails over e o kern/152828 net [em] poor performance on 8.1, 8.2-PRE o kern/152569 net [net]: Multiple ppp connections and routing table prob o kern/152235 net [arp] Permanent local ARP entries are not properly upd o kern/152141 net [vlan] [patch] encapsulate vlan in ng_ether before out o kern/152036 net [libc] getifaddrs(3) returns truncated sockaddrs for n o kern/151690 net [ep] network connectivity won't work until dhclient is o kern/151681 net [nfs] NFS mount via IPv6 leads to hang on client with o kern/151593 net [igb] [panic] Kernel panic when bringing up igb networ o kern/150920 net [ixgbe][igb] Panic when packets are dropped with heade o kern/150557 net [igb] igb0: Watchdog timeout -- resetting o kern/150251 net [patch] [ixgbe] Late cable insertion broken o kern/150249 net [ixgbe] Media type detection broken o bin/150224 net ppp(8) does not reassign static IP after kill -KILL co f kern/149969 net [wlan] [ral] ralink rt2661 fails to maintain connectio o kern/149937 net [ipfilter] [patch] kernel panic in ipfilter IP fragmen o kern/149643 net [rum] device not sending proper beacon frames in ap mo o kern/149609 net [panic] reboot after adding second default route o kern/149117 net [inet] [patch] in_pcbbind: redundant test o kern/149086 net [multicast] Generic multicast join failure in 8.1 o kern/148018 net [flowtable] flowtable crashes on ia64 o kern/147912 net [boot] FreeBSD 8 Beta won't boot on Thinkpad i1300 11 o kern/147894 net [ipsec] IPv6-in-IPv4 does not work inside an ESP-only o kern/147155 net [ip6] setfb not work with ipv6 o kern/146845 net [libc] close(2) returns error 54 (connection reset by f kern/146792 net [flowtable] flowcleaner 100% cpu's core load o kern/146719 net [pf] [panic] PF or dumynet kernel panic o kern/146534 net [icmp6] wrong source address in echo reply o kern/146427 net [mwl] Additional virtual access points don't work on m f kern/146394 net [vlan] IP source address for outgoing connections o bin/146377 net [ppp] [tun] Interface doesn't clear addresses when PPP o kern/146358 net [vlan] wrong destination MAC address o kern/146165 net [wlan] [panic] Setting bssid in adhoc mode causes pani o kern/146082 net [ng_l2tp] a false invaliant check was performed in ng_ o kern/146037 net [panic] mpd + CoA = kernel panic o kern/145825 net [panic] panic: soabort: so_count o kern/145728 net [lagg] Stops working lagg between two servers. p kern/145600 net TCP/ECN behaves different to CE/CWR than ns2 reference f kern/144917 net [flowtable] [panic] flowtable crashes system [regressi o kern/144882 net MacBookPro =>4.1 does not connect to BSD in hostap wit o kern/144874 net [if_bridge] [patch] if_bridge frees mbuf after pfil ho o conf/144700 net [rc.d] async dhclient breaks stuff for too many people o kern/144616 net [nat] [panic] ip_nat panic FreeBSD 7.2 f kern/144315 net [ipfw] [panic] freebsd 8-stable reboot after add ipfw o kern/144231 net bind/connect/sendto too strict about sockaddr length o kern/143846 net [gif] bringing gif3 tunnel down causes gif0 tunnel to s kern/143673 net [stf] [request] there should be a way to support multi s kern/143666 net [ip6] [request] PMTU black hole detection not implemen o kern/143622 net [pfil] [patch] unlock pfil lock while calling firewall o kern/143593 net [ipsec] When using IPSec, tcpdump doesn't show outgoin o kern/143591 net [ral] RT2561C-based DLink card (DWL-510) fails to work o kern/143208 net [ipsec] [gif] IPSec over gif interface not working o kern/143034 net [panic] system reboots itself in tcp code [regression] o kern/142877 net [hang] network-related repeatable 8.0-STABLE hard hang o kern/142774 net Problem with outgoing connections on interface with mu o kern/142772 net [libc] lla_lookup: new lle malloc failed f kern/142518 net [em] [lagg] Problem on 8.0-STABLE with em and lagg o kern/142018 net [iwi] [patch] Possibly wrong interpretation of beacon- o kern/141861 net [wi] data garbled with WEP and wi(4) with Prism 2.5 f kern/141741 net Etherlink III NIC won't work after upgrade to FBSD 8, o kern/140742 net rum(4) Two asus-WL167G adapters cannot talk to each ot o kern/140682 net [netgraph] [panic] random panic in netgraph f kern/140634 net [vlan] destroying if_lagg interface with if_vlan membe o kern/140619 net [ifnet] [patch] refine obsolete if_var.h comments desc o kern/140346 net [wlan] High bandwidth use causes loss of wlan connecti o kern/140142 net [ip6] [panic] FreeBSD 7.2-amd64 panic w/IPv6 o kern/140066 net [bwi] install report for 8.0 RC 2 (multiple problems) o kern/139565 net [ipfilter] ipfilter ioctl SIOCDELST broken o kern/139387 net [ipsec] Wrong lenth of PF_KEY messages in promiscuous o bin/139346 net [patch] arp(8) add option to remove static entries lis o kern/139268 net [if_bridge] [patch] allow if_bridge to forward just VL p kern/139204 net [arp] DHCP server replies rejected, ARP entry lost bef o kern/139117 net [lagg] + wlan boot timing (EBUSY) o kern/139058 net [ipfilter] mbuf cluster leak on FreeBSD 7.2 o kern/138850 net [dummynet] dummynet doesn't work correctly on a bridge o kern/138782 net [panic] sbflush_internal: cc 0 || mb 0xffffff004127b00 o kern/138688 net [rum] possibly broken on 8 Beta 4 amd64: able to wpa a o kern/138678 net [lo] FreeBSD does not assign linklocal address to loop o kern/138407 net [gre] gre(4) interface does not come up after reboot o kern/138332 net [tun] [lor] ifconfig tun0 destroy causes LOR if_adata/ o kern/138266 net [panic] kernel panic when udp benchmark test used as r o kern/138177 net [ipfilter] FreeBSD crashing repeatedly in ip_nat.c:257 f kern/138029 net [bpf] [panic] periodically kernel panic and reboot o kern/137881 net [netgraph] [panic] ng_pppoe fatal trap 12 p bin/137841 net [patch] wpa_supplicant(8) cannot verify SHA256 signed p kern/137776 net [rum] panic in rum(4) driver on 8.0-BETA2 o bin/137641 net ifconfig(8): various problems with "vlan_device.vlan_i o kern/137392 net [ip] [panic] crash in ip_nat.c line 2577 o kern/137372 net [ral] FreeBSD doesn't support wireless interface from o kern/137089 net [lagg] lagg falsely triggers IPv6 duplicate address de o bin/136994 net [patch] ifconfig(8) print carp mac address o kern/136911 net [netgraph] [panic] system panic on kldload ng_bpf.ko t o kern/136618 net [pf][stf] panic on cloning interface without unit numb o kern/135502 net [periodic] Warning message raised by rtfree function i o kern/134583 net [hang] Machine with jail freezes after random amount o o kern/134531 net [route] [panic] kernel crash related to routes/zebra o kern/134157 net [dummynet] dummynet loads cpu for 100% and make a syst o kern/133969 net [dummynet] [panic] Fatal trap 12: page fault while in o kern/133968 net [dummynet] [panic] dummynet kernel panic o kern/133736 net [udp] ip_id not protected ... o kern/133595 net [panic] Kernel Panic at pcpu.h:195 o kern/133572 net [ppp] [hang] incoming PPTP connection hangs the system o kern/133490 net [bpf] [panic] 'kmem_map too small' panic on Dell r900 o kern/133235 net [netinet] [patch] Process SIOCDLIFADDR command incorre f kern/133213 net arp and sshd errors on 7.1-PRERELEASE o kern/133060 net [ipsec] [pfsync] [panic] Kernel panic with ipsec + pfs o kern/132889 net [ndis] [panic] NDIS kernel crash on load BCM4321 AGN d o conf/132851 net [patch] rc.conf(5): allow to setfib(1) for service run o kern/132734 net [ifmib] [panic] panic in net/if_mib.c o kern/132705 net [libwrap] [patch] libwrap - infinite loop if hosts.all o kern/132672 net [ndis] [panic] ndis with rt2860.sys causes kernel pani o kern/132554 net [ipl] There is no ippool start script/ipfilter magic t o kern/132354 net [nat] Getting some packages to ipnat(8) causes crash o kern/132277 net [crypto] [ipsec] poor performance using cryptodevice f o kern/131781 net [ndis] ndis keeps dropping the link o kern/131776 net [wi] driver fails to init o kern/131753 net [altq] [panic] kernel panic in hfsc_dequeue o kern/131601 net [ipfilter] [panic] 7-STABLE panic in nat_finalise (tcp o bin/131567 net [socket] [patch] Update for regression/sockets/unix_cm o bin/131365 net route(8): route add changes interpretation of network f kern/130820 net [ndis] wpa_supplicant(8) returns 'no space on device' o kern/130628 net [nfs] NFS / rpc.lockd deadlock on 7.1-R o conf/130555 net [rc.d] [patch] No good way to set ipfilter variables a o kern/130525 net [ndis] [panic] 64 bit ar5008 ndisgen-erated driver cau o kern/130311 net [wlan_xauth] [panic] hostapd restart causing kernel pa o kern/130109 net [ipfw] Can not set fib for packets originated from loc f kern/130059 net [panic] Leaking 50k mbufs/hour f kern/129719 net [nfs] [panic] Panic during shutdown, tcp_ctloutput: in o kern/129517 net [ipsec] [panic] double fault / stack overflow f kern/129508 net [carp] [panic] Kernel panic with EtherIP (may be relat o kern/129219 net [ppp] Kernel panic when using kernel mode ppp o kern/129197 net [panic] 7.0 IP stack related panic o bin/128954 net ifconfig(8) deletes valid routes o bin/128602 net [an] wpa_supplicant(8) crashes with an(4) o kern/128448 net [nfs] 6.4-RC1 Boot Fails if NFS Hostname cannot be res o bin/128295 net [patch] ifconfig(8) does not print TOE4 or TOE6 capabi o bin/128001 net wpa_supplicant(8), wlan(4), and wi(4) issues o kern/127826 net [iwi] iwi0 driver has reduced performance and connecti o kern/127815 net [gif] [patch] if_gif does not set vlan attributes from o kern/127724 net [rtalloc] rtfree: 0xc5a8f870 has 1 refs f bin/127719 net [arp] arp: Segmentation fault (core dumped) f kern/127528 net [icmp]: icmp socket receives icmp replies not owned by p kern/127360 net [socket] TOE socket options missing from sosetopt() o bin/127192 net routed(8) removes the secondary alias IP of interface f kern/127145 net [wi]: prism (wi) driver crash at bigger traffic o kern/126895 net [patch] [ral] Add antenna selection (marked as TBD) o kern/126874 net [vlan]: Zebra problem if ifconfig vlanX destroy o kern/126695 net rtfree messages and network disruption upon use of if_ o kern/126339 net [ipw] ipw driver drops the connection o kern/126075 net [inet] [patch] internet control accesses beyond end of o bin/125922 net [patch] Deadlock in arp(8) o kern/125920 net [arp] Kernel Routing Table loses Ethernet Link status o kern/125845 net [netinet] [patch] tcp_lro_rx() should make use of hard o kern/125258 net [socket] socket's SO_REUSEADDR option does not work o kern/125239 net [gre] kernel crash when using gre o kern/124341 net [ral] promiscuous mode for wireless device ral0 looses o kern/124225 net [ndis] [patch] ndis network driver sometimes loses net o kern/124160 net [libc] connect(2) function loops indefinitely o kern/124021 net [ip6] [panic] page fault in nd6_output() o kern/123968 net [rum] [panic] rum driver causes kernel panic with WPA. o kern/123892 net [tap] [patch] No buffer space available o kern/123890 net [ppp] [panic] crash & reboot on work with PPP low-spee o kern/123858 net [stf] [patch] stf not usable behind a NAT o kern/123796 net [ipf] FreeBSD 6.1+VPN+ipnat+ipf: port mapping does not o kern/123758 net [panic] panic while restarting net/freenet6 o bin/123633 net ifconfig(8) doesn't set inet and ether address in one o kern/123559 net [iwi] iwi periodically disassociates/associates [regre o bin/123465 net [ip6] route(8): route add -inet6 -interfac o kern/123463 net [ipsec] [panic] repeatable crash related to ipsec-tool o conf/123330 net [nsswitch.conf] Enabling samba wins in nsswitch.conf c o kern/123160 net [ip] Panic and reboot at sysctl kern.polling.enable=0 o kern/122989 net [swi] [panic] 6.3 kernel panic in swi1: net o kern/122954 net [lagg] IPv6 EUI64 incorrectly chosen for lagg devices f kern/122780 net [lagg] tcpdump on lagg interface during high pps wedge o kern/122685 net It is not visible passing packets in tcpdump(1) o kern/122319 net [wi] imposible to enable ad-hoc demo mode with Orinoco o kern/122290 net [netgraph] [panic] Netgraph related "kmem_map too smal o kern/122252 net [ipmi] [bge] IPMI problem with BCM5704 (does not work o kern/122033 net [ral] [lor] Lock order reversal in ral0 at bootup ieee o bin/121895 net [patch] rtsol(8)/rtsold(8) doesn't handle managed netw s kern/121774 net [swi] [panic] 6.3 kernel panic in swi1: net o kern/121555 net [panic] Fatal trap 12: current process = 12 (swi1: net o kern/121443 net [gif] [lor] icmp6_input/nd6_lookup o kern/121437 net [vlan] Routing to layer-2 address does not work on VLA o bin/121359 net [patch] [security] ppp(8): fix local stack overflow in o kern/121257 net [tcp] TSO + natd -> slow outgoing tcp traffic o kern/121181 net [panic] Fatal trap 3: breakpoint instruction fault whi o kern/120966 net [rum] kernel panic with if_rum and WPA encryption o kern/120566 net [request]: ifconfig(8) make order of arguments more fr o kern/120304 net [netgraph] [patch] netgraph source assumes 32-bit time o kern/120266 net [udp] [panic] gnugk causes kernel panic when closing U o bin/120060 net routed(8) deletes link-level routes in the presence of o kern/119945 net [rum] [panic] rum device in hostap mode, cause kernel o kern/119791 net [nfs] UDP NFS mount of aliased IP addresses from a Sol o kern/119617 net [nfs] nfs error on wpa network when reseting/shutdown f kern/119516 net [ip6] [panic] _mtx_lock_sleep: recursed on non-recursi o kern/119432 net [arp] route add -host -iface causes arp e o kern/119225 net [wi] 7.0-RC1 no carrier with Prism 2.5 wifi card [regr o kern/118727 net [netgraph] [patch] [request] add new ng_pf module o kern/117423 net [vlan] Duplicate IP on different interfaces o bin/117339 net [patch] route(8): loading routing management commands o bin/116643 net [patch] [request] fstat(1): add INET/INET6 socket deta o kern/116185 net [iwi] if_iwi driver leads system to reboot o kern/115239 net [ipnat] panic with 'kmem_map too small' using ipnat o kern/115019 net [netgraph] ng_ether upper hook packet flow stops on ad o kern/115002 net [wi] if_wi timeout. failed allocation (busy bit). ifco o kern/114915 net [patch] [pcn] pcn (sys/pci/if_pcn.c) ethernet driver f o kern/113432 net [ucom] WARNING: attempt to net_add_domain(netgraph) af o kern/112722 net [ipsec] [udp] IP v4 udp fragmented packet reject o kern/112686 net [patm] patm driver freezes System (FreeBSD 6.2-p4) i38 o bin/112557 net [patch] ppp(8) lock file should not use symlink name o kern/112528 net [nfs] NFS over TCP under load hangs with "impossible p o kern/111537 net [inet6] [patch] ip6_input() treats mbuf cluster wrong o kern/111457 net [ral] ral(4) freeze o kern/110284 net [if_ethersubr] Invalid Assumption in SIOCSIFADDR in et o kern/110249 net [kernel] [regression] [patch] setsockopt() error regre o kern/109470 net [wi] Orinoco Classic Gold PC Card Can't Channel Hop o bin/108895 net pppd(8): PPPoE dead connections on 6.2 [regression] o kern/107944 net [wi] [patch] Forget to unlock mutex-locks o conf/107035 net [patch] bridge(8): bridge interface given in rc.conf n o kern/106444 net [netgraph] [panic] Kernel Panic on Binding to an ip to o kern/106316 net [dummynet] dummynet with multipass ipfw drops packets o kern/105945 net Address can disappear from network interface s kern/105943 net Network stack may modify read-only mbuf chain copies o bin/105925 net problems with ifconfig(8) and vlan(4) [regression] o kern/104851 net [inet6] [patch] On link routes not configured when usi o kern/104751 net [netgraph] kernel panic, when getting info about my tr o kern/103191 net Unpredictable reboot o kern/103135 net [ipsec] ipsec with ipfw divert (not NAT) encodes a pac o kern/102540 net [netgraph] [patch] supporting vlan(4) by ng_fec(4) o conf/102502 net [netgraph] [patch] ifconfig name does't rename netgrap o kern/102035 net [plip] plip networking disables parallel port printing o kern/101948 net [ipf] [panic] Kernel Panic Trap No 12 Page Fault - cau o kern/100709 net [libc] getaddrinfo(3) should return TTL info o kern/100519 net [netisr] suggestion to fix suboptimal network polling o kern/98978 net [ipf] [patch] ipfilter drops OOW packets under 6.1-Rel o kern/98597 net [inet6] Bug in FreeBSD 6.1 IPv6 link-local DAD procedu o bin/98218 net wpa_supplicant(8) blacklist not working o kern/97306 net [netgraph] NG_L2TP locks after connection with failed o conf/97014 net [gif] gifconfig_gif? in rc.conf does not recognize IPv f kern/96268 net [socket] TCP socket performance drops by 3000% if pack o kern/95519 net [ral] ral0 could not map mbuf o kern/95288 net [pppd] [tty] [panic] if_ppp panic in sys/kern/tty_subr o kern/95277 net [netinet] [patch] IP Encapsulation mask_match() return o kern/95267 net packet drops periodically appear f kern/93378 net [tcp] Slow data transfer in Postfix and Cyrus IMAP (wo o kern/93019 net [ppp] ppp and tunX problems: no traffic after restarti o kern/92880 net [libc] [patch] almost rewritten inet_network(3) functi s kern/92279 net [dc] Core faults everytime I reboot, possible NIC issu o kern/91859 net [ndis] if_ndis does not work with Asus WL-138 s kern/91777 net [ipf] [patch] wrong behaviour with skip rule inside an o kern/91364 net [ral] [wep] WF-511 RT2500 Card PCI and WEP o kern/91311 net [aue] aue interface hanging o kern/87521 net [ipf] [panic] using ipfilter "auth" keyword leads to k o kern/87421 net [netgraph] [panic]: ng_ether + ng_eiface + if_bridge o kern/86871 net [tcp] [patch] allocation logic for PCBs in TIME_WAIT s o kern/86427 net [lor] Deadlock with FASTIPSEC and nat o kern/86103 net [ipf] Illegal NAT Traversal in IPFilter o kern/85780 net 'panic: bogus refcnt 0' in routing/ipv6 o bin/85445 net ifconfig(8): deprecated keyword to ifconfig inoperativ p kern/85320 net [gre] [patch] possible depletion of kernel stack in ip o bin/82975 net route change does not parse classfull network as given o kern/82881 net [netgraph] [panic] ng_fec(4) causes kernel panic after o kern/82468 net Using 64MB tcp send/recv buffers, trafficflow stops, i o bin/82185 net [patch] ndp(8) can delete the incorrect entry o kern/81095 net IPsec connection stops working if associated network i o kern/78968 net FreeBSD freezes on mbufs exhaustion (network interface o kern/78090 net [ipf] ipf filtering on bridged packets doesn't work if o kern/77341 net [ip6] problems with IPV6 implementation s kern/77195 net [ipf] [patch] ipfilter ioctl SIOCGNATL does not match o kern/75873 net Usability problem with non-RFC-compliant IP spoof prot s kern/75407 net [an] an(4): no carrier after short time a kern/71474 net [route] route lookup does not skip interfaces marked d o kern/71469 net default route to internet magically disappears with mu o kern/70904 net [ipf] ipfilter ipnat problem with h323 proxy support o kern/68889 net [panic] m_copym, length > size of mbuf chain o kern/66225 net [netgraph] [patch] extend ng_eiface(4) control message o kern/65616 net IPSEC can't detunnel GRE packets after real ESP encryp s kern/60293 net [patch] FreeBSD arp poison patch a kern/56233 net IPsec tunnel (ESP) over IPv6: MTU computation is wrong s bin/41647 net ifconfig(8) doesn't accept lladdr along with inet addr o kern/39937 net ipstealth issue a kern/38554 net [patch] changing interface ipaddress doesn't seem to w o kern/34665 net [ipf] [hang] ipfilter rcmd proxy "hangs". o kern/31940 net ip queue length too short for >500kpps o kern/31647 net [libc] socket calls can return undocumented EINVAL o kern/30186 net [libc] getaddrinfo(3) does not handle incorrect servna o kern/27474 net [ipf] [ppp] Interactive use of user PPP and ipfilter c f kern/24959 net [patch] proper TCP_NOPUSH/TCP_CORK compatibility o conf/23063 net [arp] [patch] for static ARP tables in rc.network o kern/21998 net [socket] [patch] ident only for outgoing connections o kern/5877 net [socket] sb_cc counts control data as well as data dat 441 problems total. From owner-freebsd-net@FreeBSD.ORG Mon Feb 4 17:22:54 2013 Return-Path: Delivered-To: freebsd-net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 6934DB79; Mon, 4 Feb 2013 17:22:54 +0000 (UTC) (envelope-from randall@lakerest.net) Received: from lakerest.net (lakerest.net [70.155.160.98]) by mx1.freebsd.org (Postfix) with ESMTP id C658EC23; Mon, 4 Feb 2013 17:22:50 +0000 (UTC) Received: from [10.1.1.101] (bsd4.lakerest.net [70.155.160.102]) (authenticated bits=0) by lakerest.net (8.14.4/8.14.3) with ESMTP id r14HN5AZ028472 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT); Mon, 4 Feb 2013 12:23:05 -0500 (EST) (envelope-from randall@lakerest.net) From: Randy Stewart Content-Type: multipart/mixed; boundary="Apple-Mail=_70D37FD7-3A2F-4331-BEC7-37AA9F73B8FC" Subject: Driver patch to look at... Date: Mon, 4 Feb 2013 12:22:49 -0500 Message-Id: To: John Baldwin , jv@FreeBSD.org, George Nevile Neil , Robert Watson , Kip Macy Mime-Version: 1.0 (Apple Message framework v1283) X-Mailer: Apple Mail (2.1283) Cc: freebsd-net X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Feb 2013 17:22:54 -0000 --Apple-Mail=_70D37FD7-3A2F-4331-BEC7-37AA9F73B8FC Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii All: I have been working with TCP in gigabit networks (igb driver actually) = and have found a very nasty problem with the way the driver is doing its put back = when it fills the out-bound transmit queue. Basically it has taken a packet from the head of the ring buffer, and = then=20 realizes it can't fit it into the transmit queue. So it just = re-enqueue's it into the ring buffer. Whats wrong with that? Well most of the time there are anywhere from 10-50 packets (maybe more) in that ring buffer when = you are operating at full speed (or trying to). This means you will see 10 = duplicate ACKs, do a fast retransmit and cut your cwnd in half.. not very nice = actually. The patch I have attached makes it so that 1) There are ways to swap back. 2) Use the peek in the ring buffer and only dequeue the packet if we put it into the transmit ring 3) If something goes wrong and the transmit frees the packet we dequeue = it. 4) If the transmit changed it (defrag etc) then swap out the new mbuf = that has taken its place. I have fixed the four intel drivers that had this systemic issue, but = there are still more to fix. Comments/review .. rotten egg's etc.. would be most welcome before I commit this.. Jack are you out there? Thanks R --Apple-Mail=_70D37FD7-3A2F-4331-BEC7-37AA9F73B8FC Content-Disposition: attachment; filename=driver_patch.txt Content-Type: text/plain; x-unix-mode=0644; name="driver_patch.txt" Content-Transfer-Encoding: quoted-printable Index: dev/e1000/if_em.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- dev/e1000/if_em.c (revision 246323) +++ dev/e1000/if_em.c (working copy) @@ -894,7 +894,7 @@ static int em_mq_start_locked(struct ifnet *ifp, struct tx_ring *txr, struct mbuf = *m) { struct adapter *adapter =3D txr->adapter; - struct mbuf *next; + struct mbuf *next, *next, *dequeued; int err =3D 0, enq =3D 0; =20 if ((ifp->if_drv_flags & (IFF_DRV_RUNNING | IFF_DRV_OACTIVE)) !=3D= @@ -905,22 +905,27 @@ em_mq_start_locked(struct ifnet *ifp, struct tx_ri } =20 enq =3D 0; - if (m =3D=3D NULL) { - next =3D drbr_dequeue(ifp, txr->br); - } else if (drbr_needs_enqueue(ifp, txr->br)) { - if ((err =3D drbr_enqueue(ifp, txr->br, m)) !=3D 0) + if (m) { + err =3D drbr_enqueue(ifp, txr->br, m); + if (err) { return (err); - next =3D drbr_dequeue(ifp, txr->br); - } else - next =3D m; + } + }=20 =20 /* Process the queue */ - while (next !=3D NULL) { + while ((next =3D drbr_peek(ifp, txr->br)) !=3D NULL) { + snext =3D next; if ((err =3D em_xmit(txr, &next)) !=3D 0) { - if (next !=3D NULL) - err =3D drbr_enqueue(ifp, txr->br, = next); - break; + if (next =3D=3D NULL) { + dequeued =3D drbr_dequeue(ifp, txr->br); + KASSERT(dequeued =3D=3D snext, = ("dequeued incorrect packet from buf_ring")); + } else if (next !=3D snext) { + drbr_swap(ifp, txr->br, next, snext); + } + break; } + dequeued =3D drbr_dequeue(ifp, txr->br); + KASSERT(dequeued =3D=3D snext, ("dequeued incorrect = packet from buf_ring")); enq++; ifp->if_obytes +=3D next->m_pkthdr.len; if (next->m_flags & M_MCAST) Index: dev/e1000/if_igb.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- dev/e1000/if_igb.c (revision 246323) +++ dev/e1000/if_igb.c (working copy) @@ -981,7 +981,7 @@ static int igb_mq_start_locked(struct ifnet *ifp, struct tx_ring *txr) { struct adapter *adapter =3D txr->adapter; - struct mbuf *next; + struct mbuf *next, *snext, *dequeued; int err =3D 0, enq; =20 IGB_TX_LOCK_ASSERT(txr); @@ -994,12 +994,21 @@ igb_mq_start_locked(struct ifnet *ifp, struct tx_r enq =3D 0; =20 /* Process the queue */ - while ((next =3D drbr_dequeue(ifp, txr->br)) !=3D NULL) { + while ((next =3D drbr_peek(ifp, txr->br)) !=3D NULL) { + snext =3D next; if ((err =3D igb_xmit(txr, &next)) !=3D 0) { - if (next !=3D NULL) - err =3D drbr_enqueue(ifp, txr->br, = next); + if (next =3D=3D NULL) { + /* It was freed, dequeue it */ + dequeued =3D drbr_dequeue(ifp, txr->br); + KASSERT(dequeued =3D=3D snext, = ("dequeued incorrect packet from buf_ring")); + } else if (next !=3D snext) { + /* it was changed -- defrag? pullup? */ + drbr_swap(ifp, txr->br, next, snext); + } break; } + dequeued =3D drbr_dequeue(ifp, txr->br); + KASSERT(dequeued =3D=3D snext, ("dequeued incorrect = packet from buf_ring")); enq++; ifp->if_obytes +=3D next->m_pkthdr.len; if (next->m_flags & M_MCAST) Index: dev/ixgbe/ixgbe.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- dev/ixgbe/ixgbe.c (revision 246323) +++ dev/ixgbe/ixgbe.c (working copy) @@ -821,7 +821,7 @@ static int ixgbe_mq_start_locked(struct ifnet *ifp, struct tx_ring *txr, struct = mbuf *m) { struct adapter *adapter =3D txr->adapter; - struct mbuf *next; + struct mbuf *next, *snext, *dequeued; int enqueued, err =3D 0; =20 if (((ifp->if_drv_flags & IFF_DRV_RUNNING) =3D=3D 0) || @@ -832,22 +832,27 @@ ixgbe_mq_start_locked(struct ifnet *ifp, struct tx } =20 enqueued =3D 0; - if (m =3D=3D NULL) { - next =3D drbr_dequeue(ifp, txr->br); - } else if (drbr_needs_enqueue(ifp, txr->br)) { - if ((err =3D drbr_enqueue(ifp, txr->br, m)) !=3D 0) + if (m) { + err =3D drbr_enqueue(ifp, txr->br, m); + if (err) { return (err); - next =3D drbr_dequeue(ifp, txr->br); - } else - next =3D m; + } + } =20 /* Process the queue */ - while (next !=3D NULL) { + while ((next =3D drbr_peek(ifp, txr->br)) !=3D NULL) { + snext =3D next; if ((err =3D ixgbe_xmit(txr, &next)) !=3D 0) { - if (next !=3D NULL) - err =3D drbr_enqueue(ifp, txr->br, = next); + if (next =3D=3D NULL) { + dequeued =3D drbr_dequeue(ifp, txr->br); + KASSERT(dequeued =3D=3D snext, = ("dequeued incorrect packet from buf_ring")); + } else if (next !=3D snext) { + drbr_swap(ifp, txr->br, next, snext); + } break; } + dequeued =3D drbr_dequeue(ifp, txr->br); + KASSERT(dequeued =3D=3D snext, ("dequeued incorrect = packet from buf_ring")); enqueued++; /* Send a copy of the frame to the BPF listener */ ETHER_BPF_MTAP(ifp, next); Index: dev/ixgbe/ixv.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- dev/ixgbe/ixv.c (revision 246323) +++ dev/ixgbe/ixv.c (working copy) @@ -605,7 +605,7 @@ static int ixv_mq_start_locked(struct ifnet *ifp, struct tx_ring *txr, struct mbuf = *m) { struct adapter *adapter =3D txr->adapter; - struct mbuf *next; + struct mbuf *next, *snext, *dequeued; int enqueued, err =3D 0; =20 if ((ifp->if_drv_flags & (IFF_DRV_RUNNING | IFF_DRV_OACTIVE)) !=3D= @@ -620,22 +620,26 @@ ixv_mq_start_locked(struct ifnet *ifp, struct tx_r ixv_txeof(txr); =20 enqueued =3D 0; - if (m =3D=3D NULL) { - next =3D drbr_dequeue(ifp, txr->br); - } else if (drbr_needs_enqueue(ifp, txr->br)) { - if ((err =3D drbr_enqueue(ifp, txr->br, m)) !=3D 0) + if (m) { + err =3D drbr_enqueue(ifp, txr->br, m); + if (err) { return (err); - next =3D drbr_dequeue(ifp, txr->br); - } else - next =3D m; - + } + } /* Process the queue */ - while (next !=3D NULL) { + while ((next =3D drbr_peek(ifp, txr->br)) !=3D NULL) { + snext =3D next; if ((err =3D ixv_xmit(txr, &next)) !=3D 0) { - if (next !=3D NULL) - err =3D drbr_enqueue(ifp, txr->br, = next); + if (next =3D=3D NULL) { + dequeued =3D drbr_dequeue(ifp, txr->br); + KASSERT(dequeued =3D=3D snext, = ("dequeued incorrect packet from buf_ring")); + } else if (next !=3D snext) { + drbr_swap(ifp, txr->br, next, snext); + } break; } + dequeued =3D drbr_dequeue(ifp, txr->br); + KASSERT(dequeued =3D=3D snext, ("dequeued incorrect = packet from buf_ring")); enqueued++; ifp->if_obytes +=3D next->m_pkthdr.len; if (next->m_flags & M_MCAST) Index: net/if_var.h =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- net/if_var.h (revision 246323) +++ net/if_var.h (working copy) @@ -622,6 +622,41 @@ drbr_enqueue(struct ifnet *ifp, struct buf_ring *b } =20 static __inline void +drbr_swap(struct ifnet *ifp, struct buf_ring *br, struct mbuf *new, = struct mbuf *prev) +{ + /* + * The top of the list needs to be swapped=20 + * for this one. + */ +#ifdef ALTQ + struct mbuf *m; + if (ifp !=3D NULL && ALTQ_IS_ENABLED(&ifp->if_snd)) { + /* Pull it off and put it back in */ + IFQ_DEQUEUE(&ifp->if_snd, m); + KASSERT(m =3D=3D prev, ("Swap out failed to find prev = mbuf")); + IFQ_DRV_DEQUEUE(&ifp->if_snd, new); + return; + } +#endif + buf_ring_swap(br, new, prev); +} + +static __inline struct mbuf * +drbr_peek(struct ifnet *ifp, struct buf_ring *br) +{ +#ifdef ALTQ + struct mbuf *m; + if (ifp !=3D NULL && ALTQ_IS_ENABLED(&ifp->if_snd)) { + /* Pull it off and put it back in */ + IFQ_DEQUEUE(&ifp->if_snd, m); + IFQ_DRV_DEQUEUE(&ifp->if_snd, m); + return (m); + } +#endif + return(buf_ring_peek(br)); +} + +static __inline void drbr_flush(struct ifnet *ifp, struct buf_ring *br) { struct mbuf *m; Index: sys/buf_ring.h =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- sys/buf_ring.h (revision 246323) +++ sys/buf_ring.h (working copy) @@ -208,6 +208,27 @@ buf_ring_dequeue_sc(struct buf_ring *br) } =20 /* + * Used to return a differnt mbuf to the + * top of the ring. This can happen if + * the driver changed the packets (some deframentation + * for example) and then realized the transmit + * ring was full. In such a case the old packet + * is now freed, but we want the order of the actual + * data (being sent in the new packet) to remain + * the same. + */ +static __inline void +buf_ring_swap(struct buf_ring *br, void *new, void *old) +{ + int ret; + if (br->br_cons_head =3D=3D br->br_prod_tail)=20 + /* Huh? */ + return; + ret =3D atomic_cmpset_long((uint64_t = *)&br->br_ring[br->br_cons_head], (uint64_t)old, (uint64_t)new); + KASSERT(ret, ("Swap out failed old:%p new:%p ret:%d", old, new, = ret)); +} + +/* * return a pointer to the first entry in the ring * without modifying it, or NULL if the ring is empty * race-prone if not protected by a lock --Apple-Mail=_70D37FD7-3A2F-4331-BEC7-37AA9F73B8FC Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii ----- Randall Stewart randall@lakerest.net --Apple-Mail=_70D37FD7-3A2F-4331-BEC7-37AA9F73B8FC-- From owner-freebsd-net@FreeBSD.ORG Mon Feb 4 18:11:10 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 9BD7CC4E; Mon, 4 Feb 2013 18:11:10 +0000 (UTC) (envelope-from jfvogel@gmail.com) Received: from mail-vb0-f47.google.com (mail-vb0-f47.google.com [209.85.212.47]) by mx1.freebsd.org (Postfix) with ESMTP id 140DCE0; Mon, 4 Feb 2013 18:11:09 +0000 (UTC) Received: by mail-vb0-f47.google.com with SMTP id e21so4063547vbm.34 for ; Mon, 04 Feb 2013 10:11:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=Qk74qJKy68r7kX2OHTZ8zPFfIPIqSCfFwaTitGi9Hu4=; b=PG4cKJjiY0B0MF/xmAm1YCHbNvWlzZnVRMZ632JoEB1FxQWxc874vUhdJBx6lsxEbu J1n7+PL/+CC2RWISmI2tlLI4vDJ5PGMq31GUhdHy0zJlvSupsOm3Gaz7LwQPdmAK9T3v T2OH05AJkr/UmCb0cx/Ife7UoG23d18Hvy2qHCMMRq7rx9BC1f1ZlalFN7pA/Pw/lcMp 27566iGM+DglYAngcMtQ+wHLsbBBdyxnjNIzRdsNS8KUOB6YD3Hd0+hsllxvzO5mToT0 b3TstXBWbQR6lGLrgJ8m7aX0wuyMgh2fS5Wmk46L10kjLPvGj6FT9mEN4OWoyWALunjt v3Pg== MIME-Version: 1.0 X-Received: by 10.52.240.146 with SMTP id wa18mr20734212vdc.47.1360001463776; Mon, 04 Feb 2013 10:11:03 -0800 (PST) Received: by 10.220.191.132 with HTTP; Mon, 4 Feb 2013 10:11:03 -0800 (PST) In-Reply-To: References: Date: Mon, 4 Feb 2013 10:11:03 -0800 Message-ID: Subject: Re: Driver patch to look at... From: Jack Vogel To: Randy Stewart Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: Kip Macy , John Baldwin , freebsd-net , Robert Watson , Jack F Vogel X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Feb 2013 18:11:10 -0000 On Mon, Feb 4, 2013 at 9:22 AM, Randy Stewart wrote: > All: > > I have been working with TCP in gigabit networks (igb driver actually) and > have > found a very nasty problem with the way the driver is doing its put back > when > it fills the out-bound transmit queue. > > Basically it has taken a packet from the head of the ring buffer, and then > realizes it can't fit it into the transmit queue. So it just re-enqueue's > it > into the ring buffer. Whats wrong with that? Well most of the time there > are anywhere from 10-50 packets (maybe more) in that ring buffer when you > are > operating at full speed (or trying to). This means you will see 10 > duplicate > ACKs, do a fast retransmit and cut your cwnd in half.. not very nice > actually. > > The patch I have attached makes it so that > > 1) There are ways to swap back. > 2) Use the peek in the ring buffer and only > dequeue the packet if we put it into the transmit ring > 3) If something goes wrong and the transmit frees the packet we dequeue it. > 4) If the transmit changed it (defrag etc) then swap out the new mbuf that > has taken its place. > > I have fixed the four intel drivers that had this systemic issue, but there > are still more to fix. > > Comments/review .. rotten egg's etc.. would be most welcome before > I commit this.. > > Jack are you out there? > > Yes, I'm usually perceived as being 'out there' :) If you had addressed it to 'jfv' rather than 'jv' it would have worked better. I have no theoretical objection to this, how much testing has it had? Jack From owner-freebsd-net@FreeBSD.ORG Mon Feb 4 18:28:37 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id AC73C4A2; Mon, 4 Feb 2013 18:28:37 +0000 (UTC) (envelope-from randall@lakerest.net) Received: from lakerest.net (lakerest.net [70.155.160.98]) by mx1.freebsd.org (Postfix) with ESMTP id 38D231F4; Mon, 4 Feb 2013 18:28:36 +0000 (UTC) Received: from [10.1.1.101] (bsd4.lakerest.net [70.155.160.102]) (authenticated bits=0) by lakerest.net (8.14.4/8.14.3) with ESMTP id r14ISpaA029199 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT); Mon, 4 Feb 2013 13:28:51 -0500 (EST) (envelope-from randall@lakerest.net) Subject: Re: Driver patch to look at... Mime-Version: 1.0 (Apple Message framework v1283) Content-Type: text/plain; charset=iso-8859-1 From: Randy Stewart In-Reply-To: Date: Mon, 4 Feb 2013 13:28:35 -0500 Content-Transfer-Encoding: quoted-printable Message-Id: References: To: Jack Vogel X-Mailer: Apple Mail (2.1283) Cc: Kip Macy , John Baldwin , freebsd-net , Robert Watson , Jack F Vogel X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Feb 2013 18:28:37 -0000 I am beating the heck out of it on my 9.x testbed where I lifted it = from. I don't have any ix or ixgbe cards to play with yet though.. R On Feb 4, 2013, at 1:11 PM, Jack Vogel wrote: >=20 >=20 > On Mon, Feb 4, 2013 at 9:22 AM, Randy Stewart = wrote: > All: >=20 > I have been working with TCP in gigabit networks (igb driver actually) = and have > found a very nasty problem with the way the driver is doing its put = back when > it fills the out-bound transmit queue. >=20 > Basically it has taken a packet from the head of the ring buffer, and = then > realizes it can't fit it into the transmit queue. So it just = re-enqueue's it > into the ring buffer. Whats wrong with that? Well most of the time = there > are anywhere from 10-50 packets (maybe more) in that ring buffer when = you are > operating at full speed (or trying to). This means you will see 10 = duplicate > ACKs, do a fast retransmit and cut your cwnd in half.. not very nice = actually. >=20 > The patch I have attached makes it so that >=20 > 1) There are ways to swap back. > 2) Use the peek in the ring buffer and only > dequeue the packet if we put it into the transmit ring > 3) If something goes wrong and the transmit frees the packet we = dequeue it. > 4) If the transmit changed it (defrag etc) then swap out the new mbuf = that > has taken its place. >=20 > I have fixed the four intel drivers that had this systemic issue, but = there > are still more to fix. >=20 > Comments/review .. rotten egg's etc.. would be most welcome before > I commit this.. >=20 > Jack are you out there? >=20 >=20 > Yes, I'm usually perceived as being 'out there' :) If you had = addressed it to 'jfv' rather than 'jv' it would have worked better. >=20 > I have no theoretical objection to this, how much testing has it had? >=20 > Jack >=20 ----- Randall Stewart randall@lakerest.net From owner-freebsd-net@FreeBSD.ORG Mon Feb 4 20:46:30 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id F29A9566; Mon, 4 Feb 2013 20:46:29 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net [IPv6:2001:470:1f10:75::2]) by mx1.freebsd.org (Postfix) with ESMTP id B9BE1D42; Mon, 4 Feb 2013 20:46:26 +0000 (UTC) Received: from pakbsde14.localnet (unknown [38.105.238.108]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 1EE4CB94A; Mon, 4 Feb 2013 15:46:26 -0500 (EST) From: John Baldwin To: freebsd-net@freebsd.org Subject: Re: Driver patch to look at... Date: Mon, 4 Feb 2013 15:24:58 -0500 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p22; KDE/4.5.5; amd64; ; ) References: In-Reply-To: MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Message-Id: <201302041524.58699.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Mon, 04 Feb 2013 15:46:26 -0500 (EST) Cc: Robert Watson , Kip Macy , Randy Stewart , jv@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Feb 2013 20:46:30 -0000 On Monday, February 04, 2013 12:22:49 pm Randy Stewart wrote: > All: > > I have been working with TCP in gigabit networks (igb driver actually) and have > found a very nasty problem with the way the driver is doing its put back when > it fills the out-bound transmit queue. > > Basically it has taken a packet from the head of the ring buffer, and then > realizes it can't fit it into the transmit queue. So it just re-enqueue's it > into the ring buffer. Whats wrong with that? Well most of the time there > are anywhere from 10-50 packets (maybe more) in that ring buffer when you are > operating at full speed (or trying to). This means you will see 10 duplicate > ACKs, do a fast retransmit and cut your cwnd in half.. not very nice actually. > > The patch I have attached makes it so that > > 1) There are ways to swap back. > 2) Use the peek in the ring buffer and only > dequeue the packet if we put it into the transmit ring > 3) If something goes wrong and the transmit frees the packet we dequeue it. > 4) If the transmit changed it (defrag etc) then swap out the new mbuf that > has taken its place. > > I have fixed the four intel drivers that had this systemic issue, but there > are still more to fix. > > Comments/review .. rotten egg's etc.. would be most welcome before > I commit this.. Does this only happen in drivers that use bufring? I seem to recall that drivers using IFQ would just stuff the packet at the head of the IFQ via IFQ_DRV_PREPEND() in this case so it is still the next packet to transmit. See, for example, this bit in dc_start_locked(): for (queued = 0; !IFQ_DRV_IS_EMPTY(&ifp->if_snd); ) { /* * If there's no way we can send any packets, return now. */ if (sc->dc_cdata.dc_tx_cnt > DC_TX_LIST_CNT - DC_TX_LIST_RSVD) { ifp->if_drv_flags |= IFF_DRV_OACTIVE; break; } IFQ_DRV_DEQUEUE(&ifp->if_snd, m_head); if (m_head == NULL) break; if (dc_encap(sc, &m_head)) { if (m_head == NULL) break; IFQ_DRV_PREPEND(&ifp->if_snd, m_head); ifp->if_drv_flags |= IFF_DRV_OACTIVE; break; } It sounds like drbr/buf_ring just don't handle this case correctly? It seems a shame to have to duplicate so much code in the various drivers to fix this, but that seems to be par for the course when using buf_ring. :( (buggy in edge cases and lots of duplicated code that is). Also, doing the drbr_swap() just so that drbr_dequeue() returns what you just swapped in seems... odd. It seems that it would be nicer instead to have some sort of drbr_peek() / drbr_advance() where the latter just skips over whatever the current head is? Then you could have something like: while ((next = drbr_peek()) != NULL) { if (!foo_encap(&next)) { if (next == NULL) drbr_advance(); break; } drbr_advance(); } I guess the sticky widget is the case of ATLQ as you need to dequeue the packet always in the ALTQ case and put it back if the encap fails. For your patch it's not clear to me how that works. It seems that if the encap routine frees the mbuf you try to dereference a freed pointer when you call drbr_dequeue(). I really think you will instead need some sort of 'drbr_putback()' and have 'drbr_peek()' dequeue in the ALTQ case and use 'drbr_putback()' to put it back (PREPEND) in the ALTQ case. -- John Baldwin From owner-freebsd-net@FreeBSD.ORG Tue Feb 5 10:49:02 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 65EE156A; Tue, 5 Feb 2013 10:49:02 +0000 (UTC) (envelope-from randall@lakerest.net) Received: from lakerest.net (lakerest.net [70.155.160.98]) by mx1.freebsd.org (Postfix) with ESMTP id D066CD62; Tue, 5 Feb 2013 10:49:01 +0000 (UTC) Received: from [10.1.1.101] (bsd4.lakerest.net [70.155.160.102]) (authenticated bits=0) by lakerest.net (8.14.4/8.14.3) with ESMTP id r15AnGZb038037 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT); Tue, 5 Feb 2013 05:49:16 -0500 (EST) (envelope-from randall@lakerest.net) Subject: Re: Driver patch to look at... Mime-Version: 1.0 (Apple Message framework v1283) Content-Type: text/plain; charset=us-ascii From: Randy Stewart In-Reply-To: <201302041524.58699.jhb@freebsd.org> Date: Tue, 5 Feb 2013 05:49:00 -0500 Content-Transfer-Encoding: quoted-printable Message-Id: <45AD1046-A630-4C96-B4D2-B8A7D6A6DC45@lakerest.net> References: <201302041524.58699.jhb@freebsd.org> To: John Baldwin X-Mailer: Apple Mail (2.1283) Cc: freebsd-net@freebsd.org, Robert Watson , Kip Macy , jv@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Feb 2013 10:49:02 -0000 On Feb 4, 2013, at 3:24 PM, John Baldwin wrote: > On Monday, February 04, 2013 12:22:49 pm Randy Stewart wrote: >> All: >>=20 >> I have been working with TCP in gigabit networks (igb driver = actually) and have >> found a very nasty problem with the way the driver is doing its put = back when >> it fills the out-bound transmit queue. >>=20 >> Basically it has taken a packet from the head of the ring buffer, and = then=20 >> realizes it can't fit it into the transmit queue. So it just = re-enqueue's it >> into the ring buffer. Whats wrong with that? Well most of the time = there >> are anywhere from 10-50 packets (maybe more) in that ring buffer when = you are >> operating at full speed (or trying to). This means you will see 10 = duplicate >> ACKs, do a fast retransmit and cut your cwnd in half.. not very nice = actually. >>=20 >> The patch I have attached makes it so that >>=20 >> 1) There are ways to swap back. >> 2) Use the peek in the ring buffer and only >> dequeue the packet if we put it into the transmit ring >> 3) If something goes wrong and the transmit frees the packet we = dequeue it. >> 4) If the transmit changed it (defrag etc) then swap out the new mbuf = that >> has taken its place. >>=20 >> I have fixed the four intel drivers that had this systemic issue, but = there >> are still more to fix. >>=20 >> Comments/review .. rotten egg's etc.. would be most welcome before >> I commit this.. >=20 > Does this only happen in drivers that use buffering? Yep, there are a lot of drivers that *do not* use the drbr_xxxx() = functions and for those they do the IFQ_DRV_PREPEND().. its only the newer drivers = like the intel 1Gig and 10Gig ones that seem effected Also effected are : bxe cxgb oce en I have not fixed those yet. > I seem to recall that > drivers using IFQ would just stuff the packet at the head of the IFQ = via > IFQ_DRV_PREPEND() in this case so it is still the next packet to = transmit. > See, for example, this bit in dc_start_locked(): >=20 > for (queued =3D 0; !IFQ_DRV_IS_EMPTY(&ifp->if_snd); ) { > /* > * If there's no way we can send any packets, return = now. > */ > if (sc->dc_cdata.dc_tx_cnt > DC_TX_LIST_CNT - = DC_TX_LIST_RSVD) { > ifp->if_drv_flags |=3D IFF_DRV_OACTIVE; > break; > } > IFQ_DRV_DEQUEUE(&ifp->if_snd, m_head); > if (m_head =3D=3D NULL) > break; >=20 > if (dc_encap(sc, &m_head)) { > if (m_head =3D=3D NULL) > break; > IFQ_DRV_PREPEND(&ifp->if_snd, m_head); > ifp->if_drv_flags |=3D IFF_DRV_OACTIVE; > break; > } >=20 > It sounds like drbr/buf_ring just don't handle this case correctly? = It > seems a shame to have to duplicate so much code in the various drivers = to > fix this, but that seems to be par for the course when using buf_ring. = :( > (buggy in edge cases and lots of duplicated code that is). > Also, doing the drbr_swap() just so that drbr_dequeue() returns what = you > just swapped in seems... odd. It seems that it would be nicer instead > to have some sort of drbr_peek() / drbr_advance() where the latter = just > skips over whatever the current head is? Then you could have = something > like: >=20 > while ((next =3D drbr_peek()) !=3D NULL) { > if (!foo_encap(&next)) { > if (next =3D=3D NULL) > drbr_advance(); > break; > } > drbr_advance(); > } >=20 That was what I originally did (without the rename), but there is a sure = crash waiting in that. The only difference from what I originally had was just drbr_dequeue().. = but I was being a bit lazy and not wanting to add yet another function to = the=20 drbr_xxxx code since essential it would be a clone of drbr_dequeue() = without returning the mbuf. The crash potential here is in that foo_encap(&next) often times will = return a different mbuf (at least in the igb driver it does). I believe its due to either the m_pullup() which could change the lead mbuf you want in the drbr_ring, or the m_defrag all within igb_xmit. Thus you have to track what comes back from the !foo_encap() call and compare it to=20 see if you must swap it out.=20 > I guess the sticky widget is the case of ATLQ as you need to dequeue = the > packet always in the ALTQ case and put it back if the encap fails. Yeah ALTQ is ugly and IMO we need to re-write it anyway.. maybe re-think this whole layer ;-0 > For > your patch it's not clear to me how that works. It seems that if the > encap routine frees the mbuf you try to dereference a freed pointer = when > you call drbr_dequeue(). Hmm you are right.. I forgot how we keep those using the mbuf itself... > I really think you will instead need some sort > of 'drbr_putback()' and have 'drbr_peek()' dequeue in the ALTQ case = and > use 'drbr_putback()' to put it back (PREPEND) in the ALTQ case. We could do that but drbr_putback() would probably need both the old and new pointers and then we could make it do the ring_swap() to put the right mbuf in place.. Let me go explore that and come up with a better patch ;-) R >=20 > --=20 > John Baldwin > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >=20 ----- Randall Stewart randall@lakerest.net From owner-freebsd-net@FreeBSD.ORG Tue Feb 5 11:49:42 2013 Return-Path: Delivered-To: freebsd-net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 5FA7BB30; Tue, 5 Feb 2013 11:49:42 +0000 (UTC) (envelope-from rrs@lakerest.net) Received: from lakerest.net (lakerest.net [70.155.160.98]) by mx1.freebsd.org (Postfix) with ESMTP id 88375111; Tue, 5 Feb 2013 11:49:38 +0000 (UTC) Received: from [10.1.1.101] (bsd4.lakerest.net [70.155.160.102]) (authenticated bits=0) by lakerest.net (8.14.4/8.14.3) with ESMTP id r15BnrIa038553 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT); Tue, 5 Feb 2013 06:49:54 -0500 (EST) (envelope-from rrs@lakerest.net) Subject: Re: Driver patch to look at... Mime-Version: 1.0 (Apple Message framework v1283) Content-Type: multipart/mixed; boundary="Apple-Mail=_E172FC28-2504-451F-A867-0842517F6CD4" From: Randall Stewart In-Reply-To: <45AD1046-A630-4C96-B4D2-B8A7D6A6DC45@lakerest.net> Date: Tue, 5 Feb 2013 06:49:37 -0500 Message-Id: <39571D84-A8C0-46A4-8EFA-CF74D862EAAE@lakerest.net> References: <201302041524.58699.jhb@freebsd.org> <45AD1046-A630-4C96-B4D2-B8A7D6A6DC45@lakerest.net> To: John Baldwin X-Mailer: Apple Mail (2.1283) Cc: freebsd-net , Robert Watson , Kip Macy , jlv@FreeBSD.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Feb 2013 11:49:42 -0000 --Apple-Mail=_E172FC28-2504-451F-A867-0842517F6CD4 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=windows-1252 John: Here is an updated patch, per your suggestions. Note that I also expanded and the only driver that uses these methods I did not touch is the cxgb, but thats because I am not really sure it has the problem=85 = it does not quite enqueue the same way (it appears) that the other drivers = do ;-) R --Apple-Mail=_E172FC28-2504-451F-A867-0842517F6CD4 Content-Disposition: attachment; filename=driver_patch.txt Content-Type: text/plain; x-unix-mode=0644; name="driver_patch.txt" Content-Transfer-Encoding: quoted-printable Index: dev/e1000/if_em.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- dev/e1000/if_em.c (revision 246323) +++ dev/e1000/if_em.c (working copy) @@ -894,7 +894,7 @@ static int em_mq_start_locked(struct ifnet *ifp, struct tx_ring *txr, struct mbuf = *m) { struct adapter *adapter =3D txr->adapter; - struct mbuf *next; + struct mbuf *next, *snext; int err =3D 0, enq =3D 0; =20 if ((ifp->if_drv_flags & (IFF_DRV_RUNNING | IFF_DRV_OACTIVE)) !=3D= @@ -905,22 +905,25 @@ em_mq_start_locked(struct ifnet *ifp, struct tx_ri } =20 enq =3D 0; - if (m =3D=3D NULL) { - next =3D drbr_dequeue(ifp, txr->br); - } else if (drbr_needs_enqueue(ifp, txr->br)) { - if ((err =3D drbr_enqueue(ifp, txr->br, m)) !=3D 0) + if (m) { + err =3D drbr_enqueue(ifp, txr->br, m); + if (err) { return (err); - next =3D drbr_dequeue(ifp, txr->br); - } else - next =3D m; + } + }=20 =20 /* Process the queue */ - while (next !=3D NULL) { + while ((next =3D drbr_peek(ifp, txr->br)) !=3D NULL) { + snext =3D next; if ((err =3D em_xmit(txr, &next)) !=3D 0) { - if (next !=3D NULL) - err =3D drbr_enqueue(ifp, txr->br, = next); - break; + if (next =3D=3D NULL) { + drbr_advance(ifp, txr->br); + } else { + drbr_putback(ifp, txr->br, next, snext); + } + break; } + drbr_advance(ifp, txr->br); enq++; ifp->if_obytes +=3D next->m_pkthdr.len; if (next->m_flags & M_MCAST) @@ -928,7 +931,6 @@ em_mq_start_locked(struct ifnet *ifp, struct tx_ri ETHER_BPF_MTAP(ifp, next); if ((ifp->if_drv_flags & IFF_DRV_RUNNING) =3D=3D 0) break; - next =3D drbr_dequeue(ifp, txr->br); } =20 if (enq > 0) { Index: dev/e1000/if_igb.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- dev/e1000/if_igb.c (revision 246323) +++ dev/e1000/if_igb.c (working copy) @@ -981,7 +981,7 @@ static int igb_mq_start_locked(struct ifnet *ifp, struct tx_ring *txr) { struct adapter *adapter =3D txr->adapter; - struct mbuf *next; + struct mbuf *next, *snext; int err =3D 0, enq; =20 IGB_TX_LOCK_ASSERT(txr); @@ -994,12 +994,23 @@ igb_mq_start_locked(struct ifnet *ifp, struct tx_r enq =3D 0; =20 /* Process the queue */ - while ((next =3D drbr_dequeue(ifp, txr->br)) !=3D NULL) { + while ((next =3D drbr_peek(ifp, txr->br)) !=3D NULL) { + snext =3D next; if ((err =3D igb_xmit(txr, &next)) !=3D 0) { - if (next !=3D NULL) - err =3D drbr_enqueue(ifp, txr->br, = next); + if (next =3D=3D NULL) { + /* It was freed, move forward */ + drbr_advance(ifp, txr->br); + } else { + /*=20 + * Still have one left, it may not be + * the same since the transmit function + * may have changed it. + */ + drbr_putback(ifp, txr->br, next, snext); + } break; } + drbr_advance(ifp, txr->br); enq++; ifp->if_obytes +=3D next->m_pkthdr.len; if (next->m_flags & M_MCAST) Index: dev/oce/oce_if.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- dev/oce/oce_if.c (revision 246323) +++ dev/oce/oce_if.c (working copy) @@ -1154,6 +1154,7 @@ oce_multiq_transmit(struct ifnet *ifp, struct mbuf POCE_SOFTC sc =3D ifp->if_softc; int status =3D 0, queue_index =3D 0; struct mbuf *next =3D NULL; + struct mbuf *snext; struct buf_ring *br =3D NULL; =20 br =3D wq->br; @@ -1166,29 +1167,28 @@ oce_multiq_transmit(struct ifnet *ifp, struct = mbuf return status; } =20 - if (m =3D=3D NULL) - next =3D drbr_dequeue(ifp, br); =09 - else if (drbr_needs_enqueue(ifp, br)) { + if (m) { if ((status =3D drbr_enqueue(ifp, br, m)) !=3D 0) return status; - next =3D drbr_dequeue(ifp, br); - } else - next =3D m; - - while (next !=3D NULL) { + }=20 + while ((next =3D drbr_peek(ifp, br)) !=3D NULL) { + snext =3D next; if (oce_tx(sc, &next, queue_index)) { - if (next !=3D NULL) { + if (next =3D=3D NULL) { + drbr_advance(ifp, br); + } else { + drbr_putback(ifp, br, next, snext); wq->tx_stats.tx_stops ++; ifp->if_drv_flags |=3D IFF_DRV_OACTIVE; status =3D drbr_enqueue(ifp, br, next); } =20 break; } + drbr_advance(ifp, br); ifp->if_obytes +=3D next->m_pkthdr.len; if (next->m_flags & M_MCAST) ifp->if_omcasts++; ETHER_BPF_MTAP(ifp, next); - next =3D drbr_dequeue(ifp, br); } =20 return status; Index: dev/ixgbe/ixgbe.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- dev/ixgbe/ixgbe.c (revision 246323) +++ dev/ixgbe/ixgbe.c (working copy) @@ -821,7 +821,7 @@ static int ixgbe_mq_start_locked(struct ifnet *ifp, struct tx_ring *txr, struct = mbuf *m) { struct adapter *adapter =3D txr->adapter; - struct mbuf *next; + struct mbuf *next, *snext; int enqueued, err =3D 0; =20 if (((ifp->if_drv_flags & IFF_DRV_RUNNING) =3D=3D 0) || @@ -832,22 +832,25 @@ ixgbe_mq_start_locked(struct ifnet *ifp, struct tx } =20 enqueued =3D 0; - if (m =3D=3D NULL) { - next =3D drbr_dequeue(ifp, txr->br); - } else if (drbr_needs_enqueue(ifp, txr->br)) { - if ((err =3D drbr_enqueue(ifp, txr->br, m)) !=3D 0) + if (m) { + err =3D drbr_enqueue(ifp, txr->br, m); + if (err) { return (err); - next =3D drbr_dequeue(ifp, txr->br); - } else - next =3D m; + } + } =20 /* Process the queue */ - while (next !=3D NULL) { + while ((next =3D drbr_peek(ifp, txr->br)) !=3D NULL) { + snext =3D next; if ((err =3D ixgbe_xmit(txr, &next)) !=3D 0) { - if (next !=3D NULL) - err =3D drbr_enqueue(ifp, txr->br, = next); + if (next =3D=3D NULL) { + drbr_advance(ifp, txr->br); + } else { + drbr_putback(ifp, txr->br, next, snext); + } break; } + drbr_advance(ifp, txr->br); enqueued++; /* Send a copy of the frame to the BPF listener */ ETHER_BPF_MTAP(ifp, next); @@ -855,7 +858,6 @@ ixgbe_mq_start_locked(struct ifnet *ifp, struct tx break; if (txr->tx_avail < IXGBE_TX_OP_THRESHOLD) ixgbe_txeof(txr); - next =3D drbr_dequeue(ifp, txr->br); } =20 if (enqueued > 0) { Index: dev/ixgbe/ixv.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- dev/ixgbe/ixv.c (revision 246323) +++ dev/ixgbe/ixv.c (working copy) @@ -605,7 +605,7 @@ static int ixv_mq_start_locked(struct ifnet *ifp, struct tx_ring *txr, struct mbuf = *m) { struct adapter *adapter =3D txr->adapter; - struct mbuf *next; + struct mbuf *next, *snext; int enqueued, err =3D 0; =20 if ((ifp->if_drv_flags & (IFF_DRV_RUNNING | IFF_DRV_OACTIVE)) !=3D= @@ -620,22 +620,24 @@ ixv_mq_start_locked(struct ifnet *ifp, struct tx_r ixv_txeof(txr); =20 enqueued =3D 0; - if (m =3D=3D NULL) { - next =3D drbr_dequeue(ifp, txr->br); - } else if (drbr_needs_enqueue(ifp, txr->br)) { - if ((err =3D drbr_enqueue(ifp, txr->br, m)) !=3D 0) + if (m) { + err =3D drbr_enqueue(ifp, txr->br, m); + if (err) { return (err); - next =3D drbr_dequeue(ifp, txr->br); - } else - next =3D m; - + } + } /* Process the queue */ - while (next !=3D NULL) { + while ((next =3D drbr_peek(ifp, txr->br)) !=3D NULL) { + snext =3D next; if ((err =3D ixv_xmit(txr, &next)) !=3D 0) { - if (next !=3D NULL) - err =3D drbr_enqueue(ifp, txr->br, = next); + if (next =3D=3D NULL) { + drbr_advance(ifp, txr->br); + } else { + drbr_putback(ifp, txr->br, next, snext); + } break; } + drbr_advance(ifp, txr->br); enqueued++; ifp->if_obytes +=3D next->m_pkthdr.len; if (next->m_flags & M_MCAST) @@ -648,7 +650,6 @@ ixv_mq_start_locked(struct ifnet *ifp, struct tx_r ifp->if_drv_flags |=3D IFF_DRV_OACTIVE; break; } - next =3D drbr_dequeue(ifp, txr->br); } =20 if (enqueued > 0) { Index: dev/bxe/if_bxe.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- dev/bxe/if_bxe.c (revision 246323) +++ dev/bxe/if_bxe.c (working copy) @@ -9491,7 +9491,7 @@ bxe_tx_mq_start_locked(struct ifnet *ifp, struct bxe_fastpath *fp, struct mbuf *m) { struct bxe_softc *sc; - struct mbuf *next; + struct mbuf *next, *snext; int depth, rc, tx_count; =20 sc =3D fp->sc; @@ -9506,24 +9506,16 @@ bxe_tx_mq_start_locked(struct ifnet *ifp, =20 BXE_FP_LOCK_ASSERT(fp); =20 - if (m =3D=3D NULL) { - /* No new work, check for pending frames. */ - next =3D drbr_dequeue(ifp, fp->br); - } else if (drbr_needs_enqueue(ifp, fp->br)) { - /* Both new and pending work, maintain packet order. */ + if (m) { rc =3D drbr_enqueue(ifp, fp->br, m); if (rc !=3D 0) { fp->tx_soft_errors++; goto bxe_tx_mq_start_locked_exit; } - next =3D drbr_dequeue(ifp, fp->br); - } else - /* New work only, nothing pending. */ - next =3D m; - + } /* Keep adding entries while there are frames to send. */ - while (next !=3D NULL) { - + while ((next =3D drbr_peek(ifp, fp->br)) !=3D NULL) { + snext =3D next; /* The transmit mbuf now belongs to us, keep track of = it. */ fp->tx_mbuf_alloc++; =20 @@ -9537,23 +9529,22 @@ bxe_tx_mq_start_locked(struct ifnet *ifp, if (__predict_false(rc !=3D 0)) { fp->tx_encap_failures++; /* Very Bad Frames(tm) may have been dropped. */ - if (next !=3D NULL) { + if (next =3D=3D NULL) { + drbr_advance(ifp, fp->br); + } else { + drbr_putback(ifp, fp->br, next, snext); /* * Mark the TX queue as full and save * the frame. */ ifp->if_drv_flags |=3D IFF_DRV_OACTIVE; fp->tx_frame_deferred++; - - /* This may reorder frame. */ - rc =3D drbr_enqueue(ifp, fp->br, next); fp->tx_mbuf_alloc--; } - /* Stop looking for more work. */ break; } - + drbr_advance(ifp, fp->br); /* The transmit frame was enqueued successfully. */ tx_count++; =20 @@ -9574,8 +9565,6 @@ bxe_tx_mq_start_locked(struct ifnet *ifp, ifp->if_drv_flags &=3D ~IFF_DRV_OACTIVE; break; } - - next =3D drbr_dequeue(ifp, fp->br); } =20 /* No TX packets were dequeued. */ Index: net/if_var.h =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- net/if_var.h (revision 246323) +++ net/if_var.h (working copy) @@ -622,6 +622,47 @@ drbr_enqueue(struct ifnet *ifp, struct buf_ring *b } =20 static __inline void +drbr_putback(struct ifnet *ifp, struct buf_ring *br, struct mbuf *new, = struct mbuf *prev) +{ + /* + * The top of the list needs to be swapped=20 + * for this one. + */ +#ifdef ALTQ + struct mbuf *m; + if (ifp !=3D NULL && ALTQ_IS_ENABLED(&ifp->if_snd)) { + /*=20 + * Peek in altq case dequeued it + * so put it back. + */ + IFQ_DRV_PREPEND(ifq, new); + return; + } +#endif + if (new !=3D prev)=20 + buf_ring_swap(br, new, prev); +} + +static __inline struct mbuf * +drbr_peek(struct ifnet *ifp, struct buf_ring *br) +{ +#ifdef ALTQ + struct mbuf *m; + if (ifp !=3D NULL && ALTQ_IS_ENABLED(&ifp->if_snd)) { + /*=20 + * Pull it off like a dequeue + * since drbr_advance() does nothing + * for altq and drbr_putback() will + * use the old prepend function. + */ + IFQ_DEQUEUE(&ifp->if_snd, m); + return (m); + } +#endif + return(buf_ring_peek(br)); +} + +static __inline void drbr_flush(struct ifnet *ifp, struct buf_ring *br) { struct mbuf *m; @@ -656,6 +697,17 @@ drbr_dequeue(struct ifnet *ifp, struct buf_ring *b return (buf_ring_dequeue_sc(br)); } =20 +static __inline void +drbr_advance(struct ifnet *ifp, struct buf_ring *br) +{ +#ifdef ALTQ + /* Nothing to do here since peek dequeues in altq case */ + return; +#endif + return (buf_ring_advance_sc(br)); +} + + static __inline struct mbuf * drbr_dequeue_cond(struct ifnet *ifp, struct buf_ring *br, int (*func) (struct mbuf *, void *), void *arg)=20 Index: sys/buf_ring.h =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- sys/buf_ring.h (revision 246323) +++ sys/buf_ring.h (working copy) @@ -208,6 +208,53 @@ buf_ring_dequeue_sc(struct buf_ring *br) } =20 /* + * single-consumer advance after a peek + * use where it is protected by a lock + * e.g. a network driver's tx queue lock + */ +static __inline void +buf_ring_advance_sc(struct buf_ring *br) +{ + uint32_t cons_head, cons_next; + uint32_t prod_tail; + void *buf; +=09 + cons_head =3D br->br_cons_head; + prod_tail =3D br->br_prod_tail; +=09 + cons_next =3D (cons_head + 1) & br->br_cons_mask; +=09 + if (cons_head =3D=3D prod_tail)=20 + return; + + br->br_cons_head =3D cons_next; + buf =3D br->br_ring[cons_head]; + br->br_cons_tail =3D cons_next; +} + + +/* + * Used to return a differnt mbuf to the + * top of the ring. This can happen if + * the driver changed the packets (some defragmentation + * for example) and then realized the transmit + * ring was full. In such a case the old packet + * is now freed, but we want the order of the actual + * data (being sent in the new packet) to remain + * the same. + */ +static __inline void +buf_ring_swap(struct buf_ring *br, void *new, void *old) +{ + int ret; + if (br->br_cons_head =3D=3D br->br_prod_tail)=20 + /* Huh? */ + return; + ret =3D atomic_cmpset_long((uint64_t = *)&br->br_ring[br->br_cons_head], (uint64_t)old, (uint64_t)new); + KASSERT(ret, ("Swap out failed old:%p new:%p ret:%d", old, new, = ret)); +} + +/* * return a pointer to the first entry in the ring * without modifying it, or NULL if the ring is empty * race-prone if not protected by a lock Index: ofed/drivers/net/mlx4/en_tx.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- ofed/drivers/net/mlx4/en_tx.c (revision 246323) +++ ofed/drivers/net/mlx4/en_tx.c (working copy) @@ -919,7 +919,7 @@ mlx4_en_transmit_locked(struct ifnet *dev, int tx_ { struct mlx4_en_priv *priv =3D netdev_priv(dev); struct mlx4_en_tx_ring *ring; - struct mbuf *next; + struct mbuf *next, *snext; int enqueued, err =3D 0; =20 ring =3D &priv->tx_ring[tx_ind]; @@ -931,22 +931,22 @@ mlx4_en_transmit_locked(struct ifnet *dev, int tx_ } =20 enqueued =3D 0; - if (m =3D=3D NULL) { - next =3D drbr_dequeue(dev, ring->br); - } else if (drbr_needs_enqueue(dev, ring->br)) { + if (m) { if ((err =3D drbr_enqueue(dev, ring->br, m)) !=3D 0) return (err); - next =3D drbr_dequeue(dev, ring->br); - } else - next =3D m; - + } /* Process the queue */ - while (next !=3D NULL) { + while ((next =3D drbr_peek(ifp, txr->br)) !=3D NULL) { + snext =3D next; if ((err =3D mlx4_en_xmit(dev, tx_ind, &next)) !=3D 0) { - if (next !=3D NULL) - err =3D drbr_enqueue(dev, ring->br, = next); + if (next =3D=3D NULL) { + drbr_advance(ifp, txr->br); + } else { + drbr_putback(ifp, txr->br, next, snext); + } break; } + drbr_advance(ifp, txr->br); enqueued++; dev->if_obytes +=3D next->m_pkthdr.len; if (next->m_flags & M_MCAST) @@ -955,7 +955,6 @@ mlx4_en_transmit_locked(struct ifnet *dev, int tx_ ETHER_BPF_MTAP(dev, next); if ((dev->if_drv_flags & IFF_DRV_RUNNING) =3D=3D 0) break; - next =3D drbr_dequeue(dev, ring->br); } =20 if (enqueued > 0) --Apple-Mail=_E172FC28-2504-451F-A867-0842517F6CD4 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii On Feb 5, 2013, at 5:49 AM, Randy Stewart wrote: >=20 > On Feb 4, 2013, at 3:24 PM, John Baldwin wrote: >=20 >> On Monday, February 04, 2013 12:22:49 pm Randy Stewart wrote: >>> All: >>>=20 >>> I have been working with TCP in gigabit networks (igb driver = actually) and have >>> found a very nasty problem with the way the driver is doing its put = back when >>> it fills the out-bound transmit queue. >>>=20 >>> Basically it has taken a packet from the head of the ring buffer, = and then=20 >>> realizes it can't fit it into the transmit queue. So it just = re-enqueue's it >>> into the ring buffer. Whats wrong with that? Well most of the time = there >>> are anywhere from 10-50 packets (maybe more) in that ring buffer = when you are >>> operating at full speed (or trying to). This means you will see 10 = duplicate >>> ACKs, do a fast retransmit and cut your cwnd in half.. not very nice = actually. >>>=20 >>> The patch I have attached makes it so that >>>=20 >>> 1) There are ways to swap back. >>> 2) Use the peek in the ring buffer and only >>> dequeue the packet if we put it into the transmit ring >>> 3) If something goes wrong and the transmit frees the packet we = dequeue it. >>> 4) If the transmit changed it (defrag etc) then swap out the new = mbuf that >>> has taken its place. >>>=20 >>> I have fixed the four intel drivers that had this systemic issue, = but there >>> are still more to fix. >>>=20 >>> Comments/review .. rotten egg's etc.. would be most welcome before >>> I commit this.. >>=20 >> Does this only happen in drivers that use buffering? >=20 > Yep, there are a lot of drivers that *do not* use the drbr_xxxx() = functions and > for those they do the IFQ_DRV_PREPEND().. its only the newer drivers = like the > intel 1Gig and 10Gig ones that seem effected >=20 > Also effected are : >=20 > bxe > cxgb > oce > en >=20 > I have not fixed those yet. >=20 >> I seem to recall that >> drivers using IFQ would just stuff the packet at the head of the IFQ = via >> IFQ_DRV_PREPEND() in this case so it is still the next packet to = transmit. >> See, for example, this bit in dc_start_locked(): >>=20 >> for (queued =3D 0; !IFQ_DRV_IS_EMPTY(&ifp->if_snd); ) { >> /* >> * If there's no way we can send any packets, return = now. >> */ >> if (sc->dc_cdata.dc_tx_cnt > DC_TX_LIST_CNT - = DC_TX_LIST_RSVD) { >> ifp->if_drv_flags |=3D IFF_DRV_OACTIVE; >> break; >> } >> IFQ_DRV_DEQUEUE(&ifp->if_snd, m_head); >> if (m_head =3D=3D NULL) >> break; >>=20 >> if (dc_encap(sc, &m_head)) { >> if (m_head =3D=3D NULL) >> break; >> IFQ_DRV_PREPEND(&ifp->if_snd, m_head); >> ifp->if_drv_flags |=3D IFF_DRV_OACTIVE; >> break; >> } >>=20 >> It sounds like drbr/buf_ring just don't handle this case correctly? = It >> seems a shame to have to duplicate so much code in the various = drivers to >> fix this, but that seems to be par for the course when using = buf_ring. :( >> (buggy in edge cases and lots of duplicated code that is). >=20 >> Also, doing the drbr_swap() just so that drbr_dequeue() returns what = you >> just swapped in seems... odd. It seems that it would be nicer = instead >> to have some sort of drbr_peek() / drbr_advance() where the latter = just >> skips over whatever the current head is? Then you could have = something >> like: >>=20 >> while ((next =3D drbr_peek()) !=3D NULL) { >> if (!foo_encap(&next)) { >> if (next =3D=3D NULL) >> drbr_advance(); >> break; >> } >> drbr_advance(); >> } >>=20 >=20 > That was what I originally did (without the rename), but there is a = sure crash waiting in that. > The only difference from what I originally had was just = drbr_dequeue().. but > I was being a bit lazy and not wanting to add yet another function to = the=20 > drbr_xxxx code since essential it would be a clone of drbr_dequeue() = without > returning the mbuf. >=20 > The crash potential here is in that foo_encap(&next) often times will = return > a different mbuf (at least in the igb driver it does). I believe its = due > to either the m_pullup() which could change the lead mbuf you want > in the drbr_ring, or the m_defrag all within igb_xmit. Thus you have > to track what comes back from the !foo_encap() call and compare it to=20= > see if you must swap it out.=20 >=20 >=20 >> I guess the sticky widget is the case of ATLQ as you need to dequeue = the >> packet always in the ALTQ case and put it back if the encap fails. >=20 > Yeah ALTQ is ugly and IMO we need to re-write it anyway.. maybe = re-think > this whole layer ;-0 >=20 >> For >> your patch it's not clear to me how that works. It seems that if the >> encap routine frees the mbuf you try to dereference a freed pointer = when >> you call drbr_dequeue(). >=20 > Hmm you are right.. I forgot how we keep those using the mbuf = itself... >=20 >> I really think you will instead need some sort >> of 'drbr_putback()' and have 'drbr_peek()' dequeue in the ALTQ case = and >> use 'drbr_putback()' to put it back (PREPEND) in the ALTQ case. >=20 > We could do that but drbr_putback() would probably need both the old > and new pointers and then we could make it do the ring_swap() to put > the right mbuf in place.. >=20 > Let me go explore that and come up with a better patch ;-) >=20 > R >=20 >=20 >>=20 >> --=20 >> John Baldwin >> _______________________________________________ >> freebsd-net@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-net >> To unsubscribe, send any mail to = "freebsd-net-unsubscribe@freebsd.org" >>=20 >=20 > ----- > Randall Stewart > randall@lakerest.net >=20 >=20 >=20 >=20 > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >=20 ------------------------------ Randall Stewart 803-317-4952 (cell) --Apple-Mail=_E172FC28-2504-451F-A867-0842517F6CD4-- From owner-freebsd-net@FreeBSD.ORG Tue Feb 5 13:19:10 2013 Return-Path: Delivered-To: freebsd-net@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id B8000142; Tue, 5 Feb 2013 13:19:10 +0000 (UTC) (envelope-from linimon@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id 76201889; Tue, 5 Feb 2013 13:19:10 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.6/8.14.6) with ESMTP id r15DJAwO045243; Tue, 5 Feb 2013 13:19:10 GMT (envelope-from linimon@freefall.freebsd.org) Received: (from linimon@localhost) by freefall.freebsd.org (8.14.6/8.14.6/Submit) id r15DJAVT045239; Tue, 5 Feb 2013 13:19:10 GMT (envelope-from linimon) Date: Tue, 5 Feb 2013 13:19:10 GMT Message-Id: <201302051319.r15DJAVT045239@freefall.freebsd.org> To: linimon@FreeBSD.org, freebsd-amd64@FreeBSD.org, freebsd-net@FreeBSD.org From: linimon@FreeBSD.org Subject: Re: kern/175852: [amd64] [patch] in_cksum_hdr() behaves differently on amd64 vs i386 with unaligned data X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Feb 2013 13:19:10 -0000 Old Synopsis: in_cksum_hdr() behaves differently on amd64 vs i386 with unaligned data New Synopsis: [amd64] [patch] in_cksum_hdr() behaves differently on amd64 vs i386 with unaligned data Responsible-Changed-From-To: freebsd-amd64->freebsd-net Responsible-Changed-By: linimon Responsible-Changed-When: Tue Feb 5 13:17:43 UTC 2013 Responsible-Changed-Why: Even though this is an amd64-specific patch, I'm going to try to assign it to the networking mailing list since it affects the networking code. http://www.freebsd.org/cgi/query-pr.cgi?pr=175852 From owner-freebsd-net@FreeBSD.ORG Tue Feb 5 14:44:30 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id EBE2F944 for ; Tue, 5 Feb 2013 14:44:30 +0000 (UTC) (envelope-from s.khanchi@gmail.com) Received: from mail-ia0-x232.google.com (ia-in-x0232.1e100.net [IPv6:2607:f8b0:4001:c02::232]) by mx1.freebsd.org (Postfix) with ESMTP id B2030DC4 for ; Tue, 5 Feb 2013 14:44:30 +0000 (UTC) Received: by mail-ia0-f178.google.com with SMTP id y26so228297iab.9 for ; Tue, 05 Feb 2013 06:44:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:mime-version:sender:from:date:x-google-sender-auth :message-id:subject:to:content-type; bh=iXBmQA2madBCn0SD3buZ9Ee3UdvCCyxO5Tsk+vNV32g=; b=W7mbqANS56RR09Z5Qzcu/8Tzp4ADEFN8aQMstrRNQa41IjflFmNoZUTU71TaibJ0Aa 6yd6m40HHZghUcLwLfYUZQoSAzkl3x4X2kmaRLye8pBG5rn/Vk4dbsK5S9jyNMEO57vo 8MPZ8hEhBIKZoUelgsYLOt+27Gq0p3wZSuUeQhKmyoQ/DuJ0ed4QO6M9itKElSvFITlg BMkaEFursl2HsISgl7cQOmrl4r3EkYThB9WtkAGaByEwXxuk52p/tvwtdNofVdawJscD q7ZuuBqXVVIahVgtYJfUybVTfuskzO7vRkpHI52HrqOa8LGkOUJPcAqlx6M7U2kUDLOP vQeQ== X-Received: by 10.42.67.10 with SMTP id r10mr22552222ici.7.1360075463361; Tue, 05 Feb 2013 06:44:23 -0800 (PST) MIME-Version: 1.0 Sender: s.khanchi@gmail.com Received: by 10.64.38.65 with HTTP; Tue, 5 Feb 2013 06:44:03 -0800 (PST) From: h bagade Date: Tue, 5 Feb 2013 18:14:03 +0330 X-Google-Sender-Auth: Z8RJ6ElSsSuBgltyRZlFr-0cTQU Message-ID: Subject: debug em driver code after applying patch To: freebsd-net@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Feb 2013 14:44:31 -0000 Hi all, I applied patch on em driver code and I want to check how it is working on different situations. I need to put some output in different parts of the code to trace what's going on in different situations. I've tried to write to files or executing commands(like echo) using system function, but in these two methods, by adding headers some conflicting issues happen which I don't know how to resolve! I've tried to use it's macros like INIT_DEBUGOUT to print some messages but it only works on startup, not when the system is running! I don't know how to print out messages to debug the code?! Is there anybody to help me handle it? I really need help. From owner-freebsd-net@FreeBSD.ORG Tue Feb 5 15:24:26 2013 Return-Path: Delivered-To: freebsd-net@FreeBSD.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id A06F6ADC; Tue, 5 Feb 2013 15:24:26 +0000 (UTC) (envelope-from rrs@lakerest.net) Received: from lakerest.net (lakerest.net [70.155.160.98]) by mx1.freebsd.org (Postfix) with ESMTP id ACC57BC; Tue, 5 Feb 2013 15:24:25 +0000 (UTC) Received: from [10.1.1.101] (bsd4.lakerest.net [70.155.160.102]) (authenticated bits=0) by lakerest.net (8.14.4/8.14.3) with ESMTP id r15FOe2t041001 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT); Tue, 5 Feb 2013 10:24:40 -0500 (EST) (envelope-from rrs@lakerest.net) Subject: Re: Driver patch to look at... Mime-Version: 1.0 (Apple Message framework v1283) Content-Type: multipart/mixed; boundary="Apple-Mail=_3460D60E-1D83-4AEB-A9DF-11C2A6881F5A" From: Randall Stewart In-Reply-To: <39571D84-A8C0-46A4-8EFA-CF74D862EAAE@lakerest.net> Date: Tue, 5 Feb 2013 10:24:24 -0500 Message-Id: References: <201302041524.58699.jhb@freebsd.org> <45AD1046-A630-4C96-B4D2-B8A7D6A6DC45@lakerest.net> <39571D84-A8C0-46A4-8EFA-CF74D862EAAE@lakerest.net> To: John Baldwin X-Mailer: Apple Mail (2.1283) Cc: freebsd-net , Robert Watson , Kip Macy , Jack Vogel X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Feb 2013 15:24:26 -0000 --Apple-Mail=_3460D60E-1D83-4AEB-A9DF-11C2A6881F5A Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=windows-1252 Here is an updated patch=85 sigh.. I foobar'd the ALTQ stuff.. lots of = crashes ;-D R --Apple-Mail=_3460D60E-1D83-4AEB-A9DF-11C2A6881F5A Content-Disposition: attachment; filename=driver_patch.txt Content-Type: text/plain; x-unix-mode=0644; name="driver_patch.txt" Content-Transfer-Encoding: quoted-printable Index: dev/e1000/if_em.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- dev/e1000/if_em.c (revision 246357) +++ dev/e1000/if_em.c (working copy) @@ -894,7 +894,7 @@ static int em_mq_start_locked(struct ifnet *ifp, struct tx_ring *txr, struct mbuf = *m) { struct adapter *adapter =3D txr->adapter; - struct mbuf *next; + struct mbuf *next, *snext; int err =3D 0, enq =3D 0; =20 if ((ifp->if_drv_flags & (IFF_DRV_RUNNING | IFF_DRV_OACTIVE)) !=3D= @@ -905,22 +905,25 @@ em_mq_start_locked(struct ifnet *ifp, struct tx_ri } =20 enq =3D 0; - if (m =3D=3D NULL) { - next =3D drbr_dequeue(ifp, txr->br); - } else if (drbr_needs_enqueue(ifp, txr->br)) { - if ((err =3D drbr_enqueue(ifp, txr->br, m)) !=3D 0) + if (m) { + err =3D drbr_enqueue(ifp, txr->br, m); + if (err) { return (err); - next =3D drbr_dequeue(ifp, txr->br); - } else - next =3D m; + } + }=20 =20 /* Process the queue */ - while (next !=3D NULL) { + while ((next =3D drbr_peek(ifp, txr->br)) !=3D NULL) { + snext =3D next; if ((err =3D em_xmit(txr, &next)) !=3D 0) { - if (next !=3D NULL) - err =3D drbr_enqueue(ifp, txr->br, = next); - break; + if (next =3D=3D NULL) { + drbr_advance(ifp, txr->br); + } else { + drbr_putback(ifp, txr->br, next, snext); + } + break; } + drbr_advance(ifp, txr->br); enq++; ifp->if_obytes +=3D next->m_pkthdr.len; if (next->m_flags & M_MCAST) @@ -928,7 +931,6 @@ em_mq_start_locked(struct ifnet *ifp, struct tx_ri ETHER_BPF_MTAP(ifp, next); if ((ifp->if_drv_flags & IFF_DRV_RUNNING) =3D=3D 0) break; - next =3D drbr_dequeue(ifp, txr->br); } =20 if (enq > 0) { Index: dev/e1000/if_igb.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- dev/e1000/if_igb.c (revision 246357) +++ dev/e1000/if_igb.c (working copy) @@ -981,7 +981,7 @@ static int igb_mq_start_locked(struct ifnet *ifp, struct tx_ring *txr) { struct adapter *adapter =3D txr->adapter; - struct mbuf *next; + struct mbuf *next, *snext; int err =3D 0, enq; =20 IGB_TX_LOCK_ASSERT(txr); @@ -994,12 +994,23 @@ igb_mq_start_locked(struct ifnet *ifp, struct tx_r enq =3D 0; =20 /* Process the queue */ - while ((next =3D drbr_dequeue(ifp, txr->br)) !=3D NULL) { + while ((next =3D drbr_peek(ifp, txr->br)) !=3D NULL) { + snext =3D next; if ((err =3D igb_xmit(txr, &next)) !=3D 0) { - if (next !=3D NULL) - err =3D drbr_enqueue(ifp, txr->br, = next); + if (next =3D=3D NULL) { + /* It was freed, move forward */ + drbr_advance(ifp, txr->br); + } else { + /*=20 + * Still have one left, it may not be + * the same since the transmit function + * may have changed it. + */ + drbr_putback(ifp, txr->br, next, snext); + } break; } + drbr_advance(ifp, txr->br); enq++; ifp->if_obytes +=3D next->m_pkthdr.len; if (next->m_flags & M_MCAST) Index: dev/oce/oce_if.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- dev/oce/oce_if.c (revision 246357) +++ dev/oce/oce_if.c (working copy) @@ -1154,6 +1154,7 @@ oce_multiq_transmit(struct ifnet *ifp, struct mbuf POCE_SOFTC sc =3D ifp->if_softc; int status =3D 0, queue_index =3D 0; struct mbuf *next =3D NULL; + struct mbuf *snext; struct buf_ring *br =3D NULL; =20 br =3D wq->br; @@ -1166,29 +1167,28 @@ oce_multiq_transmit(struct ifnet *ifp, struct = mbuf return status; } =20 - if (m =3D=3D NULL) - next =3D drbr_dequeue(ifp, br); =09 - else if (drbr_needs_enqueue(ifp, br)) { + if (m) { if ((status =3D drbr_enqueue(ifp, br, m)) !=3D 0) return status; - next =3D drbr_dequeue(ifp, br); - } else - next =3D m; - - while (next !=3D NULL) { + }=20 + while ((next =3D drbr_peek(ifp, br)) !=3D NULL) { + snext =3D next; if (oce_tx(sc, &next, queue_index)) { - if (next !=3D NULL) { + if (next =3D=3D NULL) { + drbr_advance(ifp, br); + } else { + drbr_putback(ifp, br, next, snext); wq->tx_stats.tx_stops ++; ifp->if_drv_flags |=3D IFF_DRV_OACTIVE; status =3D drbr_enqueue(ifp, br, next); } =20 break; } + drbr_advance(ifp, br); ifp->if_obytes +=3D next->m_pkthdr.len; if (next->m_flags & M_MCAST) ifp->if_omcasts++; ETHER_BPF_MTAP(ifp, next); - next =3D drbr_dequeue(ifp, br); } =20 return status; Index: dev/ixgbe/ixgbe.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- dev/ixgbe/ixgbe.c (revision 246357) +++ dev/ixgbe/ixgbe.c (working copy) @@ -821,7 +821,7 @@ static int ixgbe_mq_start_locked(struct ifnet *ifp, struct tx_ring *txr, struct = mbuf *m) { struct adapter *adapter =3D txr->adapter; - struct mbuf *next; + struct mbuf *next, *snext; int enqueued, err =3D 0; =20 if (((ifp->if_drv_flags & IFF_DRV_RUNNING) =3D=3D 0) || @@ -832,22 +832,25 @@ ixgbe_mq_start_locked(struct ifnet *ifp, struct tx } =20 enqueued =3D 0; - if (m =3D=3D NULL) { - next =3D drbr_dequeue(ifp, txr->br); - } else if (drbr_needs_enqueue(ifp, txr->br)) { - if ((err =3D drbr_enqueue(ifp, txr->br, m)) !=3D 0) + if (m) { + err =3D drbr_enqueue(ifp, txr->br, m); + if (err) { return (err); - next =3D drbr_dequeue(ifp, txr->br); - } else - next =3D m; + } + } =20 /* Process the queue */ - while (next !=3D NULL) { + while ((next =3D drbr_peek(ifp, txr->br)) !=3D NULL) { + snext =3D next; if ((err =3D ixgbe_xmit(txr, &next)) !=3D 0) { - if (next !=3D NULL) - err =3D drbr_enqueue(ifp, txr->br, = next); + if (next =3D=3D NULL) { + drbr_advance(ifp, txr->br); + } else { + drbr_putback(ifp, txr->br, next, snext); + } break; } + drbr_advance(ifp, txr->br); enqueued++; /* Send a copy of the frame to the BPF listener */ ETHER_BPF_MTAP(ifp, next); @@ -855,7 +858,6 @@ ixgbe_mq_start_locked(struct ifnet *ifp, struct tx break; if (txr->tx_avail < IXGBE_TX_OP_THRESHOLD) ixgbe_txeof(txr); - next =3D drbr_dequeue(ifp, txr->br); } =20 if (enqueued > 0) { Index: dev/ixgbe/ixv.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- dev/ixgbe/ixv.c (revision 246357) +++ dev/ixgbe/ixv.c (working copy) @@ -605,7 +605,7 @@ static int ixv_mq_start_locked(struct ifnet *ifp, struct tx_ring *txr, struct mbuf = *m) { struct adapter *adapter =3D txr->adapter; - struct mbuf *next; + struct mbuf *next, *snext; int enqueued, err =3D 0; =20 if ((ifp->if_drv_flags & (IFF_DRV_RUNNING | IFF_DRV_OACTIVE)) !=3D= @@ -620,22 +620,24 @@ ixv_mq_start_locked(struct ifnet *ifp, struct tx_r ixv_txeof(txr); =20 enqueued =3D 0; - if (m =3D=3D NULL) { - next =3D drbr_dequeue(ifp, txr->br); - } else if (drbr_needs_enqueue(ifp, txr->br)) { - if ((err =3D drbr_enqueue(ifp, txr->br, m)) !=3D 0) + if (m) { + err =3D drbr_enqueue(ifp, txr->br, m); + if (err) { return (err); - next =3D drbr_dequeue(ifp, txr->br); - } else - next =3D m; - + } + } /* Process the queue */ - while (next !=3D NULL) { + while ((next =3D drbr_peek(ifp, txr->br)) !=3D NULL) { + snext =3D next; if ((err =3D ixv_xmit(txr, &next)) !=3D 0) { - if (next !=3D NULL) - err =3D drbr_enqueue(ifp, txr->br, = next); + if (next =3D=3D NULL) { + drbr_advance(ifp, txr->br); + } else { + drbr_putback(ifp, txr->br, next, snext); + } break; } + drbr_advance(ifp, txr->br); enqueued++; ifp->if_obytes +=3D next->m_pkthdr.len; if (next->m_flags & M_MCAST) @@ -648,7 +650,6 @@ ixv_mq_start_locked(struct ifnet *ifp, struct tx_r ifp->if_drv_flags |=3D IFF_DRV_OACTIVE; break; } - next =3D drbr_dequeue(ifp, txr->br); } =20 if (enqueued > 0) { Index: dev/bxe/if_bxe.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- dev/bxe/if_bxe.c (revision 246357) +++ dev/bxe/if_bxe.c (working copy) @@ -9491,7 +9491,7 @@ bxe_tx_mq_start_locked(struct ifnet *ifp, struct bxe_fastpath *fp, struct mbuf *m) { struct bxe_softc *sc; - struct mbuf *next; + struct mbuf *next, *snext; int depth, rc, tx_count; =20 sc =3D fp->sc; @@ -9506,24 +9506,16 @@ bxe_tx_mq_start_locked(struct ifnet *ifp, =20 BXE_FP_LOCK_ASSERT(fp); =20 - if (m =3D=3D NULL) { - /* No new work, check for pending frames. */ - next =3D drbr_dequeue(ifp, fp->br); - } else if (drbr_needs_enqueue(ifp, fp->br)) { - /* Both new and pending work, maintain packet order. */ + if (m) { rc =3D drbr_enqueue(ifp, fp->br, m); if (rc !=3D 0) { fp->tx_soft_errors++; goto bxe_tx_mq_start_locked_exit; } - next =3D drbr_dequeue(ifp, fp->br); - } else - /* New work only, nothing pending. */ - next =3D m; - + } /* Keep adding entries while there are frames to send. */ - while (next !=3D NULL) { - + while ((next =3D drbr_peek(ifp, fp->br)) !=3D NULL) { + snext =3D next; /* The transmit mbuf now belongs to us, keep track of = it. */ fp->tx_mbuf_alloc++; =20 @@ -9537,23 +9529,22 @@ bxe_tx_mq_start_locked(struct ifnet *ifp, if (__predict_false(rc !=3D 0)) { fp->tx_encap_failures++; /* Very Bad Frames(tm) may have been dropped. */ - if (next !=3D NULL) { + if (next =3D=3D NULL) { + drbr_advance(ifp, fp->br); + } else { + drbr_putback(ifp, fp->br, next, snext); /* * Mark the TX queue as full and save * the frame. */ ifp->if_drv_flags |=3D IFF_DRV_OACTIVE; fp->tx_frame_deferred++; - - /* This may reorder frame. */ - rc =3D drbr_enqueue(ifp, fp->br, next); fp->tx_mbuf_alloc--; } - /* Stop looking for more work. */ break; } - + drbr_advance(ifp, fp->br); /* The transmit frame was enqueued successfully. */ tx_count++; =20 @@ -9574,8 +9565,6 @@ bxe_tx_mq_start_locked(struct ifnet *ifp, ifp->if_drv_flags &=3D ~IFF_DRV_OACTIVE; break; } - - next =3D drbr_dequeue(ifp, fp->br); } =20 /* No TX packets were dequeued. */ Index: net/if_var.h =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- net/if_var.h (revision 246357) +++ net/if_var.h (working copy) @@ -622,6 +622,46 @@ drbr_enqueue(struct ifnet *ifp, struct buf_ring *b } =20 static __inline void +drbr_putback(struct ifnet *ifp, struct buf_ring *br, struct mbuf *new, = struct mbuf *prev) +{ + /* + * The top of the list needs to be swapped=20 + * for this one. + */ +#ifdef ALTQ + if (ifp !=3D NULL && ALTQ_IS_ENABLED(&ifp->if_snd)) { + /*=20 + * Peek in altq case dequeued it + * so put it back. + */ + IFQ_DRV_PREPEND(&ifp->if_snd, new); + return; + } +#endif + if (new !=3D prev)=20 + buf_ring_swap(br, new, prev); +} + +static __inline struct mbuf * +drbr_peek(struct ifnet *ifp, struct buf_ring *br) +{ +#ifdef ALTQ + struct mbuf *m; + if (ifp !=3D NULL && ALTQ_IS_ENABLED(&ifp->if_snd)) { + /*=20 + * Pull it off like a dequeue + * since drbr_advance() does nothing + * for altq and drbr_putback() will + * use the old prepend function. + */ + IFQ_DEQUEUE(&ifp->if_snd, m); + return (m); + } +#endif + return(buf_ring_peek(br)); +} + +static __inline void drbr_flush(struct ifnet *ifp, struct buf_ring *br) { struct mbuf *m; @@ -656,6 +696,18 @@ drbr_dequeue(struct ifnet *ifp, struct buf_ring *b return (buf_ring_dequeue_sc(br)); } =20 +static __inline void +drbr_advance(struct ifnet *ifp, struct buf_ring *br) +{ +#ifdef ALTQ + /* Nothing to do here since peek dequeues in altq case */ + if (ALTQ_IS_ENABLED(&ifp->if_snd)) + return; +#endif + return (buf_ring_advance_sc(br)); +} + + static __inline struct mbuf * drbr_dequeue_cond(struct ifnet *ifp, struct buf_ring *br, int (*func) (struct mbuf *, void *), void *arg)=20 Index: sys/buf_ring.h =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- sys/buf_ring.h (revision 246357) +++ sys/buf_ring.h (working copy) @@ -208,6 +208,51 @@ buf_ring_dequeue_sc(struct buf_ring *br) } =20 /* + * single-consumer advance after a peek + * use where it is protected by a lock + * e.g. a network driver's tx queue lock + */ +static __inline void +buf_ring_advance_sc(struct buf_ring *br) +{ + uint32_t cons_head, cons_next; + uint32_t prod_tail; +=09 + cons_head =3D br->br_cons_head; + prod_tail =3D br->br_prod_tail; +=09 + cons_next =3D (cons_head + 1) & br->br_cons_mask; +=09 + if (cons_head =3D=3D prod_tail)=20 + return; + + br->br_cons_head =3D cons_next; + br->br_cons_tail =3D cons_next; +} + + +/* + * Used to return a differnt mbuf to the + * top of the ring. This can happen if + * the driver changed the packets (some defragmentation + * for example) and then realized the transmit + * ring was full. In such a case the old packet + * is now freed, but we want the order of the actual + * data (being sent in the new packet) to remain + * the same. + */ +static __inline void +buf_ring_swap(struct buf_ring *br, void *new, void *old) +{ + int ret; + if (br->br_cons_head =3D=3D br->br_prod_tail)=20 + /* Huh? */ + return; + ret =3D atomic_cmpset_long((uint64_t = *)&br->br_ring[br->br_cons_head], (uint64_t)old, (uint64_t)new); + KASSERT(ret, ("Swap out failed old:%p new:%p ret:%d", old, new, = ret)); +} + +/* * return a pointer to the first entry in the ring * without modifying it, or NULL if the ring is empty * race-prone if not protected by a lock Index: ofed/drivers/net/mlx4/en_tx.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- ofed/drivers/net/mlx4/en_tx.c (revision 246357) +++ ofed/drivers/net/mlx4/en_tx.c (working copy) @@ -919,7 +919,7 @@ mlx4_en_transmit_locked(struct ifnet *dev, int tx_ { struct mlx4_en_priv *priv =3D netdev_priv(dev); struct mlx4_en_tx_ring *ring; - struct mbuf *next; + struct mbuf *next, *snext; int enqueued, err =3D 0; =20 ring =3D &priv->tx_ring[tx_ind]; @@ -931,22 +931,22 @@ mlx4_en_transmit_locked(struct ifnet *dev, int tx_ } =20 enqueued =3D 0; - if (m =3D=3D NULL) { - next =3D drbr_dequeue(dev, ring->br); - } else if (drbr_needs_enqueue(dev, ring->br)) { + if (m) { if ((err =3D drbr_enqueue(dev, ring->br, m)) !=3D 0) return (err); - next =3D drbr_dequeue(dev, ring->br); - } else - next =3D m; - + } /* Process the queue */ - while (next !=3D NULL) { + while ((next =3D drbr_peek(ifp, txr->br)) !=3D NULL) { + snext =3D next; if ((err =3D mlx4_en_xmit(dev, tx_ind, &next)) !=3D 0) { - if (next !=3D NULL) - err =3D drbr_enqueue(dev, ring->br, = next); + if (next =3D=3D NULL) { + drbr_advance(ifp, txr->br); + } else { + drbr_putback(ifp, txr->br, next, snext); + } break; } + drbr_advance(ifp, txr->br); enqueued++; dev->if_obytes +=3D next->m_pkthdr.len; if (next->m_flags & M_MCAST) @@ -955,7 +955,6 @@ mlx4_en_transmit_locked(struct ifnet *dev, int tx_ ETHER_BPF_MTAP(dev, next); if ((dev->if_drv_flags & IFF_DRV_RUNNING) =3D=3D 0) break; - next =3D drbr_dequeue(dev, ring->br); } =20 if (enqueued > 0) --Apple-Mail=_3460D60E-1D83-4AEB-A9DF-11C2A6881F5A Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=windows-1252 On Feb 5, 2013, at 6:49 AM, Randall Stewart wrote: > John: >=20 > Here is an updated patch, per your suggestions. Note that I also > expanded and the only driver that uses these methods I did not touch > is the cxgb, but thats because I am not really sure it has the = problem=85 it > does not quite enqueue the same way (it appears) that the other = drivers do ;-) >=20 > R >=20 > > On Feb 5, 2013, at 5:49 AM, Randy Stewart wrote: >=20 >>=20 >> On Feb 4, 2013, at 3:24 PM, John Baldwin wrote: >>=20 >>> On Monday, February 04, 2013 12:22:49 pm Randy Stewart wrote: >>>> All: >>>>=20 >>>> I have been working with TCP in gigabit networks (igb driver = actually) and have >>>> found a very nasty problem with the way the driver is doing its put = back when >>>> it fills the out-bound transmit queue. >>>>=20 >>>> Basically it has taken a packet from the head of the ring buffer, = and then=20 >>>> realizes it can't fit it into the transmit queue. So it just = re-enqueue's it >>>> into the ring buffer. Whats wrong with that? Well most of the time = there >>>> are anywhere from 10-50 packets (maybe more) in that ring buffer = when you are >>>> operating at full speed (or trying to). This means you will see 10 = duplicate >>>> ACKs, do a fast retransmit and cut your cwnd in half.. not very = nice actually. >>>>=20 >>>> The patch I have attached makes it so that >>>>=20 >>>> 1) There are ways to swap back. >>>> 2) Use the peek in the ring buffer and only >>>> dequeue the packet if we put it into the transmit ring >>>> 3) If something goes wrong and the transmit frees the packet we = dequeue it. >>>> 4) If the transmit changed it (defrag etc) then swap out the new = mbuf that >>>> has taken its place. >>>>=20 >>>> I have fixed the four intel drivers that had this systemic issue, = but there >>>> are still more to fix. >>>>=20 >>>> Comments/review .. rotten egg's etc.. would be most welcome before >>>> I commit this.. >>>=20 >>> Does this only happen in drivers that use buffering? >>=20 >> Yep, there are a lot of drivers that *do not* use the drbr_xxxx() = functions and >> for those they do the IFQ_DRV_PREPEND().. its only the newer drivers = like the >> intel 1Gig and 10Gig ones that seem effected >>=20 >> Also effected are : >>=20 >> bxe >> cxgb >> oce >> en >>=20 >> I have not fixed those yet. >>=20 >>> I seem to recall that >>> drivers using IFQ would just stuff the packet at the head of the IFQ = via >>> IFQ_DRV_PREPEND() in this case so it is still the next packet to = transmit. >>> See, for example, this bit in dc_start_locked(): >>>=20 >>> for (queued =3D 0; !IFQ_DRV_IS_EMPTY(&ifp->if_snd); ) { >>> /* >>> * If there's no way we can send any packets, return = now. >>> */ >>> if (sc->dc_cdata.dc_tx_cnt > DC_TX_LIST_CNT - = DC_TX_LIST_RSVD) { >>> ifp->if_drv_flags |=3D IFF_DRV_OACTIVE; >>> break; >>> } >>> IFQ_DRV_DEQUEUE(&ifp->if_snd, m_head); >>> if (m_head =3D=3D NULL) >>> break; >>>=20 >>> if (dc_encap(sc, &m_head)) { >>> if (m_head =3D=3D NULL) >>> break; >>> IFQ_DRV_PREPEND(&ifp->if_snd, m_head); >>> ifp->if_drv_flags |=3D IFF_DRV_OACTIVE; >>> break; >>> } >>>=20 >>> It sounds like drbr/buf_ring just don't handle this case correctly? = It >>> seems a shame to have to duplicate so much code in the various = drivers to >>> fix this, but that seems to be par for the course when using = buf_ring. :( >>> (buggy in edge cases and lots of duplicated code that is). >>=20 >>> Also, doing the drbr_swap() just so that drbr_dequeue() returns what = you >>> just swapped in seems... odd. It seems that it would be nicer = instead >>> to have some sort of drbr_peek() / drbr_advance() where the latter = just >>> skips over whatever the current head is? Then you could have = something >>> like: >>>=20 >>> while ((next =3D drbr_peek()) !=3D NULL) { >>> if (!foo_encap(&next)) { >>> if (next =3D=3D NULL) >>> drbr_advance(); >>> break; >>> } >>> drbr_advance(); >>> } >>>=20 >>=20 >> That was what I originally did (without the rename), but there is a = sure crash waiting in that. >> The only difference from what I originally had was just = drbr_dequeue().. but >> I was being a bit lazy and not wanting to add yet another function to = the=20 >> drbr_xxxx code since essential it would be a clone of drbr_dequeue() = without >> returning the mbuf. >>=20 >> The crash potential here is in that foo_encap(&next) often times will = return >> a different mbuf (at least in the igb driver it does). I believe its = due >> to either the m_pullup() which could change the lead mbuf you want >> in the drbr_ring, or the m_defrag all within igb_xmit. Thus you have >> to track what comes back from the !foo_encap() call and compare it to=20= >> see if you must swap it out.=20 >>=20 >>=20 >>> I guess the sticky widget is the case of ATLQ as you need to dequeue = the >>> packet always in the ALTQ case and put it back if the encap fails. >>=20 >> Yeah ALTQ is ugly and IMO we need to re-write it anyway.. maybe = re-think >> this whole layer ;-0 >>=20 >>> For >>> your patch it's not clear to me how that works. It seems that if = the >>> encap routine frees the mbuf you try to dereference a freed pointer = when >>> you call drbr_dequeue(). >>=20 >> Hmm you are right.. I forgot how we keep those using the mbuf = itself... >>=20 >>> I really think you will instead need some sort >>> of 'drbr_putback()' and have 'drbr_peek()' dequeue in the ALTQ case = and >>> use 'drbr_putback()' to put it back (PREPEND) in the ALTQ case. >>=20 >> We could do that but drbr_putback() would probably need both the old >> and new pointers and then we could make it do the ring_swap() to put >> the right mbuf in place.. >>=20 >> Let me go explore that and come up with a better patch ;-) >>=20 >> R >>=20 >>=20 >>>=20 >>> --=20 >>> John Baldwin >>> _______________________________________________ >>> freebsd-net@freebsd.org mailing list >>> http://lists.freebsd.org/mailman/listinfo/freebsd-net >>> To unsubscribe, send any mail to = "freebsd-net-unsubscribe@freebsd.org" >>>=20 >>=20 >> ----- >> Randall Stewart >> randall@lakerest.net >>=20 >>=20 >>=20 >>=20 >> _______________________________________________ >> freebsd-net@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-net >> To unsubscribe, send any mail to = "freebsd-net-unsubscribe@freebsd.org" >>=20 >=20 > ------------------------------ > Randall Stewart > 803-317-4952 (cell) >=20 > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" ------------------------------ Randall Stewart 803-317-4952 (cell) --Apple-Mail=_3460D60E-1D83-4AEB-A9DF-11C2A6881F5A-- From owner-freebsd-net@FreeBSD.ORG Tue Feb 5 17:24:43 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id D3C2120A; Tue, 5 Feb 2013 17:24:43 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net [IPv6:2001:470:1f10:75::2]) by mx1.freebsd.org (Postfix) with ESMTP id ABF60A77; Tue, 5 Feb 2013 17:24:43 +0000 (UTC) Received: from pakbsde14.localnet (unknown [38.105.238.108]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 195DEB91E; Tue, 5 Feb 2013 12:24:43 -0500 (EST) From: John Baldwin To: freebsd-net@freebsd.org Subject: Re: [PATCH] Add a new TCP_IGNOREIDLE socket option Date: Tue, 5 Feb 2013 12:11:42 -0500 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p22; KDE/4.5.5; amd64; ; ) References: <201301221511.02496.jhb@freebsd.org> <201301301158.33838.jhb@freebsd.org> <510957B9.8070203@freebsd.org> In-Reply-To: <510957B9.8070203@freebsd.org> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201302051211.43345.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Tue, 05 Feb 2013 12:24:43 -0500 (EST) Cc: Sepherosa Ziehau , Bjoern Zeeb , Andre Oppermann X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Feb 2013 17:24:43 -0000 On Wednesday, January 30, 2013 12:26:17 pm Andre Oppermann wrote: > You can simply create your own congestion control algorithm with only the > restart window changed. See (pseudo) code below. BTW, I just noticed that > the other cc algos don't do not reset the idle window. *sigh* I am fully competent at maintaining my own local changes. The point was to share this so that other people with similar workloads could make use of it. Also, a custom CC algo is not the right approach as we would want this change regardless of the CC algo used for handling non-idle periods (so that this is an orthogonal knob). Linux also makes this an orthogonal knob rather than requiring a separate CC algo. -- John Baldwin From owner-freebsd-net@FreeBSD.ORG Tue Feb 5 17:24:45 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 98A8E211; Tue, 5 Feb 2013 17:24:45 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net [IPv6:2001:470:1f10:75::2]) by mx1.freebsd.org (Postfix) with ESMTP id 7557FA79; Tue, 5 Feb 2013 17:24:45 +0000 (UTC) Received: from pakbsde14.localnet (unknown [38.105.238.108]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id CB521B946; Tue, 5 Feb 2013 12:24:44 -0500 (EST) From: John Baldwin To: freebsd-net@freebsd.org Subject: Re: Driver patch to look at... Date: Tue, 5 Feb 2013 12:13:51 -0500 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p22; KDE/4.5.5; amd64; ; ) References: <39571D84-A8C0-46A4-8EFA-CF74D862EAAE@lakerest.net> In-Reply-To: MIME-Version: 1.0 Content-Type: Text/Plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <201302051213.51401.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Tue, 05 Feb 2013 12:24:44 -0500 (EST) Cc: Jack Vogel , Robert Watson , Randall Stewart , Kip Macy X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Feb 2013 17:24:45 -0000 On Tuesday, February 05, 2013 10:24:24 am Randall Stewart wrote: > Here is an updated patch=E2=80=A6 sigh.. I foobar'd the ALTQ stuff.. lots= of crashes=20 ;-D Heh, I like this better, thanks. I think you can remove buf_ring_swap() as= it=20 is no longer used? =2D-=20 John Baldwin From owner-freebsd-net@FreeBSD.ORG Tue Feb 5 17:44:09 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id C8130A0E; Tue, 5 Feb 2013 17:44:09 +0000 (UTC) (envelope-from rrs@lakerest.net) Received: from lakerest.net (lakerest.net [70.155.160.98]) by mx1.freebsd.org (Postfix) with ESMTP id EB918C2C; Tue, 5 Feb 2013 17:44:08 +0000 (UTC) Received: from [10.1.1.101] (bsd4.lakerest.net [70.155.160.102]) (authenticated bits=0) by lakerest.net (8.14.4/8.14.3) with ESMTP id r15HiIPr042847 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT); Tue, 5 Feb 2013 12:44:18 -0500 (EST) (envelope-from rrs@lakerest.net) Subject: Re: Driver patch to look at... Mime-Version: 1.0 (Apple Message framework v1283) Content-Type: multipart/mixed; boundary="Apple-Mail=_653FCD68-7101-4A93-A68D-4D518FFFEDD1" From: Randall Stewart In-Reply-To: <201302051213.51401.jhb@freebsd.org> Date: Tue, 5 Feb 2013 12:44:01 -0500 Message-Id: <0D421326-9A80-4E21-A18E-E717F5C02164@lakerest.net> References: <39571D84-A8C0-46A4-8EFA-CF74D862EAAE@lakerest.net> <201302051213.51401.jhb@freebsd.org> To: John Baldwin X-Mailer: Apple Mail (2.1283) Cc: freebsd-net@freebsd.org, Robert Watson , Jack Vogel , Kip Macy X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Feb 2013 17:44:09 -0000 --Apple-Mail=_653FCD68-7101-4A93-A68D-4D518FFFEDD1 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii Actually, no it is used. If you look in if_var.h int he drbr_putback() function, it does a buf_ring_swap when the old mbuf pointer does not equal the new mbuf pointer. This *does* happen, I crashed at least once yesterday when the igb driver did something to free the original mbuf and return a new mbuf with the data (prepend or some such). I also have found several issues that I have fixed this morning.. its = been crash city on my test beds.. Here is the latest patch with all fixes and suggested changes from = emaste (thanks Ed) R --Apple-Mail=_653FCD68-7101-4A93-A68D-4D518FFFEDD1 Content-Disposition: attachment; filename=driver_patch.txt Content-Type: text/plain; name="driver_patch.txt" Content-Transfer-Encoding: quoted-printable Index: dev/e1000/if_em.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- dev/e1000/if_em.c (revision 246357) +++ dev/e1000/if_em.c (working copy) @@ -894,7 +894,7 @@ static int em_mq_start_locked(struct ifnet *ifp, struct tx_ring *txr, struct mbuf = *m) { struct adapter *adapter =3D txr->adapter; - struct mbuf *next; + struct mbuf *next, *snext; int err =3D 0, enq =3D 0; =20 if ((ifp->if_drv_flags & (IFF_DRV_RUNNING | IFF_DRV_OACTIVE)) !=3D= @@ -905,22 +905,25 @@ em_mq_start_locked(struct ifnet *ifp, struct tx_ri } =20 enq =3D 0; - if (m =3D=3D NULL) { - next =3D drbr_dequeue(ifp, txr->br); - } else if (drbr_needs_enqueue(ifp, txr->br)) { - if ((err =3D drbr_enqueue(ifp, txr->br, m)) !=3D 0) + if (m !=3D NULL) { + err =3D drbr_enqueue(ifp, txr->br, m); + if (err) { return (err); - next =3D drbr_dequeue(ifp, txr->br); - } else - next =3D m; + } + }=20 =20 /* Process the queue */ - while (next !=3D NULL) { + while ((next =3D drbr_peek(ifp, txr->br)) !=3D NULL) { + snext =3D next; if ((err =3D em_xmit(txr, &next)) !=3D 0) { - if (next !=3D NULL) - err =3D drbr_enqueue(ifp, txr->br, = next); - break; + if (next =3D=3D NULL) { + drbr_advance(ifp, txr->br); + } else { + drbr_putback(ifp, txr->br, next, snext); + } + break; } + drbr_advance(ifp, txr->br); enq++; ifp->if_obytes +=3D next->m_pkthdr.len; if (next->m_flags & M_MCAST) @@ -928,7 +931,6 @@ em_mq_start_locked(struct ifnet *ifp, struct tx_ri ETHER_BPF_MTAP(ifp, next); if ((ifp->if_drv_flags & IFF_DRV_RUNNING) =3D=3D 0) break; - next =3D drbr_dequeue(ifp, txr->br); } =20 if (enq > 0) { Index: dev/e1000/if_igb.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- dev/e1000/if_igb.c (revision 246357) +++ dev/e1000/if_igb.c (working copy) @@ -965,12 +965,13 @@ igb_mq_start(struct ifnet *ifp, struct mbuf *m) ** out-of-order delivery, but=20 ** settle for it if that fails */ - if (m) + if (m !=3D NULL) drbr_enqueue(ifp, txr->br, m); err =3D igb_mq_start_locked(ifp, txr); IGB_TX_UNLOCK(txr); } else { - err =3D drbr_enqueue(ifp, txr->br, m); + if (m !=3D NULL) + err =3D drbr_enqueue(ifp, txr->br, m); taskqueue_enqueue(que->tq, &txr->txq_task); } =20 @@ -981,7 +982,7 @@ static int igb_mq_start_locked(struct ifnet *ifp, struct tx_ring *txr) { struct adapter *adapter =3D txr->adapter; - struct mbuf *next; + struct mbuf *next, *snext; int err =3D 0, enq; =20 IGB_TX_LOCK_ASSERT(txr); @@ -994,12 +995,23 @@ igb_mq_start_locked(struct ifnet *ifp, struct tx_r enq =3D 0; =20 /* Process the queue */ - while ((next =3D drbr_dequeue(ifp, txr->br)) !=3D NULL) { + while ((next =3D drbr_peek(ifp, txr->br)) !=3D NULL) { + snext =3D next; if ((err =3D igb_xmit(txr, &next)) !=3D 0) { - if (next !=3D NULL) - err =3D drbr_enqueue(ifp, txr->br, = next); + if (next =3D=3D NULL) { + /* It was freed, move forward */ + drbr_advance(ifp, txr->br); + } else { + /*=20 + * Still have one left, it may not be + * the same since the transmit function + * may have changed it. + */ + drbr_putback(ifp, txr->br, next, snext); + } break; } + drbr_advance(ifp, txr->br); enq++; ifp->if_obytes +=3D next->m_pkthdr.len; if (next->m_flags & M_MCAST) Index: dev/oce/oce_if.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- dev/oce/oce_if.c (revision 246357) +++ dev/oce/oce_if.c (working copy) @@ -1154,6 +1154,7 @@ oce_multiq_transmit(struct ifnet *ifp, struct mbuf POCE_SOFTC sc =3D ifp->if_softc; int status =3D 0, queue_index =3D 0; struct mbuf *next =3D NULL; + struct mbuf *snext; struct buf_ring *br =3D NULL; =20 br =3D wq->br; @@ -1166,29 +1167,28 @@ oce_multiq_transmit(struct ifnet *ifp, struct = mbuf return status; } =20 - if (m =3D=3D NULL) - next =3D drbr_dequeue(ifp, br); =09 - else if (drbr_needs_enqueue(ifp, br)) { + if (m !=3D NULL) { if ((status =3D drbr_enqueue(ifp, br, m)) !=3D 0) return status; - next =3D drbr_dequeue(ifp, br); - } else - next =3D m; - - while (next !=3D NULL) { + }=20 + while ((next =3D drbr_peek(ifp, br)) !=3D NULL) { + snext =3D next; if (oce_tx(sc, &next, queue_index)) { - if (next !=3D NULL) { + if (next =3D=3D NULL) { + drbr_advance(ifp, br); + } else { + drbr_putback(ifp, br, next, snext); wq->tx_stats.tx_stops ++; ifp->if_drv_flags |=3D IFF_DRV_OACTIVE; status =3D drbr_enqueue(ifp, br, next); } =20 break; } + drbr_advance(ifp, br); ifp->if_obytes +=3D next->m_pkthdr.len; if (next->m_flags & M_MCAST) ifp->if_omcasts++; ETHER_BPF_MTAP(ifp, next); - next =3D drbr_dequeue(ifp, br); } =20 return status; Index: dev/ixgbe/ixgbe.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- dev/ixgbe/ixgbe.c (revision 246357) +++ dev/ixgbe/ixgbe.c (working copy) @@ -821,7 +821,7 @@ static int ixgbe_mq_start_locked(struct ifnet *ifp, struct tx_ring *txr, struct = mbuf *m) { struct adapter *adapter =3D txr->adapter; - struct mbuf *next; + struct mbuf *next, *snext; int enqueued, err =3D 0; =20 if (((ifp->if_drv_flags & IFF_DRV_RUNNING) =3D=3D 0) || @@ -832,22 +832,25 @@ ixgbe_mq_start_locked(struct ifnet *ifp, struct tx } =20 enqueued =3D 0; - if (m =3D=3D NULL) { - next =3D drbr_dequeue(ifp, txr->br); - } else if (drbr_needs_enqueue(ifp, txr->br)) { - if ((err =3D drbr_enqueue(ifp, txr->br, m)) !=3D 0) + if (m !=3D NULL) { + err =3D drbr_enqueue(ifp, txr->br, m); + if (err) { return (err); - next =3D drbr_dequeue(ifp, txr->br); - } else - next =3D m; + } + } =20 /* Process the queue */ - while (next !=3D NULL) { + while ((next =3D drbr_peek(ifp, txr->br)) !=3D NULL) { + snext =3D next; if ((err =3D ixgbe_xmit(txr, &next)) !=3D 0) { - if (next !=3D NULL) - err =3D drbr_enqueue(ifp, txr->br, = next); + if (next =3D=3D NULL) { + drbr_advance(ifp, txr->br); + } else { + drbr_putback(ifp, txr->br, next, snext); + } break; } + drbr_advance(ifp, txr->br); enqueued++; /* Send a copy of the frame to the BPF listener */ ETHER_BPF_MTAP(ifp, next); @@ -855,7 +858,6 @@ ixgbe_mq_start_locked(struct ifnet *ifp, struct tx break; if (txr->tx_avail < IXGBE_TX_OP_THRESHOLD) ixgbe_txeof(txr); - next =3D drbr_dequeue(ifp, txr->br); } =20 if (enqueued > 0) { Index: dev/ixgbe/ixv.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- dev/ixgbe/ixv.c (revision 246357) +++ dev/ixgbe/ixv.c (working copy) @@ -605,7 +605,7 @@ static int ixv_mq_start_locked(struct ifnet *ifp, struct tx_ring *txr, struct mbuf = *m) { struct adapter *adapter =3D txr->adapter; - struct mbuf *next; + struct mbuf *next, *snext; int enqueued, err =3D 0; =20 if ((ifp->if_drv_flags & (IFF_DRV_RUNNING | IFF_DRV_OACTIVE)) !=3D= @@ -620,22 +620,24 @@ ixv_mq_start_locked(struct ifnet *ifp, struct tx_r ixv_txeof(txr); =20 enqueued =3D 0; - if (m =3D=3D NULL) { - next =3D drbr_dequeue(ifp, txr->br); - } else if (drbr_needs_enqueue(ifp, txr->br)) { - if ((err =3D drbr_enqueue(ifp, txr->br, m)) !=3D 0) + if (m !=3D NULL) { + err =3D drbr_enqueue(ifp, txr->br, m); + if (err) { return (err); - next =3D drbr_dequeue(ifp, txr->br); - } else - next =3D m; - + } + } /* Process the queue */ - while (next !=3D NULL) { + while ((next =3D drbr_peek(ifp, txr->br)) !=3D NULL) { + snext =3D next; if ((err =3D ixv_xmit(txr, &next)) !=3D 0) { - if (next !=3D NULL) - err =3D drbr_enqueue(ifp, txr->br, = next); + if (next =3D=3D NULL) { + drbr_advance(ifp, txr->br); + } else { + drbr_putback(ifp, txr->br, next, snext); + } break; } + drbr_advance(ifp, txr->br); enqueued++; ifp->if_obytes +=3D next->m_pkthdr.len; if (next->m_flags & M_MCAST) @@ -648,7 +650,6 @@ ixv_mq_start_locked(struct ifnet *ifp, struct tx_r ifp->if_drv_flags |=3D IFF_DRV_OACTIVE; break; } - next =3D drbr_dequeue(ifp, txr->br); } =20 if (enqueued > 0) { Index: dev/bxe/if_bxe.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- dev/bxe/if_bxe.c (revision 246357) +++ dev/bxe/if_bxe.c (working copy) @@ -9491,7 +9491,7 @@ bxe_tx_mq_start_locked(struct ifnet *ifp, struct bxe_fastpath *fp, struct mbuf *m) { struct bxe_softc *sc; - struct mbuf *next; + struct mbuf *next, *snext; int depth, rc, tx_count; =20 sc =3D fp->sc; @@ -9506,24 +9506,16 @@ bxe_tx_mq_start_locked(struct ifnet *ifp, =20 BXE_FP_LOCK_ASSERT(fp); =20 - if (m =3D=3D NULL) { - /* No new work, check for pending frames. */ - next =3D drbr_dequeue(ifp, fp->br); - } else if (drbr_needs_enqueue(ifp, fp->br)) { - /* Both new and pending work, maintain packet order. */ + if (m !=3D NULL) { rc =3D drbr_enqueue(ifp, fp->br, m); if (rc !=3D 0) { fp->tx_soft_errors++; goto bxe_tx_mq_start_locked_exit; } - next =3D drbr_dequeue(ifp, fp->br); - } else - /* New work only, nothing pending. */ - next =3D m; - + } /* Keep adding entries while there are frames to send. */ - while (next !=3D NULL) { - + while ((next =3D drbr_peek(ifp, fp->br)) !=3D NULL) { + snext =3D next; /* The transmit mbuf now belongs to us, keep track of = it. */ fp->tx_mbuf_alloc++; =20 @@ -9537,23 +9529,22 @@ bxe_tx_mq_start_locked(struct ifnet *ifp, if (__predict_false(rc !=3D 0)) { fp->tx_encap_failures++; /* Very Bad Frames(tm) may have been dropped. */ - if (next !=3D NULL) { + if (next =3D=3D NULL) { + drbr_advance(ifp, fp->br); + } else { + drbr_putback(ifp, fp->br, next, snext); /* * Mark the TX queue as full and save * the frame. */ ifp->if_drv_flags |=3D IFF_DRV_OACTIVE; fp->tx_frame_deferred++; - - /* This may reorder frame. */ - rc =3D drbr_enqueue(ifp, fp->br, next); fp->tx_mbuf_alloc--; } - /* Stop looking for more work. */ break; } - + drbr_advance(ifp, fp->br); /* The transmit frame was enqueued successfully. */ tx_count++; =20 @@ -9574,8 +9565,6 @@ bxe_tx_mq_start_locked(struct ifnet *ifp, ifp->if_drv_flags &=3D ~IFF_DRV_OACTIVE; break; } - - next =3D drbr_dequeue(ifp, fp->br); } =20 /* No TX packets were dequeued. */ Index: net/if_var.h =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- net/if_var.h (revision 246357) +++ net/if_var.h (working copy) @@ -622,6 +622,46 @@ drbr_enqueue(struct ifnet *ifp, struct buf_ring *b } =20 static __inline void +drbr_putback(struct ifnet *ifp, struct buf_ring *br, struct mbuf *new, = struct mbuf *prev) +{ + /* + * The top of the list needs to be swapped=20 + * for this one. + */ +#ifdef ALTQ + if (ifp !=3D NULL && ALTQ_IS_ENABLED(&ifp->if_snd)) { + /*=20 + * Peek in altq case dequeued it + * so put it back. + */ + IFQ_DRV_PREPEND(&ifp->if_snd, new); + return; + } +#endif + if (new !=3D prev)=20 + buf_ring_swap(br, new, prev); +} + +static __inline struct mbuf * +drbr_peek(struct ifnet *ifp, struct buf_ring *br) +{ +#ifdef ALTQ + struct mbuf *m; + if (ifp !=3D NULL && ALTQ_IS_ENABLED(&ifp->if_snd)) { + /*=20 + * Pull it off like a dequeue + * since drbr_advance() does nothing + * for altq and drbr_putback() will + * use the old prepend function. + */ + IFQ_DEQUEUE(&ifp->if_snd, m); + return (m); + } +#endif + return(buf_ring_peek(br)); +} + +static __inline void drbr_flush(struct ifnet *ifp, struct buf_ring *br) { struct mbuf *m; @@ -648,7 +688,7 @@ drbr_dequeue(struct ifnet *ifp, struct buf_ring *b #ifdef ALTQ struct mbuf *m; =20 - if (ALTQ_IS_ENABLED(&ifp->if_snd)) {=09 + if (ifp !=3D NULL && ALTQ_IS_ENABLED(&ifp->if_snd)) {=09 IFQ_DEQUEUE(&ifp->if_snd, m); return (m); } @@ -656,6 +696,18 @@ drbr_dequeue(struct ifnet *ifp, struct buf_ring *b return (buf_ring_dequeue_sc(br)); } =20 +static __inline void +drbr_advance(struct ifnet *ifp, struct buf_ring *br) +{ +#ifdef ALTQ + /* Nothing to do here since peek dequeues in altq case */ + if (ifp !=3D NULL && ALTQ_IS_ENABLED(&ifp->if_snd)) + return; +#endif + return (buf_ring_advance_sc(br)); +} + + static __inline struct mbuf * drbr_dequeue_cond(struct ifnet *ifp, struct buf_ring *br, int (*func) (struct mbuf *, void *), void *arg)=20 Index: sys/buf_ring.h =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- sys/buf_ring.h (revision 246357) +++ sys/buf_ring.h (working copy) @@ -208,6 +208,51 @@ buf_ring_dequeue_sc(struct buf_ring *br) } =20 /* + * single-consumer advance after a peek + * use where it is protected by a lock + * e.g. a network driver's tx queue lock + */ +static __inline void +buf_ring_advance_sc(struct buf_ring *br) +{ + uint32_t cons_head, cons_next; + uint32_t prod_tail; +=09 + cons_head =3D br->br_cons_head; + prod_tail =3D br->br_prod_tail; +=09 + cons_next =3D (cons_head + 1) & br->br_cons_mask; + if (cons_head =3D=3D prod_tail)=20 + return; + br->br_cons_head =3D cons_next; +#ifdef DEBUG_BUFRING + br->br_ring[cons_head] =3D NULL; +#endif + br->br_cons_tail =3D cons_next; +} + +/* + * Used to return a differnt mbuf to the + * top of the ring. This can happen if + * the driver changed the packets (some defragmentation + * for example) and then realized the transmit + * ring was full. In such a case the old packet + * is now freed, but we want the order of the actual + * data (being sent in the new packet) to remain + * the same. + */ +static __inline void +buf_ring_swap(struct buf_ring *br, void *new, void *old) +{ + int ret; + if (br->br_cons_head =3D=3D br->br_prod_tail)=20 + /* Huh? */ + return; + ret =3D atomic_cmpset_long((uint64_t = *)&br->br_ring[br->br_cons_head], (uint64_t)old, (uint64_t)new); + KASSERT(ret, ("Swap out failed old:%p new:%p ret:%d", old, new, = ret)); +} + +/* * return a pointer to the first entry in the ring * without modifying it, or NULL if the ring is empty * race-prone if not protected by a lock Index: ofed/drivers/net/mlx4/en_tx.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- ofed/drivers/net/mlx4/en_tx.c (revision 246357) +++ ofed/drivers/net/mlx4/en_tx.c (working copy) @@ -919,7 +919,7 @@ mlx4_en_transmit_locked(struct ifnet *dev, int tx_ { struct mlx4_en_priv *priv =3D netdev_priv(dev); struct mlx4_en_tx_ring *ring; - struct mbuf *next; + struct mbuf *next, *snext; int enqueued, err =3D 0; =20 ring =3D &priv->tx_ring[tx_ind]; @@ -931,22 +931,22 @@ mlx4_en_transmit_locked(struct ifnet *dev, int tx_ } =20 enqueued =3D 0; - if (m =3D=3D NULL) { - next =3D drbr_dequeue(dev, ring->br); - } else if (drbr_needs_enqueue(dev, ring->br)) { + if (m !=3D NULL) { if ((err =3D drbr_enqueue(dev, ring->br, m)) !=3D 0) return (err); - next =3D drbr_dequeue(dev, ring->br); - } else - next =3D m; - + } /* Process the queue */ - while (next !=3D NULL) { + while ((next =3D drbr_peek(ifp, txr->br)) !=3D NULL) { + snext =3D next; if ((err =3D mlx4_en_xmit(dev, tx_ind, &next)) !=3D 0) { - if (next !=3D NULL) - err =3D drbr_enqueue(dev, ring->br, = next); + if (next =3D=3D NULL) { + drbr_advance(ifp, txr->br); + } else { + drbr_putback(ifp, txr->br, next, snext); + } break; } + drbr_advance(ifp, txr->br); enqueued++; dev->if_obytes +=3D next->m_pkthdr.len; if (next->m_flags & M_MCAST) @@ -955,7 +955,6 @@ mlx4_en_transmit_locked(struct ifnet *dev, int tx_ ETHER_BPF_MTAP(dev, next); if ((dev->if_drv_flags & IFF_DRV_RUNNING) =3D=3D 0) break; - next =3D drbr_dequeue(dev, ring->br); } =20 if (enqueued > 0) --Apple-Mail=_653FCD68-7101-4A93-A68D-4D518FFFEDD1 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=windows-1252 On Feb 5, 2013, at 12:13 PM, John Baldwin wrote: > On Tuesday, February 05, 2013 10:24:24 am Randall Stewart wrote: >> Here is an updated patch=85 sigh.. I foobar'd the ALTQ stuff.. lots = of crashes=20 > ;-D >=20 > Heh, I like this better, thanks. I think you can remove = buf_ring_swap() as it=20 > is no longer used? >=20 > --=20 > John Baldwin > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >=20 ------------------------------ Randall Stewart 803-317-4952 (cell) --Apple-Mail=_653FCD68-7101-4A93-A68D-4D518FFFEDD1-- From owner-freebsd-net@FreeBSD.ORG Tue Feb 5 17:44:33 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 18ECDAA4 for ; Tue, 5 Feb 2013 17:44:33 +0000 (UTC) (envelope-from andre@freebsd.org) Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2]) by mx1.freebsd.org (Postfix) with ESMTP id 80C0CC37 for ; Tue, 5 Feb 2013 17:44:32 +0000 (UTC) Received: (qmail 53944 invoked from network); 5 Feb 2013 19:03:25 -0000 Received: from c00l3r.networx.ch (HELO [127.0.0.1]) ([62.48.2.2]) (envelope-sender ) by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP for ; 5 Feb 2013 19:03:25 -0000 Message-ID: <511144FB.50807@freebsd.org> Date: Tue, 05 Feb 2013 18:44:27 +0100 From: Andre Oppermann User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130107 Thunderbird/17.0.2 MIME-Version: 1.0 To: John Baldwin Subject: Re: [PATCH] Add a new TCP_IGNOREIDLE socket option References: <201301221511.02496.jhb@freebsd.org> <201301301158.33838.jhb@freebsd.org> <510957B9.8070203@freebsd.org> <201302051211.43345.jhb@freebsd.org> In-Reply-To: <201302051211.43345.jhb@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Sepherosa Ziehau , freebsd-net@freebsd.org, Bjoern Zeeb X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Feb 2013 17:44:33 -0000 On 05.02.2013 18:11, John Baldwin wrote: > On Wednesday, January 30, 2013 12:26:17 pm Andre Oppermann wrote: >> You can simply create your own congestion control algorithm with only the >> restart window changed. See (pseudo) code below. BTW, I just noticed that >> the other cc algos don't do not reset the idle window. > > *sigh* I am fully competent at maintaining my own local changes. The point > was to share this so that other people with similar workloads could make use > of it. Also, a custom CC algo is not the right approach as we would want this > change regardless of the CC algo used for handling non-idle periods (so that > this is an orthogonal knob). Linux also makes this an orthogonal knob rather > than requiring a separate CC algo. If everything Linux does is good, then go ahead and commit it. Discussing this change further then is pointless. I don't mind too much and I have stated my case why I think it's the wrong thing to do. I would prefer to encapsulate it into its own not-so-much-congestion-management algorithm so you can eventually do other tweaks as well like more aggressive loss recovery which would fit your objective as well. Since you have to modify your app anyways to do the sockopt call this seems a more complete solution to me. At least better than to do a non-portable hack that violates one of the most fundamental TCP concepts. -- Andre From owner-freebsd-net@FreeBSD.ORG Tue Feb 5 18:49:39 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id E3285B01; Tue, 5 Feb 2013 18:49:39 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net [IPv6:2001:470:1f10:75::2]) by mx1.freebsd.org (Postfix) with ESMTP id 9F9AEF9A; Tue, 5 Feb 2013 18:49:39 +0000 (UTC) Received: from pakbsde14.localnet (unknown [38.105.238.108]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id ECF71B926; Tue, 5 Feb 2013 13:49:38 -0500 (EST) From: John Baldwin To: Randall Stewart Subject: Re: Driver patch to look at... Date: Tue, 5 Feb 2013 13:45:23 -0500 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p22; KDE/4.5.5; amd64; ; ) References: <201302051213.51401.jhb@freebsd.org> <0D421326-9A80-4E21-A18E-E717F5C02164@lakerest.net> In-Reply-To: <0D421326-9A80-4E21-A18E-E717F5C02164@lakerest.net> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Message-Id: <201302051345.23569.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Tue, 05 Feb 2013 13:49:39 -0500 (EST) Cc: freebsd-net@freebsd.org, Robert Watson , Jack Vogel , Kip Macy X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Feb 2013 18:49:40 -0000 On Tuesday, February 05, 2013 12:44:01 pm Randall Stewart wrote: > Actually, no it is used. > > If you look in if_var.h int he drbr_putback() function, it does > a buf_ring_swap when the old mbuf pointer does not equal the > new mbuf pointer. This *does* happen, I crashed at least once > yesterday when the igb driver did something to free the original > mbuf and return a new mbuf with the data (prepend or some such). > > I also have found several issues that I have fixed this morning.. its been > crash city on my test beds.. > > Here is the latest patch with all fixes and suggested changes from emaste (thanks Ed) Oh, I see now why that is needed. -- John Baldwin From owner-freebsd-net@FreeBSD.ORG Tue Feb 5 18:52:54 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 6DCD9BBE; Tue, 5 Feb 2013 18:52:54 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net [IPv6:2001:470:1f10:75::2]) by mx1.freebsd.org (Postfix) with ESMTP id 2ACB6FBC; Tue, 5 Feb 2013 18:52:54 +0000 (UTC) Received: from pakbsde14.localnet (unknown [38.105.238.108]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 96168B926; Tue, 5 Feb 2013 13:52:53 -0500 (EST) From: John Baldwin To: Randall Stewart Subject: Re: Driver patch to look at... Date: Tue, 5 Feb 2013 13:52:52 -0500 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p22; KDE/4.5.5; amd64; ; ) References: <201302051213.51401.jhb@freebsd.org> <0D421326-9A80-4E21-A18E-E717F5C02164@lakerest.net> In-Reply-To: <0D421326-9A80-4E21-A18E-E717F5C02164@lakerest.net> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Message-Id: <201302051352.52741.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Tue, 05 Feb 2013 13:52:53 -0500 (EST) Cc: freebsd-net@freebsd.org, Robert Watson , Jack Vogel , Kip Macy X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Feb 2013 18:52:54 -0000 On Tuesday, February 05, 2013 12:44:01 pm Randall Stewart wrote: > Actually, no it is used. > > If you look in if_var.h int he drbr_putback() function, it does > a buf_ring_swap when the old mbuf pointer does not equal the > new mbuf pointer. This *does* happen, I crashed at least once > yesterday when the igb driver did something to free the original > mbuf and return a new mbuf with the data (prepend or some such). > > I also have found several issues that I have fixed this morning.. its been > crash city on my test beds.. > > Here is the latest patch with all fixes and suggested changes from emaste (thanks Ed) Actually, one more suggestion then (since you have to keep putback). It would be nice to not have to require 'snext' in all the callers. How about replace buf_ring_swap() with a buf_ring_putback_sc() that accepts the mbuf and just stores it at the head unconditionally and have drbr_putback() use that? -- John Baldwin From owner-freebsd-net@FreeBSD.ORG Tue Feb 5 19:04:14 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 7F0E5EBD; Tue, 5 Feb 2013 19:04:14 +0000 (UTC) (envelope-from rrs@lakerest.net) Received: from lakerest.net (lakerest.net [70.155.160.98]) by mx1.freebsd.org (Postfix) with ESMTP id 0D8AEB5; Tue, 5 Feb 2013 19:04:13 +0000 (UTC) Received: from [10.1.1.101] (bsd4.lakerest.net [70.155.160.102]) (authenticated bits=0) by lakerest.net (8.14.4/8.14.3) with ESMTP id r15J4T9T044212 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT); Tue, 5 Feb 2013 14:04:29 -0500 (EST) (envelope-from rrs@lakerest.net) Subject: Re: Driver patch to look at... Mime-Version: 1.0 (Apple Message framework v1283) Content-Type: text/plain; charset=windows-1252 From: Randall Stewart In-Reply-To: <201302051352.52741.jhb@freebsd.org> Date: Tue, 5 Feb 2013 14:04:12 -0500 Content-Transfer-Encoding: quoted-printable Message-Id: <990BD290-643B-4BC7-8D64-6D4CE987025A@lakerest.net> References: <201302051213.51401.jhb@freebsd.org> <0D421326-9A80-4E21-A18E-E717F5C02164@lakerest.net> <201302051352.52741.jhb@freebsd.org> To: John Baldwin X-Mailer: Apple Mail (2.1283) Cc: freebsd-net@freebsd.org, Robert Watson , Jack Vogel , Kip Macy X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Feb 2013 19:04:14 -0000 Hmm That would trade off a stack pointer + a compare vs always doing the move. Thats fine until I have to add the _mc() version, then the put back would be an atomic, and most of the time the return from this is probably not changed=85 I really would prefer not to since the compare and maybe store vs the always store.. though the same now, would be far more expensive in the _mc version.. if we do a _mc version of course ;-) But I am willing to do whatever .. since this really needs to be fixed. R On Feb 5, 2013, at 1:52 PM, John Baldwin wrote: > On Tuesday, February 05, 2013 12:44:01 pm Randall Stewart wrote: >> Actually, no it is used. >>=20 >> If you look in if_var.h int he drbr_putback() function, it does >> a buf_ring_swap when the old mbuf pointer does not equal the >> new mbuf pointer. This *does* happen, I crashed at least once >> yesterday when the igb driver did something to free the original >> mbuf and return a new mbuf with the data (prepend or some such). >>=20 >> I also have found several issues that I have fixed this morning.. its = been >> crash city on my test beds.. >>=20 >> Here is the latest patch with all fixes and suggested changes from = emaste=20 > (thanks Ed) >=20 > Actually, one more suggestion then (since you have to keep putback). = It > would be nice to not have to require 'snext' in all the callers. How > about replace buf_ring_swap() with a buf_ring_putback_sc() that = accepts the > mbuf and just stores it at the head unconditionally and have = drbr_putback() > use that? >=20 > --=20 > John Baldwin > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >=20 ------------------------------ Randall Stewart 803-317-4952 (cell) From owner-freebsd-net@FreeBSD.ORG Tue Feb 5 19:08:18 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 791D1FA0; Tue, 5 Feb 2013 19:08:18 +0000 (UTC) (envelope-from rrs@lakerest.net) Received: from lakerest.net (lakerest.net [70.155.160.98]) by mx1.freebsd.org (Postfix) with ESMTP id 03980E5; Tue, 5 Feb 2013 19:08:17 +0000 (UTC) Received: from [10.1.1.101] (bsd4.lakerest.net [70.155.160.102]) (authenticated bits=0) by lakerest.net (8.14.4/8.14.3) with ESMTP id r15J8V07044295 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT); Tue, 5 Feb 2013 14:08:31 -0500 (EST) (envelope-from rrs@lakerest.net) Subject: Re: Driver patch to look at... Mime-Version: 1.0 (Apple Message framework v1283) Content-Type: text/plain; charset=windows-1252 From: Randall Stewart In-Reply-To: <990BD290-643B-4BC7-8D64-6D4CE987025A@lakerest.net> Date: Tue, 5 Feb 2013 14:08:15 -0500 Content-Transfer-Encoding: quoted-printable Message-Id: <00075DDD-73A1-4CA4-9574-036D43B071D9@lakerest.net> References: <201302051213.51401.jhb@freebsd.org> <0D421326-9A80-4E21-A18E-E717F5C02164@lakerest.net> <201302051352.52741.jhb@freebsd.org> <990BD290-643B-4BC7-8D64-6D4CE987025A@lakerest.net> To: Randall Stewart X-Mailer: Apple Mail (2.1283) Cc: freebsd-net@freebsd.org, Robert Watson , Jack Vogel , John Baldwin , Kip Macy X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Feb 2013 19:08:18 -0000 Hmm wait, I could probably do the compare to whats in the ring buffer ;-D R On Feb 5, 2013, at 2:04 PM, Randall Stewart wrote: > Hmm >=20 > That would trade off a stack pointer + a compare > vs always doing the move. >=20 > Thats fine until I have to add the _mc() version, then the put > back would be an atomic, and most of the time the return from > this is probably not changed=85 >=20 > I really would prefer not to since the compare and maybe store vs > the always store.. though the same now, would be far more expensive > in the _mc version.. if we do a _mc version of course ;-) >=20 > But I am willing to do whatever .. since this really needs to be = fixed. >=20 > R > On Feb 5, 2013, at 1:52 PM, John Baldwin wrote: >=20 >> On Tuesday, February 05, 2013 12:44:01 pm Randall Stewart wrote: >>> Actually, no it is used. >>>=20 >>> If you look in if_var.h int he drbr_putback() function, it does >>> a buf_ring_swap when the old mbuf pointer does not equal the >>> new mbuf pointer. This *does* happen, I crashed at least once >>> yesterday when the igb driver did something to free the original >>> mbuf and return a new mbuf with the data (prepend or some such). >>>=20 >>> I also have found several issues that I have fixed this morning.. = its been >>> crash city on my test beds.. >>>=20 >>> Here is the latest patch with all fixes and suggested changes from = emaste=20 >> (thanks Ed) >>=20 >> Actually, one more suggestion then (since you have to keep putback). = It >> would be nice to not have to require 'snext' in all the callers. How >> about replace buf_ring_swap() with a buf_ring_putback_sc() that = accepts the >> mbuf and just stores it at the head unconditionally and have = drbr_putback() >> use that? >>=20 >> --=20 >> John Baldwin >> _______________________________________________ >> freebsd-net@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-net >> To unsubscribe, send any mail to = "freebsd-net-unsubscribe@freebsd.org" >>=20 >=20 > ------------------------------ > Randall Stewart > 803-317-4952 (cell) >=20 > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >=20 ------------------------------ Randall Stewart 803-317-4952 (cell) From owner-freebsd-net@FreeBSD.ORG Tue Feb 5 19:11:20 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 068222A6; Tue, 5 Feb 2013 19:11:20 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net [IPv6:2001:470:1f10:75::2]) by mx1.freebsd.org (Postfix) with ESMTP id D65CE112; Tue, 5 Feb 2013 19:11:19 +0000 (UTC) Received: from pakbsde14.localnet (unknown [38.105.238.108]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 57B05B924; Tue, 5 Feb 2013 14:11:19 -0500 (EST) From: John Baldwin To: Randall Stewart Subject: Re: Driver patch to look at... Date: Tue, 5 Feb 2013 14:11:17 -0500 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p22; KDE/4.5.5; amd64; ; ) References: <201302051352.52741.jhb@freebsd.org> <990BD290-643B-4BC7-8D64-6D4CE987025A@lakerest.net> In-Reply-To: <990BD290-643B-4BC7-8D64-6D4CE987025A@lakerest.net> MIME-Version: 1.0 Content-Type: Text/Plain; charset="windows-1252" Content-Transfer-Encoding: quoted-printable Message-Id: <201302051411.17883.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Tue, 05 Feb 2013 14:11:19 -0500 (EST) Cc: freebsd-net@freebsd.org, Robert Watson , Jack Vogel , Kip Macy X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Feb 2013 19:11:20 -0000 On Tuesday, February 05, 2013 2:04:12 pm Randall Stewart wrote: > Hmm >=20 > That would trade off a stack pointer + a compare > vs always doing the move. Right, the store is probably cheaper than the branch. :) However, minimizi= ng=20 the duplicated code in drivers and having this interface be as clear/readab= le=20 as possible is my main goal. > Thats fine until I have to add the _mc() version, then the put > back would be an atomic, and most of the time the return from > this is probably not changed=85 >=20 > I really would prefer not to since the compare and maybe store vs > the always store.. though the same now, would be far more expensive > in the _mc version.. if we do a _mc version of course ;-) I would just not bother with an _mc version until we actually need it. :) I think doing the sort of peek/advance type logic only works well with single consumers anyway. =2D-=20 John Baldwin From owner-freebsd-net@FreeBSD.ORG Tue Feb 5 19:11:54 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 15D67341; Tue, 5 Feb 2013 19:11:54 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net [IPv6:2001:470:1f10:75::2]) by mx1.freebsd.org (Postfix) with ESMTP id E5EF7123; Tue, 5 Feb 2013 19:11:53 +0000 (UTC) Received: from pakbsde14.localnet (unknown [38.105.238.108]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 44833B91E; Tue, 5 Feb 2013 14:11:53 -0500 (EST) From: John Baldwin To: Randall Stewart Subject: Re: Driver patch to look at... Date: Tue, 5 Feb 2013 14:11:52 -0500 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p22; KDE/4.5.5; amd64; ; ) References: <990BD290-643B-4BC7-8D64-6D4CE987025A@lakerest.net> <00075DDD-73A1-4CA4-9574-036D43B071D9@lakerest.net> In-Reply-To: <00075DDD-73A1-4CA4-9574-036D43B071D9@lakerest.net> MIME-Version: 1.0 Content-Type: Text/Plain; charset="windows-1252" Content-Transfer-Encoding: quoted-printable Message-Id: <201302051411.52495.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Tue, 05 Feb 2013 14:11:53 -0500 (EST) Cc: freebsd-net@freebsd.org, Robert Watson , Jack Vogel , Kip Macy X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Feb 2013 19:11:54 -0000 On Tuesday, February 05, 2013 2:08:15 pm Randall Stewart wrote: > Hmm >=20 > wait, I could probably do the compare to whats in the ring buffer ;-D I wouldn't bother. The compare and branch is probably more expensive than the store. > R > On Feb 5, 2013, at 2:04 PM, Randall Stewart wrote: >=20 > > Hmm > >=20 > > That would trade off a stack pointer + a compare > > vs always doing the move. > >=20 > > Thats fine until I have to add the _mc() version, then the put > > back would be an atomic, and most of the time the return from > > this is probably not changed=85 > >=20 > > I really would prefer not to since the compare and maybe store vs > > the always store.. though the same now, would be far more expensive > > in the _mc version.. if we do a _mc version of course ;-) > >=20 > > But I am willing to do whatever .. since this really needs to be fixed. > >=20 > > R > > On Feb 5, 2013, at 1:52 PM, John Baldwin wrote: > >=20 > >> On Tuesday, February 05, 2013 12:44:01 pm Randall Stewart wrote: > >>> Actually, no it is used. > >>>=20 > >>> If you look in if_var.h int he drbr_putback() function, it does > >>> a buf_ring_swap when the old mbuf pointer does not equal the > >>> new mbuf pointer. This *does* happen, I crashed at least once > >>> yesterday when the igb driver did something to free the original > >>> mbuf and return a new mbuf with the data (prepend or some such). > >>>=20 > >>> I also have found several issues that I have fixed this morning.. its= been > >>> crash city on my test beds.. > >>>=20 > >>> Here is the latest patch with all fixes and suggested changes from em= aste=20 > >> (thanks Ed) > >>=20 > >> Actually, one more suggestion then (since you have to keep putback). = It > >> would be nice to not have to require 'snext' in all the callers. How > >> about replace buf_ring_swap() with a buf_ring_putback_sc() that accept= s the > >> mbuf and just stores it at the head unconditionally and have drbr_putb= ack() > >> use that? > >>=20 > >> --=20 > >> John Baldwin > >> _______________________________________________ > >> freebsd-net@freebsd.org mailing list > >> http://lists.freebsd.org/mailman/listinfo/freebsd-net > >> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > >>=20 > >=20 > > ------------------------------ > > Randall Stewart > > 803-317-4952 (cell) > >=20 > > _______________________________________________ > > freebsd-net@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-net > > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > >=20 >=20 > ------------------------------ > Randall Stewart > 803-317-4952 (cell) >=20 >=20 =2D-=20 John Baldwin From owner-freebsd-net@FreeBSD.ORG Tue Feb 5 19:30:37 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id C61EC628; Tue, 5 Feb 2013 19:30:37 +0000 (UTC) (envelope-from rrs@lakerest.net) Received: from lakerest.net (lakerest.net [70.155.160.98]) by mx1.freebsd.org (Postfix) with ESMTP id 152CB23D; Tue, 5 Feb 2013 19:30:36 +0000 (UTC) Received: from [10.1.1.101] (bsd4.lakerest.net [70.155.160.102]) (authenticated bits=0) by lakerest.net (8.14.4/8.14.3) with ESMTP id r15JUqXm044563 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT); Tue, 5 Feb 2013 14:30:52 -0500 (EST) (envelope-from rrs@lakerest.net) Subject: Re: Driver patch to look at... Mime-Version: 1.0 (Apple Message framework v1283) Content-Type: multipart/mixed; boundary="Apple-Mail=_F817BBF6-0AC8-4C69-A5F6-20B355DAEAC1" From: Randall Stewart In-Reply-To: <201302051411.52495.jhb@freebsd.org> Date: Tue, 5 Feb 2013 14:30:36 -0500 Message-Id: <74624C61-0564-420F-9F82-0475488F857C@lakerest.net> References: <990BD290-643B-4BC7-8D64-6D4CE987025A@lakerest.net> <00075DDD-73A1-4CA4-9574-036D43B071D9@lakerest.net> <201302051411.52495.jhb@freebsd.org> To: John Baldwin X-Mailer: Apple Mail (2.1283) Cc: freebsd-net@freebsd.org, Robert Watson , Jack Vogel , Kip Macy X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Feb 2013 19:30:37 -0000 --Apple-Mail=_F817BBF6-0AC8-4C69-A5F6-20B355DAEAC1 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=windows-1252 Ok Here it is one last time (I hope) with the updates ;-) R --Apple-Mail=_F817BBF6-0AC8-4C69-A5F6-20B355DAEAC1 Content-Disposition: attachment; filename=driver_patch.txt Content-Type: text/plain; x-unix-mode=0644; name="driver_patch.txt" Content-Transfer-Encoding: 7bit Index: dev/e1000/if_em.c =================================================================== --- dev/e1000/if_em.c (revision 246357) +++ dev/e1000/if_em.c (working copy) @@ -905,22 +905,24 @@ em_mq_start_locked(struct ifnet *ifp, struct tx_ri } enq = 0; - if (m == NULL) { - next = drbr_dequeue(ifp, txr->br); - } else if (drbr_needs_enqueue(ifp, txr->br)) { - if ((err = drbr_enqueue(ifp, txr->br, m)) != 0) + if (m != NULL) { + err = drbr_enqueue(ifp, txr->br, m); + if (err) { return (err); - next = drbr_dequeue(ifp, txr->br); - } else - next = m; + } + } /* Process the queue */ - while (next != NULL) { + while ((next = drbr_peek(ifp, txr->br)) != NULL) { if ((err = em_xmit(txr, &next)) != 0) { - if (next != NULL) - err = drbr_enqueue(ifp, txr->br, next); - break; + if (next == NULL) { + drbr_advance(ifp, txr->br); + } else { + drbr_putback(ifp, txr->br, next); + } + break; } + drbr_advance(ifp, txr->br); enq++; ifp->if_obytes += next->m_pkthdr.len; if (next->m_flags & M_MCAST) @@ -928,7 +930,6 @@ em_mq_start_locked(struct ifnet *ifp, struct tx_ri ETHER_BPF_MTAP(ifp, next); if ((ifp->if_drv_flags & IFF_DRV_RUNNING) == 0) break; - next = drbr_dequeue(ifp, txr->br); } if (enq > 0) { Index: dev/e1000/if_igb.c =================================================================== --- dev/e1000/if_igb.c (revision 246357) +++ dev/e1000/if_igb.c (working copy) @@ -965,12 +965,13 @@ igb_mq_start(struct ifnet *ifp, struct mbuf *m) ** out-of-order delivery, but ** settle for it if that fails */ - if (m) + if (m != NULL) drbr_enqueue(ifp, txr->br, m); err = igb_mq_start_locked(ifp, txr); IGB_TX_UNLOCK(txr); } else { - err = drbr_enqueue(ifp, txr->br, m); + if (m != NULL) + err = drbr_enqueue(ifp, txr->br, m); taskqueue_enqueue(que->tq, &txr->txq_task); } @@ -994,12 +995,22 @@ igb_mq_start_locked(struct ifnet *ifp, struct tx_r enq = 0; /* Process the queue */ - while ((next = drbr_dequeue(ifp, txr->br)) != NULL) { + while ((next = drbr_peek(ifp, txr->br)) != NULL) { if ((err = igb_xmit(txr, &next)) != 0) { - if (next != NULL) - err = drbr_enqueue(ifp, txr->br, next); + if (next == NULL) { + /* It was freed, move forward */ + drbr_advance(ifp, txr->br); + } else { + /* + * Still have one left, it may not be + * the same since the transmit function + * may have changed it. + */ + drbr_putback(ifp, txr->br, next); + } break; } + drbr_advance(ifp, txr->br); enq++; ifp->if_obytes += next->m_pkthdr.len; if (next->m_flags & M_MCAST) Index: dev/oce/oce_if.c =================================================================== --- dev/oce/oce_if.c (revision 246357) +++ dev/oce/oce_if.c (working copy) @@ -1166,29 +1166,27 @@ oce_multiq_transmit(struct ifnet *ifp, struct mbuf return status; } - if (m == NULL) - next = drbr_dequeue(ifp, br); - else if (drbr_needs_enqueue(ifp, br)) { + if (m != NULL) { if ((status = drbr_enqueue(ifp, br, m)) != 0) return status; - next = drbr_dequeue(ifp, br); - } else - next = m; - - while (next != NULL) { + } + while ((next = drbr_peek(ifp, br)) != NULL) { if (oce_tx(sc, &next, queue_index)) { - if (next != NULL) { + if (next == NULL) { + drbr_advance(ifp, br); + } else { + drbr_putback(ifp, br, next); wq->tx_stats.tx_stops ++; ifp->if_drv_flags |= IFF_DRV_OACTIVE; status = drbr_enqueue(ifp, br, next); } break; } + drbr_advance(ifp, br); ifp->if_obytes += next->m_pkthdr.len; if (next->m_flags & M_MCAST) ifp->if_omcasts++; ETHER_BPF_MTAP(ifp, next); - next = drbr_dequeue(ifp, br); } return status; Index: dev/ixgbe/ixgbe.c =================================================================== --- dev/ixgbe/ixgbe.c (revision 246357) +++ dev/ixgbe/ixgbe.c (working copy) @@ -832,22 +832,24 @@ ixgbe_mq_start_locked(struct ifnet *ifp, struct tx } enqueued = 0; - if (m == NULL) { - next = drbr_dequeue(ifp, txr->br); - } else if (drbr_needs_enqueue(ifp, txr->br)) { - if ((err = drbr_enqueue(ifp, txr->br, m)) != 0) + if (m != NULL) { + err = drbr_enqueue(ifp, txr->br, m); + if (err) { return (err); - next = drbr_dequeue(ifp, txr->br); - } else - next = m; + } + } /* Process the queue */ - while (next != NULL) { + while ((next = drbr_peek(ifp, txr->br)) != NULL) { if ((err = ixgbe_xmit(txr, &next)) != 0) { - if (next != NULL) - err = drbr_enqueue(ifp, txr->br, next); + if (next == NULL) { + drbr_advance(ifp, txr->br); + } else { + drbr_putback(ifp, txr->br, next); + } break; } + drbr_advance(ifp, txr->br); enqueued++; /* Send a copy of the frame to the BPF listener */ ETHER_BPF_MTAP(ifp, next); @@ -855,7 +857,6 @@ ixgbe_mq_start_locked(struct ifnet *ifp, struct tx break; if (txr->tx_avail < IXGBE_TX_OP_THRESHOLD) ixgbe_txeof(txr); - next = drbr_dequeue(ifp, txr->br); } if (enqueued > 0) { Index: dev/ixgbe/ixv.c =================================================================== --- dev/ixgbe/ixv.c (revision 246357) +++ dev/ixgbe/ixv.c (working copy) @@ -620,22 +620,23 @@ ixv_mq_start_locked(struct ifnet *ifp, struct tx_r ixv_txeof(txr); enqueued = 0; - if (m == NULL) { - next = drbr_dequeue(ifp, txr->br); - } else if (drbr_needs_enqueue(ifp, txr->br)) { - if ((err = drbr_enqueue(ifp, txr->br, m)) != 0) + if (m != NULL) { + err = drbr_enqueue(ifp, txr->br, m); + if (err) { return (err); - next = drbr_dequeue(ifp, txr->br); - } else - next = m; - + } + } /* Process the queue */ - while (next != NULL) { + while ((next = drbr_peek(ifp, txr->br)) != NULL) { if ((err = ixv_xmit(txr, &next)) != 0) { - if (next != NULL) - err = drbr_enqueue(ifp, txr->br, next); + if (next == NULL) { + drbr_advance(ifp, txr->br); + } else { + drbr_putback(ifp, txr->br, next); + } break; } + drbr_advance(ifp, txr->br); enqueued++; ifp->if_obytes += next->m_pkthdr.len; if (next->m_flags & M_MCAST) @@ -648,7 +649,6 @@ ixv_mq_start_locked(struct ifnet *ifp, struct tx_r ifp->if_drv_flags |= IFF_DRV_OACTIVE; break; } - next = drbr_dequeue(ifp, txr->br); } if (enqueued > 0) { Index: dev/bxe/if_bxe.c =================================================================== --- dev/bxe/if_bxe.c (revision 246357) +++ dev/bxe/if_bxe.c (working copy) @@ -9506,24 +9506,15 @@ bxe_tx_mq_start_locked(struct ifnet *ifp, BXE_FP_LOCK_ASSERT(fp); - if (m == NULL) { - /* No new work, check for pending frames. */ - next = drbr_dequeue(ifp, fp->br); - } else if (drbr_needs_enqueue(ifp, fp->br)) { - /* Both new and pending work, maintain packet order. */ + if (m != NULL) { rc = drbr_enqueue(ifp, fp->br, m); if (rc != 0) { fp->tx_soft_errors++; goto bxe_tx_mq_start_locked_exit; } - next = drbr_dequeue(ifp, fp->br); - } else - /* New work only, nothing pending. */ - next = m; - + } /* Keep adding entries while there are frames to send. */ - while (next != NULL) { - + while ((next = drbr_peek(ifp, fp->br)) != NULL) { /* The transmit mbuf now belongs to us, keep track of it. */ fp->tx_mbuf_alloc++; @@ -9537,23 +9528,22 @@ bxe_tx_mq_start_locked(struct ifnet *ifp, if (__predict_false(rc != 0)) { fp->tx_encap_failures++; /* Very Bad Frames(tm) may have been dropped. */ - if (next != NULL) { + if (next == NULL) { + drbr_advance(ifp, fp->br); + } else { + drbr_putback(ifp, fp->br, next); /* * Mark the TX queue as full and save * the frame. */ ifp->if_drv_flags |= IFF_DRV_OACTIVE; fp->tx_frame_deferred++; - - /* This may reorder frame. */ - rc = drbr_enqueue(ifp, fp->br, next); fp->tx_mbuf_alloc--; } - /* Stop looking for more work. */ break; } - + drbr_advance(ifp, fp->br); /* The transmit frame was enqueued successfully. */ tx_count++; @@ -9574,8 +9564,6 @@ bxe_tx_mq_start_locked(struct ifnet *ifp, ifp->if_drv_flags &= ~IFF_DRV_OACTIVE; break; } - - next = drbr_dequeue(ifp, fp->br); } /* No TX packets were dequeued. */ Index: net/if_var.h =================================================================== --- net/if_var.h (revision 246357) +++ net/if_var.h (working copy) @@ -622,6 +622,45 @@ drbr_enqueue(struct ifnet *ifp, struct buf_ring *b } static __inline void +drbr_putback(struct ifnet *ifp, struct buf_ring *br, struct mbuf *new) +{ + /* + * The top of the list needs to be swapped + * for this one. + */ +#ifdef ALTQ + if (ifp != NULL && ALTQ_IS_ENABLED(&ifp->if_snd)) { + /* + * Peek in altq case dequeued it + * so put it back. + */ + IFQ_DRV_PREPEND(&ifp->if_snd, new); + return; + } +#endif + buf_ring_putback_sc(br, new); +} + +static __inline struct mbuf * +drbr_peek(struct ifnet *ifp, struct buf_ring *br) +{ +#ifdef ALTQ + struct mbuf *m; + if (ifp != NULL && ALTQ_IS_ENABLED(&ifp->if_snd)) { + /* + * Pull it off like a dequeue + * since drbr_advance() does nothing + * for altq and drbr_putback() will + * use the old prepend function. + */ + IFQ_DEQUEUE(&ifp->if_snd, m); + return (m); + } +#endif + return(buf_ring_peek(br)); +} + +static __inline void drbr_flush(struct ifnet *ifp, struct buf_ring *br) { struct mbuf *m; @@ -648,7 +687,7 @@ drbr_dequeue(struct ifnet *ifp, struct buf_ring *b #ifdef ALTQ struct mbuf *m; - if (ALTQ_IS_ENABLED(&ifp->if_snd)) { + if (ifp != NULL && ALTQ_IS_ENABLED(&ifp->if_snd)) { IFQ_DEQUEUE(&ifp->if_snd, m); return (m); } @@ -656,6 +695,18 @@ drbr_dequeue(struct ifnet *ifp, struct buf_ring *b return (buf_ring_dequeue_sc(br)); } +static __inline void +drbr_advance(struct ifnet *ifp, struct buf_ring *br) +{ +#ifdef ALTQ + /* Nothing to do here since peek dequeues in altq case */ + if (ifp != NULL && ALTQ_IS_ENABLED(&ifp->if_snd)) + return; +#endif + return (buf_ring_advance_sc(br)); +} + + static __inline struct mbuf * drbr_dequeue_cond(struct ifnet *ifp, struct buf_ring *br, int (*func) (struct mbuf *, void *), void *arg) Index: sys/buf_ring.h =================================================================== --- sys/buf_ring.h (revision 246357) +++ sys/buf_ring.h (working copy) @@ -208,6 +208,55 @@ buf_ring_dequeue_sc(struct buf_ring *br) } /* + * single-consumer advance after a peek + * use where it is protected by a lock + * e.g. a network driver's tx queue lock + */ +static __inline void +buf_ring_advance_sc(struct buf_ring *br) +{ + uint32_t cons_head, cons_next; + uint32_t prod_tail; + + cons_head = br->br_cons_head; + prod_tail = br->br_prod_tail; + + cons_next = (cons_head + 1) & br->br_cons_mask; + if (cons_head == prod_tail) + return; + br->br_cons_head = cons_next; +#ifdef DEBUG_BUFRING + br->br_ring[cons_head] = NULL; +#endif + br->br_cons_tail = cons_next; +} + +/* + * Used to return a buffer (most likely already there) + * to the top od the ring. The caller should *not* + * have used any dequeue to pull it out of the ring + * but instead should have used the peek() function. + * This is normally used where the transmit queue + * of a driver is full, and an mubf must be returned. + * Most likely whats in the ring-buffer is what + * is being put back (since it was not removed), but + * sometimes the lower transmit function may have + * done a pullup or other function that will have + * changed it. As an optimzation we always put it + * back (since jhb says the store is probably cheaper), + * if we have to do a multi-queue version we will need + * the compare and an atomic. + */ +static __inline void +buf_ring_putback_sc(struct buf_ring *br, void *new) +{ + if (br->br_cons_head == br->br_prod_tail) + /* Huh? */ + return; + br->br_ring[br->br_cons_head] = new; +} + +/* * return a pointer to the first entry in the ring * without modifying it, or NULL if the ring is empty * race-prone if not protected by a lock Index: ofed/drivers/net/mlx4/en_tx.c =================================================================== --- ofed/drivers/net/mlx4/en_tx.c (revision 246357) +++ ofed/drivers/net/mlx4/en_tx.c (working copy) @@ -931,22 +931,21 @@ mlx4_en_transmit_locked(struct ifnet *dev, int tx_ } enqueued = 0; - if (m == NULL) { - next = drbr_dequeue(dev, ring->br); - } else if (drbr_needs_enqueue(dev, ring->br)) { + if (m != NULL) { if ((err = drbr_enqueue(dev, ring->br, m)) != 0) return (err); - next = drbr_dequeue(dev, ring->br); - } else - next = m; - + } /* Process the queue */ - while (next != NULL) { + while ((next = drbr_peek(ifp, ring->br)) != NULL) { if ((err = mlx4_en_xmit(dev, tx_ind, &next)) != 0) { - if (next != NULL) - err = drbr_enqueue(dev, ring->br, next); + if (next == NULL) { + drbr_advance(ifp, ring->br); + } else { + drbr_putback(ifp, ring->br, next); + } break; } + drbr_advance(ifp, ring->br); enqueued++; dev->if_obytes += next->m_pkthdr.len; if (next->m_flags & M_MCAST) @@ -955,7 +954,6 @@ mlx4_en_transmit_locked(struct ifnet *dev, int tx_ ETHER_BPF_MTAP(dev, next); if ((dev->if_drv_flags & IFF_DRV_RUNNING) == 0) break; - next = drbr_dequeue(dev, ring->br); } if (enqueued > 0) --Apple-Mail=_F817BBF6-0AC8-4C69-A5F6-20B355DAEAC1 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=windows-1252 On Feb 5, 2013, at 2:11 PM, John Baldwin wrote: > On Tuesday, February 05, 2013 2:08:15 pm Randall Stewart wrote: >> Hmm >>=20 >> wait, I could probably do the compare to whats in the ring buffer ;-D >=20 > I wouldn't bother. The compare and branch is probably more expensive = than > the store. >=20 >> R >> On Feb 5, 2013, at 2:04 PM, Randall Stewart wrote: >>=20 >>> Hmm >>>=20 >>> That would trade off a stack pointer + a compare >>> vs always doing the move. >>>=20 >>> Thats fine until I have to add the _mc() version, then the put >>> back would be an atomic, and most of the time the return from >>> this is probably not changed=85 >>>=20 >>> I really would prefer not to since the compare and maybe store vs >>> the always store.. though the same now, would be far more expensive >>> in the _mc version.. if we do a _mc version of course ;-) >>>=20 >>> But I am willing to do whatever .. since this really needs to be = fixed. >>>=20 >>> R >>> On Feb 5, 2013, at 1:52 PM, John Baldwin wrote: >>>=20 >>>> On Tuesday, February 05, 2013 12:44:01 pm Randall Stewart wrote: >>>>> Actually, no it is used. >>>>>=20 >>>>> If you look in if_var.h int he drbr_putback() function, it does >>>>> a buf_ring_swap when the old mbuf pointer does not equal the >>>>> new mbuf pointer. This *does* happen, I crashed at least once >>>>> yesterday when the igb driver did something to free the original >>>>> mbuf and return a new mbuf with the data (prepend or some such). >>>>>=20 >>>>> I also have found several issues that I have fixed this morning.. = its been >>>>> crash city on my test beds.. >>>>>=20 >>>>> Here is the latest patch with all fixes and suggested changes from = emaste=20 >>>> (thanks Ed) >>>>=20 >>>> Actually, one more suggestion then (since you have to keep = putback). It >>>> would be nice to not have to require 'snext' in all the callers. = How >>>> about replace buf_ring_swap() with a buf_ring_putback_sc() that = accepts the >>>> mbuf and just stores it at the head unconditionally and have = drbr_putback() >>>> use that? >>>>=20 >>>> --=20 >>>> John Baldwin >>>> _______________________________________________ >>>> freebsd-net@freebsd.org mailing list >>>> http://lists.freebsd.org/mailman/listinfo/freebsd-net >>>> To unsubscribe, send any mail to = "freebsd-net-unsubscribe@freebsd.org" >>>>=20 >>>=20 >>> ------------------------------ >>> Randall Stewart >>> 803-317-4952 (cell) >>>=20 >>> _______________________________________________ >>> freebsd-net@freebsd.org mailing list >>> http://lists.freebsd.org/mailman/listinfo/freebsd-net >>> To unsubscribe, send any mail to = "freebsd-net-unsubscribe@freebsd.org" >>>=20 >>=20 >> ------------------------------ >> Randall Stewart >> 803-317-4952 (cell) >>=20 >>=20 >=20 > --=20 > John Baldwin > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >=20 ------------------------------ Randall Stewart 803-317-4952 (cell) --Apple-Mail=_F817BBF6-0AC8-4C69-A5F6-20B355DAEAC1-- From owner-freebsd-net@FreeBSD.ORG Tue Feb 5 20:52:19 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 355BE78C; Tue, 5 Feb 2013 20:52:19 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net [IPv6:2001:470:1f10:75::2]) by mx1.freebsd.org (Postfix) with ESMTP id 129A17CD; Tue, 5 Feb 2013 20:52:19 +0000 (UTC) Received: from pakbsde14.localnet (unknown [38.105.238.108]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 7CE90B91A; Tue, 5 Feb 2013 15:52:18 -0500 (EST) From: John Baldwin To: Randall Stewart Subject: Re: Driver patch to look at... Date: Tue, 5 Feb 2013 15:52:04 -0500 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p22; KDE/4.5.5; amd64; ; ) References: <201302051411.52495.jhb@freebsd.org> <74624C61-0564-420F-9F82-0475488F857C@lakerest.net> In-Reply-To: <74624C61-0564-420F-9F82-0475488F857C@lakerest.net> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Message-Id: <201302051552.04275.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Tue, 05 Feb 2013 15:52:18 -0500 (EST) Cc: freebsd-net@freebsd.org, Robert Watson , Jack Vogel , Kip Macy X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Feb 2013 20:52:19 -0000 On Tuesday, February 05, 2013 2:30:36 pm Randall Stewart wrote: > Ok > > Here it is one last time (I hope) with the updates ;-) One more suggestion. I would make the check in buf_ring_putback_sc() a KASSERT() so that in the production case we don't pay for a branch that should never occur. -- John Baldwin From owner-freebsd-net@FreeBSD.ORG Tue Feb 5 21:02:34 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 7BBBAA94; Tue, 5 Feb 2013 21:02:34 +0000 (UTC) (envelope-from jfvogel@gmail.com) Received: from mail-ve0-f178.google.com (mail-ve0-f178.google.com [209.85.128.178]) by mx1.freebsd.org (Postfix) with ESMTP id E735C832; Tue, 5 Feb 2013 21:02:33 +0000 (UTC) Received: by mail-ve0-f178.google.com with SMTP id db10so524488veb.23 for ; Tue, 05 Feb 2013 13:02:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=pkkpziCS6MCrE3hTz5S5WRqNvMfdvvDBbPQap8qaqm8=; b=hKcC4b2XouVteqzsi7B3g25ZzoZHJihUH2RVH29fSMtE5Tyk2uqAQBsKASH9wwci24 DsKDH56anBkh9uVPPQF4uTHCjjfllQNs/1iLBo0k054Wgp9b3cLDjLPolk/MAq7xq8sB PRwIb6D3gu3xWoTpma20rOoD9tpPGdRNMJ6eMTNT9WF3rpi7Zl82+XokmMLTyzya2Kc9 OOar4BANa4Y5zsZXhnbo+Q21S2CF62CNCQQ5KjgjS0bpuuc2zudzYNM68P484hFn4P2J Yo6hhVoGPN5rCBsqZahs+a4iWTZrzl0T9eOQdAGnGfCeO4ygkm3VATZLcnHSmuoTjk/r aKMw== MIME-Version: 1.0 X-Received: by 10.52.29.109 with SMTP id j13mr26006577vdh.111.1360098147105; Tue, 05 Feb 2013 13:02:27 -0800 (PST) Received: by 10.220.191.132 with HTTP; Tue, 5 Feb 2013 13:02:26 -0800 (PST) In-Reply-To: <201302051552.04275.jhb@freebsd.org> References: <201302051411.52495.jhb@freebsd.org> <74624C61-0564-420F-9F82-0475488F857C@lakerest.net> <201302051552.04275.jhb@freebsd.org> Date: Tue, 5 Feb 2013 13:02:26 -0800 Message-ID: Subject: Re: Driver patch to look at... From: Jack Vogel To: John Baldwin Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: freebsd-net@freebsd.org, Robert Watson , Randall Stewart , Kip Macy X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Feb 2013 21:02:34 -0000 Thanks for being the critical eye John :) And I appreciate the work Randall, thanks! Jack On Tue, Feb 5, 2013 at 12:52 PM, John Baldwin wrote: > On Tuesday, February 05, 2013 2:30:36 pm Randall Stewart wrote: > > Ok > > > > Here it is one last time (I hope) with the updates ;-) > > One more suggestion. I would make the check in buf_ring_putback_sc() a > KASSERT() so that in the production case we don't pay for a branch that > should > never occur. > > -- > John Baldwin > From owner-freebsd-net@FreeBSD.ORG Tue Feb 5 21:40:23 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 120D726A; Tue, 5 Feb 2013 21:40:23 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net [IPv6:2001:470:1f10:75::2]) by mx1.freebsd.org (Postfix) with ESMTP id DEA829A2; Tue, 5 Feb 2013 21:40:22 +0000 (UTC) Received: from pakbsde14.localnet (unknown [38.105.238.108]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 54C42B91E; Tue, 5 Feb 2013 16:40:22 -0500 (EST) From: John Baldwin To: Andre Oppermann Subject: Re: [PATCH] Add a new TCP_IGNOREIDLE socket option Date: Tue, 5 Feb 2013 16:40:21 -0500 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p22; KDE/4.5.5; amd64; ; ) References: <201301221511.02496.jhb@freebsd.org> <201302051211.43345.jhb@freebsd.org> <511144FB.50807@freebsd.org> In-Reply-To: <511144FB.50807@freebsd.org> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201302051640.21412.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Tue, 05 Feb 2013 16:40:22 -0500 (EST) Cc: Sepherosa Ziehau , freebsd-net@freebsd.org, Bjoern Zeeb X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Feb 2013 21:40:23 -0000 On Tuesday, February 05, 2013 12:44:27 pm Andre Oppermann wrote: > On 05.02.2013 18:11, John Baldwin wrote: > > On Wednesday, January 30, 2013 12:26:17 pm Andre Oppermann wrote: > >> You can simply create your own congestion control algorithm with only the > >> restart window changed. See (pseudo) code below. BTW, I just noticed that > >> the other cc algos don't do not reset the idle window. > > > > *sigh* I am fully competent at maintaining my own local changes. The point > > was to share this so that other people with similar workloads could make use > > of it. Also, a custom CC algo is not the right approach as we would want this > > change regardless of the CC algo used for handling non-idle periods (so that > > this is an orthogonal knob). Linux also makes this an orthogonal knob rather > > than requiring a separate CC algo. > > If everything Linux does is good, then go ahead and commit it. Discussing > this change further then is pointless. I don't mind too much and I have > stated my case why I think it's the wrong thing to do. Not everything Linux does is good, nor is everything Linux does bad. > I would prefer to encapsulate it into its own not-so-much-congestion-management > algorithm so you can eventually do other tweaks as well like more aggressive > loss recovery which would fit your objective as well. Since you have to modify > your app anyways to do the sockopt call this seems a more complete solution to > me. At least better than to do a non-portable hack that violates one of the > most fundamental TCP concepts. This is real rich from the guy pushing the increased IW that came from Linux. :) "Tools not policy" yadda yadda, but I digress. -- John Baldwin From owner-freebsd-net@FreeBSD.ORG Wed Feb 6 01:57:44 2013 Return-Path: Delivered-To: freebsd-net@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id E3C8E5F8; Wed, 6 Feb 2013 01:57:44 +0000 (UTC) (envelope-from linimon@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id BF426635; Wed, 6 Feb 2013 01:57:44 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.6/8.14.6) with ESMTP id r161viHe001964; Wed, 6 Feb 2013 01:57:44 GMT (envelope-from linimon@freefall.freebsd.org) Received: (from linimon@localhost) by freefall.freebsd.org (8.14.6/8.14.6/Submit) id r161vi5C001960; Wed, 6 Feb 2013 01:57:44 GMT (envelope-from linimon) Date: Wed, 6 Feb 2013 01:57:44 GMT Message-Id: <201302060157.r161vi5C001960@freefall.freebsd.org> To: linimon@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-net@FreeBSD.org From: linimon@FreeBSD.org Subject: Re: kern/175864: [re] Intel MB D510MO, onboard ethernet not working after update to 9.1 [regression] X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Feb 2013 01:57:45 -0000 Synopsis: [re] Intel MB D510MO, onboard ethernet not working after update to 9.1 [regression] Responsible-Changed-From-To: freebsd-bugS->freebsd-net Responsible-Changed-By: linimon Responsible-Changed-When: Wed Feb 6 01:57:30 UTC 2013 Responsible-Changed-Why: er, forgot to reassign. http://www.freebsd.org/cgi/query-pr.cgi?pr=175864 From owner-freebsd-net@FreeBSD.ORG Wed Feb 6 03:44:13 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 8A78E7CD for ; Wed, 6 Feb 2013 03:44:13 +0000 (UTC) (envelope-from araujobsdport@gmail.com) Received: from mail-we0-x236.google.com (we-in-x0236.1e100.net [IPv6:2a00:1450:400c:c03::236]) by mx1.freebsd.org (Postfix) with ESMTP id 159C9A16 for ; Wed, 6 Feb 2013 03:44:12 +0000 (UTC) Received: by mail-we0-f182.google.com with SMTP id t57so779021wey.27 for ; Tue, 05 Feb 2013 19:44:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:reply-to:date:message-id:subject:from:to :content-type; bh=iWz0VkFrZ0Pcvgc2p4ULjC0cHIFREL8IrCszwWthJbc=; b=ujsFn60wT0itfEQ3wLIuwNnVF+qWZ56OZyKLbzyrvG0V6MgqvEQAkDpxXVIywW5o4/ ZC18zBUPTIU72OCgZgiKl0xrzVGereWkNnoq+/jd6e8fLDXT+80oxr605c4dg9wtK/M/ yPxaBuPceLf0bT6Xe0V9Uvp+AURpZ3k2RWkAQ4pPAdGt/E+S33xRjosXiTe0VSVxoOok sm4ZcVuspYqdao+6B4VkEttcmSnOhkAZwIPZcWkAF9qR3gUO5FD5ijdJ62kYi73dnu0h t6UuMlFDdGF/tIbjV8zK96Zgh87j6GU+/LgPey4rdGVf9e/25z/4+OCd1GjBTixYkAMU 053A== MIME-Version: 1.0 X-Received: by 10.180.85.97 with SMTP id g1mr2141805wiz.29.1360122251883; Tue, 05 Feb 2013 19:44:11 -0800 (PST) Received: by 10.180.145.44 with HTTP; Tue, 5 Feb 2013 19:44:11 -0800 (PST) Date: Wed, 6 Feb 2013 11:44:11 +0800 Message-ID: Subject: Patch: carpdev for 9.1-RELEASE. From: Marcelo Araujo To: freebsd-net@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: araujo@FreeBSD.org List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Feb 2013 03:44:13 -0000 Hello all, Based on all changes made by glebius@, I made a patch to bring to 9.1-RELEASE the capability to have the carpdev. This patch will not be submitted to 9-STABLE! This is not my intention. As I need carpdev on 9.1-RELEASE, I made it and now I want share with you guys. It works properly for me, however any review our problem, feel free to let me know and I'll try my best to fix it. Patch at: http://people.freebsd.org/~araujo/carpdev/ After apply the patch and build world/kernel, you can setup the carp as follow: root# ifconfig ix0 10.0.0.1 vhid 100 pass my_password root# ifconfig ix0 vhid 100 state backup root# ifconfig ix0 -k Note: Just remember, it is experimental, don't use it on production. Best Regards, -- Marcelo Araujo araujo@FreeBSD.org From owner-freebsd-net@FreeBSD.ORG Wed Feb 6 05:39:37 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 8541BFCE; Wed, 6 Feb 2013 05:39:37 +0000 (UTC) (envelope-from sodynet1@gmail.com) Received: from mail-ia0-x22f.google.com (ia-in-x022f.1e100.net [IPv6:2607:f8b0:4001:c02::22f]) by mx1.freebsd.org (Postfix) with ESMTP id 25684E2E; Wed, 6 Feb 2013 05:39:37 +0000 (UTC) Received: by mail-ia0-f175.google.com with SMTP id r4so1094915iaj.20 for ; Tue, 05 Feb 2013 21:39:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=awbHDzr9p8duoL+YGN1+4sWZiu1LBxz2qGIwukuUbR8=; b=cIOS6Flesfm8GcRgiWyYUKNooJKEq1mzqF6Z9jMXd/lFm5n38SSmh2JNpJdzv9f4sx PATpwsVivBWz1RREsZViJxk+fhD3r7Dt47ajqgXPVpHBfxQemExv17dadW4QISVUf55c WAhhxAmILx1h/KA5BOkgJ3jZQKCo3+Z6LVhgLQ/RjLjNjLAkJmh9asBe4yWPxz5gZmBu XAbDiG9ClWCaple+siMUC/s9RA1uabbJGsKXqpanz3HCaZUBSgaRIk4uUQ+w6KB7g+u3 3HWSTOH8XKAsvou538aDCy6HNC1P3Dy+mnS1U2ZN71KLqGOcwygCkEjtYALzKiiV7NY6 YGNA== MIME-Version: 1.0 X-Received: by 10.50.187.169 with SMTP id ft9mr3696055igc.25.1360129176749; Tue, 05 Feb 2013 21:39:36 -0800 (PST) Received: by 10.64.51.98 with HTTP; Tue, 5 Feb 2013 21:39:36 -0800 (PST) Received: by 10.64.51.98 with HTTP; Tue, 5 Feb 2013 21:39:36 -0800 (PST) In-Reply-To: References: Date: Wed, 6 Feb 2013 07:39:36 +0200 Message-ID: Subject: Re: Patch: carpdev for 9.1-RELEASE. From: Sami Halabi To: araujo@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: freebsd-net@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Feb 2013 05:39:37 -0000 Hi, Is there explaination somewhere what ix carpdev, benefits snd usage? I googled, an endex on mailinv listz nog describing but bringing carpdev frkm open/net bsd. Thanks in advance, Sami On Feb 6, 2013 5:44 AM, "Marcelo Araujo" wrote: > Hello all, > > Based on all changes made by glebius@, I made a patch to bring to > 9.1-RELEASE the capability to have the carpdev. > > This patch will not be submitted to 9-STABLE! This is not my intention. > > As I need carpdev on 9.1-RELEASE, I made it and now I want share with you > guys. It works properly for me, however any review our problem, feel free > to let me know and I'll try my best to fix it. > > Patch at: http://people.freebsd.org/~araujo/carpdev/ > > After apply the patch and build world/kernel, you can setup the carp as > follow: > root# ifconfig ix0 10.0.0.1 vhid 100 pass my_password > root# ifconfig ix0 vhid 100 state backup > root# ifconfig ix0 -k > > Note: Just remember, it is experimental, don't use it on production. > > Best Regards, > -- > Marcelo Araujo > araujo@FreeBSD.org > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > From owner-freebsd-net@FreeBSD.ORG Wed Feb 6 07:23:36 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 23B49A0A for ; Wed, 6 Feb 2013 07:23:36 +0000 (UTC) (envelope-from araujobsdport@gmail.com) Received: from mail-we0-x236.google.com (we-in-x0236.1e100.net [IPv6:2a00:1450:400c:c03::236]) by mx1.freebsd.org (Postfix) with ESMTP id 98980268 for ; Wed, 6 Feb 2013 07:23:35 +0000 (UTC) Received: by mail-we0-f182.google.com with SMTP id t57so886986wey.27 for ; Tue, 05 Feb 2013 23:23:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:reply-to:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=8AQsN43dJ+J+QasrDc37HStB07FkN5CbZENb8JCYMmI=; b=IgH5A9f6NzqNpSZPBQRJnYwS1Ba4JtpTJaCGpw+Nec0Sz/k7W4sPbyEk3EpSzsxNPN z8pimcOXPaIb6Y8DPTGRz52eUE4WID9K1U98YkTP+2ILHaC3gllD+Tbu9mMNLC4Bg5BW silL6+8cTJ+BHk9Ft44vQjhsycchVvDwwVD7B+u4+zbhtwnVN438UdvnByG5pTxNjCPw uuIDIL2866QsHMJckpXuHwLu1l/XuIxR8E2VNKsXnVxTkc+UtJlMCI8iBA4fAigL1M0V M933xN8/xcuz+RsHjGTE9qXUQTuFiTKZ4dNkRhBrA5PUNtJbweCMhZc6lJwZG3VbH+iP 2haA== MIME-Version: 1.0 X-Received: by 10.180.101.98 with SMTP id ff2mr3159398wib.0.1360135414860; Tue, 05 Feb 2013 23:23:34 -0800 (PST) Received: by 10.180.145.44 with HTTP; Tue, 5 Feb 2013 23:23:34 -0800 (PST) In-Reply-To: References: Date: Wed, 6 Feb 2013 15:23:34 +0800 Message-ID: Subject: Re: Patch: carpdev for 9.1-RELEASE. From: Marcelo Araujo To: Sami Halabi Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: freebsd-net@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: araujo@FreeBSD.org List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Feb 2013 07:23:36 -0000 2013/2/6 Sami Halabi > Hi, > Is there explaination somewhere what ix carpdev, benefits snd usage? I > googled, an endex on mailinv listz nog describing but bringing carpdev frkm > open/net bsd. > > Thanks in advance, > Sami > > Hello, Well, the patch was based on this revision on HEAD: http://svnweb.freebsd.org/base?view=revision&revision=228571 Check the log description to have more information about it. Best Regards, -- Marcelo Araujo araujo@FreeBSD.org From owner-freebsd-net@FreeBSD.ORG Wed Feb 6 07:47:52 2013 Return-Path: Delivered-To: freebsd-net@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id E16E030A; Wed, 6 Feb 2013 07:47:52 +0000 (UTC) (envelope-from araujo@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id A3E1B372; Wed, 6 Feb 2013 07:47:52 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.6/8.14.6) with ESMTP id r167lq6h066371; Wed, 6 Feb 2013 07:47:52 GMT (envelope-from araujo@freefall.freebsd.org) Received: (from araujo@localhost) by freefall.freebsd.org (8.14.6/8.14.6/Submit) id r167lq7g066367; Wed, 6 Feb 2013 07:47:52 GMT (envelope-from araujo) Date: Wed, 6 Feb 2013 07:47:52 GMT Message-Id: <201302060747.r167lq7g066367@freefall.freebsd.org> To: araujo@FreeBSD.org, araujo@FreeBSD.org, freebsd-net@FreeBSD.org From: araujo@FreeBSD.org Subject: Re: kern/171728: [arp] arp issue X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Feb 2013 07:47:52 -0000 Synopsis: [arp] arp issue State-Changed-From-To: open->closed State-Changed-By: araujo State-Changed-When: Wed Feb 6 07:47:52 UTC 2013 State-Changed-Why: It was a reply to another PR. :D http://www.freebsd.org/cgi/query-pr.cgi?pr=171728 From owner-freebsd-net@FreeBSD.ORG Wed Feb 6 10:32:39 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 32A6A7D3; Wed, 6 Feb 2013 10:32:39 +0000 (UTC) (envelope-from lars@netapp.com) Received: from mx12.netapp.com (mx12.netapp.com [216.240.18.77]) by mx1.freebsd.org (Postfix) with ESMTP id 176C8F17; Wed, 6 Feb 2013 10:32:38 +0000 (UTC) X-IronPort-AV: E=Sophos;i="4.84,614,1355126400"; d="scan'208";a="17031072" Received: from smtp1.corp.netapp.com ([10.57.156.124]) by mx12-out.netapp.com with ESMTP; 06 Feb 2013 02:32:36 -0800 Received: from vmwexceht01-prd.hq.netapp.com (exchsmtp.hq.netapp.com [10.106.76.239]) by smtp1.corp.netapp.com (8.13.1/8.13.1/NTAP-1.6) with ESMTP id r16AWa90025677; Wed, 6 Feb 2013 02:32:36 -0800 (PST) Received: from SACEXCMBX01-PRD.hq.netapp.com ([169.254.2.54]) by vmwexceht01-prd.hq.netapp.com ([10.106.76.239]) with mapi id 14.02.0328.009; Wed, 6 Feb 2013 02:32:35 -0800 From: "Eggert, Lars" To: Jack Vogel Subject: Re: Data Center Bridging? Thread-Topic: Data Center Bridging? Thread-Index: AQHN+LdDgifA/mtK/UqYvtRGZs2iTJhWEL0AgAACXoCAABzGAIAXD1iA Date: Wed, 6 Feb 2013 10:32:34 +0000 Message-ID: References: <50FEBF0C.6050307@freebsd.org> <50FEC109.5080601@freebsd.org> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.106.53.51] Content-Type: text/plain; charset="iso-8859-1" Content-ID: <83A03C9D87A4D94F84CDC20D4C836075@tahoe.netapp.com> Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Cc: "freebsd-net@freebsd.org" , "Vogel, Jack" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Feb 2013 10:32:39 -0000 Hi Jack, On Jan 22, 2013, at 19:23, Jack Vogel wrote: > I have never implemented this in the FreeBSD drivers primarily because th= e > motivation for it say, in Linux, > was to handle multiple traffic classes, for instance FCOE or iSCSI, but > FreeBSD has not had these features > to implement this for. Give me a reason to do it, and I can see about > adding it :) I'm interested in seeing if DCB can be used for lossless IP communication o= ver a simple and private LAN fabric. I have some student cycles that I can = direct at helping with the implementation, if that's useful? Lars= From owner-freebsd-net@FreeBSD.ORG Wed Feb 6 10:39:21 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 5E72E9B4; Wed, 6 Feb 2013 10:39:21 +0000 (UTC) (envelope-from rrs@lakerest.net) Received: from lakerest.net (lakerest.net [70.155.160.98]) by mx1.freebsd.org (Postfix) with ESMTP id DF20DF61; Wed, 6 Feb 2013 10:39:20 +0000 (UTC) Received: from [10.1.1.101] (bsd4.lakerest.net [70.155.160.102]) (authenticated bits=0) by lakerest.net (8.14.4/8.14.3) with ESMTP id r16AdZRY052379 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT); Wed, 6 Feb 2013 05:39:36 -0500 (EST) (envelope-from rrs@lakerest.net) Subject: Re: Driver patch to look at... Mime-Version: 1.0 (Apple Message framework v1283) Content-Type: text/plain; charset=windows-1252 From: Randall Stewart In-Reply-To: <201302051552.04275.jhb@freebsd.org> Date: Wed, 6 Feb 2013 05:39:19 -0500 Content-Transfer-Encoding: quoted-printable Message-Id: References: <201302051411.52495.jhb@freebsd.org> <74624C61-0564-420F-9F82-0475488F857C@lakerest.net> <201302051552.04275.jhb@freebsd.org> To: John Baldwin X-Mailer: Apple Mail (2.1283) Cc: freebsd-net@freebsd.org, Robert Watson , Jack Vogel , Kip Macy X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Feb 2013 10:39:21 -0000 Good idea=85=20 I will commit this late today.. just in case there are any trailing = comments ;-) R On Feb 5, 2013, at 3:52 PM, John Baldwin wrote: > On Tuesday, February 05, 2013 2:30:36 pm Randall Stewart wrote: >> Ok >>=20 >> Here it is one last time (I hope) with the updates ;-) >=20 > One more suggestion. I would make the check in buf_ring_putback_sc() = a=20 > KASSERT() so that in the production case we don't pay for a branch = that should=20 > never occur. >=20 > --=20 > John Baldwin > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >=20 ------------------------------ Randall Stewart 803-317-4952 (cell) From owner-freebsd-net@FreeBSD.ORG Wed Feb 6 10:53:40 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id CC9ADCA1 for ; Wed, 6 Feb 2013 10:53:40 +0000 (UTC) (envelope-from rrs@lakerest.net) Received: from lakerest.net (lakerest.net [70.155.160.98]) by mx1.freebsd.org (Postfix) with ESMTP id 7C95FFF8 for ; Wed, 6 Feb 2013 10:53:40 +0000 (UTC) Received: from [10.1.1.101] (bsd4.lakerest.net [70.155.160.102]) (authenticated bits=0) by lakerest.net (8.14.4/8.14.3) with ESMTP id r16Artkw052473 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT); Wed, 6 Feb 2013 05:53:56 -0500 (EST) (envelope-from rrs@lakerest.net) Subject: Re: Data Center Bridging? Mime-Version: 1.0 (Apple Message framework v1283) Content-Type: text/plain; charset=us-ascii From: Randall Stewart In-Reply-To: Date: Wed, 6 Feb 2013 05:53:39 -0500 Content-Transfer-Encoding: quoted-printable Message-Id: <4833B5CE-33A7-40EA-862D-4E345083FABE@lakerest.net> References: To: "Eggert, Lars" , Jack Vogel X-Mailer: Apple Mail (2.1283) Cc: freebsd-net X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Feb 2013 10:53:40 -0000 Lars/Jack: I am pretty sure that my company would be interested in it as well.. and = I can help out here too ;-) Jack: are there particular versions of Intel cards that this needs to be on (we have both igb and ix cards in my office now).. thanks. R On Jan 22, 2013, at 10:43 AM, Eggert, Lars wrote: > Hi, >=20 > on Linux, various NICs (e.g., ixgbe) support Data Center Bridging. Is = this also available under FreeBSD? Do *any* NICs support DCB under = FreeBSD? >=20 > Thanks, > Lars > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >=20 ------------------------------ Randall Stewart 803-317-4952 (cell) From owner-freebsd-net@FreeBSD.ORG Wed Feb 6 11:27:06 2013 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id D9C7E731; Wed, 6 Feb 2013 11:27:06 +0000 (UTC) (envelope-from rrs@lakerest.net) Received: from lakerest.net (lakerest.net [70.155.160.98]) by mx1.freebsd.org (Postfix) with ESMTP id 49EFD1E7; Wed, 6 Feb 2013 11:27:05 +0000 (UTC) Received: from [10.1.1.101] (bsd4.lakerest.net [70.155.160.102]) (authenticated bits=0) by lakerest.net (8.14.4/8.14.3) with ESMTP id r16BRLbK052791 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT); Wed, 6 Feb 2013 06:27:21 -0500 (EST) (envelope-from rrs@lakerest.net) Subject: Re: [PATCH] Add a new TCP_IGNOREIDLE socket option Mime-Version: 1.0 (Apple Message framework v1283) Content-Type: text/plain; charset=us-ascii From: Randall Stewart In-Reply-To: <50FF06AD.402@networx.ch> Date: Wed, 6 Feb 2013 06:27:04 -0500 Content-Transfer-Encoding: quoted-printable Message-Id: <061B4EA5-6A93-48A0-A269-C2C3A3C7E77C@lakerest.net> References: <201301221511.02496.jhb@freebsd.org> <50FEF81C.1070002@mu.org> <50FF06AD.402@networx.ch> To: John Baldwin X-Mailer: Apple Mail (2.1283) Cc: Alfred Perlstein , net@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Feb 2013 11:27:06 -0000 John: A burst at line rate will *often* cause drops. This is because router queues are at a finite size. Also such a burst (especially on a long delay bandwidth network) cause your RTT to increase even if there is no drop which is going to hurt you as well. A SHOULD in an RFC says you really really really really need to do it unless there is some thing that makes you willing to override it. It is slight wiggle room. In this I agree with Andre, we should not be *not* doing it. Otherwise folks will be turning this on and it is plain wrong. It may be fine for your network but I would not want to see it in FreeBSD. In my testing here at home I have put back into our stack max-burst. = This uses Mark Allman's version (not Kacheong Poon's) where you clamp the = cwnd at no more than 4 packets larger than your flight. All of my testing high-bw-delay or lan has shown this to improve TCP performance. This is because it helps you avoid bursting out so many packets that you = overflow a queue. In your long-delay bw link if you do burst out too many (and you never know how many that is since you can not predict how full all those MPLS queues are or how big they are) you will really hurt yourself even = worse. Note that generally in Cisco routers the default queue size is somewhere = between 100-300 packets depending on the router. bottom line IMO this is a bad idea. If you want to really improve that link, let me get with you off line = and we can see about getting you a couple of our boxes again :-D. R On Jan 22, 2013, at 4:37 PM, Andre Oppermann wrote: > On 22.01.2013 21:35, Alfred Perlstein wrote: >> On 1/22/13 12:11 PM, John Baldwin wrote: >>> As I mentioned in an earlier thread, I recently had to debug an = issue we were >>> seeing across a link with a high bandwidth-delay product (both high = bandwidth >>> and high RTT). Our specific use case was to use a TCP connection to = reliably >>> forward a latency-sensitive datagram stream across a WAN connection. = We would >>> often see spikes in the latency of individual datagrams. I = eventually tracked >>> this down to the connection entering slow start when it would = transmit data >>> after being idle. The data stream was quite bursty and would often = attempt to >>> transmit a burst of data after being idle for far longer than a = retransmit >>> timeout. >>>=20 >>> In 7.x we had worked around this in the past by disabling RFC 3390 = and jacking >>> the slow start window size up via a sysctl. On 8.x this no longer = worked. >>> The solution I came up with was to add a new socket option to = disable idle >>> handling completely. That is, when an idle connection restarts with = this new >>> option enabled, it keeps its current congestion window and doesn't = enter slow >>> start. >>>=20 >>> There are only a few cases where such an option is useful, but if = anyone else >>> thinks this might be useful I'd be happy to add the option to = FreeBSD. >>=20 >> This looks good, but it almost sounds like a bug for TCP to be doing = this anyhow. >=20 > It's not a bug. It's by design. It's required by the RFC. >=20 >> Why would one want this behavior? >=20 > Network conditions change all the time. Traffic and congestion comes = and goes. > Connections can go idle for milliseconds to minutes to hours. = Whenever "enough" > time has passed network capacity probing has to start anew. >=20 >> Wouldn't it make sense to keep the window large until there was a = problem rather than >> unconditionally chop it down? I almost think TCP is afraid that you = might wind up swapping out a >> 10gig interface for a modem? I'm just not getting it. (probably = simple oversight on my part). >=20 > The very real fear is congestion meltdown. That is the reason we = ended up with > TCP's AIMD mechanism in the first place. If everybody were to blast = into the > network anyone will suffer. The bufferbloat issue identified recently = makes things > even worse. >=20 >> What do you think about also making this a sysctl for global on/off = by default? >=20 > Please don't. The correct fix is either a) to use the initial window = as the restart > window (up to 10 MSS nowadays); b) to use a decay mechanism based on = the time since > the last network condition probe. Even the latter must decay to = initCWND within at > most 1MSL. >=20 > --=20 > Andre >=20 > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >=20 ------------------------------ Randall Stewart 803-317-4952 (cell) From owner-freebsd-net@FreeBSD.ORG Wed Feb 6 11:32:30 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 4B2C8821; Wed, 6 Feb 2013 11:32:30 +0000 (UTC) (envelope-from rrs@lakerest.net) Received: from lakerest.net (lakerest.net [70.155.160.98]) by mx1.freebsd.org (Postfix) with ESMTP id BE15121E; Wed, 6 Feb 2013 11:32:29 +0000 (UTC) Received: from [10.1.1.101] (bsd4.lakerest.net [70.155.160.102]) (authenticated bits=0) by lakerest.net (8.14.4/8.14.3) with ESMTP id r16BWjsU052830 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT); Wed, 6 Feb 2013 06:32:45 -0500 (EST) (envelope-from rrs@lakerest.net) Subject: Re: [PATCH] Add a new TCP_IGNOREIDLE socket option Mime-Version: 1.0 (Apple Message framework v1283) Content-Type: text/plain; charset=us-ascii From: Randall Stewart In-Reply-To: <201301241114.40734.jhb@freebsd.org> Date: Wed, 6 Feb 2013 06:32:28 -0500 Content-Transfer-Encoding: quoted-printable Message-Id: References: <201301221511.02496.jhb@freebsd.org> <5100EAD3.2090006@networx.ch> <201301241114.40734.jhb@freebsd.org> To: John Baldwin X-Mailer: Apple Mail (2.1283) Cc: Sepherosa Ziehau , freebsd-net@freebsd.org, Bjoern Zeeb X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Feb 2013 11:32:30 -0000 John: In-line On Jan 24, 2013, at 11:14 AM, John Baldwin wrote: > On Thursday, January 24, 2013 3:03:31 am Andre Oppermann wrote: >> On 24.01.2013 03:31, Sepherosa Ziehau wrote: >>> On Thu, Jan 24, 2013 at 12:15 AM, John Baldwin = wrote: >>>> On Wednesday, January 23, 2013 1:33:27 am Sepherosa Ziehau wrote: >>>>> On Wed, Jan 23, 2013 at 4:11 AM, John Baldwin = wrote: >>>>>> As I mentioned in an earlier thread, I recently had to debug an = issue we were >>>>>> seeing across a link with a high bandwidth-delay product (both = high bandwidth >>>>>> and high RTT). Our specific use case was to use a TCP connection = to reliably >>>>>> forward a latency-sensitive datagram stream across a WAN = connection. We would >>>>>> often see spikes in the latency of individual datagrams. I = eventually tracked >>>>>> this down to the connection entering slow start when it would = transmit data >>>>>> after being idle. The data stream was quite bursty and would = often attempt to >>>>>> transmit a burst of data after being idle for far longer than a = retransmit >>>>>> timeout. >>>>>>=20 >>>>>> In 7.x we had worked around this in the past by disabling RFC = 3390 and jacking >>>>>> the slow start window size up via a sysctl. On 8.x this no = longer worked. >>>>>> The solution I came up with was to add a new socket option to = disable idle >>>>>> handling completely. That is, when an idle connection restarts = with this new >>>>>> option enabled, it keeps its current congestion window and = doesn't enter slow >>>>>> start. >>>>>>=20 >>>>>> There are only a few cases where such an option is useful, but if = anyone else >>>>>> thinks this might be useful I'd be happy to add the option to = FreeBSD. >>>>>=20 >>>>> I think what you need is the RFC2861, however, you probably should >>>>> ignore the "application-limited period" part of RFC2861. >>>>=20 >>>> Hummm. It appears btw, that Linux uses RFC 2861, but has a global = knob to >>>> disable it due to applictions having problems. When it is = disabled, >>>> it doesn't decay the congestion window at all during idle handling. = That is, >>>> it appears to act the same as if TCP_IGNOREIDLE were enabled. >>>>=20 >>>> =46rom = http://www.kernel.org/doc/man-pages/online/pages/man7/tcp.7.html: >>>>=20 >>>> tcp_slow_start_after_idle (Boolean; default: enabled; since = Linux 2.6.18) >>>> If enabled, provide RFC 2861 behavior and time out = the congestion >>>> window after an idle period. An idle period is = defined as the current >>>> RTO (retransmission timeout). If disabled, the = congestion window will >>>> not be timed out after an idle period. >>>>=20 >>>> Also, in this thread on tcp-m it appears no one on that list = realizes that >>>> there are any implementations which follow the "SHOULD" in RFC 2581 = for idle >>>> handling (which is what we do currently): >>>=20 >>> Nah, I don't think the idle detection in FreeBSD follows the >>> RFC2581/RFC5681 4.1 (the paragraph before the "SHOULD"). IMHO, = that's >>> probably why the author in the following email requestioned about = the >>> implementation of "SHOULD" in RFC2581/RFC5681. >>>=20 >>>>=20 >>>> http://www.ietf.org/mail-archive/web/tcpm/current/msg02864.html >>>>=20 >>>> So if we were to implement RFC 2861, the new socket option would be = equivalent >>>> to setting Linux's 'tcp_slow_start_after_idle' to false, but on a = per-socket >>>> basis rather than globally. >>>=20 >>> Agree, per-socket option could be useful than global sysctls under >>> certain situation. However, in addition to the per-socket option, >>> could global sysctl nodes to disable idle_restart/idle_cwv help too? >>=20 >> No. This is far too dangerous once it makes it into some tuning = guide. >> The threat of congestion breakdown is real. The Internet, or any = packet >> network, can only survive in the long term if almost all follow the = rules >> and self-constrain to remain fair to the others. What would happen = if >> nobody would respect the traffic lights anymore? >=20 > The problem with this argument is Linux has already had this as a = tunable > option for years and the Internet hasn't melted as a result. Just because Linux does bad-behaviour does *not* mean that we have to. They also put Bic CC in by default, and this makes things bad for users even more so than RFC2581 in the buffer-bloat sense. The buffer-bloat problems reported by John Getty would not near has been as bad (they still would have existed) if he had been using standard RFC2581 CC. There are much better (and safer) ways to handle this type of network. Putting this in is not a good idea IMO. >=20 >> Besides that bursting into unknown network conditions is very likely = to >> result in burst losses as well. TCP isn't good at recovering from = it. >> In the end you most likely come out ahead if you decay the = restartCWND. >>=20 >> We have two cases primarily: a) long distance, medium to high RTT, = and >> wildly varying bandwidth (a.k.a. the Internet); b) short distance, = low >> RTT and mostly plenty of bandwidth (a.k.a. Datacenter). The former >> absolutely definately requires a decayed restartCWND. The latter = less >> so but even there bursting at 10Gig TSO assisted wirespeed isn't = going >> to end too happy more often than not. >=20 > You forgot my case: c) dedicated long distance links with high = bandwidth. And it may help a little, but you are *far* likely, depending on what is going on in that link, to overflow your router queues. Hurting that flow even more. R >=20 >> Since this seems to be a burning issue I'll come up with a patch in = the >> next days to add a decaying restartCWND that'll be fair and allow a = very >> quick ramp up if no loss occurs. >=20 > I think this could be useful. OTOH, I still think the TCP_IGNOREIDLE = option > is useful both with and without a decaying restartCWND? >=20 > --=20 > John Baldwin > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >=20 ------------------------------ Randall Stewart 803-317-4952 (cell) From owner-freebsd-net@FreeBSD.ORG Wed Feb 6 13:16:52 2013 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id E7721DF9 for ; Wed, 6 Feb 2013 13:16:52 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net [IPv6:2001:470:1f10:75::2]) by mx1.freebsd.org (Postfix) with ESMTP id 3940C943 for ; Wed, 6 Feb 2013 13:16:52 +0000 (UTC) Received: from pakbsde14.localnet (unknown [38.105.238.108]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 73BD1B911; Wed, 6 Feb 2013 08:16:51 -0500 (EST) From: John Baldwin To: Randall Stewart Subject: Re: [PATCH] Add a new TCP_IGNOREIDLE socket option Date: Wed, 6 Feb 2013 07:46:43 -0500 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p22; KDE/4.5.5; amd64; ; ) References: <201301221511.02496.jhb@freebsd.org> <50FF06AD.402@networx.ch> <061B4EA5-6A93-48A0-A269-C2C3A3C7E77C@lakerest.net> In-Reply-To: <061B4EA5-6A93-48A0-A269-C2C3A3C7E77C@lakerest.net> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201302060746.43736.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Wed, 06 Feb 2013 08:16:51 -0500 (EST) Cc: Alfred Perlstein , net@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Feb 2013 13:16:53 -0000 On Wednesday, February 06, 2013 6:27:04 am Randall Stewart wrote: > John: > > A burst at line rate will *often* cause drops. This is because > router queues are at a finite size. Also such a burst (especially > on a long delay bandwidth network) cause your RTT to increase even > if there is no drop which is going to hurt you as well. > > A SHOULD in an RFC says you really really really really need to do it > unless there is some thing that makes you willing to override it. It is > slight wiggle room. > > In this I agree with Andre, we should not be *not* doing it. Otherwise > folks will be turning this on and it is plain wrong. It may be fine > for your network but I would not want to see it in FreeBSD. > > In my testing here at home I have put back into our stack max-burst. This > uses Mark Allman's version (not Kacheong Poon's) where you clamp the cwnd at > no more than 4 packets larger than your flight. All of my testing > high-bw-delay or lan has shown this to improve TCP performance. This > is because it helps you avoid bursting out so many packets that you overflow > a queue. > > In your long-delay bw link if you do burst out too many (and you never > know how many that is since you can not predict how full all those > MPLS queues are or how big they are) you will really hurt yourself even worse. > Note that generally in Cisco routers the default queue size is somewhere between > 100-300 packets depending on the router. Due to the way our application works this never happens, but I am fine with just keeping this patch private. If there are other shops that need this they can always dig the patch up from the archives. -- John Baldwin From owner-freebsd-net@FreeBSD.ORG Wed Feb 6 14:21:07 2013 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id F0854C4F; Wed, 6 Feb 2013 14:21:07 +0000 (UTC) (envelope-from melifaro@FreeBSD.org) Received: from mail.ipfw.ru (unknown [IPv6:2a01:4f8:120:6141::2]) by mx1.freebsd.org (Postfix) with ESMTP id BB139DDB; Wed, 6 Feb 2013 14:21:07 +0000 (UTC) Received: from v6.mpls.in ([2a02:978:2::5] helo=ws.su29.net) by mail.ipfw.ru with esmtpsa (TLSv1:CAMELLIA256-SHA:256) (Exim 4.76 (FreeBSD)) (envelope-from ) id 1U35vR-000HbU-Mn; Wed, 06 Feb 2013 18:24:37 +0400 Message-ID: <5112666F.3050904@FreeBSD.org> Date: Wed, 06 Feb 2013 18:19:27 +0400 From: "Alexander V. Chernikov" User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:9.0) Gecko/20120121 Thunderbird/9.0 MIME-Version: 1.0 To: net@freebsd.org, freebsd-hackers@FreeBSD.org Subject: Make kernel aware of NIC queues Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Feb 2013 14:21:08 -0000 Hello list! Today more and more NICs are capable of splitting traffic to different Rx/TX rings permitting OS to dispatch this traffic on different CPU cores. However, there are some problems that arises from using multi-nic (or even singe multi-port NIC) configurations: Typical (OS) questions are: * how much queues we should allocate per port ? * how we should mark packets received in given queue ? * What traffic pattern NIC is used for: should we bind queues to CPU cores and, if so, to which ones? Currently, there are some AI implemented in Intel drivers like: * use maximum available queues if CPU has large number of cores * bind every queue to CPU core sequentially. Problems with (probably, any AI) are: * what NICs (ports) will be _actually_ used? E.g: I have 8-core system with dual 82576 Intel NIC (which is capable of using 8 RX queues per port). If only one port is used, I can allocate 8 (or 7) queues and bind it to given cores. which is generally good for forwarding traffic. For 2-port setups it is probably better to setup 4 queues per each port to make sure ithreads from different cards to not interfere with each other. * How exactly we should mark packets? There are traffic flows which are not hashed properly by NIC (mostly non-IP/IPv6 traffic, PPPoE, various tunnels are good examples) so driver receives all such packets on q0 and marks them with FLOWID 0, which can be unhandy in some situations. It can be better if we can instruct NIC not to mark such packets with any id permitting OS to re-calculate hash via probably more powerful netisr hash function. * Traffic flow inside OS / flowid marking Smarter flowid marking may be needed in some cases: for example, if we are using lagg with 2 NICs for traffic forwarding, this results in increased contention on transmit parts: From the previos example: port 0 has q0-q3 bound to cores 0-3 port 1 has q0-q3 bound to cores 4-7 flow ids are the same as core numbers. lagg uses (flowid % number_nics) which leads to TX contention: 0 (0 % 2)=port0, (0 % 4)=queue0 1 (1 % 2)=port1, (1 % 4)=queue1 2 (2 % 2)=port0, (2 % 4)=queue2 3 (3 % 2)=port1, (3 % 4)=queue3 4 (4 % 2)=port0, (4 % 4)=queue0 5 (5 % 2)=port1, (5 % 4)=queue1 6 (6 % 2)=port0, (6 % 4)=queue2 7 (7 % 2)=port1, (7 % 4)=queue3 Flow IDs 0 and 4, 1 and 5, 2 and 6, 3 and 7 use the same TX queues on the same egress NICs. This can be minimized by using either GCD(queues, ports)=1 configurations (3 queues should do the trick in this case), but this leads to suboptimal CPU usage. We internally uses patched igb/ix driver which permits setting flow ids manually (and I heard other people are using hacks to enable/disabling setting M_FLOWID). I propose implementing common API to permit drivers: * read user-supplied number of queues/other queue options (e.g: * notify kernel of each RX/TX queue being created/destroyed * make binding queues to cores via given API * Export data to userland (for example, via sysctl) to permit users: a) quickly see current configuration b) change CPU binding on-fly c) change flowid numbers on-fly (with the possibility to set 1) NIC-supplied hash 2) manually supplied value 3) disable setting M_FLOWID) Having common interface will help users to make network stack tuning easier and puts us one step further to make (probably userland) AI which can auto-tune system according to template ("router", "webserver") and rc.conf configuration (lagg presense, etc..) What do you guys think? From owner-freebsd-net@FreeBSD.ORG Wed Feb 6 14:37:17 2013 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 09C85837; Wed, 6 Feb 2013 14:37:17 +0000 (UTC) (envelope-from luigi@onelab2.iet.unipi.it) Received: from onelab2.iet.unipi.it (onelab2.iet.unipi.it [131.114.59.238]) by mx1.freebsd.org (Postfix) with ESMTP id C4AFEEDA; Wed, 6 Feb 2013 14:37:16 +0000 (UTC) Received: by onelab2.iet.unipi.it (Postfix, from userid 275) id 34E6D73029; Wed, 6 Feb 2013 15:37:14 +0100 (CET) Date: Wed, 6 Feb 2013 15:37:14 +0100 From: Luigi Rizzo To: "Alexander V. Chernikov" Subject: Re: Make kernel aware of NIC queues Message-ID: <20130206143714.GA45782@onelab2.iet.unipi.it> References: <5112666F.3050904@FreeBSD.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5112666F.3050904@FreeBSD.org> User-Agent: Mutt/1.4.2.3i Cc: freebsd-hackers@freebsd.org, net@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Feb 2013 14:37:17 -0000 On Wed, Feb 06, 2013 at 06:19:27PM +0400, Alexander V. Chernikov wrote: > Hello list! > > Today more and more NICs are capable of splitting traffic to different > Rx/TX rings permitting OS to dispatch this traffic on different CPU > cores. However, there are some problems that arises from using multi-nic > (or even singe multi-port NIC) configurations: ... > I propose implementing common API to permit drivers: > * read user-supplied number of queues/other queue options (e.g: > * notify kernel of each RX/TX queue being created/destroyed > * make binding queues to cores via given API > * Export data to userland (for example, via sysctl) to permit users: > a) quickly see current configuration > b) change CPU binding on-fly > c) change flowid numbers on-fly (with the possibility to set 1) > NIC-supplied hash 2) manually supplied value 3) disable setting M_FLOWID) > > Having common interface will help users to make network stack tuning > easier and puts us one step further to make (probably userland) AI which > can auto-tune system according to template ("router", "webserver") and > rc.conf configuration (lagg presense, etc..) > > > What do you guys think? this is certainly a good idea and a welcome one. Linux has tried to come up with a common framework to implement this kind of controls using "ethtool", and we should probably have a look at their approach and reuse it (or at least the good ideas) to avoid reinventing the same thing. cheers luigi From owner-freebsd-net@FreeBSD.ORG Wed Feb 6 16:06:03 2013 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 3C21476B; Wed, 6 Feb 2013 16:06:03 +0000 (UTC) (envelope-from gnn@neville-neil.com) Received: from vps.hungerhost.com (vps.hungerhost.com [216.38.53.176]) by mx1.freebsd.org (Postfix) with ESMTP id 01AFF684; Wed, 6 Feb 2013 16:06:02 +0000 (UTC) Received: from [38.105.238.108] (port=56842 helo=[10.7.1.235]) by vps.hungerhost.com with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.80) (envelope-from ) id 1U37VU-0002Sj-0r; Wed, 06 Feb 2013 11:05:56 -0500 Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Subject: Re: Make kernel aware of NIC queues From: George Neville-Neil In-Reply-To: <20130206143714.GA45782@onelab2.iet.unipi.it> Date: Wed, 6 Feb 2013 11:05:59 -0500 Content-Transfer-Encoding: 7bit Message-Id: References: <5112666F.3050904@FreeBSD.org> <20130206143714.GA45782@onelab2.iet.unipi.it> To: Luigi Rizzo X-Mailer: Apple Mail (2.1499) X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - vps.hungerhost.com X-AntiAbuse: Original Domain - freebsd.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - neville-neil.com X-Get-Message-Sender-Via: vps.hungerhost.com: authenticated_id: gnn@neville-neil.com Cc: freebsd-hackers@freebsd.org, "Alexander V. Chernikov" , net@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Feb 2013 16:06:03 -0000 On Feb 6, 2013, at 09:37 , Luigi Rizzo wrote: > On Wed, Feb 06, 2013 at 06:19:27PM +0400, Alexander V. Chernikov wrote: >> Hello list! >> >> Today more and more NICs are capable of splitting traffic to different >> Rx/TX rings permitting OS to dispatch this traffic on different CPU >> cores. However, there are some problems that arises from using multi-nic >> (or even singe multi-port NIC) configurations: > ... >> I propose implementing common API to permit drivers: >> * read user-supplied number of queues/other queue options (e.g: >> * notify kernel of each RX/TX queue being created/destroyed >> * make binding queues to cores via given API >> * Export data to userland (for example, via sysctl) to permit users: >> a) quickly see current configuration >> b) change CPU binding on-fly >> c) change flowid numbers on-fly (with the possibility to set 1) >> NIC-supplied hash 2) manually supplied value 3) disable setting M_FLOWID) >> >> Having common interface will help users to make network stack tuning >> easier and puts us one step further to make (probably userland) AI which >> can auto-tune system according to template ("router", "webserver") and >> rc.conf configuration (lagg presense, etc..) >> >> >> What do you guys think? > > this is certainly a good idea and a welcome one. > > Linux has tried to come up with a common framework to implement > this kind of controls using "ethtool", and we should probably > have a look at their approach and reuse it (or at least the good ideas) > to avoid reinventing the same thing. > And, though Luigi didn't say it, I will, this should integrate with netmap. Best, George From owner-freebsd-net@FreeBSD.ORG Wed Feb 6 16:55:05 2013 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id C0B10E2C; Wed, 6 Feb 2013 16:55:05 +0000 (UTC) (envelope-from luigi@onelab2.iet.unipi.it) Received: from onelab2.iet.unipi.it (onelab2.iet.unipi.it [131.114.59.238]) by mx1.freebsd.org (Postfix) with ESMTP id 8476094D; Wed, 6 Feb 2013 16:55:05 +0000 (UTC) Received: by onelab2.iet.unipi.it (Postfix, from userid 275) id 69ACC73027; Wed, 6 Feb 2013 17:55:03 +0100 (CET) Date: Wed, 6 Feb 2013 17:55:03 +0100 From: Luigi Rizzo To: George Neville-Neil Subject: Re: Make kernel aware of NIC queues Message-ID: <20130206165503.GA46925@onelab2.iet.unipi.it> References: <5112666F.3050904@FreeBSD.org> <20130206143714.GA45782@onelab2.iet.unipi.it> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i Cc: freebsd-hackers@freebsd.org, "Alexander V. Chernikov" , net@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Feb 2013 16:55:05 -0000 On Wed, Feb 06, 2013 at 11:05:59AM -0500, George Neville-Neil wrote: > > On Feb 6, 2013, at 09:37 , Luigi Rizzo wrote: ... > > Linux has tried to come up with a common framework to implement > > this kind of controls using "ethtool", and we should probably > > have a look at their approach and reuse it (or at least the good ideas) > > to avoid reinventing the same thing. > > > And, though Luigi didn't say it, I will, this should integrate with netmap. i did not say it because it will work without any extra effort: - the netmap version i committed a few days ago already fetch the number of queues and the ring sizes at runtime; - ethtool (or whatever we will call it) only operates on the configuration/control plane (number of queues and slots, partitioning of packets onto input queues, etc.), whereas netmap operates only on the data plane cheers luigi From owner-freebsd-net@FreeBSD.ORG Wed Feb 6 17:28:44 2013 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id CD141FFE; Wed, 6 Feb 2013 17:28:44 +0000 (UTC) (envelope-from bright@mu.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.freebsd.org (Postfix) with ESMTP id BB332CB4; Wed, 6 Feb 2013 17:28:44 +0000 (UTC) Received: from Alfreds-MacBook-Pro-9.local (c-67-180-208-218.hsd1.ca.comcast.net [67.180.208.218]) by elvis.mu.org (Postfix) with ESMTPSA id 1ABEE1A3C1B; Wed, 6 Feb 2013 09:28:42 -0800 (PST) Message-ID: <511292C9.4040307@mu.org> Date: Wed, 06 Feb 2013 09:28:41 -0800 From: Alfred Perlstein User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:17.0) Gecko/20130107 Thunderbird/17.0.2 MIME-Version: 1.0 To: John Baldwin Subject: Re: [PATCH] Add a new TCP_IGNOREIDLE socket option References: <201301221511.02496.jhb@freebsd.org> <50FF06AD.402@networx.ch> <061B4EA5-6A93-48A0-A269-C2C3A3C7E77C@lakerest.net> <201302060746.43736.jhb@freebsd.org> In-Reply-To: <201302060746.43736.jhb@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Randall Stewart , net@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Feb 2013 17:28:44 -0000 On 2/6/13 4:46 AM, John Baldwin wrote: > On Wednesday, February 06, 2013 6:27:04 am Randall Stewart wrote: >> John: >> >> A burst at line rate will *often* cause drops. This is because >> router queues are at a finite size. Also such a burst (especially >> on a long delay bandwidth network) cause your RTT to increase even >> if there is no drop which is going to hurt you as well. >> >> A SHOULD in an RFC says you really really really really need to do it >> unless there is some thing that makes you willing to override it. It is >> slight wiggle room. >> >> In this I agree with Andre, we should not be *not* doing it. Otherwise >> folks will be turning this on and it is plain wrong. It may be fine >> for your network but I would not want to see it in FreeBSD. >> >> In my testing here at home I have put back into our stack max-burst. This >> uses Mark Allman's version (not Kacheong Poon's) where you clamp the cwnd at >> no more than 4 packets larger than your flight. All of my testing >> high-bw-delay or lan has shown this to improve TCP performance. This >> is because it helps you avoid bursting out so many packets that you overflow >> a queue. >> >> In your long-delay bw link if you do burst out too many (and you never >> know how many that is since you can not predict how full all those >> MPLS queues are or how big they are) you will really hurt yourself even worse. >> Note that generally in Cisco routers the default queue size is somewhere between >> 100-300 packets depending on the router. > Due to the way our application works this never happens, but I am fine with > just keeping this patch private. If there are other shops that need this they > can always dig the patch up from the archives. > This is yet another time when I'm sad about how things happen in FreeBSD. A developer come forward with a non-default option that's very useful for some specific workloads, specifically one that contributes much time and $$$ to the project and the community rejects the patches even though it's been successful in other OSes. It makes zero sense. John, can you repost the patch? Maybe there is a way to refactor this somehow so it's like accept filters where we can plug in a hook for TCP? I am very disappointed, but not surprised. -Alfred From owner-freebsd-net@FreeBSD.ORG Wed Feb 6 19:29:50 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id D1477485 for ; Wed, 6 Feb 2013 19:29:50 +0000 (UTC) (envelope-from kurt.buff@gmail.com) Received: from mail-ea0-f170.google.com (mail-ea0-f170.google.com [209.85.215.170]) by mx1.freebsd.org (Postfix) with ESMTP id 6CDDB307 for ; Wed, 6 Feb 2013 19:29:50 +0000 (UTC) Received: by mail-ea0-f170.google.com with SMTP id a11so790007eaa.29 for ; Wed, 06 Feb 2013 11:29:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:date:message-id:subject:from:to :content-type; bh=1+jog4FtciTdrgTzDVQ8YQxC3b8ECpBCxc2RrQGQ2i8=; b=GtH1xL3DBH72m+D4mRJu7jwZZUPu1BJg1HJ42YnLSx3+b7tpqe+rkyGQGpYc0OwX0W amHxczjyZtRvlxsV2G7/6wDdpW97rJrBlNwkxOfj7Dfas4BHY0j8VaC4iE3ZIEoPSNqY duWde1V7Rer+HNfUfIvT4qnl8wOeJQ5XROGlakQWoj3ookSELNew+5nPGzjNRd7vNofW lQOA77VDyAtHgkzPrGQ1bOvwCuwcw1NtTvs5ctMSy9A2ksFww3f0W51XkrCVJlPgbZzW ib/Ahw3Dk7PN53xQ+tXT6ijaP7VRGoySouQf05lTN4hrNFOW9fiJ3fjXSXKoQAZswuWR dTMQ== MIME-Version: 1.0 X-Received: by 10.14.203.3 with SMTP id e3mr100454591eeo.9.1360178989122; Wed, 06 Feb 2013 11:29:49 -0800 (PST) Received: by 10.14.124.79 with HTTP; Wed, 6 Feb 2013 11:29:48 -0800 (PST) Date: Wed, 6 Feb 2013 11:29:48 -0800 Message-ID: Subject: Guest network on corporate LAN - options for security From: Kurt Buff To: freebsd-net@freebsd.org Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Feb 2013 19:29:50 -0000 All, If this isn't the right list for this, please let me know. Quite some time ago, I set up an unsecured guest VLAN in our network, providing wireless access to all of the sundry devices that staff and visitors carry. I set up a small FreeBSD machine to serve IP addresses via DHCP, and that was dead simple. However, there are now other tenants in our building, and the subnet is getting too much bandwidth and address consumption - the range I set up is completely filled, and the VLAN is consuming about half of our Internet pipe, which is far too much for my comfort. I suspect the other tenants are leeching. Does anyone have ideas on how I can leverage that FreeBSD box to control this? It's not the firewall for the VLAN - it's simple a machine sitting on the subnet. What I've read of captive portals seems to indicate that the portal is part of the firewall, which will not be the case here, as the corporate firewall will not be allowed to be part of this solution. The only other alternative I see right now is to set up a password on the SSID, and have the front desk hand it out to guests, after mailing it to staff, and I'm getting pushback on that from my manager. Thanks, Kurt From owner-freebsd-net@FreeBSD.ORG Wed Feb 6 19:38:13 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 1B9F258F for ; Wed, 6 Feb 2013 19:38:13 +0000 (UTC) (envelope-from amvandemore@gmail.com) Received: from mail-wi0-x22a.google.com (mail-wi0-x22a.google.com [IPv6:2a00:1450:400c:c05::22a]) by mx1.freebsd.org (Postfix) with ESMTP id 9C306351 for ; Wed, 6 Feb 2013 19:38:12 +0000 (UTC) Received: by mail-wi0-f170.google.com with SMTP id hm11so7427805wib.5 for ; Wed, 06 Feb 2013 11:38:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=80QBu6/sYiyGWlLIQq//cfwfhWkdtkKM1zFymbOVgcg=; b=J7wZzpJFMRyUNm3uGOIzm3AKLzBUl05taB/XDa5v6qqqzm6rUJHivRzJ2z2MF14CLp t+kw4dtxXAUjQM5Ai1QtfldRPyswmlYfNF1jI4Ewx4v5V/e1/aIMpMuapUxttBQKgvd8 u2JEDCiHM1tv7KU9vjvOGayGAFRerCDly+JgPTOZCW1bqvtrLiP9j0cFFUzP0+J4NAS1 +4PeCKi5N52dZ0wKVmP5qlo9gRCMxxhi9FJFpEHHKzT30BWjMypnii6bNTiiNg3YMSqG DnYphNT3Kbj04a0NZxiqCVOTifYR+LJvYYoJj/cP68SQOrLVkmE1o5Igw1rxyN8ZIsnK VuSQ== MIME-Version: 1.0 X-Received: by 10.194.109.10 with SMTP id ho10mr6583929wjb.16.1360179491596; Wed, 06 Feb 2013 11:38:11 -0800 (PST) Received: by 10.194.165.170 with HTTP; Wed, 6 Feb 2013 11:38:11 -0800 (PST) In-Reply-To: References: Date: Wed, 6 Feb 2013 13:38:11 -0600 Message-ID: Subject: Re: Guest network on corporate LAN - options for security From: Adam Vande More To: Kurt Buff Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: freebsd-net@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Feb 2013 19:38:13 -0000 On Wed, Feb 6, 2013 at 1:29 PM, Kurt Buff wrote: > All, > > If this isn't the right list for this, please let me know. > > Quite some time ago, I set up an unsecured guest VLAN in our network, > providing wireless access to all of the sundry devices that staff and > visitors carry. I set up a small FreeBSD machine to serve IP addresses > via DHCP, and that was dead simple. > > However, there are now other tenants in our building, and the subnet > is getting too much bandwidth and address consumption - the range I > set up is completely filled, and the VLAN is consuming about half of > our Internet pipe, which is far too much for my comfort. > > I suspect the other tenants are leeching. > > Does anyone have ideas on how I can leverage that FreeBSD box to control > this? > If it were me, I would consider replacing the FreeBSD Box with PfSense. It has a lot of managment features built in so if you're looking to get those without a big time sink otherwise, something like that is the way to go. -- Adam Vande More From owner-freebsd-net@FreeBSD.ORG Wed Feb 6 19:47:43 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 382C19C8 for ; Wed, 6 Feb 2013 19:47:43 +0000 (UTC) (envelope-from kurt.buff@gmail.com) Received: from mail-ee0-f48.google.com (mail-ee0-f48.google.com [74.125.83.48]) by mx1.freebsd.org (Postfix) with ESMTP id B69F53FB for ; Wed, 6 Feb 2013 19:47:42 +0000 (UTC) Received: by mail-ee0-f48.google.com with SMTP id t10so889092eei.7 for ; Wed, 06 Feb 2013 11:47:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=RzFvqAeMQr2DAF/Rwz1+jf617SnhMy14UWCEx4Yx1xs=; b=gU/G2OvkWTgNHApgk7s3PFmfw9VksHsHPhA3Pb6DmAYXu7eHCJiqJMOplh6eb9zSfg SVFKIchqnmblWPZxP1y+KOxWXHgrpiJ5jI3McswNPB6uYbvtILY0aKwLwZBZ2WWfEAdT md4LcoABt8UOVkscKfVIrpAVLGriQ2G5h/6gQ824PCZcDUA+F+KDHhfzBq+yvG0KFOjr gISQxlbPro/E8Zqmd1y2zJqkNfoBwdc+r1K4z7JJI+yt6Hmxs15wRAEfKUNhd6cFjJGa i6HOTwGawie2OEzET0sBhRADmYxirNYDKthNZRLCTbZe/1cRBpMGQN2Q7KtP+EWD1/Ma p0JA== MIME-Version: 1.0 X-Received: by 10.14.203.3 with SMTP id e3mr100623514eeo.9.1360180056458; Wed, 06 Feb 2013 11:47:36 -0800 (PST) Received: by 10.14.124.79 with HTTP; Wed, 6 Feb 2013 11:47:36 -0800 (PST) In-Reply-To: References: Date: Wed, 6 Feb 2013 11:47:36 -0800 Message-ID: Subject: Re: Guest network on corporate LAN - options for security From: Kurt Buff To: Adam Vande More Content-Type: text/plain; charset=UTF-8 Cc: freebsd-net@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Feb 2013 19:47:43 -0000 On Wed, Feb 6, 2013 at 11:38 AM, Adam Vande More wrote: > On Wed, Feb 6, 2013 at 1:29 PM, Kurt Buff wrote: >> >> All, >> >> If this isn't the right list for this, please let me know. >> >> Quite some time ago, I set up an unsecured guest VLAN in our network, >> providing wireless access to all of the sundry devices that staff and >> visitors carry. I set up a small FreeBSD machine to serve IP addresses >> via DHCP, and that was dead simple. >> >> However, there are now other tenants in our building, and the subnet >> is getting too much bandwidth and address consumption - the range I >> set up is completely filled, and the VLAN is consuming about half of >> our Internet pipe, which is far too much for my comfort. >> >> I suspect the other tenants are leeching. >> >> Does anyone have ideas on how I can leverage that FreeBSD box to control >> this? > > > If it were me, I would consider replacing the FreeBSD Box with PfSense. It > has a lot of managment features built in so if you're looking to get those > without a big time sink otherwise, something like that is the way to go. Thanks. I'll take a look at that. Kurt From owner-freebsd-net@FreeBSD.ORG Thu Feb 7 08:09:03 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 0246E9B0 for ; Thu, 7 Feb 2013 08:09:03 +0000 (UTC) (envelope-from lars@netapp.com) Received: from mx1.netapp.com (mx1.netapp.com [216.240.18.38]) by mx1.freebsd.org (Postfix) with ESMTP id DF42B7DB for ; Thu, 7 Feb 2013 08:09:02 +0000 (UTC) X-IronPort-AV: E=Sophos;i="4.84,621,1355126400"; d="scan'208";a="239872721" Received: from smtp2.corp.netapp.com ([10.57.159.114]) by mx1-out.netapp.com with ESMTP; 07 Feb 2013 00:09:02 -0800 Received: from vmwexceht01-prd.hq.netapp.com (exchsmtp.hq.netapp.com [10.106.76.239]) by smtp2.corp.netapp.com (8.13.1/8.13.1/NTAP-1.6) with ESMTP id r17891iY009241; Thu, 7 Feb 2013 00:09:02 -0800 (PST) Received: from SACEXCMBX01-PRD.hq.netapp.com ([169.254.2.54]) by vmwexceht01-prd.hq.netapp.com ([10.106.76.239]) with mapi id 14.02.0328.009; Thu, 7 Feb 2013 00:09:01 -0800 From: "Eggert, Lars" To: Matthew Luckie Subject: Re: high cpu usage on natd / dhcpd Thread-Topic: high cpu usage on natd / dhcpd Thread-Index: AQHN/49K3QG1cuBZpEGa6wjl1WYXnJhkDzQAgAqMkAA= Date: Thu, 7 Feb 2013 08:08:59 +0000 Message-ID: References: <510A87B8.7000705@luckie.org.nz> In-Reply-To: <510A87B8.7000705@luckie.org.nz> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.106.53.51] Content-Type: text/plain; charset="iso-8859-1" Content-ID: <02EEE0B2A5AC25418D7123D96F4D80C5@tahoe.netapp.com> Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Cc: "freebsd-net@freebsd.org" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Feb 2013 08:09:03 -0000 On Jan 31, 2013, at 16:03, Matthew Luckie wrote: >=20 > 00510 allow ip from me to not me out via em1 > 00550 divert 8668 ip from any to any via em1 >=20 > Rule 510 fixes it. Yep, it does. Can I ask someone to commit this to rc.firewall? (And I wonder if the rules for the ipfw kernel firewall need a similar addi= tion, because the system locks up under heavy network load if I use that in= stead of natd.) Lars From owner-freebsd-net@FreeBSD.ORG Thu Feb 7 11:59:09 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 65AAD9AC for ; Thu, 7 Feb 2013 11:59:09 +0000 (UTC) (envelope-from VenkatKumar.Duvvuru@Emulex.Com) Received: from CMEXEDGE2.ext.emulex.com (cmexedge2.ext.emulex.com [138.239.224.100]) by mx1.freebsd.org (Postfix) with ESMTP id 075338E8 for ; Thu, 7 Feb 2013 11:59:08 +0000 (UTC) Received: from CMEXHTCAS1.ad.emulex.com (138.239.115.217) by CMEXEDGE2.ext.emulex.com (138.239.224.100) with Microsoft SMTP Server (TLS) id 14.2.318.4; Thu, 7 Feb 2013 03:59:41 -0800 Received: from CMEXMB1.ad.emulex.com ([169.254.1.163]) by CMEXHTCAS1.ad.emulex.com ([2002:8aef:73d9::8aef:73d9]) with mapi id 14.02.0318.004; Thu, 7 Feb 2013 03:57:56 -0800 From: "Duvvuru,Venkat Kumar" To: "freebsd-net@freebsd.org" Subject: OCE driver patches Thread-Topic: OCE driver patches Thread-Index: Ac4FKZvQ2m3Cu1QRR3u/QVUeigTouw== Date: Thu, 7 Feb 2013 11:57:56 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [138.239.141.147] MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.14 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Feb 2013 11:59:09 -0000 Hi, I have submitted this patch http://www.freebsd.org/cgi/query-pr.cgi?pr=3D1= 71838 some time back. Could you please let me know when this will be pulled= in? I have some more patches to submit. Please let me know if submitting it onl= ine at this link http://www.freebsd.org/send-pr.html is the only way to get= them in or is there an alternative to the patch submission? Thanks, Venkat From owner-freebsd-net@FreeBSD.ORG Thu Feb 7 12:40:14 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 045EC86A; Thu, 7 Feb 2013 12:40:14 +0000 (UTC) (envelope-from smithi@nimnet.asn.au) Received: from sola.nimnet.asn.au (paqi.nimnet.asn.au [115.70.110.159]) by mx1.freebsd.org (Postfix) with ESMTP id 6FD58AEB; Thu, 7 Feb 2013 12:40:12 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by sola.nimnet.asn.au (8.14.2/8.14.2) with ESMTP id r17Ce415023805; Thu, 7 Feb 2013 23:40:05 +1100 (EST) (envelope-from smithi@nimnet.asn.au) Date: Thu, 7 Feb 2013 23:40:04 +1100 (EST) From: Ian Smith To: "Eggert, Lars" Subject: Re: high cpu usage on natd / dhcpd In-Reply-To: Message-ID: <20130207231943.O21988@sola.nimnet.asn.au> References: <510A87B8.7000705@luckie.org.nz> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Cc: "freebsd-net@freebsd.org" , freebsd-ipfw@freebsd.org, Matthew Luckie X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Feb 2013 12:40:14 -0000 On Thu, 7 Feb 2013 08:08:59 +0000, Eggert, Lars wrote: > On Jan 31, 2013, at 16:03, Matthew Luckie wrote: > > > > 00510 allow ip from me to not me out via em1 > > 00550 divert 8668 ip from any to any via em1 > > > > Rule 510 fixes it. > > Yep, it does. Can I ask someone to commit this to rc.firewall? The ruleset Matthew posted bears no resemblance to rc.firewall, so I don't see that (or how) it solves any generic problem. > (And I wonder if the rules for the ipfw kernel firewall need a > similar addition, because the system locks up under heavy network > load if I use that instead of natd.) > > Lars Which rc.firewall ruleset are you referring to? There certainly are problems with the 'simple' ruleset relating to use of $natd_enable vs $firewall_nat_enable (not to mention the denial of ALL icmp traffic) that I posted patches to a couple of years ago in ipfw@ to rc.firewall and /etc/rc.d/{ipfw,natd) addressing about 4 PRs .. sadly to no avail. I suggest following up to ipfw@ (cc'd) rather than net@ cheers, Ian From owner-freebsd-net@FreeBSD.ORG Thu Feb 7 12:50:54 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 85AF8A85; Thu, 7 Feb 2013 12:50:54 +0000 (UTC) (envelope-from lars@netapp.com) Received: from mx12.netapp.com (mx12.netapp.com [216.240.18.77]) by mx1.freebsd.org (Postfix) with ESMTP id 69A84B6D; Thu, 7 Feb 2013 12:50:54 +0000 (UTC) X-IronPort-AV: E=Sophos;i="4.84,622,1355126400"; d="scan'208";a="17566565" Received: from smtp1.corp.netapp.com ([10.57.156.124]) by mx12-out.netapp.com with ESMTP; 07 Feb 2013 04:50:53 -0800 Received: from vmwexceht04-prd.hq.netapp.com (vmwexceht04-prd.hq.netapp.com [10.106.77.34]) by smtp1.corp.netapp.com (8.13.1/8.13.1/NTAP-1.6) with ESMTP id r17Coqa0009706; Thu, 7 Feb 2013 04:50:52 -0800 (PST) Received: from SACEXCMBX01-PRD.hq.netapp.com ([169.254.2.54]) by vmwexceht04-prd.hq.netapp.com ([10.106.77.34]) with mapi id 14.02.0328.009; Thu, 7 Feb 2013 04:50:52 -0800 From: "Eggert, Lars" To: Ian Smith Subject: Re: high cpu usage on natd / dhcpd Thread-Topic: high cpu usage on natd / dhcpd Thread-Index: AQHN/49K3QG1cuBZpEGa6wjl1WYXnJhkDzQAgAqMkACAAEu6AIAAAwSA Date: Thu, 7 Feb 2013 12:50:51 +0000 Message-ID: References: <510A87B8.7000705@luckie.org.nz> <20130207231943.O21988@sola.nimnet.asn.au> In-Reply-To: <20130207231943.O21988@sola.nimnet.asn.au> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.106.53.51] Content-Type: text/plain; charset="us-ascii" Content-ID: Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Cc: "freebsd-net@freebsd.org" , "" , Matthew Luckie X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Feb 2013 12:50:54 -0000 Hi, On Feb 7, 2013, at 13:40, Ian Smith wrote: > On Thu, 7 Feb 2013 08:08:59 +0000, Eggert, Lars wrote: >> On Jan 31, 2013, at 16:03, Matthew Luckie wrote: >>>=20 >>> 00510 allow ip from me to not me out via em1 >>> 00550 divert 8668 ip from any to any via em1 >>>=20 >>> Rule 510 fixes it. >>=20 >> Yep, it does. Can I ask someone to commit this to rc.firewall? >=20 > The ruleset Matthew posted bears no resemblance to rc.firewall, so I=20 > don't see that (or how) it solves any generic problem. sorry for having been imprecise. What I was asking for was this change: --- /usr/src/etc/rc.firewall 2012-11-17 12:36:10.000000000 +0100 +++ rc.firewall 2013-02-06 11:35:45.000000000 +0100 @@ -155,6 +155,7 @@ case ${natd_enable} in [Yy][Ee][Ss]) if [ -n "${natd_interface}" ]; then + ${fwcmd} add 49 allow ip from me to not me out via ${natd_interface} ${fwcmd} add 50 divert natd ip4 from any to any via ${natd_interface} fi ;; >> (And I wonder if the rules for the ipfw kernel firewall need a=20 >> similar addition, because the system locks up under heavy network=20 >> load if I use that instead of natd.) >=20 > Which rc.firewall ruleset are you referring to? My rc.conf has: gateway_enable=3D"YES"=20 firewall_enable=3D"YES"=20 firewall_type=3D"OPEN"=20 natd_enable=3D"YES" natd_interface=3D"bce0" With the patch above, that seems to work fine. I tried to replace the natd_* lines with: firewall_nat_enable=3D"YES" firewall_nat_interface=3D"bce0" which caused the machine to lock up under load, similar to when natd starte= d eating CPU cycles. This made me wonder if a similar patch to the above fo= r the firewall_nat_* case in rc.firewall might be needed. > There certainly are=20 > problems with the 'simple' ruleset relating to use of $natd_enable vs=20 > $firewall_nat_enable (not to mention the denial of ALL icmp traffic)=20 > that I posted patches to a couple of years ago in ipfw@ to rc.firewall=20 > and /etc/rc.d/{ipfw,natd) addressing about 4 PRs .. sadly to no avail. >=20 > I suggest following up to ipfw@ (cc'd) rather than net@ Will subscribe, thanks. Lars= From owner-freebsd-net@FreeBSD.ORG Thu Feb 7 14:20:02 2013 Return-Path: Delivered-To: freebsd-net@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 6E80FA5F for ; Thu, 7 Feb 2013 14:20:02 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id 5F765155 for ; Thu, 7 Feb 2013 14:20:02 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.6/8.14.6) with ESMTP id r17EK1Iw017212 for ; Thu, 7 Feb 2013 14:20:01 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.6/8.14.6/Submit) id r17EK1wH017211; Thu, 7 Feb 2013 14:20:01 GMT (envelope-from gnats) Date: Thu, 7 Feb 2013 14:20:01 GMT Message-Id: <201302071420.r17EK1wH017211@freefall.freebsd.org> To: freebsd-net@FreeBSD.org Cc: From: Andrey Simonenko Subject: Re: bin/131567: Update for regression/sockets/unix_cmsg X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: Andrey Simonenko List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Feb 2013 14:20:02 -0000 The following reply was made to PR bin/131567; it has been noted by GNATS. From: Andrey Simonenko To: bug-followup@freebsd.org Cc: Subject: Re: bin/131567: Update for regression/sockets/unix_cmsg Date: Thu, 7 Feb 2013 16:16:31 +0200 Completely redesigned unix_cmsg with improved logic. Details in README. diff -ruNp unix_cmsg.orig/README unix_cmsg/README --- unix_cmsg.orig/README 2012-11-19 14:38:48.000000000 +0200 +++ unix_cmsg/README 2013-02-07 15:56:03.000000000 +0200 @@ -1,127 +1,160 @@ $FreeBSD: src/tools/regression/sockets/unix_cmsg/README,v 1.2 2012/11/17 01:53:57 svnexp Exp $ About unix_cmsg -================ +=============== -This program is a collection of regression tests for ancillary (control) -data for PF_LOCAL sockets (local domain or Unix domain sockets). There -are tests for stream and datagram sockets. - -Usually each test does following steps: create Server, fork Client, -Client sends something to Server, Server verifies if everything -is correct in received message. Sometimes Client sends several -messages to Server. +This program is a collection of regression tests for ancillary data +(control information) for PF_LOCAL sockets (local domain or Unix domain +sockets). There are tests for stream and datagram sockets. + +Usually each test does following steps: creates Server, forks Client, +Client sends something to Server, Server verifies whether everything is +correct in received message(s). It is better to change the owner of unix_cmsg to some safe user -(eg. nobody:nogroup) and set SUID and SGID bits, else some tests -can give correct results for wrong implementation. +(eg. nobody:nogroup) and set SUID and SGID bits, else some tests that +check credentials can give correct results for wrong implementation. + +It is better to run this program by a user that belongs to more +than 16 groups. Available options ================= --d Output debugging information, values of different fields of - received messages, etc. Will produce many lines of information. - --h Output help message and exit. +usage: unix_cmsg [-dh] [-n num] [-s size] [-t type] [-z value] [testno] --t - Run tests only for the given socket type: "stream" or "dgram". - With this option it is possible to run only particular test, - not all of them. - --z Do not send real control data if possible. Struct cmsghdr{} - should be followed by real control data. It is not clear if - a sender should give control data in all cases (this is not - documented and an arbitrary application can choose anything). - - At least for PF_LOCAL sockets' control messages with types - SCM_CREDS and SCM_TIMESTAMP the kernel does not need any - control data. This option allow to not send real control data - for SCM_CREDS and SCM_TIMESTAMP control messages. + Options are: + -d Output debugging information + -h Output the help message and exit + -n num Number of messages to send + -s size Specify size of data for IPC + -t type Specify socket type (stream, dgram) for tests + -z value Do not send data in a message (bit 0x1), do not send + data array associated with a cmsghdr structure (bit 0x2) + testno Run one test by its number (require the -t option) Description of tests ==================== +If Client sends something to Server, then it sends 5 messages by default. +Number of messages can be changed in the -n command line option. Number +of messages will be given as N in the following descriptions. + +If Client sends something to Server, then it sends some data (few bytes) +in each message by default. The size of this data can be changed by the -s +command line option. The "-s 0" command line option means, that Client will +send zero bytes represented by { NULL, 0 } value of struct iovec{}, referenced +by the msg_iov field from struct msghdr{}. The "-z 1" or "-z 3" command line +option means, that Client will send zero bytes represented by the NULL value +in the msg_iov field from struct msghdr{}. + +If Client sends some ancillary data object, then this ancillary data object +always has associated data array by default. The "-z 2" or "-z 3" option +means, that Client will not send associated data array if possible. + For SOCK_STREAM sockets: ----------------------- 1: Sending, receiving cmsgcred - Client connects to Server and sends two messages with data and - control message with SCM_CREDS type to Server. Server should - receive two messages, in both messages there should be data and - control message with SCM_CREDS type followed by struct cmsgcred{} - and this structure should contain correct information. - - 2: Receiving sockcred (listening socket has LOCAL_CREDS) - - Server creates listen socket and set socket option LOCAL_CREDS - for it. Client connects to Server and sends two messages with data - to Server. Server should receive two messages, in first message - there should be data and control message with SCM_CREDS type followed - by struct sockcred{} and this structure should contain correct - information, in second message there should be data and no control - message. - - 3: Receiving sockcred (accepted socket has LOCAL_CREDS) - - Client connects to Server and sends two messages with data. Server - accepts connection and set socket option LOCAL_CREDS for just accepted - socket (here synchronization is used, to allow Client to see just set - flag on Server's socket before sending messages to Server). Server - should receive two messages, in first message there should be data and - control message with SOCK_CRED type followed by struct sockcred{} and - this structure should contain correct information, in second message - there should be data and no control message. + Client connects to Server and sends N messages with SCM_CREDS ancillary + data object. Server should receive N messages, each message should + have SCM_CREDS ancillary data object followed by struct cmsgcred{}. + + 2: Receiving sockcred (listening socket) + + Server creates a listening stream socket and sets the LOCAL_CREDS + socket option for it. Client connects to Server two times, each time + it sends N messages. Server accepts two connections and receives N + messages from each connection. The first message from each connection + should have SCM_CREDS ancillary data object followed by struct sockcred{}, + next messages from the same connection should not have ancillary data. + + 3: Receiving sockcred (accepted socket) + + Client connects to Server. Server accepts connection and sets the + LOCAL_CREDS socket option for just accepted socket. Client sends N + messages to Server. Server should receive N messages, the first + message should have SCM_CREDS ancillary data object followed by + struct sockcred{}, next messages should not have ancillary data. 4: Sending cmsgcred, receiving sockcred - Server creates listen socket and set socket option LOCAL_CREDS - for it. Client connects to Server and sends one message with data - and control message with SCM_CREDS type to Server. Server should - receive one message with data and control message with SCM_CREDS type - followed by struct sockcred{} and this structure should contain - correct information. - - 5: Sending, receiving timestamp - - Client connects to Server and sends message with data and control - message with SCM_TIMESTAMP type to Server. Server should receive - message with data and control message with SCM_TIMESTAMP type - followed by struct timeval{}. + Server creates a listening stream socket and sets the LOCAL_CREDS + socket option for it. Client connects to Server and sends N messages + with SCM_CREDS ancillary data object. Server should receive N messages, + the first message should have SCM_CREDS ancillary data object followed + by struct sockcred{}, each of next messages should have SCM_CREDS + ancillary data object followed by struct cmsgcred{}. + + 5: Sending, receiving timeval + + Client connects to Server and sends message with SCM_TIMESTAMP ancillary + data object. Server should receive one message with SCM_TIMESTAMP + ancillary data object followed by struct timeval{}. + + 6: Sending, receiving bintime + + Client connects to Server and sends message with SCM_BINTIME ancillary + data object. Server should receive one message with SCM_BINTIME + ancillary data object followed by struct bintime{}. + + 7: Checking cmsghdr.cmsg_len + + Client connects to Server and tries to send several messages with + SCM_CREDS ancillary data object that has wrong cmsg_len field in its + struct cmsghdr{}. All these attempts should fail, since cmsg_len + in all requests is less than CMSG_LEN(0). + + 8: Check LOCAL_PEERCRED socket option + + This test does not use ancillary data, but can be implemented here. + Client connects to Server. Both Client and Server verify that + credentials of the peer are correct using LOCAL_PEERCRED socket option. For SOCK_DGRAM sockets: ---------------------- 1: Sending, receiving cmsgcred - Client sends to Server two messages with data and control message - with SCM_CREDS type to Server. Server should receive two messages, - in both messages there should be data and control message with - SCM_CREDS type followed by struct cmsgcred{} and this structure - should contain correct information. + Client connects to Server and sends N messages with SCM_CREDS ancillary + data object. Server should receive N messages, each message should + have SCM_CREDS ancillary data object followed by struct cmsgcred{}. 2: Receiving sockcred - Server creates datagram socket and set socket option LOCAL_CREDS - for it. Client sends two messages with data to Server. Server should - receive two messages, in both messages there should be data and control - message with SCM_CREDS type followed by struct sockcred{} and this - structure should contain correct information. + Server creates datagram socket and sets the LOCAL_CREDS socket option + for it. Client sends N messages to Server. Server should receive N + messages, each message should have SCM_CREDS ancillary data object + followed by struct sockcred{}. 3: Sending cmsgcred, receiving sockcred - - Server creates datagram socket and set socket option LOCAL_CREDS - for it. Client sends one message with data and control message with - SOCK_CREDS type to Server. Server should receive one message with - data and control message with SCM_CREDS type followed by struct - sockcred{} and this structure should contain correct information. - - 4: Sending, receiving timestamp - - Client sends message with data and control message with SCM_TIMESTAMP - type to Server. Server should receive message with data and control - message with SCM_TIMESTAMP type followed by struct timeval{}. + + Server creates datagram socket and sets the LOCAL_CREDS socket option + for it. Client sends N messages with SCM_CREDS ancillary data object + to Server. Server should receive N messages, the first message should + have SCM_CREDS ancillary data object followed by struct sockcred{}, + each of next messages should have SCM_CREDS ancillary data object + followed by struct cmsgcred{}. + + 4: Sending, receiving timeval + + Client sends one message with SCM_TIMESTAMP ancillary data object + to Server. Server should receive one message with SCM_TIMESTAMP + ancillary data object followed by struct timeval{}. + + 5: Sending, receiving bintime + + Client sends one message with SCM_BINTIME ancillary data object + to Server. Server should receive one message with SCM_BINTIME + ancillary data object followed by struct bintime{}. + + 6: Checking cmsghdr.cmsg_len + + Client tries to send Server several messages with SCM_CREDS ancillary + data object that has wrong cmsg_len field in its struct cmsghdr{}. All + these attempts should fail, since cmsg_len in all requests is less than + CMSG_LEN(0). - Andrey Simonenko -simon@comsys.ntu-kpi.kiev.ua +andreysimonenko@users.sourceforge.net diff -ruNp unix_cmsg.orig/unix_cmsg.c unix_cmsg/unix_cmsg.c --- unix_cmsg.orig/unix_cmsg.c 2012-11-20 11:26:18.000000000 +0200 +++ unix_cmsg/unix_cmsg.c 2013-02-07 16:09:02.000000000 +0200 @@ -27,48 +27,45 @@ #include __FBSDID("$FreeBSD: src/tools/regression/sockets/unix_cmsg/unix_cmsg.c,v 1.5 2012/11/19 22:59:17 svnexp Exp $"); -#include +#include #include #include +#include #include +#include #include #include -#include #include #include #include +#include #include #include -#include +#include #include #include +#include #include #include #include #include -#include #include /* * There are tables with tests descriptions and pointers to test * functions. Each t_*() function returns 0 if its test passed, - * -1 if its test failed (something wrong was found in local domain - * control messages), -2 if some system error occurred. If test - * function returns -2, then a program exits. + * -1 if its test failed, -2 if some system error occurred. + * If a test function returns -2, then a program exits. * - * Each test function completely control what to do (eg. fork or - * do not fork a client process). If a test function forks a client - * process, then it waits for its termination. If a return code of a - * client process is not equal to zero, or if a client process was - * terminated by a signal, then test function returns -2. + * If a test function forks a client process, then it waits for its + * termination. If a return code of a client process is not equal + * to zero, or if a client process was terminated by a signal, then + * a test function returns -1 or -2 depending on exit status of client. * - * Each test function and complete program are not optimized - * a lot to allow easy to modify tests. - * - * Each function which can block, is run under TIMEOUT, if timeout - * occurs, then test function returns -2 or a client process exits - * with nonzero return code. + * Each function which can block, is run under TIMEOUT. If timeout + * occurs, then a test function returns -2 or a client process exits + * with a non-zero return code. */ #ifndef LISTENQ @@ -76,207 +73,292 @@ __FBSDID("$FreeBSD: src/tools/regression #endif #ifndef TIMEOUT -# define TIMEOUT 60 +# define TIMEOUT 3 #endif -#define EXTRA_CMSG_SPACE 512 /* Memory for not expected control data. */ - -static int t_cmsgcred(void), t_sockcred_stream1(void); -static int t_sockcred_stream2(void), t_cmsgcred_sockcred(void); -static int t_sockcred_dgram(void), t_timestamp(void); +static int t_cmsgcred(void); +static int t_sockcred_1(void); +static int t_sockcred_2(void); +static int t_cmsgcred_sockcred(void); +static int t_timeval(void); +static int t_bintime(void); +static int t_cmsg_len(void); +static int t_peercred(void); struct test_func { - int (*func)(void); /* Pointer to function. */ - const char *desc; /* Test description. */ -}; - -static struct test_func test_stream_tbl[] = { - { NULL, " 0: All tests" }, - { t_cmsgcred, " 1: Sending, receiving cmsgcred" }, - { t_sockcred_stream1, " 2: Receiving sockcred (listening socket has LOCAL_CREDS)" }, - { t_sockcred_stream2, " 3: Receiving sockcred (accepted socket has LOCAL_CREDS)" }, - { t_cmsgcred_sockcred, " 4: Sending cmsgcred, receiving sockcred" }, - { t_timestamp, " 5: Sending, receiving timestamp" }, - { NULL, NULL } + int (*func)(void); + const char *desc; }; -static struct test_func test_dgram_tbl[] = { - { NULL, " 0: All tests" }, - { t_cmsgcred, " 1: Sending, receiving cmsgcred" }, - { t_sockcred_dgram, " 2: Receiving sockcred" }, - { t_cmsgcred_sockcred, " 3: Sending cmsgcred, receiving sockcred" }, - { t_timestamp, " 4: Sending, receiving timestamp" }, - { NULL, NULL } +static const struct test_func test_stream_tbl[] = { + { + .func = NULL, + .desc = "All tests" + }, + { + .func = t_cmsgcred, + .desc = "Sending, receiving cmsgcred" + }, + { + .func = t_sockcred_1, + .desc = "Receiving sockcred (listening socket)" + }, + { + .func = t_sockcred_2, + .desc = "Receiving sockcred (accepted socket)" + }, + { + .func = t_cmsgcred_sockcred, + .desc = "Sending cmsgcred, receiving sockcred" + }, + { + .func = t_timeval, + .desc = "Sending, receiving timeval" + }, + { + .func = t_bintime, + .desc = "Sending, receiving bintime" + }, + { + .func = t_cmsg_len, + .desc = "Check cmsghdr.cmsg_len" + }, + { + .func = t_peercred, + .desc = "Check LOCAL_PEERCRED socket option" + } }; -#define TEST_STREAM_NO_MAX (sizeof(test_stream_tbl) / sizeof(struct test_func) - 2) -#define TEST_DGRAM_NO_MAX (sizeof(test_dgram_tbl) / sizeof(struct test_func) - 2) - -static const char *myname = "SERVER"; /* "SERVER" or "CLIENT" */ - -static int debug = 0; /* 1, if -d. */ -static int no_control_data = 0; /* 1, if -z. */ - -static u_int nfailed = 0; /* Number of failed tests. */ +#define TEST_STREAM_TBL_SIZE \ + (sizeof(test_stream_tbl) / sizeof(test_stream_tbl[0])) -static int sock_type; /* SOCK_STREAM or SOCK_DGRAM */ -static const char *sock_type_str; /* "SOCK_STREAM" or "SOCK_DGRAN" */ - -static char tempdir[] = "/tmp/unix_cmsg.XXXXXXX"; -static char serv_sock_path[PATH_MAX]; - -static char ipc_message[] = "hello"; - -#define IPC_MESSAGE_SIZE (sizeof(ipc_message)) - -static struct sockaddr_un servaddr; /* Server address. */ - -static sigjmp_buf env_alrm; +static const struct test_func test_dgram_tbl[] = { + { + .func = NULL, + .desc = "All tests" + }, + { + .func = t_cmsgcred, + .desc = "Sending, receiving cmsgcred" + }, + { + .func = t_sockcred_2, + .desc = "Receiving sockcred" + }, + { + .func = t_cmsgcred_sockcred, + .desc = "Sending cmsgcred, receiving sockcred" + }, + { + .func = t_timeval, + .desc = "Sending, receiving timeval" + }, + { + .func = t_bintime, + .desc = "Sending, receiving bintime" + }, + { + .func = t_cmsg_len, + .desc = "Check cmsghdr.cmsg_len" + } +}; -static uid_t my_uid; -static uid_t my_euid; -static gid_t my_gid; -static gid_t my_egid; +#define TEST_DGRAM_TBL_SIZE \ + (sizeof(test_dgram_tbl) / sizeof(test_dgram_tbl[0])) -/* - * my_gids[0] is EGID, next items are supplementary GIDs, - * my_ngids determines valid items in my_gids array. - */ -static gid_t my_gids[NGROUPS_MAX]; -static int my_ngids; +static bool debug = false; +static bool server_flag = true; +static bool send_data_flag = true; +static bool send_array_flag = true; +static bool failed_flag = false; + +static int sock_type; +static const char *sock_type_str; + +static const char *proc_name; + +static char tempdir[] = _PATH_TMP "unix_cmsg.XXXXXXX"; +static int serv_sock_fd; +static struct sockaddr_un serv_addr_sun; + +static struct { + char *buf_orig; + char *buf_recv; + size_t buf_size; + u_int msg_num; +} ipc_msg; + +#define IPC_MSG_NUM_DEF 5 +#define IPC_MSG_NUM_MAX 10 +#define IPC_MSG_SIZE_DEF 7 +#define IPC_MSG_SIZE_MAX 128 + +#define CMSG_SPACE_EXTRA 64 + +static struct { + uid_t uid; + uid_t euid; + gid_t gid; + gid_t egid; + gid_t *gid_arr; + int gid_num; +} proc_cred; + +static pid_t client_pid; + +#define SYNC_SERVER 0 +#define SYNC_CLIENT 1 +#define SYNC_RECV 0 +#define SYNC_SEND 1 -static pid_t client_pid; /* PID of forked client. */ +static int sync_fd[2][2]; -#define dbgmsg(x) do { \ - if (debug) \ - logmsgx x ; \ -} while (/* CONSTCOND */0) +#define LOGMSG_SIZE 128 static void logmsg(const char *, ...) __printflike(1, 2); static void logmsgx(const char *, ...) __printflike(1, 2); +static void dbgmsg(const char *, ...) __printflike(1, 2); static void output(const char *, ...) __printflike(1, 2); -extern char *__progname; /* The name of program. */ - -/* - * Output the help message (-h switch). - */ static void -usage(int quick) +usage(bool verbose) { - const struct test_func *test_func; + u_int i; - fprintf(stderr, "Usage: %s [-dhz] [-t ] [testno]\n", - __progname); - if (quick) + printf("usage: %s [-dh] [-n num] [-s size] [-t type] " + "[-z value] [testno]\n", getprogname()); + if (!verbose) return; - fprintf(stderr, "\n Options are:\n\ - -d\t\t\tOutput debugging information\n\ - -h\t\t\tOutput this help message and exit\n\ - -t \t\tRun test only for the given socket type:\n\ -\t\t\tstream or dgram\n\ - -z\t\t\tDo not send real control data if possible\n\n"); - fprintf(stderr, " Available tests for stream sockets:\n"); - for (test_func = test_stream_tbl; test_func->desc != NULL; ++test_func) - fprintf(stderr, " %s\n", test_func->desc); - fprintf(stderr, "\n Available tests for datagram sockets:\n"); - for (test_func = test_dgram_tbl; test_func->desc != NULL; ++test_func) - fprintf(stderr, " %s\n", test_func->desc); + printf("\n Options are:\n\ + -d Output debugging information\n\ + -h Output the help message and exit\n\ + -n num Number of messages to send\n\ + -s size Specify size of data for IPC\n\ + -t type Specify socket type (stream, dgram) for tests\n\ + -z value Do not send data in a message (bit 0x1), do not send\n\ + data array associated with a cmsghdr structure (bit 0x2)\n\ + testno Run one test by its number (require the -t option)\n\n"); + printf(" Available tests for stream sockets:\n"); + for (i = 0; i < TEST_STREAM_TBL_SIZE; ++i) + printf(" %u: %s\n", i, test_stream_tbl[i].desc); + printf("\n Available tests for datagram sockets:\n"); + for (i = 0; i < TEST_DGRAM_TBL_SIZE; ++i) + printf(" %u: %s\n", i, test_dgram_tbl[i].desc); } -/* - * printf-like function for outputting to STDOUT_FILENO. - */ static void output(const char *format, ...) { - char buf[128]; + char buf[LOGMSG_SIZE]; va_list ap; va_start(ap, format); if (vsnprintf(buf, sizeof(buf), format, ap) < 0) - err(EX_SOFTWARE, "output: vsnprintf failed"); + err(EXIT_FAILURE, "output: vsnprintf failed"); write(STDOUT_FILENO, buf, strlen(buf)); va_end(ap); } -/* - * printf-like function for logging, also outputs message for errno. - */ static void logmsg(const char *format, ...) { - char buf[128]; + char buf[LOGMSG_SIZE]; va_list ap; int errno_save; - errno_save = errno; /* Save errno. */ - + errno_save = errno; va_start(ap, format); if (vsnprintf(buf, sizeof(buf), format, ap) < 0) - err(EX_SOFTWARE, "logmsg: vsnprintf failed"); + err(EXIT_FAILURE, "logmsg: vsnprintf failed"); if (errno_save == 0) - output("%s: %s\n", myname, buf); + output("%s: %s\n", proc_name, buf); else - output("%s: %s: %s\n", myname, buf, strerror(errno_save)); + output("%s: %s: %s\n", proc_name, buf, strerror(errno_save)); va_end(ap); + errno = errno_save; +} + +static void +vlogmsgx(const char *format, va_list ap) +{ + char buf[LOGMSG_SIZE]; + + if (vsnprintf(buf, sizeof(buf), format, ap) < 0) + err(EXIT_FAILURE, "logmsgx: vsnprintf failed"); + output("%s: %s\n", proc_name, buf); - errno = errno_save; /* Restore errno. */ } -/* - * printf-like function for logging, do not output message for errno. - */ static void logmsgx(const char *format, ...) { - char buf[128]; va_list ap; va_start(ap, format); - if (vsnprintf(buf, sizeof(buf), format, ap) < 0) - err(EX_SOFTWARE, "logmsgx: vsnprintf failed"); - output("%s: %s\n", myname, buf); + vlogmsgx(format, ap); va_end(ap); } -/* - * Run tests from testno1 to testno2. - */ +static void +dbgmsg(const char *format, ...) +{ + va_list ap; + + if (debug) { + va_start(ap, format); + vlogmsgx(format, ap); + va_end(ap); + } +} + static int -run_tests(u_int testno1, u_int testno2) +run_tests(int type, u_int testno1) { - const struct test_func *test_func; - u_int i, nfailed1; + const struct test_func *tf; + u_int i, testno2, failed_num; - output("Running tests for %s sockets:\n", sock_type_str); - test_func = (sock_type == SOCK_STREAM ? - test_stream_tbl : test_dgram_tbl) + testno1; + sock_type = type; + if (type == SOCK_STREAM) { + sock_type_str = "SOCK_STREAM"; + tf = test_stream_tbl; + i = TEST_STREAM_TBL_SIZE - 1; + } else { + sock_type_str = "SOCK_DGRAM"; + tf = test_dgram_tbl; + i = TEST_DGRAM_TBL_SIZE - 1; + } + if (testno1 == 0) { + testno1 = 1; + testno2 = i; + } else + testno2 = testno1; - nfailed1 = 0; - for (i = testno1; i <= testno2; ++test_func, ++i) { - output(" %s\n", test_func->desc); - switch (test_func->func()) { + output("Running tests for %s sockets:\n", sock_type_str); + failed_num = 0; + for (i = testno1, tf += testno1; i <= testno2; ++tf, ++i) { + output(" %u: %s\n", i, tf->desc); + switch (tf->func()) { case -1: - ++nfailed1; + ++failed_num; break; case -2: - logmsgx("some system error occurred, exiting"); + logmsgx("some system error or timeout occurred"); return (-1); } } - nfailed += nfailed1; + if (failed_num != 0) + failed_flag = true; if (testno1 != testno2) { - if (nfailed1 == 0) - output("-- all tests were passed!\n"); + if (failed_num == 0) + output("-- all tests passed!\n"); else - output("-- %u test%s failed!\n", nfailed1, - nfailed1 == 1 ? "" : "s"); + output("-- %u test%s failed!\n", + failed_num, failed_num == 1 ? "" : "s"); } else { - if (nfailed == 0) - output("-- test was passed!\n"); + if (failed_num == 0) + output("-- test passed!\n"); else output("-- test failed!\n"); } @@ -284,183 +366,322 @@ run_tests(u_int testno1, u_int testno2) return (0); } -/* ARGSUSED */ -static void -sig_alrm(int signo __unused) +static int +init(void) +{ + struct sigaction sigact; + size_t idx; + int rv; + + proc_name = "SERVER"; + + sigact.sa_handler = SIG_IGN; + sigact.sa_flags = 0; + sigemptyset(&sigact.sa_mask); + if (sigaction(SIGPIPE, &sigact, (struct sigaction *)NULL) < 0) { + logmsg("init: sigaction"); + return (-1); + } + + if (ipc_msg.buf_size == 0) + ipc_msg.buf_orig = ipc_msg.buf_recv = NULL; + else { + ipc_msg.buf_orig = malloc(ipc_msg.buf_size); + ipc_msg.buf_recv = malloc(ipc_msg.buf_size); + if (ipc_msg.buf_orig == NULL || ipc_msg.buf_recv == NULL) { + logmsg("init: malloc"); + return (-1); + } + for (idx = 0; idx < ipc_msg.buf_size; ++idx) + ipc_msg.buf_orig[idx] = (char)idx; + } + + proc_cred.uid = getuid(); + proc_cred.euid = geteuid(); + proc_cred.gid = getgid(); + proc_cred.egid = getegid(); + proc_cred.gid_num = getgroups(0, (gid_t *)NULL); + if (proc_cred.gid_num < 0) { + logmsg("init: getgroups"); + return (-1); + } + proc_cred.gid_arr = malloc(proc_cred.gid_num * + sizeof(*proc_cred.gid_arr)); + if (proc_cred.gid_arr == NULL) { + logmsg("init: malloc"); + return (-1); + } + if (getgroups(proc_cred.gid_num, proc_cred.gid_arr) < 0) { + logmsg("init: getgroups"); + return (-1); + } + + memset(&serv_addr_sun, 0, sizeof(serv_addr_sun)); + rv = snprintf(serv_addr_sun.sun_path, sizeof(serv_addr_sun.sun_path), + "%s/%s", tempdir, proc_name); + if (rv < 0) { + logmsg("init: snprintf"); + return (-1); + } + if ((size_t)rv >= sizeof(serv_addr_sun.sun_path)) { + logmsgx("init: not enough space for socket pathname"); + return (-1); + } + serv_addr_sun.sun_family = PF_LOCAL; + serv_addr_sun.sun_len = SUN_LEN(&serv_addr_sun); + + return (0); +} + +static int +client_fork(void) { - siglongjmp(env_alrm, 1); + int fd1, fd2; + + if (pipe(sync_fd[SYNC_SERVER]) < 0 || + pipe(sync_fd[SYNC_CLIENT]) < 0) { + logmsg("client_fork: pipe"); + return (-1); + } + client_pid = fork(); + if (client_pid == (pid_t)-1) { + logmsg("client_fork: fork"); + return (-1); + } + if (client_pid == 0) { + proc_name = "CLIENT"; + server_flag = false; + fd1 = sync_fd[SYNC_SERVER][SYNC_RECV]; + fd2 = sync_fd[SYNC_CLIENT][SYNC_SEND]; + } else { + fd1 = sync_fd[SYNC_SERVER][SYNC_SEND]; + fd2 = sync_fd[SYNC_CLIENT][SYNC_RECV]; + } + if (close(fd1) < 0 || close(fd2) < 0) { + logmsg("client_fork: close"); + return (-1); + } + return (client_pid != 0); } -/* - * Initialize signals handlers. - */ static void -sig_init(void) +client_exit(int rv) +{ + if (close(sync_fd[SYNC_SERVER][SYNC_SEND]) < 0 || + close(sync_fd[SYNC_CLIENT][SYNC_RECV]) < 0) { + logmsg("client_exit: close"); + rv = -1; + } + rv = rv == 0 ? EXIT_SUCCESS : -rv; + dbgmsg("exit: code %d", rv); + _exit(rv); +} + +static int +client_wait(void) { - struct sigaction sa; + int status; + pid_t pid; - sa.sa_handler = SIG_IGN; - sigemptyset(&sa.sa_mask); - sa.sa_flags = 0; - if (sigaction(SIGPIPE, &sa, (struct sigaction *)NULL) < 0) - err(EX_OSERR, "sigaction(SIGPIPE)"); - - sa.sa_handler = sig_alrm; - if (sigaction(SIGALRM, &sa, (struct sigaction *)NULL) < 0) - err(EX_OSERR, "sigaction(SIGALRM)"); + dbgmsg("waiting for client"); + + if (close(sync_fd[SYNC_SERVER][SYNC_RECV]) < 0 || + close(sync_fd[SYNC_CLIENT][SYNC_SEND]) < 0) { + logmsg("client_wait: close"); + return (-1); + } + + pid = waitpid(client_pid, &status, 0); + if (pid == (pid_t)-1) { + logmsg("client_wait: waitpid"); + return (-1); + } + + if (WIFEXITED(status)) { + if (WEXITSTATUS(status) != EXIT_SUCCESS) { + logmsgx("client exit status is %d", + WEXITSTATUS(status)); + return (-WEXITSTATUS(status)); + } + } else { + if (WIFSIGNALED(status)) + logmsgx("abnormal termination of client, signal %d%s", + WTERMSIG(status), WCOREDUMP(status) ? + " (core file generated)" : ""); + else + logmsgx("termination of client, unknown status"); + return (-1); + } + + return (0); } int main(int argc, char *argv[]) { const char *errstr; - int opt, dgramflag, streamflag; - u_int testno1, testno2; - - dgramflag = streamflag = 0; - while ((opt = getopt(argc, argv, "dht:z")) != -1) + u_int testno, zvalue; + int opt, rv; + bool dgram_flag, stream_flag; + + ipc_msg.buf_size = IPC_MSG_SIZE_DEF; + ipc_msg.msg_num = IPC_MSG_NUM_DEF; + dgram_flag = stream_flag = false; + while ((opt = getopt(argc, argv, "dhn:s:t:z:")) != -1) switch (opt) { case 'd': - debug = 1; + debug = true; break; case 'h': - usage(0); - return (EX_OK); + usage(true); + return (EXIT_SUCCESS); + case 'n': + ipc_msg.msg_num = strtonum(optarg, 1, + IPC_MSG_NUM_MAX, &errstr); + if (errstr != NULL) + errx(EXIT_FAILURE, "option -n: %s", errstr); + break; + case 's': + ipc_msg.buf_size = strtonum(optarg, 0, + IPC_MSG_SIZE_MAX, &errstr); + if (errstr != NULL) + errx(EXIT_FAILURE, "option -s: %s", errstr); + break; case 't': if (strcmp(optarg, "stream") == 0) - streamflag = 1; + stream_flag = true; else if (strcmp(optarg, "dgram") == 0) - dgramflag = 1; + dgram_flag = true; else - errx(EX_USAGE, "wrong socket type in -t option"); + errx(EXIT_FAILURE, "option -t: " + "wrong socket type"); break; case 'z': - no_control_data = 1; + zvalue = strtonum(optarg, 0, 3, &errstr); + if (errstr != NULL) + errx(EXIT_FAILURE, "option -z: %s", errstr); + if (zvalue & 0x1) + send_data_flag = false; + if (zvalue & 0x2) + send_array_flag = false; break; - case '?': default: - usage(1); - return (EX_USAGE); + usage(false); + return (EXIT_FAILURE); } if (optind < argc) { if (optind + 1 != argc) - errx(EX_USAGE, "too many arguments"); - testno1 = strtonum(argv[optind], 0, UINT_MAX, &errstr); + errx(EXIT_FAILURE, "too many arguments"); + testno = strtonum(argv[optind], 0, UINT_MAX, &errstr); if (errstr != NULL) - errx(EX_USAGE, "wrong test number: %s", errstr); + errx(EXIT_FAILURE, "wrong test number: %s", errstr); } else - testno1 = 0; - - if (dgramflag == 0 && streamflag == 0) - dgramflag = streamflag = 1; + testno = 0; - if (dgramflag && streamflag && testno1 != 0) - errx(EX_USAGE, "you can use particular test, only with datagram or stream sockets"); + if (!dgram_flag && !stream_flag) + dgram_flag = stream_flag = true; - if (streamflag) { - if (testno1 > TEST_STREAM_NO_MAX) - errx(EX_USAGE, "given test %u for stream sockets does not exist", - testno1); + if (dgram_flag && stream_flag && testno != 0) + errx(EXIT_FAILURE, "particular test can be used " + "with the -t option only"); + + if (stream_flag) { + if (testno >= TEST_STREAM_TBL_SIZE) + errx(EXIT_FAILURE, "given test %u for stream " + "sockets does not exist", testno); } else { - if (testno1 > TEST_DGRAM_NO_MAX) - errx(EX_USAGE, "given test %u for datagram sockets does not exist", - testno1); - } - - my_uid = getuid(); - my_euid = geteuid(); - my_gid = getgid(); - my_egid = getegid(); - switch (my_ngids = getgroups(sizeof(my_gids) / sizeof(my_gids[0]), my_gids)) { - case -1: - err(EX_SOFTWARE, "getgroups"); - /* NOTREACHED */ - case 0: - errx(EX_OSERR, "getgroups returned 0 groups"); + if (testno >= TEST_DGRAM_TBL_SIZE) + errx(EXIT_FAILURE, "given test %u for datagram " + "sockets does not exist", testno); } - sig_init(); - if (mkdtemp(tempdir) == NULL) - err(EX_OSERR, "mkdtemp"); + err(EXIT_FAILURE, "mkdtemp"); - if (streamflag) { - sock_type = SOCK_STREAM; - sock_type_str = "SOCK_STREAM"; - if (testno1 == 0) { - testno1 = 1; - testno2 = TEST_STREAM_NO_MAX; - } else - testno2 = testno1; - if (run_tests(testno1, testno2) < 0) - goto failed; - testno1 = 0; - } + if (init() < 0) + return (EXIT_FAILURE); - if (dgramflag) { - sock_type = SOCK_DGRAM; - sock_type_str = "SOCK_DGRAM"; - if (testno1 == 0) { - testno1 = 1; - testno2 = TEST_DGRAM_NO_MAX; - } else - testno2 = testno1; - if (run_tests(testno1, testno2) < 0) - goto failed; - } + rv = EXIT_SUCCESS; + if (stream_flag) + if (run_tests(SOCK_STREAM, testno) < 0) + rv = EXIT_FAILURE; + if (dgram_flag && rv == EXIT_SUCCESS) + if (run_tests(SOCK_DGRAM, testno) < 0) + rv = EXIT_FAILURE; if (rmdir(tempdir) < 0) { logmsg("rmdir(%s)", tempdir); - return (EX_OSERR); + rv = EXIT_FAILURE; } - return (nfailed ? EX_OSERR : EX_OK); + return (failed_flag ? EXIT_FAILURE : rv); +} -failed: - if (rmdir(tempdir) < 0) - logmsg("rmdir(%s)", tempdir); - return (EX_OSERR); +static int +socket_close(int fd) +{ + int rv; + + rv = 0; + if (close(fd) < 0) { + logmsg("socket_close: close"); + rv = -1; + } + if (server_flag && fd == serv_sock_fd) + if (unlink(serv_addr_sun.sun_path) < 0) { + logmsg("socket_close: unlink(%s)", + serv_addr_sun.sun_path); + rv = -1; + } + return (rv); } -/* - * Create PF_LOCAL socket, if sock_path is not equal to NULL, then - * bind() it. Return socket address in addr. Return file descriptor - * or -1 if some error occurred. - */ static int -create_socket(char *sock_path, size_t sock_path_len, struct sockaddr_un *addr) +socket_create(void) { - int rv, fd; + struct timeval tv; + int fd; - if ((fd = socket(PF_LOCAL, sock_type, 0)) < 0) { - logmsg("create_socket: socket(PF_LOCAL, %s, 0)", sock_type_str); + fd = socket(PF_LOCAL, sock_type, 0); + if (fd < 0) { + logmsg("socket_create: socket(PF_LOCAL, %s, 0)", sock_type_str); return (-1); } + if (server_flag) + serv_sock_fd = fd; - if (sock_path != NULL) { - if ((rv = snprintf(sock_path, sock_path_len, "%s/%s", - tempdir, myname)) < 0) { - logmsg("create_socket: snprintf failed"); - goto failed; - } - if ((size_t)rv >= sock_path_len) { - logmsgx("create_socket: too long path name for given buffer"); - goto failed; - } + tv.tv_sec = TIMEOUT; + tv.tv_usec = 0; + if (setsockopt(fd, SOL_SOCKET, SO_RCVTIMEO, &tv, sizeof(tv)) < 0 || + setsockopt(fd, SOL_SOCKET, SO_SNDTIMEO, &tv, sizeof(tv)) < 0) { + logmsg("socket_create: setsockopt(SO_RCVTIMEO/SO_SNDTIMEO)"); + goto failed; + } - memset(addr, 0, sizeof(*addr)); - addr->sun_family = AF_LOCAL; - if (strlen(sock_path) >= sizeof(addr->sun_path)) { - logmsgx("create_socket: too long path name (>= %lu) for local domain socket", - (u_long)sizeof(addr->sun_path)); + if (server_flag) { + if (bind(fd, (struct sockaddr *)&serv_addr_sun, + serv_addr_sun.sun_len) < 0) { + logmsg("socket_create: bind(%s)", + serv_addr_sun.sun_path); goto failed; } - strcpy(addr->sun_path, sock_path); + if (sock_type == SOCK_STREAM) { + int val; - if (bind(fd, (struct sockaddr *)addr, SUN_LEN(addr)) < 0) { - logmsg("create_socket: bind(%s)", sock_path); - goto failed; + if (listen(fd, LISTENQ) < 0) { + logmsg("socket_create: listen"); + goto failed; + } + val = fcntl(fd, F_GETFL, 0); + if (val < 0) { + logmsg("socket_create: fcntl(F_GETFL)"); + goto failed; + } + if (fcntl(fd, F_SETFL, val | O_NONBLOCK) < 0) { + logmsg("socket_create: fcntl(F_SETFL)"); + goto failed; + } } } @@ -468,1163 +689,1325 @@ create_socket(char *sock_path, size_t so failed: if (close(fd) < 0) - logmsg("create_socket: close"); + logmsg("socket_create: close"); + if (server_flag) + if (unlink(serv_addr_sun.sun_path) < 0) + logmsg("socket_close: unlink(%s)", + serv_addr_sun.sun_path); return (-1); } -/* - * Call create_socket() for server listening socket. - * Return socket descriptor or -1 if some error occurred. - */ static int -create_server_socket(void) +socket_connect(int fd) { - return (create_socket(serv_sock_path, sizeof(serv_sock_path), &servaddr)); -} + dbgmsg("connect"); -/* - * Create unbound socket. - */ -static int -create_unbound_socket(void) -{ - return (create_socket((char *)NULL, 0, (struct sockaddr_un *)NULL)); + if (connect(fd, (struct sockaddr *)&serv_addr_sun, + serv_addr_sun.sun_len) < 0) { + logmsg("socket_connect: connect(%s)", serv_addr_sun.sun_path); + return (-1); + } + return (0); } -/* - * Close socket descriptor, if sock_path is not equal to NULL, - * then unlink the given path. - */ static int -close_socket(const char *sock_path, int fd) +sync_recv(void) { - int error = 0; + ssize_t ssize; + int fd; + char buf; - if (close(fd) < 0) { - logmsg("close_socket: close"); - error = -1; - } - if (sock_path != NULL) - if (unlink(sock_path) < 0) { - logmsg("close_socket: unlink(%s)", sock_path); - error = -1; - } - return (error); -} + dbgmsg("sync: wait"); -/* - * Connect to server (socket address in servaddr). - */ -static int -connect_server(int fd) -{ - dbgmsg(("connecting to %s", serv_sock_path)); + fd = sync_fd[server_flag ? SYNC_SERVER : SYNC_CLIENT][SYNC_RECV]; - /* - * If PF_LOCAL listening socket's queue is full, then connect() - * returns ECONNREFUSED immediately, do not need timeout. - */ - if (connect(fd, (struct sockaddr *)&servaddr, sizeof(servaddr)) < 0) { - logmsg("connect_server: connect(%s)", serv_sock_path); + ssize = read(fd, &buf, 1); + if (ssize < 0) { + logmsg("sync_recv: read"); + return (-1); + } + if (ssize < 1) { + logmsgx("sync_recv: read %zd of 1 byte", ssize); return (-1); } + dbgmsg("sync: received"); + return (0); } -/* - * sendmsg() with timeout. - */ static int -sendmsg_timeout(int fd, struct msghdr *msg, size_t n) +sync_send(void) { - ssize_t nsent; - - dbgmsg(("sending %lu bytes", (u_long)n)); - - if (sigsetjmp(env_alrm, 1) != 0) { - logmsgx("sendmsg_timeout: cannot send message to %s (timeout)", serv_sock_path); - return (-1); - } - - (void)alarm(TIMEOUT); + ssize_t ssize; + int fd; - nsent = sendmsg(fd, msg, 0); + dbgmsg("sync: send"); - (void)alarm(0); + fd = sync_fd[server_flag ? SYNC_CLIENT : SYNC_SERVER][SYNC_SEND]; - if (nsent < 0) { - logmsg("sendmsg_timeout: sendmsg"); + ssize = write(fd, "", 1); + if (ssize < 0) { + logmsg("sync_send: write"); return (-1); } - - if ((size_t)nsent != n) { - logmsgx("sendmsg_timeout: sendmsg: short send: %ld of %lu bytes", - (long)nsent, (u_long)n); + if (ssize < 1) { + logmsgx("sync_send: sent %zd of 1 byte", ssize); return (-1); } return (0); } -/* - * accept() with timeout. - */ static int -accept_timeout(int listenfd) +message_send(int fd, const struct msghdr *msghdr) { - int fd; - - dbgmsg(("accepting connection")); - - if (sigsetjmp(env_alrm, 1) != 0) { - logmsgx("accept_timeout: cannot accept connection (timeout)"); + size_t size; + ssize_t ssize; + const struct cmsghdr *cmsghdr; + int i; + + size = 0; + for (i = 0; i < msghdr->msg_iovlen; ++i) + size += msghdr->msg_iov[i].iov_len; + + cmsghdr = CMSG_FIRSTHDR(msghdr); + dbgmsg("send: msghdr.msg_controllen %u", + (u_int)msghdr->msg_controllen); + if (cmsghdr != NULL) + dbgmsg("send: cmsghdr.cmsg_len %u", + (u_int)cmsghdr->cmsg_len); + dbgmsg("send: data size %zu", size); + + ssize = sendmsg(fd, msghdr, 0); + if (ssize < 0) { + logmsg("message_send: sendmsg"); + return (-1); + } + if ((size_t)ssize != size) { + logmsgx("message_send: sendmsg: sent %zd of %zu bytes", + ssize, size); return (-1); } - (void)alarm(TIMEOUT); + if (!send_data_flag) + if (sync_send() < 0) + return (-1); - fd = accept(listenfd, (struct sockaddr *)NULL, (socklen_t *)NULL); + return (0); +} - (void)alarm(0); +static int +message_sendn(int fd, struct msghdr *msghdr) +{ + u_int i; - if (fd < 0) { - logmsg("accept_timeout: accept"); - return (-1); + for (i = 1; i <= ipc_msg.msg_num; ++i) { + dbgmsg("message #%u", i); + if (message_send(fd, msghdr) < 0) + return (-1); } - - return (fd); + return (0); } -/* - * recvmsg() with timeout. - */ static int -recvmsg_timeout(int fd, struct msghdr *msg, size_t n) +message_recv(int fd, struct msghdr *msghdr) { - ssize_t nread; + size_t size; + ssize_t ssize; + int i; - dbgmsg(("receiving %lu bytes", (u_long)n)); + if (!send_data_flag) + if (sync_recv() < 0) + return (-1); - if (sigsetjmp(env_alrm, 1) != 0) { - logmsgx("recvmsg_timeout: cannot receive message (timeout)"); + size = 0; + for (i = 0; i < msghdr->msg_iovlen; ++i) + size += msghdr->msg_iov[i].iov_len; + + dbgmsg("recv: data size %zu", size); + + ssize = recvmsg(fd, msghdr, MSG_WAITALL); + if (ssize < 0) { + logmsg("message_recv: recvmsg"); return (-1); } - - (void)alarm(TIMEOUT); - - nread = recvmsg(fd, msg, MSG_WAITALL); - - (void)alarm(0); - - if (nread < 0) { - logmsg("recvmsg_timeout: recvmsg"); + if ((size_t)ssize != size) { + logmsgx("message_recv: recvmsg: received %zd of %zu bytes", + ssize, size); return (-1); } - if ((size_t)nread != n) { - logmsgx("recvmsg_timeout: recvmsg: short read: %ld of %lu bytes", - (long)nread, (u_long)n); + if (send_data_flag && memcmp(ipc_msg.buf_recv, ipc_msg.buf_orig, + ipc_msg.buf_size) != 0) { + logmsgx("message_recv: recvmsg: message has wrong content"); return (-1); } return (0); } -/* - * Wait for synchronization message (1 byte) with timeout. - */ static int -sync_recv(int fd) +socket_accept(int listenfd) { - ssize_t nread; - char buf; - - dbgmsg(("waiting for sync message")); - - if (sigsetjmp(env_alrm, 1) != 0) { - logmsgx("sync_recv: cannot receive sync message (timeout)"); + fd_set rset; + struct timeval tv; + int fd, rv, val; + + dbgmsg("accept"); + + FD_ZERO(&rset); + FD_SET(listenfd, &rset); + tv.tv_sec = TIMEOUT; + tv.tv_usec = 0; + rv = select(listenfd + 1, &rset, (fd_set *)NULL, (fd_set *)NULL, &tv); + if (rv < 0) { + logmsg("socket_accept: select"); + return (-1); + } + if (rv == 0) { + logmsgx("socket_accept: select timeout"); return (-1); } - (void)alarm(TIMEOUT); + fd = accept(listenfd, (struct sockaddr *)NULL, (socklen_t *)NULL); + if (fd < 0) { + logmsg("socket_accept: accept"); + return (-1); + } - nread = read(fd, &buf, 1); + val = fcntl(fd, F_GETFL, 0); + if (val < 0) { + logmsg("socket_accept: fcntl(F_GETFL)"); + goto failed; + } + if (fcntl(fd, F_SETFL, val & ~O_NONBLOCK) < 0) { + logmsg("socket_accept: fcntl(F_SETFL)"); + goto failed; + } - (void)alarm(0); + return (fd); - if (nread < 0) { - logmsg("sync_recv: read"); - return (-1); - } +failed: + if (close(fd) < 0) + logmsg("socket_accept: close"); + return (-1); +} - if (nread != 1) { - logmsgx("sync_recv: read: short read: %ld of 1 byte", - (long)nread); +static int +check_nxthdr(struct msghdr *msghdr, struct cmsghdr *cmsghdr) +{ + if (CMSG_NXTHDR(msghdr, cmsghdr) != NULL) { + logmsgx("ancillary data has extra object"); return (-1); } - return (0); } -/* - * Send synchronization message (1 byte) with timeout. - */ static int -sync_send(int fd) +check_msghdr(const struct msghdr *msghdr, size_t size) { - ssize_t nsent; - - dbgmsg(("sending sync message")); + dbgmsg("recv: msghdr.msg_controllen %u", + (u_int)msghdr->msg_controllen); - if (sigsetjmp(env_alrm, 1) != 0) { - logmsgx("sync_send: cannot send sync message (timeout)"); + if (msghdr->msg_flags & MSG_TRUNC) { + logmsgx("msghdr.msg_flags has MSG_TRUNC"); return (-1); } - - (void)alarm(TIMEOUT); - - nsent = write(fd, "", 1); - - (void)alarm(0); - - if (nsent < 0) { - logmsg("sync_send: write"); + if (msghdr->msg_flags & MSG_CTRUNC) { + logmsgx("msghdr.msg_flags has MSG_CTRUNC"); return (-1); } - - if (nsent != 1) { - logmsgx("sync_send: write: short write: %ld of 1 byte", - (long)nsent); + if (msghdr->msg_controllen < size) { + logmsgx("msghdr.msg_controllen %u < %zu", + (u_int)msghdr->msg_controllen, size); + return (-1); + } + if (msghdr->msg_controllen > 0 && size == 0) { + logmsgx("msghdr.msg_controllen %u > 0", + (u_int)msghdr->msg_controllen); return (-1); } - return (0); } -/* - * waitpid() for client with timeout. - */ static int -wait_client(void) +check_cmsghdr(const struct cmsghdr *cmsghdr, int type, size_t size) { - int status; - pid_t pid; - - if (sigsetjmp(env_alrm, 1) != 0) { - logmsgx("wait_client: cannot get exit status of client PID %ld (timeout)", - (long)client_pid); + if (cmsghdr == NULL) { + logmsgx("cmsghdr is NULL"); return (-1); } - (void)alarm(TIMEOUT); - - pid = waitpid(client_pid, &status, 0); - - (void)alarm(0); + dbgmsg("recv: cmsghdr.cmsg_len %u", (u_int)cmsghdr->cmsg_len); - if (pid == (pid_t)-1) { - logmsg("wait_client: waitpid"); + if (cmsghdr->cmsg_level != SOL_SOCKET) { + logmsgx("cmsghdr.cmsg_level %d != SOL_SOCKET", + cmsghdr->cmsg_level); return (-1); } - - if (WIFEXITED(status)) { - if (WEXITSTATUS(status) != 0) { - logmsgx("wait_client: exit status of client PID %ld is %d", - (long)client_pid, WEXITSTATUS(status)); - return (-1); - } - } else { - if (WIFSIGNALED(status)) - logmsgx("wait_client: abnormal termination of client PID %ld, signal %d%s", - (long)client_pid, WTERMSIG(status), WCOREDUMP(status) ? " (core file generated)" : ""); - else - logmsgx("wait_client: termination of client PID %ld, unknown status", - (long)client_pid); + if (cmsghdr->cmsg_type != type) { + logmsgx("cmsghdr.cmsg_type %d != %d", + cmsghdr->cmsg_type, type); + return (-1); + } + if (cmsghdr->cmsg_len != CMSG_LEN(size)) { + logmsgx("cmsghdr.cmsg_len %u != %zu", + (u_int)cmsghdr->cmsg_len, CMSG_LEN(size)); return (-1); } - return (0); } -/* - * Check if n supplementary GIDs in gids are correct. (my_gids + 1) - * has (my_ngids - 1) supplementary GIDs of current process. - */ static int -check_groups(const gid_t *gids, int n) +check_groups(const char *gid_arr_str, const gid_t *gid_arr, + const char *gid_num_str, int gid_num, bool all_gids) { - char match[NGROUPS_MAX] = { 0 }; - int error, i, j; + int i; - if (n != my_ngids - 1) { - logmsgx("wrong number of groups %d != %d (returned from getgroups() - 1)", - n, my_ngids - 1); - error = -1; - } else - error = 0; - for (i = 0; i < n; ++i) { - for (j = 1; j < my_ngids; ++j) { - if (gids[i] == my_gids[j]) { - if (match[j]) { - logmsgx("duplicated GID %lu", - (u_long)gids[i]); - error = -1; - } else - match[j] = 1; - break; - } + for (i = 0; i < gid_num; ++i) + dbgmsg("%s[%d] %lu", gid_arr_str, i, (u_long)gid_arr[i]); + + if (all_gids) { + if (gid_num != proc_cred.gid_num) { + logmsgx("%s %d != %d", gid_num_str, gid_num, + proc_cred.gid_num); + return (-1); } - if (j == my_ngids) { - logmsgx("unexpected GID %lu", (u_long)gids[i]); - error = -1; + } else { + if (gid_num > proc_cred.gid_num) { + logmsgx("%s %d > %d", gid_num_str, gid_num, + proc_cred.gid_num); + return (-1); } } - for (j = 1; j < my_ngids; ++j) - if (match[j] == 0) { - logmsgx("did not receive supplementary GID %u", my_gids[j]); - error = -1; - } - return (error); + if (memcmp(gid_arr, proc_cred.gid_arr, + gid_num * sizeof(*gid_arr)) != 0) { + logmsgx("%s content is wrong", gid_arr_str); + for (i = 0; i < gid_num; ++i) + if (gid_arr[i] != proc_cred.gid_arr[i]) { + logmsgx("%s[%d] %lu != %lu", + gid_arr_str, i, (u_long)gid_arr[i], + (u_long)proc_cred.gid_arr[i]); + break; + } + return (-1); + } + return (0); } -/* - * Send n messages with data and control message with SCM_CREDS type - * to server and exit. - */ -static void -t_cmsgcred_client(u_int n) +static int +check_xucred(const struct xucred *xucred, socklen_t len) { - union { - struct cmsghdr cm; - char control[CMSG_SPACE(sizeof(struct cmsgcred))]; - } control_un; - struct msghdr msg; - struct iovec iov[1]; - struct cmsghdr *cmptr; - int fd; - u_int i; + if (len != sizeof(*xucred)) { + logmsgx("option value size %zu != %zu", + (size_t)len, sizeof(*xucred)); + return (-1); + } - assert(n == 1 || n == 2); + dbgmsg("xucred.cr_version %u", xucred->cr_version); + dbgmsg("xucred.cr_uid %lu", (u_long)xucred->cr_uid); + dbgmsg("xucred.cr_ngroups %d", xucred->cr_ngroups); + + if (xucred->cr_version != XUCRED_VERSION) { + logmsgx("xucred.cr_version %u != %d", + xucred->cr_version, XUCRED_VERSION); + return (-1); + } + if (xucred->cr_uid != proc_cred.euid) { + logmsgx("xucred.cr_uid %lu != %lu (EUID)", + (u_long)xucred->cr_uid, (u_long)proc_cred.euid); + return (-1); + } + if (xucred->cr_ngroups == 0) { + logmsgx("xucred.cr_ngroups == 0"); + return (-1); + } + if (xucred->cr_ngroups < 0) { + logmsgx("xucred.cr_ngroups < 0"); + return (-1); + } + if (xucred->cr_ngroups > XU_NGROUPS) { + logmsgx("xucred.cr_ngroups %hu > %u (max)", + xucred->cr_ngroups, XU_NGROUPS); + return (-1); + } + if (xucred->cr_groups[0] != proc_cred.egid) { + logmsgx("xucred.cr_groups[0] %lu != %lu (EGID)", + (u_long)xucred->cr_groups[0], (u_long)proc_cred.egid); + return (-1); + } + if (check_groups("xucred.cr_groups", xucred->cr_groups, + "xucred.cr_ngroups", xucred->cr_ngroups, false) < 0) + return (-1); + return (0); +} - if ((fd = create_unbound_socket()) < 0) - goto failed; +static int +check_scm_creds_cmsgcred(struct cmsghdr *cmsghdr) +{ + const struct cmsgcred *cmsgcred; - if (connect_server(fd) < 0) - goto failed_close; + if (check_cmsghdr(cmsghdr, SCM_CREDS, sizeof(*cmsgcred)) < 0) + return (-1); - iov[0].iov_base = ipc_message; - iov[0].iov_len = IPC_MESSAGE_SIZE; + cmsgcred = (struct cmsgcred *)CMSG_DATA(cmsghdr); - msg.msg_name = NULL; - msg.msg_namelen = 0; - msg.msg_iov = iov; - msg.msg_iovlen = 1; - msg.msg_control = control_un.control; - msg.msg_controllen = no_control_data ? - sizeof(struct cmsghdr) : sizeof(control_un.control); - msg.msg_flags = 0; - - cmptr = CMSG_FIRSTHDR(&msg); - cmptr->cmsg_len = CMSG_LEN(no_control_data ? - 0 : sizeof(struct cmsgcred)); - cmptr->cmsg_level = SOL_SOCKET; - cmptr->cmsg_type = SCM_CREDS; - - for (i = 0; i < n; ++i) { - dbgmsg(("#%u msg_controllen = %u, cmsg_len = %u", i, - (u_int)msg.msg_controllen, (u_int)cmptr->cmsg_len)); - if (sendmsg_timeout(fd, &msg, IPC_MESSAGE_SIZE) < 0) - goto failed_close; + dbgmsg("cmsgcred.cmcred_pid %ld", (long)cmsgcred->cmcred_pid); + dbgmsg("cmsgcred.cmcred_uid %lu", (u_long)cmsgcred->cmcred_uid); + dbgmsg("cmsgcred.cmcred_euid %lu", (u_long)cmsgcred->cmcred_euid); + dbgmsg("cmsgcred.cmcred_gid %lu", (u_long)cmsgcred->cmcred_gid); + dbgmsg("cmsgcred.cmcred_ngroups %d", cmsgcred->cmcred_ngroups); + + if (cmsgcred->cmcred_pid != client_pid) { + logmsgx("cmsgcred.cmcred_pid %ld != %ld", + (long)cmsgcred->cmcred_pid, (long)client_pid); + return (-1); + } + if (cmsgcred->cmcred_uid != proc_cred.uid) { + logmsgx("cmsgcred.cmcred_uid %lu != %lu", + (u_long)cmsgcred->cmcred_uid, (u_long)proc_cred.uid); + return (-1); + } + if (cmsgcred->cmcred_euid != proc_cred.euid) { + logmsgx("cmsgcred.cmcred_euid %lu != %lu", + (u_long)cmsgcred->cmcred_euid, (u_long)proc_cred.euid); + return (-1); + } + if (cmsgcred->cmcred_gid != proc_cred.gid) { + logmsgx("cmsgcred.cmcred_gid %lu != %lu", + (u_long)cmsgcred->cmcred_gid, (u_long)proc_cred.gid); + return (-1); + } + if (cmsgcred->cmcred_ngroups == 0) { + logmsgx("cmsgcred.cmcred_ngroups == 0"); + return (-1); } + if (cmsgcred->cmcred_ngroups < 0) { + logmsgx("cmsgcred.cmcred_ngroups %d < 0", + cmsgcred->cmcred_ngroups); + return (-1); + } + if (cmsgcred->cmcred_ngroups > CMGROUP_MAX) { + logmsgx("cmsgcred.cmcred_ngroups %d > %d", + cmsgcred->cmcred_ngroups, CMGROUP_MAX); + return (-1); + } + if (cmsgcred->cmcred_groups[0] != proc_cred.egid) { + logmsgx("cmsgcred.cmcred_groups[0] %lu != %lu (EGID)", + (u_long)cmsgcred->cmcred_groups[0], (u_long)proc_cred.egid); + return (-1); + } + if (check_groups("cmsgcred.cmcred_groups", cmsgcred->cmcred_groups, + "cmsgcred.cmcred_ngroups", cmsgcred->cmcred_ngroups, false) < 0) + return (-1); + return (0); +} - if (close_socket((const char *)NULL, fd) < 0) - goto failed; +static int +check_scm_creds_sockcred(struct cmsghdr *cmsghdr) +{ + const struct sockcred *sockcred; - _exit(0); + if (check_cmsghdr(cmsghdr, SCM_CREDS, + SOCKCREDSIZE(proc_cred.gid_num)) < 0) + return (-1); -failed_close: - (void)close_socket((const char *)NULL, fd); + sockcred = (struct sockcred *)CMSG_DATA(cmsghdr); -failed: - _exit(1); + dbgmsg("sockcred.sc_uid %lu", (u_long)sockcred->sc_uid); + dbgmsg("sockcred.sc_euid %lu", (u_long)sockcred->sc_euid); + dbgmsg("sockcred.sc_gid %lu", (u_long)sockcred->sc_gid); + dbgmsg("sockcred.sc_egid %lu", (u_long)sockcred->sc_egid); + dbgmsg("sockcred.sc_ngroups %d", sockcred->sc_ngroups); + + if (sockcred->sc_uid != proc_cred.uid) { + logmsgx("sockcred.sc_uid %lu != %lu", + (u_long)sockcred->sc_uid, (u_long)proc_cred.uid); + return (-1); + } + if (sockcred->sc_euid != proc_cred.euid) { + logmsgx("sockcred.sc_euid %lu != %lu", + (u_long)sockcred->sc_euid, (u_long)proc_cred.euid); + return (-1); + } + if (sockcred->sc_gid != proc_cred.gid) { + logmsgx("sockcred.sc_gid %lu != %lu", + (u_long)sockcred->sc_gid, (u_long)proc_cred.gid); + return (-1); + } + if (sockcred->sc_egid != proc_cred.egid) { + logmsgx("sockcred.sc_egid %lu != %lu", + (u_long)sockcred->sc_egid, (u_long)proc_cred.egid); + return (-1); + } + if (sockcred->sc_ngroups == 0) { + logmsgx("sockcred.sc_ngroups == 0"); + return (-1); + } + if (sockcred->sc_ngroups < 0) { + logmsgx("sockcred.sc_ngroups %d < 0", + sockcred->sc_ngroups); + return (-1); + } + if (sockcred->sc_ngroups != proc_cred.gid_num) { + logmsgx("sockcred.sc_ngroups %d != %u", + sockcred->sc_ngroups, proc_cred.gid_num); + return (-1); + } + if (check_groups("sockcred.sc_groups", sockcred->sc_groups, + "sockcred.sc_ngroups", sockcred->sc_ngroups, true) < 0) + return (-1); + return (0); } -/* - * Receive two messages with data and control message with SCM_CREDS - * type followed by struct cmsgcred{} from client. fd1 is a listen - * socket for stream sockets or simply socket for datagram sockets. - */ static int -t_cmsgcred_server(int fd1) +check_scm_timestamp(struct cmsghdr *cmsghdr) { - char buf[IPC_MESSAGE_SIZE]; - union { - struct cmsghdr cm; - char control[CMSG_SPACE(sizeof(struct cmsgcred)) + EXTRA_CMSG_SPACE]; - } control_un; - struct msghdr msg; - struct iovec iov[1]; - struct cmsghdr *cmptr; - const struct cmsgcred *cmcredptr; - socklen_t controllen; - int error, error2, fd2; - u_int i; + const struct timeval *timeval; - if (sock_type == SOCK_STREAM) { - if ((fd2 = accept_timeout(fd1)) < 0) - return (-2); - } else - fd2 = fd1; + if (check_cmsghdr(cmsghdr, SCM_TIMESTAMP, sizeof(struct timeval)) < 0) + return (-1); - error = 0; + timeval = (struct timeval *)CMSG_DATA(cmsghdr); - controllen = sizeof(control_un.control); + dbgmsg("timeval.tv_sec %"PRIdMAX", timeval.tv_usec %"PRIdMAX, + (intmax_t)timeval->tv_sec, (intmax_t)timeval->tv_usec); - for (i = 0; i < 2; ++i) { - iov[0].iov_base = buf; - iov[0].iov_len = sizeof(buf); + return (0); +} - msg.msg_name = NULL; - msg.msg_namelen = 0; - msg.msg_iov = iov; - msg.msg_iovlen = 1; - msg.msg_control = control_un.control; - msg.msg_controllen = controllen; - msg.msg_flags = 0; +static int +check_scm_bintime(struct cmsghdr *cmsghdr) +{ + const struct bintime *bintime; - controllen = CMSG_SPACE(sizeof(struct cmsgcred)); + if (check_cmsghdr(cmsghdr, SCM_BINTIME, sizeof(struct bintime)) < 0) + return (-1); - if (recvmsg_timeout(fd2, &msg, sizeof(buf)) < 0) - goto failed; + bintime = (struct bintime *)CMSG_DATA(cmsghdr); - if (msg.msg_flags & MSG_CTRUNC) { - logmsgx("#%u control data was truncated, MSG_CTRUNC flag is on", - i); - goto next_error; - } + dbgmsg("bintime.sec %"PRIdMAX", bintime.frac %"PRIu64, + (intmax_t)bintime->sec, bintime->frac); - if (msg.msg_controllen < sizeof(struct cmsghdr)) { - logmsgx("#%u msg_controllen %u < %lu (sizeof(struct cmsghdr))", - i, (u_int)msg.msg_controllen, (u_long)sizeof(struct cmsghdr)); - goto next_error; - } + return (0); +} - if ((cmptr = CMSG_FIRSTHDR(&msg)) == NULL) { - logmsgx("CMSG_FIRSTHDR is NULL"); - goto next_error; - } +static void +msghdr_init_generic(struct msghdr *msghdr, struct iovec *iov, void *cmsg_data) +{ + msghdr->msg_name = NULL; + msghdr->msg_namelen = 0; + if (send_data_flag) { + iov->iov_base = server_flag ? + ipc_msg.buf_recv : ipc_msg.buf_orig; + iov->iov_len = ipc_msg.buf_size; + msghdr->msg_iov = iov; + msghdr->msg_iovlen = 1; + } else { + msghdr->msg_iov = NULL; + msghdr->msg_iovlen = 0; + } + msghdr->msg_control = cmsg_data; + msghdr->msg_flags = 0; +} - dbgmsg(("#%u msg_controllen = %u, cmsg_len = %u", i, - (u_int)msg.msg_controllen, (u_int)cmptr->cmsg_len)); +static void +msghdr_init_server(struct msghdr *msghdr, struct iovec *iov, + void *cmsg_data, size_t cmsg_size) +{ + msghdr_init_generic(msghdr, iov, cmsg_data); + msghdr->msg_controllen = cmsg_size; + dbgmsg("init: msghdr.msg_controllen %u", + (u_int)msghdr->msg_controllen); + dbgmsg("init: data size %zu", msghdr->msg_iov != NULL ? + msghdr->msg_iov->iov_len : (size_t)0); +} - if (cmptr->cmsg_level != SOL_SOCKET) { - logmsgx("#%u cmsg_level %d != SOL_SOCKET", i, - cmptr->cmsg_level); - goto next_error; - } +static void +msghdr_init_client(struct msghdr *msghdr, struct iovec *iov, + void *cmsg_data, size_t cmsg_size, int type, size_t arr_size) +{ + struct cmsghdr *cmsghdr; - if (cmptr->cmsg_type != SCM_CREDS) { - logmsgx("#%u cmsg_type %d != SCM_CREDS", i, - cmptr->cmsg_type); - goto next_error; - } + msghdr_init_generic(msghdr, iov, cmsg_data); + if (cmsg_data != NULL) { + msghdr->msg_controllen = send_array_flag ? + cmsg_size : CMSG_SPACE(0); + cmsghdr = CMSG_FIRSTHDR(msghdr); + cmsghdr->cmsg_level = SOL_SOCKET; + cmsghdr->cmsg_type = type; + cmsghdr->cmsg_len = CMSG_LEN(send_array_flag ? arr_size : 0); + } else + msghdr->msg_controllen = 0; +} - if (cmptr->cmsg_len != CMSG_LEN(sizeof(struct cmsgcred))) { - logmsgx("#%u cmsg_len %u != %lu (CMSG_LEN(sizeof(struct cmsgcred))", - i, (u_int)cmptr->cmsg_len, (u_long)CMSG_LEN(sizeof(struct cmsgcred))); - goto next_error; - } +static int +t_generic(int (*client_func)(int), int (*server_func)(int)) +{ + int fd, rv, rv_client; - cmcredptr = (const struct cmsgcred *)CMSG_DATA(cmptr); + switch (client_fork()) { + case 0: + fd = socket_create(); + if (fd < 0) + rv = -2; + else { + rv = client_func(fd); + if (socket_close(fd) < 0) + rv = -2; + } + client_exit(rv); + break; + case 1: + fd = socket_create(); + if (fd < 0) + rv = -2; + else { + rv = server_func(fd); + rv_client = client_wait(); + if (rv == 0 || (rv == -2 && rv_client != 0)) + rv = rv_client; + if (socket_close(fd) < 0) + rv = -2; + } + break; + default: + rv = -2; + } + return (rv); +} - error2 = 0; - if (cmcredptr->cmcred_pid != client_pid) { - logmsgx("#%u cmcred_pid %ld != %ld (PID of client)", - i, (long)cmcredptr->cmcred_pid, (long)client_pid); - error2 = 1; - } - if (cmcredptr->cmcred_uid != my_uid) { - logmsgx("#%u cmcred_uid %lu != %lu (UID of current process)", - i, (u_long)cmcredptr->cmcred_uid, (u_long)my_uid); - error2 = 1; - } - if (cmcredptr->cmcred_euid != my_euid) { - logmsgx("#%u cmcred_euid %lu != %lu (EUID of current process)", - i, (u_long)cmcredptr->cmcred_euid, (u_long)my_euid); - error2 = 1; - } - if (cmcredptr->cmcred_gid != my_gid) { - logmsgx("#%u cmcred_gid %lu != %lu (GID of current process)", - i, (u_long)cmcredptr->cmcred_gid, (u_long)my_gid); - error2 = 1; - } - if (cmcredptr->cmcred_ngroups == 0) { - logmsgx("#%u cmcred_ngroups = 0, this is wrong", i); - error2 = 1; - } else { - if (cmcredptr->cmcred_ngroups > NGROUPS_MAX) { - logmsgx("#%u cmcred_ngroups %d > %u (NGROUPS_MAX)", - i, cmcredptr->cmcred_ngroups, NGROUPS_MAX); - error2 = 1; - } else if (cmcredptr->cmcred_ngroups < 0) { - logmsgx("#%u cmcred_ngroups %d < 0", - i, cmcredptr->cmcred_ngroups); - error2 = 1; - } else { - dbgmsg(("#%u cmcred_ngroups = %d", i, - cmcredptr->cmcred_ngroups)); - if (cmcredptr->cmcred_groups[0] != my_egid) { - logmsgx("#%u cmcred_groups[0] %lu != %lu (EGID of current process)", - i, (u_long)cmcredptr->cmcred_groups[0], (u_long)my_egid); - error2 = 1; - } - if (check_groups(cmcredptr->cmcred_groups + 1, cmcredptr->cmcred_ngroups - 1) < 0) { - logmsgx("#%u cmcred_groups has wrong GIDs", i); - error2 = 1; - } - } - } +static int +t_cmsgcred_client(int fd) +{ + struct msghdr msghdr; + struct iovec iov[1]; + void *cmsg_data; + size_t cmsg_size; + int rv; - if (error2) - goto next_error; + if (sync_recv() < 0) + return (-2); - if ((cmptr = CMSG_NXTHDR(&msg, cmptr)) != NULL) { - logmsgx("#%u control data has extra header", i); - goto next_error; - } + rv = -2; - continue; -next_error: - error = -1; + cmsg_size = CMSG_SPACE(sizeof(struct cmsgcred)); + cmsg_data = malloc(cmsg_size); + if (cmsg_data == NULL) { + logmsg("malloc"); + goto done; } + msghdr_init_client(&msghdr, iov, cmsg_data, cmsg_size, + SCM_CREDS, sizeof(struct cmsgcred)); - if (sock_type == SOCK_STREAM) - if (close(fd2) < 0) { - logmsg("close"); - return (-2); - } - return (error); + if (socket_connect(fd) < 0) + goto done; -failed: - if (sock_type == SOCK_STREAM) - if (close(fd2) < 0) - logmsg("close"); - return (-2); + if (message_sendn(fd, &msghdr) < 0) + goto done; + + rv = 0; +done: + free(cmsg_data); + return (rv); } static int -t_cmsgcred(void) +t_cmsgcred_server(int fd1) { - int error, fd; + struct msghdr msghdr; + struct iovec iov[1]; + struct cmsghdr *cmsghdr; + void *cmsg_data; + size_t cmsg_size; + u_int i; + int fd2, rv; - if ((fd = create_server_socket()) < 0) + if (sync_send() < 0) return (-2); - if (sock_type == SOCK_STREAM) - if (listen(fd, LISTENQ) < 0) { - logmsg("listen"); - goto failed; - } + fd2 = -1; + rv = -2; - if ((client_pid = fork()) == (pid_t)-1) { - logmsg("fork"); - goto failed; + cmsg_size = CMSG_SPACE(sizeof(struct cmsgcred)) + CMSG_SPACE_EXTRA; + cmsg_data = malloc(cmsg_size); + if (cmsg_data == NULL) { + logmsg("malloc"); + goto done; } - if (client_pid == 0) { - myname = "CLIENT"; - if (close_socket((const char *)NULL, fd) < 0) - _exit(1); - t_cmsgcred_client(2); - } + if (sock_type == SOCK_STREAM) { + fd2 = socket_accept(fd1); + if (fd2 < 0) + goto done; + } else + fd2 = fd1; - if ((error = t_cmsgcred_server(fd)) == -2) { - (void)wait_client(); - goto failed; - } + rv = -1; + for (i = 1; i <= ipc_msg.msg_num; ++i) { + dbgmsg("message #%u", i); + + msghdr_init_server(&msghdr, iov, cmsg_data, cmsg_size); + if (message_recv(fd2, &msghdr) < 0) { + rv = -2; + break; + } - if (wait_client() < 0) - goto failed; + if (check_msghdr(&msghdr, sizeof(*cmsghdr)) < 0) + break; - if (close_socket(serv_sock_path, fd) < 0) { - logmsgx("close_socket failed"); - return (-2); + cmsghdr = CMSG_FIRSTHDR(&msghdr); + if (check_scm_creds_cmsgcred(cmsghdr) < 0) + break; + + if (check_nxthdr(&msghdr, cmsghdr) < 0) + break; } - return (error); + if (i > ipc_msg.msg_num) + rv = 0; +done: + free(cmsg_data); + if (sock_type == SOCK_STREAM && fd2 >= 0) + if (socket_close(fd2) < 0) + rv = -2; + return (rv); +} -failed: - if (close_socket(serv_sock_path, fd) < 0) - logmsgx("close_socket failed"); - return (-2); +static int +t_cmsgcred(void) +{ + return (t_generic(t_cmsgcred_client, t_cmsgcred_server)); } -/* - * Send two messages with data to server and exit. - */ -static void -t_sockcred_client(int type) +static int +t_sockcred_client(int type, int fd) { - struct msghdr msg; + struct msghdr msghdr; struct iovec iov[1]; - int fd; - u_int i; - - assert(type == 0 || type == 1); + int rv; - if ((fd = create_unbound_socket()) < 0) - goto failed; + if (sync_recv() < 0) + return (-2); - if (connect_server(fd) < 0) - goto failed_close; + rv = -2; - if (type == 1) - if (sync_recv(fd) < 0) - goto failed_close; - - iov[0].iov_base = ipc_message; - iov[0].iov_len = IPC_MESSAGE_SIZE; - - msg.msg_name = NULL; - msg.msg_namelen = 0; - msg.msg_iov = iov; - msg.msg_iovlen = 1; - msg.msg_control = NULL; - msg.msg_controllen = 0; - msg.msg_flags = 0; - - for (i = 0; i < 2; ++i) - if (sendmsg_timeout(fd, &msg, IPC_MESSAGE_SIZE) < 0) - goto failed_close; + msghdr_init_client(&msghdr, iov, NULL, 0, 0, 0); - if (close_socket((const char *)NULL, fd) < 0) - goto failed; + if (socket_connect(fd) < 0) + goto done; - _exit(0); + if (type == 2) + if (sync_recv() < 0) + goto done; -failed_close: - (void)close_socket((const char *)NULL, fd); + if (message_sendn(fd, &msghdr) < 0) + goto done; -failed: - _exit(1); + rv = 0; +done: + return (rv); } -/* - * Receive one message with data and control message with SCM_CREDS - * type followed by struct sockcred{} and if n is not equal 1, then - * receive another one message with data. fd1 is a listen socket for - * stream sockets or simply socket for datagram sockets. If type is - * 1, then set LOCAL_CREDS option for accepted stream socket. - */ static int -t_sockcred_server(int type, int fd1, u_int n) +t_sockcred_server(int type, int fd1) { - char buf[IPC_MESSAGE_SIZE]; - union { - struct cmsghdr cm; - char control[CMSG_SPACE(SOCKCREDSIZE(NGROUPS_MAX)) + EXTRA_CMSG_SPACE]; - } control_un; - struct msghdr msg; + struct msghdr msghdr; struct iovec iov[1]; - struct cmsghdr *cmptr; - const struct sockcred *sockcred; - int error, error2, fd2, optval; + struct cmsghdr *cmsghdr; + void *cmsg_data; + size_t cmsg_size; u_int i; + int fd2, rv, val; - assert(n == 1 || n == 2); - assert(type == 0 || type == 1); + fd2 = -1; + rv = -2; - if (sock_type == SOCK_STREAM) { - if ((fd2 = accept_timeout(fd1)) < 0) - return (-2); - if (type == 1) { - optval = 1; - if (setsockopt(fd2, 0, LOCAL_CREDS, &optval, sizeof optval) < 0) { - logmsg("setsockopt(LOCAL_CREDS) for accepted socket"); - if (errno == ENOPROTOOPT) { - error = -1; - goto done_close; - } - goto failed; - } - if (sync_send(fd2) < 0) - goto failed; - } - } else - fd2 = fd1; - - error = 0; - - for (i = 0; i < n; ++i) { - iov[0].iov_base = buf; - iov[0].iov_len = sizeof buf; - - msg.msg_name = NULL; - msg.msg_namelen = 0; - msg.msg_iov = iov; - msg.msg_iovlen = 1; - msg.msg_control = control_un.control; - msg.msg_controllen = sizeof control_un.control; - msg.msg_flags = 0; - - if (recvmsg_timeout(fd2, &msg, sizeof buf) < 0) - goto failed; + cmsg_size = CMSG_SPACE(SOCKCREDSIZE(proc_cred.gid_num)) + + CMSG_SPACE_EXTRA; + cmsg_data = malloc(cmsg_size); + if (cmsg_data == NULL) { + logmsg("malloc"); + goto done; + } - if (msg.msg_flags & MSG_CTRUNC) { - logmsgx("control data was truncated, MSG_CTRUNC flag is on"); - goto next_error; + if (type == 1) { + dbgmsg("setting LOCAL_CREDS"); + val = 1; + if (setsockopt(fd1, 0, LOCAL_CREDS, &val, sizeof(val)) < 0) { + logmsg("setsockopt(LOCAL_CREDS)"); + goto done; } + } - if (i != 0 && sock_type == SOCK_STREAM) { - if (msg.msg_controllen != 0) { - logmsgx("second message has control data, this is wrong for stream sockets"); - goto next_error; - } - dbgmsg(("#%u msg_controllen = %u", i, - (u_int)msg.msg_controllen)); - continue; - } + if (sync_send() < 0) + goto done; - if (msg.msg_controllen < sizeof(struct cmsghdr)) { - logmsgx("#%u msg_controllen %u < %lu (sizeof(struct cmsghdr))", - i, (u_int)msg.msg_controllen, (u_long)sizeof(struct cmsghdr)); - goto next_error; - } + if (sock_type == SOCK_STREAM) { + fd2 = socket_accept(fd1); + if (fd2 < 0) + goto done; + } else + fd2 = fd1; - if ((cmptr = CMSG_FIRSTHDR(&msg)) == NULL) { - logmsgx("CMSG_FIRSTHDR is NULL"); - goto next_error; + if (type == 2) { + dbgmsg("setting LOCAL_CREDS"); + val = 1; + if (setsockopt(fd2, 0, LOCAL_CREDS, &val, sizeof(val)) < 0) { + logmsg("setsockopt(LOCAL_CREDS)"); + goto done; + } + if (sync_send() < 0) + goto done; + } + + rv = -1; + for (i = 1; i <= ipc_msg.msg_num; ++i) { + dbgmsg("message #%u", i); + + msghdr_init_server(&msghdr, iov, cmsg_data, cmsg_size); + if (message_recv(fd2, &msghdr) < 0) { + rv = -2; + break; } - dbgmsg(("#%u msg_controllen = %u, cmsg_len = %u", i, - (u_int)msg.msg_controllen, (u_int)cmptr->cmsg_len)); - - if (cmptr->cmsg_level != SOL_SOCKET) { - logmsgx("#%u cmsg_level %d != SOL_SOCKET", i, - cmptr->cmsg_level); - goto next_error; - } + if (i > 1 && sock_type == SOCK_STREAM) { + if (check_msghdr(&msghdr, 0) < 0) + break; + } else { + if (check_msghdr(&msghdr, sizeof(*cmsghdr)) < 0) + break; - if (cmptr->cmsg_type != SCM_CREDS) { - logmsgx("#%u cmsg_type %d != SCM_CREDS", i, - cmptr->cmsg_type); - goto next_error; - } + cmsghdr = CMSG_FIRSTHDR(&msghdr); + if (check_scm_creds_sockcred(cmsghdr) < 0) + break; - if (cmptr->cmsg_len < CMSG_LEN(SOCKCREDSIZE(1))) { - logmsgx("#%u cmsg_len %u != %lu (CMSG_LEN(SOCKCREDSIZE(1)))", - i, (u_int)cmptr->cmsg_len, (u_long)CMSG_LEN(SOCKCREDSIZE(1))); - goto next_error; + if (check_nxthdr(&msghdr, cmsghdr) < 0) + break; } + } + if (i > ipc_msg.msg_num) + rv = 0; +done: + free(cmsg_data); + if (sock_type == SOCK_STREAM && fd2 >= 0) + if (socket_close(fd2) < 0) + rv = -2; + return (rv); +} - sockcred = (const struct sockcred *)CMSG_DATA(cmptr); +static int +t_sockcred_1(void) +{ + u_int i; + int fd, rv, rv_client; - error2 = 0; - if (sockcred->sc_uid != my_uid) { - logmsgx("#%u sc_uid %lu != %lu (UID of current process)", - i, (u_long)sockcred->sc_uid, (u_long)my_uid); - error2 = 1; - } - if (sockcred->sc_euid != my_euid) { - logmsgx("#%u sc_euid %lu != %lu (EUID of current process)", - i, (u_long)sockcred->sc_euid, (u_long)my_euid); - error2 = 1; - } - if (sockcred->sc_gid != my_gid) { - logmsgx("#%u sc_gid %lu != %lu (GID of current process)", - i, (u_long)sockcred->sc_gid, (u_long)my_gid); - error2 = 1; - } - if (sockcred->sc_egid != my_egid) { - logmsgx("#%u sc_egid %lu != %lu (EGID of current process)", - i, (u_long)sockcred->sc_gid, (u_long)my_egid); - error2 = 1; - } - if (sockcred->sc_ngroups > NGROUPS_MAX) { - logmsgx("#%u sc_ngroups %d > %u (NGROUPS_MAX)", - i, sockcred->sc_ngroups, NGROUPS_MAX); - error2 = 1; - } else if (sockcred->sc_ngroups < 0) { - logmsgx("#%u sc_ngroups %d < 0", - i, sockcred->sc_ngroups); - error2 = 1; - } else { - dbgmsg(("#%u sc_ngroups = %d", i, sockcred->sc_ngroups)); - if (check_groups(sockcred->sc_groups, sockcred->sc_ngroups) < 0) { - logmsgx("#%u sc_groups has wrong GIDs", i); - error2 = 1; + switch (client_fork()) { + case 0: + for (i = 1; i <= 2; ++i) { + dbgmsg("client #%u", i); + fd = socket_create(); + if (fd < 0) + rv = -2; + else { + rv = t_sockcred_client(1, fd); + if (socket_close(fd) < 0) + rv = -2; } + if (rv != 0) + break; } - - if (error2) - goto next_error; - - if ((cmptr = CMSG_NXTHDR(&msg, cmptr)) != NULL) { - logmsgx("#%u control data has extra header, this is wrong", - i); - goto next_error; + client_exit(rv); + break; + case 1: + fd = socket_create(); + if (fd < 0) + rv = -2; + else { + rv = t_sockcred_server(1, fd); + if (rv == 0) + rv = t_sockcred_server(3, fd); + rv_client = client_wait(); + if (rv == 0 || (rv == -2 && rv_client != 0)) + rv = rv_client; + if (socket_close(fd) < 0) + rv = -2; } - - continue; -next_error: - error = -1; + break; + default: + rv = -2; } -done_close: - if (sock_type == SOCK_STREAM) - if (close(fd2) < 0) { - logmsg("close"); - return (-2); - } - return (error); + return (rv); +} -failed: - if (sock_type == SOCK_STREAM) - if (close(fd2) < 0) - logmsg("close"); - return (-2); +static int +t_sockcred_2_client(int fd) +{ + return (t_sockcred_client(2, fd)); } static int -t_sockcred(int type) +t_sockcred_2_server(int fd) { - int error, fd, optval; + return (t_sockcred_server(2, fd)); +} - assert(type == 0 || type == 1); +static int +t_sockcred_2(void) +{ + return (t_generic(t_sockcred_2_client, t_sockcred_2_server)); +} - if ((fd = create_server_socket()) < 0) - return (-2); +static int +t_cmsgcred_sockcred_server(int fd1) +{ + struct msghdr msghdr; + struct iovec iov[1]; + struct cmsghdr *cmsghdr; + void *cmsg_data, *cmsg1_data, *cmsg2_data; + size_t cmsg_size, cmsg1_size, cmsg2_size; + u_int i; + int fd2, rv, val; - if (sock_type == SOCK_STREAM) - if (listen(fd, LISTENQ) < 0) { - logmsg("listen"); - goto failed; - } + fd2 = -1; + rv = -2; - if (type == 0) { - optval = 1; - if (setsockopt(fd, 0, LOCAL_CREDS, &optval, sizeof optval) < 0) { - logmsg("setsockopt(LOCAL_CREDS) for %s socket", - sock_type == SOCK_STREAM ? "stream listening" : "datagram"); - if (errno == ENOPROTOOPT) { - error = -1; - goto done_close; - } - goto failed; - } + cmsg1_size = CMSG_SPACE(SOCKCREDSIZE(proc_cred.gid_num)) + + CMSG_SPACE_EXTRA; + cmsg2_size = CMSG_SPACE(sizeof(struct cmsgcred)) + CMSG_SPACE_EXTRA; + cmsg1_data = malloc(cmsg1_size); + cmsg2_data = malloc(cmsg2_size); + if (cmsg1_data == NULL || cmsg2_data == NULL) { + logmsg("malloc"); + goto done; } - if ((client_pid = fork()) == (pid_t)-1) { - logmsg("fork"); - goto failed; + dbgmsg("setting LOCAL_CREDS"); + val = 1; + if (setsockopt(fd1, 0, LOCAL_CREDS, &val, sizeof(val)) < 0) { + logmsg("setsockopt(LOCAL_CREDS)"); + goto done; } - if (client_pid == 0) { - myname = "CLIENT"; - if (close_socket((const char *)NULL, fd) < 0) - _exit(1); - t_sockcred_client(type); - } + if (sync_send() < 0) + goto done; - if ((error = t_sockcred_server(type, fd, 2)) == -2) { - (void)wait_client(); - goto failed; - } + if (sock_type == SOCK_STREAM) { + fd2 = socket_accept(fd1); + if (fd2 < 0) + goto done; + } else + fd2 = fd1; - if (wait_client() < 0) - goto failed; + cmsg_data = cmsg1_data; + cmsg_size = cmsg1_size; + rv = -1; + for (i = 1; i <= ipc_msg.msg_num; ++i) { + dbgmsg("message #%u", i); + + msghdr_init_server(&msghdr, iov, cmsg_data, cmsg_size); + if (message_recv(fd2, &msghdr) < 0) { + rv = -2; + break; + } -done_close: - if (close_socket(serv_sock_path, fd) < 0) { - logmsgx("close_socket failed"); - return (-2); - } - return (error); + if (check_msghdr(&msghdr, sizeof(*cmsghdr)) < 0) + break; -failed: - if (close_socket(serv_sock_path, fd) < 0) - logmsgx("close_socket failed"); - return (-2); -} + cmsghdr = CMSG_FIRSTHDR(&msghdr); + if (i == 1 || sock_type == SOCK_DGRAM) { + if (check_scm_creds_sockcred(cmsghdr) < 0) + break; + } else { + if (check_scm_creds_cmsgcred(cmsghdr) < 0) + break; + } -static int -t_sockcred_stream1(void) -{ - return (t_sockcred(0)); + if (check_nxthdr(&msghdr, cmsghdr) < 0) + break; + } + if (i > ipc_msg.msg_num) + rv = 0; +done: + free(cmsg1_data); + free(cmsg2_data); + if (sock_type == SOCK_STREAM && fd2 >= 0) + if (socket_close(fd2) < 0) + rv = -2; + return (rv); } static int -t_sockcred_stream2(void) +t_cmsgcred_sockcred(void) { - return (t_sockcred(1)); + return (t_generic(t_cmsgcred_client, t_cmsgcred_sockcred_server)); } static int -t_sockcred_dgram(void) +t_timeval_client(int fd) { - return (t_sockcred(0)); + struct msghdr msghdr; + struct iovec iov[1]; + void *cmsg_data; + size_t cmsg_size; + int rv; + + if (sync_recv() < 0) + return (-2); + + rv = -2; + + cmsg_size = CMSG_SPACE(sizeof(struct timeval)); + cmsg_data = malloc(cmsg_size); + if (cmsg_data == NULL) { + logmsg("malloc"); + goto done; + } + msghdr_init_client(&msghdr, iov, cmsg_data, cmsg_size, + SCM_TIMESTAMP, sizeof(struct timeval)); + + if (socket_connect(fd) < 0) + goto done; + + if (message_sendn(fd, &msghdr) < 0) + goto done; + + rv = 0; +done: + free(cmsg_data); + return (rv); } static int -t_cmsgcred_sockcred(void) +t_timeval_server(int fd1) { - int error, fd, optval; + struct msghdr msghdr; + struct iovec iov[1]; + struct cmsghdr *cmsghdr; + void *cmsg_data; + size_t cmsg_size; + u_int i; + int fd2, rv; - if ((fd = create_server_socket()) < 0) + if (sync_send() < 0) return (-2); - if (sock_type == SOCK_STREAM) - if (listen(fd, LISTENQ) < 0) { - logmsg("listen"); - goto failed; - } + fd2 = -1; + rv = -2; - optval = 1; - if (setsockopt(fd, 0, LOCAL_CREDS, &optval, sizeof optval) < 0) { - logmsg("setsockopt(LOCAL_CREDS) for %s socket", - sock_type == SOCK_STREAM ? "stream listening" : "datagram"); - if (errno == ENOPROTOOPT) { - error = -1; - goto done_close; - } - goto failed; + cmsg_size = CMSG_SPACE(sizeof(struct timeval)) + CMSG_SPACE_EXTRA; + cmsg_data = malloc(cmsg_size); + if (cmsg_data == NULL) { + logmsg("malloc"); + goto done; } - if ((client_pid = fork()) == (pid_t)-1) { - logmsg("fork"); - goto failed; - } + if (sock_type == SOCK_STREAM) { + fd2 = socket_accept(fd1); + if (fd2 < 0) + goto done; + } else + fd2 = fd1; - if (client_pid == 0) { - myname = "CLIENT"; - if (close_socket((const char *)NULL, fd) < 0) - _exit(1); - t_cmsgcred_client(1); - } + rv = -1; + for (i = 1; i <= ipc_msg.msg_num; ++i) { + dbgmsg("message #%u", i); + + msghdr_init_server(&msghdr, iov, cmsg_data, cmsg_size); + if (message_recv(fd2, &msghdr) < 0) { + rv = -2; + break; + } - if ((error = t_sockcred_server(0, fd, 1)) == -2) { - (void)wait_client(); - goto failed; - } + if (check_msghdr(&msghdr, sizeof(*cmsghdr)) < 0) + break; - if (wait_client() < 0) - goto failed; + cmsghdr = CMSG_FIRSTHDR(&msghdr); + if (check_scm_timestamp(cmsghdr) < 0) + break; -done_close: - if (close_socket(serv_sock_path, fd) < 0) { - logmsgx("close_socket failed"); - return (-2); + if (check_nxthdr(&msghdr, cmsghdr) < 0) + break; } - return (error); + if (i > ipc_msg.msg_num) + rv = 0; +done: + free(cmsg_data); + if (sock_type == SOCK_STREAM && fd2 >= 0) + if (socket_close(fd2) < 0) + rv = -2; + return (rv); +} -failed: - if (close_socket(serv_sock_path, fd) < 0) - logmsgx("close_socket failed"); - return (-2); +static int +t_timeval(void) +{ + return (t_generic(t_timeval_client, t_timeval_server)); } -/* - * Send one message with data and control message with SCM_TIMESTAMP - * type to server and exit. - */ -static void -t_timestamp_client(void) +static int +t_bintime_client(int fd) { - union { - struct cmsghdr cm; - char control[CMSG_SPACE(sizeof(struct timeval))]; - } control_un; - struct msghdr msg; + struct msghdr msghdr; struct iovec iov[1]; - struct cmsghdr *cmptr; - int fd; - - if ((fd = create_unbound_socket()) < 0) - goto failed; - - if (connect_server(fd) < 0) - goto failed_close; - - iov[0].iov_base = ipc_message; - iov[0].iov_len = IPC_MESSAGE_SIZE; - - msg.msg_name = NULL; - msg.msg_namelen = 0; - msg.msg_iov = iov; - msg.msg_iovlen = 1; - msg.msg_control = control_un.control; - msg.msg_controllen = no_control_data ? - sizeof(struct cmsghdr) :sizeof control_un.control; - msg.msg_flags = 0; - - cmptr = CMSG_FIRSTHDR(&msg); - cmptr->cmsg_len = CMSG_LEN(no_control_data ? - 0 : sizeof(struct timeval)); - cmptr->cmsg_level = SOL_SOCKET; - cmptr->cmsg_type = SCM_TIMESTAMP; + void *cmsg_data; + size_t cmsg_size; + int rv; - dbgmsg(("msg_controllen = %u, cmsg_len = %u", - (u_int)msg.msg_controllen, (u_int)cmptr->cmsg_len)); + if (sync_recv() < 0) + return (-2); - if (sendmsg_timeout(fd, &msg, IPC_MESSAGE_SIZE) < 0) - goto failed_close; + rv = -2; - if (close_socket((const char *)NULL, fd) < 0) - goto failed; + cmsg_size = CMSG_SPACE(sizeof(struct bintime)); + cmsg_data = malloc(cmsg_size); + if (cmsg_data == NULL) { + logmsg("malloc"); + goto done; + } + msghdr_init_client(&msghdr, iov, cmsg_data, cmsg_size, + SCM_BINTIME, sizeof(struct bintime)); - _exit(0); + if (socket_connect(fd) < 0) + goto done; -failed_close: - (void)close_socket((const char *)NULL, fd); + if (message_sendn(fd, &msghdr) < 0) + goto done; -failed: - _exit(1); + rv = 0; +done: + free(cmsg_data); + return (rv); } -/* - * Receive one message with data and control message with SCM_TIMESTAMP - * type followed by struct timeval{} from client. - */ static int -t_timestamp_server(int fd1) +t_bintime_server(int fd1) { - union { - struct cmsghdr cm; - char control[CMSG_SPACE(sizeof(struct timeval)) + EXTRA_CMSG_SPACE]; - } control_un; - char buf[IPC_MESSAGE_SIZE]; - int error, fd2; - struct msghdr msg; + struct msghdr msghdr; struct iovec iov[1]; - struct cmsghdr *cmptr; - const struct timeval *timeval; + struct cmsghdr *cmsghdr; + void *cmsg_data; + size_t cmsg_size; + u_int i; + int fd2, rv; + + if (sync_send() < 0) + return (-2); + + fd2 = -1; + rv = -2; + + cmsg_size = CMSG_SPACE(sizeof(struct bintime)) + CMSG_SPACE_EXTRA; + cmsg_data = malloc(cmsg_size); + if (cmsg_data == NULL) { + logmsg("malloc"); + goto done; + } if (sock_type == SOCK_STREAM) { - if ((fd2 = accept_timeout(fd1)) < 0) - return (-2); + fd2 = socket_accept(fd1); + if (fd2 < 0) + goto done; } else fd2 = fd1; - iov[0].iov_base = buf; - iov[0].iov_len = sizeof buf; - - msg.msg_name = NULL; - msg.msg_namelen = 0; - msg.msg_iov = iov; - msg.msg_iovlen = 1; - msg.msg_control = control_un.control; - msg.msg_controllen = sizeof control_un.control; - msg.msg_flags = 0; + rv = -1; + for (i = 1; i <= ipc_msg.msg_num; ++i) { + dbgmsg("message #%u", i); + + msghdr_init_server(&msghdr, iov, cmsg_data, cmsg_size); + if (message_recv(fd2, &msghdr) < 0) { + rv = -2; + break; + } - if (recvmsg_timeout(fd2, &msg, sizeof buf) < 0) - goto failed; + if (check_msghdr(&msghdr, sizeof(*cmsghdr)) < 0) + break; - error = -1; + cmsghdr = CMSG_FIRSTHDR(&msghdr); + if (check_scm_bintime(cmsghdr) < 0) + break; - if (msg.msg_flags & MSG_CTRUNC) { - logmsgx("control data was truncated, MSG_CTRUNC flag is on"); - goto done; + if (check_nxthdr(&msghdr, cmsghdr) < 0) + break; } + if (i > ipc_msg.msg_num) + rv = 0; +done: + free(cmsg_data); + if (sock_type == SOCK_STREAM && fd2 >= 0) + if (socket_close(fd2) < 0) + rv = -2; + return (rv); +} - if (msg.msg_controllen < sizeof(struct cmsghdr)) { - logmsgx("msg_controllen %u < %lu (sizeof(struct cmsghdr))", - (u_int)msg.msg_controllen, (u_long)sizeof(struct cmsghdr)); - goto done; - } +static int +t_bintime(void) +{ + return (t_generic(t_bintime_client, t_bintime_server)); +} - if ((cmptr = CMSG_FIRSTHDR(&msg)) == NULL) { - logmsgx("CMSG_FIRSTHDR is NULL"); - goto done; - } +static int +t_cmsg_len_client(int fd) +{ + struct msghdr msghdr; + struct iovec iov[1]; + struct cmsghdr *cmsghdr; + void *cmsg_data; + size_t cmsg_size; + socklen_t socklen; + int rv; - dbgmsg(("msg_controllen = %u, cmsg_len = %u", - (u_int)msg.msg_controllen, (u_int)cmptr->cmsg_len)); + if (sync_recv() < 0) + return (-2); - if (cmptr->cmsg_level != SOL_SOCKET) { - logmsgx("cmsg_level %d != SOL_SOCKET", cmptr->cmsg_level); + rv = -2; + + cmsg_size = CMSG_SPACE(sizeof(struct cmsgcred)); + cmsg_data = malloc(cmsg_size); + if (cmsg_data == NULL) { + logmsg("malloc"); goto done; } - if (cmptr->cmsg_type != SCM_TIMESTAMP) { - logmsgx("cmsg_type %d != SCM_TIMESTAMP", cmptr->cmsg_type); + if (socket_connect(fd) < 0) goto done; + + iov[0].iov_base = ipc_msg.buf_orig; + iov[0].iov_len = ipc_msg.buf_size; + + msghdr.msg_name = NULL; + msghdr.msg_namelen = 0; + msghdr.msg_iov = iov; + msghdr.msg_iovlen = 1; + msghdr.msg_control = cmsg_data; + msghdr.msg_flags = 0; + msghdr.msg_controllen = cmsg_size; + + cmsghdr = CMSG_FIRSTHDR(&msghdr); + cmsghdr->cmsg_level = SOL_SOCKET; + cmsghdr->cmsg_type = SCM_CREDS; + + for (socklen = 0; socklen < CMSG_LEN(0); ++socklen) { + cmsghdr->cmsg_len = socklen; + dbgmsg("send: msghdr.msg_controllen %u", + (u_int)msghdr.msg_controllen); + dbgmsg("send: cmsghdr.cmsg_len %u", + (u_int)cmsghdr->cmsg_len); + dbgmsg("send: data size %zu", iov[0].iov_len); + if (sendmsg(fd, &msghdr, 0) < 0) + continue; + logmsgx("sent message with cmsghdr.cmsg_len %u < %u", + (u_int)cmsghdr->cmsg_len, (u_int)CMSG_LEN(0)); + rv = -1; + break; } + if (socklen == CMSG_LEN(0)) + rv = 0; - if (cmptr->cmsg_len != CMSG_LEN(sizeof(struct timeval))) { - logmsgx("cmsg_len %u != %lu (CMSG_LEN(sizeof(struct timeval))", - (u_int)cmptr->cmsg_len, (u_long)CMSG_LEN(sizeof(struct timeval))); + if (sync_send() < 0) { + rv = -2; goto done; } +done: + free(cmsg_data); + return (rv); +} - timeval = (const struct timeval *)CMSG_DATA(cmptr); +static int +t_cmsg_len_server(int fd1) +{ + int fd2, rv; - dbgmsg(("timeval tv_sec %jd, tv_usec %jd", - (intmax_t)timeval->tv_sec, (intmax_t)timeval->tv_usec)); + if (sync_send() < 0) + return (-2); - if ((cmptr = CMSG_NXTHDR(&msg, cmptr)) != NULL) { - logmsgx("control data has extra header"); - goto done; - } + rv = -2; - error = 0; + if (sock_type == SOCK_STREAM) { + fd2 = socket_accept(fd1); + if (fd2 < 0) + goto done; + } else + fd2 = fd1; + if (sync_recv() < 0) + goto done; + + rv = 0; done: - if (sock_type == SOCK_STREAM) - if (close(fd2) < 0) { - logmsg("close"); - return (-2); - } - return (error); + if (sock_type == SOCK_STREAM && fd2 >= 0) + if (socket_close(fd2) < 0) + rv = -2; + return (rv); +} -failed: - if (sock_type == SOCK_STREAM) - if (close(fd2) < 0) - logmsg("close"); - return (-2); +static int +t_cmsg_len(void) +{ + return (t_generic(t_cmsg_len_client, t_cmsg_len_server)); } static int -t_timestamp(void) +t_peercred_client(int fd) { - int error, fd; + struct xucred xucred; + socklen_t len; - if ((fd = create_server_socket()) < 0) - return (-2); + if (sync_recv() < 0) + return (-1); - if (sock_type == SOCK_STREAM) - if (listen(fd, LISTENQ) < 0) { - logmsg("listen"); - goto failed; - } + if (socket_connect(fd) < 0) + return (-1); - if ((client_pid = fork()) == (pid_t)-1) { - logmsg("fork"); - goto failed; + len = sizeof(xucred); + if (getsockopt(fd, 0, LOCAL_PEERCRED, &xucred, &len) < 0) { + logmsg("getsockopt(LOCAL_PEERCRED)"); + return (-1); } - if (client_pid == 0) { - myname = "CLIENT"; - if (close_socket((const char *)NULL, fd) < 0) - _exit(1); - t_timestamp_client(); - } + if (check_xucred(&xucred, len) < 0) + return (-1); - if ((error = t_timestamp_server(fd)) == -2) { - (void)wait_client(); - goto failed; - } + return (0); +} - if (wait_client() < 0) - goto failed; +static int +t_peercred_server(int fd1) +{ + struct xucred xucred; + socklen_t len; + int fd2, rv; - if (close_socket(serv_sock_path, fd) < 0) { - logmsgx("close_socket failed"); + if (sync_send() < 0) return (-2); + + fd2 = socket_accept(fd1); + if (fd2 < 0) + return (-2); + + len = sizeof(xucred); + if (getsockopt(fd2, 0, LOCAL_PEERCRED, &xucred, &len) < 0) { + logmsg("getsockopt(LOCAL_PEERCRED)"); + rv = -2; + goto done; } - return (error); -failed: - if (close_socket(serv_sock_path, fd) < 0) - logmsgx("close_socket failed"); - return (-2); + if (check_xucred(&xucred, len) < 0) { + rv = -1; + goto done; + } + + rv = 0; +done: + if (socket_close(fd2) < 0) + rv = -2; + return (rv); +} + +static int +t_peercred(void) +{ + return (t_generic(t_peercred_client, t_peercred_server)); } diff -ruNp unix_cmsg.orig/unix_cmsg.t unix_cmsg/unix_cmsg.t --- unix_cmsg.orig/unix_cmsg.t 2012-11-19 14:38:48.000000000 +0200 +++ unix_cmsg/unix_cmsg.t 2013-02-07 12:09:45.000000000 +0200 @@ -11,47 +11,66 @@ n=0 run() { - result=`${cmd} -t $2 $3 $4 2>&1` - if [ $? -eq 0 ]; then - echo -n "ok $1" - else - echo -n "not ok $1" + result=`${cmd} -t $2 $3 ${5%% *} 2>&1` + if [ $? -ne 0 ]; then + echo -n "not " fi - echo " -" $5 + echo "ok $1 - $4 ${5#* }" echo ${result} | grep -E "SERVER|CLIENT" | while read line; do echo "# ${line}" done } -echo "1..15" +echo "1..38" -for desc in \ - "Sending, receiving cmsgcred" \ - "Receiving sockcred (listening socket has LOCAL_CREDS) # TODO" \ - "Receiving sockcred (accepted socket has LOCAL_CREDS) # TODO" \ - "Sending cmsgcred, receiving sockcred # TODO" \ - "Sending, receiving timestamp" +for t1 in \ + "1 Sending, receiving cmsgcred" \ + "4 Sending cmsgcred, receiving sockcred" \ + "5 Sending, receiving timeval" \ + "6 Sending, receiving bintime" do - n=`expr ${n} + 1` - run ${n} stream "" ${n} "STREAM ${desc}" + for t2 in \ + "0 " \ + "1 (no data)" \ + "2 (no array)" \ + "3 (no data, array)" + do + n=$((n + 1)) + run ${n} stream "-z ${t2%% *}" STREAM "${t1} ${t2#* }" + done done -i=0 -for desc in \ - "Sending, receiving cmsgcred" \ - "Receiving sockcred # TODO" \ - "Sending cmsgcred, receiving sockcred # TODO" \ - "Sending, receiving timestamp" +n=$((n + 1)) +run ${n} stream "-z 0" STREAM "2 Receiving sockcred (listening socket)" + +n=$((n + 1)) +run ${n} stream "-z 0" STREAM "3 Receiving sockcred (accepted socket)" + +for t1 in \ + "1 Sending, receiving cmsgcred" \ + "2 Receiving sockcred" \ + "4 Sending, receiving timeval" \ + "5 Sending, receiving bintime" do - i=`expr ${i} + 1` - n=`expr ${n} + 1` - run ${n} dgram "" ${i} "DGRAM ${desc}" + for t2 in \ + "0 " \ + "1 (no data)" \ + "2 (no array)" \ + "3 (no data, array)" + do + n=$((n + 1)) + run ${n} dgram "-z ${t2%% *}" DGRAM "${t1} ${t2#* }" + done done -run 10 stream -z 1 "STREAM Sending, receiving cmsgcred (no control data)" -run 11 stream -z 4 "STREAM Sending cmsgcred, receiving sockcred (no control data) # TODO" -run 12 stream -z 5 "STREAM Sending, receiving timestamp (no control data)" - -run 13 dgram -z 1 "DGRAM Sending, receiving cmsgcred (no control data)" -run 14 dgram -z 3 "DGRAM Sending cmsgcred, receiving sockcred (no control data) # TODO" -run 15 dgram -z 4 "DGRAM Sending, receiving timestamp (no control data)" +n=$((n + 1)) +run ${n} dgram "-z 0" DGRAM "3 Sending cmsgcred, receiving sockcred" + +n=$((n + 1)) +run ${n} stream "-z 0" STREAM "7 Check cmsghdr.cmsg_len" + +n=$((n + 1)) +run ${n} dgram "-z 0" DGRAM "6 Check cmsghdr.cmsg_len" + +n=$((n + 1)) +run ${n} stream "-z 0" STREAM "8 Check LOCAL_PEERCRED socket option" From owner-freebsd-net@FreeBSD.ORG Thu Feb 7 19:38:17 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 882D4EA3 for ; Thu, 7 Feb 2013 19:38:17 +0000 (UTC) (envelope-from tjg@ucsc.edu) Received: from mail-ie0-x232.google.com (ie-in-x0232.1e100.net [IPv6:2607:f8b0:4001:c03::232]) by mx1.freebsd.org (Postfix) with ESMTP id 5E3458EB for ; Thu, 7 Feb 2013 19:38:17 +0000 (UTC) Received: by mail-ie0-f178.google.com with SMTP id c13so3980562ieb.37 for ; Thu, 07 Feb 2013 11:38:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ucsc.edu; s=ucsc-google; h=mime-version:x-received:date:message-id:subject:from:to :content-type; bh=Iyhh5RzQEm7j/9VMWtZkZd94ML4MsMnsejGDriz6OaQ=; b=M5CY62E+esD532+uNYKBwYAfX9nLMBlQGmUVjPFOK/4aLxlv04uKY8cD1WJ1SudGyC drc5dJAZ+L2W7KDrS+pblGig6Pkxpxb5Gn28kwwQ07ZKrRXRFVmXaNqcBaZvAwUUSSPO Poqdg7PPZQtxmv6nqFsrEvHMGfGT4SJ3mEuxM= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-received:date:message-id:subject:from:to :content-type:x-gm-message-state; bh=Iyhh5RzQEm7j/9VMWtZkZd94ML4MsMnsejGDriz6OaQ=; b=LYnYxvF1pm7egmXTAm17Wjy6v8vpd33BWoGJNcFYEw9jSUTHiDT7Dk7LRwLXxSfd4Y fOJxbOzGpx77MMxaOK/nASFKVSZGxvlh576HbmKRWaxVJT4zRAkZm5vkOZW8wJgPnqtz FuGDLq67TdRSxX4WDelaypSzk1Lh4Ynz9vGhU3Z3p/KRotxsrqNHvRzghVtlv7H6xTq/ +X7H39oCNAvYrvdEBcfKc6g2DaRvLd1ucGssA72uzYHLKbtnzUb7+r0UyJGXs7JmWl7K b+N2EXII+KrYsXXHWbhVYxA9oQn1dU7EjFh1WKFMWG8QgYQSYb1+64kU66QqVATGVhmL P6Qw== MIME-Version: 1.0 X-Received: by 10.42.58.202 with SMTP id j10mr4487083ich.39.1360265896226; Thu, 07 Feb 2013 11:38:16 -0800 (PST) Received: by 10.42.241.73 with HTTP; Thu, 7 Feb 2013 11:38:15 -0800 (PST) Date: Thu, 7 Feb 2013 11:38:15 -0800 Message-ID: Subject: L2TP with Certificates From: Tim Gustafson To: freebsd-net@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-Gm-Message-State: ALoCoQnKXNffSAT9I94ynPodf+V0ALUvRPWf4xRRtPtBfrgjPtTqxEPUmj7q7F1vrOXGH4QC7jdE X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Feb 2013 19:38:17 -0000 Hi, I've found a handful of good tutorials explaining how to set up MPD and Racoon using pre-shared keys and user/pass authentication, but I can't seem to find anything that uses certificates. Is this not an option? Does anyone know if there's a good demo of this out there? Also, if I do use user/pass auth, I see that I can specify an external password-verification program in MPD. Has anyone had any luck with tying that to an LDAP server for auth? -- Tim Gustafson tjg@ucsc.edu 831-459-5354 Baskin Engineering, Room 313A From owner-freebsd-net@FreeBSD.ORG Thu Feb 7 20:04:03 2013 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id D97859B1; Thu, 7 Feb 2013 20:04:03 +0000 (UTC) (envelope-from gnn@neville-neil.com) Received: from vps.hungerhost.com (vps.hungerhost.com [216.38.53.176]) by mx1.freebsd.org (Postfix) with ESMTP id 9EA46A75; Thu, 7 Feb 2013 20:04:03 +0000 (UTC) Received: from [38.105.238.108] (port=54565 helo=[10.7.1.235]) by vps.hungerhost.com with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.80) (envelope-from ) id 1U3XhQ-0005dY-PZ; Thu, 07 Feb 2013 15:04:00 -0500 Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Subject: Re: [PATCH] Add a new TCP_IGNOREIDLE socket option From: George Neville-Neil In-Reply-To: <511292C9.4040307@mu.org> Date: Thu, 7 Feb 2013 15:04:08 -0500 Content-Transfer-Encoding: quoted-printable Message-Id: References: <201301221511.02496.jhb@freebsd.org> <50FF06AD.402@networx.ch> <061B4EA5-6A93-48A0-A269-C2C3A3C7E77C@lakerest.net> <201302060746.43736.jhb@freebsd.org> <511292C9.4040307@mu.org> To: Alfred Perlstein X-Mailer: Apple Mail (2.1499) X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - vps.hungerhost.com X-AntiAbuse: Original Domain - freebsd.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - neville-neil.com X-Get-Message-Sender-Via: vps.hungerhost.com: authenticated_id: gnn@neville-neil.com Cc: Randall Stewart , John Baldwin , net@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Feb 2013 20:04:03 -0000 On Feb 6, 2013, at 12:28 , Alfred Perlstein wrote: > On 2/6/13 4:46 AM, John Baldwin wrote: >> On Wednesday, February 06, 2013 6:27:04 am Randall Stewart wrote: >>> John: >>>=20 >>> A burst at line rate will *often* cause drops. This is because >>> router queues are at a finite size. Also such a burst (especially >>> on a long delay bandwidth network) cause your RTT to increase even >>> if there is no drop which is going to hurt you as well. >>>=20 >>> A SHOULD in an RFC says you really really really really need to do = it >>> unless there is some thing that makes you willing to override it. It = is >>> slight wiggle room. >>>=20 >>> In this I agree with Andre, we should not be *not* doing it. = Otherwise >>> folks will be turning this on and it is plain wrong. It may be fine >>> for your network but I would not want to see it in FreeBSD. >>>=20 >>> In my testing here at home I have put back into our stack max-burst. = This >>> uses Mark Allman's version (not Kacheong Poon's) where you clamp the = cwnd at >>> no more than 4 packets larger than your flight. All of my testing >>> high-bw-delay or lan has shown this to improve TCP performance. This >>> is because it helps you avoid bursting out so many packets that you = overflow >>> a queue. >>>=20 >>> In your long-delay bw link if you do burst out too many (and you = never >>> know how many that is since you can not predict how full all those >>> MPLS queues are or how big they are) you will really hurt yourself = even worse. >>> Note that generally in Cisco routers the default queue size is = somewhere between >>> 100-300 packets depending on the router. >> Due to the way our application works this never happens, but I am = fine with >> just keeping this patch private. If there are other shops that need = this they >> can always dig the patch up from the archives. >>=20 > This is yet another time when I'm sad about how things happen in = FreeBSD. >=20 > A developer come forward with a non-default option that's very useful = for some specific workloads, specifically one that contributes much time = and $$$ to the project and the community rejects the patches even though = it's been successful in other OSes. >=20 > It makes zero sense. >=20 > John, can you repost the patch? Maybe there is a way to refactor this = somehow so it's like accept filters where we can plug in a hook for TCP? >=20 > I am very disappointed, but not surprised. >=20 I take away the complete opposite feeling. This is how we work through = these issues. It's clear from the discussion that this need not be a default in the = system, and is a special case. We had a reasoned discussion of what would be = best to do and at least two experts in TCP weighed in on the effect this change = might have. Not everything proposed by a developer need go into the tree, in = particular since these discussions are archived we can always revisit this later. This is exactly how collaborative development should look, whether or = not the patch is integrated now, next week, next year, or ever. Best, George From owner-freebsd-net@FreeBSD.ORG Thu Feb 7 20:07:32 2013 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 10684AC1; Thu, 7 Feb 2013 20:07:32 +0000 (UTC) (envelope-from gnn@neville-neil.com) Received: from vps.hungerhost.com (vps.hungerhost.com [216.38.53.176]) by mx1.freebsd.org (Postfix) with ESMTP id DC25BAB9; Thu, 7 Feb 2013 20:07:31 +0000 (UTC) Received: from [38.105.238.108] (port=54612 helo=[10.7.1.235]) by vps.hungerhost.com with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.80) (envelope-from ) id 1U3Xkp-0008Jv-8v; Thu, 07 Feb 2013 15:07:31 -0500 Content-Type: text/plain; charset=iso-8859-1 Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Subject: Re: A question about SYN cookies... From: George Neville-Neil In-Reply-To: <510F7AB5.1040508@freebsd.org> Date: Thu, 7 Feb 2013 15:07:39 -0500 Content-Transfer-Encoding: quoted-printable Message-Id: <2615E46D-2C39-42DB-B38F-E15A39A730BB@neville-neil.com> References: <131E67C7-F336-414E-89C7-535D549443F5@neville-neil.com> <510F7AB5.1040508@freebsd.org> To: Andre Oppermann X-Mailer: Apple Mail (2.1499) X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - vps.hungerhost.com X-AntiAbuse: Original Domain - freebsd.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - neville-neil.com X-Get-Message-Sender-Via: vps.hungerhost.com: authenticated_id: gnn@neville-neil.com Cc: net@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Feb 2013 20:07:32 -0000 On Feb 4, 2013, at 04:09 , Andre Oppermann wrote: > On 04.02.2013 01:09, George Neville-Neil wrote: >> Howdy, >>=20 >> I've been reviewing the SYN cache and SYN cookie code and I'm = wondering why we do all the work >> of generating a SYN cache entry before sending a SYN cookie. If the = point of SYN cookies is to >> defend against a SYN flood then, to my mind, the SYN/ACK for the = cookie case should be sent off before >> doing all the work to try to create and insert a cache entry. Has = anyone, as yet, looked at a way >> to move the sending code earlier into syncache_add() and checked to = see if there is a performance >> improvement when a system is flooded with SYN packets? >=20 > So far all syncookie implementations have an information loss because > they can't store all state in the cookie unless timestamps are = enabled. > Apparently Windows 8 still doesn't enable timestamps but does quite a > bit of window scaling leading to problems. See recent bug report here > on net@. >=20 Yes, I heard about that off list and then got time to review the = mailbox. > For generating syncookies we have three possible strategies: >=20 > 1/ Use syncache and cookies in parallel and bump the oldest syncache > entry replacing it with the new SYN attempt. Syncookies are done > on all SYN-ACK's going out. >=20 > 2/ Fill the syncache but do not bump the oldest entry, other than = normal > expiry. All further SYN-ACK's are syncookies-only (w/o window = scaling > etc). Those in the syncache do not need to carry syncookies and = are > real full SYN-ACK's. >=20 > 3/ Only send syncookies and do not cache anything. No window scaling > and SACK-PERM can be carried though. >=20 > So far we've been doing option 1. We can switch to option 2 which, = depending > on the situation, may be better or worse. Options 3 isn't viable = currently > due to loss of window scaling and SACK. >=20 > Based on the recent Windows 8 issue I've devised a different HMAC = based > syncookie scheme where all necessary information can be stored in the = ISS > forgoing the need for the timestamp bits. I have sent a description = of > the scheme to Colin and Nate to have it reviewed. It must be = cryptographically > strong enough to withstand cracking attempts for about 30 seconds. = Forward > security isn't necessary as the syncookie secrets are completely = random and > renewed every 30 seconds. I'll wait for Colin and Nates' evaluation of your scheme to weigh in, = though given the limited key space already in place I do wonder how you got = that much information into a 32 bit int. Thanks, George From owner-freebsd-net@FreeBSD.ORG Thu Feb 7 23:13:37 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 5C9C1713 for ; Thu, 7 Feb 2013 23:13:37 +0000 (UTC) (envelope-from rfg@tristatelogic.com) Received: from outgoing.tristatelogic.com (segfault.tristatelogic.com [69.62.255.118]) by mx1.freebsd.org (Postfix) with ESMTP id 3E1EC636 for ; Thu, 7 Feb 2013 23:13:36 +0000 (UTC) Received: from segfault-nmh-helo.tristatelogic.com (localhost [127.0.0.1]) by segfault.tristatelogic.com (Postfix) with ESMTP id 26CA45081A for ; Thu, 7 Feb 2013 15:13:27 -0800 (PST) To: freebsd-net@freebsd.org Subject: Question: Why ain't I getting gigabit speed? Date: Thu, 07 Feb 2013 15:13:27 -0800 Message-ID: <18120.1360278807@tristatelogic.com> From: "Ronald F. Guilmette" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Feb 2013 23:13:37 -0000 I just aquired a brand new chepie gigabit PCI ethernet card off eBay. The main chip on it appears to be an RTL8110S-32. I stuck this card into a 9.1-RELEASE system that I have been putting together, and it seemed to be recognized ok (as re0) upon boot up, so I diddled my /etc/rc.conf file to get it to ifconfig as 192.168.1.3 on reboot. Then I rebooted. I have the card wired via a CAT6 cable to my Linksys E2000 gigabit router. Nonetheless, upon reboot, followed by "ifconfig -a", the output from ifconfig says the following for this card: re0: flags=8843 metric 0 mtu 1500 options=8209b ether 00:13:3b:02:03:bd inet 192.168.1.3 netmask 0xffffff00 broadcast 192.168.1.255 inet6 fe80::213:3bff:fe02:3bd%re0 prefixlen 64 scopeid 0x7 nd6 options=29 media: Ethernet autoselect (100baseTX ) status: active I've tried two different CAT6 cables, two different LAN ports on my E2000, and I've even tried the card in two different PCI slost on my motherboard, but the results are always the same. So, um, what gives? Why does the driver appear to be setting this card to 100baseTX rather than the 1000baseTX that I was hoping for? Is there some magic spell that I am unaware of that I must cast on this in order to get it to work right? P.S. dmesg has this to say about the card: re0: port 0xbe00-0xbeff mem 0xdf9ff000-0xdf9ff0ff irq 18 at device 5.0 on pci4 re0: Chip rev. 0x04000000 re0: MAC rev. 0x00000000 re0: Ethernet address: 00:13:3b:02:03:bd re0: link state changed to UP re0: link state changed to DOWN re0: link state changed to UP From owner-freebsd-net@FreeBSD.ORG Thu Feb 7 23:53:17 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 9F073619 for ; Thu, 7 Feb 2013 23:53:17 +0000 (UTC) (envelope-from rfg@tristatelogic.com) Received: from outgoing.tristatelogic.com (segfault.tristatelogic.com [69.62.255.118]) by mx1.freebsd.org (Postfix) with ESMTP id 84B06833 for ; Thu, 7 Feb 2013 23:53:17 +0000 (UTC) Received: from segfault-nmh-helo.tristatelogic.com (localhost [127.0.0.1]) by segfault.tristatelogic.com (Postfix) with ESMTP id 0FEA05081A for ; Thu, 7 Feb 2013 15:53:17 -0800 (PST) To: freebsd-net@freebsd.org Subject: Question: Why ain't I getting gigabit speed? Date: Thu, 07 Feb 2013 15:53:17 -0800 Message-ID: <18410.1360281197@tristatelogic.com> From: "Ronald F. Guilmette" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Feb 2013 23:53:17 -0000 Apologies for following up on myself, but I just now found this: https://support.freenas.org/ticket/894 This thread would suggest that I ain't alone in experienceing this problem with the RTL8110S. That other guy apparently solved his problem by just simply switching to a CAT6 cable. I however am already using CAT6 cables, and the problem for me still exists. I tried adding: media 1000baseTX to my ifconfig_re0= line in my /etc/rc.conf file (and then rebooting), however when I did that, a subsequent "ifconfig -a" showed that indeed, the card had now been correctly configured to speak 1000baseT, however it also said: status: no carrier even though the thing most definitely _is_ still plugged in to my E2000 router, and I could not ping anything else, even on my own LAN. So I'm still stuck, and still looking for an answer. How can I get this card working at gigabit speed? From owner-freebsd-net@FreeBSD.ORG Fri Feb 8 02:32:04 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id CC859CC for ; Fri, 8 Feb 2013 02:32:04 +0000 (UTC) (envelope-from sinister@gmail.com) Received: from mail-ie0-x235.google.com (mail-ie0-x235.google.com [IPv6:2607:f8b0:4001:c03::235]) by mx1.freebsd.org (Postfix) with ESMTP id A01C3E9A for ; Fri, 8 Feb 2013 02:32:04 +0000 (UTC) Received: by mail-ie0-f181.google.com with SMTP id 17so4498987iea.12 for ; Thu, 07 Feb 2013 18:32:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:message-id:from:to:references:subject:date:mime-version :content-type:content-transfer-encoding:x-priority:x-msmail-priority :x-mailer:x-mimeole; bh=521Rbb+TDc3JCt3Nst/9tPqtUwJ1gRIeTVz8EJSLVC0=; b=SsERnzm2wn29vreuRsJkluAmoDB2nJKzB+CR/Okaabix5X4KKJpCRk31Ibe8QOnBz5 K73bziT3vw3QnNroiElg8jNhzxhThOIjwhq/o598F8y9T9yngk5TvSndgq8L8qWIbgmV MjlB3C86aUP1ERBnQQrFHJNpTuLvqMYfuvZ1pIQe/S00B+4kaMPV5rvqUCkaOAboZ9pC d2iEUeVboFP1HYNDZaPy1JB62WY1l6yqnhCVze1DLljJZYxvi8vCLUX7bmIisX65LO0O rubx9MKz2sHFvXfcDbayPB8rmfWZG8EVHw4uVrpjd+4w09crNiPyytrLUqnrnvv7lBPV NYHg== X-Received: by 10.42.11.203 with SMTP id v11mr6378170icv.28.1360290724342; Thu, 07 Feb 2013 18:32:04 -0800 (PST) Received: from dts (abaddon.markofthebeast.ca. [216.8.139.47]) by mx.google.com with ESMTPS id px5sm12265998igc.0.2013.02.07.18.32.03 (version=TLSv1 cipher=RC4-SHA bits=128/128); Thu, 07 Feb 2013 18:32:03 -0800 (PST) Message-ID: <48071C801E57457A9A291B94F284D9A5@dts> From: "Sin" To: , "Ronald F. Guilmette" References: <18410.1360281197@tristatelogic.com> Subject: Re: Question: Why ain't I getting gigabit speed? Date: Thu, 7 Feb 2013 21:32:12 -0500 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.5931 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 08 Feb 2013 02:32:04 -0000 Maybe you're not using all 4 pairs in the CAT 6 cable. ----- Original Message ----- From: "Ronald F. Guilmette" To: Sent: Thursday, February 07, 2013 6:53 PM Subject: Question: Why ain't I getting gigabit speed? > > > Apologies for following up on myself, but I just now found this: > > https://support.freenas.org/ticket/894 > > This thread would suggest that I ain't alone in experienceing this > problem with the RTL8110S. > > That other guy apparently solved his problem by just simply switching > to a CAT6 cable. I however am already using CAT6 cables, and the problem > for me still exists. > > I tried adding: > > media 1000baseTX > > to my ifconfig_re0= line in my /etc/rc.conf file (and then rebooting), > however when I did that, a subsequent "ifconfig -a" showed that indeed, > the card had now been correctly configured to speak 1000baseT, however > it also said: > > status: no carrier > > even though the thing most definitely _is_ still plugged in to my > E2000 router, and I could not ping anything else, even on my own LAN. > > So I'm still stuck, and still looking for an answer. How can I get this > card working at gigabit speed? > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > From owner-freebsd-net@FreeBSD.ORG Fri Feb 8 10:10:01 2013 Return-Path: Delivered-To: freebsd-net@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id CE8D6AEE for ; Fri, 8 Feb 2013 10:10:01 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id B72B8996 for ; Fri, 8 Feb 2013 10:10:01 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.6/8.14.6) with ESMTP id r18AA1B8049384 for ; Fri, 8 Feb 2013 10:10:01 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.6/8.14.6/Submit) id r18AA1WM049383; Fri, 8 Feb 2013 10:10:01 GMT (envelope-from gnats) Date: Fri, 8 Feb 2013 10:10:01 GMT Message-Id: <201302081010.r18AA1WM049383@freefall.freebsd.org> To: freebsd-net@FreeBSD.org Cc: From: Andrey Simonenko Subject: Re: bin/131567: Update for regression/sockets/unix_cmsg X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: Andrey Simonenko List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 08 Feb 2013 10:10:01 -0000 The following reply was made to PR bin/131567; it has been noted by GNATS. From: Andrey Simonenko To: bug-followup@freebsd.org Cc: Subject: Re: bin/131567: Update for regression/sockets/unix_cmsg Date: Fri, 8 Feb 2013 12:08:54 +0200 Completely redesigned unix_cmsg with improved logic (corrected version). Details in README. diff -ruNp unix_cmsg.orig/README unix_cmsg/README --- unix_cmsg.orig/README 2012-11-19 14:38:48.000000000 +0200 +++ unix_cmsg/README 2013-02-08 12:08:52.000000000 +0200 @@ -1,127 +1,160 @@ $FreeBSD: src/tools/regression/sockets/unix_cmsg/README,v 1.2 2012/11/17 01:53:57 svnexp Exp $ About unix_cmsg -================ +=============== -This program is a collection of regression tests for ancillary (control) -data for PF_LOCAL sockets (local domain or Unix domain sockets). There -are tests for stream and datagram sockets. - -Usually each test does following steps: create Server, fork Client, -Client sends something to Server, Server verifies if everything -is correct in received message. Sometimes Client sends several -messages to Server. +This program is a collection of regression tests for ancillary data +(control information) for PF_LOCAL sockets (local domain or Unix domain +sockets). There are tests for stream and datagram sockets. + +Usually each test does following steps: creates Server, forks Client, +Client sends something to Server, Server verifies whether everything is +correct in received message(s). It is better to change the owner of unix_cmsg to some safe user -(eg. nobody:nogroup) and set SUID and SGID bits, else some tests -can give correct results for wrong implementation. +(eg. nobody:nogroup) and set SUID and SGID bits, else some tests that +check credentials can give correct results for wrong implementation. + +It is better to run this program by a user that belongs to more +than 16 groups. Available options ================= --d Output debugging information, values of different fields of - received messages, etc. Will produce many lines of information. - --h Output help message and exit. +usage: unix_cmsg [-dh] [-n num] [-s size] [-t type] [-z value] [testno] --t - Run tests only for the given socket type: "stream" or "dgram". - With this option it is possible to run only particular test, - not all of them. - --z Do not send real control data if possible. Struct cmsghdr{} - should be followed by real control data. It is not clear if - a sender should give control data in all cases (this is not - documented and an arbitrary application can choose anything). - - At least for PF_LOCAL sockets' control messages with types - SCM_CREDS and SCM_TIMESTAMP the kernel does not need any - control data. This option allow to not send real control data - for SCM_CREDS and SCM_TIMESTAMP control messages. + Options are: + -d Output debugging information + -h Output the help message and exit + -n num Number of messages to send + -s size Specify size of data for IPC + -t type Specify socket type (stream, dgram) for tests + -z value Do not send data in a message (bit 0x1), do not send + data array associated with a cmsghdr structure (bit 0x2) + testno Run one test by its number (require the -t option) Description of tests ==================== +If Client sends something to Server, then it sends 5 messages by default. +Number of messages can be changed in the -n command line option. Number +of messages will be given as N in the following descriptions. + +If Client sends something to Server, then it sends some data (few bytes) +in each message by default. The size of this data can be changed by the -s +command line option. The "-s 0" command line option means, that Client will +send zero bytes represented by { NULL, 0 } value of struct iovec{}, referenced +by the msg_iov field from struct msghdr{}. The "-z 1" or "-z 3" command line +option means, that Client will send zero bytes represented by the NULL value +in the msg_iov field from struct msghdr{}. + +If Client sends some ancillary data object, then this ancillary data object +always has associated data array by default. The "-z 2" or "-z 3" option +means, that Client will not send associated data array if possible. + For SOCK_STREAM sockets: ----------------------- 1: Sending, receiving cmsgcred - Client connects to Server and sends two messages with data and - control message with SCM_CREDS type to Server. Server should - receive two messages, in both messages there should be data and - control message with SCM_CREDS type followed by struct cmsgcred{} - and this structure should contain correct information. - - 2: Receiving sockcred (listening socket has LOCAL_CREDS) - - Server creates listen socket and set socket option LOCAL_CREDS - for it. Client connects to Server and sends two messages with data - to Server. Server should receive two messages, in first message - there should be data and control message with SCM_CREDS type followed - by struct sockcred{} and this structure should contain correct - information, in second message there should be data and no control - message. - - 3: Receiving sockcred (accepted socket has LOCAL_CREDS) - - Client connects to Server and sends two messages with data. Server - accepts connection and set socket option LOCAL_CREDS for just accepted - socket (here synchronization is used, to allow Client to see just set - flag on Server's socket before sending messages to Server). Server - should receive two messages, in first message there should be data and - control message with SOCK_CRED type followed by struct sockcred{} and - this structure should contain correct information, in second message - there should be data and no control message. + Client connects to Server and sends N messages with SCM_CREDS ancillary + data object. Server should receive N messages, each message should + have SCM_CREDS ancillary data object followed by struct cmsgcred{}. + + 2: Receiving sockcred (listening socket) + + Server creates a listening stream socket and sets the LOCAL_CREDS + socket option for it. Client connects to Server two times, each time + it sends N messages. Server accepts two connections and receives N + messages from each connection. The first message from each connection + should have SCM_CREDS ancillary data object followed by struct sockcred{}, + next messages from the same connection should not have ancillary data. + + 3: Receiving sockcred (accepted socket) + + Client connects to Server. Server accepts connection and sets the + LOCAL_CREDS socket option for just accepted socket. Client sends N + messages to Server. Server should receive N messages, the first + message should have SCM_CREDS ancillary data object followed by + struct sockcred{}, next messages should not have ancillary data. 4: Sending cmsgcred, receiving sockcred - Server creates listen socket and set socket option LOCAL_CREDS - for it. Client connects to Server and sends one message with data - and control message with SCM_CREDS type to Server. Server should - receive one message with data and control message with SCM_CREDS type - followed by struct sockcred{} and this structure should contain - correct information. - - 5: Sending, receiving timestamp - - Client connects to Server and sends message with data and control - message with SCM_TIMESTAMP type to Server. Server should receive - message with data and control message with SCM_TIMESTAMP type - followed by struct timeval{}. + Server creates a listening stream socket and sets the LOCAL_CREDS + socket option for it. Client connects to Server and sends N messages + with SCM_CREDS ancillary data object. Server should receive N messages, + the first message should have SCM_CREDS ancillary data object followed + by struct sockcred{}, each of next messages should have SCM_CREDS + ancillary data object followed by struct cmsgcred{}. + + 5: Sending, receiving timeval + + Client connects to Server and sends message with SCM_TIMESTAMP ancillary + data object. Server should receive one message with SCM_TIMESTAMP + ancillary data object followed by struct timeval{}. + + 6: Sending, receiving bintime + + Client connects to Server and sends message with SCM_BINTIME ancillary + data object. Server should receive one message with SCM_BINTIME + ancillary data object followed by struct bintime{}. + + 7: Checking cmsghdr.cmsg_len + + Client connects to Server and tries to send several messages with + SCM_CREDS ancillary data object that has wrong cmsg_len field in its + struct cmsghdr{}. All these attempts should fail, since cmsg_len + in all requests is less than CMSG_LEN(0). + + 8: Check LOCAL_PEERCRED socket option + + This test does not use ancillary data, but can be implemented here. + Client connects to Server. Both Client and Server verify that + credentials of the peer are correct using LOCAL_PEERCRED socket option. For SOCK_DGRAM sockets: ---------------------- 1: Sending, receiving cmsgcred - Client sends to Server two messages with data and control message - with SCM_CREDS type to Server. Server should receive two messages, - in both messages there should be data and control message with - SCM_CREDS type followed by struct cmsgcred{} and this structure - should contain correct information. + Client connects to Server and sends N messages with SCM_CREDS ancillary + data object. Server should receive N messages, each message should + have SCM_CREDS ancillary data object followed by struct cmsgcred{}. 2: Receiving sockcred - Server creates datagram socket and set socket option LOCAL_CREDS - for it. Client sends two messages with data to Server. Server should - receive two messages, in both messages there should be data and control - message with SCM_CREDS type followed by struct sockcred{} and this - structure should contain correct information. + Server creates datagram socket and sets the LOCAL_CREDS socket option + for it. Client sends N messages to Server. Server should receive N + messages, each message should have SCM_CREDS ancillary data object + followed by struct sockcred{}. 3: Sending cmsgcred, receiving sockcred - - Server creates datagram socket and set socket option LOCAL_CREDS - for it. Client sends one message with data and control message with - SOCK_CREDS type to Server. Server should receive one message with - data and control message with SCM_CREDS type followed by struct - sockcred{} and this structure should contain correct information. - - 4: Sending, receiving timestamp - - Client sends message with data and control message with SCM_TIMESTAMP - type to Server. Server should receive message with data and control - message with SCM_TIMESTAMP type followed by struct timeval{}. + + Server creates datagram socket and sets the LOCAL_CREDS socket option + for it. Client sends N messages with SCM_CREDS ancillary data object + to Server. Server should receive N messages, the first message should + have SCM_CREDS ancillary data object followed by struct sockcred{}, + each of next messages should have SCM_CREDS ancillary data object + followed by struct cmsgcred{}. + + 4: Sending, receiving timeval + + Client sends one message with SCM_TIMESTAMP ancillary data object + to Server. Server should receive one message with SCM_TIMESTAMP + ancillary data object followed by struct timeval{}. + + 5: Sending, receiving bintime + + Client sends one message with SCM_BINTIME ancillary data object + to Server. Server should receive one message with SCM_BINTIME + ancillary data object followed by struct bintime{}. + + 6: Checking cmsghdr.cmsg_len + + Client tries to send Server several messages with SCM_CREDS ancillary + data object that has wrong cmsg_len field in its struct cmsghdr{}. + All these attempts should fail, since cmsg_len in all requests is less + than CMSG_LEN(0). - Andrey Simonenko -simon@comsys.ntu-kpi.kiev.ua +andreysimonenko@users.sourceforge.net diff -ruNp unix_cmsg.orig/unix_cmsg.c unix_cmsg/unix_cmsg.c --- unix_cmsg.orig/unix_cmsg.c 2012-11-20 11:26:18.000000000 +0200 +++ unix_cmsg/unix_cmsg.c 2013-02-08 11:42:08.000000000 +0200 @@ -27,48 +27,46 @@ #include __FBSDID("$FreeBSD: src/tools/regression/sockets/unix_cmsg/unix_cmsg.c,v 1.5 2012/11/19 22:59:17 svnexp Exp $"); -#include +#include #include #include +#include #include +#include #include #include -#include #include #include #include +#include #include #include -#include +#include #include #include +#include #include #include #include #include -#include #include /* * There are tables with tests descriptions and pointers to test * functions. Each t_*() function returns 0 if its test passed, - * -1 if its test failed (something wrong was found in local domain - * control messages), -2 if some system error occurred. If test - * function returns -2, then a program exits. + * -1 if its test failed, -2 if some system error occurred. + * If a test function returns -2, then a program exits. * - * Each test function completely control what to do (eg. fork or - * do not fork a client process). If a test function forks a client - * process, then it waits for its termination. If a return code of a - * client process is not equal to zero, or if a client process was - * terminated by a signal, then test function returns -2. + * If a test function forks a client process, then it waits for its + * termination. If a return code of a client process is not equal + * to zero, or if a client process was terminated by a signal, then + * a test function returns -1 or -2 depending on exit status of + * a client process. * - * Each test function and complete program are not optimized - * a lot to allow easy to modify tests. - * - * Each function which can block, is run under TIMEOUT, if timeout - * occurs, then test function returns -2 or a client process exits - * with nonzero return code. + * Each function which can block, is run under TIMEOUT. If timeout + * occurs, then a test function returns -2 or a client process exits + * with a non-zero return code. */ #ifndef LISTENQ @@ -76,207 +74,290 @@ __FBSDID("$FreeBSD: src/tools/regression #endif #ifndef TIMEOUT -# define TIMEOUT 60 +# define TIMEOUT 3 #endif -#define EXTRA_CMSG_SPACE 512 /* Memory for not expected control data. */ - -static int t_cmsgcred(void), t_sockcred_stream1(void); -static int t_sockcred_stream2(void), t_cmsgcred_sockcred(void); -static int t_sockcred_dgram(void), t_timestamp(void); +static int t_cmsgcred(void); +static int t_sockcred_1(void); +static int t_sockcred_2(void); +static int t_cmsgcred_sockcred(void); +static int t_timeval(void); +static int t_bintime(void); +static int t_cmsg_len(void); +static int t_peercred(void); struct test_func { - int (*func)(void); /* Pointer to function. */ - const char *desc; /* Test description. */ -}; - -static struct test_func test_stream_tbl[] = { - { NULL, " 0: All tests" }, - { t_cmsgcred, " 1: Sending, receiving cmsgcred" }, - { t_sockcred_stream1, " 2: Receiving sockcred (listening socket has LOCAL_CREDS)" }, - { t_sockcred_stream2, " 3: Receiving sockcred (accepted socket has LOCAL_CREDS)" }, - { t_cmsgcred_sockcred, " 4: Sending cmsgcred, receiving sockcred" }, - { t_timestamp, " 5: Sending, receiving timestamp" }, - { NULL, NULL } + int (*func)(void); + const char *desc; }; -static struct test_func test_dgram_tbl[] = { - { NULL, " 0: All tests" }, - { t_cmsgcred, " 1: Sending, receiving cmsgcred" }, - { t_sockcred_dgram, " 2: Receiving sockcred" }, - { t_cmsgcred_sockcred, " 3: Sending cmsgcred, receiving sockcred" }, - { t_timestamp, " 4: Sending, receiving timestamp" }, - { NULL, NULL } +static const struct test_func test_stream_tbl[] = { + { + .func = NULL, + .desc = "All tests" + }, + { + .func = t_cmsgcred, + .desc = "Sending, receiving cmsgcred" + }, + { + .func = t_sockcred_1, + .desc = "Receiving sockcred (listening socket)" + }, + { + .func = t_sockcred_2, + .desc = "Receiving sockcred (accepted socket)" + }, + { + .func = t_cmsgcred_sockcred, + .desc = "Sending cmsgcred, receiving sockcred" + }, + { + .func = t_timeval, + .desc = "Sending, receiving timeval" + }, + { + .func = t_bintime, + .desc = "Sending, receiving bintime" + }, + { + .func = t_cmsg_len, + .desc = "Check cmsghdr.cmsg_len" + }, + { + .func = t_peercred, + .desc = "Check LOCAL_PEERCRED socket option" + } }; -#define TEST_STREAM_NO_MAX (sizeof(test_stream_tbl) / sizeof(struct test_func) - 2) -#define TEST_DGRAM_NO_MAX (sizeof(test_dgram_tbl) / sizeof(struct test_func) - 2) - -static const char *myname = "SERVER"; /* "SERVER" or "CLIENT" */ - -static int debug = 0; /* 1, if -d. */ -static int no_control_data = 0; /* 1, if -z. */ - -static u_int nfailed = 0; /* Number of failed tests. */ +#define TEST_STREAM_TBL_SIZE \ + (sizeof(test_stream_tbl) / sizeof(test_stream_tbl[0])) -static int sock_type; /* SOCK_STREAM or SOCK_DGRAM */ -static const char *sock_type_str; /* "SOCK_STREAM" or "SOCK_DGRAN" */ - -static char tempdir[] = "/tmp/unix_cmsg.XXXXXXX"; -static char serv_sock_path[PATH_MAX]; - -static char ipc_message[] = "hello"; - -#define IPC_MESSAGE_SIZE (sizeof(ipc_message)) - -static struct sockaddr_un servaddr; /* Server address. */ - -static sigjmp_buf env_alrm; +static const struct test_func test_dgram_tbl[] = { + { + .func = NULL, + .desc = "All tests" + }, + { + .func = t_cmsgcred, + .desc = "Sending, receiving cmsgcred" + }, + { + .func = t_sockcred_2, + .desc = "Receiving sockcred" + }, + { + .func = t_cmsgcred_sockcred, + .desc = "Sending cmsgcred, receiving sockcred" + }, + { + .func = t_timeval, + .desc = "Sending, receiving timeval" + }, + { + .func = t_bintime, + .desc = "Sending, receiving bintime" + }, + { + .func = t_cmsg_len, + .desc = "Check cmsghdr.cmsg_len" + } +}; -static uid_t my_uid; -static uid_t my_euid; -static gid_t my_gid; -static gid_t my_egid; +#define TEST_DGRAM_TBL_SIZE \ + (sizeof(test_dgram_tbl) / sizeof(test_dgram_tbl[0])) -/* - * my_gids[0] is EGID, next items are supplementary GIDs, - * my_ngids determines valid items in my_gids array. - */ -static gid_t my_gids[NGROUPS_MAX]; -static int my_ngids; +static bool debug = false; +static bool server_flag = true; +static bool send_data_flag = true; +static bool send_array_flag = true; +static bool failed_flag = false; + +static int sock_type; +static const char *sock_type_str; + +static const char *proc_name; + +static char tempdir[] = _PATH_TMP "unix_cmsg.XXXXXXX"; +static int serv_sock_fd; +static struct sockaddr_un serv_addr_sun; + +static struct { + char *buf_send; + char *buf_recv; + size_t buf_size; + u_int msg_num; +} ipc_msg; + +#define IPC_MSG_NUM_DEF 5 +#define IPC_MSG_NUM_MAX 10 +#define IPC_MSG_SIZE_DEF 7 +#define IPC_MSG_SIZE_MAX 128 + +static struct { + uid_t uid; + uid_t euid; + gid_t gid; + gid_t egid; + gid_t *gid_arr; + int gid_num; +} proc_cred; + +static pid_t client_pid; + +#define SYNC_SERVER 0 +#define SYNC_CLIENT 1 +#define SYNC_RECV 0 +#define SYNC_SEND 1 -static pid_t client_pid; /* PID of forked client. */ +static int sync_fd[2][2]; -#define dbgmsg(x) do { \ - if (debug) \ - logmsgx x ; \ -} while (/* CONSTCOND */0) +#define LOGMSG_SIZE 128 static void logmsg(const char *, ...) __printflike(1, 2); static void logmsgx(const char *, ...) __printflike(1, 2); +static void dbgmsg(const char *, ...) __printflike(1, 2); static void output(const char *, ...) __printflike(1, 2); -extern char *__progname; /* The name of program. */ - -/* - * Output the help message (-h switch). - */ static void -usage(int quick) +usage(bool verbose) { - const struct test_func *test_func; + u_int i; - fprintf(stderr, "Usage: %s [-dhz] [-t ] [testno]\n", - __progname); - if (quick) + printf("usage: %s [-dh] [-n num] [-s size] [-t type] " + "[-z value] [testno]\n", getprogname()); + if (!verbose) return; - fprintf(stderr, "\n Options are:\n\ - -d\t\t\tOutput debugging information\n\ - -h\t\t\tOutput this help message and exit\n\ - -t \t\tRun test only for the given socket type:\n\ -\t\t\tstream or dgram\n\ - -z\t\t\tDo not send real control data if possible\n\n"); - fprintf(stderr, " Available tests for stream sockets:\n"); - for (test_func = test_stream_tbl; test_func->desc != NULL; ++test_func) - fprintf(stderr, " %s\n", test_func->desc); - fprintf(stderr, "\n Available tests for datagram sockets:\n"); - for (test_func = test_dgram_tbl; test_func->desc != NULL; ++test_func) - fprintf(stderr, " %s\n", test_func->desc); + printf("\n Options are:\n\ + -d Output debugging information\n\ + -h Output the help message and exit\n\ + -n num Number of messages to send\n\ + -s size Specify size of data for IPC\n\ + -t type Specify socket type (stream, dgram) for tests\n\ + -z value Do not send data in a message (bit 0x1), do not send\n\ + data array associated with a cmsghdr structure (bit 0x2)\n\ + testno Run one test by its number (require the -t option)\n\n"); + printf(" Available tests for stream sockets:\n"); + for (i = 0; i < TEST_STREAM_TBL_SIZE; ++i) + printf(" %u: %s\n", i, test_stream_tbl[i].desc); + printf("\n Available tests for datagram sockets:\n"); + for (i = 0; i < TEST_DGRAM_TBL_SIZE; ++i) + printf(" %u: %s\n", i, test_dgram_tbl[i].desc); } -/* - * printf-like function for outputting to STDOUT_FILENO. - */ static void output(const char *format, ...) { - char buf[128]; + char buf[LOGMSG_SIZE]; va_list ap; va_start(ap, format); if (vsnprintf(buf, sizeof(buf), format, ap) < 0) - err(EX_SOFTWARE, "output: vsnprintf failed"); + err(EXIT_FAILURE, "output: vsnprintf failed"); write(STDOUT_FILENO, buf, strlen(buf)); va_end(ap); } -/* - * printf-like function for logging, also outputs message for errno. - */ static void logmsg(const char *format, ...) { - char buf[128]; + char buf[LOGMSG_SIZE]; va_list ap; int errno_save; - errno_save = errno; /* Save errno. */ - + errno_save = errno; va_start(ap, format); if (vsnprintf(buf, sizeof(buf), format, ap) < 0) - err(EX_SOFTWARE, "logmsg: vsnprintf failed"); + err(EXIT_FAILURE, "logmsg: vsnprintf failed"); if (errno_save == 0) - output("%s: %s\n", myname, buf); + output("%s: %s\n", proc_name, buf); else - output("%s: %s: %s\n", myname, buf, strerror(errno_save)); + output("%s: %s: %s\n", proc_name, buf, strerror(errno_save)); va_end(ap); + errno = errno_save; +} + +static void +vlogmsgx(const char *format, va_list ap) +{ + char buf[LOGMSG_SIZE]; + + if (vsnprintf(buf, sizeof(buf), format, ap) < 0) + err(EXIT_FAILURE, "logmsgx: vsnprintf failed"); + output("%s: %s\n", proc_name, buf); - errno = errno_save; /* Restore errno. */ } -/* - * printf-like function for logging, do not output message for errno. - */ static void logmsgx(const char *format, ...) { - char buf[128]; va_list ap; va_start(ap, format); - if (vsnprintf(buf, sizeof(buf), format, ap) < 0) - err(EX_SOFTWARE, "logmsgx: vsnprintf failed"); - output("%s: %s\n", myname, buf); + vlogmsgx(format, ap); va_end(ap); } -/* - * Run tests from testno1 to testno2. - */ +static void +dbgmsg(const char *format, ...) +{ + va_list ap; + + if (debug) { + va_start(ap, format); + vlogmsgx(format, ap); + va_end(ap); + } +} + static int -run_tests(u_int testno1, u_int testno2) +run_tests(int type, u_int testno1) { - const struct test_func *test_func; - u_int i, nfailed1; + const struct test_func *tf; + u_int i, testno2, failed_num; - output("Running tests for %s sockets:\n", sock_type_str); - test_func = (sock_type == SOCK_STREAM ? - test_stream_tbl : test_dgram_tbl) + testno1; + sock_type = type; + if (type == SOCK_STREAM) { + sock_type_str = "SOCK_STREAM"; + tf = test_stream_tbl; + i = TEST_STREAM_TBL_SIZE - 1; + } else { + sock_type_str = "SOCK_DGRAM"; + tf = test_dgram_tbl; + i = TEST_DGRAM_TBL_SIZE - 1; + } + if (testno1 == 0) { + testno1 = 1; + testno2 = i; + } else + testno2 = testno1; - nfailed1 = 0; - for (i = testno1; i <= testno2; ++test_func, ++i) { - output(" %s\n", test_func->desc); - switch (test_func->func()) { + output("Running tests for %s sockets:\n", sock_type_str); + failed_num = 0; + for (i = testno1, tf += testno1; i <= testno2; ++tf, ++i) { + output(" %u: %s\n", i, tf->desc); + switch (tf->func()) { case -1: - ++nfailed1; + ++failed_num; break; case -2: - logmsgx("some system error occurred, exiting"); + logmsgx("some system error or timeout occurred"); return (-1); } } - nfailed += nfailed1; + if (failed_num != 0) + failed_flag = true; if (testno1 != testno2) { - if (nfailed1 == 0) - output("-- all tests were passed!\n"); + if (failed_num == 0) + output("-- all tests passed!\n"); else - output("-- %u test%s failed!\n", nfailed1, - nfailed1 == 1 ? "" : "s"); + output("-- %u test%s failed!\n", + failed_num, failed_num == 1 ? "" : "s"); } else { - if (nfailed == 0) - output("-- test was passed!\n"); + if (failed_num == 0) + output("-- test passed!\n"); else output("-- test failed!\n"); } @@ -284,183 +365,325 @@ run_tests(u_int testno1, u_int testno2) return (0); } -/* ARGSUSED */ -static void -sig_alrm(int signo __unused) +static int +init(void) +{ + struct sigaction sigact; + size_t idx; + int rv; + + proc_name = "SERVER"; + + sigact.sa_handler = SIG_IGN; + sigact.sa_flags = 0; + sigemptyset(&sigact.sa_mask); + if (sigaction(SIGPIPE, &sigact, (struct sigaction *)NULL) < 0) { + logmsg("init: sigaction"); + return (-1); + } + + if (ipc_msg.buf_size == 0) + ipc_msg.buf_send = ipc_msg.buf_recv = NULL; + else { + ipc_msg.buf_send = malloc(ipc_msg.buf_size); + ipc_msg.buf_recv = malloc(ipc_msg.buf_size); + if (ipc_msg.buf_send == NULL || ipc_msg.buf_recv == NULL) { + logmsg("init: malloc"); + return (-1); + } + for (idx = 0; idx < ipc_msg.buf_size; ++idx) + ipc_msg.buf_send[idx] = (char)idx; + } + + proc_cred.uid = getuid(); + proc_cred.euid = geteuid(); + proc_cred.gid = getgid(); + proc_cred.egid = getegid(); + proc_cred.gid_num = getgroups(0, (gid_t *)NULL); + if (proc_cred.gid_num < 0) { + logmsg("init: getgroups"); + return (-1); + } + proc_cred.gid_arr = malloc(proc_cred.gid_num * + sizeof(*proc_cred.gid_arr)); + if (proc_cred.gid_arr == NULL) { + logmsg("init: malloc"); + return (-1); + } + if (getgroups(proc_cred.gid_num, proc_cred.gid_arr) < 0) { + logmsg("init: getgroups"); + return (-1); + } + + memset(&serv_addr_sun, 0, sizeof(serv_addr_sun)); + rv = snprintf(serv_addr_sun.sun_path, sizeof(serv_addr_sun.sun_path), + "%s/%s", tempdir, proc_name); + if (rv < 0) { + logmsg("init: snprintf"); + return (-1); + } + if ((size_t)rv >= sizeof(serv_addr_sun.sun_path)) { + logmsgx("init: not enough space for socket pathname"); + return (-1); + } + serv_addr_sun.sun_family = PF_LOCAL; + serv_addr_sun.sun_len = SUN_LEN(&serv_addr_sun); + + return (0); +} + +static int +client_fork(void) { - siglongjmp(env_alrm, 1); + int fd1, fd2; + + if (pipe(sync_fd[SYNC_SERVER]) < 0 || + pipe(sync_fd[SYNC_CLIENT]) < 0) { + logmsg("client_fork: pipe"); + return (-1); + } + client_pid = fork(); + if (client_pid == (pid_t)-1) { + logmsg("client_fork: fork"); + return (-1); + } + if (client_pid == 0) { + proc_name = "CLIENT"; + server_flag = false; + fd1 = sync_fd[SYNC_SERVER][SYNC_RECV]; + fd2 = sync_fd[SYNC_CLIENT][SYNC_SEND]; + } else { + fd1 = sync_fd[SYNC_SERVER][SYNC_SEND]; + fd2 = sync_fd[SYNC_CLIENT][SYNC_RECV]; + } + if (close(fd1) < 0 || close(fd2) < 0) { + logmsg("client_fork: close"); + return (-1); + } + return (client_pid != 0); } -/* - * Initialize signals handlers. - */ static void -sig_init(void) +client_exit(int rv) +{ + if (close(sync_fd[SYNC_SERVER][SYNC_SEND]) < 0 || + close(sync_fd[SYNC_CLIENT][SYNC_RECV]) < 0) { + logmsg("client_exit: close"); + rv = -1; + } + rv = rv == 0 ? EXIT_SUCCESS : -rv; + dbgmsg("exit: code %d", rv); + _exit(rv); +} + +static int +client_wait(void) { - struct sigaction sa; + int status; + pid_t pid; - sa.sa_handler = SIG_IGN; - sigemptyset(&sa.sa_mask); - sa.sa_flags = 0; - if (sigaction(SIGPIPE, &sa, (struct sigaction *)NULL) < 0) - err(EX_OSERR, "sigaction(SIGPIPE)"); - - sa.sa_handler = sig_alrm; - if (sigaction(SIGALRM, &sa, (struct sigaction *)NULL) < 0) - err(EX_OSERR, "sigaction(SIGALRM)"); + dbgmsg("waiting for client"); + + if (close(sync_fd[SYNC_SERVER][SYNC_RECV]) < 0 || + close(sync_fd[SYNC_CLIENT][SYNC_SEND]) < 0) { + logmsg("client_wait: close"); + return (-1); + } + + pid = waitpid(client_pid, &status, 0); + if (pid == (pid_t)-1) { + logmsg("client_wait: waitpid"); + return (-1); + } + + if (WIFEXITED(status)) { + if (WEXITSTATUS(status) != EXIT_SUCCESS) { + logmsgx("client exit status is %d", + WEXITSTATUS(status)); + return (-WEXITSTATUS(status)); + } + } else { + if (WIFSIGNALED(status)) + logmsgx("abnormal termination of client, signal %d%s", + WTERMSIG(status), WCOREDUMP(status) ? + " (core file generated)" : ""); + else + logmsgx("termination of client, unknown status"); + return (-1); + } + + return (0); } int main(int argc, char *argv[]) { const char *errstr; - int opt, dgramflag, streamflag; - u_int testno1, testno2; - - dgramflag = streamflag = 0; - while ((opt = getopt(argc, argv, "dht:z")) != -1) + u_int testno, zvalue; + int opt, rv; + bool dgram_flag, stream_flag; + + ipc_msg.buf_size = IPC_MSG_SIZE_DEF; + ipc_msg.msg_num = IPC_MSG_NUM_DEF; + dgram_flag = stream_flag = false; + while ((opt = getopt(argc, argv, "dhn:s:t:z:")) != -1) switch (opt) { case 'd': - debug = 1; + debug = true; break; case 'h': - usage(0); - return (EX_OK); + usage(true); + return (EXIT_SUCCESS); + case 'n': + ipc_msg.msg_num = strtonum(optarg, 1, + IPC_MSG_NUM_MAX, &errstr); + if (errstr != NULL) + errx(EXIT_FAILURE, "option -n: number is %s", + errstr); + break; + case 's': + ipc_msg.buf_size = strtonum(optarg, 0, + IPC_MSG_SIZE_MAX, &errstr); + if (errstr != NULL) + errx(EXIT_FAILURE, "option -s: number is %s", + errstr); + break; case 't': if (strcmp(optarg, "stream") == 0) - streamflag = 1; + stream_flag = true; else if (strcmp(optarg, "dgram") == 0) - dgramflag = 1; + dgram_flag = true; else - errx(EX_USAGE, "wrong socket type in -t option"); + errx(EXIT_FAILURE, "option -t: " + "wrong socket type"); break; case 'z': - no_control_data = 1; + zvalue = strtonum(optarg, 0, 3, &errstr); + if (errstr != NULL) + errx(EXIT_FAILURE, "option -z: number is %s", + errstr); + if (zvalue & 0x1) + send_data_flag = false; + if (zvalue & 0x2) + send_array_flag = false; break; - case '?': default: - usage(1); - return (EX_USAGE); + usage(false); + return (EXIT_FAILURE); } if (optind < argc) { if (optind + 1 != argc) - errx(EX_USAGE, "too many arguments"); - testno1 = strtonum(argv[optind], 0, UINT_MAX, &errstr); + errx(EXIT_FAILURE, "too many arguments"); + testno = strtonum(argv[optind], 0, UINT_MAX, &errstr); if (errstr != NULL) - errx(EX_USAGE, "wrong test number: %s", errstr); + errx(EXIT_FAILURE, "test number is %s", errstr); } else - testno1 = 0; - - if (dgramflag == 0 && streamflag == 0) - dgramflag = streamflag = 1; + testno = 0; - if (dgramflag && streamflag && testno1 != 0) - errx(EX_USAGE, "you can use particular test, only with datagram or stream sockets"); + if (!dgram_flag && !stream_flag) + dgram_flag = stream_flag = true; - if (streamflag) { - if (testno1 > TEST_STREAM_NO_MAX) - errx(EX_USAGE, "given test %u for stream sockets does not exist", - testno1); + if (dgram_flag && stream_flag && testno != 0) + errx(EXIT_FAILURE, "particular test can be used " + "with the -t option only"); + + if (stream_flag) { + if (testno >= TEST_STREAM_TBL_SIZE) + errx(EXIT_FAILURE, "given test %u for stream " + "sockets does not exist", testno); } else { - if (testno1 > TEST_DGRAM_NO_MAX) - errx(EX_USAGE, "given test %u for datagram sockets does not exist", - testno1); - } - - my_uid = getuid(); - my_euid = geteuid(); - my_gid = getgid(); - my_egid = getegid(); - switch (my_ngids = getgroups(sizeof(my_gids) / sizeof(my_gids[0]), my_gids)) { - case -1: - err(EX_SOFTWARE, "getgroups"); - /* NOTREACHED */ - case 0: - errx(EX_OSERR, "getgroups returned 0 groups"); + if (testno >= TEST_DGRAM_TBL_SIZE) + errx(EXIT_FAILURE, "given test %u for datagram " + "sockets does not exist", testno); } - sig_init(); - if (mkdtemp(tempdir) == NULL) - err(EX_OSERR, "mkdtemp"); + err(EXIT_FAILURE, "mkdtemp"); - if (streamflag) { - sock_type = SOCK_STREAM; - sock_type_str = "SOCK_STREAM"; - if (testno1 == 0) { - testno1 = 1; - testno2 = TEST_STREAM_NO_MAX; - } else - testno2 = testno1; - if (run_tests(testno1, testno2) < 0) - goto failed; - testno1 = 0; - } + if (init() < 0) + return (EXIT_FAILURE); - if (dgramflag) { - sock_type = SOCK_DGRAM; - sock_type_str = "SOCK_DGRAM"; - if (testno1 == 0) { - testno1 = 1; - testno2 = TEST_DGRAM_NO_MAX; - } else - testno2 = testno1; - if (run_tests(testno1, testno2) < 0) - goto failed; - } + rv = EXIT_SUCCESS; + if (stream_flag) + if (run_tests(SOCK_STREAM, testno) < 0) + rv = EXIT_FAILURE; + if (dgram_flag && rv == EXIT_SUCCESS) + if (run_tests(SOCK_DGRAM, testno) < 0) + rv = EXIT_FAILURE; if (rmdir(tempdir) < 0) { logmsg("rmdir(%s)", tempdir); - return (EX_OSERR); + rv = EXIT_FAILURE; } - return (nfailed ? EX_OSERR : EX_OK); + return (failed_flag ? EXIT_FAILURE : rv); +} -failed: - if (rmdir(tempdir) < 0) - logmsg("rmdir(%s)", tempdir); - return (EX_OSERR); +static int +socket_close(int fd) +{ + int rv; + + rv = 0; + if (close(fd) < 0) { + logmsg("socket_close: close"); + rv = -1; + } + if (server_flag && fd == serv_sock_fd) + if (unlink(serv_addr_sun.sun_path) < 0) { + logmsg("socket_close: unlink(%s)", + serv_addr_sun.sun_path); + rv = -1; + } + return (rv); } -/* - * Create PF_LOCAL socket, if sock_path is not equal to NULL, then - * bind() it. Return socket address in addr. Return file descriptor - * or -1 if some error occurred. - */ static int -create_socket(char *sock_path, size_t sock_path_len, struct sockaddr_un *addr) +socket_create(void) { - int rv, fd; + struct timeval tv; + int fd; - if ((fd = socket(PF_LOCAL, sock_type, 0)) < 0) { - logmsg("create_socket: socket(PF_LOCAL, %s, 0)", sock_type_str); + fd = socket(PF_LOCAL, sock_type, 0); + if (fd < 0) { + logmsg("socket_create: socket(PF_LOCAL, %s, 0)", sock_type_str); return (-1); } + if (server_flag) + serv_sock_fd = fd; - if (sock_path != NULL) { - if ((rv = snprintf(sock_path, sock_path_len, "%s/%s", - tempdir, myname)) < 0) { - logmsg("create_socket: snprintf failed"); - goto failed; - } - if ((size_t)rv >= sock_path_len) { - logmsgx("create_socket: too long path name for given buffer"); - goto failed; - } + tv.tv_sec = TIMEOUT; + tv.tv_usec = 0; + if (setsockopt(fd, SOL_SOCKET, SO_RCVTIMEO, &tv, sizeof(tv)) < 0 || + setsockopt(fd, SOL_SOCKET, SO_SNDTIMEO, &tv, sizeof(tv)) < 0) { + logmsg("socket_create: setsockopt(SO_RCVTIMEO/SO_SNDTIMEO)"); + goto failed; + } - memset(addr, 0, sizeof(*addr)); - addr->sun_family = AF_LOCAL; - if (strlen(sock_path) >= sizeof(addr->sun_path)) { - logmsgx("create_socket: too long path name (>= %lu) for local domain socket", - (u_long)sizeof(addr->sun_path)); + if (server_flag) { + if (bind(fd, (struct sockaddr *)&serv_addr_sun, + serv_addr_sun.sun_len) < 0) { + logmsg("socket_create: bind(%s)", + serv_addr_sun.sun_path); goto failed; } - strcpy(addr->sun_path, sock_path); + if (sock_type == SOCK_STREAM) { + int val; - if (bind(fd, (struct sockaddr *)addr, SUN_LEN(addr)) < 0) { - logmsg("create_socket: bind(%s)", sock_path); - goto failed; + if (listen(fd, LISTENQ) < 0) { + logmsg("socket_create: listen"); + goto failed; + } + val = fcntl(fd, F_GETFL, 0); + if (val < 0) { + logmsg("socket_create: fcntl(F_GETFL)"); + goto failed; + } + if (fcntl(fd, F_SETFL, val | O_NONBLOCK) < 0) { + logmsg("socket_create: fcntl(F_SETFL)"); + goto failed; + } } } @@ -468,1163 +691,1282 @@ create_socket(char *sock_path, size_t so failed: if (close(fd) < 0) - logmsg("create_socket: close"); + logmsg("socket_create: close"); + if (server_flag) + if (unlink(serv_addr_sun.sun_path) < 0) + logmsg("socket_close: unlink(%s)", + serv_addr_sun.sun_path); return (-1); } -/* - * Call create_socket() for server listening socket. - * Return socket descriptor or -1 if some error occurred. - */ static int -create_server_socket(void) +socket_connect(int fd) { - return (create_socket(serv_sock_path, sizeof(serv_sock_path), &servaddr)); -} + dbgmsg("connect"); -/* - * Create unbound socket. - */ -static int -create_unbound_socket(void) -{ - return (create_socket((char *)NULL, 0, (struct sockaddr_un *)NULL)); + if (connect(fd, (struct sockaddr *)&serv_addr_sun, + serv_addr_sun.sun_len) < 0) { + logmsg("socket_connect: connect(%s)", serv_addr_sun.sun_path); + return (-1); + } + return (0); } -/* - * Close socket descriptor, if sock_path is not equal to NULL, - * then unlink the given path. - */ static int -close_socket(const char *sock_path, int fd) +sync_recv(void) { - int error = 0; + ssize_t ssize; + int fd; + char buf; - if (close(fd) < 0) { - logmsg("close_socket: close"); - error = -1; - } - if (sock_path != NULL) - if (unlink(sock_path) < 0) { - logmsg("close_socket: unlink(%s)", sock_path); - error = -1; - } - return (error); -} + dbgmsg("sync: wait"); -/* - * Connect to server (socket address in servaddr). - */ -static int -connect_server(int fd) -{ - dbgmsg(("connecting to %s", serv_sock_path)); + fd = sync_fd[server_flag ? SYNC_SERVER : SYNC_CLIENT][SYNC_RECV]; - /* - * If PF_LOCAL listening socket's queue is full, then connect() - * returns ECONNREFUSED immediately, do not need timeout. - */ - if (connect(fd, (struct sockaddr *)&servaddr, sizeof(servaddr)) < 0) { - logmsg("connect_server: connect(%s)", serv_sock_path); + ssize = read(fd, &buf, 1); + if (ssize < 0) { + logmsg("sync_recv: read"); + return (-1); + } + if (ssize < 1) { + logmsgx("sync_recv: read %zd of 1 byte", ssize); return (-1); } + dbgmsg("sync: received"); + return (0); } -/* - * sendmsg() with timeout. - */ static int -sendmsg_timeout(int fd, struct msghdr *msg, size_t n) +sync_send(void) { - ssize_t nsent; - - dbgmsg(("sending %lu bytes", (u_long)n)); - - if (sigsetjmp(env_alrm, 1) != 0) { - logmsgx("sendmsg_timeout: cannot send message to %s (timeout)", serv_sock_path); - return (-1); - } - - (void)alarm(TIMEOUT); + ssize_t ssize; + int fd; - nsent = sendmsg(fd, msg, 0); + dbgmsg("sync: send"); - (void)alarm(0); + fd = sync_fd[server_flag ? SYNC_CLIENT : SYNC_SERVER][SYNC_SEND]; - if (nsent < 0) { - logmsg("sendmsg_timeout: sendmsg"); + ssize = write(fd, "", 1); + if (ssize < 0) { + logmsg("sync_send: write"); return (-1); } - - if ((size_t)nsent != n) { - logmsgx("sendmsg_timeout: sendmsg: short send: %ld of %lu bytes", - (long)nsent, (u_long)n); + if (ssize < 1) { + logmsgx("sync_send: sent %zd of 1 byte", ssize); return (-1); } return (0); } -/* - * accept() with timeout. - */ static int -accept_timeout(int listenfd) +message_send(int fd, const struct msghdr *msghdr) { - int fd; - - dbgmsg(("accepting connection")); - - if (sigsetjmp(env_alrm, 1) != 0) { - logmsgx("accept_timeout: cannot accept connection (timeout)"); + const struct cmsghdr *cmsghdr; + size_t size; + ssize_t ssize; + + size = msghdr->msg_iov != 0 ? msghdr->msg_iov->iov_len : 0; + dbgmsg("send: data size %zu", size); + dbgmsg("send: msghdr.msg_controllen %u", + (u_int)msghdr->msg_controllen); + cmsghdr = CMSG_FIRSTHDR(msghdr); + if (cmsghdr != NULL) + dbgmsg("send: cmsghdr.cmsg_len %u", + (u_int)cmsghdr->cmsg_len); + + ssize = sendmsg(fd, msghdr, 0); + if (ssize < 0) { + logmsg("message_send: sendmsg"); + return (-1); + } + if ((size_t)ssize != size) { + logmsgx("message_send: sendmsg: sent %zd of %zu bytes", + ssize, size); return (-1); } - (void)alarm(TIMEOUT); + if (!send_data_flag) + if (sync_send() < 0) + return (-1); - fd = accept(listenfd, (struct sockaddr *)NULL, (socklen_t *)NULL); + return (0); +} - (void)alarm(0); +static int +message_sendn(int fd, struct msghdr *msghdr) +{ + u_int i; - if (fd < 0) { - logmsg("accept_timeout: accept"); - return (-1); + for (i = 1; i <= ipc_msg.msg_num; ++i) { + dbgmsg("message #%u", i); + if (message_send(fd, msghdr) < 0) + return (-1); } - - return (fd); + return (0); } -/* - * recvmsg() with timeout. - */ static int -recvmsg_timeout(int fd, struct msghdr *msg, size_t n) +message_recv(int fd, struct msghdr *msghdr) { - ssize_t nread; + const struct cmsghdr *cmsghdr; + size_t size; + ssize_t ssize; - dbgmsg(("receiving %lu bytes", (u_long)n)); + if (!send_data_flag) + if (sync_recv() < 0) + return (-1); - if (sigsetjmp(env_alrm, 1) != 0) { - logmsgx("recvmsg_timeout: cannot receive message (timeout)"); + size = msghdr->msg_iov != NULL ? msghdr->msg_iov->iov_len : 0; + ssize = recvmsg(fd, msghdr, MSG_WAITALL); + if (ssize < 0) { + logmsg("message_recv: recvmsg"); return (-1); } - - (void)alarm(TIMEOUT); - - nread = recvmsg(fd, msg, MSG_WAITALL); - - (void)alarm(0); - - if (nread < 0) { - logmsg("recvmsg_timeout: recvmsg"); + if ((size_t)ssize != size) { + logmsgx("message_recv: recvmsg: received %zd of %zu bytes", + ssize, size); return (-1); } - if ((size_t)nread != n) { - logmsgx("recvmsg_timeout: recvmsg: short read: %ld of %lu bytes", - (long)nread, (u_long)n); + dbgmsg("recv: data size %zd", ssize); + dbgmsg("recv: msghdr.msg_controllen %u", + (u_int)msghdr->msg_controllen); + cmsghdr = CMSG_FIRSTHDR(msghdr); + if (cmsghdr != NULL) + dbgmsg("recv: cmsghdr.cmsg_len %u", + (u_int)cmsghdr->cmsg_len); + + if (memcmp(ipc_msg.buf_recv, ipc_msg.buf_send, size) != 0) { + logmsgx("message_recv: received message has wrong content"); return (-1); } return (0); } -/* - * Wait for synchronization message (1 byte) with timeout. - */ static int -sync_recv(int fd) +socket_accept(int listenfd) { - ssize_t nread; - char buf; - - dbgmsg(("waiting for sync message")); - - if (sigsetjmp(env_alrm, 1) != 0) { - logmsgx("sync_recv: cannot receive sync message (timeout)"); + fd_set rset; + struct timeval tv; + int fd, rv, val; + + dbgmsg("accept"); + + FD_ZERO(&rset); + FD_SET(listenfd, &rset); + tv.tv_sec = TIMEOUT; + tv.tv_usec = 0; + rv = select(listenfd + 1, &rset, (fd_set *)NULL, (fd_set *)NULL, &tv); + if (rv < 0) { + logmsg("socket_accept: select"); return (-1); } - - (void)alarm(TIMEOUT); - - nread = read(fd, &buf, 1); - - (void)alarm(0); - - if (nread < 0) { - logmsg("sync_recv: read"); + if (rv == 0) { + logmsgx("socket_accept: select timeout"); return (-1); } - if (nread != 1) { - logmsgx("sync_recv: read: short read: %ld of 1 byte", - (long)nread); + fd = accept(listenfd, (struct sockaddr *)NULL, (socklen_t *)NULL); + if (fd < 0) { + logmsg("socket_accept: accept"); return (-1); } - return (0); + val = fcntl(fd, F_GETFL, 0); + if (val < 0) { + logmsg("socket_accept: fcntl(F_GETFL)"); + goto failed; + } + if (fcntl(fd, F_SETFL, val & ~O_NONBLOCK) < 0) { + logmsg("socket_accept: fcntl(F_SETFL)"); + goto failed; + } + + return (fd); + +failed: + if (close(fd) < 0) + logmsg("socket_accept: close"); + return (-1); } -/* - * Send synchronization message (1 byte) with timeout. - */ static int -sync_send(int fd) +check_msghdr(const struct msghdr *msghdr, size_t size) { - ssize_t nsent; - - dbgmsg(("sending sync message")); - - if (sigsetjmp(env_alrm, 1) != 0) { - logmsgx("sync_send: cannot send sync message (timeout)"); + if (msghdr->msg_flags & MSG_TRUNC) { + logmsgx("msghdr.msg_flags has MSG_TRUNC"); return (-1); } - - (void)alarm(TIMEOUT); - - nsent = write(fd, "", 1); - - (void)alarm(0); - - if (nsent < 0) { - logmsg("sync_send: write"); + if (msghdr->msg_flags & MSG_CTRUNC) { + logmsgx("msghdr.msg_flags has MSG_CTRUNC"); return (-1); } - - if (nsent != 1) { - logmsgx("sync_send: write: short write: %ld of 1 byte", - (long)nsent); + if (msghdr->msg_controllen < size) { + logmsgx("msghdr.msg_controllen %u < %zu", + (u_int)msghdr->msg_controllen, size); + return (-1); + } + if (msghdr->msg_controllen > 0 && size == 0) { + logmsgx("msghdr.msg_controllen %u > 0", + (u_int)msghdr->msg_controllen); return (-1); } - return (0); } -/* - * waitpid() for client with timeout. - */ static int -wait_client(void) +check_cmsghdr(const struct cmsghdr *cmsghdr, int type, size_t size) { - int status; - pid_t pid; - - if (sigsetjmp(env_alrm, 1) != 0) { - logmsgx("wait_client: cannot get exit status of client PID %ld (timeout)", - (long)client_pid); + if (cmsghdr == NULL) { + logmsgx("cmsghdr is NULL"); return (-1); } - - (void)alarm(TIMEOUT); - - pid = waitpid(client_pid, &status, 0); - - (void)alarm(0); - - if (pid == (pid_t)-1) { - logmsg("wait_client: waitpid"); + if (cmsghdr->cmsg_level != SOL_SOCKET) { + logmsgx("cmsghdr.cmsg_level %d != SOL_SOCKET", + cmsghdr->cmsg_level); return (-1); } - - if (WIFEXITED(status)) { - if (WEXITSTATUS(status) != 0) { - logmsgx("wait_client: exit status of client PID %ld is %d", - (long)client_pid, WEXITSTATUS(status)); - return (-1); - } - } else { - if (WIFSIGNALED(status)) - logmsgx("wait_client: abnormal termination of client PID %ld, signal %d%s", - (long)client_pid, WTERMSIG(status), WCOREDUMP(status) ? " (core file generated)" : ""); - else - logmsgx("wait_client: termination of client PID %ld, unknown status", - (long)client_pid); + if (cmsghdr->cmsg_type != type) { + logmsgx("cmsghdr.cmsg_type %d != %d", + cmsghdr->cmsg_type, type); + return (-1); + } + if (cmsghdr->cmsg_len != CMSG_LEN(size)) { + logmsgx("cmsghdr.cmsg_len %u != %zu", + (u_int)cmsghdr->cmsg_len, CMSG_LEN(size)); return (-1); } - return (0); } -/* - * Check if n supplementary GIDs in gids are correct. (my_gids + 1) - * has (my_ngids - 1) supplementary GIDs of current process. - */ static int -check_groups(const gid_t *gids, int n) +check_groups(const char *gid_arr_str, const gid_t *gid_arr, + const char *gid_num_str, int gid_num, bool all_gids) { - char match[NGROUPS_MAX] = { 0 }; - int error, i, j; + int i; - if (n != my_ngids - 1) { - logmsgx("wrong number of groups %d != %d (returned from getgroups() - 1)", - n, my_ngids - 1); - error = -1; - } else - error = 0; - for (i = 0; i < n; ++i) { - for (j = 1; j < my_ngids; ++j) { - if (gids[i] == my_gids[j]) { - if (match[j]) { - logmsgx("duplicated GID %lu", - (u_long)gids[i]); - error = -1; - } else - match[j] = 1; - break; - } + for (i = 0; i < gid_num; ++i) + dbgmsg("%s[%d] %lu", gid_arr_str, i, (u_long)gid_arr[i]); + + if (all_gids) { + if (gid_num != proc_cred.gid_num) { + logmsgx("%s %d != %d", gid_num_str, gid_num, + proc_cred.gid_num); + return (-1); } - if (j == my_ngids) { - logmsgx("unexpected GID %lu", (u_long)gids[i]); - error = -1; + } else { + if (gid_num > proc_cred.gid_num) { + logmsgx("%s %d > %d", gid_num_str, gid_num, + proc_cred.gid_num); + return (-1); } } - for (j = 1; j < my_ngids; ++j) - if (match[j] == 0) { - logmsgx("did not receive supplementary GID %u", my_gids[j]); - error = -1; - } - return (error); + if (memcmp(gid_arr, proc_cred.gid_arr, + gid_num * sizeof(*gid_arr)) != 0) { + logmsgx("%s content is wrong", gid_arr_str); + for (i = 0; i < gid_num; ++i) + if (gid_arr[i] != proc_cred.gid_arr[i]) { + logmsgx("%s[%d] %lu != %lu", + gid_arr_str, i, (u_long)gid_arr[i], + (u_long)proc_cred.gid_arr[i]); + break; + } + return (-1); + } + return (0); } -/* - * Send n messages with data and control message with SCM_CREDS type - * to server and exit. - */ -static void -t_cmsgcred_client(u_int n) +static int +check_xucred(const struct xucred *xucred, socklen_t len) { - union { - struct cmsghdr cm; - char control[CMSG_SPACE(sizeof(struct cmsgcred))]; - } control_un; - struct msghdr msg; - struct iovec iov[1]; - struct cmsghdr *cmptr; - int fd; - u_int i; + if (len != sizeof(*xucred)) { + logmsgx("option value size %zu != %zu", + (size_t)len, sizeof(*xucred)); + return (-1); + } - assert(n == 1 || n == 2); + dbgmsg("xucred.cr_version %u", xucred->cr_version); + dbgmsg("xucred.cr_uid %lu", (u_long)xucred->cr_uid); + dbgmsg("xucred.cr_ngroups %d", xucred->cr_ngroups); + + if (xucred->cr_version != XUCRED_VERSION) { + logmsgx("xucred.cr_version %u != %d", + xucred->cr_version, XUCRED_VERSION); + return (-1); + } + if (xucred->cr_uid != proc_cred.euid) { + logmsgx("xucred.cr_uid %lu != %lu (EUID)", + (u_long)xucred->cr_uid, (u_long)proc_cred.euid); + return (-1); + } + if (xucred->cr_ngroups == 0) { + logmsgx("xucred.cr_ngroups == 0"); + return (-1); + } + if (xucred->cr_ngroups < 0) { + logmsgx("xucred.cr_ngroups < 0"); + return (-1); + } + if (xucred->cr_ngroups > XU_NGROUPS) { + logmsgx("xucred.cr_ngroups %hu > %u (max)", + xucred->cr_ngroups, XU_NGROUPS); + return (-1); + } + if (xucred->cr_groups[0] != proc_cred.egid) { + logmsgx("xucred.cr_groups[0] %lu != %lu (EGID)", + (u_long)xucred->cr_groups[0], (u_long)proc_cred.egid); + return (-1); + } + if (check_groups("xucred.cr_groups", xucred->cr_groups, + "xucred.cr_ngroups", xucred->cr_ngroups, false) < 0) + return (-1); + return (0); +} - if ((fd = create_unbound_socket()) < 0) - goto failed; +static int +check_scm_creds_cmsgcred(struct cmsghdr *cmsghdr) +{ + const struct cmsgcred *cmsgcred; - if (connect_server(fd) < 0) - goto failed_close; + if (check_cmsghdr(cmsghdr, SCM_CREDS, sizeof(*cmsgcred)) < 0) + return (-1); - iov[0].iov_base = ipc_message; - iov[0].iov_len = IPC_MESSAGE_SIZE; + cmsgcred = (struct cmsgcred *)CMSG_DATA(cmsghdr); - msg.msg_name = NULL; - msg.msg_namelen = 0; - msg.msg_iov = iov; - msg.msg_iovlen = 1; - msg.msg_control = control_un.control; - msg.msg_controllen = no_control_data ? - sizeof(struct cmsghdr) : sizeof(control_un.control); - msg.msg_flags = 0; - - cmptr = CMSG_FIRSTHDR(&msg); - cmptr->cmsg_len = CMSG_LEN(no_control_data ? - 0 : sizeof(struct cmsgcred)); - cmptr->cmsg_level = SOL_SOCKET; - cmptr->cmsg_type = SCM_CREDS; - - for (i = 0; i < n; ++i) { - dbgmsg(("#%u msg_controllen = %u, cmsg_len = %u", i, - (u_int)msg.msg_controllen, (u_int)cmptr->cmsg_len)); - if (sendmsg_timeout(fd, &msg, IPC_MESSAGE_SIZE) < 0) - goto failed_close; + dbgmsg("cmsgcred.cmcred_pid %ld", (long)cmsgcred->cmcred_pid); + dbgmsg("cmsgcred.cmcred_uid %lu", (u_long)cmsgcred->cmcred_uid); + dbgmsg("cmsgcred.cmcred_euid %lu", (u_long)cmsgcred->cmcred_euid); + dbgmsg("cmsgcred.cmcred_gid %lu", (u_long)cmsgcred->cmcred_gid); + dbgmsg("cmsgcred.cmcred_ngroups %d", cmsgcred->cmcred_ngroups); + + if (cmsgcred->cmcred_pid != client_pid) { + logmsgx("cmsgcred.cmcred_pid %ld != %ld", + (long)cmsgcred->cmcred_pid, (long)client_pid); + return (-1); + } + if (cmsgcred->cmcred_uid != proc_cred.uid) { + logmsgx("cmsgcred.cmcred_uid %lu != %lu", + (u_long)cmsgcred->cmcred_uid, (u_long)proc_cred.uid); + return (-1); + } + if (cmsgcred->cmcred_euid != proc_cred.euid) { + logmsgx("cmsgcred.cmcred_euid %lu != %lu", + (u_long)cmsgcred->cmcred_euid, (u_long)proc_cred.euid); + return (-1); + } + if (cmsgcred->cmcred_gid != proc_cred.gid) { + logmsgx("cmsgcred.cmcred_gid %lu != %lu", + (u_long)cmsgcred->cmcred_gid, (u_long)proc_cred.gid); + return (-1); + } + if (cmsgcred->cmcred_ngroups == 0) { + logmsgx("cmsgcred.cmcred_ngroups == 0"); + return (-1); + } + if (cmsgcred->cmcred_ngroups < 0) { + logmsgx("cmsgcred.cmcred_ngroups %d < 0", + cmsgcred->cmcred_ngroups); + return (-1); + } + if (cmsgcred->cmcred_ngroups > CMGROUP_MAX) { + logmsgx("cmsgcred.cmcred_ngroups %d > %d", + cmsgcred->cmcred_ngroups, CMGROUP_MAX); + return (-1); + } + if (cmsgcred->cmcred_groups[0] != proc_cred.egid) { + logmsgx("cmsgcred.cmcred_groups[0] %lu != %lu (EGID)", + (u_long)cmsgcred->cmcred_groups[0], (u_long)proc_cred.egid); + return (-1); } + if (check_groups("cmsgcred.cmcred_groups", cmsgcred->cmcred_groups, + "cmsgcred.cmcred_ngroups", cmsgcred->cmcred_ngroups, false) < 0) + return (-1); + return (0); +} - if (close_socket((const char *)NULL, fd) < 0) - goto failed; +static int +check_scm_creds_sockcred(struct cmsghdr *cmsghdr) +{ + const struct sockcred *sockcred; - _exit(0); + if (check_cmsghdr(cmsghdr, SCM_CREDS, + SOCKCREDSIZE(proc_cred.gid_num)) < 0) + return (-1); -failed_close: - (void)close_socket((const char *)NULL, fd); + sockcred = (struct sockcred *)CMSG_DATA(cmsghdr); -failed: - _exit(1); + dbgmsg("sockcred.sc_uid %lu", (u_long)sockcred->sc_uid); + dbgmsg("sockcred.sc_euid %lu", (u_long)sockcred->sc_euid); + dbgmsg("sockcred.sc_gid %lu", (u_long)sockcred->sc_gid); + dbgmsg("sockcred.sc_egid %lu", (u_long)sockcred->sc_egid); + dbgmsg("sockcred.sc_ngroups %d", sockcred->sc_ngroups); + + if (sockcred->sc_uid != proc_cred.uid) { + logmsgx("sockcred.sc_uid %lu != %lu", + (u_long)sockcred->sc_uid, (u_long)proc_cred.uid); + return (-1); + } + if (sockcred->sc_euid != proc_cred.euid) { + logmsgx("sockcred.sc_euid %lu != %lu", + (u_long)sockcred->sc_euid, (u_long)proc_cred.euid); + return (-1); + } + if (sockcred->sc_gid != proc_cred.gid) { + logmsgx("sockcred.sc_gid %lu != %lu", + (u_long)sockcred->sc_gid, (u_long)proc_cred.gid); + return (-1); + } + if (sockcred->sc_egid != proc_cred.egid) { + logmsgx("sockcred.sc_egid %lu != %lu", + (u_long)sockcred->sc_egid, (u_long)proc_cred.egid); + return (-1); + } + if (sockcred->sc_ngroups == 0) { + logmsgx("sockcred.sc_ngroups == 0"); + return (-1); + } + if (sockcred->sc_ngroups < 0) { + logmsgx("sockcred.sc_ngroups %d < 0", + sockcred->sc_ngroups); + return (-1); + } + if (sockcred->sc_ngroups != proc_cred.gid_num) { + logmsgx("sockcred.sc_ngroups %d != %u", + sockcred->sc_ngroups, proc_cred.gid_num); + return (-1); + } + if (check_groups("sockcred.sc_groups", sockcred->sc_groups, + "sockcred.sc_ngroups", sockcred->sc_ngroups, true) < 0) + return (-1); + return (0); } -/* - * Receive two messages with data and control message with SCM_CREDS - * type followed by struct cmsgcred{} from client. fd1 is a listen - * socket for stream sockets or simply socket for datagram sockets. - */ static int -t_cmsgcred_server(int fd1) +check_scm_timestamp(struct cmsghdr *cmsghdr) { - char buf[IPC_MESSAGE_SIZE]; - union { - struct cmsghdr cm; - char control[CMSG_SPACE(sizeof(struct cmsgcred)) + EXTRA_CMSG_SPACE]; - } control_un; - struct msghdr msg; - struct iovec iov[1]; - struct cmsghdr *cmptr; - const struct cmsgcred *cmcredptr; - socklen_t controllen; - int error, error2, fd2; - u_int i; + const struct timeval *timeval; - if (sock_type == SOCK_STREAM) { - if ((fd2 = accept_timeout(fd1)) < 0) - return (-2); - } else - fd2 = fd1; + if (check_cmsghdr(cmsghdr, SCM_TIMESTAMP, sizeof(struct timeval)) < 0) + return (-1); - error = 0; + timeval = (struct timeval *)CMSG_DATA(cmsghdr); - controllen = sizeof(control_un.control); + dbgmsg("timeval.tv_sec %"PRIdMAX", timeval.tv_usec %"PRIdMAX, + (intmax_t)timeval->tv_sec, (intmax_t)timeval->tv_usec); - for (i = 0; i < 2; ++i) { - iov[0].iov_base = buf; - iov[0].iov_len = sizeof(buf); + return (0); +} - msg.msg_name = NULL; - msg.msg_namelen = 0; - msg.msg_iov = iov; - msg.msg_iovlen = 1; - msg.msg_control = control_un.control; - msg.msg_controllen = controllen; - msg.msg_flags = 0; +static int +check_scm_bintime(struct cmsghdr *cmsghdr) +{ + const struct bintime *bintime; - controllen = CMSG_SPACE(sizeof(struct cmsgcred)); + if (check_cmsghdr(cmsghdr, SCM_BINTIME, sizeof(struct bintime)) < 0) + return (-1); - if (recvmsg_timeout(fd2, &msg, sizeof(buf)) < 0) - goto failed; + bintime = (struct bintime *)CMSG_DATA(cmsghdr); - if (msg.msg_flags & MSG_CTRUNC) { - logmsgx("#%u control data was truncated, MSG_CTRUNC flag is on", - i); - goto next_error; - } + dbgmsg("bintime.sec %"PRIdMAX", bintime.frac %"PRIu64, + (intmax_t)bintime->sec, bintime->frac); - if (msg.msg_controllen < sizeof(struct cmsghdr)) { - logmsgx("#%u msg_controllen %u < %lu (sizeof(struct cmsghdr))", - i, (u_int)msg.msg_controllen, (u_long)sizeof(struct cmsghdr)); - goto next_error; - } + return (0); +} - if ((cmptr = CMSG_FIRSTHDR(&msg)) == NULL) { - logmsgx("CMSG_FIRSTHDR is NULL"); - goto next_error; - } +static void +msghdr_init_generic(struct msghdr *msghdr, struct iovec *iov, void *cmsg_data) +{ + msghdr->msg_name = NULL; + msghdr->msg_namelen = 0; + if (send_data_flag) { + iov->iov_base = server_flag ? + ipc_msg.buf_recv : ipc_msg.buf_send; + iov->iov_len = ipc_msg.buf_size; + msghdr->msg_iov = iov; + msghdr->msg_iovlen = 1; + } else { + msghdr->msg_iov = NULL; + msghdr->msg_iovlen = 0; + } + msghdr->msg_control = cmsg_data; + msghdr->msg_flags = 0; +} - dbgmsg(("#%u msg_controllen = %u, cmsg_len = %u", i, - (u_int)msg.msg_controllen, (u_int)cmptr->cmsg_len)); +static void +msghdr_init_server(struct msghdr *msghdr, struct iovec *iov, + void *cmsg_data, size_t cmsg_size) +{ + msghdr_init_generic(msghdr, iov, cmsg_data); + msghdr->msg_controllen = cmsg_size; + dbgmsg("init: data size %zu", msghdr->msg_iov != NULL ? + msghdr->msg_iov->iov_len : (size_t)0); + dbgmsg("init: msghdr.msg_controllen %u", + (u_int)msghdr->msg_controllen); +} - if (cmptr->cmsg_level != SOL_SOCKET) { - logmsgx("#%u cmsg_level %d != SOL_SOCKET", i, - cmptr->cmsg_level); - goto next_error; - } +static void +msghdr_init_client(struct msghdr *msghdr, struct iovec *iov, + void *cmsg_data, size_t cmsg_size, int type, size_t arr_size) +{ + struct cmsghdr *cmsghdr; - if (cmptr->cmsg_type != SCM_CREDS) { - logmsgx("#%u cmsg_type %d != SCM_CREDS", i, - cmptr->cmsg_type); - goto next_error; - } + msghdr_init_generic(msghdr, iov, cmsg_data); + if (cmsg_data != NULL) { + msghdr->msg_controllen = send_array_flag ? + cmsg_size : CMSG_SPACE(0); + cmsghdr = CMSG_FIRSTHDR(msghdr); + cmsghdr->cmsg_level = SOL_SOCKET; + cmsghdr->cmsg_type = type; + cmsghdr->cmsg_len = CMSG_LEN(send_array_flag ? arr_size : 0); + } else + msghdr->msg_controllen = 0; +} - if (cmptr->cmsg_len != CMSG_LEN(sizeof(struct cmsgcred))) { - logmsgx("#%u cmsg_len %u != %lu (CMSG_LEN(sizeof(struct cmsgcred))", - i, (u_int)cmptr->cmsg_len, (u_long)CMSG_LEN(sizeof(struct cmsgcred))); - goto next_error; - } +static int +t_generic(int (*client_func)(int), int (*server_func)(int)) +{ + int fd, rv, rv_client; - cmcredptr = (const struct cmsgcred *)CMSG_DATA(cmptr); + switch (client_fork()) { + case 0: + fd = socket_create(); + if (fd < 0) + rv = -2; + else { + rv = client_func(fd); + if (socket_close(fd) < 0) + rv = -2; + } + client_exit(rv); + break; + case 1: + fd = socket_create(); + if (fd < 0) + rv = -2; + else { + rv = server_func(fd); + rv_client = client_wait(); + if (rv == 0 || (rv == -2 && rv_client != 0)) + rv = rv_client; + if (socket_close(fd) < 0) + rv = -2; + } + break; + default: + rv = -2; + } + return (rv); +} - error2 = 0; - if (cmcredptr->cmcred_pid != client_pid) { - logmsgx("#%u cmcred_pid %ld != %ld (PID of client)", - i, (long)cmcredptr->cmcred_pid, (long)client_pid); - error2 = 1; - } - if (cmcredptr->cmcred_uid != my_uid) { - logmsgx("#%u cmcred_uid %lu != %lu (UID of current process)", - i, (u_long)cmcredptr->cmcred_uid, (u_long)my_uid); - error2 = 1; - } - if (cmcredptr->cmcred_euid != my_euid) { - logmsgx("#%u cmcred_euid %lu != %lu (EUID of current process)", - i, (u_long)cmcredptr->cmcred_euid, (u_long)my_euid); - error2 = 1; - } - if (cmcredptr->cmcred_gid != my_gid) { - logmsgx("#%u cmcred_gid %lu != %lu (GID of current process)", - i, (u_long)cmcredptr->cmcred_gid, (u_long)my_gid); - error2 = 1; - } - if (cmcredptr->cmcred_ngroups == 0) { - logmsgx("#%u cmcred_ngroups = 0, this is wrong", i); - error2 = 1; - } else { - if (cmcredptr->cmcred_ngroups > NGROUPS_MAX) { - logmsgx("#%u cmcred_ngroups %d > %u (NGROUPS_MAX)", - i, cmcredptr->cmcred_ngroups, NGROUPS_MAX); - error2 = 1; - } else if (cmcredptr->cmcred_ngroups < 0) { - logmsgx("#%u cmcred_ngroups %d < 0", - i, cmcredptr->cmcred_ngroups); - error2 = 1; - } else { - dbgmsg(("#%u cmcred_ngroups = %d", i, - cmcredptr->cmcred_ngroups)); - if (cmcredptr->cmcred_groups[0] != my_egid) { - logmsgx("#%u cmcred_groups[0] %lu != %lu (EGID of current process)", - i, (u_long)cmcredptr->cmcred_groups[0], (u_long)my_egid); - error2 = 1; - } - if (check_groups(cmcredptr->cmcred_groups + 1, cmcredptr->cmcred_ngroups - 1) < 0) { - logmsgx("#%u cmcred_groups has wrong GIDs", i); - error2 = 1; - } - } - } +static int +t_cmsgcred_client(int fd) +{ + struct msghdr msghdr; + struct iovec iov[1]; + void *cmsg_data; + size_t cmsg_size; + int rv; - if (error2) - goto next_error; + if (sync_recv() < 0) + return (-2); - if ((cmptr = CMSG_NXTHDR(&msg, cmptr)) != NULL) { - logmsgx("#%u control data has extra header", i); - goto next_error; - } + rv = -2; - continue; -next_error: - error = -1; + cmsg_size = CMSG_SPACE(sizeof(struct cmsgcred)); + cmsg_data = malloc(cmsg_size); + if (cmsg_data == NULL) { + logmsg("malloc"); + goto done; } + msghdr_init_client(&msghdr, iov, cmsg_data, cmsg_size, + SCM_CREDS, sizeof(struct cmsgcred)); - if (sock_type == SOCK_STREAM) - if (close(fd2) < 0) { - logmsg("close"); - return (-2); - } - return (error); + if (socket_connect(fd) < 0) + goto done; -failed: - if (sock_type == SOCK_STREAM) - if (close(fd2) < 0) - logmsg("close"); - return (-2); + if (message_sendn(fd, &msghdr) < 0) + goto done; + + rv = 0; +done: + free(cmsg_data); + return (rv); } static int -t_cmsgcred(void) +t_cmsgcred_server(int fd1) { - int error, fd; + struct msghdr msghdr; + struct iovec iov[1]; + struct cmsghdr *cmsghdr; + void *cmsg_data; + size_t cmsg_size; + u_int i; + int fd2, rv; - if ((fd = create_server_socket()) < 0) + if (sync_send() < 0) return (-2); - if (sock_type == SOCK_STREAM) - if (listen(fd, LISTENQ) < 0) { - logmsg("listen"); - goto failed; - } + fd2 = -1; + rv = -2; - if ((client_pid = fork()) == (pid_t)-1) { - logmsg("fork"); - goto failed; + cmsg_size = CMSG_SPACE(sizeof(struct cmsgcred)); + cmsg_data = malloc(cmsg_size); + if (cmsg_data == NULL) { + logmsg("malloc"); + goto done; } - if (client_pid == 0) { - myname = "CLIENT"; - if (close_socket((const char *)NULL, fd) < 0) - _exit(1); - t_cmsgcred_client(2); - } + if (sock_type == SOCK_STREAM) { + fd2 = socket_accept(fd1); + if (fd2 < 0) + goto done; + } else + fd2 = fd1; - if ((error = t_cmsgcred_server(fd)) == -2) { - (void)wait_client(); - goto failed; - } + rv = -1; + for (i = 1; i <= ipc_msg.msg_num; ++i) { + dbgmsg("message #%u", i); + + msghdr_init_server(&msghdr, iov, cmsg_data, cmsg_size); + if (message_recv(fd2, &msghdr) < 0) { + rv = -2; + break; + } - if (wait_client() < 0) - goto failed; + if (check_msghdr(&msghdr, sizeof(*cmsghdr)) < 0) + break; - if (close_socket(serv_sock_path, fd) < 0) { - logmsgx("close_socket failed"); - return (-2); + cmsghdr = CMSG_FIRSTHDR(&msghdr); + if (check_scm_creds_cmsgcred(cmsghdr) < 0) + break; } - return (error); + if (i > ipc_msg.msg_num) + rv = 0; +done: + free(cmsg_data); + if (sock_type == SOCK_STREAM && fd2 >= 0) + if (socket_close(fd2) < 0) + rv = -2; + return (rv); +} -failed: - if (close_socket(serv_sock_path, fd) < 0) - logmsgx("close_socket failed"); - return (-2); +static int +t_cmsgcred(void) +{ + return (t_generic(t_cmsgcred_client, t_cmsgcred_server)); } -/* - * Send two messages with data to server and exit. - */ -static void -t_sockcred_client(int type) +static int +t_sockcred_client(int type, int fd) { - struct msghdr msg; + struct msghdr msghdr; struct iovec iov[1]; - int fd; - u_int i; - - assert(type == 0 || type == 1); + int rv; - if ((fd = create_unbound_socket()) < 0) - goto failed; + if (sync_recv() < 0) + return (-2); - if (connect_server(fd) < 0) - goto failed_close; + rv = -2; - if (type == 1) - if (sync_recv(fd) < 0) - goto failed_close; - - iov[0].iov_base = ipc_message; - iov[0].iov_len = IPC_MESSAGE_SIZE; - - msg.msg_name = NULL; - msg.msg_namelen = 0; - msg.msg_iov = iov; - msg.msg_iovlen = 1; - msg.msg_control = NULL; - msg.msg_controllen = 0; - msg.msg_flags = 0; - - for (i = 0; i < 2; ++i) - if (sendmsg_timeout(fd, &msg, IPC_MESSAGE_SIZE) < 0) - goto failed_close; + msghdr_init_client(&msghdr, iov, NULL, 0, 0, 0); - if (close_socket((const char *)NULL, fd) < 0) - goto failed; + if (socket_connect(fd) < 0) + goto done; - _exit(0); + if (type == 2) + if (sync_recv() < 0) + goto done; -failed_close: - (void)close_socket((const char *)NULL, fd); + if (message_sendn(fd, &msghdr) < 0) + goto done; -failed: - _exit(1); + rv = 0; +done: + return (rv); } -/* - * Receive one message with data and control message with SCM_CREDS - * type followed by struct sockcred{} and if n is not equal 1, then - * receive another one message with data. fd1 is a listen socket for - * stream sockets or simply socket for datagram sockets. If type is - * 1, then set LOCAL_CREDS option for accepted stream socket. - */ static int -t_sockcred_server(int type, int fd1, u_int n) +t_sockcred_server(int type, int fd1) { - char buf[IPC_MESSAGE_SIZE]; - union { - struct cmsghdr cm; - char control[CMSG_SPACE(SOCKCREDSIZE(NGROUPS_MAX)) + EXTRA_CMSG_SPACE]; - } control_un; - struct msghdr msg; + struct msghdr msghdr; struct iovec iov[1]; - struct cmsghdr *cmptr; - const struct sockcred *sockcred; - int error, error2, fd2, optval; + struct cmsghdr *cmsghdr; + void *cmsg_data; + size_t cmsg_size; u_int i; + int fd2, rv, val; - assert(n == 1 || n == 2); - assert(type == 0 || type == 1); + fd2 = -1; + rv = -2; - if (sock_type == SOCK_STREAM) { - if ((fd2 = accept_timeout(fd1)) < 0) - return (-2); - if (type == 1) { - optval = 1; - if (setsockopt(fd2, 0, LOCAL_CREDS, &optval, sizeof optval) < 0) { - logmsg("setsockopt(LOCAL_CREDS) for accepted socket"); - if (errno == ENOPROTOOPT) { - error = -1; - goto done_close; - } - goto failed; - } - if (sync_send(fd2) < 0) - goto failed; - } - } else - fd2 = fd1; - - error = 0; - - for (i = 0; i < n; ++i) { - iov[0].iov_base = buf; - iov[0].iov_len = sizeof buf; - - msg.msg_name = NULL; - msg.msg_namelen = 0; - msg.msg_iov = iov; - msg.msg_iovlen = 1; - msg.msg_control = control_un.control; - msg.msg_controllen = sizeof control_un.control; - msg.msg_flags = 0; - - if (recvmsg_timeout(fd2, &msg, sizeof buf) < 0) - goto failed; - - if (msg.msg_flags & MSG_CTRUNC) { - logmsgx("control data was truncated, MSG_CTRUNC flag is on"); - goto next_error; - } - - if (i != 0 && sock_type == SOCK_STREAM) { - if (msg.msg_controllen != 0) { - logmsgx("second message has control data, this is wrong for stream sockets"); - goto next_error; - } - dbgmsg(("#%u msg_controllen = %u", i, - (u_int)msg.msg_controllen)); - continue; - } + cmsg_size = CMSG_SPACE(SOCKCREDSIZE(proc_cred.gid_num)); + cmsg_data = malloc(cmsg_size); + if (cmsg_data == NULL) { + logmsg("malloc"); + goto done; + } - if (msg.msg_controllen < sizeof(struct cmsghdr)) { - logmsgx("#%u msg_controllen %u < %lu (sizeof(struct cmsghdr))", - i, (u_int)msg.msg_controllen, (u_long)sizeof(struct cmsghdr)); - goto next_error; + if (type == 1) { + dbgmsg("setting LOCAL_CREDS"); + val = 1; + if (setsockopt(fd1, 0, LOCAL_CREDS, &val, sizeof(val)) < 0) { + logmsg("setsockopt(LOCAL_CREDS)"); + goto done; } + } - if ((cmptr = CMSG_FIRSTHDR(&msg)) == NULL) { - logmsgx("CMSG_FIRSTHDR is NULL"); - goto next_error; - } + if (sync_send() < 0) + goto done; - dbgmsg(("#%u msg_controllen = %u, cmsg_len = %u", i, - (u_int)msg.msg_controllen, (u_int)cmptr->cmsg_len)); + if (sock_type == SOCK_STREAM) { + fd2 = socket_accept(fd1); + if (fd2 < 0) + goto done; + } else + fd2 = fd1; - if (cmptr->cmsg_level != SOL_SOCKET) { - logmsgx("#%u cmsg_level %d != SOL_SOCKET", i, - cmptr->cmsg_level); - goto next_error; + if (type == 2) { + dbgmsg("setting LOCAL_CREDS"); + val = 1; + if (setsockopt(fd2, 0, LOCAL_CREDS, &val, sizeof(val)) < 0) { + logmsg("setsockopt(LOCAL_CREDS)"); + goto done; + } + if (sync_send() < 0) + goto done; + } + + rv = -1; + for (i = 1; i <= ipc_msg.msg_num; ++i) { + dbgmsg("message #%u", i); + + msghdr_init_server(&msghdr, iov, cmsg_data, cmsg_size); + if (message_recv(fd2, &msghdr) < 0) { + rv = -2; + break; } - if (cmptr->cmsg_type != SCM_CREDS) { - logmsgx("#%u cmsg_type %d != SCM_CREDS", i, - cmptr->cmsg_type); - goto next_error; - } + if (i > 1 && sock_type == SOCK_STREAM) { + if (check_msghdr(&msghdr, 0) < 0) + break; + } else { + if (check_msghdr(&msghdr, sizeof(*cmsghdr)) < 0) + break; - if (cmptr->cmsg_len < CMSG_LEN(SOCKCREDSIZE(1))) { - logmsgx("#%u cmsg_len %u != %lu (CMSG_LEN(SOCKCREDSIZE(1)))", - i, (u_int)cmptr->cmsg_len, (u_long)CMSG_LEN(SOCKCREDSIZE(1))); - goto next_error; + cmsghdr = CMSG_FIRSTHDR(&msghdr); + if (check_scm_creds_sockcred(cmsghdr) < 0) + break; } + } + if (i > ipc_msg.msg_num) + rv = 0; +done: + free(cmsg_data); + if (sock_type == SOCK_STREAM && fd2 >= 0) + if (socket_close(fd2) < 0) + rv = -2; + return (rv); +} - sockcred = (const struct sockcred *)CMSG_DATA(cmptr); +static int +t_sockcred_1(void) +{ + u_int i; + int fd, rv, rv_client; - error2 = 0; - if (sockcred->sc_uid != my_uid) { - logmsgx("#%u sc_uid %lu != %lu (UID of current process)", - i, (u_long)sockcred->sc_uid, (u_long)my_uid); - error2 = 1; - } - if (sockcred->sc_euid != my_euid) { - logmsgx("#%u sc_euid %lu != %lu (EUID of current process)", - i, (u_long)sockcred->sc_euid, (u_long)my_euid); - error2 = 1; - } - if (sockcred->sc_gid != my_gid) { - logmsgx("#%u sc_gid %lu != %lu (GID of current process)", - i, (u_long)sockcred->sc_gid, (u_long)my_gid); - error2 = 1; - } - if (sockcred->sc_egid != my_egid) { - logmsgx("#%u sc_egid %lu != %lu (EGID of current process)", - i, (u_long)sockcred->sc_gid, (u_long)my_egid); - error2 = 1; - } - if (sockcred->sc_ngroups > NGROUPS_MAX) { - logmsgx("#%u sc_ngroups %d > %u (NGROUPS_MAX)", - i, sockcred->sc_ngroups, NGROUPS_MAX); - error2 = 1; - } else if (sockcred->sc_ngroups < 0) { - logmsgx("#%u sc_ngroups %d < 0", - i, sockcred->sc_ngroups); - error2 = 1; - } else { - dbgmsg(("#%u sc_ngroups = %d", i, sockcred->sc_ngroups)); - if (check_groups(sockcred->sc_groups, sockcred->sc_ngroups) < 0) { - logmsgx("#%u sc_groups has wrong GIDs", i); - error2 = 1; + switch (client_fork()) { + case 0: + for (i = 1; i <= 2; ++i) { + dbgmsg("client #%u", i); + fd = socket_create(); + if (fd < 0) + rv = -2; + else { + rv = t_sockcred_client(1, fd); + if (socket_close(fd) < 0) + rv = -2; } + if (rv != 0) + break; } - - if (error2) - goto next_error; - - if ((cmptr = CMSG_NXTHDR(&msg, cmptr)) != NULL) { - logmsgx("#%u control data has extra header, this is wrong", - i); - goto next_error; + client_exit(rv); + break; + case 1: + fd = socket_create(); + if (fd < 0) + rv = -2; + else { + rv = t_sockcred_server(1, fd); + if (rv == 0) + rv = t_sockcred_server(3, fd); + rv_client = client_wait(); + if (rv == 0 || (rv == -2 && rv_client != 0)) + rv = rv_client; + if (socket_close(fd) < 0) + rv = -2; } - - continue; -next_error: - error = -1; + break; + default: + rv = -2; } -done_close: - if (sock_type == SOCK_STREAM) - if (close(fd2) < 0) { - logmsg("close"); - return (-2); - } - return (error); + return (rv); +} -failed: - if (sock_type == SOCK_STREAM) - if (close(fd2) < 0) - logmsg("close"); - return (-2); +static int +t_sockcred_2_client(int fd) +{ + return (t_sockcred_client(2, fd)); } static int -t_sockcred(int type) +t_sockcred_2_server(int fd) { - int error, fd, optval; + return (t_sockcred_server(2, fd)); +} - assert(type == 0 || type == 1); +static int +t_sockcred_2(void) +{ + return (t_generic(t_sockcred_2_client, t_sockcred_2_server)); +} - if ((fd = create_server_socket()) < 0) - return (-2); +static int +t_cmsgcred_sockcred_server(int fd1) +{ + struct msghdr msghdr; + struct iovec iov[1]; + struct cmsghdr *cmsghdr; + void *cmsg_data, *cmsg1_data, *cmsg2_data; + size_t cmsg_size, cmsg1_size, cmsg2_size; + u_int i; + int fd2, rv, val; - if (sock_type == SOCK_STREAM) - if (listen(fd, LISTENQ) < 0) { - logmsg("listen"); - goto failed; - } + fd2 = -1; + rv = -2; - if (type == 0) { - optval = 1; - if (setsockopt(fd, 0, LOCAL_CREDS, &optval, sizeof optval) < 0) { - logmsg("setsockopt(LOCAL_CREDS) for %s socket", - sock_type == SOCK_STREAM ? "stream listening" : "datagram"); - if (errno == ENOPROTOOPT) { - error = -1; - goto done_close; - } - goto failed; - } + cmsg1_size = CMSG_SPACE(SOCKCREDSIZE(proc_cred.gid_num)); + cmsg2_size = CMSG_SPACE(sizeof(struct cmsgcred)); + cmsg1_data = malloc(cmsg1_size); + cmsg2_data = malloc(cmsg2_size); + if (cmsg1_data == NULL || cmsg2_data == NULL) { + logmsg("malloc"); + goto done; } - if ((client_pid = fork()) == (pid_t)-1) { - logmsg("fork"); - goto failed; + dbgmsg("setting LOCAL_CREDS"); + val = 1; + if (setsockopt(fd1, 0, LOCAL_CREDS, &val, sizeof(val)) < 0) { + logmsg("setsockopt(LOCAL_CREDS)"); + goto done; } - if (client_pid == 0) { - myname = "CLIENT"; - if (close_socket((const char *)NULL, fd) < 0) - _exit(1); - t_sockcred_client(type); - } + if (sync_send() < 0) + goto done; - if ((error = t_sockcred_server(type, fd, 2)) == -2) { - (void)wait_client(); - goto failed; - } + if (sock_type == SOCK_STREAM) { + fd2 = socket_accept(fd1); + if (fd2 < 0) + goto done; + } else + fd2 = fd1; - if (wait_client() < 0) - goto failed; + cmsg_data = cmsg1_data; + cmsg_size = cmsg1_size; + rv = -1; + for (i = 1; i <= ipc_msg.msg_num; ++i) { + dbgmsg("message #%u", i); + + msghdr_init_server(&msghdr, iov, cmsg_data, cmsg_size); + if (message_recv(fd2, &msghdr) < 0) { + rv = -2; + break; + } -done_close: - if (close_socket(serv_sock_path, fd) < 0) { - logmsgx("close_socket failed"); - return (-2); - } - return (error); + if (check_msghdr(&msghdr, sizeof(*cmsghdr)) < 0) + break; -failed: - if (close_socket(serv_sock_path, fd) < 0) - logmsgx("close_socket failed"); - return (-2); -} + cmsghdr = CMSG_FIRSTHDR(&msghdr); + if (i == 1 || sock_type == SOCK_DGRAM) { + if (check_scm_creds_sockcred(cmsghdr) < 0) + break; + } else { + if (check_scm_creds_cmsgcred(cmsghdr) < 0) + break; + } -static int -t_sockcred_stream1(void) -{ - return (t_sockcred(0)); + cmsg_data = cmsg2_data; + cmsg_size = cmsg2_size; + } + if (i > ipc_msg.msg_num) + rv = 0; +done: + free(cmsg1_data); + free(cmsg2_data); + if (sock_type == SOCK_STREAM && fd2 >= 0) + if (socket_close(fd2) < 0) + rv = -2; + return (rv); } static int -t_sockcred_stream2(void) +t_cmsgcred_sockcred(void) { - return (t_sockcred(1)); + return (t_generic(t_cmsgcred_client, t_cmsgcred_sockcred_server)); } static int -t_sockcred_dgram(void) +t_timeval_client(int fd) { - return (t_sockcred(0)); + struct msghdr msghdr; + struct iovec iov[1]; + void *cmsg_data; + size_t cmsg_size; + int rv; + + if (sync_recv() < 0) + return (-2); + + rv = -2; + + cmsg_size = CMSG_SPACE(sizeof(struct timeval)); + cmsg_data = malloc(cmsg_size); + if (cmsg_data == NULL) { + logmsg("malloc"); + goto done; + } + msghdr_init_client(&msghdr, iov, cmsg_data, cmsg_size, + SCM_TIMESTAMP, sizeof(struct timeval)); + + if (socket_connect(fd) < 0) + goto done; + + if (message_sendn(fd, &msghdr) < 0) + goto done; + + rv = 0; +done: + free(cmsg_data); + return (rv); } static int -t_cmsgcred_sockcred(void) +t_timeval_server(int fd1) { - int error, fd, optval; + struct msghdr msghdr; + struct iovec iov[1]; + struct cmsghdr *cmsghdr; + void *cmsg_data; + size_t cmsg_size; + u_int i; + int fd2, rv; - if ((fd = create_server_socket()) < 0) + if (sync_send() < 0) return (-2); - if (sock_type == SOCK_STREAM) - if (listen(fd, LISTENQ) < 0) { - logmsg("listen"); - goto failed; - } + fd2 = -1; + rv = -2; - optval = 1; - if (setsockopt(fd, 0, LOCAL_CREDS, &optval, sizeof optval) < 0) { - logmsg("setsockopt(LOCAL_CREDS) for %s socket", - sock_type == SOCK_STREAM ? "stream listening" : "datagram"); - if (errno == ENOPROTOOPT) { - error = -1; - goto done_close; - } - goto failed; - } - - if ((client_pid = fork()) == (pid_t)-1) { - logmsg("fork"); - goto failed; + cmsg_size = CMSG_SPACE(sizeof(struct timeval)); + cmsg_data = malloc(cmsg_size); + if (cmsg_data == NULL) { + logmsg("malloc"); + goto done; } - if (client_pid == 0) { - myname = "CLIENT"; - if (close_socket((const char *)NULL, fd) < 0) - _exit(1); - t_cmsgcred_client(1); - } + if (sock_type == SOCK_STREAM) { + fd2 = socket_accept(fd1); + if (fd2 < 0) + goto done; + } else + fd2 = fd1; - if ((error = t_sockcred_server(0, fd, 1)) == -2) { - (void)wait_client(); - goto failed; - } + rv = -1; + for (i = 1; i <= ipc_msg.msg_num; ++i) { + dbgmsg("message #%u", i); + + msghdr_init_server(&msghdr, iov, cmsg_data, cmsg_size); + if (message_recv(fd2, &msghdr) < 0) { + rv = -2; + break; + } - if (wait_client() < 0) - goto failed; + if (check_msghdr(&msghdr, sizeof(*cmsghdr)) < 0) + break; -done_close: - if (close_socket(serv_sock_path, fd) < 0) { - logmsgx("close_socket failed"); - return (-2); + cmsghdr = CMSG_FIRSTHDR(&msghdr); + if (check_scm_timestamp(cmsghdr) < 0) + break; } - return (error); + if (i > ipc_msg.msg_num) + rv = 0; +done: + free(cmsg_data); + if (sock_type == SOCK_STREAM && fd2 >= 0) + if (socket_close(fd2) < 0) + rv = -2; + return (rv); +} -failed: - if (close_socket(serv_sock_path, fd) < 0) - logmsgx("close_socket failed"); - return (-2); +static int +t_timeval(void) +{ + return (t_generic(t_timeval_client, t_timeval_server)); } -/* - * Send one message with data and control message with SCM_TIMESTAMP - * type to server and exit. - */ -static void -t_timestamp_client(void) +static int +t_bintime_client(int fd) { - union { - struct cmsghdr cm; - char control[CMSG_SPACE(sizeof(struct timeval))]; - } control_un; - struct msghdr msg; + struct msghdr msghdr; struct iovec iov[1]; - struct cmsghdr *cmptr; - int fd; - - if ((fd = create_unbound_socket()) < 0) - goto failed; - - if (connect_server(fd) < 0) - goto failed_close; - - iov[0].iov_base = ipc_message; - iov[0].iov_len = IPC_MESSAGE_SIZE; - - msg.msg_name = NULL; - msg.msg_namelen = 0; - msg.msg_iov = iov; - msg.msg_iovlen = 1; - msg.msg_control = control_un.control; - msg.msg_controllen = no_control_data ? - sizeof(struct cmsghdr) :sizeof control_un.control; - msg.msg_flags = 0; - - cmptr = CMSG_FIRSTHDR(&msg); - cmptr->cmsg_len = CMSG_LEN(no_control_data ? - 0 : sizeof(struct timeval)); - cmptr->cmsg_level = SOL_SOCKET; - cmptr->cmsg_type = SCM_TIMESTAMP; + void *cmsg_data; + size_t cmsg_size; + int rv; - dbgmsg(("msg_controllen = %u, cmsg_len = %u", - (u_int)msg.msg_controllen, (u_int)cmptr->cmsg_len)); + if (sync_recv() < 0) + return (-2); - if (sendmsg_timeout(fd, &msg, IPC_MESSAGE_SIZE) < 0) - goto failed_close; + rv = -2; - if (close_socket((const char *)NULL, fd) < 0) - goto failed; + cmsg_size = CMSG_SPACE(sizeof(struct bintime)); + cmsg_data = malloc(cmsg_size); + if (cmsg_data == NULL) { + logmsg("malloc"); + goto done; + } + msghdr_init_client(&msghdr, iov, cmsg_data, cmsg_size, + SCM_BINTIME, sizeof(struct bintime)); - _exit(0); + if (socket_connect(fd) < 0) + goto done; -failed_close: - (void)close_socket((const char *)NULL, fd); + if (message_sendn(fd, &msghdr) < 0) + goto done; -failed: - _exit(1); + rv = 0; +done: + free(cmsg_data); + return (rv); } -/* - * Receive one message with data and control message with SCM_TIMESTAMP - * type followed by struct timeval{} from client. - */ static int -t_timestamp_server(int fd1) +t_bintime_server(int fd1) { - union { - struct cmsghdr cm; - char control[CMSG_SPACE(sizeof(struct timeval)) + EXTRA_CMSG_SPACE]; - } control_un; - char buf[IPC_MESSAGE_SIZE]; - int error, fd2; - struct msghdr msg; + struct msghdr msghdr; struct iovec iov[1]; - struct cmsghdr *cmptr; - const struct timeval *timeval; + struct cmsghdr *cmsghdr; + void *cmsg_data; + size_t cmsg_size; + u_int i; + int fd2, rv; + + if (sync_send() < 0) + return (-2); + + fd2 = -1; + rv = -2; + + cmsg_size = CMSG_SPACE(sizeof(struct bintime)); + cmsg_data = malloc(cmsg_size); + if (cmsg_data == NULL) { + logmsg("malloc"); + goto done; + } if (sock_type == SOCK_STREAM) { - if ((fd2 = accept_timeout(fd1)) < 0) - return (-2); + fd2 = socket_accept(fd1); + if (fd2 < 0) + goto done; } else fd2 = fd1; - iov[0].iov_base = buf; - iov[0].iov_len = sizeof buf; - - msg.msg_name = NULL; - msg.msg_namelen = 0; - msg.msg_iov = iov; - msg.msg_iovlen = 1; - msg.msg_control = control_un.control; - msg.msg_controllen = sizeof control_un.control; - msg.msg_flags = 0; - - if (recvmsg_timeout(fd2, &msg, sizeof buf) < 0) - goto failed; + rv = -1; + for (i = 1; i <= ipc_msg.msg_num; ++i) { + dbgmsg("message #%u", i); + + msghdr_init_server(&msghdr, iov, cmsg_data, cmsg_size); + if (message_recv(fd2, &msghdr) < 0) { + rv = -2; + break; + } - error = -1; + if (check_msghdr(&msghdr, sizeof(*cmsghdr)) < 0) + break; - if (msg.msg_flags & MSG_CTRUNC) { - logmsgx("control data was truncated, MSG_CTRUNC flag is on"); - goto done; + cmsghdr = CMSG_FIRSTHDR(&msghdr); + if (check_scm_bintime(cmsghdr) < 0) + break; } + if (i > ipc_msg.msg_num) + rv = 0; +done: + free(cmsg_data); + if (sock_type == SOCK_STREAM && fd2 >= 0) + if (socket_close(fd2) < 0) + rv = -2; + return (rv); +} - if (msg.msg_controllen < sizeof(struct cmsghdr)) { - logmsgx("msg_controllen %u < %lu (sizeof(struct cmsghdr))", - (u_int)msg.msg_controllen, (u_long)sizeof(struct cmsghdr)); - goto done; - } +static int +t_bintime(void) +{ + return (t_generic(t_bintime_client, t_bintime_server)); +} - if ((cmptr = CMSG_FIRSTHDR(&msg)) == NULL) { - logmsgx("CMSG_FIRSTHDR is NULL"); - goto done; - } +static int +t_cmsg_len_client(int fd) +{ + struct msghdr msghdr; + struct iovec iov[1]; + struct cmsghdr *cmsghdr; + void *cmsg_data; + size_t size, cmsg_size; + socklen_t socklen; + int rv; - dbgmsg(("msg_controllen = %u, cmsg_len = %u", - (u_int)msg.msg_controllen, (u_int)cmptr->cmsg_len)); + if (sync_recv() < 0) + return (-2); + + rv = -2; - if (cmptr->cmsg_level != SOL_SOCKET) { - logmsgx("cmsg_level %d != SOL_SOCKET", cmptr->cmsg_level); + cmsg_size = CMSG_SPACE(sizeof(struct cmsgcred)); + cmsg_data = malloc(cmsg_size); + if (cmsg_data == NULL) { + logmsg("malloc"); goto done; } + msghdr_init_client(&msghdr, iov, cmsg_data, cmsg_size, + SCM_CREDS, sizeof(struct cmsgcred)); + cmsghdr = CMSG_FIRSTHDR(&msghdr); - if (cmptr->cmsg_type != SCM_TIMESTAMP) { - logmsgx("cmsg_type %d != SCM_TIMESTAMP", cmptr->cmsg_type); + if (socket_connect(fd) < 0) goto done; + + size = msghdr.msg_iov != NULL ? msghdr.msg_iov->iov_len : 0; + rv = -1; + for (socklen = 0; socklen < CMSG_LEN(0); ++socklen) { + cmsghdr->cmsg_len = socklen; + dbgmsg("send: data size %zu", size); + dbgmsg("send: msghdr.msg_controllen %u", + (u_int)msghdr.msg_controllen); + dbgmsg("send: cmsghdr.cmsg_len %u", + (u_int)cmsghdr->cmsg_len); + if (sendmsg(fd, &msghdr, 0) < 0) + continue; + logmsgx("sent message with cmsghdr.cmsg_len %u < %u", + (u_int)cmsghdr->cmsg_len, (u_int)CMSG_LEN(0)); + break; } + if (socklen == CMSG_LEN(0)) + rv = 0; - if (cmptr->cmsg_len != CMSG_LEN(sizeof(struct timeval))) { - logmsgx("cmsg_len %u != %lu (CMSG_LEN(sizeof(struct timeval))", - (u_int)cmptr->cmsg_len, (u_long)CMSG_LEN(sizeof(struct timeval))); + if (sync_send() < 0) { + rv = -2; goto done; } +done: + free(cmsg_data); + return (rv); +} - timeval = (const struct timeval *)CMSG_DATA(cmptr); +static int +t_cmsg_len_server(int fd1) +{ + int fd2, rv; - dbgmsg(("timeval tv_sec %jd, tv_usec %jd", - (intmax_t)timeval->tv_sec, (intmax_t)timeval->tv_usec)); + if (sync_send() < 0) + return (-2); - if ((cmptr = CMSG_NXTHDR(&msg, cmptr)) != NULL) { - logmsgx("control data has extra header"); - goto done; - } + rv = -2; - error = 0; + if (sock_type == SOCK_STREAM) { + fd2 = socket_accept(fd1); + if (fd2 < 0) + goto done; + } else + fd2 = fd1; + if (sync_recv() < 0) + goto done; + + rv = 0; done: - if (sock_type == SOCK_STREAM) - if (close(fd2) < 0) { - logmsg("close"); - return (-2); - } - return (error); + if (sock_type == SOCK_STREAM && fd2 >= 0) + if (socket_close(fd2) < 0) + rv = -2; + return (rv); +} -failed: - if (sock_type == SOCK_STREAM) - if (close(fd2) < 0) - logmsg("close"); - return (-2); +static int +t_cmsg_len(void) +{ + return (t_generic(t_cmsg_len_client, t_cmsg_len_server)); } static int -t_timestamp(void) +t_peercred_client(int fd) { - int error, fd; + struct xucred xucred; + socklen_t len; - if ((fd = create_server_socket()) < 0) - return (-2); + if (sync_recv() < 0) + return (-1); - if (sock_type == SOCK_STREAM) - if (listen(fd, LISTENQ) < 0) { - logmsg("listen"); - goto failed; - } + if (socket_connect(fd) < 0) + return (-1); - if ((client_pid = fork()) == (pid_t)-1) { - logmsg("fork"); - goto failed; + len = sizeof(xucred); + if (getsockopt(fd, 0, LOCAL_PEERCRED, &xucred, &len) < 0) { + logmsg("getsockopt(LOCAL_PEERCRED)"); + return (-1); } - if (client_pid == 0) { - myname = "CLIENT"; - if (close_socket((const char *)NULL, fd) < 0) - _exit(1); - t_timestamp_client(); - } + if (check_xucred(&xucred, len) < 0) + return (-1); - if ((error = t_timestamp_server(fd)) == -2) { - (void)wait_client(); - goto failed; - } + return (0); +} - if (wait_client() < 0) - goto failed; +static int +t_peercred_server(int fd1) +{ + struct xucred xucred; + socklen_t len; + int fd2, rv; - if (close_socket(serv_sock_path, fd) < 0) { - logmsgx("close_socket failed"); + if (sync_send() < 0) return (-2); + + fd2 = socket_accept(fd1); + if (fd2 < 0) + return (-2); + + len = sizeof(xucred); + if (getsockopt(fd2, 0, LOCAL_PEERCRED, &xucred, &len) < 0) { + logmsg("getsockopt(LOCAL_PEERCRED)"); + rv = -2; + goto done; } - return (error); -failed: - if (close_socket(serv_sock_path, fd) < 0) - logmsgx("close_socket failed"); - return (-2); + if (check_xucred(&xucred, len) < 0) { + rv = -1; + goto done; + } + + rv = 0; +done: + if (socket_close(fd2) < 0) + rv = -2; + return (rv); +} + +static int +t_peercred(void) +{ + return (t_generic(t_peercred_client, t_peercred_server)); } diff -ruNp unix_cmsg.orig/unix_cmsg.t unix_cmsg/unix_cmsg.t --- unix_cmsg.orig/unix_cmsg.t 2012-11-19 14:38:48.000000000 +0200 +++ unix_cmsg/unix_cmsg.t 2013-02-08 12:08:52.000000000 +0200 @@ -11,47 +11,78 @@ n=0 run() { - result=`${cmd} -t $2 $3 $4 2>&1` - if [ $? -eq 0 ]; then - echo -n "ok $1" - else - echo -n "not ok $1" + result=`${cmd} -t $2 $3 ${5%% *} 2>&1` + if [ $? -ne 0 ]; then + echo -n "not " fi - echo " -" $5 + echo "ok $1 - $4 ${5#* }" echo ${result} | grep -E "SERVER|CLIENT" | while read line; do echo "# ${line}" done } -echo "1..15" +echo "1..47" -for desc in \ - "Sending, receiving cmsgcred" \ - "Receiving sockcred (listening socket has LOCAL_CREDS) # TODO" \ - "Receiving sockcred (accepted socket has LOCAL_CREDS) # TODO" \ - "Sending cmsgcred, receiving sockcred # TODO" \ - "Sending, receiving timestamp" +for t1 in \ + "1 Sending, receiving cmsgcred" \ + "4 Sending cmsgcred, receiving sockcred" \ + "5 Sending, receiving timeval" \ + "6 Sending, receiving bintime" \ + "7 Check cmsghdr.cmsg_len" do - n=`expr ${n} + 1` - run ${n} stream "" ${n} "STREAM ${desc}" + for t2 in \ + "0 " \ + "1 (no data)" \ + "2 (no array)" \ + "3 (no data, array)" + do + n=$((n + 1)) + run ${n} stream "-z ${t2%% *}" STREAM "${t1} ${t2#* }" + done +done + +for t1 in \ + "2 Receiving sockcred (listening socket)" \ + "3 Receiving sockcred (accepted socket)" +do + for t2 in \ + "0 " \ + "1 (no data)" + do + n=$((n + 1)) + run ${n} stream "-z ${t2%% *}" STREAM "${t1} ${t2#* }" + done done -i=0 -for desc in \ - "Sending, receiving cmsgcred" \ - "Receiving sockcred # TODO" \ - "Sending cmsgcred, receiving sockcred # TODO" \ - "Sending, receiving timestamp" +n=$((n + 1)) +run ${n} stream "-z 0" STREAM "8 Check LOCAL_PEERCRED socket option" + +for t1 in \ + "1 Sending, receiving cmsgcred" \ + "3 Sending cmsgcred, receiving sockcred" \ + "4 Sending, receiving timeval" \ + "5 Sending, receiving bintime" \ + "6 Check cmsghdr.cmsg_len" do - i=`expr ${i} + 1` - n=`expr ${n} + 1` - run ${n} dgram "" ${i} "DGRAM ${desc}" + for t2 in \ + "0 " \ + "1 (no data)" \ + "2 (no array)" \ + "3 (no data, array)" + do + n=$((n + 1)) + run ${n} dgram "-z ${t2%% *}" DGRAM "${t1} ${t2#* }" + done done -run 10 stream -z 1 "STREAM Sending, receiving cmsgcred (no control data)" -run 11 stream -z 4 "STREAM Sending cmsgcred, receiving sockcred (no control data) # TODO" -run 12 stream -z 5 "STREAM Sending, receiving timestamp (no control data)" - -run 13 dgram -z 1 "DGRAM Sending, receiving cmsgcred (no control data)" -run 14 dgram -z 3 "DGRAM Sending cmsgcred, receiving sockcred (no control data) # TODO" -run 15 dgram -z 4 "DGRAM Sending, receiving timestamp (no control data)" +for t1 in \ + "2 Receiving sockcred" +do + for t2 in \ + "0 " \ + "1 (no data)" + do + n=$((n + 1)) + run ${n} dgram "-z ${t2%% *}" DGRAM "${t1} ${t2#* }" + done +done From owner-freebsd-net@FreeBSD.ORG Fri Feb 8 14:10:55 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 03BF22FA; Fri, 8 Feb 2013 14:10:55 +0000 (UTC) (envelope-from smithi@nimnet.asn.au) Received: from sola.nimnet.asn.au (paqi.nimnet.asn.au [115.70.110.159]) by mx1.freebsd.org (Postfix) with ESMTP id 6D518840; Fri, 8 Feb 2013 14:10:53 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by sola.nimnet.asn.au (8.14.2/8.14.2) with ESMTP id r18EAjdP076787; Sat, 9 Feb 2013 01:10:45 +1100 (EST) (envelope-from smithi@nimnet.asn.au) Date: Sat, 9 Feb 2013 01:10:45 +1100 (EST) From: Ian Smith To: "Eggert, Lars" Subject: Re: high cpu usage on natd / dhcpd In-Reply-To: Message-ID: <20130208012008.L21988@sola.nimnet.asn.au> References: <510A87B8.7000705@luckie.org.nz> <20130207231943.O21988@sola.nimnet.asn.au> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Cc: "" , "freebsd-net@freebsd.org" , Matthew Luckie X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 08 Feb 2013 14:10:55 -0000 On Thu, 7 Feb 2013 12:50:51 +0000, Eggert, Lars wrote: > Hi, > > On Feb 7, 2013, at 13:40, Ian Smith wrote: > > On Thu, 7 Feb 2013 08:08:59 +0000, Eggert, Lars wrote: > >> On Jan 31, 2013, at 16:03, Matthew Luckie wrote: > >>> > >>> 00510 allow ip from me to not me out via em1 > >>> 00550 divert 8668 ip from any to any via em1 > >>> > >>> Rule 510 fixes it. > >> > >> Yep, it does. Can I ask someone to commit this to rc.firewall? > > > > The ruleset Matthew posted bears no resemblance to rc.firewall, so I > > don't see that (or how) it solves any generic problem. > > sorry for having been imprecise. What I was asking for was this change: > > --- /usr/src/etc/rc.firewall 2012-11-17 12:36:10.000000000 +0100 > +++ rc.firewall 2013-02-06 11:35:45.000000000 +0100 > @@ -155,6 +155,7 @@ > case ${natd_enable} in > [Yy][Ee][Ss]) > if [ -n "${natd_interface}" ]; then > + ${fwcmd} add 49 allow ip from me to not me out via ${natd_interface} > ${fwcmd} add 50 divert natd ip4 from any to any via ${natd_interface} > fi > ;; That could break the 'client' ruleset, which also includes this section, so to do this you may need another case for just 'open' to add that allow first, then the existing code for 'client' as well. Bit messy. My patch made it a setup_nat() function called with or without rule number, so it could be used in 'simple' too, which currently lacks kernel nat. That allows all outbound IP (4 or 6) from any address on your box (me) without trying to divert it via natd - which is a sensible aim for 'open', and as julian@ has said (paraphrasing perhaps) "Never waste natd's time with a packet it doesn't care about", which these are. I think you'd do better for this case to either put these few rules you need, including the following '65000 allow all..' into /etc/my.rules and set firewall_type="/etc/my.rules", or copy rc.firewall to rc.mywall, modify only that and set firewall_script="/etc/rc.mywall" in rc.conf ? Either way you'll still get setup_loopback() and setup_ipv6_mandatory() rules. If it improves performance, can you instrument that at all? > >> (And I wonder if the rules for the ipfw kernel firewall need a > >> similar addition, because the system locks up under heavy network > >> load if I use that instead of natd.) Perhaps finding the root cause of 'lock up' would be useful to pursue? Is there any ipv6 involved with this? Is your upstream DHCP server giving you an address in public or RFC1918 space? What packet rates? > > Which rc.firewall ruleset are you referring to? > > My rc.conf has: > > gateway_enable="YES" > firewall_enable="YES" > firewall_type="OPEN" > natd_enable="YES" > natd_interface="bce0" > > With the patch above, that seems to work fine. > > I tried to replace the natd_* lines with: > > firewall_nat_enable="YES" > firewall_nat_interface="bce0" > > which caused the machine to lock up under load, similar to when natd > started eating CPU cycles. This made me wonder if a similar patch to > the above for the firewall_nat_* case in rc.firewall might be needed. Well it shouldn't, but maybe you've reached some load / pps limit on your hardware in ipfw_nat too? Again, avoiding trying to do NAT on ineligible (outbound, from me) packets is not a bad idea per se. One of the issues in outstanding PRs for /etc/rc.d/ipfw is that if you still have natd_enable set, it won't load the ipfw_nat module needed, ie you currently need to know you must disable natd when enabling ipfw_nat. > > I suggest following up to ipfw@ (cc'd) rather than net@ > > Will subscribe, thanks. > > Lars I'll leave you to pull this out of net@ if you think it best. cheers, Ian From owner-freebsd-net@FreeBSD.ORG Fri Feb 8 18:16:58 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 16390941; Fri, 8 Feb 2013 18:16:58 +0000 (UTC) (envelope-from jfvogel@gmail.com) Received: from mail-ve0-f176.google.com (mail-ve0-f176.google.com [209.85.128.176]) by mx1.freebsd.org (Postfix) with ESMTP id A93716E0; Fri, 8 Feb 2013 18:16:57 +0000 (UTC) Received: by mail-ve0-f176.google.com with SMTP id cz10so3548090veb.21 for ; Fri, 08 Feb 2013 10:16:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:date:message-id:subject:from:to:cc :content-type; bh=/HWKFgz9ZwkiIxlnuu1o7XRIvTSkuVs3bQva9h97swI=; b=cRpzOlHfJOoNMeQHaRLycuqq/MdFZph7Ag1sMD0Q9pkjTIFqoKiCPUFf9t+3jZYdm7 SNolqxT2lGw4tCy85YjurATp8B97qX3gG6FVnXDk/W/xEhwrOorPKeXsf4uw5mJOgyos dli1NHcEw5PhjSfcL23+zACjmLHTBwjnZfn8pnWpZ0Fs0x2VsctWJAEoPBBEceziV7Nc tCskHB5KoakPq6fb+vp5iU6mAUpZvpnpfz3h61cHySnzAdYghotJvZDsOJW7TW8JU7Ue UljccQZE8A8Ac7RV3aXLpPmGOqFa7FoQ2UE1bovSROmap+07DYOL3LqUEgN/8/SfKxGw NU/w== MIME-Version: 1.0 X-Received: by 10.52.29.109 with SMTP id j13mr6919204vdh.111.1360347409285; Fri, 08 Feb 2013 10:16:49 -0800 (PST) Received: by 10.220.191.132 with HTTP; Fri, 8 Feb 2013 10:16:49 -0800 (PST) Date: Fri, 8 Feb 2013 10:16:49 -0800 Message-ID: Subject: Intel 82574 issue reported on Slashdot From: Jack Vogel To: FreeBSD Net , FreeBSD Current , FreeBSD stable Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: "Pieper, Jeffrey E" , "Hearn, James R" , "Ronciak, John" , "Vogel, Jack" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 08 Feb 2013 18:16:58 -0000 For those that may have run across the story on Slashdot about this NIC, here is our statement: Recently there were a few stories published, based on a blog post by an end-user, suggesting specific network packets may cause the Intel=AE 82574L Gigabit Ethernet Controller to become unresponsive until corrected by a full platform power cycle. Intel was made aware of this issue in September 2012 by the blogs author. Intel worked with the author as well as the original motherboard manufacturer to investigate and determine root cause. Intel root caused the issue to the specific vendor=92s mother board design where an incorrect EEPROM image was programmed during manufacturing. We communicated the findings and recommended corrections to the motherboard manufacturer. It is Intel=92s belief that this is an implementation issue isolated to a specific manufacturer, not a design problem with the Intel 82574L Gigabit Ethernet controller. Intel has not observed this issue with any implementations which follow Intel=92s published design guidelines. Intel recommends contacting your motherboard manufacturer if you have continued concerns or questions whether your products are impacted. Here is the link: http://communities.intel.com/community/wired/blog/2013/02/07/intel-82574l-g= igabit-ethernet-controller-statement Any questions or concerns may be sent to me. Cheers, Jack From owner-freebsd-net@FreeBSD.ORG Fri Feb 8 18:49:07 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id B25EFB6D for ; Fri, 8 Feb 2013 18:49:07 +0000 (UTC) (envelope-from artemb@gmail.com) Received: from mail-ve0-f182.google.com (mail-ve0-f182.google.com [209.85.128.182]) by mx1.freebsd.org (Postfix) with ESMTP id 7796E8DA for ; Fri, 8 Feb 2013 18:49:07 +0000 (UTC) Received: by mail-ve0-f182.google.com with SMTP id ox1so3628150veb.27 for ; Fri, 08 Feb 2013 10:49:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type :content-transfer-encoding; bh=sdelzcUGXoGeOwZjo2T0XAkAm8zxxQ8MYKhYu5vcjn0=; b=RctkPMkoczsl48QYzBqQFvxZ+pSX/Z+yvlHK0CfZexbvTtxX696qKH2LbVLVcME+i0 ALDyH6+0c1lXaPZWJSI6k0doNH+cVCqL3rdeowEt6Wy3lEHbVgOK5r0rkexQSpMeyZ0A C1/Ae95mUK8o+389A/mIccLvtEEa5lEbV3igvLNJFOAvsKJB5/F8gSJbR9sX98fN/T1G 65EHyNwDzbNYe4vO/krrKwuw5NRi82jxLTDU9lOGfxI6/43n/lq7bkiny2cU8zUnfiZN JpnB08+s6I2Np4HlcwVe0kQuvlymbNheD+edPu1TdWNVYKfexpej1E2obzCYABtPQNNG 6/kQ== MIME-Version: 1.0 X-Received: by 10.52.18.235 with SMTP id z11mr7105043vdd.39.1360349341166; Fri, 08 Feb 2013 10:49:01 -0800 (PST) Sender: artemb@gmail.com Received: by 10.220.123.2 with HTTP; Fri, 8 Feb 2013 10:49:01 -0800 (PST) In-Reply-To: References: Date: Fri, 8 Feb 2013 10:49:01 -0800 X-Google-Sender-Auth: yPBFtuQaeMYmdPZp4JNJHku7HOg Message-ID: Subject: Re: Intel 82574 issue reported on Slashdot From: Artem Belevich To: Jack Vogel , FreeBSD Net Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 08 Feb 2013 18:49:07 -0000 Jack, How do I tell whether my motherboards are made by 'specific manufacturer' and whether NICs there are affected? Broadcasting packet of death is not a very good method in production environment. EEPROM dump on my 82574L NICs on Supermicro X9SAE-V motherboard do match the 'bad' EEPROM mentioned in the http://www.kriskinc.com/intel-pod --Artem On Fri, Feb 8, 2013 at 10:16 AM, Jack Vogel wrote: > For those that may have run across the story on Slashdot about this NIC, > here is our statement: > > Recently there were a few stories published, based on a blog post by an > end-user, suggesting specific network packets may cause the Intel=AE 8257= 4L > Gigabit Ethernet Controller to become unresponsive until corrected by a > full platform power cycle. > > Intel was made aware of this issue in September 2012 by the blogs author. > Intel worked with the author as well as the original motherboard > manufacturer to investigate and determine root cause. Intel root caused t= he > issue to the specific vendor=92s mother board design where an incorrect > EEPROM image was programmed during manufacturing. We communicated the > findings and recommended corrections to the motherboard manufacturer. > > It is Intel=92s belief that this is an implementation issue isolated to a > specific manufacturer, not a design problem with the Intel 82574L Gigabit > Ethernet controller. Intel has not observed this issue with any > implementations which follow Intel=92s published design guidelines. Inte= l > recommends contacting your motherboard manufacturer if you have continued > concerns or questions whether your products are impacted. > Here is the link: > > http://communities.intel.com/community/wired/blog/2013/02/07/intel-82574l= -gigabit-ethernet-controller-statement > > Any questions or concerns may be sent to me. > > Cheers, > > Jack > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" From owner-freebsd-net@FreeBSD.ORG Fri Feb 8 18:50:22 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 395ACCDC for ; Fri, 8 Feb 2013 18:50:22 +0000 (UTC) (envelope-from john@jnielsen.net) Received: from ns1.jnielsen.net (secure.freebsdsolutions.net [69.55.234.48]) by mx1.freebsd.org (Postfix) with ESMTP id EFB8A8EC for ; Fri, 8 Feb 2013 18:50:21 +0000 (UTC) Received: from [10.10.1.32] (office.betterlinux.com [199.58.199.60]) (authenticated bits=0) by ns1.jnielsen.net (8.14.4/8.14.4) with ESMTP id r18ISWiX022314 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT); Fri, 8 Feb 2013 13:28:34 -0500 (EST) (envelope-from john@jnielsen.net) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Subject: Re: Question: Why ain't I getting gigabit speed? From: John Nielsen In-Reply-To: <18120.1360278807@tristatelogic.com> Date: Fri, 8 Feb 2013 11:28:45 -0700 Content-Transfer-Encoding: quoted-printable Message-Id: References: <18120.1360278807@tristatelogic.com> To: "Ronald F. Guilmette" X-Mailer: Apple Mail (2.1499) X-DCC-sonic.net-Metrics: ns1.jnielsen.net 1156; Body=2 Fuz1=2 Fuz2=2 X-Virus-Scanned: clamav-milter 0.97.5 at ns1.jnielsen.net X-Virus-Status: Clean Cc: freebsd-net@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 08 Feb 2013 18:50:22 -0000 On Feb 7, 2013, at 4:13 PM, Ronald F. Guilmette = wrote: > I just aquired a brand new chepie gigabit PCI ethernet card off eBay. > The main chip on it appears to be an RTL8110S-32. >=20 > I stuck this card into a 9.1-RELEASE system that I have been putting > together, and it seemed to be recognized ok (as re0) upon boot up, so > I diddled my /etc/rc.conf file to get it to ifconfig as 192.168.1.3 > on reboot. Then I rebooted. >=20 > I have the card wired via a CAT6 cable to my Linksys E2000 gigabit > router. Nonetheless, upon reboot, followed by "ifconfig -a", the > output from ifconfig says the following for this card: >=20 > re0: flags=3D8843 metric 0 mtu = 1500 > = options=3D8209b > ether 00:13:3b:02:03:bd > inet 192.168.1.3 netmask 0xffffff00 broadcast 192.168.1.255 > inet6 fe80::213:3bff:fe02:3bd%re0 prefixlen 64 scopeid 0x7=20 > nd6 options=3D29 > media: Ethernet autoselect (100baseTX ) > status: active >=20 > I've tried two different CAT6 cables, two different LAN ports on my = E2000, > and I've even tried the card in two different PCI slost on my = motherboard, > but the results are always the same. >=20 > So, um, what gives? Why does the driver appear to be setting this = card to > 100baseTX rather than the 1000baseTX that I was hoping for? >=20 > Is there some magic spell that I am unaware of that I must cast on = this > in order to get it to work right? I would suspect the switch ("router"). FYI: http://forum.qnap.com/viewtopic.php?f=3D11&t=3D47421#p213242 I have an re interface on my FreeBSD router and it connects at 1000baseT = no problem. > P.S. dmesg has this to say about the card: >=20 > re0: = port 0xbe00-0xbeff mem 0xdf9ff000-0xdf9ff0ff irq 18 at device 5.0 on = pci4 > re0: Chip rev. 0x04000000 > re0: MAC rev. 0x00000000 > re0: Ethernet address: 00:13:3b:02:03:bd > re0: link state changed to UP > re0: link state changed to DOWN > re0: link state changed to UP > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >=20 From owner-freebsd-net@FreeBSD.ORG Fri Feb 8 20:48:37 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 4E9D52A0 for ; Fri, 8 Feb 2013 20:48:37 +0000 (UTC) (envelope-from rfg@tristatelogic.com) Received: from outgoing.tristatelogic.com (segfault.tristatelogic.com [69.62.255.118]) by mx1.freebsd.org (Postfix) with ESMTP id 1BFDFF2C for ; Fri, 8 Feb 2013 20:48:36 +0000 (UTC) Received: from segfault-nmh-helo.tristatelogic.com (localhost [127.0.0.1]) by segfault.tristatelogic.com (Postfix) with ESMTP id 987645081B for ; Fri, 8 Feb 2013 12:48:32 -0800 (PST) To: freebsd-net@freebsd.org Subject: Re: Question: Why ain't I getting gigabit speed? In-Reply-To: Date: Fri, 08 Feb 2013 12:48:32 -0800 Message-ID: <29539.1360356512@tristatelogic.com> From: "Ronald F. Guilmette" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 08 Feb 2013 20:48:37 -0000 In message , John Nielsen wrote: >On Feb 7, 2013, at 4:13 PM, Ronald F. Guilmette = >wrote: > >> I just aquired a brand new chepie gigabit PCI ethernet card off eBay. >> The main chip on it appears to be an RTL8110S-32. >>... >I would suspect the switch ("router"). FYI: >http://forum.qnap.com/viewtopic.php?f=3D11&t=3D47421#p213242 > >I have an re interface on my FreeBSD router and it connects at 1000baseT = >no problem. Could you please send or post the relevant ifconfig printout for that, and also the applicable/relevant dmesg lines? This problem is very perplexing, but I don't think that the problem is with my Linksys E2000. I did some more experiments. Fortunately, I had a CAT6 crossover cable lying around. So I used that and connected my machine with the RTL8110S-32 in it directly to two other machines with gigabit interfaces. One was my other server. The other was a laptop I have here. The results were very strange. In the case of connecting to the laptop, all seemed to work correctly, however ifconfig showed that my re0 device in this case believed itself to be "master". (I suspect that this may make a difference, and that the current FreeBSD re driver may perhaps behave better when it is acting as master.) In the case of connecting (via CAT6 crossover) direct to my other server, things got even more strange. In this case, after making the connection, autonegotiation apparently worked correctly, and I could see "1000baseT" in the output from "ifconfig re0", *however* a moment or two later, suddenly the connection was entirely dropped, and now the ifconfig output said "no carrier". I reproduced this sequence multiple times. It is readily reproducable. (The other server is running FreeBSD 8.3- RELEASE with an on-motherboard Nvidia gigabit ethernet interface, BTW.) I am inclined to wonder if perhaps the re driver has some rough edges still. Regards, rfg P.S. Since this card is really not working out for me, has anybody got a suggestion and/or link they could send me for an _inexpensive_ gigabit PCI nic that works reliably with FreeBSD? (I am hoping for something under $12 USD.) From owner-freebsd-net@FreeBSD.ORG Fri Feb 8 22:01:10 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 8D71B935 for ; Fri, 8 Feb 2013 22:01:10 +0000 (UTC) (envelope-from ml@my.gd) Received: from mail-wi0-f181.google.com (mail-wi0-f181.google.com [209.85.212.181]) by mx1.freebsd.org (Postfix) with ESMTP id 243DF351 for ; Fri, 8 Feb 2013 22:01:09 +0000 (UTC) Received: by mail-wi0-f181.google.com with SMTP id hm6so1384360wib.14 for ; Fri, 08 Feb 2013 14:01:09 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:references:in-reply-to:mime-version :content-transfer-encoding:content-type:message-id:cc:x-mailer:from :subject:date:to:x-gm-message-state; bh=FYCmSlrGjjjeS3u69mbiaN+OKXUsOC4jx4+LeQ0QAKo=; b=NKnVpH2U6zok1Uc7QTrYnHDJkAU8umN4IBMvvome0jV3Y7WUQVHVmyc+i/MyjJRqcG 9ixP/7C1Hlt/cGMxD8ZQpcn/qtieFy124EwGYiBHosKDPPrbrTGHu7lSSW3KDC/shzpX 87owVlj2wSjbDI6m8LYWxFDPYD5x5RtH5uyYOTXdaGP9LRcp/lEg1Jd0OjIWT2hld6eg vOLEOo/7l/Y1fRip5qyxu0mhN+pIIZtM7/QlreKyiQXMZ8tvHRqIW85ITSA4pJj0weRT w0FcKBA2ZP1vacjHiH1+pT9j/AZVN0H397o8WL/RwFz2C/Oi3sykaH+K/FwGdr+u4EsD NetQ== X-Received: by 10.180.101.104 with SMTP id ff8mr5104978wib.11.1360360869050; Fri, 08 Feb 2013 14:01:09 -0800 (PST) Received: from [10.33.180.116] ([92.90.20.9]) by mx.google.com with ESMTPS id e6sm18022976wiz.1.2013.02.08.14.01.07 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 08 Feb 2013 14:01:08 -0800 (PST) References: <29539.1360356512@tristatelogic.com> In-Reply-To: <29539.1360356512@tristatelogic.com> Mime-Version: 1.0 (1.0) Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii Message-Id: X-Mailer: iPhone Mail (9A405) From: Damien Fleuriot Subject: Re: Question: Why ain't I getting gigabit speed? Date: Fri, 8 Feb 2013 22:59:54 +0100 To: "Ronald F. Guilmette" X-Gm-Message-State: ALoCoQn6OkiRHuupSiObiSSk9/W+hmLQkEljx5lUNv80IxD1bz214JdlZZaqq/BpMQh/NtIx45bA Cc: "freebsd-net@freebsd.org" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 08 Feb 2013 22:01:10 -0000 On 8 Feb 2013, at 21:48, "Ronald F. Guilmette" wrote= : >=20 > In message ,=20 > John Nielsen wrote: >=20 >> On Feb 7, 2013, at 4:13 PM, Ronald F. Guilmette =3D= >> wrote: >>=20 >>> I just aquired a brand new chepie gigabit PCI ethernet card off eBay. >>> The main chip on it appears to be an RTL8110S-32. >>> ... >=20 >> I would suspect the switch ("router"). FYI: >> http://forum.qnap.com/viewtopic.php?f=3D3D11&t=3D3D47421#p213242 >>=20 >> I have an re interface on my FreeBSD router and it connects at 1000baseT =3D= >> no problem. >=20 > Could you please send or post the relevant ifconfig printout for that, > and also the applicable/relevant dmesg lines? >=20 > This problem is very perplexing, but I don't think that the problem > is with my Linksys E2000. >=20 > I did some more experiments. Fortunately, I had a CAT6 crossover cable > lying around. So I used that and connected my machine with the RTL8110S-3= 2 > in it directly to two other machines with gigabit interfaces. One was > my other server. The other was a laptop I have here. The results were > very strange. >=20 > In the case of connecting to the laptop, all seemed to work correctly, > however ifconfig showed that my re0 device in this case believed itself > to be "master". (I suspect that this may make a difference, and that > the current FreeBSD re driver may perhaps behave better when it is > acting as master.) >=20 ????? Come again ? Master what ? You never mentioned using lagg. > In the case of connecting (via CAT6 crossover) direct to my other server, > things got even more strange. In this case, after making the connection, > autonegotiation apparently worked correctly, and I could see "1000baseT" > in the output from "ifconfig re0", *however* a moment or two later, > suddenly the connection was entirely dropped, and now the ifconfig > output said "no carrier". I reproduced this sequence multiple times. > It is readily reproducable. (The other server is running FreeBSD 8.3- > RELEASE with an on-motherboard Nvidia gigabit ethernet interface, BTW.) >=20 > I am inclined to wonder if perhaps the re driver has some rough edges > still. >=20 >=20 > Regards, > rfg >=20 >=20 > P.S. Since this card is really not working out for me, has anybody got > a suggestion and/or link they could send me for an _inexpensive_ gigabit > PCI nic that works reliably with FreeBSD? (I am hoping for something unde= r > $12=20 Come on you've got to be kidding here... Get an intel or a Broadcom, cough up a bit more than duh... $12, and you'll b= e happy with that card. FYI I'm getting 1000baseT from built-in NICs.= From owner-freebsd-net@FreeBSD.ORG Fri Feb 8 22:15:25 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id E2ACAE2A for ; Fri, 8 Feb 2013 22:15:25 +0000 (UTC) (envelope-from rfg@tristatelogic.com) Received: from outgoing.tristatelogic.com (segfault.tristatelogic.com [69.62.255.118]) by mx1.freebsd.org (Postfix) with ESMTP id 61FE9607 for ; Fri, 8 Feb 2013 22:15:25 +0000 (UTC) Received: from segfault-nmh-helo.tristatelogic.com (localhost [127.0.0.1]) by segfault.tristatelogic.com (Postfix) with ESMTP id 95E545081A for ; Fri, 8 Feb 2013 14:15:20 -0800 (PST) To: "freebsd-net@freebsd.org" Subject: Re: Question: Why ain't I getting gigabit speed? In-Reply-To: Date: Fri, 08 Feb 2013 14:15:20 -0800 Message-ID: <30352.1360361720@tristatelogic.com> From: "Ronald F. Guilmette" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 08 Feb 2013 22:15:25 -0000 In message , Damien Fleuriot wrote: >> In the case of connecting to the laptop, all seemed to work correctly, >> however ifconfig showed that my re0 device in this case believed itself >> to be "master". (I suspect that this may make a difference, and that >> the current FreeBSD re driver may perhaps behave better when it is >> acting as master.) >>=20 > >????? >Come again ? >Master what ? The hell if I know! That's just what it said on the output of "ifconfig re0". I saw the word "master" following after the "1000BaseT" (but inside the <>) on the "media:" line of the output. >You never mentioned using lagg. I have no idea what that is. Thus, I have no idea if I am using it or not. If I am, it is certainly unintentional. Regards, rfg From owner-freebsd-net@FreeBSD.ORG Fri Feb 8 23:53:06 2013 Return-Path: Delivered-To: freebsd-net@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 71E463E1; Fri, 8 Feb 2013 23:53:06 +0000 (UTC) (envelope-from eadler@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id 1D2F3A47; Fri, 8 Feb 2013 23:53:06 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.6/8.14.6) with ESMTP id r18Nr5xt001016; Fri, 8 Feb 2013 23:53:05 GMT (envelope-from eadler@freefall.freebsd.org) Received: (from eadler@localhost) by freefall.freebsd.org (8.14.6/8.14.6/Submit) id r18Nr50m001012; Fri, 8 Feb 2013 23:53:05 GMT (envelope-from eadler) Date: Fri, 8 Feb 2013 23:53:05 GMT Message-Id: <201302082353.r18Nr50m001012@freefall.freebsd.org> To: eadler@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-net@FreeBSD.org From: eadler@FreeBSD.org Subject: Re: bin/175974: ppp(8): logic issue X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 08 Feb 2013 23:53:06 -0000 Old Synopsis: logic issue in ppp(8) New Synopsis: ppp(8): logic issue Responsible-Changed-From-To: freebsd-bugs->freebsd-net Responsible-Changed-By: eadler Responsible-Changed-When: Fri Feb 8 23:52:20 UTC 2013 Responsible-Changed-Why: change synopsis and assign http://www.freebsd.org/cgi/query-pr.cgi?pr=175974 From owner-freebsd-net@FreeBSD.ORG Sat Feb 9 00:35:11 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id B5EA1D5E for ; Sat, 9 Feb 2013 00:35:11 +0000 (UTC) (envelope-from lists@jnielsen.net) Received: from ns1.jnielsen.net (secure.freebsdsolutions.net [69.55.234.48]) by mx1.freebsd.org (Postfix) with ESMTP id 855CAC3A for ; Sat, 9 Feb 2013 00:35:11 +0000 (UTC) Received: from [10.10.1.32] (office.betterlinux.com [199.58.199.60]) (authenticated bits=0) by ns1.jnielsen.net (8.14.4/8.14.4) with ESMTP id r190Z4iq013507 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT); Fri, 8 Feb 2013 19:35:05 -0500 (EST) (envelope-from lists@jnielsen.net) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Subject: Re: Question: Why ain't I getting gigabit speed? From: John Nielsen In-Reply-To: <29539.1360356512@tristatelogic.com> Date: Fri, 8 Feb 2013 17:35:18 -0700 Content-Transfer-Encoding: quoted-printable Message-Id: <6FA76794-13B9-4124-BD0E-87E1673B8B7A@jnielsen.net> References: <29539.1360356512@tristatelogic.com> To: "Ronald F. Guilmette" X-Mailer: Apple Mail (2.1499) X-DCC-sonic.net-Metrics: ns1.jnielsen.net 1117; Body=2 Fuz1=2 Fuz2=2 X-Virus-Scanned: clamav-milter 0.97.5 at ns1.jnielsen.net X-Virus-Status: Clean Cc: freebsd-net@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 09 Feb 2013 00:35:11 -0000 On Feb 8, 2013, at 1:48 PM, Ronald F. Guilmette = wrote: > In message ,=20 > John Nielsen wrote: >=20 >> On Feb 7, 2013, at 4:13 PM, Ronald F. Guilmette = =3D >> wrote: >>=20 >>> I just aquired a brand new chepie gigabit PCI ethernet card off = eBay. >>> The main chip on it appears to be an RTL8110S-32. >>> ... >=20 >> I would suspect the switch ("router"). FYI: >> http://forum.qnap.com/viewtopic.php?f=3D3D11&t=3D3D47421#p213242 >>=20 >> I have an re interface on my FreeBSD router and it connects at = 1000baseT =3D >> no problem. >=20 > Could you please send or post the relevant ifconfig printout for that, > and also the applicable/relevant dmesg lines? % ifconfig re0 re0: flags=3D8843 metric 0 mtu = 1500 = options=3D8209b ether 00:1f:e2:55:1d:bc inet 67.182.217.170 netmask 0xfffffc00 broadcast 255.255.255.255=20= nd6 options=3D29 media: Ethernet autoselect (1000baseT ) status: active % dmesg | egrep '^re0:|^miibus0:|^rgephy0:' re0: port = 0xd800-0xd8ff mem 0xfe9ff000-0xfe9fffff irq 17 at device 0.0 on pci2 re0: Using 1 MSI message re0: Chip rev. 0x38000000 re0: MAC rev. 0x00400000 miibus0: on re0 rgephy0: PHY 1 on = miibus0 rgephy0: none, 10baseT, 10baseT-FDX, 10baseT-FDX-flow, 100baseTX, = 100baseTX-FDX, 100baseTX-FDX-flow, 1000baseT, 1000baseT-master, = 1000baseT-FDX, 1000baseT-FDX-master, 1000baseT-FDX-flow, = 1000baseT-FDX-flow-master, auto, auto-flow re0: Ethernet address: 00:1f:e2:55:1d:bc > This problem is very perplexing, but I don't think that the problem > is with my Linksys E2000. >=20 > I did some more experiments. Fortunately, I had a CAT6 crossover = cable > lying around. So I used that and connected my machine with the = RTL8110S-32 > in it directly to two other machines with gigabit interfaces. One was > my other server. The other was a laptop I have here. The results = were > very strange. >=20 > In the case of connecting to the laptop, all seemed to work correctly, > however ifconfig showed that my re0 device in this case believed = itself > to be "master". (I suspect that this may make a difference, and that > the current FreeBSD re driver may perhaps behave better when it is > acting as master.) Agree with other followup--"master" shouldn't be applicable here; figure = that out before you spend more time worrying about hardware. Would you = mind posting a redacted version of /etc/rc.conf (and the contents of = /etc/rc.conf.d, if any)? > In the case of connecting (via CAT6 crossover) direct to my other = server, > things got even more strange. In this case, after making the = connection, > autonegotiation apparently worked correctly, and I could see = "1000baseT" > in the output from "ifconfig re0", *however* a moment or two later, > suddenly the connection was entirely dropped, and now the ifconfig > output said "no carrier". I reproduced this sequence multiple times. > It is readily reproducable. (The other server is running FreeBSD 8.3- > RELEASE with an on-motherboard Nvidia gigabit ethernet interface, = BTW.) Any log or kernel messages on either side when this happens? > I am inclined to wonder if perhaps the re driver has some rough edges > still. I wouldn't jump to that conclusion. It's not exactly a new driver and = its author (Bill Paul) was quite experienced. It is possible you have a = dodgy board though. > P.S. Since this card is really not working out for me, has anybody = got > a suggestion and/or link they could send me for an _inexpensive_ = gigabit > PCI nic that works reliably with FreeBSD? (I am hoping for something = under > $12 USD.) Most/all 1G NIC's in that price range will be Realtek. You may be able = to find a Marvell/SysKonnect card for a bit more, but for not much more = than that you can get something from Intel. You may get gigabit links = from a cheap card but I wouldn't count on gigabit performance. (Actually = any PCI card will fall short of gigabit performance.) If you actually = care then spend the $30 on an Intel card. JN From owner-freebsd-net@FreeBSD.ORG Sat Feb 9 00:38:25 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 050EFF81; Sat, 9 Feb 2013 00:38:25 +0000 (UTC) (envelope-from doconnor@gsoft.com.au) Received: from cain.gsoft.com.au (cain.gsoft.com.au [203.31.81.10]) by mx1.freebsd.org (Postfix) with ESMTP id 7F730CC8; Sat, 9 Feb 2013 00:38:23 +0000 (UTC) Received: from [10.138.22.220] ([1.124.113.155]) (authenticated bits=0) by cain.gsoft.com.au (8.14.4/8.14.3) with ESMTP id r190bqro039647 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO); Sat, 9 Feb 2013 11:07:59 +1030 (CST) (envelope-from doconnor@gsoft.com.au) Subject: Re: Intel 82574 issue reported on Slashdot Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Content-Type: text/plain; charset=windows-1252 From: "Daniel O'Connor" In-Reply-To: Date: Sat, 9 Feb 2013 11:07:51 +1030 Content-Transfer-Encoding: quoted-printable Message-Id: References: To: Jack Vogel X-Mailer: Apple Mail (2.1499) X-Spam-Score: -0.023 () BAYES_00,HELO_MISC_IP,RDNS_NONE X-Scanned-By: MIMEDefang 2.67 on 203.31.81.10 Cc: FreeBSD stable , "Pieper, Jeffrey E" , FreeBSD Net , "Hearn, James R" , "Vogel, Jack" , "Ronciak, John" , FreeBSD Current X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 09 Feb 2013 00:38:25 -0000 On 09/02/2013, at 4:46, Jack Vogel wrote: > recommends contacting your motherboard manufacturer if you have = continued > concerns or questions whether your products are impacted. > Here is the link: >=20 > = http://communities.intel.com/community/wired/blog/2013/02/07/intel-82574l-= gigabit-ethernet-controller-statement >=20 > Any questions or concerns may be sent to me. In all honesty.. The blog post (and your email) are basically = information free, they don't name names and provide no script or = downloadable code that will allow end users to check if they are = affected. "Contact your motherboard manufacturer" is much more time consuming than = "Run sysctl... | grep foo | awk ..." to see if your system is affected. -- Daniel O'Connor software and network engineer for Genesis Software - http://www.gsoft.com.au "The nice thing about standards is that there are so many of them to choose from." -- Andrew Tanenbaum GPG Fingerprint - 5596 B766 97C0 0E94 4347 295E E593 DC20 7B3F CE8C From owner-freebsd-net@FreeBSD.ORG Sat Feb 9 08:15:47 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 5D487293; Sat, 9 Feb 2013 08:15:47 +0000 (UTC) (envelope-from bygg@mail.cafax.se) Received: from mail.cafax.se (mail.cafax.se [IPv6:2a00:801:11:53::4]) by mx1.freebsd.org (Postfix) with ESMTP id BDC7BDDC; Sat, 9 Feb 2013 08:15:45 +0000 (UTC) Received: from mail.cafax.se (localhost [127.0.0.1]) by mail.cafax.se (8.14.6/8.14.6) with ESMTP id r198FhOv018342 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 9 Feb 2013 09:15:43 +0100 (MET) Received: (from bygg@localhost) by mail.cafax.se (8.14.6/8.14.6/Submit) id r198Fh51009505; Sat, 9 Feb 2013 09:15:43 +0100 (MET) Sender: Johnny Eriksson Date: Sat, 9 Feb 2013 9:15:43 WET From: Johnny Eriksson To: FreeBSD Net Subject: Re: Intel 82574 issue reported on Slashdot In-Reply-To: Your message of Sat, 9 Feb 2013 11:07:51 +1030 Message-ID: X-Scanned-By: MIMEDefang 2.71 on 192.71.228.4 Cc: FreeBSD Current X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: Johnny Eriksson List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 09 Feb 2013 08:15:47 -0000 > In all honesty.. The blog post (and your email) are basically > information free, they don't name names and provide no script > or downloadable code that will allow end users to check if they > are affected. A link with a little bit more information: http://blog.krisk.org/2013/02/packets-of-death.html > Daniel O'Connor software and network engineer --Johnny From owner-freebsd-net@FreeBSD.ORG Sat Feb 9 10:12:26 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 97EE3EAD; Sat, 9 Feb 2013 10:12:26 +0000 (UTC) (envelope-from parv@pair.com) Received: from hrndva-omtalb.mail.rr.com (hrndva-omtalb.mail.rr.com [71.74.56.122]) by mx1.freebsd.org (Postfix) with ESMTP id 18E1E191; Sat, 9 Feb 2013 10:12:24 +0000 (UTC) X-Authority-Analysis: v=2.0 cv=bJmU0YCZ c=1 sm=0 a=lLOF/jpPrR0dcgWXP1EvZg==:17 a=xCuMbNp8hPoA:10 a=dLZphiicA4cA:10 a=R5FhY6rjjCMA:10 a=kj9zAlcOel0A:10 a=Ymsr-CWnAAAA:8 a=LDrFwfghZz0A:10 a=iReALljSAAAA:8 a=pGLkceISAAAA:8 a=QyXUC8HyAAAA:8 a=RaItmienAAAA:8 a=E70Ph-o_AAAA:8 a=WxxYND6mw3PQpSCbB8QA:9 a=CjuIK1q_8ugA:10 a=4elu-xP5etcA:10 a=MSl-tDqOz04A:10 a=lLOF/jpPrR0dcgWXP1EvZg==:117 X-Cloudmark-Score: 0 X-Authenticated-User: X-Originating-IP: 204.210.114.114 Received: from [204.210.114.114] ([204.210.114.114:49924] helo=localhost.hawaii.res.rr.com) by hrndva-oedge01.mail.rr.com (envelope-from ) (ecelerity 2.2.3.46 r()) with ESMTP id 59/B5-06157-10126115; Sat, 09 Feb 2013 10:12:18 +0000 Received: by localhost.hawaii.res.rr.com (Postfix, from userid 1000) id AFE745CAD; Sat, 9 Feb 2013 00:12:19 -1000 (HST) Date: Sat, 9 Feb 2013 00:12:19 -1000 From: Parv To: Daniel O'Connor Subject: Re: Intel 82574 issue reported on Slashdot Message-ID: <20130209101219.GA2133@holstein.holy.cow> Mail-Followup-To: Daniel O'Connor , Jack Vogel , FreeBSD stable , "Pieper, Jeffrey E" , FreeBSD Net , "Hearn, James R" , "Vogel, Jack" , "Ronciak, John" , FreeBSD Current References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Cc: FreeBSD stable , FreeBSD Current , "Pieper, Jeffrey E" , FreeBSD Net , "Hearn, James R" , "Vogel, Jack" , "Ronciak, John" , Jack Vogel X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 09 Feb 2013 10:12:26 -0000 in message , wrote Daniel O'Connor thusly... > > > On 09/02/2013, at 4:46, Jack Vogel wrote: > > > recommends contacting your motherboard manufacturer if you have > > continued concerns or questions whether your products are > > impacted. Here is the link: > > > > http://communities.intel.com/community/wired/blog/2013/02/07/intel-82574l-gigabit-ethernet-controller-statement > > > > Any questions or concerns may be sent to me. > > In all honesty.. The blog post (and your email) are basically > information free, they don't name names and provide no script or > downloadable code that will allow end users to check if they are > affected. > "Contact your motherboard manufacturer" is much more time > consuming than "Run sysctl... | grep foo | awk ..." to see if your > system is affected. Gift^WStraight from horse's mouth ... http://blog.krisk.org/2013/02/packets-of-death.html http://www.kriskinc.com/intel-pod - parv -- From owner-freebsd-net@FreeBSD.ORG Sat Feb 9 10:15:41 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id BB5D12D3; Sat, 9 Feb 2013 10:15:41 +0000 (UTC) (envelope-from doconnor@gsoft.com.au) Received: from cain.gsoft.com.au (cain.gsoft.com.au [203.31.81.10]) by mx1.freebsd.org (Postfix) with ESMTP id 420451D2; Sat, 9 Feb 2013 10:15:40 +0000 (UTC) Received: from ur.dons.net.au (ppp118-210-73-50.lns20.adl2.internode.on.net [118.210.73.50]) (authenticated bits=0) by cain.gsoft.com.au (8.14.4/8.14.3) with ESMTP id r19AFQce071839 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO); Sat, 9 Feb 2013 20:45:32 +1030 (CST) (envelope-from doconnor@gsoft.com.au) Subject: Re: Intel 82574 issue reported on Slashdot Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Content-Type: text/plain; charset=us-ascii From: "Daniel O'Connor" In-Reply-To: <20130209101219.GA2133@holstein.holy.cow> Date: Sat, 9 Feb 2013 20:45:26 +1030 Content-Transfer-Encoding: quoted-printable Message-Id: <1D962264-6DF6-4192-8190-34E22AADE843@gsoft.com.au> References: <20130209101219.GA2133@holstein.holy.cow> To: Parv X-Mailer: Apple Mail (2.1499) X-Spam-Score: 0.163 () BAYES_00,RDNS_DYNAMIC X-Scanned-By: MIMEDefang 2.67 on 203.31.81.10 Cc: FreeBSD stable , FreeBSD Current , "Pieper, Jeffrey E" , FreeBSD Net , "Hearn, James R" , "Vogel, Jack" , "Ronciak, John" , Jack Vogel X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 09 Feb 2013 10:15:41 -0000 On 09/02/2013, at 20:42, Parv wrote: >> "Contact your motherboard manufacturer" is much more time >> consuming than "Run sysctl... | grep foo | awk ..." to see if your >> system is affected. >=20 > Gift^WStraight from horse's mouth ... >=20 > http://blog.krisk.org/2013/02/packets-of-death.html I've already read this. > http://www.kriskinc.com/intel-pod I'd really rather a test which reads the EEPROM and tells me if it's a = problem rather than hang the interface on a machine :) In any case that isn't the point - this may be a "vendor issue" but it = reflects poorly on Intel that they didn't take proper ownership of the = issue. It would be far, far better for their image to say "some systems = may have the fault, go to http://.... to find a way to test for your = operating system". -- Daniel O'Connor software and network engineer for Genesis Software - http://www.gsoft.com.au "The nice thing about standards is that there are so many of them to choose from." -- Andrew Tanenbaum GPG Fingerprint - 5596 B766 97C0 0E94 4347 295E E593 DC20 7B3F CE8C From owner-freebsd-net@FreeBSD.ORG Sat Feb 9 12:17:46 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 9FB7F8E2; Sat, 9 Feb 2013 12:17:46 +0000 (UTC) (envelope-from ohartman@zedat.fu-berlin.de) Received: from outpost1.zedat.fu-berlin.de (outpost1.zedat.fu-berlin.de [130.133.4.66]) by mx1.freebsd.org (Postfix) with ESMTP id 5380A772; Sat, 9 Feb 2013 12:17:45 +0000 (UTC) Received: from inpost2.zedat.fu-berlin.de ([130.133.4.69]) by outpost1.zedat.fu-berlin.de (Exim 4.69) with esmtp (envelope-from ) id <1U49NC-000SJA-Rk>; Sat, 09 Feb 2013 13:17:38 +0100 Received: from e178030124.adsl.alicedsl.de ([85.178.30.124] helo=thor.walstatt.dyndns.org) by inpost2.zedat.fu-berlin.de (Exim 4.69) with esmtpsa (envelope-from ) id <1U49NC-001uSu-OF>; Sat, 09 Feb 2013 13:17:38 +0100 Message-ID: <51163E5B.7070602@zedat.fu-berlin.de> Date: Sat, 09 Feb 2013 13:17:31 +0100 From: "O. Hartmann" User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130131 Thunderbird/17.0.2 MIME-Version: 1.0 Subject: Re: Intel 82574 issue reported on Slashdot References: In-Reply-To: X-Enigmail-Version: 1.4.6 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enigAFCE9A2736331D45EB2F96D7" X-Originating-IP: 85.178.30.124 Cc: FreeBSD Net , FreeBSD Current X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 09 Feb 2013 12:17:46 -0000 This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enigAFCE9A2736331D45EB2F96D7 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Am 02/09/13 09:15, schrieb Johnny Eriksson: >> In all honesty.. The blog post (and your email) are basically >> information free, they don't name names and provide no script >> or downloadable code that will allow end users to check if they >> are affected. >=20 > A link with a little bit more information: >=20 > http://blog.krisk.org/2013/02/packets-of-death.html >=20 >> Daniel O'Connor software and network engineer >=20 > --Johnny We don't even have the tool tcpreplay in the ports mentioned in that BLOG= =2E oh --------------enigAFCE9A2736331D45EB2F96D7 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iQEcBAEBAgAGBQJRFj5iAAoJEOgBcD7A/5N8JnoH/ii46xUh/BhwbRv8ZommzU81 Ku7zSQErG3Mew51y6j1GmRyBb51Fu6KaQEgR0I8d0c0InL7amJ64tk+4u6KmoRot 1BWuCQUfjonhOqdkoE3pqlfNSh5L9mfmiZrhfgByTDhMlYdiaXtGfKF1I9WV97XU V6780P5vswuFXkGfWdoBMH2JqKVIapeGxZfWUGnDhUWu/5tK7RceZDlhfnyNbYQu AX3Ev7ujGusIyMcYxu8iBYq1sn9y78Sghm+8sFhvlPvSgyZT7fH25ndGeX9Ou3Ty lX//u1MOock7pFpc9LV1FWywtxuJJpwbhoMITuNxlsrdYYk3FvOO61587IYnsYw= =A1er -----END PGP SIGNATURE----- --------------enigAFCE9A2736331D45EB2F96D7-- From owner-freebsd-net@FreeBSD.ORG Sat Feb 9 12:28:22 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 1BABBB1E for ; Sat, 9 Feb 2013 12:28:22 +0000 (UTC) (envelope-from krzysiek@airnet.opole.pl) Received: from base.airnet.opole.pl (ns2.airmax.pl [176.111.128.3]) by mx1.freebsd.org (Postfix) with ESMTP id CE3BF7D0 for ; Sat, 9 Feb 2013 12:28:20 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by base.airnet.opole.pl (Postfix) with ESMTP id 7FBAE7FF04D for ; Sat, 9 Feb 2013 13:23:08 +0100 (CET) Received: from base.airnet.opole.pl ([127.0.0.1]) by localhost (mail.airnet.opole.pl [127.0.0.1]) (maiad, port 10024) with ESMTP id 23163-08 for ; Sat, 9 Feb 2013 13:23:08 +0100 (CET) Received: from [10.10.11.223] (unknown [176.111.138.12]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: krzysiek@airnet.opole.pl) by base.airnet.opole.pl (Postfix) with ESMTPSA id 544907FF02D for ; Sat, 9 Feb 2013 13:23:08 +0100 (CET) Message-ID: <51163FA5.5020501@airnet.opole.pl> Date: Sat, 09 Feb 2013 13:23:01 +0100 From: Krzysztof Barcikowski User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130107 Thunderbird/17.0.2 MIME-Version: 1.0 To: freebsd-net@freebsd.org Subject: Re: Intel 82574 issue reported on Slashdot References: <51163E5B.7070602@zedat.fu-berlin.de> In-Reply-To: <51163E5B.7070602@zedat.fu-berlin.de> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 09 Feb 2013 12:28:22 -0000 W dniu 2013-02-09 13:17, O. Hartmann pisze: > We don't even have the tool tcpreplay in the ports mentioned in that > BLOG. oh base2[/usr/ports]# make search name=tcpreplay Port: tcpreplay-3.4.4 Path: /usr/ports/net-mgmt/tcpreplay Info: A tool to replay saved packet capture files Maint: ehaupt@FreeBSD.org B-deps: libpcapnav-0.8 R-deps: WWW: http://tcpreplay.synfin.net/trac/ From owner-freebsd-net@FreeBSD.ORG Sat Feb 9 12:40:02 2013 Return-Path: Delivered-To: freebsd-net@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 4DC19F18 for ; Sat, 9 Feb 2013 12:40:02 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id 3B4558B4 for ; Sat, 9 Feb 2013 12:40:02 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.6/8.14.6) with ESMTP id r19Ce1us050496 for ; Sat, 9 Feb 2013 12:40:01 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.6/8.14.6/Submit) id r19Ce1UA050495; Sat, 9 Feb 2013 12:40:01 GMT (envelope-from gnats) Date: Sat, 9 Feb 2013 12:40:01 GMT Message-Id: <201302091240.r19Ce1UA050495@freefall.freebsd.org> To: freebsd-net@FreeBSD.org Cc: From: Andrey Simonenko Subject: Re: bin/131567: Update for regression/sockets/unix_cmsg X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: Andrey Simonenko List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 09 Feb 2013 12:40:02 -0000 The following reply was made to PR bin/131567; it has been noted by GNATS. From: Andrey Simonenko To: bug-followup@freebsd.org Cc: Subject: Re: bin/131567: Update for regression/sockets/unix_cmsg Date: Sat, 9 Feb 2013 14:35:42 +0200 Proposed commit log: ---- - Added tests for SCM_BINTIME, LOCAL_PEERCRED, cmsghdr.cmsg_len - Code that checks correctness of groups was corrected (getgroups(2) change) - unix_cmsg.c was completely redesigned and simplified - Use less timeout value in unix_cmsg.c for faster work - Added support for not sending data in a message, not sending data and data array associated with a cmsghdr structure in a message - Existent tests were improved - unix_cmsg.t was redesigned and simplified Correctness of unix_cmsg verified on 7.2-STABLE, 9.1-STABLE and 10-CURRENT. Submitted by: Andrey Simonenko ---- I've found one bug with the working directory in unix_cmsg.c and simplified unix_cmsg.c:main() a bit, this is the corrected version. diff -ruNp unix_cmsg.orig/README unix_cmsg/README --- unix_cmsg.orig/README 2012-11-19 14:38:48.000000000 +0200 +++ unix_cmsg/README 2013-02-09 14:29:23.000000000 +0200 @@ -1,127 +1,160 @@ $FreeBSD: src/tools/regression/sockets/unix_cmsg/README,v 1.2 2012/11/17 01:53:57 svnexp Exp $ About unix_cmsg -================ +=============== -This program is a collection of regression tests for ancillary (control) -data for PF_LOCAL sockets (local domain or Unix domain sockets). There -are tests for stream and datagram sockets. - -Usually each test does following steps: create Server, fork Client, -Client sends something to Server, Server verifies if everything -is correct in received message. Sometimes Client sends several -messages to Server. +This program is a collection of regression tests for ancillary data +(control information) for PF_LOCAL sockets (local domain or Unix domain +sockets). There are tests for stream and datagram sockets. + +Usually each test does following steps: creates Server, forks Client, +Client sends something to Server, Server verifies whether everything is +correct in received message(s). It is better to change the owner of unix_cmsg to some safe user -(eg. nobody:nogroup) and set SUID and SGID bits, else some tests -can give correct results for wrong implementation. +(eg. nobody:nogroup) and set SUID and SGID bits, else some tests that +check credentials can give correct results for wrong implementation. + +It is better to run this program by a user that belongs to more +than 16 groups. Available options ================= --d Output debugging information, values of different fields of - received messages, etc. Will produce many lines of information. - --h Output help message and exit. - --t - Run tests only for the given socket type: "stream" or "dgram". - With this option it is possible to run only particular test, - not all of them. - --z Do not send real control data if possible. Struct cmsghdr{} - should be followed by real control data. It is not clear if - a sender should give control data in all cases (this is not - documented and an arbitrary application can choose anything). - - At least for PF_LOCAL sockets' control messages with types - SCM_CREDS and SCM_TIMESTAMP the kernel does not need any - control data. This option allow to not send real control data - for SCM_CREDS and SCM_TIMESTAMP control messages. +usage: unix_cmsg [-dh] [-n num] [-s size] [-t type] [-z value] [testno] -Description of tests -==================== + Options are: + -d Output debugging information + -h Output the help message and exit + -n num Number of messages to send + -s size Specify size of data for IPC + -t type Specify socket type (stream, dgram) for tests + -z value Do not send data in a message (bit 0x1), do not send + data array associated with a cmsghdr structure (bit 0x2) + testno Run one test by its number (require the -t option) + +Description +=========== + +If Client sends something to Server, then it sends 5 messages by default. +Number of messages can be changed in the -n command line option. Number +of messages will be given as N in the following descriptions. + +If Client sends something to Server, then it sends some data (few bytes) +in each message by default. The size of this data can be changed by the -s +command line option. The "-s 0" command line option means, that Client will +send zero bytes represented by { NULL, 0 } value of struct iovec{}, referenced +by the msg_iov field from struct msghdr{}. The "-z 1" or "-z 3" command line +option means, that Client will send zero bytes represented by the NULL value +in the msg_iov field from struct msghdr{}. + +If Client sends some ancillary data object, then this ancillary data object +always has associated data array by default. The "-z 2" or "-z 3" option +means, that Client will not send associated data array if possible. For SOCK_STREAM sockets: ----------------------- 1: Sending, receiving cmsgcred - Client connects to Server and sends two messages with data and - control message with SCM_CREDS type to Server. Server should - receive two messages, in both messages there should be data and - control message with SCM_CREDS type followed by struct cmsgcred{} - and this structure should contain correct information. - - 2: Receiving sockcred (listening socket has LOCAL_CREDS) - - Server creates listen socket and set socket option LOCAL_CREDS - for it. Client connects to Server and sends two messages with data - to Server. Server should receive two messages, in first message - there should be data and control message with SCM_CREDS type followed - by struct sockcred{} and this structure should contain correct - information, in second message there should be data and no control - message. - - 3: Receiving sockcred (accepted socket has LOCAL_CREDS) - - Client connects to Server and sends two messages with data. Server - accepts connection and set socket option LOCAL_CREDS for just accepted - socket (here synchronization is used, to allow Client to see just set - flag on Server's socket before sending messages to Server). Server - should receive two messages, in first message there should be data and - control message with SOCK_CRED type followed by struct sockcred{} and - this structure should contain correct information, in second message - there should be data and no control message. + Client connects to Server and sends N messages with SCM_CREDS ancillary + data object. Server should receive N messages, each message should + have SCM_CREDS ancillary data object followed by struct cmsgcred{}. + + 2: Receiving sockcred (listening socket) + + Server creates a listening stream socket and sets the LOCAL_CREDS + socket option for it. Client connects to Server two times, each time + it sends N messages. Server accepts two connections and receives N + messages from each connection. The first message from each connection + should have SCM_CREDS ancillary data object followed by struct sockcred{}, + next messages from the same connection should not have ancillary data. + + 3: Receiving sockcred (accepted socket) + + Client connects to Server. Server accepts connection and sets the + LOCAL_CREDS socket option for just accepted socket. Client sends N + messages to Server. Server should receive N messages, the first + message should have SCM_CREDS ancillary data object followed by + struct sockcred{}, next messages should not have ancillary data. 4: Sending cmsgcred, receiving sockcred - Server creates listen socket and set socket option LOCAL_CREDS - for it. Client connects to Server and sends one message with data - and control message with SCM_CREDS type to Server. Server should - receive one message with data and control message with SCM_CREDS type - followed by struct sockcred{} and this structure should contain - correct information. - - 5: Sending, receiving timestamp - - Client connects to Server and sends message with data and control - message with SCM_TIMESTAMP type to Server. Server should receive - message with data and control message with SCM_TIMESTAMP type - followed by struct timeval{}. + Server creates a listening stream socket and sets the LOCAL_CREDS + socket option for it. Client connects to Server and sends N messages + with SCM_CREDS ancillary data object. Server should receive N messages, + the first message should have SCM_CREDS ancillary data object followed + by struct sockcred{}, each of next messages should have SCM_CREDS + ancillary data object followed by struct cmsgcred{}. + + 5: Sending, receiving timeval + + Client connects to Server and sends message with SCM_TIMESTAMP ancillary + data object. Server should receive one message with SCM_TIMESTAMP + ancillary data object followed by struct timeval{}. + + 6: Sending, receiving bintime + + Client connects to Server and sends message with SCM_BINTIME ancillary + data object. Server should receive one message with SCM_BINTIME + ancillary data object followed by struct bintime{}. + + 7: Checking cmsghdr.cmsg_len + + Client connects to Server and tries to send several messages with + SCM_CREDS ancillary data object that has wrong cmsg_len field in its + struct cmsghdr{}. All these attempts should fail, since cmsg_len + in all requests is less than CMSG_LEN(0). + + 8: Check LOCAL_PEERCRED socket option + + This test does not use ancillary data, but can be implemented here. + Client connects to Server. Both Client and Server verify that + credentials of the peer are correct using LOCAL_PEERCRED socket option. For SOCK_DGRAM sockets: ---------------------- 1: Sending, receiving cmsgcred - Client sends to Server two messages with data and control message - with SCM_CREDS type to Server. Server should receive two messages, - in both messages there should be data and control message with - SCM_CREDS type followed by struct cmsgcred{} and this structure - should contain correct information. + Client connects to Server and sends N messages with SCM_CREDS ancillary + data object. Server should receive N messages, each message should + have SCM_CREDS ancillary data object followed by struct cmsgcred{}. 2: Receiving sockcred - Server creates datagram socket and set socket option LOCAL_CREDS - for it. Client sends two messages with data to Server. Server should - receive two messages, in both messages there should be data and control - message with SCM_CREDS type followed by struct sockcred{} and this - structure should contain correct information. + Server creates datagram socket and sets the LOCAL_CREDS socket option + for it. Client sends N messages to Server. Server should receive N + messages, each message should have SCM_CREDS ancillary data object + followed by struct sockcred{}. 3: Sending cmsgcred, receiving sockcred - - Server creates datagram socket and set socket option LOCAL_CREDS - for it. Client sends one message with data and control message with - SOCK_CREDS type to Server. Server should receive one message with - data and control message with SCM_CREDS type followed by struct - sockcred{} and this structure should contain correct information. - - 4: Sending, receiving timestamp - - Client sends message with data and control message with SCM_TIMESTAMP - type to Server. Server should receive message with data and control - message with SCM_TIMESTAMP type followed by struct timeval{}. + + Server creates datagram socket and sets the LOCAL_CREDS socket option + for it. Client sends N messages with SCM_CREDS ancillary data object + to Server. Server should receive N messages, the first message should + have SCM_CREDS ancillary data object followed by struct sockcred{}, + each of next messages should have SCM_CREDS ancillary data object + followed by struct cmsgcred{}. + + 4: Sending, receiving timeval + + Client sends one message with SCM_TIMESTAMP ancillary data object + to Server. Server should receive one message with SCM_TIMESTAMP + ancillary data object followed by struct timeval{}. + + 5: Sending, receiving bintime + + Client sends one message with SCM_BINTIME ancillary data object + to Server. Server should receive one message with SCM_BINTIME + ancillary data object followed by struct bintime{}. + + 6: Checking cmsghdr.cmsg_len + + Client tries to send Server several messages with SCM_CREDS ancillary + data object that has wrong cmsg_len field in its struct cmsghdr{}. + All these attempts should fail, since cmsg_len in all requests is less + than CMSG_LEN(0). - Andrey Simonenko -simon@comsys.ntu-kpi.kiev.ua +andreysimonenko@users.sourceforge.net diff -ruNp unix_cmsg.orig/unix_cmsg.c unix_cmsg/unix_cmsg.c --- unix_cmsg.orig/unix_cmsg.c 2012-11-20 11:26:18.000000000 +0200 +++ unix_cmsg/unix_cmsg.c 2013-02-09 03:10:58.000000000 +0200 @@ -27,48 +27,46 @@ #include __FBSDID("$FreeBSD: src/tools/regression/sockets/unix_cmsg/unix_cmsg.c,v 1.5 2012/11/19 22:59:17 svnexp Exp $"); -#include +#include #include #include +#include #include +#include #include #include -#include #include #include #include +#include #include #include -#include +#include #include #include +#include #include #include #include #include -#include #include /* * There are tables with tests descriptions and pointers to test * functions. Each t_*() function returns 0 if its test passed, - * -1 if its test failed (something wrong was found in local domain - * control messages), -2 if some system error occurred. If test - * function returns -2, then a program exits. + * -1 if its test failed, -2 if some system error occurred. + * If a test function returns -2, then a program exits. * - * Each test function completely control what to do (eg. fork or - * do not fork a client process). If a test function forks a client - * process, then it waits for its termination. If a return code of a - * client process is not equal to zero, or if a client process was - * terminated by a signal, then test function returns -2. + * If a test function forks a client process, then it waits for its + * termination. If a return code of a client process is not equal + * to zero, or if a client process was terminated by a signal, then + * a test function returns -1 or -2 depending on exit status of + * a client process. * - * Each test function and complete program are not optimized - * a lot to allow easy to modify tests. - * - * Each function which can block, is run under TIMEOUT, if timeout - * occurs, then test function returns -2 or a client process exits - * with nonzero return code. + * Each function which can block, is run under TIMEOUT. If timeout + * occurs, then a test function returns -2 or a client process exits + * with a non-zero return code. */ #ifndef LISTENQ @@ -76,207 +74,290 @@ __FBSDID("$FreeBSD: src/tools/regression #endif #ifndef TIMEOUT -# define TIMEOUT 60 +# define TIMEOUT 2 #endif -#define EXTRA_CMSG_SPACE 512 /* Memory for not expected control data. */ - -static int t_cmsgcred(void), t_sockcred_stream1(void); -static int t_sockcred_stream2(void), t_cmsgcred_sockcred(void); -static int t_sockcred_dgram(void), t_timestamp(void); +static int t_cmsgcred(void); +static int t_sockcred_1(void); +static int t_sockcred_2(void); +static int t_cmsgcred_sockcred(void); +static int t_timeval(void); +static int t_bintime(void); +static int t_cmsg_len(void); +static int t_peercred(void); struct test_func { - int (*func)(void); /* Pointer to function. */ - const char *desc; /* Test description. */ -}; - -static struct test_func test_stream_tbl[] = { - { NULL, " 0: All tests" }, - { t_cmsgcred, " 1: Sending, receiving cmsgcred" }, - { t_sockcred_stream1, " 2: Receiving sockcred (listening socket has LOCAL_CREDS)" }, - { t_sockcred_stream2, " 3: Receiving sockcred (accepted socket has LOCAL_CREDS)" }, - { t_cmsgcred_sockcred, " 4: Sending cmsgcred, receiving sockcred" }, - { t_timestamp, " 5: Sending, receiving timestamp" }, - { NULL, NULL } + int (*func)(void); + const char *desc; }; -static struct test_func test_dgram_tbl[] = { - { NULL, " 0: All tests" }, - { t_cmsgcred, " 1: Sending, receiving cmsgcred" }, - { t_sockcred_dgram, " 2: Receiving sockcred" }, - { t_cmsgcred_sockcred, " 3: Sending cmsgcred, receiving sockcred" }, - { t_timestamp, " 4: Sending, receiving timestamp" }, - { NULL, NULL } +static const struct test_func test_stream_tbl[] = { + { + .func = NULL, + .desc = "All tests" + }, + { + .func = t_cmsgcred, + .desc = "Sending, receiving cmsgcred" + }, + { + .func = t_sockcred_1, + .desc = "Receiving sockcred (listening socket)" + }, + { + .func = t_sockcred_2, + .desc = "Receiving sockcred (accepted socket)" + }, + { + .func = t_cmsgcred_sockcred, + .desc = "Sending cmsgcred, receiving sockcred" + }, + { + .func = t_timeval, + .desc = "Sending, receiving timeval" + }, + { + .func = t_bintime, + .desc = "Sending, receiving bintime" + }, + { + .func = t_cmsg_len, + .desc = "Check cmsghdr.cmsg_len" + }, + { + .func = t_peercred, + .desc = "Check LOCAL_PEERCRED socket option" + } }; -#define TEST_STREAM_NO_MAX (sizeof(test_stream_tbl) / sizeof(struct test_func) - 2) -#define TEST_DGRAM_NO_MAX (sizeof(test_dgram_tbl) / sizeof(struct test_func) - 2) - -static const char *myname = "SERVER"; /* "SERVER" or "CLIENT" */ - -static int debug = 0; /* 1, if -d. */ -static int no_control_data = 0; /* 1, if -z. */ - -static u_int nfailed = 0; /* Number of failed tests. */ +#define TEST_STREAM_TBL_SIZE \ + (sizeof(test_stream_tbl) / sizeof(test_stream_tbl[0])) -static int sock_type; /* SOCK_STREAM or SOCK_DGRAM */ -static const char *sock_type_str; /* "SOCK_STREAM" or "SOCK_DGRAN" */ - -static char tempdir[] = "/tmp/unix_cmsg.XXXXXXX"; -static char serv_sock_path[PATH_MAX]; - -static char ipc_message[] = "hello"; - -#define IPC_MESSAGE_SIZE (sizeof(ipc_message)) - -static struct sockaddr_un servaddr; /* Server address. */ - -static sigjmp_buf env_alrm; +static const struct test_func test_dgram_tbl[] = { + { + .func = NULL, + .desc = "All tests" + }, + { + .func = t_cmsgcred, + .desc = "Sending, receiving cmsgcred" + }, + { + .func = t_sockcred_2, + .desc = "Receiving sockcred" + }, + { + .func = t_cmsgcred_sockcred, + .desc = "Sending cmsgcred, receiving sockcred" + }, + { + .func = t_timeval, + .desc = "Sending, receiving timeval" + }, + { + .func = t_bintime, + .desc = "Sending, receiving bintime" + }, + { + .func = t_cmsg_len, + .desc = "Check cmsghdr.cmsg_len" + } +}; -static uid_t my_uid; -static uid_t my_euid; -static gid_t my_gid; -static gid_t my_egid; +#define TEST_DGRAM_TBL_SIZE \ + (sizeof(test_dgram_tbl) / sizeof(test_dgram_tbl[0])) -/* - * my_gids[0] is EGID, next items are supplementary GIDs, - * my_ngids determines valid items in my_gids array. - */ -static gid_t my_gids[NGROUPS_MAX]; -static int my_ngids; +static bool debug = false; +static bool server_flag = true; +static bool send_data_flag = true; +static bool send_array_flag = true; +static bool failed_flag = false; + +static int sock_type; +static const char *sock_type_str; + +static const char *proc_name; + +static char work_dir[] = _PATH_TMP "unix_cmsg.XXXXXXX"; +static int serv_sock_fd; +static struct sockaddr_un serv_addr_sun; + +static struct { + char *buf_send; + char *buf_recv; + size_t buf_size; + u_int msg_num; +} ipc_msg; + +#define IPC_MSG_NUM_DEF 5 +#define IPC_MSG_NUM_MAX 10 +#define IPC_MSG_SIZE_DEF 7 +#define IPC_MSG_SIZE_MAX 128 + +static struct { + uid_t uid; + uid_t euid; + gid_t gid; + gid_t egid; + gid_t *gid_arr; + int gid_num; +} proc_cred; + +static pid_t client_pid; + +#define SYNC_SERVER 0 +#define SYNC_CLIENT 1 +#define SYNC_RECV 0 +#define SYNC_SEND 1 -static pid_t client_pid; /* PID of forked client. */ +static int sync_fd[2][2]; -#define dbgmsg(x) do { \ - if (debug) \ - logmsgx x ; \ -} while (/* CONSTCOND */0) +#define LOGMSG_SIZE 128 static void logmsg(const char *, ...) __printflike(1, 2); static void logmsgx(const char *, ...) __printflike(1, 2); +static void dbgmsg(const char *, ...) __printflike(1, 2); static void output(const char *, ...) __printflike(1, 2); -extern char *__progname; /* The name of program. */ - -/* - * Output the help message (-h switch). - */ static void -usage(int quick) +usage(bool verbose) { - const struct test_func *test_func; + u_int i; - fprintf(stderr, "Usage: %s [-dhz] [-t ] [testno]\n", - __progname); - if (quick) + printf("usage: %s [-dh] [-n num] [-s size] [-t type] " + "[-z value] [testno]\n", getprogname()); + if (!verbose) return; - fprintf(stderr, "\n Options are:\n\ - -d\t\t\tOutput debugging information\n\ - -h\t\t\tOutput this help message and exit\n\ - -t \t\tRun test only for the given socket type:\n\ -\t\t\tstream or dgram\n\ - -z\t\t\tDo not send real control data if possible\n\n"); - fprintf(stderr, " Available tests for stream sockets:\n"); - for (test_func = test_stream_tbl; test_func->desc != NULL; ++test_func) - fprintf(stderr, " %s\n", test_func->desc); - fprintf(stderr, "\n Available tests for datagram sockets:\n"); - for (test_func = test_dgram_tbl; test_func->desc != NULL; ++test_func) - fprintf(stderr, " %s\n", test_func->desc); + printf("\n Options are:\n\ + -d Output debugging information\n\ + -h Output the help message and exit\n\ + -n num Number of messages to send\n\ + -s size Specify size of data for IPC\n\ + -t type Specify socket type (stream, dgram) for tests\n\ + -z value Do not send data in a message (bit 0x1), do not send\n\ + data array associated with a cmsghdr structure (bit 0x2)\n\ + testno Run one test by its number (require the -t option)\n\n"); + printf(" Available tests for stream sockets:\n"); + for (i = 0; i < TEST_STREAM_TBL_SIZE; ++i) + printf(" %u: %s\n", i, test_stream_tbl[i].desc); + printf("\n Available tests for datagram sockets:\n"); + for (i = 0; i < TEST_DGRAM_TBL_SIZE; ++i) + printf(" %u: %s\n", i, test_dgram_tbl[i].desc); } -/* - * printf-like function for outputting to STDOUT_FILENO. - */ static void output(const char *format, ...) { - char buf[128]; + char buf[LOGMSG_SIZE]; va_list ap; va_start(ap, format); if (vsnprintf(buf, sizeof(buf), format, ap) < 0) - err(EX_SOFTWARE, "output: vsnprintf failed"); + err(EXIT_FAILURE, "output: vsnprintf failed"); write(STDOUT_FILENO, buf, strlen(buf)); va_end(ap); } -/* - * printf-like function for logging, also outputs message for errno. - */ static void logmsg(const char *format, ...) { - char buf[128]; + char buf[LOGMSG_SIZE]; va_list ap; int errno_save; - errno_save = errno; /* Save errno. */ - + errno_save = errno; va_start(ap, format); if (vsnprintf(buf, sizeof(buf), format, ap) < 0) - err(EX_SOFTWARE, "logmsg: vsnprintf failed"); + err(EXIT_FAILURE, "logmsg: vsnprintf failed"); if (errno_save == 0) - output("%s: %s\n", myname, buf); + output("%s: %s\n", proc_name, buf); else - output("%s: %s: %s\n", myname, buf, strerror(errno_save)); + output("%s: %s: %s\n", proc_name, buf, strerror(errno_save)); va_end(ap); + errno = errno_save; +} + +static void +vlogmsgx(const char *format, va_list ap) +{ + char buf[LOGMSG_SIZE]; + + if (vsnprintf(buf, sizeof(buf), format, ap) < 0) + err(EXIT_FAILURE, "logmsgx: vsnprintf failed"); + output("%s: %s\n", proc_name, buf); - errno = errno_save; /* Restore errno. */ } -/* - * printf-like function for logging, do not output message for errno. - */ static void logmsgx(const char *format, ...) { - char buf[128]; va_list ap; va_start(ap, format); - if (vsnprintf(buf, sizeof(buf), format, ap) < 0) - err(EX_SOFTWARE, "logmsgx: vsnprintf failed"); - output("%s: %s\n", myname, buf); + vlogmsgx(format, ap); va_end(ap); } -/* - * Run tests from testno1 to testno2. - */ +static void +dbgmsg(const char *format, ...) +{ + va_list ap; + + if (debug) { + va_start(ap, format); + vlogmsgx(format, ap); + va_end(ap); + } +} + static int -run_tests(u_int testno1, u_int testno2) +run_tests(int type, u_int testno1) { - const struct test_func *test_func; - u_int i, nfailed1; + const struct test_func *tf; + u_int i, testno2, failed_num; - output("Running tests for %s sockets:\n", sock_type_str); - test_func = (sock_type == SOCK_STREAM ? - test_stream_tbl : test_dgram_tbl) + testno1; + sock_type = type; + if (type == SOCK_STREAM) { + sock_type_str = "SOCK_STREAM"; + tf = test_stream_tbl; + i = TEST_STREAM_TBL_SIZE - 1; + } else { + sock_type_str = "SOCK_DGRAM"; + tf = test_dgram_tbl; + i = TEST_DGRAM_TBL_SIZE - 1; + } + if (testno1 == 0) { + testno1 = 1; + testno2 = i; + } else + testno2 = testno1; - nfailed1 = 0; - for (i = testno1; i <= testno2; ++test_func, ++i) { - output(" %s\n", test_func->desc); - switch (test_func->func()) { + output("Running tests for %s sockets:\n", sock_type_str); + failed_num = 0; + for (i = testno1, tf += testno1; i <= testno2; ++tf, ++i) { + output(" %u: %s\n", i, tf->desc); + switch (tf->func()) { case -1: - ++nfailed1; + ++failed_num; break; case -2: - logmsgx("some system error occurred, exiting"); + logmsgx("some system error or timeout occurred"); return (-1); } } - nfailed += nfailed1; + if (failed_num != 0) + failed_flag = true; if (testno1 != testno2) { - if (nfailed1 == 0) - output("-- all tests were passed!\n"); + if (failed_num == 0) + output("-- all tests passed!\n"); else - output("-- %u test%s failed!\n", nfailed1, - nfailed1 == 1 ? "" : "s"); + output("-- %u test%s failed!\n", + failed_num, failed_num == 1 ? "" : "s"); } else { - if (nfailed == 0) - output("-- test was passed!\n"); + if (failed_num == 0) + output("-- test passed!\n"); else output("-- test failed!\n"); } @@ -284,183 +365,322 @@ run_tests(u_int testno1, u_int testno2) return (0); } -/* ARGSUSED */ -static void -sig_alrm(int signo __unused) +static int +init(void) +{ + struct sigaction sigact; + size_t idx; + int rv; + + proc_name = "SERVER"; + + sigact.sa_handler = SIG_IGN; + sigact.sa_flags = 0; + sigemptyset(&sigact.sa_mask); + if (sigaction(SIGPIPE, &sigact, (struct sigaction *)NULL) < 0) { + logmsg("init: sigaction"); + return (-1); + } + + if (ipc_msg.buf_size == 0) + ipc_msg.buf_send = ipc_msg.buf_recv = NULL; + else { + ipc_msg.buf_send = malloc(ipc_msg.buf_size); + ipc_msg.buf_recv = malloc(ipc_msg.buf_size); + if (ipc_msg.buf_send == NULL || ipc_msg.buf_recv == NULL) { + logmsg("init: malloc"); + return (-1); + } + for (idx = 0; idx < ipc_msg.buf_size; ++idx) + ipc_msg.buf_send[idx] = (char)idx; + } + + proc_cred.uid = getuid(); + proc_cred.euid = geteuid(); + proc_cred.gid = getgid(); + proc_cred.egid = getegid(); + proc_cred.gid_num = getgroups(0, (gid_t *)NULL); + if (proc_cred.gid_num < 0) { + logmsg("init: getgroups"); + return (-1); + } + proc_cred.gid_arr = malloc(proc_cred.gid_num * + sizeof(*proc_cred.gid_arr)); + if (proc_cred.gid_arr == NULL) { + logmsg("init: malloc"); + return (-1); + } + if (getgroups(proc_cred.gid_num, proc_cred.gid_arr) < 0) { + logmsg("init: getgroups"); + return (-1); + } + + memset(&serv_addr_sun, 0, sizeof(serv_addr_sun)); + rv = snprintf(serv_addr_sun.sun_path, sizeof(serv_addr_sun.sun_path), + "%s/%s", work_dir, proc_name); + if (rv < 0) { + logmsg("init: snprintf"); + return (-1); + } + if ((size_t)rv >= sizeof(serv_addr_sun.sun_path)) { + logmsgx("init: not enough space for socket pathname"); + return (-1); + } + serv_addr_sun.sun_family = PF_LOCAL; + serv_addr_sun.sun_len = SUN_LEN(&serv_addr_sun); + + return (0); +} + +static int +client_fork(void) { - siglongjmp(env_alrm, 1); + int fd1, fd2; + + if (pipe(sync_fd[SYNC_SERVER]) < 0 || + pipe(sync_fd[SYNC_CLIENT]) < 0) { + logmsg("client_fork: pipe"); + return (-1); + } + client_pid = fork(); + if (client_pid == (pid_t)-1) { + logmsg("client_fork: fork"); + return (-1); + } + if (client_pid == 0) { + proc_name = "CLIENT"; + server_flag = false; + fd1 = sync_fd[SYNC_SERVER][SYNC_RECV]; + fd2 = sync_fd[SYNC_CLIENT][SYNC_SEND]; + } else { + fd1 = sync_fd[SYNC_SERVER][SYNC_SEND]; + fd2 = sync_fd[SYNC_CLIENT][SYNC_RECV]; + } + if (close(fd1) < 0 || close(fd2) < 0) { + logmsg("client_fork: close"); + return (-1); + } + return (client_pid != 0); } -/* - * Initialize signals handlers. - */ static void -sig_init(void) +client_exit(int rv) +{ + if (close(sync_fd[SYNC_SERVER][SYNC_SEND]) < 0 || + close(sync_fd[SYNC_CLIENT][SYNC_RECV]) < 0) { + logmsg("client_exit: close"); + rv = -1; + } + rv = rv == 0 ? EXIT_SUCCESS : -rv; + dbgmsg("exit: code %d", rv); + _exit(rv); +} + +static int +client_wait(void) { - struct sigaction sa; + int status; + pid_t pid; - sa.sa_handler = SIG_IGN; - sigemptyset(&sa.sa_mask); - sa.sa_flags = 0; - if (sigaction(SIGPIPE, &sa, (struct sigaction *)NULL) < 0) - err(EX_OSERR, "sigaction(SIGPIPE)"); - - sa.sa_handler = sig_alrm; - if (sigaction(SIGALRM, &sa, (struct sigaction *)NULL) < 0) - err(EX_OSERR, "sigaction(SIGALRM)"); + dbgmsg("waiting for client"); + + if (close(sync_fd[SYNC_SERVER][SYNC_RECV]) < 0 || + close(sync_fd[SYNC_CLIENT][SYNC_SEND]) < 0) { + logmsg("client_wait: close"); + return (-1); + } + + pid = waitpid(client_pid, &status, 0); + if (pid == (pid_t)-1) { + logmsg("client_wait: waitpid"); + return (-1); + } + + if (WIFEXITED(status)) { + if (WEXITSTATUS(status) != EXIT_SUCCESS) { + logmsgx("client exit status is %d", + WEXITSTATUS(status)); + return (-WEXITSTATUS(status)); + } + } else { + if (WIFSIGNALED(status)) + logmsgx("abnormal termination of client, signal %d%s", + WTERMSIG(status), WCOREDUMP(status) ? + " (core file generated)" : ""); + else + logmsgx("termination of client, unknown status"); + return (-1); + } + + return (0); } int main(int argc, char *argv[]) { const char *errstr; - int opt, dgramflag, streamflag; - u_int testno1, testno2; - - dgramflag = streamflag = 0; - while ((opt = getopt(argc, argv, "dht:z")) != -1) + u_int testno, zvalue; + int opt, rv; + bool dgram_flag, stream_flag; + + ipc_msg.buf_size = IPC_MSG_SIZE_DEF; + ipc_msg.msg_num = IPC_MSG_NUM_DEF; + dgram_flag = stream_flag = false; + while ((opt = getopt(argc, argv, "dhn:s:t:z:")) != -1) switch (opt) { case 'd': - debug = 1; + debug = true; break; case 'h': - usage(0); - return (EX_OK); + usage(true); + return (EXIT_SUCCESS); + case 'n': + ipc_msg.msg_num = strtonum(optarg, 1, + IPC_MSG_NUM_MAX, &errstr); + if (errstr != NULL) + errx(EXIT_FAILURE, "option -n: number is %s", + errstr); + break; + case 's': + ipc_msg.buf_size = strtonum(optarg, 0, + IPC_MSG_SIZE_MAX, &errstr); + if (errstr != NULL) + errx(EXIT_FAILURE, "option -s: number is %s", + errstr); + break; case 't': if (strcmp(optarg, "stream") == 0) - streamflag = 1; + stream_flag = true; else if (strcmp(optarg, "dgram") == 0) - dgramflag = 1; + dgram_flag = true; else - errx(EX_USAGE, "wrong socket type in -t option"); + errx(EXIT_FAILURE, "option -t: " + "wrong socket type"); break; case 'z': - no_control_data = 1; + zvalue = strtonum(optarg, 0, 3, &errstr); + if (errstr != NULL) + errx(EXIT_FAILURE, "option -z: number is %s", + errstr); + if (zvalue & 0x1) + send_data_flag = false; + if (zvalue & 0x2) + send_array_flag = false; break; - case '?': default: - usage(1); - return (EX_USAGE); + usage(false); + return (EXIT_FAILURE); } if (optind < argc) { if (optind + 1 != argc) - errx(EX_USAGE, "too many arguments"); - testno1 = strtonum(argv[optind], 0, UINT_MAX, &errstr); + errx(EXIT_FAILURE, "too many arguments"); + testno = strtonum(argv[optind], 0, UINT_MAX, &errstr); if (errstr != NULL) - errx(EX_USAGE, "wrong test number: %s", errstr); + errx(EXIT_FAILURE, "test number is %s", errstr); + if (stream_flag && testno >= TEST_STREAM_TBL_SIZE) + errx(EXIT_FAILURE, "given test %u for stream " + "sockets does not exist", testno); + if (dgram_flag && testno >= TEST_DGRAM_TBL_SIZE) + errx(EXIT_FAILURE, "given test %u for datagram " + "sockets does not exist", testno); } else - testno1 = 0; - - if (dgramflag == 0 && streamflag == 0) - dgramflag = streamflag = 1; + testno = 0; - if (dgramflag && streamflag && testno1 != 0) - errx(EX_USAGE, "you can use particular test, only with datagram or stream sockets"); - - if (streamflag) { - if (testno1 > TEST_STREAM_NO_MAX) - errx(EX_USAGE, "given test %u for stream sockets does not exist", - testno1); - } else { - if (testno1 > TEST_DGRAM_NO_MAX) - errx(EX_USAGE, "given test %u for datagram sockets does not exist", - testno1); + if (!dgram_flag && !stream_flag) { + if (testno != 0) + errx(EXIT_FAILURE, "particular test number " + "can be used with the -t option only"); + dgram_flag = stream_flag = true; } - my_uid = getuid(); - my_euid = geteuid(); - my_gid = getgid(); - my_egid = getegid(); - switch (my_ngids = getgroups(sizeof(my_gids) / sizeof(my_gids[0]), my_gids)) { - case -1: - err(EX_SOFTWARE, "getgroups"); - /* NOTREACHED */ - case 0: - errx(EX_OSERR, "getgroups returned 0 groups"); - } + if (mkdtemp(work_dir) == NULL) + err(EXIT_FAILURE, "mkdtemp(%s)", work_dir); - sig_init(); + rv = EXIT_FAILURE; + if (init() < 0) + goto done; - if (mkdtemp(tempdir) == NULL) - err(EX_OSERR, "mkdtemp"); + if (stream_flag) + if (run_tests(SOCK_STREAM, testno) < 0) + goto done; + if (dgram_flag) + if (run_tests(SOCK_DGRAM, testno) < 0) + goto done; - if (streamflag) { - sock_type = SOCK_STREAM; - sock_type_str = "SOCK_STREAM"; - if (testno1 == 0) { - testno1 = 1; - testno2 = TEST_STREAM_NO_MAX; - } else - testno2 = testno1; - if (run_tests(testno1, testno2) < 0) - goto failed; - testno1 = 0; + rv = EXIT_SUCCESS; +done: + if (rmdir(work_dir) < 0) { + logmsg("rmdir(%s)", work_dir); + rv = EXIT_FAILURE; } + return (failed_flag ? EXIT_FAILURE : rv); +} - if (dgramflag) { - sock_type = SOCK_DGRAM; - sock_type_str = "SOCK_DGRAM"; - if (testno1 == 0) { - testno1 = 1; - testno2 = TEST_DGRAM_NO_MAX; - } else - testno2 = testno1; - if (run_tests(testno1, testno2) < 0) - goto failed; - } +static int +socket_close(int fd) +{ + int rv; - if (rmdir(tempdir) < 0) { - logmsg("rmdir(%s)", tempdir); - return (EX_OSERR); + rv = 0; + if (close(fd) < 0) { + logmsg("socket_close: close"); + rv = -1; } - - return (nfailed ? EX_OSERR : EX_OK); - -failed: - if (rmdir(tempdir) < 0) - logmsg("rmdir(%s)", tempdir); - return (EX_OSERR); + if (server_flag && fd == serv_sock_fd) + if (unlink(serv_addr_sun.sun_path) < 0) { + logmsg("socket_close: unlink(%s)", + serv_addr_sun.sun_path); + rv = -1; + } + return (rv); } -/* - * Create PF_LOCAL socket, if sock_path is not equal to NULL, then - * bind() it. Return socket address in addr. Return file descriptor - * or -1 if some error occurred. - */ static int -create_socket(char *sock_path, size_t sock_path_len, struct sockaddr_un *addr) +socket_create(void) { - int rv, fd; + struct timeval tv; + int fd; - if ((fd = socket(PF_LOCAL, sock_type, 0)) < 0) { - logmsg("create_socket: socket(PF_LOCAL, %s, 0)", sock_type_str); + fd = socket(PF_LOCAL, sock_type, 0); + if (fd < 0) { + logmsg("socket_create: socket(PF_LOCAL, %s, 0)", sock_type_str); return (-1); } + if (server_flag) + serv_sock_fd = fd; - if (sock_path != NULL) { - if ((rv = snprintf(sock_path, sock_path_len, "%s/%s", - tempdir, myname)) < 0) { - logmsg("create_socket: snprintf failed"); - goto failed; - } - if ((size_t)rv >= sock_path_len) { - logmsgx("create_socket: too long path name for given buffer"); - goto failed; - } + tv.tv_sec = TIMEOUT; + tv.tv_usec = 0; + if (setsockopt(fd, SOL_SOCKET, SO_RCVTIMEO, &tv, sizeof(tv)) < 0 || + setsockopt(fd, SOL_SOCKET, SO_SNDTIMEO, &tv, sizeof(tv)) < 0) { + logmsg("socket_create: setsockopt(SO_RCVTIMEO/SO_SNDTIMEO)"); + goto failed; + } - memset(addr, 0, sizeof(*addr)); - addr->sun_family = AF_LOCAL; - if (strlen(sock_path) >= sizeof(addr->sun_path)) { - logmsgx("create_socket: too long path name (>= %lu) for local domain socket", - (u_long)sizeof(addr->sun_path)); + if (server_flag) { + if (bind(fd, (struct sockaddr *)&serv_addr_sun, + serv_addr_sun.sun_len) < 0) { + logmsg("socket_create: bind(%s)", + serv_addr_sun.sun_path); goto failed; } - strcpy(addr->sun_path, sock_path); + if (sock_type == SOCK_STREAM) { + int val; - if (bind(fd, (struct sockaddr *)addr, SUN_LEN(addr)) < 0) { - logmsg("create_socket: bind(%s)", sock_path); - goto failed; + if (listen(fd, LISTENQ) < 0) { + logmsg("socket_create: listen"); + goto failed; + } + val = fcntl(fd, F_GETFL, 0); + if (val < 0) { + logmsg("socket_create: fcntl(F_GETFL)"); + goto failed; + } + if (fcntl(fd, F_SETFL, val | O_NONBLOCK) < 0) { + logmsg("socket_create: fcntl(F_SETFL)"); + goto failed; + } } } @@ -468,1163 +688,1282 @@ create_socket(char *sock_path, size_t so failed: if (close(fd) < 0) - logmsg("create_socket: close"); + logmsg("socket_create: close"); + if (server_flag) + if (unlink(serv_addr_sun.sun_path) < 0) + logmsg("socket_close: unlink(%s)", + serv_addr_sun.sun_path); return (-1); } -/* - * Call create_socket() for server listening socket. - * Return socket descriptor or -1 if some error occurred. - */ static int -create_server_socket(void) +socket_connect(int fd) { - return (create_socket(serv_sock_path, sizeof(serv_sock_path), &servaddr)); -} + dbgmsg("connect"); -/* - * Create unbound socket. - */ -static int -create_unbound_socket(void) -{ - return (create_socket((char *)NULL, 0, (struct sockaddr_un *)NULL)); + if (connect(fd, (struct sockaddr *)&serv_addr_sun, + serv_addr_sun.sun_len) < 0) { + logmsg("socket_connect: connect(%s)", serv_addr_sun.sun_path); + return (-1); + } + return (0); } -/* - * Close socket descriptor, if sock_path is not equal to NULL, - * then unlink the given path. - */ static int -close_socket(const char *sock_path, int fd) +sync_recv(void) { - int error = 0; + ssize_t ssize; + int fd; + char buf; - if (close(fd) < 0) { - logmsg("close_socket: close"); - error = -1; - } - if (sock_path != NULL) - if (unlink(sock_path) < 0) { - logmsg("close_socket: unlink(%s)", sock_path); - error = -1; - } - return (error); -} + dbgmsg("sync: wait"); -/* - * Connect to server (socket address in servaddr). - */ -static int -connect_server(int fd) -{ - dbgmsg(("connecting to %s", serv_sock_path)); + fd = sync_fd[server_flag ? SYNC_SERVER : SYNC_CLIENT][SYNC_RECV]; - /* - * If PF_LOCAL listening socket's queue is full, then connect() - * returns ECONNREFUSED immediately, do not need timeout. - */ - if (connect(fd, (struct sockaddr *)&servaddr, sizeof(servaddr)) < 0) { - logmsg("connect_server: connect(%s)", serv_sock_path); + ssize = read(fd, &buf, 1); + if (ssize < 0) { + logmsg("sync_recv: read"); + return (-1); + } + if (ssize < 1) { + logmsgx("sync_recv: read %zd of 1 byte", ssize); return (-1); } + dbgmsg("sync: received"); + return (0); } -/* - * sendmsg() with timeout. - */ static int -sendmsg_timeout(int fd, struct msghdr *msg, size_t n) +sync_send(void) { - ssize_t nsent; - - dbgmsg(("sending %lu bytes", (u_long)n)); - - if (sigsetjmp(env_alrm, 1) != 0) { - logmsgx("sendmsg_timeout: cannot send message to %s (timeout)", serv_sock_path); - return (-1); - } - - (void)alarm(TIMEOUT); + ssize_t ssize; + int fd; - nsent = sendmsg(fd, msg, 0); + dbgmsg("sync: send"); - (void)alarm(0); + fd = sync_fd[server_flag ? SYNC_CLIENT : SYNC_SERVER][SYNC_SEND]; - if (nsent < 0) { - logmsg("sendmsg_timeout: sendmsg"); + ssize = write(fd, "", 1); + if (ssize < 0) { + logmsg("sync_send: write"); return (-1); } - - if ((size_t)nsent != n) { - logmsgx("sendmsg_timeout: sendmsg: short send: %ld of %lu bytes", - (long)nsent, (u_long)n); + if (ssize < 1) { + logmsgx("sync_send: sent %zd of 1 byte", ssize); return (-1); } return (0); } -/* - * accept() with timeout. - */ static int -accept_timeout(int listenfd) +message_send(int fd, const struct msghdr *msghdr) { - int fd; - - dbgmsg(("accepting connection")); - - if (sigsetjmp(env_alrm, 1) != 0) { - logmsgx("accept_timeout: cannot accept connection (timeout)"); + const struct cmsghdr *cmsghdr; + size_t size; + ssize_t ssize; + + size = msghdr->msg_iov != 0 ? msghdr->msg_iov->iov_len : 0; + dbgmsg("send: data size %zu", size); + dbgmsg("send: msghdr.msg_controllen %u", + (u_int)msghdr->msg_controllen); + cmsghdr = CMSG_FIRSTHDR(msghdr); + if (cmsghdr != NULL) + dbgmsg("send: cmsghdr.cmsg_len %u", + (u_int)cmsghdr->cmsg_len); + + ssize = sendmsg(fd, msghdr, 0); + if (ssize < 0) { + logmsg("message_send: sendmsg"); + return (-1); + } + if ((size_t)ssize != size) { + logmsgx("message_send: sendmsg: sent %zd of %zu bytes", + ssize, size); return (-1); } - (void)alarm(TIMEOUT); + if (!send_data_flag) + if (sync_send() < 0) + return (-1); - fd = accept(listenfd, (struct sockaddr *)NULL, (socklen_t *)NULL); + return (0); +} - (void)alarm(0); +static int +message_sendn(int fd, struct msghdr *msghdr) +{ + u_int i; - if (fd < 0) { - logmsg("accept_timeout: accept"); - return (-1); + for (i = 1; i <= ipc_msg.msg_num; ++i) { + dbgmsg("message #%u", i); + if (message_send(fd, msghdr) < 0) + return (-1); } - - return (fd); + return (0); } -/* - * recvmsg() with timeout. - */ static int -recvmsg_timeout(int fd, struct msghdr *msg, size_t n) +message_recv(int fd, struct msghdr *msghdr) { - ssize_t nread; + const struct cmsghdr *cmsghdr; + size_t size; + ssize_t ssize; - dbgmsg(("receiving %lu bytes", (u_long)n)); + if (!send_data_flag) + if (sync_recv() < 0) + return (-1); - if (sigsetjmp(env_alrm, 1) != 0) { - logmsgx("recvmsg_timeout: cannot receive message (timeout)"); + size = msghdr->msg_iov != NULL ? msghdr->msg_iov->iov_len : 0; + ssize = recvmsg(fd, msghdr, MSG_WAITALL); + if (ssize < 0) { + logmsg("message_recv: recvmsg"); return (-1); } - - (void)alarm(TIMEOUT); - - nread = recvmsg(fd, msg, MSG_WAITALL); - - (void)alarm(0); - - if (nread < 0) { - logmsg("recvmsg_timeout: recvmsg"); + if ((size_t)ssize != size) { + logmsgx("message_recv: recvmsg: received %zd of %zu bytes", + ssize, size); return (-1); } - if ((size_t)nread != n) { - logmsgx("recvmsg_timeout: recvmsg: short read: %ld of %lu bytes", - (long)nread, (u_long)n); + dbgmsg("recv: data size %zd", ssize); + dbgmsg("recv: msghdr.msg_controllen %u", + (u_int)msghdr->msg_controllen); + cmsghdr = CMSG_FIRSTHDR(msghdr); + if (cmsghdr != NULL) + dbgmsg("recv: cmsghdr.cmsg_len %u", + (u_int)cmsghdr->cmsg_len); + + if (memcmp(ipc_msg.buf_recv, ipc_msg.buf_send, size) != 0) { + logmsgx("message_recv: received message has wrong content"); return (-1); } return (0); } -/* - * Wait for synchronization message (1 byte) with timeout. - */ static int -sync_recv(int fd) +socket_accept(int listenfd) { - ssize_t nread; - char buf; - - dbgmsg(("waiting for sync message")); - - if (sigsetjmp(env_alrm, 1) != 0) { - logmsgx("sync_recv: cannot receive sync message (timeout)"); + fd_set rset; + struct timeval tv; + int fd, rv, val; + + dbgmsg("accept"); + + FD_ZERO(&rset); + FD_SET(listenfd, &rset); + tv.tv_sec = TIMEOUT; + tv.tv_usec = 0; + rv = select(listenfd + 1, &rset, (fd_set *)NULL, (fd_set *)NULL, &tv); + if (rv < 0) { + logmsg("socket_accept: select"); return (-1); } - - (void)alarm(TIMEOUT); - - nread = read(fd, &buf, 1); - - (void)alarm(0); - - if (nread < 0) { - logmsg("sync_recv: read"); + if (rv == 0) { + logmsgx("socket_accept: select timeout"); return (-1); } - if (nread != 1) { - logmsgx("sync_recv: read: short read: %ld of 1 byte", - (long)nread); + fd = accept(listenfd, (struct sockaddr *)NULL, (socklen_t *)NULL); + if (fd < 0) { + logmsg("socket_accept: accept"); return (-1); } - return (0); + val = fcntl(fd, F_GETFL, 0); + if (val < 0) { + logmsg("socket_accept: fcntl(F_GETFL)"); + goto failed; + } + if (fcntl(fd, F_SETFL, val & ~O_NONBLOCK) < 0) { + logmsg("socket_accept: fcntl(F_SETFL)"); + goto failed; + } + + return (fd); + +failed: + if (close(fd) < 0) + logmsg("socket_accept: close"); + return (-1); } -/* - * Send synchronization message (1 byte) with timeout. - */ static int -sync_send(int fd) +check_msghdr(const struct msghdr *msghdr, size_t size) { - ssize_t nsent; - - dbgmsg(("sending sync message")); - - if (sigsetjmp(env_alrm, 1) != 0) { - logmsgx("sync_send: cannot send sync message (timeout)"); + if (msghdr->msg_flags & MSG_TRUNC) { + logmsgx("msghdr.msg_flags has MSG_TRUNC"); return (-1); } - - (void)alarm(TIMEOUT); - - nsent = write(fd, "", 1); - - (void)alarm(0); - - if (nsent < 0) { - logmsg("sync_send: write"); + if (msghdr->msg_flags & MSG_CTRUNC) { + logmsgx("msghdr.msg_flags has MSG_CTRUNC"); return (-1); } - - if (nsent != 1) { - logmsgx("sync_send: write: short write: %ld of 1 byte", - (long)nsent); + if (msghdr->msg_controllen < size) { + logmsgx("msghdr.msg_controllen %u < %zu", + (u_int)msghdr->msg_controllen, size); + return (-1); + } + if (msghdr->msg_controllen > 0 && size == 0) { + logmsgx("msghdr.msg_controllen %u > 0", + (u_int)msghdr->msg_controllen); return (-1); } - return (0); } -/* - * waitpid() for client with timeout. - */ static int -wait_client(void) +check_cmsghdr(const struct cmsghdr *cmsghdr, int type, size_t size) { - int status; - pid_t pid; - - if (sigsetjmp(env_alrm, 1) != 0) { - logmsgx("wait_client: cannot get exit status of client PID %ld (timeout)", - (long)client_pid); + if (cmsghdr == NULL) { + logmsgx("cmsghdr is NULL"); return (-1); } - - (void)alarm(TIMEOUT); - - pid = waitpid(client_pid, &status, 0); - - (void)alarm(0); - - if (pid == (pid_t)-1) { - logmsg("wait_client: waitpid"); + if (cmsghdr->cmsg_level != SOL_SOCKET) { + logmsgx("cmsghdr.cmsg_level %d != SOL_SOCKET", + cmsghdr->cmsg_level); return (-1); } - - if (WIFEXITED(status)) { - if (WEXITSTATUS(status) != 0) { - logmsgx("wait_client: exit status of client PID %ld is %d", - (long)client_pid, WEXITSTATUS(status)); - return (-1); - } - } else { - if (WIFSIGNALED(status)) - logmsgx("wait_client: abnormal termination of client PID %ld, signal %d%s", - (long)client_pid, WTERMSIG(status), WCOREDUMP(status) ? " (core file generated)" : ""); - else - logmsgx("wait_client: termination of client PID %ld, unknown status", - (long)client_pid); + if (cmsghdr->cmsg_type != type) { + logmsgx("cmsghdr.cmsg_type %d != %d", + cmsghdr->cmsg_type, type); + return (-1); + } + if (cmsghdr->cmsg_len != CMSG_LEN(size)) { + logmsgx("cmsghdr.cmsg_len %u != %zu", + (u_int)cmsghdr->cmsg_len, CMSG_LEN(size)); return (-1); } - return (0); } -/* - * Check if n supplementary GIDs in gids are correct. (my_gids + 1) - * has (my_ngids - 1) supplementary GIDs of current process. - */ static int -check_groups(const gid_t *gids, int n) +check_groups(const char *gid_arr_str, const gid_t *gid_arr, + const char *gid_num_str, int gid_num, bool all_gids) { - char match[NGROUPS_MAX] = { 0 }; - int error, i, j; + int i; - if (n != my_ngids - 1) { - logmsgx("wrong number of groups %d != %d (returned from getgroups() - 1)", - n, my_ngids - 1); - error = -1; - } else - error = 0; - for (i = 0; i < n; ++i) { - for (j = 1; j < my_ngids; ++j) { - if (gids[i] == my_gids[j]) { - if (match[j]) { - logmsgx("duplicated GID %lu", - (u_long)gids[i]); - error = -1; - } else - match[j] = 1; - break; - } + for (i = 0; i < gid_num; ++i) + dbgmsg("%s[%d] %lu", gid_arr_str, i, (u_long)gid_arr[i]); + + if (all_gids) { + if (gid_num != proc_cred.gid_num) { + logmsgx("%s %d != %d", gid_num_str, gid_num, + proc_cred.gid_num); + return (-1); } - if (j == my_ngids) { - logmsgx("unexpected GID %lu", (u_long)gids[i]); - error = -1; + } else { + if (gid_num > proc_cred.gid_num) { + logmsgx("%s %d > %d", gid_num_str, gid_num, + proc_cred.gid_num); + return (-1); } } - for (j = 1; j < my_ngids; ++j) - if (match[j] == 0) { - logmsgx("did not receive supplementary GID %u", my_gids[j]); - error = -1; - } - return (error); + if (memcmp(gid_arr, proc_cred.gid_arr, + gid_num * sizeof(*gid_arr)) != 0) { + logmsgx("%s content is wrong", gid_arr_str); + for (i = 0; i < gid_num; ++i) + if (gid_arr[i] != proc_cred.gid_arr[i]) { + logmsgx("%s[%d] %lu != %lu", + gid_arr_str, i, (u_long)gid_arr[i], + (u_long)proc_cred.gid_arr[i]); + break; + } + return (-1); + } + return (0); } -/* - * Send n messages with data and control message with SCM_CREDS type - * to server and exit. - */ -static void -t_cmsgcred_client(u_int n) +static int +check_xucred(const struct xucred *xucred, socklen_t len) { - union { - struct cmsghdr cm; - char control[CMSG_SPACE(sizeof(struct cmsgcred))]; - } control_un; - struct msghdr msg; - struct iovec iov[1]; - struct cmsghdr *cmptr; - int fd; - u_int i; + if (len != sizeof(*xucred)) { + logmsgx("option value size %zu != %zu", + (size_t)len, sizeof(*xucred)); + return (-1); + } - assert(n == 1 || n == 2); + dbgmsg("xucred.cr_version %u", xucred->cr_version); + dbgmsg("xucred.cr_uid %lu", (u_long)xucred->cr_uid); + dbgmsg("xucred.cr_ngroups %d", xucred->cr_ngroups); + + if (xucred->cr_version != XUCRED_VERSION) { + logmsgx("xucred.cr_version %u != %d", + xucred->cr_version, XUCRED_VERSION); + return (-1); + } + if (xucred->cr_uid != proc_cred.euid) { + logmsgx("xucred.cr_uid %lu != %lu (EUID)", + (u_long)xucred->cr_uid, (u_long)proc_cred.euid); + return (-1); + } + if (xucred->cr_ngroups == 0) { + logmsgx("xucred.cr_ngroups == 0"); + return (-1); + } + if (xucred->cr_ngroups < 0) { + logmsgx("xucred.cr_ngroups < 0"); + return (-1); + } + if (xucred->cr_ngroups > XU_NGROUPS) { + logmsgx("xucred.cr_ngroups %hu > %u (max)", + xucred->cr_ngroups, XU_NGROUPS); + return (-1); + } + if (xucred->cr_groups[0] != proc_cred.egid) { + logmsgx("xucred.cr_groups[0] %lu != %lu (EGID)", + (u_long)xucred->cr_groups[0], (u_long)proc_cred.egid); + return (-1); + } + if (check_groups("xucred.cr_groups", xucred->cr_groups, + "xucred.cr_ngroups", xucred->cr_ngroups, false) < 0) + return (-1); + return (0); +} - if ((fd = create_unbound_socket()) < 0) - goto failed; +static int +check_scm_creds_cmsgcred(struct cmsghdr *cmsghdr) +{ + const struct cmsgcred *cmsgcred; - if (connect_server(fd) < 0) - goto failed_close; + if (check_cmsghdr(cmsghdr, SCM_CREDS, sizeof(*cmsgcred)) < 0) + return (-1); - iov[0].iov_base = ipc_message; - iov[0].iov_len = IPC_MESSAGE_SIZE; + cmsgcred = (struct cmsgcred *)CMSG_DATA(cmsghdr); - msg.msg_name = NULL; - msg.msg_namelen = 0; - msg.msg_iov = iov; - msg.msg_iovlen = 1; - msg.msg_control = control_un.control; - msg.msg_controllen = no_control_data ? - sizeof(struct cmsghdr) : sizeof(control_un.control); - msg.msg_flags = 0; - - cmptr = CMSG_FIRSTHDR(&msg); - cmptr->cmsg_len = CMSG_LEN(no_control_data ? - 0 : sizeof(struct cmsgcred)); - cmptr->cmsg_level = SOL_SOCKET; - cmptr->cmsg_type = SCM_CREDS; - - for (i = 0; i < n; ++i) { - dbgmsg(("#%u msg_controllen = %u, cmsg_len = %u", i, - (u_int)msg.msg_controllen, (u_int)cmptr->cmsg_len)); - if (sendmsg_timeout(fd, &msg, IPC_MESSAGE_SIZE) < 0) - goto failed_close; + dbgmsg("cmsgcred.cmcred_pid %ld", (long)cmsgcred->cmcred_pid); + dbgmsg("cmsgcred.cmcred_uid %lu", (u_long)cmsgcred->cmcred_uid); + dbgmsg("cmsgcred.cmcred_euid %lu", (u_long)cmsgcred->cmcred_euid); + dbgmsg("cmsgcred.cmcred_gid %lu", (u_long)cmsgcred->cmcred_gid); + dbgmsg("cmsgcred.cmcred_ngroups %d", cmsgcred->cmcred_ngroups); + + if (cmsgcred->cmcred_pid != client_pid) { + logmsgx("cmsgcred.cmcred_pid %ld != %ld", + (long)cmsgcred->cmcred_pid, (long)client_pid); + return (-1); + } + if (cmsgcred->cmcred_uid != proc_cred.uid) { + logmsgx("cmsgcred.cmcred_uid %lu != %lu", + (u_long)cmsgcred->cmcred_uid, (u_long)proc_cred.uid); + return (-1); + } + if (cmsgcred->cmcred_euid != proc_cred.euid) { + logmsgx("cmsgcred.cmcred_euid %lu != %lu", + (u_long)cmsgcred->cmcred_euid, (u_long)proc_cred.euid); + return (-1); + } + if (cmsgcred->cmcred_gid != proc_cred.gid) { + logmsgx("cmsgcred.cmcred_gid %lu != %lu", + (u_long)cmsgcred->cmcred_gid, (u_long)proc_cred.gid); + return (-1); + } + if (cmsgcred->cmcred_ngroups == 0) { + logmsgx("cmsgcred.cmcred_ngroups == 0"); + return (-1); + } + if (cmsgcred->cmcred_ngroups < 0) { + logmsgx("cmsgcred.cmcred_ngroups %d < 0", + cmsgcred->cmcred_ngroups); + return (-1); + } + if (cmsgcred->cmcred_ngroups > CMGROUP_MAX) { + logmsgx("cmsgcred.cmcred_ngroups %d > %d", + cmsgcred->cmcred_ngroups, CMGROUP_MAX); + return (-1); + } + if (cmsgcred->cmcred_groups[0] != proc_cred.egid) { + logmsgx("cmsgcred.cmcred_groups[0] %lu != %lu (EGID)", + (u_long)cmsgcred->cmcred_groups[0], (u_long)proc_cred.egid); + return (-1); } + if (check_groups("cmsgcred.cmcred_groups", cmsgcred->cmcred_groups, + "cmsgcred.cmcred_ngroups", cmsgcred->cmcred_ngroups, false) < 0) + return (-1); + return (0); +} - if (close_socket((const char *)NULL, fd) < 0) - goto failed; +static int +check_scm_creds_sockcred(struct cmsghdr *cmsghdr) +{ + const struct sockcred *sockcred; - _exit(0); + if (check_cmsghdr(cmsghdr, SCM_CREDS, + SOCKCREDSIZE(proc_cred.gid_num)) < 0) + return (-1); -failed_close: - (void)close_socket((const char *)NULL, fd); + sockcred = (struct sockcred *)CMSG_DATA(cmsghdr); -failed: - _exit(1); + dbgmsg("sockcred.sc_uid %lu", (u_long)sockcred->sc_uid); + dbgmsg("sockcred.sc_euid %lu", (u_long)sockcred->sc_euid); + dbgmsg("sockcred.sc_gid %lu", (u_long)sockcred->sc_gid); + dbgmsg("sockcred.sc_egid %lu", (u_long)sockcred->sc_egid); + dbgmsg("sockcred.sc_ngroups %d", sockcred->sc_ngroups); + + if (sockcred->sc_uid != proc_cred.uid) { + logmsgx("sockcred.sc_uid %lu != %lu", + (u_long)sockcred->sc_uid, (u_long)proc_cred.uid); + return (-1); + } + if (sockcred->sc_euid != proc_cred.euid) { + logmsgx("sockcred.sc_euid %lu != %lu", + (u_long)sockcred->sc_euid, (u_long)proc_cred.euid); + return (-1); + } + if (sockcred->sc_gid != proc_cred.gid) { + logmsgx("sockcred.sc_gid %lu != %lu", + (u_long)sockcred->sc_gid, (u_long)proc_cred.gid); + return (-1); + } + if (sockcred->sc_egid != proc_cred.egid) { + logmsgx("sockcred.sc_egid %lu != %lu", + (u_long)sockcred->sc_egid, (u_long)proc_cred.egid); + return (-1); + } + if (sockcred->sc_ngroups == 0) { + logmsgx("sockcred.sc_ngroups == 0"); + return (-1); + } + if (sockcred->sc_ngroups < 0) { + logmsgx("sockcred.sc_ngroups %d < 0", + sockcred->sc_ngroups); + return (-1); + } + if (sockcred->sc_ngroups != proc_cred.gid_num) { + logmsgx("sockcred.sc_ngroups %d != %u", + sockcred->sc_ngroups, proc_cred.gid_num); + return (-1); + } + if (check_groups("sockcred.sc_groups", sockcred->sc_groups, + "sockcred.sc_ngroups", sockcred->sc_ngroups, true) < 0) + return (-1); + return (0); } -/* - * Receive two messages with data and control message with SCM_CREDS - * type followed by struct cmsgcred{} from client. fd1 is a listen - * socket for stream sockets or simply socket for datagram sockets. - */ static int -t_cmsgcred_server(int fd1) +check_scm_timestamp(struct cmsghdr *cmsghdr) { - char buf[IPC_MESSAGE_SIZE]; - union { - struct cmsghdr cm; - char control[CMSG_SPACE(sizeof(struct cmsgcred)) + EXTRA_CMSG_SPACE]; - } control_un; - struct msghdr msg; - struct iovec iov[1]; - struct cmsghdr *cmptr; - const struct cmsgcred *cmcredptr; - socklen_t controllen; - int error, error2, fd2; - u_int i; + const struct timeval *timeval; - if (sock_type == SOCK_STREAM) { - if ((fd2 = accept_timeout(fd1)) < 0) - return (-2); - } else - fd2 = fd1; + if (check_cmsghdr(cmsghdr, SCM_TIMESTAMP, sizeof(struct timeval)) < 0) + return (-1); - error = 0; + timeval = (struct timeval *)CMSG_DATA(cmsghdr); - controllen = sizeof(control_un.control); + dbgmsg("timeval.tv_sec %"PRIdMAX", timeval.tv_usec %"PRIdMAX, + (intmax_t)timeval->tv_sec, (intmax_t)timeval->tv_usec); - for (i = 0; i < 2; ++i) { - iov[0].iov_base = buf; - iov[0].iov_len = sizeof(buf); + return (0); +} - msg.msg_name = NULL; - msg.msg_namelen = 0; - msg.msg_iov = iov; - msg.msg_iovlen = 1; - msg.msg_control = control_un.control; - msg.msg_controllen = controllen; - msg.msg_flags = 0; +static int +check_scm_bintime(struct cmsghdr *cmsghdr) +{ + const struct bintime *bintime; - controllen = CMSG_SPACE(sizeof(struct cmsgcred)); + if (check_cmsghdr(cmsghdr, SCM_BINTIME, sizeof(struct bintime)) < 0) + return (-1); - if (recvmsg_timeout(fd2, &msg, sizeof(buf)) < 0) - goto failed; + bintime = (struct bintime *)CMSG_DATA(cmsghdr); - if (msg.msg_flags & MSG_CTRUNC) { - logmsgx("#%u control data was truncated, MSG_CTRUNC flag is on", - i); - goto next_error; - } + dbgmsg("bintime.sec %"PRIdMAX", bintime.frac %"PRIu64, + (intmax_t)bintime->sec, bintime->frac); - if (msg.msg_controllen < sizeof(struct cmsghdr)) { - logmsgx("#%u msg_controllen %u < %lu (sizeof(struct cmsghdr))", - i, (u_int)msg.msg_controllen, (u_long)sizeof(struct cmsghdr)); - goto next_error; - } + return (0); +} - if ((cmptr = CMSG_FIRSTHDR(&msg)) == NULL) { - logmsgx("CMSG_FIRSTHDR is NULL"); - goto next_error; - } +static void +msghdr_init_generic(struct msghdr *msghdr, struct iovec *iov, void *cmsg_data) +{ + msghdr->msg_name = NULL; + msghdr->msg_namelen = 0; + if (send_data_flag) { + iov->iov_base = server_flag ? + ipc_msg.buf_recv : ipc_msg.buf_send; + iov->iov_len = ipc_msg.buf_size; + msghdr->msg_iov = iov; + msghdr->msg_iovlen = 1; + } else { + msghdr->msg_iov = NULL; + msghdr->msg_iovlen = 0; + } + msghdr->msg_control = cmsg_data; + msghdr->msg_flags = 0; +} - dbgmsg(("#%u msg_controllen = %u, cmsg_len = %u", i, - (u_int)msg.msg_controllen, (u_int)cmptr->cmsg_len)); +static void +msghdr_init_server(struct msghdr *msghdr, struct iovec *iov, + void *cmsg_data, size_t cmsg_size) +{ + msghdr_init_generic(msghdr, iov, cmsg_data); + msghdr->msg_controllen = cmsg_size; + dbgmsg("init: data size %zu", msghdr->msg_iov != NULL ? + msghdr->msg_iov->iov_len : (size_t)0); + dbgmsg("init: msghdr.msg_controllen %u", + (u_int)msghdr->msg_controllen); +} - if (cmptr->cmsg_level != SOL_SOCKET) { - logmsgx("#%u cmsg_level %d != SOL_SOCKET", i, - cmptr->cmsg_level); - goto next_error; - } +static void +msghdr_init_client(struct msghdr *msghdr, struct iovec *iov, + void *cmsg_data, size_t cmsg_size, int type, size_t arr_size) +{ + struct cmsghdr *cmsghdr; - if (cmptr->cmsg_type != SCM_CREDS) { - logmsgx("#%u cmsg_type %d != SCM_CREDS", i, - cmptr->cmsg_type); - goto next_error; - } + msghdr_init_generic(msghdr, iov, cmsg_data); + if (cmsg_data != NULL) { + msghdr->msg_controllen = send_array_flag ? + cmsg_size : CMSG_SPACE(0); + cmsghdr = CMSG_FIRSTHDR(msghdr); + cmsghdr->cmsg_level = SOL_SOCKET; + cmsghdr->cmsg_type = type; + cmsghdr->cmsg_len = CMSG_LEN(send_array_flag ? arr_size : 0); + } else + msghdr->msg_controllen = 0; +} - if (cmptr->cmsg_len != CMSG_LEN(sizeof(struct cmsgcred))) { - logmsgx("#%u cmsg_len %u != %lu (CMSG_LEN(sizeof(struct cmsgcred))", - i, (u_int)cmptr->cmsg_len, (u_long)CMSG_LEN(sizeof(struct cmsgcred))); - goto next_error; - } +static int +t_generic(int (*client_func)(int), int (*server_func)(int)) +{ + int fd, rv, rv_client; - cmcredptr = (const struct cmsgcred *)CMSG_DATA(cmptr); + switch (client_fork()) { + case 0: + fd = socket_create(); + if (fd < 0) + rv = -2; + else { + rv = client_func(fd); + if (socket_close(fd) < 0) + rv = -2; + } + client_exit(rv); + break; + case 1: + fd = socket_create(); + if (fd < 0) + rv = -2; + else { + rv = server_func(fd); + rv_client = client_wait(); + if (rv == 0 || (rv == -2 && rv_client != 0)) + rv = rv_client; + if (socket_close(fd) < 0) + rv = -2; + } + break; + default: + rv = -2; + } + return (rv); +} - error2 = 0; - if (cmcredptr->cmcred_pid != client_pid) { - logmsgx("#%u cmcred_pid %ld != %ld (PID of client)", - i, (long)cmcredptr->cmcred_pid, (long)client_pid); - error2 = 1; - } - if (cmcredptr->cmcred_uid != my_uid) { - logmsgx("#%u cmcred_uid %lu != %lu (UID of current process)", - i, (u_long)cmcredptr->cmcred_uid, (u_long)my_uid); - error2 = 1; - } - if (cmcredptr->cmcred_euid != my_euid) { - logmsgx("#%u cmcred_euid %lu != %lu (EUID of current process)", - i, (u_long)cmcredptr->cmcred_euid, (u_long)my_euid); - error2 = 1; - } - if (cmcredptr->cmcred_gid != my_gid) { - logmsgx("#%u cmcred_gid %lu != %lu (GID of current process)", - i, (u_long)cmcredptr->cmcred_gid, (u_long)my_gid); - error2 = 1; - } - if (cmcredptr->cmcred_ngroups == 0) { - logmsgx("#%u cmcred_ngroups = 0, this is wrong", i); - error2 = 1; - } else { - if (cmcredptr->cmcred_ngroups > NGROUPS_MAX) { - logmsgx("#%u cmcred_ngroups %d > %u (NGROUPS_MAX)", - i, cmcredptr->cmcred_ngroups, NGROUPS_MAX); - error2 = 1; - } else if (cmcredptr->cmcred_ngroups < 0) { - logmsgx("#%u cmcred_ngroups %d < 0", - i, cmcredptr->cmcred_ngroups); - error2 = 1; - } else { - dbgmsg(("#%u cmcred_ngroups = %d", i, - cmcredptr->cmcred_ngroups)); - if (cmcredptr->cmcred_groups[0] != my_egid) { - logmsgx("#%u cmcred_groups[0] %lu != %lu (EGID of current process)", - i, (u_long)cmcredptr->cmcred_groups[0], (u_long)my_egid); - error2 = 1; - } - if (check_groups(cmcredptr->cmcred_groups + 1, cmcredptr->cmcred_ngroups - 1) < 0) { - logmsgx("#%u cmcred_groups has wrong GIDs", i); - error2 = 1; - } - } - } +static int +t_cmsgcred_client(int fd) +{ + struct msghdr msghdr; + struct iovec iov[1]; + void *cmsg_data; + size_t cmsg_size; + int rv; - if (error2) - goto next_error; + if (sync_recv() < 0) + return (-2); - if ((cmptr = CMSG_NXTHDR(&msg, cmptr)) != NULL) { - logmsgx("#%u control data has extra header", i); - goto next_error; - } + rv = -2; - continue; -next_error: - error = -1; + cmsg_size = CMSG_SPACE(sizeof(struct cmsgcred)); + cmsg_data = malloc(cmsg_size); + if (cmsg_data == NULL) { + logmsg("malloc"); + goto done; } + msghdr_init_client(&msghdr, iov, cmsg_data, cmsg_size, + SCM_CREDS, sizeof(struct cmsgcred)); - if (sock_type == SOCK_STREAM) - if (close(fd2) < 0) { - logmsg("close"); - return (-2); - } - return (error); + if (socket_connect(fd) < 0) + goto done; -failed: - if (sock_type == SOCK_STREAM) - if (close(fd2) < 0) - logmsg("close"); - return (-2); + if (message_sendn(fd, &msghdr) < 0) + goto done; + + rv = 0; +done: + free(cmsg_data); + return (rv); } static int -t_cmsgcred(void) +t_cmsgcred_server(int fd1) { - int error, fd; + struct msghdr msghdr; + struct iovec iov[1]; + struct cmsghdr *cmsghdr; + void *cmsg_data; + size_t cmsg_size; + u_int i; + int fd2, rv; - if ((fd = create_server_socket()) < 0) + if (sync_send() < 0) return (-2); - if (sock_type == SOCK_STREAM) - if (listen(fd, LISTENQ) < 0) { - logmsg("listen"); - goto failed; - } + fd2 = -1; + rv = -2; - if ((client_pid = fork()) == (pid_t)-1) { - logmsg("fork"); - goto failed; + cmsg_size = CMSG_SPACE(sizeof(struct cmsgcred)); + cmsg_data = malloc(cmsg_size); + if (cmsg_data == NULL) { + logmsg("malloc"); + goto done; } - if (client_pid == 0) { - myname = "CLIENT"; - if (close_socket((const char *)NULL, fd) < 0) - _exit(1); - t_cmsgcred_client(2); - } + if (sock_type == SOCK_STREAM) { + fd2 = socket_accept(fd1); + if (fd2 < 0) + goto done; + } else + fd2 = fd1; - if ((error = t_cmsgcred_server(fd)) == -2) { - (void)wait_client(); - goto failed; - } + rv = -1; + for (i = 1; i <= ipc_msg.msg_num; ++i) { + dbgmsg("message #%u", i); + + msghdr_init_server(&msghdr, iov, cmsg_data, cmsg_size); + if (message_recv(fd2, &msghdr) < 0) { + rv = -2; + break; + } - if (wait_client() < 0) - goto failed; + if (check_msghdr(&msghdr, sizeof(*cmsghdr)) < 0) + break; - if (close_socket(serv_sock_path, fd) < 0) { - logmsgx("close_socket failed"); - return (-2); + cmsghdr = CMSG_FIRSTHDR(&msghdr); + if (check_scm_creds_cmsgcred(cmsghdr) < 0) + break; } - return (error); + if (i > ipc_msg.msg_num) + rv = 0; +done: + free(cmsg_data); + if (sock_type == SOCK_STREAM && fd2 >= 0) + if (socket_close(fd2) < 0) + rv = -2; + return (rv); +} -failed: - if (close_socket(serv_sock_path, fd) < 0) - logmsgx("close_socket failed"); - return (-2); +static int +t_cmsgcred(void) +{ + return (t_generic(t_cmsgcred_client, t_cmsgcred_server)); } -/* - * Send two messages with data to server and exit. - */ -static void -t_sockcred_client(int type) +static int +t_sockcred_client(int type, int fd) { - struct msghdr msg; + struct msghdr msghdr; struct iovec iov[1]; - int fd; - u_int i; - - assert(type == 0 || type == 1); + int rv; - if ((fd = create_unbound_socket()) < 0) - goto failed; + if (sync_recv() < 0) + return (-2); - if (connect_server(fd) < 0) - goto failed_close; + rv = -2; - if (type == 1) - if (sync_recv(fd) < 0) - goto failed_close; - - iov[0].iov_base = ipc_message; - iov[0].iov_len = IPC_MESSAGE_SIZE; - - msg.msg_name = NULL; - msg.msg_namelen = 0; - msg.msg_iov = iov; - msg.msg_iovlen = 1; - msg.msg_control = NULL; - msg.msg_controllen = 0; - msg.msg_flags = 0; - - for (i = 0; i < 2; ++i) - if (sendmsg_timeout(fd, &msg, IPC_MESSAGE_SIZE) < 0) - goto failed_close; + msghdr_init_client(&msghdr, iov, NULL, 0, 0, 0); - if (close_socket((const char *)NULL, fd) < 0) - goto failed; + if (socket_connect(fd) < 0) + goto done; - _exit(0); + if (type == 2) + if (sync_recv() < 0) + goto done; -failed_close: - (void)close_socket((const char *)NULL, fd); + if (message_sendn(fd, &msghdr) < 0) + goto done; -failed: - _exit(1); + rv = 0; +done: + return (rv); } -/* - * Receive one message with data and control message with SCM_CREDS - * type followed by struct sockcred{} and if n is not equal 1, then - * receive another one message with data. fd1 is a listen socket for - * stream sockets or simply socket for datagram sockets. If type is - * 1, then set LOCAL_CREDS option for accepted stream socket. - */ static int -t_sockcred_server(int type, int fd1, u_int n) +t_sockcred_server(int type, int fd1) { - char buf[IPC_MESSAGE_SIZE]; - union { - struct cmsghdr cm; - char control[CMSG_SPACE(SOCKCREDSIZE(NGROUPS_MAX)) + EXTRA_CMSG_SPACE]; - } control_un; - struct msghdr msg; + struct msghdr msghdr; struct iovec iov[1]; - struct cmsghdr *cmptr; - const struct sockcred *sockcred; - int error, error2, fd2, optval; + struct cmsghdr *cmsghdr; + void *cmsg_data; + size_t cmsg_size; u_int i; + int fd2, rv, val; - assert(n == 1 || n == 2); - assert(type == 0 || type == 1); + fd2 = -1; + rv = -2; - if (sock_type == SOCK_STREAM) { - if ((fd2 = accept_timeout(fd1)) < 0) - return (-2); - if (type == 1) { - optval = 1; - if (setsockopt(fd2, 0, LOCAL_CREDS, &optval, sizeof optval) < 0) { - logmsg("setsockopt(LOCAL_CREDS) for accepted socket"); - if (errno == ENOPROTOOPT) { - error = -1; - goto done_close; - } - goto failed; - } - if (sync_send(fd2) < 0) - goto failed; - } - } else - fd2 = fd1; - - error = 0; - - for (i = 0; i < n; ++i) { - iov[0].iov_base = buf; - iov[0].iov_len = sizeof buf; - - msg.msg_name = NULL; - msg.msg_namelen = 0; - msg.msg_iov = iov; - msg.msg_iovlen = 1; - msg.msg_control = control_un.control; - msg.msg_controllen = sizeof control_un.control; - msg.msg_flags = 0; - - if (recvmsg_timeout(fd2, &msg, sizeof buf) < 0) - goto failed; - - if (msg.msg_flags & MSG_CTRUNC) { - logmsgx("control data was truncated, MSG_CTRUNC flag is on"); - goto next_error; - } - - if (i != 0 && sock_type == SOCK_STREAM) { - if (msg.msg_controllen != 0) { - logmsgx("second message has control data, this is wrong for stream sockets"); - goto next_error; - } - dbgmsg(("#%u msg_controllen = %u", i, - (u_int)msg.msg_controllen)); - continue; - } + cmsg_size = CMSG_SPACE(SOCKCREDSIZE(proc_cred.gid_num)); + cmsg_data = malloc(cmsg_size); + if (cmsg_data == NULL) { + logmsg("malloc"); + goto done; + } - if (msg.msg_controllen < sizeof(struct cmsghdr)) { - logmsgx("#%u msg_controllen %u < %lu (sizeof(struct cmsghdr))", - i, (u_int)msg.msg_controllen, (u_long)sizeof(struct cmsghdr)); - goto next_error; + if (type == 1) { + dbgmsg("setting LOCAL_CREDS"); + val = 1; + if (setsockopt(fd1, 0, LOCAL_CREDS, &val, sizeof(val)) < 0) { + logmsg("setsockopt(LOCAL_CREDS)"); + goto done; } + } - if ((cmptr = CMSG_FIRSTHDR(&msg)) == NULL) { - logmsgx("CMSG_FIRSTHDR is NULL"); - goto next_error; - } + if (sync_send() < 0) + goto done; - dbgmsg(("#%u msg_controllen = %u, cmsg_len = %u", i, - (u_int)msg.msg_controllen, (u_int)cmptr->cmsg_len)); + if (sock_type == SOCK_STREAM) { + fd2 = socket_accept(fd1); + if (fd2 < 0) + goto done; + } else + fd2 = fd1; - if (cmptr->cmsg_level != SOL_SOCKET) { - logmsgx("#%u cmsg_level %d != SOL_SOCKET", i, - cmptr->cmsg_level); - goto next_error; + if (type == 2) { + dbgmsg("setting LOCAL_CREDS"); + val = 1; + if (setsockopt(fd2, 0, LOCAL_CREDS, &val, sizeof(val)) < 0) { + logmsg("setsockopt(LOCAL_CREDS)"); + goto done; + } + if (sync_send() < 0) + goto done; + } + + rv = -1; + for (i = 1; i <= ipc_msg.msg_num; ++i) { + dbgmsg("message #%u", i); + + msghdr_init_server(&msghdr, iov, cmsg_data, cmsg_size); + if (message_recv(fd2, &msghdr) < 0) { + rv = -2; + break; } - if (cmptr->cmsg_type != SCM_CREDS) { - logmsgx("#%u cmsg_type %d != SCM_CREDS", i, - cmptr->cmsg_type); - goto next_error; - } + if (i > 1 && sock_type == SOCK_STREAM) { + if (check_msghdr(&msghdr, 0) < 0) + break; + } else { + if (check_msghdr(&msghdr, sizeof(*cmsghdr)) < 0) + break; - if (cmptr->cmsg_len < CMSG_LEN(SOCKCREDSIZE(1))) { - logmsgx("#%u cmsg_len %u != %lu (CMSG_LEN(SOCKCREDSIZE(1)))", - i, (u_int)cmptr->cmsg_len, (u_long)CMSG_LEN(SOCKCREDSIZE(1))); - goto next_error; + cmsghdr = CMSG_FIRSTHDR(&msghdr); + if (check_scm_creds_sockcred(cmsghdr) < 0) + break; } + } + if (i > ipc_msg.msg_num) + rv = 0; +done: + free(cmsg_data); + if (sock_type == SOCK_STREAM && fd2 >= 0) + if (socket_close(fd2) < 0) + rv = -2; + return (rv); +} - sockcred = (const struct sockcred *)CMSG_DATA(cmptr); +static int +t_sockcred_1(void) +{ + u_int i; + int fd, rv, rv_client; - error2 = 0; - if (sockcred->sc_uid != my_uid) { - logmsgx("#%u sc_uid %lu != %lu (UID of current process)", - i, (u_long)sockcred->sc_uid, (u_long)my_uid); - error2 = 1; - } - if (sockcred->sc_euid != my_euid) { - logmsgx("#%u sc_euid %lu != %lu (EUID of current process)", - i, (u_long)sockcred->sc_euid, (u_long)my_euid); - error2 = 1; - } - if (sockcred->sc_gid != my_gid) { - logmsgx("#%u sc_gid %lu != %lu (GID of current process)", - i, (u_long)sockcred->sc_gid, (u_long)my_gid); - error2 = 1; - } - if (sockcred->sc_egid != my_egid) { - logmsgx("#%u sc_egid %lu != %lu (EGID of current process)", - i, (u_long)sockcred->sc_gid, (u_long)my_egid); - error2 = 1; - } - if (sockcred->sc_ngroups > NGROUPS_MAX) { - logmsgx("#%u sc_ngroups %d > %u (NGROUPS_MAX)", - i, sockcred->sc_ngroups, NGROUPS_MAX); - error2 = 1; - } else if (sockcred->sc_ngroups < 0) { - logmsgx("#%u sc_ngroups %d < 0", - i, sockcred->sc_ngroups); - error2 = 1; - } else { - dbgmsg(("#%u sc_ngroups = %d", i, sockcred->sc_ngroups)); - if (check_groups(sockcred->sc_groups, sockcred->sc_ngroups) < 0) { - logmsgx("#%u sc_groups has wrong GIDs", i); - error2 = 1; + switch (client_fork()) { + case 0: + for (i = 1; i <= 2; ++i) { + dbgmsg("client #%u", i); + fd = socket_create(); + if (fd < 0) + rv = -2; + else { + rv = t_sockcred_client(1, fd); + if (socket_close(fd) < 0) + rv = -2; } + if (rv != 0) + break; } - - if (error2) - goto next_error; - - if ((cmptr = CMSG_NXTHDR(&msg, cmptr)) != NULL) { - logmsgx("#%u control data has extra header, this is wrong", - i); - goto next_error; + client_exit(rv); + break; + case 1: + fd = socket_create(); + if (fd < 0) + rv = -2; + else { + rv = t_sockcred_server(1, fd); + if (rv == 0) + rv = t_sockcred_server(3, fd); + rv_client = client_wait(); + if (rv == 0 || (rv == -2 && rv_client != 0)) + rv = rv_client; + if (socket_close(fd) < 0) + rv = -2; } - - continue; -next_error: - error = -1; + break; + default: + rv = -2; } -done_close: - if (sock_type == SOCK_STREAM) - if (close(fd2) < 0) { - logmsg("close"); - return (-2); - } - return (error); + return (rv); +} -failed: - if (sock_type == SOCK_STREAM) - if (close(fd2) < 0) - logmsg("close"); - return (-2); +static int +t_sockcred_2_client(int fd) +{ + return (t_sockcred_client(2, fd)); } static int -t_sockcred(int type) +t_sockcred_2_server(int fd) { - int error, fd, optval; + return (t_sockcred_server(2, fd)); +} - assert(type == 0 || type == 1); +static int +t_sockcred_2(void) +{ + return (t_generic(t_sockcred_2_client, t_sockcred_2_server)); +} - if ((fd = create_server_socket()) < 0) - return (-2); +static int +t_cmsgcred_sockcred_server(int fd1) +{ + struct msghdr msghdr; + struct iovec iov[1]; + struct cmsghdr *cmsghdr; + void *cmsg_data, *cmsg1_data, *cmsg2_data; + size_t cmsg_size, cmsg1_size, cmsg2_size; + u_int i; + int fd2, rv, val; - if (sock_type == SOCK_STREAM) - if (listen(fd, LISTENQ) < 0) { - logmsg("listen"); - goto failed; - } + fd2 = -1; + rv = -2; - if (type == 0) { - optval = 1; - if (setsockopt(fd, 0, LOCAL_CREDS, &optval, sizeof optval) < 0) { - logmsg("setsockopt(LOCAL_CREDS) for %s socket", - sock_type == SOCK_STREAM ? "stream listening" : "datagram"); - if (errno == ENOPROTOOPT) { - error = -1; - goto done_close; - } - goto failed; - } + cmsg1_size = CMSG_SPACE(SOCKCREDSIZE(proc_cred.gid_num)); + cmsg2_size = CMSG_SPACE(sizeof(struct cmsgcred)); + cmsg1_data = malloc(cmsg1_size); + cmsg2_data = malloc(cmsg2_size); + if (cmsg1_data == NULL || cmsg2_data == NULL) { + logmsg("malloc"); + goto done; } - if ((client_pid = fork()) == (pid_t)-1) { - logmsg("fork"); - goto failed; + dbgmsg("setting LOCAL_CREDS"); + val = 1; + if (setsockopt(fd1, 0, LOCAL_CREDS, &val, sizeof(val)) < 0) { + logmsg("setsockopt(LOCAL_CREDS)"); + goto done; } - if (client_pid == 0) { - myname = "CLIENT"; - if (close_socket((const char *)NULL, fd) < 0) - _exit(1); - t_sockcred_client(type); - } + if (sync_send() < 0) + goto done; - if ((error = t_sockcred_server(type, fd, 2)) == -2) { - (void)wait_client(); - goto failed; - } + if (sock_type == SOCK_STREAM) { + fd2 = socket_accept(fd1); + if (fd2 < 0) + goto done; + } else + fd2 = fd1; - if (wait_client() < 0) - goto failed; + cmsg_data = cmsg1_data; + cmsg_size = cmsg1_size; + rv = -1; + for (i = 1; i <= ipc_msg.msg_num; ++i) { + dbgmsg("message #%u", i); + + msghdr_init_server(&msghdr, iov, cmsg_data, cmsg_size); + if (message_recv(fd2, &msghdr) < 0) { + rv = -2; + break; + } -done_close: - if (close_socket(serv_sock_path, fd) < 0) { - logmsgx("close_socket failed"); - return (-2); - } - return (error); + if (check_msghdr(&msghdr, sizeof(*cmsghdr)) < 0) + break; -failed: - if (close_socket(serv_sock_path, fd) < 0) - logmsgx("close_socket failed"); - return (-2); -} + cmsghdr = CMSG_FIRSTHDR(&msghdr); + if (i == 1 || sock_type == SOCK_DGRAM) { + if (check_scm_creds_sockcred(cmsghdr) < 0) + break; + } else { + if (check_scm_creds_cmsgcred(cmsghdr) < 0) + break; + } -static int -t_sockcred_stream1(void) -{ - return (t_sockcred(0)); + cmsg_data = cmsg2_data; + cmsg_size = cmsg2_size; + } + if (i > ipc_msg.msg_num) + rv = 0; +done: + free(cmsg1_data); + free(cmsg2_data); + if (sock_type == SOCK_STREAM && fd2 >= 0) + if (socket_close(fd2) < 0) + rv = -2; + return (rv); } static int -t_sockcred_stream2(void) +t_cmsgcred_sockcred(void) { - return (t_sockcred(1)); + return (t_generic(t_cmsgcred_client, t_cmsgcred_sockcred_server)); } static int -t_sockcred_dgram(void) +t_timeval_client(int fd) { - return (t_sockcred(0)); + struct msghdr msghdr; + struct iovec iov[1]; + void *cmsg_data; + size_t cmsg_size; + int rv; + + if (sync_recv() < 0) + return (-2); + + rv = -2; + + cmsg_size = CMSG_SPACE(sizeof(struct timeval)); + cmsg_data = malloc(cmsg_size); + if (cmsg_data == NULL) { + logmsg("malloc"); + goto done; + } + msghdr_init_client(&msghdr, iov, cmsg_data, cmsg_size, + SCM_TIMESTAMP, sizeof(struct timeval)); + + if (socket_connect(fd) < 0) + goto done; + + if (message_sendn(fd, &msghdr) < 0) + goto done; + + rv = 0; +done: + free(cmsg_data); + return (rv); } static int -t_cmsgcred_sockcred(void) +t_timeval_server(int fd1) { - int error, fd, optval; + struct msghdr msghdr; + struct iovec iov[1]; + struct cmsghdr *cmsghdr; + void *cmsg_data; + size_t cmsg_size; + u_int i; + int fd2, rv; - if ((fd = create_server_socket()) < 0) + if (sync_send() < 0) return (-2); - if (sock_type == SOCK_STREAM) - if (listen(fd, LISTENQ) < 0) { - logmsg("listen"); - goto failed; - } + fd2 = -1; + rv = -2; - optval = 1; - if (setsockopt(fd, 0, LOCAL_CREDS, &optval, sizeof optval) < 0) { - logmsg("setsockopt(LOCAL_CREDS) for %s socket", - sock_type == SOCK_STREAM ? "stream listening" : "datagram"); - if (errno == ENOPROTOOPT) { - error = -1; - goto done_close; - } - goto failed; - } - - if ((client_pid = fork()) == (pid_t)-1) { - logmsg("fork"); - goto failed; + cmsg_size = CMSG_SPACE(sizeof(struct timeval)); + cmsg_data = malloc(cmsg_size); + if (cmsg_data == NULL) { + logmsg("malloc"); + goto done; } - if (client_pid == 0) { - myname = "CLIENT"; - if (close_socket((const char *)NULL, fd) < 0) - _exit(1); - t_cmsgcred_client(1); - } + if (sock_type == SOCK_STREAM) { + fd2 = socket_accept(fd1); + if (fd2 < 0) + goto done; + } else + fd2 = fd1; - if ((error = t_sockcred_server(0, fd, 1)) == -2) { - (void)wait_client(); - goto failed; - } + rv = -1; + for (i = 1; i <= ipc_msg.msg_num; ++i) { + dbgmsg("message #%u", i); + + msghdr_init_server(&msghdr, iov, cmsg_data, cmsg_size); + if (message_recv(fd2, &msghdr) < 0) { + rv = -2; + break; + } - if (wait_client() < 0) - goto failed; + if (check_msghdr(&msghdr, sizeof(*cmsghdr)) < 0) + break; -done_close: - if (close_socket(serv_sock_path, fd) < 0) { - logmsgx("close_socket failed"); - return (-2); + cmsghdr = CMSG_FIRSTHDR(&msghdr); + if (check_scm_timestamp(cmsghdr) < 0) + break; } - return (error); + if (i > ipc_msg.msg_num) + rv = 0; +done: + free(cmsg_data); + if (sock_type == SOCK_STREAM && fd2 >= 0) + if (socket_close(fd2) < 0) + rv = -2; + return (rv); +} -failed: - if (close_socket(serv_sock_path, fd) < 0) - logmsgx("close_socket failed"); - return (-2); +static int +t_timeval(void) +{ + return (t_generic(t_timeval_client, t_timeval_server)); } -/* - * Send one message with data and control message with SCM_TIMESTAMP - * type to server and exit. - */ -static void -t_timestamp_client(void) +static int +t_bintime_client(int fd) { - union { - struct cmsghdr cm; - char control[CMSG_SPACE(sizeof(struct timeval))]; - } control_un; - struct msghdr msg; + struct msghdr msghdr; struct iovec iov[1]; - struct cmsghdr *cmptr; - int fd; - - if ((fd = create_unbound_socket()) < 0) - goto failed; - - if (connect_server(fd) < 0) - goto failed_close; - - iov[0].iov_base = ipc_message; - iov[0].iov_len = IPC_MESSAGE_SIZE; - - msg.msg_name = NULL; - msg.msg_namelen = 0; - msg.msg_iov = iov; - msg.msg_iovlen = 1; - msg.msg_control = control_un.control; - msg.msg_controllen = no_control_data ? - sizeof(struct cmsghdr) :sizeof control_un.control; - msg.msg_flags = 0; - - cmptr = CMSG_FIRSTHDR(&msg); - cmptr->cmsg_len = CMSG_LEN(no_control_data ? - 0 : sizeof(struct timeval)); - cmptr->cmsg_level = SOL_SOCKET; - cmptr->cmsg_type = SCM_TIMESTAMP; + void *cmsg_data; + size_t cmsg_size; + int rv; - dbgmsg(("msg_controllen = %u, cmsg_len = %u", - (u_int)msg.msg_controllen, (u_int)cmptr->cmsg_len)); + if (sync_recv() < 0) + return (-2); - if (sendmsg_timeout(fd, &msg, IPC_MESSAGE_SIZE) < 0) - goto failed_close; + rv = -2; - if (close_socket((const char *)NULL, fd) < 0) - goto failed; + cmsg_size = CMSG_SPACE(sizeof(struct bintime)); + cmsg_data = malloc(cmsg_size); + if (cmsg_data == NULL) { + logmsg("malloc"); + goto done; + } + msghdr_init_client(&msghdr, iov, cmsg_data, cmsg_size, + SCM_BINTIME, sizeof(struct bintime)); - _exit(0); + if (socket_connect(fd) < 0) + goto done; -failed_close: - (void)close_socket((const char *)NULL, fd); + if (message_sendn(fd, &msghdr) < 0) + goto done; -failed: - _exit(1); + rv = 0; +done: + free(cmsg_data); + return (rv); } -/* - * Receive one message with data and control message with SCM_TIMESTAMP - * type followed by struct timeval{} from client. - */ static int -t_timestamp_server(int fd1) +t_bintime_server(int fd1) { - union { - struct cmsghdr cm; - char control[CMSG_SPACE(sizeof(struct timeval)) + EXTRA_CMSG_SPACE]; - } control_un; - char buf[IPC_MESSAGE_SIZE]; - int error, fd2; - struct msghdr msg; + struct msghdr msghdr; struct iovec iov[1]; - struct cmsghdr *cmptr; - const struct timeval *timeval; + struct cmsghdr *cmsghdr; + void *cmsg_data; + size_t cmsg_size; + u_int i; + int fd2, rv; + + if (sync_send() < 0) + return (-2); + + fd2 = -1; + rv = -2; + + cmsg_size = CMSG_SPACE(sizeof(struct bintime)); + cmsg_data = malloc(cmsg_size); + if (cmsg_data == NULL) { + logmsg("malloc"); + goto done; + } if (sock_type == SOCK_STREAM) { - if ((fd2 = accept_timeout(fd1)) < 0) - return (-2); + fd2 = socket_accept(fd1); + if (fd2 < 0) + goto done; } else fd2 = fd1; - iov[0].iov_base = buf; - iov[0].iov_len = sizeof buf; - - msg.msg_name = NULL; - msg.msg_namelen = 0; - msg.msg_iov = iov; - msg.msg_iovlen = 1; - msg.msg_control = control_un.control; - msg.msg_controllen = sizeof control_un.control; - msg.msg_flags = 0; - - if (recvmsg_timeout(fd2, &msg, sizeof buf) < 0) - goto failed; + rv = -1; + for (i = 1; i <= ipc_msg.msg_num; ++i) { + dbgmsg("message #%u", i); + + msghdr_init_server(&msghdr, iov, cmsg_data, cmsg_size); + if (message_recv(fd2, &msghdr) < 0) { + rv = -2; + break; + } - error = -1; + if (check_msghdr(&msghdr, sizeof(*cmsghdr)) < 0) + break; - if (msg.msg_flags & MSG_CTRUNC) { - logmsgx("control data was truncated, MSG_CTRUNC flag is on"); - goto done; + cmsghdr = CMSG_FIRSTHDR(&msghdr); + if (check_scm_bintime(cmsghdr) < 0) + break; } + if (i > ipc_msg.msg_num) + rv = 0; +done: + free(cmsg_data); + if (sock_type == SOCK_STREAM && fd2 >= 0) + if (socket_close(fd2) < 0) + rv = -2; + return (rv); +} - if (msg.msg_controllen < sizeof(struct cmsghdr)) { - logmsgx("msg_controllen %u < %lu (sizeof(struct cmsghdr))", - (u_int)msg.msg_controllen, (u_long)sizeof(struct cmsghdr)); - goto done; - } +static int +t_bintime(void) +{ + return (t_generic(t_bintime_client, t_bintime_server)); +} - if ((cmptr = CMSG_FIRSTHDR(&msg)) == NULL) { - logmsgx("CMSG_FIRSTHDR is NULL"); - goto done; - } +static int +t_cmsg_len_client(int fd) +{ + struct msghdr msghdr; + struct iovec iov[1]; + struct cmsghdr *cmsghdr; + void *cmsg_data; + size_t size, cmsg_size; + socklen_t socklen; + int rv; - dbgmsg(("msg_controllen = %u, cmsg_len = %u", - (u_int)msg.msg_controllen, (u_int)cmptr->cmsg_len)); + if (sync_recv() < 0) + return (-2); + + rv = -2; - if (cmptr->cmsg_level != SOL_SOCKET) { - logmsgx("cmsg_level %d != SOL_SOCKET", cmptr->cmsg_level); + cmsg_size = CMSG_SPACE(sizeof(struct cmsgcred)); + cmsg_data = malloc(cmsg_size); + if (cmsg_data == NULL) { + logmsg("malloc"); goto done; } + msghdr_init_client(&msghdr, iov, cmsg_data, cmsg_size, + SCM_CREDS, sizeof(struct cmsgcred)); + cmsghdr = CMSG_FIRSTHDR(&msghdr); - if (cmptr->cmsg_type != SCM_TIMESTAMP) { - logmsgx("cmsg_type %d != SCM_TIMESTAMP", cmptr->cmsg_type); + if (socket_connect(fd) < 0) goto done; + + size = msghdr.msg_iov != NULL ? msghdr.msg_iov->iov_len : 0; + rv = -1; + for (socklen = 0; socklen < CMSG_LEN(0); ++socklen) { + cmsghdr->cmsg_len = socklen; + dbgmsg("send: data size %zu", size); + dbgmsg("send: msghdr.msg_controllen %u", + (u_int)msghdr.msg_controllen); + dbgmsg("send: cmsghdr.cmsg_len %u", + (u_int)cmsghdr->cmsg_len); + if (sendmsg(fd, &msghdr, 0) < 0) + continue; + logmsgx("sent message with cmsghdr.cmsg_len %u < %u", + (u_int)cmsghdr->cmsg_len, (u_int)CMSG_LEN(0)); + break; } + if (socklen == CMSG_LEN(0)) + rv = 0; - if (cmptr->cmsg_len != CMSG_LEN(sizeof(struct timeval))) { - logmsgx("cmsg_len %u != %lu (CMSG_LEN(sizeof(struct timeval))", - (u_int)cmptr->cmsg_len, (u_long)CMSG_LEN(sizeof(struct timeval))); + if (sync_send() < 0) { + rv = -2; goto done; } +done: + free(cmsg_data); + return (rv); +} - timeval = (const struct timeval *)CMSG_DATA(cmptr); +static int +t_cmsg_len_server(int fd1) +{ + int fd2, rv; - dbgmsg(("timeval tv_sec %jd, tv_usec %jd", - (intmax_t)timeval->tv_sec, (intmax_t)timeval->tv_usec)); + if (sync_send() < 0) + return (-2); - if ((cmptr = CMSG_NXTHDR(&msg, cmptr)) != NULL) { - logmsgx("control data has extra header"); - goto done; - } + rv = -2; - error = 0; + if (sock_type == SOCK_STREAM) { + fd2 = socket_accept(fd1); + if (fd2 < 0) + goto done; + } else + fd2 = fd1; + if (sync_recv() < 0) + goto done; + + rv = 0; done: - if (sock_type == SOCK_STREAM) - if (close(fd2) < 0) { - logmsg("close"); - return (-2); - } - return (error); + if (sock_type == SOCK_STREAM && fd2 >= 0) + if (socket_close(fd2) < 0) + rv = -2; + return (rv); +} -failed: - if (sock_type == SOCK_STREAM) - if (close(fd2) < 0) - logmsg("close"); - return (-2); +static int +t_cmsg_len(void) +{ + return (t_generic(t_cmsg_len_client, t_cmsg_len_server)); } static int -t_timestamp(void) +t_peercred_client(int fd) { - int error, fd; + struct xucred xucred; + socklen_t len; - if ((fd = create_server_socket()) < 0) - return (-2); + if (sync_recv() < 0) + return (-1); - if (sock_type == SOCK_STREAM) - if (listen(fd, LISTENQ) < 0) { - logmsg("listen"); - goto failed; - } + if (socket_connect(fd) < 0) + return (-1); - if ((client_pid = fork()) == (pid_t)-1) { - logmsg("fork"); - goto failed; + len = sizeof(xucred); + if (getsockopt(fd, 0, LOCAL_PEERCRED, &xucred, &len) < 0) { + logmsg("getsockopt(LOCAL_PEERCRED)"); + return (-1); } - if (client_pid == 0) { - myname = "CLIENT"; - if (close_socket((const char *)NULL, fd) < 0) - _exit(1); - t_timestamp_client(); - } + if (check_xucred(&xucred, len) < 0) + return (-1); - if ((error = t_timestamp_server(fd)) == -2) { - (void)wait_client(); - goto failed; - } + return (0); +} - if (wait_client() < 0) - goto failed; +static int +t_peercred_server(int fd1) +{ + struct xucred xucred; + socklen_t len; + int fd2, rv; - if (close_socket(serv_sock_path, fd) < 0) { - logmsgx("close_socket failed"); + if (sync_send() < 0) return (-2); + + fd2 = socket_accept(fd1); + if (fd2 < 0) + return (-2); + + len = sizeof(xucred); + if (getsockopt(fd2, 0, LOCAL_PEERCRED, &xucred, &len) < 0) { + logmsg("getsockopt(LOCAL_PEERCRED)"); + rv = -2; + goto done; } - return (error); -failed: - if (close_socket(serv_sock_path, fd) < 0) - logmsgx("close_socket failed"); - return (-2); + if (check_xucred(&xucred, len) < 0) { + rv = -1; + goto done; + } + + rv = 0; +done: + if (socket_close(fd2) < 0) + rv = -2; + return (rv); +} + +static int +t_peercred(void) +{ + return (t_generic(t_peercred_client, t_peercred_server)); } diff -ruNp unix_cmsg.orig/unix_cmsg.t unix_cmsg/unix_cmsg.t --- unix_cmsg.orig/unix_cmsg.t 2012-11-19 14:38:48.000000000 +0200 +++ unix_cmsg/unix_cmsg.t 2013-02-08 12:08:52.000000000 +0200 @@ -11,47 +11,78 @@ n=0 run() { - result=`${cmd} -t $2 $3 $4 2>&1` - if [ $? -eq 0 ]; then - echo -n "ok $1" - else - echo -n "not ok $1" + result=`${cmd} -t $2 $3 ${5%% *} 2>&1` + if [ $? -ne 0 ]; then + echo -n "not " fi - echo " -" $5 + echo "ok $1 - $4 ${5#* }" echo ${result} | grep -E "SERVER|CLIENT" | while read line; do echo "# ${line}" done } -echo "1..15" +echo "1..47" -for desc in \ - "Sending, receiving cmsgcred" \ - "Receiving sockcred (listening socket has LOCAL_CREDS) # TODO" \ - "Receiving sockcred (accepted socket has LOCAL_CREDS) # TODO" \ - "Sending cmsgcred, receiving sockcred # TODO" \ - "Sending, receiving timestamp" +for t1 in \ + "1 Sending, receiving cmsgcred" \ + "4 Sending cmsgcred, receiving sockcred" \ + "5 Sending, receiving timeval" \ + "6 Sending, receiving bintime" \ + "7 Check cmsghdr.cmsg_len" do - n=`expr ${n} + 1` - run ${n} stream "" ${n} "STREAM ${desc}" + for t2 in \ + "0 " \ + "1 (no data)" \ + "2 (no array)" \ + "3 (no data, array)" + do + n=$((n + 1)) + run ${n} stream "-z ${t2%% *}" STREAM "${t1} ${t2#* }" + done +done + +for t1 in \ + "2 Receiving sockcred (listening socket)" \ + "3 Receiving sockcred (accepted socket)" +do + for t2 in \ + "0 " \ + "1 (no data)" + do + n=$((n + 1)) + run ${n} stream "-z ${t2%% *}" STREAM "${t1} ${t2#* }" + done done -i=0 -for desc in \ - "Sending, receiving cmsgcred" \ - "Receiving sockcred # TODO" \ - "Sending cmsgcred, receiving sockcred # TODO" \ - "Sending, receiving timestamp" +n=$((n + 1)) +run ${n} stream "-z 0" STREAM "8 Check LOCAL_PEERCRED socket option" + +for t1 in \ + "1 Sending, receiving cmsgcred" \ + "3 Sending cmsgcred, receiving sockcred" \ + "4 Sending, receiving timeval" \ + "5 Sending, receiving bintime" \ + "6 Check cmsghdr.cmsg_len" do - i=`expr ${i} + 1` - n=`expr ${n} + 1` - run ${n} dgram "" ${i} "DGRAM ${desc}" + for t2 in \ + "0 " \ + "1 (no data)" \ + "2 (no array)" \ + "3 (no data, array)" + do + n=$((n + 1)) + run ${n} dgram "-z ${t2%% *}" DGRAM "${t1} ${t2#* }" + done done -run 10 stream -z 1 "STREAM Sending, receiving cmsgcred (no control data)" -run 11 stream -z 4 "STREAM Sending cmsgcred, receiving sockcred (no control data) # TODO" -run 12 stream -z 5 "STREAM Sending, receiving timestamp (no control data)" - -run 13 dgram -z 1 "DGRAM Sending, receiving cmsgcred (no control data)" -run 14 dgram -z 3 "DGRAM Sending cmsgcred, receiving sockcred (no control data) # TODO" -run 15 dgram -z 4 "DGRAM Sending, receiving timestamp (no control data)" +for t1 in \ + "2 Receiving sockcred" +do + for t2 in \ + "0 " \ + "1 (no data)" + do + n=$((n + 1)) + run ${n} dgram "-z ${t2%% *}" DGRAM "${t1} ${t2#* }" + done +done From owner-freebsd-net@FreeBSD.ORG Sat Feb 9 12:58:59 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 63F01614 for ; Sat, 9 Feb 2013 12:58:59 +0000 (UTC) (envelope-from pi@opsec.eu) Received: from home.opsec.eu (home.opsec.eu [IPv6:2001:14f8:200::1]) by mx1.freebsd.org (Postfix) with ESMTP id 25F9894C for ; Sat, 9 Feb 2013 12:58:59 +0000 (UTC) Received: from pi by home.opsec.eu with local (Exim 4.80.1 (FreeBSD)) (envelope-from ) id 1U4A1B-00039u-2b for freebsd-net@freebsd.org; Sat, 09 Feb 2013 13:58:57 +0100 Date: Sat, 9 Feb 2013 13:58:57 +0100 From: Kurt Jaeger To: FreeBSD Net Subject: Re: Intel 82574 issue reported on Slashdot Message-ID: <20130209125856.GX8239@home.opsec.eu> References: <51163E5B.7070602@zedat.fu-berlin.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <51163E5B.7070602@zedat.fu-berlin.de> X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 09 Feb 2013 12:58:59 -0000 Hi! > We don't even have the tool tcpreplay in the ports mentioned in that BLOG. net-mgmt/tcpreplay is not the same ? -- pi@opsec.eu +49 171 3101372 7 years to go ! From owner-freebsd-net@FreeBSD.ORG Sat Feb 9 14:41:38 2013 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id F26AF2D2; Sat, 9 Feb 2013 14:41:37 +0000 (UTC) (envelope-from bright@mu.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.freebsd.org (Postfix) with ESMTP id D16BFD6B; Sat, 9 Feb 2013 14:41:37 +0000 (UTC) Received: from Alfreds-MacBook-Pro-9.local (c-67-180-208-218.hsd1.ca.comcast.net [67.180.208.218]) by elvis.mu.org (Postfix) with ESMTPSA id 228CF1A3C43; Sat, 9 Feb 2013 06:41:29 -0800 (PST) Message-ID: <51166019.9040104@mu.org> Date: Sat, 09 Feb 2013 06:41:29 -0800 From: Alfred Perlstein User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:17.0) Gecko/20130107 Thunderbird/17.0.2 MIME-Version: 1.0 To: George Neville-Neil Subject: Re: [PATCH] Add a new TCP_IGNOREIDLE socket option References: <201301221511.02496.jhb@freebsd.org> <50FF06AD.402@networx.ch> <061B4EA5-6A93-48A0-A269-C2C3A3C7E77C@lakerest.net> <201302060746.43736.jhb@freebsd.org> <511292C9.4040307@mu.org> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Randall Stewart , John Baldwin , net@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 09 Feb 2013 14:41:38 -0000 On 2/7/13 12:04 PM, George Neville-Neil wrote: > On Feb 6, 2013, at 12:28 , Alfred Perlstein wrote: > >> On 2/6/13 4:46 AM, John Baldwin wrote: >>> On Wednesday, February 06, 2013 6:27:04 am Randall Stewart wrote: >>>> John: >>>> >>>> A burst at line rate will *often* cause drops. This is because >>>> router queues are at a finite size. Also such a burst (especially >>>> on a long delay bandwidth network) cause your RTT to increase even >>>> if there is no drop which is going to hurt you as well. >>>> >>>> A SHOULD in an RFC says you really really really really need to do it >>>> unless there is some thing that makes you willing to override it. It is >>>> slight wiggle room. >>>> >>>> In this I agree with Andre, we should not be *not* doing it. Otherwise >>>> folks will be turning this on and it is plain wrong. It may be fine >>>> for your network but I would not want to see it in FreeBSD. >>>> >>>> In my testing here at home I have put back into our stack max-burst. This >>>> uses Mark Allman's version (not Kacheong Poon's) where you clamp the cwnd at >>>> no more than 4 packets larger than your flight. All of my testing >>>> high-bw-delay or lan has shown this to improve TCP performance. This >>>> is because it helps you avoid bursting out so many packets that you overflow >>>> a queue. >>>> >>>> In your long-delay bw link if you do burst out too many (and you never >>>> know how many that is since you can not predict how full all those >>>> MPLS queues are or how big they are) you will really hurt yourself even worse. >>>> Note that generally in Cisco routers the default queue size is somewhere between >>>> 100-300 packets depending on the router. >>> Due to the way our application works this never happens, but I am fine with >>> just keeping this patch private. If there are other shops that need this they >>> can always dig the patch up from the archives. >>> >> This is yet another time when I'm sad about how things happen in FreeBSD. >> >> A developer come forward with a non-default option that's very useful for some specific workloads, specifically one that contributes much time and $$$ to the project and the community rejects the patches even though it's been successful in other OSes. >> >> It makes zero sense. >> >> John, can you repost the patch? Maybe there is a way to refactor this somehow so it's like accept filters where we can plug in a hook for TCP? >> >> I am very disappointed, but not surprised. >> > I take away the complete opposite feeling. This is how we work through these issues. > It's clear from the discussion that this need not be a default in the system, > and is a special case. We had a reasoned discussion of what would be best to do > and at least two experts in TCP weighed in on the effect this change might have. > > Not everything proposed by a developer need go into the tree, in particular since these > discussions are archived we can always revisit this later. > > This is exactly how collaborative development should look, whether or not the patch > is integrated now, next week, next year, or ever. I agree that discussion is great, we have all learned quite a bit from it, about TCP and the dangers of adjusting buffering without considerable thought. I would not be involved in FreeBSD had this type of discussion and information not be discussed on the lists so readily. However, the end result must be far different than what has occurred so far. If the code was deemed unacceptable for general inclusion, then we must find a way to provide a light framework to accomplish the needs of the community member. Take for instance someone who is starting a company that needs this facility. Which OS will they choose? One who has integrated a useful feature? Or one who has rejected it and left that code in the mailing list archives? As much as expert opinion is valuable, it must include understanding and need of handling special cases and the ability to facilitate those special cases for our users and developers. -Alfred From owner-freebsd-net@FreeBSD.ORG Sat Feb 9 15:03:15 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id C19C3B2C for ; Sat, 9 Feb 2013 15:03:15 +0000 (UTC) (envelope-from josh@tcbug.org) Received: from out5-smtp.messagingengine.com (out5-smtp.messagingengine.com [66.111.4.29]) by mx1.freebsd.org (Postfix) with ESMTP id 97216E2C for ; Sat, 9 Feb 2013 15:03:15 +0000 (UTC) Received: from compute1.internal (compute1.nyi.mail.srv.osa [10.202.2.41]) by gateway1.nyi.mail.srv.osa (Postfix) with ESMTP id 1AEDB20B78; Sat, 9 Feb 2013 10:03:15 -0500 (EST) Received: from frontend2.nyi.mail.srv.osa ([10.202.2.161]) by compute1.internal (MEProxy); Sat, 09 Feb 2013 10:03:15 -0500 DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d= messagingengine.com; h=references:mime-version:in-reply-to :content-type:content-transfer-encoding:message-id:cc:from :subject:date:to; s=smtpout; bh=7qVrpk+SQmazQbkWsRYeAZ3ed84=; b= jlb+naDs7d1l23K1kqX5AyRhmnzA2Y9IO1a4EUmkV/OtOSBMLQ6MdtZ+aWZbUMHC ZhWaiLV/wKNo9T3K4uNNvpS99ohnR5mXRaj53a4vXH+Whp+piW/TVX73nRCjOVu9 ++PsGsZbneafjEeSV3OwqzE5kklFR5wYfHT5yBk+QM4= X-Sasl-enc: ew1go1yqd7HaADaJOl9g6ty62iiSmf4bOxcqRlCaSQsD 1360422194 Received: from [10.77.222.69] (unknown [166.137.184.140]) by mail.messagingengine.com (Postfix) with ESMTPA id 856304827A0; Sat, 9 Feb 2013 10:03:14 -0500 (EST) References: Mime-Version: 1.0 (1.0) In-Reply-To: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Message-Id: X-Mailer: iPhone Mail (10A551) From: Josh Paetzel Subject: Re: OCE driver patches Date: Sat, 9 Feb 2013 07:03:13 -0800 To: "Duvvuru,Venkat Kumar" Cc: "freebsd-net@freebsd.org" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 09 Feb 2013 15:03:15 -0000 Vencat, There's been a breakdown in communication. I've been working on oce with Ada= m and have a bunch of oce hardware. Please cc me on any patches you have. (p= r's are fine, but they won't get my attention) Thanks, Josh Paetzel On Feb 7, 2013, at 3:57 AM, "Duvvuru,Venkat Kumar" wrote: > Hi, > I have submitted this patch http://www.freebsd.org/cgi/query-pr.cgi?pr=3D= 171838 some time back. Could you please let me know when this will be pulled= in? > I have some more patches to submit. Please let me know if submitting it on= line at this link http://www.freebsd.org/send-pr.html is the only way to get= them in or is there an alternative to the patch submission? >=20 > Thanks, > Venkat >=20 > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" From owner-freebsd-net@FreeBSD.ORG Sat Feb 9 15:07:18 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id D58EDD11 for ; Sat, 9 Feb 2013 15:07:18 +0000 (UTC) (envelope-from josh@tcbug.org) Received: from out5-smtp.messagingengine.com (out5-smtp.messagingengine.com [66.111.4.29]) by mx1.freebsd.org (Postfix) with ESMTP id AD2F9E6B for ; Sat, 9 Feb 2013 15:07:18 +0000 (UTC) Received: from compute2.internal (compute2.nyi.mail.srv.osa [10.202.2.42]) by gateway1.nyi.mail.srv.osa (Postfix) with ESMTP id 06CCB20C88; Sat, 9 Feb 2013 10:07:18 -0500 (EST) Received: from frontend2.nyi.mail.srv.osa ([10.202.2.161]) by compute2.internal (MEProxy); Sat, 09 Feb 2013 10:07:18 -0500 DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d= messagingengine.com; h=references:mime-version:in-reply-to :content-type:content-transfer-encoding:message-id:cc:from :subject:date:to; s=smtpout; bh=X4sLkKIUw3lnxnxFDQoJ9YyE2OM=; b= PTfgYYIKJssBvaQS8pbykEK1iLKHmQQi+X8mBugx/9QazTucxk46vP/frafMIasg KSaKnlzjJykMFHmc2LZuSOWA51OxyjvgCvjy5WhvKJIC74abHrJkBm05BHXHyyqf Sbs9FoAoAEFuTUT9oPK5hmelKSlQSmlBh1SMPmvgifQ= X-Sasl-enc: QMHOqOTyrDKyoo8urt+NqAvp6GBU/1Ew8vyTCC0Ey3A6 1360422437 Received: from [10.77.222.69] (unknown [166.137.184.140]) by mail.messagingengine.com (Postfix) with ESMTPA id 7DBDE4827A0; Sat, 9 Feb 2013 10:07:17 -0500 (EST) References: <18410.1360281197@tristatelogic.com> Mime-Version: 1.0 (1.0) In-Reply-To: <18410.1360281197@tristatelogic.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Message-Id: <6D56DD0F-7A50-4423-8BFD-9E949187E203@tcbug.org> X-Mailer: iPhone Mail (10A551) From: Josh Paetzel Subject: Re: Question: Why ain't I getting gigabit speed? Date: Sat, 9 Feb 2013 07:07:16 -0800 To: "Ronald F. Guilmette" Cc: "freebsd-net@freebsd.org" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 09 Feb 2013 15:07:18 -0000 There's likely something wrong hardware wise. Either with that nic, the cabl= e, or the port you are plugging it into. The NIC is (correctly) not autoneg= otiating 1000TX full duplex for some reason, and when you try to force it it= doesn't work.=20 Thanks, Josh Paetzel On Feb 7, 2013, at 3:53 PM, "Ronald F. Guilmette" wr= ote: >=20 >=20 > Apologies for following up on myself, but I just now found this: >=20 > https://support.freenas.org/ticket/894 >=20 > This thread would suggest that I ain't alone in experienceing this > problem with the RTL8110S. >=20 > That other guy apparently solved his problem by just simply switching > to a CAT6 cable. I however am already using CAT6 cables, and the problem > for me still exists. >=20 > I tried adding: >=20 > media 1000baseTX >=20 > to my ifconfig_re0=3D line in my /etc/rc.conf file (and then rebooting), > however when I did that, a subsequent "ifconfig -a" showed that indeed, > the card had now been correctly configured to speak 1000baseT, however > it also said: >=20 > status: no carrier >=20 > even though the thing most definitely _is_ still plugged in to my > E2000 router, and I could not ping anything else, even on my own LAN. >=20 > So I'm still stuck, and still looking for an answer. How can I get this > card working at gigabit speed? > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" From owner-freebsd-net@FreeBSD.ORG Sat Feb 9 20:15:10 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id CABDD27C; Sat, 9 Feb 2013 20:15:10 +0000 (UTC) (envelope-from sendtomatt@gmail.com) Received: from mail-pb0-f44.google.com (mail-pb0-f44.google.com [209.85.160.44]) by mx1.freebsd.org (Postfix) with ESMTP id A2F50147; Sat, 9 Feb 2013 20:15:10 +0000 (UTC) Received: by mail-pb0-f44.google.com with SMTP id wz12so201450pbc.3 for ; Sat, 09 Feb 2013 12:15:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:message-id:date:from:user-agent:mime-version:to:cc :subject:references:in-reply-to:content-type :content-transfer-encoding; bh=RKvXNRPcYWKcuBPxy/tOz4M0k5dwUEKxXpNBJe++Awg=; b=dhYee/onyDLIJLYcOPJz3OXkExGwAUR1jD8lGWvOCbWYf8VbKdRWeeAJN6BRhKHO+6 78BCxEzBGpkFg8yD/cKNnuIKhUcrl0pyIA5aBgcODWDpm8FDsiPpLdujD2PTr42oNnKq UjyAMjIcFnWd0J1zCnqnT34SOBq8id841eZGx0PHN38eNTDG2iZSW+VzEJMfUBqtEYfp +reu1kPUch29ldk7iMIIKWPVJymlTM67SAYUbVFSq8ujLUSznw5RXJhs7+1DEDkt6hwm jcJe3KXOlkGC/a1WLNN+GBAntpuQDoG9RyWCdfqr0UAn+6sqdUIfdgKoM+ltm6RLUzp0 sTxQ== X-Received: by 10.66.81.231 with SMTP id d7mr28464986pay.27.1360434411550; Sat, 09 Feb 2013 10:26:51 -0800 (PST) Received: from flatline.local (70-36-223-239.dsl.dynamic.sonic.net. [70.36.223.239]) by mx.google.com with ESMTPS id xa2sm2155412pbc.23.2013.02.09.10.26.49 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Sat, 09 Feb 2013 10:26:50 -0800 (PST) Message-ID: <511694B0.4060805@gmail.com> Date: Sat, 09 Feb 2013 10:25:52 -0800 From: matt User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130202 Thunderbird/17.0.2 MIME-Version: 1.0 To: Johnny Eriksson Subject: Re: Intel 82574 issue reported on Slashdot References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: FreeBSD Net , FreeBSD Current X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 09 Feb 2013 20:15:10 -0000 On 02/09/13 09:15, Johnny Eriksson wrote: >> In all honesty.. The blog post (and your email) are basically >> information free, they don't name names and provide no script >> or downloadable code that will allow end users to check if they >> are affected. > A link with a little bit more information: > > http://blog.krisk.org/2013/02/packets-of-death.html > >> Daniel O'Connor software and network engineer > --Johnny > _______________________________________________ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" > Did anyone check to see if the Intel announcement had a 2 at 0x47f? :) I do have a machine with these controllers that had a bridge "hang" in a very odd fashion a while back, but it didn't repeat. It wasn't a SuperMicro board, which is what some posters were saying were affected. I would imagine a large ping packet (as used to test MTU) should inoculate any affected interface if issued at boot, I don't think our padding lines up with the problem. Once an interface sees a packet with anything else at 0x47f, it's no longer affected, so there's a narrow window of vulnerability in affected NICs. Matt From owner-freebsd-net@FreeBSD.ORG Sat Feb 9 22:18:14 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 449B693F for ; Sat, 9 Feb 2013 22:18:14 +0000 (UTC) (envelope-from peter@rulingia.com) Received: from vps.rulingia.com (host-122-100-2-194.octopus.com.au [122.100.2.194]) by mx1.freebsd.org (Postfix) with ESMTP id CE495A82 for ; Sat, 9 Feb 2013 22:18:12 +0000 (UTC) Received: from server.rulingia.com (c220-239-255-116.belrs5.nsw.optusnet.com.au [220.239.255.116]) by vps.rulingia.com (8.14.5/8.14.5) with ESMTP id r19MBFEE009678 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Sun, 10 Feb 2013 09:11:17 +1100 (EST) (envelope-from peter@rulingia.com) X-Bogosity: Ham, spamicity=0.000000 Received: from server.rulingia.com (localhost.rulingia.com [127.0.0.1]) by server.rulingia.com (8.14.5/8.14.5) with ESMTP id r19MB9FG020723 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sun, 10 Feb 2013 09:11:10 +1100 (EST) (envelope-from peter@server.rulingia.com) Received: (from peter@localhost) by server.rulingia.com (8.14.5/8.14.5/Submit) id r19MB8d8020718; Sun, 10 Feb 2013 09:11:08 +1100 (EST) (envelope-from peter) Date: Sun, 10 Feb 2013 09:11:08 +1100 From: Peter Jeremy To: "Ronald F. Guilmette" Subject: Re: Question: Why ain't I getting gigabit speed? Message-ID: <20130209221107.GA32563@server.rulingia.com> References: <29539.1360356512@tristatelogic.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="MGYHOYXEY6WxJCY8" Content-Disposition: inline In-Reply-To: <29539.1360356512@tristatelogic.com> X-PGP-Key: http://www.rulingia.com/keys/peter.pgp User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-net@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 09 Feb 2013 22:18:14 -0000 --MGYHOYXEY6WxJCY8 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 2013-Feb-07 15:13:27 -0800, "Ronald F. Guilmette" wrote: >I just aquired a brand new chepie gigabit PCI ethernet card off eBay. >The main chip on it appears to be an RTL8110S-32. =2E.. >I've tried two different CAT6 cables, two different LAN ports on my E2000, >and I've even tried the card in two different PCI slost on my motherboard, >but the results are always the same. Based on the testing you've done, I'd suspect a broken card. I'll echo the comments that Realtek is the cheapest end of the market and you'd be better off with a Broadcom or Intel NIC. >P.S. dmesg has this to say about the card: > >re0: port = 0xbe00-0xbeff mem 0xdf9ff000-0xdf9ff0ff irq 18 at device 5.0 on pci4 >re0: Chip rev. 0x04000000 >re0: MAC rev. 0x00000000 >re0: Ethernet address: 00:13:3b:02:03:bd >re0: link state changed to UP >re0: link state changed to DOWN >re0: link state changed to UP The critical information you've left out is the phy details. This should look something like: miibus0: on re0 rgephy0: PHY 1 on miibus0 rgephy0: none, 10baseT, 10baseT-FDX, 10baseT-FDX-flow, 100baseTX, 100baseT= X-FDX, 100baseTX-FDX-flow, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000= baseT-FDX-master, 1000baseT-FDX-flow, 1000baseT-FDX-flow-master, auto, auto= -flow On 2013-Feb-08 12:48:32 -0800, "Ronald F. Guilmette" wrote: >I did some more experiments. Fortunately, I had a CAT6 crossover cable >lying around. For future reference, you can join GigE interfaces with either straight through or crossover cables. >In the case of connecting to the laptop, all seemed to work correctly, >however ifconfig showed that my re0 device in this case believed itself >to be "master". (I suspect that this may make a difference, and that >the current FreeBSD re driver may perhaps behave better when it is >acting as master.) The "master" term seems to only define which end is the clock source. >in the output from "ifconfig re0", *however* a moment or two later, >suddenly the connection was entirely dropped, and now the ifconfig >output said "no carrier". What status was reported on the lights at each end? --=20 Peter Jeremy --MGYHOYXEY6WxJCY8 Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iEYEARECAAYFAlEWyXsACgkQ/opHv/APuIcDWwCfZtwmX62zoMEEjDBYX/ivoRLj KYMAn3jmti5xGTopzlcEh8zihhHoNl9C =QTxK -----END PGP SIGNATURE----- --MGYHOYXEY6WxJCY8--