From owner-freebsd-net@FreeBSD.ORG Sat Dec 14 05:04:58 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 775DDE91 for ; Sat, 14 Dec 2013 05:04:58 +0000 (UTC) Received: from mail-oa0-x235.google.com (mail-oa0-x235.google.com [IPv6:2607:f8b0:4003:c02::235]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 44C4416C0 for ; Sat, 14 Dec 2013 05:04:58 +0000 (UTC) Received: by mail-oa0-f53.google.com with SMTP id m1so3039602oag.12 for ; Fri, 13 Dec 2013 21:04:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=kl/0b3vzUbj1TE9U+5smN2hdv7LqOSz/AGsaGm6vO/w=; b=abTXphmDTHwoAEPlUIYGwQWwo7G3k9Jdq3oaQSy+Ditj98e50DYYXQbVSbl8D2tGh/ mXuYlrxB/Tsw7W0ZxgJwxEmIS1Wb4bKUwI/jwQPzO8ZyrOjcWUhMuPbNPcrc76cCrPqH Dz9wAF95tUFYWvMUjGLyPvDUZREkS8jodpDzmWr+RdBOmiSX4tRooPo4mRpslsAtBAEw OPAsCpaaELbtvtNuwY9CCn4wIGWckTvrN3Zyqtl1dW6zR3NmF/+8hKh6WwD2gamuuQSz ClNXL31l4tkcbHuEKWL7I57eXrM6GLjmvFlonHDI4V8cPQs5MSBaPhCNLNPNXfeoqU0m SjZw== MIME-Version: 1.0 X-Received: by 10.60.51.102 with SMTP id j6mr4230404oeo.6.1386997497483; Fri, 13 Dec 2013 21:04:57 -0800 (PST) Received: by 10.76.158.225 with HTTP; Fri, 13 Dec 2013 21:04:57 -0800 (PST) Date: Sat, 14 Dec 2013 00:04:57 -0500 Message-ID: Subject: buf_ring in HEAD is racy From: Ryan Stone To: freebsd-net Content-Type: text/plain; charset=ISO-8859-1 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 14 Dec 2013 05:04:58 -0000 I am seeing spurious output packet drops that appear to be due to insufficient memory barriers in buf_ring. I believe that this is the scenario that I am seeing: 1) The buf_ring is empty, br_prod_head = br_cons_head = 0 2) Thread 1 attempts to enqueue an mbuf on the buf_ring. It fetches br_prod_head (0) into a local variable called prod_head 3) Thread 2 enqueues an mbuf on the buf_ring. The sequence of events is essentially: Thread 2 claims an index in the ring and atomically sets br_prod_head (say to 1) Thread 2 sets br_ring[1] = mbuf; Thread 2 does a full memory barrier Thread 2 updates br_prod_tail to 1 4) Thread 2 dequeues the packet from the buf_ring using the single-consumer interface. The sequence of events is essentialy: Thread 2 checks whether queue is empty (br_cons_head == br_prod_tail), this is false Thread 2 sets br_cons_head to 1 Thread 2 grabs the mbuf from br_ring[1] Thread 2 sets br_cons_tail to 1 5) Thread 1, which is still attempting to enqueue an mbuf on the ring. fetches br_cons_tail (1) into a local variable called cons_tail. It sees cons_tail == 1 but prod_head == 0 and concludes that the ring is full and drops the packet (incrementing br_drops unatomically, I might add) I can reproduce several drops per minute by configuring the ixgbe driver to use only 1 queue and then sending traffic from concurrent 8 iperf processes. (You will need this hacky patch to even see the drops with netstat, though: http://people.freebsd.org/~rstone/patches/ixgbe_br_drops.diff) I am investigating fixing buf_ring by using acquire/release semantics rather than load/store barriers. However, I note that this will apparently be the second attempt to fix buf_ring, and I'm seriously questioning whether this is worth the effort compared to the simplicity of using a mutex. I'm not even convinced that a correct lockless implementation will even be a performance win, given the number of memory barriers that will apparently be necessary.