Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 02 Jul 2018 23:49:29 +0200
From:      "Kristof Provost" <kristof@sigsegv.be>
To:        "Dr Josef Karthauser" <joe@truespeed.com>
Cc:        freebsd-net@freebsd.org, "David Athay" <davida@truespeed.com>, "Bjoern A. Zeeb" <bz@freebsd.org>
Subject:   Re: epair failure in production on 11.1-STABLE (r328930) ? weird!
Message-ID:  <CD02639C-AD41-458D-A2F7-609080EBBB2C@sigsegv.be>
In-Reply-To: <F58994AA-5012-482D-9D80-3DB9EEC16F71@truespeed.com>
References:  <20180620095844.9182416723@smtp-relay2.localdomain> <F58994AA-5012-482D-9D80-3DB9EEC16F71@truespeed.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On 2 Jul 2018, at 23:11, Dr Josef Karthauser wrote:
> Break break. We’ve just seen a bug bugzilla report 22710, reporting 
> that epair fails when the queue limit is hit 
> (net.link.epair.netisr_maxqlen). We’ve just introduced a high 
> bandwidth service on this machine and so it’s probably that that’s 
> what’s caused the issue.
>
I think you meant 227100 there.

> But, why has hitting the queue limit broken it entirely!
>
It’s a bug in the epair code. Something’s wrong when it handles a 
queue overflow and it never leaves the overflow state, dropping all new 
packets instead.
I’m afraid that I’ve not been able to do anything about it yet. 
Bjoern is more familiar with that code I believe, and might be able to 
help.

Regards,
Kristof
From owner-freebsd-net@freebsd.org  Mon Jul  2 22:16:35 2018
Return-Path: <owner-freebsd-net@freebsd.org>
Delivered-To: freebsd-net@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 10199FDBAC3
 for <freebsd-net@mailman.ysv.freebsd.org>;
 Mon,  2 Jul 2018 22:16:35 +0000 (UTC)
 (envelope-from bzeeb-lists@lists.zabbadoz.net)
Received: from mx1.sbone.de (mx1.sbone.de [IPv6:2a01:4f8:130:3ffc::401:25])
 (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
 (Client CN "mx1.sbone.de", Issuer "SBone.DE" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id 942E07CCF6
 for <freebsd-net@freebsd.org>; Mon,  2 Jul 2018 22:16:34 +0000 (UTC)
 (envelope-from bzeeb-lists@lists.zabbadoz.net)
Received: from mail.sbone.de (mail.sbone.de [IPv6:fde9:577b:c1a9:31::2013:587])
 (using TLSv1 with cipher ADH-CAMELLIA256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1.sbone.de (Postfix) with ESMTPS id 4D22725D3A8F;
 Mon,  2 Jul 2018 22:16:32 +0000 (UTC)
Received: from content-filter.sbone.de (content-filter.sbone.de
 [IPv6:fde9:577b:c1a9:31::2013:2742])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mail.sbone.de (Postfix) with ESMTPS id 8E972D1F842;
 Mon,  2 Jul 2018 22:16:31 +0000 (UTC)
X-Virus-Scanned: amavisd-new at sbone.de
Received: from mail.sbone.de ([IPv6:fde9:577b:c1a9:31::2013:587])
 by content-filter.sbone.de (content-filter.sbone.de
 [fde9:577b:c1a9:31::2013:2742]) (amavisd-new, port 10024)
 with ESMTP id ADZtI4sMqeWF; Mon,  2 Jul 2018 22:16:30 +0000 (UTC)
Received: from [192.168.124.1] (fresh-ayiya.sbone.de
 [IPv6:fde9:577b:c1a9:f001::2])
 (using TLSv1 with cipher AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mail.sbone.de (Postfix) with ESMTPSA id 885FAD1F840;
 Mon,  2 Jul 2018 22:16:27 +0000 (UTC)
From: "Bjoern A. Zeeb" <bzeeb-lists@lists.zabbadoz.net>
To: "Dr Josef Karthauser" <joe@truespeed.com>
Cc: freebsd-net@freebsd.org, "David Athay" <davida@truespeed.com>
Subject: Re: epair failure in production on 11.1-STABLE (r328930) ? weird!
Date: Mon, 02 Jul 2018 22:16:25 +0000
X-Mailer: MailMate (2.0BETAr6113)
Message-ID: <8D7CF4E7-0D5E-4464-AEB8-DEBD7EB6DB7D@lists.zabbadoz.net>
In-Reply-To: <F58994AA-5012-482D-9D80-3DB9EEC16F71@truespeed.com>
References: <20180620095844.9182416723@smtp-relay2.localdomain>
 <F58994AA-5012-482D-9D80-3DB9EEC16F71@truespeed.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.27
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>;
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 02 Jul 2018 22:16:35 -0000

On 2 Jul 2018, at 21:11, Dr Josef Karthauser wrote:

> We’re experiencing a strange issue in production failure with epair 
> (which we’re using to talk vimage to jails).
>
> FreeBSD s5 11.1-STABLE FreeBSD 11.1-STABLE #2 r328930: Tue Feb  6 
> 16:05:59 GMT 2018     root@s5:/usr/obj/usr/src/sys/TRUESPEED  amd64
>
> Looks like epair has suddenly stopped forwarding packets between the 
> pair interfaces. Our server has been up for 82 days and it’s been 
> working fine, but suddenly packets have stopped being forwarded 
> between epairs across the entire system. (We’ve got around 30 epairs 
> on the host).  So, we’ve got a sudden ARP resolution failure which 
> is affecting all services. :(.

Ok, that’s a very interesting new observation I have not heard before 
or missed.   You are saying that for about 30 epair pairs NONE is 
working anymore?   All 30 are “dead”?

/bz



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CD02639C-AD41-458D-A2F7-609080EBBB2C>