From owner-freebsd-bugs@FreeBSD.ORG Tue Jan 29 21:50:02 2008 Return-Path: Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B8DCE16A475 for ; Tue, 29 Jan 2008 21:50:02 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 980C413C468 for ; Tue, 29 Jan 2008 21:50:02 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.2/8.14.2) with ESMTP id m0TLo2ug089465 for ; Tue, 29 Jan 2008 21:50:02 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.2/8.14.1/Submit) id m0TLo2Co089464; Tue, 29 Jan 2008 21:50:02 GMT (envelope-from gnats) Resent-Date: Tue, 29 Jan 2008 21:50:02 GMT Resent-Message-Id: <200801292150.m0TLo2Co089464@freefall.freebsd.org> Resent-From: FreeBSD-gnats-submit@FreeBSD.org (GNATS Filer) Resent-To: freebsd-bugs@FreeBSD.org Resent-Reply-To: FreeBSD-gnats-submit@FreeBSD.org, Christoph Weber-Fahr Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 45F5516A419 for ; Tue, 29 Jan 2008 21:48:38 +0000 (UTC) (envelope-from nobody@FreeBSD.org) Received: from www.freebsd.org (www.freebsd.org [IPv6:2001:4f8:fff6::21]) by mx1.freebsd.org (Postfix) with ESMTP id 34E1913C45B for ; Tue, 29 Jan 2008 21:48:38 +0000 (UTC) (envelope-from nobody@FreeBSD.org) Received: from www.freebsd.org (localhost [127.0.0.1]) by www.freebsd.org (8.14.2/8.14.2) with ESMTP id m0TLkpg5084751 for ; Tue, 29 Jan 2008 21:46:51 GMT (envelope-from nobody@www.freebsd.org) Received: (from nobody@localhost) by www.freebsd.org (8.14.2/8.14.1/Submit) id m0TLkpBX084750; Tue, 29 Jan 2008 21:46:51 GMT (envelope-from nobody) Message-Id: <200801292146.m0TLkpBX084750@www.freebsd.org> Date: Tue, 29 Jan 2008 21:46:51 GMT From: Christoph Weber-Fahr To: freebsd-gnats-submit@FreeBSD.org X-Send-Pr-Version: www-3.1 Cc: Subject: kern/120130: carp causes kernel panics in any constellation X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 29 Jan 2008 21:50:02 -0000 >Number: 120130 >Category: kern >Synopsis: carp causes kernel panics in any constellation >Confidential: no >Severity: critical >Priority: medium >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Tue Jan 29 21:50:02 UTC 2008 >Closed-Date: >Last-Modified: >Originator: Christoph Weber-Fahr >Release: 6.3-RELEASE >Organization: Arcor AG >Environment: FreeBSD XXX.tnd.lab.arcor.de 6.3-RELEASE FreeBSD 6.3-RELEASE #0: Fri Jan 25 21:34:42 CET 2008 wefa@XXX.tnd.lab.arcor.de:/usr/obj/usr/src/sys/DL380 i386 >Description: Carp reliably and reproducably causes kernel panics. This is an enhancement of kern/117448 (which itself contains a backreference to kern/92776). The referred PR claims this error only for the case of having and destroying 2 carp interfaces. We have tested carp extensively, with both 6.2-RELEASE-p9 and and 6.3-RELEASE, and we have additionally encountered a number of spontaneous reboots, spurious lockups and similar problems. Note, that even though the reproduction recipe given below is based on ifconfig destroy commands, we actually saw crashes in normal course of operation during and between tests where carp was active, both with only one and with multiple carp interfaces. >How-To-Repeat: Currently we also found 2 ways to repeatbly reproduce those effects: 1.) as documented in the referred kern/11744 ifconfig carp0 destroy ifconfig carp1 destroy This is unrelated to the constellation in which those Interfaces are - in some constllations the system crashes immediately, in others after the next ifconfig operation. 2.) is is alsow possible to have a crash using only one crap interface. We found the following script to reliably produce a kernel panic within 15-20 minutes: while [ 1 ] do /etc/rc.d/netif restart sleep 35 ifconfig carp0 destroy sleep 35 done >Fix: We do not have a fix. It should specifically be noted, that using ucarp (from net/ucarp in the ports collection) is no alternative either. In our tests we found ucarp 1.3 to have serious recovery issues after a failover wich reproducably left the cluster in a dysfunctional state. We also tested the (not yet ported) ucarp4 and found it to be completely broken in our environment (Cisco Switch platform) - they switched the transport to multicast and apparently completely botched the implementation, so that it doesn't work on either FreeBSD or Linux. >Release-Note: >Audit-Trail: >Unformatted: