Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 29 Jan 2008 21:46:51 GMT
From:      Christoph Weber-Fahr <cwf-mlqarcor.de@FreeBSD.org>
To:        freebsd-gnats-submit@FreeBSD.org
Subject:   kern/120130: carp causes kernel panics in any constellation
Message-ID:  <200801292146.m0TLkpBX084750@www.freebsd.org>
Resent-Message-ID: <200801292150.m0TLo2Co089464@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help

>Number:         120130
>Category:       kern
>Synopsis:       carp causes kernel panics in any constellation
>Confidential:   no
>Severity:       critical
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Tue Jan 29 21:50:02 UTC 2008
>Closed-Date:
>Last-Modified:
>Originator:     Christoph Weber-Fahr
>Release:        6.3-RELEASE
>Organization:
Arcor AG
>Environment:
FreeBSD XXX.tnd.lab.arcor.de 6.3-RELEASE FreeBSD 6.3-RELEASE #0: Fri Jan 25 21:34:42 CET 2008     wefa@XXX.tnd.lab.arcor.de:/usr/obj/usr/src/sys/DL380  i386

>Description:
Carp reliably and reproducably causes kernel panics.

This is an enhancement of kern/117448 (which itself contains a backreference to kern/92776). 

The referred PR claims this error only for the case of having and destroying 2 carp interfaces. We have tested carp extensively, with both 6.2-RELEASE-p9 and and 6.3-RELEASE, and we have additionally encountered a number of spontaneous  reboots, spurious lockups and similar problems.

Note, that even though the reproduction recipe given below is based on ifconfig destroy commands, we actually saw crashes in normal course of operation during and between tests where carp was active, both with only one and with multiple carp interfaces.
>How-To-Repeat:
Currently we also found 2 ways to repeatbly reproduce those effects:

1.) as documented in the referred kern/11744
   ifconfig carp0 destroy
   ifconfig carp1 destroy

This is unrelated to the constellation in which those Interfaces are - in some constllations the system crashes immediately, in others after the next ifconfig operation.

2.) is is alsow possible to have a crash using only one crap interface. We found the following script to reliably produce a kernel panic within 15-20 minutes:

   while [ 1 ]
   do
      /etc/rc.d/netif restart
      sleep 35
      ifconfig carp0 destroy
      sleep 35
   done

>Fix:



We do not have a fix.

It should specifically be noted, that using ucarp (from net/ucarp in the ports collection) is no alternative either. In our tests we found ucarp 1.3 to have serious recovery issues after a failover wich reproducably left the cluster in a dysfunctional state. We also tested the (not yet ported) ucarp4 and found it to be completely broken in our environment (Cisco Switch platform) - they switched the transport to multicast and apparently completely botched the implementation, so that it doesn't work on either FreeBSD or Linux.




>Release-Note:
>Audit-Trail:
>Unformatted:



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200801292146.m0TLkpBX084750>