From owner-freebsd-cluster@FreeBSD.ORG Wed Oct 1 04:42:25 2003 Return-Path: Delivered-To: freebsd-cluster@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3215416A4B3; Wed, 1 Oct 2003 04:42:25 -0700 (PDT) Received: from mccinet.ru (relay.cell.ru [212.119.96.41]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9294B43F75; Wed, 1 Oct 2003 04:42:23 -0700 (PDT) (envelope-from dolgop@mccinet.ru) Received: from [212.1.235.150] (HELO server.dep624) by mccinet.ru (CommuniGate Pro SMTP 4.1.4) with ESMTP-TLS id 7524781; Wed, 01 Oct 2003 15:42:21 +0400 From: Evgeny Dolgopiat To: "freebsd-cluster" Date: Wed, 1 Oct 2003 15:44:36 +0400 User-Agent: KMail/1.5 MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200310011544.36182.dolgop@mccinet.ru> cc: freebsd-hackers Subject: ng_one2many heartbeat algorithm for LAN fault tolerance X-BeenThere: freebsd-cluster@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: evg_dolgop@mail.ru List-Id: Clustering FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 Oct 2003 11:42:25 -0000 Hi all, The link to the patches and some docs: http://www.watson.org/~ilmar/download/ng_one2many.tbz What is it Link failure determination for one2many netgraph node. How it works It is implemented as "heartbeat" packet counters on all one2many tranked interfaces. If the number of packest hook received is less for some specified value than max number of packest, received by another hooks of the node, then interface is marked as failed (subnet or link failure). If this difference is less than this value and interface is marked as failed, then interface is up and working. How to setup Algorithm number is 2, so to configure node one should issue "setconfig {xmitAlg=1 failAlg=2}" message for ng_one2many node. There are two params of algorithm: timeout - time between sending of hearbeat packets (integer number of 1/10 sec) period - number of timeouts for failure determination statistics Default values are timeout=10 and period=10. Two new node messages: "gethbconfig" and "sethbconfig {timeout=X period=Y}" for getting and setting heartbeat algorithm params. Author: Evgeny Dolgopiat