From owner-freebsd-isp Fri Jul 7 11:10:25 2000 Delivered-To: freebsd-isp@freebsd.org Received: from mimer.webgiro.com (mimer.webgiro.com [212.209.29.5]) by hub.freebsd.org (Postfix) with ESMTP id 2526137BED1 for ; Fri, 7 Jul 2000 11:10:18 -0700 (PDT) (envelope-from abial@webgiro.com) Received: by mimer.webgiro.com (Postfix, from userid 66) id 4451D2DC0D; Fri, 7 Jul 2000 20:15:54 +0200 (CEST) Received: by mx.webgiro.com (Postfix, from userid 1001) id 8D7D77817; Fri, 7 Jul 2000 20:05:44 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by mx.webgiro.com (Postfix) with ESMTP id 8B1A110E17; Fri, 7 Jul 2000 20:05:44 +0200 (CEST) Date: Fri, 7 Jul 2000 20:05:44 +0200 (CEST) From: Andrzej Bialecki To: Gabriel Ambuehl Cc: Luigi Rizzo , Chris Shenton , Alan Batie , isp@FreeBSD.ORG Subject: Re: Re[2]: load balancing In-Reply-To: <13990135708.20000707183631@buz.ch> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-isp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Fri, 7 Jul 2000, Gabriel Ambuehl wrote: > > having a machine acting as hot-backup is trivial as long as > > you tolerate that during the crash recovery (an unlikely event) > > all active sessions will drop and need to restart. > > I'm very interested in hearing such a solution. The point where we're > failing here is the following one: one SERVICE (not the complete box) > of the box goes down. IP itself stays up. Now the hotspare should jump > in and take the IP over but how are you going to protect the network > from being screwed up by two identical IP addresses? I'd really > appreciate it if one could explain me how to solve this problem (IP > takeover with completely failed boxes is easy). Let's see: +----------+ 10.0.0.1 (main) | | box A |=====================| +--+-------+ 10.0.1.1 (diag) | |1.1.1.1 | | | | heartbeat | | | |1.1.1.2 | +--+-------+ (10.0.0.1) (main) | | box B |=====================| +----------+ 10.0.1.2 (diag) | On both machines you should run monitoring software that checks the health of the machine and/or service. One of the things it needs to check is the heartbeat (it can be as simple as pinging, although a message exchange would be better). The software should allow you to set up criteria for switchover, like: Box A: * monitor health (apps/OS). If sick, start switchover on other machine, shut down main interface (e.g. delete address, remove MAC) and kill the service. Box B: * if no heartbeat, check over diag interface * if no response over diag, assume the other box is dead and go active. This includes configuring MAC/IP address. * also, if the other box tells you so, go active after specified timeout (to give box A a chance of shutting down gracefully) Optionally, you can involve third machine that can monitor and manage the whole cluster. Also optionally, you can have a watchdog in each machine to eliminate situations when IP stays up, but the machine is wedged anyway. Hacking up something like that is about a week/two of work (I've done this for my employer), but setting up the environment to work properly is not a trivial task - you really need to understand the specifics and behaviour of the OS and monitored applications, adjust various timeouts, add some heuristics, spit up three times over your shoulder etc... If you take a look at commercial packages of this type, they do exactly this - it's usually just a bunch of shell scripts, some small utils, plus about a week of work for highly skilled (and expensive) consultant. Andrzej Bialecki // WebGiro AB, Sweden (http://www.webgiro.com) // ------------------------------------------------------------------- // ------ FreeBSD: The Power to Serve. http://www.freebsd.org -------- // --- Small & Embedded FreeBSD: http://www.freebsd.org/~picobsd/ ---- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-isp" in the body of the message