Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 7 Jul 2000 20:05:44 +0200 (CEST)
From:      Andrzej Bialecki <abial@webgiro.com>
To:        Gabriel Ambuehl <gabriel_ambuehl@buz.ch>
Cc:        Luigi Rizzo <luigi@info.iet.unipi.it>, Chris Shenton <cshenton@uucom.com>, Alan Batie <batie@rdrop.com>, isp@FreeBSD.ORG
Subject:   Re: Re[2]: load balancing
Message-ID:  <Pine.BSF.4.20.0007071943280.14402-100000@mx.webgiro.com>
In-Reply-To: <13990135708.20000707183631@buz.ch>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 7 Jul 2000, Gabriel Ambuehl wrote:

> > having a machine acting as hot-backup is trivial as long as
> > you tolerate that during the crash recovery (an unlikely event)
> > all active sessions will drop and need to restart.
> 
> I'm very interested in hearing such a solution. The point where we're
> failing here is the following one: one SERVICE (not the complete box)
> of the box goes down. IP itself stays up. Now the hotspare should jump
> in and take the IP over but how are you going to protect the network
> from being screwed up by two identical IP addresses? I'd really
> appreciate it if one could explain me how to solve this problem (IP
> takeover with completely failed boxes is easy).

Let's see:

+----------+ 10.0.0.1 (main)     |
| box A    |=====================|
+--+-------+ 10.0.1.1 (diag)     |
   |1.1.1.1                      |
   |                             |
   | heartbeat                   |
   |                             |
   |1.1.1.2                      |
+--+-------+ (10.0.0.1) (main)   |
| box B    |=====================|
+----------+ 10.0.1.2 (diag)     |

On both machines you should run monitoring software that checks the health
of the machine and/or service. One of the things it needs to check is the
heartbeat (it can be as simple as pinging, although a message exchange
would be better). The software should allow you to set up criteria for
switchover, like:

Box A:
* monitor health (apps/OS). If sick, start switchover on
  other machine, shut down main interface (e.g. delete
  address, remove MAC) and kill the service.

Box B:
* if no heartbeat, check over diag interface
* if no response over diag, assume the other box is dead and go active.
  This includes configuring MAC/IP address.
* also, if the other box tells you so, go active after specified timeout
  (to give box A a chance of shutting down gracefully)

Optionally, you can involve third machine that can monitor and manage
the whole cluster. Also optionally, you can have a watchdog in each
machine to eliminate situations when IP stays up, but the machine is
wedged anyway.

Hacking up something like that is about a week/two of work (I've done this
for my employer), but setting up the environment to work properly is not a
trivial task - you really need to understand the specifics and behaviour
of the OS and monitored applications, adjust various timeouts, add some
heuristics, spit up three times over your shoulder etc...

If you take a look at commercial packages of this type, they do exactly
this - it's usually just a bunch of shell scripts, some small utils, plus
about a week of work for highly skilled (and expensive) consultant.

Andrzej Bialecki

//  <abial@webgiro.com> WebGiro AB, Sweden (http://www.webgiro.com)
// -------------------------------------------------------------------
// ------ FreeBSD: The Power to Serve. http://www.freebsd.org --------
// --- Small & Embedded FreeBSD: http://www.freebsd.org/~picobsd/ ----




To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-isp" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.20.0007071943280.14402-100000>