Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 12 Oct 1996 22:37:45 -0600 (MDT)
From:      Wes Peters <softweyr@xmission.com>
To:        James FitzGibbon <james@nexis.net>
Cc:        questions@freebsd.org
Subject:   Redundancy in FBSD web server
Message-ID:  <199610130437.WAA01056@obie.softweyr.com>
In-Reply-To: <72125273@toto.iv>

next in thread | previous in thread | raw e-mail | index | archive | help
James FitzGibbon writes:
 > I need to set up a web server that (in my client's humble words) "CANNOT
 > EVER BE DOWN".  They've got the budget, so I recommended two servers that
 > can serve domains concurrently.
 > 
 > I'd be interested in hearing how people have/would implement this.  My
 > thoughts so far would be to:
 > 
 > a) Use a powerful box as the main server, with a backup box mirroring
 > sites and ready to take over should the main one go down.
 > 
 > -or-
 > 
 > b) Use machines of equal power, using a DNS entry with multiple A records
 > to shuffle requests back and forth.
 > 
 > Opinions appreciated, including ways of detecting a downed host and taking
 > over (ifconfig aliasing) IPs of a machine that has crashed.

I've just finished (5 minutes ago, literally) a project of this sort
at my "day" job: a redundant, 24x7 television broadcast automation
system.  Our system, a large audio/video switch, uses a control
processor based on an M68000.  In order to acheive reliable backup, we
put two of them in the system, and have them monitor each others
state.  This is a really simplistic system, but it works fairly well.*

What I'd suggest you do is to have two machines connected to your
router.  Each has a network interface, neither interface is the
www.whatever address.  When the "primary" machine boots, it adds the
address of www.whatever as an alias for its network interface; the
standby begins pinging (or attempting http connections to) the
www.whatever address.  If the standby machine detects the primary has
gone down, by not answering the pings, it adds www.whatever as an
alias for *its* network and takes over.

Things you have to account for:

 o The original machine comes back up.  Does it now take over and the
   "backup" shut up, or does it become the backup.  In our system, a
   control board always comes up "standby" and only goes "active" once
   it has determined there isn't another active board.

 o Keeping the HTML "database" up to date.  In our system, critical
   dynamic configuration data is downloaded from the active board
   whenever a system comes up standby and an active board exists.

 o Routing and ARP tables.  You're juggling the hardware address
   associated with the www.whatever IP address dynamically on your
   local network.  I know this isn't going to "just work," but I'd
   have to study the routing implications of this before commiting to
   do this.

 o Communications between the two systems.  We use a pair of dedicated
   serial ports for our redundancy state messages; each board
   transmits its current state 4x/sec and expects the other board to
   report its state at least 2x/sec.  If a report is not seen within
   500 msec, it is assumed that the other board has died, and this
   board becomes active.  For your application, pinging over the
   network may be good enough.

People who really study backup systems will explain that you can't
have a proper redundant system with 2, or *any* even number of
processors, or with the same software on every system.  On the other
hand, you can measurably increase your reliability in the face of
simple hardware failures without a lot of custom programming.

Good luck.  Feel free to e-mail back if I can answer any questions for
you.  ;^)

	Wes Peters

* Our most common mode of failure is, of course, catastrophic software
failure.  In this case, what usually happens is the active board
crashes, the standby board takes over, the control system resends its
last command, and the newly active board dies in *exactly* the same
manner the previous active board did.  Sigh.  That's why you can't
have a truly redundant system running the *same* software.  But who
can afford to develop *two* control systems, when this one took twelve
years to develop to this stage already???

-- 
   Wes Peters	|
    Softweyr 	|   Where am I, and what am I doing in this handbasket? 
   Consulting	| 
 softweyr@xmission.com




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199610130437.WAA01056>