Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 23 Jan 1997 18:14:14 +1100
From:      Andrew McRae <amcrae@cisco.com>
To:        jdp@polstra.com
Cc:        hackers@freebsd.org
Subject:   Re: Fault-tolerant network with 2 ethernets
Message-ID:  <199701230714.SAA16460@metaplex-ss10.cisco.com>

next in thread | raw e-mail | index | archive | help
jdp@polstra.com (John Polstra):
> This is probably a routing 101 question.  But I've never had to do much
> with routing, so I could use some advice.
> 
> A client wants a fault-tolerant LAN setup like this:
> 
>     ethernet A (100BaseT)
>     ---+------+------+------+------+------+---
>        |      |      |      |      |      |
>      host   host   host   host   host   host
>        |      |      |      |      |      |
>     ---+------+------+------+------+------+---
>     ethernet B (100BaseT)
> 
> The goal is that either ethernet could go down, yet all the hosts could
> still talk to each other.  Or, one of the ethernet cards on a host could
> go down, and it could still talk to all the other hosts.  In either
> case, it has to happen automatically, without manual intervention.  Load
> balancing isn't a goal, just fault-tolerance.

I used to work in the real time & high availability field,
and this is a typical architecture that we designed and
installed. Unfortunately, it is almost impossible to
do this without involving the applications in some way, or
without throwing a fair amount of hardware at the problem.

> At first I was hoping that routed could do this for me, without
> the applications even being aware of it.  But now I'm not so sure.
> Each ethernet will have to have its own IP network number (right?),
> and so each host will have to have 2 IP addresses.  A given packet
> will be addressed to only a single IP address, and that implies
> it's headed for a particular ethernet.  If that ethernet is down,
> all addresses on it are down, and the packet won't be delivered
> no matter what routed does.
> 
> Is this analysis correct?  Is there a simple way to get what I want?
> How about a non-simple way?

You analysis is correct; the BSD stack and most IP based
applications rely on IP endpoints for routing i.e a TCP
connection requires a remote & local IP address & port number.
I guess you can play games with routing and shutting interfaces
down (like detecting when an interface fails, and then
setting up a bunch of IP host router and marking the failed
interface down), but my experience is that it is difficult
to make this work in all circumstances.

There is no simple way, but consider this: using 100Bt, you
are going to be using 100BT hubs or switches. A lot of the
switches and hubs are providing high availability these days,
such as dual power supplys etc. so the kind of failure you
get is more related to a single port as opposed to the
entire net failing. In this case, all the other host interfaces
are going to be still operational, and their interfaces
are going to be sending/receiving to other hosts on the net
with the failed interface, so how do you tell when all
the hosts should start using the backup net??

So a host may see a failed ethernet, but perhaps only that
host's port is down.  There are lots of combinations of
failure scenarios, and it is real hard to catch all these
with a simplistic, application invisible method.

Another approach is to use a router (or routers), in various configurations
with switches or hubs.

Depending on how much money you want to spend, you can range
from availabilities of 98% to 99.99%.
In the architecture we designed in our high availability
product, the software was cognizant of multiple networks
and was able to handle multiple paths to hosts.
Most of the standard Unix net applications don't handle
network problems of this nature well :-)

the 100BT hub probably has a much higher MTBF and much lower
MTTR than any of the hosts, anyway... why not just keep a spare
hub lying around.
Adding lots of extra hardware and cabling and extra net
interfaces may just reduce the availability of the system.

>    John Polstra                                       jdp@polstra.com
>    John D. Polstra & Co., Inc.                Seattle, Washington USA
>    "Self-knowledge is always bad news."                 -- John Barth

Cheers,
Andrew McRae (amcrae@cisco.com)



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199701230714.SAA16460>