Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 2 Jun 2011 18:16:34 -0700
From:      Jeremy Chadwick <freebsd@jdc.parodius.com>
To:        Per von Zweigbergk <pvz@itassistans.se>
Cc:        freebsd-fs@freebsd.org, freebsd-net@freebsd.org, John <jwd@SlowBlink.Com>, Patrick Lamaiziere <patfbsd@davenulle.org>
Subject:   Re: Production use of carp?
Message-ID:  <20110603011634.GA59971@icarus.home.lan>
In-Reply-To: <2E31CF74-416A-4310-9102-FD0C86275D0E@itassistans.se>
References:  <20110602203940.GA80549@slowblink.com> <20110603001036.5ad0ff8d@davenulle.org> <2E31CF74-416A-4310-9102-FD0C86275D0E@itassistans.se>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Jun 03, 2011 at 02:37:53AM +0200, Per von Zweigbergk wrote:
> 3 jun 2011 kl. 00.10 skrev Patrick Lamaiziere:
> 
> > You may want to implement your own control because if the two hosts
> > cannot communicate, you will have two masters. This can happen if the
> > links on the both hosts are up, but none packet are forwarded (ie the
> > switch connecting the two boxes is broken in some way).
> 
> As a general thought that might be interesting when you're building your HA solution:
> 
> One less-documented feature of VMware ESXi is that it checks whether it's isolated from the network by pinging the gateway on the management network.
> 
> This is how ESXi trys to avoid having a split-brain condition - by making sure that it only considers itself to be the master if it can reach the gateway, but cannot reach any other servers. You might implement gating in a similar way to avoid a split-brain condition in your HA solution.

If that's indeed true, VMware ESXi is doing something Extremely Bad.
Pinging the local gateway (read: A ROUTER) as a form of determining if
network I/O is failing is an unwise decision.

Commercial-grade routers (read: Cisco, Juniper) all implement a form of
ICMP prioritisation.  The router can (and will) discard/drop inbound
ICMP packets directed at the router itself (e.g. a destination IP of the
gateway) during high CPU utilisation.  Packets destined to a router
itself (e.g. destination IP is the router) are handled very, very
differently.

This is why network engineers always recommend that when testing for
network anomalies, the client (source IP) should attempt to speak to a
web server, another box, whatever -- anything as long as it's not a
router -- for its destination IP.

At my workplace, for quite some time our Solaris machines using mpathd
were configured to ping their default gateway (a Juniper M320).  After
we expanded and scaled out, we found that mpath would randomly fail over
to the 2nd NIC for presumably no reason.  The above description was the
root cause.  The solution was to have mpath probe against a dedicated
host (another Solaris box) rather than the network gateway.  Problem
solved.

-- 
| Jeremy Chadwick                                   jdc@parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                   Mountain View, CA, US |
| Making life hard for others since 1977.               PGP 4BD6C0CB |




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110603011634.GA59971>