From owner-freebsd-cluster Sun Nov 10 0: 7:58 2002 Delivered-To: freebsd-cluster@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8EFE337B401 for ; Sun, 10 Nov 2002 00:07:57 -0800 (PST) Received: from fuzuli.enderunix.org (64.90.191.122.nyinternet.net [64.90.191.122]) by mx1.FreeBSD.org (Postfix) with SMTP id A69CB43E42 for ; Sun, 10 Nov 2002 00:07:56 -0800 (PST) (envelope-from freebsd@faruk.net) Received: (qmail 45869 invoked by uid 89); 10 Nov 2002 08:08:57 -0000 Message-ID: <20021110080857.45866.qmail@fuzuli.enderunix.org> From: "Omer Faruk Sen" To: freebsd-cluster@freebsd.org Subject: Re: clustering freebsd Date: Sun, 10 Nov 2002 03:08:57 -0500 Mime-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Sender: owner-freebsd-cluster@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Hi. I want to share my experiences about HA in FreeBSD. We haven't much choice in FreeBSD for clustering. I have read about Sporner's (http://sporner.dyndns.org/freebsdclusters/ ) project a few days ago but didn't set it up. It seems promising but I think it lacks file replication? And I really want to hear from him about maturity level of his product. I have installed polyserve's (www.polyserve.com) clustering software for FreeBSD. It does nice on HA and File replication. I can suggest it but it is commercial product. Michael I think you need to learn clustering terms. You may want to look at www.linuxvirtualserver.org and www.ultramonkey.org. Especially ultramonkey has nice pictures that depicts everything. Ultramonkey is a part of vanessa project and vannessa also includes super sparrow (which is a global HA solution that uses BGP for the nearest route). Anyway go and look at the pictures at ultramonkey. There is also linux-ha.org project that is being or was ported ( I don't know the maturity level of it) to FreeBSD. I think that project is very good for HA @ FreeBSD. Someone mentioned VRRP. There is also vrrp implementation in FreeBSD which named freevrrpd (/usr/ports/net/freevrrpd). To conclude Linux is better in clustering solutions we have to admit it :( despite FreeBSD is better in networking (my personal thoughts). PS: I think we can set up a FreeBSD Clustering page just like www.lcic.org does which provides information articles .... I can arrange a web page, ftp, cvs for it. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-cluster" in the body of the message From owner-freebsd-cluster Sun Nov 10 3:13:42 2002 Delivered-To: freebsd-cluster@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5EA6937B401 for ; Sun, 10 Nov 2002 03:13:38 -0800 (PST) Received: from mailout07.sul.t-online.com (mailout07.sul.t-online.com [194.25.134.83]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7B7D943E4A for ; Sun, 10 Nov 2002 03:13:32 -0800 (PST) (envelope-from Alexander@Leidinger.net) Received: from fwd04.sul.t-online.de by mailout07.sul.t-online.com with smtp id 18Aq1w-0003pB-0D; Sun, 10 Nov 2002 12:13:28 +0100 Received: from Andro-Beta.Leidinger.net (520065502893-0001@[80.131.127.106]) by fmrl04.sul.t-online.com with esmtp id 18Aq1n-272u48C; Sun, 10 Nov 2002 12:13:19 +0100 Received: from Magelan.Leidinger.net (Magelan [192.168.1.1]) by Andro-Beta.Leidinger.net (8.12.6/8.12.6) with ESMTP id gAABDL0s047388; Sun, 10 Nov 2002 12:13:21 +0100 (CET) (envelope-from Alexander@Leidinger.net) Received: from Magelan.Leidinger.net (netchild@localhost [127.0.0.1]) by Magelan.Leidinger.net (8.12.6/8.12.6) with SMTP id gAABDePw001288; Sun, 10 Nov 2002 12:13:40 +0100 (CET) (envelope-from Alexander@Leidinger.net) Date: Sun, 10 Nov 2002 12:13:40 +0100 From: Alexander Leidinger To: Michael Grant Cc: freebsd-cluster@FreeBSD.ORG Subject: Re: clustering freebsd Message-Id: <20021110121340.3f2d1827.Alexander@Leidinger.net> In-Reply-To: <200211091604.gA9G4wW28126@splat.grant.org> References: <200211091604.gA9G4wW28126@splat.grant.org> X-Mailer: Sylpheed version 0.8.5claws (GTK+ 1.2.10; i386-portbld-freebsd5.0) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Sender: 520065502893-0001@t-dialin.net Sender: owner-freebsd-cluster@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Sat, 9 Nov 2002 17:04:58 +0100 (MET) Michael Grant wrote: Judging from the other replies, I don't talk about beowulf style clustering and concentrate on failover and HA solutions. > One of the big things that cause me down time is upgrading the OS. > I'm also worried about hardware failure (which luckily hasn't happened > to me yet...) I too would like to achieve at least 5 nines. > Let's say I have a cluster of n machines. Some of those n machines > may be running a web server, some a shell server, some mail server, > some pop/imap mail servers...etc. How is an incoming connection sent > to the right machine? It seems like that there needs to be a single > machine in front of the cluster to send connections the right way, > isn't this a single point of failure? Yes, in this case it is a single point of failure, but there are other ways to solve this. > If you do have multiple machines answering requests, how's this done? > With multiple IP addresses? I know one can specify multiple A > records in DNS and that it'll do a sort of round-robin. But does this > work well? What if one of the machines is down and a caching dns > server returns an ip address of one of the down machines? Seems like > you need then to start modifying the dns zone to take out the down > machines and use a low ttl. This starts to get ugly quickly. No, you can configure the remaining systems to answer on the "bad" IP (e.g. via VRRP). E.g. you have n system which provide the same service (e.g. http). If one of those n system goes down, one of the other n-1 systems takes over the IP of the failed system and answers to requests for it too. You can use VRRP to do this (you don't need a single box in front of those n systems, every n system runs a vrrp daemon, no single point of failure). > Second problem I have been thinking about is shared disk. I read a > post by someone who also had this concern. One obvious way to solve > the shared disk problem is to have another box which has a bunch of > disks in a RAID configuration, and mount the diks via nfs. This disk > box would probably need to be highly available with redundant power > supplies and the like. Another approach is to use e.g. AFS. You can spread the reads over multiple AFS "mirrors", but writes have to go to one specific box (at least this is how I understand it after reading a little bit about it, I haven't used AFS myself). Another way is to have the data in a database and let the database do the replication. MySQL has at least a working 2 way replication feature (I'm talking about 3.x). This way you also get load balancing on the data (you can read/write to both and the other one gets the data too). With postgresql (and oracle, ...) you can have n-way replication. > However, I'm not so convinced that a third disk box is the right > answer. I'd like to see something which could mirror (in real time) a > file system over the lan, thus keeping 2+ disks in sync just like a > RAID array spread over multiple systems. Does such a thing exist? > After hours of searching, I could find nothing that did this. I'm not aware of a free solution for this (but AFS may do something like this... for appropriate values of "something" and "like"). But: is it necessary for your problem? E.g. if you just have minor static data with a low change frequency for a webserver, but a lot of dynamic pages which may change often, you don't need it. Use the replication feature of the database to have "realtime" synchronisation of the dynamic data, and use e.g. rsync periodically for the static data (or keep the machines in sync on your own; or set up a master tree, which is different from the tree which the webserver operates on (life-tree), on one of the machines and periodically sync from the master tree to every life-tree). Obviously this isn't an option for Mail-/IMAP-servers (except they store all mails in a database). > There seems to be essentially 2 types of clustering: > > 1) hot spare failovers > 2) multiple machines operating in parallel > > (Perhaps someone could enlighten me if there are proper names for > these). In this case "2)" is "load-balancing". Your application has to support these features. But "application" doesn't mean e.g. "apache" here. If you have a website which does session tracking (e.g. an e-commerce solution with a product-basket), and you do "1)", and the main system goes down, everything in the basker is lost if the e-commerce-engine does the wrong thing (e.g. storing the session data in the filesystem and determining the right session by taking into account his own hostname or something else which is host specific). The customer has to start over. If the engine does the right thing (storing session data in a replicated database and not using host specific data to determine the right session), only the last action of the customer may fail (if it occurs at the time the system goes down), but if he already has 100 items in the basket, he hasn't to start from scratch again. > What's important to me at the moment is that if I have a user on one > machine that goes down that they can get right back on another machine > and get at their mail or files. Of if someone is surfing our site, > they just automatically get files from the server that's up. If you solve the file syncing problem, VRRP (net/freevrrpd) is an easy (and fast) way to achieve "1)". For "2)" you can add e.g. net/loadd (from the same autor than freevrrpd) to the mix (I haven't tested loadd myself). > Does anyone know of some list of clustering software? Is there > anything I can use today to do #2 that runs on freebsd (or other bsd > systems)? Have a look at http://www.leidinger.net/cgi-bin/search.pl?q=cluster&num=10 for some links I found on this topic (freevrrpd and loadd are parts of the HUT project). Bye, Alexander. -- Speak softly and carry a cellular phone. http://www.Leidinger.net Alexander @ Leidinger.net GPG fingerprint = C518 BC70 E67F 143F BE91 3365 79E2 9C60 B006 3FE7 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-cluster" in the body of the message From owner-freebsd-cluster Sun Nov 10 3:20: 1 2002 Delivered-To: freebsd-cluster@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 458D837B401 for ; Sun, 10 Nov 2002 03:19:54 -0800 (PST) Received: from mailout06.sul.t-online.com (mailout06.sul.t-online.com [194.25.134.19]) by mx1.FreeBSD.org (Postfix) with ESMTP id 745C743E42 for ; Sun, 10 Nov 2002 03:19:53 -0800 (PST) (envelope-from Alexander@Leidinger.net) Received: from fwd08.sul.t-online.de by mailout06.sul.t-online.com with smtp id 18Aq82-0003Ge-01; Sun, 10 Nov 2002 12:19:46 +0100 Received: from Andro-Beta.Leidinger.net (520065502893-0001@[80.131.127.106]) by fmrl08.sul.t-online.com with esmtp id 18Aq7a-0Aso1AC; Sun, 10 Nov 2002 12:19:18 +0100 Received: from Magelan.Leidinger.net (Magelan [192.168.1.1]) by Andro-Beta.Leidinger.net (8.12.6/8.12.6) with ESMTP id gAABJK0s047408; Sun, 10 Nov 2002 12:19:21 +0100 (CET) (envelope-from Alexander@Leidinger.net) Received: from Magelan.Leidinger.net (netchild@localhost [127.0.0.1]) by Magelan.Leidinger.net (8.12.6/8.12.6) with SMTP id gAABJdPw001307; Sun, 10 Nov 2002 12:19:39 +0100 (CET) (envelope-from Alexander@Leidinger.net) Date: Sun, 10 Nov 2002 12:19:39 +0100 From: Alexander Leidinger To: "Omer Faruk Sen" Cc: freebsd-cluster@FreeBSD.ORG Subject: Re: clustering freebsd Message-Id: <20021110121939.4e336b15.Alexander@Leidinger.net> In-Reply-To: <20021110080857.45866.qmail@fuzuli.enderunix.org> References: <20021110080857.45866.qmail@fuzuli.enderunix.org> X-Mailer: Sylpheed version 0.8.5claws (GTK+ 1.2.10; i386-portbld-freebsd5.0) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Sender: 520065502893-0001@t-dialin.net Sender: owner-freebsd-cluster@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Sun, 10 Nov 2002 03:08:57 -0500 "Omer Faruk Sen" wrote: > PS: I think we can set up a FreeBSD Clustering page just like www.lcic.org > does which provides information articles .... I can arrange a web page, ftp, > cvs for it. I'm sure we can find a solution to integrate such a page into FreeBSD.org, we just have to find a soul which we can suc^wconvince to do the work. Bye, Alexander. -- Where do you think you're going today? http://www.Leidinger.net Alexander @ Leidinger.net GPG fingerprint = C518 BC70 E67F 143F BE91 3365 79E2 9C60 B006 3FE7 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-cluster" in the body of the message From owner-freebsd-cluster Sun Nov 10 5:26:16 2002 Delivered-To: freebsd-cluster@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3B8B737B401 for ; Sun, 10 Nov 2002 05:26:15 -0800 (PST) Received: from grant.org (grant.org [206.190.164.98]) by mx1.FreeBSD.org (Postfix) with ESMTP id AC42F43E6E for ; Sun, 10 Nov 2002 05:26:14 -0800 (PST) (envelope-from mgrant@splat.grant.org) Received: from splat.grant.org (mgrant@splat.grant.org [213.39.2.177]) by grant.org (8.12.6/8.12.6) with ESMTP id gAADQ757038945 for ; Sun, 10 Nov 2002 08:26:07 -0500 (EST) (envelope-from mgrant@splat.grant.org) Received: (from mgrant@localhost) by splat.grant.org (8.11.6+Sun/8.11.6) id gAADOng29727; Sun, 10 Nov 2002 14:24:49 +0100 (MET) Date: Sun, 10 Nov 2002 14:24:49 +0100 (MET) Message-Id: <200211101324.gAADOng29727@splat.grant.org> From: Michael Grant To: freebsd-cluster@freebsd.org Subject: Re: clustering freebsd Sender: owner-freebsd-cluster@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Thanks, all of you, for your excellent replies. After reading several replies, yes, it's load balancing that I want to do. That looks very promising to solve the front-end machine problem. I have started looking around for load balancer devices, there seem to be a quite a few on the market, they seem to be in the US$10K range. Some of them like Cisco's LocalDirector product seem to work in conjunction with a router. It's possible that my ISP has one of these or something similar. A quite cool one is here, only 1U high and fully redundant: http://www.loadbalancer.org/modules.php?name=Content&pa=showpage&pid=9 I'll look more into freevrrpd and loadd. On the shared disk side, I've not found the perfect solution (yet). This does seem to be a sticky problem. Several of you have said I don't want to do this. Well, I think I do and worst case, I'll end up using a separate HA configured box and nfs mount it like Compaq's prolient cluster thingy. I haven't give up looking for a better solution though. The need for having a shared writable file system mainly comes from having shell users with home directories and mail boxes. I have had great resistance from the users trying to move mailboxes into another format other than plain mbox format. i.e. moving mail into mysql would not be a popular idea. If mysql can replicate things like that, I wonder about implementing a file system ontop of mysql? The performance would probably suck though. I'm surprised that there isn't some extension to JFS (the Journaling File System) to do something like I want. Alexander mentions AFS. In fact, the folks who brought you AFS have something called CODA which seems to be a network replicated file system. I read up on using it and it's quite complicated and seems to have some restrictions on the size of file systems. It also seems like it's still in an experimental state. It did't give me a warm fuzzy feeling, but it certainly is cool. Anyone else have experience using it? I couldn't find much on iSCSI for freebsd and it's not clear to me that you could have n systems writing to a raid array with vinum. I can believe easily that you could have one system write to a vinum raid array spread over several systems via iscsi though. If anyone has any ideas on this front, I'd like to hear them. Michael Grant To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-cluster" in the body of the message From owner-freebsd-cluster Mon Nov 11 2:58:48 2002 Delivered-To: freebsd-cluster@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 719AA37B401 for ; Mon, 11 Nov 2002 02:58:46 -0800 (PST) Received: from gate.nentec.de (gate2.nentec.de [194.25.215.66]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9791143E4A for ; Mon, 11 Nov 2002 02:58:44 -0800 (PST) (envelope-from sporner@nentec.de) Received: from nenny.nentec.de (root@nenny.nentec.de [153.92.64.1]) by gate.nentec.de (8.11.3/8.9.3) with ESMTP id gABAwgE12322; Mon, 11 Nov 2002 11:58:42 +0100 Received: from nentec.de (andromeda.nentec.de [153.92.64.34]) by nenny.nentec.de (8.11.3/8.11.3) with ESMTP id gABAwct01195; Mon, 11 Nov 2002 11:58:38 +0100 Message-ID: <3DCF8D5D.2080006@nentec.de> Date: Mon, 11 Nov 2002 11:58:37 +0100 From: Andy Sporner User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2a) Gecko/20020910 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Omer Faruk Sen , freebsd-cluster Subject: Re: clustering freebsd References: <20021110080857.45866.qmail@fuzuli.enderunix.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: by AMaViS-perl11-milter (http://amavis.org/) Sender: owner-freebsd-cluster@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Suddenly the list comes alive! :-) I like it! > > I want to share my experiences about HA in FreeBSD. We haven't much > choice in FreeBSD for clustering. I have read about Sporner's > (http://sporner.dyndns.org/freebsdclusters/ ) project a few days ago > but didn't set it up. It seems promising but I think it lacks file > replication? And I really want to hear from him about maturity level > of his product. First, maturity, It has been around for some time and I get downloads--but no real feedback, so I don't know how people are using it *if* they are using it. I guess no news is good news--but... No there is no file replication. That is another matter. I had been thinking about a shared scsi solution. There is a way to query which node is running master. Since there are transition scripts that are run when a node leaves or joins. it is possible to say "if I am now monitor mount filesystem as native UFS, otherwise mount from NFS". This is less then elegant solution to the problem but short of a lock manager and clustered filesytem there is not another way. As for access, one can use the transition scripts to "ifconfig add ...." to add an ip alias. We are lucky with FreeBSD that they do a gratuitous arp when an address is configured. I would be eager to see a HA apache application running. I had thought about writting a white paper on the subject, but time being what it is--it's hard. I am currently working on a load balancing switch for a commercial firm that handles most all of the concerns around access failover (it runs FreeBSD and my HA software). Please let me know what the experiences are with the software. Then I would feel better about spending more development time on it... Thanks! Andy To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-cluster" in the body of the message