From owner-freebsd-cluster Mon Mar 4 10:18: 6 2002 Delivered-To: freebsd-cluster@freebsd.org Received: from cluster.nix.selu.edu (gateway.nix.selu.edu [147.174.59.2]) by hub.freebsd.org (Postfix) with ESMTP id 4836037B400 for ; Mon, 4 Mar 2002 10:17:53 -0800 (PST) Received: from cluster.nix.selu.edu (DerTeufel@localhost [127.0.0.1]) by cluster.nix.selu.edu (8.12.2/8.12.2) with ESMTP id g24IHlce000565 for ; Mon, 4 Mar 2002 12:17:47 -0600 (CST) (envelope-from jfried@cluster.nix.selu.edu) Received: from localhost (jfried@localhost) by cluster.nix.selu.edu (8.12.2/8.12.2/Submit) with ESMTP id g24IHlxI000562 for ; Mon, 4 Mar 2002 12:17:47 -0600 (CST) Date: Mon, 4 Mar 2002 12:17:47 -0600 (CST) From: Jason Fried To: freebsd-cluster@FreeBSD.ORG Subject: FreeBSD Cluster at SLU In-Reply-To: <004a01c1c149$15f47440$3017c581@BSDWIN2KKOROUSH> Message-ID: <20020304120822.F507-100000@cluster.nix.selu.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-cluster@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG I just finished setting up a Cluster running FreeBSD 4.5 at Southeastern Louisiana University, for the Computer Science Department. each node is a PII-450mhz, with 128mb of ram. Its small now, 10 nodes. But maybe I can get a grant for more nodes. The cluster is using lam-mpi along with openPBS. most of the jobs Ive been running are for benchmarking, but I hear the Physics Department has a problem that is ideal for a Cluster to crunch away at. -- Jason Fried To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-cluster" in the body of the message From owner-freebsd-cluster Tue Mar 5 1:21:59 2002 Delivered-To: freebsd-cluster@freebsd.org Received: from gate.nentec.de (gate2.nentec.de [194.25.215.66]) by hub.freebsd.org (Postfix) with ESMTP id 755E537B402 for ; Tue, 5 Mar 2002 01:21:54 -0800 (PST) Received: from nenny.nentec.de (root@nenny.nentec.de [153.92.64.1]) by gate.nentec.de (8.9.3/8.9.3) with ESMTP id KAA04046; Tue, 5 Mar 2002 10:21:53 +0100 Received: from andromeda (andromeda [153.92.64.34]) by nenny.nentec.de (8.11.3/8.11.3/SuSE Linux 8.11.1-0.5) with ESMTP id g259Lph28317; Tue, 5 Mar 2002 10:21:51 +0100 Message-ID: X-Mailer: XFMail 1.4.0 on Linux X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: <20020304120822.F507-100000@cluster.nix.selu.edu> Date: Tue, 05 Mar 2002 10:21:54 +0100 (MET) Reply-To: Andy Sporner Organization: NENTEC Netywerktechnologie GmbH From: Andy Sporner To: Jason Fried Subject: RE: FreeBSD Cluster at SLU Cc: freebsd-cluster@FreeBSD.ORG X-Virus-Scanned: by AMaViS-perl11-milter (http://amavis.org/) Sender: owner-freebsd-cluster@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Hi Jason, I am curious as to the target of this group which I have just joined. I see few posts here. I have a clustering system as well that works for FreeBSD. The purpose is to handle application failover. This weekend (I hope!!!) I will release the next version (211) that provides for centralized management of nodes in the a cluster (That you can see all the processes and do basic administration). My next attempt after 211 is to handle process migration between nodes. Is the clustering that is present here only representative of Beowulf style clustering? Thanks! Andy Sporner PS. The site is http://www.sporner.com/bsdclusters... To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-cluster" in the body of the message From owner-freebsd-cluster Tue Mar 5 7:13:53 2002 Delivered-To: freebsd-cluster@freebsd.org Received: from cluster.nix.selu.edu (gateway.nix.selu.edu [147.174.59.2]) by hub.freebsd.org (Postfix) with ESMTP id B17A337B41E for ; Tue, 5 Mar 2002 07:13:35 -0800 (PST) Received: from cluster.nix.selu.edu (DerTeufel@localhost [127.0.0.1]) by cluster.nix.selu.edu (8.12.2/8.12.2) with ESMTP id g25FDWAn009095; Tue, 5 Mar 2002 09:13:32 -0600 (CST) (envelope-from jfried@cluster.nix.selu.edu) Received: from localhost (jfried@localhost) by cluster.nix.selu.edu (8.12.2/8.12.2/Submit) with ESMTP id g25FDVbf009092; Tue, 5 Mar 2002 09:13:32 -0600 (CST) Date: Tue, 5 Mar 2002 09:13:31 -0600 (CST) From: Jason Fried To: Andy Sporner Cc: freebsd-cluster@FreeBSD.ORG Subject: RE: FreeBSD Cluster at SLU In-Reply-To: Message-ID: <20020305085934.V8898-100000@cluster.nix.selu.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-cluster@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Well, I just joined this list last week, I didnt see any posts, so i dont know. I guess this list includes all types of Clustering Projects that use FreeBSD. Your project sound interesting. Ive been thinking about using some system for process migration, like mosix but thats only for linux, though I hear that somebody has done the same for freebsd Most of the time I spent working on my cluster was to handle the setup of new nodes, and to handle changing configurations on existing nodes. Now I just need to get a book on mpi programing and learn how to write programs to take advantage of this cluster. Jason Fried On Tue, 5 Mar 2002, Andy Sporner wrote: > Hi Jason, > > I am curious as to the target of this group which I have just joined. I see > few posts here. I have a clustering system as well that works for FreeBSD. > The purpose is to handle application failover. This weekend (I hope!!!) I will > release the next version (211) that provides for centralized management of > nodes in the a cluster (That you can see all the processes and do basic > administration). My next attempt after 211 is to handle process migration > between nodes. > > Is the clustering that is present here only representative of Beowulf style > clustering? > > Thanks! > > > > Andy Sporner > > PS. The site is http://www.sporner.com/bsdclusters... > > > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-cluster" in the body of the message From owner-freebsd-cluster Tue Mar 5 7:40:50 2002 Delivered-To: freebsd-cluster@freebsd.org Received: from gate.nentec.de (gate2.nentec.de [194.25.215.66]) by hub.freebsd.org (Postfix) with ESMTP id 3B11F37B402 for ; Tue, 5 Mar 2002 07:40:43 -0800 (PST) Received: from nenny.nentec.de (root@nenny.nentec.de [153.92.64.1]) by gate.nentec.de (8.9.3/8.9.3) with ESMTP id QAA31954; Tue, 5 Mar 2002 16:40:43 +0100 Received: from andromeda (andromeda [153.92.64.34]) by nenny.nentec.de (8.11.3/8.11.3/SuSE Linux 8.11.1-0.5) with ESMTP id g25Fech15861; Tue, 5 Mar 2002 16:40:38 +0100 Message-ID: X-Mailer: XFMail 1.4.0 on Linux X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: <20020305085934.V8898-100000@cluster.nix.selu.edu> Date: Tue, 05 Mar 2002 16:40:41 +0100 (MET) Reply-To: Andy Sporner Organization: NENTEC Netywerktechnologie GmbH From: Andy Sporner To: Jason Fried Subject: RE: FreeBSD Cluster at SLU Cc: freebsd-cluster@FreeBSD.ORG X-Virus-Scanned: by AMaViS-perl11-milter (http://amavis.org/) Sender: owner-freebsd-cluster@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Hi, > > Well, I just joined this list last week, I didnt see any posts, so i dont > know. I guess this list includes all types of Clustering Projects that use > FreeBSD. > > Your project sound interesting. > > Ive been thinking about using some system for process migration, like > mosix but thats only for linux, though I hear that somebody has done the > same for freebsd I don't know about it, except for NOW and SPRITE. I would like to expand on the 'Jail' concept whereby a virtual machine is spanned accross many physical machines. In this way the only processes that can move are in some sort of container. The biggest advantage I think that I can bring is migratable networking sockets. Also shared memory would be available across the machines (though I don't think this is new). > > Most of the time I spent working on my cluster was to handle the setup of > new nodes, and to handle changing configurations on existing nodes. > There is a thread going on about 'fish' in hackers. I had thought about the problem in my GUI. I may actually try to address it, since I handle configuration replication and it would be a small matter to include the settings in 'rc.conf' and other things. > Now I just need to get a book on mpi programing and learn how to write > programs to take advantage of this cluster. Good luck! Andy PS. When I do the next release you might consider trying it for no other reason as to have a central console... > > > Jason Fried > > > > On Tue, 5 Mar 2002, Andy Sporner wrote: > >> Hi Jason, >> >> I am curious as to the target of this group which I have just joined. I see >> few posts here. I have a clustering system as well that works for FreeBSD. >> The purpose is to handle application failover. This weekend (I hope!!!) I >> will >> release the next version (211) that provides for centralized management of >> nodes in the a cluster (That you can see all the processes and do basic >> administration). My next attempt after 211 is to handle process migration >> between nodes. >> >> Is the clustering that is present here only representative of Beowulf style >> clustering? >> >> Thanks! >> >> >> >> Andy Sporner >> >> PS. The site is http://www.sporner.com/bsdclusters... >> >> >> To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-cluster" in the body of the message From owner-freebsd-cluster Tue Mar 5 10:11: 9 2002 Delivered-To: freebsd-cluster@freebsd.org Received: from acl.lanl.gov (acl.lanl.gov [128.165.147.1]) by hub.freebsd.org (Postfix) with SMTP id 1034737B404 for ; Tue, 5 Mar 2002 10:11:00 -0800 (PST) Received: (qmail 167031 invoked from network); 5 Mar 2002 11:10:59 -0700 Received: from snaresland.acl.lanl.gov (128.165.147.113) by acl.lanl.gov with SMTP; 5 Mar 2002 11:10:59 -0700 Received: (qmail 1794 invoked by uid 3499); 5 Mar 2002 11:10:50 -0700 Received: from localhost (sendmail-bs@127.0.0.1) by localhost with SMTP; 5 Mar 2002 11:10:50 -0700 Date: Tue, 5 Mar 2002 11:10:50 -0700 (MST) From: Ronald G Minnich X-X-Sender: To: Andy Sporner Cc: Jason Fried , Subject: RE: FreeBSD Cluster at SLU In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-cluster@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG you could do a lot worse than just porting bproc to freebsd. see www.clustermatic.org ron To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-cluster" in the body of the message From owner-freebsd-cluster Wed Mar 6 0:20:40 2002 Delivered-To: freebsd-cluster@freebsd.org Received: from gate.nentec.de (gate2.nentec.de [194.25.215.66]) by hub.freebsd.org (Postfix) with ESMTP id DDC4637B417 for ; Wed, 6 Mar 2002 00:20:35 -0800 (PST) Received: from nenny.nentec.de (root@nenny.nentec.de [153.92.64.1]) by gate.nentec.de (8.9.3/8.9.3) with ESMTP id JAA08885; Wed, 6 Mar 2002 09:20:31 +0100 Received: from andromeda (andromeda [153.92.64.34]) by nenny.nentec.de (8.11.3/8.11.3/SuSE Linux 8.11.1-0.5) with ESMTP id g268KOh21572; Wed, 6 Mar 2002 09:20:29 +0100 Message-ID: X-Mailer: XFMail 1.4.0 on Linux X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: Date: Wed, 06 Mar 2002 09:20:21 +0100 (MET) Reply-To: Andy Sporner Organization: NENTEC Netywerktechnologie GmbH From: Andy Sporner To: Ronald G Minnich Subject: RE: FreeBSD Cluster at SLU Cc: freebsd-cluster@FreeBSD.ORG, Jason Fried X-Virus-Scanned: by AMaViS-perl11-milter (http://amavis.org/) Sender: owner-freebsd-cluster@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Looks nice, but very Assymetric... On 05-Mar-02 Ronald G Minnich wrote: > > > you could do a lot worse than just porting bproc to freebsd. > > see www.clustermatic.org > > ron > > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-cluster" in the body of the message To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-cluster" in the body of the message From owner-freebsd-cluster Wed Mar 6 7:22:28 2002 Delivered-To: freebsd-cluster@freebsd.org Received: from acl.lanl.gov (acl.lanl.gov [128.165.147.1]) by hub.freebsd.org (Postfix) with SMTP id 5B9D437B404 for ; Wed, 6 Mar 2002 07:22:24 -0800 (PST) Received: (qmail 223542 invoked from network); 6 Mar 2002 08:22:23 -0700 Received: from snaresland.acl.lanl.gov (128.165.147.113) by acl.lanl.gov with SMTP; 6 Mar 2002 08:22:23 -0700 Received: (qmail 7954 invoked by uid 3499); 6 Mar 2002 08:22:23 -0700 Received: from localhost (sendmail-bs@127.0.0.1) by localhost with SMTP; 6 Mar 2002 08:22:23 -0700 Date: Wed, 6 Mar 2002 08:22:23 -0700 (MST) From: Ronald G Minnich X-X-Sender: To: Andy Sporner Cc: , Jason Fried Subject: RE: FreeBSD Cluster at SLU In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-cluster@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Wed, 6 Mar 2002, Andy Sporner wrote: > Looks nice, but very Assymetric... and that's good in a cluster. Assymetry is very very good. There is no need to so SSI on all the nodes in the cluster -- just the node you log into. SSI on 1024 nodes is a huge mistake. ron To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-cluster" in the body of the message From owner-freebsd-cluster Wed Mar 6 7:47:58 2002 Delivered-To: freebsd-cluster@freebsd.org Received: from gate.nentec.de (gate2.nentec.de [194.25.215.66]) by hub.freebsd.org (Postfix) with ESMTP id 5E6D237B419 for ; Wed, 6 Mar 2002 07:47:51 -0800 (PST) Received: from nenny.nentec.de (root@nenny.nentec.de [153.92.64.1]) by gate.nentec.de (8.9.3/8.9.3) with ESMTP id QAA13751; Wed, 6 Mar 2002 16:47:50 +0100 Received: from andromeda (andromeda [153.92.64.34]) by nenny.nentec.de (8.11.3/8.11.3/SuSE Linux 8.11.1-0.5) with ESMTP id g26Flkh10020; Wed, 6 Mar 2002 16:47:46 +0100 Message-ID: X-Mailer: XFMail 1.4.0 on Linux X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: Date: Wed, 06 Mar 2002 16:47:42 +0100 (MET) Reply-To: Andy Sporner Organization: NENTEC Netywerktechnologie GmbH From: Andy Sporner To: Ronald G Minnich Subject: RE: FreeBSD Cluster at SLU Cc: Jason Fried , freebsd-cluster@FreeBSD.ORG X-Virus-Scanned: by AMaViS-perl11-milter (http://amavis.org/) Sender: owner-freebsd-cluster@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Within reason I agree... However having things in one place defeats the high availabilty on a cluster, but we may be talking about different things here. I am looking at making Unix machines more reliable to get to 99.999% uptime. If your configuration image is on machine, than you have no backups. The cluster approach I designed has replication of configuration that covers this, so your "Cluster Monitor" node can fail-over when that machine fails (should it...). On 06-Mar-02 Ronald G Minnich wrote: > On Wed, 6 Mar 2002, Andy Sporner wrote: > >> Looks nice, but very Assymetric... > > and that's good in a cluster. Assymetry is very very good. There is no > need to so SSI on all the nodes in the cluster -- just the node you log > into. > > SSI on 1024 nodes is a huge mistake. > > ron > > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-cluster" in the body of the message To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-cluster" in the body of the message From owner-freebsd-cluster Wed Mar 6 7:52: 4 2002 Delivered-To: freebsd-cluster@freebsd.org Received: from acl.lanl.gov (acl.lanl.gov [128.165.147.1]) by hub.freebsd.org (Postfix) with SMTP id 3FAAB37B405 for ; Wed, 6 Mar 2002 07:52:02 -0800 (PST) Received: (qmail 222646 invoked from network); 6 Mar 2002 08:52:00 -0700 Received: from snaresland.acl.lanl.gov (128.165.147.113) by acl.lanl.gov with SMTP; 6 Mar 2002 08:52:00 -0700 Received: (qmail 8167 invoked by uid 3499); 6 Mar 2002 08:52:00 -0700 Received: from localhost (sendmail-bs@127.0.0.1) by localhost with SMTP; 6 Mar 2002 08:52:00 -0700 Date: Wed, 6 Mar 2002 08:52:00 -0700 (MST) From: Ronald G Minnich X-X-Sender: To: Andy Sporner Cc: Jason Fried , Subject: RE: FreeBSD Cluster at SLU In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-cluster@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Wed, 6 Mar 2002, Andy Sporner wrote: > Within reason I agree... However having things in one place defeats > the high availabilty on a cluster, but we may be talking about > different things here. no, this is actually funny thinking about uptime. People frequently confuse things. - A system with Multiple Points of Failure (MPOF) has no Single Point of Failure (SPOF) - A system with a Single Point of Failure (SPOF) - A system with No SPOF Often, people build systems with MPOF, and mistakenly think they have achieved a sytem with No SPOF. Wrong. We're just trying to get to a system with SPOF, harder than it looks. > I am looking at making Unix machines more > reliable to get to 99.999% uptime. You can actually do this with one node. It's doing it with lots of nodes that is hard. > If your configuration image is on > machine, than you have no backups. See above. > The cluster approach I designed > has replication of configuration that covers this, so your "Cluster > Monitor" node can fail-over when that machine fails (should it...). How large have you made your system to date? how many nodes? have you built it? ron To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-cluster" in the body of the message From owner-freebsd-cluster Wed Mar 6 8:24:59 2002 Delivered-To: freebsd-cluster@freebsd.org Received: from gate.nentec.de (gate2.nentec.de [194.25.215.66]) by hub.freebsd.org (Postfix) with ESMTP id 204A537B405 for ; Wed, 6 Mar 2002 08:24:41 -0800 (PST) Received: from nenny.nentec.de (root@nenny.nentec.de [153.92.64.1]) by gate.nentec.de (8.9.3/8.9.3) with ESMTP id RAA18433; Wed, 6 Mar 2002 17:24:34 +0100 Received: from andromeda (andromeda [153.92.64.34]) by nenny.nentec.de (8.11.3/8.11.3/SuSE Linux 8.11.1-0.5) with ESMTP id g26GOTh11733; Wed, 6 Mar 2002 17:24:29 +0100 Message-ID: X-Mailer: XFMail 1.4.0 on Linux X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: Date: Wed, 06 Mar 2002 17:24:24 +0100 (MET) Reply-To: Andy Sporner Organization: NENTEC Netywerktechnologie GmbH From: Andy Sporner To: Ronald G Minnich Subject: RE: FreeBSD Cluster at SLU Cc: freebsd-cluster@FreeBSD.ORG, Jason Fried X-Virus-Scanned: by AMaViS-perl11-milter (http://amavis.org/) Sender: owner-freebsd-cluster@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Hi Ron, Hopefully this thread will bring some life to this group... > >> Within reason I agree... However having things in one place defeats >> the high availabilty on a cluster, but we may be talking about >> different things here. > > no, this is actually funny thinking about uptime. Yes I agree. Here is where I am coming from. In 1995 I started working with Sequent Clusters. I saw a need to provide clustering such as what was done with VAX clusters some time ago. This means a large cluster is about 32 nodes. The application for such clusters is business applications (such as Oracle and the like). However I do realize that times have changed and clusters are much larger now. But in that time I have done systems architecture at two major corporations that calculated down time in terms of Millions of dollars per hour. So I am well aware of the impacts that need to be addressed. Up until now, my focus has been to provide application failover and nothing more (in the tradition of the original Sequent clusters)--except for a few differences, most notably the lack of a distributed lock manager. But since the goal is simple application failover, it wasn't needed. I'm not up to date on what Oracle has been up to with version 8, but they may have implemented this outside of the O/S by now. Version 7, which I did have exposure to needed the support in the O/S. Again stating my focus to be making a computing platform so that networking services can be scaled in a reliable way. That is to create a platform that has NO SPOF. Every component has a redundant member. I don't think I have to tell you that even this doesn't work well completely.... ;-) > > We're just trying to get to a system with SPOF, harder than it looks. > Clear. The "Monitor Node" does all of the administration on my clustering system and the other nodes are passive. There is a "Lady in waiting" should the master fail. This is dynamically computed as nodes enter and leave the cluster, in a deterministic manner so that there can be no doubt, which node will take over the Monitor responsibility in the event of the monitor node failing... As the monitor node updates it's configuration, it passes the updates to the other nodes. There is a lot of logic to prevent stale nodes from entering the cluster and other mishaps like the "Split Brain" scenario. >> reliable to get to 99.999% uptime. > > You can actually do this with one node. It's doing it with lots of nodes > that is hard. Clear again, but I think only an IBM mainframe or something of that type of hardware reliability. But let's not split hairs over this, because it would take us off topic. > >> The cluster approach I designed >> has replication of configuration that covers this, so your "Cluster >> Monitor" node can fail-over when that machine fails (should it...). > > How large have you made your system to date? how many nodes? have you > built it? > 6 nodes and it works very well. I have a new version that provides a centralized interface to look at the uptime and statistics of all of the nodes. This is a prelude to a single process table image across all of the nodes in the cluster. This is the next major release, which is easily a year away (unless I find helpers! :-) The idea is that whereever a process is started, it makes an entry in the process table. The PID's are assigned in a N-Modulus approach so that the PID determines the home node of the process. When a process migrates, it keeps it's entry on the home node and a new entry is created on the new host node. If it should move again, the home node is updated. I haven't started implementing or benchmarking this yet, so it could change, but that is the initial idea. Since the model is for making a scalable networking application platform, all of the aspects of the process move with the process (including sockets). The idea is that you can telnet into a machine, and have your in.telnetd and shell migrate to another machine without breaking the connection. This uses a gateway device, which keeps track of all of the sessions. When a process moves, the session is updated to point to the new host machine. This gateway needs to be redundant, so here is where the current generation of the cluster software is put to work. There is no hard coded limit on how many nodes can be in the cluster. As I recall Mosix has a limit. Last I head they also had some issues on how to create a network coherent memory space. Lastly I think there was some problem with open-source (because of some military impact in Isreal). But I have digressed, the point is to apply an SMP approach to a network of computers, such as Numa does, but without the O/S being a single point of failure. If a node does, only programs that had resources there fail and can be immediately restarted. The larger the cluster, hopefully the smaller the impact, then it is simply a matter of simple statistics to calculate downtime. Regards Andy To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-cluster" in the body of the message From owner-freebsd-cluster Wed Mar 6 10: 8:16 2002 Delivered-To: freebsd-cluster@freebsd.org Received: from acl.lanl.gov (acl.lanl.gov [128.165.147.1]) by hub.freebsd.org (Postfix) with SMTP id EBEC237B423 for ; Wed, 6 Mar 2002 10:07:52 -0800 (PST) Received: (qmail 234116 invoked from network); 6 Mar 2002 11:07:51 -0700 Received: from snaresland.acl.lanl.gov (128.165.147.113) by acl.lanl.gov with SMTP; 6 Mar 2002 11:07:51 -0700 Received: (qmail 9936 invoked by uid 3499); 6 Mar 2002 11:07:51 -0700 Received: from localhost (sendmail-bs@127.0.0.1) by localhost with SMTP; 6 Mar 2002 11:07:51 -0700 Date: Wed, 6 Mar 2002 11:07:51 -0700 (MST) From: Ronald G Minnich X-X-Sender: To: Andy Sporner Cc: , Jason Fried Subject: RE: FreeBSD Cluster at SLU In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-cluster@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Wed, 6 Mar 2002, Andy Sporner wrote: > > The idea is that whereever a process is started, it makes an entry in > the process table. The PID's are assigned in a N-Modulus approach so that > the PID determines the home node of the process. When a process migrates, > it keeps it's entry on the home node and a new entry is created on the > new host node. If it should move again, the home node is updated. I haven't > started implementing or benchmarking this yet, so it could change, but that > is the initial idea. this is very similar to bproc. Would a single hot-spare approach do the job? I do know there is a telecom company using bproc to do this type of thing. > Since the model is for making a scalable networking application platform, > all of the aspects of the process move with the process (including sockets). movable sockets sure would be nice. your work sounds neat. ron To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-cluster" in the body of the message From owner-freebsd-cluster Wed Mar 6 23:39:55 2002 Delivered-To: freebsd-cluster@freebsd.org Received: from gate.nentec.de (gate2.nentec.de [194.25.215.66]) by hub.freebsd.org (Postfix) with ESMTP id E14C737B404 for ; Wed, 6 Mar 2002 23:39:49 -0800 (PST) Received: from nenny.nentec.de (root@nenny.nentec.de [153.92.64.1]) by gate.nentec.de (8.9.3/8.9.3) with ESMTP id IAA18481; Thu, 7 Mar 2002 08:39:47 +0100 Received: from andromeda (andromeda [153.92.64.34]) by nenny.nentec.de (8.11.3/8.11.3/SuSE Linux 8.11.1-0.5) with ESMTP id g277djh14073; Thu, 7 Mar 2002 08:39:46 +0100 Message-ID: X-Mailer: XFMail 1.4.0 on Linux X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: Date: Thu, 07 Mar 2002 08:39:34 +0100 (MET) Reply-To: Andy Sporner Organization: NENTEC Netywerktechnologie GmbH From: Andy Sporner To: Ronald G Minnich Subject: RE: FreeBSD Cluster at SLU Cc: Jason Fried , freebsd-cluster@FreeBSD.ORG X-Virus-Scanned: by AMaViS-perl11-milter (http://amavis.org/) Sender: owner-freebsd-cluster@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Hi Ron, >> >> The idea is that whereever a process is started, it makes an entry in >> the process table. The PID's are assigned in a N-Modulus approach so that >> the PID determines the home node of the process. When a process migrates, >> it keeps it's entry on the home node and a new entry is created on the >> new host node. If it should move again, the home node is updated. I >> haven't >> started implementing or benchmarking this yet, so it could change, but that >> is the initial idea. > > this is very similar to bproc. Would a single hot-spare approach do the > job? > Well for scalability reasons, probably not. On the other hand, it would also be very bad to be playing "Hot Potatoe" with an unruly process that wants to dominate a machine resources. No doubt some very complicated handling will need to be added. I remember all the trouble they had with Numa and Quad Affinity. Resource affinity will have to also be looked at (like shared memory). I think you have convinced me to look into the effort of porting 'bproc' to FreeBSD. Certainly it would make a good starting point in the direction that I want to go--and reduce certain pains.. More on that later when I have had a look at it. > I do know there is a telecom company using bproc to do this type of thing. > >> Since the model is for making a scalable networking application platform, >> all of the aspects of the process move with the process (including sockets). > > movable sockets sure would be nice. > > your work sounds neat. > Thanks! Likewise! Andy To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-cluster" in the body of the message