From owner-freebsd-cluster  Mon Mar  4 10:18: 6 2002
Delivered-To: freebsd-cluster@freebsd.org
Received: from cluster.nix.selu.edu (gateway.nix.selu.edu [147.174.59.2])
	by hub.freebsd.org (Postfix) with ESMTP id 4836037B400
	for <freebsd-cluster@FreeBSD.ORG>; Mon,  4 Mar 2002 10:17:53 -0800 (PST)
Received: from cluster.nix.selu.edu (DerTeufel@localhost [127.0.0.1])
	by cluster.nix.selu.edu (8.12.2/8.12.2) with ESMTP id g24IHlce000565
	for <freebsd-cluster@FreeBSD.ORG>; Mon, 4 Mar 2002 12:17:47 -0600 (CST)
	(envelope-from jfried@cluster.nix.selu.edu)
Received: from localhost (jfried@localhost)
	by cluster.nix.selu.edu (8.12.2/8.12.2/Submit) with ESMTP id g24IHlxI000562
	for <freebsd-cluster@FreeBSD.ORG>; Mon, 4 Mar 2002 12:17:47 -0600 (CST)
Date: Mon, 4 Mar 2002 12:17:47 -0600 (CST)
From: Jason Fried <jfried@cluster.nix.selu.edu>
To: freebsd-cluster@FreeBSD.ORG
Subject: FreeBSD Cluster at SLU
In-Reply-To: <004a01c1c149$15f47440$3017c581@BSDWIN2KKOROUSH>
Message-ID: <20020304120822.F507-100000@cluster.nix.selu.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-cluster@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-cluster.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-cluster>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-cluster>
X-Loop: FreeBSD.ORG

I just finished setting up a Cluster running FreeBSD 4.5 at Southeastern
Louisiana University, for the Computer Science Department.

each node is a PII-450mhz, with 128mb of ram.
Its small now, 10 nodes. But maybe I can get a grant for more nodes.

The cluster is using lam-mpi along with openPBS.

most of the jobs Ive been running are for benchmarking, but I hear the
Physics Department has a problem that is ideal for a Cluster to crunch
away at.

--
Jason Fried


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-cluster" in the body of the message


From owner-freebsd-cluster  Tue Mar  5  1:21:59 2002
Delivered-To: freebsd-cluster@freebsd.org
Received: from gate.nentec.de (gate2.nentec.de [194.25.215.66])
	by hub.freebsd.org (Postfix) with ESMTP id 755E537B402
	for <freebsd-cluster@FreeBSD.ORG>; Tue,  5 Mar 2002 01:21:54 -0800 (PST)
Received: from nenny.nentec.de (root@nenny.nentec.de [153.92.64.1])
	by gate.nentec.de (8.9.3/8.9.3) with ESMTP id KAA04046;
	Tue, 5 Mar 2002 10:21:53 +0100
Received: from andromeda (andromeda [153.92.64.34])
	by nenny.nentec.de (8.11.3/8.11.3/SuSE Linux 8.11.1-0.5) with ESMTP id g259Lph28317;
	Tue, 5 Mar 2002 10:21:51 +0100
Message-ID: <XFMail.020305102154.sporner@nentec.de>
X-Mailer: XFMail 1.4.0 on Linux
X-Priority: 3 (Normal)
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 8bit
MIME-Version: 1.0
In-Reply-To: <20020304120822.F507-100000@cluster.nix.selu.edu>
Date: Tue, 05 Mar 2002 10:21:54 +0100 (MET)
Reply-To: Andy Sporner <sporner@nentec.de>
Organization: NENTEC Netywerktechnologie GmbH
From: Andy Sporner <sporner@nentec.de>
To: Jason Fried <jfried@cluster.nix.selu.edu>
Subject: RE: FreeBSD Cluster at SLU
Cc: freebsd-cluster@FreeBSD.ORG
X-Virus-Scanned: by AMaViS-perl11-milter (http://amavis.org/)
Sender: owner-freebsd-cluster@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-cluster.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-cluster>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-cluster>
X-Loop: FreeBSD.ORG

Hi Jason,

I am curious as to the target of this group which I have just joined. I see
few posts here.   I have a clustering system as well that works for FreeBSD.
The purpose is to handle application failover.  This weekend (I hope!!!) I will
release the next version (211) that provides for centralized management of 
nodes in the a cluster (That you can see all the processes and do basic 
administration).  My next attempt after 211 is to handle process migration
between nodes.

Is the clustering that is present here only representative of Beowulf style
clustering?

Thanks!


Andy Sporner

PS.  The site is http://www.sporner.com/bsdclusters...


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-cluster" in the body of the message


From owner-freebsd-cluster  Tue Mar  5  7:13:53 2002
Delivered-To: freebsd-cluster@freebsd.org
Received: from cluster.nix.selu.edu (gateway.nix.selu.edu [147.174.59.2])
	by hub.freebsd.org (Postfix) with ESMTP id B17A337B41E
	for <freebsd-cluster@FreeBSD.ORG>; Tue,  5 Mar 2002 07:13:35 -0800 (PST)
Received: from cluster.nix.selu.edu (DerTeufel@localhost [127.0.0.1])
	by cluster.nix.selu.edu (8.12.2/8.12.2) with ESMTP id g25FDWAn009095;
	Tue, 5 Mar 2002 09:13:32 -0600 (CST)
	(envelope-from jfried@cluster.nix.selu.edu)
Received: from localhost (jfried@localhost)
	by cluster.nix.selu.edu (8.12.2/8.12.2/Submit) with ESMTP id g25FDVbf009092;
	Tue, 5 Mar 2002 09:13:32 -0600 (CST)
Date: Tue, 5 Mar 2002 09:13:31 -0600 (CST)
From: Jason Fried <jfried@cluster.nix.selu.edu>
To: Andy Sporner <sporner@nentec.de>
Cc: freebsd-cluster@FreeBSD.ORG
Subject: RE: FreeBSD Cluster at SLU
In-Reply-To: <XFMail.020305102154.sporner@nentec.de>
Message-ID: <20020305085934.V8898-100000@cluster.nix.selu.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-cluster@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-cluster.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-cluster>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-cluster>
X-Loop: FreeBSD.ORG


Well, I just joined this list last week, I didnt see any posts, so i dont
know. I guess this list includes all types of Clustering Projects that use
FreeBSD.

Your project sound interesting.

Ive been thinking about using some system for process migration, like
mosix but thats only for linux, though I hear that somebody has done the
same for freebsd

Most of the time I spent working on my cluster was to handle the setup of
new nodes, and to handle changing configurations on existing nodes.

Now I just need to get a book on mpi programing and learn how to write
programs to take advantage of this cluster.


Jason Fried


On Tue, 5 Mar 2002, Andy Sporner wrote:

> Hi Jason,
>
> I am curious as to the target of this group which I have just joined. I see
> few posts here.   I have a clustering system as well that works for FreeBSD.
> The purpose is to handle application failover.  This weekend (I hope!!!) I will
> release the next version (211) that provides for centralized management of
> nodes in the a cluster (That you can see all the processes and do basic
> administration).  My next attempt after 211 is to handle process migration
> between nodes.
>
> Is the clustering that is present here only representative of Beowulf style
> clustering?
>
> Thanks!
>
>
>
> Andy Sporner
>
> PS.  The site is http://www.sporner.com/bsdclusters...
>
>
>


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-cluster" in the body of the message


From owner-freebsd-cluster  Tue Mar  5  7:40:50 2002
Delivered-To: freebsd-cluster@freebsd.org
Received: from gate.nentec.de (gate2.nentec.de [194.25.215.66])
	by hub.freebsd.org (Postfix) with ESMTP id 3B11F37B402
	for <freebsd-cluster@FreeBSD.ORG>; Tue,  5 Mar 2002 07:40:43 -0800 (PST)
Received: from nenny.nentec.de (root@nenny.nentec.de [153.92.64.1])
	by gate.nentec.de (8.9.3/8.9.3) with ESMTP id QAA31954;
	Tue, 5 Mar 2002 16:40:43 +0100
Received: from andromeda (andromeda [153.92.64.34])
	by nenny.nentec.de (8.11.3/8.11.3/SuSE Linux 8.11.1-0.5) with ESMTP id g25Fech15861;
	Tue, 5 Mar 2002 16:40:38 +0100
Message-ID: <XFMail.020305164041.sporner@nentec.de>
X-Mailer: XFMail 1.4.0 on Linux
X-Priority: 3 (Normal)
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 8bit
MIME-Version: 1.0
In-Reply-To: <20020305085934.V8898-100000@cluster.nix.selu.edu>
Date: Tue, 05 Mar 2002 16:40:41 +0100 (MET)
Reply-To: Andy Sporner <sporner@nentec.de>
Organization: NENTEC Netywerktechnologie GmbH
From: Andy Sporner <sporner@nentec.de>
To: Jason Fried <jfried@cluster.nix.selu.edu>
Subject: RE: FreeBSD Cluster at SLU
Cc: freebsd-cluster@FreeBSD.ORG
X-Virus-Scanned: by AMaViS-perl11-milter (http://amavis.org/)
Sender: owner-freebsd-cluster@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-cluster.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-cluster>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-cluster>
X-Loop: FreeBSD.ORG

Hi,

> 
> Well, I just joined this list last week, I didnt see any posts, so i dont
> know. I guess this list includes all types of Clustering Projects that use
> FreeBSD.
> 
> Your project sound interesting.
> 
> Ive been thinking about using some system for process migration, like
> mosix but thats only for linux, though I hear that somebody has done the
> same for freebsd

I don't know about it, except for NOW and SPRITE.  I would like to expand on
the 'Jail' concept whereby a virtual machine is spanned accross many physical
machines.  In this way the only processes that can move are in some sort of
container.   

The biggest advantage I think that I can bring is migratable networking sockets.
Also shared memory would be available across the machines (though I don't think
this is new).

> 
> Most of the time I spent working on my cluster was to handle the setup of
> new nodes, and to handle changing configurations on existing nodes.
> 

There is a thread going on about 'fish' in hackers.  I had thought about the
problem in my GUI.  I may actually try to address it, since I handle
configuration replication and it would be a small matter to include the 
settings in 'rc.conf' and other things.

> Now I just need to get a book on mpi programing and learn how to write
> programs to take advantage of this cluster.

Good luck!


Andy

PS. When I do the next release you might consider trying it for no other
reason as to have a central console...


> 
> 
> Jason Fried
> 
> 
> 
> On Tue, 5 Mar 2002, Andy Sporner wrote:
> 
>> Hi Jason,
>>
>> I am curious as to the target of this group which I have just joined. I see
>> few posts here.   I have a clustering system as well that works for FreeBSD.
>> The purpose is to handle application failover.  This weekend (I hope!!!) I
>> will
>> release the next version (211) that provides for centralized management of
>> nodes in the a cluster (That you can see all the processes and do basic
>> administration).  My next attempt after 211 is to handle process migration
>> between nodes.
>>
>> Is the clustering that is present here only representative of Beowulf style
>> clustering?
>>
>> Thanks!
>>
>>
>>
>> Andy Sporner
>>
>> PS.  The site is http://www.sporner.com/bsdclusters...
>>
>>
>>

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-cluster" in the body of the message


From owner-freebsd-cluster  Tue Mar  5 10:11: 9 2002
Delivered-To: freebsd-cluster@freebsd.org
Received: from acl.lanl.gov (acl.lanl.gov [128.165.147.1])
	by hub.freebsd.org (Postfix) with SMTP id 1034737B404
	for <freebsd-cluster@FreeBSD.ORG>; Tue,  5 Mar 2002 10:11:00 -0800 (PST)
Received: (qmail 167031 invoked from network); 5 Mar 2002 11:10:59 -0700
Received: from snaresland.acl.lanl.gov (128.165.147.113)
  by acl.lanl.gov with SMTP; 5 Mar 2002 11:10:59 -0700
Received: (qmail 1794 invoked by uid 3499); 5 Mar 2002 11:10:50 -0700
Received: from localhost (sendmail-bs@127.0.0.1)
  by localhost with SMTP; 5 Mar 2002 11:10:50 -0700
Date: Tue, 5 Mar 2002 11:10:50 -0700 (MST)
From: Ronald G Minnich <rminnich@lanl.gov>
X-X-Sender:  <rminnich@snaresland.acl.lanl.gov>
To: Andy Sporner <sporner@nentec.de>
Cc: Jason Fried <jfried@cluster.nix.selu.edu>,
	<freebsd-cluster@FreeBSD.ORG>
Subject: RE: FreeBSD Cluster at SLU
In-Reply-To: <XFMail.020305164041.sporner@nentec.de>
Message-ID: <Pine.LNX.4.33.0203051110320.1758-100000@snaresland.acl.lanl.gov>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-cluster@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-cluster.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-cluster>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-cluster>
X-Loop: FreeBSD.ORG


you could do a lot worse than just porting bproc to freebsd.

see www.clustermatic.org

ron


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-cluster" in the body of the message


From owner-freebsd-cluster  Wed Mar  6  0:20:40 2002
Delivered-To: freebsd-cluster@freebsd.org
Received: from gate.nentec.de (gate2.nentec.de [194.25.215.66])
	by hub.freebsd.org (Postfix) with ESMTP id DDC4637B417
	for <freebsd-cluster@FreeBSD.ORG>; Wed,  6 Mar 2002 00:20:35 -0800 (PST)
Received: from nenny.nentec.de (root@nenny.nentec.de [153.92.64.1])
	by gate.nentec.de (8.9.3/8.9.3) with ESMTP id JAA08885;
	Wed, 6 Mar 2002 09:20:31 +0100
Received: from andromeda (andromeda [153.92.64.34])
	by nenny.nentec.de (8.11.3/8.11.3/SuSE Linux 8.11.1-0.5) with ESMTP id g268KOh21572;
	Wed, 6 Mar 2002 09:20:29 +0100
Message-ID: <XFMail.020306092021.sporner@nentec.de>
X-Mailer: XFMail 1.4.0 on Linux
X-Priority: 3 (Normal)
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 8bit
MIME-Version: 1.0
In-Reply-To: <Pine.LNX.4.33.0203051110320.1758-100000@snaresland.acl.lanl.gov>
Date: Wed, 06 Mar 2002 09:20:21 +0100 (MET)
Reply-To: Andy Sporner <sporner@nentec.de>
Organization: NENTEC Netywerktechnologie GmbH
From: Andy Sporner <sporner@nentec.de>
To: Ronald G Minnich <rminnich@lanl.gov>
Subject: RE: FreeBSD Cluster at SLU
Cc: freebsd-cluster@FreeBSD.ORG,
	Jason Fried <jfried@cluster.nix.selu.edu>
X-Virus-Scanned: by AMaViS-perl11-milter (http://amavis.org/)
Sender: owner-freebsd-cluster@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-cluster.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-cluster>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-cluster>
X-Loop: FreeBSD.ORG

Looks nice, but very Assymetric...


On 05-Mar-02 Ronald G Minnich wrote:
> 
> 
> you could do a lot worse than just porting bproc to freebsd.
> 
> see www.clustermatic.org
> 
> ron
> 
> 
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-cluster" in the body of the message

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-cluster" in the body of the message


From owner-freebsd-cluster  Wed Mar  6  7:22:28 2002
Delivered-To: freebsd-cluster@freebsd.org
Received: from acl.lanl.gov (acl.lanl.gov [128.165.147.1])
	by hub.freebsd.org (Postfix) with SMTP id 5B9D437B404
	for <freebsd-cluster@FreeBSD.ORG>; Wed,  6 Mar 2002 07:22:24 -0800 (PST)
Received: (qmail 223542 invoked from network); 6 Mar 2002 08:22:23 -0700
Received: from snaresland.acl.lanl.gov (128.165.147.113)
  by acl.lanl.gov with SMTP; 6 Mar 2002 08:22:23 -0700
Received: (qmail 7954 invoked by uid 3499); 6 Mar 2002 08:22:23 -0700
Received: from localhost (sendmail-bs@127.0.0.1)
  by localhost with SMTP; 6 Mar 2002 08:22:23 -0700
Date: Wed, 6 Mar 2002 08:22:23 -0700 (MST)
From: Ronald G Minnich <rminnich@lanl.gov>
X-X-Sender:  <rminnich@snaresland.acl.lanl.gov>
To: Andy Sporner <sporner@nentec.de>
Cc: <freebsd-cluster@FreeBSD.ORG>,
	Jason Fried <jfried@cluster.nix.selu.edu>
Subject: RE: FreeBSD Cluster at SLU
In-Reply-To: <XFMail.020306092021.sporner@nentec.de>
Message-ID: <Pine.LNX.4.33.0203060821500.7642-100000@snaresland.acl.lanl.gov>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-cluster@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-cluster.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-cluster>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-cluster>
X-Loop: FreeBSD.ORG

On Wed, 6 Mar 2002, Andy Sporner wrote:

> Looks nice, but very Assymetric...

and that's good in a cluster. Assymetry is very very good. There is no
need to so SSI on all the nodes in the cluster -- just the node you log
into.

SSI on 1024 nodes is a huge mistake.

ron


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-cluster" in the body of the message


From owner-freebsd-cluster  Wed Mar  6  7:47:58 2002
Delivered-To: freebsd-cluster@freebsd.org
Received: from gate.nentec.de (gate2.nentec.de [194.25.215.66])
	by hub.freebsd.org (Postfix) with ESMTP id 5E6D237B419
	for <freebsd-cluster@FreeBSD.ORG>; Wed,  6 Mar 2002 07:47:51 -0800 (PST)
Received: from nenny.nentec.de (root@nenny.nentec.de [153.92.64.1])
	by gate.nentec.de (8.9.3/8.9.3) with ESMTP id QAA13751;
	Wed, 6 Mar 2002 16:47:50 +0100
Received: from andromeda (andromeda [153.92.64.34])
	by nenny.nentec.de (8.11.3/8.11.3/SuSE Linux 8.11.1-0.5) with ESMTP id g26Flkh10020;
	Wed, 6 Mar 2002 16:47:46 +0100
Message-ID: <XFMail.020306164742.sporner@nentec.de>
X-Mailer: XFMail 1.4.0 on Linux
X-Priority: 3 (Normal)
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 8bit
MIME-Version: 1.0
In-Reply-To: <Pine.LNX.4.33.0203060821500.7642-100000@snaresland.acl.lanl.gov>
Date: Wed, 06 Mar 2002 16:47:42 +0100 (MET)
Reply-To: Andy Sporner <sporner@nentec.de>
Organization: NENTEC Netywerktechnologie GmbH
From: Andy Sporner <sporner@nentec.de>
To: Ronald G Minnich <rminnich@lanl.gov>
Subject: RE: FreeBSD Cluster at SLU
Cc: Jason Fried <jfried@cluster.nix.selu.edu>,
	freebsd-cluster@FreeBSD.ORG
X-Virus-Scanned: by AMaViS-perl11-milter (http://amavis.org/)
Sender: owner-freebsd-cluster@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-cluster.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-cluster>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-cluster>
X-Loop: FreeBSD.ORG

Within reason I agree...  However having things in one place defeats the high
availabilty on a cluster, but we may be talking about different things here.  I
am looking at making Unix machines more reliable to get to 99.999% uptime. If
your configuration image is on machine, than you have no backups.  The cluster 
approach I designed has replication of configuration that covers this, so your
"Cluster Monitor" node can fail-over when that machine fails (should it...).


On 06-Mar-02 Ronald G Minnich wrote:
> On Wed, 6 Mar 2002, Andy Sporner wrote:
> 
>> Looks nice, but very Assymetric...
> 
> and that's good in a cluster. Assymetry is very very good. There is no
> need to so SSI on all the nodes in the cluster -- just the node you log
> into.
> 
> SSI on 1024 nodes is a huge mistake.
> 
> ron
> 
> 
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-cluster" in the body of the message

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-cluster" in the body of the message


From owner-freebsd-cluster  Wed Mar  6  7:52: 4 2002
Delivered-To: freebsd-cluster@freebsd.org
Received: from acl.lanl.gov (acl.lanl.gov [128.165.147.1])
	by hub.freebsd.org (Postfix) with SMTP id 3FAAB37B405
	for <freebsd-cluster@FreeBSD.ORG>; Wed,  6 Mar 2002 07:52:02 -0800 (PST)
Received: (qmail 222646 invoked from network); 6 Mar 2002 08:52:00 -0700
Received: from snaresland.acl.lanl.gov (128.165.147.113)
  by acl.lanl.gov with SMTP; 6 Mar 2002 08:52:00 -0700
Received: (qmail 8167 invoked by uid 3499); 6 Mar 2002 08:52:00 -0700
Received: from localhost (sendmail-bs@127.0.0.1)
  by localhost with SMTP; 6 Mar 2002 08:52:00 -0700
Date: Wed, 6 Mar 2002 08:52:00 -0700 (MST)
From: Ronald G Minnich <rminnich@lanl.gov>
X-X-Sender:  <rminnich@snaresland.acl.lanl.gov>
To: Andy Sporner <sporner@nentec.de>
Cc: Jason Fried <jfried@cluster.nix.selu.edu>,
	<freebsd-cluster@FreeBSD.ORG>
Subject: RE: FreeBSD Cluster at SLU
In-Reply-To: <XFMail.020306164742.sporner@nentec.de>
Message-ID: <Pine.LNX.4.33.0203060849090.7642-100000@snaresland.acl.lanl.gov>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-cluster@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-cluster.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-cluster>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-cluster>
X-Loop: FreeBSD.ORG

On Wed, 6 Mar 2002, Andy Sporner wrote:

> Within reason I agree...  However having things in one place defeats
> the high availabilty on a cluster, but we may be talking about
> different things here.

no, this is actually funny thinking about uptime.

People frequently confuse things.
- A system with Multiple Points of Failure (MPOF) has no Single Point of
  Failure (SPOF)
- A system with a Single Point of Failure (SPOF)
- A system with No SPOF

Often, people build systems with MPOF, and mistakenly think they have
achieved a sytem with No SPOF. Wrong.

We're just trying to get to a system with SPOF, harder than it looks.


>  I am looking at making Unix machines more
> reliable to get to 99.999% uptime.

You can actually do this with one node. It's doing it with lots of nodes
that is hard.

> If your configuration image is on
> machine, than you have no backups.

See above.

> The cluster approach I designed
> has replication of configuration that covers this, so your "Cluster
> Monitor" node can fail-over when that machine fails (should it...).

How large have you made your system to date? how many nodes? have you
built it?

ron


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-cluster" in the body of the message


From owner-freebsd-cluster  Wed Mar  6  8:24:59 2002
Delivered-To: freebsd-cluster@freebsd.org
Received: from gate.nentec.de (gate2.nentec.de [194.25.215.66])
	by hub.freebsd.org (Postfix) with ESMTP id 204A537B405
	for <freebsd-cluster@FreeBSD.ORG>; Wed,  6 Mar 2002 08:24:41 -0800 (PST)
Received: from nenny.nentec.de (root@nenny.nentec.de [153.92.64.1])
	by gate.nentec.de (8.9.3/8.9.3) with ESMTP id RAA18433;
	Wed, 6 Mar 2002 17:24:34 +0100
Received: from andromeda (andromeda [153.92.64.34])
	by nenny.nentec.de (8.11.3/8.11.3/SuSE Linux 8.11.1-0.5) with ESMTP id g26GOTh11733;
	Wed, 6 Mar 2002 17:24:29 +0100
Message-ID: <XFMail.020306172424.sporner@nentec.de>
X-Mailer: XFMail 1.4.0 on Linux
X-Priority: 3 (Normal)
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 8bit
MIME-Version: 1.0
In-Reply-To: <Pine.LNX.4.33.0203060849090.7642-100000@snaresland.acl.lanl.gov>
Date: Wed, 06 Mar 2002 17:24:24 +0100 (MET)
Reply-To: Andy Sporner <sporner@nentec.de>
Organization: NENTEC Netywerktechnologie GmbH
From: Andy Sporner <sporner@nentec.de>
To: Ronald G Minnich <rminnich@lanl.gov>
Subject: RE: FreeBSD Cluster at SLU
Cc: freebsd-cluster@FreeBSD.ORG,
	Jason Fried <jfried@cluster.nix.selu.edu>
X-Virus-Scanned: by AMaViS-perl11-milter (http://amavis.org/)
Sender: owner-freebsd-cluster@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-cluster.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-cluster>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-cluster>
X-Loop: FreeBSD.ORG

Hi Ron,

Hopefully this thread will bring some life to this group...

> 
>> Within reason I agree...  However having things in one place defeats
>> the high availabilty on a cluster, but we may be talking about
>> different things here.
> 
> no, this is actually funny thinking about uptime.

Yes I agree.

Here is where I am coming from.  In 1995 I started working with Sequent 
Clusters.  I saw a need to provide clustering such as what was done with
VAX clusters some time ago.  This means a large cluster is about 32 nodes.
The application for such clusters is business applications (such as Oracle
and the like).  However I do realize that times have changed and clusters are
much larger now.  But in that time I have done systems architecture at two
major corporations that calculated down time in terms of Millions of dollars per
hour.  So I am well aware of the impacts that need to be addressed.

Up until now, my focus has been to provide application failover and nothing 
more (in the tradition of the original Sequent clusters)--except for a few
differences, most notably the lack of a distributed lock manager.  But since
the goal is simple application failover, it wasn't needed.  I'm not up to date
on what Oracle has been up to with version 8, but they may have implemented
this outside of the O/S by now.  Version 7, which I did have exposure to 
needed the support in the O/S.

Again stating my focus to be making a computing platform so that networking
services can be scaled in a reliable way.  That is to create a platform that
has NO SPOF.  Every component has a redundant member.  I don't think I have
to tell you that even this doesn't work well completely.... ;-)

> 
> We're just trying to get to a system with SPOF, harder than it looks.
> 

Clear.  The "Monitor Node" does all of the administration on my clustering
system and the other nodes are passive.  There is a "Lady in waiting" should
the master fail.  This is dynamically computed as nodes enter and leave the
cluster, in a deterministic manner so that there can be no doubt, which node
will take over the Monitor responsibility in the event of the monitor node
failing...  As the monitor node updates it's configuration, it passes the 
updates to the other nodes.  There is a lot of logic to prevent stale nodes
from entering the cluster and other mishaps like the "Split Brain" scenario.

>> reliable to get to 99.999% uptime.
> 
> You can actually do this with one node. It's doing it with lots of nodes
> that is hard.

Clear again, but I think only an IBM mainframe or something of that type of
hardware reliability.  But let's not split hairs over this, because it would
take us off topic.

> 
>> The cluster approach I designed
>> has replication of configuration that covers this, so your "Cluster
>> Monitor" node can fail-over when that machine fails (should it...).
> 
> How large have you made your system to date? how many nodes? have you
> built it?
> 

6 nodes and it works very well.  I have a new version that provides a
centralized interface to look at the uptime and statistics of all of the
nodes.  This is a prelude to a single process table image across all of
the nodes in the cluster.   This is the next major release, which is 
easily a year away (unless I find helpers! :-)

The idea is that whereever a process is started, it makes an entry in 
the process table.  The PID's are assigned in a N-Modulus approach so that
the PID determines the home node of the process.  When a process migrates, 
it keeps it's entry on the home node and a new entry is created on the
new host node.  If it should move again, the home node is updated.  I haven't
started implementing or benchmarking this yet, so it could change, but that
is the initial idea.

Since the model is for making a scalable networking application platform,
all of the aspects of the process move with the process (including sockets).
The idea is that you can telnet into a machine, and have your in.telnetd and
shell migrate to another machine without breaking the connection.   This uses
a gateway device, which keeps track of all of the sessions.  When a process
moves, the session is updated to point to the new host machine.  This gateway
needs to be redundant, so here is where the current generation of the cluster
software is put to work.  

There is no hard coded limit on how many nodes can be in the cluster.  As I
recall Mosix has a limit.  Last I head they also had some issues on how to
create a network coherent memory space.  Lastly I think there was some problem
with open-source (because of some military impact in Isreal).

But I have digressed,  the point is to apply an SMP approach to a network
of computers, such as Numa does, but without the O/S being a single point
of failure.  If a node does, only programs that had resources there fail and
can be immediately restarted.  The larger the cluster, hopefully the smaller
the impact, then it is simply a matter of simple statistics to calculate 
downtime.

Regards


Andy

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-cluster" in the body of the message


From owner-freebsd-cluster  Wed Mar  6 10: 8:16 2002
Delivered-To: freebsd-cluster@freebsd.org
Received: from acl.lanl.gov (acl.lanl.gov [128.165.147.1])
	by hub.freebsd.org (Postfix) with SMTP id EBEC237B423
	for <freebsd-cluster@FreeBSD.ORG>; Wed,  6 Mar 2002 10:07:52 -0800 (PST)
Received: (qmail 234116 invoked from network); 6 Mar 2002 11:07:51 -0700
Received: from snaresland.acl.lanl.gov (128.165.147.113)
  by acl.lanl.gov with SMTP; 6 Mar 2002 11:07:51 -0700
Received: (qmail 9936 invoked by uid 3499); 6 Mar 2002 11:07:51 -0700
Received: from localhost (sendmail-bs@127.0.0.1)
  by localhost with SMTP; 6 Mar 2002 11:07:51 -0700
Date: Wed, 6 Mar 2002 11:07:51 -0700 (MST)
From: Ronald G Minnich <rminnich@lanl.gov>
X-X-Sender:  <rminnich@snaresland.acl.lanl.gov>
To: Andy Sporner <sporner@nentec.de>
Cc: <freebsd-cluster@FreeBSD.ORG>,
	Jason Fried <jfried@cluster.nix.selu.edu>
Subject: RE: FreeBSD Cluster at SLU
In-Reply-To: <XFMail.020306172424.sporner@nentec.de>
Message-ID: <Pine.LNX.4.33.0203061105590.7642-100000@snaresland.acl.lanl.gov>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-cluster@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-cluster.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-cluster>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-cluster>
X-Loop: FreeBSD.ORG

On Wed, 6 Mar 2002, Andy Sporner wrote:

>
> The idea is that whereever a process is started, it makes an entry in
> the process table.  The PID's are assigned in a N-Modulus approach so that
> the PID determines the home node of the process.  When a process migrates,
> it keeps it's entry on the home node and a new entry is created on the
> new host node.  If it should move again, the home node is updated.  I haven't
> started implementing or benchmarking this yet, so it could change, but that
> is the initial idea.

this is very similar to bproc. Would a single hot-spare approach do the
job?

I do know there is a telecom company using bproc to do this type of thing.

> Since the model is for making a scalable networking application platform,
> all of the aspects of the process move with the process (including sockets).

movable sockets sure would be nice.

your work sounds neat.

ron


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-cluster" in the body of the message


From owner-freebsd-cluster  Wed Mar  6 23:39:55 2002
Delivered-To: freebsd-cluster@freebsd.org
Received: from gate.nentec.de (gate2.nentec.de [194.25.215.66])
	by hub.freebsd.org (Postfix) with ESMTP id E14C737B404
	for <freebsd-cluster@FreeBSD.ORG>; Wed,  6 Mar 2002 23:39:49 -0800 (PST)
Received: from nenny.nentec.de (root@nenny.nentec.de [153.92.64.1])
	by gate.nentec.de (8.9.3/8.9.3) with ESMTP id IAA18481;
	Thu, 7 Mar 2002 08:39:47 +0100
Received: from andromeda (andromeda [153.92.64.34])
	by nenny.nentec.de (8.11.3/8.11.3/SuSE Linux 8.11.1-0.5) with ESMTP id g277djh14073;
	Thu, 7 Mar 2002 08:39:46 +0100
Message-ID: <XFMail.020307083934.sporner@nentec.de>
X-Mailer: XFMail 1.4.0 on Linux
X-Priority: 3 (Normal)
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 8bit
MIME-Version: 1.0
In-Reply-To: <Pine.LNX.4.33.0203061105590.7642-100000@snaresland.acl.lanl.gov>
Date: Thu, 07 Mar 2002 08:39:34 +0100 (MET)
Reply-To: Andy Sporner <sporner@nentec.de>
Organization: NENTEC Netywerktechnologie GmbH
From: Andy Sporner <sporner@nentec.de>
To: Ronald G Minnich <rminnich@lanl.gov>
Subject: RE: FreeBSD Cluster at SLU
Cc: Jason Fried <jfried@cluster.nix.selu.edu>,
	freebsd-cluster@FreeBSD.ORG
X-Virus-Scanned: by AMaViS-perl11-milter (http://amavis.org/)
Sender: owner-freebsd-cluster@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-cluster.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-cluster>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-cluster>
X-Loop: FreeBSD.ORG

Hi Ron,

>>
>> The idea is that whereever a process is started, it makes an entry in
>> the process table.  The PID's are assigned in a N-Modulus approach so that
>> the PID determines the home node of the process.  When a process migrates,
>> it keeps it's entry on the home node and a new entry is created on the
>> new host node.  If it should move again, the home node is updated.  I
>> haven't
>> started implementing or benchmarking this yet, so it could change, but that
>> is the initial idea.
> 
> this is very similar to bproc. Would a single hot-spare approach do the
> job?
> 

Well for scalability reasons, probably not.  On the other hand, it would also
be very bad to be playing "Hot Potatoe" with an unruly process that wants to
dominate a machine resources.  No doubt some very complicated handling will need
to be added.  I remember all the trouble they had with Numa and Quad Affinity. 
Resource affinity will have to also be looked at (like shared memory).

I think you have convinced me to look into the effort of porting 'bproc' to 
FreeBSD.  Certainly it would make a good starting point in the direction that
I want to go--and reduce certain pains..  More on that later when I have had
a look at it.

> I do know there is a telecom company using bproc to do this type of thing.
> 
>> Since the model is for making a scalable networking application platform,
>> all of the aspects of the process move with the process (including sockets).
> 
> movable sockets sure would be nice.
> 
> your work sounds neat.
>

Thanks!  Likewise!


Andy


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-cluster" in the body of the message