Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 19 Mar 1997 05:57:58 -0800 (PST)
From:      Mike Pritchard <mpp>
To:        jkh@time.cdrom.com (Jordan K. Hubbard)
Cc:        hackers@freebsd.org
Subject:   Re: dup3() - I've thought it over and decided...
Message-ID:  <199703191357.FAA22301@freefall.freebsd.org>
In-Reply-To: <20682.858762363@time.cdrom.com> from "Jordan K. Hubbard" at Mar 19, 97 01:06:03 am

next in thread | previous in thread | raw e-mail | index | archive | help
Jordan K. Hubbard wrote:
> 
> > How about going more along the checkpoint/restart route?  Suspend
> > the process, checkpoint it, and on restart you can reconnect stdin/out/err
> > to either the current tty, or to another file.  Reconnecting to
> > a pipeline should also be possible.  I can dig up the USENIX paper
> > the Cray guys wrote on this if you like :-).
> 
> Does the checkpoint feature allow you to complete snapshot process
> state?  That's another item on my wishlist. :-)
> 
> I wouldn't mind reading that paper, if you can dig it up.

The best I could do for now was come up with was the information available
from www.usenix.org:

Authors: Brent A. Kingsbury, John T. Kline, Cray Research
Title: Job and process recovery in a UNIX-based operating system
Winter 1989 USENIX

However, while searching around, it looks like there is a POSIX draft
floating around that has something to say on the matter of checkpoint/restart.

As for Cray's implementation, yes, it allows you to create a complete
snapshot of the process, process group, or session.  At this point you
could either kill the the proc/pgrp/session for later restart, or allow 
it to keep running and only use the snapshot in case of a system crash.
I was involved in some work on this that allowed you to checkpoint the 
process on one machine and then restart it on another for load leveling 
purposes.

It was used mainly for checkpoint/restart of long running batch
jobs submitted via NQS, but it was usable with interactive jobs
to a degree.  There was on-going work for better interactive
support when I left Cray (see below).

> Anyone remember a timesharing system called ITS (from MIT)?  If you
> got disconnected from the modem (not uncommon in those days of
> Pennywhistle, 300 baud acoustically-coupled modems :-) you wouldn't
> lose your session, like you do under UNIX, rather the next time you
> logged in it would ask you:
> 
> [Attach your detached tree?]
> 
> And if you said 'y' you'd get your old process tree back, everything
> right where you left it.

I used to use CDC's NOS that had this feature.  Very useful.
I think that Cray's current implementation also has this feature.
-- 
Mike Pritchard
mpp@FreeBSD.org
"Go that way.  Really fast.  If something gets in your way, turn"



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199703191357.FAA22301>