Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 19 Jun 2002 12:42:03 -0700
From:      Terry Lambert <tlambert2@mindspring.com>
To:        Andrey Simonenko <simon@comsys.ntu-kpi.kiev.ua>
Cc:        freebsd-hackers@freebsd.org
Subject:   Re: Is it possible to store process state and then restore process
Message-ID:  <3D10DE8B.F259D0BA@mindspring.com>
References:  <006401c21793$30721750$6d36120a@pm5149>

next in thread | previous in thread | raw e-mail | index | archive | help
Andrey Simonenko wrote:
> Suppose there is a process, let this process doesn't have any
> childs, open sockets, it has one thread, etc. But this process
> can malloc() memory, open local files. Let's take very simple case.
> 
> Is it possible to store process state to the file (i.e. say
> somehow the kernel to do this), and then after rebooting restore
> from the file this process back to system and continue executing it?
> 
> I understand that it is not very simple, but I want to know if it is
> possible. Are there any problem with memory addressetion?

Do a web search on the two terms "checkpoint restart".  You can
also do a web search on the term "undump".

In a general sense, this won't be able to work for any process
which uses sockets, because the endpoint information will not
be recoverable (in case you decide to not take the "simple case"
in the future).  It's possible to make it (mostly) recoverable,
but it requires modifications, such as pausing the TCP stack so
that after reboot, but before checkpointed jobs that will be
restarted are recovered, since you don't want to be sending RST
packets to the peers on network connections.

As a rule, most checkpoint and restart systems that you will find
out there on the net when you run the search will also not support
things like re-sharing of descriptors for a set of processes that
have used UNIX domain sockets to pass them, maintaining proper
parent/child process relationships for things like SIGCHLD, etc..
You should assume that anything you checkpoint will be restarted
on another machine halfway around the planet, without any of the
other local processes running.

Anything having to do with pending outstanding operations (e.g.
alarms, I/O, etc.) will require OS support to recover.

Since it's a lot simpler to restart a long running application
*almost* where it left off, most of the useful and non-invasive
packages you will find from your web search will try to do a
periodic snapshot of process state, and restore it from the
point of last snapshot, not failure.  This will also lose any
implied IPC state, so it's best if the application in question
is written to open any resources, access them, close them, do
the long term computation, and then open and write an output
file only after it's done, rather than, say, holding the output
file open.  If the output file is written incrementally, you
will likely end up with duplicate results, otherwise.

For these reasons, and others I haven't mentioned, you will
probably be most happy with "undump", unless you plan on doing
a large project.

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3D10DE8B.F259D0BA>