Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 24 Jan 2005 20:52:35 +0000 (GMT)
From:      Robert Watson <rwatson@freebsd.org>
To:        Peter Jeremy <PeterJeremy@optushome.com.au>
Cc:        Christian Laursen <xi@borderworlds.dk>
Subject:   Re: Resuming from a crashdump
Message-ID:  <Pine.NEB.3.96L.1050124204659.75661B-100000@fledge.watson.org>
In-Reply-To: <20050124202059.GA30458@cirb503493.alcatel.com.au>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 25 Jan 2005, Peter Jeremy wrote:

> On Mon, 2005-Jan-24 20:22:27 +0100, Christian Laursen wrote:
> >The idea would be to force the system to "crash" and make a
> >dump on a dedicated partition. On boot after initializing devices
> >but before mounting /, the kernel would check that partition and
> >if it found a dump there restore it to the machine's memory,
> >reinitialize devices and continue where it left off.
> 
> At a process level, this is what emacs and TeX used to do many years ago
> (have a look for "undump"). 
> 
> What you are describing is basically the same as the suspend-to-disk
> that some laptops support.  Implementing it is non-trivial because each
> I/O device needs to be re-initialised into the state it was before the
> suspend.  You also have to work out how to handle the intervening (lost) 
> time - what happens to at/cron jobs and timers that should have fired in
> the intervening period?
> 
> Note that in many circumstances, you will lose all external TCP
> connections when keep-alive timers expire in remote systems and
> firewalls.

It strikes me that there a number of classes of events that will resolve
themselves or can be worked through -- these tend to be network-related,
where the TCP connections will, most likely, be closed due to timeouts,
etc.  This is no different than today's normal suspend behavior, and I'm
not sure there's really any expectation of handling it better --
applications see the faults, and recover if they know how. 

The harder sets of things are things like:

- Making sure the state of the system is properly quiesced for dumping,
  such as truly halting system operation, making sure there are no pending
  I/Os, leaving distributed file systems in as clean a state as possible,
  etc.  If there are custom "settings" for devices that are stored in
  device state and must be restored, preserving them properly.

- When coming back up again, identifying various hardware devices and
  where possible, making sure they remain attached to the same device
  driver logical attachments in the OS -- i.e., mounted file systems,
  device nodes, network interfaces, etc.

- Handling some of the nastier user space cases, such as the X server,
  which maintain extensive device state that the kernel cannot restore.
  This will require the applications themselves to handle the
  quiesce/restore behavior.

- Writing the bootstrap pieces to re-initialize VM/physical memory and get
  things "up and running" -- restoring hardware state, etc.

That said, I think this is all very interesting, because one of the things
that currently irritates me in notebook use is the long boot time to get
from "go" to "fully initialized with applications running".  I think we
could do it faster with a real suspend to disk.

Robert N M Watson





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.NEB.3.96L.1050124204659.75661B-100000>