From owner-freebsd-cluster@FreeBSD.ORG Wed May 19 13:28:41 2010 Return-Path: Delivered-To: freebsd-cluster@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EB363106566B for ; Wed, 19 May 2010 13:28:41 +0000 (UTC) (envelope-from numardbsd@gmail.com) Received: from mail-pw0-f54.google.com (mail-pw0-f54.google.com [209.85.160.54]) by mx1.freebsd.org (Postfix) with ESMTP id A63378FC0A for ; Wed, 19 May 2010 13:28:41 +0000 (UTC) Received: by pwi9 with SMTP id 9so4149290pwi.13 for ; Wed, 19 May 2010 06:28:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:date:from:to:subject :message-id:in-reply-to:references:x-mailer:face:x-face:mime-version :content-type:content-transfer-encoding; bh=/jdHkSToNe2xYRo0jc5U2XCrXm+w6qmifUShSFf5+Dw=; b=owx3cWwpQMFFANJkYcUhUwpKeQZwMmFMKt6iMv+Dshi1RhGhONAsjV0idSHHLORLEf 02eodGhr1QElli6snv12MyoSACtxMxA0lggGYMWXYnNLhkzwUeH38a3C8cpDH3NSR3BV 54ZuhyeRGBl92owkCpPf24wg3U9r/ren8R9cE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=date:from:to:subject:message-id:in-reply-to:references:x-mailer :face:x-face:mime-version:content-type:content-transfer-encoding; b=xbGsjz6Pxh4vJyt6T3CS47ntYxY7ZGIwA5t/uBW+zQwLirf2pP/AHK27y5/GpnvTG8 MWPJwyVQ4HS+Lq0tWj3Hsz/n3toHkrYGP7qgw+AXoaRnVvTEcTdJgUl/k4jycGM0w62L N8Pm/vMoZzhfals3WAvM0Yke7x0TzKu2z1Ah0= Received: by 10.143.25.3 with SMTP id c3mr6035392wfj.17.1274275720957; Wed, 19 May 2010 06:28:40 -0700 (PDT) Received: from suspectum.octantis.com.au (124-170-49-70.dyn.iinet.net.au [124.170.49.70]) by mx.google.com with ESMTPS id 20sm5951886pzk.7.2010.05.19.06.28.38 (version=SSLv3 cipher=RC4-MD5); Wed, 19 May 2010 06:28:39 -0700 (PDT) Date: Wed, 19 May 2010 23:28:35 +1000 From: Norberto Meijome To: freebsd-cluster@freebsd.org Message-ID: <20100519232835.07672e18@suspectum.octantis.com.au> In-Reply-To: References: X-Mailer: Claws Mail 3.7.5 (GTK+ 2.18.9; i686-redhat-linux-gnu) Face: iVBORw0KGgoAAAANSUhEUgAAADAAAAAwBAMAAAClLOS0AAAAGFBMVEX+/v7++v6YOTrq8PCcuIX989UvOSj++v0BNCbpAAAAB3RJTUUHsQwfFzs7RBhzUQAAAhJJREFUOI1dU8GOqzAMNKIoV1bvwD1i0ysqrHplIdBrVSX7ATSbd03VVvn9tQNtQy0hjAdn7LED4AAcPtWm9RV+MPSfxhBLx9ajd6X/ngB6/mTwnRSZua7i7Ca+0ctZKo4Qmz+JY13X6I3nFZBxIYW1PbgfQ5RP8g0XlltEWGf3cV03joYpRnFbvYDKbXjZlXyyhEZA4lI+cN3NaVXE4VKjSwTExO10eTEkkJVqIAD5z0nUBQJluQDRSQjcrBiHAJxZlAH5CUMBMC7OcJ4LMQNnxhZ1HYPscMc6J4UlWRMNwzOpCcAHKSICd1EDn83abdREIbXsHkD1OinP1aCUCOEVRaa1lMcvywUWdYgk13JQUpYNKmvXQ8Kw5ML9YI5h8SakctBc7E/IYuLhYd/zZIk+1gM1vNweQBvHE0j+oYah3sMqAytQYlZk6+ANaaawJdu3OFzYGMZ3iGpa3qMlq9ZH0VZTgrCtw/ngdYkEIIpSbP1bWQAdFdX9vocBdkH2qVjVmuMu3gI5rjs814EUdrCZgWlPaxZZ3RiLFUtr+ud0PXwp2dnQSNXgePt6AZpBj6UMJ7VQkzN4utVeaSW1Dhn/kblGrKeMvNGnzwX4zuEDarYz1KdPtR60Gul0Gued+515SJXhCsl+Tx/3kY/UDvicPll9mfu50t3tvQ/thZpJYgeuwdSKNJ6tCD98MCgoxLDaPxbwqqwPWaWiAAAAAElFTkSuQmCC X-Face: iVBORw0KGgoAAAANSUhEUgAAADAAAAAwBAMAAAClLOS0AAAAGFBMVEX+/v7++v6YOTrq8PCcuIX989UvOSj++v0BNCbpAAAAB3RJTUUHsQwfFzs7RBhzUQAAAhJJREFUOI1dU8GOqzAMNKIoV1bvwD1i0ysqrHplIdBrVSX7ATSbd03VVvn9tQNtQy0hjAdn7LED4AAcPtWm9RV+MPSfxhBLx9ajd6X/ngB6/mTwnRSZua7i7Ca+0ctZKo4Qmz+JY13X6I3nFZBxIYW1PbgfQ5RP8g0XlltEWGf3cV03joYpRnFbvYDKbXjZlXyyhEZA4lI+cN3NaVXE4VKjSwTExO10eTEkkJVqIAD5z0nUBQJluQDRSQjcrBiHAJxZlAH5CUMBMC7OcJ4LMQNnxhZ1HYPscMc6J4UlWRMNwzOpCcAHKSICd1EDn83abdREIbXsHkD1OinP1aCUCOEVRaa1lMcvywUWdYgk13JQUpYNKmvXQ8Kw5ML9YI5h8SakctBc7E/IYuLhYd/zZIk+1gM1vNweQBvHE0j+oYah3sMqAytQYlZk6+ANaaawJdu3OFzYGMZ3iGpa3qMlq9ZH0VZTgrCtw/ngdYkEIIpSbP1bWQAdFdX9vocBdkH2qVjVmuMu3gI5rjs814EUdrCZgWlPaxZZ3RiLFUtr+ud0PXwp2dnQSNXgePt6AZpBj6UMJ7VQkzN4utVeaSW1Dhn/kblGrKeMvNGnzwX4zuEDarYz1KdPtR60Gul0Gued+515SJXhCsl+Tx/3kY/UDvicPll9mfu50t3tvQ/thZpJYgeuwdSKNJ6tCD98MCgoxLDaPxbwqqwPWaWiAAAAAElFTkSuQmCC Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Subject: Re: saving job state over a power outage X-BeenThere: freebsd-cluster@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Clustering FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 May 2010 13:28:42 -0000 On Wed, 5 May 2010 13:49:31 -0300 Joey Mingrone wrote: > Hello, > > Our lab has a cluster (Sun Fire X40z master node, Opteron 270 and > generic 2.0 GHz Opteron based compute nodes) running 8.0-RELEASE. > We've been informed that the power has to be turned off for the > weekend, but one of the lab members has been running some jobs since > December. I don't know much about the jobs, i.e., if they can be > restarted without losing all the computing work that's been done so > far. > > Can anyone suggest a way to save the state of the jobs so they can > continue when the power comes back on? Sorry, no real answers :( If your nodes were actually virtual machines, one per host, you could suspend the VM to disk, then shut down your hosts safely. Of course, this assumes your VMs could suspend to disk at all... > From experience, ACPI seems > flaky if it will work at all because of problems with BIOS > implementations. Also, iirc ACPI has issues with SMP kernels. Is > there something similar to software suspend found in Linux? Does > anyone have any other suggestions to accomplish this? I think you'd be needing suspend to disk.... unless you have enough UPS to keep the machines with enough juice in suspend mode for the duration of your outage... good luck, _________________________ {Beto|Norberto|Numard} Meijome Unix is user friendly. However, it isn't idiot friendly. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.