From owner-freebsd-stable@FreeBSD.ORG Fri Aug 8 16:49:51 2008 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7C9521065673 for ; Fri, 8 Aug 2008 16:49:51 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from server.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net [IPv6:2001:470:1f10:75::2]) by mx1.freebsd.org (Postfix) with ESMTP id 009EC8FC1D for ; Fri, 8 Aug 2008 16:49:50 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from localhost.corp.yahoo.com (john@localhost [IPv6:::1]) (authenticated bits=0) by server.baldwin.cx (8.14.2/8.14.2) with ESMTP id m78GnX5b030165; Fri, 8 Aug 2008 12:49:40 -0400 (EDT) (envelope-from jhb@freebsd.org) From: John Baldwin To: freebsd-stable@freebsd.org Date: Fri, 8 Aug 2008 12:49:28 -0400 User-Agent: KMail/1.9.7 References: <20080627031233.9DC4945047@ptavv.es.net> In-Reply-To: <20080627031233.9DC4945047@ptavv.es.net> MIME-Version: 1.0 Content-Disposition: inline Message-Id: <200808081249.28513.jhb@freebsd.org> Content-Type: text/plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by milter-greylist-2.0.2 (server.baldwin.cx [IPv6:::1]); Fri, 08 Aug 2008 12:49:41 -0400 (EDT) X-Virus-Scanned: ClamAV 0.93.1/7981/Fri Aug 8 11:29:53 2008 on server.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.5 required=4.2 tests=AWL,BAYES_00,NO_RELAYS autolearn=ham version=3.1.3 X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on server.baldwin.cx Cc: Volker , Mark Kirkwood , James Seward , Kelly Black Subject: Re: Problem with /boot/loader [A new patch] X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 08 Aug 2008 16:49:51 -0000 On Thursday 26 June 2008 11:12:33 pm Kevin Oberman wrote: > > Date: Thu, 26 Jun 2008 23:53:44 +0200 > > From: Volker > > Sender: owner-freebsd-stable@freebsd.org > > > > On 12/23/-58 20:59, Kelly Black wrote: > > > Hello, > > > > > > I have a problem with loader. I recently upgraded from 6_rel to 7_rel. > > > Now when I install world there is a problem booting. > > > > > > Here is what I do: > > > cd /usr/src > > > make buildworld > > > make buildkernel KERNCONF=BLACK > > > make installkernel KERNCONF=BLACK > > > > > > At this point I can reboot and all is good. After boot I install the new world: > > > > > > cd /usr/src > > > mergemaster -p > > > reboot into single user mode > > > cd /usr/src > > > make installworld > > > mergemaster > > > > > > Now when I reboot there is a problem. I get an error that the system > > > cannot boot. Part of it looks like this: > > > Can't work out which disk we are booting from. > > > Guessed BIOS device 0xffffffff not found by probes, defaulting to disk0: > > > > > > If I boot from a live disk and replace /boot/loader with > > > /boot/loader.old it boots up fine and everything looks good. A new > > > world and a new kernel. I would be grateful for any help or any > > > pointers. > > > > > > Sincerely, > > > Kel > > > > > > PS I do not do anything special with my loader config files: > > > > > > $ cat loader.conf > > >... > > > > Kelly, > > > > the /boot/loader.conf file does not come into play at that stage. Early > > in the loader code, loader needs to figure out, which disk (BIOS device) > > has been booted from. Until loader knows which device was booted up, > > it's unable to access any files (even loader.conf) on your boot device. > > > > As I've never seen such a problem while upgrading any system, I suspect > > your problem must be settings specific. Can you show me your kernel > > config or are you using a plain vanilla GENERIC? Which arch are we > > talking about? > > > > As I'm currently investigating another boot problem (but earlier in the > > boot chain), I'll check boot logic in the source code and may check for > > your issue, too, at that time, so it's just one effort. But please stay > > patient for some days, as I'm currently too busy. > > We just got hit by this. The loader never loads and nothing boots. But a > system admin discovered that the problem disappeared if the /boot.conf > file was deleted. It just contained '-P'. > > Once this file was removed, the system just booted up as expected. When > he changed it to -D or -h, the boot still locked up. So I had a little epiphany in the shower this morning and have a possible fix. I've suspected from the start that the hangs had to do with interrupts being disabled/enabled at the wrong time. However, I had always been assuming that the problem was interrupts being disabled when they should have been enabled. Now I think it's actually the reverse. :) Some background: There are three sorts of requests that BTX can handle that require dropping to real mode (previously this was done with virtual 8086 mode): 1) hardware interrupt, 2) user request (boot2/loader) to simulate a software interrupt (e.g. int 0x15 BIOS calls), 3) user request to perform a far call to a specified cs:ip in real mode. For all 3 of these requests, we do preserve the %eflags register at the time of the interrupt/user request and make it visible as-is to the real mode code with some possible modifications. Previously the only modifications I did was to disable interrupts (PSL_I) in case 1). When looking at this earlier, I noticed that none of the BTX clients (boot2/gptboot/loader) had ever explicitly initialized the eflags value that gets passed to BTX during vm86 requests, so the initial flags (including PSL_I) was garbage, and as a result, it was sort of random as to whether or not the real mode code for cases 2) and 3) was run with interrupts enabled or disabled. My realization this morning is that software interrupts ('int X') in real mode disable interrupts just like hardware interrupts do. Thus, my patch changes BTX to disable interrupts for both cases 1) and 2) now. I think this will fix the hangs. I'm still including the code to explicitly initialize the eflags for user requests to a known-good value. It still has interrupts enabled which means that case 3) should know always run with interrupts enabled (which is the desired state), but the client can disable interrupts in the eflags in the vm86 structure if desired. The updated patch (same URL, new patch) is at http://www.FreeBSD.org/~jhb/patches/btx_hang.patch -- John Baldwin