Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 8 Aug 2008 12:49:28 -0400
From:      John Baldwin <jhb@freebsd.org>
To:        freebsd-stable@freebsd.org
Cc:        Volker <volker@vwsoft.com>, Mark Kirkwood <markir@paradise.net.nz>, James Seward <jamesoff@gmail.com>, Kelly Black <kjblack@gmail.com>
Subject:   Re: Problem with /boot/loader [A new patch]
Message-ID:  <200808081249.28513.jhb@freebsd.org>
In-Reply-To: <20080627031233.9DC4945047@ptavv.es.net>
References:  <20080627031233.9DC4945047@ptavv.es.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thursday 26 June 2008 11:12:33 pm Kevin Oberman wrote:
> > Date: Thu, 26 Jun 2008 23:53:44 +0200
> > From: Volker <volker@vwsoft.com>
> > Sender: owner-freebsd-stable@freebsd.org
> > 
> > On 12/23/-58 20:59, Kelly Black wrote:
> > > Hello,
> > > 
> > > I have a problem with loader. I recently upgraded from 6_rel to 7_rel.
> > > Now when I install world there is a problem booting.
> > > 
> > > Here is what I do:
> > > cd /usr/src
> > > make buildworld
> > > make buildkernel KERNCONF=BLACK
> > > make installkernel KERNCONF=BLACK
> > > 
> > > At this point I can reboot and all is good. After boot I install the new 
world:
> > > 
> > > cd /usr/src
> > > mergemaster -p
> > > reboot into single user mode
> > > cd /usr/src
> > > make installworld
> > > mergemaster
> > > 
> > > Now when I reboot there is a problem. I get an error that the system
> > > cannot boot. Part of it looks like this:
> > > Can't work out which disk we are booting from.
> > > Guessed BIOS device 0xffffffff not found by probes, defaulting to disk0:
> > > 
> > > If I boot from a live disk and replace /boot/loader with
> > > /boot/loader.old it boots up fine and everything looks good. A new
> > > world and a new kernel. I would be grateful for any help or any
> > > pointers.
> > > 
> > > Sincerely,
> > > Kel
> > > 
> > > PS I do not do anything special with my loader config files:
> > > 
> > > $ cat loader.conf
> > >...
> > 
> > Kelly,
> > 
> > the /boot/loader.conf file does not come into play at that stage. Early
> > in the loader code, loader needs to figure out, which disk (BIOS device)
> > has been booted from. Until loader knows which device was booted up,
> > it's unable to access any files (even loader.conf) on your boot device.
> > 
> > As I've never seen such a problem while upgrading any system, I suspect
> > your problem must be settings specific. Can you show me your kernel
> > config or are you using a plain vanilla GENERIC? Which arch are we
> > talking about?
> > 
> > As I'm currently investigating another boot problem (but earlier in the
> > boot chain), I'll check boot logic in the source code and may check for
> > your issue, too, at that time, so it's just one effort. But please stay
> > patient for some days, as I'm currently too busy.
> 
> We just got hit by this. The loader never loads and nothing boots. But a
> system admin discovered that the problem disappeared if the /boot.conf
> file was deleted. It just contained '-P'.
> 
> Once this file was removed, the system just booted up as expected. When
> he changed it to -D or -h, the boot still locked up. 

So I had a little epiphany in the shower this morning and have a possible fix.  
I've suspected from the start that the hangs had to do with interrupts being 
disabled/enabled at the wrong time.  However, I had always been assuming that 
the problem was interrupts being disabled when they should have been enabled.  
Now I think it's actually the reverse. :)  Some background:

There are three sorts of requests that BTX can handle that require dropping to 
real mode (previously this was done with virtual 8086 mode): 1) hardware 
interrupt, 2) user request (boot2/loader) to simulate a software interrupt 
(e.g. int 0x15 BIOS calls), 3) user request to perform a far call to a 
specified cs:ip in real mode.

For all 3 of these requests, we do preserve the %eflags register at the time 
of the interrupt/user request and make it visible as-is to the real mode code 
with some possible modifications.  Previously the only modifications I did 
was to disable interrupts (PSL_I) in case 1).  When looking at this earlier, 
I noticed that none of the BTX clients (boot2/gptboot/loader) had ever 
explicitly initialized the eflags value that gets passed to BTX during vm86 
requests, so the initial flags (including PSL_I) was garbage, and as a 
result, it was sort of random as to whether or not the real mode code for 
cases 2) and 3) was run with interrupts enabled or disabled.

My realization this morning is that software interrupts ('int X') in real mode 
disable interrupts just like hardware interrupts do.  Thus, my patch changes 
BTX to disable interrupts for both cases 1) and 2) now.  I think this will 
fix the hangs.  I'm still including the code to explicitly initialize the 
eflags for user requests to a known-good value.  It still has interrupts 
enabled which means that case 3) should know always run with interrupts 
enabled (which is the desired state), but the client can disable interrupts 
in the eflags in the vm86 structure if desired.

The updated patch (same URL, new patch) is at 
http://www.FreeBSD.org/~jhb/patches/btx_hang.patch

-- 
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200808081249.28513.jhb>