Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 13 Apr 2014 00:38:58 -0700
From:      <dteske@FreeBSD.org>
To:        "'Doug Hardie'" <bc979@lafn.org>, <freebsd-stable@freebsd.org>
Cc:        dteske@FreeBSD.org, 'Chris H' <bsd-lists@bsdforge.com>
Subject:   RE: 9.2 Boot Problem
Message-ID:  <117a01cf56eb$6f989e50$4ec9daf0$@FreeBSD.org>
In-Reply-To: <1D50A38D-8919-4034-A4E5-EEF8E78E638D@lafn.org>
References:  <175D3755-BB9B-4EAD-BDAD-06E9670E06AB@lafn.org> <186472F9-A97B-4863-81BC-67BE788D5E9A@lafn.org> <a865b8f2ccb9ad4918544bad3d49554d.authenticated@ultimatedns.net> <791C8200-023A-4ACB-9B6F-F5A8B0E170F4@lafn.org> <5bfb4fb619954c3dfbd3499aafa98917.authenticated@ultimatedns.net> <4F983E6A-0A7D-403C-AFAA-9CCCCB05716F@lafn.org> <feeca307c8da9ca3b385cf47d75904a7.authenticated@ultimatedns.net> <0f3f01cf5439$13cf8570$3b6e9050$@FreeBSD.org> <981CAA9F-1E67-4E56-A119-BA6D1D29F383@lafn.org> <89290759-E5C2-4991-B644-A82648BEDD52@lafn.org> <1D50A38D-8919-4034-A4E5-EEF8E78E638D@lafn.org>

next in thread | previous in thread | raw e-mail | index | archive | help
------=_NextPart_000_117B_01CF56B0.C33B25E0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit



> -----Original Message-----
> From: Doug Hardie [mailto:bc979@lafn.org]
> Sent: Saturday, April 12, 2014 7:08 PM
> To: freebsd-stable@freebsd.org
> Cc: dteske@FreeBSD.org Teske; Chris H
> Subject: Re: 9.2 Boot Problem
> 
> 
> On 10 April 2014, at 14:23, Doug Hardie <bc979@lafn.org> wrote:
> 
> >
> > On 9 April 2014, at 16:53, Doug Hardie <bc979@lafn.org> wrote:
> >
> >>
> >> On 9 April 2014, at 14:17, dteske@FreeBSD.org wrote:
> >>
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: Chris H [mailto:bsd-lists@bsdforge.com]
> >>>> Sent: Wednesday, April 9, 2014 2:03 PM
> >>>> To: Doug Hardie
> >>>> Cc: freebsd-stable@freebsd.org List
> >>>> Subject: Re: 9.2 Boot Problem
> >>>>
> >>>>>
> >>>>> On 9 April 2014, at 13:49, "Chris H" <bsd-lists@bsdforge.com> wrote:
> >>>>>
> >>>>>>>
> >>>>>>> On 9 April 2014, at 11:29, "Chris H" <bsd-lists@bsdforge.com>
> wrote:
> >>>>>>>
> >>>>>>>>>
> >>>>>>>>> On 4 April 2014, at 21:08, Doug Hardie <bc979@lafn.org> wrote:
> >>>>>>>>>
> >>>>>>>>>> I put this out on Questions, but got no responses. Hopefully
> >>>>>>>>>> someone here has some ideas.
> >>>>>>>>>>
> >>>>>>>>>> FreeBSD 9.2.  All of my systems are hanging during boot right
> >>>>>>>>>> after the screen that has the picture.  Its as if someone hit
> >>>>>>>>>> a space on the keyboard.  However, these systems have no
> keyboard.
> >>>>>>>>>> If I plug one in, or use the serial console, and enter a
> >>>>>>>>>> return, the boot continues properly.
> >>>>>>>>>>
> >>>>>>>>>> The boot menu is displayed along with Beastie.  However, the
> >>>>>>>>>> line that says Autoboot in n seconds. never appears.  It just
> >>>>>>>>>> stops there.  These are all new installs from CD systems.
> >>>>>>>>>> I just used freebsd-update to take a toy server from 9.1 to
> >>>>>>>>>> 9.2 and it doesn't exhibit this behavior.  It boots properly.
> >>>>>>>>>> I have updated one of the production servers with the latest
> >>>>>>>>>> 9.2 changes and it still has the issue.  I first thought that
> >>>>>>>>>> some config file did not get updated properly on the CD.  I
> >>>>>>>>>> have dug around through the 4th files and don't see anything
> >>>>>>>>>> obvious that would cause this.  I have now verified that all
> >>>>>>>>>> the 4th files in boot are identical (except for the version
> >>>>>>>>>> number.  They are slightly different).  I don't believe this
> >>>>>>>>>> is a BIOS setting issue as FreeBSD 7.2 didn't exhibit this
> >>>>>>>>>> behavior.  All
> >>>>>>>>>> 4
> >>>>>>>>>> systems are on totally different motherboards.
> >>>>>>>>>>
> >>>>>>>>>> I tried setting loader_logo="none" in /boot/config.rc and
> >>>>>>>>>> that eliminated the menu and Beastie.  I think the system
> >>>>>>>>>> completed
> >>>> booting, but the serial console was then dead.
> >>>>>>>>>> It
> >>>>>>>>>> did not respond or output anything.  I had to remove that and
> >>>>>>>>>> reboot to get the console back again.
> >>>>>>>>>>
> >>>>>>>>>> I need to get this fixed as these are production servers that
> >>>>>>>>>> are essentially unmanned so its difficult to get them back up
> again.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> No response here either.  Surely someone must know the
> loader.
> >>>>>>>>> I
> >>>> have been digging
> >>>>>>>>> through
> >>>>>>>>> the code, and can't find any differences between the systems
> >>>>>>>>> that
> >>>> work and those that
> >>>>>>>>> don't.
> >>>>>>>>> Is there any way to debug this?  Is there a way to find out
> >>>>>>>>> where the
> >>>> loader is sitting
> >>>>>>>>> waiting on input from the terminal.  That might give a clue as
> >>>>>>>>> to why it
> >>>> didn't
> >>>>>>>>> autoboot.
> >>>>>>>>>
> >>>>>>>> OK. This is the first I've seen of your post. I'm not going to
> >>>>>>>> profess being an expert. But I might suggest adding the
> >>>>>>>> following to
> >>>>>>>> loader.conf(5)
> >>>>>>>>
> >>>>>>>> verbose_loading="YES"
> >>>>>>>> boot_verbose="YES"
> >>>>>>>>
> >>>>>>>> This raises the "noise level". Maybe that will help to provide
> >>>>>>>> you with a bit more information, as to what, or if, your
> >>>>>>>> booting. DO have a look through /boot/defaults/loader.conf for
> >>>>>>>> more hints, as to what, and
> >>>> how
> >>>>>>>> you can control the boot process. As well as
/etc/defaults/rc.conf.
> >>>>>>>> In fact, you can pre-decide what, and how, to boot. Even
> >>>>>>>> passing by the boot menu entirely.
> >>>>>>>
> >>>>>>> Thanks Chris.  I did that and here is what I get:
> >>>>>>>
> >>>>>>> Rebooting...
> >>>>>>> cpu_reset: Stopping other CPUs
> >>>>>>> /boot.config: -Dh
> >>>>>>> Consoles: internal video/keyboard  serial port BIOS drive A: is
> >>>>>>> disk0 BIOS drive C: is disk1 BIOS 640kB/2087360kB available
> >>>>>>> memory
> >>>>>>>
> >>>>>>> FreeBSD/x86 bootstrap loader, Revision 1.1 (doug@zool.lafn.org,
> >>>>>>> Tue Apr  8 20:30:20 PDT 2014) Loading /boot/defaults/loader.conf
> >>>>>>> Warning: unable to open file /boot/loader.conf.local
> >>>>>>> /boot/kernel/kernel text=0xdb3171 data=0xf3c04+0xbb770
> >>>> syms=[0x4+0xeda80+0x4+0x1b8ebf]
> >>>>>>> zpool_cache...failed!
> >>>>>>> \
> >>>>>>> H[Esc]ape to loader prompt_   _____ _____
> >>>>>>> |  ____|             |  _ \ / ____|  __ \
> >>>>>>> | |___ _ __ ___  ___ | |_) | (___ | |  | |
> >>>>>>> |  ___| '__/ _ \/ _ \|  _ < \___ \| |  | |
> >>>>>>> | |   | | |  __/  __/| |_) |____) | |__| |
> >>>>>>> | |   | | |    |    ||     |      |      |
> >>>>>>> |_|   |_|  \___|\___||____/|_____/|_____/    ```
`
> >>>>>>>                                          s` `.....---.......--.```
-/
> >>>>>>> +            Welcome to FreeBSD           + +o   .--`         /y:`
+.
> >>>>>>> |                                         |  yo`:.            :o
`+-
> >>>>>>> |  1. Boot Multi User [Enter]             |   y/        3;46H /
> >>>>>>> |  2.--  /                                |
> >>>>>>> |                                         |
> >>>>>>> |  4. Reboot                              | `:
:`
> >>>>>>> |                                         | `:
:`
> >>>>>>> |  Options:                                  /
/
> >>>>>>> |  5. Configure Boot [O]ptions...            .-
-.
> >>>>>>> |                                             --
-.
> >>>>>>> |                                              `:`
`:`
> >>>>>>> |                                                .--
`--.
> >>>>>>> |                                                   .---.....----.
> >>>>>>> +-----------------------------------------+
> >>>>>>>
> >>>>>>>                                             FreeBSD `Nakatomi
> >>>>>>> Socrates' 9.2
> >>>>>>>
> >>>>>>>
> >>>>>>> Now it waits for a return.  I have tried changing the logo,
> >>>>>>> setting the
> >>>> autoboot timeout
> >>>>>>> and
> >>>>>>> a couple others.  The only thing that did anything different was
> >>>>>>> setting
> >>>> the logo to an
> >>>>>>> invalid value.  Basically the console was dead after that, but
> >>>>>>> the system
> >>>> did boot.  I
> >>>>>>> never
> >>>>>>> see the Auto Boot in n seconds message.  Its also interesting
> >>>>>>> that the list
> >>>> of options
> >>>>>>> above
> >>>>>>> appears incomplete.  On the working system, items 1 through 5
> >>>>>>> are all
> >>>> present.  I have
> >>>>>>> now
> >>>>>>> checked all the cksum's for all the files in /boot and they are
all the
> same.
> >>>>>>>
> >>>>>> Hmmm. Looks like you're going to make me do all your research, for
> you.
> >>>> ;)
> >>>>>> You /did/ read the contents of /boot/defaults/loader.conf. Yes?
> >>>>>> I'm
> >>>> guessing
> >>>>>> that you've also already read loader.4th(8), and the other related
> info.
> >>>>>> Now this is pure supposition; as it appears that you're looking
> >>>>>> for a serial console. I'd /speculate/ that you want to turn all
> >>>>>> that NASTY ANSI stuff
> >>>> OFF
> >>>>>> That's why your not seeing the complete menu -- hear that Devin!
> >>>>>> I'm going to post just this much for now, just to get you
> >>>>>> started. I know what else you need/are looking for. But need to
> >>>>>> find the /correct/ syntax
> >>>> --
> >>>>>> paraphrasing, just won't get it. :)\
> >>>>>
> >>>>> Setting loader_color="NO"   (from man page)  does give back the full
> menu.
> >>>> Still waits for
> >>>>> return after the version name.  I haven't found in the forth where
> >>>>> it is
> >>>> reading the
> >>>>> keyboard.  Yes, I have to use a serial console.  These machines
> >>>>> are about
> >>>> 100 miles away.
> >>>>> Something is stopping the autoboot from even starting.
> >>>>
> >>>> See my reply to this. I think I've given you the hints you need --
> >>>> fingers crossed. :)
> >>>>
> >>>
> >>> He's using console=comconsole (serial boot).
> >>> When that is the case, loader_color is automatically set to NO.
> >>> There's no reason to set both loader_color=NO and console=
> >>> comconsole. The code that does this is here:
> >>>
> >>> http://svnweb.freebsd.org/base/release/9.2.0/sys/boot/forth/color.4t
> >>> h?revision=255898&view=markup Line 48 within the loader_color?
> >>> function:
> >>> 	boot_serial? if FALSE else TRUE then
> >>>
> >>> As for answering the quandary of where the keyboard is polled during
> >>> the timeout countdown, that's the getkey function in here:
> >>>
> >>>
> http://svnweb.freebsd.org/base/release/9.2.0/sys/boot/forth/menu.4th
> >>> ?revision=255898&view=markup
> >>> --
> >>
> >>
> >>
> >> I commented out the 3 cursor positions in menu-timeout-update.  It
> >> does not appear that word is being used.  The Autoboot message never
> >> appeared.  Obviously getkey is being used as it does respond properly
> >> to a return.  I am beginning to suspect that menu_timeout_enabled is
> >> zero.  I believe adding a line after getkey's begin with
> >>
> >>       s"menu_timeout_enabled = " type menu_timeout_enabled @ . 10
> >> spaces
> >>
> >> will tell me.
> >
> >
> >
> > There is a missing space after the first " above.  However, that does
confirm
> my suspicion that menu_timeout_enabled is set to 0.  It is only displayed
> once.  On a working system the value is 1 and that message is output
> numerous times until the 10 seconds expires and then the boot begins.
> >
> > Now to figure out how that value is getting set incorrectly.
> >
> 
> After much digging, I now know what it going on, but not why.  When getkey
> is called the first time, menu_timeout_enable is set to one.  However, it
is
> set to zero on every check after that.  In getkey after the comment "Was a
> key pressed" is a check of key to see if a key was pressed.  It is
returning a
> decimal 7 (BEL).  That then clears menu_timeout_enable and it then sits
> there waiting for a valid key input.  There is no keyboard plugged into
the
> system.  I have no idea how that BEL is being generated or even how to
> prevent it.  Could it be possible that it comes from the serial console?
I tend
> to doubt thats the case since the system hangs during boot when the serial
> console is not connected.  I suppose that I could put in a test for a key
value
> that is not a control character, but that would only work until the next
system
> update.  I'd have to remember to put it back in each time.  Thats not
likely to
> happen.  My memory is not that good.  Whats interesting is that I have 4
> systems (i386) doing this and 1 system (i386) and 2 systems (amd64) not
> doing it.  The only common thread is the 4 systems doing it are about 100
> miles from me and the working ones are here.
> 

Based on that feedback, I've developed the attached patch.txt.
Can you give it a whirl and let me know how it works?
-- 
Cheers,
Devin



_____________
The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.

------=_NextPart_000_117B_01CF56B0.C33B25E0
Content-Type: text/plain; name="patch.txt"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment; filename="patch.txt"

Index: sys/boot/forth/menu.4th=0A=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=0A=
--- sys/boot/forth/menu.4th	(revision 264244)=0A=
+++ sys/boot/forth/menu.4th	(working copy)=0A=
@@ -897,10 +897,10 @@ create kernelsbuf 256 allot=0A=
 =0A=
 				menu_timeout @ 0=3D if=0A=
 					\ We've reached the end of the timeout=0A=
-					\ (user did not cancel by pressing ANY=0A=
-					\ key)=0A=
+					\ (user did not cancel by pressing=0A=
+					\ ASCII sequence 0x8 BS or higher)=0A=
 =0A=
-					s" menu_timeout_command"  getenv dup=0A=
+					s" menu_timeout_command" getenv dup=0A=
 					-1 =3D if=0A=
 						drop \ clean-up=0A=
 					else=0A=
@@ -915,10 +915,17 @@ create kernelsbuf 256 allot=0A=
 			( -- )=0A=
 		then=0A=
 =0A=
-		key? if \ Was a key pressed? (see loader(8))=0A=
+		key? if ( and ) key dup 7 > if=0A=
 =0A=
-			\ An actual key was pressed (if the timeout is running,=0A=
-			\ kill it regardless of which key was pressed)=0A=
+			( -- N )=0A=
+=0A=
+			\ Was a key pressed? (see loader(8))=0A=
+			\ ... and was it a usable ASCII sequence?=0A=
+=0A=
+			\ NB: Some systems may generate ASCII 0x7 BEL when a=0A=
+			\ keyboard is not connected (e.g., booting serial)=0A=
+=0A=
+			\ If the timeout is running, kill it=0A=
 			menu_timeout @ 0<> if=0A=
 				0 menu_timeout !=0A=
 				0 menu_timeout_enabled !=0A=
@@ -927,14 +934,9 @@ create kernelsbuf 256 allot=0A=
 				0 menu-timeout-update=0A=
 			then=0A=
 =0A=
-			\ get the key that was pressed and exit (if we=0A=
-			\ get a non-zero ASCII code)=0A=
-			key dup 0<> if=0A=
-				exit=0A=
-			else=0A=
-				drop=0A=
-			then=0A=
-		then=0A=
+			exit=0A=
+=0A=
+		else drop then then=0A=
 		50 ms \ sleep for 50 milliseconds (see loader(8))=0A=
 =0A=
 	again=0A=

------=_NextPart_000_117B_01CF56B0.C33B25E0--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?117a01cf56eb$6f989e50$4ec9daf0$>