Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 15 Jul 2016 09:51:25 -0600
From:      Ian Lepore <ian@freebsd.org>
To:        Karl Denninger <karl@denninger.net>, freebsd-arm <freebsd-arm@freebsd.org>
Subject:   Re: Bizarre clone attempt failures on Raspberry Pi2...
Message-ID:  <1468597885.72182.286.camel@freebsd.org>
In-Reply-To: <d1aba096-e645-04df-dfda-5a9284250960@denninger.net>
References:  <548783e1-9047-68f7-5f50-449db684d602@denninger.net> <d2eb4035-e494-1a7b-98e5-2aa87efe0763@denninger.net> <EDE65B12-4961-4CEF-8AE9-BFDA4FD508A5@gromit.dlib.vt.edu> <5475ea53-ae22-2634-6f2a-5737d1b0e308@denninger.net> <398ae56c-8893-f188-c210-cf7f19ccf433@denninger.net> <1468518953.72182.219.camel@freebsd.org> <7a91fc79-1c85-fac8-aa3f-db90592f3f44@denninger.net> <bec46aff-a4d5-9c4d-49d0-78534b13f719@denninger.net> <E01579F5-9562-4E51-9CFB-EA510460A4C8@gromit.dlib.vt.edu> <60b6e156-981e-9fbd-b68c-0daae1961286@denninger.net> <04391154-A38E-46CD-B570-B2BECFD19022@gromit.dlib.vt.edu> <d1aba096-e645-04df-dfda-5a9284250960@denninger.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 2016-07-15 at 09:44 -0500, Karl Denninger wrote:
> On 7/15/2016 09:22, Paul Mather wrote:
> 
> > On Jul 15, 2016, at 9:44 AM, Karl Denninger <karl@denninger.net>
> > wrote:
> > 
> > > On 7/15/2016 08:36, Paul Mather wrote:
> > > > On Jul 14, 2016, at 11:36 PM, Karl Denninger <
> > > > karl@denninger.net> wrote:
> > > > 
> > > > > Found it.
> > > > > 
> > > > > Apparently the current code *requires* the label be set on
> > > > > the msdos
> > > > > partition.  If it's not then not only does it not mount
> > > > > (which shouldn't
> > > > > matter post-boot as the loader is supposed to pass the dtb
> > > > > file, it is
> > > > > specified in the config file without any sort of path prefix,
> > > > > and thus
> > > > > once the kernel has loaded it should not matter if the dos
> > > > > partition if
> > > > > actually mounted or not) *but* the boot process hangs without
> > > > > any
> > > > > indication of why!
> > > > > 
> > > > > So, you must do newfs_msdos -L MSDOSBOOT -F 16 {device}
> > > > > 
> > > > > If the "-L" is missing you're hosed; the system facially
> > > > > appears to be
> > > > > just fine but while the loader comes up and so does the
> > > > > kernel, it hangs
> > > > > without ever proceeding -- and without any sort of error
> > > > > message
> > > > > indicating that it is unable to mount something it needs.
> > > > You have to do that because the device entry in the stock
> > > > /etc/fstab is /dev/msdosfs/MSDOSBOOT.  The /dev/msdosfs part
> > > > indicates it's using ms-dos labels.  In other words, this is
> > > > just the same sort of failure you were getting when you weren't
> > > > labelling the UFS partition as "rootfs".  Labelling the file
> > > > system properly "fixes" the issue, as you would expect.
> > > > 
> > > > It's a misnomer to say the code "requires" labels.  It's just
> > > > that's the way the distribution images are currently set up.  I
> > > > have an older Pi that predates the current distribution images
> > > > that just uses /dev/mmcsd0... device names in /etc/fstab.  Both
> > > > approaches work fine.  You just need to make sure the devices
> > > > you specify in /etc/fstab will actually exist when it comes
> > > > time to mount the corresponding file system.
> > > Except that if the root filesystem doesn't mount you get an
> > > error, and
> > > thus you can figure out what's going on.  What excuse is there
> > > for not
> > > printing an error message if a mount fails, and if something in
> > > /etc/fstab fails to mount what's with hanging the machine?  I've
> > > had
> > > disks be unavailable before on Intel architecture machines (it
> > > happens
> > > when disks fail) and the result is an error on the failure to
> > > mount but,
> > > unless it's the root volume, the system still comes up.
> > 
> > Are you sure you don't get an error?  When I forgot to label rootfs
> > recently when I cloned an SD card I got an error displayed on the
> > serial console.  I didn't get an error on the HDMI screen console.
> You get an error if rootfs is not labelled on the HDMI screen (as
> root
> fails to mount.) There is *no* error on an HDMI screen if the msdosfs
> is
> not labeled.
> > As I've mentioned before directly, FreeBSD/arm acts like
> > console="comconsole,vidconsole" is in effect.  This means that
> > during /etc/rc boot processing, you'll only get output on
> > comconsole (except for kernel messages, which seem to go to both). 
> >  That's been my experience in FreeBSD in general.
> > 
> > I dimly recall folks on here saying U-Boot doesn't currently
> > enable/support USB keyboards, so there's not really much you can do
> > to fix it interactively if you fail to boot the OS and hence enable
> > USB keyboard support via FreeBSD.  That's not a problem if you use
> > a serial console, which is supported by U-Boot.
> Well, that's not true if the kernel is loaded.  Once the kernel loads
> a
> usb keyboard works.
> > 
> > I'm not sure comparisons with Intel architecture machines is
> > entirely appropriate as they use a different boot
> > environment/mechanism.  Still, I stand by the fact that I've always
> > got an error message on the serial console when disks on my
> > FreeBSD/arm system have failed to mount at boot.  (It used to
> > happen regularly with an external USB drive I had that took a long
> > time to probe, and I ended up having to put a kern.cam.boot_delay
> > in /boot/loader.conf to avoid the system dropping into single-user
> > mode when doing a reboot.)
> > 
> > 
> > > > If you stop using labels in your /etc/fstab then you won't have
> > > > problems when those labels are missing.  If the labels are
> > > > missing, the /dev/{msdosfs,ufs} devices will not be present and
> > > > the system will drop to single-user mode because none-late, non
> > > > -noauto file systems can't be accessed via their device nodes
> > > > when attempting to mount them.  When that happens and you don't
> > > > have a serial console enabled then you have problems
> > > > remediating the situation.
> > > > 
> > > > If a file system is not needed to mount as part of booting (as
> > > > you suggest for /boot/msdos) then you should probably flag it
> > > > with the "noauto" option in /etc/fstab or remove it from
> > > > /etc/fstab entirely.
> > > > 
> > > > I think the problem you were having is not copying all the
> > > > required attributes of the file systems in question when
> > > > cloning your SD cards, given your /etc/fstab setup.  It sounds
> > > > like you've fixed that, now.
> > > Again, if it dropped to single user mode *and said it was doing
> > > so* or
> > > if there was an error message on the console when the filesystem
> > > failed
> > > to mount I would have found this in a reasonable period of time. 
> > >  It
> > > wasn't that rough to do so with the ufs label once I knew the
> > > filesystem
> > > was failing to mount, which was discernible from the console
> > > output.
> > > 
> > > Not printing an error when things error out is rude at best, and
> > > when
> > > that error is going to prevent the system from coming up this
> > > darn well
> > > ought to show up where one with a monitor plugged in can see it,
> > > eh?
> > > 
> > > There was literally no indication at all as to what was going on
> > > and
> > > since gpart does not show filesystem labels for *either* BSD
> > > labeled
> > > slices OR msdos figuring out what was different between the two
> > > proved
> > > to be a bit troublesome.  IMHO at least the failure to display an
> > > error
> > > message in this circumstance ought to be corrected.
> > 
> > See above re: serial console vs. video console.
> > 
> > As for the labels, these are file system labels and not partition
> > labels.  The big clue is in the device name in /etc/fstab.  (The "
> > -l" option to "gpart show" will only show labels "[f]or
> > partitioning schemes that support partition labels".  That's
> > reasonable, IMHO, as partitions are not the same as file systems
> > and gpart is concerned with partitions.)  In my experience,
> > complaints about not being able to access /dev/ufs/something means
> > you forgot to label a UFS file system as "something" when you made
> > it. :-)
> > 
> > Cheers,
> > 
> > Paul.
> 
> Understood, but the issue here is that there's no indication without
> a
> serial console that you have anything wrong -- the system appears to
> have simply hung.
> 
> The quick fix is to put "failok" (or noauto) in the default
> /etc/fstab
> entry for the dos filesystem, since it is not necessary for that
> filesystem to be mounted at all on a running machine.  If there is a
> policy reason to leave it accessible (and there's a fairly-clean
> argument that there is) then "failok" might be preferable to
> "noauto",
> but either way forcing a filesystem that is not necessary to be
> accessible or the system fails to come up and does not give any
> indication of same on what many users will have accessible to them is
> facially wrong.
> 
> These devices are thought of as "appliances" by many and as such the
> model of USB keyboard + HDMI (e.g. TV or monitor) is entirely
> reasonable, and IMHO FreeBSD ought to, when possible, make that a
> viable
> option.  It both is and can be provided the kernel loads, but the
> defaults in pre-built configurations right now preclude that.
> 

I'm having a hard time understanding how a problem report got generated
about all this, or how any of it is anything other than "Karl
misconfigured his system."

The downloadable system images work correctly.  You made a local change
(formatted new media) and depending on how you want to look at it,
either you didn't format correctly or you didn't make your config files
match the way you formatted, and that made your system stop working. 
 It doesn't mean there is anything wrong about the way the downloadable
images are generated.

Changing fstab in the distributed images so that a failure to mount a
filesystem becomes a non-error seems like a bad idea to me.  The only
way that problem happens with a downloaded image is if the image wasn't
burned successfully, and that doesn't seem like something that needs to
just get papered over just because in your use-case you don't really
need the filesystem that failed to mount.

A PR about the fact that it hung without visibly reporting an error may
be appropriate.  A PR that says we should just paper over the error
because you don't care about it doesn't seem appropriate.

-- Ian




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1468597885.72182.286.camel>