Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 13 Dec 2016 21:28:21 +0100
From:      Dimitry Andric <dim@FreeBSD.org>
To:        Colin Percival <cperciva@FreeBSD.org>
Cc:        src-committers <src-committers@freebsd.org>, svn-src-all@freebsd.org, svn-src-head@freebsd.org, =?utf-8?Q?Roger_Pau_Monn=C3=A9?= <royger@freebsd.org>
Subject:   Re: svn commit: r310013 - head/sys/dev/xen/blkfront
Message-ID:  <A5CE6877-FDB2-4B07-A611-29FAD91FC9F2@FreeBSD.org>
In-Reply-To: <201612130654.uBD6sDtF073358@repo.freebsd.org>
References:  <201612130654.uBD6sDtF073358@repo.freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help

--Apple-Mail=_CE0A3362-B11A-4518-AE43-BF90CD47FFBB
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;
	charset=us-ascii

On 13 Dec 2016, at 07:54, Colin Percival <cperciva@FreeBSD.org> wrote:
> 
> Author: cperciva
> Date: Tue Dec 13 06:54:13 2016
> New Revision: 310013
> URL: https://svnweb.freebsd.org/changeset/base/310013
> 
> Log:
>  Check that blkfront devices have a non-zero number of sectors and a
>  non-zero sector size.  Such a device would be a virtual disk of zero
>  bytes; clearly not useful, and not something we should try to attach.
> 
>  As a fortuitous side effect, checking that these values are non-zero
>  here results in them not *becoming* zero later on the function.  This
>  odd behaviour began with r309124 (clang 3.9.0) but is challenging to
>  debug; making any changes to this function whatsoever seems to affect
>  the llvm optimizer behaviour enough to make the unexpected zeroing of
>  the sector_size variable cease.

I've been having some fun debugging a kernel under Xen, and what I found
is that sector_size changes from 512 to 0 because of an xs_gather() call
in xbd_connect().  The function starts as follows:

  1222  static void
  1223  xbd_connect(struct xbd_softc *sc)
  1224  {
  1225          device_t dev = sc->xbd_dev;
  1226          unsigned long sectors, sector_size, phys_sector_size;
  1227          unsigned int binfo;
  1228          int err, feature_barrier, feature_flush;
...
  1237          err = xs_gather(XST_NIL, xenbus_get_otherend_path(dev),
  1238              "sectors", "%lu", &sectors,
  1239              "info", "%u", &binfo,
  1240              "sector-size", "%lu", &sector_size,
  1241              NULL);

After this first call to xs_gather(), sectors is 44040322 (in my case of
a ~21G disk), binfo is zero, and sector_size is 512.  During execution
of the following few lines, sector_size stays 512, until just after the
fourth call to xs_gather():

  1259          err = xs_gather(XST_NIL, xenbus_get_otherend_path(dev),
  1260               "feature-flush-cache", "%lu", &feature_flush,
  1261               NULL);

and then it becomes zero!  This is because the %lu format scans a 64 bit
unsigned integer, while the feature_flush variable is a 32 bit signed
integer, thus overwriting an adjacent location on the stack, which
happens to contain sector_size: in the assembly output, sector_size is
at -88(%rbp), while feature_flush is at -92(%rbp).

There is another instance of such an incorrect format, a few lines
before this, but it doesn't seem to scribble over important data (or we
didn't panic because of it, yet):

  1253          err = xs_gather(XST_NIL, xenbus_get_otherend_path(dev),
  1254               "feature-barrier", "%lu", &feature_barrier,
  1255               NULL);

Here, feature_barrier is a 32 bit signed integer, while %lu again scans
a 64 bit unsigned integer.

In short, I think the following change would also fix the problem:

Index: sys/dev/xen/blkfront/blkfront.c
===================================================================
--- sys/dev/xen/blkfront/blkfront.c     (revision 309817)
+++ sys/dev/xen/blkfront/blkfront.c     (working copy)
@@ -1251,13 +1251,13 @@ xbd_connect(struct xbd_softc *sc)
        if (err || phys_sector_size <= sector_size)
                phys_sector_size = 0;
        err = xs_gather(XST_NIL, xenbus_get_otherend_path(dev),
-            "feature-barrier", "%lu", &feature_barrier,
+            "feature-barrier", "%d", &feature_barrier,
             NULL);
        if (err == 0 && feature_barrier != 0)
                sc->xbd_flags |= XBDF_BARRIER;

        err = xs_gather(XST_NIL, xenbus_get_otherend_path(dev),
-            "feature-flush-cache", "%lu", &feature_flush,
+            "feature-flush-cache", "%d", &feature_flush,
             NULL);
        if (err == 0 && feature_flush != 0)
                sc->xbd_flags |= XBDF_FLUSH;

However, I have some difficulty getting a custom-built kernel booting on
my very rudimentary Xen setup.  I am using a snapshot raw image, with no
idea how to insert a kernel into it.

E.g, can you please try this out, instead of the zero-check fix?  I am
very curious whether that fixes the panics. :)

-Dimitry

P.S.: We really need AddressSanitizer support in the kernel...


--Apple-Mail=_CE0A3362-B11A-4518-AE43-BF90CD47FFBB
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
	filename=signature.asc
Content-Type: application/pgp-signature;
	name=signature.asc
Content-Description: Message signed with OpenPGP using GPGMail

-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.30

iEYEARECAAYFAlhQWfIACgkQsF6jCi4glqNjoACg8Bm/ggGf+Pbrm9wIuajXxm6Y
2qoAoOl3Kob3ZGUUyfZaLS6y/lZgx27I
=RooA
-----END PGP SIGNATURE-----

--Apple-Mail=_CE0A3362-B11A-4518-AE43-BF90CD47FFBB--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?A5CE6877-FDB2-4B07-A611-29FAD91FC9F2>