Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 14 Dec 2016 16:47:05 +0000
From:      Roger Pau =?iso-8859-1?Q?Monn=E9?= <royger@freebsd.org>
To:        Dimitry Andric <dim@FreeBSD.org>
Cc:        Colin Percival <cperciva@FreeBSD.org>, src-committers <src-committers@freebsd.org>, svn-src-all@freebsd.org, svn-src-head@freebsd.org
Subject:   Re: svn commit: r310013 - head/sys/dev/xen/blkfront
Message-ID:  <20161214164704.grn5gvsm3u6difs2@dhcp-3-221.uk.xensource.com>
In-Reply-To: <A5CE6877-FDB2-4B07-A611-29FAD91FC9F2@FreeBSD.org>
References:  <201612130654.uBD6sDtF073358@repo.freebsd.org> <A5CE6877-FDB2-4B07-A611-29FAD91FC9F2@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Dec 13, 2016 at 09:28:21PM +0100, Dimitry Andric wrote:
> On 13 Dec 2016, at 07:54, Colin Percival <cperciva@FreeBSD.org> wrote:
> > 
> > Author: cperciva
> > Date: Tue Dec 13 06:54:13 2016
> > New Revision: 310013
> > URL: https://svnweb.freebsd.org/changeset/base/310013
> > 
> > Log:
> >  Check that blkfront devices have a non-zero number of sectors and a
> >  non-zero sector size.  Such a device would be a virtual disk of zero
> >  bytes; clearly not useful, and not something we should try to attach.
> > 
> >  As a fortuitous side effect, checking that these values are non-zero
> >  here results in them not *becoming* zero later on the function.  This
> >  odd behaviour began with r309124 (clang 3.9.0) but is challenging to
> >  debug; making any changes to this function whatsoever seems to affect
> >  the llvm optimizer behaviour enough to make the unexpected zeroing of
> >  the sector_size variable cease.
> 
> I've been having some fun debugging a kernel under Xen, and what I found
> is that sector_size changes from 512 to 0 because of an xs_gather() call
> in xbd_connect().  The function starts as follows:
> 
>   1222  static void
>   1223  xbd_connect(struct xbd_softc *sc)
>   1224  {
>   1225          device_t dev = sc->xbd_dev;
>   1226          unsigned long sectors, sector_size, phys_sector_size;
>   1227          unsigned int binfo;
>   1228          int err, feature_barrier, feature_flush;
> ...
>   1237          err = xs_gather(XST_NIL, xenbus_get_otherend_path(dev),
>   1238              "sectors", "%lu", &sectors,
>   1239              "info", "%u", &binfo,
>   1240              "sector-size", "%lu", &sector_size,
>   1241              NULL);
> 
> After this first call to xs_gather(), sectors is 44040322 (in my case of
> a ~21G disk), binfo is zero, and sector_size is 512.  During execution
> of the following few lines, sector_size stays 512, until just after the
> fourth call to xs_gather():
> 
>   1259          err = xs_gather(XST_NIL, xenbus_get_otherend_path(dev),
>   1260               "feature-flush-cache", "%lu", &feature_flush,
>   1261               NULL);
> 
> and then it becomes zero!  This is because the %lu format scans a 64 bit
> unsigned integer, while the feature_flush variable is a 32 bit signed
> integer, thus overwriting an adjacent location on the stack, which
> happens to contain sector_size: in the assembly output, sector_size is
> at -88(%rbp), while feature_flush is at -92(%rbp).
> 
> There is another instance of such an incorrect format, a few lines
> before this, but it doesn't seem to scribble over important data (or we
> didn't panic because of it, yet):
> 
>   1253          err = xs_gather(XST_NIL, xenbus_get_otherend_path(dev),
>   1254               "feature-barrier", "%lu", &feature_barrier,
>   1255               NULL);
> 
> Here, feature_barrier is a 32 bit signed integer, while %lu again scans
> a 64 bit unsigned integer.
> 
> In short, I think the following change would also fix the problem:
> 
> Index: sys/dev/xen/blkfront/blkfront.c
> ===================================================================
> --- sys/dev/xen/blkfront/blkfront.c     (revision 309817)
> +++ sys/dev/xen/blkfront/blkfront.c     (working copy)
> @@ -1251,13 +1251,13 @@ xbd_connect(struct xbd_softc *sc)
>         if (err || phys_sector_size <= sector_size)
>                 phys_sector_size = 0;
>         err = xs_gather(XST_NIL, xenbus_get_otherend_path(dev),
> -            "feature-barrier", "%lu", &feature_barrier,
> +            "feature-barrier", "%d", &feature_barrier,
>              NULL);
>         if (err == 0 && feature_barrier != 0)
>                 sc->xbd_flags |= XBDF_BARRIER;
> 
>         err = xs_gather(XST_NIL, xenbus_get_otherend_path(dev),
> -            "feature-flush-cache", "%lu", &feature_flush,
> +            "feature-flush-cache", "%d", &feature_flush,
>              NULL);
>         if (err == 0 && feature_flush != 0)
>                 sc->xbd_flags |= XBDF_FLUSH;
> 
> However, I have some difficulty getting a custom-built kernel booting on
> my very rudimentary Xen setup.  I am using a snapshot raw image, with no
> idea how to insert a kernel into it.
> 
> E.g, can you please try this out, instead of the zero-check fix?  I am
> very curious whether that fixes the panics. :)

Yes, this indeed fixes the panic, I've reverted Colin's patch and applied yours, 
and everything seems fine. Please go ahead and commit it.

xs_gather should really go away and we should use something that we can sanitize 
at compile time using __scanflike & friends.

Thanks, Roger.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20161214164704.grn5gvsm3u6difs2>