Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 03 Sep 2014 10:16:21 -0600
From:      Ian Lepore <ian@FreeBSD.org>
To:        Alexander Motin <mav@FreeBSD.org>
Cc:        src-committers@FreeBSD.org, John-Mark Gurney <jmg@funkthat.com>, Roger Pau =?ISO-8859-1?Q?Monn=E9?= <royger@FreeBSD.org>, scottl@FreeBSD.org, cperciva@FreeBSD.org, svn-src-head@FreeBSD.org, gibbs@freebsd.org, svn-src-all@FreeBSD.org
Subject:   Re: svn commit: r269814 - head/sys/dev/xen/blkfront
Message-ID:  <1409760981.1150.284.camel@revolution.hippie.lan>
In-Reply-To: <54073BC2.1000703@FreeBSD.org>
References:  <53e8e31e.2179.30c1c657@svn.freebsd.org> <53FF7386.3050804@FreeBSD.org> <20140828184515.GV71691@funkthat.com> <53FF7BC4.6050801@FreeBSD.org> <5400BDC7.7020902@FreeBSD.org> <54058E1E.4050907@FreeBSD.org> <20140902171841.GX71691@funkthat.com> <5407385B.1000005@FreeBSD.org>  <54073BC2.1000703@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 2014-09-03 at 19:03 +0300, Alexander Motin wrote:
> On 03.09.2014 18:48, Roger Pau Monn=E9 wrote:
> > El 02/09/14 a les 19.18, John-Mark Gurney ha escrit:
> >> Roger Pau Monn wrote this message on Tue, Sep 02, 2014 at 11:30 +020=
0:
> >>> El 29/08/14 a les 19.52, Roger Pau Monn=E9 ha escrit:
> >>>> El 28/08/14 a les 20.58, Alexander Motin ha escrit:
> >>>>> On 28.08.2014 21:45, John-Mark Gurney wrote:
> >>>>>> Alexander Motin wrote this message on Thu, Aug 28, 2014 at 21:23=
 +0300:
> >>>>>>> Hi, Roger.
> >>>>>>>
> >>>>>>> It looks to me like this commit does not work as it should. I g=
ot
> >>>>>>> problem when I just tried `newfs /dev/ada0 ; mount /dev/ada0 /m=
nt`.
> >>>>>>> Somehow newfs does not produce valid filesystem. Problem is rel=
iably
> >>>>>>> repeatable and reverting this commit fixes it.
> >>>>>>>
> >>>>>>> I found at least one possible cause there: If original data buf=
fer is
> >>>>>>> unmapped, misaligned and not physically contiguous, then presen=
t x86
> >>>>>>> bus_dmamap_load_bio() implementation will process each physical=
ly
> >>>>>>> contiguous segment separately. Due to the misalignment first an=
d last
> >>>>>>> physical segments may have size not multiple to 512 bytes. Sinc=
e each
> >>>>>>> segment processed separately, they are not joined together, and
> >>>>>>> xbd_queue_cb() is getting segments not multiple to 512 bytes. A=
ttempt to
> >>>>>>> convert them to exact number of sectors in the driver cause dat=
a corruption.
> >>>>>>
> >>>>>> Are you sure this isn't a problem w/ the tag not properly specif=
ying
> >>>>>> the correct alignement?=20
> >>>>>
> >>>>> I don't know how to specify it stronger then this:
> >>>>>         error =3D bus_dma_tag_create(
> >>>>>             bus_get_dma_tag(sc->xbd_dev),       /* parent */
> >>>>>             512, PAGE_SIZE,                     /* algnmnt, bound=
ary */
> >>>>>             BUS_SPACE_MAXADDR,                  /* lowaddr */
> >>>>>             BUS_SPACE_MAXADDR,                  /* highaddr */
> >>>>>             NULL, NULL,                         /* filter, filter=
arg */
> >>>>>             sc->xbd_max_request_size,
> >>>>>             sc->xbd_max_request_segments,
> >>>>>             PAGE_SIZE,                          /* maxsegsize */
> >>>>>             BUS_DMA_ALLOCNOW,                   /* flags */
> >>>>>             busdma_lock_mutex,                  /* lockfunc */
> >>>>>             &sc->xbd_io_lock,                   /* lockarg */
> >>>>>             &sc->xbd_io_dmat);
> >>>>>
> >>>>>> Also, I don't think there is a way for busdma
> >>>>>> to say that you MUST have a segment be a multiple of 512, though=
 you
> >>>>>> could use a 512 boundary, but that would force all segments to o=
nly be
> >>>>>> 512 bytes...
> >>>>>
> >>>>> As I understand, that is mandatory requirement for this "hardware=
".
> >>>>> Alike 4K alignment requirement also exist at least for SDHCI, and=
 IIRC
> >>>>> UHCI/OHCI hardware. Even AHCI requires both segment addresses and
> >>>>> lengths to be even.
> >>>>>
> >>>>> I may be wrong, but I think it is quite likely that hardware that
> >>>>> requires segment address alignment quite likely will have the sam=
e
> >>>>> requirements for segments length.
> >>>
> >>> Hello,
> >>>
> >>> I have the following fix, which makes sure the total length and the=
=20
> >>> size of each segment is aligned. I'm not very knowledgeable of the=20
> >>> busdma code, so someone has to review it.
> >>
> >> I feel that this alignment should only be enforced via a new option =
on
> >> the tag...  I don't see how alignment and segment size should be
> >> conflated...  I could totally see a device that requires an aligneme=
nt
> >> of 8 bytes, but has a segment size of 16, or vice versa, and requiri=
ng
> >> them to be the same means we will bounce unnecesarily...
> >>
> >> cc'd scottl since he knows this code better than I... and cperciva a=
s
> >> he touched it for similar reasons..
> >>
> >> Oh, I just found PR 152818, where cperciva did a similar fix to
> >> bounce_bus_dmamap_load_buffer for the exact same reason...  It was
> >> committed in r216194...
> >=20
> > Since Xen blkfront seems to be the only driver to have such segment=20
> > size requirements,=20
>=20
> No, it is not. I've already posted other examples I can recall: SDHCI,
> UHCI/OHCI and AHCI. Their limitations are different and less strict, bu=
t
> still may need handling. For SDHCI, since it is quite slow and has many
> other bugs, I practically implemented custom buffer bouncing. AHCI I
> suppose works only because limitation is only for even addresses, and
> odd ones happen extremely rarely (does not happen). For USB I am not
> sure, but at least umass driver does not support unmapped I/O.
>=20
> > it might be best to just fix blkfront to always=20
> > roundup segment size to 512, like the following:
>=20
> I think some coffee is needed here. ;) Rounding addresses won't make
> data properly aligned. Some copy is unavoidable in such cases. It would
> be good if it was done properly by default buffer bouncer.
>=20

And likewise I don't see how rounding up sizes can result in anything
other than writing extra garbage to disk.  If there is a 512 byte
sector's worth of data not aligned to a 512 boundary and crossing a page
boundary, the bounce code is going to have to cope with the fact that
the second half of the data to be copied may not be physically
contiguous, unless there's some rule in the phys bio world that g'tees
it is.

-- Ian

> > diff --git a/sys/dev/xen/blkfront/blkfront.c b/sys/dev/xen/blkfront/b=
lkfront.c
> > index 26b8f09..2d284d9 100644
> > --- a/sys/dev/xen/blkfront/blkfront.c
> > +++ b/sys/dev/xen/blkfront/blkfront.c
> > @@ -209,7 +209,8 @@ xbd_queue_cb(void *arg, bus_dma_segment_t *segs, =
int nsegs, int error)
> > =20
> >  			buffer_ma =3D segs->ds_addr;
> >  			fsect =3D (buffer_ma & PAGE_MASK) >> XBD_SECTOR_SHFT;
> > -			lsect =3D fsect + (segs->ds_len  >> XBD_SECTOR_SHFT) - 1;
> > +			lsect =3D fsect + (roundup2(segs->ds_len, 512)
> > +			    >> XBD_SECTOR_SHFT) - 1;
> > =20
> >  			KASSERT(lsect <=3D 7, ("XEN disk driver data cannot "
> >  			    "cross a page boundary"));
>=20
>=20





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1409760981.1150.284.camel>