Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 27 Nov 2011 20:41:20 +0200
From:      Kostik Belousov <kostikbel@gmail.com>
To:        Lev Serebryakov <lev@freebsd.org>
Cc:        Kirk McKusick <mckusick@mckusick.com>, freebsd-fs@freebsd.org
Subject:   Re: Does UFS2 send BIO_FLUSH to GEOM when update metadata (with softupdates)?
Message-ID:  <20111127184120.GT50300@deviant.kiev.zoral.com.ua>
In-Reply-To: <1381381670.20111127152414@serebryakov.spb.ru>
References:  <20111123194444.GE50300@deviant.kiev.zoral.com.ua> <201111260725.pAQ7PDow056289@chez.mckusick.com> <20111126080351.GD50300@deviant.kiev.zoral.com.ua> <1961318852.20111126121354@serebryakov.spb.ru> <20111126084151.GH50300@deviant.kiev.zoral.com.ua> <1381381670.20111127152414@serebryakov.spb.ru>

next in thread | previous in thread | raw e-mail | index | archive | help

--M3MVXBHeTEnycIo5
Content-Type: text/plain; charset=koi8-r
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Sun, Nov 27, 2011 at 03:24:14PM +0400, Lev Serebryakov wrote:
> Hello, Kostik.
> You wrote 26 =CE=CF=D1=C2=D2=D1 2011 =C7., 12:41:51:
>=20
> > on the operation end. In fact, there is inherited uglyness due to async
> > nature, namely, the kernel-owned buffer locks. Getting rid of them would
> > be much more useful then breaking UFS.
>   Why do you name it breaking? How additional piece of meta-information c=
ould break
> UFS?
Because disabling reordering of the writes issued by UFS slows it down
by a factor of 3-10 times.

>=20
> > The non-broken driver must not return the 'completed' bio into the up
> > queue until write is sent to hardware and hardware reported the complet=
ion.
>  So, hold bio without completion for, say, 5 minutes, will be Ok?
It is up to users of your driver to decide is it Ok or no.

For UFS/SU, the only consequence will be the accumulation of the
workitems in memory that track dependencies of other metadata buffers on
the delayed one. For UFS/SU+J, if some buffer is delayed indefinitely,
the journal might overflow.

>=20
> > Raid controllers which aggressively cache the writes use nvram or
> > battery backups, and do not allow to turn on write cache if battery is
> > non-functional. I had not seen SU inconsistencies on RAID 6 on mfi(4),
>   It is not always true. And it could be not true for network
> attached storage, as here is too many variables in equation in such
> case. Yes, good controller should do this, I could not agree more. But
> it is not always possible, unfortunately.
Yous claims are not backed by any facts. Please inform us on the models
and revisions of the firmware for the devices you declare are broken
in the described ways. Also, please reference the documentation which
states that devices behave in such a way.

At least I would know what to avoid.

>=20
> > despite one our machine has unfortunate habit of dropping boot disk over
> > SATA channel each week, for 2 years.
>    Great! But even battery-backed (read: UPS) software realization is
>  not protected from OS crashes. So, it is impossible to implement
>  software RAID5, which plays nicely with UFS (in case of crash --
>  until ehre is no crash, everyhting is perfect), now. Ok, you could
>  say ``we don't need it at all,'' but I could not agree with this
>  statement. Yes, I'm biased here. But, really, I see some interest to
>  software RAID5 on FreeBSD now.
Software RAID5 might loose the checksum block due to kernel or power
failure. This is not different from RAID1 declared inconsistent after
the unclean stop.

Your claim is not backed by facts, again.
>=20
> > You again missed the point - if metadata is not reordable, but user
> > data is, you get security issues. They are similar (but inverse) to what
> > I described in the previous paragraph.
>   In case of crash -- yes. But, IMHO, in case of crash here could be
>  scenario when some information is leaked in any case. If here is no
>  crash, you haven't security issues. Because every read will return
>  actual information, either from write cache, or from plates.
>  Inconsistent cache implementation is bad thing, for sure, but it is
>  orthogonal question to what we discuss here.
I cannot understand how you answer is related to my statement.

--M3MVXBHeTEnycIo5
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (FreeBSD)

iEYEARECAAYFAk7ShFAACgkQC3+MBN1Mb4ispQCfa9fVFO8CZ6dNcpeqxxVWbxqQ
Q0oAoN8Mhg+VlkLLhSbx4xooATs6l80g
=AHnQ
-----END PGP SIGNATURE-----

--M3MVXBHeTEnycIo5--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20111127184120.GT50300>