Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 2 Apr 2011 09:57:02 +0200
From:      Olivier Smedts <olivier@gid0.org>
To:        freebsd-stable@freebsd.org
Subject:   Re: Constant rebooting after power loss
Message-ID:  <BANLkTik9aN7TZ_pSZ1b=nMeXO-mW-fYuUA@mail.gmail.com>
In-Reply-To: <201104020335.p323Zp8Q018666@apollo.backplane.com>
References:  <87d3l6p5xv.fsf@cosmos.claresco.hr> <AANLkTi=kEyz-mKLzdV8LAf91ZhMTP8gLKs=3Eu5WD8mh@mail.gmail.com> <874o6ip0ak.fsf@cosmos.claresco.hr> <7b15d37d28f8ddac9eb81e4390231c96.HRCIM@webmail.1command.com> <AANLkTi=KEwmm1hM6Z=r_SWUAn9KhUrkTVzfF6VmqQauW@mail.gmail.com> <14c23d4bf5b47a7790cff65e70c66151.HRCIM@webmail.1command.com> <AANLkTi=6pqRwJ96Lg=603cYg_f8QUXkg8aXtbjbYpFrV@mail.gmail.com> <201104020335.p323Zp8Q018666@apollo.backplane.com>

next in thread | previous in thread | raw e-mail | index | archive | help
2011/4/2 Matthew Dillon <dillon@apollo.backplane.com>:
> =A0 =A0The core of the issue here comes down to two things:
>
> =A0 =A0First, a power loss to the drive will cause the drive's dirty writ=
e cache
> =A0 =A0to be lost, that data will not make it to disk. =A0Nor do you real=
ly want
> =A0 =A0to turn of write caching on the physical drive. =A0Well, you CAN t=
urn it
> =A0 =A0off, but if you do performance will become so bad that there's no =
point.
> =A0 =A0So turning off the write caching is really a non-starter.
>
> =A0 =A0The solution to this first item is for the OS/filesystem to issue =
a
> =A0 =A0disk flush command to the drive at appropriate times. =A0If I reca=
ll the
> =A0 =A0ZFS implementation in FreeBSD *DOES* do this for transaction group=
s,
> =A0 =A0which guarantees that a prior transaction group is fully synced be=
fore
> =A0 =A0a new ones starts running (HAMMER in DragonFly also does this).
> =A0 =A0(Just getting an 'ack' from the write transaction over the SATA bu=
s only
> =A0 =A0means the data made it to the drive's cache, not that it made it t=
o
> =A0 =A0the platter).

Amen !

> =A0 =A0I'm not sure about UFS vis-a-vie the recent UFS logging features..=
.
> =A0 =A0it might be an option but I don't know if it is a default. =A0Perh=
aps
> =A0 =A0someone can comment on that.
>
> =A0 =A0One last note here. =A0Many modern drives have very large ram cach=
es.
> =A0 =A0OCZ's SSDs have something like 256MB write caches and many modern =
HDs
> =A0 =A0now come with 32MB and 64MB caches. =A0Aged drives with lots of re=
located
> =A0 =A0sectors and bit errors can also take a very long time to perform w=
rites
> =A0 =A0on certain sectors. =A0So these large caches take time to drain an=
d one
> =A0 =A0can't really assume that an acknowledged write to disk will actual=
ly
> =A0 =A0make it to the disk under adverse circumstances any more. =A0All s=
orts
> =A0 =A0of bad things can happen.
>
> =A0 =A0Finally, the drives don't order their writes to the platter (you c=
an
> =A0 =A0set a bit to tell them to, but like many similar bits in the past =
there
> =A0 =A0is no real guarantee that the drives will honor it). =A0So if two
> =A0 =A0transactions do not have a disk flush command inbetween them it is
> =A0 =A0possible for data from the second transaction to commit to the pla=
tter
> =A0 =A0before all the data from the first transaction commits to the plat=
ter.
> =A0 =A0Or worse, for the non-transactional data to update out of order re=
lative
> =A0 =A0to the transactional data which was supposed to commit first.
>
> =A0 =A0Hence IMHO the OS/filesystem must use the disk flush command in su=
ch
> =A0 =A0situations for good reliability.
>
> =A0 =A0--
>
> =A0 =A0The second problem is that a physical loss of power to the drive c=
an
> =A0 =A0cause the drive to physically lose one or more sectors, and can ev=
en
> =A0 =A0effectively destroy the drive (even with the fancy auto-park)... i=
f the
> =A0 =A0drive happens to be in the middle of a track write-back when power=
 is
> =A0 =A0lost it is possible to lose far more than a single sector, includi=
ng
> =A0 =A0sectors unrelated to recent filesystem operations.
>
> =A0 =A0The only solution to #2 is to make sure your machines (or at least=
 the
> =A0 =A0drives if they happen to be in external enclosures) are connected =
to
> =A0 =A0a UPS and that the machines are communicating with the UPS via
> =A0 =A0something like the "apcupsd" port. =A0AND also that you test to ma=
ke
> =A0 =A0sure the machines properly shut themselves down when AC is lost be=
fore
> =A0 =A0the UPS itself runs out of battery time. =A0After all, a UPS won't=
 help
> =A0 =A0if the machines don't at least idle their drives before power is l=
ost!!!
>
> =A0 =A0I learned this lesson the hard way about 3 years ago. =A0I had som=
ething
> =A0 =A0like a dozen drives in two raid arrays doing heavy write activity =
and
> =A0 =A0lost physical power and several of the drives were totally destroy=
ed,
> =A0 =A0with thousands of sector errors. =A0Not just one or two... thousan=
ds.
>
> =A0 =A0(It is unclear how SSDs react to physical loss of power during hea=
vy
> =A0 =A0writing activity. =A0Theoretically while they will certainly lose =
their
> =A0 =A0write cache they shouldn't wind up with any read errors).
>
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 =A0-Matt
>
> _______________________________________________
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"
>



--=20
Olivier Smedts=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=A0 _
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=
 =A0 ASCII ribbon campaign ( )
e-mail: olivier@gid0.org=A0 =A0 =A0 =A0 - against HTML email & vCards=A0 X
www: http://www.gid0.org=A0 =A0 - against proprietary attachments / \

=A0 "Il y a seulement 10 sortes de gens dans le monde :
=A0 ceux qui comprennent le binaire,
=A0 et ceux qui ne le comprennent pas."



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?BANLkTik9aN7TZ_pSZ1b=nMeXO-mW-fYuUA>