From owner-freebsd-questions@FreeBSD.ORG Sun Jun 24 23:41:21 2012 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 673FE106567D for ; Sun, 24 Jun 2012 23:41:21 +0000 (UTC) (envelope-from abalour@gmail.com) Received: from mail-bk0-f54.google.com (mail-bk0-f54.google.com [209.85.214.54]) by mx1.freebsd.org (Postfix) with ESMTP id D984C8FC0A for ; Sun, 24 Jun 2012 23:41:20 +0000 (UTC) Received: by bkvi18 with SMTP id i18so3422420bkv.13 for ; Sun, 24 Jun 2012 16:41:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:reply-to:sender:in-reply-to:references:from:date :x-google-sender-auth:message-id:subject:to:cc:content-type; bh=IZxq7yAFv4vOCMpzZKxYTM/qa4fYM39pmJpDGPecxfo=; b=pL8dpLWPRUtEE3ovpJldeJ5wPM6bhAs0hyYe0QE7pALVyz3d0OaXI8aY7zqiuuQcUm Wfn3Jz6Dq9K9eDQZ06Rskp/7gOHbm/M1Vhdh7uuZ5rU1DEyTBeeqmpnW7vaahaU/lqSp fjjBhVV4B/F2zz1qS+ZZEgknSA229Mp6GynXhlH0GJcpRsjkyYHY2h7iGT8hlCvEe3Ee a5Uvbl4PYQS9E082TAYgK40Xlmc6fAUjqTh3nTGLH4HJWWKQKp6bOxmK+Ny/bzAaqW1F 1MaFHGLL0RS84NkVhpHH1rAX4rWzmdBA+Jr2EOErwu0PcVO4ZA45um7JKzfLcRfQGgqm 8nrQ== Received: by 10.204.151.130 with SMTP id c2mr3471492bkw.125.1340581279698; Sun, 24 Jun 2012 16:41:19 -0700 (PDT) MIME-Version: 1.0 Sender: abalour@gmail.com Received: by 10.205.24.131 with HTTP; Sun, 24 Jun 2012 16:40:58 -0700 (PDT) In-Reply-To: References: <4FE2CE38.9000100@gmail.com> <4FE32FF5.60603@ulb.ac.be> From: Ross Cameron Date: Mon, 25 Jun 2012 01:40:58 +0200 X-Google-Sender-Auth: qc_L5kjO1Rfjv5445FMbDqEzywE Message-ID: To: Wojciech Puchar Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: Julien Cigar , freebsd-questions@freebsd.org Subject: Re: Is ZFS production ready? X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: ross.cameron@linuxpro.co.za List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 24 Jun 2012 23:41:21 -0000 On Thu, Jun 21, 2012 at 4:44 PM, Wojciech Puchar < wojtek@wojtek.tensor.gdynia.pl> wrote: > One interesting feature of ZFS if it's block checksum: all reads and >> writes include block checksum, so it can easily detect situations where, >> for example, data is quietly corrupted by RAM. >> > > you may be shocked but you are sometimes wrong. i already demostrated it > and checksumming doesn't get any errors, and do write wrong data with right > checksums :) > > it's quite easy to explain if one understand hardware details. > > Checksumming will protect you from > > - failed SATA/SAS port, on-disk controller that returns bad data as good. > This is actually really rare case. i never seen that, but maybe it happens. > > - some types of DRAM failure - but not all. Actually just a small fraction > because DRAM failure like that would bring your system to crash so quickly > that you are unlikely to get big data corruption. > > Common case with DRAM memory is that after you write to it, keeps right > data some time and RARELY flips some bit later in spite of refresh. > > With this type you may run your machine for hours, even days or longer. > And ZFS would calculate proper checksum of wrong data and will write it to > disk. > > > This is the reason i keep few failed DIMMs - for testing how different > software behaves on broken machine. > > UFS resulted in few corrupted files after half a day of heavy work and 4 > crashes. fsck always recovered things well (of course "unexpected > softupdate inconsistency....") > > ZFS survived 2 crashes. After third it panicked on startup. > > Of course - no zfs_fsck. > And no possibility of making really good zfs_fsck because of data layout, > at least not easy. > > > > This feature is very important for databases. >> > is data integrity not important for the rest? :) > > Still - disks itself perform quite heavy ECC and both SATA and SAS ports. > While I don't dispute you're test's findings I would like to point out that you are SPECIFICALLY testing for something that the original designers of ZFS (SUN now Oracle) point out VERY clearly as being an issue that you should avoid in you're deployed environments. The filesystem is designed to protect the ON DISK data and being a highly memory intensive filesystem should ALWAYS be deployed on hardware with memory error correction build in (aka ECC RAM deployed across multiple banks). The filesystem comes from an hardware/OS environment that is HEAVILY BIASED towards "self healing" as they put it and as a result things like memory module issues would: 1) Either be corrected by the ECC modules 2) Be reported to the administrator of said system as soon as they occur (well on a system where you have such reporting setup correctly) As a result you're argument is moot....whilst you're findings are indeed still valid. UFS2 being MUCH lighter on RAM requirements is, well frankly, quite possibly not even interacting with the damaged sections of the memory modules in you're test and I am almost certain that if we were to ask around on this mailing list enough examples of UFS/UFS2 corruption due to faulty RAM are VERY VERY likely to come up. No filesystem (or other code for that matter) would be able to detect RAM content corruption (as this is NOT a filesystem's job) and correct it for you as frankly the kernel wouldn't know if the data in the buffers is correct or not without the application storing said data being coded to check for these conditions (I know of a patch to the Linux kernel that does indeed look for faulty RAM segments and works around them but I am *mostly*positive that no general purpose OS in current deployment does so as I have noticed that this behavior was VERY CPU intensive). Also (debate encouraged here) due to the COW nature of ZFS a zfs_fsck command is basically entirely unnecessary as 1) The last successfully completed write to the file will be intact and 2) Scrubbing the on disk content performs a much better filesystem maintenance than an fsck does and this can also be done online without impacting uptimes of you're systems/data availability. On my systems I specifically trigger a scrub (via the ZFS init script) whenever my systems are uncleanly shut down as I am willing to tolerate a slightly slower but available system in such conditions. While UFS2 is indeed an wonderfully reliable filesystem it (as with all things) is not suited to all tasks, there are many instances where I can see the features of ZFS far outweighing the detractions (as do I see the same for the converse state of affairs). While all the above is purely based on my understanding of ZFS (and I am one of the people working on a port to GNU/Linux - admittedly not directly but I spend a LOT of my time reading/cleaning up the code fork that I do use) and SUN's (now Oracle's) design/deployment documents,...it is still my opinion and I would encourage a debate on these opinions.