From owner-freebsd-questions@FreeBSD.ORG Wed Jun 11 19:29:13 2008 Return-Path: Delivered-To: freebsd-questions@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2C6591065673 for ; Wed, 11 Jun 2008 19:29:13 +0000 (UTC) (envelope-from olli@lurza.secnetix.de) Received: from lurza.secnetix.de (unknown [IPv6:2a01:170:102f::2]) by mx1.freebsd.org (Postfix) with ESMTP id A4E0C8FC12 for ; Wed, 11 Jun 2008 19:29:12 +0000 (UTC) (envelope-from olli@lurza.secnetix.de) Received: from lurza.secnetix.de (localhost [127.0.0.1]) by lurza.secnetix.de (8.14.1/8.14.1) with ESMTP id m5BJTATv070814; Wed, 11 Jun 2008 21:29:11 +0200 (CEST) (envelope-from oliver.fromme@secnetix.de) Received: (from olli@localhost) by lurza.secnetix.de (8.14.1/8.14.1/Submit) id m5BJTAPY070813; Wed, 11 Jun 2008 21:29:10 +0200 (CEST) (envelope-from olli) Date: Wed, 11 Jun 2008 21:29:10 +0200 (CEST) Message-Id: <200806111929.m5BJTAPY070813@lurza.secnetix.de> From: Oliver Fromme To: freebsd-questions@FreeBSD.ORG In-Reply-To: <20080611202926.X73093@wojtek.tensor.gdynia.pl> X-Newsgroups: list.freebsd-questions User-Agent: tin/1.8.3-20070201 ("Scotasay") (UNIX) (FreeBSD/6.2-STABLE-20070808 (i386)) MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-2.1.2 (lurza.secnetix.de [127.0.0.1]); Wed, 11 Jun 2008 21:29:11 +0200 (CEST) Cc: Subject: Re: FreeBSD + ZFS on a production server? X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: freebsd-questions@FreeBSD.ORG List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 Jun 2008 19:29:13 -0000 [attribution fixed] Wojciech Puchar wrote: > Oliver Fromme wrote: > > Wojciech Puchar wrote: > > > 3) a CPU,cache and memory bandwidth hogging "feature" of checksumming all > > > blocks. thing that are already done in disk hardware. fortunately you can > > > turn this off > > > > Obviously you have been lucky to never be a victim of > > silent disk corruption (or you just haven't noticed). > > what you mean. that disk wrote the data wrong and doesn't detect it on > read? i would mean broken disk processor, it's memory etc. Correct. It does happen. > possible - as much as broken main processor, main memory, some of chips on > motherboard etc. - A broken processor usually results in random crashes, not silent data corruption. Broken memory will be noticed if it supports ECC, otherwise it will also result in crashes, most probably. > which will make ZFS calculate checksum wrong on write, Even if that happens (without crashes or other things that you'll notice immediately), the error will be detected by ZFS and fixed ("healed") if possible, i.e. when running with redundancy and at least one copy has a good checksum. (GELI can only detect, but not fix. ZFS can fix it, too. I assume in theory it would be possible to make geli co- operate with gmirror so it could fix bad blocks, too, but that's just theory. ZFS is reality.) > or even calculate checksum right of wrong data generated by badly > operating programs. What do you mean, wrong data generated by programs? If a program generates wrong output, there's nothing any file system could do about that. That's not the file system's job at all. The file systems job is to ensure the integrity of data written to the disk, and ZFS does exactly that. > given the complexity of motherboard+CPU etc. to complexity of disk > hardware, i don't think "silent disk failure" happens often. Fortunately it doesn't happen often, but it does happen. And when it happens, you are in really serious trouble. You usually notice it when it's too late and the last good backup media was already recycled. > i think all your cases wasn't disk, but general hardware problems. In my case it was a disk with media surface errors, and the disk failed to report the error properly to the OS. Instead it just returned bad data. > ZFS may help detect it, or it may not. if it helped for you. Please stop spreading FUD. There is no "may or may not". If a disk returns bad data, ZFS _will_ detect it. Silent corruption _cannot_ happen with ZFS, except if you disable the checksumming feature intentionally. > even without ZFS it WOULD cause problems with programs like random > crashes. Please elaborate what the problem is, if you think there is one. > personally i often got disk failing the way that it was unable to read or > write giving an error, but never things like that. As I said: You were lucky. > > You're free to use UFS, of course, and keep suffering > > from its shortcomings. > > i have to start suffering at first.... Many people suffer without knowing. :-) I do suffer from UFS' shortcomings on many machines on which I can't use ZFS (or other file systems) for various reasons. Best regards Oliver -- Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M. Handelsregister: Registergericht Muenchen, HRA 74606, Geschäftsfuehrung: secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün- chen, HRB 125758, Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd With Perl you can manipulate text, interact with programs, talk over networks, drive Web pages, perform arbitrary precision arithmetic, and write programs that look like Snoopy swearing.