From owner-freebsd-current@FreeBSD.ORG Sun Dec 16 02:56:29 2007 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1F79F16A419 for ; Sun, 16 Dec 2007 02:56:29 +0000 (UTC) (envelope-from ticso@cicely12.cicely.de) Received: from raven.bwct.de (raven.bwct.de [85.159.14.73]) by mx1.freebsd.org (Postfix) with ESMTP id C807313C45A for ; Sun, 16 Dec 2007 02:56:28 +0000 (UTC) (envelope-from ticso@cicely12.cicely.de) Received: from cicely5.cicely.de ([10.1.1.7]) by raven.bwct.de (8.13.4/8.13.4) with ESMTP id lBG2h518032486; Sun, 16 Dec 2007 03:43:06 +0100 (CET) (envelope-from ticso@cicely12.cicely.de) Received: from cicely12.cicely.de (cicely12.cicely.de [10.1.1.14]) by cicely5.cicely.de (8.13.4/8.13.4) with ESMTP id lBG2h01O024156 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sun, 16 Dec 2007 03:43:00 +0100 (CET) (envelope-from ticso@cicely12.cicely.de) Received: from cicely12.cicely.de (localhost [127.0.0.1]) by cicely12.cicely.de (8.13.4/8.13.3) with ESMTP id lBG2gxUO050999; Sun, 16 Dec 2007 03:42:59 +0100 (CET) (envelope-from ticso@cicely12.cicely.de) Received: (from ticso@localhost) by cicely12.cicely.de (8.13.4/8.13.3/Submit) id lBG2gxe8050998; Sun, 16 Dec 2007 03:42:59 +0100 (CET) (envelope-from ticso) Date: Sun, 16 Dec 2007 03:42:59 +0100 From: Bernd Walter To: Ivan Voras Message-ID: <20071216024259.GI48684@cicely12.cicely.de> References: <06CAC7FC-DB58-441D-A6E0-76D1D8133393@tamu.edu> <86ir31xwlu.fsf@ds4.des.no> <476343B4.8080208@FreeBSD.org> <86tzmk54tt.fsf@ds4.des.no> <476419CD.9070401@terranova.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Operating-System: FreeBSD cicely12.cicely.de 5.4-STABLE alpha User-Agent: Mutt/1.5.9i X-Spam-Status: No, score=-4.2 required=5.0 tests=ALL_TRUSTED=-1.8, AWL=0.149, BAYES_00=-2.599 autolearn=ham version=3.1.7 X-Spam-Checker-Version: SpamAssassin 3.1.7 (2006-10-05) on cicely12.cicely.de Cc: freebsd-current@freebsd.org Subject: Re: ZFS melting under postgres... X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: ticso@cicely.de List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Dec 2007 02:56:29 -0000 On Sat, Dec 15, 2007 at 11:04:04PM +0100, Ivan Voras wrote: > Travis Mikalson wrote: > > > If you're using compact flash for something that's constantly updated > > like a ZIL, wouldn't your CF card die real quick? > > Probably, for constant updates to the same areas. But as you say: CF and the flash based SSD drives rotate the flash cells anyway, so it doesn't matter that much if you write the same block or not. I wouldn't worry about wearing out those devices, since todays media survive many writes. > > Since a ZIL is not really seek-intensive, why not just offload it to its > > own standard hard disk that has its write caching and all other similar > > data-corrupting technologies disabled? > > Yes. I don't see a point writing a log that's mostly sequantially > accessed on a SSD, and which probably wears the same areas on the drive. > I'm more interested in loads like databases. I wouldn't do both with them unless required for a specific reason. The problem is how they work. They contain NAND flash chips which have two data areas containing data blocks of typically slightly more than 4 or 8kB these days. One area is 100% error free with high write rate, but small and the other is of much less quality, but large. Devices use the later for the offered data blocks and the good cells for maintening allocation of them. One problem is with the data blocks beeing that big, when writing 512 Byte you effectifly do a read-modify-write of a larger physical block. This can be handled quite well with larger FS block. The much bigger problem is with power loss when writing such a maintenence block. You loose a very large area of logical blocks when this fails, since a 4k maintenence block contains the allocation for several hundert kB of logical data blocks. In other words - you possibly loose data blocks that were not written a long time and the database wouldn't expect a problem with that data. Even for ZIL it is very questionable if you loose a large data area, since the purpose is to have the data that was already sinced readable after a power loss. I'm not sure what happens in case of a device reset in the wrong moment, possibly this depends on the specific media, but I wouldn't be surprised to see read errors after a reset without power loss as well. This is true with all NAND based flash media, SD, MMC, SM, CF, ... There are medias which are less critical because of the way they utulize the maintenance blocks, but those things are usually a secret to the vendor. I do run PostgreSQL on SD media with ARM based FreeBSD systems, but I'm prepared to loose the whole database and to recover it from backup if things go wrong. -- B.Walter http://www.bwct.de http://www.fizon.de bernd@bwct.de info@bwct.de support@fizon.de