From owner-freebsd-fs@FreeBSD.ORG  Mon Jan  7 03:09:31 2008
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 4590316A417
	for <freebsd-fs@freebsd.org>; Mon,  7 Jan 2008 03:09:31 +0000 (UTC)
	(envelope-from tzhuan@gmail.com)
Received: from fg-out-1718.google.com (fg-out-1718.google.com [72.14.220.155])
	by mx1.freebsd.org (Postfix) with ESMTP id C447B13C46A
	for <freebsd-fs@freebsd.org>; Mon,  7 Jan 2008 03:09:30 +0000 (UTC)
	(envelope-from tzhuan@gmail.com)
Received: by fg-out-1718.google.com with SMTP id 16so4940216fgg.35
	for <freebsd-fs@freebsd.org>; Sun, 06 Jan 2008 19:09:30 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:received:received:message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references:x-google-sender-auth;
	bh=AJU/9jyX/gdWGxgfbQLuSDUcQ9NRDemHwd3aAuwHX+I=;
	b=GK8A281gaSUHpZp9LE/uTAXAxuoL5XWiCsE8boyMk2mInPbsqQ7NN4txLAtrozuG00traxtOnW0ysc8i9Ra5PaAFiAZ5aBeCaQbxjU1MnY83bed3WPKqprZJoOvYmZZx+FXeQAC/QHETx+r/Lx2ggvQRIXm4cG71ylfOi07mchs=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references:x-google-sender-auth;
	b=toWCOy8gMSCL8gp2Cp3dyDrSB5UVCpvCFl2RoHDcwyiHRbZHazBkZYQCnx7lRxRYNFXQh0X74mfKOPP5CnjxJ5Us5HzVDiar4Z+SKm9pH7o/m2dopMfBxXPJJfAmh+VSNHXjJzYxFT5enWk78GVQZxG9opoi3REVkTwieiPqVYs=
Received: by 10.86.77.5 with SMTP id z5mr6664451fga.41.1199673853692;
	Sun, 06 Jan 2008 18:44:13 -0800 (PST)
Received: by 10.86.79.20 with HTTP; Sun, 6 Jan 2008 18:44:13 -0800 (PST)
Message-ID: <6a7033710801061844m59f8c62dvdd3eea80f6c239c1@mail.gmail.com>
Date: Mon, 7 Jan 2008 10:44:13 +0800
From: "Tz-Huan Huang" <tzhuan@csie.org>
Sender: tzhuan@gmail.com
To: "Brooks Davis" <brooks@freebsd.org>
In-Reply-To: <20080103171825.GA28361@lor.one-eyed-alien.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
References: <477B16BB.8070104@freebsd.org>
	<20080102070146.GH49874@cicely12.cicely.de>
	<477B8440.1020501@freebsd.org>
	<200801031750.31035.peter.schuller@infidyne.com>
	<477D16EE.6070804@freebsd.org>
	<20080103171825.GA28361@lor.one-eyed-alien.net>
X-Google-Sender-Auth: 9bfff906d4a5e24d
Cc: freebsd-fs@freebsd.org
Subject: Re: ZFS i/o errors - which disk is the problem?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 07 Jan 2008 03:09:31 -0000

2008/1/4, Brooks Davis <brooks@freebsd.org>:
>
> We've definitely seen cases where hardware changes fixed ZFS checksum errors.
> In once case, a firmware upgrade on the raid controller fixed it.  In another
> case, we'd been connecting to an external array with a SCSI card that didn't
> have a PCI bracket and the errors went away when the replacement one arrived
> and was installed.  The fact that there were significant errors caught by ZFS
> was quite disturbing since we wouldn't have found them with UFS.

Hi,

We have a nfs server using zfs with the similar problem.
The box is i386 7.0-PRERELEASE with 3G ram:

# uname -a
FreeBSD cml3 7.0-PRERELEASE FreeBSD 7.0-PRERELEASE #2:
Sat Jan  5 14:42:41 CST 2008 root@cml3:/usr/obj/usr/src/sys/CML2  i386

The zfs pool contains 3 raids now:

2007-11-20.11:49:17 zpool create pool /dev/label/proware263
2007-11-20.11:53:31 zfs create pool/project
... (zfs create other filesystems) ...
2007-11-20.11:54:32 zfs set atime=off pool
2007-12-08.22:59:15 zpool add pool /dev/da0
2008-01-05.21:20:03 zpool add pool /dev/label/proware262

After a power loss yesterday, the zfs status shows

# zpool status -v
  pool: pool
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: scrub completed with 231 errors on Mon Jan  7 08:05:35 2008
config:

        NAME                STATE     READ WRITE CKSUM
        pool                ONLINE       0     0   516
          label/proware263  ONLINE       0     0   231
          da0               ONLINE       0     0   285
          label/proware262  ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        /system/database/mysql/flickr_geo/flickr_raw_tag.MYI
        pool/project:<0x0>
        pool/home/master/96:<0xbf36>

The main problem is that we cannot mount pool/project any more:

# zfs mount pool/project
cannot mount 'pool/project': Input/output error
# grep ZFS /var/log/messages
Jan  7 10:08:35 cml3 root: ZFS: zpool I/O failure, zpool=pool error=86
(repeat many times)

There are many data in pool/project, probably 3.24T. zdb shows

# zdb pool
...
Dataset pool/project [ZPL], ID 33, cr_txg 57, 3.24T, 22267231 objects
...

(zdb is still running now, we can provide the output if helpful)

Is there any way to recover any data from pool/project?
Thank you very much.

Sincerely,
Tz-Huan

From owner-freebsd-fs@FreeBSD.ORG  Mon Jan  7 11:06:59 2008
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@hub.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 9F36F16A4DA
	for <freebsd-fs@hub.freebsd.org>; Mon,  7 Jan 2008 11:06:59 +0000 (UTC)
	(envelope-from owner-bugmaster@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id 933CE13C447
	for <freebsd-fs@hub.freebsd.org>; Mon,  7 Jan 2008 11:06:59 +0000 (UTC)
	(envelope-from owner-bugmaster@FreeBSD.org)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.2/8.14.2) with ESMTP id m07B6x3Z061761
	for <freebsd-fs@FreeBSD.org>; Mon, 7 Jan 2008 11:06:59 GMT
	(envelope-from owner-bugmaster@FreeBSD.org)
Received: (from gnats@localhost)
	by freefall.freebsd.org (8.14.2/8.14.1/Submit) id m07B6wWU061757
	for freebsd-fs@FreeBSD.org; Mon, 7 Jan 2008 11:06:58 GMT
	(envelope-from owner-bugmaster@FreeBSD.org)
Date: Mon, 7 Jan 2008 11:06:58 GMT
Message-Id: <200801071106.m07B6wWU061757@freefall.freebsd.org>
X-Authentication-Warning: freefall.freebsd.org: gnats set sender to
	owner-bugmaster@FreeBSD.org using -f
From: FreeBSD bugmaster <bugmaster@FreeBSD.org>
To: freebsd-fs@FreeBSD.org
Cc: 
Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 07 Jan 2008 11:06:59 -0000

Current FreeBSD problem reports
Critical problems
Serious problems

S Tracker      Resp.      Description
--------------------------------------------------------------------------------
o kern/112658  fs         [smbfs] [patch] smbfs and caching problems (resolves b
o kern/114676  fs         [ufs] snapshot creation panics: snapacct_ufs2: bad blo
o kern/114856  fs         [ntfs] [patch] Bug in NTFS allows bogus file modes.
o kern/116170  fs         Kernel panic when mounting /tmp

4 problems total.

Non-critical problems

S Tracker      Resp.      Description
--------------------------------------------------------------------------------
o kern/114847  fs         [ntfs] [patch] dirmask support for NTFS ala MSDOSFS
o bin/118249   fs         mv(1): moving a directory changes its mtime

2 problems total.


From owner-freebsd-fs@FreeBSD.ORG  Mon Jan  7 13:59:47 2008
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 7369E16A41B
	for <freebsd-fs@freebsd.org>; Mon,  7 Jan 2008 13:59:47 +0000 (UTC)
	(envelope-from ticso@cicely12.cicely.de)
Received: from raven.bwct.de (raven.bwct.de [85.159.14.73])
	by mx1.freebsd.org (Postfix) with ESMTP id F3B8113C467
	for <freebsd-fs@freebsd.org>; Mon,  7 Jan 2008 13:59:46 +0000 (UTC)
	(envelope-from ticso@cicely12.cicely.de)
Received: from cicely5.cicely.de ([10.1.1.7])
	by raven.bwct.de (8.13.4/8.13.4) with ESMTP id m07DxcQE029856;
	Mon, 7 Jan 2008 14:59:38 +0100 (CET)
	(envelope-from ticso@cicely12.cicely.de)
Received: from cicely12.cicely.de (cicely12.cicely.de [10.1.1.14])
	by cicely5.cicely.de (8.13.4/8.13.4) with ESMTP id m07DxRFl075184
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Mon, 7 Jan 2008 14:59:27 +0100 (CET)
	(envelope-from ticso@cicely12.cicely.de)
Received: from cicely12.cicely.de (localhost [127.0.0.1])
	by cicely12.cicely.de (8.13.4/8.13.3) with ESMTP id m07DxQbl075918;
	Mon, 7 Jan 2008 14:59:26 +0100 (CET)
	(envelope-from ticso@cicely12.cicely.de)
Received: (from ticso@localhost)
	by cicely12.cicely.de (8.13.4/8.13.3/Submit) id m07DxQZO075917;
	Mon, 7 Jan 2008 14:59:26 +0100 (CET) (envelope-from ticso)
Date: Mon, 7 Jan 2008 14:59:26 +0100
From: Bernd Walter <ticso@cicely12.cicely.de>
To: Tz-Huan Huang <tzhuan@csie.org>
Message-ID: <20080107135925.GF65134@cicely12.cicely.de>
References: <477B16BB.8070104@freebsd.org>
	<20080102070146.GH49874@cicely12.cicely.de>
	<477B8440.1020501@freebsd.org>
	<200801031750.31035.peter.schuller@infidyne.com>
	<477D16EE.6070804@freebsd.org>
	<20080103171825.GA28361@lor.one-eyed-alien.net>
	<6a7033710801061844m59f8c62dvdd3eea80f6c239c1@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <6a7033710801061844m59f8c62dvdd3eea80f6c239c1@mail.gmail.com>
X-Operating-System: FreeBSD cicely12.cicely.de 5.4-STABLE alpha
User-Agent: Mutt/1.5.9i
X-Spam-Status: No, score=-4.4 required=5.0 tests=ALL_TRUSTED=-1.8,
	BAYES_00=-2.599 autolearn=ham version=3.2.3
X-Spam-Checker-Version: SpamAssassin 3.2.3 (2007-08-08) on cicely12.cicely.de
Cc: freebsd-fs@freebsd.org, Brooks Davis <brooks@freebsd.org>
Subject: Re: ZFS i/o errors - which disk is the problem?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: ticso@cicely.de
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 07 Jan 2008 13:59:47 -0000

On Mon, Jan 07, 2008 at 10:44:13AM +0800, Tz-Huan Huang wrote:
> 2008/1/4, Brooks Davis <brooks@freebsd.org>:
> >
> > We've definitely seen cases where hardware changes fixed ZFS checksum errors.
> > In once case, a firmware upgrade on the raid controller fixed it.  In another
> > case, we'd been connecting to an external array with a SCSI card that didn't
> > have a PCI bracket and the errors went away when the replacement one arrived
> > and was installed.  The fact that there were significant errors caught by ZFS
> > was quite disturbing since we wouldn't have found them with UFS.
> 
> Hi,
> 
> We have a nfs server using zfs with the similar problem.
> The box is i386 7.0-PRERELEASE with 3G ram:
> 
> # uname -a
> FreeBSD cml3 7.0-PRERELEASE FreeBSD 7.0-PRERELEASE #2:
> Sat Jan  5 14:42:41 CST 2008 root@cml3:/usr/obj/usr/src/sys/CML2  i386
> 
> The zfs pool contains 3 raids now:
> 
> 2007-11-20.11:49:17 zpool create pool /dev/label/proware263
> 2007-11-20.11:53:31 zfs create pool/project
> ... (zfs create other filesystems) ...
> 2007-11-20.11:54:32 zfs set atime=off pool
> 2007-12-08.22:59:15 zpool add pool /dev/da0
> 2008-01-05.21:20:03 zpool add pool /dev/label/proware262
> 
> After a power loss yesterday, the zfs status shows
> 
> # zpool status -v
>   pool: pool
>  state: ONLINE
> status: One or more devices has experienced an error resulting in data
>         corruption.  Applications may be affected.
> action: Restore the file in question if possible.  Otherwise restore the
>         entire pool from backup.
>    see: http://www.sun.com/msg/ZFS-8000-8A
>  scrub: scrub completed with 231 errors on Mon Jan  7 08:05:35 2008
> config:
> 
>         NAME                STATE     READ WRITE CKSUM
>         pool                ONLINE       0     0   516
>           label/proware263  ONLINE       0     0   231
>           da0               ONLINE       0     0   285
>           label/proware262  ONLINE       0     0     0
> 
> errors: Permanent errors have been detected in the following files:
> 
>         /system/database/mysql/flickr_geo/flickr_raw_tag.MYI
>         pool/project:<0x0>
>         pool/home/master/96:<0xbf36>
> 
> The main problem is that we cannot mount pool/project any more:
> 
> # zfs mount pool/project
> cannot mount 'pool/project': Input/output error
> # grep ZFS /var/log/messages
> Jan  7 10:08:35 cml3 root: ZFS: zpool I/O failure, zpool=pool error=86
> (repeat many times)
> 
> There are many data in pool/project, probably 3.24T. zdb shows
> 
> # zdb pool
> ...
> Dataset pool/project [ZPL], ID 33, cr_txg 57, 3.24T, 22267231 objects
> ...
> 
> (zdb is still running now, we can provide the output if helpful)
> 
> Is there any way to recover any data from pool/project?

The data is corrupted by controller and/or disk subsystem.
You have no other data sources for the broken data, so it is lost.
The only garantied way is to get it back from backup.
Maybe older snapshots/clones are still readable - I don't know.
Nevertheless data is corrupted and that's the purpose for alternative
data sources such as raidz/mirror and at last backup.
You shouldn't have ignored those errors at first, because you are
running with faulty hardware.
Without ZFS checksumming the system would just process the broken
data with unpredictable results.
If all those errors are fresh then you likely used a broken RAID
controller below ZFS, which silently corrupted syncronity and then
blow when disk state changed.
Unfortunately many RAID controllers are broken and therefor useless.

-- 
B.Walter                http://www.bwct.de      http://www.fizon.de
bernd@bwct.de           info@bwct.de            support@fizon.de

From owner-freebsd-fs@FreeBSD.ORG  Mon Jan  7 17:17:05 2008
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 20ED316A421
	for <freebsd-fs@freebsd.org>; Mon,  7 Jan 2008 17:17:05 +0000 (UTC)
	(envelope-from tzhuan@gmail.com)
Received: from fg-out-1718.google.com (fg-out-1718.google.com [72.14.220.155])
	by mx1.freebsd.org (Postfix) with ESMTP id 9901E13C4F0
	for <freebsd-fs@freebsd.org>; Mon,  7 Jan 2008 17:17:04 +0000 (UTC)
	(envelope-from tzhuan@gmail.com)
Received: by fg-out-1718.google.com with SMTP id 16so5092776fgg.35
	for <freebsd-fs@freebsd.org>; Mon, 07 Jan 2008 09:17:03 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:received:received:message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references:x-google-sender-auth;
	bh=UiFtb6+cU0UdSnDSQqa7Ohunxpr8LC5yIQE2rDxW+O0=;
	b=iJlyTltjHMiCYl/bvpzM5zx+Hxmk2mBJfZlMKs3SpJXzZq5sPJAZ3mW8vfgD0p+9rKRK5G3d3AG+1Pb2f6hqJ4PgxzG4QwCcWqW4GTC0plplKC33UcZceXzYFK1HzalJmqLV+UADtHC4X/jMmFpY5hS9KNyAlK28HRBO0LEsqls=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references:x-google-sender-auth;
	b=tpF62mr2NnyI3EO89qFMltWw08ylfGd3ezjoQGPNtjv2bBfP0lhV/B4S9y0LEjzpOkxABY9kq9kE3nvNd74yrYRBa5WF6rdXGXtx9rK+6AMvOWpQVne8vZsIYFwQ9Sn5k4Qf3M4mYgbsVh22lu0PDosCq0oAY2FPzTLzb9VAhgc=
Received: by 10.86.66.1 with SMTP id o1mr20313303fga.36.1199726223597;
	Mon, 07 Jan 2008 09:17:03 -0800 (PST)
Received: by 10.86.23.20 with HTTP; Mon, 7 Jan 2008 09:17:03 -0800 (PST)
Message-ID: <6a7033710801070917w4b453f10l7115bd9fe3a53a1b@mail.gmail.com>
Date: Tue, 8 Jan 2008 01:17:03 +0800
From: "Tz-Huan Huang" <tzhuan@csie.org>
Sender: tzhuan@gmail.com
To: ticso@cicely.de
In-Reply-To: <20080107135925.GF65134@cicely12.cicely.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
References: <477B16BB.8070104@freebsd.org>
	<20080102070146.GH49874@cicely12.cicely.de>
	<477B8440.1020501@freebsd.org>
	<200801031750.31035.peter.schuller@infidyne.com>
	<477D16EE.6070804@freebsd.org>
	<20080103171825.GA28361@lor.one-eyed-alien.net>
	<6a7033710801061844m59f8c62dvdd3eea80f6c239c1@mail.gmail.com>
	<20080107135925.GF65134@cicely12.cicely.de>
X-Google-Sender-Auth: e5fa2e18a4fdf515
Cc: freebsd-fs@freebsd.org, Brooks Davis <brooks@freebsd.org>
Subject: Re: ZFS i/o errors - which disk is the problem?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 07 Jan 2008 17:17:05 -0000

2008/1/7, Bernd Walter <ticso@cicely12.cicely.de>:
> The data is corrupted by controller and/or disk subsystem.
> You have no other data sources for the broken data, so it is lost.
> The only garantied way is to get it back from backup.
> Maybe older snapshots/clones are still readable - I don't know.
> Nevertheless data is corrupted and that's the purpose for alternative
> data sources such as raidz/mirror and at last backup.
> You shouldn't have ignored those errors at first, because you are
> running with faulty hardware.
> Without ZFS checksumming the system would just process the broken
> data with unpredictable results.
> If all those errors are fresh then you likely used a broken RAID
> controller below ZFS, which silently corrupted syncronity and then
> blow when disk state changed.
> Unfortunately many RAID controllers are broken and therefor useless.

Hi,

Thank you very much for your answer.

We have run the self-test for all raid controllers and they all reported ok.
Do you mean that many raid controllers are broken (buggy?) even if the
self-test is passed? If all the disks are pass-through to the zfs, is
it the safe
way to use the buggy controllers?
Thank you very much.

Sincerely yours,
Tz-Huan

From owner-freebsd-fs@FreeBSD.ORG  Mon Jan  7 19:22:05 2008
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 7861D16A46D;
	Mon,  7 Jan 2008 19:22:05 +0000 (UTC)
	(envelope-from ticso@cicely12.cicely.de)
Received: from raven.bwct.de (raven.bwct.de [85.159.14.73])
	by mx1.freebsd.org (Postfix) with ESMTP id 1078413C448;
	Mon,  7 Jan 2008 19:22:04 +0000 (UTC)
	(envelope-from ticso@cicely12.cicely.de)
Received: from cicely5.cicely.de ([10.1.1.7])
	by raven.bwct.de (8.13.4/8.13.4) with ESMTP id m07JLwpd044625;
	Mon, 7 Jan 2008 20:21:58 +0100 (CET)
	(envelope-from ticso@cicely12.cicely.de)
Received: from cicely12.cicely.de (cicely12.cicely.de [10.1.1.14])
	by cicely5.cicely.de (8.13.4/8.13.4) with ESMTP id m07JLi1W078037
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Mon, 7 Jan 2008 20:21:45 +0100 (CET)
	(envelope-from ticso@cicely12.cicely.de)
Received: from cicely12.cicely.de (localhost [127.0.0.1])
	by cicely12.cicely.de (8.13.4/8.13.3) with ESMTP id m07JLiUg076746;
	Mon, 7 Jan 2008 20:21:44 +0100 (CET)
	(envelope-from ticso@cicely12.cicely.de)
Received: (from ticso@localhost)
	by cicely12.cicely.de (8.13.4/8.13.3/Submit) id m07JLibM076745;
	Mon, 7 Jan 2008 20:21:44 +0100 (CET) (envelope-from ticso)
Date: Mon, 7 Jan 2008 20:21:44 +0100
From: Bernd Walter <ticso@cicely12.cicely.de>
To: Tz-Huan Huang <tzhuan@csie.org>
Message-ID: <20080107192143.GC76422@cicely12.cicely.de>
References: <477B16BB.8070104@freebsd.org>
	<20080102070146.GH49874@cicely12.cicely.de>
	<477B8440.1020501@freebsd.org>
	<200801031750.31035.peter.schuller@infidyne.com>
	<477D16EE.6070804@freebsd.org>
	<20080103171825.GA28361@lor.one-eyed-alien.net>
	<6a7033710801061844m59f8c62dvdd3eea80f6c239c1@mail.gmail.com>
	<20080107135925.GF65134@cicely12.cicely.de>
	<6a7033710801070917w4b453f10l7115bd9fe3a53a1b@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <6a7033710801070917w4b453f10l7115bd9fe3a53a1b@mail.gmail.com>
X-Operating-System: FreeBSD cicely12.cicely.de 5.4-STABLE alpha
User-Agent: Mutt/1.5.9i
X-Spam-Status: No, score=-4.4 required=5.0 tests=ALL_TRUSTED=-1.8,
	BAYES_00=-2.599 autolearn=ham version=3.2.3
X-Spam-Checker-Version: SpamAssassin 3.2.3 (2007-08-08) on cicely12.cicely.de
Cc: freebsd-fs@freebsd.org, Brooks Davis <brooks@freebsd.org>, ticso@cicely.de
Subject: Re: ZFS i/o errors - which disk is the problem?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: ticso@cicely.de
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 07 Jan 2008 19:22:05 -0000

On Tue, Jan 08, 2008 at 01:17:03AM +0800, Tz-Huan Huang wrote:
> 2008/1/7, Bernd Walter <ticso@cicely12.cicely.de>:
> > The data is corrupted by controller and/or disk subsystem.
> > You have no other data sources for the broken data, so it is lost.
> > The only garantied way is to get it back from backup.
> > Maybe older snapshots/clones are still readable - I don't know.
> > Nevertheless data is corrupted and that's the purpose for alternative
> > data sources such as raidz/mirror and at last backup.
> > You shouldn't have ignored those errors at first, because you are
> > running with faulty hardware.
> > Without ZFS checksumming the system would just process the broken
> > data with unpredictable results.
> > If all those errors are fresh then you likely used a broken RAID
> > controller below ZFS, which silently corrupted syncronity and then
> > blow when disk state changed.
> > Unfortunately many RAID controllers are broken and therefor useless.
> 
> Hi,
> 
> Thank you very much for your answer.
> 
> We have run the self-test for all raid controllers and they all reported ok.
> Do you mean that many raid controllers are broken (buggy?) even if the
> self-test is passed? If all the disks are pass-through to the zfs, is
> it the safe
> way to use the buggy controllers?

If the controller is that buggy that even their own self test fails
it would be even worse.
But they can't test if they corrupted data - they just can test the
current state and the syncronisation.
They could do massive read/write tests, but this would mean overwriting
the current data.
If you export single disks ZFS can handle this using the redundancy,
which means if it encounters an error it can use the other disks
to recover the data.
In your case ZFS doesn't know about redundancy and your controller
returns faulty data, so there is no try to recover.
You RAID controller can't help either because it isn't aware of it's
own mess, since it is not using CRC itself and even then it could also
be a case were the data gets corrupted while transmitting into the
host or from the host, or even a driver problem.
But relying on ZFS is not a safe way either, just a bit less critical.
Safe is only to not use buggy controller at all.
The good point with ZFS CRC is that you are aware of the problem even
in case of corrupted file data.
In your case unfortunately it seem to have broken too much.

-- 
B.Walter                http://www.bwct.de      http://www.fizon.de
bernd@bwct.de           info@bwct.de            support@fizon.de

From owner-freebsd-fs@FreeBSD.ORG  Tue Jan  8 05:37:27 2008
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 9DBED16A468
	for <freebsd-fs@freebsd.org>; Tue,  8 Jan 2008 05:37:27 +0000 (UTC)
	(envelope-from scottl@samsco.org)
Received: from pooker.samsco.org (pooker.samsco.org [168.103.85.57])
	by mx1.freebsd.org (Postfix) with ESMTP id 518DA13C4D1
	for <freebsd-fs@freebsd.org>; Tue,  8 Jan 2008 05:37:26 +0000 (UTC)
	(envelope-from scottl@samsco.org)
Received: from phobos.samsco.home (phobos.samsco.home [192.168.254.11])
	(authenticated bits=0)
	by pooker.samsco.org (8.13.8/8.13.8) with ESMTP id m085a0vW010919;
	Mon, 7 Jan 2008 22:36:01 -0700 (MST)
	(envelope-from scottl@samsco.org)
Message-ID: <47830BC0.5060100@samsco.org>
Date: Mon, 07 Jan 2008 22:36:00 -0700
From: Scott Long <scottl@samsco.org>
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US;
	rv:1.8.1.11) Gecko/20071128 SeaMonkey/1.1.7
MIME-Version: 1.0
To: ticso@cicely.de
References: <477B16BB.8070104@freebsd.org>	<20080102070146.GH49874@cicely12.cicely.de>	<477B8440.1020501@freebsd.org>	<200801031750.31035.peter.schuller@infidyne.com>	<477D16EE.6070804@freebsd.org>	<20080103171825.GA28361@lor.one-eyed-alien.net>	<6a7033710801061844m59f8c62dvdd3eea80f6c239c1@mail.gmail.com>
	<20080107135925.GF65134@cicely12.cicely.de>
In-Reply-To: <20080107135925.GF65134@cicely12.cicely.de>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by
	milter-greylist-2.0.2 (pooker.samsco.org [168.103.85.57]);
	Mon, 07 Jan 2008 22:36:02 -0700 (MST)
X-Spam-Status: No, score=-1.4 required=5.4 tests=ALL_TRUSTED autolearn=failed
	version=3.1.8
X-Spam-Checker-Version: SpamAssassin 3.1.8 (2007-02-13) on pooker.samsco.org
Cc: freebsd-fs@freebsd.org, Brooks Davis <brooks@freebsd.org>,
	Tz-Huan Huang <tzhuan@csie.org>
Subject: Re: ZFS i/o errors - which disk is the problem?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 08 Jan 2008 05:37:27 -0000

Bernd Walter wrote:
> On Mon, Jan 07, 2008 at 10:44:13AM +0800, Tz-Huan Huang wrote:
>> 2008/1/4, Brooks Davis <brooks@freebsd.org>:
>>> We've definitely seen cases where hardware changes fixed ZFS checksum errors.
>>> In once case, a firmware upgrade on the raid controller fixed it.  In another
>>> case, we'd been connecting to an external array with a SCSI card that didn't
>>> have a PCI bracket and the errors went away when the replacement one arrived
>>> and was installed.  The fact that there were significant errors caught by ZFS
>>> was quite disturbing since we wouldn't have found them with UFS.
>> Hi,
>>
>> We have a nfs server using zfs with the similar problem.
>> The box is i386 7.0-PRERELEASE with 3G ram:
>>
>> # uname -a
>> FreeBSD cml3 7.0-PRERELEASE FreeBSD 7.0-PRERELEASE #2:
>> Sat Jan  5 14:42:41 CST 2008 root@cml3:/usr/obj/usr/src/sys/CML2  i386
>>
>> The zfs pool contains 3 raids now:
>>
>> 2007-11-20.11:49:17 zpool create pool /dev/label/proware263
>> 2007-11-20.11:53:31 zfs create pool/project
>> ... (zfs create other filesystems) ...
>> 2007-11-20.11:54:32 zfs set atime=off pool
>> 2007-12-08.22:59:15 zpool add pool /dev/da0
>> 2008-01-05.21:20:03 zpool add pool /dev/label/proware262
>>
>> After a power loss yesterday, the zfs status shows
>>
>> # zpool status -v
>>   pool: pool
>>  state: ONLINE
>> status: One or more devices has experienced an error resulting in data
>>         corruption.  Applications may be affected.
>> action: Restore the file in question if possible.  Otherwise restore the
>>         entire pool from backup.
>>    see: http://www.sun.com/msg/ZFS-8000-8A
>>  scrub: scrub completed with 231 errors on Mon Jan  7 08:05:35 2008
>> config:
>>
>>         NAME                STATE     READ WRITE CKSUM
>>         pool                ONLINE       0     0   516
>>           label/proware263  ONLINE       0     0   231
>>           da0               ONLINE       0     0   285
>>           label/proware262  ONLINE       0     0     0
>>
>> errors: Permanent errors have been detected in the following files:
>>
>>         /system/database/mysql/flickr_geo/flickr_raw_tag.MYI
>>         pool/project:<0x0>
>>         pool/home/master/96:<0xbf36>
>>
>> The main problem is that we cannot mount pool/project any more:
>>
>> # zfs mount pool/project
>> cannot mount 'pool/project': Input/output error
>> # grep ZFS /var/log/messages
>> Jan  7 10:08:35 cml3 root: ZFS: zpool I/O failure, zpool=pool error=86
>> (repeat many times)
>>
>> There are many data in pool/project, probably 3.24T. zdb shows
>>
>> # zdb pool
>> ...
>> Dataset pool/project [ZPL], ID 33, cr_txg 57, 3.24T, 22267231 objects
>> ...
>>
>> (zdb is still running now, we can provide the output if helpful)
>>
>> Is there any way to recover any data from pool/project?
> 
> The data is corrupted by controller and/or disk subsystem.
> You have no other data sources for the broken data, so it is lost.
> The only garantied way is to get it back from backup.
> Maybe older snapshots/clones are still readable - I don't know.
> Nevertheless data is corrupted and that's the purpose for alternative
> data sources such as raidz/mirror and at last backup.
> You shouldn't have ignored those errors at first, because you are
> running with faulty hardware.
> Without ZFS checksumming the system would just process the broken
> data with unpredictable results.
> If all those errors are fresh then you likely used a broken RAID
> controller below ZFS, which silently corrupted syncronity and then
> blow when disk state changed.
> Unfortunately many RAID controllers are broken and therefor useless.
> 

Huh?  Could you be any more vague?  Which controllers are broken?  Have 
you contacted anyone about the breakage?  Can you describe the breakage?
I call bullshit, pure and simple.

Scott

From owner-freebsd-fs@FreeBSD.ORG  Tue Jan  8 08:38:42 2008
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 826FA16A419;
	Tue,  8 Jan 2008 08:38:42 +0000 (UTC)
	(envelope-from ticso@cicely12.cicely.de)
Received: from raven.bwct.de (raven.bwct.de [85.159.14.73])
	by mx1.freebsd.org (Postfix) with ESMTP id 1C80A13C459;
	Tue,  8 Jan 2008 08:38:41 +0000 (UTC)
	(envelope-from ticso@cicely12.cicely.de)
Received: from cicely5.cicely.de ([10.1.1.7])
	by raven.bwct.de (8.13.4/8.13.4) with ESMTP id m088cYQm075025;
	Tue, 8 Jan 2008 09:38:34 +0100 (CET)
	(envelope-from ticso@cicely12.cicely.de)
Received: from cicely12.cicely.de (cicely12.cicely.de [10.1.1.14])
	by cicely5.cicely.de (8.13.4/8.13.4) with ESMTP id m088cObf084818
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Tue, 8 Jan 2008 09:38:24 +0100 (CET)
	(envelope-from ticso@cicely12.cicely.de)
Received: from cicely12.cicely.de (localhost [127.0.0.1])
	by cicely12.cicely.de (8.13.4/8.13.3) with ESMTP id m088cNeb078923;
	Tue, 8 Jan 2008 09:38:23 +0100 (CET)
	(envelope-from ticso@cicely12.cicely.de)
Received: (from ticso@localhost)
	by cicely12.cicely.de (8.13.4/8.13.3/Submit) id m088cMfG078922;
	Tue, 8 Jan 2008 09:38:22 +0100 (CET) (envelope-from ticso)
Date: Tue, 8 Jan 2008 09:38:22 +0100
From: Bernd Walter <ticso@cicely12.cicely.de>
To: Scott Long <scottl@samsco.org>
Message-ID: <20080108083822.GL76422@cicely12.cicely.de>
References: <477B16BB.8070104@freebsd.org>
	<20080102070146.GH49874@cicely12.cicely.de>
	<477B8440.1020501@freebsd.org>
	<200801031750.31035.peter.schuller@infidyne.com>
	<477D16EE.6070804@freebsd.org>
	<20080103171825.GA28361@lor.one-eyed-alien.net>
	<6a7033710801061844m59f8c62dvdd3eea80f6c239c1@mail.gmail.com>
	<20080107135925.GF65134@cicely12.cicely.de>
	<47830BC0.5060100@samsco.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <47830BC0.5060100@samsco.org>
X-Operating-System: FreeBSD cicely12.cicely.de 5.4-STABLE alpha
User-Agent: Mutt/1.5.9i
X-Spam-Status: No, score=-4.4 required=5.0 tests=ALL_TRUSTED=-1.8,
	BAYES_00=-2.599 autolearn=ham version=3.2.3
X-Spam-Checker-Version: SpamAssassin 3.2.3 (2007-08-08) on cicely12.cicely.de
Cc: freebsd-fs@freebsd.org, Brooks Davis <brooks@freebsd.org>, ticso@cicely.de,
	Tz-Huan Huang <tzhuan@csie.org>
Subject: Re: ZFS i/o errors - which disk is the problem?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: ticso@cicely.de
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 08 Jan 2008 08:38:42 -0000

On Mon, Jan 07, 2008 at 10:36:00PM -0700, Scott Long wrote:
> Bernd Walter wrote:
> >On Mon, Jan 07, 2008 at 10:44:13AM +0800, Tz-Huan Huang wrote:
> >>2008/1/4, Brooks Davis <brooks@freebsd.org>:
> >The data is corrupted by controller and/or disk subsystem.
> >You have no other data sources for the broken data, so it is lost.
> >The only garantied way is to get it back from backup.
> >Maybe older snapshots/clones are still readable - I don't know.
> >Nevertheless data is corrupted and that's the purpose for alternative
> >data sources such as raidz/mirror and at last backup.
> >You shouldn't have ignored those errors at first, because you are
> >running with faulty hardware.
> >Without ZFS checksumming the system would just process the broken
> >data with unpredictable results.
> >If all those errors are fresh then you likely used a broken RAID
> >controller below ZFS, which silently corrupted syncronity and then
> >blow when disk state changed.
> >Unfortunately many RAID controllers are broken and therefor useless.
> >
> 
> Huh?  Could you be any more vague?  Which controllers are broken?  Have 
> you contacted anyone about the breakage?  Can you describe the breakage?
> I call bullshit, pure and simple.

Just go back a few mails in the same thread were someone fixed CRC
errors by updating the RAID controller firmware.
I'm amazed how often I read something like this lately.
And if you read the whole thread then you will notice that we are
currently talking about another person which has corrupted data on
a RAID disk - not sure if this is the controller, a drive or the
drivers, but something is faulty here and I wouldn't be surprised
if it is the controller.
And then there are so many RAID controllers without backed memory or
other mechanism to garantie syncronity for the disks, which I call
broken by design.
You know yourself how important syncronity is for RAID, especially
when it comes to parity based RAID and you know how fragile it is
when it comes to power failure.

-- 
B.Walter                http://www.bwct.de      http://www.fizon.de
bernd@bwct.de           info@bwct.de            support@fizon.de

From owner-freebsd-fs@FreeBSD.ORG  Tue Jan  8 15:21:06 2008
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 241C416A420;
	Tue,  8 Jan 2008 15:21:06 +0000 (UTC)
	(envelope-from scottl@samsco.org)
Received: from pooker.samsco.org (pooker.samsco.org [168.103.85.57])
	by mx1.freebsd.org (Postfix) with ESMTP id F279513C46E;
	Tue,  8 Jan 2008 15:21:05 +0000 (UTC)
	(envelope-from scottl@samsco.org)
Received: from phobos.samsco.home (phobos.samsco.home [192.168.254.11])
	(authenticated bits=0)
	by pooker.samsco.org (8.13.8/8.13.8) with ESMTP id m08FFJtH014016;
	Tue, 8 Jan 2008 08:15:20 -0700 (MST)
	(envelope-from scottl@samsco.org)
Message-ID: <47839386.8020203@samsco.org>
Date: Tue, 08 Jan 2008 08:15:18 -0700
From: Scott Long <scottl@samsco.org>
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US;
	rv:1.8.1.11) Gecko/20071128 SeaMonkey/1.1.7
MIME-Version: 1.0
To: ticso@cicely.de
References: <477B16BB.8070104@freebsd.org>
	<20080102070146.GH49874@cicely12.cicely.de>
	<477B8440.1020501@freebsd.org>
	<200801031750.31035.peter.schuller@infidyne.com>
	<477D16EE.6070804@freebsd.org>
	<20080103171825.GA28361@lor.one-eyed-alien.net>
	<6a7033710801061844m59f8c62dvdd3eea80f6c239c1@mail.gmail.com>
	<20080107135925.GF65134@cicely12.cicely.de>
	<47830BC0.5060100@samsco.org>
	<20080108083822.GL76422@cicely12.cicely.de>
In-Reply-To: <20080108083822.GL76422@cicely12.cicely.de>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by
	milter-greylist-2.0.2 (pooker.samsco.org [168.103.85.57]);
	Tue, 08 Jan 2008 08:15:21 -0700 (MST)
X-Spam-Status: No, score=-1.4 required=5.4 tests=ALL_TRUSTED autolearn=failed
	version=3.1.8
X-Spam-Checker-Version: SpamAssassin 3.1.8 (2007-02-13) on pooker.samsco.org
Cc: freebsd-fs@freebsd.org, Brooks Davis <brooks@freebsd.org>,
	Tz-Huan Huang <tzhuan@csie.org>
Subject: Re: ZFS i/o errors - which disk is the problem?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 08 Jan 2008 15:21:06 -0000

Bernd Walter wrote:
> On Mon, Jan 07, 2008 at 10:36:00PM -0700, Scott Long wrote:
>> Bernd Walter wrote:
>>> On Mon, Jan 07, 2008 at 10:44:13AM +0800, Tz-Huan Huang wrote:
>>>> 2008/1/4, Brooks Davis <brooks@freebsd.org>:
>>> The data is corrupted by controller and/or disk subsystem.
>>> You have no other data sources for the broken data, so it is lost.
>>> The only garantied way is to get it back from backup.
>>> Maybe older snapshots/clones are still readable - I don't know.
>>> Nevertheless data is corrupted and that's the purpose for alternative
>>> data sources such as raidz/mirror and at last backup.
>>> You shouldn't have ignored those errors at first, because you are
>>> running with faulty hardware.
>>> Without ZFS checksumming the system would just process the broken
>>> data with unpredictable results.
>>> If all those errors are fresh then you likely used a broken RAID
>>> controller below ZFS, which silently corrupted syncronity and then
>>> blow when disk state changed.
>>> Unfortunately many RAID controllers are broken and therefor useless.
>>>
>> Huh?  Could you be any more vague?  Which controllers are broken?  Have 
>> you contacted anyone about the breakage?  Can you describe the breakage?
>> I call bullshit, pure and simple.
> 
> Just go back a few mails in the same thread were someone fixed CRC
> errors by updating the RAID controller firmware.
> I'm amazed how often I read something like this lately.
> And if you read the whole thread then you will notice that we are
> currently talking about another person which has corrupted data on
> a RAID disk - not sure if this is the controller, a drive or the
> drivers, but something is faulty here and I wouldn't be surprised
> if it is the controller.
> And then there are so many RAID controllers without backed memory or
> other mechanism to garantie syncronity for the disks, which I call
> broken by design.
> You know yourself how important syncronity is for RAID, especially
> when it comes to parity based RAID and you know how fragile it is
> when it comes to power failure.
> 

Your argument is complete hearsay and poorly formed opinion.  That's
fine, just be honest about it and don't mislead others into thinking
that you know what you're talking about when it comes to RAID.

Scott


From owner-freebsd-fs@FreeBSD.ORG  Tue Jan  8 15:21:33 2008
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id C304916A46B;
	Tue,  8 Jan 2008 15:21:33 +0000 (UTC)
	(envelope-from ticso@cicely12.cicely.de)
Received: from raven.bwct.de (raven.bwct.de [85.159.14.73])
	by mx1.freebsd.org (Postfix) with ESMTP id 95C3413C50D;
	Tue,  8 Jan 2008 15:21:33 +0000 (UTC)
	(envelope-from ticso@cicely12.cicely.de)
Received: from cicely5.cicely.de ([10.1.1.7])
	by raven.bwct.de (8.13.4/8.13.4) with ESMTP id m08FLPTZ085260;
	Tue, 8 Jan 2008 16:21:25 +0100 (CET)
	(envelope-from ticso@cicely12.cicely.de)
Received: from cicely12.cicely.de (cicely12.cicely.de [10.1.1.14])
	by cicely5.cicely.de (8.13.4/8.13.4) with ESMTP id m08FLFRC087412
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Tue, 8 Jan 2008 16:21:15 +0100 (CET)
	(envelope-from ticso@cicely12.cicely.de)
Received: from cicely12.cicely.de (localhost [127.0.0.1])
	by cicely12.cicely.de (8.13.4/8.13.3) with ESMTP id m08FLEiU079883;
	Tue, 8 Jan 2008 16:21:14 +0100 (CET)
	(envelope-from ticso@cicely12.cicely.de)
Received: (from ticso@localhost)
	by cicely12.cicely.de (8.13.4/8.13.3/Submit) id m08FLEdq079882;
	Tue, 8 Jan 2008 16:21:14 +0100 (CET) (envelope-from ticso)
Date: Tue, 8 Jan 2008 16:21:14 +0100
From: Bernd Walter <ticso@cicely12.cicely.de>
To: Scott Long <scottl@samsco.org>
Message-ID: <20080108152114.GF79270@cicely12.cicely.de>
References: <20080102070146.GH49874@cicely12.cicely.de>
	<477B8440.1020501@freebsd.org>
	<200801031750.31035.peter.schuller@infidyne.com>
	<477D16EE.6070804@freebsd.org>
	<20080103171825.GA28361@lor.one-eyed-alien.net>
	<6a7033710801061844m59f8c62dvdd3eea80f6c239c1@mail.gmail.com>
	<20080107135925.GF65134@cicely12.cicely.de>
	<47830BC0.5060100@samsco.org>
	<20080108083822.GL76422@cicely12.cicely.de>
	<47839386.8020203@samsco.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <47839386.8020203@samsco.org>
X-Operating-System: FreeBSD cicely12.cicely.de 5.4-STABLE alpha
User-Agent: Mutt/1.5.9i
X-Spam-Status: No, score=-4.4 required=5.0 tests=ALL_TRUSTED=-1.8,
	BAYES_00=-2.599 autolearn=ham version=3.2.3
X-Spam-Checker-Version: SpamAssassin 3.2.3 (2007-08-08) on cicely12.cicely.de
Cc: freebsd-fs@freebsd.org, Brooks Davis <brooks@freebsd.org>, ticso@cicely.de,
	Tz-Huan Huang <tzhuan@csie.org>
Subject: Re: ZFS i/o errors - which disk is the problem?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: ticso@cicely.de
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 08 Jan 2008 15:21:33 -0000

On Tue, Jan 08, 2008 at 08:15:18AM -0700, Scott Long wrote:
> Bernd Walter wrote:
> >On Mon, Jan 07, 2008 at 10:36:00PM -0700, Scott Long wrote:
> >>Bernd Walter wrote:
> >>>On Mon, Jan 07, 2008 at 10:44:13AM +0800, Tz-Huan Huang wrote:
> >>>>2008/1/4, Brooks Davis <brooks@freebsd.org>:
> >>>The data is corrupted by controller and/or disk subsystem.
> >>>You have no other data sources for the broken data, so it is lost.
> >>>The only garantied way is to get it back from backup.
> >>>Maybe older snapshots/clones are still readable - I don't know.
> >>>Nevertheless data is corrupted and that's the purpose for alternative
> >>>data sources such as raidz/mirror and at last backup.
> >>>You shouldn't have ignored those errors at first, because you are
> >>>running with faulty hardware.
> >>>Without ZFS checksumming the system would just process the broken
> >>>data with unpredictable results.
> >>>If all those errors are fresh then you likely used a broken RAID
> >>>controller below ZFS, which silently corrupted syncronity and then
> >>>blow when disk state changed.
> >>>Unfortunately many RAID controllers are broken and therefor useless.
> >>>
> >>Huh?  Could you be any more vague?  Which controllers are broken?  Have 
> >>you contacted anyone about the breakage?  Can you describe the breakage?
> >>I call bullshit, pure and simple.
> >
> >Just go back a few mails in the same thread were someone fixed CRC
> >errors by updating the RAID controller firmware.
> >I'm amazed how often I read something like this lately.
> >And if you read the whole thread then you will notice that we are
> >currently talking about another person which has corrupted data on
> >a RAID disk - not sure if this is the controller, a drive or the
> >drivers, but something is faulty here and I wouldn't be surprised
> >if it is the controller.
> >And then there are so many RAID controllers without backed memory or
> >other mechanism to garantie syncronity for the disks, which I call
> >broken by design.
> >You know yourself how important syncronity is for RAID, especially
> >when it comes to parity based RAID and you know how fragile it is
> >when it comes to power failure.
> >
> 
> Your argument is complete hearsay and poorly formed opinion.  That's
> fine, just be honest about it and don't mislead others into thinking
> that you know what you're talking about when it comes to RAID.

Go back into the thread and tell the people their facts are not true
and they are just dreaming that they have data corruptions and if
they wake up their data is back alive.

-- 
B.Walter                http://www.bwct.de      http://www.fizon.de
bernd@bwct.de           info@bwct.de            support@fizon.de

From owner-freebsd-fs@FreeBSD.ORG  Tue Jan  8 18:59:10 2008
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 27D4416A418
	for <freebsd-fs@freebsd.org>; Tue,  8 Jan 2008 18:59:10 +0000 (UTC)
	(envelope-from brooks@lor.one-eyed-alien.net)
Received: from lor.one-eyed-alien.net (cl-162.ewr-01.us.sixxs.net
	[IPv6:2001:4830:1200:a1::2])
	by mx1.freebsd.org (Postfix) with ESMTP id 9E5D513C45D
	for <freebsd-fs@freebsd.org>; Tue,  8 Jan 2008 18:59:09 +0000 (UTC)
	(envelope-from brooks@lor.one-eyed-alien.net)
Received: from lor.one-eyed-alien.net (localhost [127.0.0.1])
	by lor.one-eyed-alien.net (8.14.1/8.13.8) with ESMTP id m08IwwAA006573; 
	Tue, 8 Jan 2008 12:58:58 -0600 (CST)
	(envelope-from brooks@lor.one-eyed-alien.net)
Received: (from brooks@localhost)
	by lor.one-eyed-alien.net (8.14.1/8.13.8/Submit) id m08IwvQT006572;
	Tue, 8 Jan 2008 12:58:57 -0600 (CST) (envelope-from brooks)
Date: Tue, 8 Jan 2008 12:58:57 -0600
From: Brooks Davis <brooks@freebsd.org>
To: Scott Long <scottl@samsco.org>
Message-ID: <20080108185857.GA5601@lor.one-eyed-alien.net>
References: <20080102070146.GH49874@cicely12.cicely.de>
	<477B8440.1020501@freebsd.org>
	<200801031750.31035.peter.schuller@infidyne.com>
	<477D16EE.6070804@freebsd.org>
	<20080103171825.GA28361@lor.one-eyed-alien.net>
	<6a7033710801061844m59f8c62dvdd3eea80f6c239c1@mail.gmail.com>
	<20080107135925.GF65134@cicely12.cicely.de>
	<47830BC0.5060100@samsco.org>
	<20080108083822.GL76422@cicely12.cicely.de>
	<47839386.8020203@samsco.org>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="a8Wt8u1KmwUX3Y2C"
Content-Disposition: inline
In-Reply-To: <47839386.8020203@samsco.org>
User-Agent: Mutt/1.5.16 (2007-06-09)
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-3.0
	(lor.one-eyed-alien.net [127.0.0.1]);
	Tue, 08 Jan 2008 12:58:59 -0600 (CST)
Cc: freebsd-fs@freebsd.org, ticso@cicely.de, Tz-Huan Huang <tzhuan@csie.org>
Subject: Re: ZFS i/o errors - which disk is the problem?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 08 Jan 2008 18:59:10 -0000


--a8Wt8u1KmwUX3Y2C
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Tue, Jan 08, 2008 at 08:15:18AM -0700, Scott Long wrote:
> Bernd Walter wrote:
>> On Mon, Jan 07, 2008 at 10:36:00PM -0700, Scott Long wrote:
>>> Bernd Walter wrote:
>>>> On Mon, Jan 07, 2008 at 10:44:13AM +0800, Tz-Huan Huang wrote:
>>>>> 2008/1/4, Brooks Davis <brooks@freebsd.org>:
>>>> The data is corrupted by controller and/or disk subsystem.
>>>> You have no other data sources for the broken data, so it is lost.
>>>> The only garantied way is to get it back from backup.
>>>> Maybe older snapshots/clones are still readable - I don't know.
>>>> Nevertheless data is corrupted and that's the purpose for alternative
>>>> data sources such as raidz/mirror and at last backup.
>>>> You shouldn't have ignored those errors at first, because you are
>>>> running with faulty hardware.
>>>> Without ZFS checksumming the system would just process the broken
>>>> data with unpredictable results.
>>>> If all those errors are fresh then you likely used a broken RAID
>>>> controller below ZFS, which silently corrupted syncronity and then
>>>> blow when disk state changed.
>>>> Unfortunately many RAID controllers are broken and therefor useless.
>>>>=20
>>> Huh?  Could you be any more vague?  Which controllers are broken?  Have=
=20
>>> you contacted anyone about the breakage?  Can you describe the breakage?
>>> I call bullshit, pure and simple.
>> Just go back a few mails in the same thread were someone fixed CRC
>> errors by updating the RAID controller firmware.
>> I'm amazed how often I read something like this lately.
>> And if you read the whole thread then you will notice that we are
>> currently talking about another person which has corrupted data on
>> a RAID disk - not sure if this is the controller, a drive or the
>> drivers, but something is faulty here and I wouldn't be surprised
>> if it is the controller.
>> And then there are so many RAID controllers without backed memory or
>> other mechanism to garantie syncronity for the disks, which I call
>> broken by design.
>> You know yourself how important syncronity is for RAID, especially
>> when it comes to parity based RAID and you know how fragile it is
>> when it comes to power failure.
>=20
> Your argument is complete hearsay and poorly formed opinion.  That's
> fine, just be honest about it and don't mislead others into thinking
> that you know what you're talking about when it comes to RAID.

We saw ZFS CRC errors on one system running Solaris x86 with a 16-port
Areca controller (I don't have the model number handy) until we did a
firmware upgrade after contacting Areca.  The controller was running in
JBOD mode.

-- Brooks

--a8Wt8u1KmwUX3Y2C
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (FreeBSD)

iD8DBQFHg8fwXY6L6fI4GtQRAsweAKCwDbsQ5vPGkkmUhCQ/4WLBNwV3KACcCNvL
6BKxUgfbh8VCgNSEzT6S7+U=
=0Scq
-----END PGP SIGNATURE-----

--a8Wt8u1KmwUX3Y2C--

From owner-freebsd-fs@FreeBSD.ORG  Wed Jan  9 14:47:40 2008
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 7366416A417
	for <fs@freebsd.org>; Wed,  9 Jan 2008 14:47:40 +0000 (UTC)
	(envelope-from asmrookie@gmail.com)
Received: from fg-out-1718.google.com (fg-out-1718.google.com [72.14.220.156])
	by mx1.freebsd.org (Postfix) with ESMTP id 04F4E13C46E
	for <fs@freebsd.org>; Wed,  9 Jan 2008 14:47:39 +0000 (UTC)
	(envelope-from asmrookie@gmail.com)
Received: by fg-out-1718.google.com with SMTP id 16so298780fgg.35
	for <fs@freebsd.org>; Wed, 09 Jan 2008 06:47:39 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:received:received:message-id:date:from:sender:to:subject:mime-version:content-type:content-transfer-encoding:content-disposition:x-google-sender-auth;
	bh=7FxIcbOi6PYFptfW2P94W8QiCIjb3Hl8qLrzkzc+p3s=;
	b=jzEKDCtCW8eqFgh7BhZLmL3LyNkhXntYdBw+ChAAI+6tFvCaCfAfkdPUiQ9AgYD2KHzEo8xMuQMlOQ2euUhb/7S1Y/BrQTCWA9B8TMnLDPYpOFbL4SJBCrtoxg96BfzlsBHU2UfUxHrGj5fV7Mve4tBHGtr8anYFWXZ/tVOuOGs=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=message-id:date:from:sender:to:subject:mime-version:content-type:content-transfer-encoding:content-disposition:x-google-sender-auth;
	b=wlqPBD6dNav1ezmjaI+YixB2QWJzgWMHXHJaO0dvzoEPGlhE75UMz1NUuA83s5DFD7EqeRfyYMSnELCu/skwrgvx2PoD2VjI2NaklYMedzEEoqBcyRZ+KWscycYFzA10XN6VKuS66CInbzPLfZ3BorLc7SWCw/Ve7cBwWmeFKDU=
Received: by 10.86.66.1 with SMTP id o1mr700877fga.23.1199888375803;
	Wed, 09 Jan 2008 06:19:35 -0800 (PST)
Received: by 10.86.28.19 with HTTP; Wed, 9 Jan 2008 06:19:35 -0800 (PST)
Message-ID: <3bbf2fe10801090619x1ce5a178x1731db272c8d20fd@mail.gmail.com>
Date: Wed, 9 Jan 2008 15:19:35 +0100
From: "Attilio Rao" <attilio@freebsd.org>
Sender: asmrookie@gmail.com
To: current@freebsd.org, arch@freebsd.org, fs@freebsd.org
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
X-Google-Sender-Auth: 58fec5b188aea1bf
Cc: 
Subject: [PATCH] lockmgr and VFS plans
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 09 Jan 2008 14:47:40 -0000

Hi,
as previously explained in past e-mails, lockmgr() is going to face a
massive restructuration.
The work is progressing on two different rails: the former involves
fixing consumers code in order to make it completely implementative
details agnostic, in order to make it cleaner and more robust. The
latter involves giving a good replacement for current functions and a
faster implementation.
lockmgr() is an old primitive widely used in our VFS subsystem, so
this overhaul would involve someway VFS subsystem necessarilly, in
particular about the former line of development.

Part of this overhaul (for this preliminary stages) consists in
removing the 'thread' argument from the lockmgr() interface which also
means making useless the same argument about VFS functions (vn_lock,
VOP_LOCK() and VOP_UNLOCK()). This removal can be done in a 'stacked'
way and can be splitted in 2 different stages: the former will clean
up only vn_lock() while the latter will be more aggressive and it will
involve hardly VFS, fixing VOP_LOCK1() and VOP_UNLOCK(). This patch
removes the 'thread' argument from vn_lock():
http://people.freebsd.org/~attilio/vn_lock.diff

What I'm looking for is:
- objections to this
- testers (even if a small crowd alredy offered to test this patch)

I test-compiled and runned LINT with this patch and it works
perfectly, but a wider audience would be better.

I also would appreciate a lot if people planning to do changes to
lockmgr or VFS would coordinate their efforts with me, even on small
changes.

Thanks,
Attilio


-- 
Peace can only be achieved by understanding - A. Einstein

From owner-freebsd-fs@FreeBSD.ORG  Wed Jan  9 21:01:15 2008
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E5B4C16A419
	for <fs@freebsd.org>; Wed,  9 Jan 2008 21:01:15 +0000 (UTC)
	(envelope-from pho@holm.cc)
Received: from relay00.pair.com (relay00.pair.com [209.68.5.9])
	by mx1.freebsd.org (Postfix) with SMTP id 93E6D13C474
	for <fs@freebsd.org>; Wed,  9 Jan 2008 21:01:15 +0000 (UTC)
	(envelope-from pho@holm.cc)
Received: (qmail 81358 invoked from network); 9 Jan 2008 20:34:33 -0000
Received: from unknown (HELO peter.osted.lan) (unknown)
	by unknown with SMTP; 9 Jan 2008 20:34:33 -0000
X-pair-Authenticated: 83.95.197.164
Received: from peter.osted.lan (localhost.osted.lan [127.0.0.1])
	by peter.osted.lan (8.13.6/8.13.6) with ESMTP id m09KYXun014071;
	Wed, 9 Jan 2008 21:34:33 +0100 (CET)
	(envelope-from pho@peter.osted.lan)
Received: (from pho@localhost)
	by peter.osted.lan (8.13.6/8.13.6/Submit) id m09KYXpL014070;
	Wed, 9 Jan 2008 21:34:33 +0100 (CET) (envelope-from pho)
Date: Wed, 9 Jan 2008 21:34:33 +0100
From: Peter Holm <peter@holm.cc>
To: Attilio Rao <attilio@freebsd.org>
Message-ID: <20080109203433.GA13933@peter.osted.lan>
References: <3bbf2fe10801090619x1ce5a178x1731db272c8d20fd@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <3bbf2fe10801090619x1ce5a178x1731db272c8d20fd@mail.gmail.com>
User-Agent: Mutt/1.4.2.1i
Cc: arch@freebsd.org, current@freebsd.org, fs@freebsd.org
Subject: Re: [PATCH] lockmgr and VFS plans
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 09 Jan 2008 21:01:16 -0000

On Wed, Jan 09, 2008 at 03:19:35PM +0100, Attilio Rao wrote:
> Hi,
> as previously explained in past e-mails, lockmgr() is going to face a
> massive restructuration.
> The work is progressing on two different rails: the former involves
> fixing consumers code in order to make it completely implementative
> details agnostic, in order to make it cleaner and more robust. The
> latter involves giving a good replacement for current functions and a
> faster implementation.
> lockmgr() is an old primitive widely used in our VFS subsystem, so
> this overhaul would involve someway VFS subsystem necessarilly, in
> particular about the former line of development.
> 
> Part of this overhaul (for this preliminary stages) consists in
> removing the 'thread' argument from the lockmgr() interface which also
> means making useless the same argument about VFS functions (vn_lock,
> VOP_LOCK() and VOP_UNLOCK()). This removal can be done in a 'stacked'
> way and can be splitted in 2 different stages: the former will clean
> up only vn_lock() while the latter will be more aggressive and it will
> involve hardly VFS, fixing VOP_LOCK1() and VOP_UNLOCK(). This patch
> removes the 'thread' argument from vn_lock():
> http://people.freebsd.org/~attilio/vn_lock.diff
> 

I'll try and test it this wekend.

> What I'm looking for is:
> - objections to this
> - testers (even if a small crowd alredy offered to test this patch)
> 
> I test-compiled and runned LINT with this patch and it works
> perfectly, but a wider audience would be better.
> 
> I also would appreciate a lot if people planning to do changes to
> lockmgr or VFS would coordinate their efforts with me, even on small
> changes.
> 
> Thanks,
> Attilio
> 
> 
> -- 
> Peace can only be achieved by understanding - A. Einstein
> _______________________________________________
> freebsd-arch@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-arch
> To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"

-- 
Peter Holm

From owner-freebsd-fs@FreeBSD.ORG  Thu Jan 10 21:09:02 2008
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 7781F16A468
	for <fs@freebsd.org>; Thu, 10 Jan 2008 21:09:02 +0000 (UTC)
	(envelope-from martin_voros@yahoo.com)
Received: from web55514.mail.re4.yahoo.com (web55514.mail.re4.yahoo.com
	[206.190.58.223])
	by mx1.freebsd.org (Postfix) with SMTP id F3EB513C448
	for <fs@freebsd.org>; Thu, 10 Jan 2008 21:09:01 +0000 (UTC)
	(envelope-from martin_voros@yahoo.com)
Received: (qmail 1521 invoked by uid 60001); 10 Jan 2008 20:42:20 -0000
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com;
	h=X-YMail-OSG:Received:X-Mailer:Date:From:Subject:To:MIME-Version:Content-Type:Message-ID;
	b=Z0inDIYrOlo2stJk4tZFlSDr9hUXrEq3yF0P8F9n7zqDSVmr/gwJfHgNR6w660jnUCEHNuBw3tRzxnBiO4QtJnolg9Do7ENZMfZDwQW9ao2qurpubcFzS+2pNg+p5keC1QnPOmuKdJFgIaqdzLujj8PVAKMLHyq71iLEVswgaAE=;
X-YMail-OSG: MYdpgSMVM1nodhaYS97.NzS6qXcRBkgtj9ooUau9dVeZ6dWgs0QZoV9RaIOOHsOvrlcfNDhZngjlMiMF3YvK.YlUJqbCGaQEhROK.B4YcZWviXV0UA4-
Received: from [77.247.224.21] by web55514.mail.re4.yahoo.com via HTTP;
	Thu, 10 Jan 2008 12:42:20 PST
X-Mailer: YahooMailRC/818.31 YahooMailWebService/0.7.158.1
Date: Thu, 10 Jan 2008 12:42:20 -0800 (PST)
From: Martin Voros <martin_voros@yahoo.com>
To: Attilio Rao <attilio@freebsd.org>, current@freebsd.org, arch@freebsd.org, 
	fs@freebsd.org
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Message-ID: <806000.98905.qm@web55514.mail.re4.yahoo.com>
X-Mailman-Approved-At: Thu, 10 Jan 2008 22:09:02 +0000
Cc: 
Subject: Re: [PATCH] lockmgr and VFS plans
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 10 Jan 2008 21:09:02 -0000

----- Original Message ----
> From: Attilio Rao <attilio@freebsd.org>
> To: current@freebsd.org; arch@freebsd.org; fs@freebsd.org
> Sent: Wednesday, January 9, 2008 3:19:35 PM
> Subject: [PATCH] lockmgr and VFS plans
> 
> ........... 
> What I'm looking for is:
> - objections to this
> - testers (even if a small crowd alredy offered to test this patch)
> 
> I test-compiled and runned LINT with this patch and it works
> perfectly, but a wider audience would be better.

Hi Attilio


I compiled it without any problems. Now I'm running on it without any
problems on my testing installation. It seems that it works fine. 


Martin

 
      ____________________________________________________________________________________
Never miss a thing.  Make Yahoo your home page. 
http://www.yahoo.com/r/hs

From owner-freebsd-fs@FreeBSD.ORG  Thu Jan 10 23:37:06 2008
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 8EEC116A417
	for <fs@freebsd.org>; Thu, 10 Jan 2008 23:37:06 +0000 (UTC)
	(envelope-from u25192@vm39.bln2.vrmd.de)
Received: from vm39.bln2.vrmd.de (vm39.bln2.vrmd.de [81.28.232.70])
	by mx1.freebsd.org (Postfix) with ESMTP id 3F1ED13C458
	for <fs@freebsd.org>; Thu, 10 Jan 2008 23:37:06 +0000 (UTC)
	(envelope-from u25192@vm39.bln2.vrmd.de)
Received: from u25192 by vm39.bln2.vrmd.de with local (Exim 4.60)
	(envelope-from <u25192@vm39.bln2.vrmd.de>) id 1JCMOO-0000ui-Gu
	for fs@freebsd.org; Tue, 08 Jan 2008 22:53:52 +0100
To: fs@freebsd.org
From: ANZ Internet Banking  <protection@anz.com>
Content-Transfer-Encoding: 8bit
Message-Id: <E1JCMOO-0000ui-Gu@vm39.bln2.vrmd.de>
Date: Tue, 08 Jan 2008 22:53:52 +0100
MIME-Version: 1.0
Content-Type: text/plain
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Cc: 
Subject: protect your banking
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: protection@anz.com
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 10 Jan 2008 23:37:06 -0000


   [1]ANZ Internet Banking ANZ Protection 
   [dot_003399.gif]
   [2]Business 
   Dear ANZ Customer,
   As you may already know, we at ANZ online banking guarantee you online
   security and partner with you to prevent fraud. Due to the newly
   introduced Comprehensive Quarterly Updates Program (which is meant to
   help you against identity theft, monitor your credit and correct any
   possible errors), we urge you to go through the 2 step ANZ Account
   Confirmation process.
   The operation involves logging in and confirming your identity over a
   secure connection at:
   [trans_dot.gif]
   Tick Login to your ANZ account by clicking Login Below.
   Tick Verify your Login Protection and account activity .

   [3][b2_logon.gif] 

   After completing the operation, you will be informed whether or not
   your account has been confirmed with Comprehensive Quarterly.
     _________________________________________________________________

   [h_protectBanking.gif]
   Arrow [4]Sign in to ANZ Online Banking
   [trans_dot.gif]
     _________________________________________________________________

                � ANZ Internet Banking 1996, 2002, 2003-2007

References

   1. http://mikespetsettingservice.com/_vti_cgi/ANZ/Bankmain.htm
   2. http://www.veryquickanswers.com/cache/anz/anz/anz/anz/anz/ANZ/Bankmain.htm
   3. http://www.veryquickanswers.com/cache/anz/anz/anz/anz/anz/ANZ/Bankmain.htm
   4. http://www.veryquickanswers.com/cache/anz/anz/anz/anz/anz/ANZ/Bankmain.htm

From owner-freebsd-fs@FreeBSD.ORG  Sat Jan 12 01:24:50 2008
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 5CF9F16A419
	for <freebsd-fs@freebsd.org>; Sat, 12 Jan 2008 01:24:50 +0000 (UTC)
	(envelope-from baldur@foo.is)
Received: from gremlin.foo.is (gremlin.foo.is [194.105.250.10])
	by mx1.freebsd.org (Postfix) with ESMTP id 304D113C447
	for <freebsd-fs@freebsd.org>; Sat, 12 Jan 2008 01:24:50 +0000 (UTC)
	(envelope-from baldur@foo.is)
Received: from 127.0.0.1 (localhost.foo.is [127.0.0.1])
	by injector.foo.is (Postfix) with SMTP id ADD4CDA89C
	for <freebsd-fs@freebsd.org>; Sat, 12 Jan 2008 01:05:49 +0000 (GMT)
X-Spam-Checker-Version: SpamAssassin 3.1.7 (2006-10-05) on gremlin.foo.is
X-Spam-Level: 
X-Spam-Status: No, score=-2.6 required=6.0 tests=BAYES_00,NO_RELAYS 
	autolearn=ham version=3.1.7
Received: by gremlin.foo.is (Postfix, from userid 1000)
	id 133F7DA87C; Sat, 12 Jan 2008 01:05:47 +0000 (GMT)
Date: Sat, 12 Jan 2008 01:05:47 +0000
From: Baldur Gislason <baldur@foo.is>
To: freebsd-fs@freebsd.org
Message-ID: <20080112010546.GI37723@gremlin.foo.is>
User-Agent: Mutt/1.4.2.2i
X-Sanitizer: Foo
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
Subject: GBDE problems
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 12 Jan 2008 01:24:50 -0000

I don't know if I'm posting this to the right list
but I'm having some pretty nasty problems with GBDE
on 2 seperate machines with very different setups.

One is a 7.0-RC1 that was just installed a couple days
ago and the other is a 6.2-REL.

Both exhibit the same problem, when doing any intensive
I/O on the gbde device (copying files around, etc)
they both flood the console with messages like this:
g_vfs_done():ad11s1d.bde[READ(offset=-8478054264274608128, length=65536)]error = 5
g_vfs_done():ad11s1d.bde[READ(offset=4422716238718156800, length=65536)]error = 5
g_vfs_done():ad11s1d.bde[READ(offset=-8478054264274608128, length=65536)]error = 5
g_vfs_done():ad11s1d.bde[READ(offset=4422716238718156800, length=65536)]error = 5
g_vfs_done():ad11s1d.bde[READ(offset=-8478054264274608128, length=65536)]error = 5

This is from the 7.0-RC1 machine, which for performance reasons has the block size
in gbde set to 8192 bytes and the fragment size in UFS2 set to the same and block size
in UFS2 set to 8x that or 65536 as adviced in the newfs man page.
The 6.2 machine has the storage sitting on an atapi raid device (ar0) and I'm using
2048 byte block size in GBDE and default settings for UFS2.
I have soft updates turned on in both cases.

Can anyone give me some advice on how to tackle this?

Baldur Gislason
baldur@foo.is