From owner-freebsd-fs@FreeBSD.ORG Wed May 18 07:59:48 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D8DCC106564A for ; Wed, 18 May 2011 07:59:48 +0000 (UTC) (envelope-from daniel@digsys.bg) Received: from smtp-sofia.digsys.bg (smtp-sofia.digsys.bg [193.68.3.230]) by mx1.freebsd.org (Postfix) with ESMTP id 77E498FC12 for ; Wed, 18 May 2011 07:59:48 +0000 (UTC) Received: from dcave.digsys.bg (dcave.digsys.bg [192.92.129.5]) (authenticated bits=0) by smtp-sofia.digsys.bg (8.14.4/8.14.4) with ESMTP id p4I7xbB7038788 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO) for ; Wed, 18 May 2011 10:59:42 +0300 (EEST) (envelope-from daniel@digsys.bg) Message-ID: <4DD37C69.5020005@digsys.bg> Date: Wed, 18 May 2011 10:59:37 +0300 From: Daniel Kalchev User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.15) Gecko/20110307 Thunderbird/3.1.9 MIME-Version: 1.0 To: freebsd-fs@freebsd.org References: <85EC77D3-116E-43B0-BFF1-AE1BD71B5CE9@itassistans.se> In-Reply-To: <85EC77D3-116E-43B0-BFF1-AE1BD71B5CE9@itassistans.se> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: HAST + ZFS self healing? Hot spares? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 May 2011 07:59:48 -0000 On 18.05.11 09:13, Per von Zweigbergk wrote: > I've been investigating HAST as a possibility in adding synchronous replication and failover to a set of two NFS servers backed by NFS. The servers themselves contain quite a few disks. 20 of them (7200 RPM SAS disks), to be exact. (If I didn't lose count again...) Plus two quick but small SSD's for ZIL and two not-as-quick but larger SSD's for L2ARC. Your idea is to have hot standby server, to replace the primary, should the primary fail (hardware-wise)? You need probably CAPR in addition to HAST in order to maintain the same shared IP address. > Initially, my thoughts land on simply creating HAST resources for the corresponding pairs of disks and SSDs in servers A and B, and then using these HAST resources to make up the ZFS pool. This would be the most natural decision, especially if you have identical hardware on both servers. Let's call this variant 1. Variant 2, would be to create local ZFS pools (as you already have) and then create ZVOLs there, that are managed by HAST. Then, you will use the HAST provider for whatever storage needs you have. Performance might not be what you expect and you need to trust HAST for the checksuming. > 1. Hardware failure management. In case of a hardware failure, I'm not exactly sure what will happen, but I suspect the single-disk RAID-0 array containing the failed disk will simply fail. I assume it will still exist, but refuse to be read or written. In this situation I understand HAST will handle this by routing all I/O to the secondary server, in case the disk on the primary side dies, or simply by cutting off replication if the disk on the secondary server fails. Having local ZFS makes hardware management easier, as ZFS is designed for this. This is variant 2. In your case, with variant 1 you will have several issues: - handle the disk failure and array management on the controller level. You need to check if this will work - you may end up with a new array name and thus having to edit config files. - there is no hot spare mechanism in HAST and I do not believe you can switch to secondary easily. Switching to secondary will make the HAST device node disappear for sure on the primary server and reappear on the secondary server. Maybe someone might suggest proper way to handle this. > 2. ZFS self-healing. As far as I understand it, ZFS does self-healing, in that all data is checksummed, and if one disk in a mirror happens to contain corrupted data, ZFS will re-read the same data from the other disk in the ZFS mirror. I don't see any way this could work in a configuration where ZFS is not mirroring itself, but rather, running on top of HAST, currently. Am I wrong about this? Or is there any way to achieve this same self-healing effect except with HAST? HAST is simple mirror. It only makes sure blocks on the local and remove drives contains the same data. I do not believe it has strong enough checksuming to compare with ZFS. Therefore, your best bet is to use ZFS on top of HAST for best data protection. In your example, you will need to create 20 HAST resources, out of each disk. Then create ZFS on top of these HAST resources. ZFS will then be able to heal itself in case there are inconsistencies with data on the HAST resources (for whatever reason). Some reported they used HAST for the SLOG as well. I do not know if using HAST for the L2ARC makes any sense. On failure you will import the pool on the slave node and this will wipe the L2ARC anyway. > I mean, ideally, ZFS would have a really neat synchronous replication feature built into it. Or ZFS could be HAST-aware, and know how to ask HAST to bring it a copy of a block of data on the remote block device in a HAST mirror in case the checksum on the local block device doesn't match. Or HAST would itself have some kind of block-level checksums, and do self-healing itself. (This would probably be the easiest to implement. The secondary site could even continually be reading the same data as the primary site is, merely to check the checksums on disk, not to send it over the wire. It's not like it's doing anything else useful with that untapped read performance.) With HAST, no (hast) storage providers exist on the secondary node. Therefore, you cannot do any I/O on the secondary node, until it becomes primary. I too, would be interested in the failure management scenario with HAST+ZFS, as I am currently experimenting with a similar system. Daniel