From owner-freebsd-fs@FreeBSD.ORG Fri Apr 20 12:44:06 2007 Return-Path: X-Original-To: freebsd-fs@freebsd.org Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id E0A6D16A400 for ; Fri, 20 Apr 2007 12:44:05 +0000 (UTC) (envelope-from mike@nux.co.uk) Received: from smtp.nildram.co.uk (smtp.nildram.co.uk [195.112.4.54]) by mx1.freebsd.org (Postfix) with ESMTP id 5473113C458 for ; Fri, 20 Apr 2007 12:44:05 +0000 (UTC) (envelope-from mike@nux.co.uk) Received: from office.nux.co.uk (unknown [82.133.40.67]) by smtp.nildram.co.uk (Postfix) with ESMTP id 8309A2B7032 for ; Fri, 20 Apr 2007 13:44:01 +0100 (BST) Received: (qmail 47823 invoked by uid 2223); 20 Apr 2007 12:44:03 -0000 Received: from localhost (sendmail-bs@127.0.0.1) by localhost with SMTP; 20 Apr 2007 12:44:03 -0000 Date: Fri, 20 Apr 2007 13:44:03 +0100 (BST) From: Mike Wolman X-X-Sender: mike@nux.eros.office To: freebsd-geom@freebsd.org, freebsd-fs@freebsd.org Message-ID: <20070420133854.G45782@nux.eros.office> MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="0-767011191-1177073043=:45782" Cc: Subject: lazy mirror / live backup X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 20 Apr 2007 12:44:06 -0000 This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --0-767011191-1177073043=:45782 Content-Type: TEXT/PLAIN; charset=uk.cp850.kbd; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE Hi List, I'd like the ability to have gmirror do a more efficient re-silvering (or= =20 re-syncing) of the mirror members when a planned disconnect occurs. This=20 would significantly reduce the mirror rebuild time for any component which= =20 had been deactivated, for network mirrors using ggated devices this would= =20 also reduce network usage and could be used for remote asynced mirrors=20 thus providing a live backup for laptops/workstations. Main Points When a normal mirror breaks this module must keep track of which block in= =20 the mirror have changed. - This can be done by keeping a list/map of just the block which change. - This list/map needs to be stored on a device not provided by the mirror= =20 or in memory. If this list is stored in memory on rebooting the machine any= =20 deattached drive would require a full resync so a way of saving dumping=20 this list to permanent storage would be required as this would be a=20 problem for large mirrors over slow links. Example uses: Usb/Firewire external drive nightly full backup. If a mirror is contracted with 3 components: ad1, umass1 and umass2 umass1 and umass2 are backup devices taken home on alternate nights by=20 different users, always allowing for a device to have a full 1 day old=20 backup at a remote location. This module should be able to use the change log for multiple devices=20 preserving the changes until all components are upto date. Should one of= =20 the usb devices fail and is removed from the mirror the change log should= =20 be cleared (provided all other components are upto date) allowing for=20 drive failers and stopping the block change log growing indefinitely. It= =20 should be possible to use the same change log for more than one device. Normal full backups to usb devices can take many hours, this should reduce= =20 the time to only the amount of data added within the period the device was= =20 last attached to the mirror. Example use for disaster recovery - slow links: If the mirror consist of 2 components ad1 and ggatec1 with component=20 ggatec1 being on a slow link. A flush period tuneable could be used by deactivating the ggatec component= =20 and reactivating it allowing for an asynchronous mirror - =E1 la rsync but= =20 faster as there is no file list etc. A tuneable may be required to only sync blocks which have not been changed= =20 in xx seconds/minutes to prevent the same blocks being transferred too=20 often. A tuneable to specify the speed at which gmirror syncronises the out of=20 sync component will be required - This would possibly be useful for normal= =20 gmirror use on a busy server when rebuilding a drive, as gmirror currently= =20 uses all available write speed to do so - limiting rebuild speed may=20 therefore prevent drive failures. Live backup of laptop/workstation If the mirror is created using a local disk ad0 and a ggated mounted=20 device ggate0 with a balance algorithm preferring ad0. When the network is unavailable gmirror starts to keep track of the=20 changed blocks. On reconnection to the network and activating the ggatec=20 component the list of changed blocks can be flushed. Should the same=20 block have changed more than once only the last change needs to be sent -= =20 reducing network usage. The main problem with mirroring the whole system drive is that any swap=20 changes will need to be ignored. Other Considerations/Suggestions: - Gmirrror will need somehow need to be informed that a drive has actually= =20 failed and is not just temporarily disconnected. - Data structure consideration difference between a list of block numbers= =20 that have changed, and a block bitmap. A block bitmap is perfect for=20 this, and only requires 1bit per block of storage, max. No more. A list= =20 of block numbers can get *HUGE* though, because the block numbers are=20 probably all 64bit numbers, so it will be 64x the amount of space required= =20 to store the list, not to mention the issues of sorting and maintaining it - For determining the size of this 'block change map', you could use the=20 ceiling of the max number of blocks. so, a 100Gb storage mirror, would=20 have roughly 200000000 512b blocks. So, 200million bits (using a bitmap=20 to store when a block needs resyncing or not (0 no sync, 1 sync) is=20 roughly 24MB. You could pretty easily keep that in memory, but if the size= =20 was 1Tb, you'd be at around 240MB, so that starts to get a little much.=20 Since this would be able to be enabled/disabled, it may not be an issue. - Possibly, you could cheat. Instead of marking each storage block=20 (512byte sector) as needing sync or not, you could do it in 16KB chunks.=20 So, if any sector inside that 16KB chunk was written, resync the whole=20 chunk. That reduces your memory footprint for a 100GB mirror down to=20 something less than 1MB! That means a 1Tb mirror would need only 7-10MB.=20 You'll resync a little extra data, but since drives cache and the GEOM=20 layer does requests efficiently in larger sizes anyhow, this might=20 actually perform better anyway. If there is anyone has further suggestions for this idea please let me=20 know and if there are and developers interested in this i may be able to=20 provide/donate some hardware - sorry not new - a laptop, desktop and some= =20 hard drives - and can setup a machine for any network related testing. --0-767011191-1177073043=:45782--