Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 20 Apr 2007 13:44:03 +0100 (BST)
From:      Mike Wolman <mike@nux.co.uk>
To:        freebsd-geom@freebsd.org, freebsd-fs@freebsd.org
Subject:   lazy mirror / live backup
Message-ID:  <20070420133854.G45782@nux.eros.office>

next in thread | raw e-mail | index | archive | help
  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

--0-767011191-1177073043=:45782
Content-Type: TEXT/PLAIN; charset=uk.cp850.kbd; format=flowed
Content-Transfer-Encoding: QUOTED-PRINTABLE

Hi List,

I'd like the ability to have gmirror do a more efficient re-silvering (or=
=20
re-syncing) of the mirror members when a planned disconnect occurs. This=20
would significantly reduce the mirror rebuild time for any component which=
=20
had been deactivated, for network mirrors using ggated devices this would=
=20
also reduce network usage and could be used for remote asynced mirrors=20
thus providing a live backup for laptops/workstations.


Main Points

When a normal mirror breaks this module must keep track of which block in=
=20
the mirror have changed.

- This can be done by keeping a list/map of just the block which change.

- This list/map needs to be stored on a device not provided by the mirror=
=20
or in memory. If this list is stored in memory on rebooting the machine any=
=20
deattached drive would require a full resync so a way of saving dumping=20
this list to permanent storage would be required as this would be a=20
problem for large mirrors over slow links.


Example uses:

Usb/Firewire external drive nightly full backup.

If a mirror is contracted with 3 components: ad1, umass1 and umass2

umass1 and umass2 are backup devices taken home on alternate nights by=20
different users, always allowing for a device to have a full 1 day old=20
backup at a remote location.

This module should be able to use the change log for multiple devices=20
preserving the changes until all components are upto date.  Should one of=
=20
the usb devices fail and is removed from the mirror the change log should=
=20
be cleared (provided all other components are upto date) allowing for=20
drive failers and stopping the block change log growing indefinitely.  It=
=20
should be possible to use the same change log for more than one device.

Normal full backups to usb devices can take many hours, this should reduce=
=20
the time to only the amount of data added within the period the device was=
=20
last attached to the mirror.


Example use for disaster recovery - slow links:

If the mirror consist of 2 components ad1 and ggatec1 with component=20
ggatec1 being on a slow link.

A flush period tuneable could be used by deactivating the ggatec component=
=20
and reactivating it allowing for an asynchronous mirror - =E1 la rsync but=
=20
faster as there is no file list etc.

A tuneable may be required to only sync blocks which have not been changed=
=20
in xx seconds/minutes to prevent the same blocks being transferred too=20
often.

A tuneable to specify the speed at which gmirror syncronises the out of=20
sync component will be required - This would possibly be useful for normal=
=20
gmirror use on a busy server when rebuilding a drive, as gmirror currently=
=20
uses all available  write speed to do so - limiting rebuild speed may=20
therefore prevent drive failures.


Live backup of laptop/workstation

If the mirror is created using a local disk ad0 and a ggated mounted=20
device ggate0 with a balance algorithm preferring ad0.

When the network is unavailable gmirror starts to keep track of the=20
changed blocks. On reconnection to the network and activating the ggatec=20
component the list of changed blocks can be flushed.  Should the same=20
block have changed more than once only the last change needs to be sent -=
=20
reducing network usage.

The main problem with mirroring the whole system drive is that any swap=20
changes will need to be ignored.


Other Considerations/Suggestions:

- Gmirrror will need somehow need to be informed that a drive has actually=
=20
failed and is not just temporarily disconnected.

- Data structure consideration difference between a list of block numbers=
=20
that have changed, and a block bitmap.  A block bitmap is perfect for=20
this, and only requires 1bit per block of storage, max.  No more.  A list=
=20
of block numbers can get *HUGE* though, because the block numbers are=20
probably all 64bit numbers, so it will be 64x the amount of space required=
=20
to store the list, not to mention the issues of sorting and maintaining it

- For determining the size of this 'block change map', you could use the=20
ceiling of the max number of blocks.  so, a 100Gb storage mirror, would=20
have roughly 200000000 512b blocks.  So, 200million bits (using a bitmap=20
to store when a block needs resyncing or not (0 no sync, 1 sync) is=20
roughly 24MB. You could pretty easily keep that in memory, but if the size=
=20
was 1Tb, you'd be at around 240MB, so that starts to get a little much.=20
Since this would be able to be enabled/disabled, it may not be an issue.

- Possibly, you could cheat.  Instead of marking each storage block=20
(512byte sector) as needing sync or not, you could do it in 16KB chunks.=20
So, if any sector inside that 16KB chunk was written, resync the whole=20
chunk.  That reduces your memory footprint for a 100GB mirror down to=20
something less than 1MB! That means a 1Tb mirror would need only 7-10MB.=20
You'll resync a little extra data, but since drives cache and the GEOM=20
layer does requests efficiently in larger sizes anyhow, this might=20
actually perform better anyway.


If there is anyone has further suggestions for this idea please let me=20
know and if there are and developers interested in this i may be able to=20
provide/donate some hardware - sorry not new - a laptop, desktop and some=
=20
hard drives - and can setup a machine for any network related testing.


--0-767011191-1177073043=:45782--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070420133854.G45782>