Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 5 May 2013 17:51:31 GMT
From:      Nathaniel Filardo <nwf@cs.jhu.edu>
To:        freebsd-gnats-submit@FreeBSD.org
Subject:   kern/178349: zfs scrub on deduped data could be much less seeky
Message-ID:  <201305051751.r45HpVkq020980@oldred.FreeBSD.org>
Resent-Message-ID: <201305051800.r45I00O6018479@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help

>Number:         178349
>Category:       kern
>Synopsis:       zfs scrub on deduped data could be much less seeky
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          change-request
>Submitter-Id:   current-users
>Arrival-Date:   Sun May 05 18:00:00 UTC 2013
>Closed-Date:
>Last-Modified:
>Originator:     Nathaniel Filardo
>Release:        9.1-STABLE
>Organization:
>Environment:
FreeBSD hydra.priv.oc.ietfng.org 9.1-STABLE FreeBSD 9.1-STABLE #46 r+c68cdd0-dirty: Tue Apr 23 22:59:02 EDT 2013     root@hydra.priv.oc.ietfng.org:/usr/obj/systank/src-git/sys/NWFKERN  sparc64

>Description:
ZFS tries to save time in scrubbing by visiting data in
most-referenced-to-least-referenced order (so that it need not visit a block
once for each reference to it): in short, it scans the DDT for all blocks
with refcount >1 and then walks the on-disk tree to visit refcount==1
blocks.  Unfortunately, the first phase is apparently prone to being very
seeky, resulting in agonizingly slow scrubs and resilvers (my disks all get
18-25 ops/sec during this phase, for a grand total of ~1.5MB/sec from my
raidz2; later traversals are much more respectable at 35MB/sec or so).  It
would be better, I think, if the scrub logic traversed the DDT with a
measure of on-disk locality (though this will, naturally, take several
passes to visit all blocks).

A straightforward way to do this, though by no means necessarily the best,
would be to allocate in RAM a fixed-size sorted queue of visited block
pointers and ignore block pointers that fell outside the min and max of this
queue (rather like the HAMMER2 lazy deduplication logic, amusingly enough).
Upon visiting a block pointer, it would be inserted into the queue and may
displace a higher address (which will be unnecessarily revisited later, but
that's OK), but will thereby restrict this pass to a narrower region of the
disk, reducing the number of long-distance seeks.  When a pass over the DDT
has finished, if the queue's max is still infinity, no additional passes are
needed; otherwise, the max of the queue should be made the min, the max
should be reset to infinity, and another pass over the DDT should be made.

The current bookmarking scheme is sufficient to resume this game, as well, I
think, with the understanding that all blocks in the DDT whose on-disk
location is greater than the bookmark are still due for scan (i.e. when
resuming, use the bookmark as the min of the queue and initialize the max to
infinity).

It may make sense, rather than tracking exact block pointers in the queue,
to mask off some number of bits from the bottom of their addresses and track
those values instead.
>How-To-Repeat:

>Fix:


>Release-Note:
>Audit-Trail:
>Unformatted:



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201305051751.r45HpVkq020980>