Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 29 Jan 2013 18:06:01 -0600
From:      Kevin Day <toasty@dragondata.com>
To:        Matthew Ahrens <mahrens@delphix.com>
Cc:        FreeBSD Filesystems <freebsd-fs@freebsd.org>
Subject:   Re: Improving ZFS performance for large directories
Message-ID:  <F4420A8C-FB92-4771-B261-6C47A736CF7F@dragondata.com>
In-Reply-To: <CAJjvXiE%2B8OMu_yvdRAsWugH7W=fhFW7bicOLLyjEn8YrgvCwiw@mail.gmail.com>
References:  <19DB8F4A-6788-44F6-9A2C-E01DEA01BED9@dragondata.com> <CAJjvXiE%2B8OMu_yvdRAsWugH7W=fhFW7bicOLLyjEn8YrgvCwiw@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help

On Jan 29, 2013, at 5:42 PM, Matthew Ahrens <mahrens@delphix.com> wrote:

> On Tue, Jan 29, 2013 at 3:20 PM, Kevin Day <toasty@dragondata.com> =
wrote:
> I'm prepared to try an L2arc cache device (with =
secondarycache=3Dmetadata),
>=20
> You might first see how long it takes when everything is cached.  E.g. =
by doing this in the same directory several times.  This will give you a =
lower bound on the time it will take (or put another way, an upper bound =
on the improvement available from a cache device).
> =20

Doing it twice back-to-back makes a bit of difference but it's still =
slow either way.

After not touching this directory for about 30 minutes:

# time ls -l >/dev/null
0.773u 2.665s 0:18.21 18.8%	35+2749k 3012+0io 0pf+0w

Immediately again:

# time ls -l > /dev/null
0.665u 1.077s 0:08.60 20.1%	35+2719k 556+0io 0pf+0w

18.2 vs 8.6 seconds is an improvement, but even the 8.6 seconds is =
longer than what I was expecting.

>=20
> For a specific filesystem, nothing comes to mind, but I'm sure you =
could cobble something together with zdb.  There are several tools to =
determine the amount of metadata in a ZFS storage pool:
>=20
>  - "zdb -bbb <pool>"
>      but this is unreliable on pools that are in use

I tried this and it consumed >16GB of memory after about 5 minutes so I =
had to kill it. I'll try it again during our next maintenance window =
where it can be the only thing running.

>  - "zpool scrub <pool>; <wait for scrub to complete>; echo '::walk =
spa|::zfs_blkstats' | mdb -k"
>     the scrub is slow, but this can be mitigated by setting the global =
variable zfs_no_scrub_io to 1.  If you don't have mdb or equivalent =
debugging tools on freebsd, you can manually look at =
<spa_t>->spa_dsl_pool->dp_blkstats.
>=20
> In either case, the "LSIZE" is the size that's required for caching =
(in memory or on a l2arc cache device).  At a minimum you will need 512 =
bytes for each file, to cache the dnode_phys_t.

Okay, thanks a bunch. I'll try this on the next chance I get too.


I think some of the issue is that nothing is being allowed to stay =
cached long. We have several parallel rsyncs running at once that are =
basically scanning every directory as fast as they can, combined with a =
bunch of rsync, http and ftp clients. I'm guessing with all that =
activity things are getting shoved out pretty quickly.






Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?F4420A8C-FB92-4771-B261-6C47A736CF7F>