From owner-freebsd-fs@FreeBSD.ORG Wed Jan 30 00:06:06 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 16348650 for ; Wed, 30 Jan 2013 00:06:06 +0000 (UTC) (envelope-from toasty@dragondata.com) Received: from mail-ia0-x22e.google.com (mail-ia0-x22e.google.com [IPv6:2607:f8b0:4001:c02::22e]) by mx1.freebsd.org (Postfix) with ESMTP id D7221FE for ; Wed, 30 Jan 2013 00:06:05 +0000 (UTC) Received: by mail-ia0-f174.google.com with SMTP id o25so1453847iad.5 for ; Tue, 29 Jan 2013 16:06:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dragondata.com; s=google; h=x-received:content-type:mime-version:subject:from:in-reply-to:date :cc:message-id:references:to:x-mailer; bh=8sboX/ToX1azSjD1R2MMSsZH/dhn5fvGFOkTgIZv4jY=; b=jbE8aMdNgBQK7S8V2vlZbNCt1iQ9dHQRYxk4BiBv0kiz/0scoxWzMdVHrAummJl5Ts mUcdXnNm1dVmUmNbFLRiCCQrrVTZ/4FTsfc1/EkykJrdDpblGlfAYY9lVwqOqwKd4vf9 TyDIq7fK7n+sNx+mc3O7SQ7v/wYuFNpDpL/uY= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:content-type:mime-version:subject:from:in-reply-to:date :cc:message-id:references:to:x-mailer:x-gm-message-state; bh=8sboX/ToX1azSjD1R2MMSsZH/dhn5fvGFOkTgIZv4jY=; b=G+laN224gQKTIm2dkhUxiJbxhFZ7zQe13nP/iOThgaF2Zonr1LlnEL+9/rcAXYEa0/ gvatwHT0Nj0NWGsr5eJboVTpK17WRCkpCI+nwpykit98Iee503M4r4msS6z1cTrmWKhq yFiOnrWM83YJvYmq5ab0qpMTAr70jIVCdpLTVyPVumcW63GpGVnbLVlj6eEZx6DhEHXs IZ9cslDVDqNGJRgkgws2BAQ7fRDEU3k5ElvhfL3NCDUrRopBvUBMQbb14QGr/kQbZfc7 GNdFVtov9te0XIyNe8Lspzalo0rknuVxYdw1W3IwfJi2jVaRymyWYfESAc7RPEv46k71 5lSA== X-Received: by 10.42.11.203 with SMTP id v11mr1911977icv.28.1359504365528; Tue, 29 Jan 2013 16:06:05 -0800 (PST) Received: from vpn132.rw1.your.org (vpn132.rw1.your.org. [204.9.51.132]) by mx.google.com with ESMTPS id uj6sm3844598igb.4.2013.01.29.16.06.03 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 29 Jan 2013 16:06:04 -0800 (PST) Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Subject: Re: Improving ZFS performance for large directories From: Kevin Day In-Reply-To: Date: Tue, 29 Jan 2013 18:06:01 -0600 Message-Id: References: <19DB8F4A-6788-44F6-9A2C-E01DEA01BED9@dragondata.com> To: Matthew Ahrens X-Mailer: Apple Mail (2.1499) X-Gm-Message-State: ALoCoQn+7di3aQMcA75dIfGldt7pYAFItZYBgEiliyBw3tVFZFyVAdaNgRG692kEa9iKr5URNyr/ Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: FreeBSD Filesystems X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 30 Jan 2013 00:06:06 -0000 On Jan 29, 2013, at 5:42 PM, Matthew Ahrens wrote: > On Tue, Jan 29, 2013 at 3:20 PM, Kevin Day = wrote: > I'm prepared to try an L2arc cache device (with = secondarycache=3Dmetadata), >=20 > You might first see how long it takes when everything is cached. E.g. = by doing this in the same directory several times. This will give you a = lower bound on the time it will take (or put another way, an upper bound = on the improvement available from a cache device). > =20 Doing it twice back-to-back makes a bit of difference but it's still = slow either way. After not touching this directory for about 30 minutes: # time ls -l >/dev/null 0.773u 2.665s 0:18.21 18.8% 35+2749k 3012+0io 0pf+0w Immediately again: # time ls -l > /dev/null 0.665u 1.077s 0:08.60 20.1% 35+2719k 556+0io 0pf+0w 18.2 vs 8.6 seconds is an improvement, but even the 8.6 seconds is = longer than what I was expecting. >=20 > For a specific filesystem, nothing comes to mind, but I'm sure you = could cobble something together with zdb. There are several tools to = determine the amount of metadata in a ZFS storage pool: >=20 > - "zdb -bbb " > but this is unreliable on pools that are in use I tried this and it consumed >16GB of memory after about 5 minutes so I = had to kill it. I'll try it again during our next maintenance window = where it can be the only thing running. > - "zpool scrub ; ; echo '::walk = spa|::zfs_blkstats' | mdb -k" > the scrub is slow, but this can be mitigated by setting the global = variable zfs_no_scrub_io to 1. If you don't have mdb or equivalent = debugging tools on freebsd, you can manually look at = ->spa_dsl_pool->dp_blkstats. >=20 > In either case, the "LSIZE" is the size that's required for caching = (in memory or on a l2arc cache device). At a minimum you will need 512 = bytes for each file, to cache the dnode_phys_t. Okay, thanks a bunch. I'll try this on the next chance I get too. I think some of the issue is that nothing is being allowed to stay = cached long. We have several parallel rsyncs running at once that are = basically scanning every directory as fast as they can, combined with a = bunch of rsync, http and ftp clients. I'm guessing with all that = activity things are getting shoved out pretty quickly.