From owner-freebsd-fs@FreeBSD.ORG Thu Jul 21 18:25:25 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6A5361065670; Thu, 21 Jul 2011 18:25:25 +0000 (UTC) (envelope-from mm@FreeBSD.org) Received: from mail.vx.sk (mail.vx.sk [IPv6:2a01:4f8:100:1043::3]) by mx1.freebsd.org (Postfix) with ESMTP id A42648FC15; Thu, 21 Jul 2011 18:25:24 +0000 (UTC) Received: from core.vx.sk (localhost [127.0.0.1]) by mail.vx.sk (Postfix) with ESMTP id 53A7115E8B5; Thu, 21 Jul 2011 20:25:23 +0200 (CEST) X-Virus-Scanned: amavisd-new at mail.vx.sk Received: from mail.vx.sk ([127.0.0.1]) by core.vx.sk (mail.vx.sk [127.0.0.1]) (amavisd-new, port 10024) with LMTP id sEjGCTkL5p+i; Thu, 21 Jul 2011 20:25:20 +0200 (CEST) Received: from [10.9.8.3] (chello085216231078.chello.sk [85.216.231.78]) by mail.vx.sk (Postfix) with ESMTPSA id E0F7F15E8A5; Thu, 21 Jul 2011 20:25:19 +0200 (CEST) Message-ID: <4E286F1F.6010502@FreeBSD.org> Date: Thu, 21 Jul 2011 20:25:35 +0200 From: Martin Matuska User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko/20110624 Thunderbird/5.0 MIME-Version: 1.0 To: Ivan Voras References: In-Reply-To: X-Enigmail-Version: 1.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-fs@freebsd.org Subject: Re: ZFS and large directories - caveat report X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Jul 2011 18:25:25 -0000 Quoting: ... The default record size ZFS utilizes is 128K, which is good for many storage servers that will harbor larger files. However, when dealing with many files that are only a matter of tens of kilobytes, or even bytes, considerable slowdown will result. ZFS can easily alter the record size of the data to be written through the use of attributes. These attributes can be set at any time through the use of the "zfs set" command. To set the record size attribute perform "zfs set recordsize=32K pool/share". This will set the recordsize to 32K on share "share" within pool "pool". This type of functionality can even be implemented on nested shares for even more flexibility. ... Read more: http://www.articlesbase.com/information-technology-articles/improving-file-system-performance-utilizing-dynamic-record-sizes-in-zfs-4565092.html#ixzz1SlWZ7BM5 Dn(a 21. 7. 2011 17:45, Ivan Voras wrote / napísal(a): > I'm writing this mostly for future reference / archiving and also if > someone has an idea on how to improve the situation. > > A web server I maintain was hit by DoS, which has caused more than 4 > million PHP session files to be created. The session files are sharded > in 32 directories in a single level - which is normally more than > enough for this web server as the number of users is only a couple of > thousand. With the DoS, the number of files per shard directory rose > to about 130,000. > > The problem is: ZFS has proven horribly inefficient with such large > directories. I have other, more loaded servers with simlarly bad / > large directories on UFS where the problem is not nearly as serious as > here (probably due to the large dirhash). On this system, any > operation which touches even only the parent of these 32 shards (e.g. > "ls") takes seconds, and a simple "find | wc -l" on one of the shards > takes > 30 minutes (I stopped it after 30 minutes). Another symptom is > that SIGINT-ing such find process takes 10-15 seconds to complete > (sic! this likely means the kernel operation cannot be interrupted for > so long). > > This wouldn't be a problem by itself, but operations on such > directories eat IOPS - clearly visible with the "find" test case, > making the rest of the services on the server fall as collateral > damage. Apparently there is a huge amount of seeking being done, even > though I would think that for read operations all the data would be > cached - and somehow the seeking from this operation takes priority / > livelocks other operations on the same ZFS pool. > > This is on a fresh 8-STABLE AMD64, pool version 28 and zfs version 5. > > Is there an equivalent of UFS dirhash memory setting for ZFS? (i.e. > the size of the metadata cache) > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" -- Martin Matuska FreeBSD committer http://blog.vx.sk