From owner-freebsd-fs@freebsd.org Sat Dec 31 18:08:48 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 41E32C99819 for ; Sat, 31 Dec 2016 18:08:48 +0000 (UTC) (envelope-from jpaetzel@FreeBSD.org) Received: from out1-smtp.messagingengine.com (out1-smtp.messagingengine.com [66.111.4.25]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 1BF901269 for ; Sat, 31 Dec 2016 18:08:47 +0000 (UTC) (envelope-from jpaetzel@FreeBSD.org) Received: from compute2.internal (compute2.nyi.internal [10.202.2.42]) by mailout.nyi.internal (Postfix) with ESMTP id 5D9B920783; Sat, 31 Dec 2016 13:08:36 -0500 (EST) Received: from web1 ([10.202.2.211]) by compute2.internal (MEProxy); Sat, 31 Dec 2016 13:08:36 -0500 DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-me-sender:x-me-sender:x-sasl-enc; s=smtpout; bh=Lp Z4iosNWVMACeRMryjy0GdCOEw=; b=RrCnnEPPFr83vgc3hAK1Df+8MRqg77PzKm y6Z1Lhklm2GtXu6pGRAva/E19FJ1fVein+EyaHcUVs/ffI2/ERJ6jJ1QhMWdgJli ZpZbti0VkACYahCxjgo/cRNCFzSTYs0QdBORA57GceF+/ageWJaiwG4wkAn1bLyN ta04PWcrA= X-ME-Sender: Received: by mailuser.nyi.internal (Postfix, from userid 99) id 3CA4EAA6C5; Sat, 31 Dec 2016 13:08:36 -0500 (EST) Message-Id: <1483207716.3465220.833841385.061386FF@webmail.messagingengine.com> From: Josh Paetzel To: Konstantin Belousov Cc: freebsd-fs@freebsd.org, Rick Macklem , ash@ixsystems.com MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="utf-8" X-Mailer: MessagingEngine.com Webmail Interface - ajax-9c115fcf In-Reply-To: <20161231133350.GU1923@kib.kiev.ua> Subject: Re: NFS readdirplus on ZFS with > 1 billion files References: <1483179971.3381747.833629401.5EF242B8@webmail.messagingengine.com> <20161231133350.GU1923@kib.kiev.ua> Date: Sat, 31 Dec 2016 12:08:36 -0600 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 31 Dec 2016 18:08:48 -0000 On Sat, Dec 31, 2016, at 07:33 AM, Konstantin Belousov wrote: > On Sat, Dec 31, 2016 at 04:26:11AM -0600, Josh Paetzel wrote: > > We've been chasing this bug for a very long time and finally managed to > > pin it down. When a ZFS dataset has more than 1 billion files on it and > > an NFS client does a readdirplus the file handles for files with high > > znode/inode numbers gets truncated due to a 64 -> 32 bit conversion. > > > > https://reviews.freebsd.org/D9009 > > > > This isn't a fix so much as a workaround. From a performance standpoint > > it's the same as if the client mounts with noreaddirplus; sometimes it's > > a win, sometimes it's a lose. CPU usage does go up on the server a bit. > > > > Can you point to the places in ZFS code where the truncation occur ? > I have no idea about ZFS code, and my question is mainly is the > truncation > just occurs due to different types of ino_t and zfs node id, or some code > actively does the range reduction. > > My question is in the context of the long-dragging ino64 work, which > might > be finished in some visible future. In particular, I am curious if just > using the patched kernel fixes your issue. See > https://github.com/FreeBSDFoundation/freebsd/tree/ino64 > although I do not make any claim about the state of the code yet. > > Your patch, after a review, might be still useful for stable/10 and 11, > since I do not think that ino64 has any bits which could be merged. > _______________________________________________ > freebsd-fs@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" That's a great question and I will attempt to answer the best I can, however I am cc'ing Ash Gokhale and Rick Macklem here because they understand the issue better and might be able to provide a better answer. My understanding is the issue occurs here: http://fxr.watson.org/fxr/source/fs/nfsserver/nfs_nfsdport.c?v=FREEBSD10#L2090 This codepath casts dirent d->fileno from 32 to 64bits to stuff the nfs fileno, but the legacy struct dirent->d_fileno is still 32 bit. I'm not entirely sure this is a ZFS specific issue at all, I've never tried to put 1 billion files on a UFS filesystem to see what would happen. (I suspect this issue with the NFS server would be the least of your issues) I agree the correct solution is the ino64 work. I'm fine with this hack going directly in to 11-STABLE and 10-STABLE. (In fact I think that's the best solution) Another thing we kicked around was making this hack a sysctl, such that you could manually activate it if a filesystem went over the threshold for the bug to occur. No one is completely convinced we understand fully the performance implications of this patch. -- Thanks, Josh Paetzel