From owner-freebsd-fs@FreeBSD.ORG Fri Nov 21 23:45:54 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 6FE9516E for ; Fri, 21 Nov 2014 23:45:54 +0000 (UTC) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 1EC3ADE4 for ; Fri, 21 Nov 2014 23:45:53 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ar0EAEnOb1SDaFve/2dsb2JhbABchECDAtAAAoEdAQEBAQF9hAIBAQEDASNWBRYOCgICDRkCWQaISwm2dpcTAQEBAQEBBAEBAQEBAQEbgS2PKjQHgnmBVQWMJJQKi2+CL4dBhBsqgXiBAwEBAQ X-IronPort-AV: E=Sophos;i="5.07,434,1413259200"; d="scan'208";a="171650553" Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.222]) by esa-annu.net.uoguelph.ca with ESMTP; 21 Nov 2014 18:45:52 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 12EFDB404F; Fri, 21 Nov 2014 18:45:52 -0500 (EST) Date: Fri, 21 Nov 2014 18:45:52 -0500 (EST) From: Rick Macklem To: Konstantin Belousov Message-ID: <420608613.5215411.1416613552066.JavaMail.root@uoguelph.ca> In-Reply-To: <20141121155754.GN17068@kib.kiev.ua> Subject: Re: RFC: patch to make d_fileno 64bits MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.209] X-Mailer: Zimbra 7.2.6_GA_2926 (ZimbraWebClient - FF3.0 (Win)/7.2.6_GA_2926) Cc: FreeBSD Filesystems X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Nov 2014 23:45:54 -0000 Kostik wrote: > On Thu, Nov 20, 2014 at 10:19:14PM -0500, Rick Macklem wrote: > > The attached patch covers the basics of a way to > > convert the d_fileno field of "struct dirent" to > > 64bits. This patch is incomplete and won't even > > build, but I thought I'd post it in case anyone > > wanted to take a look and comment on the approach > > it uses. > > > > - renames the old/current one "struct dirent32" > > - changes d_fileno to 64bits and adds a 64bit > > d_off field for the offset of the underlying > > file system > > - defines a new VOP_READDIR() that will return > > the new "struct dirent" that is used as the > > default one for a new getdirentries(2). > > - the old/current getdirentries(2) uses the old > > VOP_READDIR32() by default. > > > > For the case of a file system that supports both > > the new and old VOP_READDIR(), they are used by > > the corresponding new and old getdirentries(2) > > syscalls. > > > > For a file system that only supports one of > > the VOP_READDIR()s, the "struct dirent32" > > is copied to "struct dirent" (or vice versa). > > > > At this point, all file systems would support > > the old VOP_READDIR() and I think the new > > VOP_READDIR() can easily be added for NFS, > > ZFS. (OpenBSD already has UFS code for > > essentially a new struct dirent and hopefully > > that code could be ported easily, too.) > > > > Anyhow, any comments on this approach? rick > > I do not think we need to have in-kernel compatibility shims. > The work, big but relatively trivial, is to convert filesystems to > use the new ino_t, even if the on-disk structures still use 32bit > inode number. > What about old binaries that do getdirentries(2) and expect the old structure with 32bit d_fileno or the linux compatibility stuff? I suspect that there are some old staticly linked binaries out there that does/expects the old getdirentries. Having said that, most apps will use readdir(3). Do we need to somehow allow old binaries work with a newer libc? (If so, that's going to be really nasty. I had assumed that old libc code would do old getdirentries(2) and, as such, having a working old and new getdirentries(2) would handle old binaries? I was trying to avoid data copying for the case of an old getdirentries(2) by having file systems provide VOP_READDIR() calls for both old and new structures. It is certainly possible to have all file systems only produce the new "struct dirent" and then just do data copying/conversion to the old one. Btw, I think the new getdirentries(2) will need additional arguments, since the offset for the underlying file system needs to be provided along with the "logical offset", which is the byte offset within the directory being returned as "struct dirent"s. > Really problematic part of this change is the usermode ABI breakage. > The struct dirent is only the start of the whole issue. ino_t is > embedded into more structures which are part of the contract, e.g. > struct stat. We have to provide new syscalls which accept or return > the affected structures. > > And then, there are libraries which embed ino_t into their ABI. > Immediate example is fts(3) in libc. Look at the FTSENT.fts_ino. Even > after the base system is fixed by properly providing the compat shims > and symbol versions for the affected libraries, we get the same > problem > with the binaries not from base. > > Summary of the issue with ino_t is that it is not too hard to fix the > kernel, comparing with the ABI issues which must be solved in > usermode. > > Yes, I was just going to look at d_fileno as a starting point. (For whatever reason d_fileno isn't defined as ino_t?) I was specifically avoiding any use of "ino_t" and saw it as something that needed to eventually change to 64 bits at the very end. I was aware of Gleb Kurtsou's work, but didn't realize it lived in projects/ino64 and he had mentioned that he was busy, but would try and find time to update the patch. I will look at projects/ino64 and it sounds like Kirk would like to figure it all out in projects/ino64 and eventually do a "super patch" to head. This sounds fine to me, if we can pull it off. rick