From owner-freebsd-fs@FreeBSD.ORG Sun Nov 23 00:27:09 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 82AE4E6A for ; Sun, 23 Nov 2014 00:27:09 +0000 (UTC) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 30C01909 for ; Sun, 23 Nov 2014 00:27:08 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ar8EAO4ocVSDaFve/2dsb2JhbABchDwEgwLQBQKBFgEBAQEBfYQCAQEBAwEjBFIbDgMDAQIBAgINGQIjLggGE4gsAwkJtxuPTQ2GQgEBAQEBAQEDAQEBAQEBARuBLY0hggk0B4J5gVUFjCWQR486gneGfoQbKjCBSIEDAQEB X-IronPort-AV: E=Sophos;i="5.07,439,1413259200"; d="scan'208";a="171780913" Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.222]) by esa-annu.net.uoguelph.ca with ESMTP; 22 Nov 2014 19:27:01 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 0AEE8B4034; Sat, 22 Nov 2014 19:27:02 -0500 (EST) Date: Sat, 22 Nov 2014 19:27:02 -0500 (EST) From: Rick Macklem To: Konstantin Belousov Message-ID: <933608854.5622292.1416702422030.JavaMail.root@uoguelph.ca> In-Reply-To: <20141122175531.GZ17068@kib.kiev.ua> Subject: Re: RFC: patch to make d_fileno 64bits MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.209] X-Mailer: Zimbra 7.2.6_GA_2926 (ZimbraWebClient - FF3.0 (Win)/7.2.6_GA_2926) Cc: FreeBSD Filesystems , Kirk McKusick X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 23 Nov 2014 00:27:09 -0000 Kostik wrote: > On Sat, Nov 22, 2014 at 09:23:41AM -0800, Kirk McKusick wrote: > > > Date: Sat, 22 Nov 2014 17:34:27 +0200 > > > From: Konstantin Belousov > > > To: Rick Macklem > > > Subject: Re: RFC: patch to make d_fileno 64bits > > > Cc: FreeBSD Filesystems > > > > > > On Fri, Nov 21, 2014 at 06:45:52PM -0500, Rick Macklem wrote: > > >> Kostik wrote: > > >> What about old binaries that do getdirentries(2) and expect the > > >> old > > >> structure with 32bit d_fileno or the linux compatibility stuff? > > >> I suspect that there are some old staticly linked binaries out > > >> there > > >> that does/expects the old getdirentries. > > > > > > No, let me restate my position. There are two places for > > > backward > > > compatibility, one is in-kernel binary interface, and another is > > > applications ,i.e. KBI and ABI. > > > > > > My opinion is that we must provide strict backward ABI > > > compatibility > > > to have even right to be called useful OS. In particular, the > > > syscalls > > > like current getdirentries (156 and 196) providing 32-bit > > > inonums, must > > > be kept with their current binary contract. The userspace issues > > > do > > > not end there, but this is not the currently discussed item. > > > > > > On the other hand, providing KBI compat for filesystems which > > > work > > > right now with 32bit inode numbers, should not be done. I.e., no > > > VOP_READDIR_32INO(), all filesystems must be converted once. > > > Yes, I did understand that others felt having the 2 VOP_READDIR_XX() wasn't appropriate and that is fine with me. (I just did it to try and allow file systems to be converted when someone got around to it.) What I will probably do for my testing is implement a MNTK_DIRENT64 flag that file systems set when they are converted to generating the new dirent64. (I'm not sure if this could go into head, since I'm not sure if it would work for stacked file systems? Maybe the stacked file system could just set the flag if the flag is set for the underlying fs, but I'm not sure. Anyhow, the object will be to convert all file systems so this doesn't need to be in head.) > > > For syscalls 156 and 196 (and some more), the converter must be > > > written > > > in the vfs_syscalls.c which translates the new dirents into old > > > dirents, > > > at the level of best efforts. > > > > I believe that we are all in agreement with you on the kernel > > approach > > at this point. > Well, I think this was the Rick patch and proposal to have compat > ino32 > in kernel. > Yes, I agree to one VOP_READDIR() and a converter (already written) from new->old in vfs_syscalls.c. > > > > Do we have a way of versioning libc so that we can have the old > > version > > that provides the 32-bit version of the syscalls (156 and 196) > > along > > with 32-bit higher-level functions like fts and friends and then a > > new > > libc version that has the 64-bit version of the syscalls and other > > higher-level functions? > > We do not need several versions of libc. We support symbol > versioning, > i.e. we can have old getdirents symbol which resolves to syscall stub > for 196, and new getdirents for new syscall. > > It is somewhat convoluted feature, you could look at example in > sys/kern/sysv_*.c, for instance, freebsd7_shmctl and shmctl. Also > look at libc/include/compat.h. For pure usermode compat shims, > lib/libc/gen/fts-compat.c was already handled one time. > > I promise to write neccessary magic for libc versioning when needed. > As I explained before, unfortunately the libc is not the final point > for the userspace compat. > I'll keep plugging away at the dirent structure and am glad you guys are going to attack the tough userland parts. Good luck with it, rick From owner-freebsd-fs@FreeBSD.ORG Sun Nov 23 01:49:20 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 55516FEC; Sun, 23 Nov 2014 01:49:20 +0000 (UTC) Received: from mail-pa0-x230.google.com (mail-pa0-x230.google.com [IPv6:2607:f8b0:400e:c03::230]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 1D865FEE; Sun, 23 Nov 2014 01:49:20 +0000 (UTC) Received: by mail-pa0-f48.google.com with SMTP id rd3so7336275pab.35 for ; Sat, 22 Nov 2014 17:49:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=MnBQSPP76Q0ketcdt4nQD9DjNM4/J/oYJyoYqhzdMzE=; b=STL6McbGjKqcT5tCRmzmC267ki1DD8VlcEXPqPSBUIqpqk3lbG8Ro2Q/uGdZzWBvjE IsJ4cYGh0kRMTqDSZ2L7kYeJKbKLCGtx8GFmKKcllgidxfvXcGpEdERe0OncqtfvTibf G+gXJvjDeWe+gQbW0e1HD8339yovlWB14U8qmpPE51hh39sovDT5LauPQQhT7bOVyag8 +7Xk6c4Av6J112O38b1oS6iyS+fCAz5JcGlU0ws4ThXrL4BtKC4HbQoiR+95nufbel5d HKQyXcqOKGbCAsuN5pB6LRd6UX2GdZe2eDkak4Hp4NhV1YRntaoXTrHuT1O4TssK+n/H r3pA== X-Received: by 10.66.66.42 with SMTP id c10mr7339057pat.4.1416707359755; Sat, 22 Nov 2014 17:49:19 -0800 (PST) Received: from localhost (c-76-21-76-83.hsd1.ca.comcast.net. [76.21.76.83]) by mx.google.com with ESMTPSA id j6sm8538751pdm.16.2014.11.22.17.49.18 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 22 Nov 2014 17:49:18 -0800 (PST) Sender: Gleb Kurtsou Date: Sat, 22 Nov 2014 17:50:22 -0800 From: Gleb Kurtsou To: Rick Macklem Subject: Re: RFC: patch to make d_fileno 64bits Message-ID: <20141123015021.GA1658@reks> References: <20141121182219.GA1076@reks> <1346064334.5234809.1416616386575.JavaMail.root@uoguelph.ca> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="SUOF0GtieIMvvwua" Content-Disposition: inline In-Reply-To: <1346064334.5234809.1416616386575.JavaMail.root@uoguelph.ca> User-Agent: Mutt/1.5.23 (2014-03-12) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 23 Nov 2014 01:49:20 -0000 --SUOF0GtieIMvvwua Content-Type: text/plain; charset=utf-8 Content-Disposition: inline On (21/11/2014 19:33), Rick Macklem wrote: > Gleb Kurtsou wrote: > > On (21/11/2014 10:25), John Baldwin wrote: [...] > > > > > > I think this is already done (along with several other changes) > > > more fully in > > > the projects/ino64 branch in svn? > > > > projects/ino64 was created by mdf for merging GSoC commits, and it > > didn't even get half way through. > > > > I'm currently working on merging the code to CURRENT. It's been more > > than 2 years, so there is quite some work in there. I intend to > > update > > the branch as soon as code is ready for review. > > > Btw, I just took a quick look and I didn't find any changes to "struct dirent" > in projects/ino64, so I think my original assumption that this piece of the > puzzle hadn't yet been solved, is correct. (Gleb, if you had changes to > "struct dirent" and related fs changes, please let me know.) projects/ino64 was created for merging GSoC commits. The branch is incomplete to say the least.. There are preparatory changes in there only. In case you are interested please refer to https://github.com/glk/freebsd-ino64/commits/projects/ino64 BTW original GSoC branch also changes VOP_READDIR for all file systems to populate dirent.d_off. I've attached the patch for some of the system headers generated by git diff -r 531f5069a9b0f61b8ecd08e4ed744cec3b022606 -r github/projects/ino64 sys/sys/{_types,dirent,mount,stat}.h > ~/ino64-sys-sys.patch > > Oh, and thanks to some comments, the new struct dirent has already changed to: > > struct dirent { > __uint64_t d_cookie; /* dir cookie for next dir entry */ > __uint64_t d_fileno; > __uint16_t d_reclen; > __uint8_t d_type; > __uint8_t d_namlen; > __uint8_t d_pad[4]; /* align d_name to 8 byte boundary */ > __uint8_t d_name[MAXNAMLEN + 1]; > }; GSoC'2011 code: struct dirent { ino_t d_fileno; /* file number of entry */ off_t d_off; /* next entry seek offset */ __uint16_t d_reclen; /* length of this record */ __uint16_t d_namlen; /* length of string in d_name */ __uint8_t d_type; /* file type, see below */ __uint8_t d_unused1; __uint16_t d_unused2; char d_name[MAXNAMLEN + 1]; /* name must be no longer than this */ }; > > It was pointed out that C would pad the structure to a multiple of 8 bytes > for some arches and without d_pad that would imply d_name wasn't at the end > of the structure. (Apparently code somewhere find d_name by subtracting MAXNAMLEN + 1 > from sizeof(struct dirent) and this fails if d_name isn't at the end. Yuck, > but the above fixes it.) > > However, the size of d_namlen could become uint16_t, if anyone thinks MAXNAMLEN > might want to be greater than 255 someday (long away, since that's another ABI change). Somebody has already mentioned it previously so I've bumped it to uint16 back than. > > rick > > > Besides branch also changes dev_t to 64-bit, bumps MNAMELEN to 1024 > > and > > has complete ABI compatibility shims (probably except openaudit which > > had > > issues). > > > > > > _______________________________________________ > > freebsd-fs@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > > --SUOF0GtieIMvvwua Content-Type: text/x-diff; charset=utf-8 Content-Disposition: attachment; filename="ino64-sys-sys.patch" diff --git a/sys/sys/_types.h b/sys/sys/_types.h index c59afd3..2d757f5 100644 --- a/sys/sys/_types.h +++ b/sys/sys/_types.h @@ -32,37 +32,37 @@ #include #include /* * Standard type definitions. */ typedef __uint32_t __blksize_t; /* file block size */ typedef __int64_t __blkcnt_t; /* file block count */ typedef __int32_t __clockid_t; /* clock_gettime()... */ typedef __uint64_t __cap_rights_t; /* capability rights */ typedef __uint32_t __fflags_t; /* file flags */ typedef __uint64_t __fsblkcnt_t; typedef __uint64_t __fsfilcnt_t; typedef __uint32_t __gid_t; typedef __int64_t __id_t; /* can hold a gid_t, pid_t, or uid_t */ -typedef __uint32_t __ino_t; /* inode number */ +typedef __uint64_t __ino_t; /* inode number */ typedef long __key_t; /* IPC key (for Sys V IPC) */ typedef __int32_t __lwpid_t; /* Thread ID (a.k.a. LWP) */ typedef __uint16_t __mode_t; /* permissions */ typedef int __accmode_t; /* access permissions */ typedef int __nl_item; -typedef __uint16_t __nlink_t; /* link count */ +typedef __uint32_t __nlink_t; /* link count */ typedef __int64_t __off_t; /* file offset */ typedef __int32_t __pid_t; /* process [group] */ typedef __int64_t __rlim_t; /* resource limit - intentionally */ /* signed, because of legacy code */ /* that uses -1 for RLIM_INFINITY */ typedef __uint8_t __sa_family_t; typedef __uint32_t __socklen_t; typedef long __suseconds_t; /* microseconds (signed) */ typedef struct __timer *__timer_t; /* timer_gettime()... */ typedef struct __mq *__mqd_t; /* mq_open()... */ typedef __uint32_t __uid_t; typedef unsigned int __useconds_t; /* microseconds (unsigned) */ typedef int __cpuwhich_t; /* which parameter for cpuset. */ typedef int __cpulevel_t; /* level parameter for cpuset. */ typedef int __cpusetid_t; /* cpuset identifier. */ @@ -78,29 +78,29 @@ typedef int __cpusetid_t; /* cpuset identifier. */ * ints cannot hold 32 bits, you will be in trouble. The reason an int was * chosen over a long is that the is*() and to*() routines take ints (says * ANSI C), but they use __ct_rune_t instead of int. * * NOTE: rune_t is not covered by ANSI nor other standards, and should not * be instantiated outside of lib/libc/locale. Use wchar_t. wchar_t and * rune_t must be the same type. Also, wint_t must be no narrower than * wchar_t, and should be able to hold all members of the largest * character set plus one extra value (WEOF), and must be at least 16 bits. */ typedef int __ct_rune_t; /* arg type for ctype funcs */ typedef __ct_rune_t __rune_t; /* rune_t (see above) */ typedef __ct_rune_t __wchar_t; /* wchar_t (see above) */ typedef __ct_rune_t __wint_t; /* wint_t (see above) */ -typedef __uint32_t __dev_t; /* device number */ +typedef __uint64_t __dev_t; /* device number */ typedef __uint32_t __fixpt_t; /* fixed point number */ /* * mbstate_t is an opaque object to keep conversion state during multibyte * stream conversions. */ typedef union { char __mbstate8[128]; __int64_t _mbstateL; /* for alignment */ } __mbstate_t; #endif /* !_SYS__TYPES_H_ */ diff --git a/sys/sys/dirent.h b/sys/sys/dirent.h index dfaacff..107936b 100644 --- a/sys/sys/dirent.h +++ b/sys/sys/dirent.h @@ -24,55 +24,76 @@ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * @(#)dirent.h 8.3 (Berkeley) 8/10/94 * $FreeBSD$ */ #ifndef _SYS_DIRENT_H_ #define _SYS_DIRENT_H_ #include #include +#ifndef _INO_T_DECLARED +typedef __ino_t ino_t; +#define _INO_T_DECLARED +#endif + +#ifndef _OFF_T_DECLARED +typedef __off_t off_t; +#define _OFF_T_DECLARED +#endif + /* * The dirent structure defines the format of directory entries returned by * the getdirentries(2) system call. * * A directory entry has a struct dirent at the front of it, containing its * inode number, the length of the entry, and the length of the name * contained in the entry. These are followed by the name padded to a 4 * byte boundary with null bytes. All names are guaranteed null terminated. * The maximum length of a name in a directory is MAXNAMLEN. */ struct dirent { - __uint32_t d_fileno; /* file number of entry */ + ino_t d_fileno; /* file number of entry */ + off_t d_off; /* next entry seek offset */ __uint16_t d_reclen; /* length of this record */ + __uint16_t d_namlen; /* length of string in d_name */ __uint8_t d_type; /* file type, see below */ - __uint8_t d_namlen; /* length of string in d_name */ + __uint8_t d_unused1; + __uint16_t d_unused2; #if __BSD_VISIBLE #define MAXNAMLEN 255 char d_name[MAXNAMLEN + 1]; /* name must be no longer than this */ #else char d_name[255 + 1]; /* name must be no longer than this */ #endif }; #if __BSD_VISIBLE +struct freebsd9_dirent { + __uint32_t d_fileno; /* file number of entry */ + __uint16_t d_reclen; /* length of this record */ + __uint8_t d_type; /* file type, see below */ + __uint8_t d_namlen; /* length of string in d_name */ + char d_name[255 + 1]; /* name must be no longer than this */ +}; + /* * File types */ #define DT_UNKNOWN 0 #define DT_FIFO 1 #define DT_CHR 2 #define DT_DIR 4 #define DT_BLK 6 #define DT_REG 8 #define DT_LNK 10 #define DT_SOCK 12 #define DT_WHT 14 /* * Convert between stat structure types and directory types. diff --git a/sys/sys/mount.h b/sys/sys/mount.h index ecaf7b8..01f9e04 100644 --- a/sys/sys/mount.h +++ b/sys/sys/mount.h @@ -52,57 +52,83 @@ typedef struct fsid { int32_t val[2]; } fsid_t; /* filesystem id type */ * File identifier. * These are unique per filesystem on a single machine. */ #define MAXFIDSZ 16 struct fid { u_short fid_len; /* length of data in bytes */ u_short fid_data0; /* force longword alignment */ char fid_data[MAXFIDSZ]; /* data (variable length) */ }; /* * filesystem statistics */ #define MFSNAMELEN 16 /* length of type name including null */ -#define MNAMELEN 88 /* size of on/from name bufs */ -#define STATFS_VERSION 0x20030518 /* current version number */ +#define MNAMELEN 1024 /* size of on/from name bufs */ +#define STATFS_VERSION 0x20110618 /* current version number */ struct statfs { uint32_t f_version; /* structure version number */ uint32_t f_type; /* type of filesystem */ uint64_t f_flags; /* copy of mount exported flags */ uint64_t f_bsize; /* filesystem fragment size */ uint64_t f_iosize; /* optimal transfer block size */ uint64_t f_blocks; /* total data blocks in filesystem */ uint64_t f_bfree; /* free blocks in filesystem */ int64_t f_bavail; /* free blocks avail to non-superuser */ uint64_t f_files; /* total file nodes in filesystem */ int64_t f_ffree; /* free nodes avail to non-superuser */ uint64_t f_syncwrites; /* count of sync writes since mount */ uint64_t f_asyncwrites; /* count of async writes since mount */ uint64_t f_syncreads; /* count of sync reads since mount */ uint64_t f_asyncreads; /* count of async reads since mount */ uint64_t f_spare[10]; /* unused spare */ uint32_t f_namemax; /* maximum filename length */ uid_t f_owner; /* user that mounted the filesystem */ fsid_t f_fsid; /* filesystem id */ char f_charspare[80]; /* spare string space */ char f_fstypename[MFSNAMELEN]; /* filesystem type name */ char f_mntfromname[MNAMELEN]; /* mounted filesystem */ char f_mntonname[MNAMELEN]; /* directory on which mounted */ }; +#define FREEBSD9_STATFS_VERSION 0x20030518 /* current version number */ +struct freebsd9_statfs { + uint32_t f_version; /* structure version number */ + uint32_t f_type; /* type of filesystem */ + uint64_t f_flags; /* copy of mount exported flags */ + uint64_t f_bsize; /* filesystem fragment size */ + uint64_t f_iosize; /* optimal transfer block size */ + uint64_t f_blocks; /* total data blocks in filesystem */ + uint64_t f_bfree; /* free blocks in filesystem */ + int64_t f_bavail; /* free blocks avail to non-superuser */ + uint64_t f_files; /* total file nodes in filesystem */ + int64_t f_ffree; /* free nodes avail to non-superuser */ + uint64_t f_syncwrites; /* count of sync writes since mount */ + uint64_t f_asyncwrites; /* count of async writes since mount */ + uint64_t f_syncreads; /* count of sync reads since mount */ + uint64_t f_asyncreads; /* count of async reads since mount */ + uint64_t f_spare[10]; /* unused spare */ + uint32_t f_namemax; /* maximum filename length */ + uid_t f_owner; /* user that mounted the filesystem */ + fsid_t f_fsid; /* filesystem id */ + char f_charspare[80]; /* spare string space */ + char f_fstypename[16]; /* filesystem type name */ + char f_mntfromname[88]; /* mounted filesystem */ + char f_mntonname[88]; /* directory on which mounted */ +}; + #ifdef _KERNEL #define OMFSNAMELEN 16 /* length of fs type name, including null */ #define OMNAMELEN (88 - 2 * sizeof(long)) /* size of on/from name bufs */ /* XXX getfsstat.2 is out of date with write and read counter changes here. */ /* XXX statfs.2 is out of date with read counter changes here. */ struct ostatfs { long f_spare2; /* placeholder */ long f_bsize; /* fundamental filesystem block size */ long f_iosize; /* optimal transfer block size */ long f_blocks; /* total data blocks in filesystem */ long f_bfree; /* free blocks in fs */ long f_bavail; /* free blocks avail to non-superuser */ long f_files; /* total file nodes in filesystem */ long f_ffree; /* free file nodes in fs */ diff --git a/sys/sys/stat.h b/sys/sys/stat.h index 1b03bd2..a9b5ff4 100644 --- a/sys/sys/stat.h +++ b/sys/sys/stat.h @@ -90,86 +90,117 @@ typedef __off_t off_t; #ifndef _UID_T_DECLARED typedef __uid_t uid_t; #define _UID_T_DECLARED #endif #if !defined(_KERNEL) && __BSD_VISIBLE /* * XXX We get miscellaneous namespace pollution with this. */ #include #endif #if __BSD_VISIBLE struct ostat { __uint16_t st_dev; /* inode's device */ - ino_t st_ino; /* inode's number */ + __uint32_t st_ino; /* inode's number */ mode_t st_mode; /* inode protection mode */ - nlink_t st_nlink; /* number of hard links */ + __uint16_t st_nlink; /* number of hard links */ __uint16_t st_uid; /* user ID of the file's owner */ __uint16_t st_gid; /* group ID of the file's group */ __uint16_t st_rdev; /* device type */ __int32_t st_size; /* file size, in bytes */ struct timespec st_atim; /* time of last access */ struct timespec st_mtim; /* time of last data modification */ struct timespec st_ctim; /* time of last file status change */ __int32_t st_blksize; /* optimal blocksize for I/O */ __int32_t st_blocks; /* blocks allocated for file */ fflags_t st_flags; /* user defined flags for file */ __uint32_t st_gen; /* file generation number */ }; + +struct freebsd9_stat { + __uint32_t st_dev; /* inode's device */ + __uint32_t st_ino; /* inode's number */ + mode_t st_mode; /* inode protection mode */ + __uint16_t st_nlink; /* number of hard links */ + uid_t st_uid; /* user ID of the file's owner */ + gid_t st_gid; /* group ID of the file's group */ + __uint32_t st_rdev; /* device type */ + struct timespec st_atim; /* time of last access */ + struct timespec st_mtim; /* time of last data modification */ + struct timespec st_ctim; /* time of last file status change */ + off_t st_size; /* file size, in bytes */ + blkcnt_t st_blocks; /* blocks allocated for file */ + blksize_t st_blksize; /* optimal blocksize for I/O */ + fflags_t st_flags; /* user defined flags for file */ + __uint32_t st_gen; /* file generation number */ + __int32_t st_lspare; + struct timespec st_birthtim; /* time of file creation */ + /* + * Explicitly pad st_birthtim to 16 bytes so that the size of + * struct stat is backwards compatible. We use bitfields instead + * of an array of chars so that this doesn't require a C99 compiler + * to compile if the size of the padding is 0. We use 2 bitfields + * to cover up to 64 bits on 32-bit machines. We assume that + * CHAR_BIT is 8... + */ + unsigned int :(8 / 2) * (16 - (int)sizeof(struct timespec)); + unsigned int :(8 / 2) * (16 - (int)sizeof(struct timespec)); +}; #endif /* __BSD_VISIBLE */ struct stat { - __dev_t st_dev; /* inode's device */ + dev_t st_dev; /* inode's device */ ino_t st_ino; /* inode's number */ - mode_t st_mode; /* inode protection mode */ nlink_t st_nlink; /* number of hard links */ + mode_t st_mode; /* inode protection mode */ + __int16_t st_padding0; uid_t st_uid; /* user ID of the file's owner */ gid_t st_gid; /* group ID of the file's group */ - __dev_t st_rdev; /* device type */ + dev_t st_rdev; /* device type */ struct timespec st_atim; /* time of last access */ struct timespec st_mtim; /* time of last data modification */ struct timespec st_ctim; /* time of last file status change */ off_t st_size; /* file size, in bytes */ blkcnt_t st_blocks; /* blocks allocated for file */ blksize_t st_blksize; /* optimal blocksize for I/O */ fflags_t st_flags; /* user defined flags for file */ __uint32_t st_gen; /* file generation number */ __int32_t st_lspare; struct timespec st_birthtim; /* time of file creation */ /* * Explicitly pad st_birthtim to 16 bytes so that the size of * struct stat is backwards compatible. We use bitfields instead * of an array of chars so that this doesn't require a C99 compiler * to compile if the size of the padding is 0. We use 2 bitfields * to cover up to 64 bits on 32-bit machines. We assume that * CHAR_BIT is 8... */ unsigned int :(8 / 2) * (16 - (int)sizeof(struct timespec)); unsigned int :(8 / 2) * (16 - (int)sizeof(struct timespec)); }; #if __BSD_VISIBLE struct nstat { - __dev_t st_dev; /* inode's device */ - ino_t st_ino; /* inode's number */ + __uint32_t st_dev; /* inode's device */ + __uint32_t st_ino; /* inode's number */ __uint32_t st_mode; /* inode protection mode */ __uint32_t st_nlink; /* number of hard links */ uid_t st_uid; /* user ID of the file's owner */ gid_t st_gid; /* group ID of the file's group */ - __dev_t st_rdev; /* device type */ + __uint32_t st_rdev; /* device type */ struct timespec st_atim; /* time of last access */ struct timespec st_mtim; /* time of last data modification */ struct timespec st_ctim; /* time of last file status change */ off_t st_size; /* file size, in bytes */ blkcnt_t st_blocks; /* blocks allocated for file */ blksize_t st_blksize; /* optimal blocksize for I/O */ fflags_t st_flags; /* user defined flags for file */ __uint32_t st_gen; /* file generation number */ struct timespec st_birthtim; /* time of file creation */ /* * See above about the following padding. */ unsigned int :(8 / 2) * (16 - (int)sizeof(struct timespec)); unsigned int :(8 / 2) * (16 - (int)sizeof(struct timespec)); }; --SUOF0GtieIMvvwua-- From owner-freebsd-fs@FreeBSD.ORG Sun Nov 23 02:17:20 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 52084355; Sun, 23 Nov 2014 02:17:20 +0000 (UTC) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id D22A82D2; Sun, 23 Nov 2014 02:17:19 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AsIEALlCcVSDaFve/2dsb2JhbABcg2NZBIMCyQ8KhhVVAoEXAQEBAQF9hAMBAQQBAQEgKyALGxgCAg0ZAikBCSYGCAcEARwEiCANtxyWGAEBAQEBAQEBAQEBAQEBAQEBAQEBGIEtjT6BTwEBGzQHgnmBVQWMJYsohCeEQT+DGogYgi+DPIQKggIggXkqMAeBCDmBAwEBAQ X-IronPort-AV: E=Sophos;i="5.07,440,1413259200"; d="scan'208";a="171786986" Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.222]) by esa-annu.net.uoguelph.ca with ESMTP; 22 Nov 2014 21:17:18 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 166A3B3F91; Sat, 22 Nov 2014 21:17:18 -0500 (EST) Date: Sat, 22 Nov 2014 21:17:18 -0500 (EST) From: Rick Macklem To: Gleb Kurtsou Message-ID: <1070528800.5649836.1416709038079.JavaMail.root@uoguelph.ca> In-Reply-To: <20141123015021.GA1658@reks> Subject: Re: RFC: patch to make d_fileno 64bits MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.201] X-Mailer: Zimbra 7.2.6_GA_2926 (ZimbraWebClient - FF3.0 (Win)/7.2.6_GA_2926) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 23 Nov 2014 02:17:20 -0000 Gleb Kurtsou wrote: > On (21/11/2014 19:33), Rick Macklem wrote: > > Gleb Kurtsou wrote: > > > On (21/11/2014 10:25), John Baldwin wrote: > [...] > > > > > > > > I think this is already done (along with several other changes) > > > > more fully in > > > > the projects/ino64 branch in svn? > > > > > > projects/ino64 was created by mdf for merging GSoC commits, and > > > it > > > didn't even get half way through. > > > > > > I'm currently working on merging the code to CURRENT. It's been > > > more > > > than 2 years, so there is quite some work in there. I intend to > > > update > > > the branch as soon as code is ready for review. > > > > > Btw, I just took a quick look and I didn't find any changes to > > "struct dirent" > > in projects/ino64, so I think my original assumption that this > > piece of the > > puzzle hadn't yet been solved, is correct. (Gleb, if you had > > changes to > > "struct dirent" and related fs changes, please let me know.) > > projects/ino64 was created for merging GSoC commits. The branch is > incomplete to say the least.. There are preparatory changes in there > only. In case you are interested please refer to > https://github.com/glk/freebsd-ino64/commits/projects/ino64 > > BTW original GSoC branch also changes VOP_READDIR for all file > systems > to populate dirent.d_off. > > I've attached the patch for some of the system headers generated by > git diff -r 531f5069a9b0f61b8ecd08e4ed744cec3b022606 -r > github/projects/ino64 sys/sys/{_types,dirent,mount,stat}.h > > ~/ino64-sys-sys.patch > > > > > Oh, and thanks to some comments, the new struct dirent has already > > changed to: > > > > struct dirent { > > __uint64_t d_cookie; /* dir cookie for next dir entry */ > > __uint64_t d_fileno; > > __uint16_t d_reclen; > > __uint8_t d_type; > > __uint8_t d_namlen; > > __uint8_t d_pad[4]; /* align d_name to 8 byte boundary */ > > __uint8_t d_name[MAXNAMLEN + 1]; > > }; > > GSoC'2011 code: > struct dirent { > ino_t d_fileno; /* file number of entry */ > off_t d_off; /* next entry seek offset */ > __uint16_t d_reclen; /* length of this record */ > __uint16_t d_namlen; /* length of string in d_name */ > __uint8_t d_type; /* file type, see below */ > __uint8_t d_unused1; > __uint16_t d_unused2; > char d_name[MAXNAMLEN + 1]; /* name must be no longer than this */ > }; > Hmm. I actually was wondering if d_namlen should go to uint16_t just in case someone wanted to make MAXNAMLEN > 255 someday. I think your variant is better than mine. I'll try and take a look at the git stuff (never done git, but I guess I can't avoid it forever). Thanks for the pointer, rick > > > > It was pointed out that C would pad the structure to a multiple of > > 8 bytes > > for some arches and without d_pad that would imply d_name wasn't at > > the end > > of the structure. (Apparently code somewhere find d_name by > > subtracting MAXNAMLEN + 1 > > from sizeof(struct dirent) and this fails if d_name isn't at the > > end. Yuck, > > but the above fixes it.) > > > > However, the size of d_namlen could become uint16_t, if anyone > > thinks MAXNAMLEN > > might want to be greater than 255 someday (long away, since that's > > another ABI change). > > Somebody has already mentioned it previously so I've bumped it to > uint16 > back than. > > > > > > rick > > > > > Besides branch also changes dev_t to 64-bit, bumps MNAMELEN to > > > 1024 > > > and > > > has complete ABI compatibility shims (probably except openaudit > > > which > > > had > > > issues). > > > > > > > > > _______________________________________________ > > > freebsd-fs@freebsd.org mailing list > > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > > > To unsubscribe, send any mail to > > > "freebsd-fs-unsubscribe@freebsd.org" > > > > From owner-freebsd-fs@FreeBSD.ORG Sun Nov 23 19:59:21 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 6F098273 for ; Sun, 23 Nov 2014 19:59:21 +0000 (UTC) Received: from onlyone.friendlyhosting.spb.ru (onlyone.friendlyhosting.spb.ru [46.4.40.135]) by mx1.freebsd.org (Postfix) with ESMTP id 35273E2F for ; Sun, 23 Nov 2014 19:59:21 +0000 (UTC) Received: from [192.168.135.70] (unknown [94.19.235.70]) (Authenticated sender: lev@serebryakov.spb.ru) by onlyone.friendlyhosting.spb.ru (Postfix) with ESMTPSA id 9EF9656400 for ; Sun, 23 Nov 2014 22:58:59 +0300 (MSK) Message-ID: <54723C85.70207@FreeBSD.org> Date: Sun, 23 Nov 2014 22:59:01 +0300 From: Lev Serebryakov Reply-To: lev@FreeBSD.org Organization: FreeBSD User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: FreeBSD 10 panic with "ffs_valloc: dup alloc" Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 23 Nov 2014 19:59:21 -0000 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 Filesystem in question is clean after panic reboot (!) SU and "native" journal are used. Memory check is Ok. System is amd64, r269936. What could I add to this information to help track & fix this bug? - -- // Lev Serebryakov AKA Black Lion -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (MingW32) iQJ8BAEBCgBmBQJUcjyFXxSAAAAAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9w ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXRGOTZEMUNBMEI1RjQzMThCNjc0QjMzMEFF QUIwM0M1OEJGREM0NzhGAAoJEOqwPFi/3EePzkcP/0UD235bG8+YcRpeACuXe3l4 A6XjxCJce5Q3YCKwSTA52u8BLZzEiULUvFmPd0isljeyTKIM/5w6FqXLFJsLIl9g ytkd/QHyhU2Rc5q8GOyILhny94ypdsk1aRjKlmz7jhaA95aHzcMJA0Mu/nkZ6QE+ Gxvq9wszz2C1awgXR84T3kRcG4pANB+arMxejRUvVR5d7QgLCswhTtH6CzUWNStj KDgwyyuuwe5cEJ6/XrJvvY0Ewm2y9yL8GylOGgE3lTESRlx29kAeEY3Pvba1mVY0 swsIY+/afRijf4Mdj5hjPOe++NZ789ABMJzWJVd/ac6fXDlhzL6+Mn1bHH2Giiqj wbSJjFXgFJZPq1ytpqUIA5uNGJLtvAJV+TDG+5AbzlVnY167dTw20lVr0V8K9aba DOeeuxXHKdeEEb9jW3pdbVEarGQj7h52KkeB+ZNWtDmmKm81Dr91u05OuTCVy+9V LWxyysrtvzVAXcCejj507GAe2LQBFtKIq3uv35afHKLsJU3M5IxYIMpgmhRSExIU W9OpscD9Ol644/JL3u58rVW8eyFe9+/xwlhPIvwwU82DYnmhfbinaPlttU+KviTj U2Jg3blkdPZiCQ7j5nLipYAPhBDgXkdp3U++GNsT+pDeHe2NRDSbB2eBvpYCzlEG 4JXvtVdSCQspFxpDOKXj =gxm1 -----END PGP SIGNATURE----- From owner-freebsd-fs@FreeBSD.ORG Sun Nov 23 21:00:09 2014 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id D63BEE61 for ; Sun, 23 Nov 2014 21:00:09 +0000 (UTC) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id AD46D606 for ; Sun, 23 Nov 2014 21:00:09 +0000 (UTC) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.14.9/8.14.9) with ESMTP id sANL09lb036647 for ; Sun, 23 Nov 2014 21:00:09 GMT (envelope-from bugzilla-noreply@FreeBSD.org) Message-Id: <201411232100.sANL09lb036647@kenobi.freebsd.org> From: bugzilla-noreply@FreeBSD.org To: freebsd-fs@FreeBSD.org Subject: Problem reports for freebsd-fs@FreeBSD.org that need special attention X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 Date: Sun, 23 Nov 2014 21:00:09 +0000 Content-Type: text/plain; charset="UTF-8" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 23 Nov 2014 21:00:09 -0000 To view an individual PR, use: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=(Bug Id). The following is a listing of current problems submitted by FreeBSD users, which need special attention. These represent problem reports covering all versions including experimental development code and obsolete releases. Status | Bug Id | Description ------------+-----------+--------------------------------------------------- Open | 136470 | [nfs] Cannot mount / in read-only, over NFS Open | 139651 | [nfs] mount(8): read-only remount of NFS volume d Open | 144447 | [zfs] sharenfs fsunshare() & fsshare_main() non f 3 problems total for which you should take action. From owner-freebsd-fs@FreeBSD.ORG Sun Nov 23 21:11:54 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 09B4D50B for ; Sun, 23 Nov 2014 21:11:54 +0000 (UTC) Received: from onlyone.friendlyhosting.spb.ru (onlyone.friendlyhosting.spb.ru [46.4.40.135]) by mx1.freebsd.org (Postfix) with ESMTP id 7E689863 for ; Sun, 23 Nov 2014 21:11:53 +0000 (UTC) Received: from [192.168.135.70] (unknown [94.19.235.70]) (Authenticated sender: lev@serebryakov.spb.ru) by onlyone.friendlyhosting.spb.ru (Postfix) with ESMTPSA id 365A256401 for ; Mon, 24 Nov 2014 00:11:38 +0300 (MSK) Message-ID: <54724D8C.3030100@FreeBSD.org> Date: Mon, 24 Nov 2014 00:11:40 +0300 From: Lev Serebryakov Reply-To: lev@FreeBSD.org Organization: FreeBSD User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Re: FreeBSD 10 panic with "ffs_valloc: dup alloc" References: <54723C85.70207@FreeBSD.org> In-Reply-To: <54723C85.70207@FreeBSD.org> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 23 Nov 2014 21:11:54 -0000 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 On 23.11.2014 22:59, Lev Serebryakov wrote: > Filesystem in question is clean after panic reboot (!) SU and > "native" journal are used. Memory check is Ok. System is amd64, > r269936. What could I add to this information to help track & fix > this bug? Ok, I could reproduce it. It is triggered by "svnsync" of FreeBSD "base" repository. 3 times in a row. - -- // Lev Serebryakov AKA Black Lion -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (MingW32) iQJ8BAEBCgBmBQJUck2MXxSAAAAAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9w ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXRGOTZEMUNBMEI1RjQzMThCNjc0QjMzMEFF QUIwM0M1OEJGREM0NzhGAAoJEOqwPFi/3EePPXYP/Rbiv0HSFg5zcEvoe99JPoD9 07CTRKUpQWlQ35asrcClEkH+7JaTO5JmbhlpyRs9pUIG5/hhiWYZWKqFpYmcmF4B t7dTODXPCMaAIzmZSq/41aIQy7QFycLMu5PJMlPTqziROuNI+m/nRg0I1P/EqueL QFUGHHXlbK4+nKHkhfh1IZjsuf1LVa3pkqAxqE1dWcZZ8lOyrdI/zMycnjv/BHjC jgoRpKRza/r+pkIf+1HlDF2ew/vMJxRkVx2+4pl6L9BcwW+bcVUpKvrYMeA/IxL4 H2JpkUV38uc9CQLl0oREgHJMhopJT/Np2LS50DwvI0qxEyKsLt8Xo01JnqzbqM4a rKxHqAafP/bvvPvwKXjiDO0G94h+lLkHpsvFkpYTBZ41wNoaVH3uNqhT2fkCWSWK 3bSKkpKjHvHjGCv92FkKN8MsfIU6iyPdIB7FwcHp9Kvta5DoxUDyJUX+ZZTRt5Af BD7pC9ao1wo2n7SffZjPeBKbPZ9C6b2lpkQcPVYgjat7JKiotSsUuJD6WN2kghuk VNbe9Q9IfY0WzGf+DR4XCpDcgJV6HY/r3OtzFF4VSjJow0BF9sV1dOQWqXneqRm/ SopJpXDzgx1/q0w/eLnOl9DOs5gXs23hqI9NauA/qDoSDsaAArtUTD6XDWrWq8H/ W5XGFCZqtqpAJYi0Z7X7 =ZW7f -----END PGP SIGNATURE----- From owner-freebsd-fs@FreeBSD.ORG Sun Nov 23 22:00:20 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 82BBB9BE for ; Sun, 23 Nov 2014 22:00:20 +0000 (UTC) Received: from mail-ob0-x22c.google.com (mail-ob0-x22c.google.com [IPv6:2607:f8b0:4003:c01::22c]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 47E4BC94 for ; Sun, 23 Nov 2014 22:00:20 +0000 (UTC) Received: by mail-ob0-f172.google.com with SMTP id wn1so6385086obc.3 for ; Sun, 23 Nov 2014 14:00:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=G1FdaQ+InsLv8wnsCFNMgylvsHHNQKlNdmRZsYf34FQ=; b=CCXQvrEADnRQIhXg1vzdGtfWlwtbpQ/O+h94hclJHfE6gokshvYlNFENueEUWLm2L7 Ri2B4zf5u3WuPZd5fekQmYq5iUCJVbSwBsXrVoR5UdLKhISHBVS42TruQvkZvXws3FmO oRkQCIPXIoR1cuLPdjVX74HGWMdEQ/Ok4HE0rYA8qd+Su+wHTAwTBXckjmBoQynxKFQ/ gMb3Nbrr/yG/D0FdClaMu1ReVMSAdyqLWgBXcaj34bm9TvMAJU/QzgGMpA6st/L8nuZB NfRrnUNe6eWaPdaCC/GMGgs377kzo5Mc9bAVThrfFV3+MwSGSV9JYzckw8UkbZxL8jg+ 1D6Q== MIME-Version: 1.0 X-Received: by 10.182.65.105 with SMTP id w9mr9430376obs.60.1416780019633; Sun, 23 Nov 2014 14:00:19 -0800 (PST) Received: by 10.76.0.138 with HTTP; Sun, 23 Nov 2014 14:00:19 -0800 (PST) Date: Sun, 23 Nov 2014 17:00:19 -0500 Message-ID: Subject: Making sense of some ZDB output...? From: Zaphod Beeblebrox To: freebsd-fs Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 23 Nov 2014 22:00:20 -0000 In my quest to recover files, I've come across something I don't understand. 100000 L0 1:720cfa33400:24c00 20000L/20000P F=1 B=11756828567/11756828567 120000 L0 0:94048dc9000:24000 20000L/20000P F=1 B=11756828567/11756828567 So far, files I've recovered have been on vdev '0' ... which has 9 devices... I therefore makes sense to me that 24000 (hex) bytes is composed of 20000 bytes of data and 4000 bytes of parity. But this object I'm trying to fetch has some data on vdev '1' ... which has 8 devices. 24c000 / 8 is 4980 ... which isn't a very even number. Am I looking at this wrongly? Does the size include the parity or not? From owner-freebsd-fs@FreeBSD.ORG Mon Nov 24 01:00:52 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 409EDA02 for ; Mon, 24 Nov 2014 01:00:52 +0000 (UTC) Received: from mail-ob0-x229.google.com (mail-ob0-x229.google.com [IPv6:2607:f8b0:4003:c01::229]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 02D37EE0 for ; Mon, 24 Nov 2014 01:00:52 +0000 (UTC) Received: by mail-ob0-f169.google.com with SMTP id vb8so6449817obc.28 for ; Sun, 23 Nov 2014 17:00:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=4mIGsBZ8p3u9x9omZEFLDMbBQMaAN8MYeeACmPvi4ps=; b=lZjDosi+sKsn24qkWRi4tsJIYFK9MSwknS+52Cml3heN0ecQUbwLTIxLTPohIxCS4W YEorulNoIJPxifbRX4eIz4u3CTp21A5Z6Q5f+XSlCve13yVrghQQaO7fRjNzgzp0T7q1 RJDbOzYYcGxhyXB4UoNdk3Fp7dJbiPtCCCngyZfmpgZcRG4USR0SqM3Oj6Kk5P4f87/8 Vo7+PdVtNX28An/U25FhTux/+dxAVQuRKAZGmng0TqaPnQUfWEzWI3/9+f0EH+DcyXa4 xFocGnjnDdRnEiObWwJ9M4n88mH4sTyIHAGw94yPqpg1SoREPSCaFD0WX1zwVYKvuuL+ TizA== MIME-Version: 1.0 X-Received: by 10.202.196.206 with SMTP id u197mr10031014oif.21.1416790850629; Sun, 23 Nov 2014 17:00:50 -0800 (PST) Received: by 10.76.0.138 with HTTP; Sun, 23 Nov 2014 17:00:50 -0800 (PST) In-Reply-To: References: Date: Sun, 23 Nov 2014 20:00:50 -0500 Message-ID: Subject: Re: When a ZFS error is not an error. From: Zaphod Beeblebrox To: freebsd-fs Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 24 Nov 2014 01:00:52 -0000 So... I recovered a 2nd file... this time a rar file. It fails the checksum, but I'm unsure of how I'm extracting the file. The array consists of two vdevs: 1 of 9 disks and the other of 8 disks, both raidz1. I'm pretty sure that the 9 drive vdev is '0' in this output ... as the 9 drive vdev lists first in the zpool status output (and was created first). Anyways, here's two lines from the verbose zdb output: 100000 L0 1:720cfa33400:24c00 20000L/20000P F=1 B=11756828567/11756828567 120000 L0 0:94048dc9000:24000 20000L/20000P F=1 B=11756828567/11756828567 ... What I'm not clear on is why the 8 drive vdev is writing 24c00 bytes and the 9 drive vdev is writing 24000 bytes. And ... in either case, am I to fetch the first 20000 bytes? ie: zdb -R 0:94048dc9000:20000 and zdb -R 1:720cfa33400:20000 ? If, when I read these blocks with zdb, the filesystem is reporting to checksum errors, am I getting the right data? Do I need to process the parity? On Sat, Nov 22, 2014 at 6:56 PM, Zaphod Beeblebrox wrote: > I have a file that ZFS claims is in error that when I go through all the > effort to retrieve it, is not in error. I have 405 files, then, that zfs > says are in error on this array and since some are rather large and since > retrieving one block seems to take 30 seconds (ie: hundreds of hours of > time to recover some files), I'd like to ask if there's some way to finesse > this... or to fix zfs. > > To start, my array has errors like: > > NAME STATE READ WRITE CKSUM > vr2 ONLINE 0 0 989 > raidz1-0 ONLINE 0 0 1.93K > label/vr2-d0 ONLINE 0 0 0 > > (I've omitted the other lines ... they all '0'). I asked what this meant > ... and the best I got was that the errors were not assigned to any > particular device. So I learned how to use ZDB and I have a patch for > ZDB. Apparently the deadlist can have a null in it that crashes ZDB. > > No matter. We have this file in the output of zpool status -v: > > vr2/Audio@20080305-1450:/cds/service/02-Lord_Have_Mercy_Kyrie.mp3 > > ... now even though it picks on the snapshot (not all of the -v reports > do), the following fails: > > [1:170:470]root@virtual:/vr1/tmp/diag> cp > /vr2/Audio/cds/service/02-Lord_Have_Mercy_Kyrie.mp3 . > cp: foo.mp3: Bad address > > So I did this: > > for i in `grep L0 4351-dddddddd.txt | grep -v vr2/Audio | head -50 | cut > -c22-34`; do cc=`printf %05d $count`; echo getting $i 4035/b$cc; time zdb > -R vr2 $i:20000:r >4035/b$cc & count=$[count+1]; done > > --- basically, 4351-dddddddd.txt is the output of zdb for that file (see > http://pastebin.com/tdqEJKJB) and the little script calls zdb to get the > first 20000 (hex) of each block because the remaining 4000 is the parity (9 > disk array). > > Then I cat it into one file, then I truncate it to the specified length > .... > > and lo and behold: The file is sound. > > So what's ZFS on about not wanting to read this file? Help? > From owner-freebsd-fs@FreeBSD.ORG Mon Nov 24 03:37:36 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A622F738; Mon, 24 Nov 2014 03:37:36 +0000 (UTC) Received: from chez.mckusick.com (chez.mckusick.com [IPv6:2001:5a8:4:7e72:4a5b:39ff:fe12:452]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 8AC74F57; Mon, 24 Nov 2014 03:37:36 +0000 (UTC) Received: from chez.mckusick.com (localhost [127.0.0.1]) by chez.mckusick.com (8.14.3/8.14.3) with ESMTP id sAO3bJIA076312; Sun, 23 Nov 2014 19:37:19 -0800 (PST) (envelope-from mckusick@chez.mckusick.com) Message-Id: <201411240337.sAO3bJIA076312@chez.mckusick.com> To: lev@freebsd.org Subject: Re: FreeBSD 10 panic with "ffs_valloc: dup alloc" In-reply-to: <54724D8C.3030100@FreeBSD.org> Date: Sun, 23 Nov 2014 19:37:19 -0800 From: Kirk McKusick Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 24 Nov 2014 03:37:36 -0000 > Date: Mon, 24 Nov 2014 00:11:40 +0300 > From: Lev Serebryakov > To: freebsd-fs@freebsd.org > Subject: Re: FreeBSD 10 panic with "ffs_valloc: dup alloc" > > On 23.11.2014 22:59, Lev Serebryakov wrote: > >> Filesystem in question is clean after panic reboot (!) SU and >> "native" journal are used. Memory check is Ok. System is amd64, >> r269936. What could I add to this information to help track & fix >> this bug? > Ok, I could reproduce it. It is triggered by "svnsync" of FreeBSD > "base" repository. 3 times in a row. > > -- > // Lev Serebryakov AKA Black Lion I don't know what you mean by "native" journal. Is this SUJ or is it the journal GEOM layer? In either case, you should take the system down to single user. Run fsck over the filesystem in question. It will most likely find something wrong and fix it. After that the problem will be gone. The problem with any type of journalling is that if a filesystem error creeps in either because of a write error or a lying disk (e.g., it says that it did the write, but in fact the write was lost when the power failed), then the journal does not know about the error and the only way to fix it is to run a full filesystem check. Kirk McKusick From owner-freebsd-fs@FreeBSD.ORG Mon Nov 24 09:53:01 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id E7D45934 for ; Mon, 24 Nov 2014 09:53:01 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 600FBA72 for ; Mon, 24 Nov 2014 09:53:01 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id sAO9qqf5038108 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 24 Nov 2014 11:52:52 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua sAO9qqf5038108 Received: (from kostik@localhost) by tom.home (8.14.9/8.14.9/Submit) id sAO9qqNq038107; Mon, 24 Nov 2014 11:52:52 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Mon, 24 Nov 2014 11:52:52 +0200 From: Konstantin Belousov To: Mateusz Guzik Subject: Re: atomic v_usecount and v_holdcnt Message-ID: <20141124095251.GH17068@kib.kiev.ua> References: <20141122002812.GA32289@dft-labs.eu> <20141122092527.GT17068@kib.kiev.ua> <20141122211147.GA23623@dft-labs.eu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20141122211147.GA23623@dft-labs.eu> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.0 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on tom.home Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 24 Nov 2014 09:53:02 -0000 On Sat, Nov 22, 2014 at 10:11:47PM +0100, Mateusz Guzik wrote: > On Sat, Nov 22, 2014 at 11:25:27AM +0200, Konstantin Belousov wrote: > > On Sat, Nov 22, 2014 at 01:28:12AM +0100, Mateusz Guzik wrote: > > > The idea is that we don't need an interlock as long as we don't > > > transition either counter 1->0 or 0->1. > > I already said that something along the lines of the patch should work. > > In fact, you need vnode lock when hold count changes between 0 and 1, > > and probably the same for use count. > > > > I don't see why this would be required (not that I'm an VFS expert). > vnode recycling seems to be protected with the interlock. > > In fact I would argue that if this is really needed, current code is > buggy. Yes, it is already (somewhat) buggy. Most need of the lock is for the case of counts coming from 1 to 0. The reason is the handling of the active vnode list, which is used for limiting the amount of vnode list walking in syncer. When hold count is decremented to 0, vnode is removed from the active list. When use count is decremented to 0, vnode is supposedly inactivated, and vinactive() cleans the cached pages belonging to vnode. In other words, VI_OWEINACT for dirty vnode is sort of bug. > > interlock is taken in e.g. vgone with vnode already locked, so for cases > where we get interlock -> lock, the kernel has to drop the former before > blocking in order to avoid deadlocks. And this opens the same window > present with my patch. See above. > > > Some notes about the patch. > > > > mtx_owned() braces are untolerable ugliness. You should either pass a > > boolean flag (preferred), or create locked/unlocked versions of the > > functions. > > > > That was a temporary hack. > > > Similarly, I dislike vget_held(). Add a flag to vget(), see LK_EATTR_MASK > > in sys/lockmgr.h. > > > > lockmgr has no business knowing or not knowing whether we held the > vnode, so the flag would have to be cleared before it is passed to it. > But then it seems like an abuse of LK_* namespace. > > But maybe I misunerstood your proposal. Look at LK_RETRY, which is not lockmgr flag. Yes, I want vget() just grow one more flag bit to indicate that hold was already done, instead of multiplying vget into more functions. > > > Could there be consequences of not taking vnode interlock and passing > > LK_INTERLOCK to vn_lock() in vget() ? > > > > You mean to add an assertion? I did it in the new patch. > > > Taking interlock when vnode lock is already owned is probably fine and > > does not add to contention. I mean that making VI_OWEINACT so loose > > breaks the VOP_INACTIVE() contract. > > namecache typically locks vnodes shared and in such cases vinactive is > not executed and only OWEINACT is cleared. > > And it is a serialisation point. I just tested with multiple stats in > the same directory and it went from ~1100000 to ~620000 ops/s. > > I can add unconditional locking of the interlock for exclusively locked > vnodes if you insist, bur for shared ones I don't see any benefit. > > Patch below should be split in 3, but imho is sufficinetly readable for > review in one batch for review. > > 1. It adds refcount_{acquire,release}_if_greater functions to replace > open coded atomic_cmpset_int loops. > > 2. v_rdev->si_usecount manipulation is moved to v_{incr,decr}_devcount. > > 3. actual switch to atomics + some assertions > > diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/vnode.c b/sys/cddl/contrib/opensolaris/uts/common/fs/vnode.c > index 83f29c1..b587ebd 100644 > --- a/sys/cddl/contrib/opensolaris/uts/common/fs/vnode.c > +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/vnode.c > @@ -99,6 +99,6 @@ vn_rele_async(vnode_t *vp, taskq_t *taskq) > (task_func_t *)vn_rele_inactive, vp, TQ_SLEEP) != 0); > return; > } > - vp->v_usecount--; > + refcount_release(&vp->v_usecount); > vdropl(vp); > } > diff --git a/sys/kern/vfs_cache.c b/sys/kern/vfs_cache.c > index 55e3217..50a84d8 100644 > --- a/sys/kern/vfs_cache.c > +++ b/sys/kern/vfs_cache.c > @@ -665,12 +665,12 @@ success: > ltype = VOP_ISLOCKED(dvp); > VOP_UNLOCK(dvp, 0); > } > - VI_LOCK(*vpp); > + vhold(*vpp); > if (wlocked) > CACHE_WUNLOCK(); > else > CACHE_RUNLOCK(); > - error = vget(*vpp, cnp->cn_lkflags | LK_INTERLOCK, cnp->cn_thread); > + error = vget_held(*vpp, cnp->cn_lkflags, cnp->cn_thread); > if (cnp->cn_flags & ISDOTDOT) { > vn_lock(dvp, ltype | LK_RETRY); > if (dvp->v_iflag & VI_DOOMED) { > @@ -1376,9 +1376,9 @@ vn_dir_dd_ino(struct vnode *vp) > if ((ncp->nc_flag & NCF_ISDOTDOT) != 0) > continue; > ddvp = ncp->nc_dvp; > - VI_LOCK(ddvp); > + vhold(ddvp); > CACHE_RUNLOCK(); > - if (vget(ddvp, LK_INTERLOCK | LK_SHARED | LK_NOWAIT, curthread)) > + if (vget_held(ddvp, LK_SHARED | LK_NOWAIT, curthread)) > return (NULL); > return (ddvp); > } > diff --git a/sys/kern/vfs_hash.c b/sys/kern/vfs_hash.c > index 0271e49..d2fdbba 100644 > --- a/sys/kern/vfs_hash.c > +++ b/sys/kern/vfs_hash.c > @@ -83,9 +83,9 @@ vfs_hash_get(const struct mount *mp, u_int hash, int flags, struct thread *td, s > continue; > if (fn != NULL && fn(vp, arg)) > continue; > - VI_LOCK(vp); > + vhold(vp); > mtx_unlock(&vfs_hash_mtx); > - error = vget(vp, flags | LK_INTERLOCK, td); > + error = vget_held(vp, flags, td); > if (error == ENOENT && (flags & LK_NOWAIT) == 0) > break; > if (error) > @@ -127,9 +127,9 @@ vfs_hash_insert(struct vnode *vp, u_int hash, int flags, struct thread *td, stru > continue; > if (fn != NULL && fn(vp2, arg)) > continue; > - VI_LOCK(vp2); > + vhold(vp2); > mtx_unlock(&vfs_hash_mtx); > - error = vget(vp2, flags | LK_INTERLOCK, td); > + error = vget_held(vp2, flags, td); > if (error == ENOENT && (flags & LK_NOWAIT) == 0) > break; > mtx_lock(&vfs_hash_mtx); > diff --git a/sys/kern/vfs_subr.c b/sys/kern/vfs_subr.c > index 345aad6..564b5af 100644 > --- a/sys/kern/vfs_subr.c > +++ b/sys/kern/vfs_subr.c > @@ -68,6 +68,7 @@ __FBSDID("$FreeBSD$"); > #include > #include > #include > +#include > #include > #include > #include > @@ -105,6 +106,8 @@ static void v_incr_usecount(struct vnode *); > static void v_decr_usecount(struct vnode *); > static void v_decr_useonly(struct vnode *); > static void v_upgrade_usecount(struct vnode *); > +static void v_incr_devcount(struct vnode *); > +static void v_decr_devcount(struct vnode *); > static void vnlru_free(int); > static void vgonel(struct vnode *); > static void vfs_knllock(void *arg); > @@ -165,6 +168,10 @@ static int reassignbufcalls; > SYSCTL_INT(_vfs, OID_AUTO, reassignbufcalls, CTLFLAG_RW, &reassignbufcalls, 0, > "Number of calls to reassignbuf"); > > +static int vget_lock; > +SYSCTL_INT(_vfs, OID_AUTO, vget_lock, CTLFLAG_RW, &vget_lock, 0, > + "Lock the interlock unconditionally in vget"); > + > /* > * Cache for the mount type id assigned to NFS. This is used for > * special checks in nfs/nfs_nqlease.c and vm/vnode_pager.c. > @@ -854,7 +861,7 @@ vnlru_free(int count) > */ > freevnodes--; > vp->v_iflag &= ~VI_FREE; > - vp->v_holdcnt++; > + refcount_acquire(&vp->v_holdcnt); > > mtx_unlock(&vnode_free_list_mtx); > VI_UNLOCK(vp); > @@ -2050,14 +2057,10 @@ static void > v_incr_usecount(struct vnode *vp) > { > > + ASSERT_VI_UNLOCKED(vp, __func__); > CTR2(KTR_VFS, "%s: vp %p", __func__, vp); > - vholdl(vp); > - vp->v_usecount++; > - if (vp->v_type == VCHR && vp->v_rdev != NULL) { > - dev_lock(); > - vp->v_rdev->si_usecount++; > - dev_unlock(); > - } > + vhold(vp); > + v_upgrade_usecount(vp); > } > > /* > @@ -2068,13 +2071,14 @@ static void > v_upgrade_usecount(struct vnode *vp) > { > > + ASSERT_VI_UNLOCKED(vp, __func__); > CTR2(KTR_VFS, "%s: vp %p", __func__, vp); > - vp->v_usecount++; > - if (vp->v_type == VCHR && vp->v_rdev != NULL) { > - dev_lock(); > - vp->v_rdev->si_usecount++; > - dev_unlock(); > + if (!refcount_acquire_if_greater(&vp->v_usecount, 0)) { > + VI_LOCK(vp); > + refcount_acquire(&vp->v_usecount); > + VI_UNLOCK(vp); > } > + v_incr_devcount(vp); > } > > /* > @@ -2086,16 +2090,11 @@ static void > v_decr_usecount(struct vnode *vp) > { > > - ASSERT_VI_LOCKED(vp, __FUNCTION__); > + ASSERT_VI_LOCKED(vp, __func__); > VNASSERT(vp->v_usecount > 0, vp, > ("v_decr_usecount: negative usecount")); > CTR2(KTR_VFS, "%s: vp %p", __func__, vp); > - vp->v_usecount--; > - if (vp->v_type == VCHR && vp->v_rdev != NULL) { > - dev_lock(); > - vp->v_rdev->si_usecount--; > - dev_unlock(); > - } > + v_decr_useonly(vp); > vdropl(vp); > } > > @@ -2109,11 +2108,35 @@ static void > v_decr_useonly(struct vnode *vp) > { > > - ASSERT_VI_LOCKED(vp, __FUNCTION__); > + ASSERT_VI_LOCKED(vp, __func__); > VNASSERT(vp->v_usecount > 0, vp, > ("v_decr_useonly: negative usecount")); > CTR2(KTR_VFS, "%s: vp %p", __func__, vp); > - vp->v_usecount--; > + refcount_release(&vp->v_usecount); > + v_decr_devcount(vp); > +} > + > +/* > + * Increment si_usecount of the associated device, if any. > + */ > +static void > +v_incr_devcount(struct vnode *vp) > +{ > + > + if (vp->v_type == VCHR && vp->v_rdev != NULL) { > + dev_lock(); > + vp->v_rdev->si_usecount++; > + dev_unlock(); > + } > +} > + > +/* > + * Increment si_usecount of the associated device, if any. > + */ > +static void > +v_decr_devcount(struct vnode *vp) > +{ > + > if (vp->v_type == VCHR && vp->v_rdev != NULL) { > dev_lock(); > vp->v_rdev->si_usecount--; > @@ -2129,19 +2152,19 @@ v_decr_useonly(struct vnode *vp) > * vput try to do it here. > */ > int > -vget(struct vnode *vp, int flags, struct thread *td) > +vget_held(struct vnode *vp, int flags, struct thread *td) > { > int error; > > - error = 0; > VNASSERT((flags & LK_TYPE_MASK) != 0, vp, > ("vget: invalid lock operation")); > + if ((flags & LK_INTERLOCK) != 0) > + ASSERT_VI_LOCKED(vp, __func__); > + else > + ASSERT_VI_UNLOCKED(vp, __func__); > CTR3(KTR_VFS, "%s: vp %p with flags %d", __func__, vp, flags); > > - if ((flags & LK_INTERLOCK) == 0) > - VI_LOCK(vp); > - vholdl(vp); > - if ((error = vn_lock(vp, flags | LK_INTERLOCK)) != 0) { > + if ((error = vn_lock(vp, flags)) != 0) { > vdrop(vp); > CTR2(KTR_VFS, "%s: impossible to lock vnode %p", __func__, > vp); > @@ -2149,7 +2172,6 @@ vget(struct vnode *vp, int flags, struct thread *td) > } > if (vp->v_iflag & VI_DOOMED && (flags & LK_RETRY) == 0) > panic("vget: vn_lock failed to return ENOENT\n"); > - VI_LOCK(vp); > /* Upgrade our holdcnt to a usecount. */ > v_upgrade_usecount(vp); > /* > @@ -2158,16 +2180,27 @@ vget(struct vnode *vp, int flags, struct thread *td) > * here at preventing a reference to a removed file. If > * we don't succeed no harm is done. > */ > - if (vp->v_iflag & VI_OWEINACT) { > - if (VOP_ISLOCKED(vp) == LK_EXCLUSIVE && > - (flags & LK_NOWAIT) == 0) > - vinactive(vp, td); > - vp->v_iflag &= ~VI_OWEINACT; > + if (vget_lock || vp->v_iflag & VI_OWEINACT) { > + VI_LOCK(vp); > + if (vp->v_iflag & VI_OWEINACT) { > + if (VOP_ISLOCKED(vp) == LK_EXCLUSIVE && > + (flags & LK_NOWAIT) == 0) > + vinactive(vp, td); > + vp->v_iflag &= ~VI_OWEINACT; > + } > + VI_UNLOCK(vp); > } > - VI_UNLOCK(vp); > return (0); > } > > +int > +vget(struct vnode *vp, int flags, struct thread *td) > +{ > + > + _vhold(vp, (flags & LK_INTERLOCK) != 0); > + return (vget_held(vp, flags, td)); > +} > + > /* > * Increase the reference count of a vnode. > */ > @@ -2176,9 +2209,7 @@ vref(struct vnode *vp) > { > > CTR2(KTR_VFS, "%s: vp %p", __func__, vp); > - VI_LOCK(vp); > v_incr_usecount(vp); > - VI_UNLOCK(vp); > } > > /* > @@ -2193,13 +2224,8 @@ vref(struct vnode *vp) > int > vrefcnt(struct vnode *vp) > { > - int usecnt; > - > - VI_LOCK(vp); > - usecnt = vp->v_usecount; > - VI_UNLOCK(vp); > > - return (usecnt); > + return (vp->v_usecount); > } > > #define VPUTX_VRELE 1 > @@ -2218,12 +2244,19 @@ vputx(struct vnode *vp, int func) > ASSERT_VOP_LOCKED(vp, "vput"); > else > KASSERT(func == VPUTX_VRELE, ("vputx: wrong func")); > + ASSERT_VI_UNLOCKED(vp, __func__); > CTR2(KTR_VFS, "%s: vp %p", __func__, vp); > + > + if (refcount_release_if_greater(&vp->v_usecount, 1)) { > + if (func == VPUTX_VPUT) > + VOP_UNLOCK(vp, 0); > + v_decr_devcount(vp); > + vdrop(vp); > + return; > + } > + > VI_LOCK(vp); > > - /* Skip this v_writecount check if we're going to panic below. */ > - VNASSERT(vp->v_writecount < vp->v_usecount || vp->v_usecount < 1, vp, > - ("vputx: missed vn_close")); > error = 0; > > if (vp->v_usecount > 1 || ((vp->v_iflag & VI_DOINGINACT) && > @@ -2314,38 +2347,32 @@ vunref(struct vnode *vp) > } > > /* > - * Somebody doesn't want the vnode recycled. > - */ > -void > -vhold(struct vnode *vp) > -{ > - > - VI_LOCK(vp); > - vholdl(vp); > - VI_UNLOCK(vp); > -} > - > -/* > * Increase the hold count and activate if this is the first reference. > */ > void > -vholdl(struct vnode *vp) > +_vhold(struct vnode *vp, bool locked) > { > struct mount *mp; > > + if (locked) > + ASSERT_VI_LOCKED(vp, __func__); > + else > + ASSERT_VI_UNLOCKED(vp, __func__); > CTR2(KTR_VFS, "%s: vp %p", __func__, vp); > -#ifdef INVARIANTS > - /* getnewvnode() calls v_incr_usecount() without holding interlock. */ > - if (vp->v_type != VNON || vp->v_data != NULL) { > - ASSERT_VI_LOCKED(vp, "vholdl"); > - VNASSERT(vp->v_holdcnt > 0 || (vp->v_iflag & VI_FREE) != 0, > - vp, ("vholdl: free vnode is held")); > + if (refcount_acquire_if_greater(&vp->v_holdcnt, 0)) { > + VNASSERT((vp->v_iflag & VI_FREE) == 0, vp, > + ("_vhold: vnode with holdcnt is free")); > + return; > } > -#endif > - vp->v_holdcnt++; > - if ((vp->v_iflag & VI_FREE) == 0) > + if (!locked) > + VI_LOCK(vp); > + if ((vp->v_iflag & VI_FREE) == 0) { > + refcount_acquire(&vp->v_holdcnt); > + if (!locked) > + VI_UNLOCK(vp); > return; > - VNASSERT(vp->v_holdcnt == 1, vp, ("vholdl: wrong hold count")); > + } > + VNASSERT(vp->v_holdcnt == 0, vp, ("vholdl: wrong hold count")); > VNASSERT(vp->v_op != NULL, vp, ("vholdl: vnode already reclaimed.")); > /* > * Remove a vnode from the free list, mark it as in use, > @@ -2362,18 +2389,9 @@ vholdl(struct vnode *vp) > TAILQ_INSERT_HEAD(&mp->mnt_activevnodelist, vp, v_actfreelist); > mp->mnt_activevnodelistsize++; > mtx_unlock(&vnode_free_list_mtx); > -} > - > -/* > - * Note that there is one less who cares about this vnode. > - * vdrop() is the opposite of vhold(). > - */ > -void > -vdrop(struct vnode *vp) > -{ > - > - VI_LOCK(vp); > - vdropl(vp); > + refcount_acquire(&vp->v_holdcnt); > + if (!locked) > + VI_UNLOCK(vp); > } > > /* > @@ -2382,20 +2400,28 @@ vdrop(struct vnode *vp) > * (marked VI_DOOMED) in which case we will free it. > */ > void > -vdropl(struct vnode *vp) > +_vdrop(struct vnode *vp, bool locked) > { > struct bufobj *bo; > struct mount *mp; > int active; > > - ASSERT_VI_LOCKED(vp, "vdropl"); > + if (locked) > + ASSERT_VI_LOCKED(vp, __func__); > + else > + ASSERT_VI_UNLOCKED(vp, __func__); > CTR2(KTR_VFS, "%s: vp %p", __func__, vp); > if (vp->v_holdcnt <= 0) > panic("vdrop: holdcnt %d", vp->v_holdcnt); > - vp->v_holdcnt--; > - VNASSERT(vp->v_holdcnt >= vp->v_usecount, vp, > - ("hold count less than use count")); > - if (vp->v_holdcnt > 0) { > + if (refcount_release_if_greater(&vp->v_holdcnt, 1)) { > + if (locked) > + VI_UNLOCK(vp); > + return; > + } > + > + if (!locked) > + VI_LOCK(vp); > + if (refcount_release(&vp->v_holdcnt) == 0) { > VI_UNLOCK(vp); > return; > } > diff --git a/sys/sys/refcount.h b/sys/sys/refcount.h > index 4611664..360d50d 100644 > --- a/sys/sys/refcount.h > +++ b/sys/sys/refcount.h > @@ -64,4 +64,32 @@ refcount_release(volatile u_int *count) > return (old == 1); > } > > +static __inline int > +refcount_acquire_if_greater(volatile u_int *count, int val) > +{ > + int old; > +retry: > + old = *count; > + if (old > val) { > + if (atomic_cmpset_int(count, old, old + 1)) > + return (true); > + goto retry; > + } > + return (false); > +} > + > +static __inline int > +refcount_release_if_greater(volatile u_int *count, int val) > +{ > + int old; > +retry: > + old = *count; > + if (old > val) { > + if (atomic_cmpset_int(count, old, old - 1)) > + return (true); > + goto retry; > + } > + return (false); > +} > + > #endif /* ! __SYS_REFCOUNT_H__ */ > diff --git a/sys/sys/vnode.h b/sys/sys/vnode.h > index c78b9d1..2aab48a 100644 > --- a/sys/sys/vnode.h > +++ b/sys/sys/vnode.h > @@ -647,13 +647,16 @@ int vaccess_acl_posix1e(enum vtype type, uid_t file_uid, > struct ucred *cred, int *privused); > void vattr_null(struct vattr *vap); > int vcount(struct vnode *vp); > -void vdrop(struct vnode *); > -void vdropl(struct vnode *); > +#define vdrop(vp) _vdrop((vp), 0) > +#define vdropl(vp) _vdrop((vp), 1) > +void _vdrop(struct vnode *, bool); > int vflush(struct mount *mp, int rootrefs, int flags, struct thread *td); > int vget(struct vnode *vp, int lockflag, struct thread *td); > +int vget_held(struct vnode *vp, int lockflag, struct thread *td); > void vgone(struct vnode *vp); > -void vhold(struct vnode *); > -void vholdl(struct vnode *); > +#define vhold(vp) _vhold((vp), 0) > +#define vholdl(vp) _vhold((vp), 1) > +void _vhold(struct vnode *, bool); > void vinactive(struct vnode *, struct thread *); > int vinvalbuf(struct vnode *vp, int save, int slpflag, int slptimeo); > int vtruncbuf(struct vnode *vp, struct ucred *cred, off_t length, From owner-freebsd-fs@FreeBSD.ORG Mon Nov 24 18:49:39 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id AFE62E25 for ; Mon, 24 Nov 2014 18:49:39 +0000 (UTC) Received: from mail-oi0-x234.google.com (mail-oi0-x234.google.com [IPv6:2607:f8b0:4003:c06::234]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 75B4FEFE for ; Mon, 24 Nov 2014 18:49:39 +0000 (UTC) Received: by mail-oi0-f52.google.com with SMTP id h136so6997510oig.39 for ; Mon, 24 Nov 2014 10:49:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=JJt8yiukBknIyNDaHERF/f1OtcXcJGyRtzp+Wfh/++0=; b=SmwaBzrYNCkfyphdlgZXs1Q49cO8HiB6H02m/whUUxb4XfQrJmGMoobLDN8fFLtlRg QPtwr7DlvKsHliT2LfE4/6YXtRs8jPnlDUfZwt6zlel882iNoV3wMI0aM3rSqurapy1H P2HZcEmD81NmzEun4SIYmTH74/GXzsDQh6Lr1REzXGJsWoMnxeW5+d/uemBcNkYzncye HT/NGb0JQyoE8BkfGS+M2ivX0jqFNHGLy/E8EXbviBvUFtQfuHmiahl4jgqjMtau550/ onPiZIV+OkWRA3KfVpZWgY5CrpQgRq0P9U+uzmBJXuERoLeDUb9exrZjbPdH3zChuGnP Qtqw== MIME-Version: 1.0 X-Received: by 10.60.247.137 with SMTP id ye9mr4299283oec.35.1416854978847; Mon, 24 Nov 2014 10:49:38 -0800 (PST) Received: by 10.76.0.138 with HTTP; Mon, 24 Nov 2014 10:49:38 -0800 (PST) Date: Mon, 24 Nov 2014 13:49:38 -0500 Message-ID: Subject: ZDB -Z? From: Zaphod Beeblebrox To: freebsd-fs Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 24 Nov 2014 18:49:39 -0000 I'm reading about someone else's recovery of files from a damaged ZFS partition. He claims to have added (possibly to opensolaris or whatnot) an argument to zdb '-Z' ... which operates somewhat like -R, but which highlights what parts of the region are on what physical disks, and which are parity. Has anyone patched this into FreeBSD? From owner-freebsd-fs@FreeBSD.ORG Mon Nov 24 19:40:52 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id EADA2457 for ; Mon, 24 Nov 2014 19:40:52 +0000 (UTC) Received: from mail-ob0-x232.google.com (mail-ob0-x232.google.com [IPv6:2607:f8b0:4003:c01::232]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id AE43C85E for ; Mon, 24 Nov 2014 19:40:52 +0000 (UTC) Received: by mail-ob0-f178.google.com with SMTP id gq1so7619706obb.37 for ; Mon, 24 Nov 2014 11:40:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=EhBgQtagqNl4pzv2EQOj0jijCGtz48zimcHTEQ1+DhE=; b=SxUkf6HLGodbMUNCyrVr/atdpA9qWNV0XHpydxVXHfinybKFZ3FOShNKAvH1f0wnn3 uJjwyHP/zhQ8AXD+bNImMjz4kyKbIGtATkrC4TwAP0/aIqDk27FhuUSgaFt7EUL+NF+V OaSXSsJu0yQImCi/LlhUES/DaOvNvn+Cqgui4LhwjnxFGqXGjaZjCaMIo80ur38ZC+5h HujHsHoMSDmbeN7QTux7cWDn6Ws9Cfmyg9272qGybLSBn6NydYX7SYMPXSmeDxMiOyzE eoIzzhYpscKqhNSRmGFKyNsWKO6MXXsPhcXKZxbCmCtH5Bun+vNiCcOFKNor2UdcgctB BJkg== MIME-Version: 1.0 X-Received: by 10.202.196.206 with SMTP id u197mr12795499oif.21.1416858052104; Mon, 24 Nov 2014 11:40:52 -0800 (PST) Received: by 10.76.0.138 with HTTP; Mon, 24 Nov 2014 11:40:52 -0800 (PST) Date: Mon, 24 Nov 2014 14:40:52 -0500 Message-ID: Subject: What does it mean when zdb -R x:xxx:xxx:g crashes? From: Zaphod Beeblebrox To: freebsd-fs Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 24 Nov 2014 19:40:53 -0000 So... another barrier in figuring out my zfs problem is that zdb crashes when asked to print the gang block header: [1:97:397]root@virtual:/vr1/tmp/diag> zdb -AAA -R vr2 0:94048dc9000:24000:g Found vdev type: raidz Assertion failed: (zio->io_error == 0 || (zio->io_flags & ZIO_FLAG_CANFAIL)), file /usr/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c, line 3297. Abort trap (core dumped) Now... this trace looks odd to me: there are no bookmarks in the filesystem... neither are there snapshots for this specific filesystem. (gdb) bt #0 0x0000000801cb26ca in thr_kill () from /lib/libc.so.7 #1 0x0000000801d87149 in abort () from /lib/libc.so.7 #2 0x0000000801920e21 in zio_init () from /lib/libzpool.so.2 #3 0x0000000801927e0e in zbookmark_is_before () from /lib/libzpool.so.2 #4 0x0000000801922df7 in zio_execute () from /lib/libzpool.so.2 #5 0x0000000801927f11 in zbookmark_is_before () from /lib/libzpool.so.2 #6 0x0000000801922df7 in zio_execute () from /lib/libzpool.so.2 #7 0x0000000801927f11 in zbookmark_is_before () from /lib/libzpool.so.2 #8 0x0000000801922df7 in zio_execute () from /lib/libzpool.so.2 #9 0x0000000801927f11 in zbookmark_is_before () from /lib/libzpool.so.2 #10 0x0000000801922df7 in zio_execute () from /lib/libzpool.so.2 #11 0x0000000801927f11 in zbookmark_is_before () from /lib/libzpool.so.2 #12 0x0000000801922df7 in zio_execute () from /lib/libzpool.so.2 #13 0x000000080191b8d9 in taskq_create () from /lib/libzpool.so.2 #14 0x0000000800e814f5 in pthread_create () from /lib/libthr.so.3 #15 0x00007ffff6fb9000 in ?? () Cannot access memory at address 0x7ffff71b9000 Help? From owner-freebsd-fs@FreeBSD.ORG Mon Nov 24 20:39:29 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 6223934B for ; Mon, 24 Nov 2014 20:39:29 +0000 (UTC) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 3AB71E7B for ; Mon, 24 Nov 2014 20:39:29 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 2FD21B9B4; Mon, 24 Nov 2014 15:39:28 -0500 (EST) From: John Baldwin To: borjam@sarenet.es Subject: Re: BIOS booting from disks > 2TB Date: Mon, 24 Nov 2014 14:25:32 -0500 User-Agent: KMail/1.13.5 (FreeBSD/8.4-CBSD-20140415; KDE/4.5.5; amd64; ; ) References: <17A2AC72-AD70-480A-9BAC-9CC8EAFD572F@we.lc.ehu.es> <201411201110.45066.jhb@freebsd.org> In-Reply-To: MIME-Version: 1.0 Content-Type: Text/Plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <201411241425.32989.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Mon, 24 Nov 2014 15:39:28 -0500 (EST) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 24 Nov 2014 20:39:29 -0000 On Thursday, November 20, 2014 2:08:54 pm borjam@sarenet.es wrote: > El 20.11.2014 17:10, John Baldwin escribi=C3=B3: > > Can you start with 'lsdev -v' at the loader prompt? >=20 > Sure! >=20 > cd devices: > disk devices: > disk0: BIOS drive C: > pxe devices: Ugh. So it means we aren't seeing any partitions or filesystems on disk0, and that is why the loader craps out. Can you build a loader with "-DDISK_DEBUG" enabled in CFLAGS and capture the output from that starting up? =2D-=20 John Baldwin From owner-freebsd-fs@FreeBSD.ORG Mon Nov 24 20:41:44 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id D462A3DD for ; Mon, 24 Nov 2014 20:41:44 +0000 (UTC) Received: from mail-oi0-x229.google.com (mail-oi0-x229.google.com [IPv6:2607:f8b0:4003:c06::229]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 963ABEA3 for ; Mon, 24 Nov 2014 20:41:44 +0000 (UTC) Received: by mail-oi0-f41.google.com with SMTP id a3so7335603oib.0 for ; Mon, 24 Nov 2014 12:41:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=jcf8pdX+WRXwngG2GfG4vzMiBKsypZ0lqnHpdTV/XmM=; b=N59M38VhcWiM6JdiF/B62QWU3CeinN7CQV/NpaZDMIzS4aiC2DYliFEnPwCv3o4peY 4RK+TZl21wpDw9Zqmo2WgqcntD+y2Etc+AYWZjruhnitElRfvdlgxRiAeF+/trPp1OXc j+N+87/hkXSasNW8vZ0GAMaa/6FdNQcelcw9j5pWnqy6gJG9t4CP//OSmTd5v/tCVmvK tEXbMQJEj7T2zGSIgFeHZcWE1aa52A4k/dyfMSFuzIGrDsnu2oqD3z7d6h/uQRAjvYNm FSo0bPWiR+paiH1RguIjfwRnmj649/rkGrJ4SkIwYqsWQpu3fOgqY34x53wy2wHi0qJM u66A== MIME-Version: 1.0 X-Received: by 10.182.65.105 with SMTP id w9mr12859232obs.60.1416861703882; Mon, 24 Nov 2014 12:41:43 -0800 (PST) Received: by 10.76.0.138 with HTTP; Mon, 24 Nov 2014 12:41:43 -0800 (PST) In-Reply-To: References: Date: Mon, 24 Nov 2014 15:41:43 -0500 Message-ID: Subject: Re: What does it mean when zdb -R x:xxx:xxx:g crashes? From: Zaphod Beeblebrox To: freebsd-fs Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 24 Nov 2014 20:41:44 -0000 I have retried this on a brand new pool with a brand new filesystem containing one file. zdb -R x:xxx:xxx:g <--- ie: retrieve the gang block header crashes. It also crashes with -A -AA and -AAA (which are supposed to prevent assertions from stopping the debugger). Help? On Mon, Nov 24, 2014 at 2:40 PM, Zaphod Beeblebrox wrote: > So... another barrier in figuring out my zfs problem is that zdb crashes > when asked to print the gang block header: > > [1:97:397]root@virtual:/vr1/tmp/diag> zdb -AAA -R vr2 > 0:94048dc9000:24000:g > Found vdev type: raidz > Assertion failed: (zio->io_error == 0 || (zio->io_flags & > ZIO_FLAG_CANFAIL)), file > /usr/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c, > line 3297. > Abort trap (core dumped) > > Now... this trace looks odd to me: there are no bookmarks in the > filesystem... neither are there snapshots for this specific filesystem. > > (gdb) bt > #0 0x0000000801cb26ca in thr_kill () from /lib/libc.so.7 > #1 0x0000000801d87149 in abort () from /lib/libc.so.7 > #2 0x0000000801920e21 in zio_init () from /lib/libzpool.so.2 > #3 0x0000000801927e0e in zbookmark_is_before () from /lib/libzpool.so.2 > #4 0x0000000801922df7 in zio_execute () from /lib/libzpool.so.2 > #5 0x0000000801927f11 in zbookmark_is_before () from /lib/libzpool.so.2 > #6 0x0000000801922df7 in zio_execute () from /lib/libzpool.so.2 > #7 0x0000000801927f11 in zbookmark_is_before () from /lib/libzpool.so.2 > #8 0x0000000801922df7 in zio_execute () from /lib/libzpool.so.2 > #9 0x0000000801927f11 in zbookmark_is_before () from /lib/libzpool.so.2 > #10 0x0000000801922df7 in zio_execute () from /lib/libzpool.so.2 > #11 0x0000000801927f11 in zbookmark_is_before () from /lib/libzpool.so.2 > #12 0x0000000801922df7 in zio_execute () from /lib/libzpool.so.2 > #13 0x000000080191b8d9 in taskq_create () from /lib/libzpool.so.2 > #14 0x0000000800e814f5 in pthread_create () from /lib/libthr.so.3 > #15 0x00007ffff6fb9000 in ?? () > Cannot access memory at address 0x7ffff71b9000 > > Help? > From owner-freebsd-fs@FreeBSD.ORG Mon Nov 24 21:13:33 2014 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id ACA2ACE7; Mon, 24 Nov 2014 21:13:33 +0000 (UTC) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 9E9022C1; Mon, 24 Nov 2014 21:13:32 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id XAA08522; Mon, 24 Nov 2014 23:15:23 +0200 (EET) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1Xt0ws-000CkT-4S; Mon, 24 Nov 2014 23:13:30 +0200 Message-ID: <54739F41.8030407@FreeBSD.org> Date: Mon, 24 Nov 2014 23:12:33 +0200 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: sbruno@FreeBSD.org, freebsd-current@FreeBSD.org, freebsd-fs Subject: zfs locking vs vnode locking [Was: zfs/vfs lockups, via poudriere] References: <1416684021.7423.77.camel@bruno> <547109A2.9010506@FreeBSD.org> <1416761846.1186.0.camel@bruno> In-Reply-To: <1416761846.1186.0.camel@bruno> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 24 Nov 2014 21:13:33 -0000 On 23/11/2014 18:57, Sean Bruno wrote: > 31071 100995 rm - mi_switch+0xe1 sleepq_wait+0x3a sleeplk+0x18d __lockmgr_args+0x9ab vop_stdlock+0x3c VOP_LOCK1_APV+0xab _vn_lock+0x43 zfs_lookup+0x45d zfs_freebsd_lookup+0x6d VOP_CACHEDLOOKUP_APV+0xa1 vfs_cache_lookup+0xd6 VOP_LOOKUP_APV+0xa1 lookup+0x5a1 namei+0x534 kern_rmdirat+0x8d amd64_syscall+0x3fb Xfast_syscall+0xfb > 31075 100693 mv - mi_switch+0xe1 sleepq_wait+0x3a sleeplk+0x18d __lockmgr_args+0xd5d vop_stdlock+0x3c VOP_LOCK1_APV+0xab _vn_lock+0x43 vputx+0x28a zfs_rename_unlock+0x3e zfs_freebsd_rename+0xe39 VOP_RENAME_APV+0xab kern_renameat+0x4a6 amd64_syscall+0x3fb Xfast_syscall+0xfb Just the stack traces are not sufficient to analyze the problem without examining the relevant vnodes and vnode locks. But I believe that I have seen reports about this kind of problem before. And I think that I understand what's going on. And, as a part of my job, I tried to develop a fix [*] for this problem and had some positive feedback for it. But the fix is not just a few lines of changes. It's a lot of modifications to a lot of files. Besides, my changes alter quite a bit more code than a bare minimum required to fix the problem, which still would be quite a bit of changes. So, right now I would like to describe the problem as I understand it. Some general information about the FreeBSD VFS and its difference from Solaris VFS [**] can be useful, but is not really required. I'll try to explain by example. If we look at any mature and "native" FreeBSD filesystem with read-write support - ffs, maybe tmpfs - then we can make the following observations. In most of the vnode operation implementations there are no calls to vnode locking functions. E.g. for an operation like vop_remove two vnodes in question are already locked at the VFS layer. In some cases VOPs do locking, but it is very trivial e.g. in vop_create a newly created vnode must be returned locked. Naturally, if we look at the VFS code we see a lot of vnode locking for various purposes. Like locking the vnodes for vop_remove call. Or locking vnodes during their life cycle management, so that, for example, a vnode is not destroyed while there is an ongoing operation on it. Also, we can see locking in VFS namei / lookup implementation where we need to hold onto a directory vnode while looking up a child vnode by name. But there are two vnode operation implementations where we can see a non-trivial vnode locking "dance". Those are vop_rename and vop_lookup. Anyone is welcome to take a cursory look at the first hundred or so lines of ufs_rename(). The point of the above observations is that both VFS and a filesystem driver do vnode locking. And, thus, both VFS and the driver must cooperate by using the same locking protocol. Now, if we look at the ZFS ZPL code and most prominently at zfs_rename() we see that there is quite a bit of locking going on there, e.g. see zfs_rename_lock, but the locks in question are all ZFS internal locks. We do not see the vnode locks. From this comes a suspicion, or even a conclusion, that ZFS currently does not use the same vnode locking protocol that is expected from any filesystem driver. There is a weird form of redundancy between the fine grained ZFS locks that got ported over and the FreeBSD vnode locks. In some cases the ZFS locks are always uncontested because the vnode locks held at the VFS level ensure a serialized access. In other cases there is no protection at all, because one thread is in VFS code which uses the vnode locks and another thread is in ZFS code which uses the ZFS locks and thus there is no real synchronization between those threads. My solution to this problem was to completely eliminate (at least) the following ZFS locks kmutex_t z_lock; /* znode modification lock */ krwlock_t z_parent_lock; /* parent lock for directories */ krwlock_t z_name_lock; /* "master" lock for dirent locks */ zfs_dirlock_t *z_dirlocks; /* directory entry lock list */ and to ensure that the proper vnode locking protocol is followed. That required substantial changes to the rename and lookup code. Finally, this is not really a suggestion to test or discuss my changes, but rather a call to discuss the problem and other possible ways to fix it. I do not preclude any options including making changes to our VFS (and thus ti all the filesystems) :-) [*] https://github.com/avg-I/freebsd/compare/wip/hc/zfs-fbsd-vfs [**] https://clusterhq.com/blog/complexity-freebsd-vfs-using-zfs-example-part-1-2/ https://clusterhq.com/blog/complexity-freebsd-vfs-using-zfs-example-part-2/ -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Mon Nov 24 21:51:59 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id AFE69A7 for ; Mon, 24 Nov 2014 21:51:59 +0000 (UTC) Received: from mail-la0-x230.google.com (mail-la0-x230.google.com [IPv6:2a00:1450:4010:c03::230]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 444358C4 for ; Mon, 24 Nov 2014 21:51:59 +0000 (UTC) Received: by mail-la0-f48.google.com with SMTP id s18so8313637lam.21 for ; Mon, 24 Nov 2014 13:51:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=delphix.com; s=google; h=mime-version:date:message-id:subject:from:to:content-type; bh=RjyEgQGKhFHagMCqGfe+refh7jUNuW9Lb2shnEc0Ksk=; b=XFwn3hNcfU1Ls4FTBfsa+b0rz4XDOUK8I7CWS+qoBA04r5+vN4bJAYoD8YdIyEXhOv 4C4A+NJfgq8gJLwvVCPw6/6p69XN0m8vzuRVY8MWMwUWg7WU5JHTmCH6E2uEEJ/+nXFe OaHKX1djRG2l46ERUV2igtzZ7He9ZpzFlJUDY= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:date:message-id:subject:from:to :content-type; bh=RjyEgQGKhFHagMCqGfe+refh7jUNuW9Lb2shnEc0Ksk=; b=jVo+Cfk9iTfTwLPkipmsSzTp9k4L4FNRPjE6FlEiZ+9SY4dpO45fulzipC3r97ytjM BTW4R1LZmlp0bJuuUILnPq/OXG3/P86Pv+34BTy7XB8BGKrMjYYcUw1udB0V8n/fdOjk ozPfH8eS95Ef+sb8yJ7Hur400BM15ixPT1smMSGVHelovK8uMyDcPnArBubLOw+m27Mu 0dWBaot9uCe6beQAuUIs5wn0GPq9/Bx85GWSGRQ34XmLNVA95z2N/O6S4SM7zqyeZumP Tlg4Hdy20vOjkhHn2K5PIHhNd7LrjlwbTcJxwvxNb7uMW8VyRC5AhMnHomq64rejC3uN L6Mw== X-Gm-Message-State: ALoCoQmtRAibrB1sKkPGbHSeq5SUn8PhNdVdhLsAWU06SUcvCECghSCLkELH9RfTGcEvI11rVARe MIME-Version: 1.0 X-Received: by 10.112.182.72 with SMTP id ec8mr21757540lbc.87.1416865917329; Mon, 24 Nov 2014 13:51:57 -0800 (PST) Received: by 10.25.170.9 with HTTP; Mon, 24 Nov 2014 13:51:57 -0800 (PST) Date: Mon, 24 Nov 2014 13:51:57 -0800 Message-ID: Subject: video and slides from 2014 OpenZFS Developer Summit From: Matthew Ahrens To: developer , illumos-zfs , freebsd-fs , zfs-discuss , "admin@open-zfs.org" Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 24 Nov 2014 21:51:59 -0000 Thanks to everyone who participated at the 2014 OpenZFS Developer Summit, 2 weeks ago; we received a lot of positive feedback on the event! We have posted the video recordings and slides from the event, including 12 main talks and the presentations from the hackathon. http://www.open-zfs.org/wiki/OpenZFS_Developer_Summit_2014 --matt p.s. My apologies that the audio is not the best -- turn the volume all the way up. We'll find a better solution for next year. From owner-freebsd-fs@FreeBSD.ORG Tue Nov 25 02:11:57 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 42000730 for ; Tue, 25 Nov 2014 02:11:57 +0000 (UTC) Received: from mail-wg0-f52.google.com (mail-wg0-f52.google.com [74.125.82.52]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id CCE0481E for ; Tue, 25 Nov 2014 02:11:56 +0000 (UTC) Received: by mail-wg0-f52.google.com with SMTP id a1so14130192wgh.25 for ; Mon, 24 Nov 2014 18:11:54 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:message-id:date:from:user-agent:mime-version:to :subject:references:in-reply-to:content-type :content-transfer-encoding; bh=cc1xfSMs5bSqMfPR65NUkUENd1WgxCWNGaEwW29YLnI=; b=H208VTB+DKr8/EK6K5vGCHVpwJOt5Uj//TJhzpKwsgSH9NXCkzJlUrvLSnYJpNXKEW DOhxeH22bcy7OQcB8WetegM9hk/DjjrG0wI3OPGoLZxMDvX/+b0nnH5YBJUzwkmVGzTX 9784JwlxsU1kqSYu5+XYQrnexzjbrbjTrad+STLw3My5+9gRAwfjlcK3xlv0kYwctRZT sCwgFRRYwGGSvJhn/yIl5dHmJlAK1CFgcdIniAq5r2SMlfvPyBSkBFYS/nZNAaLYR1YD WB9qt5xwjo1zwOHxbNGLnRwoU747I3pttVid7b1Jocw7nva+Q/LYB4zY5KYECaxCyfPC KIrw== X-Gm-Message-State: ALoCoQmOPwpNPFwTkrrW4GxLiXrdTit1jPMR3sBfLt5YCF1vhqo4ULbABEMZF9ZFENusoyQoJKmz X-Received: by 10.194.200.1 with SMTP id jo1mr21608676wjc.64.1416881069276; Mon, 24 Nov 2014 18:04:29 -0800 (PST) Received: from [10.10.1.68] (82-69-141-170.dsl.in-addr.zen.co.uk. [82.69.141.170]) by mx.google.com with ESMTPSA id nj9sm579396wic.10.2014.11.24.18.04.28 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 24 Nov 2014 18:04:28 -0800 (PST) Message-ID: <5473E3D6.1030705@multiplay.co.uk> Date: Tue, 25 Nov 2014 02:05:10 +0000 From: Steven Hartland User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Re: ZDB -Z? References: In-Reply-To: Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 25 Nov 2014 02:11:57 -0000 If its in the upstream illumos version then its likely already available in HEAD On 24/11/2014 18:49, Zaphod Beeblebrox wrote: > I'm reading about someone else's recovery of files from a damaged ZFS > partition. He claims to have added (possibly to opensolaris or whatnot) an > argument to zdb '-Z' ... which operates somewhat like -R, but which > highlights what parts of the region are on what physical disks, and which > are parity. > > Has anyone patched this into FreeBSD? > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Tue Nov 25 22:37:22 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 4F5F26A2 for ; Tue, 25 Nov 2014 22:37:22 +0000 (UTC) Received: from mail-pd0-f180.google.com (mail-pd0-f180.google.com [209.85.192.180]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 24CF41A1 for ; Tue, 25 Nov 2014 22:37:21 +0000 (UTC) Received: by mail-pd0-f180.google.com with SMTP id p10so1445258pdj.39 for ; Tue, 25 Nov 2014 14:37:20 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:message-id:date:from:user-agent:mime-version:to :subject:content-type:content-transfer-encoding; bh=eEc3NyfPVHf6V8y1V2ij6Vfyc9ooV+0kHRnTt4j3+yk=; b=XSQZPpdtZprC0Su/tSAuovIDoKLMk64is3NWm7xBPd6vyj4eWIIjKnEmeI07wrNgWL 1aXKsJedZQdelYMGrqDW6z6el6lJX+FtYlXeSdBuDzUAzyXPD9/YjcBKhd8ojAyR1EYw lUQwjOyU0kuExFzEquZ+6WT0tK0/8HoH+Gs+P+UzLnB9tRrpJohoZ0LfdzMoH4cyY9q4 hb3O0SVCCHZkRrSHB968Vlz7adNovVKns9FunPFj5cpiENOQAWt1IGbPnWleiObwkM1o cM/sxLOJTnVAgF4VirNW0ZjJChOm6oxRLy4UC/dneOcn8JCHh+WoArmE5lKTBj2j28d3 12Qw== X-Gm-Message-State: ALoCoQnnQbIwkMjjnwK4HLrmPUd7193JZ8LuBcCa7f54rlgOFo3y4tihAVLGSIWvBkHtjdB0lzAq X-Received: by 10.66.124.196 with SMTP id mk4mr46780050pab.144.1416954540465; Tue, 25 Nov 2014 14:29:00 -0800 (PST) Received: from Michaels-MacBook-Pro.local (c-98-246-202-204.hsd1.or.comcast.net. [98.246.202.204]) by mx.google.com with ESMTPSA id i10sm2439657pdr.21.2014.11.25.14.28.58 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 25 Nov 2014 14:28:59 -0800 (PST) Message-ID: <547502A9.5060903@callfortesting.org> Date: Tue, 25 Nov 2014 14:28:57 -0800 From: Michael Dexter User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: How to manually validate a gpart layout? Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 25 Nov 2014 22:37:22 -0000 Hello all, The kernel will report this at boot if a primary GPT is invalid: GEOM: da1: the primary GPT table is corrupt or invalid How does one manually validate the table with gpart? I sense that 'gpart recover' will perform enough of a test to say that recovery is not needed but it is not clear if this is the same test that the kernel is performing. Thank you, Michael (And yes, I asked Michael W. Lucas if it is covered in his new book (GO BUY IT!) and alas, no, it is not.) From owner-freebsd-fs@FreeBSD.ORG Wed Nov 26 00:02:32 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id BA6D59D7 for ; Wed, 26 Nov 2014 00:02:32 +0000 (UTC) Received: from forward5l.mail.yandex.net (forward5l.mail.yandex.net [IPv6:2a02:6b8:0:1819::5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "forwards.mail.yandex.net", Issuer "Certum Level IV CA" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 75ED0C98 for ; Wed, 26 Nov 2014 00:02:32 +0000 (UTC) Received: from smtp2m.mail.yandex.net (smtp2m.mail.yandex.net [77.88.61.129]) by forward5l.mail.yandex.net (Yandex) with ESMTP id EB3CAC40F9C; Wed, 26 Nov 2014 03:02:28 +0300 (MSK) Received: from smtp2m.mail.yandex.net (localhost [127.0.0.1]) by smtp2m.mail.yandex.net (Yandex) with ESMTP id 78ABA420059; Wed, 26 Nov 2014 03:02:28 +0300 (MSK) Received: from 84.201.166.117-vpn.dhcp.yndx.net (84.201.166.117-vpn.dhcp.yndx.net [84.201.166.117]) by smtp2m.mail.yandex.net (nwsmtp/Yandex) with ESMTPSA id rLYqZcawSd-2SQ4ujh8; Wed, 26 Nov 2014 03:02:28 +0300 (using TLSv1.2 with cipher AES128-SHA (128/128 bits)) (Client certificate not present) X-Yandex-Uniq: 09614d4c-89d1-42a5-aec6-72b53c2d9af2 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex.ru; s=mail; t=1416960148; bh=4LeJ/eOPjJ3A0o6drjTYmpIBINZt9+USrIQeWhdyx/Q=; h=Message-ID:Date:From:User-Agent:MIME-Version:To:Subject: References:In-Reply-To:Content-Type:Content-Transfer-Encoding; b=N6hLQrTlT4uZJgebNz4JIff02ENy4hjEZ9eEpORdQxWp6SASXZQ9rXb3TTDx+ACNr UXI4OLKyO5XvB4ZSVU3kprH6wAGeL4yFJmRT1MCpmfOZjsPAmu/7tMollV/yNoI3LX NuFT3h42gR90Hs3Z9Dcvp6VxrcaFnGPcQgcxKw9o= Authentication-Results: smtp2m.mail.yandex.net; dkim=pass header.i=@yandex.ru Message-ID: <54751876.3070801@yandex.ru> Date: Wed, 26 Nov 2014 03:01:58 +0300 From: "Andrey V. Elsukov" User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: Michael Dexter , freebsd-fs@freebsd.org Subject: Re: How to manually validate a gpart layout? References: <547502A9.5060903@callfortesting.org> In-Reply-To: <547502A9.5060903@callfortesting.org> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 Nov 2014 00:02:32 -0000 On 26.11.2014 01:28, Michael Dexter wrote: > The kernel will report this at boot if a primary GPT is invalid: > > GEOM: da1: the primary GPT table is corrupt or invalid > > How does one manually validate the table with gpart? I sense that 'gpart > recover' will perform enough of a test to say that recovery is not > needed but it is not clear if this is the same test that the kernel is > performing. Hi, gpart(8) only sends control requests to the kernel, it doesn't do any tests. -- WBR, Andrey V. Elsukov From owner-freebsd-fs@FreeBSD.ORG Wed Nov 26 05:43:43 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 5B185BB4 for ; Wed, 26 Nov 2014 05:43:43 +0000 (UTC) Received: from mail-wg0-f45.google.com (mail-wg0-f45.google.com [74.125.82.45]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id DEFC31FB for ; Wed, 26 Nov 2014 05:43:42 +0000 (UTC) Received: by mail-wg0-f45.google.com with SMTP id b13so2736285wgh.18 for ; Tue, 25 Nov 2014 21:43:35 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-type; bh=qAkdZBxjTbGx9um1v6E2BrykMabHzWYHkbofI4egPv0=; b=h3eao30SxhMCUfIv/PAMqAs8M8IB0XNt43SOh20fHuBAaNbiLJDLwyCuhBEmUJETsT W6pCADnBvUZipCHev2Bx21wOPdp5p5iorml6y0YxMo/NTrwz5x2ssjukwuEc3PAILc2D rU855TDnu9HETTPmQox2i3OFS+aXPgAaPSSq4B9DmDEAjxnRx4QNJEq0h/1miiR3zqrB nAFl3M2OVPkgqIEOSELZCGcxjoUkObRWI7/jRbo/4dRsflCgsZ8rAhKQdDSCBPR5f97O msS0/NgumAnjaX8GuJEgQahHxmjBRjYnZIwBeXjjMX27UyHW+JDweybWRFv/+2jBV7Nl 9zDg== X-Gm-Message-State: ALoCoQlGherbPgZhpLLYj8iXfQOb164H0TLWVqiEQqWyutOPtbzLfdwhtLAzZrEbs02FtkM6yn+0 X-Received: by 10.180.21.166 with SMTP id w6mr37858888wie.43.1416980615320; Tue, 25 Nov 2014 21:43:35 -0800 (PST) Received: from mail-wi0-f171.google.com (mail-wi0-f171.google.com. [209.85.212.171]) by mx.google.com with ESMTPSA id pf4sm4783601wjb.36.2014.11.25.21.43.34 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 25 Nov 2014 21:43:34 -0800 (PST) Received: by mail-wi0-f171.google.com with SMTP id bs8so11339303wib.4 for ; Tue, 25 Nov 2014 21:43:34 -0800 (PST) X-Received: by 10.180.83.105 with SMTP id p9mr32642132wiy.49.1416980614577; Tue, 25 Nov 2014 21:43:34 -0800 (PST) MIME-Version: 1.0 Received: by 10.194.157.137 with HTTP; Tue, 25 Nov 2014 21:43:14 -0800 (PST) In-Reply-To: References: From: Jov Date: Wed, 26 Nov 2014 13:43:14 +0800 Message-ID: Subject: Re: video and slides from 2014 OpenZFS Developer Summit To: Matthew Ahrens Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 Cc: developer , freebsd-fs , illumos-zfs , zfs-discuss , "admin@open-zfs.org" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 Nov 2014 05:43:43 -0000 Great=EF=BC=81Thanks=EF=BC=81 Jov blog: http:amutu.com/blog 2014-11-25 5:51 GMT+08:00 Matthew Ahrens : > Thanks to everyone who participated at the 2014 OpenZFS Developer Summit,= 2 > weeks ago; we received a lot of positive feedback on the event! > > We have posted the video recordings and slides from the event, including = 12 > main talks and the presentations from the hackathon. > > http://www.open-zfs.org/wiki/OpenZFS_Developer_Summit_2014 > > --matt > > p.s. My apologies that the audio is not the best -- turn the volume all t= he > way up. We'll find a better solution for next year. > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > From owner-freebsd-fs@FreeBSD.ORG Wed Nov 26 10:14:13 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 92AD2918; Wed, 26 Nov 2014 10:14:13 +0000 (UTC) Received: from cu01176b.smtpx.saremail.com (cu01176b.smtpx.saremail.com [195.16.151.151]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4FE23FBA; Wed, 26 Nov 2014 10:14:12 +0000 (UTC) Received: from [172.16.2.2] (izaro.sarenet.es [192.148.167.11]) by proxypop04.sare.net (Postfix) with ESMTPSA id 39FB29DF488; Wed, 26 Nov 2014 11:14:04 +0100 (CET) Subject: Re: BIOS booting from disks > 2TB Mime-Version: 1.0 (Apple Message framework v1283) Content-Type: text/plain; charset=iso-8859-1 From: Borja Marcos In-Reply-To: <201411241425.32989.jhb@freebsd.org> Date: Wed, 26 Nov 2014 11:14:00 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: <3C86981C-F332-43B6-979E-6B9786DF9584@sarenet.es> References: <17A2AC72-AD70-480A-9BAC-9CC8EAFD572F@we.lc.ehu.es> <201411201110.45066.jhb@freebsd.org> <201411241425.32989.jhb@freebsd.org> To: John Baldwin X-Mailer: Apple Mail (2.1283) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 Nov 2014 10:14:13 -0000 On Nov 24, 2014, at 8:25 PM, John Baldwin wrote: > On Thursday, November 20, 2014 2:08:54 pm borjam@sarenet.es wrote: >> El 20.11.2014 17:10, John Baldwin escribi=F3: >>> Can you start with 'lsdev -v' at the loader prompt? >>=20 >> Sure! >>=20 >> cd devices: >> disk devices: >> disk0: BIOS drive C: >> pxe devices: >=20 > Ugh. So it means we aren't seeing any partitions or filesystems on > disk0, and that is why the loader craps out. Can you build a loader = with > "-DDISK_DEBUG" enabled in CFLAGS and capture the output from that = starting > up? Sure, I'll try to find some time today. Borja. From owner-freebsd-fs@FreeBSD.ORG Wed Nov 26 10:57:53 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id E70CD1C9 for ; Wed, 26 Nov 2014 10:57:53 +0000 (UTC) Received: from smtprelay05.ispgateway.de (smtprelay05.ispgateway.de [80.67.31.97]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id A817768E for ; Wed, 26 Nov 2014 10:57:53 +0000 (UTC) Received: from [84.44.211.170] (helo=fabiankeil.de) by smtprelay05.ispgateway.de with esmtpsa (TLSv1.2:AES128-GCM-SHA256:128) (Exim 4.84) (envelope-from ) id 1XtaI4-000810-PS; Wed, 26 Nov 2014 11:57:44 +0100 Date: Wed, 26 Nov 2014 11:57:47 +0100 From: Fabian Keil To: Matthew Ahrens Subject: Re: video and slides from 2014 OpenZFS Developer Summit Message-ID: <1a14a4a4.41455a02@fabiankeil.de> In-Reply-To: References: MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; boundary="Sig_/GNTuOKbbM=umbI1/JahZ8cm"; protocol="application/pgp-signature" X-Df-Sender: Nzc1MDY3 Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 Nov 2014 10:57:54 -0000 --Sig_/GNTuOKbbM=umbI1/JahZ8cm Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable Matthew Ahrens wrote: > Thanks to everyone who participated at the 2014 OpenZFS Developer Summit,= 2 > weeks ago; we received a lot of positive feedback on the event! >=20 > We have posted the video recordings and slides from the event, including = 12 > main talks and the presentations from the hackathon. >=20 > http://www.open-zfs.org/wiki/OpenZFS_Developer_Summit_2014 Thanks a lot. In somewhat related news, I recently published the slides for a zogftw presentation I did at at ORR 2014: http://programm.openrheinruhr.de/2014/events/323.de.html The presentation was given in German, but the slides mostly consist of terminal "screen shots" and may be of interest to non-German speakers as well. Fabian --Sig_/GNTuOKbbM=umbI1/JahZ8cm Content-Type: application/pgp-signature Content-Description: OpenPGP digital signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEARECAAYFAlR1sisACgkQBYqIVf93VJ15kgCgx4yh9SrzIsAs2v/eD2zEr5lz 00AAn1VMZGTgyw2Moj0Lgp0J1MRRYX89 =his6 -----END PGP SIGNATURE----- --Sig_/GNTuOKbbM=umbI1/JahZ8cm-- From owner-freebsd-fs@FreeBSD.ORG Wed Nov 26 18:14:09 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id AF5B1E28; Wed, 26 Nov 2014 18:14:09 +0000 (UTC) Received: from odin.blazingdot.com (odin.blazingdot.com [204.109.60.170]) by mx1.freebsd.org (Postfix) with ESMTP id 9579AD49; Wed, 26 Nov 2014 18:14:09 +0000 (UTC) Received: by odin.blazingdot.com (Postfix, from userid 1001) id CC54D13143F; Wed, 26 Nov 2014 13:06:33 -0500 (EST) Date: Wed, 26 Nov 2014 13:06:33 -0500 From: Marcus Reid To: freebsd-current@freebsd.org, freebsd-fs@freebsd.org Subject: Delayed atime updates ("lazytime") Message-ID: <20141126180633.GA69028@blazingdot.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Coffee-Level: nearly-fatal User-Agent: Mutt/1.5.23 (2014-03-12) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 Nov 2014 18:14:09 -0000 Hi, Looks like Linux is about to grow another solution to handling atime updates differently: http://lwn.net/SubscriberLink/621046/e59938475fd3e874/ In short, it will only write out atime changes periodically (daily), or if there is another reason to write out the inode, or if the inode is about to be pushed out of cache. This seems like a pretty good compromise. Currently, the ZFS configuration that results from using bsdinstall disables atime on all but /var/mail, which is the only example of disabling atime by default that I'm aware of outside of Gentoo Linux. I can't seem to find any information that talks about the rationale behind that, though a couple things come to mind: - some additional IO generated (but that's always been the case) - additional wear on SSD devices (enough to compel the change?) - zfs snapshot growth (but the snapshot stops growing after one full set of inode updates) - wake up otherwise idle spinning media on a laptop (the actual reason that was cited as motivation for the change) Something like lazytime would address most of those concerns, and people who are even more OCD than that could disable atime completely on their machine. Marcus From owner-freebsd-fs@FreeBSD.ORG Wed Nov 26 19:45:50 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 8EC9DE8; Wed, 26 Nov 2014 19:45:50 +0000 (UTC) Received: from mail-ie0-x22c.google.com (mail-ie0-x22c.google.com [IPv6:2607:f8b0:4001:c03::22c]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 5D1B38DD; Wed, 26 Nov 2014 19:45:50 +0000 (UTC) Received: by mail-ie0-f172.google.com with SMTP id tr6so3331595ieb.3 for ; Wed, 26 Nov 2014 11:45:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=Ha84WawLkX1GqtJTMN2NgsrNTAmQAN04zoM2SYz3U6U=; b=mGZ2q9Ifu/3s+30sTLr9ogRyU8NfjpPMMzatAPNeAJbCBGrMMVWThSUwSXe5uf76cH EVeHrfXfJfJMu+ZoYbAj9HXxgGTzYvgdxmBagoPzOtTsLPBpX3klha+I32DvZ7EwMkoG ieo4CaKwtqzOEVvYul+tvNHL4Dk+nOPlLQEdN4zYLU+YRw1MGr0/OPFrX3xlHLGeUpdg PIwe9jk0jTbmf+DeBVASeUByF4R60AFlY38LVb+DhDJS1tnzYR9OomOkZs2nH/MI52Ig 5fH1DNVcbhQ9QngvjRPvgued0838KLl8uSm59bK9rNm+StzheckVjpWQiHMu5f05Sr3H qKaw== MIME-Version: 1.0 X-Received: by 10.43.75.138 with SMTP id za10mr31832998icb.23.1417031149562; Wed, 26 Nov 2014 11:45:49 -0800 (PST) Sender: kob6558@gmail.com Received: by 10.107.7.169 with HTTP; Wed, 26 Nov 2014 11:45:49 -0800 (PST) In-Reply-To: <20141126180633.GA69028@blazingdot.com> References: <20141126180633.GA69028@blazingdot.com> Date: Wed, 26 Nov 2014 11:45:49 -0800 X-Google-Sender-Auth: GS3Coc1vbtv4zLsZVAXuZionv8I Message-ID: Subject: Re: Delayed atime updates ("lazytime") From: Kevin Oberman To: Marcus Reid Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 Cc: FreeBSD FS , FreeBSD Current X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 Nov 2014 19:45:50 -0000 On Wed, Nov 26, 2014 at 10:06 AM, Marcus Reid wrote: > Hi, > > Looks like Linux is about to grow another solution to handling atime > updates differently: > > http://lwn.net/SubscriberLink/621046/e59938475fd3e874/ > > In short, it will only write out atime changes periodically (daily), or > if there is another reason to write out the inode, or if the inode is > about to be pushed out of cache. This seems like a pretty good > compromise. > > Currently, the ZFS configuration that results from using bsdinstall > disables atime on all but /var/mail, which is the only example of > disabling atime by default that I'm aware of outside of Gentoo Linux. > I can't seem to find any information that talks about the rationale > behind that, though a couple things come to mind: > > - some additional IO generated (but that's always been the case) > - additional wear on SSD devices (enough to compel the change?) > - zfs snapshot growth (but the snapshot stops growing after one > full set of inode updates) > - wake up otherwise idle spinning media on a laptop (the actual reason > that was cited as motivation for the change) > > Something like lazytime would address most of those concerns, and people > who are even more OCD than that could disable atime completely on their > machine. > > Marcus > > About time. VMS started doing this over a quarter century ago. Worked very well. Of course, the VMS file system (ODS-2) has little in common with either ZFS or UFS, but it had an interesting twist. There was a per-disk update "window" that could be modified on a per-file basis, so that you could specify the "update atime for every access" if you really needed it, but normally it would only update atime every so many seconds. I don't remember the system default any more. This kept almost everyone happy. VMS previously had no equivalent to atime and had lots of request for it, but the developers did not want to impact performance as drastically as updating the access time on every access would have done. I don't know how or if such a scheme could be implemented in FreeBSD file systems, but it was a very nice way of handling the issue. -- R. Kevin Oberman, Network Engineer, Retired E-mail: rkoberman@gmail.com From owner-freebsd-fs@FreeBSD.ORG Wed Nov 26 20:30:23 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 2879AE50; Wed, 26 Nov 2014 20:30:23 +0000 (UTC) Received: from anubis.delphij.net (anubis.delphij.net [IPv6:2001:470:1:117::25]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "anubis.delphij.net", Issuer "StartCom Class 1 Primary Intermediate Server CA" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 023ABE1C; Wed, 26 Nov 2014 20:30:23 +0000 (UTC) Received: from zeta.ixsystems.com (unknown [12.229.62.2]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by anubis.delphij.net (Postfix) with ESMTPSA id 2EF60F7DA; Wed, 26 Nov 2014 12:30:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=delphij.net; s=anubis; t=1417033822; x=1417048222; bh=/mw5x7VqQvTHv9S9/O8T6v5vMe9KCysAIzvuvnL1q7M=; h=Date:From:Reply-To:To:Subject:References:In-Reply-To; b=4iTPbm4QXhcED5rcxqwTa23DhsPehSY0lFKoHvQNv6iDdCSNQk+oU0tt6XE1Xblfn 9BnlWyPh2Oj6yCvYAR/TiieAeC9iasupforjPKJ+lSFtGgteimn0WlUuF2AVCIm01v A/2a8x5rToiX1q/dOGjLqRtP/hBIDXgAjDoijWo4= Message-ID: <5476385D.5020404@delphij.net> Date: Wed, 26 Nov 2014 12:30:21 -0800 From: Xin Li Reply-To: d@delphij.net Organization: The FreeBSD Project MIME-Version: 1.0 To: Marcus Reid , freebsd-current@freebsd.org, freebsd-fs@freebsd.org Subject: Re: Delayed atime updates ("lazytime") References: <20141126180633.GA69028@blazingdot.com> In-Reply-To: <20141126180633.GA69028@blazingdot.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 Nov 2014 20:30:23 -0000 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 On 11/26/14 10:06, Marcus Reid wrote: > Hi, > > Looks like Linux is about to grow another solution to handling > atime updates differently: > > http://lwn.net/SubscriberLink/621046/e59938475fd3e874/ > > In short, it will only write out atime changes periodically > (daily), or if there is another reason to write out the inode, or > if the inode is about to be pushed out of cache. This seems like a > pretty good compromise. > > Currently, the ZFS configuration that results from using > bsdinstall disables atime on all but /var/mail, which is the only > example of disabling atime by default that I'm aware of outside of > Gentoo Linux. I can't seem to find any information that talks about > the rationale behind that, though a couple things come to mind: > > - some additional IO generated (but that's always been the case) - > additional wear on SSD devices (enough to compel the change?) - zfs > snapshot growth (but the snapshot stops growing after one full set > of inode updates) - wake up otherwise idle spinning media on a > laptop (the actual reason that was cited as motivation for the > change) > > Something like lazytime would address most of those concerns, and > people who are even more OCD than that could disable atime > completely on their machine. I think bsdinstall disables atime because it's an "useful default". The lazytime idea seems to be a better compromise. PS. A while back I have implemented a 'relatime' feature on FreeBSD in a private branch on my github repository, but never have pushed it further due to a difference in semantics (which needs to be fixed: atime should still be updated after some time, while my version only update it once, the Linux semantics is more useful for cleanup applications to identify unused files) and partially lack of interest from the community. Cheers, - -- Xin LI https://www.delphij.net/ FreeBSD - The Power to Serve! Live free or die -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.1 iQIcBAEBCgAGBQJUdjhZAAoJEJW2GBstM+nsJvsQAJAYNhKU3+3OTIEX7+1w1WlQ SPO55FrZ86nRfYIDbioafXqXki5QrjDrZLwaP2wwLMOmclZBVxliKiFUnRSXNdl+ q0j2jSYiue3GKNvN6nLRTCWqe4lYg46btmVhqBsJnATLxDq4fH/5+FwsORgSgTOq LENUyYDJ8beuYCCD52Rs7RklNhQqfEPPbNWclLuWqjq6YYcqfRjgXD0PHJpmhMcR NOMRnkv8BtcvsOwD09uYqfsWZX5cO2yb1JdlvGRVft6xHLLOhCaAxOhhz7yeTSzq OrvUSRw2rCRJdNqfUpLcN1oK7Fu2f13HrqPXGeOKc96VE6pX2ADaoCtKXgtDFf0W qCmR1jhu5v/NAHxTZjRR+Lpf3zO/NA0lS3+uCFjxFjBy5NwFdh2MsNRBWV6EBdYF kJ5DqsIqLfW89F7jtKnp3qaxhyySwKlgqDooVMrClCkz6Doy84dBzA44b8yQnHri YcUlXgfBz33qfMP+pywRKOC25mQe05u1yk33dp1QTTxPVW+BvDMxgwaTqSpqTvyB yHTm//Dz+UdNDkxL82aVw4pfNhhOPb52jWz7MNTVYTP15w3+rY45sChgux02ltNE gEm1MnJIBYmFNQq5orcjLSGIKTL6VlrDmC6rd7zXEagQ1D34LknziE61m6/yeZTI 4lcmm6CWRz/L2cfOLR/p =I8Cg -----END PGP SIGNATURE----- From owner-freebsd-fs@FreeBSD.ORG Thu Nov 27 23:51:04 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 7F57D857 for ; Thu, 27 Nov 2014 23:51:04 +0000 (UTC) Received: from mail.yellowspace.net (mail.yellowspace.net [80.190.192.217]) by mx1.freebsd.org (Postfix) with ESMTP id 27A80F5C for ; Thu, 27 Nov 2014 23:51:03 +0000 (UTC) Received: from peta.fritz.box ([185.17.207.118]) (AUTH: PLAIN lopez.on.the.lists@yellowspace.net, SSL: TLSv1/SSLv3, 128bits, AES128-SHA) by mail.yellowspace.net with esmtp; Fri, 28 Nov 2014 00:45:52 +0100 id 02B25C2A.000000005477B7B0.0000178F Message-ID: <5477B7AF.5020802@yellowspace.net> Date: Fri, 28 Nov 2014 00:45:51 +0100 From: Lorenzo Perone User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Re: HAST, zvols, istgt and carp working... References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: bill@ethernext.com X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 27 Nov 2014 23:51:04 -0000 Hello, I am quoting the below thread fully as this is an old thread. But I ended up here while searching on the same subject. I have a question on the same kind of setup. In some situations, I thought it might be useful to use HAST on ZVOLs for single jails - which might be running on machine a and b or c and d. So my idea is the following: HAST on a zvol, with two volumes per filesystem: One for a UFS filesystem and one for a gjournal. Example - quick code rush: zfs create -o compression=on -s -b 4096 -V 2T rtank/vols/jail1 zfs create -o compression=on -b 4096 -V 32G rtank/vols/jail1_j hastctl create jail1 service hastd onestart hastctl create jail1_j hastctl role primary jail1 hastctl role primary jail1_j gjournal label hast/jail1 hast/jail1_j newfs -J -U /dev/hast/jail1.journal mount /dev/hast/jail1.journal /jails/jail1 I thought of gjournal, as that would allow to choose replication = memsync for the filesystem, and async for the journal: speedup while keeping the filesystem consistent. For as far as I get, it works in my tests, and performance is OK. So, now down to my question: It would be great to be able to mount read-only snapshots of the zvol... But it does not seem to succeed: zfs snapshot rtank/vols/jail1@hellotest mount -t ufs -o ro /dev/zvol/rtank/vols/jail1@hellotest /mnt mount: /dev/zvol/rtank/vols/jail1@hellotest: Invalid argument Even doing a clone does not succeed. I guess this is because the newfs, of course, was done on the hast volume. Is there any way to mount the snapshot read only - or to have a 'hast' wrapper for the snapshot without having to really 'hast' it? Thanx a lot for any comment or hint.. (even if it is: "bad idea, don't do any of that"'..) Of course I could use the hast vols to make another zpool on top of them, and then snapshot those. But I had a bad feeling @ reliability for a setup like that (correct feeling?) Thanks a lot in advance for any time taken by anyone to reply.. Greetings and Regards, Lorenzo On 01.03.11 04:35, Bill Desjardins wrote: > Hello All, > > as an experiment today I setup a couple 8-stable guests on vmware ESXI to test > hast with zfs, carp and istgt for a redundant nas system I am putting > together. > I havent seen any mention that anyone has used hast to mirror a zfs zvol so I > figured I would try it and atleast my proof of concept seems to work just fine. > > is anyone doing this and using it in a production environment? > > heres how I setup the testing environment... > > - created (2) 8-stable hosts on esxi 4.1: hast-1 & hast-2 (os on da0) > > on both hast-1 and hast-2 > > - added 4 x 8GB disk's to each (da1 - da4) > - glabel'd disks disk1 - disk4 > - zpool create tank mirror disk1 disk2 mirror disk3 disk4 > - zfs create -p -s -b 64k -V 4G tank/hzvol.1 > > hast.conf on each > > resource tank_hzvol.1 { > local /dev/zvol/tank/hzvol.1 > on hast-1 { > remote x.x.x.9 > } > on hast-2 { > remote x.x.x.8 > } > } > > on hast-1 and hast-2 > >> hastd_enable="YES" in rc.conf >> hastctl create tank_hzvol.1 >> /etc/rc.d/hastd start > > on hast-2 > >> hastctl role secondary tank_hzvol.1 > > on hast-1 > >> hastctl role primary tank_hzvol.1 > > hastctl status reports all is well so far... > > next I configured istgt identically on hast-1 and hast-2 for the hast device > >>> LUN0 Storage /dev/hast/tank_hzvol.1 3G > > istgt was started (istgt onestart) and the zvol target was setup on > another vmware esxi server > which then was formatted as a vmfs volume. I created a 2GB disk on this volume > and added it to another 8-stable host as a ufs disk mounted on /mnt. so far > going good, everything working as expected. > > to test hast replication, I created a few 200MB files on the host with the > ufs vmdk volume and seen traffic over the hast network from hast-1 to hast-2. on > hast-1, the zvol size reflected correct sparse disk space usage, but > hast-2 showed > the full 4GB zvol allocated which I suspect is due to hast. > > to test failover of the isci zvol target from hast-1 to hast-2: > > on hast-1 > >> istgt stop >> hastctl role secondary tank_hzvol.1 > > on hast-2 > >> hastctl role primary tank_hzvol.1 >> istgt onestart > > NOTE: carp does not seem to work on esxi for me so between hast-1 and hast-2 I > manually moved the IP for istgt to hast-2. > > the result was that the istgt hast zvol successfully failed over to hast-2 > with only a brief stall while I manually performed the failover process. I only > performed the ideal manual failover scenario for proof of concept. I will be > testing this on 2 real development servers later this week for a more > complete understanding. > > > I see some real advantages for zvols only hast: > ++++++++++++++++++++++++++++++++++++++++++++++++ > > + no need to hast each individual disk in the zpool so you can access all > available storage on either storage unit > + maintaining storage units remains functionally consistent between them > + once setup, zvols are easily migrated to new storage environments in real-time > since there is only a single zvol hast resource to replicate. (no need > to have all > matching zpool hast members, just reconfigure the primary zvol hast > resource to point to > a new secondary server and swap roles/failover when ready) > + can have active hast zvols on each unit to distribute IO > + no need for zpool export/import on failover > + hast easily added to current zvols > + retains performance of entire zpool > + zpool can be expanded without changing hast config > + minimizes hast replication traffic between storage units > + hast split-brain localized to specific zvol's > + can use ufs on hast zvol resource for things like samba and nfs > > cons > ------------------------------------------- > > - performace impact (???) > - each hast zvol requires distinct application configurations (more > confgurations to deal with/screw up) > - zfs sparse volumes seem not to be working correctly via hast (???) > - expanding zvol requires hastctl create, init, startup plus may need > application specific changes/restart. > - other methods needed to replicate data in rest of pool > - possible long rebuild time on large zvols? > - snapshots / rollbacks (???) > - many more??? > > my main question is if using hast to replicate a zvol is a supported > configuration and what are the possible drawbacks? Its more than > likely I am overlooking some very basic requirement/restrictions and > am blatantly wrong in all this, but if it can perform, I think its a big+ > for freebsd and zfs useability as a nas server. > > thoughts? comments? criticisms? :) > > Best, > > Bill > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > From owner-freebsd-fs@FreeBSD.ORG Fri Nov 28 11:16:32 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 09C47BB5 for ; Fri, 28 Nov 2014 11:16:32 +0000 (UTC) Received: from mail.yellowspace.net (mail.yellowspace.net [80.190.192.217]) by mx1.freebsd.org (Postfix) with ESMTP id 9259C829 for ; Fri, 28 Nov 2014 11:16:29 +0000 (UTC) Received: from furia.intranet ([88.217.4.119]) (AUTH: PLAIN lopez.on.the.lists@yellowspace.net, SSL: TLSv1/SSLv3, 128bits, AES128-SHA) by mail.yellowspace.net with esmtp; Fri, 28 Nov 2014 12:16:22 +0100 id 02B25C31.0000000054785986.00012ADC Message-ID: <54785AA1.10107@yellowspace.net> Date: Fri, 28 Nov 2014 12:21:05 +0100 From: Lorenzo Perone User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Mounting a filesystem on a HAST provider without HAST (was: Re: HAST, zvols, istgt and carp working...) References: <5477B7AF.5020802@yellowspace.net> In-Reply-To: <5477B7AF.5020802@yellowspace.net> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 28 Nov 2014 11:16:32 -0000 Hi All, I succeeded on this topic, but a new question arose - so I'd like to share the (provisory) solution as well as the question... On 28.11.14 00:45, Lorenzo Perone wrote: > Is there any way to mount the snapshot [of a provider used by hast] read only - or to have a 'hast' > wrapper for the snapshot without having to really 'hast' it? I was able to succeed after comparing a hexdump of the first few MBs of the original zvol and the vol "under hast". ( /dev/zvol/rtank/vols/jail1@hellotest vs. /dev/hast/jail1 ) The UFS filesystem on the zvol provider begins after 135168 bytes (132KB) of hast-header. So I tried to create a transparent provider with gnop: gnop create -o 135168 /dev/zvol/rtank/vols/jail1@hellotest and the mount mount -t ufs -o ro /dev/zvol/rtank/vols/jail1@hellotest.nop /mnt succeeded. Now my new question is: Can I "safely" assume that this header is always 132K (I assume not - it probably contains the bitmap...)? In this case, which formula should be applied to calculate the offset? Thanks a lot in advance for any hint, And to the FreeBSD team / pjd: THANKs for GEOM! :-) It really is the best invention closely after espresso coffee... Gretings and Regards, From owner-freebsd-fs@FreeBSD.ORG Sat Nov 29 12:31:35 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 40FCD72E for ; Sat, 29 Nov 2014 12:31:35 +0000 (UTC) Received: from mail-wi0-x22a.google.com (mail-wi0-x22a.google.com [IPv6:2a00:1450:400c:c05::22a]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id C3B2E17A for ; Sat, 29 Nov 2014 12:31:34 +0000 (UTC) Received: by mail-wi0-f170.google.com with SMTP id bs8so23027848wib.5 for ; Sat, 29 Nov 2014 04:31:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=PAXNUxYXPlVI2e0Jiyg+Vopt0RZuuXIBFPzYkK/Pg10=; b=Ld9lgXQ45hZJ2O/NILeQO1lCLBBooNwBgl95OfGPkJV/gdHOFzsb3hJtxolM1GHJM9 mMqo54POUtXd6xicf2+D+/DY29rtbI54WOx+pVG0RhEXRKlgAgByNPZ5XT166IJk9z+e kthKyQJIyst0zN9drwTlFc61OP0JqjSOEgX9AzV6dQtcZP95fR2zNHJQcMHK+katbMml qr9HyiCc7/UhgMzyNf1NOvgJsrajIBkLv7NKmtvWv21cQHyXYdH0JdKNgSaVyU0p3x6B 1/EX/uPWtlwcPv1qXEjwtSf8BUULRs4egEqZMn6gaZA/N1s9dvpicqSXIGIf5rBCFc6X vAeg== X-Received: by 10.181.8.72 with SMTP id di8mr42394106wid.1.1417264293226; Sat, 29 Nov 2014 04:31:33 -0800 (PST) Received: from localhost ([91.225.202.111]) by mx.google.com with ESMTPSA id h2sm19775259wix.5.2014.11.29.04.31.32 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 29 Nov 2014 04:31:32 -0800 (PST) Date: Sat, 29 Nov 2014 12:31:30 +0000 From: Mykola Golub To: Lorenzo Perone Subject: Re: Mounting a filesystem on a HAST provider without HAST (was: Re: HAST, zvols, istgt and carp working...) Message-ID: <20141129123130.GA21946@gmail.com> References: <5477B7AF.5020802@yellowspace.net> <54785AA1.10107@yellowspace.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <54785AA1.10107@yellowspace.net> User-Agent: Mutt/1.5.23 (2014-03-12) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 29 Nov 2014 12:31:35 -0000 On Fri, Nov 28, 2014 at 12:21:05PM +0100, Lorenzo Perone wrote: > The UFS filesystem on the zvol provider begins after 135168 bytes > (132KB) of hast-header. > > So I tried to create a transparent provider with gnop: > > gnop create -o 135168 /dev/zvol/rtank/vols/jail1@hellotest > > and the mount > > mount -t ufs -o ro /dev/zvol/rtank/vols/jail1@hellotest.nop /mnt > > succeeded. > > Now my new question is: Can I "safely" assume that this header is always > 132K (I assume not - it probably contains the bitmap...)? In this case, > which formula should be applied to calculate the offset? The header contains 4096 bytes of HAST metadata (METADATA_SIZE) + map, which size depends on media size and extent size. You can obtain the offset by running `hastctl dump' command (localcnt field). If you still want to calulate it yourself, see how hastctl calculates hr_localoff in sbin/hastctl/hastctl.c:create_one() with the help of sbin/hastd/metadata.c:activemap_calc_ondisk_size(). In short, you need to calculate number of extents (mediasize / extentsize) and reserve a bit for every extent in the map, rounding up to the sector size. -- Mykola Golub From owner-freebsd-fs@FreeBSD.ORG Sat Nov 29 15:42:24 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id C144CA4D for ; Sat, 29 Nov 2014 15:42:24 +0000 (UTC) Received: from mail.yellowspace.net (mail.yellowspace.net [80.190.192.217]) by mx1.freebsd.org (Postfix) with ESMTP id 69A6F69E for ; Sat, 29 Nov 2014 15:42:21 +0000 (UTC) Received: from peta.fritz.box ([88.217.180.10]) (AUTH: PLAIN lopez.on.the.lists@yellowspace.net, SSL: TLSv1/SSLv3, 128bits, AES128-SHA) by mail.yellowspace.net with esmtp; Sat, 29 Nov 2014 16:41:25 +0100 id 02B81C2F.000000005479E925.0000A2C5 Message-ID: <5479E91F.6020809@yellowspace.net> Date: Sat, 29 Nov 2014 16:41:19 +0100 From: Lorenzo Perone User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 MIME-Version: 1.0 To: Mykola Golub Subject: Re: Mounting a filesystem on a HAST provider without HAST References: <5477B7AF.5020802@yellowspace.net> <54785AA1.10107@yellowspace.net> <20141129123130.GA21946@gmail.com> In-Reply-To: <20141129123130.GA21946@gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 29 Nov 2014 15:42:24 -0000 On 29.11.14 13:31, Mykola Golub wrote: > You can obtain the offset by running `hastctl dump' command (localcnt > field). > > If you still want to calulate it yourself, see how hastctl calculates > hr_localoff in sbin/hastctl/hastctl.c:create_one() with the help of > sbin/hastd/metadata.c:activemap_calc_ondisk_size(). In short, you need > to calculate number of extents (mediasize / extentsize) and reserve a > bit for every extent in the map, rounding up to the sector size. Thanks a lot for both pointers! hastctl dump will do perfectly well for now, it can easilly be wrapped in a local shell script. Greetings and Regards, Lorenzo