From owner-freebsd-fs@FreeBSD.ORG Sun Jan 8 15:56:17 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 35037106566B for ; Sun, 8 Jan 2012 15:56:17 +0000 (UTC) (envelope-from randy@psg.com) Received: from ran.psg.com (ran.psg.com [IPv6:2001:418:1::36]) by mx1.freebsd.org (Postfix) with ESMTP id 1B6348FC16 for ; Sun, 8 Jan 2012 15:56:17 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=rair.psg.com.psg.com) by ran.psg.com with esmtp (Exim 4.77 (FreeBSD)) (envelope-from ) id 1Rjv6W-000MEx-BY for freebsd-fs@freebsd.org; Sun, 08 Jan 2012 15:56:16 +0000 Date: Sun, 08 Jan 2012 10:56:15 -0500 Message-ID: From: Randy Bush To: FreeBSD FS User-Agent: Wanderlust/2.15.9 (Almost Unreal) Emacs/22.3 Mule/5.0 (SAKAKI) MIME-Version: 1.0 (generated by SEMI 1.14.6 - "Maruoka") Content-Type: text/plain; charset=US-ASCII Subject: zfs with a bunch of drives X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 08 Jan 2012 15:56:17 -0000 we want to build a new 16-drive RELENG_9 server on which we intend to run a bunch of vboxen. a couple of years ago, we built a 16-drive server. based on advice here not to just do a big raidz, it was configured as eight mirrors tank ONLINE 0 0 0 mirror ONLINE 0 0 0 label/m00-d01 ONLINE 0 0 0 label/m00-d00 ONLINE 0 0 0 mirror ONLINE 0 0 0 label/m01-d00 ONLINE 0 0 0 label/m01-d01 ONLINE 0 0 0 and so forth. is this still the best advice for performance? and is there a url for building a bootable RELENG_9 zfs-only system? last i looked there were a half dozen all somewhat different. thanks for clue. randy From owner-freebsd-fs@FreeBSD.ORG Sun Jan 8 17:38:43 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BFA8A106564A for ; Sun, 8 Jan 2012 17:38:43 +0000 (UTC) (envelope-from kraduk@gmail.com) Received: from mail-gx0-f182.google.com (mail-gx0-f182.google.com [209.85.161.182]) by mx1.freebsd.org (Postfix) with ESMTP id 7BAD78FC0C for ; Sun, 8 Jan 2012 17:38:43 +0000 (UTC) Received: by ggnp1 with SMTP id p1so1536310ggn.13 for ; Sun, 08 Jan 2012 09:38:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=Pu+R8WgONtlRar7EH/UwMmLMzjN02v7HIwGrLEHiGhY=; b=eFt2+x5p+OY/4iQQP43SEEIgXSd44zB7VOboGtdFU26WlmcDHi+Hguw9nAoGAmYepW aLRZjUbg/tQbzBFbrvFTaofpm3aGnjHpJFZiKDrjUJ/e4J0ySjPBUsUaxdGvrak2Qkot PwLdJtrfxENcXAsAGakW/FJLKVxu69jTlW56s= MIME-Version: 1.0 Received: by 10.100.206.2 with SMTP id d2mr5861705ang.3.1326044322819; Sun, 08 Jan 2012 09:38:42 -0800 (PST) Received: by 10.236.139.193 with HTTP; Sun, 8 Jan 2012 09:38:42 -0800 (PST) In-Reply-To: References: Date: Sun, 8 Jan 2012 17:38:42 +0000 Message-ID: From: krad To: Randy Bush Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: FreeBSD FS Subject: Re: zfs with a bunch of drives X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 08 Jan 2012 17:38:43 -0000 On 8 January 2012 15:56, Randy Bush wrote: > we want to build a new 16-drive RELENG_9 server on which we intend to > run a bunch of vboxen. a couple of years ago, we built a 16-drive > server. based on advice here not to just do a big raidz, it was > configured as eight mirrors > > tank ONLINE 0 0 0 > mirror ONLINE 0 0 0 > label/m00-d01 ONLINE 0 0 0 > label/m00-d00 ONLINE 0 0 0 > mirror ONLINE 0 0 0 > label/m01-d00 ONLINE 0 0 0 > label/m01-d01 ONLINE 0 0 0 > > and so forth. is this still the best advice for performance? > > and is there a url for building a bootable RELENG_9 zfs-only system? > last i looked there were a half dozen all somewhat different. > > thanks for clue. > > randy > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > striping mirrors will be fast but wasteful on space. two groups of raidz2 would be nice but slightly slower depends whats more important to you though From owner-freebsd-fs@FreeBSD.ORG Sun Jan 8 18:17:48 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6400C106566B for ; Sun, 8 Jan 2012 18:17:48 +0000 (UTC) (envelope-from dg17@penx.com) Received: from btw.pki2.com (btw.pki2.com [IPv6:2001:470:a:6fd::2]) by mx1.freebsd.org (Postfix) with ESMTP id 09DBE8FC12 for ; Sun, 8 Jan 2012 18:17:47 +0000 (UTC) Received: from [127.0.0.1] (localhost [127.0.0.1]) by btw.pki2.com (8.14.5/8.14.5) with ESMTP id q08IHhsj050808; Sun, 8 Jan 2012 10:17:43 -0800 (PST) (envelope-from dg17@penx.com) From: Dennis Glatting To: freebsd-fs@freebsd.org, Randy Bush In-Reply-To: References: Content-Type: text/plain; charset="us-ascii" Date: Sun, 08 Jan 2012 10:17:43 -0800 Message-ID: <1326046663.53707.39.camel@btw.pki2.com> Mime-Version: 1.0 X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port Content-Transfer-Encoding: 7bit X-yoursite-MailScanner-Information: Dennis Glatting X-yoursite-MailScanner-ID: q08IHhsj050808 X-yoursite-MailScanner: Found to be clean X-MailScanner-From: dg17@penx.com X-yoursite-MailScanner-Watermark: 1326651464.86807@jRsYcbtZLqON7UTNnx+Nww Cc: Subject: Re: zfs with a bunch of drives X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: dg17@penx.com List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 08 Jan 2012 18:17:48 -0000 On Sun, 2012-01-08 at 10:56 -0500, Randy Bush wrote: > we want to build a new 16-drive RELENG_9 server on which we intend to > run a bunch of vboxen. a couple of years ago, we built a 16-drive > server. based on advice here not to just do a big raidz, it was > configured as eight mirrors > > tank ONLINE 0 0 0 > mirror ONLINE 0 0 0 > label/m00-d01 ONLINE 0 0 0 > label/m00-d00 ONLINE 0 0 0 > mirror ONLINE 0 0 0 > label/m01-d00 ONLINE 0 0 0 > label/m01-d01 ONLINE 0 0 0 > > and so forth. is this still the best advice for performance? > > and is there a url for building a bootable RELENG_9 zfs-only system? > last i looked there were a half dozen all somewhat different. > I would do a performance measurement against a three drive RAIDz with cache verses a simple mirror. Of course performance is based on data but I've found RAIDz a better performer than RAID5 on blocky I/O, which I /think/ is the type of I/O you'd see under vboxen. I've taken data on a simple mirror verses RAIDz/2/3 but I no longer have it. > thanks for clue. > > randy > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Mon Jan 9 11:07:03 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3CEC9106564A for ; Mon, 9 Jan 2012 11:07:03 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 2A06D8FC14 for ; Mon, 9 Jan 2012 11:07:03 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id q09B73rs042166 for ; Mon, 9 Jan 2012 11:07:03 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.5/8.14.5/Submit) id q09B729b042164 for freebsd-fs@FreeBSD.org; Mon, 9 Jan 2012 11:07:02 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 9 Jan 2012 11:07:02 GMT Message-Id: <201201091107.q09B729b042164@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-fs@FreeBSD.org Cc: Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Jan 2012 11:07:03 -0000 Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/163770 fs [zfs] [hang] LOR between zfs&syncer + vnlru leading to o kern/162944 fs [coda] Coda file system module looks broken in 9.0 o kern/162860 fs [zfs] Cannot share ZFS filesystem to hosts with a hyph o kern/162751 fs [zfs] [panic] kernel panics during file operations o kern/162591 fs [nullfs] cross-filesystem nullfs does not work as expe o kern/162519 fs [zfs] "zpool import" relies on buggy realpath() behavi o kern/162362 fs [snapshots] [panic] ufs with snapshot(s) panics when g o kern/162083 fs [zfs] [panic] zfs unmount -f pool o kern/161968 fs [zfs] [hang] renaming snapshot with -r including a zvo o kern/161897 fs [zfs] [patch] zfs partition probing causing long delay o kern/161864 fs [ufs] removing journaling from UFS partition fails on o bin/161807 fs [patch] add option for explicitly specifying metadata o kern/161674 fs [ufs] snapshot on journaled ufs doesn't work o kern/161579 fs [smbfs] FreeBSD sometimes panics when an smb share is o kern/161533 fs [zfs] [panic] zfs receive panic: system ioctl returnin o kern/161511 fs [unionfs] Filesystem deadlocks when using multiple uni o kern/161438 fs [zfs] [panic] recursed on non-recursive spa_namespace_ o kern/161424 fs [nullfs] __getcwd() calls fail when used on nullfs mou o kern/161280 fs [zfs] Stack overflow in gptzfsboot o kern/161205 fs [nfs] [pfsync] [regression] [build] Bug report freebsd o kern/161169 fs [zfs] [panic] ZFS causes kernel panic in dbuf_dirty o kern/161112 fs [ufs] [lor] filesystem LOR in FreeBSD 9.0-BETA3 o kern/160893 fs [zfs] [panic] 9.0-BETA2 kernel panic o kern/160860 fs Random UFS root filesystem corruption with SU+J [regre o kern/160801 fs [zfs] zfsboot on 8.2-RELEASE fails to boot from root-o o kern/160790 fs [fusefs] [panic] VPUTX: negative ref count with FUSE o kern/160777 fs [zfs] [hang] RAID-Z3 causes fatal hang upon scrub/impo o kern/160706 fs [zfs] zfs bootloader fails when a non-root vdev exists o kern/160591 fs [zfs] Fail to boot on zfs root with degraded raidz2 [r o kern/160410 fs [smbfs] [hang] smbfs hangs when transferring large fil o kern/160283 fs [zfs] [patch] 'zfs list' does abort in make_dataset_ha o kern/159971 fs [ffs] [panic] panic with soft updates journaling durin o kern/159930 fs [ufs] [panic] kernel core o kern/159402 fs [zfs][loader] symlinks cause I/O errors o kern/159357 fs [zfs] ZFS MAXNAMELEN macro has confusing name (off-by- o kern/159356 fs [zfs] [patch] ZFS NAME_ERR_DISKLIKE check is Solaris-s o kern/159351 fs [nfs] [patch] - divide by zero in mountnfs() o kern/159251 fs [zfs] [request]: add FLETCHER4 as DEDUP hash option o kern/159077 fs [zfs] Can't cd .. with latest zfs version o kern/159048 fs [smbfs] smb mount corrupts large files o kern/159045 fs [zfs] [hang] ZFS scrub freezes system o kern/158839 fs [zfs] ZFS Bootloader Fails if there is a Dead Disk o kern/158802 fs amd(8) ICMP storm and unkillable process. o kern/158711 fs [ffs] [panic] panic in ffs_blkfree and ffs_valloc o kern/158231 fs [nullfs] panic on unmounting nullfs mounted over ufs o f kern/157929 fs [nfs] NFS slow read o kern/157722 fs [geli] unable to newfs a geli encrypted partition o kern/157399 fs [zfs] trouble with: mdconfig force delete && zfs strip o kern/157179 fs [zfs] zfs/dbuf.c: panic: solaris assert: arc_buf_remov o kern/156797 fs [zfs] [panic] Double panic with FreeBSD 9-CURRENT and o kern/156781 fs [zfs] zfs is losing the snapshot directory, p kern/156545 fs [ufs] mv could break UFS on SMP systems o kern/156193 fs [ufs] [hang] UFS snapshot hangs && deadlocks processes o kern/156039 fs [nullfs] [unionfs] nullfs + unionfs do not compose, re o kern/155615 fs [zfs] zfs v28 broken on sparc64 -current o kern/155587 fs [zfs] [panic] kernel panic with zfs f kern/155411 fs [regression] [8.2-release] [tmpfs]: mount: tmpfs : No o kern/155199 fs [ext2fs] ext3fs mounted as ext2fs gives I/O errors o bin/155104 fs [zfs][patch] use /dev prefix by default when importing o kern/154930 fs [zfs] cannot delete/unlink file from full volume -> EN o kern/154828 fs [msdosfs] Unable to create directories on external USB o kern/154491 fs [smbfs] smb_co_lock: recursive lock for object 1 p kern/154228 fs [md] md getting stuck in wdrain state o kern/153996 fs [zfs] zfs root mount error while kernel is not located o kern/153753 fs [zfs] ZFS v15 - grammatical error when attempting to u o kern/153716 fs [zfs] zpool scrub time remaining is incorrect o kern/153695 fs [patch] [zfs] Booting from zpool created on 4k-sector o kern/153680 fs [xfs] 8.1 failing to mount XFS partitions o kern/153520 fs [zfs] Boot from GPT ZFS root on HP BL460c G1 unstable o kern/153418 fs [zfs] [panic] Kernel Panic occurred writing to zfs vol o kern/153351 fs [zfs] locking directories/files in ZFS o bin/153258 fs [patch][zfs] creating ZVOLs requires `refreservation' s kern/153173 fs [zfs] booting from a gzip-compressed dataset doesn't w o kern/153126 fs [zfs] vdev failure, zpool=peegel type=vdev.too_small o kern/152022 fs [nfs] nfs service hangs with linux client [regression] o kern/151942 fs [zfs] panic during ls(1) zfs snapshot directory o kern/151905 fs [zfs] page fault under load in /sbin/zfs o bin/151713 fs [patch] Bug in growfs(8) with respect to 32-bit overfl o kern/151648 fs [zfs] disk wait bug o kern/151629 fs [fs] [patch] Skip empty directory entries during name o kern/151330 fs [zfs] will unshare all zfs filesystem after execute a o kern/151326 fs [nfs] nfs exports fail if netgroups contain duplicate o kern/151251 fs [ufs] Can not create files on filesystem with heavy us o kern/151226 fs [zfs] can't delete zfs snapshot o kern/151111 fs [zfs] vnodes leakage during zfs unmount o kern/150503 fs [zfs] ZFS disks are UNAVAIL and corrupted after reboot o kern/150501 fs [zfs] ZFS vdev failure vdev.bad_label on amd64 o kern/150390 fs [zfs] zfs deadlock when arcmsr reports drive faulted o kern/150336 fs [nfs] mountd/nfsd became confused; refused to reload n o kern/149208 fs mksnap_ffs(8) hang/deadlock o kern/149173 fs [patch] [zfs] make OpenSolaris installa o kern/149015 fs [zfs] [patch] misc fixes for ZFS code to build on Glib o kern/149014 fs [zfs] [patch] declarations in ZFS libraries/utilities o kern/149013 fs [zfs] [patch] make ZFS makefiles use the libraries fro o kern/148504 fs [zfs] ZFS' zpool does not allow replacing drives to be o kern/148490 fs [zfs]: zpool attach - resilver bidirectionally, and re o kern/148368 fs [zfs] ZFS hanging forever on 8.1-PRERELEASE o kern/148138 fs [zfs] zfs raidz pool commands freeze o kern/147903 fs [zfs] [panic] Kernel panics on faulty zfs device o kern/147881 fs [zfs] [patch] ZFS "sharenfs" doesn't allow different " o kern/147560 fs [zfs] [boot] Booting 8.1-PRERELEASE raidz system take o kern/147420 fs [ufs] [panic] ufs_dirbad, nullfs, jail panic (corrupt o kern/146941 fs [zfs] [panic] Kernel Double Fault - Happens constantly o kern/146786 fs [zfs] zpool import hangs with checksum errors o kern/146708 fs [ufs] [panic] Kernel panic in softdep_disk_write_compl o kern/146528 fs [zfs] Severe memory leak in ZFS on i386 o kern/146502 fs [nfs] FreeBSD 8 NFS Client Connection to Server s kern/145712 fs [zfs] cannot offline two drives in a raidz2 configurat o kern/145411 fs [xfs] [panic] Kernel panics shortly after mounting an f bin/145309 fs bsdlabel: Editing disk label invalidates the whole dev o kern/145272 fs [zfs] [panic] Panic during boot when accessing zfs on o kern/145246 fs [ufs] dirhash in 7.3 gratuitously frees hashes when it o kern/145238 fs [zfs] [panic] kernel panic on zpool clear tank o kern/145229 fs [zfs] Vast differences in ZFS ARC behavior between 8.0 o kern/145189 fs [nfs] nfsd performs abysmally under load o kern/144929 fs [ufs] [lor] vfs_bio.c + ufs_dirhash.c p kern/144447 fs [zfs] sharenfs fsunshare() & fsshare_main() non functi o kern/144416 fs [panic] Kernel panic on online filesystem optimization s kern/144415 fs [zfs] [panic] kernel panics on boot after zfs crash o kern/144234 fs [zfs] Cannot boot machine with recent gptzfsboot code o kern/143825 fs [nfs] [panic] Kernel panic on NFS client o bin/143572 fs [zfs] zpool(1): [patch] The verbose output from iostat o kern/143212 fs [nfs] NFSv4 client strange work ... o kern/143184 fs [zfs] [lor] zfs/bufwait LOR o kern/142878 fs [zfs] [vfs] lock order reversal o kern/142597 fs [ext2fs] ext2fs does not work on filesystems with real o kern/142489 fs [zfs] [lor] allproc/zfs LOR o kern/142466 fs Update 7.2 -> 8.0 on Raid 1 ends with screwed raid [re o kern/142306 fs [zfs] [panic] ZFS drive (from OSX Leopard) causes two o kern/142068 fs [ufs] BSD labels are got deleted spontaneously o kern/141897 fs [msdosfs] [panic] Kernel panic. msdofs: file name leng o kern/141463 fs [nfs] [panic] Frequent kernel panics after upgrade fro o kern/141305 fs [zfs] FreeBSD ZFS+sendfile severe performance issues ( o kern/141091 fs [patch] [nullfs] fix panics with DIAGNOSTIC enabled o kern/141086 fs [nfs] [panic] panic("nfs: bioread, not dir") on FreeBS o kern/141010 fs [zfs] "zfs scrub" fails when backed by files in UFS2 o kern/140888 fs [zfs] boot fail from zfs root while the pool resilveri o kern/140661 fs [zfs] [patch] /boot/loader fails to work on a GPT/ZFS- o kern/140640 fs [zfs] snapshot crash o kern/140068 fs [smbfs] [patch] smbfs does not allow semicolon in file o kern/139725 fs [zfs] zdb(1) dumps core on i386 when examining zpool c o kern/139715 fs [zfs] vfs.numvnodes leak on busy zfs p bin/139651 fs [nfs] mount(8): read-only remount of NFS volume does n o kern/139597 fs [patch] [tmpfs] tmpfs initializes va_gen but doesn't u o kern/139564 fs [zfs] [panic] 8.0-RC1 - Fatal trap 12 at end of shutdo o kern/139407 fs [smbfs] [panic] smb mount causes system crash if remot o kern/138662 fs [panic] ffs_blkfree: freeing free block o kern/138421 fs [ufs] [patch] remove UFS label limitations o kern/138202 fs mount_msdosfs(1) see only 2Gb o kern/136968 fs [ufs] [lor] ufs/bufwait/ufs (open) o kern/136945 fs [ufs] [lor] filedesc structure/ufs (poll) o kern/136944 fs [ffs] [lor] bufwait/snaplk (fsync) o kern/136873 fs [ntfs] Missing directories/files on NTFS volume o kern/136865 fs [nfs] [patch] NFS exports atomic and on-the-fly atomic p kern/136470 fs [nfs] Cannot mount / in read-only, over NFS o kern/135546 fs [zfs] zfs.ko module doesn't ignore zpool.cache filenam o kern/135469 fs [ufs] [panic] kernel crash on md operation in ufs_dirb o kern/135050 fs [zfs] ZFS clears/hides disk errors on reboot o kern/134491 fs [zfs] Hot spares are rather cold... o kern/133676 fs [smbfs] [panic] umount -f'ing a vnode-based memory dis o kern/132960 fs [ufs] [panic] panic:ffs_blkfree: freeing free frag o kern/132397 fs reboot causes filesystem corruption (failure to sync b o kern/132331 fs [ufs] [lor] LOR ufs and syncer o kern/132237 fs [msdosfs] msdosfs has problems to read MSDOS Floppy o kern/132145 fs [panic] File System Hard Crashes o kern/131441 fs [unionfs] [nullfs] unionfs and/or nullfs not combineab o kern/131360 fs [nfs] poor scaling behavior of the NFS server under lo o kern/131342 fs [nfs] mounting/unmounting of disks causes NFS to fail o bin/131341 fs makefs: error "Bad file descriptor" on the mount poin o kern/130920 fs [msdosfs] cp(1) takes 100% CPU time while copying file o kern/130210 fs [nullfs] Error by check nullfs o kern/129760 fs [nfs] after 'umount -f' of a stale NFS share FreeBSD l o kern/129488 fs [smbfs] Kernel "bug" when using smbfs in smbfs_smb.c: o kern/129231 fs [ufs] [patch] New UFS mount (norandom) option - mostly o kern/129152 fs [panic] non-userfriendly panic when trying to mount(8) o kern/127787 fs [lor] [ufs] Three LORs: vfslock/devfs/vfslock, ufs/vfs o bin/127270 fs fsck_msdosfs(8) may crash if BytesPerSec is zero o kern/127029 fs [panic] mount(8): trying to mount a write protected zi o kern/126287 fs [ufs] [panic] Kernel panics while mounting an UFS file o kern/125895 fs [ffs] [panic] kernel: panic: ffs_blkfree: freeing free s kern/125738 fs [zfs] [request] SHA256 acceleration in ZFS o kern/123939 fs [msdosfs] corrupts new files f sparc/123566 fs [zfs] zpool import issue: EOVERFLOW o kern/122380 fs [ffs] ffs_valloc:dup alloc (Soekris 4801/7.0/USB Flash o bin/122172 fs [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, o bin/121898 fs [nullfs] pwd(1)/getcwd(2) fails with Permission denied o bin/121072 fs [smbfs] mount_smbfs(8) cannot normally convert the cha o kern/120483 fs [ntfs] [patch] NTFS filesystem locking changes o kern/120482 fs [ntfs] [patch] Sync style changes between NetBSD and F o kern/118912 fs [2tb] disk sizing/geometry problem with large array o kern/118713 fs [minidump] [patch] Display media size required for a k o bin/118249 fs [ufs] mv(1): moving a directory changes its mtime o kern/118126 fs [nfs] [patch] Poor NFS server write performance o kern/118107 fs [ntfs] [panic] Kernel panic when accessing a file at N o kern/117954 fs [ufs] dirhash on very large directories blocks the mac o bin/117315 fs [smbfs] mount_smbfs(8) and related options can't mount f kern/117314 fs [ntfs] Long-filename only NTFS fs'es cause kernel pani o kern/117158 fs [zfs] zpool scrub causes panic if geli vdevs detach on o bin/116980 fs [msdosfs] [patch] mount_msdosfs(8) resets some flags f o conf/116931 fs lack of fsck_cd9660 prevents mounting iso images with o kern/116583 fs [ffs] [hang] System freezes for short time when using o bin/115361 fs [zfs] mount(8) gets into a state where it won't set/un o kern/114955 fs [cd9660] [patch] [request] support for mask,dirmask,ui o kern/114847 fs [ntfs] [patch] [request] dirmask support for NTFS ala o kern/114676 fs [ufs] snapshot creation panics: snapacct_ufs2: bad blo o bin/114468 fs [patch] [request] add -d option to umount(8) to detach o kern/113852 fs [smbfs] smbfs does not properly implement DFS referral o bin/113838 fs [patch] [request] mount(8): add support for relative p o bin/113049 fs [patch] [request] make quot(8) use getopt(3) and show o kern/112658 fs [smbfs] [patch] smbfs and caching problems (resolves b o kern/111843 fs [msdosfs] Long Names of files are incorrectly created o kern/111782 fs [ufs] dump(8) fails horribly for large filesystems s bin/111146 fs [2tb] fsck(8) fails on 6T filesystem o kern/109024 fs [msdosfs] [iconv] mount_msdosfs: msdosfs_iconv: Operat o kern/109010 fs [msdosfs] can't mv directory within fat32 file system o bin/107829 fs [2TB] fdisk(8): invalid boundary checking in fdisk / w o kern/106107 fs [ufs] left-over fsck_snapshot after unfinished backgro o kern/104406 fs [ufs] Processes get stuck in "ufs" state under persist o kern/104133 fs [ext2fs] EXT2FS module corrupts EXT2/3 filesystems o kern/103035 fs [ntfs] Directories in NTFS mounted disc images appear o kern/101324 fs [smbfs] smbfs sometimes not case sensitive when it's s o kern/99290 fs [ntfs] mount_ntfs ignorant of cluster sizes s bin/97498 fs [request] newfs(8) has no option to clear the first 12 o kern/97377 fs [ntfs] [patch] syntax cleanup for ntfs_ihash.c o kern/95222 fs [cd9660] File sections on ISO9660 level 3 CDs ignored o kern/94849 fs [ufs] rename on UFS filesystem is not atomic o bin/94810 fs fsck(8) incorrectly reports 'file system marked clean' o kern/94769 fs [ufs] Multiple file deletions on multi-snapshotted fil o kern/94733 fs [smbfs] smbfs may cause double unlock o kern/93942 fs [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D o kern/92272 fs [ffs] [hang] Filling a filesystem while creating a sna o kern/91134 fs [smbfs] [patch] Preserve access and modification time a kern/90815 fs [smbfs] [patch] SMBFS with character conversions somet o kern/88657 fs [smbfs] windows client hang when browsing a samba shar o kern/88555 fs [panic] ffs_blkfree: freeing free frag on AMD 64 o kern/88266 fs [smbfs] smbfs does not implement UIO_NOCOPY and sendfi o bin/87966 fs [patch] newfs(8): introduce -A flag for newfs to enabl o kern/87859 fs [smbfs] System reboot while umount smbfs. o kern/86587 fs [msdosfs] rm -r /PATH fails with lots of small files o bin/85494 fs fsck_ffs: unchecked use of cg_inosused macro etc. o kern/80088 fs [smbfs] Incorrect file time setting on NTFS mounted vi o bin/74779 fs Background-fsck checks one filesystem twice and omits o kern/73484 fs [ntfs] Kernel panic when doing `ls` from the client si o bin/73019 fs [ufs] fsck_ufs(8) cannot alloc 607016868 bytes for ino o kern/71774 fs [ntfs] NTFS cannot "see" files on a WinXP filesystem o bin/70600 fs fsck(8) throws files away when it can't grow lost+foun o kern/68978 fs [panic] [ufs] crashes with failing hard disk, loose po o kern/65920 fs [nwfs] Mounted Netware filesystem behaves strange o kern/65901 fs [smbfs] [patch] smbfs fails fsx write/truncate-down/tr o kern/61503 fs [smbfs] mount_smbfs does not work as non-root o kern/55617 fs [smbfs] Accessing an nsmb-mounted drive via a smb expo o kern/51685 fs [hang] Unbounded inode allocation causes kernel to loc o kern/51583 fs [nullfs] [patch] allow to work with devices and socket o kern/36566 fs [smbfs] System reboot with dead smb mount and umount o bin/27687 fs fsck(8) wrapper is not properly passing options to fsc o kern/18874 fs [2TB] 32bit NFS servers export wrong negative values t 256 problems total. From owner-freebsd-fs@FreeBSD.ORG Mon Jan 9 11:36:00 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6E2D51065673 for ; Mon, 9 Jan 2012 11:36:00 +0000 (UTC) (envelope-from florian@wagner-flo.net) Received: from umbracor.wagner-flo.net (umbracor.wagner-flo.net [213.165.81.202]) by mx1.freebsd.org (Postfix) with ESMTP id BC5448FC16 for ; Mon, 9 Jan 2012 11:35:59 +0000 (UTC) Received: from naclador.mos32.de (ppp-88-217-80-44.dynamic.mnet-online.de [88.217.80.44]) by umbracor.wagner-flo.net (Postfix) with ESMTPSA id EC3AC3C058F6; Mon, 9 Jan 2012 12:20:13 +0100 (CET) Date: Mon, 9 Jan 2012 12:20:11 +0100 From: Florian Wagner To: Andriy Gapon Message-ID: <20120109122011.0ae6ad70@naclador.mos32.de> In-Reply-To: <4ED35326.80402@FreeBSD.org> References: <20111015214347.09f68e4e@naclador.mos32.de> <4E9ACA9F.5090308@FreeBSD.org> <20111019082139.1661868e@auedv3.syscomp.de> <4E9EEF45.9020404@FreeBSD.org> <20111019182130.27446750@naclador.mos32.de> <4EB98E05.4070900@FreeBSD.org> <20111119211921.7ffa9953@naclador.mos32.de> <4EC8CD14.4040600@FreeBSD.org> <20111120121248.5e9773c8@naclador.mos32.de> <4EC91B36.7060107@FreeBSD.org> <20111120191018.1aa4e882@naclador.mos32.de> <4ECA2DBD.5040701@FreeBSD.org> <20111121201332.03ecadf1@naclador.mos32.de> <4ECAC272.5080500@FreeBSD.org> <4ECEBD44.6090900@FreeBSD.org> <20111125224722.6cf3a299@naclador.mos32.de> <4ED0CFF9.4030503@FreeBSD.org> <20111126134927.60fe5097@naclador.mos32.de> <4ED35326.80402@FreeBSD.org> X-Mailer: Claws Mail 3.7.10 (GTK+ 2.24.8; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: multipart/signed; micalg=PGP-SHA1; boundary="Sig_/yZj6IQdWDsGiC2GeS=oaHxP"; protocol="application/pgp-signature" Cc: freebsd-fs@FreeBSD.org Subject: Re: Extending zfsboot.c to allow selecting filesystem from boot.config X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Jan 2012 11:36:00 -0000 --Sig_/yZj6IQdWDsGiC2GeS=oaHxP Content-Type: multipart/mixed; boundary="MP_/8h6raSy1Ymd8hSZ2Yhq+T1H" --MP_/8h6raSy1Ymd8hSZ2Yhq+T1H Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable Content-Disposition: inline On Mon, 28 Nov 2011 11:23:50 +0200 Andriy Gapon wrote: > on 26/11/2011 14:49 Florian Wagner said the following: > >=20 > > I'll try applying your patches to head instead of stable/8 in the > > next days and test that. To make matters easier, can you tell me > > which revision of head they are based on? >=20 > I have finally updated my source repo and rebased the patches upon > the recent CUURENT. Please see > https://gitorious.org/~avg/freebsd/avgbsd/commits/devel-20111127_1. > The interesting commits are all near the HEAD. I've built gptzfsboot and zfsloader from the code from your branch and can report this as working perfectly over multiple reboots (including a boot.config which specifies a root filesystem and vfs.root.mountfrom being automatically set). I've the looked at the diffs of the relevant stuff between my version and your branch and as they were rather small decided to mix a bit by copying from your branch: sys/boot/zfs/zfs.c sys/boot/zfs/zfsimpl.c sys/boot/i386/zfsboot/zfsboot.c This also results in a working version according to the tests mentioned above. Attached are: - zfsboot-broken-to-working.patch which is the diff between the broken stable-8 we were discussing prior and the working one which results from copying over the mentioned files from your branch. - zfsboot-avgbsd-mix.patch which is the full patch (svn diff) above stable-8 revision 229801 that I've tested. Do you currently have any plans to merge any of that into stable-9 or stable-8? Regards Florian Wagner --MP_/8h6raSy1Ymd8hSZ2Yhq+T1H Content-Type: text/x-patch Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename=zfsboot-avgbsd-mix.patch Index: sys/boot/common/boot.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- sys/boot/common/boot.c (revision 229805) +++ sys/boot/common/boot.c (working copy) @@ -311,12 +311,12 @@ if (getenv("vfs.root.mountfrom") !=3D NULL) return(0); =20 + error =3D 1; sprintf(lbuf, "%s/etc/fstab", rootdev); if ((fd =3D open(lbuf, O_RDONLY)) < 0) - return(1); + goto notfound; =20 /* loop reading lines from /etc/fstab What was that about sscanf ag= ain? */ - error =3D 1; while (fgetstr(lbuf, sizeof(lbuf), fd) >=3D 0) { if ((lbuf[0] =3D=3D 0) || (lbuf[0] =3D=3D '#')) continue; @@ -377,6 +377,20 @@ break; } close(fd); + +notfound: + if (error) { + const char *currdev; + + currdev =3D getenv("currdev"); + if (currdev !=3D NULL && strncmp("zfs:", currdev, 4) =3D=3D 0) { + cp =3D strdup(currdev); + cp[strlen(cp) - 1] =3D '\0'; + setenv("vfs.root.mountfrom", cp, 0); + error =3D 0; + } + } + return(error); } =20 Index: sys/boot/i386/libi386/devicename.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- sys/boot/i386/libi386/devicename.c (revision 229805) +++ sys/boot/i386/libi386/devicename.c (working copy) @@ -88,6 +88,8 @@ int i, unit, slice, partition, err; char *cp; const char *np; + const char *sep; + const char *end; =20 /* minimum length check */ if (strlen(devspec) < 2) @@ -171,7 +173,6 @@ =20 case DEVT_CD: case DEVT_NET: - case DEVT_ZFS: unit =3D 0; =20 if (*np && (*np !=3D ':')) { @@ -193,6 +194,34 @@ *path =3D (*cp =3D=3D 0) ? cp : cp + 1; break; =20 + case DEVT_ZFS: + if (*np !=3D ':') { + err =3D EINVAL; + goto fail; + } + np++; + end =3D strchr(np, ':'); + if (end =3D=3D NULL) { + err =3D EINVAL; + goto fail; + } + sep =3D strchr(np, '/'); + if (sep =3D=3D NULL || sep >=3D end) + sep =3D end; + memcpy(idev->d_kind.zfs.poolname, np, sep - np); + idev->d_kind.zfs.poolname[sep - np] =3D '\0'; + if (sep < end) { + sep++; + memcpy(idev->d_kind.zfs.rootname, sep, end - sep); + idev->d_kind.zfs.rootname[end - sep] =3D '\0'; + } + else + idev->d_kind.zfs.rootname[0] =3D '\0'; + + if (path !=3D NULL) + *path =3D (*end =3D=3D '\0') ? end : end + 1; + break; + default: err =3D EINVAL; goto fail; @@ -216,7 +245,7 @@ i386_fmtdev(void *vdev) { struct i386_devdesc *dev =3D (struct i386_devdesc *)vdev; - static char buf[128]; /* XXX device length constant? */ + static char buf[256]; /* XXX device length constant? */ char *cp; =20 switch(dev->d_type) { @@ -247,9 +276,14 @@ break; =20 case DEVT_NET: - case DEVT_ZFS: sprintf(buf, "%s%d:", dev->d_dev->dv_name, dev->d_unit); break; + case DEVT_ZFS: + if (dev->d_kind.zfs.rootname[0] =3D=3D '\0') + sprintf(buf, "%s:%s:", dev->d_dev->dv_name, dev->d_kind.zfs.poolname); + else + sprintf(buf, "%s:%s/%s:", dev->d_dev->dv_name, dev->d_kind.zfs.poolna= me, dev->d_kind.zfs.rootname); + break; } return(buf); } Index: sys/boot/i386/libi386/libi386.h =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- sys/boot/i386/libi386/libi386.h (revision 229805) +++ sys/boot/i386/libi386/libi386.h (working copy) @@ -49,6 +49,12 @@ { void *data; } bioscd; + struct + { + void *data; + char poolname[256]; + char rootname[256]; + } zfs; } d_kind; }; =20 Index: sys/boot/i386/zfsboot/zfsboot.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- sys/boot/i386/zfsboot/zfsboot.c (revision 229805) +++ sys/boot/i386/zfsboot/zfsboot.c (working copy) @@ -45,7 +45,8 @@ /* Hint to loader that we came from ZFS */ #define KARGS_FLAGS_ZFS 0x4 =20 -#define PATH_CONFIG "/boot.config" +#define PATH_DOTCONFIG "/boot.config" +#define PATH_CONFIG "/boot/config" #define PATH_BOOT3 "/boot/zfsloader" #define PATH_KERNEL "/boot/kernel/kernel" =20 @@ -53,7 +54,7 @@ #define NOPT 14 #define NDEV 3 =20 -#define BIOS_NUMDRIVES 0x475 +#define BIOS_NUMDRIVES 0x475 #define DRV_HARD 0x80 #define DRV_MASK 0x7f =20 @@ -91,8 +92,10 @@ static const char *const dev_nm[NDEV] =3D {"ad", "da", "fd"}; static const unsigned char dev_maj[NDEV] =3D {30, 4, 2}; =20 +struct zfsmount zfsmount; static char cmd[512]; static char kname[1024]; +static char rootname[256]; static int comspeed =3D SIOSPD; static struct bootinfo bootinfo; static uint32_t bootdev; @@ -495,7 +498,12 @@ * will find any other available pools and it may fill in missing * vdevs for the boot pool. */ - for (i =3D 0; i < *(unsigned char *)PTOV(BIOS_NUMDRIVES); i++) { +#ifndef VIRTUALBOX + for (i =3D 0; i < *(unsigned char *)PTOV(BIOS_NUMDRIVES); i++) +#else + for (i =3D 0; i < MAXBDDEV; i++) +#endif + { if ((i | DRV_HARD) =3D=3D *(uint8_t *)PTOV(ARGS)) continue; =20 @@ -526,18 +534,21 @@ } } =20 - zfs_mount_pool(spa); - - if (zfs_lookup(spa, PATH_CONFIG, &dn) =3D=3D 0) { + if (zfs_spa_init(spa) !=3D 0 || zfs_mount(spa, 0, &zfsmount) !=3D 0) { + printf("%s: failed to mount default pool %s\n", + BOOTPROG, spa->spa_name); + autoboot =3D 0; + } else if (zfs_lookup(&zfsmount, PATH_CONFIG, &dn) =3D=3D 0 || + zfs_lookup(&zfsmount, PATH_DOTCONFIG, &dn) =3D=3D 0) { off =3D 0; zfs_read(spa, &dn, &off, cmd, sizeof(cmd)); } =20 if (*cmd) { + if (!OPT_CHECK(RBX_QUIET)) + printf("%s: %s", PATH_CONFIG, cmd); if (parse()) autoboot =3D 0; - if (!OPT_CHECK(RBX_QUIET)) - printf("%s: %s", PATH_CONFIG, cmd); /* Do not process this command twice */ *cmd =3D 0; } @@ -558,11 +569,17 @@ /* Present the user with the boot2 prompt. */ =20 for (;;) { - if (!autoboot || !OPT_CHECK(RBX_QUIET)) - printf("\nFreeBSD/x86 boot\n" - "Default: %s:%s\n" - "boot: ", - spa->spa_name, kname); + if (!autoboot || !OPT_CHECK(RBX_QUIET)) { + printf("\nFreeBSD/x86 boot\n"); + if (zfs_rlookup(spa, zfsmount.rootobj, rootname) !=3D 0) + printf("Default: %s:<0x%llx>:%s\n" + "boot: ", + spa->spa_name, zfsmount.rootobj, kname); + else + printf("Default: %s:%s:%s\n" + "boot: ", + spa->spa_name, rootname, kname); + } if (ioctrl & IO_SERIAL) sio_flush(); if (!autoboot || keyhit(5)) @@ -598,7 +615,8 @@ uint32_t addr, x; int fmt, i, j; =20 - if (zfs_lookup(spa, kname, &dn)) { + if (zfs_lookup(&zfsmount, kname, &dn)) { + printf("\nCan't find %s\n", kname); return; } off =3D 0; @@ -677,7 +695,9 @@ KARGS_FLAGS_ZFS, (uint32_t) spa->spa_guid, (uint32_t) (spa->spa_guid >> 32), - VTOP(&bootinfo)); + VTOP(&bootinfo), + (uint32_t) zfsmount.rootobj, + (uint32_t) (zfsmount.rootobj >> 32)); } =20 static int @@ -729,7 +749,7 @@ } if (c =3D=3D '?') { dnode_phys_t dn; =20 - if (zfs_lookup(spa, arg, &dn) =3D=3D 0) { + if (zfs_lookup(&zfsmount, arg, &dn) =3D=3D 0) { zap_list(spa, &dn); } return -1; @@ -751,17 +771,32 @@ q =3D (char *) strchr(arg, ':'); if (q) { spa_t *newspa; + uint64_t newroot; =20 *q++ =3D 0; newspa =3D spa_find_by_name(arg); if (newspa) { + arg =3D q; spa =3D newspa; - zfs_mount_pool(spa); + newroot =3D 0; + q =3D (char *) strchr(arg, ':'); + if (q) { + *q++ =3D 0; + if (zfs_lookup_dataset(spa, arg, &newroot)) { + printf("\nCan't find dataset %s in ZFS pool %s\n", + arg, spa->spa_name); + return -1; + } + arg =3D q; + } + if (zfs_mount(spa, newroot, &zfsmount)) { + printf("\nCan't mount ZFS dataset\n"); + return -1; + } } else { printf("\nCan't find ZFS pool %s\n", arg); return -1; } - arg =3D q; } if ((i =3D ep - arg)) { if ((size_t)i >=3D sizeof(kname)) Index: sys/boot/i386/btx/btxldr/btxldr.S =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- sys/boot/i386/btx/btxldr/btxldr.S (revision 229805) +++ sys/boot/i386/btx/btxldr/btxldr.S (working copy) @@ -112,7 +112,7 @@ call hexout # relocation call putstr # message #endif -start_null_bi: movl $0x18,%ecx # Allocate space +start_null_bi: movl $0x20,%ecx # Allocate space subl %ecx,%ebp # for arguments leal 0x4(%esp,1),%esi # Source movl %ebp,%edi # Destination Index: sys/boot/i386/btx/lib/btxcsu.s =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- sys/boot/i386/btx/lib/btxcsu.s (revision 229805) +++ sys/boot/i386/btx/lib/btxcsu.s (working copy) @@ -26,7 +26,7 @@ # # Constants. # - .set ARGADJ,0xfa0 # Argument adjustment + .set ARGADJ,0xf98 # Argument adjustment # # Client entry point. # Index: sys/boot/i386/loader/main.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- sys/boot/i386/loader/main.c (revision 229805) +++ sys/boot/i386/loader/main.c (working copy) @@ -52,14 +52,21 @@ u_int32_t howto; u_int32_t bootdev; u_int32_t bootflags; +#ifdef LOADER_ZFS_SUPPORT union { +#endif struct { u_int32_t pxeinfo; u_int32_t res2; }; +#ifdef LOADER_ZFS_SUPPORT uint64_t zfspool; }; +#endif u_int32_t bootinfo; +#ifdef LOADER_ZFS_SUPPORT + uint64_t zfsroot; +#endif } *kargs; =20 static u_int32_t initial_howto; @@ -72,6 +79,9 @@ static int isa_inb(int port); static void isa_outb(int port, int value); void exit(int code); +#ifdef LOADER_ZFS_SUPPORT +extern int zfs_extract_currdev(uint64_t guid, uint64_t rootobj, struct i38= 6_devdesc *dev); +#endif =20 /* from vers.c */ extern char bootprog_name[], bootprog_rev[], bootprog_date[], bootprog_mak= er[]; @@ -259,33 +269,16 @@ "Guessed BIOS device 0x%x not found by probes, defaulting to disk0= :\n", biosdev); new_currdev.d_unit =3D 0; } + +#ifdef LOADER_ZFS_SUPPORT + if ((kargs->bootflags & KARGS_FLAGS_ZFS) !=3D 0) + zfs_extract_currdev(kargs->zfspool, kargs->zfsroot, &new_currdev); +#endif + env_setenv("currdev", EV_VOLATILE, i386_fmtdev(&new_currdev), i386_setcurrdev, env_nounset); env_setenv("loaddev", EV_VOLATILE, i386_fmtdev(&new_currdev), env_nose= t, env_nounset); - -#ifdef LOADER_ZFS_SUPPORT - /* - * If we were started from a ZFS-aware boot2, we can work out - * which ZFS pool we are booting from. - */ - if (kargs->bootflags & KARGS_FLAGS_ZFS) { - /* - * Dig out the pool guid and convert it to a 'unit number' - */ - uint64_t guid; - int unit; - char devname[32]; - extern int zfs_guid_to_unit(uint64_t); - - guid =3D kargs->zfspool; - unit =3D zfs_guid_to_unit(guid); - if (unit >=3D 0) { - sprintf(devname, "zfs%d", unit); - setenv("currdev", devname, 1); - } - } -#endif } =20 COMMAND_SET(reboot, "reboot", "reboot the system", command_reboot); Index: sys/boot/zfs/zfs.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- sys/boot/zfs/zfs.c (revision 229805) +++ sys/boot/zfs/zfs.c (working copy) @@ -42,9 +42,12 @@ #include #include #include +#include <../i386/libi386/libi386.h> =20 #include "zfsimpl.c" =20 +#define MAXBDDEV 31 + static int zfs_open(const char *path, struct open_file *f); static int zfs_write(struct open_file *f, void *buf, size_t size, size_t *= resid); static int zfs_close(struct open_file *f); @@ -83,35 +86,20 @@ static int zfs_open(const char *upath, struct open_file *f) { - spa_t *spa =3D (spa_t *) f->f_devdata; + struct zfsmount *mount =3D (struct zfsmount *)f->f_devdata; struct file *fp; int rc; =20 if (f->f_dev !=3D &zfs_dev) return (EINVAL); =20 - rc =3D zfs_mount_pool(spa); - if (rc) - return (rc); - /* allocate file system specific data structure */ fp =3D malloc(sizeof(struct file)); bzero(fp, sizeof(struct file)); f->f_fsdata =3D (void *)fp; =20 - if (spa->spa_root_objset.os_type !=3D DMU_OST_ZFS) { - printf("Unexpected object set type %llu\n", - spa->spa_root_objset.os_type); - rc =3D EIO; - goto out; - } - - rc =3D zfs_lookup(spa, upath, &fp->f_dnode); - if (rc) - goto out; - + rc =3D zfs_lookup(mount, upath, &fp->f_dnode); fp->f_seekp =3D 0; -out: if (rc) { f->f_fsdata =3D NULL; free(fp); @@ -140,7 +128,7 @@ static int zfs_read(struct open_file *f, void *start, size_t size, size_t *resid /* o= ut */) { - spa_t *spa =3D (spa_t *) f->f_devdata; + spa_t *spa =3D ((struct zfsmount *)f->f_devdata)->spa; struct file *fp =3D (struct file *)f->f_fsdata; struct stat sb; size_t n; @@ -214,16 +202,16 @@ static int zfs_stat(struct open_file *f, struct stat *sb) { - spa_t *spa =3D (spa_t *) f->f_devdata; + struct zfsmount *mount =3D (struct zfsmount *)f->f_devdata; struct file *fp =3D (struct file *)f->f_fsdata; =20 - return (zfs_dnode_stat(spa, &fp->f_dnode, sb)); + return (zfs_dnode_stat(mount, &fp->f_dnode, sb)); } =20 static int zfs_readdir(struct open_file *f, struct dirent *d) { - spa_t *spa =3D (spa_t *) f->f_devdata; + spa_t *spa =3D ((struct zfsmount *)f->f_devdata)->spa; struct file *fp =3D (struct file *)f->f_fsdata; mzap_ent_phys_t mze; struct stat sb; @@ -379,22 +367,31 @@ } } =20 -/* - * Convert a pool guid to a 'unit number' suitable for use with zfs_dev_op= en. - */ int -zfs_guid_to_unit(uint64_t guid) +zfs_extract_currdev(uint64_t guid, uint64_t rootobj, struct i386_devdesc *= dev) { spa_t *spa; - int unit; + int rv; =20 - unit =3D 0; - STAILQ_FOREACH(spa, &zfs_pools, spa_link) { - if (spa->spa_guid =3D=3D guid) - return unit; - unit++; + spa =3D spa_find_by_guid(guid); + if (spa =3D=3D NULL) + return (ENOENT); + + rv =3D zfs_spa_init(spa); + if (rv !=3D 0) + return (rv); + strcpy(dev->d_kind.zfs.poolname, spa->spa_name); + if (rootobj =3D=3D 0 && zfs_get_root(spa, &rootobj)) { + printf("ZFS: can't find root filesystem\n"); + return (EIO); } - return (-1); + if (zfs_rlookup(spa, rootobj, dev->d_kind.zfs.rootname)) { + printf("ZFS: can't map root filesystem to its name\n"); + return (ENOENT); + } + dev->d_dev =3D &zfs_dev; + dev->d_type =3D dev->d_dev->dv_type; + return (0); } =20 static int @@ -410,7 +407,7 @@ * diskN, diskNpM or diskNsM. */ zfs_init(); - for (unit =3D 0; unit < 32 /* XXX */; unit++) { + for (unit =3D 0; unit < MAXBDDEV; unit++) { sprintf(devname, "disk%d:", unit); fd =3D open(devname, O_RDONLY); if (fd =3D=3D -1) @@ -448,17 +445,14 @@ { spa_t *spa; char line[80]; - int unit; =20 if (verbose) { spa_all_status(); return; } - unit =3D 0; STAILQ_FOREACH(spa, &zfs_pools, spa_link) { - sprintf(line, " zfs%d: %s\n", unit, spa->spa_name); + sprintf(line, " zfs:%s\n", spa->spa_name); pager_output(line); - unit++; } } =20 @@ -469,33 +463,38 @@ zfs_dev_open(struct open_file *f, ...) { va_list args; - struct devdesc *dev; - int unit, i; + struct i386_devdesc *dev; + struct zfsmount *mount; spa_t *spa; + uint64_t rootobj; + int rv; =20 va_start(args, f); - dev =3D va_arg(args, struct devdesc*); + dev =3D va_arg(args, struct i386_devdesc *); va_end(args); =20 - /* - * We mostly ignore the stuff that devopen sends us. For now, - * use the unit to find a pool - later we will override the - * devname parsing so that we can name a pool and a fs within - * the pool. - */ - unit =3D dev->d_unit; -=09 - i =3D 0; - STAILQ_FOREACH(spa, &zfs_pools, spa_link) { - if (i =3D=3D unit) - break; - i++; - } - if (!spa) { + spa =3D spa_find_by_name(dev->d_kind.zfs.poolname); + if (!spa) return (ENXIO); + rv =3D zfs_spa_init(spa); + if (rv !=3D 0) + return (rv); + mount =3D malloc(sizeof(*mount)); + rootobj =3D 0; + if (dev->d_kind.zfs.rootname[0] !=3D '\0') { + rv =3D zfs_lookup_dataset(spa, dev->d_kind.zfs.rootname, &rootobj); + if (rv !=3D 0) + return (rv); } - - f->f_devdata =3D spa; + rv =3D zfs_mount(spa, rootobj, mount); + if (rv !=3D 0) + return (rv); + if (mount->objset.os_type !=3D DMU_OST_ZFS) { + printf("Unexpected object set type %llu\n", + mount->objset.os_type); + return (EIO); + } + f->f_devdata =3D mount; free(dev); return (0); } @@ -504,6 +503,7 @@ zfs_dev_close(struct open_file *f) { =20 + free(f->f_devdata); f->f_devdata =3D NULL; return (0); } Index: sys/boot/zfs/zfsimpl.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- sys/boot/zfs/zfsimpl.c (revision 229805) +++ sys/boot/zfs/zfsimpl.c (working copy) @@ -36,6 +36,13 @@ #include "zfsimpl.h" #include "zfssubr.c" =20 + +struct zfsmount { + spa_t *spa; + objset_phys_t objset; + uint64_t rootobj; +}; + /* * List of all vdevs, chained through v_alllink. */ @@ -458,6 +465,9 @@ =20 if (strcmp(type, VDEV_TYPE_MIRROR) && strcmp(type, VDEV_TYPE_DISK) +#ifdef ZFS_TEST + && strcmp(type, VDEV_TYPE_FILE) +#endif && strcmp(type, VDEV_TYPE_RAIDZ) && strcmp(type, VDEV_TYPE_REPLACING)) { printf("ZFS: can only boot from disk, mirror, raidz1, raidz2 and raidz3 = vdevs\n"); @@ -623,8 +633,6 @@ return (0); } =20 -#ifdef BOOT2 - static spa_t * spa_find_by_name(const char *name) { @@ -637,8 +645,6 @@ return (0); } =20 -#endif - static spa_t * spa_create(uint64_t guid) { @@ -968,7 +974,7 @@ int v; =20 for (v =3D 0; v < 32; v++) - if (n =3D=3D (1 << v)) + if (n =3D=3D (1 << v)) /* XXX n is expected to be a power of two? */ return v; return -1; } @@ -1449,6 +1455,259 @@ dnode, sizeof(dnode_phys_t)); } =20 +static int +mzap_rlookup(spa_t *spa, const dnode_phys_t *dnode, char *name, uint64_t v= alue) +{ + const mzap_phys_t *mz; + const mzap_ent_phys_t *mze; + size_t size; + int chunks, i; + + /* + * Microzap objects use exactly one block. Read the whole + * thing. + */ + size =3D dnode->dn_datablkszsec * 512; + + mz =3D (const mzap_phys_t *) zap_scratch; + chunks =3D size / MZAP_ENT_LEN - 1; + + for (i =3D 0; i < chunks; i++) { + mze =3D &mz->mz_chunk[i]; + if (value =3D=3D mze->mze_value) { + strcpy(name, mze->mze_name); + return (0); + } + } + + return (ENOENT); +} + +static void +fzap_name_copy(const zap_leaf_t *zl, const zap_leaf_chunk_t *zc, char *nam= e) +{ + size_t namelen; + const zap_leaf_chunk_t *nc; + char *p; + + namelen =3D zc->l_entry.le_name_length; + + nc =3D &ZAP_LEAF_CHUNK(zl, zc->l_entry.le_name_chunk); + p =3D name; + while (namelen > 0) { + size_t len; + len =3D namelen; + if (len > ZAP_LEAF_ARRAY_BYTES) + len =3D ZAP_LEAF_ARRAY_BYTES; + memcpy(p, nc->l_array.la_array, len); + p +=3D len; + namelen -=3D len; + nc =3D &ZAP_LEAF_CHUNK(zl, nc->l_array.la_next); + } + + *p =3D '\0'; +} + +static int +fzap_rlookup(spa_t *spa, const dnode_phys_t *dnode, char *name, uint64_t v= alue) +{ + int bsize =3D dnode->dn_datablkszsec << SPA_MINBLOCKSHIFT; + zap_phys_t zh =3D *(zap_phys_t *) zap_scratch; + fat_zap_t z; + uint64_t *ptrtbl; + uint64_t hash; + int rc; + + if (zh.zap_magic !=3D ZAP_MAGIC) + return (EIO); + + z.zap_block_shift =3D ilog2(bsize); + z.zap_phys =3D (zap_phys_t *) zap_scratch; + + /* + * Figure out where the pointer table is and read it in if necessary. + */ + if (zh.zap_ptrtbl.zt_blk) { + rc =3D dnode_read(spa, dnode, zh.zap_ptrtbl.zt_blk * bsize, + zap_scratch, bsize); + if (rc) + return (rc); + ptrtbl =3D (uint64_t *) zap_scratch; + } else { + ptrtbl =3D &ZAP_EMBEDDED_PTRTBL_ENT(&z, 0); + } + + hash =3D zap_hash(zh.zap_salt, name); + + zap_leaf_t zl; + zl.l_bs =3D z.zap_block_shift; + + off_t off =3D ptrtbl[hash >> (64 - zh.zap_ptrtbl.zt_shift)] << zl.l_bs; + zap_leaf_chunk_t *zc; + + rc =3D dnode_read(spa, dnode, off, zap_scratch, bsize); + if (rc) + return (rc); + + zl.l_phys =3D (zap_leaf_phys_t *) zap_scratch; + + /* + * Make sure this chunk matches our hash. + */ + if (zl.l_phys->l_hdr.lh_prefix_len > 0 + && zl.l_phys->l_hdr.lh_prefix + !=3D hash >> (64 - zl.l_phys->l_hdr.lh_prefix_len)) + return (ENOENT); + + /* + * Hash within the chunk to find our entry. + */ + int shift =3D (64 - ZAP_LEAF_HASH_SHIFT(&zl) - zl.l_phys->l_hdr.lh_prefix= _len); + int h =3D (hash >> shift) & ((1 << ZAP_LEAF_HASH_SHIFT(&zl)) - 1); + h =3D zl.l_phys->l_hash[h]; + if (h =3D=3D 0xffff) + return (ENOENT); + zc =3D &ZAP_LEAF_CHUNK(&zl, h); + while (zc->l_entry.le_hash !=3D hash) { + if (zc->l_entry.le_next =3D=3D 0xffff) { + zc =3D 0; + break; + } + zc =3D &ZAP_LEAF_CHUNK(&zl, zc->l_entry.le_next); + } + if (fzap_leaf_value(&zl, zc) =3D=3D value) { + fzap_name_copy(&zl, zc, name); + return (0); + } + + return (ENOENT); +} + +static int +zap_rlookup(spa_t *spa, const dnode_phys_t *dnode, char *name, uint64_t va= lue) +{ + int rc; + uint64_t zap_type; + size_t size =3D dnode->dn_datablkszsec * 512; + + rc =3D dnode_read(spa, dnode, 0, zap_scratch, size); + if (rc) + return (rc); + + zap_type =3D *(uint64_t *) zap_scratch; + if (zap_type =3D=3D ZBT_MICRO) + return mzap_rlookup(spa, dnode, name, value); + else + return fzap_rlookup(spa, dnode, name, value); +} + +static int +zfs_rlookup(spa_t *spa, uint64_t objnum, char *result) +{ + char name[256]; + char component[256]; + uint64_t dir_obj, parent_obj, child_dir_zapobj; + dnode_phys_t child_dir_zap, dataset, dir, parent; + dsl_dir_phys_t *dd; + dsl_dataset_phys_t *ds; + char *p; + int len; + + p =3D &name[sizeof(name) - 1]; + *p =3D '\0'; + + if (objset_get_dnode(spa, &spa->spa_mos, objnum, &dataset)) { + printf("ZFS: can't find dataset %llu\n", objnum); + return (EIO); + } + ds =3D (dsl_dataset_phys_t *)&dataset.dn_bonus; + dir_obj =3D ds->ds_dir_obj; + + for (;;) { + if (objset_get_dnode(spa, &spa->spa_mos, dir_obj, &dir) !=3D 0) + return (EIO); + dd =3D (dsl_dir_phys_t *)&dir.dn_bonus; + + /* Actual loop condition. */ + parent_obj =3D dd->dd_parent_obj; + if (parent_obj =3D=3D 0) + break; + + if (objset_get_dnode(spa, &spa->spa_mos, parent_obj, &parent) !=3D 0) + return (EIO); + dd =3D (dsl_dir_phys_t *)&parent.dn_bonus; + child_dir_zapobj =3D dd->dd_child_dir_zapobj; + if (objset_get_dnode(spa, &spa->spa_mos, child_dir_zapobj, &child_dir_za= p) !=3D 0) + return (EIO); + if (zap_rlookup(spa, &child_dir_zap, component, dir_obj) !=3D 0) + return (EIO); + + len =3D strlen(component); + p -=3D len; + memcpy(p, component, len); + --p; + *p =3D '/'; + + /* Actual loop iteration. */ + dir_obj =3D parent_obj; + } + + if (*p !=3D '\0') + ++p; + strcpy(result, p); + + return (0); +} + +static int +zfs_lookup_dataset(spa_t *spa, const char *name, uint64_t *objnum) +{ + char element[256]; + uint64_t dir_obj, child_dir_zapobj; + dnode_phys_t child_dir_zap, dir; + dsl_dir_phys_t *dd; + const char *p, *q; + + if (objset_get_dnode(spa, &spa->spa_mos, DMU_POOL_DIRECTORY_OBJECT, &dir)) + return (EIO); + if (zap_lookup(spa, &dir, DMU_POOL_ROOT_DATASET, &dir_obj)) + return (EIO); + + p =3D name; + for (;;) { + if (objset_get_dnode(spa, &spa->spa_mos, dir_obj, &dir)) + return (EIO); + dd =3D (dsl_dir_phys_t *)&dir.dn_bonus; + + while (*p =3D=3D '/') + p++; + /* Actual loop condition #1. */ + if (*p =3D=3D '\0') + break; + + q =3D strchr(p, '/'); + if (q) { + memcpy(element, p, q - p); + element[q - p] =3D '\0'; + p =3D q + 1; + } else { + strcpy(element, p); + p +=3D strlen(p); + } + + child_dir_zapobj =3D dd->dd_child_dir_zapobj; + if (objset_get_dnode(spa, &spa->spa_mos, child_dir_zapobj, &child_dir_za= p) !=3D 0) + return (EIO); + + /* Actual loop condition #2. */ + if (zap_lookup(spa, &child_dir_zap, element, &dir_obj) !=3D 0) + return (ENOENT); + } + + *objnum =3D dd->dd_head_dataset_obj; + return (0); +} + /* * Find the object set given the object number of its dataset object * and return its details in *objset @@ -1478,11 +1737,13 @@ * dataset if there is none and return its details in *objset */ static int -zfs_mount_root(spa_t *spa, objset_phys_t *objset) +zfs_get_root(spa_t *spa, uint64_t *objid) { dnode_phys_t dir, propdir; uint64_t props, bootfs, root; =20 + *objid =3D 0; + /* * Start with the MOS directory object. */ @@ -1498,8 +1759,10 @@ && objset_get_dnode(spa, &spa->spa_mos, props, &propdir) =3D=3D 0 && zap_lookup(spa, &propdir, "bootfs", &bootfs) =3D=3D 0 && bootfs !=3D 0) - return zfs_mount_dataset(spa, bootfs, objset); - + { + *objid =3D bootfs; + return (0); + } /* * Lookup the root dataset directory */ @@ -1514,36 +1777,52 @@ * to find the dataset object and from that the object set itself. */ dsl_dir_phys_t *dd =3D (dsl_dir_phys_t *) &dir.dn_bonus; - return zfs_mount_dataset(spa, dd->dd_head_dataset_obj, objset); + *objid =3D dd->dd_head_dataset_obj; + return (0); } =20 static int -zfs_mount_pool(spa_t *spa) +zfs_mount(spa_t *spa, uint64_t rootobj, struct zfsmount *mount) { =20 + mount->spa =3D spa; + /* - * Find the MOS and work our way in from there. + * Find the root object set if not explicitly provided */ - if (zio_read(spa, &spa->spa_uberblock.ub_rootbp, &spa->spa_mos)) { - printf("ZFS: can't read MOS\n"); + if (rootobj =3D=3D 0 && zfs_get_root(spa, &rootobj)) { + printf("ZFS: can't find root filesystem\n"); return (EIO); } =20 - /* - * Find the root object set - */ - if (zfs_mount_root(spa, &spa->spa_root_objset)) { - printf("Can't find root filesystem - giving up\n"); + if (zfs_mount_dataset(spa, rootobj, &mount->objset)) { + printf("ZFS: can't open root filesystem\n"); return (EIO); } =20 + mount->rootobj =3D rootobj; + return (0); } =20 static int -zfs_dnode_stat(spa_t *spa, dnode_phys_t *dn, struct stat *sb) +zfs_spa_init(spa_t *spa) { =20 + if (zio_read(spa, &spa->spa_uberblock.ub_rootbp, &spa->spa_mos)) { + printf("ZFS: can't read MOS of pool %s\n", spa->spa_name); + return (EIO); + } + return (0); +} + +static int +zfs_dnode_stat(const struct zfsmount *mount, dnode_phys_t *dn, struct stat= *sb) +{ + spa_t *spa; + + spa =3D mount->spa; + if (dn->dn_bonustype !=3D DMU_OT_SA) { znode_phys_t *zp =3D (znode_phys_t *)dn->dn_bonus; =20 @@ -1596,10 +1875,11 @@ * Lookup a file and return its dnode. */ static int -zfs_lookup(spa_t *spa, const char *upath, dnode_phys_t *dnode) +zfs_lookup(const struct zfsmount *mount, const char *upath, dnode_phys_t *= dnode) { int rc; uint64_t objnum, rootnum, parentnum; + spa_t *spa; dnode_phys_t dn; const char *p, *q; char element[256]; @@ -1607,16 +1887,17 @@ int symlinks_followed =3D 0; struct stat sb; =20 - if (spa->spa_root_objset.os_type !=3D DMU_OST_ZFS) { + spa =3D mount->spa; + if (mount->objset.os_type !=3D DMU_OST_ZFS) { printf("ZFS: unexpected object set type %llu\n", - spa->spa_root_objset.os_type); + mount->objset.os_type); return (EIO); } =20 /* * Get the root directory dnode. */ - rc =3D objset_get_dnode(spa, &spa->spa_root_objset, MASTER_NODE_OBJ, &dn); + rc =3D objset_get_dnode(spa, &mount->objset, MASTER_NODE_OBJ, &dn); if (rc) return (rc); =20 @@ -1624,7 +1905,7 @@ if (rc) return (rc); =20 - rc =3D objset_get_dnode(spa, &spa->spa_root_objset, rootnum, &dn); + rc =3D objset_get_dnode(spa, &mount->objset, rootnum, &dn); if (rc) return (rc); =20 @@ -1645,7 +1926,7 @@ p =3D 0; } =20 - rc =3D zfs_dnode_stat(spa, &dn, &sb); + rc =3D zfs_dnode_stat(mount, &dn, &sb); if (rc) return (rc); if (!S_ISDIR(sb.st_mode)) @@ -1657,14 +1938,14 @@ return (rc); objnum =3D ZFS_DIRENT_OBJ(objnum); =20 - rc =3D objset_get_dnode(spa, &spa->spa_root_objset, objnum, &dn); + rc =3D objset_get_dnode(spa, &mount->objset, objnum, &dn); if (rc) return (rc); =20 /* * Check for symlink. */ - rc =3D zfs_dnode_stat(spa, &dn, &sb); + rc =3D zfs_dnode_stat(mount, &dn, &sb); if (rc) return (rc); if (S_ISLNK(sb.st_mode)) { @@ -1699,7 +1980,7 @@ objnum =3D rootnum; else objnum =3D parentnum; - objset_get_dnode(spa, &spa->spa_root_objset, objnum, &dn); + objset_get_dnode(spa, &mount->objset, objnum, &dn); } } =20 Index: sys/cddl/boot/zfs/zfsimpl.h =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- sys/cddl/boot/zfs/zfsimpl.h (revision 229805) +++ sys/cddl/boot/zfs/zfsimpl.h (working copy) @@ -1327,5 +1327,4 @@ struct uberblock spa_uberblock; /* best uberblock so far */ vdev_list_t spa_vdevs; /* list of all toplevel vdevs */ objset_phys_t spa_mos; /* MOS for this pool */ - objset_phys_t spa_root_objset; /* current mounted ZPL objset */ } spa_t; --MP_/8h6raSy1Ymd8hSZ2Yhq+T1H Content-Type: text/x-patch Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename=zfsboot-broken-to-working.patch diff --exclude=3D.svn -Nur boot-stable-8/i386/zfsboot/zfsboot.c stable-8/sy= s/boot/i386/zfsboot/zfsboot.c --- boot-stable-8/i386/zfsboot/zfsboot.c 2012-01-08 19:01:50.409526858 +0000 +++ stable-8/sys/boot/i386/zfsboot/zfsboot.c 2012-01-08 19:06:06.021632256 = +0000 @@ -45,7 +45,8 @@ /* Hint to loader that we came from ZFS */ #define KARGS_FLAGS_ZFS 0x4 =20 -#define PATH_CONFIG "/boot.config" +#define PATH_DOTCONFIG "/boot.config" +#define PATH_CONFIG "/boot/config" #define PATH_BOOT3 "/boot/zfsloader" #define PATH_KERNEL "/boot/kernel/kernel" =20 @@ -53,7 +54,7 @@ #define NOPT 14 #define NDEV 3 =20 -#define BIOS_NUMDRIVES 0x475 +#define BIOS_NUMDRIVES 0x475 #define DRV_HARD 0x80 #define DRV_MASK 0x7f =20 @@ -497,7 +498,12 @@ * will find any other available pools and it may fill in missing * vdevs for the boot pool. */ - for (i =3D 0; i < *(unsigned char *)PTOV(BIOS_NUMDRIVES); i++) { +#ifndef VIRTUALBOX + for (i =3D 0; i < *(unsigned char *)PTOV(BIOS_NUMDRIVES); i++) +#else + for (i =3D 0; i < MAXBDDEV; i++) +#endif + { if ((i | DRV_HARD) =3D=3D *(uint8_t *)PTOV(ARGS)) continue; =20 @@ -528,9 +534,12 @@ } } =20 - zfs_mount(spa, 0, &zfsmount); - - if (zfs_lookup(&zfsmount, PATH_CONFIG, &dn) =3D=3D 0) { + if (zfs_spa_init(spa) !=3D 0 || zfs_mount(spa, 0, &zfsmount) !=3D 0) { + printf("%s: failed to mount default pool %s\n", + BOOTPROG, spa->spa_name); + autoboot =3D 0; + } else if (zfs_lookup(&zfsmount, PATH_CONFIG, &dn) =3D=3D 0 || + zfs_lookup(&zfsmount, PATH_DOTCONFIG, &dn) =3D=3D 0) { off =3D 0; zfs_read(spa, &dn, &off, cmd, sizeof(cmd)); } @@ -565,7 +574,7 @@ if (zfs_rlookup(spa, zfsmount.rootobj, rootname) !=3D 0) printf("Default: %s:<0x%llx>:%s\n" "boot: ", - spa->spa_name, rootname, kname); + spa->spa_name, zfsmount.rootobj, kname); else printf("Default: %s:%s:%s\n" "boot: ", diff --exclude=3D.svn -Nur boot-stable-8/zfs/zfs.c stable-8/sys/boot/zfs/zf= s.c --- boot-stable-8/zfs/zfs.c 2012-01-08 19:01:50.409526858 +0000 +++ stable-8/sys/boot/zfs/zfs.c 2012-01-08 19:04:04.050596447 +0000 @@ -46,6 +46,8 @@ =20 #include "zfsimpl.c" =20 +#define MAXBDDEV 31 + static int zfs_open(const char *path, struct open_file *f); static int zfs_write(struct open_file *f, void *buf, size_t size, size_t *= resid); static int zfs_close(struct open_file *f); @@ -369,15 +371,16 @@ zfs_extract_currdev(uint64_t guid, uint64_t rootobj, struct i386_devdesc *= dev) { spa_t *spa; + int rv; =20 - STAILQ_FOREACH(spa, &zfs_pools, spa_link) - if (spa->spa_guid =3D=3D guid) { - strcpy(dev->d_kind.zfs.poolname, spa->spa_name); - break; - } - + spa =3D spa_find_by_guid(guid); if (spa =3D=3D NULL) return (ENOENT); + + rv =3D zfs_spa_init(spa); + if (rv !=3D 0) + return (rv); + strcpy(dev->d_kind.zfs.poolname, spa->spa_name); if (rootobj =3D=3D 0 && zfs_get_root(spa, &rootobj)) { printf("ZFS: can't find root filesystem\n"); return (EIO); @@ -404,7 +407,7 @@ * diskN, diskNpM or diskNsM. */ zfs_init(); - for (unit =3D 0; unit < 32 /* XXX */; unit++) { + for (unit =3D 0; unit < MAXBDDEV; unit++) { sprintf(devname, "disk%d:", unit); fd =3D open(devname, O_RDONLY); if (fd =3D=3D -1) @@ -470,11 +473,12 @@ dev =3D va_arg(args, struct i386_devdesc *); va_end(args); =20 - STAILQ_FOREACH(spa, &zfs_pools, spa_link) - if (strcmp(spa->spa_name, dev->d_kind.zfs.poolname) =3D=3D 0) - break; + spa =3D spa_find_by_name(dev->d_kind.zfs.poolname); if (!spa) return (ENXIO); + rv =3D zfs_spa_init(spa); + if (rv !=3D 0) + return (rv); mount =3D malloc(sizeof(*mount)); rootobj =3D 0; if (dev->d_kind.zfs.rootname[0] !=3D '\0') { diff --exclude=3D.svn -Nur boot-stable-8/zfs/zfsimpl.c stable-8/sys/boot/zf= s/zfsimpl.c --- boot-stable-8/zfs/zfsimpl.c 2012-01-08 19:01:50.409526858 +0000 +++ stable-8/sys/boot/zfs/zfsimpl.c 2012-01-08 19:04:04.070575607 +0000 @@ -633,8 +633,6 @@ return (0); } =20 -#ifdef BOOT2 - static spa_t * spa_find_by_name(const char *name) { @@ -647,8 +645,6 @@ return (0); } =20 -#endif - static spa_t * spa_create(uint64_t guid) { @@ -967,15 +963,6 @@ } zfs_free(upbuf, VDEV_UBERBLOCK_SIZE(vdev)); =20 - STAILQ_FOREACH(top_vdev, &spa->spa_vdevs, v_childlink) - if (top_vdev->v_state < VDEV_STATE_DEGRADED) - break; - if (top_vdev =3D=3D NULL) { /* all top vdevs are sufficiently healthy to = try */ - if (zio_read(spa, &spa->spa_uberblock.ub_rootbp, &spa->spa_mos)) { - printf("ZFS: can't read MOS\n"); - } - } - if (spap) *spap =3D spa; return (0); @@ -987,7 +974,7 @@ int v; =20 for (v =3D 0; v < 32; v++) - if (n =3D=3D (1 << v)) + if (n =3D=3D (1 << v)) /* XXX n is expected to be a power of two? */ return v; return -1; } @@ -1819,6 +1806,17 @@ } =20 static int +zfs_spa_init(spa_t *spa) +{ + + if (zio_read(spa, &spa->spa_uberblock.ub_rootbp, &spa->spa_mos)) { + printf("ZFS: can't read MOS of pool %s\n", spa->spa_name); + return (EIO); + } + return (0); +} + +static int zfs_dnode_stat(const struct zfsmount *mount, dnode_phys_t *dn, struct stat= *sb) { spa_t *spa; --MP_/8h6raSy1Ymd8hSZ2Yhq+T1H-- --Sig_/yZj6IQdWDsGiC2GeS=oaHxP Content-Type: application/pgp-signature; name=signature.asc Content-Disposition: attachment; filename=signature.asc -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.17 (GNU/Linux) iEYEARECAAYFAk8KzWwACgkQLvW/2gp2pPzBFQCgn5Mnw1bCmcc20VvQh+kQ04Sx nFUAnjEc48w9QGXeY6o+iAXv+kalnsuP =DjNq -----END PGP SIGNATURE----- --Sig_/yZj6IQdWDsGiC2GeS=oaHxP-- From owner-freebsd-fs@FreeBSD.ORG Mon Jan 9 12:09:44 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8ECD3106566B for ; Mon, 9 Jan 2012 12:09:44 +0000 (UTC) (envelope-from peter.maloney@brockmann-consult.de) Received: from moutng.kundenserver.de (moutng.kundenserver.de [212.227.126.171]) by mx1.freebsd.org (Postfix) with ESMTP id 202AE8FC0C for ; Mon, 9 Jan 2012 12:09:44 +0000 (UTC) Received: from [10.3.0.26] ([141.4.215.32]) by mrelayeu.kundenserver.de (node=mreu3) with ESMTP (Nemesis) id 0M6c40-1SgIsd0XLw-00wWtK; Mon, 09 Jan 2012 13:09:43 +0100 Message-ID: <4F0AD906.5080507@brockmann-consult.de> Date: Mon, 09 Jan 2012 13:09:42 +0100 From: Peter Maloney User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.23) Gecko/20110922 Thunderbird/3.1.15 MIME-Version: 1.0 To: freebsd-fs@freebsd.org References: <20111205220715.GA36072@freebsdbox.adamsnet> <4EDDE954.9020709@gmail.com> In-Reply-To: X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Provags-ID: V02:K0:X64QXhP0j+GpVJ/vyvW2jPUJQ4mTv0uWD7DtHnJ+8Ro zP2ZP54ghSdGzoq/qhbi9DAMt+vIxwxYreokZ58HZmrqV9qSFv jmc4+08GrcwF8FD3xUShvVNQITl/GN2rBWjAFPPXwFIIwu5ah9 7yrkdATA38iS+Sw6OZ11bHSUF+4FHSW3KpH+ARXulSBMmKpsa4 m1IdsONNXIp1/Yz94sFd3nzoyQNsTx4wC1qqQCFry+f0Z5xZ0A niNBEvoyOXKcQ9lvNX+Rk7c/E4njYvJthkIyJJZGT664DlUlZ9 KgCYlCu51gPwegzOjeiggVNAf9IhE5ticArzGcN7UF2tBN2Vk/ xhBZSmXX/pL56TxbVCokEtExH9FONeHr2AbYwNz2D Subject: Re: weird bug with ZFS and SLOG X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Jan 2012 12:09:44 -0000 On 12/06/2011 03:22 PM, Adam Stylinski wrote: > On Tue, Dec 6, 2011 at 5:07 AM, Volodymyr Kostyrko wrote: > > ... > Actually my pool was created on 7.2 and has been updated every release, is > that a factor? I think the answer is yes. Today I caused the same thing you guys did, with a log that wouldn't go away. I always did the same thing on the main system, with no issues (removing/replacing the log). But today for the first time, I wanted to remove the log from the backup system, and it caused/revealed that problem. I think the relevant difference between the two systems is that I destroyed and recreated the main system's zpool after I upgraded to 8-STABLE and zpool v28 (still zfs version 4). The backup system was upgraded from v15 to v28. old: zpool v15 zfs v4 new: zpool v28 zfs v4 And now to fix it, I am moving and destroying the pool. (to zpool v28 zfs v5) It is easy to just offline the device and remove it, but I want to be sure there is nothing else broken in there. For example, if I attach a mirror to the log now, it triggers a resilver that does not show which disk is resilvering in zpool status, and says it will take 25 hours, even though the 8GB SSD log obviously doesn't need that long (and replacing a pool disk doesn't either). And zdb shows more than just the log disk, but all slices with the same physical disk with "resilvering: 1". That is not extremely serious, but undesirable. > I did manage to do this in a VM, so in any case we'll > probably have to do something that involves zdb. > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" -- -------------------------------------------- Peter Maloney Brockmann Consult Max-Planck-Str. 2 21502 Geesthacht Germany Tel: +49 4152 889 300 Fax: +49 4152 889 333 E-mail: peter.maloney@brockmann-consult.de Internet: http://www.brockmann-consult.de -------------------------------------------- From owner-freebsd-fs@FreeBSD.ORG Mon Jan 9 18:27:06 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DB96E106566C for ; Mon, 9 Jan 2012 18:27:06 +0000 (UTC) (envelope-from zorbustheknight@gmail.com) Received: from mail-wi0-f182.google.com (mail-wi0-f182.google.com [209.85.212.182]) by mx1.freebsd.org (Postfix) with ESMTP id 7149C8FC0C for ; Mon, 9 Jan 2012 18:27:05 +0000 (UTC) Received: by wibhr1 with SMTP id hr1so4371180wib.13 for ; Mon, 09 Jan 2012 10:27:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; bh=WulSs/+p4zZ5hPJK1WDmtpWT7YO5Yp3s8lZtGuH4lxE=; b=plBQ4Lq0mr5dqf1Gt/LtUlK/qsdDbGxq9tDb4Du2Sm+GUqMkTK4GTMTU5kSJMcFGxD YUTk0oXd4cIxRqZmmEgMXlmGPYONQ3ghSHzswovqS4zvBPEgZJzhvsvEx0SSsg+Mkzue LY0OrqMDx5si7+SJvNWuFCp77Ln/wJhkzgfh8= MIME-Version: 1.0 Received: by 10.180.80.162 with SMTP id s2mr6489504wix.10.1326131943531; Mon, 09 Jan 2012 09:59:03 -0800 (PST) Received: by 10.223.124.195 with HTTP; Mon, 9 Jan 2012 09:59:03 -0800 (PST) Date: Mon, 9 Jan 2012 09:59:03 -0800 Message-ID: From: "frank.weed@magi.bounceme.net" To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 Subject: zfs mounting issue. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Jan 2012 18:27:06 -0000 Hi, I am currently using Freenas (9.0-RC2 FreeBSD 9.0-RC2) with zfs v28. Recently I had some issue with the usb disk that os was installed. I haved reinstalled the system and I tried to import the pool, but it will hang the system after sometime. If I reboot the system, it shows that the pool is online but it is not mounted. I issue the command zfs mount datastore, but it will cause the system to hang once again. I am more or less stumped on the issue. Any advice or ideas on how I can get around this would be helpful. Thanks. From owner-freebsd-fs@FreeBSD.ORG Mon Jan 9 19:26:55 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9BAE81065680 for ; Mon, 9 Jan 2012 19:26:55 +0000 (UTC) (envelope-from amdmi3@amdmi3.ru) Received: from smtp.timeweb.ru (smtp.timeweb.ru [92.53.116.57]) by mx1.freebsd.org (Postfix) with ESMTP id 54A618FC1C for ; Mon, 9 Jan 2012 19:26:55 +0000 (UTC) Received: from [213.148.20.85] (helo=hive.panopticon) by smtp.timeweb.ru with esmtpsa (TLSv1:CAMELLIA256-SHA:256) (Exim 4.76) (envelope-from ) id 1RkKRe-0003mH-BY for freebsd-fs@FreeBSD.org; Mon, 09 Jan 2012 22:59:46 +0400 Received: from hades.panopticon (hades.panopticon [192.168.0.32]) by hive.panopticon (Postfix) with ESMTP id 6B223B84D for ; Mon, 9 Jan 2012 22:59:45 +0400 (MSK) Received: by hades.panopticon (Postfix, from userid 1000) id 14DF891; Mon, 9 Jan 2012 22:59:45 +0400 (MSK) Date: Mon, 9 Jan 2012 22:59:45 +0400 From: Dmitry Marakasov To: freebsd-fs@FreeBSD.org Message-ID: <20120109185944.GA8140@hades.panopticon> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) Cc: Subject: Issues with multiple-vdev ZFS root X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Jan 2012 19:26:55 -0000 Hi! I've recently moved to ZFS root installation with 2-way mirror with an intent to add another pair of disks later, however when I've actually tried to add second vdev, I've got "root pool can not have multiple vdevs or separate logs" error. It's possible to fool this error by removing bootfs property from the pool, adding second vdev and the re-adding the property. I've decided to first experiment in virtualbox. It turned out that after adding second mirror this way, system was still bootable. However, after copying /boot over (mv /boot /boot.old && cp -RPp /boot.old /boot) the system won't boot with ZFS: i/o error - all block copies unavailable Invalid format error (see [1]). It seems like the files have landed onto second vdev and gptzfsboot is no longer able to get them. Still, I've managed to solve that by increasing number of copies for the pool to 3 and copying /boot over again, which likely made the files available from first vdev again. Later I was hinted that the cause of this is that virtualbox BIOS only reports first disk, so the whole second vdev is unavailable for gptzfsboot. My question is whether these conclusions are correct: - FreeBSD supports any configuration for root zfs pool, including one with multiple vdevs. - However, one should ensure that BIOS reports all disks or increase number of copies for bootable dataset (and this should probably be documented somewhere) - The 'root pool can not have multiple vdevs' check should be removed [1] http://amdmi3.ru/files/zfs-boot-fail.png -- Dmitry Marakasov . 55B5 0596 FF1E 8D84 5F56 9510 D35A 80DD F9D2 F77D amdmi3@amdmi3.ru ..: jabber: amdmi3@jabber.ru http://www.amdmi3.ru From owner-freebsd-fs@FreeBSD.ORG Mon Jan 9 22:52:08 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 77E98106564A; Mon, 9 Jan 2012 22:52:08 +0000 (UTC) (envelope-from des@des.no) Received: from smtp.des.no (smtp.des.no [194.63.250.102]) by mx1.freebsd.org (Postfix) with ESMTP id 32A208FC16; Mon, 9 Jan 2012 22:52:07 +0000 (UTC) Received: from ds4.des.no (des.no [84.49.246.2]) by smtp.des.no (Postfix) with ESMTP id 047F9632F; Mon, 9 Jan 2012 22:52:06 +0000 (UTC) Received: by ds4.des.no (Postfix, from userid 1001) id B3438816B; Mon, 9 Jan 2012 23:52:06 +0100 (CET) From: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= To: Adrian Chadd References: <86ty4a8mc3.fsf@ds4.des.no> Date: Mon, 09 Jan 2012 23:52:06 +0100 In-Reply-To: (Adrian Chadd's message of "Fri, 6 Jan 2012 13:30:31 -0800") Message-ID: <86ehv8xze1.fsf@ds4.des.no> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.3 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org, freebsd-current , freebsd-arch@freebsd.org Subject: Re: Is it possible to make subr_acl_nfs4 and subr_acl_posix1e disabled? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Jan 2012 22:52:08 -0000 Adrian Chadd writes: > Dag-Erling Sm=C3=B8rgrav writes: > > I would be very annoyed if it were no longer possible to netboot > > GENERIC... > I don't want to break that. :) I Just don't want to compile it in > unless I'm using NFS/ZFS, and on my 4MB flash boards I'm not booting > w/ NFS compiled in statically.. Sorry, I just realized that I read the text of your message but not the subject; I thought you were proposing to remove NFS from GENERIC. DES --=20 Dag-Erling Sm=C3=B8rgrav - des@des.no From owner-freebsd-fs@FreeBSD.ORG Tue Jan 10 13:28:57 2012 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 60880106566B for ; Tue, 10 Jan 2012 13:28:57 +0000 (UTC) (envelope-from jhs@berklix.com) Received: from tower.berklix.org (tower.berklix.org [83.236.223.114]) by mx1.freebsd.org (Postfix) with ESMTP id CAFB48FC14 for ; Tue, 10 Jan 2012 13:28:56 +0000 (UTC) Received: from mart.js.berklix.net (pD9FBEECD.dip.t-dialin.net [217.251.238.205]) (authenticated bits=0) by tower.berklix.org (8.14.2/8.14.2) with ESMTP id q0ADFhg7021122 for ; Tue, 10 Jan 2012 13:15:44 GMT (envelope-from jhs@berklix.com) Received: from fire.js.berklix.net (fire.js.berklix.net [192.168.91.41]) by mart.js.berklix.net (8.14.3/8.14.3) with ESMTP id q0ADFWfw041833 for ; Tue, 10 Jan 2012 14:15:33 +0100 (CET) (envelope-from jhs@berklix.com) Received: from fire.js.berklix.net (localhost [127.0.0.1]) by fire.js.berklix.net (8.14.4/8.14.4) with ESMTP id q0ADFQ5f017969 for ; Tue, 10 Jan 2012 14:15:32 +0100 (CET) (envelope-from jhs@fire.js.berklix.net) Message-Id: <201201101315.q0ADFQ5f017969@fire.js.berklix.net> To: fs@freebsd.org From: "Julian H. Stacey" Organization: http://www.berklix.com BSD Linux Unix Consultancy, Munich Germany User-agent: EXMH on FreeBSD http://www.berklix.com/free/ X-URL: http://www.berklix.com/~jhs/cv/ Date: Tue, 10 Jan 2012 14:15:26 +0100 Sender: jhs@berklix.com Cc: Subject: unexpected soft update inconsistency - cannot fix X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 Jan 2012 13:28:57 -0000 Hi FS experts, Any thoughts on this repeat failure of fsck ? Detail below. There's nothing on the partition that I can't recreate, no backup needed. So a (one off *) opportunity to fix fsck or test an enhanced fsck ? (* partition is too big to copy so I only get one go at this) Should I - try fsdb. - compile current/ fsck/ & try that - or does anyone have new uncommited fsck code to compile & try ? ... - some extra fsck in ports maybe ? (None in SEE ALSO of man fsck.) uname -a FreeBSD laph.js.berklix.net 8.2-RELEASE FreeBSD 8.2-RELEASE \ #0: Thu Feb 17 02:41:51 UTC 2011 \ root@mason.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC amd64 disklabel /dev/ad4s4 8 partitions: # size offset fstype [fsize bsize bps/cpg] ... g: 1269373152 69752832 4.2BSD 0 0 0 fsck -y /dev/ad4s4g ** /dev/ad4s4g ** Last Mounted on /usr1 ** Phase 1 - Check Blocks and Sizes ** Phase 2 - Check Pathnames MISSING '..' I=825575 OWNER=mailnull MODE=40755 SIZE=512 MTIME=Dec 29 22:12 2011 DIR=? UNEXPECTED SOFT UPDATE INCONSISTENCY CANNOT FIX, SECOND ENTRY IN DIRECTORY CONTAINS Makefile,v UNEXPECTED SOFT UPDATE INCONSISTENCY ** Phase 3 - Check Connectivity ** Phase 4 - Check Reference Counts ** Phase 5 - Check Cyl groups 1360879 files, 126366566 used, 180989615 free (1437463 frags, 22444019 blocks, 0.5% fragmentation) ***** FILE SYSTEM MARKED CLEAN ***** fsck -y /dev/ad4s4g ** /dev/ad4s4g ** Last Mounted on /usr1 ** Phase 1 - Check Blocks and Sizes ** Phase 2 - Check Pathnames MISSING '..' I=825575 OWNER=mailnull MODE=40755 SIZE=512 MTIME=Dec 29 22:12 2011 DIR=? UNEXPECTED SOFT UPDATE INCONSISTENCY CANNOT FIX, SECOND ENTRY IN DIRECTORY CONTAINS Makefile,v UNEXPECTED SOFT UPDATE INCONSISTENCY ** Phase 3 - Check Connectivity ** Phase 4 - Check Reference Counts ** Phase 5 - Check Cyl groups 1360879 files, 126366566 used, 180989615 free (1437463 frags, 22444019 blocks, 0.5% fragmentation) ***** FILE SYSTEM IS CLEAN ***** fsck_ufs -b 160 /dev/ad4s4g Alternate super block location: 160 ** /dev/ad4s4g ** Last Mounted on ** Phase 1 - Check Blocks and Sizes ** Phase 2 - Check Pathnames MISSING '..' I=825575 OWNER=mailnull MODE=40755 SIZE=512 MTIME=Dec 29 22:12 2011 DIR=? UNEXPECTED SOFT UPDATE INCONSISTENCY CANNOT FIX, SECOND ENTRY IN DIRECTORY CONTAINS Makefile,v UNEXPECTED SOFT UPDATE INCONSISTENCY ** Phase 3 - Check Connectivity ** Phase 4 - Check Reference Counts ** Phase 5 - Check Cyl groups SUMMARY BLK COUNT(S) WRONG IN SUPERBLK SALVAGE? [yn] SALVAGE? [yn] SALVAGE? [yn] y 1360879 files, 126366566 used, 180989615 free (1437463 frags, 22444019 blocks, 0.5% fragmentation) UPDATE STANDARD SUPERBLOCK? [yn] y ***** FILE SYSTEM IS CLEAN ***** ***** FILE SYSTEM WAS MODIFIED ***** fsck /dev/ad4s4g ** /dev/ad4s4g ** Last Mounted on ** Phase 1 - Check Blocks and Sizes ** Phase 2 - Check Pathnames MISSING '..' I=825575 OWNER=mailnull MODE=40755 SIZE=512 MTIME=Dec 29 22:12 2011 DIR=? UNEXPECTED SOFT UPDATE INCONSISTENCY CANNOT FIX, SECOND ENTRY IN DIRECTORY CONTAINS Makefile,v UNEXPECTED SOFT UPDATE INCONSISTENCY ** Phase 3 - Check Connectivity ** Phase 4 - Check Reference Counts ** Phase 5 - Check Cyl groups 1360879 files, 126366566 used, 180989615 free (1437463 frags, 22444019 blocks, 0.5% fragmentation) ***** FILE SYSTEM IS CLEAN ***** fsck_ufs -b 160 /dev/ad4s4g Alternate super block location: 160 ** /dev/ad4s4g ** Last Mounted on ** Phase 1 - Check Blocks and Sizes ** Phase 2 - Check Pathnames MISSING '..' I=825575 OWNER=mailnull MODE=40755 SIZE=512 MTIME=Dec 29 22:12 2011 DIR=? UNEXPECTED SOFT UPDATE INCONSISTENCY CANNOT FIX, SECOND ENTRY IN DIRECTORY CONTAINS Makefile,v UNEXPECTED SOFT UPDATE INCONSISTENCY ** Phase 3 - Check Connectivity ** Phase 4 - Check Reference Counts ** Phase 5 - Check Cyl groups 1360879 files, 126366566 used, 180989615 free (1437463 frags, 22444019 blocks, 0.5% fragmentation) UPDATE STANDARD SUPERBLOCK? [yn] y ***** FILE SYSTEM IS CLEAN ***** ***** FILE SYSTEM WAS MODIFIED ***** fsck_ufs -b 160 /dev/ad4s4g Alternate super block location: 160 ** /dev/ad4s4g ** Last Mounted on ** Phase 1 - Check Blocks and Sizes ** Phase 2 - Check Pathnames MISSING '..' I=825575 OWNER=mailnull MODE=40755 SIZE=512 MTIME=Dec 29 22:12 2011 DIR=? UNEXPECTED SOFT UPDATE INCONSISTENCY CANNOT FIX, SECOND ENTRY IN DIRECTORY CONTAINS Makefile,v UNEXPECTED SOFT UPDATE INCONSISTENCY ** Phase 3 - Check Connectivity ** Phase 4 - Check Reference Counts ** Phase 5 - Check Cyl groups 1360879 files, 126366566 used, 180989615 free (1437463 frags, 22444019 blocks, 0.5% fragmentation) UPDATE STANDARD SUPERBLOCK? [yn] y ***** FILE SYSTEM IS CLEAN ***** ***** FILE SYSTEM WAS MODIFIED ***** fsck_ufs /dev/ad4s4g ** /dev/ad4s4g ** Last Mounted on ** Phase 1 - Check Blocks and Sizes ** Phase 2 - Check Pathnames MISSING '..' I=825575 OWNER=mailnull MODE=40755 SIZE=512 MTIME=Dec 29 22:12 2011 DIR=? UNEXPECTED SOFT UPDATE INCONSISTENCY CANNOT FIX, SECOND ENTRY IN DIRECTORY CONTAINS Makefile,v UNEXPECTED SOFT UPDATE INCONSISTENCY ** Phase 3 - Check Connectivity ** Phase 4 - Check Reference Counts ** Phase 5 - Check Cyl groups 1360879 files, 126366566 used, 180989615 free (1437463 frags, 22444019 blocks, 0.5% fragmentation) ***** FILE SYSTEM IS CLEAN ***** To avoid inadvertent access I have changed fstab to ufs ro,noauto It's an internal drive on a notebook PC http://berklix.com/~jhs/hardware/hp/pavilion/dm3-1155ea that has had some overheating problems, (that I'm pursuing on a seperate thread), but its not crashed in days, & busy running other stuff, so probably no problem running fsck. df /dev/ad4s4g Filesystem 1K-blocks Used Avail Capacity Mounted on /dev/ad4s4g 614712362 252733132 312802242 45% tunefs -p /dev/ad4s4g tunefs: POSIX.1e ACLs: (-a) disabled tunefs: NFSv4 ACLs: (-N) disabled tunefs: MAC multilabel: (-l) disabled tunefs: soft updates: (-n) enabled tunefs: gjournal: (-J) disabled tunefs: maximum blocks per file in a cylinder group: (-e) 2048 tunefs: average file size: (-f) 16384 tunefs: average number of files in a directory: (-s) 64 tunefs: minimum percentage of free space: (-m) 8% tunefs: optimization preference: (-o) time tunefs: volume label: (-L) dumpfs -m /dev/ad4s4g newfs -O 2 -U -a 8 -b 16384 -d 16384 -e 2048 -f 2048 -g \ 16384 -h 64 -m 8 -o time -s 1269373152 /dev/ad4s4g dumpfs -f /dev/ad4s4g | wc -l 343954 343954 6261059 ( I could upload that to my web if reqested. ) PS I wrote a trivial Makefile to test exit values, to prove fsck is failing to assert appropriate value of non zero: xxx: fsck_ufs -y /dev/ad4s4g @echo yes1 fsck_ufs -y /dev/ad4s4g @echo yes2 Result: fsck_ufs -y /dev/ad4s4g ** /dev/ad4s4g ** Last Mounted on ** Phase 1 - Check Blocks and Sizes ** Phase 2 - Check Pathnames MISSING '..' I=825575 OWNER=mailnull MODE=40755 SIZE=512 MTIME=Dec 29 22:12 2011 DIR=? UNEXPECTED SOFT UPDATE INCONSISTENCY CANNOT FIX, SECOND ENTRY IN DIRECTORY CONTAINS Makefile,v UNEXPECTED SOFT UPDATE INCONSISTENCY ** Phase 3 - Check Connectivity ** Phase 4 - Check Reference Counts ** Phase 5 - Check Cyl groups 1360879 files, 126366566 used, 180989615 free (1437463 frags, 22444019 blocks, 0.5% fragmentation) ***** FILE SYSTEM IS CLEAN ***** yes1 fsck_ufs -y /dev/ad4s4g ** /dev/ad4s4g ** Last Mounted on ** Phase 1 - Check Blocks and Sizes ** Phase 2 - Check Pathnames MISSING '..' I=825575 OWNER=mailnull MODE=40755 SIZE=512 MTIME=Dec 29 22:12 2011 DIR=? UNEXPECTED SOFT UPDATE INCONSISTENCY CANNOT FIX, SECOND ENTRY IN DIRECTORY CONTAINS Makefile,v UNEXPECTED SOFT UPDATE INCONSISTENCY ** Phase 3 - Check Connectivity ** Phase 4 - Check Reference Counts ** Phase 5 - Check Cyl groups 1360879 files, 126366566 used, 180989615 free (1437463 frags, 22444019 blocks, 0.5% fragmentation) ***** FILE SYSTEM IS CLEAN ***** yes2 Cheers, Julian -- Julian Stacey, BSD Unix Linux C Sys Eng Consultants Munich http://berklix.com Reply below not above, cumulative like a play script, & indent with "> ". Format: Plain text. Not HTML, multipart/alternative, base64, quoted-printable. From owner-freebsd-fs@FreeBSD.ORG Tue Jan 10 17:50:06 2012 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A929E1065672 for ; Tue, 10 Jan 2012 17:50:06 +0000 (UTC) (envelope-from mckusick@mckusick.com) Received: from chez.mckusick.com (unknown [IPv6:2001:5a8:4:7e72:4a5b:39ff:fe12:452]) by mx1.freebsd.org (Postfix) with ESMTP id 8CC838FC12 for ; Tue, 10 Jan 2012 17:50:06 +0000 (UTC) Received: from chez.mckusick.com (localhost [127.0.0.1]) by chez.mckusick.com (8.14.3/8.14.3) with ESMTP id q0AHnn1s053527; Tue, 10 Jan 2012 09:49:54 -0800 (PST) (envelope-from mckusick@chez.mckusick.com) Message-Id: <201201101749.q0AHnn1s053527@chez.mckusick.com> To: "Julian H. Stacey" In-reply-to: <201201101315.q0ADFQ5f017969@fire.js.berklix.net> Date: Tue, 10 Jan 2012 09:49:49 -0800 From: Kirk McKusick X-Spam-Status: No, score=0.0 required=5.0 tests=MISSING_MID, UNPARSEABLE_RELAY autolearn=failed version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on chez.mckusick.com Cc: fs@freebsd.org Subject: Re: unexpected soft update inconsistency - cannot fix X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 Jan 2012 17:50:06 -0000 The problem is that you somehow lost your ".." entry in the directory associated with inode 825575. That entry was then replaced by "Makefile,v". Because ".." is missing, fsck cannot figure out its parent and hence the pathname of the directory. To fix, do the following: cd to mountpoint of filesystem find . -inum 825575 -print cd to the directory identified by find mv Makefile,v Makefile,v.sav cd / unmount filesystem run fsck which should now be able to create ".." mount filesystem cd to affected directory mv Makefile,v.sav Makefile,v Kirk McKusick From owner-freebsd-fs@FreeBSD.ORG Tue Jan 10 18:18:31 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A9029106564A for ; Tue, 10 Jan 2012 18:18:29 +0000 (UTC) (envelope-from knarf@knarf.de) Received: from mail.server-king.de (mail.server-king.de [IPv6:2a01:4f8:100:41a2::25]) by mx1.freebsd.org (Postfix) with ESMTP id 2F4948FC13 for ; Tue, 10 Jan 2012 18:18:28 +0000 (UTC) Received: from cheese.server-king.de (localhost [127.0.0.1]) by mail.server-king.de (8.14.5/8.14.5) with ESMTP id q0AIIRfS054282 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Tue, 10 Jan 2012 19:18:27 +0100 (CET) (envelope-from knarf@knarf.de) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=knarf.de; s=mail.server-king.de; t=1326219508; bh=q1d0yjbzTDwWIhASQ56u8WrtijCJXmbJpbxxPNPVpdg=; h=Date:From:To:Subject:Message-ID:MIME-Version:Content-Type; b=ZoyLp3WMOtE8Bh1C72FS4xkN9HtmwNljAfKT9zXvTSP7G3h2sJyz5dfoeg0/6xYXl sAK3DuHl7aEPeTb4mCaSZ4fdjGIGwV9J3oBhLicQa1XR7QqcTu5s3CK7x8Qp1ZR8jW RB4d3pqea3ubJIYnf5aRQmsEjkLEODW3DjwsLhMk= Received: (from knarf@localhost) by cheese.server-king.de (8.14.5/8.14.5/Submit) id q0AIIRSm054281 for freebsd-fs@freebsd.org; Tue, 10 Jan 2012 19:18:27 +0100 (CET) (envelope-from knarf@knarf.de) X-Authentication-Warning: cheese.server-king.de: knarf set sender to knarf@knarf.de using -f Date: Tue, 10 Jan 2012 19:18:27 +0100 From: Frank Bartels To: freebsd-fs@freebsd.org Message-ID: <20120110181827.GA7601@server-king.de> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.2.7 (mail.server-king.de [127.0.0.1]); Tue, 10 Jan 2012 19:18:28 +0100 (CET) Subject: Mounting from zfs:zroot failed with error 6. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 Jan 2012 18:18:31 -0000 Hi, I use FreeBSD 9.0-RELEASE. I boot from USB stick (16 GB) using gptzfsboot. Everything works fine. In the past weeks I've read a lot about the 4 KB alignment and ashift=12. But at the time I've created the stick the first time I had no idea about all these problems. I still have no clue what the physical sector size of my USB stick is (more than 4 KB for sure, maybe 128 KB or 256 KB?) and why gnop does not allow more than 8 KB, but that's another story. This is my old stick, created the wrong way: bkool:/root# gpart show da1 => 34 30883773 da1 GPT (14G) 34 128 1 freebsd-boot (64k) 162 8019968 2 freebsd-swap (3.8G) 8020130 22863677 3 freebsd-zfs (10G) And this is the new one, created the right way: bkool:/root# gpart show da0 => 34 30871485 da0 GPT (14G) 34 6 - free - (3.0k) 40 128 1 freebsd-boot (64k) 168 1880 - free - (940k) 2048 8388608 2 freebsd-swap (4.0G) 8390656 22480863 3 freebsd-zfs (10G) So my plan is to create a new zpool with altroot= and cachefile=, rsync the old stick to the new one, copy over the cachefile, double check loader.conf vfs.root.mountfrom= and zfs bootfs=. This worked several times before in other environments. But here it does not. If I try to boot, the new stick runs gptzfsboot, loads the kernel and fails with "Mounting from zfs:bkool9 failed with error 6.". In the past (maybe it was with 8.2-RELEASE) I had the same problem. The solution was to type "zfs:zroot" (*same* value from loader.conf) at the prompt and it worked. But this time this "trick" does not help. I've seen this error message several times now. I tried to create the new zpool with ashift=12 (gnop -S 4K) and ashift=9. I double checked gpart bootcode. kernel is GENERIC with DDB enabled, no other changes. But it won't mount root. The interesting part is, if I boot from the old stick and inspect the new one, I see this: bkool:/root# zpool import pool: bkool9 id: 2362879880167335458 state: FAULTED status: One or more devices contains corrupted data. action: The pool cannot be imported due to damaged devices or data. The pool may be active on another system, but can be imported using the '-f' flag. see: http://www.sun.com/msg/ZFS-8000-5E config: bkool9 FAULTED corrupted data 5856161150314576551 UNAVAIL corrupted data But, if I tell zpool to look for devices in /dev, I get this: bkool:/root# zpool import -d /dev pool: bkool9 id: 2362879880167335458 state: ONLINE action: The pool can be imported using its name or numeric identifier. config: bkool9 ONLINE gpt/bkool9-disk0 ONLINE What's going on here? Why is the kernel not able to find the zpool during boot (after loading the loader and kernel from it!) even if "/dev/gpt/bkool9-disk0" is part of zpool.cache? And why is looking for devices in /dev or /dev/gpt not default if I use zpool import? The manual page says "/dev/dsk" is default, but does this not only make sense under Solaris? And what does "5856161150314576551" mean? Where does the information about the pool bkool9 come from if the only vdev is UNAVAIL and there are no hints in /boot/zfs/zpool.cache? Thanks, Frank From owner-freebsd-fs@FreeBSD.ORG Tue Jan 10 20:11:51 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: by hub.freebsd.org (Postfix, from userid 1233) id B0AD51065672; Tue, 10 Jan 2012 20:11:51 +0000 (UTC) Date: Tue, 10 Jan 2012 20:11:51 +0000 From: Alexander Best To: freebsd-fs@freebsd.org, freebsd-usb@freebsd.org Message-ID: <20120110201151.GA23484@freebsd.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Cc: Subject: issue with usb hdd and SU+J X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 Jan 2012 20:11:51 -0000 can somebody help me with this issue? i'm running HEAD on amd64: 1) connect my usb hdd 2) mount it (/mnt/wd) 3) don't access it for a few hours 4) ls /mnt/wd returns nothing; doing ls or pwd from /mnt/wd returns ENXIO 5) unmount /mnt/wd succeeds 6) mount /mnt/wd fails then i did fsck /mnt/wd three times to be sure the drive didn't contain any errors: otaku% fsck /dev/ufs/wd ** /dev/ufs/wd USE JOURNAL? [yn] y ** SU+J Recovering /dev/ufs/wd ** Reading 33554432 byte journal from inode 4. RECOVER? [yn] y ** Building recovery table. ** Resolving unreferenced inode list. ** Processing journal entries. WRITE CHANGES? [yn] y ** 4 journal records in 1024 bytes for 12.50% utilization ** Freed 0 inodes (0 dirs) 0 blocks, and 0 frags. ***** FILE SYSTEM MARKED CLEAN ***** otaku% fsck /dev/ufs/wd ** /dev/ufs/wd USE JOURNAL? [yn] y ** SU+J Recovering /dev/ufs/wd Journal timestamp does not match fs mount time ** Skipping journal, falling through to full fsck ** Last Mounted on /mnt/wd ** Phase 1 - Check Blocks and Sizes ** Phase 2 - Check Pathnames ** Phase 3 - Check Connectivity ** Phase 4 - Check Reference Counts ** Phase 5 - Check Cyl groups 1420 files, 170686465 used, 69659704 free (304 frags, 8707425 blocks, 0.0% fragmentation) ***** FILE SYSTEM IS CLEAN ***** ***** FILE SYSTEM WAS MODIFIED ***** otaku% fsck /dev/ufs/wd ** /dev/ufs/wd USE JOURNAL? [yn] y ** SU+J Recovering /dev/ufs/wd Journal timestamp does not match fs mount time ** Skipping journal, falling through to full fsck ** Last Mounted on /mnt/wd ** Phase 1 - Check Blocks and Sizes ** Phase 2 - Check Pathnames ** Phase 3 - Check Connectivity ** Phase 4 - Check Reference Counts ** Phase 5 - Check Cyl groups 1420 files, 170686465 used, 69659704 free (304 frags, 8707425 blocks, 0.0% fragmentation) ***** FILE SYSTEM IS CLEAN ***** ***** FILE SYSTEM WAS MODIFIED ***** ...i found this umass error in my dmesg: (probe0:umass-sim0:0:0:1): TEST UNIT READY. CDB: 0 0 0 0 0 0 (probe0:umass-sim0:0:0:1): CAM status: SCSI Status Error (probe0:umass-sim0:0:0:1): SCSI status: Check Condition (probe0:umass-sim0:0:0:1): SCSI sense: ILLEGAL REQUEST asc:20,0 (Invalid command operation code) ...and this lor witness warning: lock order reversal: 1st 0xfffffe006fb4fbd8 ufs (ufs) @ /usr/subversion-src/sys/kern/vfs_mount.c:1209 2nd 0xfffffe000dbaedb8 syncer (syncer) @ /usr/subversion-src/sys/kern/vfs_subr.c:2279 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b kdb_backtrace() at kdb_backtrace+0x39 witness_checkorder() at witness_checkorder+0x6c5 __lockmgr_args() at __lockmgr_args+0x382 vop_stdlock() at vop_stdlock+0x3c VOP_LOCK1_APV() at VOP_LOCK1_APV+0x42 _vn_lock() at _vn_lock+0x43 vputx() at vputx+0x140 dounmount() at dounmount+0x27f sys_unmount() at sys_unmount+0x251 amd64_syscall() at amd64_syscall+0x1f8 Xfast_syscall() at Xfast_syscall+0xfb --- syscall (22, FreeBSD ELF64, sys_unmount), rip = 0x80088ab8c, rsp = 0x7fffffffd2b8, rbp = 0x800c09098 --- cheers. alex ps: camcontrol devlist reports: at scbus0 target 0 lun 0 (pass0,ada0) at scbus1 target 0 lun 0 (pass1,ada1) at scbus2 target 0 lun 0 (pass2,cd0) at scbus6 target 0 lun 0 (da0,pass3) at scbus6 target 0 lun 1 (pass4) unfortunately camcontrol identify won't work with {da0,pass3,pass4}. nothing gets output and echo $? returns 1. this is the dmesg output when the drive gets attached: ugen3.3: at usbus3 umass0: on usbus3 da0 at umass-sim0 bus 0 scbus6 target 0 lun 0 da0: Fixed Direct Access SCSI-6 device da0: 40.000MB/s transfers da0: 953837MB (1953458176 512 byte sectors: 255H 63S/T 121597C) From owner-freebsd-fs@FreeBSD.ORG Tue Jan 10 21:47:23 2012 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8E9251065676 for ; Tue, 10 Jan 2012 21:47:23 +0000 (UTC) (envelope-from jhs@berklix.com) Received: from tower.berklix.org (tower.berklix.org [83.236.223.114]) by mx1.freebsd.org (Postfix) with ESMTP id 1F7578FC13 for ; Tue, 10 Jan 2012 21:47:22 +0000 (UTC) Received: from mart.js.berklix.net (p5DCBDD86.dip.t-dialin.net [93.203.221.134]) (authenticated bits=0) by tower.berklix.org (8.14.2/8.14.2) with ESMTP id q0ALlKAP025316; Tue, 10 Jan 2012 21:47:21 GMT (envelope-from jhs@berklix.com) Received: from fire.js.berklix.net (fire.js.berklix.net [192.168.91.41]) by mart.js.berklix.net (8.14.3/8.14.3) with ESMTP id q0ALl9Ye017017; Tue, 10 Jan 2012 22:47:09 +0100 (CET) (envelope-from jhs@berklix.com) Received: from fire.js.berklix.net (localhost [127.0.0.1]) by fire.js.berklix.net (8.14.4/8.14.4) with ESMTP id q0ALkvhW033512; Tue, 10 Jan 2012 22:47:09 +0100 (CET) (envelope-from jhs@fire.js.berklix.net) Message-Id: <201201102147.q0ALkvhW033512@fire.js.berklix.net> To: Kirk McKusick From: "Julian H. Stacey" Organization: http://www.berklix.com BSD Unix Linux Consultancy, Munich Germany User-agent: EXMH on FreeBSD http://www.berklix.com/free/ X-URL: http://www.berklix.com In-reply-to: Your message "Tue, 10 Jan 2012 09:49:49 PST." <201201101749.q0AHnn1s053527@chez.mckusick.com> Date: Tue, 10 Jan 2012 22:46:57 +0100 Sender: jhs@berklix.com Cc: fs@freebsd.org Subject: Re: unexpected soft update inconsistency - cannot fix X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 Jan 2012 21:47:23 -0000 Hi Kirk cc fs@ Kirk McKusick wrote: > The problem is that you somehow lost your ".." entry in the > directory associated with inode 825575. That entry was then > replaced by "Makefile,v". Because ".." is missing, fsck cannot > figure out its parent and hence the pathname of the directory. > > To fix, do the following: > > cd to mountpoint of filesystem > find . -inum 825575 -print > cd to the directory identified by find > mv Makefile,v Makefile,v.sav > cd / > unmount filesystem > run fsck which should now be able to create ".." > mount filesystem > cd to affected directory > mv Makefile,v.sav Makefile,v > > Kirk McKusick Thanks ! Meantime I'd been looking & made notes below. I don't understand how the above "mv Makefile,v Makefile,v.sav" would create a new slot ? wouldnt' it just call rename(2) ? So there'd still be no slot to re-label as ".." to point at parent inode ? Wouldn't it be better to call fsdb + "ln 79302719 .." as below ? Also as notes below show inode of parent can be deduced, wouldn't it be good if such functionality were built into fsck or fsdb code (or pro tem added as an example to man fsdb ?) --------- fsdb -r /dev/ad4s4g ** /dev/ad4s4g (NO WRITE) Examining file system `/dev/ad4s4g' Last Mounted on current inode: directory I=2 MODE=40755 SIZE=512 OWNER=root GRP=wheel LINKCNT=9 FLAGS=0 BLKCNT=4 GEN=359f3dd1 fsdb (inum: 2)> inode 825575 current inode: directory I=825575 MODE=40755 SIZE=512 OWNER=mailnull GRP=mailnull LINKCNT=2 FLAGS=0 BLKCNT=4 GEN=29c28025 fsdb (inum: 825575)> ls slot 0 ino 825575 reclen 20: directory, `.' slot 1 ino 825580 reclen 20: regular, `Makefile,v' slot 2 ino 825581 reclen 20: regular, `distinfo,v' slot 3 ino 825582 reclen 20: regular, `pkg-descr,v' slot 4 ino 825583 reclen 432: regular, `pkg-plist,v' fsdb (inum: 825575)> print current inode: directory I=825575 MODE=40755 SIZE=512 OWNER=mailnull GRP=mailnull LINKCNT=2 FLAGS=0 BLKCNT=4 GEN=29c28025 # LINKCNT=2 seems wrong, comparing with a goo directory, # I think should be 4 without the ".." & 5 with "..". fsdb (inum: 825575)> uplink fsdb: `uplink' requires write access fsdb: rval was 1 fsdb (inum: 825575)> To Do Later With Fsdb ( without -r ) uplink uplink ln 79302719 .. mount /usr1 ; mount | grep /usr1 /dev/ad4s4g on /usr1 (ufs, local, read-only) cd /usr1 ; find -x . -inum 825575 ./ftp/.backup/pri/FreeBSD/development/FreeBSD-CVS/\ ports/net/keepalived/Attic cd ftp/.backup/pri/FreeBSD/development/FreeBSD-CVS/ports/net/keepalived ls -lai total 50 79302719 drwxr-xr-x 3 mailnull mailnull 512 Dec 29 20:09 ./ 73700289 drwxr-xr-x 2109 mailnull mailnull 43520 Dec 29 20:08 ../ 825575 drwxr-xr-x 2 mailnull mailnull 512 Dec 29 22:12 Attic/ 79302722 drwxr-xr-x 3 mailnull mailnull 512 Dec 29 22:12 files/ fsdb (inum: 825575)> inode 79302719 current inode: directory I=79302719 MODE=40755 SIZE=512 OWNER=mailnull GRP=mailnull LINKCNT=3 FLAGS=0 BLKCNT=4 GEN=293a450d fsdb (inum: 79302719)> ls slot 0 ino 79302719 reclen 12: directory, `.' slot 1 ino 73700289 reclen 12: directory, `..' slot 2 ino 79302722 reclen 16: directory, `files' slot 3 ino 825575 reclen 472: directory, `Attic' --------- Cheers, Julian -- Julian Stacey, BSD Unix Linux C Sys Eng Consultants Munich http://berklix.com Reply below not above, cumulative like a play script, & indent with "> ". Format: Plain text. Not HTML, multipart/alternative, base64, quoted-printable. From owner-freebsd-fs@FreeBSD.ORG Tue Jan 10 22:21:14 2012 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 399491065673 for ; Tue, 10 Jan 2012 22:21:14 +0000 (UTC) (envelope-from mckusick@mckusick.com) Received: from chez.mckusick.com (unknown [IPv6:2001:5a8:4:7e72:4a5b:39ff:fe12:452]) by mx1.freebsd.org (Postfix) with ESMTP id 0F2A18FC08 for ; Tue, 10 Jan 2012 22:21:14 +0000 (UTC) Received: from chez.mckusick.com (localhost [127.0.0.1]) by chez.mckusick.com (8.14.3/8.14.3) with ESMTP id q0AML8MX012837; Tue, 10 Jan 2012 14:21:08 -0800 (PST) (envelope-from mckusick@chez.mckusick.com) Message-Id: <201201102221.q0AML8MX012837@chez.mckusick.com> To: "Julian H. Stacey" In-reply-to: <201201102147.q0ALkvhW033512@fire.js.berklix.net> Date: Tue, 10 Jan 2012 14:21:08 -0800 From: Kirk McKusick X-Spam-Status: No, score=0.0 required=5.0 tests=MISSING_MID, UNPARSEABLE_RELAY autolearn=failed version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on chez.mckusick.com Cc: fs@freebsd.org Subject: Re: unexpected soft update inconsistency - cannot fix X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 Jan 2012 22:21:14 -0000 > To: Kirk McKusick > cc: fs@freebsd.org > Subject: Re: unexpected soft update inconsistency - cannot fix > From: "Julian H. Stacey" > Date: Tue, 10 Jan 2012 22:46:57 +0100 > > Hi Kirk > cc fs@ > > Thanks ! Meantime I'd been looking & made notes below. > I don't understand how the above "mv Makefile,v Makefile,v.sav" > would create a new slot ? wouldnt' it just call rename(2) ? > So there'd still be no slot to re-label as ".." to point at parent inode ? The rename will have to create a new slot for "Makefile,v.sav" which will be at the end of the directory (as that is the only place where there is enough space for it). That will leave slot 0 with a reclen of 40. With a reclen of 40, there will be enough space following the "." entry (which is using the first 12) that fsck will be able to fit the ".." entry into the unused space immediately following the "." entry where it belongs. > Wouldn't it be better to call fsdb + "ln 79302719 .." as below ? If you do that the ".." entry will be placed at the end of the directory where it does not belong. Fsck will still bitch and will still be unable to fix it. > Also as notes below show inode of parent can be deduced, > wouldn't it be good if such functionality were built into fsck or fsdb code > (or pro tem added as an example to man fsdb ?) The path is calculated by getting the inode associated with ".." in the current inode. You then read the ".." inode and find the name associated with the current inode in it. You repeat this process level by level until you reach the root of the filesystem (inode 2) at which point you have the full path. If the only information you have is the contents of an inode that is missing its ".." entry you cannot figure out its parent or its path. At the point in fsck where this error occurs you do not know what the namespace looks like, you have only a subset of the inodes. So there is no way to figure out the path. --------- fsdb -r /dev/ad4s4g ** /dev/ad4s4g (NO WRITE) Examining file system `/dev/ad4s4g' Last Mounted on current inode: directory I=2 MODE=40755 SIZE=512 OWNER=root GRP=wheel LINKCNT=9 FLAGS=0 BLKCNT=4 GEN=359f3dd1 fsdb (inum: 2)> inode 825575 current inode: directory I=825575 MODE=40755 SIZE=512 OWNER=mailnull GRP=mailnull LINKCNT=2 FLAGS=0 BLKCNT=4 GEN=29c28025 fsdb (inum: 825575)> ls slot 0 ino 825575 reclen 20: directory, `.' slot 1 ino 825580 reclen 20: regular, `Makefile,v' slot 2 ino 825581 reclen 20: regular, `distinfo,v' slot 3 ino 825582 reclen 20: regular, `pkg-descr,v' slot 4 ino 825583 reclen 432: regular, `pkg-plist,v' fsdb (inum: 825575)> print current inode: directory I=825575 MODE=40755 SIZE=512 OWNER=mailnull GRP=mailnull LINKCNT=2 FLAGS=0 BLKCNT=4 GEN=29c28025 # LINKCNT=2 seems wrong, comparing with a goo directory, # I think should be 4 without the ".." & 5 with "..". The ./ftp/.backup/pri/FreeBSD/development/FreeBSD-CVS/ports/net/ \ keepalived/Attic directory contains no directories, so with a ".." entry it should have a link count of 2. fsdb (inum: 825575)> uplink fsdb: `uplink' requires write access fsdb: rval was 1 fsdb (inum: 825575)> To Do Later With Fsdb ( without -r ) uplink uplink ln 79302719 .. mount /usr1 ; mount | grep /usr1 /dev/ad4s4g on /usr1 (ufs, local, read-only) cd /usr1 ; find -x . -inum 825575 ./ftp/.backup/pri/FreeBSD/development/FreeBSD-CVS/\ ports/net/keepalived/Attic cd ftp/.backup/pri/FreeBSD/development/FreeBSD-CVS/ports/net/keepalived ls -lai total 50 79302719 drwxr-xr-x 3 mailnull mailnull 512 Dec 29 20:09 ./ 73700289 drwxr-xr-x 2109 mailnull mailnull 43520 Dec 29 20:08 ../ 825575 drwxr-xr-x 2 mailnull mailnull 512 Dec 29 22:12 Attic/ 79302722 drwxr-xr-x 3 mailnull mailnull 512 Dec 29 22:12 files/ fsdb (inum: 825575)> inode 79302719 current inode: directory I=79302719 MODE=40755 SIZE=512 OWNER=mailnull GRP=mailnull LINKCNT=3 FLAGS=0 BLKCNT=4 GEN=293a450d fsdb (inum: 79302719)> ls slot 0 ino 79302719 reclen 12: directory, `.' slot 1 ino 73700289 reclen 12: directory, `..' slot 2 ino 79302722 reclen 16: directory, `files' slot 3 ino 825575 reclen 472: directory, `Attic' --------- Cheers, Julian -- Julian Stacey, BSD Unix Linux C Sys Eng Consultants Munich http://berklix.com Reply below not above, cumulative like a play script, & indent with "> ". Format: Plain text. Not HTML, multipart/alternative, base64, quoted-printable. So, to refine my earlier suggestion: cd /usr1/./ftp/.backup/pri/FreeBSD/development/FreeBSD-CVS/ports/net/\ keepalived/Attic ls -fa mv Makefile,v Makefile,v.sav ls -fa cd / umount /dev/ad4s4g fsck /dev/ad4s4g mount /usr1 cd /usr1/./ftp/.backup/pri/FreeBSD/development/FreeBSD-CVS/ports/net/\ keepalived/Attic ls -fa mv Makefile,v.sav Makefile,v ls -fa You can use `ls -fa' to see the order of everything in the directory. Kirk McKusick From owner-freebsd-fs@FreeBSD.ORG Tue Jan 10 22:56:58 2012 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9AC3F106564A for ; Tue, 10 Jan 2012 22:56:58 +0000 (UTC) (envelope-from mi+thun@aldan.algebra.com) Received: from smtp02.lnh.mail.rcn.net (smtp02.lnh.mail.rcn.net [207.172.157.102]) by mx1.freebsd.org (Postfix) with ESMTP id D9C8A8FC12 for ; Tue, 10 Jan 2012 22:56:56 +0000 (UTC) Received: from mr16.lnh.mail.rcn.net ([207.172.157.36]) by smtp02.lnh.mail.rcn.net with ESMTP; 10 Jan 2012 17:28:10 -0500 Received: from smtp01.lnh.mail.rcn.net (smtp01.lnh.mail.rcn.net [207.172.4.11]) by mr16.lnh.mail.rcn.net (MOS 4.3.4-GA) with ESMTP id BNF51300; Tue, 10 Jan 2012 17:28:10 -0500 Received-SPF: None identity=pra; client-ip=209.6.61.133; receiver=smtp01.lnh.mail.rcn.net; envelope-from="mi+thun@aldan.algebra.com"; x-sender="mi+thun@aldan.algebra.com"; x-conformance=sidf_compatible Received-SPF: None identity=mailfrom; client-ip=209.6.61.133; receiver=smtp01.lnh.mail.rcn.net; envelope-from="mi+thun@aldan.algebra.com"; x-sender="mi+thun@aldan.algebra.com"; x-conformance=sidf_compatible Received-SPF: None identity=helo; client-ip=209.6.61.133; receiver=smtp01.lnh.mail.rcn.net; envelope-from="mi+thun@aldan.algebra.com"; x-sender="postmaster@utka.zajac"; x-conformance=sidf_compatible X-Auth-ID: anat Received: from 209-6-61-133.c3-0.sbo-ubr1.sbo.ma.cable.rcn.com (HELO utka.zajac) ([209.6.61.133]) by smtp01.lnh.mail.rcn.net with ESMTP; 10 Jan 2012 17:28:10 -0500 Message-ID: <4F0CBB79.7010704@aldan.algebra.com> Date: Tue, 10 Jan 2012 17:28:09 -0500 From: "Mikhail T." User-Agent: Mozilla/5.0 (X11; FreeBSD i386; rv:7.0.1) Gecko/20111013 Thunderbird/7.0.1 MIME-Version: 1.0 To: fs@freebsd.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Subject: How to best send files over network? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 Jan 2012 22:56:58 -0000 Hello! I'm trying to implement a BSD-tuned sending of files. Most of the files are fairly large. I thought, sendfile(2) is the way to go, but, it seems, there are severe performance problems at the moment for the files located on a ZFS. mmap-ing the files and then write-ing them is claimed to be a lot faster. Could somebody comment on this? Should one always mmap/write, or are there situations, when sendfile is advantageous? If, indeed, sendfile is best for UFS, but mmap/write is better over ZFS, what is the best way to determine the underlying FS for each file? statfs(2) is supposed to answer that question -- what should I look for in the struct statfs, that it will return? Do I check, if f_fsid contains a magic number for ZFS, or look for a magic string in f_fstypename? Could someone provide an example? Thank you very much. Yours, -mi From owner-freebsd-fs@FreeBSD.ORG Wed Jan 11 00:01:47 2012 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B24D5106566B for ; Wed, 11 Jan 2012 00:01:47 +0000 (UTC) (envelope-from bfriesen@simple.dallas.tx.us) Received: from blade.simplesystems.org (blade.simplesystems.org [65.66.246.74]) by mx1.freebsd.org (Postfix) with ESMTP id 64A5F8FC0A for ; Wed, 11 Jan 2012 00:01:47 +0000 (UTC) Received: from freddy.simplesystems.org (freddy.simplesystems.org [65.66.246.65]) by blade.simplesystems.org (8.14.4+Sun/8.14.4) with ESMTP id q0ANn1gO022128; Tue, 10 Jan 2012 17:49:01 -0600 (CST) Date: Tue, 10 Jan 2012 17:49:01 -0600 (CST) From: Bob Friesenhahn X-X-Sender: bfriesen@freddy.simplesystems.org To: "Mikhail T." In-Reply-To: <4F0CBB79.7010704@aldan.algebra.com> Message-ID: References: <4F0CBB79.7010704@aldan.algebra.com> User-Agent: Alpine 2.01 (GSO 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (blade.simplesystems.org [65.66.246.90]); Tue, 10 Jan 2012 17:49:01 -0600 (CST) Cc: fs@freebsd.org Subject: Re: How to best send files over network? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 Jan 2012 00:01:47 -0000 On Tue, 10 Jan 2012, Mikhail T. wrote: > I'm trying to implement a BSD-tuned sending of files. Most of the files are > fairly large. I thought, sendfile(2) is the way to go, but, it seems, there > are severe performance problems at the moment for the files located on a ZFS. > > mmap-ing the files and then write-ing them is claimed to be a lot faster. > Could somebody comment on this? Should one always mmap/write, or are there > situations, when sendfile is advantageous? Don't use mmap on zfs since doing so wastes memory (zfs ARC is not coherent with mmap page cache). Instead do normal file I/O (e.g. write, fwrite) using the filesystem blocksize (e.g. 128K) or a small multiple thereof. Of course if you are doing this over the network then you will need a program on the receiving end to write the data, and a program on the sending end to send it. It is useful to cache several blocks on the receiving end and use a thread to receive data from the network in case zfs temporarily stalls during write (which it periodically does). > If, indeed, sendfile is best for UFS, but mmap/write is better over ZFS, what > is the best way to determine the underlying FS for each file? statfs(2) is > supposed to answer that question -- what should I look for in the struct > statfs, that it will return? Do I check, if f_fsid contains a magic number > for ZFS, or look for a magic string in f_fstypename? Could someone provide an > example? Use it to obtain the filesystem block size. > Thank you very much. Yours, Good to hear from you! Bob -- Bob Friesenhahn bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From owner-freebsd-fs@FreeBSD.ORG Wed Jan 11 07:05:19 2012 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 03EF11065670 for ; Wed, 11 Jan 2012 07:05:19 +0000 (UTC) (envelope-from mi+thun@aldan.algebra.com) Received: from vms173017pub.verizon.net (vms173017pub.verizon.net [206.46.173.17]) by mx1.freebsd.org (Postfix) with ESMTP id D31AA8FC15 for ; Wed, 11 Jan 2012 07:05:18 +0000 (UTC) Received: from [192.168.1.8] ([unknown] [96.242.210.31]) by vms173017.mailsrvcs.net (Sun Java(tm) System Messaging Server 7u2-7.02 32bit (built Apr 16 2009)) with ESMTPA id <0LXM001C1E8B2AK0@vms173017.mailsrvcs.net> for fs@freebsd.org; Wed, 11 Jan 2012 00:05:04 -0600 (CST) Message-id: <4F0D268B.9060908@aldan.algebra.com> Date: Wed, 11 Jan 2012 01:04:59 -0500 From: "Mikhail T." User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:6.0.2) Gecko/20110926 Thunderbird/6.0.2 MIME-version: 1.0 To: Bob Friesenhahn References: <4F0CBB79.7010704@aldan.algebra.com> In-reply-to: Content-type: text/plain; charset=ISO-8859-1; format=flowed Content-transfer-encoding: 7bit X-Mailman-Approved-At: Wed, 11 Jan 2012 11:56:57 +0000 Cc: fs@freebsd.org Subject: Re: How to best send files over network? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 Jan 2012 07:05:19 -0000 On 10.01.2012 18:49, Bob Friesenhahn wrote: > Don't use mmap on zfs since doing so wastes memory (zfs ARC is not > coherent with mmap page cache). Instead do normal file I/O (e.g. > write, fwrite) using the filesystem blocksize (e.g. 128K) or a small > multiple thereof. Well, that was the reason cited for not using sendfile over ZFS. But mmap/write, it was claimed, was efficient. What's the general opinion of using mmap/write, when the file is on UFS? Is it just as good as sendfile, or can sendfile be better under some circumstances? > It is useful to cache several blocks on the receiving end and use a > thread to receive data from the network in case zfs temporarily stalls > during write (which it periodically does). No, thanks. I'm certainly not doing a read/write loop -- that's just too disgusting in the age of better interfaces (even if those aren't well implemented yet) :-) > >> If, indeed, sendfile is best for UFS, but mmap/write is better over >> ZFS, what is the best way to determine the underlying FS for each >> file? statfs(2) is supposed to answer that question -- what should I >> look for in the struct statfs, that it will return? Do I check, if >> f_fsid contains a magic number for ZFS, or look for a magic string in >> f_fstypename? Could someone provide an example? > > Use it to obtain the filesystem block size. Ok, but still -- how does one determine the filesystem, where a particular file resides? Thanks! -mi From owner-freebsd-fs@FreeBSD.ORG Wed Jan 11 15:01:36 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 90F961065672 for ; Wed, 11 Jan 2012 15:01:36 +0000 (UTC) (envelope-from phoemix@harmless.hu) Received: from marvin.harmless.hu (marvin.harmless.hu [195.56.55.204]) by mx1.freebsd.org (Postfix) with ESMTP id 55E1A8FC14 for ; Wed, 11 Jan 2012 15:01:36 +0000 (UTC) Received: from gprs4f7a1e2d.pool.t-umts.hu ([79.122.30.45] helo=unknown) by marvin.harmless.hu with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.75 (FreeBSD)) (envelope-from ) id 1RkzSZ-000K8a-6U for freebsd-fs@freebsd.org; Wed, 11 Jan 2012 15:47:28 +0100 Date: Wed, 11 Jan 2012 15:47:22 +0100 From: Gergely CZUCZY To: freebsd-fs@freebsd.org Message-ID: <20120111154722.000036e4@unknown> Organization: Harmless Digital X-Mailer: Claws Mail 3.7.6 (GTK+ 2.16.0; i586-pc-mingw32msvc) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Subject: Unplugging disk under ZFS yield panic X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 Jan 2012 15:01:36 -0000 Dear List, I'd like to ask, whether it is normal behaviour when we're unplugging a disk under a ZFS system then on the first write a kernel panic happened. The hardware is a supermicro X8DTH-i/6/iF/6F board with 2x LSI 2008 fusion MPT SAS-2 controllers, over the mps(4) driver. The disks are accessed over gmultipath, and the multipath'd devices are added to a ZFS mirror: DB mirror-0 multipath/DB01 multipath/DB02 mirror-1 multipath/DB03 multipath/DB04 logs mirror/host1p5 cache multipath/SSD03p1 spares multipath/DB05 System is 9.0-RELEASE I've unplugged DB03 and on the first write we got a kernel panic. Should this be normal behaviour or we're missing something here? On a device removal we're expecting it to moving to the spare disk, or using the available redundant disks. Best regards, Gergely From owner-freebsd-fs@FreeBSD.ORG Wed Jan 11 15:42:09 2012 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 276B9106566B for ; Wed, 11 Jan 2012 15:42:09 +0000 (UTC) (envelope-from bfriesen@simple.dallas.tx.us) Received: from blade.simplesystems.org (blade.simplesystems.org [65.66.246.74]) by mx1.freebsd.org (Postfix) with ESMTP id DD7A88FC12 for ; Wed, 11 Jan 2012 15:42:08 +0000 (UTC) Received: from freddy.simplesystems.org (freddy.simplesystems.org [65.66.246.65]) by blade.simplesystems.org (8.14.4+Sun/8.14.4) with ESMTP id q0BFftOq025961; Wed, 11 Jan 2012 09:41:55 -0600 (CST) Date: Wed, 11 Jan 2012 09:41:55 -0600 (CST) From: Bob Friesenhahn X-X-Sender: bfriesen@freddy.simplesystems.org To: "Mikhail T." In-Reply-To: <4F0D268B.9060908@aldan.algebra.com> Message-ID: References: <4F0CBB79.7010704@aldan.algebra.com> <4F0D268B.9060908@aldan.algebra.com> User-Agent: Alpine 2.01 (GSO 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (blade.simplesystems.org [65.66.246.90]); Wed, 11 Jan 2012 09:41:55 -0600 (CST) Cc: fs@freebsd.org Subject: Re: How to best send files over network? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 Jan 2012 15:42:09 -0000 On Wed, 11 Jan 2012, Mikhail T. wrote: > On 10.01.2012 18:49, Bob Friesenhahn wrote: >> Don't use mmap on zfs since doing so wastes memory (zfs ARC is not coherent >> with mmap page cache). Instead do normal file I/O (e.g. write, fwrite) >> using the filesystem blocksize (e.g. 128K) or a small multiple thereof. > Well, that was the reason cited for not using sendfile over ZFS. But > mmap/write, it was claimed, was efficient. You don't usually want to write files using mmap because it is hard to do so properly and because some operating systems are not very good about writing the dirty pages when they should. You should avoid using mmap on zfs in general because it will double the memory consumption and double the buffer copies. > No, thanks. I'm certainly not doing a read/write loop -- that's just too > disgusting in the age of better interfaces (even if those aren't well > implemented yet) :-) Yes, it seems antique but if you are looking for optimum performance that you can tune and have control over, this is the way to go. You could also try using async I/O. >> Use it to obtain the filesystem block size. > Ok, but still -- how does one determine the filesystem, where a particular > file resides? Thanks! I would tell you to use statvfs and the f_basetype member of the statvfs struct but it seems that FreeBSD only supports a very limited version of this structure. Regardless, if the filesystem is mounted via NFS or SMB then the system is likely to lie about the true underlying filesystem. Bob -- Bob Friesenhahn bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From owner-freebsd-fs@FreeBSD.ORG Wed Jan 11 16:01:02 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9F93A106566C for ; Wed, 11 Jan 2012 16:01:02 +0000 (UTC) (envelope-from joh.hendriks@gmail.com) Received: from mail-ee0-f54.google.com (mail-ee0-f54.google.com [74.125.83.54]) by mx1.freebsd.org (Postfix) with ESMTP id 307948FC0A for ; Wed, 11 Jan 2012 16:01:01 +0000 (UTC) Received: by eekd49 with SMTP id d49so426688eek.13 for ; Wed, 11 Jan 2012 08:01:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding; bh=C4jryCFFBTP288Z7qGm1qLyFUh5o36aihPW7FF18VUU=; b=VIU0r4sl2PlYo/WWr0Sr+tg9/5/CBu+k5FoIkAw9loIv3g8OMIVDDDkttd0xIbtHUz XZ2aTMDyOvudSn0+TxVrcO2a/Us2EcQPCxJEAIT2kDp6MUOYihLCVeSNWzGVH9inxUZG 9t7PM7WI8H8v+4/hNfF/t8PVzHGQ5sUI/Cv58= Received: by 10.14.9.228 with SMTP id 76mr1285311eet.18.1326297658332; Wed, 11 Jan 2012 08:00:58 -0800 (PST) Received: from [192.168.50.103] (double-l.xs4all.nl. [80.126.205.144]) by mx.google.com with ESMTPS id 15sm7285333eeu.1.2012.01.11.08.00.38 (version=SSLv3 cipher=OTHER); Wed, 11 Jan 2012 08:00:38 -0800 (PST) Message-ID: <4F0DB222.3020004@gmail.com> Date: Wed, 11 Jan 2012 17:00:34 +0100 From: Johan Hendriks User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:9.0) Gecko/20111222 Thunderbird/9.0.1 MIME-Version: 1.0 To: Gergely CZUCZY , freebsd-fs@freebsd.org References: <20120111154722.000036e4@unknown> In-Reply-To: <20120111154722.000036e4@unknown> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Subject: Re: Unplugging disk under ZFS yield panic X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 Jan 2012 16:01:02 -0000 Gergely CZUCZY schreef: > Dear List, {snip} > > On a device removal we're expecting it to moving to the spare disk, or > using the available redundant disks. > > Best regards, > Gergely > > It is never good when a system panics, so that is not good. But i can not help you with that. Be aware that the spares in FreeBSD are cold, and human intervention is needed to replace a drive. I think a lot of people assume that the spare drive is automaticly beeing replaced when a disk dies. This is not the case. I think that the zpool command should opt a warning when adding a spare to the pool that it is not hot. zpool should not accept the spare without any warning to the user that it will not automaticly replace a dead drive. regards, Johan From owner-freebsd-fs@FreeBSD.ORG Wed Jan 11 18:29:25 2012 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7B04B106566B for ; Wed, 11 Jan 2012 18:29:25 +0000 (UTC) (envelope-from bfriesen@simple.dallas.tx.us) Received: from blade.simplesystems.org (blade.simplesystems.org [65.66.246.74]) by mx1.freebsd.org (Postfix) with ESMTP id 417FA8FC0A for ; Wed, 11 Jan 2012 18:29:24 +0000 (UTC) Received: from freddy.simplesystems.org (freddy.simplesystems.org [65.66.246.65]) by blade.simplesystems.org (8.14.4+Sun/8.14.4) with ESMTP id q0BITAGV026875; Wed, 11 Jan 2012 12:29:10 -0600 (CST) Date: Wed, 11 Jan 2012 12:29:10 -0600 (CST) From: Bob Friesenhahn X-X-Sender: bfriesen@freddy.simplesystems.org To: Rick Macklem In-Reply-To: <1235110182.47136.1326298038118.JavaMail.root@erie.cs.uoguelph.ca> Message-ID: References: <1235110182.47136.1326298038118.JavaMail.root@erie.cs.uoguelph.ca> User-Agent: Alpine 2.01 (GSO 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (blade.simplesystems.org [65.66.246.90]); Wed, 11 Jan 2012 12:29:10 -0600 (CST) Cc: "Mikhail T." , fs@freebsd.org Subject: Re: How to best send files over network? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 Jan 2012 18:29:25 -0000 On Wed, 11 Jan 2012, Rick Macklem wrote: >> disgusting in the age of better interfaces (even if those aren't well >> implemented yet) :-) > > I think Bob was referring to the receive end and not the send end, which > might be why the answer didn't make sense to you? (For the receive end, > it sounds like a good suggestion to me.) Yes, I am definitely talking about the receive (or the writing end). The main advantage of using sendfile() seems to be to keep the file data from needing to transit a user-space program. This is a noble goal but circumstances have changed (i.e. bottlenecks have moved) since sendfile() was invented. If FreeBSD sendfile is using memory mapping in its implementation, then that is definitely bad for zfs. Besides the double-buffering, double-memcpy, and additional code context switching, added by mmap on top of zfs, there is also the issue that zfs block sizes are usually 128K whereas MMU pages are usually only 4K. If the kernel does its write I/O requests in units of 4K that is not ideal for zfs (but zfs can deal with it), but it is particularly bad if the file is accessed via NFS. > Well, I think you can strcmp("zfs", sb.fs_fstypename) == 0 to check for > a zfs file system. However, if the file system type name gets changed > to something like zfsv32 (just a hypothetical example), the above breaks. > (There is a case of this in the NFS server code, so hopefully it won't > break for a while.;-) I did not see that member mentioned in the manual page. Apparently it does exist anyway. Bob -- Bob Friesenhahn bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From owner-freebsd-fs@FreeBSD.ORG Wed Jan 11 18:50:36 2012 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 67E62106566B for ; Wed, 11 Jan 2012 18:50:36 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 1D6AF8FC17 for ; Wed, 11 Jan 2012 18:50:35 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ap0EAH7KDU+DaFvO/2dsb2JhbABChQ+oeoFyAQEFI1YbGAICDXQGrX+RY4Evim4EiDqMUpJa X-IronPort-AV: E=Sophos;i="4.71,493,1320642000"; d="scan'208";a="154569951" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 11 Jan 2012 11:07:18 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 205D3B3EB2; Wed, 11 Jan 2012 11:07:18 -0500 (EST) Date: Wed, 11 Jan 2012 11:07:18 -0500 (EST) From: Rick Macklem To: "Mikhail T." Message-ID: <1235110182.47136.1326298038118.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <4F0D268B.9060908@aldan.algebra.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.201] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: fs@freebsd.org Subject: Re: How to best send files over network? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 Jan 2012 18:50:36 -0000 Mikhail T. wrote: > On 10.01.2012 18:49, Bob Friesenhahn wrote: > > Don't use mmap on zfs since doing so wastes memory (zfs ARC is not > > coherent with mmap page cache). Instead do normal file I/O (e.g. > > write, fwrite) using the filesystem blocksize (e.g. 128K) or a small > > multiple thereof. > Well, that was the reason cited for not using sendfile over ZFS. But > mmap/write, it was claimed, was efficient. > > What's the general opinion of using mmap/write, when the file is on > UFS? > Is it just as good as sendfile, or can sendfile be better under some > circumstances? > > > It is useful to cache several blocks on the receiving end and use a > > thread to receive data from the network in case zfs temporarily > > stalls > > during write (which it periodically does). > No, thanks. I'm certainly not doing a read/write loop -- that's just > too > disgusting in the age of better interfaces (even if those aren't well > implemented yet) :-) I think Bob was referring to the receive end and not the send end, which might be why the answer didn't make sense to you? (For the receive end, it sounds like a good suggestion to me.) > > > >> If, indeed, sendfile is best for UFS, but mmap/write is better over > >> ZFS, what is the best way to determine the underlying FS for each > >> file? statfs(2) is supposed to answer that question -- what should > >> I > >> look for in the struct statfs, that it will return? Do I check, if > >> f_fsid contains a magic number for ZFS, or look for a magic string > >> in > >> f_fstypename? Could someone provide an example? > > > > Use it to obtain the filesystem block size. > Ok, but still -- how does one determine the filesystem, where a > particular file resides? Thanks! > Well, I think you can strcmp("zfs", sb.fs_fstypename) == 0 to check for a zfs file system. However, if the file system type name gets changed to something like zfsv32 (just a hypothetical example), the above breaks. (There is a case of this in the NFS server code, so hopefully it won't break for a while.;-) rick From owner-freebsd-fs@FreeBSD.ORG Wed Jan 11 18:51:01 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8A58910656D0 for ; Wed, 11 Jan 2012 18:51:01 +0000 (UTC) (envelope-from kraduk@gmail.com) Received: from mail-gy0-f182.google.com (mail-gy0-f182.google.com [209.85.160.182]) by mx1.freebsd.org (Postfix) with ESMTP id 477A48FC20 for ; Wed, 11 Jan 2012 18:51:01 +0000 (UTC) Received: by ghrr16 with SMTP id r16so555988ghr.13 for ; Wed, 11 Jan 2012 10:51:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=dcJt3C+DNhnktmEZA0GGXHFf0vGY11VUFc5hB1MKPpc=; b=E9KzMmak0pdclnuFfOcGzpERkwpE/8hpn0gEbNnQauzfcPMalG9brP5dXerbu9Y6NE BK/BpbQqDsVjENsvoQHoQ8yrz8V7O7tbGYayIcKmox9FREc1FZFBHktCfZiOeTMooRNz KF/cr7WQDUhhtgXVZP9n29zmGe317H3AmXvkE= MIME-Version: 1.0 Received: by 10.236.91.84 with SMTP id g60mr50231yhf.90.1326307860645; Wed, 11 Jan 2012 10:51:00 -0800 (PST) Received: by 10.236.139.193 with HTTP; Wed, 11 Jan 2012 10:51:00 -0800 (PST) Received: by 10.236.139.193 with HTTP; Wed, 11 Jan 2012 10:51:00 -0800 (PST) In-Reply-To: <20120110181827.GA7601@server-king.de> References: <20120110181827.GA7601@server-king.de> Date: Wed, 11 Jan 2012 18:51:00 +0000 Message-ID: From: krad To: Frank Bartels Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-fs@freebsd.org Subject: Re: Mounting from zfs:zroot failed with error 6. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 Jan 2012 18:51:01 -0000 Grab the binary versions of the bootblocks from people.FreeBSD.org/~pjd/zfsboot they generally work for me On Jan 10, 2012 6:18 PM, "Frank Bartels" wrote: > Hi, > > I use FreeBSD 9.0-RELEASE. I boot from USB stick (16 GB) using > gptzfsboot. Everything works fine. > > In the past weeks I've read a lot about the 4 KB alignment and > ashift=12. But at the time I've created the stick the first time I > had no idea about all these problems. > > I still have no clue what the physical sector size of my USB stick > is (more than 4 KB for sure, maybe 128 KB or 256 KB?) and why gnop > does not allow more than 8 KB, but that's another story. > > This is my old stick, created the wrong way: > > bkool:/root# gpart show da1 > => 34 30883773 da1 GPT (14G) > 34 128 1 freebsd-boot (64k) > 162 8019968 2 freebsd-swap (3.8G) > 8020130 22863677 3 freebsd-zfs (10G) > > And this is the new one, created the right way: > > bkool:/root# gpart show da0 > => 34 30871485 da0 GPT (14G) > 34 6 - free - (3.0k) > 40 128 1 freebsd-boot (64k) > 168 1880 - free - (940k) > 2048 8388608 2 freebsd-swap (4.0G) > 8390656 22480863 3 freebsd-zfs (10G) > > So my plan is to create a new zpool with altroot= and cachefile=, > rsync the old stick to the new one, copy over the cachefile, double > check loader.conf vfs.root.mountfrom= and zfs bootfs=. This worked > several times before in other environments. But here it does not. > > If I try to boot, the new stick runs gptzfsboot, loads the kernel > and fails with "Mounting from zfs:bkool9 failed with error 6.". > > In the past (maybe it was with 8.2-RELEASE) I had the same problem. > The solution was to type "zfs:zroot" (*same* value from loader.conf) > at the prompt and it worked. But this time this "trick" does not > help. > > I've seen this error message several times now. I tried to create > the new zpool with ashift=12 (gnop -S 4K) and ashift=9. I double > checked gpart bootcode. kernel is GENERIC with DDB enabled, no other > changes. But it won't mount root. > > The interesting part is, if I boot from the old stick and inspect > the new one, I see this: > > bkool:/root# zpool import > pool: bkool9 > id: 2362879880167335458 > state: FAULTED > status: One or more devices contains corrupted data. > action: The pool cannot be imported due to damaged devices or data. > The pool may be active on another system, but can be imported using > the '-f' flag. > see: http://www.sun.com/msg/ZFS-8000-5E > config: > > bkool9 FAULTED corrupted data > 5856161150314576551 UNAVAIL corrupted data > > But, if I tell zpool to look for devices in /dev, I get this: > > bkool:/root# zpool import -d /dev > pool: bkool9 > id: 2362879880167335458 > state: ONLINE > action: The pool can be imported using its name or numeric identifier. > config: > > bkool9 ONLINE > gpt/bkool9-disk0 ONLINE > > What's going on here? > > Why is the kernel not able to find the zpool during boot (after > loading the loader and kernel from it!) even if "/dev/gpt/bkool9-disk0" > is part of zpool.cache? > > And why is looking for devices in /dev or /dev/gpt not default if > I use zpool import? The manual page says "/dev/dsk" is default, but > does this not only make sense under Solaris? > > And what does "5856161150314576551" mean? Where does the information > about the pool bkool9 come from if the only vdev is UNAVAIL and > there are no hints in /boot/zfs/zpool.cache? > > Thanks, > Frank > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > From owner-freebsd-fs@FreeBSD.ORG Wed Jan 11 19:21:58 2012 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B66731065672 for ; Wed, 11 Jan 2012 19:21:58 +0000 (UTC) (envelope-from mdf356@gmail.com) Received: from mail-pw0-f54.google.com (mail-pw0-f54.google.com [209.85.160.54]) by mx1.freebsd.org (Postfix) with ESMTP id 8F3128FC19 for ; Wed, 11 Jan 2012 19:21:58 +0000 (UTC) Received: by pbcc3 with SMTP id c3so913679pbc.13 for ; Wed, 11 Jan 2012 11:21:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=GTeJzHUzSAWswDO7GfAXuscv+BdwTIHY9ti0lLEUWMo=; b=Wc451LhnHKRw+qdFbaWM34ElnwVYTqchRnj6JwVSZlcV8oftoFIlxlFm5CpVH7bVOm Nk34AhPLaWYEDl3XrfF0pNIwFSZnXIlmfCrePGL6iu+IImbETDZu3ciajLE5EugkNCvl Un1eyfe98MlYL4JeD8n5vj4jpzWYoPLSXRkWk= MIME-Version: 1.0 Received: by 10.68.213.33 with SMTP id np1mr279865pbc.107.1326308197150; Wed, 11 Jan 2012 10:56:37 -0800 (PST) Sender: mdf356@gmail.com Received: by 10.68.208.167 with HTTP; Wed, 11 Jan 2012 10:56:37 -0800 (PST) In-Reply-To: <1235110182.47136.1326298038118.JavaMail.root@erie.cs.uoguelph.ca> References: <4F0D268B.9060908@aldan.algebra.com> <1235110182.47136.1326298038118.JavaMail.root@erie.cs.uoguelph.ca> Date: Wed, 11 Jan 2012 10:56:37 -0800 X-Google-Sender-Auth: jto9-7IGTklSN903YI2WFJOYLZ0 Message-ID: From: mdf@FreeBSD.org To: Rick Macklem Content-Type: text/plain; charset=ISO-8859-1 Cc: "Mikhail T." , fs@freebsd.org Subject: Re: How to best send files over network? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 Jan 2012 19:21:58 -0000 On Wed, Jan 11, 2012 at 8:07 AM, Rick Macklem wrote: > Mikhail T. wrote: >> On 10.01.2012 18:49, Bob Friesenhahn wrote: >> > Don't use mmap on zfs since doing so wastes memory (zfs ARC is not >> > coherent with mmap page cache). Instead do normal file I/O (e.g. >> > write, fwrite) using the filesystem blocksize (e.g. 128K) or a small >> > multiple thereof. >> Well, that was the reason cited for not using sendfile over ZFS. But >> mmap/write, it was claimed, was efficient. >> >> What's the general opinion of using mmap/write, when the file is on >> UFS? >> Is it just as good as sendfile, or can sendfile be better under some >> circumstances? >> >> > It is useful to cache several blocks on the receiving end and use a >> > thread to receive data from the network in case zfs temporarily >> > stalls >> > during write (which it periodically does). >> No, thanks. I'm certainly not doing a read/write loop -- that's just >> too >> disgusting in the age of better interfaces (even if those aren't well >> implemented yet) :-) > > I think Bob was referring to the receive end and not the send end, which > might be why the answer didn't make sense to you? (For the receive end, > it sounds like a good suggestion to me.) Some day, when there's time, I'd like to roll up Isilon's recvfile(2) into a proper patch. It's the obvious analogue to sendfile(2). I think it also requires two new VOPs, for efficiency, VOP_READ_MBUF (for sendfile) and VOP_WRITE_MBUF, which use an mbuf chain rather than uio to do their work. Perhaps at BSDCan, if this is my priority at the time. Or, if someone else wants to work on it I can provide patches that will require work. Thanks, matthew From owner-freebsd-fs@FreeBSD.ORG Wed Jan 11 19:30:02 2012 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 02A29106566B for ; Wed, 11 Jan 2012 19:30:01 +0000 (UTC) (envelope-from mi+thun@aldan.algebra.com) Received: from smtp02.lnh.mail.rcn.net (smtp02.lnh.mail.rcn.net [207.172.157.102]) by mx1.freebsd.org (Postfix) with ESMTP id 67C138FC08 for ; Wed, 11 Jan 2012 19:30:01 +0000 (UTC) Received: from mr16.lnh.mail.rcn.net ([207.172.157.36]) by smtp02.lnh.mail.rcn.net with ESMTP; 11 Jan 2012 14:29:57 -0500 Received: from smtp01.lnh.mail.rcn.net (smtp01.lnh.mail.rcn.net [207.172.4.11]) by mr16.lnh.mail.rcn.net (MOS 4.3.4-GA) with ESMTP id BNG69753; Wed, 11 Jan 2012 14:29:57 -0500 Received-SPF: None identity=pra; client-ip=209.6.61.133; receiver=smtp01.lnh.mail.rcn.net; envelope-from="mi+thun@aldan.algebra.com"; x-sender="mi+thun@aldan.algebra.com"; x-conformance=sidf_compatible Received-SPF: None identity=mailfrom; client-ip=209.6.61.133; receiver=smtp01.lnh.mail.rcn.net; envelope-from="mi+thun@aldan.algebra.com"; x-sender="mi+thun@aldan.algebra.com"; x-conformance=sidf_compatible Received-SPF: None identity=helo; client-ip=209.6.61.133; receiver=smtp01.lnh.mail.rcn.net; envelope-from="mi+thun@aldan.algebra.com"; x-sender="postmaster@utka.zajac"; x-conformance=sidf_compatible X-Auth-ID: anat Received: from 209-6-61-133.c3-0.sbo-ubr1.sbo.ma.cable.rcn.com (HELO utka.zajac) ([209.6.61.133]) by smtp01.lnh.mail.rcn.net with ESMTP; 11 Jan 2012 14:29:56 -0500 Message-ID: <4F0DE332.5040804@aldan.algebra.com> Date: Wed, 11 Jan 2012 14:29:54 -0500 From: "Mikhail T." User-Agent: Mozilla/5.0 (X11; FreeBSD i386; rv:7.0.1) Gecko/20111013 Thunderbird/7.0.1 MIME-Version: 1.0 To: Bob Friesenhahn References: <1235110182.47136.1326298038118.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: X-Mailman-Approved-At: Wed, 11 Jan 2012 20:03:26 +0000 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: fs@freebsd.org Subject: Re: How to best send files over network? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 Jan 2012 19:30:02 -0000 On 11.01.2012 13:29, Bob Friesenhahn wrote: > Yes, I am definitely talking about the receive (or the writing end). Well, this topic is of no concern to me right now -- I'm sending to clients already written (and embedded in firmwares), hence the explicit subject line :-) > The main advantage of using sendfile() seems to be to keep the file data from > needing to transit a user-space program. This is a noble goal but > circumstances have changed (i.e. bottlenecks have moved) since sendfile() was > invented. Even if other things involved in sending a file cost more, memory copying is still not free... > If FreeBSD sendfile is using memory mapping in its implementation, then that > is definitely bad for zfs. In September K. Macy claimed on this mailing list, that sending from mmap is over twice faster than sendfile, if the file is on ZFS: http://freebsd.1045724.n5.nabble.com/ZFS-lighttpd2-sendfile-too-high-IO-td4793886.html Bob replied to that, actually, so he must remember the discussion :) Though I find this sorry state of sendfile() rather disheartening, reading the above thread got me thinking about using a different method for different filesystems. But if mmap/write is never (i.e. for both UFS and ZFS) any worse than sendfile, then I can simply always use mmap in all cases. And this is, what I wanted to discuss -- I thought, in some cases, sendfile can arrange for data to go from the disk controller to the network card directly... Yours, -mi From owner-freebsd-fs@FreeBSD.ORG Wed Jan 11 20:32:50 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B0EA9106566C for ; Wed, 11 Jan 2012 20:32:50 +0000 (UTC) (envelope-from freebsd-listen@fabiankeil.de) Received: from smtprelay04.ispgateway.de (smtprelay04.ispgateway.de [80.67.29.8]) by mx1.freebsd.org (Postfix) with ESMTP id 703C28FC0C for ; Wed, 11 Jan 2012 20:32:50 +0000 (UTC) Received: from [109.46.180.43] (helo=fabiankeil.de) by smtprelay04.ispgateway.de with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.68) (envelope-from ) id 1Rl4fV-0004UZ-8Y; Wed, 11 Jan 2012 21:21:09 +0100 Date: Wed, 11 Jan 2012 21:07:08 +0100 From: Fabian Keil To: Gergely CZUCZY Message-ID: <20120111210708.1168781e@fabiankeil.de> In-Reply-To: <20120111154722.000036e4@unknown> References: <20120111154722.000036e4@unknown> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=PGP-SHA1; boundary="Sig_/sme5Aas0O1FB6Uj1.oz3E5l"; protocol="application/pgp-signature" X-Df-Sender: Nzc1MDY3 Cc: freebsd-fs@freebsd.org Subject: Re: Unplugging disk under ZFS yield panic X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: freebsd-fs@freebsd.org List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 Jan 2012 20:32:50 -0000 --Sig_/sme5Aas0O1FB6Uj1.oz3E5l Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable Gergely CZUCZY wrote: > I'd like to ask, whether it is normal behaviour when we're unplugging a > disk under a ZFS system then on the first write a kernel panic happened. Sounds familiar. I currently have two PRs open for reproducible kernel panics after a vdev gets lost: http://www.freebsd.org/cgi/query-pr.cgi?pr=3Dkern/162010 http://www.freebsd.org/cgi/query-pr.cgi?pr=3Dkern/162036 Note that the pool layouts are different, though. > The hardware is a supermicro X8DTH-i/6/iF/6F board with 2x LSI 2008 > fusion MPT SAS-2 controllers, over the mps(4) driver. The disks are > accessed over gmultipath, and the multipath'd devices are added to a > ZFS mirror: > DB > mirror-0 > multipath/DB01 > multipath/DB02 > mirror-1 > multipath/DB03 > multipath/DB04 > logs > mirror/host1p5 > cache > multipath/SSD03p1 > spares > multipath/DB05 >=20 > System is 9.0-RELEASE >=20 > I've unplugged DB03 and on the first write we got a kernel panic. > Should this be normal behaviour or we're missing something here? Without a back trace or at least the panic reason one can only speculate what's going on, but I think it's rather unlikely that the panic is the intended behaviour and not a bug. Maybe you can gather some additional information and file a PR? > On a device removal we're expecting it to moving to the spare disk, or > using the available redundant disks. I agree that this behaviour would be preferable to a panic. Fabian --Sig_/sme5Aas0O1FB6Uj1.oz3E5l Content-Type: application/pgp-signature; name=signature.asc Content-Disposition: attachment; filename=signature.asc -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (FreeBSD) iEYEARECAAYFAk8N6/EACgkQBYqIVf93VJ3sRwCglgpytX4TKPOWNIHnfCpJgyLT bIYAoJj6AT5p6k2Y06x8gn2HLa7g1Z+F =4h+l -----END PGP SIGNATURE----- --Sig_/sme5Aas0O1FB6Uj1.oz3E5l-- From owner-freebsd-fs@FreeBSD.ORG Wed Jan 11 20:40:43 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C4B28106566B for ; Wed, 11 Jan 2012 20:40:43 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta03.emeryville.ca.mail.comcast.net (qmta03.emeryville.ca.mail.comcast.net [76.96.30.32]) by mx1.freebsd.org (Postfix) with ESMTP id AB5778FC0C for ; Wed, 11 Jan 2012 20:40:43 +0000 (UTC) Received: from omta16.emeryville.ca.mail.comcast.net ([76.96.30.72]) by qmta03.emeryville.ca.mail.comcast.net with comcast id LJzw1i0031ZMdJ4A3Lgj0D; Wed, 11 Jan 2012 20:40:43 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta16.emeryville.ca.mail.comcast.net with comcast id LLgh1i01P1t3BNj8cLgiY4; Wed, 11 Jan 2012 20:40:42 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 8B371102C1E; Wed, 11 Jan 2012 12:40:41 -0800 (PST) Date: Wed, 11 Jan 2012 12:40:41 -0800 From: Jeremy Chadwick To: freebsd-fs@freebsd.org Message-ID: <20120111204041.GA47175@icarus.home.lan> References: <20120111154722.000036e4@unknown> <20120111210708.1168781e@fabiankeil.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20120111210708.1168781e@fabiankeil.de> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: Subject: Re: Unplugging disk under ZFS yield panic X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 Jan 2012 20:40:43 -0000 On Wed, Jan 11, 2012 at 09:07:08PM +0100, Fabian Keil wrote: > Gergely CZUCZY wrote: > > > I'd like to ask, whether it is normal behaviour when we're unplugging a > > disk under a ZFS system then on the first write a kernel panic happened. > > Sounds familiar. I currently have two PRs open for > reproducible kernel panics after a vdev gets lost: > http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/162010 > http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/162036 > > Note that the pool layouts are different, though. Is this problem truly ZFS-specific? I'd been tracking this problem for years, and was told it was fixed: http://wiki.freebsd.org/BugBusting/Commonly_reported_issues * Panic occurs when a mounted device (USB, SATA, local image file, etc.) is removed Workaround: Be sure to umount all filesystems before removing the physical device Partial fix: Committed to CURRENT (8.0) on/prior to 2008/02/21 There is ongoing work to fully fix this problem, ETA 2009/02 OP, please provide a kernel backtrace. Otherwise, if needed, I can go yank one of the two mirrored disks out of my FreeBSD box at home to try and reproduce the problem. pool: data state: ONLINE scan: scrub repaired 0 in 1h17m with 0 errors on Thu Dec 29 12:05:05 2011 config: NAME STATE READ WRITE CKSUM data ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 ada1 ONLINE 0 0 0 ada3 ONLINE 0 0 0 cache ada4 ONLINE 0 0 0 ada1 at ahcich1 bus 0 scbus1 target 0 lun 0 ada1: ATA-8 SATA 3.x device ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada1: Command Queueing enabled ada1: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C) ada3 at ahcich3 bus 0 scbus3 target 0 lun 0 ada3: ATA-8 SATA 3.x device ada3: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada3: Command Queueing enabled ada3: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C) ahci0: port 0x1c50-0x1c57,0x1c44-0x1c47,0x1c48-0x1c4f,0x1c40-0x1c43,0x18e0-0x18ff mem 0xdc000800-0xdc000fff irq 17 at device 31.2 on pci0 ahci0: [ITHREAD] ahci0: AHCI v1.20 with 6 3Gbps ports, Port Multiplier supported ahcich1: at channel 1 on ahci0 ahcich1: [ITHREAD] ahcich3: at channel 3 on ahci0 ahcich3: [ITHREAD] > > The hardware is a supermicro X8DTH-i/6/iF/6F board with 2x LSI 2008 > > fusion MPT SAS-2 controllers, over the mps(4) driver. The disks are > > accessed over gmultipath, and the multipath'd devices are added to a > > ZFS mirror: > > DB > > mirror-0 > > multipath/DB01 > > multipath/DB02 > > mirror-1 > > multipath/DB03 > > multipath/DB04 > > logs > > mirror/host1p5 > > cache > > multipath/SSD03p1 > > spares > > multipath/DB05 > > > > System is 9.0-RELEASE > > > > I've unplugged DB03 and on the first write we got a kernel panic. > > Should this be normal behaviour or we're missing something here? > > Without a back trace or at least the panic reason one can only > speculate what's going on, but I think it's rather unlikely > that the panic is the intended behaviour and not a bug. > > Maybe you can gather some additional information and file a PR? > > > On a device removal we're expecting it to moving to the spare disk, or > > using the available redundant disks. > > I agree that this behaviour would be preferable to a panic. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Wed Jan 11 20:47:44 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 619BA106564A for ; Wed, 11 Jan 2012 20:47:44 +0000 (UTC) (envelope-from feld@feld.me) Received: from mwi1.coffeenet.org (unknown [IPv6:2607:f4e0:100:300::2]) by mx1.freebsd.org (Postfix) with ESMTP id 3BA6E8FC0C for ; Wed, 11 Jan 2012 20:47:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=feld.me; s=blargle; h=In-Reply-To:Message-Id:Date:From:Mime-Version:References:Subject:To:Content-Type; bh=HxMKxyCZa6JV0ZKYwvRsZNIO1OEpgW7Gt7XB0yzCRls=; b=dMqRmx5q9Xy/CEjIhRZAgsEkoBH3Ef0xiTfMLW/i5V8dEeb+WdNuHwtzP3ggIzRCtgcAJOAit0Htq0CkbELuKXXs71KuTkQrpS220pwfDufOvylY9H/f/vCsb3LqMYsm; Received: from localhost ([127.0.0.1] helo=mwi1.coffeenet.org) by mwi1.coffeenet.org with esmtp (Exim 4.77 (FreeBSD)) (envelope-from ) id 1Rl55D-000OuZ-6i for freebsd-fs@freebsd.org; Wed, 11 Jan 2012 14:47:43 -0600 Received: from feld@feld.me by mwi1.coffeenet.org (Archiveopteryx 3.1.4) with esmtpsa id 1326314856-88972-88971/5/17; Wed, 11 Jan 2012 20:47:36 +0000 Content-Type: text/plain; charset=utf-8; format=flowed; delsp=yes To: freebsd-fs@freebsd.org References: <20120111154722.000036e4@unknown> Mime-Version: 1.0 From: Mark Felder Date: Wed, 11 Jan 2012 14:47:36 -0600 Message-Id: In-Reply-To: <20120111154722.000036e4@unknown> User-Agent: Opera Mail/11.61 (FreeBSD) X-SA-Score: -1.0 Subject: Re: Unplugging disk under ZFS yield panic X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 Jan 2012 20:47:44 -0000 This is a known bug in gmultipath. It's 100% fixed in 9-STABLE; the patch just didn't make -RELEASE. gmultipath was near completely rewritten. I've tested it extensively and you gain both Active/Active ability as well as the fact that it never panics anymore. Regards, Mark From owner-freebsd-fs@FreeBSD.ORG Wed Jan 11 20:47:55 2012 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D3786106564A for ; Wed, 11 Jan 2012 20:47:55 +0000 (UTC) (envelope-from jhs@berklix.com) Received: from tower.berklix.org (tower.berklix.org [83.236.223.114]) by mx1.freebsd.org (Postfix) with ESMTP id 60F378FC15 for ; Wed, 11 Jan 2012 20:47:54 +0000 (UTC) Received: from mart.js.berklix.net (p5DCBF135.dip.t-dialin.net [93.203.241.53]) (authenticated bits=0) by tower.berklix.org (8.14.2/8.14.2) with ESMTP id q0BKlqfH043785; Wed, 11 Jan 2012 20:47:53 GMT (envelope-from jhs@berklix.com) Received: from fire.js.berklix.net (fire.js.berklix.net [192.168.91.41]) by mart.js.berklix.net (8.14.3/8.14.3) with ESMTP id q0BKlfsE048693; Wed, 11 Jan 2012 21:47:41 +0100 (CET) (envelope-from jhs@berklix.com) Received: from fire.js.berklix.net (localhost [127.0.0.1]) by fire.js.berklix.net (8.14.4/8.14.4) with ESMTP id q0BKlTW8042343; Wed, 11 Jan 2012 21:47:35 +0100 (CET) (envelope-from jhs@fire.js.berklix.net) Message-Id: <201201112047.q0BKlTW8042343@fire.js.berklix.net> To: Kirk McKusick From: "Julian H. Stacey" Organization: http://www.berklix.com BSD Unix Linux Consultancy, Munich Germany User-agent: EXMH on FreeBSD http://www.berklix.com/free/ X-URL: http://www.berklix.com In-reply-to: Your message "Tue, 10 Jan 2012 14:21:08 PST." <201201102221.q0AML8MX012837@chez.mckusick.com> Date: Wed, 11 Jan 2012 21:47:29 +0100 Sender: jhs@berklix.com Cc: fs@freebsd.org Subject: Re: unexpected soft update inconsistency - cannot fix X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 Jan 2012 20:47:56 -0000 Hi Kirk cc fs@, OK, it all worked as you said, Thanks for the explanations ! Maybe if I or others get some time it'd be worth tweaking the code: - It's suprising ".." has to be 2nd entry & fsck/fsdb can't find it later. - fsck &/or fsdb dont have to just give up on no parent inode, they could search down from root looking for a prent inode listing the damaged child inode, like I did with find, (Possibly first stab might be some nasty system("...") cludges to a printf as a hint to user). I'll save this mail thread in my personal tree in case I get time to hack code :-) Detail of the repair: cd /mnt/ftp/.backup/pri/FreeBSD/development/FreeBSD-CVS/ports/net/\ keepalived/Attic ls -fa ./ Makefile,v distinfo,v pkg-descr,v pkg-plist,v mv Makefile,v Makefile,v.sav ls -fa ./ distinfo,v pkg-descr,v pkg-plist,v Makefile,v.sav umount mnt fsdb -r /dev/ad4s4g # just looking inode 825575 current inode: directory I=825575 MODE=40755 SIZE=512 BTIME=Dec 29 20:09:19 2011 [0 nsec] MTIME=Jan 11 08:57:58 2012 [0 nsec] CTIME=Jan 11 08:57:58 2012 [0 nsec] ATIME=Jan 11 08:58:14 2012 [0 nsec] OWNER=mailnull GRP=mailnull LINKCNT=2 FLAGS=0 BLKCNT=4 \ GEN=29c28025 fsdb (inum: 825575)> ls slot 0 ino 825575 reclen 40: directory, `.' slot 1 ino 825581 reclen 20: regular, `distinfo,v' slot 2 ino 825582 reclen 20: regular, `pkg-descr,v' slot 3 ino 825583 reclen 20: regular, `pkg-plist,v' slot 4 ino 825580 reclen 412: regular, `Makefile,v.sav' ^D # end of looking fsck /dev/ad4s4g ** /dev/ad4s4g ** Last Mounted on /mnt ** Phase 1 - Check Blocks and Sizes ** Phase 2 - Check Pathnames MISSING '..' I=825575 OWNER=mailnull MODE=40755 SIZE=512 MTIME=Jan 11 08:57 2012 DIR=/ftp/.backup/pri/FreeBSD/development/FreeBSD-CVS/ports/\ net/keepalived/Attic FIX? [yn] y ** Phase 3 - Check Connectivity ** Phase 4 - Check Reference Counts LINK COUNT DIR I=79302719 OWNER=mailnull MODE=40755 SIZE=512 MTIME=Dec 29 20:09 2011 COUNT 3 SHOULD BE 4 ADJUST? [yn] y ** Phase 5 - Check Cyl groups 1360879 files, 126366566 used, 180989615 free (1437463 frags, \ 22444019 blocks, 0.5% fragmentation) ***** FILE SYSTEM IS CLEAN ***** ***** FILE SYSTEM WAS MODIFIED ***** fsck /dev/ad4s4g # No errs .. FILE SYSTEM IS CLEAN mount -t ufs /dev/ad4s4g /mnt cd /mnt/ftp/.backup/pri/FreeBSD/development/FreeBSD-CVS/ports/net/\ keepalived/Attic mv Makefile,v.sav Makefile,v Cheers, Julian -- Julian Stacey, BSD Unix Linux C Sys Eng Consultants Munich http://berklix.com Reply below not above, cumulative like a play script, & indent with "> ". Format: Plain text. Not HTML, multipart/alternative, base64, quoted-printable. From owner-freebsd-fs@FreeBSD.ORG Wed Jan 11 21:48:02 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BC9041065675 for ; Wed, 11 Jan 2012 21:48:02 +0000 (UTC) (envelope-from freebsd-listen@fabiankeil.de) Received: from smtprelay02.ispgateway.de (smtprelay02.ispgateway.de [80.67.31.29]) by mx1.freebsd.org (Postfix) with ESMTP id 7AD3F8FC0A for ; Wed, 11 Jan 2012 21:48:02 +0000 (UTC) Received: from [109.41.36.128] (helo=fabiankeil.de) by smtprelay02.ispgateway.de with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.68) (envelope-from ) id 1Rl61Y-0003IP-RN for freebsd-fs@freebsd.org; Wed, 11 Jan 2012 22:48:01 +0100 Date: Wed, 11 Jan 2012 22:43:40 +0100 From: Fabian Keil To: freebsd-fs@freebsd.org Message-ID: <20120111224340.72f28ef4@fabiankeil.de> In-Reply-To: <20120111204041.GA47175@icarus.home.lan> References: <20120111154722.000036e4@unknown> <20120111210708.1168781e@fabiankeil.de> <20120111204041.GA47175@icarus.home.lan> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=PGP-SHA1; boundary="Sig_/TY/An./0jB88xYb.h.J37YL"; protocol="application/pgp-signature" X-Df-Sender: Nzc1MDY3 Subject: Re: Unplugging disk under ZFS yield panic X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 Jan 2012 21:48:02 -0000 --Sig_/TY/An./0jB88xYb.h.J37YL Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable Jeremy Chadwick wrote: > On Wed, Jan 11, 2012 at 09:07:08PM +0100, Fabian Keil wrote: > > Gergely CZUCZY wrote: > >=20 > > > I'd like to ask, whether it is normal behaviour when we're unplugging= a > > > disk under a ZFS system then on the first write a kernel panic happen= ed. > >=20 > > Sounds familiar. I currently have two PRs open for > > reproducible kernel panics after a vdev gets lost: > > http://www.freebsd.org/cgi/query-pr.cgi?pr=3Dkern/162010 > > http://www.freebsd.org/cgi/query-pr.cgi?pr=3Dkern/162036 > >=20 > > Note that the pool layouts are different, though. >=20 > Is this problem truly ZFS-specific? I'd been tracking this problem for > years, and was told it was fixed: I'm not saying that my problems are ZFS-specific. The backtraces mainly contain geom functions and no ZFS code, so ZFS might be the victim here. > http://wiki.freebsd.org/BugBusting/Commonly_reported_issues >=20 > * Panic occurs when a mounted device (USB, SATA, local image file, > etc.) is removed >=20 > Workaround: Be sure to umount all filesystems before removing the > physical device > Partial fix: Committed to CURRENT (8.0) on/prior to 2008/02/21 >=20 > There is ongoing work to fully fix this problem, ETA 2009/02=20 This is a different problem. It was even easier to reproduce and at least I haven't seen it since the fixes went in. Fabian --Sig_/TY/An./0jB88xYb.h.J37YL Content-Type: application/pgp-signature; name=signature.asc Content-Disposition: attachment; filename=signature.asc -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (FreeBSD) iEYEARECAAYFAk8OApEACgkQBYqIVf93VJ0ApACfRWk3i0jNsLdGzovWgL+tGl8m 2UYAn36v0AQDMO6q+iz3TdiP5AxYnDRB =nzwg -----END PGP SIGNATURE----- --Sig_/TY/An./0jB88xYb.h.J37YL-- From owner-freebsd-fs@FreeBSD.ORG Wed Jan 11 23:58:59 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5C192106564A for ; Wed, 11 Jan 2012 23:58:59 +0000 (UTC) (envelope-from spork@bway.net) Received: from xena.bway.net (xena.bway.net [216.220.96.26]) by mx1.freebsd.org (Postfix) with ESMTP id 1251E8FC08 for ; Wed, 11 Jan 2012 23:58:58 +0000 (UTC) Received: (qmail 78327 invoked by uid 0); 11 Jan 2012 23:58:58 -0000 Received: from smtp.bway.net (216.220.96.25) by xena.bway.net with (DHE-RSA-AES256-SHA encrypted) SMTP; 11 Jan 2012 23:58:58 -0000 Received: (qmail 78318 invoked by uid 90); 11 Jan 2012 23:58:57 -0000 Received: from unknown (HELO ?10.3.2.40?) (spork@96.57.144.66) by smtp.bway.net with (AES128-SHA encrypted) SMTP; 11 Jan 2012 23:58:57 -0000 Mime-Version: 1.0 (Apple Message framework v1084) Content-Type: text/plain; charset=us-ascii From: Charles Sprickman In-Reply-To: <20120111224340.72f28ef4@fabiankeil.de> Date: Wed, 11 Jan 2012 18:58:57 -0500 Content-Transfer-Encoding: 7bit Message-Id: References: <20120111154722.000036e4@unknown> <20120111210708.1168781e@fabiankeil.de> <20120111204041.GA47175@icarus.home.lan> <20120111224340.72f28ef4@fabiankeil.de> To: Fabian Keil X-Mailer: Apple Mail (2.1084) Cc: freebsd-fs@freebsd.org Subject: Re: Unplugging disk under ZFS yield panic X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 Jan 2012 23:58:59 -0000 On Jan 11, 2012, at 4:43 PM, Fabian Keil wrote: > Jeremy Chadwick wrote: > >> On Wed, Jan 11, 2012 at 09:07:08PM +0100, Fabian Keil wrote: >>> Gergely CZUCZY wrote: >>> >>>> I'd like to ask, whether it is normal behaviour when we're unplugging a >>>> disk under a ZFS system then on the first write a kernel panic happened. >>> >>> Sounds familiar. I currently have two PRs open for >>> reproducible kernel panics after a vdev gets lost: >>> http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/162010 >>> http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/162036 >>> >>> Note that the pool layouts are different, though. >> >> Is this problem truly ZFS-specific? I'd been tracking this problem for >> years, and was told it was fixed: > > I'm not saying that my problems are ZFS-specific. > The backtraces mainly contain geom functions and no ZFS code, > so ZFS might be the victim here. Is there any relation between this issue and the "log_sysevent: type 19 is not implemented" problem that happens on device insertion/removal on 8.2? http://lists.freebsd.org/pipermail/freebsd-fs/2011-June/011855.html I still see that one on 8.2-STABLE from around 6/2011. I initially thought it was triggered by device failure or removal (even with proper hotplug support), but I got hit by it last night when inserting a new drive in a chassis that supports ahci/sata hotplug. IIRC it's not a ZFS issue, ZFS just gets more spammy (well, extremely spammy to the point the system can't do much else) about reporting an issue with a device going away or being inserted. Charles > >> http://wiki.freebsd.org/BugBusting/Commonly_reported_issues >> >> * Panic occurs when a mounted device (USB, SATA, local image file, >> etc.) is removed >> >> Workaround: Be sure to umount all filesystems before removing the >> physical device >> Partial fix: Committed to CURRENT (8.0) on/prior to 2008/02/21 >> >> There is ongoing work to fully fix this problem, ETA 2009/02 > > This is a different problem. It was even easier to reproduce > and at least I haven't seen it since the fixes went in. > > Fabian From owner-freebsd-fs@FreeBSD.ORG Thu Jan 12 00:15:13 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4FE881065676 for ; Thu, 12 Jan 2012 00:15:13 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta06.emeryville.ca.mail.comcast.net (qmta06.emeryville.ca.mail.comcast.net [76.96.30.56]) by mx1.freebsd.org (Postfix) with ESMTP id 4DC208FC0A for ; Thu, 12 Jan 2012 00:15:09 +0000 (UTC) Received: from omta23.emeryville.ca.mail.comcast.net ([76.96.30.90]) by qmta06.emeryville.ca.mail.comcast.net with comcast id LQ0a1i00L1wfjNsA6QF9Wl; Thu, 12 Jan 2012 00:15:09 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta23.emeryville.ca.mail.comcast.net with comcast id LQF81i01D1t3BNj8jQF9TK; Thu, 12 Jan 2012 00:15:09 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id AE666102C1E; Wed, 11 Jan 2012 16:15:08 -0800 (PST) Date: Wed, 11 Jan 2012 16:15:08 -0800 From: Jeremy Chadwick To: Charles Sprickman Message-ID: <20120112001508.GA50634@icarus.home.lan> References: <20120111154722.000036e4@unknown> <20120111210708.1168781e@fabiankeil.de> <20120111204041.GA47175@icarus.home.lan> <20120111224340.72f28ef4@fabiankeil.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org Subject: Re: Unplugging disk under ZFS yield panic X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Jan 2012 00:15:13 -0000 On Wed, Jan 11, 2012 at 06:58:57PM -0500, Charles Sprickman wrote: > > On Jan 11, 2012, at 4:43 PM, Fabian Keil wrote: > > > Jeremy Chadwick wrote: > > > >> On Wed, Jan 11, 2012 at 09:07:08PM +0100, Fabian Keil wrote: > >>> Gergely CZUCZY wrote: > >>> > >>>> I'd like to ask, whether it is normal behaviour when we're unplugging a > >>>> disk under a ZFS system then on the first write a kernel panic happened. > >>> > >>> Sounds familiar. I currently have two PRs open for > >>> reproducible kernel panics after a vdev gets lost: > >>> http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/162010 > >>> http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/162036 > >>> > >>> Note that the pool layouts are different, though. > >> > >> Is this problem truly ZFS-specific? I'd been tracking this problem for > >> years, and was told it was fixed: > > > > I'm not saying that my problems are ZFS-specific. > > The backtraces mainly contain geom functions and no ZFS code, > > so ZFS might be the victim here. > > Is there any relation between this issue and the "log_sysevent: > type 19 is not implemented" problem that happens on device > insertion/removal on 8.2? > > http://lists.freebsd.org/pipermail/freebsd-fs/2011-June/011855.html > > I still see that one on 8.2-STABLE from around 6/2011. I initially > thought it was triggered by device failure or removal (even with > proper hotplug support), but I got hit by it last night when > inserting a new drive in a chassis that supports ahci/sata hotplug. > IIRC it's not a ZFS issue, ZFS just gets more spammy (well, > extremely spammy to the point the system can't do much else) about > reporting an issue with a device going away or being inserted. This was fixed in RELENG_8 on 2011/06/14, r222343: http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/cddl/compat/opensolaris/kern/opensolaris_sysevent.c#rev1.2.2.3 The code that would print that message is #if 0'd out now. I can confirm the message is gone on all our systems running recent RELENG_8. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Thu Jan 12 00:25:57 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8A7A41065672 for ; Thu, 12 Jan 2012 00:25:57 +0000 (UTC) (envelope-from spork@bway.net) Received: from xena.bway.net (xena.bway.net [216.220.96.26]) by mx1.freebsd.org (Postfix) with ESMTP id 3E3388FC08 for ; Thu, 12 Jan 2012 00:25:57 +0000 (UTC) Received: (qmail 2072 invoked by uid 0); 12 Jan 2012 00:25:56 -0000 Received: from smtp.bway.net (216.220.96.25) by xena.bway.net with (DHE-RSA-AES256-SHA encrypted) SMTP; 12 Jan 2012 00:25:56 -0000 Received: (qmail 2061 invoked by uid 90); 12 Jan 2012 00:25:56 -0000 Received: from unknown (HELO ?10.3.2.40?) (spork@96.57.144.66) by smtp.bway.net with (AES128-SHA encrypted) SMTP; 12 Jan 2012 00:25:56 -0000 Mime-Version: 1.0 (Apple Message framework v1084) Content-Type: text/plain; charset=us-ascii From: Charles Sprickman In-Reply-To: <20120112001508.GA50634@icarus.home.lan> Date: Wed, 11 Jan 2012 19:25:55 -0500 Content-Transfer-Encoding: quoted-printable Message-Id: <4F04A2EC-67BE-401C-BF47-8614FAED07E1@bway.net> References: <20120111154722.000036e4@unknown> <20120111210708.1168781e@fabiankeil.de> <20120111204041.GA47175@icarus.home.lan> <20120111224340.72f28ef4@fabiankeil.de> <20120112001508.GA50634@icarus.home.lan> To: Jeremy Chadwick X-Mailer: Apple Mail (2.1084) Cc: freebsd-fs@freebsd.org Subject: Re: Unplugging disk under ZFS yield panic X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Jan 2012 00:25:57 -0000 On Jan 11, 2012, at 7:15 PM, Jeremy Chadwick wrote: > On Wed, Jan 11, 2012 at 06:58:57PM -0500, Charles Sprickman wrote: >>=20 >> On Jan 11, 2012, at 4:43 PM, Fabian Keil wrote: >>=20 >>> Jeremy Chadwick wrote: >>>=20 >>>> On Wed, Jan 11, 2012 at 09:07:08PM +0100, Fabian Keil wrote: >>>>> Gergely CZUCZY wrote: >>>>>=20 >>>>>> I'd like to ask, whether it is normal behaviour when we're = unplugging a >>>>>> disk under a ZFS system then on the first write a kernel panic = happened. >>>>>=20 >>>>> Sounds familiar. I currently have two PRs open for >>>>> reproducible kernel panics after a vdev gets lost: >>>>> http://www.freebsd.org/cgi/query-pr.cgi?pr=3Dkern/162010 >>>>> http://www.freebsd.org/cgi/query-pr.cgi?pr=3Dkern/162036 >>>>>=20 >>>>> Note that the pool layouts are different, though. >>>>=20 >>>> Is this problem truly ZFS-specific? I'd been tracking this problem = for >>>> years, and was told it was fixed: >>>=20 >>> I'm not saying that my problems are ZFS-specific. >>> The backtraces mainly contain geom functions and no ZFS code, >>> so ZFS might be the victim here. >>=20 >> Is there any relation between this issue and the "log_sysevent: >> type 19 is not implemented" problem that happens on device >> insertion/removal on 8.2? >>=20 >> http://lists.freebsd.org/pipermail/freebsd-fs/2011-June/011855.html >>=20 >> I still see that one on 8.2-STABLE from around 6/2011. I initially >> thought it was triggered by device failure or removal (even with >> proper hotplug support), but I got hit by it last night when >> inserting a new drive in a chassis that supports ahci/sata hotplug. >> IIRC it's not a ZFS issue, ZFS just gets more spammy (well, >> extremely spammy to the point the system can't do much else) about >> reporting an issue with a device going away or being inserted. >=20 > This was fixed in RELENG_8 on 2011/06/14, r222343: >=20 > = http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/cddl/compat/opensolaris/kern= /opensolaris_sysevent.c#rev1.2.2.3 >=20 > The code that would print that message is #if 0'd out now. I can > confirm the message is gone on all our systems running recent = RELENG_8. Wow, how unlucky: FreeBSD 8.2-STABLE (BL8-64) #0 r222897: Sun Jun 12 16:35:52 EDT 2011 So that's totally cosmetic? There was no underlying GEOM issue? Sorry for the noise on that then... Charles=20 >=20 > --=20 > | Jeremy Chadwick jdc at parodius.com | > | Parodius Networking http://www.parodius.com/ | > | UNIX Systems Administrator Mountain View, CA, US | > | Making life hard for others since 1977. PGP 4BD6C0CB | >=20 From owner-freebsd-fs@FreeBSD.ORG Thu Jan 12 01:46:29 2012 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2F8351065676 for ; Thu, 12 Jan 2012 01:46:29 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id B15DC8FC0C for ; Thu, 12 Jan 2012 01:46:28 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ap4EAJw6Dk+DaFvO/2dsb2JhbABDhQ6pAoFyAQEEASNWBRYYAgINGQJZGYd6pjKRVoEvh1KCBoEWBIg6jFKSWg X-IronPort-AV: E=Sophos;i="4.71,495,1320642000"; d="scan'208";a="151713084" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-annu-pri.mail.uoguelph.ca with ESMTP; 11 Jan 2012 20:45:51 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 50F6BB3F36; Wed, 11 Jan 2012 20:45:51 -0500 (EST) Date: Wed, 11 Jan 2012 20:45:51 -0500 (EST) From: Rick Macklem To: mdf@FreeBSD.org Message-ID: <450250386.94820.1326332751322.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.202] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: "Mikhail T." , fs@freebsd.org Subject: Re: How to best send files over network? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Jan 2012 01:46:29 -0000 Matthew Fleming wrote: > On Wed, Jan 11, 2012 at 8:07 AM, Rick Macklem > wrote: > > Mikhail T. wrote: > >> On 10.01.2012 18:49, Bob Friesenhahn wrote: > >> > Don't use mmap on zfs since doing so wastes memory (zfs ARC is > >> > not > >> > coherent with mmap page cache). Instead do normal file I/O (e.g. > >> > write, fwrite) using the filesystem blocksize (e.g. 128K) or a > >> > small > >> > multiple thereof. > >> Well, that was the reason cited for not using sendfile over ZFS. > >> But > >> mmap/write, it was claimed, was efficient. > >> > >> What's the general opinion of using mmap/write, when the file is on > >> UFS? > >> Is it just as good as sendfile, or can sendfile be better under > >> some > >> circumstances? > >> > >> > It is useful to cache several blocks on the receiving end and use > >> > a > >> > thread to receive data from the network in case zfs temporarily > >> > stalls > >> > during write (which it periodically does). > >> No, thanks. I'm certainly not doing a read/write loop -- that's > >> just > >> too > >> disgusting in the age of better interfaces (even if those aren't > >> well > >> implemented yet) :-) > > > > I think Bob was referring to the receive end and not the send end, > > which > > might be why the answer didn't make sense to you? (For the receive > > end, > > it sounds like a good suggestion to me.) > > Some day, when there's time, I'd like to roll up Isilon's recvfile(2) > into a proper patch. It's the obvious analogue to sendfile(2). I > think it also requires two new VOPs, for efficiency, VOP_READ_MBUF > (for sendfile) and VOP_WRITE_MBUF, which use an mbuf chain rather than > uio to do their work. > The NFS code has worked around this by allocating a bunch of mbufs and creating an iovec that refers to their data areas, pretty well forever. (Just guessing this would avoid adding the VOP_xxx_MBUF()s?) However, don't mistakenly interpret this as me volunteering to do this:-) rick > Perhaps at BSDCan, if this is my priority at the time. Or, if someone > else wants to work on it I can provide patches that will require work. > > Thanks, > matthew From owner-freebsd-fs@FreeBSD.ORG Thu Jan 12 08:09:44 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7FB641065670 for ; Thu, 12 Jan 2012 08:09:44 +0000 (UTC) (envelope-from peter.maloney@brockmann-consult.de) Received: from moutng.kundenserver.de (moutng.kundenserver.de [212.227.126.186]) by mx1.freebsd.org (Postfix) with ESMTP id 0D7A98FC08 for ; Thu, 12 Jan 2012 08:09:43 +0000 (UTC) Received: from [10.3.0.26] ([141.4.215.32]) by mrelayeu.kundenserver.de (node=mrbap1) with ESMTP (Nemesis) id 0MCwZH-1RtoRx2qDz-0095ll; Thu, 12 Jan 2012 09:09:42 +0100 Message-ID: <4F0E9546.1030405@brockmann-consult.de> Date: Thu, 12 Jan 2012 09:09:42 +0100 From: Peter Maloney User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.23) Gecko/20110922 Thunderbird/3.1.15 MIME-Version: 1.0 To: freebsd-fs@freebsd.org References: <20120111154722.000036e4@unknown> <20120111210708.1168781e@fabiankeil.de> <20120111204041.GA47175@icarus.home.lan> In-Reply-To: <20120111204041.GA47175@icarus.home.lan> X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Provags-ID: V02:K0:oO58sjRLllfCNF/WOZ54vculd8cB7jYV1hBwEMqOTkC MGLryhvVxBvmWu+fKGlNFjkGbyUextABcQ2H36kD9rf4JFMGgj WzvCzR10MWAzWdu9q8EohLubETWSQn/3RFOP6suSWr+OgfAfwN 60gpz+MkOKJxDMEeZCWMsZLKBzngYCA/fwKuBTBHasKqSzKqrn TnWwnXsdL8Wy4JRX7LplSILOKEeBxuavql4i0V1NZHg7YHKwds 45LaspnCxQYHDba2RKpb8tmoCVuaov2mBeMCwuH6vf1ovbFkxf uhtarWpAe+ahL6A6/naCuOutQ7dEAg/NE5dwqVPTQ9aToy/1O+ BF1p62JyT5XWM5AbNU0W5TbGQnNqvwvURSVOIzUv8 Subject: Re: Unplugging disk under ZFS yield panic X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Jan 2012 08:09:44 -0000 On 01/11/2012 09:40 PM, Jeremy Chadwick wrote: > On Wed, Jan 11, 2012 at 09:07:08PM +0100, Fabian Keil wrote: >> Gergely CZUCZY wrote: >> >>> I'd like to ask, whether it is normal behaviour when we're unplugging a >>> disk under a ZFS system then on the first write a kernel panic happened. >> Sounds familiar. I currently have two PRs open for >> reproducible kernel panics after a vdev gets lost: >> http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/162010 >> http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/162036 >> >> Note that the pool layouts are different, though. > Is this problem truly ZFS-specific? I'd been tracking this problem for > years, and was told it was fixed: > > http://wiki.freebsd.org/BugBusting/Commonly_reported_issues > > * Panic occurs when a mounted device (USB, SATA, local image file, > etc.) is removed > > Workaround: Be sure to umount all filesystems before removing the > physical device > Partial fix: Committed to CURRENT (8.0) on/prior to 2008/02/21 > > There is ongoing work to fully fix this problem, ETA 2009/02 > > OP, please provide a kernel backtrace. > > Otherwise, if needed, I can go yank one of the two mirrored disks out of > my FreeBSD box at home to try and reproduce the problem. I have pulled root disks (gpt slices), logs (gpt slices), and other disks (whole disk labels) without unmounting, without panic on 8-STABLE. My whole system is pure zfs with no gmirror, multipath, gnop, etc. devices, using the mps or mpslsi driver. I also have an SSD (with bad firmware?) that fails horribly when pulled (with SCSI / SMP timeouts instead of "lost device"), but *probably* doesn't affect the rest of the system, until you run "gpart recover" or "camcontrol reset ..." on the device, and then you get a panic. (I think mpslsi handles the bad SSD slightly better... sometimes recovering, and never hanging unless all root disks are gone, but not too sure; no difference in panics caused by "gpart recover" or "camcontrol reset ..." between the 2 drivers) However, In my experience, when a log with no redundancy is pulled without first doing "zpool remove ", the pool is marked FAULTED, and does not run (unlike when DEGRADED) until you run "zpool clear " (discarding the log data, possibly losing some files) or put the disk back in. Since your root pool has the log, your root pool would then be FAULTED. And any time your root disk is gone, FreeBSD seems to quickly panic. Could that be related to what you did? You said you pulled one of the multipath data disks though, not the log. I would try the same test with the log removed, or in a zfs mirror of slices/disks (instead of gmirror devices / whatever it is now). >>> On a device removal we're expecting it to moving to the spare disk, or >>> using the available redundant disks. >> I agree that this behaviour would be preferable to a panic. I agree as well. Even if the root disk is lost. If the root is gone in Linux (didn't try with mirrors), it just remounts the system read only (which can be disabled) and runs in an unpredictable state... maybe ssh works, maybe web works, etc. and then when the disk comes back, it is just like a read only file system until you remount it. -- -------------------------------------------- Peter Maloney Brockmann Consult Max-Planck-Str. 2 21502 Geesthacht Germany Tel: +49 4152 889 300 Fax: +49 4152 889 333 E-mail: peter.maloney@brockmann-consult.de Internet: http://www.brockmann-consult.de -------------------------------------------- From owner-freebsd-fs@FreeBSD.ORG Thu Jan 12 10:06:34 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BD089106566C for ; Thu, 12 Jan 2012 10:06:34 +0000 (UTC) (envelope-from joh.hendriks@gmail.com) Received: from mail-ee0-f54.google.com (mail-ee0-f54.google.com [74.125.83.54]) by mx1.freebsd.org (Postfix) with ESMTP id 479FF8FC0C for ; Thu, 12 Jan 2012 10:06:33 +0000 (UTC) Received: by eeke53 with SMTP id e53so127770eek.13 for ; Thu, 12 Jan 2012 02:06:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=message-id:date:from:user-agent:mime-version:to:subject :content-type; bh=Qkn+g2lQ0JdqmbXIhz08XeZ48c6hOROOtjazT37AILg=; b=f9Jz3CQebujss+zhc01VLQYshBUyyh2u4+UJtoigT5LmaDjeyaYGKxFQCk5ztioI1U Q0E2h1yrvI6j2a5g8iILaj6F9pUrcCW6KzAAPdi0tFPga7nWZcxBHTJu4xuDdeZjawjZ DHb7zVIy+Hq+RSeUdRqcLdkrVktQSxhH19VS8= Received: by 10.213.15.209 with SMTP id l17mr689756eba.11.1326362792287; Thu, 12 Jan 2012 02:06:32 -0800 (PST) Received: from [192.168.1.129] (schavemaker.nl. [213.84.84.186]) by mx.google.com with ESMTPS id 76sm17669835eeh.0.2012.01.12.02.06.31 (version=SSLv3 cipher=OTHER); Thu, 12 Jan 2012 02:06:31 -0800 (PST) Message-ID: <4F0EB0A6.9040009@gmail.com> Date: Thu, 12 Jan 2012 11:06:30 +0100 From: Johan Hendriks User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:9.0) Gecko/20111222 Thunderbird/9.0.1 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: wiki adjustment regarding FreeBSD ZFS spares X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Jan 2012 10:06:34 -0000 On the FreeBSD wiki page about ZFS, there is no notification that a spare on FreeBSD is not hot, and that human intervention is needed. http://wiki.freebsd.org/ZFS On the page there is a link called Known problems / gotchas for ZFS http://wiki.freebsd.org/ZFSKnownProblems I think the cold spare is a real ZFS gotcha on FreeBSD. It would be in my opinion a good idea to add a warning about the spares on that page. On the forum and on this list, there are several persons that assumed that a spare added to the pool is hot. If you read the zfs admin guide from solaris, it is hot. Also the man page of zpool tells you it is hot.! And zpool accepts a spare without any warning, so people get a false sense of security using a spare. just my thoughts. regards Johan Hendriks From owner-freebsd-fs@FreeBSD.ORG Fri Jan 13 00:24:55 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 711CB106564A; Fri, 13 Jan 2012 00:24:55 +0000 (UTC) (envelope-from mckusick@mckusick.com) Received: from chez.mckusick.com (unknown [IPv6:2001:5a8:4:7e72:4a5b:39ff:fe12:452]) by mx1.freebsd.org (Postfix) with ESMTP id 52D478FC0A; Fri, 13 Jan 2012 00:24:55 +0000 (UTC) Received: from chez.mckusick.com (localhost [127.0.0.1]) by chez.mckusick.com (8.14.3/8.14.3) with ESMTP id q0D0OflU055874; Thu, 12 Jan 2012 16:24:41 -0800 (PST) (envelope-from mckusick@chez.mckusick.com) Message-Id: <201201130024.q0D0OflU055874@chez.mckusick.com> To: freebsd-fs@freebsd.org In-reply-to: <20120111103039.d342aef4.lists@yamagi.org> Date: Thu, 12 Jan 2012 16:24:41 -0800 From: Kirk McKusick X-Spam-Status: No, score=0.0 required=5.0 tests=MISSING_MID, UNPARSEABLE_RELAY autolearn=failed version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on chez.mckusick.com Cc: bryce@bryce.net, Gautam Mani Subject: Re: FS hang when creating snapshots on a UFS SU+J setup X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 13 Jan 2012 00:24:55 -0000 Hi all, With a gentle reminder from Peter Holm that he has a test that demonstrates this problem and the additional examples provided by several of you, I can now reliably reproduce the problem. Having spent a day trying to get to the bottom of it, I have decided that it is a `hard problem' which is going to require some time to figure out. I am hopeful that Jeff will come up with some time to help out here, but know that he is rather busy on other projects at the moment. My conclusion is that as an interim solution, we disable the ability to request snapshots on filesystems that are using SU+J. Notably, if a snapshot is attempted, the request fail with a message that reports that snapshots are currently supported only on systems running with SU. I think that this would be far preferable to the current situation. Does this seem like a reasonable approach? Kirk McKusick