From owner-freebsd-fs@FreeBSD.ORG Mon Jul 1 11:06:46 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 4B4AB5EE for ; Mon, 1 Jul 2013 11:06:46 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id 2D47C10D4 for ; Mon, 1 Jul 2013 11:06:46 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id r61B6kWI085753 for ; Mon, 1 Jul 2013 11:06:46 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.7/8.14.7/Submit) id r61B6jw1085751 for freebsd-fs@FreeBSD.org; Mon, 1 Jul 2013 11:06:45 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 1 Jul 2013 11:06:45 GMT Message-Id: <201307011106.r61B6jw1085751@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-fs@FreeBSD.org Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 01 Jul 2013 11:06:46 -0000 Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/178854 fs [ufs] FreeBSD kernel crash in UFS o kern/178713 fs [nfs] [patch] Correct WebNFS support in NFS server and o kern/178412 fs [smbfs] Coredump when smbfs mounted o kern/178388 fs [zfs] [patch] allow up to 8MB recordsize o kern/178349 fs [zfs] zfs scrub on deduped data could be much less see o kern/178329 fs [zfs] extended attributes leak o kern/178238 fs [nullfs] nullfs don't release i-nodes on unlink. f kern/178231 fs [nfs] 8.3 nfsv4 client reports "nfsv4 client/server pr o kern/178103 fs [kernel] [nfs] [patch] Correct support of index files o kern/177985 fs [zfs] disk usage problem when copying from one zfs dat o kern/177971 fs [nfs] FreeBSD 9.1 nfs client dirlist problem w/ nfsv3, o kern/177966 fs [zfs] resilver completes but subsequent scrub reports o kern/177658 fs [ufs] FreeBSD panics after get full filesystem with uf o kern/177536 fs [zfs] zfs livelock (deadlock) with high write-to-disk o kern/177445 fs [hast] HAST panic o kern/177240 fs [zfs] zpool import failed with state UNAVAIL but all d o kern/176978 fs [zfs] [panic] zfs send -D causes "panic: System call i o kern/176857 fs [softupdates] [panic] 9.1-RELEASE/amd64/GENERIC panic o bin/176253 fs zpool(8): zfs pool indentation is misleading/wrong o kern/176141 fs [zfs] sharesmb=on makes errors for sharenfs, and still o kern/175950 fs [zfs] Possible deadlock in zfs after long uptime o kern/175897 fs [zfs] operations on readonly zpool hang o kern/175179 fs [zfs] ZFS may attach wrong device on move o kern/175071 fs [ufs] [panic] softdep_deallocate_dependencies: unrecov o kern/174372 fs [zfs] Pagefault appears to be related to ZFS o kern/174315 fs [zfs] chflags uchg not supported o kern/174310 fs [zfs] root point mounting broken on CURRENT with multi o kern/174279 fs [ufs] UFS2-SU+J journal and filesystem corruption o kern/173830 fs [zfs] Brain-dead simple change to ZFS error descriptio o kern/173718 fs [zfs] phantom directory in zraid2 pool f kern/173657 fs [nfs] strange UID map with nfsuserd o kern/173363 fs [zfs] [panic] Panic on 'zpool replace' on readonly poo o kern/173136 fs [unionfs] mounting above the NFS read-only share panic o kern/172942 fs [smbfs] Unmounting a smb mount when the server became o kern/172348 fs [unionfs] umount -f of filesystem in use with readonly o kern/172334 fs [unionfs] unionfs permits recursive union mounts; caus o kern/171626 fs [tmpfs] tmpfs should be noisier when the requested siz o kern/171415 fs [zfs] zfs recv fails with "cannot receive incremental o kern/170945 fs [gpt] disk layout not portable between direct connect o bin/170778 fs [zfs] [panic] FreeBSD panics randomly o kern/170680 fs [nfs] Multiple NFS Client bug in the FreeBSD 7.4-RELEA o kern/170497 fs [xfs][panic] kernel will panic whenever I ls a mounted o kern/169945 fs [zfs] [panic] Kernel panic while importing zpool (afte o kern/169480 fs [zfs] ZFS stalls on heavy I/O o kern/169398 fs [zfs] Can't remove file with permanent error o kern/169339 fs panic while " : > /etc/123" o kern/169319 fs [zfs] zfs resilver can't complete o kern/168947 fs [nfs] [zfs] .zfs/snapshot directory is messed up when o kern/168942 fs [nfs] [hang] nfsd hangs after being restarted (not -HU o kern/168158 fs [zfs] incorrect parsing of sharenfs options in zfs (fs o kern/167979 fs [ufs] DIOCGDINFO ioctl does not work on 8.2 file syste o kern/167977 fs [smbfs] mount_smbfs results are differ when utf-8 or U o kern/167688 fs [fusefs] Incorrect signal handling with direct_io o kern/167685 fs [zfs] ZFS on USB drive prevents shutdown / reboot o kern/167612 fs [portalfs] The portal file system gets stuck inside po o kern/167272 fs [zfs] ZFS Disks reordering causes ZFS to pick the wron o kern/167260 fs [msdosfs] msdosfs disk was mounted the second time whe o kern/167109 fs [zfs] [panic] zfs diff kernel panic Fatal trap 9: gene o kern/167105 fs [nfs] mount_nfs can not handle source exports wiht mor o kern/167067 fs [zfs] [panic] ZFS panics the server o kern/167065 fs [zfs] boot fails when a spare is the boot disk o kern/167048 fs [nfs] [patch] RELEASE-9 crash when using ZFS+NULLFS+NF o kern/166912 fs [ufs] [panic] Panic after converting Softupdates to jo o kern/166851 fs [zfs] [hang] Copying directory from the mounted UFS di o kern/166477 fs [nfs] NFS data corruption. o kern/165950 fs [ffs] SU+J and fsck problem o kern/165521 fs [zfs] [hang] livelock on 1 Gig of RAM with zfs when 31 o kern/165392 fs Multiple mkdir/rmdir fails with errno 31 o kern/165087 fs [unionfs] lock violation in unionfs o kern/164472 fs [ufs] fsck -B panics on particular data inconsistency o kern/164370 fs [zfs] zfs destroy for snapshot fails on i386 and sparc o kern/164261 fs [nullfs] [patch] fix panic with NFS served from NULLFS o kern/164256 fs [zfs] device entry for volume is not created after zfs o kern/164184 fs [ufs] [panic] Kernel panic with ufs_makeinode o kern/163801 fs [md] [request] allow mfsBSD legacy installed in 'swap' o kern/163770 fs [zfs] [hang] LOR between zfs&syncer + vnlru leading to o kern/163501 fs [nfs] NFS exporting a dir and a subdir in that dir to o kern/162944 fs [coda] Coda file system module looks broken in 9.0 o kern/162860 fs [zfs] Cannot share ZFS filesystem to hosts with a hyph o kern/162751 fs [zfs] [panic] kernel panics during file operations o kern/162591 fs [nullfs] cross-filesystem nullfs does not work as expe o kern/162519 fs [zfs] "zpool import" relies on buggy realpath() behavi o kern/161968 fs [zfs] [hang] renaming snapshot with -r including a zvo o kern/161864 fs [ufs] removing journaling from UFS partition fails on o kern/161579 fs [smbfs] FreeBSD sometimes panics when an smb share is o kern/161533 fs [zfs] [panic] zfs receive panic: system ioctl returnin o kern/161438 fs [zfs] [panic] recursed on non-recursive spa_namespace_ o kern/161424 fs [nullfs] __getcwd() calls fail when used on nullfs mou o kern/161280 fs [zfs] Stack overflow in gptzfsboot o kern/161205 fs [nfs] [pfsync] [regression] [build] Bug report freebsd o kern/161169 fs [zfs] [panic] ZFS causes kernel panic in dbuf_dirty o kern/161112 fs [ufs] [lor] filesystem LOR in FreeBSD 9.0-BETA3 o kern/160893 fs [zfs] [panic] 9.0-BETA2 kernel panic f kern/160860 fs [ufs] Random UFS root filesystem corruption with SU+J o kern/160801 fs [zfs] zfsboot on 8.2-RELEASE fails to boot from root-o o kern/160790 fs [fusefs] [panic] VPUTX: negative ref count with FUSE o kern/160777 fs [zfs] [hang] RAID-Z3 causes fatal hang upon scrub/impo o kern/160706 fs [zfs] zfs bootloader fails when a non-root vdev exists o kern/160591 fs [zfs] Fail to boot on zfs root with degraded raidz2 [r o kern/160410 fs [smbfs] [hang] smbfs hangs when transferring large fil o kern/160283 fs [zfs] [patch] 'zfs list' does abort in make_dataset_ha o kern/159930 fs [ufs] [panic] kernel core o kern/159402 fs [zfs][loader] symlinks cause I/O errors o kern/159357 fs [zfs] ZFS MAXNAMELEN macro has confusing name (off-by- o kern/159356 fs [zfs] [patch] ZFS NAME_ERR_DISKLIKE check is Solaris-s o kern/159351 fs [nfs] [patch] - divide by zero in mountnfs() o kern/159251 fs [zfs] [request]: add FLETCHER4 as DEDUP hash option o kern/159077 fs [zfs] Can't cd .. with latest zfs version o kern/159048 fs [smbfs] smb mount corrupts large files o kern/159045 fs [zfs] [hang] ZFS scrub freezes system o kern/158839 fs [zfs] ZFS Bootloader Fails if there is a Dead Disk o kern/158802 fs amd(8) ICMP storm and unkillable process. o kern/158231 fs [nullfs] panic on unmounting nullfs mounted over ufs o f kern/157929 fs [nfs] NFS slow read o kern/157399 fs [zfs] trouble with: mdconfig force delete && zfs strip o kern/157179 fs [zfs] zfs/dbuf.c: panic: solaris assert: arc_buf_remov o kern/156797 fs [zfs] [panic] Double panic with FreeBSD 9-CURRENT and o kern/156781 fs [zfs] zfs is losing the snapshot directory, p kern/156545 fs [ufs] mv could break UFS on SMP systems o kern/156193 fs [ufs] [hang] UFS snapshot hangs && deadlocks processes o kern/156039 fs [nullfs] [unionfs] nullfs + unionfs do not compose, re o kern/155615 fs [zfs] zfs v28 broken on sparc64 -current o kern/155587 fs [zfs] [panic] kernel panic with zfs p kern/155411 fs [regression] [8.2-release] [tmpfs]: mount: tmpfs : No o kern/155199 fs [ext2fs] ext3fs mounted as ext2fs gives I/O errors o bin/155104 fs [zfs][patch] use /dev prefix by default when importing o kern/154930 fs [zfs] cannot delete/unlink file from full volume -> EN o kern/154828 fs [msdosfs] Unable to create directories on external USB o kern/154491 fs [smbfs] smb_co_lock: recursive lock for object 1 p kern/154228 fs [md] md getting stuck in wdrain state o kern/153996 fs [zfs] zfs root mount error while kernel is not located o kern/153753 fs [zfs] ZFS v15 - grammatical error when attempting to u o kern/153716 fs [zfs] zpool scrub time remaining is incorrect o kern/153695 fs [patch] [zfs] Booting from zpool created on 4k-sector o kern/153680 fs [xfs] 8.1 failing to mount XFS partitions o kern/153418 fs [zfs] [panic] Kernel Panic occurred writing to zfs vol o kern/153351 fs [zfs] locking directories/files in ZFS o bin/153258 fs [patch][zfs] creating ZVOLs requires `refreservation' s kern/153173 fs [zfs] booting from a gzip-compressed dataset doesn't w o bin/153142 fs [zfs] ls -l outputs `ls: ./.zfs: Operation not support o kern/153126 fs [zfs] vdev failure, zpool=peegel type=vdev.too_small o kern/152022 fs [nfs] nfs service hangs with linux client [regression] o kern/151942 fs [zfs] panic during ls(1) zfs snapshot directory o kern/151905 fs [zfs] page fault under load in /sbin/zfs o bin/151713 fs [patch] Bug in growfs(8) with respect to 32-bit overfl o kern/151648 fs [zfs] disk wait bug o kern/151629 fs [fs] [patch] Skip empty directory entries during name o kern/151330 fs [zfs] will unshare all zfs filesystem after execute a o kern/151326 fs [nfs] nfs exports fail if netgroups contain duplicate o kern/151251 fs [ufs] Can not create files on filesystem with heavy us o kern/151226 fs [zfs] can't delete zfs snapshot o kern/150503 fs [zfs] ZFS disks are UNAVAIL and corrupted after reboot o kern/150501 fs [zfs] ZFS vdev failure vdev.bad_label on amd64 o kern/150390 fs [zfs] zfs deadlock when arcmsr reports drive faulted o kern/150336 fs [nfs] mountd/nfsd became confused; refused to reload n o kern/149208 fs mksnap_ffs(8) hang/deadlock o kern/149173 fs [patch] [zfs] make OpenSolaris installa o kern/149015 fs [zfs] [patch] misc fixes for ZFS code to build on Glib o kern/149014 fs [zfs] [patch] declarations in ZFS libraries/utilities o kern/149013 fs [zfs] [patch] make ZFS makefiles use the libraries fro o kern/148504 fs [zfs] ZFS' zpool does not allow replacing drives to be o kern/148490 fs [zfs]: zpool attach - resilver bidirectionally, and re o kern/148368 fs [zfs] ZFS hanging forever on 8.1-PRERELEASE o kern/148138 fs [zfs] zfs raidz pool commands freeze o kern/147903 fs [zfs] [panic] Kernel panics on faulty zfs device o kern/147881 fs [zfs] [patch] ZFS "sharenfs" doesn't allow different " o kern/147420 fs [ufs] [panic] ufs_dirbad, nullfs, jail panic (corrupt o kern/146941 fs [zfs] [panic] Kernel Double Fault - Happens constantly o kern/146786 fs [zfs] zpool import hangs with checksum errors o kern/146708 fs [ufs] [panic] Kernel panic in softdep_disk_write_compl o kern/146528 fs [zfs] Severe memory leak in ZFS on i386 o kern/146502 fs [nfs] FreeBSD 8 NFS Client Connection to Server s kern/145712 fs [zfs] cannot offline two drives in a raidz2 configurat o kern/145411 fs [xfs] [panic] Kernel panics shortly after mounting an f bin/145309 fs bsdlabel: Editing disk label invalidates the whole dev o kern/145272 fs [zfs] [panic] Panic during boot when accessing zfs on o kern/145246 fs [ufs] dirhash in 7.3 gratuitously frees hashes when it o kern/145238 fs [zfs] [panic] kernel panic on zpool clear tank o kern/145229 fs [zfs] Vast differences in ZFS ARC behavior between 8.0 o kern/145189 fs [nfs] nfsd performs abysmally under load o kern/144929 fs [ufs] [lor] vfs_bio.c + ufs_dirhash.c p kern/144447 fs [zfs] sharenfs fsunshare() & fsshare_main() non functi o kern/144416 fs [panic] Kernel panic on online filesystem optimization s kern/144415 fs [zfs] [panic] kernel panics on boot after zfs crash o kern/144234 fs [zfs] Cannot boot machine with recent gptzfsboot code o kern/143825 fs [nfs] [panic] Kernel panic on NFS client o bin/143572 fs [zfs] zpool(1): [patch] The verbose output from iostat o kern/143212 fs [nfs] NFSv4 client strange work ... o kern/143184 fs [zfs] [lor] zfs/bufwait LOR o kern/142878 fs [zfs] [vfs] lock order reversal o kern/142597 fs [ext2fs] ext2fs does not work on filesystems with real o kern/142489 fs [zfs] [lor] allproc/zfs LOR o kern/142466 fs Update 7.2 -> 8.0 on Raid 1 ends with screwed raid [re o kern/142306 fs [zfs] [panic] ZFS drive (from OSX Leopard) causes two o kern/142068 fs [ufs] BSD labels are got deleted spontaneously o kern/141897 fs [msdosfs] [panic] Kernel panic. msdofs: file name leng o kern/141463 fs [nfs] [panic] Frequent kernel panics after upgrade fro o kern/141305 fs [zfs] FreeBSD ZFS+sendfile severe performance issues ( o kern/141091 fs [patch] [nullfs] fix panics with DIAGNOSTIC enabled o kern/141086 fs [nfs] [panic] panic("nfs: bioread, not dir") on FreeBS o kern/141010 fs [zfs] "zfs scrub" fails when backed by files in UFS2 o kern/140888 fs [zfs] boot fail from zfs root while the pool resilveri o kern/140661 fs [zfs] [patch] /boot/loader fails to work on a GPT/ZFS- o kern/140640 fs [zfs] snapshot crash o kern/140068 fs [smbfs] [patch] smbfs does not allow semicolon in file o kern/139725 fs [zfs] zdb(1) dumps core on i386 when examining zpool c o kern/139715 fs [zfs] vfs.numvnodes leak on busy zfs p bin/139651 fs [nfs] mount(8): read-only remount of NFS volume does n o kern/139407 fs [smbfs] [panic] smb mount causes system crash if remot o kern/138662 fs [panic] ffs_blkfree: freeing free block o kern/138421 fs [ufs] [patch] remove UFS label limitations o kern/138202 fs mount_msdosfs(1) see only 2Gb o kern/136968 fs [ufs] [lor] ufs/bufwait/ufs (open) o kern/136945 fs [ufs] [lor] filedesc structure/ufs (poll) o kern/136944 fs [ffs] [lor] bufwait/snaplk (fsync) o kern/136873 fs [ntfs] Missing directories/files on NTFS volume o kern/136865 fs [nfs] [patch] NFS exports atomic and on-the-fly atomic p kern/136470 fs [nfs] Cannot mount / in read-only, over NFS o kern/135546 fs [zfs] zfs.ko module doesn't ignore zpool.cache filenam o kern/135469 fs [ufs] [panic] kernel crash on md operation in ufs_dirb o kern/135050 fs [zfs] ZFS clears/hides disk errors on reboot o kern/134491 fs [zfs] Hot spares are rather cold... o kern/133676 fs [smbfs] [panic] umount -f'ing a vnode-based memory dis p kern/133174 fs [msdosfs] [patch] msdosfs must support multibyte inter o kern/132960 fs [ufs] [panic] panic:ffs_blkfree: freeing free frag o kern/132397 fs reboot causes filesystem corruption (failure to sync b o kern/132331 fs [ufs] [lor] LOR ufs and syncer o kern/132237 fs [msdosfs] msdosfs has problems to read MSDOS Floppy o kern/132145 fs [panic] File System Hard Crashes o kern/131441 fs [unionfs] [nullfs] unionfs and/or nullfs not combineab o kern/131360 fs [nfs] poor scaling behavior of the NFS server under lo o kern/131342 fs [nfs] mounting/unmounting of disks causes NFS to fail o bin/131341 fs makefs: error "Bad file descriptor" on the mount poin o kern/130920 fs [msdosfs] cp(1) takes 100% CPU time while copying file o kern/130210 fs [nullfs] Error by check nullfs o kern/129760 fs [nfs] after 'umount -f' of a stale NFS share FreeBSD l o kern/129488 fs [smbfs] Kernel "bug" when using smbfs in smbfs_smb.c: o kern/129231 fs [ufs] [patch] New UFS mount (norandom) option - mostly o kern/129152 fs [panic] non-userfriendly panic when trying to mount(8) o kern/127787 fs [lor] [ufs] Three LORs: vfslock/devfs/vfslock, ufs/vfs o bin/127270 fs fsck_msdosfs(8) may crash if BytesPerSec is zero o kern/127029 fs [panic] mount(8): trying to mount a write protected zi o kern/126287 fs [ufs] [panic] Kernel panics while mounting an UFS file o kern/125895 fs [ffs] [panic] kernel: panic: ffs_blkfree: freeing free s kern/125738 fs [zfs] [request] SHA256 acceleration in ZFS o kern/123939 fs [msdosfs] corrupts new files o kern/122380 fs [ffs] ffs_valloc:dup alloc (Soekris 4801/7.0/USB Flash o bin/122172 fs [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, o bin/121898 fs [nullfs] pwd(1)/getcwd(2) fails with Permission denied o bin/121072 fs [smbfs] mount_smbfs(8) cannot normally convert the cha o kern/120483 fs [ntfs] [patch] NTFS filesystem locking changes o kern/120482 fs [ntfs] [patch] Sync style changes between NetBSD and F o kern/118912 fs [2tb] disk sizing/geometry problem with large array o kern/118713 fs [minidump] [patch] Display media size required for a k o kern/118318 fs [nfs] NFS server hangs under special circumstances o bin/118249 fs [ufs] mv(1): moving a directory changes its mtime o kern/118126 fs [nfs] [patch] Poor NFS server write performance o kern/118107 fs [ntfs] [panic] Kernel panic when accessing a file at N o kern/117954 fs [ufs] dirhash on very large directories blocks the mac o bin/117315 fs [smbfs] mount_smbfs(8) and related options can't mount o kern/117158 fs [zfs] zpool scrub causes panic if geli vdevs detach on o bin/116980 fs [msdosfs] [patch] mount_msdosfs(8) resets some flags f o conf/116931 fs lack of fsck_cd9660 prevents mounting iso images with o kern/116583 fs [ffs] [hang] System freezes for short time when using o bin/115361 fs [zfs] mount(8) gets into a state where it won't set/un o kern/114955 fs [cd9660] [patch] [request] support for mask,dirmask,ui o kern/114847 fs [ntfs] [patch] [request] dirmask support for NTFS ala o kern/114676 fs [ufs] snapshot creation panics: snapacct_ufs2: bad blo o bin/114468 fs [patch] [request] add -d option to umount(8) to detach o kern/113852 fs [smbfs] smbfs does not properly implement DFS referral o bin/113838 fs [patch] [request] mount(8): add support for relative p o bin/113049 fs [patch] [request] make quot(8) use getopt(3) and show o kern/112658 fs [smbfs] [patch] smbfs and caching problems (resolves b o kern/111843 fs [msdosfs] Long Names of files are incorrectly created o kern/111782 fs [ufs] dump(8) fails horribly for large filesystems s bin/111146 fs [2tb] fsck(8) fails on 6T filesystem o bin/107829 fs [2TB] fdisk(8): invalid boundary checking in fdisk / w o kern/106107 fs [ufs] left-over fsck_snapshot after unfinished backgro o kern/104406 fs [ufs] Processes get stuck in "ufs" state under persist o kern/104133 fs [ext2fs] EXT2FS module corrupts EXT2/3 filesystems o kern/103035 fs [ntfs] Directories in NTFS mounted disc images appear o kern/101324 fs [smbfs] smbfs sometimes not case sensitive when it's s o kern/99290 fs [ntfs] mount_ntfs ignorant of cluster sizes s bin/97498 fs [request] newfs(8) has no option to clear the first 12 o kern/97377 fs [ntfs] [patch] syntax cleanup for ntfs_ihash.c o kern/95222 fs [cd9660] File sections on ISO9660 level 3 CDs ignored o kern/94849 fs [ufs] rename on UFS filesystem is not atomic o bin/94810 fs fsck(8) incorrectly reports 'file system marked clean' o kern/94769 fs [ufs] Multiple file deletions on multi-snapshotted fil o kern/94733 fs [smbfs] smbfs may cause double unlock o kern/93942 fs [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D o kern/92272 fs [ffs] [hang] Filling a filesystem while creating a sna o kern/91134 fs [smbfs] [patch] Preserve access and modification time a kern/90815 fs [smbfs] [patch] SMBFS with character conversions somet o kern/88657 fs [smbfs] windows client hang when browsing a samba shar o kern/88555 fs [panic] ffs_blkfree: freeing free frag on AMD 64 o bin/87966 fs [patch] newfs(8): introduce -A flag for newfs to enabl o kern/87859 fs [smbfs] System reboot while umount smbfs. o kern/86587 fs [msdosfs] rm -r /PATH fails with lots of small files o bin/85494 fs fsck_ffs: unchecked use of cg_inosused macro etc. o kern/80088 fs [smbfs] Incorrect file time setting on NTFS mounted vi o bin/74779 fs Background-fsck checks one filesystem twice and omits o kern/73484 fs [ntfs] Kernel panic when doing `ls` from the client si o bin/73019 fs [ufs] fsck_ufs(8) cannot alloc 607016868 bytes for ino o kern/71774 fs [ntfs] NTFS cannot "see" files on a WinXP filesystem o bin/70600 fs fsck(8) throws files away when it can't grow lost+foun o kern/68978 fs [panic] [ufs] crashes with failing hard disk, loose po o kern/65920 fs [nwfs] Mounted Netware filesystem behaves strange o kern/65901 fs [smbfs] [patch] smbfs fails fsx write/truncate-down/tr o kern/61503 fs [smbfs] mount_smbfs does not work as non-root o kern/55617 fs [smbfs] Accessing an nsmb-mounted drive via a smb expo o kern/51685 fs [hang] Unbounded inode allocation causes kernel to loc o kern/36566 fs [smbfs] System reboot with dead smb mount and umount o bin/27687 fs fsck(8) wrapper is not properly passing options to fsc o kern/18874 fs [2TB] 32bit NFS servers export wrong negative values t 315 problems total. From owner-freebsd-fs@FreeBSD.ORG Mon Jul 1 23:11:53 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 8110A96B for ; Mon, 1 Jul 2013 23:11:53 +0000 (UTC) (envelope-from rmh.aybabtu@gmail.com) Received: from mail-qc0-x230.google.com (mail-qc0-x230.google.com [IPv6:2607:f8b0:400d:c01::230]) by mx1.freebsd.org (Postfix) with ESMTP id 493EC189F for ; Mon, 1 Jul 2013 23:11:53 +0000 (UTC) Received: by mail-qc0-f176.google.com with SMTP id z10so3286453qcx.35 for ; Mon, 01 Jul 2013 16:11:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:date:x-google-sender-auth:message-id:subject :from:to:content-type; bh=fEramCY+0KWSebcitGvu8yPwKYvG2cGH5Nqej0ZRw48=; b=mACHfYdVgG93MhQQqAmjwKHrPJwXN59EB22dEe12ko0xGwiMsBRfl2xdmM9LHm3rFO dNJNE6VnPVPYOzOhAlD0tiis14qxK8FhX0NfIKmb8rtSbxAHkAPu3FR28tZJcqMe4j13 c6esOIZS3vivJqjlETUUJFOg2Hl8UQG/A8cBpueHUNR4B86UfepEAZc3kzlED/G2TbQN vIEF1OiWaPsh2bF6Qj4YFhq631veistuGBq0RsJxjLDNTicNv/moG6ntharQHhjdYQS4 QV1SKsNrKvQ1hl0BwygzpuCV/MunawrflfFoF9HEMogG0LXxaZTRokJpZ9IX8uZYFjTX 6B8g== MIME-Version: 1.0 X-Received: by 10.49.58.70 with SMTP id o6mr34316401qeq.1.1372720312541; Mon, 01 Jul 2013 16:11:52 -0700 (PDT) Sender: rmh.aybabtu@gmail.com Received: by 10.49.26.193 with HTTP; Mon, 1 Jul 2013 16:11:52 -0700 (PDT) Date: Tue, 2 Jul 2013 01:11:52 +0200 X-Google-Sender-Auth: saPBD46NP-OB9p5dbtTC2pg6Q0g Message-ID: Subject: Compatibility options for mount(8) From: Robert Millan To: freebsd-fs@freebsd.org Content-Type: multipart/mixed; boundary=047d7b2e52909ae8eb04e07b5aa9 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 01 Jul 2013 23:11:53 -0000 --047d7b2e52909ae8eb04e07b5aa9 Content-Type: text/plain; charset=UTF-8 Hi, On Debian GNU/kFreeBSD, we've been using these bits of glue to make FreeBSD mount a bit more compatible with the Linux version of mount. We found that this occasionally helps when porting software that needs to use those features and relies on Linux semantics: - Ignore "-n" flag, since it requests not to update /etc/mtab, which we never do anyway. - Map "-o remount" to its FreeBSD equivalent, "-o update". I'd like to check in the attached patch. Please have a look! Thanks -- Robert Millan --047d7b2e52909ae8eb04e07b5aa9 Content-Type: application/octet-stream; name="mount_cli_compat.diff" Content-Disposition: attachment; filename="mount_cli_compat.diff" Content-Transfer-Encoding: base64 X-Attachment-Id: f_hima6x510 SW5kZXg6IHNiaW4vbW91bnQvbW91bnQuYwo9PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09 PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09Ci0tLSBzYmluL21vdW50L21vdW50 LmMJKHJldmlzaW9uIDI1MjQ5MCkKKysrIHNiaW4vbW91bnQvbW91bnQuYwkod29ya2luZyBjb3B5 KQpAQCAtMjUzLDcgKzI1Myw3IEBACiAJb3B0aW9ucyA9IE5VTEw7CiAJdmZzbGlzdCA9IE5VTEw7 CiAJdmZzdHlwZSA9ICJ1ZnMiOwotCXdoaWxlICgoY2ggPSBnZXRvcHQoYXJnYywgYXJndiwgImFk RjpmTGxvOnBydDp1dnciKSkgIT0gLTEpCisJd2hpbGUgKChjaCA9IGdldG9wdChhcmdjLCBhcmd2 LCAiYWRGOmZMbG86cHJ0OnV2d24iKSkgIT0gLTEpCiAJCXN3aXRjaCAoY2gpIHsKIAkJY2FzZSAn YSc6CiAJCQlhbGwgPSAxOwpAQCAtMjc0LDYgKzI3NCw5IEBACiAJCWNhc2UgJ2wnOgogCQkJbGF0 ZSA9IDE7CiAJCQlicmVhazsKKwkJY2FzZSAnbic6CisJCQkvKiBGb3IgY29tcGF0aWJpbGl0eSB3 aXRoIHRoZSBMaW51eCB2ZXJzaW9uIG9mIG1vdW50LiAqLworCQkJYnJlYWs7CiAJCWNhc2UgJ28n OgogCQkJaWYgKCpvcHRhcmcpIHsKIAkJCQlvcHRpb25zID0gY2F0b3B0KG9wdGlvbnMsIG9wdGFy Zyk7CkBAIC03NzEsNiArNzc0LDExIEBACiAJCQl9IGVsc2UgaWYgKHN0cm5jbXAocCwgZ3JvdXBx dW90YWVxLAogCQkJICAgIHNpemVvZihncm91cHF1b3RhZXEpIC0gMSkgPT0gMCkgewogCQkJCWNv bnRpbnVlOworCQkJfSBlbHNlIGlmIChzdHJjbXAocCwgInJlbW91bnQiKSA9PSAwKSB7CisJCQkJ LyogRm9yIGNvbXBhdGliaWxpdHkgd2l0aCB0aGUgTGludXggdmVyc2lvbiBvZiBtb3VudC4gKi8K KwkJCQlhcHBlbmRfYXJnKGEsIHN0cmR1cCgiLW8iKSk7CisJCQkJYXBwZW5kX2FyZyhhLCBzdHJk dXAoInVwZGF0ZSIpKTsKKwkJCQljb250aW51ZTsKIAkJCX0gZWxzZSBpZiAoKnAgPT0gJy0nKSB7 CiAJCQkJYXBwZW5kX2FyZyhhLCBwKTsKIAkJCQlwID0gc3RyY2hyKHAsICc9Jyk7Cg== --047d7b2e52909ae8eb04e07b5aa9-- From owner-freebsd-fs@FreeBSD.ORG Tue Jul 2 00:07:47 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 7A8ED477; Tue, 2 Jul 2013 00:07:47 +0000 (UTC) (envelope-from jdc@koitsu.org) Received: from relay3-d.mail.gandi.net (relay3-d.mail.gandi.net [217.70.183.195]) by mx1.freebsd.org (Postfix) with ESMTP id 3A3F01A88; Tue, 2 Jul 2013 00:07:47 +0000 (UTC) Received: from mfilter24-d.gandi.net (mfilter24-d.gandi.net [217.70.178.152]) by relay3-d.mail.gandi.net (Postfix) with ESMTP id 479F0A80B6; Tue, 2 Jul 2013 02:07:36 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at mfilter24-d.gandi.net Received: from relay3-d.mail.gandi.net ([217.70.183.195]) by mfilter24-d.gandi.net (mfilter24-d.gandi.net [10.0.15.180]) (amavisd-new, port 10024) with ESMTP id 0hYqt874lplK; Tue, 2 Jul 2013 02:07:34 +0200 (CEST) X-Originating-IP: 76.102.14.35 Received: from jdc.koitsu.org (c-76-102-14-35.hsd1.ca.comcast.net [76.102.14.35]) (Authenticated sender: jdc@koitsu.org) by relay3-d.mail.gandi.net (Postfix) with ESMTPSA id 63B1DA80B4; Tue, 2 Jul 2013 02:07:34 +0200 (CEST) Received: by icarus.home.lan (Postfix, from userid 1000) id 8F41C73A1C; Mon, 1 Jul 2013 17:07:32 -0700 (PDT) Date: Mon, 1 Jul 2013 17:07:32 -0700 From: Jeremy Chadwick To: Robert Millan Subject: Re: Compatibility options for mount(8) Message-ID: <20130702000732.GA72587@icarus.home.lan> References: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="Q68bSM7Ycu6FN28Q" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 02 Jul 2013 00:07:47 -0000 --Q68bSM7Ycu6FN28Q Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Tue, Jul 02, 2013 at 01:11:52AM +0200, Robert Millan wrote: > Hi, > > On Debian GNU/kFreeBSD, we've been using these bits of glue to make > FreeBSD mount a bit more compatible with the Linux version of mount. > > We found that this occasionally helps when porting software that needs > to use those features and relies on Linux semantics: > > - Ignore "-n" flag, since it requests not to update /etc/mtab, which > we never do anyway. > > - Map "-o remount" to its FreeBSD equivalent, "-o update". > > I'd like to check in the attached patch. Please have a look! Minor but are well-justified given quality of code: 1. Put "n" in alphabetical order/after "l" and not at the end of the getopt() string, i.e.: while ((ch = getopt(argc, argv, "adF:fLlno:prt:uvw")) != -1) 2. Please use strncmp(). I know other parts of the same code use strcmp() and those should really be improved at some other time, but while you're already there you might as well use strncmp() (you'll see others have done the same), i.e.: } else if (strncmp(p, "remount", 7) == 0) { And finally: 3. mount(8) man page should reflect these (IMO). Attached is the diff for that; wording based off of some other man pages. -- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | --Q68bSM7Ycu6FN28Q Content-Type: text/x-diff; charset=us-ascii Content-Disposition: attachment; filename="mount.8.diff" Index: mount.8 =================================================================== --- mount.8 (revision 252457) +++ mount.8 (working copy) @@ -106,6 +106,9 @@ a file system mount status from read-write to read Also forces the R/W mount of an unclean file system (dangerous; use with caution). +.It Fl n +For compatibility with some other implementations; this flag is +currently a no-op. .It Fl l When used in conjunction with the .Fl a @@ -239,6 +242,10 @@ It is set automatically when the user does not hav .It Cm nosymfollow Do not follow symlinks on the mounted file system. +.It Cm remount +The same as +.Cm update ; +for compatibility with some other implementations. .It Cm ro The same as .Fl r ; --Q68bSM7Ycu6FN28Q-- From owner-freebsd-fs@FreeBSD.ORG Tue Jul 2 03:42:28 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id C61F314B; Tue, 2 Jul 2013 03:42:28 +0000 (UTC) (envelope-from mckusick@mckusick.com) Received: from chez.mckusick.com (chez.mckusick.com [IPv6:2001:5a8:4:7e72:4a5b:39ff:fe12:452]) by mx1.freebsd.org (Postfix) with ESMTP id A75141268; Tue, 2 Jul 2013 03:42:28 +0000 (UTC) Received: from chez.mckusick.com (localhost [127.0.0.1]) by chez.mckusick.com (8.14.3/8.14.3) with ESMTP id r623gOTv012017; Mon, 1 Jul 2013 20:42:24 -0700 (PDT) (envelope-from mckusick@chez.mckusick.com) Message-Id: <201307020342.r623gOTv012017@chez.mckusick.com> To: Robert Millan Subject: Re: Compatibility options for mount(8) In-reply-to: Date: Mon, 01 Jul 2013 20:42:24 -0700 From: Kirk McKusick Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 02 Jul 2013 03:42:28 -0000 > Date: Tue, 2 Jul 2013 01:11:52 +0200 > Subject: Compatibility options for mount(8) > From: Robert Millan > To: freebsd-fs@freebsd.org > > Hi, > > On Debian GNU/kFreeBSD, we've been using these bits of glue to make > FreeBSD mount a bit more compatible with the Linux version of mount. Your proposed changes look reasonable to me. Some comments below. Also you need to update the manual page for mount to document these two changes lest someone be surprised or confused. > We found that this occasionally helps when porting software that needs > to use those features and relies on Linux semantics: > > - Ignore "-n" flag, since it requests not to update /etc/mtab, which > we never do anyway. > > - Map "-o remount" to its FreeBSD equivalent, "-o update". It is shorter to remap it to "-u" which is shorthand for "-o update". > I'd like to check in the attached patch. Please have a look! > > Thanks > > -- > Robert Millan > > Index: sbin/mount/mount.c > =================================================================== > --- sbin/mount/mount.c (revision 252490) > +++ sbin/mount/mount.c (working copy) > @@ -253,7 +253,7 @@ > options = NULL; > vfslist = NULL; > vfstype = "ufs"; > - while ((ch = getopt(argc, argv, "adF:fLlo:prt:uvw")) != -1) > + while ((ch = getopt(argc, argv, "adF:fLlo:prt:uvwn")) != -1) Our coding style is to sort the options into alphabetical order. So the "n" option should be between "l" and "o". > switch (ch) { > case 'a': > all = 1; > @@ -274,6 +274,9 @@ > case 'l': > late = 1; > break; > + case 'n': > + /* For compatibility with the Linux version of mount. */ > + break; > case 'o': > if (*optarg) { > options = catopt(options, optarg); > @@ -771,6 +774,11 @@ > } else if (strncmp(p, groupquotaeq, > sizeof(groupquotaeq) - 1) == 0) { > continue; > + } else if (strcmp(p, "remount") == 0) { > + /* For compatibility with the Linux version of mount. */ > + append_arg(a, strdup("-o")); > + append_arg(a, strdup("update")); > + continue; As noted above, I would recoomend using "-u". > } else if (*p == '-') { > append_arg(a, p); > p = strchr(p, '='); Kirk McKusick From owner-freebsd-fs@FreeBSD.ORG Tue Jul 2 06:04:26 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 46766B80 for ; Tue, 2 Jul 2013 06:04:26 +0000 (UTC) (envelope-from lists@eitanadler.com) Received: from mail-pd0-x231.google.com (mail-pd0-x231.google.com [IPv6:2607:f8b0:400e:c02::231]) by mx1.freebsd.org (Postfix) with ESMTP id 2313817F7 for ; Tue, 2 Jul 2013 06:04:26 +0000 (UTC) Received: by mail-pd0-f177.google.com with SMTP id p10so3248846pdj.8 for ; Mon, 01 Jul 2013 23:04:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=eitanadler.com; s=0xdeadbeef; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=oa+TTRgirSH5UW9FanCQJgt7pk7nU6M+D9bQgjr9dQg=; b=Nuicn9dlB1/homqMupIiqg7AtCX7VJly2g5jkjygWjg0+XST/6D3nfcsmhDo0rQwLD WTCFvp07UXin512RzQnKAk+vxRXzI50BeiF3xpsuLC4paqhajuCD5dGNgyvGVS8VV3H+ Eytiry7Lhy1UhKyn7dRqQ8CrM8JKwAcQQC6Fo= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type:x-gm-message-state; bh=oa+TTRgirSH5UW9FanCQJgt7pk7nU6M+D9bQgjr9dQg=; b=KhG4wH4Ckf/sLSJssHXxuySKpBwmw0DcMJ+aqI7q/SdwNbhsUcJ8xbH6yNiUejhA1d 5TLwr3PHIEpIvhnCiiroetnmRMcJHmUSaV92iH5n/MJzc3bjh+zgkPdzk93QYGl0J2CQ 9Sk+kYMcpoMBx20hGkt6ypVITRywidJmeo4bzGuk9leAD8Ja3HE5UaDpUsKFUlkPjvp9 Daor3pViVg5579DJyowst1UmPpUG3gjkUmqT+jTYjrp7rGPlh7j8SJ3ThSKkdJblr+xl cWXR8IY4/itpOtlwuwjNvvSQ99wMHImYvvKPBDdCXVxWrs0qzZi+E4ziMVbZFM/5mwHh SrJg== X-Received: by 10.68.231.200 with SMTP id ti8mr27246217pbc.46.1372745065744; Mon, 01 Jul 2013 23:04:25 -0700 (PDT) MIME-Version: 1.0 Received: by 10.70.45.33 with HTTP; Mon, 1 Jul 2013 23:03:55 -0700 (PDT) In-Reply-To: References: From: Eitan Adler Date: Tue, 2 Jul 2013 08:03:55 +0200 Message-ID: Subject: Re: Compatibility options for mount(8) To: Robert Millan Content-Type: text/plain; charset=UTF-8 X-Gm-Message-State: ALoCoQlxPevFaBvF7PHq4o7nZDq+MnYYbXrWhZrp1wGJAYE0XyCNXCV2NKo5GFQY4aIwYMx/vIZp Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 02 Jul 2013 06:04:26 -0000 On Tue, Jul 2, 2013 at 1:11 AM, Robert Millan wrote: > I'd like to check in the attached patch. Please have a look! Please make sure to update the man page as well. -- Eitan Adler From owner-freebsd-fs@FreeBSD.ORG Tue Jul 2 22:07:40 2013 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 5FB4CF37 for ; Tue, 2 Jul 2013 22:07:40 +0000 (UTC) (envelope-from kob6558@gmail.com) Received: from mail-oa0-x236.google.com (mail-oa0-x236.google.com [IPv6:2607:f8b0:4003:c02::236]) by mx1.freebsd.org (Postfix) with ESMTP id 31CDA193A for ; Tue, 2 Jul 2013 22:07:40 +0000 (UTC) Received: by mail-oa0-f54.google.com with SMTP id o6so7203467oag.27 for ; Tue, 02 Jul 2013 15:07:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:date:x-google-sender-auth:message-id:subject :from:to:cc:content-type; bh=YlA02al/8v8JECynBIECl3Rzuu9NXn9tXhkX+hh4DfI=; b=NqSr0RzQKNPY03SZrp0Z23nIHSMhRO2z0sTF5HaqmcC4yQ2pxC/VGQbcGQ77l0bm67 rVwGjdFWLkfE9MjGLYz79qH0jgjkJk42lAhJnI/BxGvJJRGeLaEVvPv+P7AL1yiA8AJo 3wlwUgioA9AaQbbpgaOHtpEdloe9A/E4GnDJZTGE5+aabiQbeeBw1R5Pii2Iptn12XJy A49MAHfMDQRaBMlgDaigQd0D7RB9qyHyKQADPcpLyCH5uYqoi0H5gnGzJtZzA80CQ1sa NsqqTz4VxMMPEJuV3ssH6XG4agRjo0omyUMc4MN/lw4r/QT/KIJsQhsZgTQo3qu0j0Rr tu3A== MIME-Version: 1.0 X-Received: by 10.182.171.7 with SMTP id aq7mr13017251obc.103.1372802859126; Tue, 02 Jul 2013 15:07:39 -0700 (PDT) Sender: kob6558@gmail.com Received: by 10.76.112.212 with HTTP; Tue, 2 Jul 2013 15:07:39 -0700 (PDT) Date: Tue, 2 Jul 2013 15:07:39 -0700 X-Google-Sender-Auth: V5e2uS_onrtWK_7E6maON3scDiM Message-ID: Subject: New fusefs implementation not usable with multiple fusefs mounts From: Kevin Oberman To: fs@freebsd.org Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: Attilio Rao X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 02 Jul 2013 22:07:40 -0000 I have been using the new fusefs for a while and have had to back it out and go back to the old kernel module. I keep getting corrupted file NTFS systems and I think I understand why, I mount two NTFS systems: /dev/fuse 184319948 110625056 73694892 60% /media/Media /dev/fuse 110636028 104943584 5692444 95% /media/Windows7_OS Note that both systems are mounted on /dev/fuse and I am assured that this is by design. Both work fine for reads and seem to work for writes. Then I unmount either of them. Both are unmounted, at least as far as the OS is concerned. There is no way to unmount one and leave the other mounted. It appears that any attempt to unmount either system does a proper unmount of /media/Media, but, while marking /media/Windows7_OS as unmounted, actually does not do so. The device ends up corrupt and the only way I have been able to clean it is to boot Windows and have a disk check run. Media never seems to get corrupted. Any further information I might gather before filing a PR? I am running on 9.1 stable, but havehad the problem since the patch set first became available on 9.0-stable. -- R. Kevin Oberman, Network Engineer E-mail: rkoberman@gmail.com From owner-freebsd-fs@FreeBSD.ORG Tue Jul 2 22:20:01 2013 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 38A23286; Tue, 2 Jul 2013 22:20:01 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from mail-ie0-x231.google.com (mail-ie0-x231.google.com [IPv6:2607:f8b0:4001:c03::231]) by mx1.freebsd.org (Postfix) with ESMTP id 06AC6199B; Tue, 2 Jul 2013 22:20:00 +0000 (UTC) Received: by mail-ie0-f177.google.com with SMTP id aq17so13696165iec.36 for ; Tue, 02 Jul 2013 15:20:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:reply-to:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=hBGo1+pUXsexSuQaA0Vk2o5xHWkcdJDOw3R6r+pQERQ=; b=wSrBj5CZbBlnqLSbh1CGGWs/i6F4EdUN1PfvBy9dxEuHn85f26udIH4uUUAJD5XHrI xX6oL/6jLdE2Rc/5antCQVmUCy54NQRIFT7sjPkAxC/Xa6Y12y0AiiYsjESpW9Aveowz bbq03J25ngbV5hyik7i9P7E8Wt5RQMkSB4tKcDn9SeN4xj/lk/U8qolOPSx6cC7vmBCC y6+YlEBNcwBtkssbo4WB4QH8STYc/7fg5alMcQCOFQHqBorJ+7hrN8/JidBUEE9ORYgr r7i5c7aKIgSnIw8P/ueCAgI1N2ihqX4WKS6JunTSfI26cx0w59nPFc7fCxIGjkr2PApw FFAQ== MIME-Version: 1.0 X-Received: by 10.42.43.4 with SMTP id v4mr516861ice.109.1372803600779; Tue, 02 Jul 2013 15:20:00 -0700 (PDT) Sender: asmrookie@gmail.com Received: by 10.42.253.129 with HTTP; Tue, 2 Jul 2013 15:20:00 -0700 (PDT) In-Reply-To: References: Date: Wed, 3 Jul 2013 00:20:00 +0200 X-Google-Sender-Auth: 8UrOPadts59ZtxuRGA_gi3Ooeok Message-ID: Subject: Re: New fusefs implementation not usable with multiple fusefs mounts From: Attilio Rao To: Kevin Oberman , George Neville-Neil Content-Type: text/plain; charset=UTF-8 Cc: fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: attilio@FreeBSD.org List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 02 Jul 2013 22:20:01 -0000 On Wed, Jul 3, 2013 at 12:07 AM, Kevin Oberman wrote: > I have been using the new fusefs for a while and have had to back it out and > go back to the old kernel module. I keep getting corrupted file NTFS systems > and I think I understand why, > > I mount two NTFS systems: > /dev/fuse 184319948 110625056 73694892 60% /media/Media > /dev/fuse 110636028 104943584 5692444 95% /media/Windows7_OS > > Note that both systems are mounted on /dev/fuse and I am assured that this > is by design. Both work fine for reads and seem to work for writes. Then I > unmount either of them. Both are unmounted, at least as far as the OS is > concerned. There is no way to unmount one and leave the other mounted. It > appears that any attempt to unmount either system does a proper unmount of > /media/Media, but, while marking /media/Windows7_OS as unmounted, actually > does not do so. The device ends up corrupt and the only way I have been able > to clean it is to boot Windows and have a disk check run. Media never seems > to get corrupted. > > Any further information I might gather before filing a PR? I am running on > 9.1 stable, but havehad the problem since the patch set first became > available on 9.0-stable. I do not understand, new fusefs implementation was never committed to stable branch to my knowledge. Did you backport manually? BTW I cc'ed George which should maintain the module. Attilio -- Peace can only be achieved by understanding - A. Einstein From owner-freebsd-fs@FreeBSD.ORG Tue Jul 2 22:46:22 2013 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 7EEA46D5; Tue, 2 Jul 2013 22:46:22 +0000 (UTC) (envelope-from kob6558@gmail.com) Received: from mail-oa0-x22d.google.com (mail-oa0-x22d.google.com [IPv6:2607:f8b0:4003:c02::22d]) by mx1.freebsd.org (Postfix) with ESMTP id 321221A6F; Tue, 2 Jul 2013 22:46:22 +0000 (UTC) Received: by mail-oa0-f45.google.com with SMTP id j1so7195636oag.32 for ; Tue, 02 Jul 2013 15:46:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=DpXzrOj/Li3xoHoVL/lXjDf6X/wC+lzYZligoTxh4V4=; b=PHL1Kf3bUAiFixF+4rycWKtHKYGjfPkx0CBYhMhfoghzTpSnQrBpdap0q3nuP+y0xZ 4G1YL1Qg004QfDPr2Im/vVuvsrMYKXHh/p9PBVqVkijUYsZvJQJ5NHCfwCLwL/NUL9dR XVP9K4QXiJlNQy0TOGviaPJwWOLnZwTrQb8Z10NwtsVGCYcIiwVms04u7hzSPVdYuAtg BZjsuvW62TS9svLcMdu/40rDtXErHEdoUoa3f4u9DSY5XSpWWxkFadnDw6RmFrOwRN4/ rxUApWE5jk8aKDdGF0/Dhtm/HJhY6avsPWiOnwMWHO2lueDUoGmYq7C44tGqoth8A93z K3jw== MIME-Version: 1.0 X-Received: by 10.60.52.165 with SMTP id u5mr13087444oeo.15.1372805181794; Tue, 02 Jul 2013 15:46:21 -0700 (PDT) Sender: kob6558@gmail.com Received: by 10.76.112.212 with HTTP; Tue, 2 Jul 2013 15:46:21 -0700 (PDT) In-Reply-To: References: Date: Tue, 2 Jul 2013 15:46:21 -0700 X-Google-Sender-Auth: y5wPx6dGHTdnsbkeQlieJ9prw_c Message-ID: Subject: Re: New fusefs implementation not usable with multiple fusefs mounts From: Kevin Oberman To: Attilio Rao Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: George Neville-Neil , fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 02 Jul 2013 22:46:22 -0000 On Tue, Jul 2, 2013 at 3:20 PM, Attilio Rao wrote: > On Wed, Jul 3, 2013 at 12:07 AM, Kevin Oberman > wrote: > > I have been using the new fusefs for a while and have had to back it out > and > > go back to the old kernel module. I keep getting corrupted file NTFS > systems > > and I think I understand why, > > > > I mount two NTFS systems: > > /dev/fuse 184319948 110625056 73694892 60% /media/Media > > /dev/fuse 110636028 104943584 5692444 95% /media/Windows7_OS > > > > Note that both systems are mounted on /dev/fuse and I am assured that > this > > is by design. Both work fine for reads and seem to work for writes. Then > I > > unmount either of them. Both are unmounted, at least as far as the OS is > > concerned. There is no way to unmount one and leave the other mounted. It > > appears that any attempt to unmount either system does a proper unmount > of > > /media/Media, but, while marking /media/Windows7_OS as unmounted, > actually > > does not do so. The device ends up corrupt and the only way I have been > able > > to clean it is to boot Windows and have a disk check run. Media never > seems > > to get corrupted. > > > > Any further information I might gather before filing a PR? I am running > on > > 9.1 stable, but havehad the problem since the patch set first became > > available on 9.0-stable. > > I do not understand, new fusefs implementation was never committed to > stable branch to my knowledge. > Did you backport manually? > > BTW I cc'ed George which should maintain the module. > > Attilio > Attilio, Actually, you provided the patches for 9-Stable way back when you first did them and we had an exchange on current@ about their use on 9-stable and their operation including the mounts all being on /dev/fuse. I also edited the mount_fuse man pages to clarify the awkward wording of the original (which you didn't write). They still apply pretty cleanly and I continued using them until about 3 weeks ago when I removed them to test whether they were responsible for the issues I was seeing. Since I got corruption most every time I unmounted the file systems after having written to the Windows one, I am now pretty sure that it does not happen when I use the old kernel module. The analysis of the problem is purely speculation, but fits the behavior. If it is correct, I would expect the same issues to occur with head. Thanks for copying George. I didn't realize that he had taken over the code. I won't bu you about it again. -- R. Kevin Oberman, Network Engineer E-mail: rkoberman@gmail.com From owner-freebsd-fs@FreeBSD.ORG Wed Jul 3 00:56:42 2013 Return-Path: Delivered-To: freebsd-fs@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 566106A5; Wed, 3 Jul 2013 00:56:42 +0000 (UTC) (envelope-from linimon@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id 31E871F0C; Wed, 3 Jul 2013 00:56:42 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id r630ugA9067396; Wed, 3 Jul 2013 00:56:42 GMT (envelope-from linimon@freefall.freebsd.org) Received: (from linimon@localhost) by freefall.freebsd.org (8.14.7/8.14.7/Submit) id r630ubMW067390; Wed, 3 Jul 2013 00:56:37 GMT (envelope-from linimon) Date: Wed, 3 Jul 2013 00:56:37 GMT Message-Id: <201307030056.r630ubMW067390@freefall.freebsd.org> To: rick@wirelessleiden.nl, linimon@FreeBSD.org, daichi@FreeBSD.org, freebsd-fs@FreeBSD.org From: linimon@FreeBSD.org Subject: Re: kern/121385: [unionfs] unionfs cross mount -> kernel panic X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jul 2013 00:56:42 -0000 Synopsis: [unionfs] unionfs cross mount -> kernel panic State-Changed-From-To: open->open State-Changed-By: linimon State-Changed-When: Wed Jul 3 00:50:32 UTC 2013 State-Changed-Why: commit bit has been taken in for safekeeping. Responsible-Changed-From-To: daichi->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Wed Jul 3 00:50:32 UTC 2013 Responsible-Changed-Why: http://www.freebsd.org/cgi/query-pr.cgi?pr=121385 From owner-freebsd-fs@FreeBSD.ORG Wed Jul 3 00:56:57 2013 Return-Path: Delivered-To: freebsd-fs@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id D57AD715; Wed, 3 Jul 2013 00:56:57 +0000 (UTC) (envelope-from linimon@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id AF95B1F10; Wed, 3 Jul 2013 00:56:57 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id r630uvvK067434; Wed, 3 Jul 2013 00:56:57 GMT (envelope-from linimon@freefall.freebsd.org) Received: (from linimon@localhost) by freefall.freebsd.org (8.14.7/8.14.7/Submit) id r630uvZw067433; Wed, 3 Jul 2013 00:56:57 GMT (envelope-from linimon) Date: Wed, 3 Jul 2013 00:56:57 GMT Message-Id: <201307030056.r630uvZw067433@freefall.freebsd.org> To: freebsd-pr@cropcirclesystems.com, linimon@FreeBSD.org, daichi@FreeBSD.org, freebsd-fs@FreeBSD.org From: linimon@FreeBSD.org Subject: Re: bin/123574: [unionfs] df(1) -t option destroys info for unionfs (and maybe other) mounts X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jul 2013 00:56:57 -0000 Synopsis: [unionfs] df(1) -t option destroys info for unionfs (and maybe other) mounts State-Changed-From-To: open->open State-Changed-By: linimon State-Changed-When: Wed Jul 3 00:50:32 UTC 2013 State-Changed-Why: commit bit has been taken in for safekeeping. Responsible-Changed-From-To: daichi->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Wed Jul 3 00:50:32 UTC 2013 Responsible-Changed-Why: http://www.freebsd.org/cgi/query-pr.cgi?pr=123574 From owner-freebsd-fs@FreeBSD.ORG Wed Jul 3 00:57:24 2013 Return-Path: Delivered-To: freebsd-fs@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 39965799; Wed, 3 Jul 2013 00:57:24 +0000 (UTC) (envelope-from linimon@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id 139CE1F20; Wed, 3 Jul 2013 00:57:24 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id r630vNFl067479; Wed, 3 Jul 2013 00:57:23 GMT (envelope-from linimon@freefall.freebsd.org) Received: (from linimon@localhost) by freefall.freebsd.org (8.14.7/8.14.7/Submit) id r630vIjA067478; Wed, 3 Jul 2013 00:57:18 GMT (envelope-from linimon) Date: Wed, 3 Jul 2013 00:57:18 GMT Message-Id: <201307030057.r630vIjA067478@freefall.freebsd.org> To: gw.freebsd@tnode.com, linimon@FreeBSD.org, daichi@FreeBSD.org, freebsd-fs@FreeBSD.org From: linimon@FreeBSD.org Subject: Re: kern/126553: [unionfs] unionfs move directory problem 2 (files appear) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jul 2013 00:57:24 -0000 Synopsis: [unionfs] unionfs move directory problem 2 (files appear) State-Changed-From-To: open->open State-Changed-By: linimon State-Changed-When: Wed Jul 3 00:50:32 UTC 2013 State-Changed-Why: commit bit has been taken in for safekeeping. Responsible-Changed-From-To: daichi->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Wed Jul 3 00:50:32 UTC 2013 Responsible-Changed-Why: http://www.freebsd.org/cgi/query-pr.cgi?pr=126553 From owner-freebsd-fs@FreeBSD.ORG Wed Jul 3 00:59:06 2013 Return-Path: Delivered-To: freebsd-fs@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 27DE8930; Wed, 3 Jul 2013 00:59:06 +0000 (UTC) (envelope-from linimon@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id F07961F3E; Wed, 3 Jul 2013 00:59:05 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id r630x5SG067575; Wed, 3 Jul 2013 00:59:05 GMT (envelope-from linimon@freefall.freebsd.org) Received: (from linimon@localhost) by freefall.freebsd.org (8.14.7/8.14.7/Submit) id r630x5dP067574; Wed, 3 Jul 2013 00:59:05 GMT (envelope-from linimon) Date: Wed, 3 Jul 2013 00:59:05 GMT Message-Id: <201307030059.r630x5dP067574@freefall.freebsd.org> To: naylor.b.david@gmail.com, linimon@FreeBSD.org, daichi@FreeBSD.org, freebsd-fs@FreeBSD.org From: linimon@FreeBSD.org Subject: Re: kern/126973: [unionfs] [hang] System hang with unionfs and init chroot [regression] X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jul 2013 00:59:06 -0000 Synopsis: [unionfs] [hang] System hang with unionfs and init chroot [regression] State-Changed-From-To: open->open State-Changed-By: linimon State-Changed-When: Wed Jul 3 00:50:32 UTC 2013 State-Changed-Why: commit bit has been taken in for safekeeping. Responsible-Changed-From-To: daichi->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Wed Jul 3 00:50:32 UTC 2013 Responsible-Changed-Why: http://www.freebsd.org/cgi/query-pr.cgi?pr=126973 From owner-freebsd-fs@FreeBSD.ORG Wed Jul 3 00:59:21 2013 Return-Path: Delivered-To: freebsd-fs@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 311359A3; Wed, 3 Jul 2013 00:59:21 +0000 (UTC) (envelope-from linimon@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id 0DB1C1F43; Wed, 3 Jul 2013 00:59:21 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id r630xKcB067621; Wed, 3 Jul 2013 00:59:20 GMT (envelope-from linimon@freefall.freebsd.org) Received: (from linimon@localhost) by freefall.freebsd.org (8.14.7/8.14.7/Submit) id r630xKue067620; Wed, 3 Jul 2013 00:59:20 GMT (envelope-from linimon) Date: Wed, 3 Jul 2013 00:59:20 GMT Message-Id: <201307030059.r630xKue067620@freefall.freebsd.org> To: danny@cs.huji.ac.il, linimon@FreeBSD.org, daichi@FreeBSD.org, freebsd-fs@FreeBSD.org From: linimon@FreeBSD.org Subject: Re: kern/137588: [unionfs] [lor] LOR nfs/ufs/nfs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jul 2013 00:59:21 -0000 Synopsis: [unionfs] [lor] LOR nfs/ufs/nfs State-Changed-From-To: open->open State-Changed-By: linimon State-Changed-When: Wed Jul 3 00:50:32 UTC 2013 State-Changed-Why: commit bit has been taken in for safekeeping. Responsible-Changed-From-To: daichi->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Wed Jul 3 00:50:32 UTC 2013 Responsible-Changed-Why: http://www.freebsd.org/cgi/query-pr.cgi?pr=137588 From owner-freebsd-fs@FreeBSD.ORG Wed Jul 3 00:59:35 2013 Return-Path: Delivered-To: freebsd-fs@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id BE396A1B; Wed, 3 Jul 2013 00:59:35 +0000 (UTC) (envelope-from linimon@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id 641F91F4B; Wed, 3 Jul 2013 00:59:35 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id r630xZtZ067659; Wed, 3 Jul 2013 00:59:35 GMT (envelope-from linimon@freefall.freebsd.org) Received: (from linimon@localhost) by freefall.freebsd.org (8.14.7/8.14.7/Submit) id r630xZvr067658; Wed, 3 Jul 2013 00:59:35 GMT (envelope-from linimon) Date: Wed, 3 Jul 2013 00:59:35 GMT Message-Id: <201307030059.r630xZvr067658@freefall.freebsd.org> To: naylor.b.david@gmail.com, linimon@FreeBSD.org, daichi@FreeBSD.org, freebsd-fs@FreeBSD.org From: linimon@FreeBSD.org Subject: Re: kern/141950: [unionfs] [lor] ufs/unionfs/ufs Lock order reversal X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jul 2013 00:59:35 -0000 Synopsis: [unionfs] [lor] ufs/unionfs/ufs Lock order reversal State-Changed-From-To: open->open State-Changed-By: linimon State-Changed-When: Wed Jul 3 00:50:32 UTC 2013 State-Changed-Why: commit bit has been taken in for safekeeping. Responsible-Changed-From-To: daichi->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Wed Jul 3 00:50:32 UTC 2013 Responsible-Changed-Why: http://www.freebsd.org/cgi/query-pr.cgi?pr=141950 From owner-freebsd-fs@FreeBSD.ORG Wed Jul 3 00:59:51 2013 Return-Path: Delivered-To: freebsd-fs@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 2AB3CA95; Wed, 3 Jul 2013 00:59:51 +0000 (UTC) (envelope-from linimon@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id 032061F5A; Wed, 3 Jul 2013 00:59:51 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id r630xonu067702; Wed, 3 Jul 2013 00:59:50 GMT (envelope-from linimon@freefall.freebsd.org) Received: (from linimon@localhost) by freefall.freebsd.org (8.14.7/8.14.7/Submit) id r630xorM067701; Wed, 3 Jul 2013 00:59:50 GMT (envelope-from linimon) Date: Wed, 3 Jul 2013 00:59:50 GMT Message-Id: <201307030059.r630xorM067701@freefall.freebsd.org> To: freebsd@omnilan.de, linimon@FreeBSD.org, daichi@FreeBSD.org, freebsd-fs@FreeBSD.org From: linimon@FreeBSD.org Subject: Re: kern/145750: [unionfs] [hang] unionfs locks the machine X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jul 2013 00:59:51 -0000 Synopsis: [unionfs] [hang] unionfs locks the machine State-Changed-From-To: open->open State-Changed-By: linimon State-Changed-When: Wed Jul 3 00:50:32 UTC 2013 State-Changed-Why: commit bit has been taken in for safekeeping. Responsible-Changed-From-To: daichi->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Wed Jul 3 00:50:32 UTC 2013 Responsible-Changed-Why: http://www.freebsd.org/cgi/query-pr.cgi?pr=145750 From owner-freebsd-fs@FreeBSD.ORG Wed Jul 3 01:00:05 2013 Return-Path: Delivered-To: freebsd-fs@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id D4B24B0E; Wed, 3 Jul 2013 01:00:05 +0000 (UTC) (envelope-from linimon@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id B0D591F66; Wed, 3 Jul 2013 01:00:05 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id r63105Z4067900; Wed, 3 Jul 2013 01:00:05 GMT (envelope-from linimon@freefall.freebsd.org) Received: (from linimon@localhost) by freefall.freebsd.org (8.14.7/8.14.7/Submit) id r631051Q067899; Wed, 3 Jul 2013 01:00:05 GMT (envelope-from linimon) Date: Wed, 3 Jul 2013 01:00:05 GMT Message-Id: <201307030100.r631051Q067899@freefall.freebsd.org> To: naylor.b.david@gmail.com, linimon@FreeBSD.org, daichi@FreeBSD.org, freebsd-fs@FreeBSD.org From: linimon@FreeBSD.org Subject: Re: kern/175449: [unionfs] unionfs and devfs misbehaviour X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jul 2013 01:00:05 -0000 Synopsis: [unionfs] unionfs and devfs misbehaviour State-Changed-From-To: open->open State-Changed-By: linimon State-Changed-When: Wed Jul 3 00:50:32 UTC 2013 State-Changed-Why: commit bit has been taken in for safekeeping. Responsible-Changed-From-To: daichi->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Wed Jul 3 00:50:32 UTC 2013 Responsible-Changed-Why: http://www.freebsd.org/cgi/query-pr.cgi?pr=175449 From owner-freebsd-fs@FreeBSD.ORG Wed Jul 3 01:11:01 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id E16BD73E for ; Wed, 3 Jul 2013 01:11:00 +0000 (UTC) (envelope-from berend@pobox.com) Received: from smtp.pobox.com (b-pb-sasl-quonix.pobox.com [208.72.237.35]) by mx1.freebsd.org (Postfix) with ESMTP id A082E1097 for ; Wed, 3 Jul 2013 01:10:59 +0000 (UTC) Received: from smtp.pobox.com (unknown [127.0.0.1]) by b-sasl-quonix.pobox.com (Postfix) with ESMTP id D5C761F2A7 for ; Wed, 3 Jul 2013 01:10:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=pobox.com; h=date :message-id:from:to:subject:mime-version:content-type :content-transfer-encoding; s=sasl; bh=pNGIL73M6guCWU6hPLeqfzt75 OI=; b=guww5N/vLkqCby9+a9XfpXaM5IvjFhvokS01rF1Pw4qMXEhZC4wIOaWez W71L6Ro2llwFURreCGdTVB2AcEo9DpV8AszRnBLMrnBp++5y41vJ12rY9ovZpbTF CNAR7LRIT6l9INws5pe8i+i51Qy07B58nxJFYrYjSqxxwq/xRo= DomainKey-Signature: a=rsa-sha1; c=nofws; d=pobox.com; h=date:message-id :from:to:subject:mime-version:content-type :content-transfer-encoding; q=dns; s=sasl; b=LQUZqYcim+dQ0Fe30wB vF8zaJ1YOh/7EpbO3HmG3UPz2anef5aafQr176wK5BNeXFlX8/8qIbRvZIbzGDIk TkqAO8zU63Qf7ZwTbB1g9Gj+JUN0R+ByrwFJdIaaRDa6NSQEmOfr3Cm4zAhZIG/w LwhdDXjsTBnKSBQ6l5WexuWI= Received: from b-pb-sasl-quonix.pobox.com (unknown [127.0.0.1]) by b-sasl-quonix.pobox.com (Postfix) with ESMTP id CB1741F2A6 for ; Wed, 3 Jul 2013 01:10:28 +0000 (UTC) Received: from bmach.nederware.nl (unknown [27.252.247.84]) by b-sasl-quonix.pobox.com (Postfix) with ESMTPA id 438F21F2A2 for ; Wed, 3 Jul 2013 01:10:28 +0000 (UTC) Received: from quadrio.nederware.nl (quadrio.nederware.nl [192.168.33.13]) by bmach.nederware.nl (Postfix) with ESMTP id 52F625C7C for ; Wed, 3 Jul 2013 13:10:21 +1200 (NZST) Received: from quadrio.nederware.nl (quadrio.nederware.nl [127.0.0.1]) by quadrio.nederware.nl (Postfix) with ESMTP id EAAEF4A11BF3 for ; Wed, 3 Jul 2013 13:10:25 +1200 (NZST) Date: Wed, 03 Jul 2013 13:10:25 +1200 Message-ID: <87li5o5tz2.wl%berend@pobox.com> From: Berend de Boer To: freebsd-fs@FreeBSD.org Subject: EBS snapshot backups from a FreeBSD zfs file system: zpool freeze? User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL/10.8 EasyPG/1.0.0 Emacs/24.3 (i686-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) Organization: Xplain Technology Ltd MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: multipart/signed; boundary="pgp-sign-Multipart_Wed_Jul__3_13:10:25_2013-1"; micalg=pgp-sha256; protocol="application/pgp-signature" Content-Transfer-Encoding: 7bit X-Pobox-Relay-ID: 59484966-E37D-11E2-822C-E84251E3A03C-48001098!b-pb-sasl-quonix.pobox.com X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jul 2013 01:11:01 -0000 --pgp-sign-Multipart_Wed_Jul__3_13:10:25_2013-1 Content-Type: text/plain; charset=US-ASCII Hi All, I'm experimenting with building a FreeBSD NFS server on Amazon AWS EC2. I've created a zpool with 5 disks in a raidz2 configuration. How can I make a consistent backup of this using EBS? On Linux' file systems I can freeze a file system, start the backup of all disks, and unfreeze. This freeze usually only takes 100ms or so. ZFS on FreeBSD does not appear to have such an option. I.e. what I'm looking for is basically a hardware based snapshot. ZFS should simply be suspended at a recoverable point for a few hundred ms. A similar question from 2010 is here: http://thr3ads.net/zfs-discuss/2010/11/580781-how-to-quiesce-and-unquiesc-zfs-and-zpool-for-array-hardware-snapshots Absent a "zfs freeze" it seems using FreeBSD/zfs on AWS with EBS is going to be impossible. Unfortunately that means back to Linux sigh. -- All the best, Berend de Boer ------------------------------------------------------ Awesome Drupal hosting: https://www.xplainhosting.com/ --pgp-sign-Multipart_Wed_Jul__3_13:10:25_2013-1 Content-Type: application/pgp-signature Content-Transfer-Encoding: 7bit Content-Description: OpenPGP Digital Signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.17 (GNU/Linux) iQIcBAABCAAGBQJR03oBAAoJEKOfeD48G3g5rswQALCq24ztGoQeajSHsqvjD0X+ pEzGlIWQEzL90+JDzcB/0+S2Ba5DAhe1li4ff4EzuRp9hdH4goY7uR3oRUG5pgqO fETd4bcKKgqFVSnj6tshxscz42ysGT8M+NygrxJW6MjWbSuTTegcbE1o2pW+b9rf 8rPTD/BTkE5GRXk3WcdEk7fDXEZDXfb9jpW8DL83N2gaYj0itzpsyIKsnoD+AM4T 2Nfy7YYvriOvzKUijyUjxF7Yxv9gVjDhbiPGDxMn+kb7ISO+E0fvZBoZ6I8tS1WA 59oLC6qsUkk7BDV5UeFZxWcDZB96HF6hpq+4bcG3loqrk9pmWY6z+Zm71T02/PVW o564xSmAaiJpAP6HwGy+VjyBHjWtgH/G/2DUzMvjz8VMffH76lcSj0DB16KJ+yg7 /A2X/0xqKaxJVLxNEfwBfBNWJZyrAikOAAQkhnu87vyK2GVsZkqEGWOwmqqRvQv7 d0+LUAq6mv40K3Q/mqe8cgLcniMaLD5bAskAyUVyAa/Fj/0bYBxyRXLozv+FA72X VN/TM1YSL4WNC6QRUI8UbqYhH5AcelKo5xEV70xEdU9U7JiVd54ZFRY7rUaZjWf+ CmHDDuYTie+fK+1MWLdaT34xafDrzX/dbh2S2Hual6ly5lBovt3KrY59Tyu2oBhJ MLri85lCr6Gy+sxX+r2l =7vVs -----END PGP SIGNATURE----- --pgp-sign-Multipart_Wed_Jul__3_13:10:25_2013-1-- From owner-freebsd-fs@FreeBSD.ORG Wed Jul 3 01:27:00 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id A469AB74 for ; Wed, 3 Jul 2013 01:27:00 +0000 (UTC) (envelope-from amvandemore@gmail.com) Received: from mail-pb0-x22e.google.com (mail-pb0-x22e.google.com [IPv6:2607:f8b0:400e:c01::22e]) by mx1.freebsd.org (Postfix) with ESMTP id 83CA71154 for ; Wed, 3 Jul 2013 01:27:00 +0000 (UTC) Received: by mail-pb0-f46.google.com with SMTP id rq2so6763220pbb.19 for ; Tue, 02 Jul 2013 18:27:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=zOVDq6ypzrixNqQ6V62PG4kKRwoBDvWY2rUik7Hgx4Y=; b=nMU4xJEc0u2DaUCa/lXLYoFQmd29C4nztazsl+p1pSSwhPQ6I8kK9rvp9RCg4bpp+A GrsOtud0t9o91aOagRQr3br9D8n7arlEbRJK7rqXz1XGtZAa3Q9lDyf031KWbh0oC583 iQzeJPk9F2ukImPnaoaWGlI+BM45/7nXfZrc+3/YCkVdph9i4C61WPfZBaD+a3eQwflX uijm/rH1Z527fOhSAPnl59BxuBQzh08d8DnqfGFeWT1PyVsgqj0suW8jVUUuUpShG0lo MRxRGwz+2JqBU/+l1e54RuN0M/36r4CuJVxB6ROwbpT4sHDDGbFqVfe4PhbNAA8Qrvud 0sdQ== MIME-Version: 1.0 X-Received: by 10.66.159.105 with SMTP id xb9mr278972pab.146.1372814819981; Tue, 02 Jul 2013 18:26:59 -0700 (PDT) Received: by 10.70.88.74 with HTTP; Tue, 2 Jul 2013 18:26:59 -0700 (PDT) In-Reply-To: <87li5o5tz2.wl%berend@pobox.com> References: <87li5o5tz2.wl%berend@pobox.com> Date: Tue, 2 Jul 2013 20:26:59 -0500 Message-ID: Subject: Re: EBS snapshot backups from a FreeBSD zfs file system: zpool freeze? From: Adam Vande More To: Berend de Boer Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jul 2013 01:27:00 -0000 On Tue, Jul 2, 2013 at 8:10 PM, Berend de Boer wrote: > Hi All, > > I'm experimenting with building a FreeBSD NFS server on Amazon AWS > EC2. I've created a zpool with 5 disks in a raidz2 configuration. > > How can I make a consistent backup of this using EBS? > > On Linux' file systems I can freeze a file system, start the backup of > all disks, and unfreeze. This freeze usually only takes 100ms or so. > > ZFS on FreeBSD does not appear to have such an option. I.e. what I'm > looking for is basically a hardware based snapshot. ZFS should simply > be suspended at a recoverable point for a few hundred ms. > > A similar question from 2010 is here: > > http://thr3ads.net/zfs-discuss/2010/11/580781-how-to-quiesce-and-unquiesc-zfs-and-zpool-for-array-hardware-snapshots > > Absent a "zfs freeze" it seems using FreeBSD/zfs on AWS with EBS is > going to be impossible. Unfortunately that means back to Linux sigh. > What is wrong with a simple ZFS snapshot and running the backup against it? I assume that's how most of us are doing it. -- Adam Vande More From owner-freebsd-fs@FreeBSD.ORG Wed Jul 3 01:35:43 2013 Return-Path: Delivered-To: freebsd-fs@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id ED907F97; Wed, 3 Jul 2013 01:35:43 +0000 (UTC) (envelope-from linimon@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id C7CDB11D5; Wed, 3 Jul 2013 01:35:43 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id r631ZhlC078678; Wed, 3 Jul 2013 01:35:43 GMT (envelope-from linimon@freefall.freebsd.org) Received: (from linimon@localhost) by freefall.freebsd.org (8.14.7/8.14.7/Submit) id r631Zh6j078677; Wed, 3 Jul 2013 01:35:43 GMT (envelope-from linimon) Date: Wed, 3 Jul 2013 01:35:43 GMT Message-Id: <201307030135.r631Zh6j078677@freefall.freebsd.org> To: linimon@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-fs@FreeBSD.org From: linimon@FreeBSD.org Subject: Re: kern/9619: [nfs] Restarting mountd kills existing mounts X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jul 2013 01:35:44 -0000 Synopsis: [nfs] Restarting mountd kills existing mounts Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Wed Jul 3 01:35:32 UTC 2013 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=9619 From owner-freebsd-fs@FreeBSD.ORG Wed Jul 3 01:36:03 2013 Return-Path: Delivered-To: freebsd-fs@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id E5C88E9; Wed, 3 Jul 2013 01:36:03 +0000 (UTC) (envelope-from linimon@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id C019A11E4; Wed, 3 Jul 2013 01:36:03 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id r631a3ct078718; Wed, 3 Jul 2013 01:36:03 GMT (envelope-from linimon@freefall.freebsd.org) Received: (from linimon@localhost) by freefall.freebsd.org (8.14.7/8.14.7/Submit) id r631a3aG078717; Wed, 3 Jul 2013 01:36:03 GMT (envelope-from linimon) Date: Wed, 3 Jul 2013 01:36:03 GMT Message-Id: <201307030136.r631a3aG078717@freefall.freebsd.org> To: g.gonter@ieee.org, linimon@FreeBSD.org, rodrigc@FreeBSD.org, freebsd-fs@FreeBSD.org From: linimon@FreeBSD.org Subject: Re: kern/67326: [msdosfs] crash after attempt to mount write protected MSDOS fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jul 2013 01:36:04 -0000 Synopsis: [msdosfs] crash after attempt to mount write protected MSDOS fs State-Changed-From-To: open->open State-Changed-By: linimon State-Changed-When: Wed Jul 3 00:50:32 UTC 2013 State-Changed-Why: commit bit has been taken in for safekeeping. Responsible-Changed-From-To: rodrigc->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Wed Jul 3 00:50:32 UTC 2013 Responsible-Changed-Why: http://www.freebsd.org/cgi/query-pr.cgi?pr=67326 From owner-freebsd-fs@FreeBSD.ORG Wed Jul 3 02:08:21 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 923E7B36 for ; Wed, 3 Jul 2013 02:08:21 +0000 (UTC) (envelope-from berend@pobox.com) Received: from smtp.pobox.com (b-pb-sasl-quonix.pobox.com [208.72.237.35]) by mx1.freebsd.org (Postfix) with ESMTP id 4FBE013B8 for ; Wed, 3 Jul 2013 02:08:20 +0000 (UTC) Received: from smtp.pobox.com (unknown [127.0.0.1]) by b-sasl-quonix.pobox.com (Postfix) with ESMTP id 068F6259A6; Wed, 3 Jul 2013 02:08:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=pobox.com; h=date :message-id:from:to:cc:subject:in-reply-to:references :mime-version:content-type:content-transfer-encoding; s=sasl; bh=gowTjJNCznOZbw8ahLg/X0QQU/g=; b=re0PPqKJ8cp9z+7QIDh3AaotNJ2y plVSB7GvXWd/68eU4zBQ29C/aJ1m36j8PrtHOhfoiJPfX+Zz7hDcTDvQM874YVM2 NcnsuL/zTvZggFAgg0ydV76QxjQv6fmSP5+eCoO2ztyR+58QxOh8L+NNQ2GfUWGw 0N/Vly9xI4y1FYc= DomainKey-Signature: a=rsa-sha1; c=nofws; d=pobox.com; h=date:message-id :from:to:cc:subject:in-reply-to:references:mime-version :content-type:content-transfer-encoding; q=dns; s=sasl; b=d5BHXS y27FONrA/bUTrlZQXTNGqzxbkDmaAPAZrmJ7yKGt/1SYkpYq74J9RIFcNtjv0Y+S 6PkrcOHErr62bxuBpHQpJzNsH4meCcvGtUbTEToikAGZEig8q2duHaPZHJLb8mSn Ep+xcPLdAk7g5mI2qJ13moHH5X3qwjq2fGUG4= Received: from b-pb-sasl-quonix.pobox.com (unknown [127.0.0.1]) by b-sasl-quonix.pobox.com (Postfix) with ESMTP id F109E259A5; Wed, 3 Jul 2013 02:08:19 +0000 (UTC) Received: from bmach.nederware.nl (unknown [27.252.247.84]) by b-sasl-quonix.pobox.com (Postfix) with ESMTPA id 7559A259A3; Wed, 3 Jul 2013 02:08:19 +0000 (UTC) Received: from quadrio.nederware.nl (quadrio.nederware.nl [192.168.33.13]) by bmach.nederware.nl (Postfix) with ESMTP id 9338B5C81; Wed, 3 Jul 2013 14:08:12 +1200 (NZST) Received: from quadrio.nederware.nl (quadrio.nederware.nl [127.0.0.1]) by quadrio.nederware.nl (Postfix) with ESMTP id 37D574A11BF3; Wed, 3 Jul 2013 14:08:17 +1200 (NZST) Date: Wed, 03 Jul 2013 14:08:13 +1200 Message-ID: <87ehbg5raq.wl%berend@pobox.com> From: Berend de Boer To: Adam Vande More Subject: Re: EBS snapshot backups from a FreeBSD zfs file system: zpool freeze? In-Reply-To: References: <87li5o5tz2.wl%berend@pobox.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL/10.8 EasyPG/1.0.0 Emacs/24.3 (i686-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) Organization: Xplain Technology Ltd MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: multipart/signed; boundary="pgp-sign-Multipart_Wed_Jul__3_14:08:13_2013-1"; micalg=pgp-sha256; protocol="application/pgp-signature" Content-Transfer-Encoding: 7bit X-Pobox-Relay-ID: 6E47B344-E385-11E2-8032-E84251E3A03C-48001098!b-pb-sasl-quonix.pobox.com Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jul 2013 02:08:21 -0000 --pgp-sign-Multipart_Wed_Jul__3_14:08:13_2013-1 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable >>>>> "Adam" =3D=3D Adam Vande More writes: Adam> What is wrong with a simple ZFS snapshot and running the Adam> backup against it? =C2=A0I assume that's how most of us are doing Adam> it. For starters, I suppose very few people are using FreeBSD on AWS, so "most of us" don't have a choice :-) But this might simply be my understanding: what if I want to use the EBS snapshot of the 5 disks I've taken and attach them to another machine, and mount it? But perhaps you don't know what an EBS snapshot is? It's not a backup of your file system, it's a hardware-based backup of a disk at a single point in time. I.e. with EBS I can take a snapshot of 5 1TB, create new disks from it, and attach it to another machine. In SECONDS. I'm not looking for a way to take a ZFS snapshot, stream that to S3 for hours, and stream it back and write to another disk for hours. But in case I didn't get you: could you please let me know if your approach would allow me to take a consistent backup of my disks and mount them on another server or use them for recovery purposes? --=20 All the best, Berend de Boer ------------------------------------------------------ Awesome Drupal hosting: https://www.xplainhosting.com/ --pgp-sign-Multipart_Wed_Jul__3_14:08:13_2013-1 Content-Type: application/pgp-signature Content-Transfer-Encoding: 7bit Content-Description: OpenPGP Digital Signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.17 (GNU/Linux) iQIcBAABCAAGBQJR04eNAAoJEKOfeD48G3g5IJ0QAJq76UAFXIY5wHMOKjyUwKqH BO6kWfogMZmJloYYP6ckGdOwCmIwGJnFwIoxjKASIKTmw6WpXmkuZnEQpqKvKudk rBlPVa5hw8mcr2a2qvZqW2M6dsMH5abAfYyF7wGf/ktKME/RBIJhKi/kKpQMUAwc SqgZWoMS7Ye3wAf9MSfvpOXO553A1gb+7Bj6RbwSYeVsJahDyNd843JogRQ+g3ix SStvsTbmH+ppVLYDsrFhJQ2YD+Apk7aVGQ4JwkH5JQ+GWfq+gzz4E8W8SIfjKhF7 m6TWs73JzHAWL2OwA1GCxzv+ujdXnpdbESUp2oRHHNRf1a3hek+mUK7xldjUT+NN 4J2Lk+EE9P7XslG6pD4h8oHT5UPec6yvp9be+P4zh9Ut94X3EDeqL8PHjXlviubP iXLWW4AxRyfVxNOggs7XCn9oXFFmIbk5rnebgV7rdYfLIZnQ/mSuF2INBzAi7ROX qucJz53fF91gmTbN5iWxdaF+b0zgHnQGqP352OINz1Is7mJJn/F42uM7v8kwEgn+ sCNu3y603fV64ZsXuCXkGQxJV8KcFntzTOVa+sCzEKThMzRQFRo4es6B9e5sCTvi VAqlHs/OfZ6T94SVmQbOEoA91oYNLnS4qxc3fUbnvaMkolgs3Bj8q94Pavm56ygU q9t9Rgv5qUk/IEFAurZF =T6f7 -----END PGP SIGNATURE----- --pgp-sign-Multipart_Wed_Jul__3_14:08:13_2013-1-- From owner-freebsd-fs@FreeBSD.ORG Wed Jul 3 05:51:10 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id DFBA0F07 for ; Wed, 3 Jul 2013 05:51:10 +0000 (UTC) (envelope-from jdc@koitsu.org) Received: from relay5-d.mail.gandi.net (relay5-d.mail.gandi.net [217.70.183.197]) by mx1.freebsd.org (Postfix) with ESMTP id 6908F1E51 for ; Wed, 3 Jul 2013 05:51:10 +0000 (UTC) Received: from mfilter23-d.gandi.net (mfilter23-d.gandi.net [217.70.178.151]) by relay5-d.mail.gandi.net (Postfix) with ESMTP id C450741C06B; Wed, 3 Jul 2013 07:50:52 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at mfilter23-d.gandi.net Received: from relay5-d.mail.gandi.net ([217.70.183.197]) by mfilter23-d.gandi.net (mfilter23-d.gandi.net [10.0.15.180]) (amavisd-new, port 10024) with ESMTP id sQtZwEe8XkR7; Wed, 3 Jul 2013 07:50:51 +0200 (CEST) X-Originating-IP: 76.102.14.35 Received: from jdc.koitsu.org (c-76-102-14-35.hsd1.ca.comcast.net [76.102.14.35]) (Authenticated sender: jdc@koitsu.org) by relay5-d.mail.gandi.net (Postfix) with ESMTPSA id A2E6E41C06A; Wed, 3 Jul 2013 07:50:49 +0200 (CEST) Received: by icarus.home.lan (Postfix, from userid 1000) id CD2A673A1D; Tue, 2 Jul 2013 22:50:47 -0700 (PDT) Date: Tue, 2 Jul 2013 22:50:47 -0700 From: Jeremy Chadwick To: Berend de Boer Subject: Re: EBS snapshot backups from a FreeBSD zfs file system: zpool freeze? Message-ID: <20130703055047.GA54853@icarus.home.lan> References: <87li5o5tz2.wl%berend@pobox.com> <87ehbg5raq.wl%berend@pobox.com> MIME-Version: 1.0 Content-Type: text/plain; charset=unknown-8bit Content-Disposition: inline In-Reply-To: <87ehbg5raq.wl%berend@pobox.com> User-Agent: Mutt/1.5.21 (2010-09-15) Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jul 2013 05:51:10 -0000 On Wed, Jul 03, 2013 at 02:08:13PM +1200, Berend de Boer wrote: > >>>>> "Adam" =3D=3D Adam Vande More writes: >=20 > Adam> What is wrong with a simple ZFS snapshot and running the > Adam> backup against it? =C2=A0I assume that's how most of us are d= oing > Adam> it. >=20 > For starters, I suppose very few people are using FreeBSD on AWS, so > "most of us" don't have a choice :-) >=20 > But this might simply be my understanding: what if I want to use the > EBS snapshot of the 5 disks I've taken and attach them to another > machine, and mount it? > > But perhaps you don't know what an EBS snapshot is? It's not a backup > of your file system, it's a hardware-based backup of a disk at a > single point in time. > > I.e. with EBS I can take a snapshot of 5 1TB, create new disks from > it, and attach it to another machine. In SECONDS. Okay, I think I understand what you're asking. Please correct me: It sounds to me like the Linux OS images on AWS have utilities or the capability to create EBS images that are snapshots of the "virtual disks" that make up the AWS system, and that you can transfer these to another Linux AWS machine and mount the EBS images, and that this is being done within Linux itself. Correct? If so -- then what you're wanting to ask is: "does FreeBSD have support for EBS images?" (Meaning this has nothing to do with ZFS) I get the impression an EBS image is a proprietary Amazon thing, so you would need to ask Amazon if they have the same utilities for FreeBSD, or ask the individual(s) responsible for the FreeBSD AWS images if there are such tools. Again: nothing to do with ZFS. The confusion for me lies in this statement (previous mail): >>> On Linux' file systems I can freeze a file system, start the backup o= f >>> all disks, and unfreeze. You used the word "file system" here, not "disk". Yet you then say: > I'm not looking for a way to take a ZFS snapshot, stream that to S3 > for hours, and stream it back and write to another disk for hours. Except ZFS is a filesystem, yet above you just said "On Linux I can freeze a filesystem..." Understand my confusion now? > But in case I didn't get you: could you please let me know if your > approach would allow me to take a consistent backup of my disks and > mount them on another server or use them for recovery purposes? ZFS snapshots will let you take a snapshot of a pool/filesystem (including incremental, if needed). You can send ZFS snapshots to another system for use using "zfs send" and "zfs recv". You can use ZFS snapshots for almost-bare-metal recovery depending on how you set up your system, but it does not do recovery of things like boot blocks/bootloaders or other nuances. You would have to do those on your own (set up the boot blocks, etc.), or do so at a different layer. For example, referring to virtualisation/hypervisors: it is possible to take a "snapshot of a disk used by a VM", and then take that snapshot and move it to another machine where its added (as a new disk) + seen by the OS as a new disk and can be used. And that has nothing to do with ZFS -- that has to do with the VM software and/or HV software. *Sometimes* OS vendors actually have utilities (running within the guest itself) that can do these tasks, hence my paragraph above starting out with "If so --". --=20 | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Wed Jul 3 06:21:05 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 45C5B9F7 for ; Wed, 3 Jul 2013 06:21:05 +0000 (UTC) (envelope-from davidxu@freebsd.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id 397651F50; Wed, 3 Jul 2013 06:21:05 +0000 (UTC) Received: from xyf.my.dom (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id r636L35g048460; Wed, 3 Jul 2013 06:21:04 GMT (envelope-from davidxu@freebsd.org) Message-ID: <51D3C2F4.8010907@freebsd.org> Date: Wed, 03 Jul 2013 14:21:40 +0800 From: David Xu User-Agent: Mozilla/5.0 (X11; FreeBSD i386; rv:17.0) Gecko/20130416 Thunderbird/17.0.5 MIME-Version: 1.0 To: Berend de Boer Subject: Re: EBS snapshot backups from a FreeBSD zfs file system: zpool freeze? References: <87li5o5tz2.wl%berend@pobox.com> In-Reply-To: <87li5o5tz2.wl%berend@pobox.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jul 2013 06:21:05 -0000 On 2013/07/03 09:10, Berend de Boer wrote: > Hi All, > > I'm experimenting with building a FreeBSD NFS server on Amazon AWS > EC2. I've created a zpool with 5 disks in a raidz2 configuration. > > How can I make a consistent backup of this using EBS? > > On Linux' file systems I can freeze a file system, start the backup of > all disks, and unfreeze. This freeze usually only takes 100ms or so. > > ZFS on FreeBSD does not appear to have such an option. I.e. what I'm > looking for is basically a hardware based snapshot. ZFS should simply > be suspended at a recoverable point for a few hundred ms. > > A similar question from 2010 is here: > http://thr3ads.net/zfs-discuss/2010/11/580781-how-to-quiesce-and-unquiesc-zfs-and-zpool-for-array-hardware-snapshots > > Absent a "zfs freeze" it seems using FreeBSD/zfs on AWS with EBS is > going to be impossible. Unfortunately that means back to Linux sigh. > > -- > All the best, > > Berend de Boer > > > ------------------------------------------------------ > Awesome Drupal hosting: https://www.xplainhosting.com/ > What you need is a tool to create snapshot on EBS server, and let EBS server to transfer the snapshot image to anonther EBS server, it is nothing to do with FreeBSD. EBS server which is based on Linux can create snapshot at block level, AFAIK its device mapper is capabilities of creating a snapshot for a disk volume, FreeBSD's geom lacks of such a class. But it is nothing to do with your needs, it is only needed on EBS server. Regards, David Xu From owner-freebsd-fs@FreeBSD.ORG Wed Jul 3 06:34:40 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id F2C71DD1 for ; Wed, 3 Jul 2013 06:34:40 +0000 (UTC) (envelope-from toasty@dragondata.com) Received: from mail-gg0-x22d.google.com (mail-gg0-x22d.google.com [IPv6:2607:f8b0:4002:c02::22d]) by mx1.freebsd.org (Postfix) with ESMTP id B20181FD0 for ; Wed, 3 Jul 2013 06:34:40 +0000 (UTC) Received: by mail-gg0-f173.google.com with SMTP id k3so143606ggn.18 for ; Tue, 02 Jul 2013 23:34:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dragondata.com; s=google; h=content-type:mime-version:subject:from:in-reply-to:date:cc :message-id:references:to:x-mailer; bh=MTjnFj0GdGwT4e3P8vOY6bkAhgB4aLG712LKXHQ2DAg=; b=la5Nt5eAYHX44aaAdyUIHTKJunlGkUv9O4Zw107IY4VEVU+6zsP0y+IljWpSIJekU/ kLfmHhKO0m6mM3+qh2X13AeBSWM3DvYsdOjV0hR7LHl1hknfR0x2+79UZsf4FcKABLMm jBbpzpwCGB3duRnW6lDjIWTy443/A+j+5dxKA= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=content-type:mime-version:subject:from:in-reply-to:date:cc :message-id:references:to:x-mailer:x-gm-message-state; bh=MTjnFj0GdGwT4e3P8vOY6bkAhgB4aLG712LKXHQ2DAg=; b=Zfrm8/NNuXr2b6T3ldvWNBYkagEzHC2mLGf9gu2uqsj0w8DA61TDTM2ESrodoTBJn+ qSWf3hqCgzvVufKSoDTCZyFCZ1xHkZFVaJs974eEdzav3OZnpIN2FU7YVRL2UaCBBtYu /8KqlQx+kP3kqdrSqVhHZDbpkcAg7e1py7f1rzgHRCpfH0ivx26CeY8o6hMSgIQdCovz mdRi8WdXcIO4Vh0R+Ax8dwW5fj1eOzaQLjMALQV5lO0t2IUSaUaDyoqiYupa+Tgd7TRs 98EAiyQDLYzRH/Z+coPKf/ZK1lSgz4QUuzZtWVsWwaRsVynyF1bpHj70X9ALvoUaK1v5 Lq5Q== X-Received: by 10.236.49.41 with SMTP id w29mr15892363yhb.152.1372833280053; Tue, 02 Jul 2013 23:34:40 -0700 (PDT) Received: from unassigned.v6.your.org ([2001:4978:1:45:501c:ad1e:60c9:cc37]) by mx.google.com with ESMTPSA id e69sm45038558yhl.3.2013.07.02.23.34.38 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 02 Jul 2013 23:34:39 -0700 (PDT) Content-Type: multipart/signed; boundary="Apple-Mail=_E9F25EA9-6758-4A9C-A380-FD064D1B1DB1"; protocol="application/pkcs7-signature"; micalg=sha1 Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\)) Subject: Re: EBS snapshot backups from a FreeBSD zfs file system: zpool freeze? From: Kevin Day In-Reply-To: <20130703055047.GA54853@icarus.home.lan> Date: Wed, 3 Jul 2013 01:34:36 -0500 Message-Id: <6488DECC-2455-4E92-B432-C39490D18484@dragondata.com> References: <87li5o5tz2.wl%berend@pobox.com> <87ehbg5raq.wl%berend@pobox.com> <20130703055047.GA54853@icarus.home.lan> To: Jeremy Chadwick X-Mailer: Apple Mail (2.1508) X-Gm-Message-State: ALoCoQmAS0t4msbhF6o5njA3th4EeAyLXjpdURpvDLW/OvqDZNLinIi1Z3vpmTfVwT/NZH55lsk1 Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jul 2013 06:34:41 -0000 --Apple-Mail=_E9F25EA9-6758-4A9C-A380-FD064D1B1DB1 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii On Jul 3, 2013, at 12:50 AM, Jeremy Chadwick wrote: > On Wed, Jul 03, 2013 at 02:08:13PM +1200, Berend de Boer wrote: >>=20 >> For starters, I suppose very few people are using FreeBSD on AWS, so >> "most of us" don't have a choice :-) >>=20 >> But this might simply be my understanding: what if I want to use the >> EBS snapshot of the 5 disks I've taken and attach them to another >> machine, and mount it? >>=20 >> But perhaps you don't know what an EBS snapshot is? It's not a backup >> of your file system, it's a hardware-based backup of a disk at a >> single point in time. >>=20 >> I.e. with EBS I can take a snapshot of 5 1TB, create new disks from >> it, and attach it to another machine. In SECONDS. >=20 > Okay, I think I understand what you're asking. Please correct me: >=20 > It sounds to me like the Linux OS images on AWS have utilities or the > capability to create EBS images that are snapshots of the "virtual > disks" that make up the AWS system, and that you can transfer these > to another Linux AWS machine and mount the EBS images, and that this = is > being done within Linux itself. >=20 > Correct? >=20 > If so -- then what you're wanting to ask is: "does FreeBSD have = support > for EBS images?" (Meaning this has nothing to do with ZFS) I get the > impression an EBS image is a proprietary Amazon thing, so you would = need > to ask Amazon if they have the same utilities for FreeBSD, or ask the > individual(s) responsible for the FreeBSD AWS images if there are such > tools. Again: nothing to do with ZFS. Not quite, we run into something similar with VMware ESXi, so let me try = to rephrase it. ESXi provides block storage to virtual machines that appear to be = traditional disks. ESXi has no real knowledge of the contents, files, = it's just dumb block storage. This is like EBS, the volumes appear to be = normal disks to the virtual machine. But, the backend storage system allows you to take snapshots of the = virtual disks. You can then roll back to an earlier snapshot, or make = copies of the snapshots for other VMs to use. The problem is that unless = you quiesce the filesystem before making the snapshot, the contents are = potentially "dirty". It could be mid-write, or in the middle of doing = something that would be bad to leave that way. If you're unlucky and = make a snapshot during one of those inconsistent periods, if you were to = rollback or clone that snapshot you'd get fsck complaining (for UFS) or = ZFS may need to do some repairing, etc. This is much less an issue with = ZFS than other filesystems, but it's still possible to think you've = written something, try to trigger a snapshot, and it's not actually = committed to disk yet. Linux has an ioctl(?) you can call that says "Quiesce this filesystem", = where all dirty buffers are flushed and (i believe), writes are blocked, = and the filesystem is temporarily marked clean. You then trigger the = backend storage to make a filesystem, then you unfreeze the filesystem = so that normal I/O can continue. The closest thing we can do in FreeBSD is to unmount the filesystem, = take the snapshot, and remount. This has the side effect of closing all = open files, so it's not really an alternative.=20 The other option is to not freeze the filesystem before taking the = snapshot, but again you risk leaving things in an inconsistent state, = and/or the last few writes you think you made didn't actually get = committed to disk yet. For automated systems that create then clone = filesystems for new VMs, this can be a big problem. At best, you're = going to get a warning that the filesystem wasn't cleanly unmounted. --Apple-Mail=_E9F25EA9-6758-4A9C-A380-FD064D1B1DB1 Content-Disposition: attachment; filename=smime.p7s Content-Type: application/pkcs7-signature; name=smime.p7s Content-Transfer-Encoding: base64 MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIPLzCCBN0w ggPFoAMCAQICEHGS++YZX6xNEoV0cTSiGKcwDQYJKoZIhvcNAQEFBQAwezELMAkGA1UEBhMCR0Ix GzAZBgNVBAgMEkdyZWF0ZXIgTWFuY2hlc3RlcjEQMA4GA1UEBwwHU2FsZm9yZDEaMBgGA1UECgwR Q29tb2RvIENBIExpbWl0ZWQxITAfBgNVBAMMGEFBQSBDZXJ0aWZpY2F0ZSBTZXJ2aWNlczAeFw0w NDAxMDEwMDAwMDBaFw0yODEyMzEyMzU5NTlaMIGuMQswCQYDVQQGEwJVUzELMAkGA1UECBMCVVQx FzAVBgNVBAcTDlNhbHQgTGFrZSBDaXR5MR4wHAYDVQQKExVUaGUgVVNFUlRSVVNUIE5ldHdvcmsx ITAfBgNVBAsTGGh0dHA6Ly93d3cudXNlcnRydXN0LmNvbTE2MDQGA1UEAxMtVVROLVVTRVJGaXJz dC1DbGllbnQgQXV0aGVudGljYXRpb24gYW5kIEVtYWlsMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8A MIIBCgKCAQEAsjmFpPJ9q0E7YkY3rs3BYHW8OWX5ShpHornMSMxqmNVNNRm5pELlzkniii8efNIx B8dOtINknS4p1aJkxIW9hVE1eaROaJB7HHqkkqgX8pgV8pPMyaQylbsMTzC9mKALi+VuG6JG+ni8 om+rWV6lL8/K2m2qL+usobNqqrcuZzWLeeEeaYji5kbNoKXqvgvOdjp6Dpvq/NonWz1zHyLmSGHG TPNpsaguG7bUMSAsvIKKjqQOpdeJQ/wWWq8dcdcRWdq6hw2v+vPhwvCkxWeM1tZUOt4KpLoDd7Nl yP0e03RiqhjKaJMeoYV+9Udly/hNVyh00jT/MLbu9mIwFIws6wIDAQABo4IBJzCCASMwHwYDVR0j BBgwFoAUoBEKIz6W8Qfs4q8p74Klf9AwpLQwHQYDVR0OBBYEFImCZ33EnSZwAEu0UEh83j2uBG59 MA4GA1UdDwEB/wQEAwIBBjAPBgNVHRMBAf8EBTADAQH/MB0GA1UdJQQWMBQGCCsGAQUFBwMCBggr BgEFBQcDBDARBgNVHSAECjAIMAYGBFUdIAAwewYDVR0fBHQwcjA4oDagNIYyaHR0cDovL2NybC5j b21vZG9jYS5jb20vQUFBQ2VydGlmaWNhdGVTZXJ2aWNlcy5jcmwwNqA0oDKGMGh0dHA6Ly9jcmwu Y29tb2RvLm5ldC9BQUFDZXJ0aWZpY2F0ZVNlcnZpY2VzLmNybDARBglghkgBhvhCAQEEBAMCAQYw DQYJKoZIhvcNAQEFBQADggEBAJ2Vyzy4fqUJxB6/C8LHdo45PJTGEKpPDMngq4RdiVTgZTvzbRx8 NywlVF+WIfw3hJGdFdwUT4HPVB1rbEVgxy35l1FM+WbKPKCCjKbI8OLp1Er57D9Wyd12jMOCAU9s APMeGmF0BEcDqcZAV5G8ZSLFJ2dPV9tkWtmNH7qGL/QGrpxp7en0zykX2OBKnxogL5dMUbtGB8SK N04g4wkxaMeexIud6H4RvDJoEJYRmETYKlFgTYjrdDrfQwYyyDlWjDoRUtNBpEMD9O3vMyfbOeAU TibJ2PU54om4k123KSZB6rObroP8d3XK6Mq1/uJlSmM+RMTQw16Hc6mYHK9/FX8wggUaMIIEAqAD AgECAhBtGeqnGU9qMyLmIjJ6qnHeMA0GCSqGSIb3DQEBBQUAMIGuMQswCQYDVQQGEwJVUzELMAkG A1UECBMCVVQxFzAVBgNVBAcTDlNhbHQgTGFrZSBDaXR5MR4wHAYDVQQKExVUaGUgVVNFUlRSVVNU IE5ldHdvcmsxITAfBgNVBAsTGGh0dHA6Ly93d3cudXNlcnRydXN0LmNvbTE2MDQGA1UEAxMtVVRO LVVTRVJGaXJzdC1DbGllbnQgQXV0aGVudGljYXRpb24gYW5kIEVtYWlsMB4XDTExMDQyODAwMDAw MFoXDTIwMDUzMDEwNDgzOFowgZMxCzAJBgNVBAYTAkdCMRswGQYDVQQIExJHcmVhdGVyIE1hbmNo ZXN0ZXIxEDAOBgNVBAcTB1NhbGZvcmQxGjAYBgNVBAoTEUNPTU9ETyBDQSBMaW1pdGVkMTkwNwYD VQQDEzBDT01PRE8gQ2xpZW50IEF1dGhlbnRpY2F0aW9uIGFuZCBTZWN1cmUgRW1haWwgQ0EwggEi MA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQCShIRbS1eY1F4vi6ThQMijU1hfZmXxMk73nzJ9 VdB4TFW3QpTg+SdxB8XGaaS5MsTxQBqQzCdWYn8XtXFpruUgG+TLY15gyqJB9mrho/+43x9IbWVD jCouK2M4d9+xF6zC2oIC1tQyatRnbyATj1w1+uVUgK/YcQodNwoCUFNslR2pEBS0mZVZEjH/CaLS TNxS297iQAFbSGjdxUq04O0kHzqvcV8H46y/FDuwJXFoPfQP1hdYRhWBPGiLi4MPbXohV+Y0sNsy fuNK4aVScmQmkU6lkg//4LFg/RpvaFGZY40ai6XMQpubfSJj06mg/M6ekN9EGfRcWzW6FvOnm//B AgMBAAGjggFLMIIBRzAfBgNVHSMEGDAWgBSJgmd9xJ0mcABLtFBIfN49rgRufTAdBgNVHQ4EFgQU ehNOAHRbxnhjZCfBL+KgW7x5xXswDgYDVR0PAQH/BAQDAgEGMBIGA1UdEwEB/wQIMAYBAf8CAQAw EQYDVR0gBAowCDAGBgRVHSAAMFgGA1UdHwRRME8wTaBLoEmGR2h0dHA6Ly9jcmwudXNlcnRydXN0 LmNvbS9VVE4tVVNFUkZpcnN0LUNsaWVudEF1dGhlbnRpY2F0aW9uYW5kRW1haWwuY3JsMHQGCCsG AQUFBwEBBGgwZjA9BggrBgEFBQcwAoYxaHR0cDovL2NydC51c2VydHJ1c3QuY29tL1VUTkFkZFRy dXN0Q2xpZW50X0NBLmNydDAlBggrBgEFBQcwAYYZaHR0cDovL29jc3AudXNlcnRydXN0LmNvbTAN BgkqhkiG9w0BAQUFAAOCAQEAhda+eFdVbTN/RFL+QtUGqAEDgIr7DbL9Sr/2r0FJ9RtaxdKtG3Nu PukmfOZMmMEwKN/L+0I8oSU+CnXW0D05hmbRoZu1TZtvryhsHa/l6nRaqNqxwPF1ei+eupN5yv7i kR5WdLL4jdPgQ3Ib7Y/9YDkgR/uLrzplSDyYPaUlv73vYOBJ5RbI6z9Dg/Dg7g3B080zX5vQvWBq szv++tTJOjwf7Zv/m0kzvkIpOYPuM2kugp1FTahp2oAbHj3SGl18R5mlmwhtEpmG1l1XBxunML5L SUS4kH7K0Xk467Qz+qA6XSZYnmFVGLQh1ZnV4ENAQjC+6qXnlNKw/vN1+X9u5zCCBSwwggQUoAMC AQICEQDbETdDYf7wYKjx8ymk38yAMA0GCSqGSIb3DQEBBQUAMIGTMQswCQYDVQQGEwJHQjEbMBkG A1UECBMSR3JlYXRlciBNYW5jaGVzdGVyMRAwDgYDVQQHEwdTYWxmb3JkMRowGAYDVQQKExFDT01P RE8gQ0EgTGltaXRlZDE5MDcGA1UEAxMwQ09NT0RPIENsaWVudCBBdXRoZW50aWNhdGlvbiBhbmQg U2VjdXJlIEVtYWlsIENBMB4XDTEzMDYxNjAwMDAwMFoXDTE0MDYxNjIzNTk1OVowJjEkMCIGCSqG SIb3DQEJARYVdG9hc3R5QGRyYWdvbmRhdGEuY29tMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIB CgKCAQEAvoIO+cLWLe7YYAGV/WdoWC85K8uIgstlYMg/bC8eGbC7AY/nuQXpRV5+xlTXgN7qry/m 6XErlaw1U3rmwlNyjMhJdYaPZclywBKKpYnc3sp0q2A6naeVmOF/t4QDImtfc3sV7SaEkIr7zssK MFTnkOX57g1r3MuiYoHBx1cMaWXYCJ5LDzsynwHGAExYuziRzXcu4sRZ1HBJlQ8hM3yhTTGGOQv1 H1ky13a1RxXC+uoTtYFyrxdBgPUd4eGF1tILHtK9NXnU6lhey90wDa2jmQOJQErgYuYPZriSuBXz QobK7tGcjMBgBQ1U+gxaTyThbXgxfb1MTjDx46hSl8Z35wIDAQABo4IB5TCCAeEwHwYDVR0jBBgw FoAUehNOAHRbxnhjZCfBL+KgW7x5xXswHQYDVR0OBBYEFO9wHp89I1B980w64KR38bmtuHFYMA4G A1UdDwEB/wQEAwIFoDAMBgNVHRMBAf8EAjAAMCAGA1UdJQQZMBcGCCsGAQUFBwMEBgsrBgEEAbIx AQMFAjARBglghkgBhvhCAQEEBAMCBSAwRgYDVR0gBD8wPTA7BgwrBgEEAbIxAQIBAQEwKzApBggr BgEFBQcCARYdaHR0cHM6Ly9zZWN1cmUuY29tb2RvLm5ldC9DUFMwVwYDVR0fBFAwTjBMoEqgSIZG aHR0cDovL2NybC5jb21vZG9jYS5jb20vQ09NT0RPQ2xpZW50QXV0aGVudGljYXRpb25hbmRTZWN1 cmVFbWFpbENBLmNybDCBiAYIKwYBBQUHAQEEfDB6MFIGCCsGAQUFBzAChkZodHRwOi8vY3J0LmNv bW9kb2NhLmNvbS9DT01PRE9DbGllbnRBdXRoZW50aWNhdGlvbmFuZFNlY3VyZUVtYWlsQ0EuY3J0 MCQGCCsGAQUFBzABhhhodHRwOi8vb2NzcC5jb21vZG9jYS5jb20wIAYDVR0RBBkwF4EVdG9hc3R5 QGRyYWdvbmRhdGEuY29tMA0GCSqGSIb3DQEBBQUAA4IBAQCBaQ8dcaprzzREiMtsc2UtOPSHFiCy dcd5OjE6BN+pkcQozhx3nol9dFKJ+YfGvIxIjHmDGFTOgJgJvjRZ0D1Hw2WJCEtyD+U6yi/cnDFu Ksl039qafzbah6ft2r+GM0QufuFmrBi/bTWU3lGuhL8TKOvsWeLFkyGqtv9AJz2vg7j7dpxutLQY NWnrt7nS2x6p4f1LXu3iwczefyNNFUYwE9zXAT0Uwn48g2iijuf9vekfpqtHBmfSu0tSfd3FS3JC hmFp1fMxnWOnuZ529HFtGeYzr1K8Tp+JEVPjzPCxymVFsZ945Vzj0kc0DT3f9N5Gdw6uybrUwupM NHJJCB9VMYIDrjCCA6oCAQEwgakwgZMxCzAJBgNVBAYTAkdCMRswGQYDVQQIExJHcmVhdGVyIE1h bmNoZXN0ZXIxEDAOBgNVBAcTB1NhbGZvcmQxGjAYBgNVBAoTEUNPTU9ETyBDQSBMaW1pdGVkMTkw NwYDVQQDEzBDT01PRE8gQ2xpZW50IEF1dGhlbnRpY2F0aW9uIGFuZCBTZWN1cmUgRW1haWwgQ0EC EQDbETdDYf7wYKjx8ymk38yAMAkGBSsOAwIaBQCgggHZMBgGCSqGSIb3DQEJAzELBgkqhkiG9w0B BwEwHAYJKoZIhvcNAQkFMQ8XDTEzMDcwMzA2MzQzN1owIwYJKoZIhvcNAQkEMRYEFBL9fNjtknbX ezNaUD2wg5ghgwtqMIG6BgkrBgEEAYI3EAQxgawwgakwgZMxCzAJBgNVBAYTAkdCMRswGQYDVQQI ExJHcmVhdGVyIE1hbmNoZXN0ZXIxEDAOBgNVBAcTB1NhbGZvcmQxGjAYBgNVBAoTEUNPTU9ETyBD QSBMaW1pdGVkMTkwNwYDVQQDEzBDT01PRE8gQ2xpZW50IEF1dGhlbnRpY2F0aW9uIGFuZCBTZWN1 cmUgRW1haWwgQ0ECEQDbETdDYf7wYKjx8ymk38yAMIG8BgsqhkiG9w0BCRACCzGBrKCBqTCBkzEL MAkGA1UEBhMCR0IxGzAZBgNVBAgTEkdyZWF0ZXIgTWFuY2hlc3RlcjEQMA4GA1UEBxMHU2FsZm9y ZDEaMBgGA1UEChMRQ09NT0RPIENBIExpbWl0ZWQxOTA3BgNVBAMTMENPTU9ETyBDbGllbnQgQXV0 aGVudGljYXRpb24gYW5kIFNlY3VyZSBFbWFpbCBDQQIRANsRN0Nh/vBgqPHzKaTfzIAwDQYJKoZI hvcNAQEBBQAEggEAjYkL54G/XpE8V6/C/JoAfo4NCyUhGH6r0FUJ4PI05jAZAkHLRg9mIumGCrlc hYg2dmSY8ZVJG8HWQVWmrxsvRW6z235cJHRSz9L//lqNUd+zplMSrs0kO6eVqZJ2UCXC5kXArpFD wsLzmdqROzM0kM43bVQy+MKt8PMFBiFo2DVFquQeqnVqY0OWIdh/l8HU5Gykei0XlKic415UBncY G5Du1SlllhZU77qnFtCr50AaSCl+BaCBZLG5Pi/TPEJsGFiPenJGs3mlVpG8HYjgKdVUB4paskCt w+q3C6K7w3fPKZCDULVJ7Vo/b7+5ri6qV5hSGRxB3tJA6S6atNhcLAAAAAAAAA== --Apple-Mail=_E9F25EA9-6758-4A9C-A380-FD064D1B1DB1-- From owner-freebsd-fs@FreeBSD.ORG Wed Jul 3 06:53:21 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 0D7E4382 for ; Wed, 3 Jul 2013 06:53:21 +0000 (UTC) (envelope-from will@firepipe.net) Received: from mail-vb0-f44.google.com (mail-vb0-f44.google.com [209.85.212.44]) by mx1.freebsd.org (Postfix) with ESMTP id C3985107A for ; Wed, 3 Jul 2013 06:53:20 +0000 (UTC) Received: by mail-vb0-f44.google.com with SMTP id e15so5398919vbg.3 for ; Tue, 02 Jul 2013 23:53:13 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:x-gm-message-state; bh=kt48liOr/uwK6n4zwT1gmKu62zgiWyPgAQdOlUfb2M8=; b=TbGLAEuSOHQ3SviyJL8D7DM26NLjNZchTCZXT8IRNUsOKcaOhXYePvp7awbpBv88og s4Yhu7QPIIu/HClyD+fD1glsgWGf2nHyqPJtGPIJS+HKWxtMxexPivBCjfyFOXUxpCHw seT/NZBNA/WcM0t+0wMYhpWWlXHbxiWZqTng4+1+5pzfIvSPHUjsTaYljWxoF9Vwm5zY P0owaPt/PMZRoK37CN9jXSoxMv+DIABOYINW6QRSwNK3eLTpBYHI4GbUomkcKv6KiDIT bbyhQye/1bl2MJNMAVQ1EQupwUM9izqxVG/B3wOERyJIvW9DjBjyyb82nfkVeHEOnscl //AQ== MIME-Version: 1.0 X-Received: by 10.220.143.140 with SMTP id v12mr12507568vcu.95.1372834393814; Tue, 02 Jul 2013 23:53:13 -0700 (PDT) Received: by 10.58.226.66 with HTTP; Tue, 2 Jul 2013 23:53:13 -0700 (PDT) In-Reply-To: <6488DECC-2455-4E92-B432-C39490D18484@dragondata.com> References: <87li5o5tz2.wl%berend@pobox.com> <87ehbg5raq.wl%berend@pobox.com> <20130703055047.GA54853@icarus.home.lan> <6488DECC-2455-4E92-B432-C39490D18484@dragondata.com> Date: Wed, 3 Jul 2013 00:53:13 -0600 Message-ID: Subject: Re: EBS snapshot backups from a FreeBSD zfs file system: zpool freeze? From: Will Andrews To: Kevin Day X-Gm-Message-State: ALoCoQkl84tR1UzzOu4V5d0CT2ORoZ4P7PZayBIzetl4B/rd5AcLLK2StxD18M1Fj32B9Z28AtEr Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jul 2013 06:53:21 -0000 On Wednesday, July 3, 2013, Kevin Day wrote: > The closest thing we can do in FreeBSD is to unmount the filesystem, take > the snapshot, and remount. This has the side effect of closing all open > files, so it's not really an alternative. > > The other option is to not freeze the filesystem before taking the > snapshot, but again you risk leaving things in an inconsistent state, > and/or the last few writes you think you made didn't actually get committ= ed > to disk yet. For automated systems that create then clone filesystems for > new VMs, this can be a big problem. At best, you're going to get a warnin= g > that the filesystem wasn't cleanly unmounted. > Actually, sync(2)/sync(8) will do the job on ZFS. It won't stop/pause I/O running in other contexts, but it does guarantee that any commands you ran and completed prior to calling sync will make it to disk in ZFS. This is because sync in ZFS is implemented as a ZIL commit, so transactions that haven't yet made it to disk via the normal syncing context will at least be committed via their ZIL blocks. Which can then be replayed when the pool is imported later, in this case from the EBS snapshots. And since the entire tree from the =C3=BCberblock down in ZFS is COW, you c= an't get an inconsistent pool simply by doing a virtual disk snapshot, regardless of how that is implemented. --Will. From owner-freebsd-fs@FreeBSD.ORG Wed Jul 3 07:02:15 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 612B1515 for ; Wed, 3 Jul 2013 07:02:15 +0000 (UTC) (envelope-from toasty@dragondata.com) Received: from mail-gh0-x230.google.com (mail-gh0-x230.google.com [IPv6:2607:f8b0:4002:c05::230]) by mx1.freebsd.org (Postfix) with ESMTP id 1BA9110D2 for ; Wed, 3 Jul 2013 07:02:15 +0000 (UTC) Received: by mail-gh0-f176.google.com with SMTP id z17so2736895ghb.7 for ; Wed, 03 Jul 2013 00:02:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dragondata.com; s=google; h=content-type:mime-version:subject:from:in-reply-to:date:cc :message-id:references:to:x-mailer; bh=Wkk1Sdzl3y9mDUqYQIHpfNbpm2rzAJJsGQOU14wMW88=; b=jmXtRs9yPJ1x5rJPyntIzBSDhWzXTlyR4HOfR/uPIFFdFYrcqGedQ5EM9WdeKKtUVa lcSJFVFvFutwr8IAuYQFkHtQnadiNxKdTUkl2T4CVm/1hffaSVdszu5ohfymDR5cUxi/ lDFEJzw8iTGWHdBssTxCtwPitkSQUonz714Hk= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=content-type:mime-version:subject:from:in-reply-to:date:cc :message-id:references:to:x-mailer:x-gm-message-state; bh=Wkk1Sdzl3y9mDUqYQIHpfNbpm2rzAJJsGQOU14wMW88=; b=krXkq5YjXw+/Ysb97sXpOxAeV12HuyZpmFbQS6BqY7/207ifbLGgjnxeEK/R4u4LOS 4xTSCIbDH2RMoORiIRlNHXObyufMs3ud9jLL31JUOlkwWtD/MW5H69MKr2wWUVjRwuAI uNP0h5nmWAK0qesuJgbRIiLDAUHXOaPRWIgBTi9icvaWfF8fNhPWGrx5hicFrrB2Bejw u+vU5saCqz/I4Hkclxdd7mr5TVjjC/5lmiNbE3650Cy6To9Fy3vKSRLC0t3mxNxSK1l4 L7prqOE5N56i+zF+8eH8VuRikHzIJuLhjMcVzy05sqcBff4b4+hcFlbWgT8zneE5xlXu FBvA== X-Received: by 10.236.81.244 with SMTP id m80mr15967113yhe.114.1372834934186; Wed, 03 Jul 2013 00:02:14 -0700 (PDT) Received: from unassigned.v6.your.org ([2001:4978:1:45:501c:ad1e:60c9:cc37]) by mx.google.com with ESMTPSA id o1sm45107916yho.2.2013.07.03.00.02.12 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 03 Jul 2013 00:02:13 -0700 (PDT) Content-Type: multipart/signed; boundary="Apple-Mail=_900D5931-C93C-4EBD-AEE4-DD9657BD91AF"; protocol="application/pkcs7-signature"; micalg=sha1 Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\)) Subject: Re: EBS snapshot backups from a FreeBSD zfs file system: zpool freeze? From: Kevin Day In-Reply-To: Date: Wed, 3 Jul 2013 02:02:10 -0500 Message-Id: References: <87li5o5tz2.wl%berend@pobox.com> <87ehbg5raq.wl%berend@pobox.com> <20130703055047.GA54853@icarus.home.lan> <6488DECC-2455-4E92-B432-C39490D18484@dragondata.com> To: Will Andrews X-Mailer: Apple Mail (2.1508) X-Gm-Message-State: ALoCoQnKIuqOmZy6Sxi+VGM+OfPQtgecyWzqce0xGpU7okAXcuQ+9db4w0iy3uecrySqNxXoGFQn X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jul 2013 07:02:15 -0000 --Apple-Mail=_900D5931-C93C-4EBD-AEE4-DD9657BD91AF Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=iso-8859-1 On Jul 3, 2013, at 1:53 AM, Will Andrews wrote: > On Wednesday, July 3, 2013, Kevin Day wrote: > The closest thing we can do in FreeBSD is to unmount the filesystem, = take the snapshot, and remount. This has the side effect of closing all = open files, so it's not really an alternative. >=20 > The other option is to not freeze the filesystem before taking the = snapshot, but again you risk leaving things in an inconsistent state, = and/or the last few writes you think you made didn't actually get = committed to disk yet. For automated systems that create then clone = filesystems for new VMs, this can be a big problem. At best, you're = going to get a warning that the filesystem wasn't cleanly unmounted. >=20 > Actually, sync(2)/sync(8) will do the job on ZFS. It won't stop/pause = I/O running in other contexts, but it does guarantee that any commands = you ran and completed prior to calling sync will make it to disk in ZFS. >=20 > This is because sync in ZFS is implemented as a ZIL commit, so = transactions that haven't yet made it to disk via the normal syncing = context will at least be committed via their ZIL blocks. Which can then = be replayed when the pool is imported later, in this case from the EBS = snapshots. >=20 > And since the entire tree from the =FCberblock down in ZFS is COW, you = can't get an inconsistent pool simply by doing a virtual disk snapshot, = regardless of how that is implemented. >=20 > --Will. Sorry, yes, this is true. We're not using ZFS to clone and provision new = VMs, so I was just thinking about UFS here. And ZFS does have a good = advantage here that it seems to actually respect sync requests. I think = it was here I reported a few months ago that we were seeing UFS+SUJ not = actually doing anything when sync(8) was called. But for some workloads this still isn't sufficient if you have processes = running that could be writing at any time. As an example, we have a = database server using ZFS backed storage. Short of shutting down the = server, there's no way to guarantee it won't try to write even if we = lock all tables, disconnect all clients, etc. mysql has all sorts of = things done on timers that occur lazily in the future, including = periodic checkpoint writes even if there is no activity. I know this is a sort of obscure use case, but Linux and Windows both = have this functionality that VMWare will use if present (and the guest = tools know about it). Linux goes a step further and ensures that it's = not in the middle of writing anything to swap during the quiesce period, = too. I don't think this would be terribly difficult to implement, a hook = somewhere along the write chain that blocks (or queues up) anything = trying to write until the unfreeze comes along, but I'm guessing there = are all sorts of deadlock opportunities here. Either way, I'm not asking that anyone spend time to write this, I'm = just trying to reword what the original requestor was talking about. -- Kevin --Apple-Mail=_900D5931-C93C-4EBD-AEE4-DD9657BD91AF Content-Disposition: attachment; filename=smime.p7s Content-Type: application/pkcs7-signature; name=smime.p7s Content-Transfer-Encoding: base64 MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIPLzCCBN0w ggPFoAMCAQICEHGS++YZX6xNEoV0cTSiGKcwDQYJKoZIhvcNAQEFBQAwezELMAkGA1UEBhMCR0Ix GzAZBgNVBAgMEkdyZWF0ZXIgTWFuY2hlc3RlcjEQMA4GA1UEBwwHU2FsZm9yZDEaMBgGA1UECgwR Q29tb2RvIENBIExpbWl0ZWQxITAfBgNVBAMMGEFBQSBDZXJ0aWZpY2F0ZSBTZXJ2aWNlczAeFw0w NDAxMDEwMDAwMDBaFw0yODEyMzEyMzU5NTlaMIGuMQswCQYDVQQGEwJVUzELMAkGA1UECBMCVVQx FzAVBgNVBAcTDlNhbHQgTGFrZSBDaXR5MR4wHAYDVQQKExVUaGUgVVNFUlRSVVNUIE5ldHdvcmsx ITAfBgNVBAsTGGh0dHA6Ly93d3cudXNlcnRydXN0LmNvbTE2MDQGA1UEAxMtVVROLVVTRVJGaXJz dC1DbGllbnQgQXV0aGVudGljYXRpb24gYW5kIEVtYWlsMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8A MIIBCgKCAQEAsjmFpPJ9q0E7YkY3rs3BYHW8OWX5ShpHornMSMxqmNVNNRm5pELlzkniii8efNIx B8dOtINknS4p1aJkxIW9hVE1eaROaJB7HHqkkqgX8pgV8pPMyaQylbsMTzC9mKALi+VuG6JG+ni8 om+rWV6lL8/K2m2qL+usobNqqrcuZzWLeeEeaYji5kbNoKXqvgvOdjp6Dpvq/NonWz1zHyLmSGHG TPNpsaguG7bUMSAsvIKKjqQOpdeJQ/wWWq8dcdcRWdq6hw2v+vPhwvCkxWeM1tZUOt4KpLoDd7Nl yP0e03RiqhjKaJMeoYV+9Udly/hNVyh00jT/MLbu9mIwFIws6wIDAQABo4IBJzCCASMwHwYDVR0j BBgwFoAUoBEKIz6W8Qfs4q8p74Klf9AwpLQwHQYDVR0OBBYEFImCZ33EnSZwAEu0UEh83j2uBG59 MA4GA1UdDwEB/wQEAwIBBjAPBgNVHRMBAf8EBTADAQH/MB0GA1UdJQQWMBQGCCsGAQUFBwMCBggr BgEFBQcDBDARBgNVHSAECjAIMAYGBFUdIAAwewYDVR0fBHQwcjA4oDagNIYyaHR0cDovL2NybC5j b21vZG9jYS5jb20vQUFBQ2VydGlmaWNhdGVTZXJ2aWNlcy5jcmwwNqA0oDKGMGh0dHA6Ly9jcmwu Y29tb2RvLm5ldC9BQUFDZXJ0aWZpY2F0ZVNlcnZpY2VzLmNybDARBglghkgBhvhCAQEEBAMCAQYw DQYJKoZIhvcNAQEFBQADggEBAJ2Vyzy4fqUJxB6/C8LHdo45PJTGEKpPDMngq4RdiVTgZTvzbRx8 NywlVF+WIfw3hJGdFdwUT4HPVB1rbEVgxy35l1FM+WbKPKCCjKbI8OLp1Er57D9Wyd12jMOCAU9s APMeGmF0BEcDqcZAV5G8ZSLFJ2dPV9tkWtmNH7qGL/QGrpxp7en0zykX2OBKnxogL5dMUbtGB8SK N04g4wkxaMeexIud6H4RvDJoEJYRmETYKlFgTYjrdDrfQwYyyDlWjDoRUtNBpEMD9O3vMyfbOeAU TibJ2PU54om4k123KSZB6rObroP8d3XK6Mq1/uJlSmM+RMTQw16Hc6mYHK9/FX8wggUaMIIEAqAD AgECAhBtGeqnGU9qMyLmIjJ6qnHeMA0GCSqGSIb3DQEBBQUAMIGuMQswCQYDVQQGEwJVUzELMAkG A1UECBMCVVQxFzAVBgNVBAcTDlNhbHQgTGFrZSBDaXR5MR4wHAYDVQQKExVUaGUgVVNFUlRSVVNU IE5ldHdvcmsxITAfBgNVBAsTGGh0dHA6Ly93d3cudXNlcnRydXN0LmNvbTE2MDQGA1UEAxMtVVRO LVVTRVJGaXJzdC1DbGllbnQgQXV0aGVudGljYXRpb24gYW5kIEVtYWlsMB4XDTExMDQyODAwMDAw MFoXDTIwMDUzMDEwNDgzOFowgZMxCzAJBgNVBAYTAkdCMRswGQYDVQQIExJHcmVhdGVyIE1hbmNo ZXN0ZXIxEDAOBgNVBAcTB1NhbGZvcmQxGjAYBgNVBAoTEUNPTU9ETyBDQSBMaW1pdGVkMTkwNwYD VQQDEzBDT01PRE8gQ2xpZW50IEF1dGhlbnRpY2F0aW9uIGFuZCBTZWN1cmUgRW1haWwgQ0EwggEi MA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQCShIRbS1eY1F4vi6ThQMijU1hfZmXxMk73nzJ9 VdB4TFW3QpTg+SdxB8XGaaS5MsTxQBqQzCdWYn8XtXFpruUgG+TLY15gyqJB9mrho/+43x9IbWVD jCouK2M4d9+xF6zC2oIC1tQyatRnbyATj1w1+uVUgK/YcQodNwoCUFNslR2pEBS0mZVZEjH/CaLS TNxS297iQAFbSGjdxUq04O0kHzqvcV8H46y/FDuwJXFoPfQP1hdYRhWBPGiLi4MPbXohV+Y0sNsy fuNK4aVScmQmkU6lkg//4LFg/RpvaFGZY40ai6XMQpubfSJj06mg/M6ekN9EGfRcWzW6FvOnm//B AgMBAAGjggFLMIIBRzAfBgNVHSMEGDAWgBSJgmd9xJ0mcABLtFBIfN49rgRufTAdBgNVHQ4EFgQU ehNOAHRbxnhjZCfBL+KgW7x5xXswDgYDVR0PAQH/BAQDAgEGMBIGA1UdEwEB/wQIMAYBAf8CAQAw EQYDVR0gBAowCDAGBgRVHSAAMFgGA1UdHwRRME8wTaBLoEmGR2h0dHA6Ly9jcmwudXNlcnRydXN0 LmNvbS9VVE4tVVNFUkZpcnN0LUNsaWVudEF1dGhlbnRpY2F0aW9uYW5kRW1haWwuY3JsMHQGCCsG AQUFBwEBBGgwZjA9BggrBgEFBQcwAoYxaHR0cDovL2NydC51c2VydHJ1c3QuY29tL1VUTkFkZFRy dXN0Q2xpZW50X0NBLmNydDAlBggrBgEFBQcwAYYZaHR0cDovL29jc3AudXNlcnRydXN0LmNvbTAN BgkqhkiG9w0BAQUFAAOCAQEAhda+eFdVbTN/RFL+QtUGqAEDgIr7DbL9Sr/2r0FJ9RtaxdKtG3Nu PukmfOZMmMEwKN/L+0I8oSU+CnXW0D05hmbRoZu1TZtvryhsHa/l6nRaqNqxwPF1ei+eupN5yv7i kR5WdLL4jdPgQ3Ib7Y/9YDkgR/uLrzplSDyYPaUlv73vYOBJ5RbI6z9Dg/Dg7g3B080zX5vQvWBq szv++tTJOjwf7Zv/m0kzvkIpOYPuM2kugp1FTahp2oAbHj3SGl18R5mlmwhtEpmG1l1XBxunML5L SUS4kH7K0Xk467Qz+qA6XSZYnmFVGLQh1ZnV4ENAQjC+6qXnlNKw/vN1+X9u5zCCBSwwggQUoAMC AQICEQDbETdDYf7wYKjx8ymk38yAMA0GCSqGSIb3DQEBBQUAMIGTMQswCQYDVQQGEwJHQjEbMBkG A1UECBMSR3JlYXRlciBNYW5jaGVzdGVyMRAwDgYDVQQHEwdTYWxmb3JkMRowGAYDVQQKExFDT01P RE8gQ0EgTGltaXRlZDE5MDcGA1UEAxMwQ09NT0RPIENsaWVudCBBdXRoZW50aWNhdGlvbiBhbmQg U2VjdXJlIEVtYWlsIENBMB4XDTEzMDYxNjAwMDAwMFoXDTE0MDYxNjIzNTk1OVowJjEkMCIGCSqG SIb3DQEJARYVdG9hc3R5QGRyYWdvbmRhdGEuY29tMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIB CgKCAQEAvoIO+cLWLe7YYAGV/WdoWC85K8uIgstlYMg/bC8eGbC7AY/nuQXpRV5+xlTXgN7qry/m 6XErlaw1U3rmwlNyjMhJdYaPZclywBKKpYnc3sp0q2A6naeVmOF/t4QDImtfc3sV7SaEkIr7zssK MFTnkOX57g1r3MuiYoHBx1cMaWXYCJ5LDzsynwHGAExYuziRzXcu4sRZ1HBJlQ8hM3yhTTGGOQv1 H1ky13a1RxXC+uoTtYFyrxdBgPUd4eGF1tILHtK9NXnU6lhey90wDa2jmQOJQErgYuYPZriSuBXz QobK7tGcjMBgBQ1U+gxaTyThbXgxfb1MTjDx46hSl8Z35wIDAQABo4IB5TCCAeEwHwYDVR0jBBgw FoAUehNOAHRbxnhjZCfBL+KgW7x5xXswHQYDVR0OBBYEFO9wHp89I1B980w64KR38bmtuHFYMA4G A1UdDwEB/wQEAwIFoDAMBgNVHRMBAf8EAjAAMCAGA1UdJQQZMBcGCCsGAQUFBwMEBgsrBgEEAbIx AQMFAjARBglghkgBhvhCAQEEBAMCBSAwRgYDVR0gBD8wPTA7BgwrBgEEAbIxAQIBAQEwKzApBggr BgEFBQcCARYdaHR0cHM6Ly9zZWN1cmUuY29tb2RvLm5ldC9DUFMwVwYDVR0fBFAwTjBMoEqgSIZG aHR0cDovL2NybC5jb21vZG9jYS5jb20vQ09NT0RPQ2xpZW50QXV0aGVudGljYXRpb25hbmRTZWN1 cmVFbWFpbENBLmNybDCBiAYIKwYBBQUHAQEEfDB6MFIGCCsGAQUFBzAChkZodHRwOi8vY3J0LmNv bW9kb2NhLmNvbS9DT01PRE9DbGllbnRBdXRoZW50aWNhdGlvbmFuZFNlY3VyZUVtYWlsQ0EuY3J0 MCQGCCsGAQUFBzABhhhodHRwOi8vb2NzcC5jb21vZG9jYS5jb20wIAYDVR0RBBkwF4EVdG9hc3R5 QGRyYWdvbmRhdGEuY29tMA0GCSqGSIb3DQEBBQUAA4IBAQCBaQ8dcaprzzREiMtsc2UtOPSHFiCy dcd5OjE6BN+pkcQozhx3nol9dFKJ+YfGvIxIjHmDGFTOgJgJvjRZ0D1Hw2WJCEtyD+U6yi/cnDFu Ksl039qafzbah6ft2r+GM0QufuFmrBi/bTWU3lGuhL8TKOvsWeLFkyGqtv9AJz2vg7j7dpxutLQY NWnrt7nS2x6p4f1LXu3iwczefyNNFUYwE9zXAT0Uwn48g2iijuf9vekfpqtHBmfSu0tSfd3FS3JC hmFp1fMxnWOnuZ529HFtGeYzr1K8Tp+JEVPjzPCxymVFsZ945Vzj0kc0DT3f9N5Gdw6uybrUwupM NHJJCB9VMYIDrjCCA6oCAQEwgakwgZMxCzAJBgNVBAYTAkdCMRswGQYDVQQIExJHcmVhdGVyIE1h bmNoZXN0ZXIxEDAOBgNVBAcTB1NhbGZvcmQxGjAYBgNVBAoTEUNPTU9ETyBDQSBMaW1pdGVkMTkw NwYDVQQDEzBDT01PRE8gQ2xpZW50IEF1dGhlbnRpY2F0aW9uIGFuZCBTZWN1cmUgRW1haWwgQ0EC EQDbETdDYf7wYKjx8ymk38yAMAkGBSsOAwIaBQCgggHZMBgGCSqGSIb3DQEJAzELBgkqhkiG9w0B BwEwHAYJKoZIhvcNAQkFMQ8XDTEzMDcwMzA3MDIxMVowIwYJKoZIhvcNAQkEMRYEFKi00Hzns2BZ iz/dyF3vXcPoYJGeMIG6BgkrBgEEAYI3EAQxgawwgakwgZMxCzAJBgNVBAYTAkdCMRswGQYDVQQI ExJHcmVhdGVyIE1hbmNoZXN0ZXIxEDAOBgNVBAcTB1NhbGZvcmQxGjAYBgNVBAoTEUNPTU9ETyBD QSBMaW1pdGVkMTkwNwYDVQQDEzBDT01PRE8gQ2xpZW50IEF1dGhlbnRpY2F0aW9uIGFuZCBTZWN1 cmUgRW1haWwgQ0ECEQDbETdDYf7wYKjx8ymk38yAMIG8BgsqhkiG9w0BCRACCzGBrKCBqTCBkzEL MAkGA1UEBhMCR0IxGzAZBgNVBAgTEkdyZWF0ZXIgTWFuY2hlc3RlcjEQMA4GA1UEBxMHU2FsZm9y ZDEaMBgGA1UEChMRQ09NT0RPIENBIExpbWl0ZWQxOTA3BgNVBAMTMENPTU9ETyBDbGllbnQgQXV0 aGVudGljYXRpb24gYW5kIFNlY3VyZSBFbWFpbCBDQQIRANsRN0Nh/vBgqPHzKaTfzIAwDQYJKoZI hvcNAQEBBQAEggEAYk2TbYQ6LuLiAwkwJnEin+EOp7sqoKvMKePC2E577rcUUUCJ1Vvhddphdvll 7usi2w/GsWniSZ30mpxIp7Ms3mfUrfrE8n0bqoc1aVIG44ep9f4Lqzy1gDzK4HNL4TyenPrUPixS Xb0gfOyxk/rt9Xrovay8WmJOfiEnY8KIo9fZBfFopDOvZL7T1CIU4tYbNuBhTruSUYuZJKOfxTbK wEaxo+aYBE+K7jT8vhgq87Xp8RDDoSajTtdw8L8GJts6dYMGYM46kqkBkIMUYQ+HuB9Sv9qPuQkl D2J/vBIIeO7Z5LB1Irob/rQQ52goVWLuWZpqCpG9XMnWI6WNH51hfwAAAAAAAA== --Apple-Mail=_900D5931-C93C-4EBD-AEE4-DD9657BD91AF-- From owner-freebsd-fs@FreeBSD.ORG Wed Jul 3 07:33:49 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 9BBC1BA6 for ; Wed, 3 Jul 2013 07:33:49 +0000 (UTC) (envelope-from jdc@koitsu.org) Received: from relay5-d.mail.gandi.net (relay5-d.mail.gandi.net [217.70.183.197]) by mx1.freebsd.org (Postfix) with ESMTP id 28CC511C1 for ; Wed, 3 Jul 2013 07:33:48 +0000 (UTC) Received: from mfilter11-d.gandi.net (mfilter11-d.gandi.net [217.70.178.131]) by relay5-d.mail.gandi.net (Postfix) with ESMTP id E75CF41C0A9; Wed, 3 Jul 2013 09:33:37 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at mfilter11-d.gandi.net Received: from relay5-d.mail.gandi.net ([217.70.183.197]) by mfilter11-d.gandi.net (mfilter11-d.gandi.net [10.0.15.180]) (amavisd-new, port 10024) with ESMTP id BllxspXElr3D; Wed, 3 Jul 2013 09:33:36 +0200 (CEST) X-Originating-IP: 76.102.14.35 Received: from jdc.koitsu.org (c-76-102-14-35.hsd1.ca.comcast.net [76.102.14.35]) (Authenticated sender: jdc@koitsu.org) by relay5-d.mail.gandi.net (Postfix) with ESMTPSA id 8CA5241C099; Wed, 3 Jul 2013 09:33:35 +0200 (CEST) Received: by icarus.home.lan (Postfix, from userid 1000) id 4415E73A1C; Wed, 3 Jul 2013 00:33:33 -0700 (PDT) Date: Wed, 3 Jul 2013 00:33:33 -0700 From: Jeremy Chadwick To: Will Andrews Subject: Re: EBS snapshot backups from a FreeBSD zfs file system: zpool freeze? Message-ID: <20130703073333.GA57318@icarus.home.lan> References: <87li5o5tz2.wl%berend@pobox.com> <87ehbg5raq.wl%berend@pobox.com> <20130703055047.GA54853@icarus.home.lan> <6488DECC-2455-4E92-B432-C39490D18484@dragondata.com> MIME-Version: 1.0 Content-Type: text/plain; charset=unknown-8bit Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jul 2013 07:33:49 -0000 On Wed, Jul 03, 2013 at 12:53:13AM -0600, Will Andrews wrote: > On Wednesday, July 3, 2013, Kevin Day wrote: >=20 > > The closest thing we can do in FreeBSD is to unmount the filesystem, = take > > the snapshot, and remount. This has the side effect of closing all op= en > > files, so it's not really an alternative. > > > > The other option is to not freeze the filesystem before taking the > > snapshot, but again you risk leaving things in an inconsistent state, > > and/or the last few writes you think you made didn't actually get com= mitted > > to disk yet. For automated systems that create then clone filesystems= for > > new VMs, this can be a big problem. At best, you're going to get a wa= rning > > that the filesystem wasn't cleanly unmounted. > > >=20 > Actually, sync(2)/sync(8) will do the job on ZFS. It won't stop/pause I= /O > running in other contexts, but it does guarantee that any commands you = ran > and completed prior to calling sync will make it to disk in ZFS. >=20 > This is because sync in ZFS is implemented as a ZIL commit, so transact= ions > that haven't yet made it to disk via the normal syncing context will at > least be committed via their ZIL blocks. Which can then be replayed whe= n > the pool is imported later, in this case from the EBS snapshots. >=20 > And since the entire tree from the =C3=BCberblock down in ZFS is COW, y= ou can't > get an inconsistent pool simply by doing a virtual disk snapshot, > regardless of how that is implemented. I'm a little confused about this statement, particularly as a result of this thread (read the entire thing time permitting): http://lists.freebsd.org/pipermail/freebsd-fs/2013-April/016982.html UFS is what's being discussed there, but there are some blanket statements (maybe I'm taking them out of context, not entirely sure) made by Bruce there that seem to imply that sync(2) may not actually flush all memory buffers to disk when issued, only that they're "scheduled" to be flushed. The part that's confusing to me is this part of your paragraph: > This is because sync in ZFS is implemented as a ZIL commit, so transact= ions > that haven't yet made it to disk via the normal syncing context will at > least be committed via their ZIL blocks. ... What confuses me about this is that it implies these "ZIL block commits" (I/O writes of a certain type) are somehow being done outside of a normal I/O write (e.g. "normal syncing context"). To me this indicates ZFS is somehow able to tell the underlying storage subsystem driver to (speaking ATA here because it's what I'm familiar with) issue WRITE DMA EXT (0x35) or the NCQ equivalent, followed immediately by FLUSH CACHE EXT (0xea)? My understanding of the latter was that it was accomplished via BIO_FLUSH. Looking at sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c it seems there are BIO_FLUSH handlers in place (at the GEOM level). So all this makes me wonder: why exactly does sync(2) result in different behaviour on UFS than it does on ZFS? Do both of these filesystems not use BIO_write() and friends? Does sync(2) not simply iterate over all the queued BIO_write()s and BIO_FLUSH them all? Sorry if I'm overthinking this or missing something, but I just don't understand why sync(2) would flush stuff to disk with one filesystem but not another. --=20 | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Wed Jul 3 08:26:23 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 00CD99DB for ; Wed, 3 Jul 2013 08:26:22 +0000 (UTC) (envelope-from gerrit.kuehn@aei.mpg.de) Received: from umail.aei.mpg.de (umail.aei.mpg.de [194.94.224.6]) by mx1.freebsd.org (Postfix) with ESMTP id B08A8137D for ; Wed, 3 Jul 2013 08:26:22 +0000 (UTC) Received: from mailgate.aei.mpg.de (mailgate.aei.mpg.de [194.94.224.5]) by umail.aei.mpg.de (Postfix) with ESMTP id A5AD1200B92 for ; Wed, 3 Jul 2013 10:19:11 +0200 (CEST) Received: from mailgate.aei.mpg.de (localhost [127.0.0.1]) by localhost (Postfix) with SMTP id 99A5940588A for ; Wed, 3 Jul 2013 10:19:11 +0200 (CEST) Received: from intranet.aei.uni-hannover.de (ahin1.aei.uni-hannover.de [130.75.117.40]) by mailgate.aei.mpg.de (Postfix) with ESMTP id 6A863406AF1 for ; Wed, 3 Jul 2013 10:19:11 +0200 (CEST) Received: from cascade.aei.uni-hannover.de ([130.75.117.3]) by intranet.aei.uni-hannover.de (Lotus Domino Release 8.5.3) with ESMTP id 2013070310190067-78118 ; Wed, 3 Jul 2013 10:19:00 +0200 Date: Wed, 3 Jul 2013 10:19:00 +0200 From: Gerrit =?ISO-8859-1?Q?K=FChn?= To: freebsd-fs@freebsd.org Subject: pwd in zfs snapshot Message-Id: <20130703101900.4191f56e.gerrit.kuehn@aei.mpg.de> Organization: Max Planck Gesellschaft X-Mailer: Sylpheed 3.1.3 (GTK+ 2.24.6; amd64-portbld-freebsd8.2) Mime-Version: 1.0 X-MIMETrack: Itemize by SMTP Server on intranet/aei-hannover(Release 8.5.3|September 15, 2011) at 07/03/2013 10:19:00, Serialize by Router on intranet/aei-hannover(Release 8.5.3|September 15, 2011) at 07/03/2013 10:19:10, Serialize complete at 07/03/2013 10:19:10 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=US-ASCII X-PMX-Version: 6.0.2.2308539, Antispam-Engine: 2.7.2.2107409, Antispam-Data: 2013.7.3.81247 X-PerlMx-Spam: Gauge=IIIIIIII, Probability=8%, Report=' HTML_00_01 0.05, HTML_00_10 0.05, MIME_LOWER_CASE 0.05, BODYTEXTP_SIZE_3000_LESS 0, BODY_SIZE_1000_LESS 0, BODY_SIZE_2000_LESS 0, BODY_SIZE_5000_LESS 0, BODY_SIZE_600_699 0, BODY_SIZE_7000_LESS 0, URI_ENDS_IN_HTML 0, __ANY_URI 0, __CP_URI_IN_BODY 0, __CT 0, __CTE 0, __CT_TEXT_PLAIN 0, __HAS_FROM 0, __HAS_MSGID 0, __HAS_X_MAILER 0, __MIME_TEXT_ONLY 0, __MIME_VERSION 0, __SANE_MSGID 0, __SUBJ_ALPHA_END 0, __SUBJ_ALPHA_START 0, __TO_MALFORMED_2 0, __TO_NO_NAME 0, __URI_NO_WWW 0, __URI_NS ' X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jul 2013 08:26:23 -0000 Hi all, Does anyone know if this is an issue that should already be fixed? It looks like I still see it, even on rather up-to-date-systems: --- root@pt-storage:/root # cd /data/.zfs/snapshot/daily.0/ root@pt-storage:/data/.zfs/snapshot/daily.0 # pwd pwd: .: No such file or directory root@pt-storage:/data/.zfs/snapshot/daily.0 # uname -a FreeBSD pt-storage.pt.rt.aei.uni-hannover.de 9.1-RELEASE-p3 FreeBSD 9.1-RELEASE-p3 #0: Mon Apr 29 18:27:25 UTC 2013 root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 --- cu Gerrit From owner-freebsd-fs@FreeBSD.ORG Wed Jul 3 08:56:50 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 88674423 for ; Wed, 3 Jul 2013 08:56:50 +0000 (UTC) (envelope-from jdc@koitsu.org) Received: from relay3-d.mail.gandi.net (relay3-d.mail.gandi.net [217.70.183.195]) by mx1.freebsd.org (Postfix) with ESMTP id 48B13170A for ; Wed, 3 Jul 2013 08:56:49 +0000 (UTC) Received: from mfilter22-d.gandi.net (mfilter22-d.gandi.net [217.70.178.150]) by relay3-d.mail.gandi.net (Postfix) with ESMTP id 3C100A810A; Wed, 3 Jul 2013 10:56:33 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at mfilter22-d.gandi.net Received: from relay3-d.mail.gandi.net ([217.70.183.195]) by mfilter22-d.gandi.net (mfilter22-d.gandi.net [10.0.15.180]) (amavisd-new, port 10024) with ESMTP id 25YDOQaXK4qk; Wed, 3 Jul 2013 10:56:31 +0200 (CEST) X-Originating-IP: 76.102.14.35 Received: from jdc.koitsu.org (c-76-102-14-35.hsd1.ca.comcast.net [76.102.14.35]) (Authenticated sender: jdc@koitsu.org) by relay3-d.mail.gandi.net (Postfix) with ESMTPSA id 45D85A80D8; Wed, 3 Jul 2013 10:56:31 +0200 (CEST) Received: by icarus.home.lan (Postfix, from userid 1000) id 2EC3573A1C; Wed, 3 Jul 2013 01:56:29 -0700 (PDT) Date: Wed, 3 Jul 2013 01:56:29 -0700 From: Jeremy Chadwick To: Gerrit =?unknown-8bit?Q?K=FChn?= Subject: Re: pwd in zfs snapshot Message-ID: <20130703085629.GA59068@icarus.home.lan> References: <20130703101900.4191f56e.gerrit.kuehn@aei.mpg.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130703101900.4191f56e.gerrit.kuehn@aei.mpg.de> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jul 2013 08:56:50 -0000 On Wed, Jul 03, 2013 at 10:19:00AM +0200, Gerrit Khn wrote: > Hi all, > > Does anyone know if this is an issue that should already be fixed? > > > > It looks like I still see it, even on rather up-to-date-systems: > > --- > root@pt-storage:/root # cd /data/.zfs/snapshot/daily.0/ > root@pt-storage:/data/.zfs/snapshot/daily.0 # pwd > pwd: .: No such file or directory > root@pt-storage:/data/.zfs/snapshot/daily.0 # uname -a > FreeBSD pt-storage.pt.rt.aei.uni-hannover.de 9.1-RELEASE-p3 FreeBSD 9.1-RELEASE-p3 #0: Mon Apr 29 18:27:25 UTC 2013 root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 If I'm reading this correctly (very likely I'm not): http://lists.freebsd.org/pipermail/freebsd-fs/2010-February/007805.html This would imply the issue may be relieved by setting snapdir=visible on the filesystem. If it works as a workaround, great, but be aware this may not be ideal for everyone. Please try it and report back. -- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Wed Jul 3 08:59:52 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 579A84C7 for ; Wed, 3 Jul 2013 08:59:52 +0000 (UTC) (envelope-from markus.gebert@hostpoint.ch) Received: from mail.adm.hostpoint.ch (mail.adm.hostpoint.ch [IPv6:2a00:d70:0:a::e0]) by mx1.freebsd.org (Postfix) with ESMTP id F079D1727 for ; Wed, 3 Jul 2013 08:59:51 +0000 (UTC) Received: from [2001:1620:2013:1:bdf8:1930:3dc:492a] (port=63705) by mail.adm.hostpoint.ch with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.80.1 (FreeBSD)) (envelope-from ) id 1UuIuj-000GmZ-DD; Wed, 03 Jul 2013 10:59:49 +0200 Content-Type: text/plain; charset=iso-8859-1 Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\)) Subject: Re: EBS snapshot backups from a FreeBSD zfs file system: zpool freeze? From: Markus Gebert In-Reply-To: Date: Wed, 3 Jul 2013 10:59:07 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: <14A2336A-969C-4A13-9EFA-C0C42A12039F@hostpoint.ch> References: <87li5o5tz2.wl%berend@pobox.com> <87ehbg5raq.wl%berend@pobox.com> <20130703055047.GA54853@icarus.home.lan> <6488DECC-2455-4E92-B432-C39490D18484@dragondata.com> To: Kevin Day X-Mailer: Apple Mail (2.1508) Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jul 2013 08:59:52 -0000 On 03.07.2013, at 09:02, Kevin Day wrote: >=20 > On Jul 3, 2013, at 1:53 AM, Will Andrews wrote: >=20 >> On Wednesday, July 3, 2013, Kevin Day wrote: >> The closest thing we can do in FreeBSD is to unmount the filesystem, = take the snapshot, and remount. This has the side effect of closing all = open files, so it's not really an alternative. >>=20 >> The other option is to not freeze the filesystem before taking the = snapshot, but again you risk leaving things in an inconsistent state, = and/or the last few writes you think you made didn't actually get = committed to disk yet. For automated systems that create then clone = filesystems for new VMs, this can be a big problem. At best, you're = going to get a warning that the filesystem wasn't cleanly unmounted. >>=20 >> Actually, sync(2)/sync(8) will do the job on ZFS. It won't stop/pause = I/O running in other contexts, but it does guarantee that any commands = you ran and completed prior to calling sync will make it to disk in ZFS. >>=20 >> This is because sync in ZFS is implemented as a ZIL commit, so = transactions that haven't yet made it to disk via the normal syncing = context will at least be committed via their ZIL blocks. Which can then = be replayed when the pool is imported later, in this case from the EBS = snapshots. >>=20 >> And since the entire tree from the =FCberblock down in ZFS is COW, = you can't get an inconsistent pool simply by doing a virtual disk = snapshot, regardless of how that is implemented. >>=20 >> --Will. >=20 > Sorry, yes, this is true. We're not using ZFS to clone and provision = new VMs, so I was just thinking about UFS here. And ZFS does have a good = advantage here that it seems to actually respect sync requests. I think = it was here I reported a few months ago that we were seeing UFS+SUJ not = actually doing anything when sync(8) was called. >=20 > But for some workloads this still isn't sufficient if you have = processes running that could be writing at any time. As an example, we = have a database server using ZFS backed storage. Short of shutting down = the server, there's no way to guarantee it won't try to write even if we = lock all tables, disconnect all clients, etc. mysql has all sorts of = things done on timers that occur lazily in the future, including = periodic checkpoint writes even if there is no activity. >=20 > I know this is a sort of obscure use case, but Linux and Windows both = have this functionality that VMWare will use if present (and the guest = tools know about it). Linux goes a step further and ensures that it's = not in the middle of writing anything to swap during the quiesce period, = too. I don't think this would be terribly difficult to implement, a hook = somewhere along the write chain that blocks (or queues up) anything = trying to write until the unfreeze comes along, but I'm guessing there = are all sorts of deadlock opportunities here. Indeed sync(8) has the disadvantage that you cannot prevent writes = between the syscall and the EBS snapshot, so depending on the = application, this can make the resulting EBS snapshot useless. But taking a zfs snapshot is an atomic operation. Why not use that? For = example: 1. snapshot the zfs at the same point in time you'd issue that ioctl on = Linux 2. take the EBS snapshot at any time 3. clone the EBS snapshot to the new/other VM 4. zfs import the pool there 5. zfs rollback the filesystem to the snapshot taken in step 1 (or clone = it and use that) Any writes that have been issued between the zfs snapshot and the EBS = snapshot are discarded, and like that you get the exact same filesystem = data as you would have gotten with ioctl. Also, taking the zfs snapshot = should take much less time, because you don't have to wait for the EBS = snapshot to complete before you can resume IO on the filesystem. So you = don't even depend on EBS snapshots being quick when using the zfs = approach, a big advantage in my opinion. Markus From owner-freebsd-fs@FreeBSD.ORG Wed Jul 3 09:05:30 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 16402712 for ; Wed, 3 Jul 2013 09:05:30 +0000 (UTC) (envelope-from berend@pobox.com) Received: from smtp.pobox.com (b-pb-sasl-quonix.pobox.com [208.72.237.35]) by mx1.freebsd.org (Postfix) with ESMTP id C1DD71777 for ; Wed, 3 Jul 2013 09:05:29 +0000 (UTC) Received: from smtp.pobox.com (unknown [127.0.0.1]) by b-sasl-quonix.pobox.com (Postfix) with ESMTP id 9B5C02B4CC; Wed, 3 Jul 2013 09:05:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=pobox.com; h=date :message-id:from:to:cc:subject:in-reply-to:references :mime-version:content-type:content-transfer-encoding; s=sasl; bh=8CILa0zulcrGJG0ovixRmgxvzEg=; b=Z+T9lhGpFmkbOICzYkUYrv656RUA oedzd9HUaSvi5fkTkRPi4hH0yi5aXwIEg2G0I9T9FzNO2XBH6M7t3SFDKFryffTM dq6s2ZOrKD6BWiWMNk+7fNtjdtvPVhwyLvw/2Dm6KK2/RT64eZWdnNLjV6wKNKkb eMvQOnrgXKIXYcs= DomainKey-Signature: a=rsa-sha1; c=nofws; d=pobox.com; h=date:message-id :from:to:cc:subject:in-reply-to:references:mime-version :content-type:content-transfer-encoding; q=dns; s=sasl; b=WRAq9p UYsZlBM1QGKw+nEmbM3AbkvO/m3a4mU5j1xFCVcEkcdNRZRf2cWuqxgGMNHCOS3n BBXxamYLrLKw3nphAU+1PuZt6YukCrZLa0GTbeZS4Yg4F1KfoS6xC9qtxZ9wv04e ypVlvdALCHsAIHunlHG+cwa104WXZbhzAHbFQ= Received: from b-pb-sasl-quonix.pobox.com (unknown [127.0.0.1]) by b-sasl-quonix.pobox.com (Postfix) with ESMTP id 8FA1A2B4CB; Wed, 3 Jul 2013 09:05:27 +0000 (UTC) Received: from bmach.nederware.nl (unknown [27.252.247.84]) by b-sasl-quonix.pobox.com (Postfix) with ESMTPA id 0815A2B4CA; Wed, 3 Jul 2013 09:05:27 +0000 (UTC) Received: from quadrio.nederware.nl (quadrio.nederware.nl [192.168.33.13]) by bmach.nederware.nl (Postfix) with ESMTP id ED9365C6A; Wed, 3 Jul 2013 21:05:19 +1200 (NZST) Received: from quadrio.nederware.nl (quadrio.nederware.nl [127.0.0.1]) by quadrio.nederware.nl (Postfix) with ESMTP id 8EC064A11BF3; Wed, 3 Jul 2013 21:05:24 +1200 (NZST) Date: Wed, 03 Jul 2013 21:05:03 +1200 Message-ID: <874ncc5800.wl%berend@pobox.com> From: Berend de Boer To: Jeremy Chadwick Subject: Re: EBS snapshot backups from a FreeBSD zfs file system: zpool freeze? In-Reply-To: <20130703055047.GA54853@icarus.home.lan> References: <87li5o5tz2.wl%berend@pobox.com> <87ehbg5raq.wl%berend@pobox.com> <20130703055047.GA54853@icarus.home.lan> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL/10.8 EasyPG/1.0.0 Emacs/24.3 (i686-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) Organization: Xplain Technology Ltd MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: multipart/signed; boundary="pgp-sign-Multipart_Wed_Jul__3_21:05:02_2013-1"; micalg=pgp-sha256; protocol="application/pgp-signature" Content-Transfer-Encoding: 7bit X-Pobox-Relay-ID: B3DE0A5E-E3BF-11E2-BC38-E84251E3A03C-48001098!b-pb-sasl-quonix.pobox.com Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jul 2013 09:05:30 -0000 --pgp-sign-Multipart_Wed_Jul__3_21:05:02_2013-1 Content-Type: text/plain; charset=US-ASCII >>>>> "Jeremy" == Jeremy Chadwick writes: Jeremy> It sounds to me like the Linux OS images on AWS have Jeremy> utilities or the capability to create EBS images that are Jeremy> snapshots of the "virtual disks" that make up the AWS Jeremy> system, and that you can transfer these to another Linux Jeremy> AWS machine and mount the EBS images, and that this is Jeremy> being done within Linux itself. Jeremy> Correct? Not really. EBS is just block storage. To your OS (including FreeBSD) they just appear as disks. They are just disks for all intends and purposes. EBS has very little to do with it. Any attached block storage with hardware based backup will run into the same problem. -- All the best, Berend de Boer ------------------------------------------------------ Awesome Drupal hosting: https://www.xplainhosting.com/ --pgp-sign-Multipart_Wed_Jul__3_21:05:02_2013-1 Content-Type: application/pgp-signature Content-Transfer-Encoding: 7bit Content-Description: OpenPGP Digital Signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.17 (GNU/Linux) iQIcBAABCAAGBQJR0+k+AAoJEKOfeD48G3g5ltMP/Rfp3gdKfI+rpU2/6vbn14k5 DGAOSuK3asscBQAbLp5ilfXHZKx0rqTGUN3zbT7oolEkUcvvNs552j8G1W+4xwCb 5sF9zvtHXa33tYhz+9DLXYecov1OSAxaYQ7dTBhlA5/bW5ZBrgs9kwJWSm1oOskA Q56Z4vpxkHl0SJ29F9LPR5WGYz7GghhOn3wPuV1fSANlUz03x/06NrbVYO6pIFku y0EaraOJl5GT9RlO9B6zH2L1T3dQIK5uSf2roIccQo1pR3PQ3eiagjVHxoTbYuU5 I9U9iR+w68167BAbiXzH65HeN4khOetmazZogs85qVeXznUaMXNjd9bX+GXxdfX5 pa4l+LUsBxm6Y4/gP2RRyXLVus/Hn8Q78nvED0qaaDjTXja8o/vuizzeBMroWvw+ QIMZ+IVxTfnfSZLATEXBnsSnbq6yvXU9gOECZVGMfUaVt/N6sYPlP82Pg6T2ldsk rrNKJL4O+8QwmPJi6XTU093l6kVDw8rL1fCl8iivHgALkkguO2pS9HinbaptNRap Grq91f1vfgskHkA+sytun++sePUKM9HOpxy49qc+lnUDdv/NgRv20xurJ82RIW0b 0jGbCP1ILXO/8/7NA6S/vPf5JeMXmxCR3+/eyFkUO4NgUNe/LInOAfOz6+y6VBys 08lxPo0tlvewyGDq2x7e =5+La -----END PGP SIGNATURE----- --pgp-sign-Multipart_Wed_Jul__3_21:05:02_2013-1-- From owner-freebsd-fs@FreeBSD.ORG Wed Jul 3 09:07:13 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id E6635799 for ; Wed, 3 Jul 2013 09:07:13 +0000 (UTC) (envelope-from berend@pobox.com) Received: from smtp.pobox.com (b-pb-sasl-quonix.pobox.com [208.72.237.35]) by mx1.freebsd.org (Postfix) with ESMTP id A4BB11788 for ; Wed, 3 Jul 2013 09:07:13 +0000 (UTC) Received: from smtp.pobox.com (unknown [127.0.0.1]) by b-sasl-quonix.pobox.com (Postfix) with ESMTP id BE0952B5B7; Wed, 3 Jul 2013 09:07:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=pobox.com; h=date :message-id:from:to:cc:subject:in-reply-to:references :mime-version:content-type:content-transfer-encoding; s=sasl; bh=UYtZTksMegFmts/4itb0e3YqPhA=; b=AcqaiUu6GOvhyzoaFfIlVvDR/pBI dKe5g48ckfpg+3vLeS2iKveJ/+qjObNLVVjlCOeU2mSUzL7js8LWQMm9WYOFvS2j 9vwZ5cV4F7A/ecQAl/WaIra1hZBPyO31ub7uO+znWB0aRIxBQopRwnl4/xAHIxF+ lhDZdMgv4B6VXIg= DomainKey-Signature: a=rsa-sha1; c=nofws; d=pobox.com; h=date:message-id :from:to:cc:subject:in-reply-to:references:mime-version :content-type:content-transfer-encoding; q=dns; s=sasl; b=TuNAlA uhjI4DTbAv9llht5aMvwsfq1bOUS1vw5nOllOi6yxC465q58dMPuGeK6VN9h7upL fRkXJrgDmNkdFSp4JL3v4mj1ZH/O3VFQPMfUqT/Md4wRScjeigYsrkq4zwjPfN2g V9CcTQMFnR/jc1xsNasmGtp7HTroqLRP0JMwo= Received: from b-pb-sasl-quonix.pobox.com (unknown [127.0.0.1]) by b-sasl-quonix.pobox.com (Postfix) with ESMTP id B3D182B5B6; Wed, 3 Jul 2013 09:07:12 +0000 (UTC) Received: from bmach.nederware.nl (unknown [27.252.247.84]) by b-sasl-quonix.pobox.com (Postfix) with ESMTPA id 299E52B5B1; Wed, 3 Jul 2013 09:07:12 +0000 (UTC) Received: from quadrio.nederware.nl (quadrio.nederware.nl [192.168.33.13]) by bmach.nederware.nl (Postfix) with ESMTP id 4441C5C6A; Wed, 3 Jul 2013 21:07:05 +1200 (NZST) Received: from quadrio.nederware.nl (quadrio.nederware.nl [127.0.0.1]) by quadrio.nederware.nl (Postfix) with ESMTP id E05C64A11BF3; Wed, 3 Jul 2013 21:07:09 +1200 (NZST) Date: Wed, 03 Jul 2013 21:07:09 +1200 Message-ID: <8738rw57wi.wl%berend@pobox.com> From: Berend de Boer To: Kevin Day Subject: Re: EBS snapshot backups from a FreeBSD zfs file system: zpool freeze? In-Reply-To: <6488DECC-2455-4E92-B432-C39490D18484@dragondata.com> References: <87li5o5tz2.wl%berend@pobox.com> <87ehbg5raq.wl%berend@pobox.com> <20130703055047.GA54853@icarus.home.lan> <6488DECC-2455-4E92-B432-C39490D18484@dragondata.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL/10.8 EasyPG/1.0.0 Emacs/24.3 (i686-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) Organization: Xplain Technology Ltd MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: multipart/signed; boundary="pgp-sign-Multipart_Wed_Jul__3_21:07:09_2013-1"; micalg=pgp-sha256; protocol="application/pgp-signature" Content-Transfer-Encoding: 7bit X-Pobox-Relay-ID: F28864FC-E3BF-11E2-B2CC-E84251E3A03C-48001098!b-pb-sasl-quonix.pobox.com Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jul 2013 09:07:14 -0000 --pgp-sign-Multipart_Wed_Jul__3_21:07:09_2013-1 Content-Type: text/plain; charset=US-ASCII >>>>> "Kevin" == Kevin Day writes: Kevin> The other option is to not freeze the filesystem before Kevin> taking the snapshot, but again you risk leaving things in Kevin> an inconsistent state, and/or the last few writes you think Kevin> you made didn't actually get committed to disk yet. For Kevin> automated systems that create then clone filesystems for Kevin> new VMs, this can be a big problem. At best, you're going Kevin> to get a warning that the filesystem wasn't cleanly Kevin> unmounted. But I have a couple of disks, and the EBS snapshots are taking slightly one after another with a few ms between, so I don't think ZFS will cover from this. But indeed, you have explained my question very well, thanks!! -- All the best, Berend de Boer ------------------------------------------------------ Awesome Drupal hosting: https://www.xplainhosting.com/ --pgp-sign-Multipart_Wed_Jul__3_21:07:09_2013-1 Content-Type: application/pgp-signature Content-Transfer-Encoding: 7bit Content-Description: OpenPGP Digital Signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.17 (GNU/Linux) iQIcBAABCAAGBQJR0+m9AAoJEKOfeD48G3g5MxcP/j4X0p4oH4NDKcNk/hgW+7qL Lz9xxHME2sLQfjw8g28KHNTQvQ1nAmNocaupQ/9gL/W5UEw+JYC/fKg64CELdmcV YAejDZydSszktK1tuIWZh5EnZQHR7O3q2rFG2qlD61s96ssRxLKWIKIJD113HCmL jxYR/VXJpaSZZANGEZ31krwVohIZjFFYv0aRGkMf4q5CwcBUdM1pDoTjtSgD7GK1 dkJupi2T0uz1S/TF9tjpL5D5GlrOTL8xh9q0Q+m2FvQ2p+YitbIFJeBfbSKkVYp6 uHPxs3TJxWUBTHngad57Ze36tUpod+E6yPCTq6z4xuonwGRY1c84F9K2IrdYwjqm qelVdKx0cNqdB2jSMDlJ2pTH04zG7prmbUZJtOQJZhTLUp1Z4Lp/KoUko7wdGHii CN1tShBnDmi2oACSuAdcPau3FEzuV6j4PlWUY0z87z0oEZWuuA/jbMJ7Y2BrzOGz vrznHytsodU/rInUIMLDviN2zbBo+59nspSWoc2otfoqVNUi0SFnkw07M5ku/GMK PF4UB1y32oNXVXCvc9LFuDM4wVvpmxEd5/hXBlAxqL3LsThsn56p6tt2WsoKlO/W DkKLTm5RS6EOHBaSLnoG5LwD39NsUPZxNNbbO/1w/IaHri4p+QAny+v03JJSnvb4 0/rL0FSEUrPRe81lGbmd =f58n -----END PGP SIGNATURE----- --pgp-sign-Multipart_Wed_Jul__3_21:07:09_2013-1-- From owner-freebsd-fs@FreeBSD.ORG Wed Jul 3 09:10:11 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 65EB6836 for ; Wed, 3 Jul 2013 09:10:11 +0000 (UTC) (envelope-from berend@pobox.com) Received: from smtp.pobox.com (b-pb-sasl-quonix.pobox.com [208.72.237.35]) by mx1.freebsd.org (Postfix) with ESMTP id 2FBF517A1 for ; Wed, 3 Jul 2013 09:10:10 +0000 (UTC) Received: from smtp.pobox.com (unknown [127.0.0.1]) by b-sasl-quonix.pobox.com (Postfix) with ESMTP id 182BA2B7F5; Wed, 3 Jul 2013 09:10:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=pobox.com; h=date :message-id:from:to:cc:subject:in-reply-to:references :mime-version:content-type:content-transfer-encoding; s=sasl; bh=QBiUvder4E+pIS+sdk5UQ16DAmY=; b=Qns0jgD+EKIdUUvSHjhUkaOehnDI azOT7Jkyvvn0yxVxFjZStscspD4L+DWw6L/+gh6+kDaSNmTCQtAL9U6uQPJPOvX2 bkDrkAuuCSZz6zMv7F/JLrYxqp2jpV2RFaMCv2d9H1+yyF8rBWWkXWiIklP5JcLM i/nfGexPJaPJHgc= DomainKey-Signature: a=rsa-sha1; c=nofws; d=pobox.com; h=date:message-id :from:to:cc:subject:in-reply-to:references:mime-version :content-type:content-transfer-encoding; q=dns; s=sasl; b=WK7rSi t4OWnK62O5UKygV/sO8lSEqTL73KD5vffC2gjjnmt2LcAggr6AjZIjjGMID517X6 1BNIk1lWfPzSqlul0N0IRsu+cjArdxkr4FW0AwyoCi1ucKYyKC+oitGq31uT8Uvr 0C5O+SyBCGnWPo7Z2Q5DJPDvs29/kPBvOhHe8= Received: from b-pb-sasl-quonix.pobox.com (unknown [127.0.0.1]) by b-sasl-quonix.pobox.com (Postfix) with ESMTP id 0EC5D2B7F2; Wed, 3 Jul 2013 09:10:10 +0000 (UTC) Received: from bmach.nederware.nl (unknown [27.252.247.84]) by b-sasl-quonix.pobox.com (Postfix) with ESMTPA id 2C6C52B7E9; Wed, 3 Jul 2013 09:10:09 +0000 (UTC) Received: from quadrio.nederware.nl (quadrio.nederware.nl [192.168.33.13]) by bmach.nederware.nl (Postfix) with ESMTP id 49BC75C6A; Wed, 3 Jul 2013 21:10:02 +1200 (NZST) Received: from quadrio.nederware.nl (quadrio.nederware.nl [127.0.0.1]) by quadrio.nederware.nl (Postfix) with ESMTP id DB7BF4A11BF3; Wed, 3 Jul 2013 21:10:06 +1200 (NZST) Date: Wed, 03 Jul 2013 21:10:06 +1200 Message-ID: <871u7g57rl.wl%berend@pobox.com> From: Berend de Boer To: Kevin Day Subject: Re: EBS snapshot backups from a FreeBSD zfs file system: zpool freeze? In-Reply-To: References: <87li5o5tz2.wl%berend@pobox.com> <87ehbg5raq.wl%berend@pobox.com> <20130703055047.GA54853@icarus.home.lan> <6488DECC-2455-4E92-B432-C39490D18484@dragondata.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL/10.8 EasyPG/1.0.0 Emacs/24.3 (i686-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) Organization: Xplain Technology Ltd MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: multipart/signed; boundary="pgp-sign-Multipart_Wed_Jul__3_21:10:06_2013-1"; micalg=pgp-sha256; protocol="application/pgp-signature" Content-Transfer-Encoding: 7bit X-Pobox-Relay-ID: 5C09EC20-E3C0-11E2-9738-E84251E3A03C-48001098!b-pb-sasl-quonix.pobox.com Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jul 2013 09:10:11 -0000 --pgp-sign-Multipart_Wed_Jul__3_21:10:06_2013-1 Content-Type: text/plain; charset=US-ASCII >>>>> "Kevin" == Kevin Day writes: Kevin> I know this is a sort of obscure use case, but Linux and Kevin> Windows both have this functionality that VMWare will use Kevin> if present (and the guest tools know about it). May I correct you? This is not obscure. This is an extremely common use-case on Linux, either using LVM/XFS file systems, and definitely best practice on Amazon AWS. If you're not doing it this way, you're doing it wrong. Kevin> Linux goes a step further and ensures that it's not in the Kevin> middle of writing anything to swap during the quiesce Kevin> period, too. I don't think this would be terribly difficult Kevin> to implement, a hook somewhere along the write chain that Kevin> blocks (or queues up) anything trying to write until the Kevin> unfreeze comes along, but I'm guessing there are all sorts Kevin> of deadlock opportunities here. Kevin> Either way, I'm not asking that anyone spend time to write Kevin> this, I'm just trying to reword what the original requestor Kevin> was talking about. Heh, I'm asking that actually :-) Would be great to have this in FreeBSD. Once you have used EBS snapshots, you really don't want to go back. -- All the best, Berend de Boer ------------------------------------------------------ Awesome Drupal hosting: https://www.xplainhosting.com/ --pgp-sign-Multipart_Wed_Jul__3_21:10:06_2013-1 Content-Type: application/pgp-signature Content-Transfer-Encoding: 7bit Content-Description: OpenPGP Digital Signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.17 (GNU/Linux) iQIcBAABCAAGBQJR0+puAAoJEKOfeD48G3g5e64P/j5hU60lS9gxBsXiXfCpbROl sRoFgDxVLrH69Qf8RdoHcyhiS+NnLMfmdExJL/Q9/YkhdjL1oZjy3lDNJR3g+lg4 9RVzkVaYcLLUeXyHojZ1aafVUEOlj5dCl27BV96NNULUQYzgHE2z47kRPaeL6F4v ozf38ZeERtTp67d2iuIG2ftX96BawUH6ow1X2r+aCxZpMYdUigHYKHedgUz7n6Sc A5zVweP5QQMxSfJo+WQ70pjRn7k/MTgHMsnY5i75woI1UOdgvx+FO+Zd5pxANiu8 SaUa2I1N1ysRh1L4aghRS2d1j8B8kDO8IJkVpPOVBGGAM4ovZwGa5ZDiucdAlV3p 7jIrforxug8tnDuBIK7yI3l0N0O1SjpSKcBUCWDJKSYjzDLw4cCJOFzFkhvdkFi3 JbClsTqH8HgbOc1QFrX7qIyKTL60MY+i7cTaViw3NPIhWyvefpi9X5OEORiGfxXF p+ajMyt48dGPsIp6ZGeOGsIa40oPTpmZjNwXGi10etmk2wsHmbmbS0KfEPMPW6Te s2vpMj5sl/vsPGjzKlBWMJ9BWCGCN79oKUUn+93SjbH/Ely2fgjViJQO3IhnLxf6 23uJm4g6oqIWBzbH/T+NkMjIC8RAE5SSK50Q7rN0E+RvX4wrqFNqzLjc+ceOLmMr Mm4O5qWv2aFvHeTvkK2Y =6lk6 -----END PGP SIGNATURE----- --pgp-sign-Multipart_Wed_Jul__3_21:10:06_2013-1-- From owner-freebsd-fs@FreeBSD.ORG Wed Jul 3 09:15:35 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 3308D909 for ; Wed, 3 Jul 2013 09:15:35 +0000 (UTC) (envelope-from gerrit.kuehn@aei.mpg.de) Received: from umail.aei.mpg.de (umail.aei.mpg.de [194.94.224.6]) by mx1.freebsd.org (Postfix) with ESMTP id E0E2717EF for ; Wed, 3 Jul 2013 09:15:34 +0000 (UTC) Received: from mailgate.aei.mpg.de (mailgate.aei.mpg.de [194.94.224.5]) by umail.aei.mpg.de (Postfix) with ESMTP id 2843D200A07; Wed, 3 Jul 2013 11:15:34 +0200 (CEST) Received: from mailgate.aei.mpg.de (localhost [127.0.0.1]) by localhost (Postfix) with SMTP id 1BDDC405889; Wed, 3 Jul 2013 11:15:34 +0200 (CEST) Received: from intranet.aei.uni-hannover.de (ahin1.aei.uni-hannover.de [130.75.117.40]) by mailgate.aei.mpg.de (Postfix) with ESMTP id E9792406AF1; Wed, 3 Jul 2013 11:15:33 +0200 (CEST) Received: from cascade.aei.uni-hannover.de ([130.75.117.3]) by intranet.aei.uni-hannover.de (Lotus Domino Release 8.5.3) with ESMTP id 2013070311152379-78180 ; Wed, 3 Jul 2013 11:15:23 +0200 Date: Wed, 3 Jul 2013 11:15:23 +0200 From: Gerrit =?ISO-8859-1?Q?K=FChn?= To: Jeremy Chadwick Subject: Re: pwd in zfs snapshot Message-Id: <20130703111523.a94aa4c1.gerrit.kuehn@aei.mpg.de> In-Reply-To: <20130703085629.GA59068@icarus.home.lan> References: <20130703101900.4191f56e.gerrit.kuehn@aei.mpg.de> <20130703085629.GA59068@icarus.home.lan> Organization: Max Planck Gesellschaft X-Mailer: Sylpheed 3.1.3 (GTK+ 2.24.6; amd64-portbld-freebsd8.2) Mime-Version: 1.0 X-MIMETrack: Itemize by SMTP Server on intranet/aei-hannover(Release 8.5.3|September 15, 2011) at 07/03/2013 11:15:23, Serialize by Router on intranet/aei-hannover(Release 8.5.3|September 15, 2011) at 07/03/2013 11:15:33, Serialize complete at 07/03/2013 11:15:33 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=US-ASCII X-PMX-Version: 6.0.2.2308539, Antispam-Engine: 2.7.2.2107409, Antispam-Data: 2013.7.3.90318 X-PerlMx-Spam: Gauge=IIIIIIII, Probability=8%, Report=' HTML_00_01 0.05, HTML_00_10 0.05, MIME_LOWER_CASE 0.05, BODYTEXTP_SIZE_3000_LESS 0, BODY_SIZE_1100_1199 0, BODY_SIZE_2000_LESS 0, BODY_SIZE_5000_LESS 0, BODY_SIZE_7000_LESS 0, __ANY_URI 0, __BOUNCE_CHALLENGE_SUBJ 0, __BOUNCE_NDR_SUBJ_EXEMPT 0, __CT 0, __CTE 0, __CT_TEXT_PLAIN 0, __HAS_FROM 0, __HAS_MSGID 0, __HAS_X_MAILER 0, __IN_REP_TO 0, __MIME_TEXT_ONLY 0, __MIME_VERSION 0, __SANE_MSGID 0, __SUBJ_ALPHA_END 0, __SUBJ_ALPHA_NEGATE 0, __TO_MALFORMED_2 0, __URI_NO_PATH 0, __URI_NO_WWW 0, __URI_NS ' Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jul 2013 09:15:35 -0000 On Wed, 3 Jul 2013 01:56:29 -0700 Jeremy Chadwick wrote about Re: pwd in zfs snapshot: JC> This would imply the issue may be relieved by setting snapdir=visible JC> on the filesystem. If it works as a workaround, great, but be aware JC> this may not be ideal for everyone. JC> Please try it and report back. I can confirm that setting snapdir=visible makes the issue go away. On the other hand, actually I would like to keep the snapshots hidden. My actual problem is with rsync, which insists on doing pwd before working: root@pt-storage:/data/.zfs/snapshot/daily.0 # rsync -Wavn . / rsync: getcwd(): No such file or directory (2) rsync error: errors selecting input/output files, dirs (code 3) at util.c (1002) [Receiver=3.0.9] This even happens when only calling from the snapshot dir, and not trying to work on it at all: root@pt-storage:/data/.zfs/snapshot/daily.0 # rsync -Wavn /root/ / rsync: getcwd(): No such file or directory (2) rsync error: errors selecting input/output files, dirs (code 3) at util.c (1002) [Receiver=3.0.9] cu Gerrit From owner-freebsd-fs@FreeBSD.ORG Wed Jul 3 09:15:55 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 9D088978 for ; Wed, 3 Jul 2013 09:15:55 +0000 (UTC) (envelope-from berend@pobox.com) Received: from smtp.pobox.com (b-pb-sasl-quonix.pobox.com [208.72.237.35]) by mx1.freebsd.org (Postfix) with ESMTP id 6109517F4 for ; Wed, 3 Jul 2013 09:15:55 +0000 (UTC) Received: from smtp.pobox.com (unknown [127.0.0.1]) by b-sasl-quonix.pobox.com (Postfix) with ESMTP id 652F62BBBB; Wed, 3 Jul 2013 09:15:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=pobox.com; h=date :message-id:from:to:cc:subject:in-reply-to:references :mime-version:content-type:content-transfer-encoding; s=sasl; bh=qxdmMsMlfpOWjnc9v4/F8GjCju8=; b=Ke8XcB/s9ch50weu/qT6WO+Z4I43 WNQ13tkPv7sC6DJ5fnczqJRseM6v75PgkF8bgmanNMrsL2TRCuQhFR6BmcvTnW/R YMZAiC2+Mrzo/Mr6tjefGVYoa29VNt8JnhzU97iTu8GPWLOBmZIZj8Is2sH6ppyG OQ6qHCKBK+4QgeU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=pobox.com; h=date:message-id :from:to:cc:subject:in-reply-to:references:mime-version :content-type:content-transfer-encoding; q=dns; s=sasl; b=llVeJi Lfp6mcmnRoI6WXeesFaGl/oHy2AZCNAofbx9+MXwO1IwqVMsUqxUsKRN5WMnDxrP QanFHUBzwZWuA5BwtSHFBZ63X1AL2rylWvv6niKFF1UV3E6nTXjiYgM09A6po5nM x9mdLdICoUVZ5biktQMtnI8uxHq/s89JMpOHI= Received: from b-pb-sasl-quonix.pobox.com (unknown [127.0.0.1]) by b-sasl-quonix.pobox.com (Postfix) with ESMTP id 59EB42BBBA; Wed, 3 Jul 2013 09:15:53 +0000 (UTC) Received: from bmach.nederware.nl (unknown [27.252.247.84]) by b-sasl-quonix.pobox.com (Postfix) with ESMTPA id 73A152BBB4; Wed, 3 Jul 2013 09:15:52 +0000 (UTC) Received: from quadrio.nederware.nl (quadrio.nederware.nl [192.168.33.13]) by bmach.nederware.nl (Postfix) with ESMTP id 947055C6A; Wed, 3 Jul 2013 21:15:45 +1200 (NZST) Received: from quadrio.nederware.nl (quadrio.nederware.nl [127.0.0.1]) by quadrio.nederware.nl (Postfix) with ESMTP id 317FA4A11BF3; Wed, 3 Jul 2013 21:15:50 +1200 (NZST) Date: Wed, 03 Jul 2013 21:15:50 +1200 Message-ID: <87zju43sxl.wl%berend@pobox.com> From: Berend de Boer To: Markus Gebert Subject: Re: EBS snapshot backups from a FreeBSD zfs file system: zpool freeze? In-Reply-To: <14A2336A-969C-4A13-9EFA-C0C42A12039F@hostpoint.ch> References: <87li5o5tz2.wl%berend@pobox.com> <87ehbg5raq.wl%berend@pobox.com> <20130703055047.GA54853@icarus.home.lan> <6488DECC-2455-4E92-B432-C39490D18484@dragondata.com> <14A2336A-969C-4A13-9EFA-C0C42A12039F@hostpoint.ch> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL/10.8 EasyPG/1.0.0 Emacs/24.3 (i686-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) Organization: Xplain Technology Ltd MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: multipart/signed; boundary="pgp-sign-Multipart_Wed_Jul__3_21:15:49_2013-1"; micalg=pgp-sha256; protocol="application/pgp-signature" Content-Transfer-Encoding: 7bit X-Pobox-Relay-ID: 28A8F8B6-E3C1-11E2-8E6E-E84251E3A03C-48001098!b-pb-sasl-quonix.pobox.com Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jul 2013 09:15:55 -0000 --pgp-sign-Multipart_Wed_Jul__3_21:15:49_2013-1 Content-Type: text/plain; charset=US-ASCII >>>>> "Markus" == Markus Gebert writes: Markus> 1. snapshot the zfs at the same point in time you'd issue Markus> that ioctl on Linux Markus> 2. take the EBS snapshot at any time Markus> 3. clone the EBS snapshot to the new/other VM Markus> 4. zfs import the pool there Markus> 5. zfs rollback the filesystem to Markus> the snapshot taken in step 1 (or clone it and use that) That seems like a very good first step! It's unfortunately not automatic, but for recovery purposes it should do. Do you think (yes, I will definitely test this), that ZFS can mount a file system consisting of a couple of disk (raidz2 setup), and access it even though every disk might be a backup taken at a slighty different time? Obviously I'm going to throw away the mounted state and rollback to my snapshot, but it has to be able to mount a set of disks which might be in a terrible state first. Markus> Also, taking the zfs snapshot should take much less time, Markus> because you don't have to wait for the EBS snapshot to Markus> complete before you can resume IO on the filesystem. So Markus> you don't even depend on EBS snapshots being quick when Markus> using the zfs approach, a big advantage in my opinion. You don't have to wait for an EBS snapshot to complete. That can take hours. EBS simply takes the moment in time you give the command, and starts the backup from there. Normal I/O to the disk continues (so uses some kind of COW system I suppose) -- All the best, Berend de Boer ------------------------------------------------------ Awesome Drupal hosting: https://www.xplainhosting.com/ --pgp-sign-Multipart_Wed_Jul__3_21:15:49_2013-1 Content-Type: application/pgp-signature Content-Transfer-Encoding: 7bit Content-Description: OpenPGP Digital Signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.17 (GNU/Linux) iQIcBAABCAAGBQJR0+vFAAoJEKOfeD48G3g5LNoQAJ6TW0/Ui83dhd+vfklNotqA sh4ukK+2dAFmW4C33R0bZSfEHfZN+SMz/UQROAKAjwK/KveeoLDD9nXryrD9Pm/K TN9k4zXoauiKan6XQZShqtxhTKEjG6BnB+mLwLcWkQN1160AP3Da3ykKimrSfOi0 PskaMcgLj0DXCX2e8a/up77zC9ljIWNqhbJmxaggRxKUyPCeMVWQjouB72VXv27D essxiMDLpMHdZpsWnNEFAKYKjqpFWfqsWINO9jifwQqnD1mEa/SZDHClXyanBbBK OfA46Ocq8hyl+ALF6RnfAsY/Ff91roYVcklPs7zySGYf9hZbgxTyUTUDDaGUSaDA DhWT92aMopqLXnNtEfHxPJpAYvDQEspUWwGzVXhd8rFQ2FxQnVCnuplCV7zPBOIM 0RFeg1Q6gewurQvqTCxQa6vhvAP+Aely8jTepYSP9lcpZN5etX7zIxStBjKM7HqB /HCsfuNnDXHK/9q/T14/qL8aqTPTHoV0iiBVIpTF3SDPwxYaiIPEp4nr2Lra+JKp nuYLLff7F/SUoydEjxWP1mjRDFbRbbrXXM+wahWQzzLr8sl9DYV2fq/7D0GEFdAT WNflpL4K8niCJ0uzVTbrYanDcgeX0vPtYt0Y0FLFRteyZ91unhdX21WxzjSu0kt3 y96xgA+EuRNq+I13VIrJ =cXQg -----END PGP SIGNATURE----- --pgp-sign-Multipart_Wed_Jul__3_21:15:49_2013-1-- From owner-freebsd-fs@FreeBSD.ORG Wed Jul 3 09:17:47 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 371F7A12; Wed, 3 Jul 2013 09:17:47 +0000 (UTC) (envelope-from berend@pobox.com) Received: from smtp.pobox.com (b-pb-sasl-quonix.pobox.com [208.72.237.35]) by mx1.freebsd.org (Postfix) with ESMTP id E9493180E; Wed, 3 Jul 2013 09:17:46 +0000 (UTC) Received: from smtp.pobox.com (unknown [127.0.0.1]) by b-sasl-quonix.pobox.com (Postfix) with ESMTP id 534282BCD1; Wed, 3 Jul 2013 09:17:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=pobox.com; h=date :message-id:from:to:cc:subject:in-reply-to:references :mime-version:content-type:content-transfer-encoding; s=sasl; bh=14Q4zOr6OS7wtTFd3IBCzmRAcKQ=; b=mo0XOIIOYKF/aTaGBwp9sbIY5MBt 5tTtcNTLk61CtxJ8Y0c/XNqQU+qZD/qtMdnwmIGXQa96srm6LPerssG+orp958rX s/C8RaP1xKcf7es2xPP0GlHJxEP4o1ccZv+gTXzOTZOon47craLnrlOj6J3NC1KA GYrOsOFZFRt7svQ= DomainKey-Signature: a=rsa-sha1; c=nofws; d=pobox.com; h=date:message-id :from:to:cc:subject:in-reply-to:references:mime-version :content-type:content-transfer-encoding; q=dns; s=sasl; b=YEjJFV xT3dxlvTngtXBb6asSMyQHMTkji/sR6FPtBsT4tdeKAdqcoEk8RSzDtPvXTN6b5x eIm7KaWdLpL0/57TJlyPyzVDHlQnK1+1gNqar6hQW3a0980N6uBMANeguma78Anw MtpHDcquJzpe7HLebI4Y7Ru8r3t9Qu6RYcHug= Received: from b-pb-sasl-quonix.pobox.com (unknown [127.0.0.1]) by b-sasl-quonix.pobox.com (Postfix) with ESMTP id 477552BCCF; Wed, 3 Jul 2013 09:17:46 +0000 (UTC) Received: from bmach.nederware.nl (unknown [27.252.247.84]) by b-sasl-quonix.pobox.com (Postfix) with ESMTPA id B83AE2BCCE; Wed, 3 Jul 2013 09:17:45 +0000 (UTC) Received: from quadrio.nederware.nl (quadrio.nederware.nl [192.168.33.13]) by bmach.nederware.nl (Postfix) with ESMTP id D9C6A5C81; Wed, 3 Jul 2013 21:17:38 +1200 (NZST) Received: from quadrio.nederware.nl (quadrio.nederware.nl [127.0.0.1]) by quadrio.nederware.nl (Postfix) with ESMTP id 7D39A4A11BF3; Wed, 3 Jul 2013 21:17:43 +1200 (NZST) Date: Wed, 03 Jul 2013 21:17:43 +1200 Message-ID: <87y59o3sug.wl%berend@pobox.com> From: Berend de Boer To: David Xu Subject: Re: EBS snapshot backups from a FreeBSD zfs file system: zpool freeze? In-Reply-To: <51D3C2F4.8010907@freebsd.org> References: <87li5o5tz2.wl%berend@pobox.com> <51D3C2F4.8010907@freebsd.org> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL/10.8 EasyPG/1.0.0 Emacs/24.3 (i686-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) Organization: Xplain Technology Ltd MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: multipart/signed; boundary="pgp-sign-Multipart_Wed_Jul__3_21:17:43_2013-1"; micalg=pgp-sha256; protocol="application/pgp-signature" Content-Transfer-Encoding: 7bit X-Pobox-Relay-ID: 6C2D6874-E3C1-11E2-B70B-E84251E3A03C-48001098!b-pb-sasl-quonix.pobox.com Cc: freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jul 2013 09:17:47 -0000 --pgp-sign-Multipart_Wed_Jul__3_21:17:43_2013-1 Content-Type: text/plain; charset=US-ASCII >>>>> "David" == David Xu writes: David> What you need is a tool to create snapshot on EBS server, FYI, EBS is not a server. It is NAS. It's block storage. -- All the best, Berend de Boer ------------------------------------------------------ Awesome Drupal hosting: https://www.xplainhosting.com/ --pgp-sign-Multipart_Wed_Jul__3_21:17:43_2013-1 Content-Type: application/pgp-signature Content-Transfer-Encoding: 7bit Content-Description: OpenPGP Digital Signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.17 (GNU/Linux) iQIcBAABCAAGBQJR0+w3AAoJEKOfeD48G3g5oUoP/RSPAaiYfHksQG+aArnh92SI rnXoVInUGrjEhasxObP6uQRriSxLUxgBjpmjngAxFn4C3yGuxt2Mya0ReE49JyrE Lx4PsCr3WWkIwYtzmd2zI7iT8N3DkripijboldNTduz4olcr4aCnRtwzBcaPhzpv pklpN32hAaJ+QcGwaJ6JzdFXqscXQuUgIgRZRHHaIR0PcMJvSBjodoT+1gDoBqBO ScgV2eeXxp1Z72ZAhqZJA6njOufxF7+cEh8wol19YsR/hAOm6nJXCYconZoftKC1 m3jRcuUNUenoobjDe9Ln5zdlS+nyQ5s7V+2+S91kx4vwpJ2hLZRA3C7Eld1Ti2F+ sW30FM03Bi6O5CVhr/LdI1U5q+PVAyH80Db7FYLe+CGac1uSn16ZpH4qZ8LQVfGP WZcgIAbhXz3rQrNf+uWD9LZYr5Fi3ZaZRsOSnj6pSE08exJzoUILOSPKkj2trXjR 7gt+5S0vyTrYaKWPuJHD5p+83hhMzev8w1ZLl5ZJmW8zqSmBMlNyspcHEv8LxYRp S2kK7UJaeWx1ifdowDsNtDD/IMmDRkCVXF7fZ3CN7jxG2zOUIi11rGxMJ3RrDowP lyONUym/Ys0IXVb1HUzM3YsFK0v0OzkCV1Q6ePSt6FXsQgOw2MPW0gHcM1JPDHKd zepYry3f4N5VX/3cjCNU =pcfB -----END PGP SIGNATURE----- --pgp-sign-Multipart_Wed_Jul__3_21:17:43_2013-1-- From owner-freebsd-fs@FreeBSD.ORG Wed Jul 3 09:22:31 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 51260ABB for ; Wed, 3 Jul 2013 09:22:31 +0000 (UTC) (envelope-from shuriku@shurik.kiev.ua) Received: from graal.it-profi.org.ua (graal.shurik.kiev.ua [193.239.74.7]) by mx1.freebsd.org (Postfix) with ESMTP id 113601835 for ; Wed, 3 Jul 2013 09:22:30 +0000 (UTC) Received: from [217.76.201.82] (helo=thinkpad.it-profi.org.ua) by graal.it-profi.org.ua with esmtpa (Exim 4.80.1 (FreeBSD)) (envelope-from ) id 1UuJGe-0009G9-RW for freebsd-fs@freebsd.org; Wed, 03 Jul 2013 12:22:29 +0300 Message-ID: <51D3ED4F.5030102@shurik.kiev.ua> Date: Wed, 03 Jul 2013 12:22:23 +0300 From: Alexandr User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130630 Thunderbird/17.0.7 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-SA-Exim-Connect-IP: 217.76.201.82 X-SA-Exim-Mail-From: shuriku@shurik.kiev.ua X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on graal.it-profi.org.ua X-Spam-Level: X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED autolearn=unavailable version=3.3.2 Subject: Whole disk ZFS or -a4k partition X-SA-Exim-Version: 4.2 X-SA-Exim-Scanned: Yes (on graal.it-profi.org.ua) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jul 2013 09:22:31 -0000 Hello, community! I have a laptop with 2 disks - mSATA 32Gb SSD and 500Gb HDD. I plan to use SSD as whole disk ZFS pool with root and /usr partitions and 500Gb SATA as ZFS pool for /home and /var. My questions are: 1. Do I need to create a 4k aligned partition on HDD for zfs pool or best to use a whole disk? 2. Where the best place for swap - ssd or hdd? SSD is much faster, but with limited write life cycle. From owner-freebsd-fs@FreeBSD.ORG Wed Jul 3 09:39:22 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id D83FF135 for ; Wed, 3 Jul 2013 09:39:22 +0000 (UTC) (envelope-from prvs=189683e577=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 7E6E318D8 for ; Wed, 3 Jul 2013 09:39:22 +0000 (UTC) Received: from r2d2 ([82.69.141.170]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50004694576.msg for ; Wed, 03 Jul 2013 10:39:20 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Wed, 03 Jul 2013 10:39:20 +0100 (not processed: message from valid local sender) X-MDDKIM-Result: neutral (mail1.multiplay.co.uk) X-MDRemoteIP: 82.69.141.170 X-Return-Path: prvs=189683e577=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-fs@freebsd.org Message-ID: From: "Steven Hartland" To: "Alexandr" , References: <51D3ED4F.5030102@shurik.kiev.ua> Subject: Re: Whole disk ZFS or -a4k partition Date: Wed, 3 Jul 2013 10:39:28 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jul 2013 09:39:22 -0000 ----- Original Message ----- From: "Alexandr" > Hello, community! > > I have a laptop with 2 disks - mSATA 32Gb SSD and 500Gb HDD. I plan to > use SSD as whole disk ZFS pool with root and /usr partitions and 500Gb > SATA as ZFS pool for /home and /var. My questions are: > > 1. Do I need to create a 4k aligned partition on HDD for zfs pool or > best to use a whole disk? As you need a boot partition you will need to 4k align and ensure the zpool is created with 4k sectors using gnop hack. > 2. Where the best place for swap - ssd or hdd? SSD is much faster, but > with limited write life cycle. I'd use the ssd for the performance as in general you wont see significanlty higher wear leveling than general use unless you're really low on RAM and then you will want it as quick as possible anyway. If you want ZFS TRIM ensure you use a recent stable/8, stable/9 or head. Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Wed Jul 3 09:46:33 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 5BE85317 for ; Wed, 3 Jul 2013 09:46:33 +0000 (UTC) (envelope-from ml@my.gd) Received: from mail-wg0-f51.google.com (mail-wg0-f51.google.com [74.125.82.51]) by mx1.freebsd.org (Postfix) with ESMTP id E94051929 for ; Wed, 3 Jul 2013 09:46:32 +0000 (UTC) Received: by mail-wg0-f51.google.com with SMTP id e11so5438430wgh.6 for ; Wed, 03 Jul 2013 02:46:26 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=content-type:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to:x-mailer :x-gm-message-state; bh=CtdMgi1De1j0SWWNmVAkbl877AVXTZI4c/fpKMS975M=; b=SQMncBFB2GpZGFpVDrVIOtXcCgVS7C2z6GIJEAzFJvBP/8MI2ZBj9BFa1Mng2UotT5 edti7pgz0NzD3X6dUCZ+ArtLsKLXj7OmQiHLNc9nErxjpv32Wh9v5bB2/pS5za904ZMD lDbxiqRrzoFokdJqrmMOUriEll83mdsPEcJKMVpjAKj1q7ZcjDX7a2489UMyWMwR61G3 B6G47OlD336mwfMV1sf6JVRC8ioE+J+whxNcaKFYGhAa+yGIfgMoi3F7RsC1/FxowwvZ 3UKPPiokG+v8ZmuE9AtkkOOum2jYLDJ7UcuCU7D3TZe7kNB6ikYm3Ir5z6QM3gb5Veip dWUQ== X-Received: by 10.180.20.116 with SMTP id m20mr17639181wie.46.1372844786332; Wed, 03 Jul 2013 02:46:26 -0700 (PDT) Received: from dfleuriot.paris.hi-media-techno.com ([83.167.62.196]) by mx.google.com with ESMTPSA id z6sm27553634wiv.11.2013.07.03.02.46.25 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 03 Jul 2013 02:46:25 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\)) Subject: Re: Whole disk ZFS or -a4k partition From: Fleuriot Damien In-Reply-To: <51D3ED4F.5030102@shurik.kiev.ua> Date: Wed, 3 Jul 2013 11:46:29 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: References: <51D3ED4F.5030102@shurik.kiev.ua> To: Alexandr X-Mailer: Apple Mail (2.1508) X-Gm-Message-State: ALoCoQmnWal/bqBqjX+MX7KIrQzoM5qvlDMY+GnHiD/1Ym6rVNd+ptztZz8wBwPrNkQzAq7IQwhR Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jul 2013 09:46:33 -0000 On Jul 3, 2013, at 11:22 AM, Alexandr wrote: > Hello, community! >=20 > I have a laptop with 2 disks - mSATA 32Gb SSD and 500Gb HDD. I plan to > use SSD as whole disk ZFS pool with root and /usr partitions and 500Gb > SATA as ZFS pool for /home and /var. My questions are: >=20 > 1. Do I need to create a 4k aligned partition on HDD for zfs pool or > best to use a whole disk? > 2. Where the best place for swap - ssd or hdd? SSD is much faster, but > with limited write life cycle. >=20 I advise against using the whole disk for the following reason: DISK1, manufacturer 1: 3tb (give or take) DISK2, manufacturer 2: 3tb (give or take MINUS 10 BYTES) Et voila, you can't use DISK1 with DISK2 because DISK2 is 10 bytes = smaller and won't fit. When building RAID arrays, it's recommended to: - use disks from different manufacturers - use disks ordered at different times (for example half at T1, half at = T2) - shave a few MBs off your disks, so that they all present the same size Here's an example from my nas at home: % gpart show /dev/ada6 =3D> 34 5860533101 ada6 GPT (2.7T) 34 6 - free - (3.0k) 40 102400 1 freebsd-ufs (50M) 102440 5860430688 2 freebsd-zfs (2.7T) 5860533128 7 - free - (3.5k) Notice how the disk presents 5860533101 sectors of 512 bytes. Also notice how I've shaved off 50mbytes in the first, unused partition. This way, if I replace a disk later on and the new disk only has, say = 5860533095 512-byte sectors, I can still use it in my pool. From owner-freebsd-fs@FreeBSD.ORG Wed Jul 3 09:56:40 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 29FD981B for ; Wed, 3 Jul 2013 09:56:40 +0000 (UTC) (envelope-from markus.gebert@hostpoint.ch) Received: from mail.adm.hostpoint.ch (mail.adm.hostpoint.ch [IPv6:2a00:d70:0:a::e0]) by mx1.freebsd.org (Postfix) with ESMTP id C345F19B2 for ; Wed, 3 Jul 2013 09:56:39 +0000 (UTC) Received: from [2001:1620:2013:1:bdf8:1930:3dc:492a] (port=49592) by mail.adm.hostpoint.ch with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.80.1 (FreeBSD)) (envelope-from ) id 1UuJnd-000JBw-H5; Wed, 03 Jul 2013 11:56:33 +0200 Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\)) Subject: Re: EBS snapshot backups from a FreeBSD zfs file system: zpool freeze? From: Markus Gebert In-Reply-To: <87zju43sxl.wl%berend@pobox.com> Date: Wed, 3 Jul 2013 11:55:51 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: References: <87li5o5tz2.wl%berend@pobox.com> <87ehbg5raq.wl%berend@pobox.com> <20130703055047.GA54853@icarus.home.lan> <6488DECC-2455-4E92-B432-C39490D18484@dragondata.com> <14A2336A-969C-4A13-9EFA-C0C42A12039F@hostpoint.ch> <87zju43sxl.wl%berend@pobox.com> To: Berend de Boer X-Mailer: Apple Mail (2.1508) Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jul 2013 09:56:40 -0000 On 03.07.2013, at 11:15, Berend de Boer wrote: >>>>>> "Markus" =3D=3D Markus Gebert = writes: >=20 > Markus> 1. snapshot the zfs at the same point in time you'd issue > Markus> that ioctl on Linux > Markus> 2. take the EBS snapshot at any time > Markus> 3. clone the EBS snapshot to the new/other VM > Markus> 4. zfs import the pool there > Markus> 5. zfs rollback the filesystem to > Markus> the snapshot taken in step 1 (or clone it and use that) >=20 > That seems like a very good first step! >=20 > It's unfortunately not automatic, but for recovery purposes it should > do. This is as automatic as you make it to be :-). But yes, the code that = does that might not exist yet... > Do you think (yes, I will definitely test this), that ZFS can mount a > file system consisting of a couple of disk (raidz2 setup), and access > it even though every disk might be a backup taken at a slighty > different time? I'm not entirely sure. I've written the scenario above with one disk in = mind, which works for sure. I know that zfs keeps around a certain amount of old = transactions/uberblocks, so that in case it finds that the newest = transaction can't be used on import for some reason, it can rollback to = an older transaction (see -F option of zpool import). This usually means = data loss, but I guess that's a non-issue in your scenario, as you'll = throw away data newer than your snapshot anyway and the snapshot should = be on disk when you take the EBS snapshot. Then again, please test this, I'm not sure wether the old transactions = even help in this scenario. And if the time delta gets too big and you = do too many writes in the meantime, zfs might not be able to import the = pool, if no mutual transaction can be found anymore. Of course it'd be safest to EBS snapshot all disk at the same exact = time, but if I understand you correctly, there is no such functionality = and the OS is expected to guarantee some kind of consistency between = multiple related disks. > Obviously I'm going to throw away the mounted state and rollback to my > snapshot, but it has to be able to mount a set of disks which might be > in a terrible state first. >=20 >=20 > Markus> Also, taking the zfs snapshot should take much less time, > Markus> because you don't have to wait for the EBS snapshot to > Markus> complete before you can resume IO on the filesystem. So > Markus> you don't even depend on EBS snapshots being quick when > Markus> using the zfs approach, a big advantage in my opinion. >=20 > You don't have to wait for an EBS snapshot to complete. That can take > hours. EBS simply takes the moment in time you give the command, and > starts the backup from there. Normal I/O to the disk continues (so > uses some kind of COW system I suppose) Yes, but a zfs snapshot is near instant. ioctl, wait for sync, mark = clean, trigger EBS snapshot, ioctl again to resume IO, sounds like more = work. So I wasn't saying EBS snapshots are slow, but the whole process = probably isn't as quick as just taking a zfs snapshot. zfs probably will = loose some time when importing on the new VM, but at that point you = usually don't care. Markus From owner-freebsd-fs@FreeBSD.ORG Wed Jul 3 10:06:36 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 34F9DA61 for ; Wed, 3 Jul 2013 10:06:36 +0000 (UTC) (envelope-from shuriku@shurik.kiev.ua) Received: from graal.it-profi.org.ua (graal.shurik.kiev.ua [193.239.74.7]) by mx1.freebsd.org (Postfix) with ESMTP id E79181A28 for ; Wed, 3 Jul 2013 10:06:35 +0000 (UTC) Received: from [217.76.201.82] (helo=thinkpad.it-profi.org.ua) by graal.it-profi.org.ua with esmtpa (Exim 4.80.1 (FreeBSD)) (envelope-from ) id 1UuJxK-0009R7-5l for freebsd-fs@freebsd.org; Wed, 03 Jul 2013 13:06:34 +0300 Message-ID: <51D3F7A5.5030308@shurik.kiev.ua> Date: Wed, 03 Jul 2013 13:06:29 +0300 From: Alexandr User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130630 Thunderbird/17.0.7 MIME-Version: 1.0 To: freebsd-fs@freebsd.org References: <51D3ED4F.5030102@shurik.kiev.ua> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-SA-Exim-Connect-IP: 217.76.201.82 X-SA-Exim-Mail-From: shuriku@shurik.kiev.ua X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on graal.it-profi.org.ua X-Spam-Level: X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED autolearn=unavailable version=3.3.2 Subject: Re: Whole disk ZFS or -a4k partition X-SA-Exim-Version: 4.2 X-SA-Exim-Scanned: Yes (on graal.it-profi.org.ua) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jul 2013 10:06:36 -0000 03.07.2013 12:39, Steven Hartland пишет: > ----- Original Message ----- From: "Alexandr" > > >> Hello, community! >> >> I have a laptop with 2 disks - mSATA 32Gb SSD and 500Gb HDD. I plan to >> use SSD as whole disk ZFS pool with root and /usr partitions and 500Gb >> SATA as ZFS pool for /home and /var. My questions are: >> >> 1. Do I need to create a 4k aligned partition on HDD for zfs pool or >> best to use a whole disk? > > As you need a boot partition you will need to 4k align and ensure > the zpool is created with 4k sectors using gnop hack. On ssd I have whole disk zpool and boot sector installed manually with dd: dd if=/boot/zfsboot of=/dev/ada1 count=1 dd if=/boot/zfsboot of=/dev/ada1 skip=1 seek=1024 >> 2. Where the best place for swap - ssd or hdd? SSD is much faster, but >> with limited write life cycle. > > I'd use the ssd for the performance as in general you wont see > significanlty higher wear leveling than general use unless you're really > low on RAM and then you will want it as quick as possible anyway. My laptop has 8Gb DDR3 and core-i5 3210. But when hdd disk io is intensive, laptop becomes very slow and sometimes totally unusable. Using ssd as cache or log device for zpool didn't help and I decide to move system to ssd and leave at hdd only /home and /var partitions. Also I noticed than system becomes slow when using swap. > > If you want ZFS TRIM ensure you use a recent stable/8, stable/9 or head. Yes, I'm using latest 9.1-STABLE #29 r252340M > > Regards > Steve > > ================================================ > This e.mail is private and confidential between Multiplay (UK) Ltd. > and the person or entity to whom it is addressed. In the event of > misdirection, the recipient is prohibited from using, copying, > printing or otherwise disseminating it or any information contained in > it. > In the event of misdirection, illegible or incomplete transmission > please telephone +44 845 868 1337 > or return the E.mail to postmaster@multiplay.co.uk. > From owner-freebsd-fs@FreeBSD.ORG Wed Jul 3 10:09:14 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 6D141B0D for ; Wed, 3 Jul 2013 10:09:14 +0000 (UTC) (envelope-from shuriku@shurik.kiev.ua) Received: from graal.it-profi.org.ua (graal.shurik.kiev.ua [193.239.74.7]) by mx1.freebsd.org (Postfix) with ESMTP id 283B81A48 for ; Wed, 3 Jul 2013 10:09:13 +0000 (UTC) Received: from [217.76.201.82] (helo=thinkpad.it-profi.org.ua) by graal.it-profi.org.ua with esmtpa (Exim 4.80.1 (FreeBSD)) (envelope-from ) id 1UuJzr-0009Rh-N9; Wed, 03 Jul 2013 13:09:12 +0300 Message-ID: <51D3F842.9050002@shurik.kiev.ua> Date: Wed, 03 Jul 2013 13:09:06 +0300 From: Alexandr User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130630 Thunderbird/17.0.7 MIME-Version: 1.0 To: Fleuriot Damien References: <51D3ED4F.5030102@shurik.kiev.ua> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-SA-Exim-Connect-IP: 217.76.201.82 X-SA-Exim-Mail-From: shuriku@shurik.kiev.ua X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on graal.it-profi.org.ua X-Spam-Level: X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED autolearn=unavailable version=3.3.2 Subject: Re: Whole disk ZFS or -a4k partition X-SA-Exim-Version: 4.2 X-SA-Exim-Scanned: Yes (on graal.it-profi.org.ua) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jul 2013 10:09:14 -0000 03.07.2013 12:46, Fleuriot Damien пишет: > On Jul 3, 2013, at 11:22 AM, Alexandr wrote: > >> Hello, community! >> >> I have a laptop with 2 disks - mSATA 32Gb SSD and 500Gb HDD. I plan to >> use SSD as whole disk ZFS pool with root and /usr partitions and 500Gb >> SATA as ZFS pool for /home and /var. My questions are: >> >> 1. Do I need to create a 4k aligned partition on HDD for zfs pool or >> best to use a whole disk? >> 2. Where the best place for swap - ssd or hdd? SSD is much faster, but >> with limited write life cycle. >> > > I advise against using the whole disk for the following reason: > > DISK1, manufacturer 1: 3tb (give or take) > DISK2, manufacturer 2: 3tb (give or take MINUS 10 BYTES) > > Et voila, you can't use DISK1 with DISK2 because DISK2 is 10 bytes smaller and won't fit. > > > When building RAID arrays, it's recommended to: > - use disks from different manufacturers > - use disks ordered at different times (for example half at T1, half at T2) > - shave a few MBs off your disks, so that they all present the same size > > > Here's an example from my nas at home: > > % gpart show /dev/ada6 > => 34 5860533101 ada6 GPT (2.7T) > 34 6 - free - (3.0k) > 40 102400 1 freebsd-ufs (50M) > 102440 5860430688 2 freebsd-zfs (2.7T) > 5860533128 7 - free - (3.5k) > > > Notice how the disk presents 5860533101 sectors of 512 bytes. > Also notice how I've shaved off 50mbytes in the first, unused partition. > > This way, if I replace a disk later on and the new disk only has, say 5860533095 512-byte sectors, I can still use it in my pool. > Thanks, but I haven't raid arrays in my setup. I have two different zpools on ssd and hard disk. From owner-freebsd-fs@FreeBSD.ORG Wed Jul 3 11:21:29 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id DF943949 for ; Wed, 3 Jul 2013 11:21:29 +0000 (UTC) (envelope-from berend@pobox.com) Received: from smtp.pobox.com (b-pb-sasl-quonix.pobox.com [208.72.237.35]) by mx1.freebsd.org (Postfix) with ESMTP id A72691F66 for ; Wed, 3 Jul 2013 11:21:29 +0000 (UTC) Received: from smtp.pobox.com (unknown [127.0.0.1]) by b-sasl-quonix.pobox.com (Postfix) with ESMTP id 3DFCD2DE62; Wed, 3 Jul 2013 11:21:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=pobox.com; h=date :message-id:from:to:cc:subject:in-reply-to:references :mime-version:content-type:content-transfer-encoding; s=sasl; bh=bpKPzsR/LfKEZOBbr1eXEmJruzU=; b=MWeoqMMBunJJasvuc4Az7VgxcpT8 kpfmyXOuoE5jMvMtjnKJ94YgbqVCnl4Mw5i21NfKsxuVOVIUSN+N+udkiritvTtS He9+HtgEosZJVUbjgC7gcPJ6q/L3e++yvBxUcYSft1To6BFQZsC/14oGPXW5GMXl TUwUvImjuPcIwjg= DomainKey-Signature: a=rsa-sha1; c=nofws; d=pobox.com; h=date:message-id :from:to:cc:subject:in-reply-to:references:mime-version :content-type:content-transfer-encoding; q=dns; s=sasl; b=kEJLVJ x+/Qg0c0WjTklQ8FcQqGJ9ADb8vTQOZWzsgm/EgOHxEf0QCfZK0NSChokbw3E1jI efOthP2C3vRuMMGmU286yjF3WS/U6+xgnSb8cosUUQ6WJ39UMSs3AlwcN04sbc8L gbhkUUszG+vA6I9oBeNQj1YJplQcSivJQkPno= Received: from b-pb-sasl-quonix.pobox.com (unknown [127.0.0.1]) by b-sasl-quonix.pobox.com (Postfix) with ESMTP id 347072DE61; Wed, 3 Jul 2013 11:21:27 +0000 (UTC) Received: from bmach.nederware.nl (unknown [27.252.247.84]) by b-sasl-quonix.pobox.com (Postfix) with ESMTPA id 583372DE5E; Wed, 3 Jul 2013 11:21:26 +0000 (UTC) Received: from quadrio.nederware.nl (quadrio.nederware.nl [192.168.33.13]) by bmach.nederware.nl (Postfix) with ESMTP id 492CD5C5B; Wed, 3 Jul 2013 23:21:19 +1200 (NZST) Received: from quadrio.nederware.nl (quadrio.nederware.nl [127.0.0.1]) by quadrio.nederware.nl (Postfix) with ESMTP id DBA794A11BF3; Wed, 3 Jul 2013 23:21:23 +1200 (NZST) Date: Wed, 03 Jul 2013 23:21:23 +1200 Message-ID: <87ppuz51os.wl%berend@pobox.com> From: Berend de Boer To: Markus Gebert Subject: Re: EBS snapshot backups from a FreeBSD zfs file system: zpool freeze? In-Reply-To: References: <87li5o5tz2.wl%berend@pobox.com> <87ehbg5raq.wl%berend@pobox.com> <20130703055047.GA54853@icarus.home.lan> <6488DECC-2455-4E92-B432-C39490D18484@dragondata.com> <14A2336A-969C-4A13-9EFA-C0C42A12039F@hostpoint.ch> <87zju43sxl.wl%berend@pobox.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL/10.8 EasyPG/1.0.0 Emacs/24.3 (i686-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) Organization: Xplain Technology Ltd MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: multipart/signed; boundary="pgp-sign-Multipart_Wed_Jul__3_23:21:23_2013-1"; micalg=pgp-sha256; protocol="application/pgp-signature" Content-Transfer-Encoding: 7bit X-Pobox-Relay-ID: B333D710-E3D2-11E2-8D06-E84251E3A03C-48001098!b-pb-sasl-quonix.pobox.com Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jul 2013 11:21:29 -0000 --pgp-sign-Multipart_Wed_Jul__3_23:21:23_2013-1 Content-Type: text/plain; charset=US-ASCII >>>>> "Markus" == Markus Gebert writes: Markus> Of course it'd be safest to EBS snapshot all disk at the Markus> same exact time, but if I understand you correctly, there Markus> is no such functionality and the OS is expected to Markus> guarantee some kind of consistency between multiple Markus> related disks. That's exactly the point. I agree with you that the one disk solution is trivial. It's the multiple disk case that concerns me. Markus> Yes, but a zfs snapshot is near instant. ioctl, wait for Markus> sync, mark clean, trigger EBS snapshot, ioctl again to Markus> resume IO, sounds like more work. Definitely true. Could take a few seconds if you have a lot of disks. But a hickup of can't write for a few seconds isn't noticeable in most situations. And after you have done your zfs snapshot, you're not done either! You have to transfer it somewhere, probably compressed, so your (p)bzip2 dominates your CPUs, your network bandwidth is gone, etc. So a backup starts to becomes a really high impact event, while an EBS snapshot today isn't a big deal. Slightly degraded performance perhaps, but not much. -- All the best, Berend de Boer ------------------------------------------------------ Awesome Drupal hosting: https://www.xplainhosting.com/ --pgp-sign-Multipart_Wed_Jul__3_23:21:23_2013-1 Content-Type: application/pgp-signature Content-Transfer-Encoding: 7bit Content-Description: OpenPGP Digital Signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.17 (GNU/Linux) iQIcBAABCAAGBQJR1AkzAAoJEKOfeD48G3g5guQP+wcyXSEU4hWRFqmft5NmzSZq 8O/cAyAbJruyhVViMuEQFT/zgVCAhhDNt/HwQMJWIgNdyiRkX+p+jg6RqwQ4Q7tZ I8jLfMxHQhUYJhE59Vh5wfw6syPvqOjeoDpY04mfLZ8E2JfVNiibJGAWkxCRagIj IBm4HGSbxgrgD3ZR4WIdZpkY8ZeStL4NyeIPcbiu3aJzzAoW21fPkiCXZJ7eQ3Fx g9SkyhhF95+A8CShhsz/QzzYGniGAoPWd1vNJ8P8dL1A3FPRgqh3u+OKj4F3jBvq 0+S1iu4mNBik8o8I7lYa5LxDx9ydHaKNwcLBStkHySGiK4snHCKRWj6bMBdHgTP7 JHe679Hdf70ik2unbWO+tnh/Hh93NsR9OsIDcr6Zep+QUJHANQe6XUV3Or/7N+xD fY1GVnR6yq0YRKtoCk3pAfwNivKNO/9iT0CPvb3eZ3b3dc2zjtKs7hPcff8mou2l BZpRmlusN8u4BQDg799G9y8do01bef0T32E7NJYqNj0pqrbHjg12l7Wq3xBCFwyt F2oU8BT4UXA2rNXnz4+P0sjNm7hKR7paAsZmh/2PJDC+8FBAyiDVbBVrId1MF7tr w+VUBqHITSUlnsO1dctj3M18zc1GypCWgL7F+eWAnptyz9+gHn+AoFWBeyGzdFex dPKQwscCfcIgDENKBUc7 =SrOz -----END PGP SIGNATURE----- --pgp-sign-Multipart_Wed_Jul__3_23:21:23_2013-1-- From owner-freebsd-fs@FreeBSD.ORG Wed Jul 3 11:56:21 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 6F96D725 for ; Wed, 3 Jul 2013 11:56:21 +0000 (UTC) (envelope-from Ivailo.Tanusheff@skrill.com) Received: from ch1outboundpool.messaging.microsoft.com (ch1ehsobe001.messaging.microsoft.com [216.32.181.181]) by mx1.freebsd.org (Postfix) with ESMTP id 1E5D312D3 for ; Wed, 3 Jul 2013 11:56:20 +0000 (UTC) Received: from mail1-ch1-R.bigfish.com (10.43.68.233) by CH1EHSOBE013.bigfish.com (10.43.70.63) with Microsoft SMTP Server id 14.1.225.22; Wed, 3 Jul 2013 11:56:19 +0000 Received: from mail1-ch1 (localhost [127.0.0.1]) by mail1-ch1-R.bigfish.com (Postfix) with ESMTP id 9B2BE402B1; Wed, 3 Jul 2013 11:56:19 +0000 (UTC) X-Forefront-Antispam-Report: CIP:157.56.249.213; KIP:(null); UIP:(null); IPV:NLI; H:AM2PRD0710HT004.eurprd07.prod.outlook.com; RD:none; EFVD:NLI X-SpamScore: 20 X-BigFish: PS20(zz9371I542I168aJd88dizz1f42h1ee6h1de0h1fdah2073h1202h1e76h1d1ah1d2ah1fc6hzz17326ah8275dhz2fh2a8h668h839h944hd24hf0ah1220h1288h12a5h12a9h12bdh137ah13b6h1441h1504h1537h153bh162dh1631h1758h18e1h1946h19b5h19ceh1ad9h1b0ah1d07h1d0ch1d2eh1d3fh1de9h1dfeh1dffh1e1dh9a9j1594im1155h) Received-SPF: pass (mail1-ch1: domain of skrill.com designates 157.56.249.213 as permitted sender) client-ip=157.56.249.213; envelope-from=Ivailo.Tanusheff@skrill.com; helo=AM2PRD0710HT004.eurprd07.prod.outlook.com ; .outlook.com ; X-Forefront-Antispam-Report-Untrusted: SFV:NSPM; SFS:(189002)(199002)(57704003)(13464003)(377454003)(56776001)(31966008)(76576001)(74316001)(16406001)(54356001)(33646001)(59766001)(76796001)(74706001)(77982001)(63696002)(65816001)(66066001)(79102001)(80022001)(74876001)(76786001)(77096001)(47446002)(74662001)(81542001)(49866001)(46102001)(4396001)(83072001)(81342001)(69226001)(54316002)(74502001)(47976001)(74366001)(50986001)(53806001)(51856001)(76482001)(56816003)(15202345003)(47736001)(24736002)(554374003); DIR:OUT; SFP:; SCL:1; SRVR:DBXPR07MB063; H:DBXPR07MB064.eurprd07.prod.outlook.com; RD:InfoNoRecords; MX:1; A:1; LANG:en; Received: from mail1-ch1 (localhost.localdomain [127.0.0.1]) by mail1-ch1 (MessageSwitch) id 1372852577913855_1468; Wed, 3 Jul 2013 11:56:17 +0000 (UTC) Received: from CH1EHSMHS027.bigfish.com (snatpool1.int.messaging.microsoft.com [10.43.68.242]) by mail1-ch1.bigfish.com (Postfix) with ESMTP id DCD2B1E005C; Wed, 3 Jul 2013 11:56:17 +0000 (UTC) Received: from AM2PRD0710HT004.eurprd07.prod.outlook.com (157.56.249.213) by CH1EHSMHS027.bigfish.com (10.43.70.27) with Microsoft SMTP Server (TLS) id 14.1.225.23; Wed, 3 Jul 2013 11:56:17 +0000 Received: from DBXPR07MB063.eurprd07.prod.outlook.com (10.242.147.22) by AM2PRD0710HT004.eurprd07.prod.outlook.com (10.255.165.39) with Microsoft SMTP Server (TLS) id 14.16.324.0; Wed, 3 Jul 2013 11:56:08 +0000 Received: from DBXPR07MB064.eurprd07.prod.outlook.com (10.242.147.24) by DBXPR07MB063.eurprd07.prod.outlook.com (10.242.147.22) with Microsoft SMTP Server (TLS) id 15.0.702.21; Wed, 3 Jul 2013 11:56:06 +0000 Received: from DBXPR07MB064.eurprd07.prod.outlook.com ([169.254.7.13]) by DBXPR07MB064.eurprd07.prod.outlook.com ([169.254.7.13]) with mapi id 15.00.0702.005; Wed, 3 Jul 2013 11:56:06 +0000 From: Ivailo Tanusheff To: Alexandr , "freebsd-fs@freebsd.org" Subject: RE: Whole disk ZFS or -a4k partition Thread-Topic: Whole disk ZFS or -a4k partition Thread-Index: AQHOd87nvMtnsaGDhUiDqo7kXsjtiplS1DuQ Date: Wed, 3 Jul 2013 11:56:05 +0000 Message-ID: <8afaecbe0f764963b57fac7743f483bc@DBXPR07MB064.eurprd07.prod.outlook.com> References: <51D3ED4F.5030102@shurik.kiev.ua> In-Reply-To: <51D3ED4F.5030102@shurik.kiev.ua> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [217.18.249.148] x-forefront-prvs: 0896BFCE6C Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: skrill.com X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jul 2013 11:56:21 -0000 Hi, I can give you my point of view: Which zpool do you refer to? Still I would align both pools - on SSD becaus= e you will have a boot code anyway.=20 What I would do on SSD is to use: gpart add -b 34 -s 94 -t freebsd-boot The second thing I would get in mind Is the swap - I would not use a ZFS fo= r that, had a lot of issues with similar setup and taking into account that= ZFS uses a lot of RAM you may become to situation when you want to swap da= ta to use RAM for ZFS functionality, related to the swap volume on the ZFS = subsystem, which is a deadlock scenario. So I would advise you to create swap partition on the SSD and use this part= ition in the system. gpart add -t freebsd-swap -s 4G -l ssd-swap gpart add -t freebsd-zfs -l ssd-zfs This will solve you the alignment, swap issues and etc. About the second drive - there are two issues: may you need to replace the = disk and will you need additional swap in some case. Just because future is unsure I would recommend to again use 4K align, to c= reate a similar SWAP partition and DO NOT use it on the system and use gnop= for the 4K sizing: gnop create -S 4096 /dev/gpt/disk0 zpool create zroot /dev/gpt/disk0.nop zpool export zroot gnop destroy /dev/gpt/disk0.nop zpool import zroot This way you will solve: - the 4K issues, if you have any; - the replacement issues, if you get a disk with smaller sector count; - in case you have some swap issues/needs, you will be able to issue swapon= /dev/gpt/... and temporary increasethe swap, but not using this all time y= ou will not decrease performance of the system in normal days. Best regards, Ivailo Tanusheff -----Original Message----- From: owner-freebsd-fs@freebsd.org [mailto:owner-freebsd-fs@freebsd.org] On= Behalf Of Alexandr Sent: Wednesday, July 03, 2013 12:22 PM To: freebsd-fs@freebsd.org Subject: Whole disk ZFS or -a4k partition Hello, community! I have a laptop with 2 disks - mSATA 32Gb SSD and 500Gb HDD. I plan to use = SSD as whole disk ZFS pool with root and /usr partitions and 500Gb SATA as = ZFS pool for /home and /var. My questions are: 1. Do I need to create a 4k aligned partition on HDD for zfs pool or best t= o use a whole disk? 2. Where the best place for swap - ssd or hdd? SSD is much faster, but with= limited write life cycle. _______________________________________________ freebsd-fs@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Wed Jul 3 12:02:30 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 14414BA7 for ; Wed, 3 Jul 2013 12:02:30 +0000 (UTC) (envelope-from feld@feld.me) Received: from out1-smtp.messagingengine.com (out1-smtp.messagingengine.com [66.111.4.25]) by mx1.freebsd.org (Postfix) with ESMTP id DA3AE1458 for ; Wed, 3 Jul 2013 12:02:29 +0000 (UTC) Received: from compute6.internal (compute6.nyi.mail.srv.osa [10.202.2.46]) by gateway1.nyi.mail.srv.osa (Postfix) with ESMTP id 2D27F2122A for ; Wed, 3 Jul 2013 08:02:22 -0400 (EDT) Received: from frontend1.nyi.mail.srv.osa ([10.202.2.160]) by compute6.internal (MEProxy); Wed, 03 Jul 2013 08:02:23 -0400 DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=feld.me; h= content-type:to:subject:references:date:mime-version :content-transfer-encoding:from:message-id:in-reply-to; s= mesmtp; bh=F7y6yghZuE/OolerFmKQ4/7Kqxo=; b=MpTyoEZG97E2YfKWpjyDG t0y1jXGQkXdq5l4WAPPiCp9NhaLE8Eb7EfSgC8EH4IQ2cEcJc6dgRxe7/LX/fcMk b/HuQSIvrIwNhhZ0MGYljG3cEYXhBT1OgjddMd0ATw5Nw6Gb9XWSxC/U57iNVW5v Av1VLHYQKk8k03jT8iX17w= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d= messagingengine.com; h=content-type:to:subject:references:date :mime-version:content-transfer-encoding:from:message-id :in-reply-to; s=smtpout; bh=F7y6yghZuE/OolerFmKQ4/7Kqxo=; b=nFUs kO9ody8UAu1HUKtcFEyhlwTlEZDoDMZFv5vTY1AyCRAdXGdzj/3wRoSY5wM3d36X Jullt479CENUQgiXsMkE6eii5501JBUIMfS097D7J7T9E1AkJ0Bz0G9cdd+YcRjG g/V2wsk6H4zpvHHDfWYuLLaFfTbX7NHT/AqHvgU= X-Sasl-enc: cfPbhocJd7pP9dXfshRmoCyvWWN5MzCt+u0nGwMINFwY 1372852942 Received: from tech304.office.supranet.net (unknown [66.170.8.18]) by mail.messagingengine.com (Postfix) with ESMTPA id EF20FC00E8A for ; Wed, 3 Jul 2013 08:02:21 -0400 (EDT) Content-Type: text/plain; charset=us-ascii; format=flowed; delsp=yes To: freebsd-fs Subject: Re: EBS snapshot backups from a FreeBSD zfs file system: zpool freeze? References: <87li5o5tz2.wl%berend@pobox.com> <87ehbg5raq.wl%berend@pobox.com> <20130703055047.GA54853@icarus.home.lan> <6488DECC-2455-4E92-B432-C39490D18484@dragondata.com> <871u7g57rl.wl%berend@pobox.com> Date: Wed, 03 Jul 2013 07:02:21 -0500 MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: "Mark Felder" Message-ID: In-Reply-To: <871u7g57rl.wl%berend@pobox.com> User-Agent: Opera Mail/12.15 (FreeBSD) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jul 2013 12:02:30 -0000 On Wed, 03 Jul 2013 04:10:06 -0500, Berend de Boer wrote: > > Would be great to have this in FreeBSD. Once you have used EBS > snapshots, you really don't want to go back. This really does sound like Amazon needs to provide whatever mechanism to communicate between the host and the guest so this EBS snapshot can take place. On the other hand, every time I read about "block storage snapshots" -- even if you quiesce the filesystem -- I start to get really itchy thinking about the likeliness a high TPS database is going to end up with corruption and require recovery. :) From owner-freebsd-fs@FreeBSD.ORG Wed Jul 3 12:16:04 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 323D5291 for ; Wed, 3 Jul 2013 12:16:04 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail106.syd.optusnet.com.au (mail106.syd.optusnet.com.au [211.29.132.42]) by mx1.freebsd.org (Postfix) with ESMTP id B0A711792 for ; Wed, 3 Jul 2013 12:16:03 +0000 (UTC) Received: from c122-106-156-23.carlnfd1.nsw.optusnet.com.au (c122-106-156-23.carlnfd1.nsw.optusnet.com.au [122.106.156.23]) by mail106.syd.optusnet.com.au (Postfix) with ESMTPS id 81E9E3C1ECF; Wed, 3 Jul 2013 22:15:54 +1000 (EST) Date: Wed, 3 Jul 2013 22:15:50 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Jeremy Chadwick Subject: Re: EBS snapshot backups from a FreeBSD zfs file system: zpool freeze? In-Reply-To: <20130703073333.GA57318@icarus.home.lan> Message-ID: <20130703215557.U26322@besplex.bde.org> References: <87li5o5tz2.wl%berend@pobox.com> <87ehbg5raq.wl%berend@pobox.com> <20130703055047.GA54853@icarus.home.lan> <6488DECC-2455-4E92-B432-C39490D18484@dragondata.com> <20130703073333.GA57318@icarus.home.lan> MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="0-1674810795-1372853750=:26322" X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.0 cv=eqSHVfVX c=1 sm=1 a=_L_Mgjhap9MA:10 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=Xp-57ecbnGcA:10 a=6I5d2MoRAAAA:8 a=088zp9679ETcpNnLNjMA:9 a=45ClL6m2LaAA:10 a=sJYPmuLzIcQA:10 a=ebeQFi2P/qHVC0Yw9JDJ4g==:117 Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jul 2013 12:16:04 -0000 This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --0-1674810795-1372853750=:26322 Content-Type: TEXT/PLAIN; charset=X-UNKNOWN; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE On Wed, 3 Jul 2013, Jeremy Chadwick wrote: > On Wed, Jul 03, 2013 at 12:53:13AM -0600, Will Andrews wrote: >> ... >> This is because sync in ZFS is implemented as a ZIL commit, so transacti= ons >> that haven't yet made it to disk via the normal syncing context will at >> least be committed via their ZIL blocks. Which can then be replayed when >> the pool is imported later, in this case from the EBS snapshots. >> >> And since the entire tree from the =C3=BCberblock down in ZFS is COW, yo= u can't >> get an inconsistent pool simply by doing a virtual disk snapshot, >> regardless of how that is implemented. > > I'm a little confused about this statement, particularly as a result of > this thread (read the entire thing time permitting): > > http://lists.freebsd.org/pipermail/freebsd-fs/2013-April/016982.html > > UFS is what's being discussed there, but there are some blanket > statements (maybe I'm taking them out of context, not entirely sure) > made by Bruce there that seem to imply that sync(2) may not actually > flush all memory buffers to disk when issued, only that they're > "scheduled" to be flushed. That was for ffs. > ... > So all this makes me wonder: why exactly does sync(2) result in > different behaviour on UFS than it does on ZFS? Do both of these > filesystems not use BIO_write() and friends? Does sync(2) not simply > iterate over all the queued BIO_write()s and BIO_FLUSH them all? ffs uses the buffer cache, and zfs doesn't go anywhere near the buffer cache (it calls driver i/o routines fairly directly, via geom). This alone gives very different behaviour. But zfs is even more different for sync(2). > Sorry if I'm overthinking this or missing something, but I just don't > understand why sync(2) would flush stuff to disk with one filesystem but > not another. It is because zfs ignores sync(2)'s request to not wait for the i/o to complete. I don't know much else about zfs. Bruce --0-1674810795-1372853750=:26322-- From owner-freebsd-fs@FreeBSD.ORG Wed Jul 3 12:46:20 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 6C464E33 for ; Wed, 3 Jul 2013 12:46:20 +0000 (UTC) (envelope-from mxb@unixconn.com) Received: from mail-lb0-f171.google.com (mail-lb0-f171.google.com [209.85.217.171]) by mx1.freebsd.org (Postfix) with ESMTP id EB56C19DE for ; Wed, 3 Jul 2013 12:46:19 +0000 (UTC) Received: by mail-lb0-f171.google.com with SMTP id 13so178966lba.16 for ; Wed, 03 Jul 2013 05:46:12 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=from:content-type:content-transfer-encoding:subject:message-id:date :to:mime-version:x-mailer:x-gm-message-state; bh=n0yME+Us0qRM2wRmMuXjUrPHZgMMmjWwoGPdAuGtXl0=; b=XX/DIDeA7T2uXFEyZkW07aTWivnfgvWK5/zT/vo/kqCSeyzKltY/jXjUniJFsLxJRU xTcfTL1ga0feBFtw5T0o3Oxl+JyxWIA6JEaldU8055DfhNfLZpoqgmoTT3+48kM8BTAD eJwp06ix4X29e5ijL18gbWPFmjHZJ7WeNf9+7CwZRTbMjaxiYXl9d/v5pICqwQBYbDFu EoJUjEu5H+lQyEhY8XXUtmCHQYgknsqoq9jhhxKdJ/uwbKQHfGyZPNMpLnjXZDGA6Dcj 7Wrh8t8RmNyoFV9XEzDdQ7lIqIhCHXV/HLth8WnqxGyxvI+StmRIbqlA0QFJ2XSfbZSo H/bg== X-Received: by 10.152.6.4 with SMTP id w4mr434808law.14.1372855571925; Wed, 03 Jul 2013 05:46:11 -0700 (PDT) Received: from grey.home.unixconn.com (h-74-23.a183.priv.bahnhof.se. [46.59.74.23]) by mx.google.com with ESMTPSA id 8sm9862885lbn.9.2013.07.03.05.46.10 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 03 Jul 2013 05:46:11 -0700 (PDT) From: Maxim Bourmistrov Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Subject: Slow resilvering with mirrored ZIL Message-Id: Date: Wed, 3 Jul 2013 14:46:09 +0200 To: "freebsd-fs@freebsd.org" Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\)) X-Mailer: Apple Mail (2.1508) X-Gm-Message-State: ALoCoQkoO9Scg1yF0w0teCZdSV6KlOY0KSQ8WW/KbLaJfFR00azDFRWI33Yu0OtReiLCA8r7uVuF X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jul 2013 12:46:20 -0000 Hello list@, I'm in the middle of replacing my old hard drives with new ones on my = ZFS NAS. There are at least two ways to accomplish this.=20 I'm however, doing resilvering of new ones. While doing a resilvering of one of possible four drivers I had 40-60 = MB/s resilvering speed. ZIL is located on a 10G partition on an Intel SLC SSD. It toke about 24h to resilver about 1.5T. Then I decided to create a 10G partition on another SSD(OCZ AGILITY 3 , = MLC) and created a mirror of ZIL. Done that, I continued to resilver the second drive out of four and = noted significant speed drop of resilvering. 5MB/s as of now. I understand that doing a ZIL mirror will result in performance penalty, = but not that much?! Box is slightly loaded and I tried to completely shutdown all services = and disconnect it from LAN in order to isolate it, without any = improvements in speed. Any ideas? Am I completely wrong in assuming that ZIL is involved in resilvering = process? P.S. SSDs: ada4: ATA-7 SATA 2.x device ada4: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada4: Command Queueing enabled ada4: 19087MB (39091248 512 byte sectors: 16H 63S/T 16383C) ada4: Previously was known as ad12 ada5 at ahcich5 bus 0 scbus5 target 0 lun 0 ada5: ATA-8 SATA 3.x device ada5: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada5: Command Queueing enabled ada5: 57241MB (117231408 512 byte sectors: 16H 63S/T 16383C) ada5: Previously was known as ad14 From owner-freebsd-fs@FreeBSD.ORG Wed Jul 3 13:03:06 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 69AA0A1C for ; Wed, 3 Jul 2013 13:03:06 +0000 (UTC) (envelope-from daniel@digsys.bg) Received: from smtp-sofia.digsys.bg (smtp-sofia.digsys.bg [193.68.21.123]) by mx1.freebsd.org (Postfix) with ESMTP id CB7EA1B01 for ; Wed, 3 Jul 2013 13:03:05 +0000 (UTC) Received: from dcave.digsys.bg (dcave.digsys.bg [193.68.6.1]) (authenticated bits=0) by smtp-sofia.digsys.bg (8.14.6/8.14.6) with ESMTP id r63D33lY036715 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO) for ; Wed, 3 Jul 2013 16:03:03 +0300 (EEST) (envelope-from daniel@digsys.bg) Message-ID: <51D42107.1050107@digsys.bg> Date: Wed, 03 Jul 2013 16:03:03 +0300 From: Daniel Kalchev User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130627 Thunderbird/17.0.7 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Re: Slow resilvering with mirrored ZIL References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jul 2013 13:03:06 -0000 On 03.07.13 15:46, Maxim Bourmistrov wrote: > Any ideas? > Am I completely wrong in assuming that ZIL is involved in resilvering process? Yes. :-) Under some conditions, such as using dedup resilver can be really slow. Nothing to do with ZIL though. You can safely remove ZIL while resilvering.. it is used only during sync writes to the pool at file system level. Daniel From owner-freebsd-fs@FreeBSD.ORG Wed Jul 3 13:16:14 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 7F4DFED6 for ; Wed, 3 Jul 2013 13:16:14 +0000 (UTC) (envelope-from shuriku@shurik.kiev.ua) Received: from graal.it-profi.org.ua (graal.shurik.kiev.ua [193.239.74.7]) by mx1.freebsd.org (Postfix) with ESMTP id 3D4841BEE for ; Wed, 3 Jul 2013 13:16:13 +0000 (UTC) Received: from [217.76.201.82] (helo=thinkpad.it-profi.org.ua) by graal.it-profi.org.ua with esmtpa (Exim 4.80.1 (FreeBSD)) (envelope-from ) id 1UuMup-000AN3-FK; Wed, 03 Jul 2013 16:16:11 +0300 Message-ID: <51D42416.8080604@shurik.kiev.ua> Date: Wed, 03 Jul 2013 16:16:06 +0300 From: Alexandr User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130630 Thunderbird/17.0.7 MIME-Version: 1.0 To: Ivailo Tanusheff References: <51D3ED4F.5030102@shurik.kiev.ua> <8afaecbe0f764963b57fac7743f483bc@DBXPR07MB064.eurprd07.prod.outlook.com> In-Reply-To: <8afaecbe0f764963b57fac7743f483bc@DBXPR07MB064.eurprd07.prod.outlook.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-SA-Exim-Connect-IP: 217.76.201.82 X-SA-Exim-Mail-From: shuriku@shurik.kiev.ua X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on graal.it-profi.org.ua X-Spam-Level: X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED autolearn=unavailable version=3.3.2 Subject: Re: Whole disk ZFS or -a4k partition X-SA-Exim-Version: 4.2 X-SA-Exim-Scanned: Yes (on graal.it-profi.org.ua) Cc: "freebsd-fs@freebsd.org" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jul 2013 13:16:14 -0000 03.07.2013 14:56, Ivailo Tanusheff пишет: > Hi, > > I can give you my point of view: > > Which zpool do you refer to? Still I would align both pools - on SSD because you will have a boot code anyway. > What I would do on SSD is to use: > gpart add -b 34 -s 94 -t freebsd-boot > > The second thing I would get in mind Is the swap - I would not use a ZFS for that, had a lot of issues with similar setup and taking into account that ZFS uses a lot of RAM you may become to situation when you want to swap data to use RAM for ZFS functionality, related to the swap volume on the ZFS subsystem, which is a deadlock scenario. > So I would advise you to create swap partition on the SSD and use this partition in the system. > gpart add -t freebsd-swap -s 4G -l ssd-swap > gpart add -t freebsd-zfs -l ssd-zfs > > This will solve you the alignment, swap issues and etc. > > About the second drive - there are two issues: may you need to replace the disk and will you need additional swap in some case. > Just because future is unsure I would recommend to again use 4K align, to create a similar SWAP partition and DO NOT use it on the system and use gnop for the 4K sizing: > gnop create -S 4096 /dev/gpt/disk0 > zpool create zroot /dev/gpt/disk0.nop > zpool export zroot > gnop destroy /dev/gpt/disk0.nop > zpool import zroot > > This way you will solve: > - the 4K issues, if you have any; > - the replacement issues, if you get a disk with smaller sector count; > - in case you have some swap issues/needs, you will be able to issue swapon /dev/gpt/... and temporary increasethe swap, but not using this all time you will not decrease performance of the system in normal days. > > Best regards, > Ivailo Tanusheff > > Thank you for your explain. I'll try it soon and post my results. One thing - I can't use gpt-disks because off my laptop's bios (Lenovo Thinkpad E530) cannot boot it, only mbr-style. From owner-freebsd-fs@FreeBSD.ORG Wed Jul 3 13:18:42 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 24D851C2 for ; Wed, 3 Jul 2013 13:18:42 +0000 (UTC) (envelope-from daniel@digsys.bg) Received: from smtp-sofia.digsys.bg (smtp-sofia.digsys.bg [193.68.21.123]) by mx1.freebsd.org (Postfix) with ESMTP id 81CDE1C21 for ; Wed, 3 Jul 2013 13:18:41 +0000 (UTC) Received: from dcave.digsys.bg (dcave.digsys.bg [193.68.6.1]) (authenticated bits=0) by smtp-sofia.digsys.bg (8.14.6/8.14.6) with ESMTP id r63D0P1V036196 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO) for ; Wed, 3 Jul 2013 16:00:26 +0300 (EEST) (envelope-from daniel@digsys.bg) Message-ID: <51D42069.5060409@digsys.bg> Date: Wed, 03 Jul 2013 16:00:25 +0300 From: Daniel Kalchev User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130627 Thunderbird/17.0.7 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Re: EBS snapshot backups from a FreeBSD zfs file system: zpool freeze? References: <87li5o5tz2.wl%berend@pobox.com> <87ehbg5raq.wl%berend@pobox.com> <20130703055047.GA54853@icarus.home.lan> <6488DECC-2455-4E92-B432-C39490D18484@dragondata.com> <14A2336A-969C-4A13-9EFA-C0C42A12039F@hostpoint.ch> <87zju43sxl.wl%berend@pobox.com> <87ppuz51os.wl%berend@pobox.com> In-Reply-To: <87ppuz51os.wl%berend@pobox.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jul 2013 13:18:42 -0000 On 03.07.13 14:21, Berend de Boer wrote: > And after you have done your zfs snapshot, you're not done either! You > have to transfer it somewhere, probably compressed, so your (p)bzip2 > dominates your CPUs, your network bandwidth is gone, etc. The idea that was proposed was to create an local ZFS snapshot, that is not being sent anywhere. Because the ZFS snapshot is a ZIL commit, it can be very fast. Well, how fast, depends on some conditions - I have an system that sometimes takes minutes for a snapshot, but .. I am really torturing ZFS there. With the local ZFS snapshot, you then trigger an EBS snapshot of your disks. That is more or less identical to your server losing power and then coming back -- you only are sure there is a consistent snapshot of the filesystem available. However, whether this suits you or not is another matter. Do you want to essentially emulate power loss/restart of the server when you revert to use those snapshots? If so, then you are ok. ZFS has you covered. Perhaps even without making the snapshot in the first place. But, if you want your application data consistent on disk, then temporarily stopping the applications is the only safe way -- FreeBSD/ZFS, Linux or what you have won't make any difference. > So a backup starts to becomes a really high impact event, while an EBS > snapshot today isn't a big deal. Slightly degraded performance > perhaps, but not much. It seems, Amazon uses some sort of ZFS (volume) snapshots in order to implement the functionality of EBS. Why it would take hours to complete is hard to understand, perhaps they are actually backing it up somewhere too, using ZFS send/receive (or equivalents if they don't use ZFS). Daniel From owner-freebsd-fs@FreeBSD.ORG Wed Jul 3 13:36:27 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id DDD07429 for ; Wed, 3 Jul 2013 13:36:27 +0000 (UTC) (envelope-from mxb@alumni.chalmers.se) Received: from mail-la0-f42.google.com (mail-la0-f42.google.com [209.85.215.42]) by mx1.freebsd.org (Postfix) with ESMTP id 63E641CDA for ; Wed, 3 Jul 2013 13:36:26 +0000 (UTC) Received: by mail-la0-f42.google.com with SMTP id eb20so142233lab.15 for ; Wed, 03 Jul 2013 06:36:20 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=content-type:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to:x-mailer :x-gm-message-state; bh=n7Ilz8wd5SDFuNOLZkNjL6YSm2H4mmLFi79WSqkOBGo=; b=duE41FUVVVOF1zAn9rZiv5gE7l6u0PzwEe3J/va/xxGlc36Gse+P0BPFfm6EAot9Td HOrC+kjSxaLzahWjxnAaC4jx6XB8CqKmPuDqc0txfwJwvtweF0b1vJPX5oex5yuhm3r7 v6qUV2fmbxYUPeepMjb3AlV9KGbKkLv+vERRz4UbhIPfXO/9pfJpM0Q5PG5WcIzRFeZn SFtBHUYn4lXhDF/Il4SfHZfGcczDJPcqXsWxwM1BeYRgQrFwKYv4Hz0kPxxTOY7Q0RqF j+2ZcbLHL641P0kLOvbuRb9uPJZ/d8MMukD+kVsfzVSD+Trk9scpIFkzH+O6/GVOL0R9 JN8w== X-Received: by 10.112.168.132 with SMTP id zw4mr1271492lbb.79.1372858579988; Wed, 03 Jul 2013 06:36:19 -0700 (PDT) Received: from grey.home.unixconn.com (h-74-23.a183.priv.bahnhof.se. [46.59.74.23]) by mx.google.com with ESMTPSA id m1sm10885043lag.3.2013.07.03.06.36.18 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 03 Jul 2013 06:36:19 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\)) Subject: Re: Slow resilvering with mirrored ZIL From: mxb In-Reply-To: <51D42107.1050107@digsys.bg> Date: Wed, 3 Jul 2013 15:36:17 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: <2EF46A8C-6908-4160-BF99-EC610B3EA771@alumni.chalmers.se> References: <51D42107.1050107@digsys.bg> To: Daniel Kalchev X-Mailer: Apple Mail (2.1508) X-Gm-Message-State: ALoCoQm+OORnKeyGYVI3uVBzvuSvDFCqSqnM/dGl0GIBnEgnF7wQabX0d2wJHVCufKUkRHeiQUFR Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jul 2013 13:36:27 -0000 Well, then my question persists - why I get so significant drop of speed = while resilvering second drive. The only changes to the system are: 1. Second partition for ZIL to create a mirror 2. New disks are 7200rpm. old ones are 5400rpm. On 3 jul 2013, at 15:03, Daniel Kalchev wrote: >=20 > On 03.07.13 15:46, Maxim Bourmistrov wrote: >> Any ideas? >> Am I completely wrong in assuming that ZIL is involved in resilvering = process? >=20 > Yes. :-) >=20 > Under some conditions, such as using dedup resilver can be really = slow. Nothing to do with ZIL though. You can safely remove ZIL while = resilvering.. it is used only during sync writes to the pool at file = system level. >=20 > Daniel > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Wed Jul 3 14:40:39 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 08115F75 for ; Wed, 3 Jul 2013 14:40:39 +0000 (UTC) (envelope-from daniel@digsys.bg) Received: from smtp-sofia.digsys.bg (smtp-sofia.digsys.bg [193.68.21.123]) by mx1.freebsd.org (Postfix) with ESMTP id 83A8A103C for ; Wed, 3 Jul 2013 14:40:37 +0000 (UTC) Received: from dcave.digsys.bg (dcave.digsys.bg [193.68.6.1]) (authenticated bits=0) by smtp-sofia.digsys.bg (8.14.6/8.14.6) with ESMTP id r63EeY2S059208 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Wed, 3 Jul 2013 17:40:34 +0300 (EEST) (envelope-from daniel@digsys.bg) Message-ID: <51D437E2.4060101@digsys.bg> Date: Wed, 03 Jul 2013 17:40:34 +0300 From: Daniel Kalchev User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130627 Thunderbird/17.0.7 MIME-Version: 1.0 To: mxb Subject: Re: Slow resilvering with mirrored ZIL References: <51D42107.1050107@digsys.bg> <2EF46A8C-6908-4160-BF99-EC610B3EA771@alumni.chalmers.se> In-Reply-To: <2EF46A8C-6908-4160-BF99-EC610B3EA771@alumni.chalmers.se> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jul 2013 14:40:39 -0000 On 03.07.13 16:36, mxb wrote: > Well, then my question persists - why I get so significant drop of speed while resilvering second drive. > The only changes to the system are: > > 1. Second partition for ZIL to create a mirror > 2. New disks are 7200rpm. old ones are 5400rpm. > I can't see how the ZIL can be involved in resilvering. The ZIL is only used when you write synchronous data to the pool, which is not happening during resilver. As mentioned already, you do not need ZIL during the resilver and you can zpool remove the ZIL drives from your pool to verify. What is happening during resilver is reading from the other drives in the vdev and writing to the new drive. You mention four drives so I assume this is an raidz? Is it possible that the old drives and the zpool vdev(s) are 512byte sector size and the new drive is 4k sector size? If so, you might experience severe slowdown. The only way to fix this situation is to recreate the zpool -- copy data out of the pool, create the pool with 4k alignment for vdevs and copy data back. You can check the ashift property in the output of zdb. If ashift=9 you have 512 byte sector vdev(s). If ashift=12 you have an 4k sector vdev(s). In any case, you can easily rule out the effects of ZIL on your pool by just removing the log devices. Daniel From owner-freebsd-fs@FreeBSD.ORG Wed Jul 3 14:53:05 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 0CEDD185 for ; Wed, 3 Jul 2013 14:53:05 +0000 (UTC) (envelope-from feld@feld.me) Received: from out1-smtp.messagingengine.com (out1-smtp.messagingengine.com [66.111.4.25]) by mx1.freebsd.org (Postfix) with ESMTP id D533910C8 for ; Wed, 3 Jul 2013 14:53:04 +0000 (UTC) Received: from compute4.internal (compute4.nyi.mail.srv.osa [10.202.2.44]) by gateway1.nyi.mail.srv.osa (Postfix) with ESMTP id DA85B211D2; Wed, 3 Jul 2013 10:53:03 -0400 (EDT) Received: from frontend2.nyi.mail.srv.osa ([10.202.2.161]) by compute4.internal (MEProxy); Wed, 03 Jul 2013 10:53:03 -0400 DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=feld.me; h= content-type:to:cc:subject:references:date:mime-version :content-transfer-encoding:from:message-id:in-reply-to; s= mesmtp; bh=WO4IjV/IhYkEcIXFzzcDCjrotuw=; b=OpnANfXPL208eEg+hQPmH LU+aAc1I2k52am/293acCy/EWUebzZ5kB2hHbZ5Zlbexj2MVgexdfVCK7BODhoMT bpW8IKJAYKjuuKDTeikPiPkcbUPGeglVSCLWmx7xCyDEnNxRkDHJ3w5yxf3iAcAv 37qM1PcRXGBuMOXmyS5fpg= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d= messagingengine.com; h=content-type:to:cc:subject:references :date:mime-version:content-transfer-encoding:from:message-id :in-reply-to; s=smtpout; bh=WO4IjV/IhYkEcIXFzzcDCjrotuw=; b=pzfI E2V98nJEUKfhBKLNJrYfy7NFCrb92o56bgTWykbJSmWO2j5rNqI4qMU2/Xpf0WxO 2DW1r9P/Ib5bcdOej9RRfpS1BX34gkDqpEk/P9rxSA8KNgoNi561Wu5N6CQSopYK pcni4FAmOxUcjJEg+GxmcpGc7+fvMAlwDPrFUug= X-Sasl-enc: EMpGqfnJf7Qp6itTWPgR5bdAtlv3V5n2oTXNTtNxl9AL 1372863183 Received: from tech304.office.supranet.net (unknown [66.170.8.18]) by mail.messagingengine.com (Postfix) with ESMTPA id 89B4D680252; Wed, 3 Jul 2013 10:53:03 -0400 (EDT) Content-Type: text/plain; charset=us-ascii; format=flowed; delsp=yes To: mxb , "Daniel Kalchev" Subject: Re: Slow resilvering with mirrored ZIL References: <51D42107.1050107@digsys.bg> <2EF46A8C-6908-4160-BF99-EC610B3EA771@alumni.chalmers.se> <51D437E2.4060101@digsys.bg> Date: Wed, 03 Jul 2013 09:53:03 -0500 MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: "Mark Felder" Message-ID: In-Reply-To: <51D437E2.4060101@digsys.bg> User-Agent: Opera Mail/12.15 (FreeBSD) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jul 2013 14:53:05 -0000 On Wed, 03 Jul 2013 09:40:34 -0500, Daniel Kalchev wrote: > > What is happening during resilver is reading from the other drives in > the vdev and writing to the new drive. You mention four drives so I > assume this is an raidz? Is it possible that the old drives and the > zpool vdev(s) are 512byte sector size and the new drive is 4k sector > size? If so, you might experience severe slowdown. The only way to fix > this situation is to recreate the zpool -- copy data out of the pool, > create the pool with 4k alignment for vdevs and copy data back. This might be what he's experiencing -- I ran into this recently when I knowingly put a 4K drive into a server that used 512b sectors in the zpool. It was an emergency as we ran out of 512b 2TB spares. As soon as we got a proper spare I did replace the drive and performance went back to normal and the resilver was extremely fast too. From owner-freebsd-fs@FreeBSD.ORG Wed Jul 3 14:55:27 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id C3284234 for ; Wed, 3 Jul 2013 14:55:27 +0000 (UTC) (envelope-from prvs=189683e577=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 6760E10EF for ; Wed, 3 Jul 2013 14:55:26 +0000 (UTC) Received: from r2d2 ([82.69.141.170]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50004699269.msg for ; Wed, 03 Jul 2013 15:55:24 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Wed, 03 Jul 2013 15:55:24 +0100 (not processed: message from valid local sender) X-MDDKIM-Result: neutral (mail1.multiplay.co.uk) X-MDRemoteIP: 82.69.141.170 X-Return-Path: prvs=189683e577=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-fs@freebsd.org Message-ID: From: "Steven Hartland" To: "Daniel Kalchev" , "mxb" References: <51D42107.1050107@digsys.bg> <2EF46A8C-6908-4160-BF99-EC610B3EA771@alumni.chalmers.se> <51D437E2.4060101@digsys.bg> Subject: Re: Slow resilvering with mirrored ZIL Date: Wed, 3 Jul 2013 15:55:34 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=response Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jul 2013 14:55:27 -0000 ----- Original Message ----- From: "Daniel Kalchev" To: "mxb" Cc: Sent: Wednesday, July 03, 2013 3:40 PM Subject: Re: Slow resilvering with mirrored ZIL > > On 03.07.13 16:36, mxb wrote: >> Well, then my question persists - why I get so significant drop of speed while resilvering second drive. >> The only changes to the system are: >> >> 1. Second partition for ZIL to create a mirror >> 2. New disks are 7200rpm. old ones are 5400rpm. >> Its not something like the old disks are 512byte sectors where as the new ones are 4k? It this is the case having already replaced one disk you've killed performance as its having to do lots more work reading none 4k aligned data? Regards steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Wed Jul 3 15:12:27 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 8140A9F1 for ; Wed, 3 Jul 2013 15:12:27 +0000 (UTC) (envelope-from mxb@alumni.chalmers.se) Received: from mail-la0-f45.google.com (mail-la0-f45.google.com [209.85.215.45]) by mx1.freebsd.org (Postfix) with ESMTP id 07D5111E2 for ; Wed, 3 Jul 2013 15:12:26 +0000 (UTC) Received: by mail-la0-f45.google.com with SMTP id fr10so250177lab.32 for ; Wed, 03 Jul 2013 08:12:20 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=subject:mime-version:content-type:from:x-priority:in-reply-to:date :cc:content-transfer-encoding:message-id:references:to:x-mailer :x-gm-message-state; bh=+e810TrZ6e/H52JEj76YPdVzcC76QUo2Xbj8T3xc1jw=; b=j7aDGtF3kChBwjEqsev7+54kt1HHotHVL6yjWQS1Mjq1+1qn7zXwFOW77i0abI1Aro mnEzWDX62ZKNT0lx69uzqEyJdYtQWRvUJVGRP61pxRAOZPDqnIF7+cFKaAvSXnvtDo2W 9Fr+T3XTFgfJ3+VFibpSaoZ9Fbk4fZJCnlCJX6sEmGsaiUiqnmnzW/Q63yu5OaNQ5xTA Gf/BB93UNJOjC1bwfZWdeOYCdF5Fgkbh63EHM37IrRtUVZZEk9CR7g5tm7SoabTZojQn 8PUiflELexdrsmuvC1Tf6RE98l7v+//Ly/thVGINfBS0ltoC/SlbB6avxTHlXIteZXbQ PjhA== X-Received: by 10.152.87.172 with SMTP id az12mr727636lab.24.1372864340284; Wed, 03 Jul 2013 08:12:20 -0700 (PDT) Received: from grey.home.unixconn.com (h-74-23.a183.priv.bahnhof.se. [46.59.74.23]) by mx.google.com with ESMTPSA id c4sm11027841lae.7.2013.07.03.08.12.18 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 03 Jul 2013 08:12:19 -0700 (PDT) Subject: Re: Slow resilvering with mirrored ZIL Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\)) Content-Type: text/plain; charset=iso-8859-1 From: mxb X-Priority: 3 In-Reply-To: Date: Wed, 3 Jul 2013 17:12:17 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: References: <51D42107.1050107@digsys.bg> <2EF46A8C-6908-4160-BF99-EC610B3EA771@alumni.chalmers.se> <51D437E2.4060101@digsys.bg> To: Steven Hartland X-Mailer: Apple Mail (2.1508) X-Gm-Message-State: ALoCoQmJDwG9bIQbf4RVDFaezRH6mIKNfQUCFVNYPkVuFQd+kpbOwqdhPJFGMtolmYAmelR75qdV Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jul 2013 15:12:27 -0000 Not sure if new are 4k. Done nothing about that. But the SECOND drive, resilvering is SLOW. Not the first one. As stated below. Those changes are introduced to the system. ALL new driver ARE identical, except S/N of cause :) On 3 jul 2013, at 16:55, Steven Hartland = wrote: >=20 > ----- Original Message ----- From: "Daniel Kalchev" > To: "mxb" > Cc: > Sent: Wednesday, July 03, 2013 3:40 PM > Subject: Re: Slow resilvering with mirrored ZIL >=20 >=20 >> On 03.07.13 16:36, mxb wrote: >>> Well, then my question persists - why I get so significant drop of = speed while resilvering second drive. >>> The only changes to the system are: >>>=20 >>> 1. Second partition for ZIL to create a mirror >>> 2. New disks are 7200rpm. old ones are 5400rpm. >>>=20 >=20 > Its not something like the old disks are 512byte sectors > where as the new ones are 4k? >=20 > It this is the case having already replaced one disk you've > killed performance as its having to do lots more work reading > none 4k aligned data? >=20 > Regards > steve >=20 > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > This e.mail is private and confidential between Multiplay (UK) Ltd. = and the person or entity to whom it is addressed. In the event of = misdirection, the recipient is prohibited from using, copying, printing = or otherwise disseminating it or any information contained in it.=20 > In the event of misdirection, illegible or incomplete transmission = please telephone +44 845 868 1337 > or return the E.mail to postmaster@multiplay.co.uk. >=20 From owner-freebsd-fs@FreeBSD.ORG Wed Jul 3 16:40:46 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 96114E9C for ; Wed, 3 Jul 2013 16:40:46 +0000 (UTC) (envelope-from bofh@terranova.net) Received: from tog.net (tog.net [IPv6:2605:5a00::5]) by mx1.freebsd.org (Postfix) with ESMTP id 6B87217F4 for ; Wed, 3 Jul 2013 16:40:46 +0000 (UTC) Received: from [IPv6:2605:5a00:ffff::face] (unknown [IPv6:2605:5a00:ffff::face]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by tog.net (Postfix) with ESMTPSA id 3blp3C2fLRz2mT for ; Wed, 3 Jul 2013 12:40:39 -0400 (EDT) Message-ID: <51D45401.5050801@terranova.net> Date: Wed, 03 Jul 2013 12:40:33 -0400 From: Travis Mikalson Organization: TerraNovaNet Internet Services User-Agent: Thunderbird 2.0.0.24 (Windows/20100228) MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Report: ZFS deadlock in 9-STABLE X-Enigmail-Version: 0.96.0 OpenPGP: url=http://www.terranova.net/pgp/bofh Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jul 2013 16:40:46 -0000 Hello, To cut to the chase, I have a procstat -kk -a captured during a livelock for you here: http://tog.net/freebsd/zfsdeadlock-storage1-20130703 The other relevant configurations I could think of to show you are available within that http://tog.net/freebsd/ directory. If you want any additional information that I haven't given here please let me know! This is a FreeBSD 9-STABLE AMD64 system currently at: r250777: Sat May 18 17:41:39 EDT 2013 I didn't see too many relevant ZFS-related fixes after that date so am waiting for another round of interesting commits to update again. Unfortunately, this system has been livelocking on average about once every 7-14 days. Its lot in life is a ZFS storage server serving NFS and istgt traffic. It has 32GB of RAM and is an 8-core 2.6GHz Opteron 6212. The zpool looks like this, it has eight 1TB SAS drives and two SSDs being used for log and cache. pool: storage1 state: ONLINE status: The pool is formatted using a legacy on-disk format. The pool can still be used, but some features are unavailable. action: Upgrade the pool using 'zpool upgrade'. Once this is done, the pool will no longer be accessible on software that does not support feature flags. scan: scrub repaired 0 in 6h4m with 0 errors on Sun Jan 6 06:39:38 2013 config: NAME STATE READ WRITE CKSUM storage1 ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 da0 ONLINE 0 0 0 da2 ONLINE 0 0 0 da4 ONLINE 0 0 0 da6 ONLINE 0 0 0 raidz1-1 ONLINE 0 0 0 da1 ONLINE 0 0 0 da3 ONLINE 0 0 0 da5 ONLINE 0 0 0 da7 ONLINE 0 0 0 logs mirror-2 ONLINE 0 0 0 da8p2 ONLINE 0 0 0 da9p2 ONLINE 0 0 0 cache da8p3 ONLINE 0 0 0 da9p3 ONLINE 0 0 0 errors: No known data errors -- TerraNovaNet Internet Services - Key Largo, FL Voice: (305)453-4011 x101 Fax: (305)451-5991 http://www.terranova.net/ PGP: 50091B3D ---------------------------------------------- Life's not fair, but the root password helps. From owner-freebsd-fs@FreeBSD.ORG Wed Jul 3 17:03:09 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 755BB794 for ; Wed, 3 Jul 2013 17:03:09 +0000 (UTC) (envelope-from prvs=189683e577=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 05E2D194C for ; Wed, 3 Jul 2013 17:03:08 +0000 (UTC) Received: from r2d2 ([82.69.141.170]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50004701467.msg for ; Wed, 03 Jul 2013 18:03:06 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Wed, 03 Jul 2013 18:03:06 +0100 (not processed: message from valid local sender) X-MDDKIM-Result: neutral (mail1.multiplay.co.uk) X-MDRemoteIP: 82.69.141.170 X-Return-Path: prvs=189683e577=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-fs@freebsd.org Message-ID: <6F014EB7E2A446E5B7FDDF086E77880E@multiplay.co.uk> From: "Steven Hartland" To: "Travis Mikalson" , References: <51D45401.5050801@terranova.net> Subject: Re: Report: ZFS deadlock in 9-STABLE Date: Wed, 3 Jul 2013 18:03:15 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jul 2013 17:03:09 -0000 Any zvols? ----- Original Message ----- From: "Travis Mikalson" To: Sent: Wednesday, July 03, 2013 5:40 PM Subject: Report: ZFS deadlock in 9-STABLE > Hello, > > To cut to the chase, I have a procstat -kk -a captured during a livelock > for you here: > http://tog.net/freebsd/zfsdeadlock-storage1-20130703 > > The other relevant configurations I could think of to show you are > available within that http://tog.net/freebsd/ directory. > > If you want any additional information that I haven't given here please > let me know! > > This is a FreeBSD 9-STABLE AMD64 system currently at: > r250777: Sat May 18 17:41:39 EDT 2013 > > I didn't see too many relevant ZFS-related fixes after that date so am > waiting for another round of interesting commits to update again. > > Unfortunately, this system has been livelocking on average about once > every 7-14 days. Its lot in life is a ZFS storage server serving NFS and > istgt traffic. > > It has 32GB of RAM and is an 8-core 2.6GHz Opteron 6212. > The zpool looks like this, it has eight 1TB SAS drives and two SSDs > being used for log and cache. > > pool: storage1 > state: ONLINE > status: The pool is formatted using a legacy on-disk format. The pool can > still be used, but some features are unavailable. > action: Upgrade the pool using 'zpool upgrade'. Once this is done, the > pool will no longer be accessible on software that does not > support feature > flags. > scan: scrub repaired 0 in 6h4m with 0 errors on Sun Jan 6 06:39:38 2013 > config: > > NAME STATE READ WRITE CKSUM > storage1 ONLINE 0 0 0 > raidz1-0 ONLINE 0 0 0 > da0 ONLINE 0 0 0 > da2 ONLINE 0 0 0 > da4 ONLINE 0 0 0 > da6 ONLINE 0 0 0 > raidz1-1 ONLINE 0 0 0 > da1 ONLINE 0 0 0 > da3 ONLINE 0 0 0 > da5 ONLINE 0 0 0 > da7 ONLINE 0 0 0 > logs > mirror-2 ONLINE 0 0 0 > da8p2 ONLINE 0 0 0 > da9p2 ONLINE 0 0 0 > cache > da8p3 ONLINE 0 0 0 > da9p3 ONLINE 0 0 0 > > errors: No known data errors > > -- > TerraNovaNet Internet Services - Key Largo, FL > Voice: (305)453-4011 x101 Fax: (305)451-5991 > http://www.terranova.net/ PGP: 50091B3D > ---------------------------------------------- > Life's not fair, but the root password helps. > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Wed Jul 3 19:05:40 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id C66B57C9 for ; Wed, 3 Jul 2013 19:05:40 +0000 (UTC) (envelope-from zoltan.arnold.nagy@gmail.com) Received: from mail-oa0-x22d.google.com (mail-oa0-x22d.google.com [IPv6:2607:f8b0:4003:c02::22d]) by mx1.freebsd.org (Postfix) with ESMTP id 9A2A01FE8 for ; Wed, 3 Jul 2013 19:05:40 +0000 (UTC) Received: by mail-oa0-f45.google.com with SMTP id j1so699459oag.4 for ; Wed, 03 Jul 2013 12:05:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=nOdmYPy8hGFfoL/BXAFXzzCOwQ4vuwqol3wLM1vwlQg=; b=QRWBKfcPA7JbuV8kY1CkWFrX9DVcPp876DeVrB+iJ4ebQDLLLPSDiM2rXPbP3/poWe 17JMfv2YZCN9Bp39IM5vhckjBOrUJrLWDKWHvZ/9ZtCtkclSn9cDIwrJNUsB1Ox6DIe3 FO/rWZRW6uiZGMkx89y0SxNOnjqXAqcBqsG4ZQt0Ols6M/Wlste04rOOv8cqNYwPA7Kj eqndZX3QLVHQFJZK9ur9rtL/Sml9ifCow1NohpdEUFHq1pffgzntCNXw4D7yokCThOIG Xf4L+Jfcq5qfb6fSvcsgZYlWbHRqiEoEZgBsRX/RTKpEl3iopuirwfZdQJ9FZwR6KnyB HkJA== MIME-Version: 1.0 X-Received: by 10.182.81.233 with SMTP id d9mr2227342oby.43.1372878340203; Wed, 03 Jul 2013 12:05:40 -0700 (PDT) Received: by 10.76.126.195 with HTTP; Wed, 3 Jul 2013 12:05:40 -0700 (PDT) Date: Wed, 3 Jul 2013 21:05:40 +0200 Message-ID: Subject: O_DIRECT|O_SYNC semantics? From: Zoltan Arnold NAGY To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jul 2013 19:05:40 -0000 Hi, Could someone have a look here: http://serverfault.com/questions/520141/please-explain-my-fio-results-is-o-synco-direct-misbehaving-on-linux Basically, I'm seeing wastly different results on Linux and on FreeBSD 9.1. Either FreeBSD's not honoring O_SYNC properly, or Linux does something wicked. I've been at it for a few days, without any real progress. I do realize that since I'm operating at a block device level not with any filesystem it's strange to ask on -fs, but I came to this results while experimenting with the SSD as a ZIL device, and was surprised at the low numbers. Thanks, Zoltan From owner-freebsd-fs@FreeBSD.ORG Wed Jul 3 19:24:17 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 50283F68; Wed, 3 Jul 2013 19:24:17 +0000 (UTC) (envelope-from delphij@delphij.net) Received: from anubis.delphij.net (anubis.delphij.net [IPv6:2001:470:1:117::25]) by mx1.freebsd.org (Postfix) with ESMTP id 2FA351124; Wed, 3 Jul 2013 19:24:17 +0000 (UTC) Received: from zeta.ixsystems.com (drawbridge.ixsystems.com [206.40.55.65]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by anubis.delphij.net (Postfix) with ESMTPSA id 6742B9DEB; Wed, 3 Jul 2013 12:24:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=delphij.net; s=anubis; t=1372879456; bh=3JD/VofTCjjEyo1dzZYVJQXQiEJ4GXuZ2ey46hQqmiE=; h=Date:From:Reply-To:To:CC:Subject:References:In-Reply-To; b=YO0h6UN9GmraKPzQ62Kf+1rTchWDUueLr79cq6hSoOgQE6U5HlZp9WFtVPaBYPqFa 5DFWzxunsNr6lKPcNhTLnigKTcX7Org0P1RHFLZXBjKmS57ElfpfkB3BivqWJj/TYN UlIry9moDI+WG9fwl+/Ilula6Xo7WMEk10xO6ibI= Message-ID: <51D47A5F.3030501@delphij.net> Date: Wed, 03 Jul 2013 12:24:15 -0700 From: Xin Li Organization: The FreeBSD Project MIME-Version: 1.0 To: Travis Mikalson Subject: Re: Report: ZFS deadlock in 9-STABLE References: <51D45401.5050801@terranova.net> In-Reply-To: <51D45401.5050801@terranova.net> X-Enigmail-Version: 1.5.1 Content-Type: multipart/mixed; boundary="------------030900050708080305020709" Cc: freebsd-fs@freebsd.org, kib@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: d@delphij.net List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jul 2013 19:24:17 -0000 This is a multi-part message in MIME format. --------------030900050708080305020709 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 Hi, Sorry for the top posting but I am quite convinced that this is a known issue that we have seen with our customer. Please try applying this patch [1] and please report back if that fixes your problem. Note that if you would like to provide more help, we would appreciate that you test Konstantin's patch as well, at: http://lists.freebsd.org/pipermail/freebsd-hackers/2013-May/042876.html [1] See attachment; the commit is https://github.com/trueos/trueos/commit/f678ae7c7f72fba577b00e3d0c237c4f297575c6 Cheers, On 07/03/13 09:40, Travis Mikalson wrote: > Hello, > > To cut to the chase, I have a procstat -kk -a captured during a > livelock for you here: > http://tog.net/freebsd/zfsdeadlock-storage1-20130703 > > The other relevant configurations I could think of to show you are > available within that http://tog.net/freebsd/ directory. > > If you want any additional information that I haven't given here > please let me know! > > This is a FreeBSD 9-STABLE AMD64 system currently at: r250777: Sat > May 18 17:41:39 EDT 2013 > > I didn't see too many relevant ZFS-related fixes after that date so > am waiting for another round of interesting commits to update > again. > > Unfortunately, this system has been livelocking on average about > once every 7-14 days. Its lot in life is a ZFS storage server > serving NFS and istgt traffic. > > It has 32GB of RAM and is an 8-core 2.6GHz Opteron 6212. The zpool > looks like this, it has eight 1TB SAS drives and two SSDs being > used for log and cache. > > pool: storage1 state: ONLINE status: The pool is formatted using a > legacy on-disk format. The pool can still be used, but some > features are unavailable. action: Upgrade the pool using 'zpool > upgrade'. Once this is done, the pool will no longer be accessible > on software that does not support feature flags. scan: scrub > repaired 0 in 6h4m with 0 errors on Sun Jan 6 06:39:38 2013 > config: > > NAME STATE READ WRITE CKSUM storage1 ONLINE 0 > 0 0 raidz1-0 ONLINE 0 0 0 da0 ONLINE 0 > 0 0 da2 ONLINE 0 0 0 da4 ONLINE 0 > 0 0 da6 ONLINE 0 0 0 raidz1-1 ONLINE 0 > 0 0 da1 ONLINE 0 0 0 da3 ONLINE 0 > 0 0 da5 ONLINE 0 0 0 da7 ONLINE 0 > 0 0 logs mirror-2 ONLINE 0 0 0 da8p2 ONLINE > 0 0 0 da9p2 ONLINE 0 0 0 cache da8p3 > ONLINE 0 0 0 da9p3 ONLINE 0 0 0 > > errors: No known data errors > - -- Xin LI https://www.delphij.net/ FreeBSD - The Power to Serve! Live free or die -----BEGIN PGP SIGNATURE----- iQEcBAEBCgAGBQJR1HpeAAoJEG80Jeu8UPuzxQAH/iwsYlntqDdNt+nLl45KxzKV Zf0Nh1i0OMNJvSlMW/h1N89AChrCEjUQm+YNZ1+1QPR+kR/GiRsCHYeRzEYExfUH 98i0gGefr63/2vOML7+NgBc90Kf+cSdouMV+dOuhWNgD4t/aHbbJktIKR8Ye/T+8 20W89Ts34xr9D0IfcXhZB5JBlcBl9nrtD/vD7IZ2KVP8icjLh1TSKU8kEREka8EZ MGS0EfDF8KjfzekGCaSV/AQTDpUdltcRqxE7bG5IWTu0sRGmemqZjD5ilAPX0ls9 LctLiwp/k7xBJ8cUR9Zq9wBd6ISSb6Cc90Pf8Rm60438sDzUdwk9l5m9+BxPX+U= =ME+s -----END PGP SIGNATURE----- --------------030900050708080305020709 Content-Type: text/plain; charset=UTF-8; name="f678ae7c7f72fba577b00e3d0c237c4f297575c6.diff" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="f678ae7c7f72fba577b00e3d0c237c4f297575c6.diff" diff --git a/sys/kern/kern_intr.c b/sys/kern/kern_intr.c index 33db213..75e0912 100644 --- a/sys/kern/kern_intr.c +++ b/sys/kern/kern_intr.c @@ -841,7 +841,7 @@ static void priv_ithread_execute_handler(struct proc *p, * again and remove this handler if it has already passed * it on the list. */ - ie->ie_thread->it_need = 1; + atomic_store_rel_int(&ie->ie_thread->it_need, 1); } else TAILQ_REMOVE(&ie->ie_handlers, handler, ih_next); thread_unlock(ie->ie_thread->it_thread); @@ -912,7 +912,7 @@ static void priv_ithread_execute_handler(struct proc *p, * running. Then, lock the thread and see if we actually need to * put it on the runqueue. */ - it->it_need = 1; + atomic_store_rel_int(&it->it_need, 1); thread_lock(td); if (TD_AWAITING_INTR(td)) { CTR3(KTR_INTR, "%s: schedule pid %d (%s)", __func__, p->p_pid, @@ -990,7 +990,7 @@ static void priv_ithread_execute_handler(struct proc *p, * again and remove this handler if it has already passed * it on the list. */ - it->it_need = 1; + atomic_store_rel_int(&it->it_need, 1); } else TAILQ_REMOVE(&ie->ie_handlers, handler, ih_next); thread_unlock(it->it_thread); @@ -1066,7 +1066,7 @@ static void priv_ithread_execute_handler(struct proc *p, * running. Then, lock the thread and see if we actually need to * put it on the runqueue. */ - it->it_need = 1; + atomic_store_rel_int(&it->it_need, 1); thread_lock(td); if (TD_AWAITING_INTR(td)) { CTR3(KTR_INTR, "%s: schedule pid %d (%s)", __func__, p->p_pid, @@ -1256,7 +1256,7 @@ static void priv_ithread_execute_handler(struct proc *p, * interrupt threads always invoke all of their handlers. */ if (ie->ie_flags & IE_SOFT) { - if (!ih->ih_need) + if (atomic_load_acq_int(&ih->ih_need) == 0) continue; else atomic_store_rel_int(&ih->ih_need, 0); @@ -1358,7 +1358,7 @@ static void priv_ithread_execute_handler(struct proc *p, * we are running, it will set it_need to note that we * should make another pass. */ - while (ithd->it_need) { + while (atomic_load_acq_int(&ithd->it_need) != 0) { /* * This might need a full read and write barrier * to make sure that this write posts before any @@ -1377,7 +1377,8 @@ static void priv_ithread_execute_handler(struct proc *p, * set again, so we have to check it again. */ thread_lock(td); - if (!ithd->it_need && !(ithd->it_flags & (IT_DEAD | IT_WAIT))) { + if ((atomic_load_acq_int(&ithd->it_need) == 0) && + !(ithd->it_flags & (IT_DEAD | IT_WAIT))) { TD_SET_IWAIT(td); ie->ie_count = 0; mi_switch(SW_VOL | SWT_IWAIT, NULL); @@ -1538,7 +1539,7 @@ static void priv_ithread_execute_handler(struct proc *p, * we are running, it will set it_need to note that we * should make another pass. */ - while (ithd->it_need) { + while (atomic_load_acq_int(&ithd->it_need) != 0) { /* * This might need a full read and write barrier * to make sure that this write posts before any @@ -1560,7 +1561,8 @@ static void priv_ithread_execute_handler(struct proc *p, * set again, so we have to check it again. */ thread_lock(td); - if (!ithd->it_need && !(ithd->it_flags & (IT_DEAD | IT_WAIT))) { + if ((atomic_load_acq_int(&ithd->it_need) == 0) && + !(ithd->it_flags & (IT_DEAD | IT_WAIT))) { TD_SET_IWAIT(td); ie->ie_count = 0; mi_switch(SW_VOL | SWT_IWAIT, NULL); --------------030900050708080305020709-- From owner-freebsd-fs@FreeBSD.ORG Wed Jul 3 19:28:35 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id CC6DEC7 for ; Wed, 3 Jul 2013 19:28:35 +0000 (UTC) (envelope-from berend@pobox.com) Received: from smtp.pobox.com (b-pb-sasl-quonix.pobox.com [208.72.237.35]) by mx1.freebsd.org (Postfix) with ESMTP id 89DBD1158 for ; Wed, 3 Jul 2013 19:28:33 +0000 (UTC) Received: from smtp.pobox.com (unknown [127.0.0.1]) by b-sasl-quonix.pobox.com (Postfix) with ESMTP id CB1452DE12; Wed, 3 Jul 2013 19:28:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=pobox.com; h=date :message-id:from:to:cc:subject:in-reply-to:references :mime-version:content-type:content-transfer-encoding; s=sasl; bh=PSB//v+n5QSPKwkd8BMwLDup5HQ=; b=IrpcJYMvOTEbQpBGKCsDzJ39gYEi o3bVRkuWb6SxTb9mHw9NBPVRHCKVF4mUWvGpNWX9MGUqeSIwnqE1/nxNOhFQOh/F IPUZEMy+faCdHhxvRspEsuwR7UPdUg6CrdOaBKDDYoZaQsMFAFwp4rmpKKPP+lZ+ /aHNH1CMKVy/AWA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=pobox.com; h=date:message-id :from:to:cc:subject:in-reply-to:references:mime-version :content-type:content-transfer-encoding; q=dns; s=sasl; b=Bkzw6c aodypifgBaajqXPd55UexDVrt4etqhO4N6yYSpmlIR0lYZNjml0pQO6VRxMe3c+R rQHD4BjtqdGQcRFV6KL98GTLOri1ZAkFiEnwOZ9v5/whM/Pyd7yU6XJSlY7XfaPf blI93Ex+kRQzMEIoYOo1p5nGgAX4jw5/9mPlA= Received: from b-pb-sasl-quonix.pobox.com (unknown [127.0.0.1]) by b-sasl-quonix.pobox.com (Postfix) with ESMTP id C0DDA2DE11; Wed, 3 Jul 2013 19:28:31 +0000 (UTC) Received: from bmach.nederware.nl (unknown [27.252.247.84]) by b-sasl-quonix.pobox.com (Postfix) with ESMTPA id 1205C2DE10; Wed, 3 Jul 2013 19:28:31 +0000 (UTC) Received: from quadrio.nederware.nl (quadrio.nederware.nl [192.168.33.13]) by bmach.nederware.nl (Postfix) with ESMTP id AC8F25C55; Thu, 4 Jul 2013 07:28:23 +1200 (NZST) Received: from quadrio.nederware.nl (quadrio.nederware.nl [127.0.0.1]) by quadrio.nederware.nl (Postfix) with ESMTP id 3BF304A11BF3; Thu, 4 Jul 2013 07:28:28 +1200 (NZST) Date: Thu, 04 Jul 2013 07:28:28 +1200 Message-ID: <87obaj4f4z.wl%berend@pobox.com> From: Berend de Boer To: Daniel Kalchev Subject: Re: EBS snapshot backups from a FreeBSD zfs file system: zpool freeze? In-Reply-To: <51D42069.5060409@digsys.bg> References: <87li5o5tz2.wl%berend@pobox.com> <87ehbg5raq.wl%berend@pobox.com> <20130703055047.GA54853@icarus.home.lan> <6488DECC-2455-4E92-B432-C39490D18484@dragondata.com> <14A2336A-969C-4A13-9EFA-C0C42A12039F@hostpoint.ch> <87zju43sxl.wl%berend@pobox.com> <87ppuz51os.wl%berend@pobox.com> <51D42069.5060409@digsys.bg> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL/10.8 EasyPG/1.0.0 Emacs/24.3 (i686-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) Organization: Xplain Technology Ltd MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: multipart/signed; boundary="pgp-sign-Multipart_Thu_Jul__4_07:28:27_2013-1"; micalg=pgp-sha256; protocol="application/pgp-signature" Content-Transfer-Encoding: 7bit X-Pobox-Relay-ID: BE7D8D16-E416-11E2-B126-E84251E3A03C-48001098!b-pb-sasl-quonix.pobox.com Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jul 2013 19:28:35 -0000 --pgp-sign-Multipart_Thu_Jul__4_07:28:27_2013-1 Content-Type: text/plain; charset=US-ASCII >>>>> "Daniel" == Daniel Kalchev writes: Daniel> It seems, Amazon uses some sort of ZFS (volume) snapshots Daniel> in order to implement the functionality of EBS. Why it Daniel> would take hours to complete is hard to understand, It all depends on how big your disks are :-) First snapshot takes longer, then they take differences. Perhaps 1TB takes 20 minutes or so? But diffs are much faster, usually 2 minutes or so. -- All the best, Berend de Boer ------------------------------------------------------ Awesome Drupal hosting: https://www.xplainhosting.com/ --pgp-sign-Multipart_Thu_Jul__4_07:28:27_2013-1 Content-Type: application/pgp-signature Content-Transfer-Encoding: 7bit Content-Description: OpenPGP Digital Signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.17 (GNU/Linux) iQIcBAABCAAGBQJR1HtbAAoJEKOfeD48G3g5FlIP/iICV3dCIXMMtP8G5WQSPRo2 e780pLr+wfLShagoyfCLg0FAHO1UcodPZGTc3XV2fjZBVZ/vP9R5XghiPdxJKjB2 ZRCZDTptLcWpbGGiVzMa/vq9KnTsHAMnv6b8Xw47vwLrsJItq9eL5prIneT1zvEX OHu2a5760fHcKHVMVU/LiggdN6kPTilbHInVZwJ3MMKidsR0TaKhwxwdS7UiDpMQ ixga19ZxdcmHJDlCK0BnYSc58TzM5QCQMEhmttHjZ/s7Oxh5LLD536YR08bptREm hxeNAcj8640Tir5NLDvJgXZcBZEGcEX57/dCGwAa5uMEs9LouFb/ktYquZ7z8Ats jY9wRyj6l3Y8ivOMzXwt0kKcohey7KrLrWU3NwLUc4cta/8stypynUq8dyaaGBEp nZ83LlrN1WLJn/IhfE2fBo4jjx//qNhhP86EdIaEXc/n1YjgVKEUpp/+wH8QFXL3 OWiA7pYT8XSQxVbCah3qfkiEwVLlkBKB6EwFgSdbs+5DparbEAFebBGmpjBNwaFl lkPOnq3ZSRDvtC+GFeLsuB5PbG835d/Y2WX4TTsv/k9FVeggtw5EizA/3cl4bUQU HBbbG899PyoSJI5v2LYziqnQHHLFpXYb+aMovYH0LSOnvjwnUu1SJuPT81jZJaeE LSE9ITzWYC7VE8eCFzYJ =v+Aw -----END PGP SIGNATURE----- --pgp-sign-Multipart_Thu_Jul__4_07:28:27_2013-1-- From owner-freebsd-fs@FreeBSD.ORG Wed Jul 3 19:39:31 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 27225242 for ; Wed, 3 Jul 2013 19:39:31 +0000 (UTC) (envelope-from berend@pobox.com) Received: from smtp.pobox.com (b-pb-sasl-quonix.pobox.com [208.72.237.35]) by mx1.freebsd.org (Postfix) with ESMTP id E423211B8 for ; Wed, 3 Jul 2013 19:39:30 +0000 (UTC) Received: from smtp.pobox.com (unknown [127.0.0.1]) by b-sasl-quonix.pobox.com (Postfix) with ESMTP id C3E6F2D4A4; Wed, 3 Jul 2013 19:39:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=pobox.com; h=date :message-id:from:to:cc:subject:in-reply-to:references :mime-version:content-type:content-transfer-encoding; s=sasl; bh=F+9mwqDN+fTkz2QgWiIpvEtrfzQ=; b=o52CRUr1zfNsf4rmu5jEb1ZwSdO8 +tptMpJCsDwiCQAgmIBLMcTPpq4pfPZzCFqsMXliLS5LvtwNd+xRRrTupTpPXJTZ YcoamxyqIXAyyDL8li0aoI2eRlIH8FpWfXOnzCtuqIYFy4jkcy03c/3RVGvCtO8f iPVrlyaHTnaHuBE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=pobox.com; h=date:message-id :from:to:cc:subject:in-reply-to:references:mime-version :content-type:content-transfer-encoding; q=dns; s=sasl; b=Vcx2es cgwvSXzafRe1o/gHqHN9s4BBxnQXthlNTpTC/us+F/FQTpV9tuiEHruhtGAb+KJg sBGZv8U3V/s2Tc/fhaH7K7mIHm7ZRAzmdFb7aAc9k3EbrzSie+ZcuYY/lKILwV6S NYze11LFNouAcemlq39eHVB/cAOo4JcuiiPsA= Received: from b-pb-sasl-quonix.pobox.com (unknown [127.0.0.1]) by b-sasl-quonix.pobox.com (Postfix) with ESMTP id BA8B92D4A3; Wed, 3 Jul 2013 19:39:29 +0000 (UTC) Received: from bmach.nederware.nl (unknown [27.252.247.84]) by b-sasl-quonix.pobox.com (Postfix) with ESMTPA id 749F52D49B; Wed, 3 Jul 2013 19:39:28 +0000 (UTC) Received: from quadrio.nederware.nl (quadrio.nederware.nl [192.168.33.13]) by bmach.nederware.nl (Postfix) with ESMTP id 8D8EF5C55; Thu, 4 Jul 2013 07:39:21 +1200 (NZST) Received: from quadrio.nederware.nl (quadrio.nederware.nl [127.0.0.1]) by quadrio.nederware.nl (Postfix) with ESMTP id 3090E4A11BF3; Thu, 4 Jul 2013 07:39:26 +1200 (NZST) Date: Thu, 04 Jul 2013 07:39:26 +1200 Message-ID: <87mwq34emp.wl%berend@pobox.com> From: Berend de Boer To: "Mark Felder" Subject: Re: EBS snapshot backups from a FreeBSD zfs file system: zpool freeze? In-Reply-To: References: <87li5o5tz2.wl%berend@pobox.com> <87ehbg5raq.wl%berend@pobox.com> <20130703055047.GA54853@icarus.home.lan> <6488DECC-2455-4E92-B432-C39490D18484@dragondata.com> <871u7g57rl.wl%berend@pobox.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL/10.8 EasyPG/1.0.0 Emacs/24.3 (i686-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) Organization: Xplain Technology Ltd MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: multipart/signed; boundary="pgp-sign-Multipart_Thu_Jul__4_07:39:25_2013-1"; micalg=pgp-sha256; protocol="application/pgp-signature" Content-Transfer-Encoding: 7bit X-Pobox-Relay-ID: 46551D48-E418-11E2-9C99-E84251E3A03C-48001098!b-pb-sasl-quonix.pobox.com Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jul 2013 19:39:31 -0000 --pgp-sign-Multipart_Thu_Jul__4_07:39:25_2013-1 Content-Type: text/plain; charset=US-ASCII >>>>> "Mark" == Mark Felder writes: Mark> On the other hand, every time I read about "block storage Mark> snapshots" -- even if you quiesce the filesystem -- I start Mark> to get really itchy thinking about the likeliness a high TPS Mark> database is going to end up with corruption and require Mark> recovery. :) That's not how it works: if you freeze the file system at a consistent point, you can use the roll-forward/backward capabilities of your db to come back clean. You can do this even fancier. Mysql or Mongo allow you to flush their caches as well + put in place a global lock of the database. Then you freeze the file system, take the snapshot, unfreeze file system, then unfreeze mysql. People have been using this for many years, for example a famous utility for this was mylvmbackup: http://www.lenzg.net/mylvmbackup/ ZFS should work really well here, and people probably use zfs snapshots in this manner. If performance is an issue, you do this on the slave obviously. But I don't get the itchy part: if a disk is just software, I can copy it. I want to copy it. To another data centre (zone) for example. That's a trivial operation in EBS, and you can clone huge disks this way in minutes. Doing an zfs send/recv is just laughably primitive and slow compared to this. It would take me days to send a full snapshot this way. Mark> This really does sound like Amazon needs to provide whatever Mark> mechanism to communicate between the host and the guest so Mark> this EBS snapshot can take place. Again, my request has *nothing* to do with EBS. If you have multiple disks in your pool, how can you make a backup you can restore from, at the hardware level. -- All the best, Berend de Boer ------------------------------------------------------ Awesome Drupal hosting: https://www.xplainhosting.com/ --pgp-sign-Multipart_Thu_Jul__4_07:39:25_2013-1 Content-Type: application/pgp-signature Content-Transfer-Encoding: 7bit Content-Description: OpenPGP Digital Signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.17 (GNU/Linux) iQIcBAABCAAGBQJR1H3tAAoJEKOfeD48G3g5ql8P/3bHdabY6NmGUkoNma2liHqV UcXJaoBzE6CbYtqI/evR/ZNfLLX5MK9wChdRV4dlVbAVMzpo7u/9+i8J68WDyoS8 hyUQjHVWJ8o50eblblRIxWf1hUrimGGQ5T5xVXvrEZTSmyvsMWAsNZAs12gk5YHI McTvXFkDmEefRzPQiIymY0tjRzASDiNX8daBbkYefxcGJU7CMQBbufAdvonlxXdB J0PIwxyFH3YyUJ52YoET0WOWnU3//A9+R8NNYDrISd3gG0bxAD7xLcenK3dqR+lY mt2pJhccltLHtVuC52nUnxkCO+CtuXOWGRdO0pbJO0LIDOoGC0lVCNcl14jQYcF3 bRwC7E7vs0EbNTSJYdjr2RuUIbLUM04Qn83r4xnmAN9zBdTeZzG3a4bFcpjMw+Ij gSdCFmlHokSaYs0m2ccy0FocueIMZjjT2lJFHWSign3GCYrewceHJqkoLrTVDkQY EPPRPS0Fre7v1jLyH2ozURksIpAWfkWmXZtBTvvPhhVLr1XS0hbMEIWZA4kyoPip lRBT4kTwyJWu69h9i3XWNxy6herrb9AUoC2MFMESbTqDo/DFbzTz4X2O4ZOm4F8v b5UamlMD6odeCrslNywoc8zp86qlK/gBGQ7Xybxa8XEyGLfh/VCwhqsOpXSKBz/j of+vLR4aCPV9BkeTGr73 =5+03 -----END PGP SIGNATURE----- --pgp-sign-Multipart_Thu_Jul__4_07:39:25_2013-1-- From owner-freebsd-fs@FreeBSD.ORG Wed Jul 3 20:02:45 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id D005E80A for ; Wed, 3 Jul 2013 20:02:45 +0000 (UTC) (envelope-from gpalmer@freebsd.org) Received: from mail.in-addr.com (unknown [IPv6:2001:470:8:162::1]) by mx1.freebsd.org (Postfix) with ESMTP id A4E4912BD for ; Wed, 3 Jul 2013 20:02:45 +0000 (UTC) Received: from gjp by mail.in-addr.com with local (Exim 4.80.1 (FreeBSD)) (envelope-from ) id 1UuTGD-000LwI-7Q; Wed, 03 Jul 2013 16:02:41 -0400 Date: Wed, 3 Jul 2013 16:02:41 -0400 From: Gary Palmer To: Berend de Boer Subject: Re: EBS snapshot backups from a FreeBSD zfs file system: zpool freeze? Message-ID: <20130703200241.GB60515@in-addr.com> References: <87li5o5tz2.wl%berend@pobox.com> <87ehbg5raq.wl%berend@pobox.com> <20130703055047.GA54853@icarus.home.lan> <6488DECC-2455-4E92-B432-C39490D18484@dragondata.com> <871u7g57rl.wl%berend@pobox.com> <87mwq34emp.wl%berend@pobox.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87mwq34emp.wl%berend@pobox.com> X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: gpalmer@freebsd.org X-SA-Exim-Scanned: No (on mail.in-addr.com); SAEximRunCond expanded to false Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jul 2013 20:02:45 -0000 On Thu, Jul 04, 2013 at 07:39:26AM +1200, Berend de Boer wrote: > Again, my request has *nothing* to do with EBS. If you have multiple > disks in your pool, how can you make a backup you can restore from, at > the hardware level. Other than using SAN (FC or iSCSI), I know of no reason to do backups at the raw disk level, nor any real demand. I've worked with people who have done LUN based backups in the past and they have one drawback - they tend to back up the entire LUN, irrespective of whether it is an allocated block or not. Modern systems that implement some kind of TRIM emulation (or cheat and sniff the filesystem block allocation maps) may alleviate that problem. However, in the vast majority of cases, people back up from above the FS, not below. This makes your use case probably more tied to EBS than you may otherwise think. Regards, Gary From owner-freebsd-fs@FreeBSD.ORG Wed Jul 3 20:09:40 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id E9DFDAA8 for ; Wed, 3 Jul 2013 20:09:39 +0000 (UTC) (envelope-from jdc@koitsu.org) Received: from relay5-d.mail.gandi.net (relay5-d.mail.gandi.net [217.70.183.197]) by mx1.freebsd.org (Postfix) with ESMTP id 74B0D134C for ; Wed, 3 Jul 2013 20:09:39 +0000 (UTC) Received: from mfilter24-d.gandi.net (mfilter24-d.gandi.net [217.70.178.152]) by relay5-d.mail.gandi.net (Postfix) with ESMTP id 53AAA41C053; Wed, 3 Jul 2013 22:09:28 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at mfilter24-d.gandi.net Received: from relay5-d.mail.gandi.net ([217.70.183.197]) by mfilter24-d.gandi.net (mfilter24-d.gandi.net [10.0.15.180]) (amavisd-new, port 10024) with ESMTP id n3o4K+3nbssn; Wed, 3 Jul 2013 22:09:26 +0200 (CEST) X-Originating-IP: 76.102.14.35 Received: from jdc.koitsu.org (c-76-102-14-35.hsd1.ca.comcast.net [76.102.14.35]) (Authenticated sender: jdc@koitsu.org) by relay5-d.mail.gandi.net (Postfix) with ESMTPSA id 66D1841C06C; Wed, 3 Jul 2013 22:09:25 +0200 (CEST) Received: by icarus.home.lan (Postfix, from userid 1000) id 7067973A1C; Wed, 3 Jul 2013 13:09:23 -0700 (PDT) Date: Wed, 3 Jul 2013 13:09:23 -0700 From: Jeremy Chadwick To: Alexandr Subject: Re: Whole disk ZFS or -a4k partition Message-ID: <20130703200923.GA70533@icarus.home.lan> References: <51D3ED4F.5030102@shurik.kiev.ua> <8afaecbe0f764963b57fac7743f483bc@DBXPR07MB064.eurprd07.prod.outlook.com> <51D42416.8080604@shurik.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=unknown-8bit Content-Disposition: inline In-Reply-To: <51D42416.8080604@shurik.kiev.ua> User-Agent: Mutt/1.5.21 (2010-09-15) Content-Transfer-Encoding: quoted-printable Cc: "freebsd-fs@freebsd.org" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jul 2013 20:09:40 -0000 On Wed, Jul 03, 2013 at 04:16:06PM +0300, Alexandr wrote: > 03.07.2013 14:56, Ivailo Tanusheff =D0=BF=D0=B8=D1=88=D0=B5=D1=82: > > Hi, > > > > I can give you my point of view: > > > > Which zpool do you refer to? Still I would align both pools - on SSD = because you will have a boot code anyway.=20 > > What I would do on SSD is to use: > > gpart add -b 34 -s 94 -t freebsd-boot > > > > The second thing I would get in mind Is the swap - I would not use a = ZFS for that, had a lot of issues with similar setup and taking into acco= unt that ZFS uses a lot of RAM you may become to situation when you want = to swap data to use RAM for ZFS functionality, related to the swap volume= on the ZFS subsystem, which is a deadlock scenario. > > So I would advise you to create swap partition on the SSD and use thi= s partition in the system. > > gpart add -t freebsd-swap -s 4G -l ssd-swap > > gpart add -t freebsd-zfs -l ssd-zfs > > > > This will solve you the alignment, swap issues and etc. > > > > About the second drive - there are two issues: may you need to replac= e the disk and will you need additional swap in some case. > > Just because future is unsure I would recommend to again use 4K align= , to create a similar SWAP partition and DO NOT use it on the system and = use gnop for the 4K sizing: > > gnop create -S 4096 /dev/gpt/disk0 > > zpool create zroot /dev/gpt/disk0.nop > > zpool export zroot > > gnop destroy /dev/gpt/disk0.nop > > zpool import zroot > > > > This way you will solve: > > - the 4K issues, if you have any; > > - the replacement issues, if you get a disk with smaller sector count= ; > > - in case you have some swap issues/needs, you will be able to issue = swapon /dev/gpt/... and temporary increasethe swap, but not using this al= l time you will not decrease performance of the system in normal days. > > > > Best regards, > > Ivailo Tanusheff > > > > > Thank you for your explain. I'll try it soon and post my results. One > thing - I can't use gpt-disks because off my laptop's bios (Lenovo > Thinkpad E530) cannot boot it, only mbr-style. You can use GPT with a BIOS that only supports MBR (in other words, you do not need UEFI to boot from GPT). FreeBSD's boot blocks are intelligent in this regard. I know because 1) I use this on my own systems that do not have UEFI, 2) tons of other people here do as well, and 3) because even Wikipedia states so. :-) --=20 | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Wed Jul 3 20:17:52 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 2255CD49 for ; Wed, 3 Jul 2013 20:17:52 +0000 (UTC) (envelope-from jdc@koitsu.org) Received: from relay3-d.mail.gandi.net (relay3-d.mail.gandi.net [217.70.183.195]) by mx1.freebsd.org (Postfix) with ESMTP id D768013BF for ; Wed, 3 Jul 2013 20:17:51 +0000 (UTC) Received: from mfilter21-d.gandi.net (mfilter21-d.gandi.net [217.70.178.149]) by relay3-d.mail.gandi.net (Postfix) with ESMTP id 333B5A80CF; Wed, 3 Jul 2013 22:17:39 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at mfilter21-d.gandi.net Received: from relay3-d.mail.gandi.net ([217.70.183.195]) by mfilter21-d.gandi.net (mfilter21-d.gandi.net [10.0.15.180]) (amavisd-new, port 10024) with ESMTP id UZEmdKZONIxo; Wed, 3 Jul 2013 22:17:37 +0200 (CEST) X-Originating-IP: 76.102.14.35 Received: from jdc.koitsu.org (c-76-102-14-35.hsd1.ca.comcast.net [76.102.14.35]) (Authenticated sender: jdc@koitsu.org) by relay3-d.mail.gandi.net (Postfix) with ESMTPSA id 3327FA80CD; Wed, 3 Jul 2013 22:17:37 +0200 (CEST) Received: by icarus.home.lan (Postfix, from userid 1000) id 4E68C73A1C; Wed, 3 Jul 2013 13:17:35 -0700 (PDT) Date: Wed, 3 Jul 2013 13:17:35 -0700 From: Jeremy Chadwick To: Zoltan Arnold NAGY Subject: Re: O_DIRECT|O_SYNC semantics? Message-ID: <20130703201735.GB70533@icarus.home.lan> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jul 2013 20:17:52 -0000 On Wed, Jul 03, 2013 at 09:05:40PM +0200, Zoltan Arnold NAGY wrote: > Could someone have a look here: > http://serverfault.com/questions/520141/please-explain-my-fio-results-is-o-synco-direct-misbehaving-on-linux > > Basically, I'm seeing wastly different results on Linux and on FreeBSD 9.1. > Either FreeBSD's not honoring O_SYNC properly, or Linux does something > wicked. > > I've been at it for a few days, without any real progress. > > I do realize that since I'm operating at a block device level not with any > filesystem it's strange to ask on -fs, but I came to this results while > experimenting with the SSD as a ZIL device, and was surprised at the low > numbers. Block devices on FreeBSD are ***always*** O_DIRECT. There is no "caching mechanism" with such. Block devices on Linux result in caching, unless O_DIRECT is used. Because you're asking about some underlying kernel behaviour, I might recommend this be discussed on the -hackers list, where many of the low-level folks hang out. I can assure you that you're going to be asked to provide "dmesg" (on FreeBSD) from the system you're testing with, so you'd best have that ready. -- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Wed Jul 3 20:20:06 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 160D6F46 for ; Wed, 3 Jul 2013 20:20:06 +0000 (UTC) (envelope-from wblock@wonkity.com) Received: from wonkity.com (wonkity.com [67.158.26.137]) by mx1.freebsd.org (Postfix) with ESMTP id B05651458 for ; Wed, 3 Jul 2013 20:20:05 +0000 (UTC) Received: from wonkity.com (localhost [127.0.0.1]) by wonkity.com (8.14.7/8.14.7) with ESMTP id r63KK45x096650; Wed, 3 Jul 2013 14:20:04 -0600 (MDT) (envelope-from wblock@wonkity.com) Received: from localhost (wblock@localhost) by wonkity.com (8.14.7/8.14.7/Submit) with ESMTP id r63KK4YB096647; Wed, 3 Jul 2013 14:20:04 -0600 (MDT) (envelope-from wblock@wonkity.com) Date: Wed, 3 Jul 2013 14:20:04 -0600 (MDT) From: Warren Block To: Jeremy Chadwick Subject: Re: Whole disk ZFS or -a4k partition In-Reply-To: <20130703200923.GA70533@icarus.home.lan> Message-ID: References: <51D3ED4F.5030102@shurik.kiev.ua> <8afaecbe0f764963b57fac7743f483bc@DBXPR07MB064.eurprd07.prod.outlook.com> <51D42416.8080604@shurik.kiev.ua> <20130703200923.GA70533@icarus.home.lan> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=UTF-8; format=flowed X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (wonkity.com [127.0.0.1]); Wed, 03 Jul 2013 14:20:04 -0600 (MDT) Cc: "freebsd-fs@freebsd.org" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jul 2013 20:20:06 -0000 On Wed, 3 Jul 2013, Jeremy Chadwick wrote: > On Wed, Jul 03, 2013 at 04:16:06PM +0300, Alexandr wrote: >>> >> Thank you for your explain. I'll try it soon and post my results. One >> thing - I can't use gpt-disks because off my laptop's bios (Lenovo >> Thinkpad E530) cannot boot it, only mbr-style. > > You can use GPT with a BIOS that only supports MBR (in other words, you > do not need UEFI to boot from GPT). FreeBSD's boot blocks are > intelligent in this regard. Yes. However, the Thinkpad BIOS is not intelligent about GPT: http://forums.freebsd.org/showthread.php?t=26759&highlight=UEFI+GPT http://www.dec.sakura.ne.jp/~junchoon/machine/freebsd-e.html From owner-freebsd-fs@FreeBSD.ORG Wed Jul 3 20:37:10 2013 Return-Path: Delivered-To: freebsd-fs@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 04AFEEF5; Wed, 3 Jul 2013 20:37:10 +0000 (UTC) (envelope-from linimon@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id D25801699; Wed, 3 Jul 2013 20:37:09 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id r63Kb91r037173; Wed, 3 Jul 2013 20:37:09 GMT (envelope-from linimon@freefall.freebsd.org) Received: (from linimon@localhost) by freefall.freebsd.org (8.14.7/8.14.7/Submit) id r63Kb9EW037172; Wed, 3 Jul 2013 20:37:09 GMT (envelope-from linimon) Date: Wed, 3 Jul 2013 20:37:09 GMT Message-Id: <201307032037.r63Kb9EW037172@freefall.freebsd.org> To: linimon@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-fs@FreeBSD.org From: linimon@FreeBSD.org Subject: Re: kern/180236: [zfs] [nullfs] Leakage free space using ZFS with nullfs on 9.1-STABLE X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jul 2013 20:37:10 -0000 Old Synopsis: Leakage free space using ZFS with nullfs on 9.1-STABLE New Synopsis: [zfs] [nullfs] Leakage free space using ZFS with nullfs on 9.1-STABLE Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Wed Jul 3 20:36:46 UTC 2013 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=180236 From owner-freebsd-fs@FreeBSD.ORG Wed Jul 3 21:02:49 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 8D488BC7 for ; Wed, 3 Jul 2013 21:02:49 +0000 (UTC) (envelope-from jdc@koitsu.org) Received: from relay3-d.mail.gandi.net (relay3-d.mail.gandi.net [217.70.183.195]) by mx1.freebsd.org (Postfix) with ESMTP id 4A63A1818 for ; Wed, 3 Jul 2013 21:02:48 +0000 (UTC) Received: from mfilter4-d.gandi.net (mfilter4-d.gandi.net [217.70.178.134]) by relay3-d.mail.gandi.net (Postfix) with ESMTP id 51A0CA80D8; Wed, 3 Jul 2013 23:02:38 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at mfilter4-d.gandi.net Received: from relay3-d.mail.gandi.net ([217.70.183.195]) by mfilter4-d.gandi.net (mfilter4-d.gandi.net [10.0.15.180]) (amavisd-new, port 10024) with ESMTP id lbj1QjYZ5UOI; Wed, 3 Jul 2013 23:02:36 +0200 (CEST) X-Originating-IP: 76.102.14.35 Received: from jdc.koitsu.org (c-76-102-14-35.hsd1.ca.comcast.net [76.102.14.35]) (Authenticated sender: jdc@koitsu.org) by relay3-d.mail.gandi.net (Postfix) with ESMTPSA id 36527A80B0; Wed, 3 Jul 2013 23:02:36 +0200 (CEST) Received: by icarus.home.lan (Postfix, from userid 1000) id 7FFC173A1C; Wed, 3 Jul 2013 14:02:34 -0700 (PDT) Date: Wed, 3 Jul 2013 14:02:34 -0700 From: Jeremy Chadwick To: Warren Block Subject: Re: Whole disk ZFS or -a4k partition Message-ID: <20130703210234.GA71331@icarus.home.lan> References: <51D3ED4F.5030102@shurik.kiev.ua> <8afaecbe0f764963b57fac7743f483bc@DBXPR07MB064.eurprd07.prod.outlook.com> <51D42416.8080604@shurik.kiev.ua> <20130703200923.GA70533@icarus.home.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Cc: "freebsd-fs@freebsd.org" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jul 2013 21:02:49 -0000 On Wed, Jul 03, 2013 at 02:20:04PM -0600, Warren Block wrote: > On Wed, 3 Jul 2013, Jeremy Chadwick wrote: > >On Wed, Jul 03, 2013 at 04:16:06PM +0300, Alexandr wrote: > >>> > >>Thank you for your explain. I'll try it soon and post my results. One > >>thing - I can't use gpt-disks because off my laptop's bios (Lenovo > >>Thinkpad E530) cannot boot it, only mbr-style. > > > >You can use GPT with a BIOS that only supports MBR (in other words, you > >do not need UEFI to boot from GPT). FreeBSD's boot blocks are > >intelligent in this regard. > > Yes. However, the Thinkpad BIOS is not intelligent about GPT: > http://forums.freebsd.org/showthread.php?t=26759&highlight=UEFI+GPT > http://www.dec.sakura.ne.jp/~junchoon/machine/freebsd-e.html Cute. I like how the user expected Lenovo to respond on a web forum; nope, not going to happen. Phone calls + Email work better. And like many discussions about problems on forums, this one fell into the abyss. I couldn't find a PR for this T420 issue (searched PRs single and multi line text fields for T420). The patch on Junchoon's page even says: "I'm pleased if someone takes over above and send-pr", yet no one did. *I'm* not going to, because I don't have a T420 to test patches on. The only patch relevant is the one listed first, i.e. "Fixes for 9-STABLE after r243243". The other disk-related ones are irrelevant at this point. (I'm intentionally avoiding the ACPI stuff listed, since we're focused on GPT/PMBR bits in this convo.) I did verify that as stable/9 r252602 the 0xef Protected MBR workarounds for the T420 were not present. So simply put someone needs to step up to the plate who has this hardware and can verify the sys/boot/common/part.c workaround works for them. If someone needs a "more thorough patch" (a single patch that includes the part.c and diskmbr.h modifications (latter needed since "magic values" are legitimately shunned)), I can do that, but like I said, can't test it. -- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Wed Jul 3 21:26:12 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 12DDCA39 for ; Wed, 3 Jul 2013 21:26:12 +0000 (UTC) (envelope-from delphij@delphij.net) Received: from anubis.delphij.net (anubis.delphij.net [IPv6:2001:470:1:117::25]) by mx1.freebsd.org (Postfix) with ESMTP id EBA21194C for ; Wed, 3 Jul 2013 21:26:11 +0000 (UTC) Received: from zeta.ixsystems.com (drawbridge.ixsystems.com [206.40.55.65]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by anubis.delphij.net (Postfix) with ESMTPSA id 6158193D4; Wed, 3 Jul 2013 14:26:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=delphij.net; s=anubis; t=1372886771; bh=1I+PIBxXL6iPtF+UpBylMDU6QgJhqM9cxKMTjztb4ps=; h=Date:From:Reply-To:To:CC:Subject:References:In-Reply-To; b=UZi6njle4PRKDqMPL6EquhS8iMnvui+N3WXADvynfNxl94LBk7JmuvA34fE9C1Fg2 l5erxgz0elAg8aA4bSizQmirMXIvXERLk/TTrmcY8hm0/oeharGZpW7rzD5QOuyii2 uC5oyR62i660Do7lenvukPcfmGFwnDcn4kvJn3mQ= Message-ID: <51D496F2.6080000@delphij.net> Date: Wed, 03 Jul 2013 14:26:10 -0700 From: Xin Li Organization: The FreeBSD Project MIME-Version: 1.0 To: Warren Block Subject: Re: Whole disk ZFS or -a4k partition References: <51D3ED4F.5030102@shurik.kiev.ua> <8afaecbe0f764963b57fac7743f483bc@DBXPR07MB064.eurprd07.prod.outlook.com> <51D42416.8080604@shurik.kiev.ua> <20130703200923.GA70533@icarus.home.lan> In-Reply-To: X-Enigmail-Version: 1.5.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: "freebsd-fs@freebsd.org" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: d@delphij.net List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jul 2013 21:26:12 -0000 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 On 07/03/13 13:20, Warren Block wrote: > On Wed, 3 Jul 2013, Jeremy Chadwick wrote: >> On Wed, Jul 03, 2013 at 04:16:06PM +0300, Alexandr wrote: >>>> >>> Thank you for your explain. I'll try it soon and post my >>> results. One thing - I can't use gpt-disks because off my >>> laptop's bios (Lenovo Thinkpad E530) cannot boot it, only >>> mbr-style. >> >> You can use GPT with a BIOS that only supports MBR (in other >> words, you do not need UEFI to boot from GPT). FreeBSD's boot >> blocks are intelligent in this regard. > > Yes. However, the Thinkpad BIOS is not intelligent about GPT: > http://forums.freebsd.org/showthread.php?t=26759&highlight=UEFI+GPT > > http://www.dec.sakura.ne.jp/~junchoon/machine/freebsd-e.html Not true. My Lenovo T530 boots fine with GPT after a BIOS upgrade and choosing "legacy" in the EFI boot options. Cheers, - -- Xin LI https://www.delphij.net/ FreeBSD - The Power to Serve! Live free or die -----BEGIN PGP SIGNATURE----- iQEcBAEBCgAGBQJR1JbyAAoJEG80Jeu8UPuzeNgH+QENFbLZPruhFeSvTZWza7O0 v0PDXdlT85mkhwEZn6NfUTFIg5s3Nqihq1+jrDFK+2/1GD5PBCs0z4FcZrdNYU0F W4faB8qEP1AaJdkRf8hutmz+Zpg81IluhXwuKwRu780NK+7KssCpdOH+W5iyfBLZ YtzlAuyuNs8CJX6xNAU9vEdgK1OcBE8LGJMzJjQxQTVJd4npOX59oTlY7rqChUQl ptL65gSbrntvoX8qI4Y8tc9Z1h1LPIETtmz/fVfiTHflhucsFkSNf8vPv8FoVs59 tRWI9uWykQX0lduxYS063r5CouHfKq1fQN2wt9aXyzOCA3nUNFxNlSQeodzry2k= =6BiX -----END PGP SIGNATURE----- From owner-freebsd-fs@FreeBSD.ORG Wed Jul 3 21:30:58 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 344DAC78 for ; Wed, 3 Jul 2013 21:30:58 +0000 (UTC) (envelope-from jdc@koitsu.org) Received: from relay3-d.mail.gandi.net (relay3-d.mail.gandi.net [217.70.183.195]) by mx1.freebsd.org (Postfix) with ESMTP id E53811991 for ; Wed, 3 Jul 2013 21:30:57 +0000 (UTC) Received: from mfilter10-d.gandi.net (mfilter10-d.gandi.net [217.70.178.139]) by relay3-d.mail.gandi.net (Postfix) with ESMTP id CFE81A80CA; Wed, 3 Jul 2013 23:30:46 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at mfilter10-d.gandi.net Received: from relay3-d.mail.gandi.net ([217.70.183.195]) by mfilter10-d.gandi.net (mfilter10-d.gandi.net [10.0.15.180]) (amavisd-new, port 10024) with ESMTP id ZB-+8-eFAUM3; Wed, 3 Jul 2013 23:30:45 +0200 (CEST) X-Originating-IP: 76.102.14.35 Received: from jdc.koitsu.org (c-76-102-14-35.hsd1.ca.comcast.net [76.102.14.35]) (Authenticated sender: jdc@koitsu.org) by relay3-d.mail.gandi.net (Postfix) with ESMTPSA id ED9C2A80BC; Wed, 3 Jul 2013 23:30:43 +0200 (CEST) Received: by icarus.home.lan (Postfix, from userid 1000) id DDBF673A1C; Wed, 3 Jul 2013 14:30:41 -0700 (PDT) Date: Wed, 3 Jul 2013 14:30:41 -0700 From: Jeremy Chadwick To: d@delphij.net Subject: Re: Whole disk ZFS or -a4k partition Message-ID: <20130703213041.GA72100@icarus.home.lan> References: <51D3ED4F.5030102@shurik.kiev.ua> <8afaecbe0f764963b57fac7743f483bc@DBXPR07MB064.eurprd07.prod.outlook.com> <51D42416.8080604@shurik.kiev.ua> <20130703200923.GA70533@icarus.home.lan> <51D496F2.6080000@delphij.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <51D496F2.6080000@delphij.net> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: "freebsd-fs@freebsd.org" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jul 2013 21:30:58 -0000 On Wed, Jul 03, 2013 at 02:26:10PM -0700, Xin Li wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA512 > > On 07/03/13 13:20, Warren Block wrote: > > On Wed, 3 Jul 2013, Jeremy Chadwick wrote: > >> On Wed, Jul 03, 2013 at 04:16:06PM +0300, Alexandr wrote: > >>>> > >>> Thank you for your explain. I'll try it soon and post my > >>> results. One thing - I can't use gpt-disks because off my > >>> laptop's bios (Lenovo Thinkpad E530) cannot boot it, only > >>> mbr-style. > >> > >> You can use GPT with a BIOS that only supports MBR (in other > >> words, you do not need UEFI to boot from GPT). FreeBSD's boot > >> blocks are intelligent in this regard. > > > > Yes. However, the Thinkpad BIOS is not intelligent about GPT: > > http://forums.freebsd.org/showthread.php?t=26759&highlight=UEFI+GPT > > > > > http://www.dec.sakura.ne.jp/~junchoon/machine/freebsd-e.html > > Not true. My Lenovo T530 boots fine with GPT after a BIOS upgrade and > choosing "legacy" in the EFI boot options. The issue Warren listed off is specific to the Lenovo T420. It does not happen on the T430, nor later models (this is further confirmed by the last person posting in the thread). Models prior to the T420 may also have the same problem (speculative / I do not know for certain). I interpreted what Warren was saying to mean "yes it works, but some models of laptops/hardware don't work with it". -- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Wed Jul 3 21:37:24 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id A9FCAF4C for ; Wed, 3 Jul 2013 21:37:24 +0000 (UTC) (envelope-from delphij@delphij.net) Received: from anubis.delphij.net (anubis.delphij.net [IPv6:2001:470:1:117::25]) by mx1.freebsd.org (Postfix) with ESMTP id 912C019ED for ; Wed, 3 Jul 2013 21:37:24 +0000 (UTC) Received: from zeta.ixsystems.com (drawbridge.ixsystems.com [206.40.55.65]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by anubis.delphij.net (Postfix) with ESMTPSA id 4A2E4946F; Wed, 3 Jul 2013 14:37:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=delphij.net; s=anubis; t=1372887444; bh=l+Jmy4E6PMgARVe2g3MHYsKnNURX/NJ/sksBKf/todU=; h=Date:From:Reply-To:To:CC:Subject:References:In-Reply-To; b=Q4/QKRdrcwybbqmGWx7pBYRoxzFiZaZJl922om52LiKN+88HOh23032rdnP+eaWA/ J/g6IR0amqIF4Oe3mfKXFnFiaJNIF9m72ysO4aI/S/B+CdIdeDJFnaaj9xPZq4V1G6 t/72SP8NKeNn7cjrTJHyQBCevHvyGujpx7kdtuEc= Message-ID: <51D49993.1060703@delphij.net> Date: Wed, 03 Jul 2013 14:37:23 -0700 From: Xin Li Organization: The FreeBSD Project MIME-Version: 1.0 To: Alexandr Subject: Re: Whole disk ZFS or -a4k partition References: <51D3ED4F.5030102@shurik.kiev.ua> In-Reply-To: <51D3ED4F.5030102@shurik.kiev.ua> X-Enigmail-Version: 1.5.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: d@delphij.net List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jul 2013 21:37:24 -0000 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 On 07/03/13 02:22, Alexandr wrote: > Hello, community! > > I have a laptop with 2 disks - mSATA 32Gb SSD and 500Gb HDD. I plan > to use SSD as whole disk ZFS pool with root and /usr partitions and > 500Gb SATA as ZFS pool for /home and /var. My questions are: Why not using that 32GB SSD as L2ARC (and/or) ZIL device? (It's generally a good idea to have a dedicated boot zpool which means you may not be able to take advantage of the SSD, though). We do not currently support persistent L2ARC (the original author's patch is still pending Illumos developers' review) but I believe we will have it in the near future so no "pre-warming" would be needed after reboot. This also gives you more flexibility -- it's more likely to run out of space on a 32Gb device if it's configured as a separate pool, and the system decides whether to use the SSD device. > 1. Do I need to create a 4k aligned partition on HDD for zfs pool > or best to use a whole disk? Not if you don't boot from the pool. I generally create GPT partition to make it easier to manage. > 2. Where the best place for swap - ssd or hdd? SSD is much faster, > but with limited write life cycle. I'd put swap on hdd. You do not want your system to swap a lot, and generally it's a good idea to max out memory for a laptop if you run ZFS, in the case, the swap size could be reasonably large (the recommended swap size is no less than physical memory) but rarely used. It would be wasteful to use SSD for swap in this case. Cheers, - -- Xin LI https://www.delphij.net/ FreeBSD - The Power to Serve! Live free or die -----BEGIN PGP SIGNATURE----- iQEcBAEBCgAGBQJR1JmTAAoJEG80Jeu8UPuzLjAH/2V1CSGnnwXC8wK6My9Xnw+i bLCaoZvJuDEJKuOZuNaEmZ5YxkbLyvMpdvyoZ56lJpe87N+eH+syH6R7NpfSojut hvIg9fNkTy/fGo4hOTu0zdjvBUyLZ0Cst3lF/AQvQg6YRXV4CncWYqu1HpYvAD2v 4++qRnYNWZwO7oSZKl6FQ2O6l0Jciuw8mlRKjWHdJI+hic5Dli4u39XUqOUStRW6 1L57uzlv3wNh/oJYiF0x3wQJELMYaiteawpZklWTXek6zFKGoLN10NBhTASlhVUS tBL2KAdvITZf7NrvPX2HVkaWYaFkd7DT66z/sKl5Komt6h+siY0ZdGxgQmwbQm4= =EIZ7 -----END PGP SIGNATURE----- From owner-freebsd-fs@FreeBSD.ORG Wed Jul 3 21:39:06 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 44A5FFE0 for ; Wed, 3 Jul 2013 21:39:06 +0000 (UTC) (envelope-from delphij@delphij.net) Received: from anubis.delphij.net (anubis.delphij.net [IPv6:2001:470:1:117::25]) by mx1.freebsd.org (Postfix) with ESMTP id 295F119FC for ; Wed, 3 Jul 2013 21:39:06 +0000 (UTC) Received: from zeta.ixsystems.com (drawbridge.ixsystems.com [206.40.55.65]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by anubis.delphij.net (Postfix) with ESMTPSA id E87889484; Wed, 3 Jul 2013 14:39:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=delphij.net; s=anubis; t=1372887546; bh=PvqBmq66TByG2AuD2i1zUlT4VcfWI0bPuLyTDU57knU=; h=Date:From:Reply-To:To:Subject:References:In-Reply-To; b=OBT3SUOa7/dUJEclOUS/wnC+PkVQE88oxI+EvhhOLqBJM/NPxx7TH+VIhjfSleetY iiO4cXV+mDdnB5RRWr2TCKm9C2TPlwAtappnxoG+gIcA82dBAY8DZfkv3YNRrioTX2 GLqnK6KR35MIGvrdSDL7CA66iuKkQ5moKVJbECv8= Message-ID: <51D499F9.1020201@delphij.net> Date: Wed, 03 Jul 2013 14:39:05 -0700 From: Xin Li Organization: The FreeBSD Project MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Re: Whole disk ZFS or -a4k partition References: <51D3ED4F.5030102@shurik.kiev.ua> <8afaecbe0f764963b57fac7743f483bc@DBXPR07MB064.eurprd07.prod.outlook.com> <51D42416.8080604@shurik.kiev.ua> <20130703200923.GA70533@icarus.home.lan> <51D496F2.6080000@delphij.net> <20130703213041.GA72100@icarus.home.lan> In-Reply-To: <20130703213041.GA72100@icarus.home.lan> X-Enigmail-Version: 1.5.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: d@delphij.net List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jul 2013 21:39:06 -0000 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 On 07/03/13 14:30, Jeremy Chadwick wrote: > On Wed, Jul 03, 2013 at 02:26:10PM -0700, Xin Li wrote: >> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 >> >> On 07/03/13 13:20, Warren Block wrote: >>> On Wed, 3 Jul 2013, Jeremy Chadwick wrote: >>>> On Wed, Jul 03, 2013 at 04:16:06PM +0300, Alexandr wrote: >>>>>> >>>>> Thank you for your explain. I'll try it soon and post my >>>>> results. One thing - I can't use gpt-disks because off my >>>>> laptop's bios (Lenovo Thinkpad E530) cannot boot it, only >>>>> mbr-style. >>>> >>>> You can use GPT with a BIOS that only supports MBR (in other >>>> words, you do not need UEFI to boot from GPT). FreeBSD's >>>> boot blocks are intelligent in this regard. >>> >>> Yes. However, the Thinkpad BIOS is not intelligent about GPT: >>> >>> http://forums.freebsd.org/showthread.php?t=26759&highlight=UEFI+GPT >>> >>> >> >>> http://www.dec.sakura.ne.jp/~junchoon/machine/freebsd-e.html >> >> Not true. My Lenovo T530 boots fine with GPT after a BIOS >> upgrade and choosing "legacy" in the EFI boot options. > > The issue Warren listed off is specific to the Lenovo T420. It > does not happen on the T430, nor later models (this is further > confirmed by the last person posting in the thread). Models prior > to the T420 may also have the same problem (speculative / I do not > know for certain). > > I interpreted what Warren was saying to mean "yes it works, but > some models of laptops/hardware don't work with it". Ah Ok, did Lenovo released new BIOS/EFI updates for the older models? IIRC some of their consumer series also share this issue by the way. Cheers, - -- Xin LI https://www.delphij.net/ FreeBSD - The Power to Serve! Live free or die -----BEGIN PGP SIGNATURE----- iQEcBAEBCgAGBQJR1Jn5AAoJEG80Jeu8UPuzumMH/jieSJ6ntbusCzKzI+yn3XI+ rglZi7Prlcau/Tmb1gKayAqRgm7u8NUwDglAD361hO5Q69MQDoHb10sHxCCmHbPB AbHicUuwbds2F2plhizodqKWP/3ZybWkDaryd1y/mVyaVb1vvvSHLno0smw4mllE gsONRfhR8QPEUgFsi9oIakIOAeCX/Bp/mrwlwEeqQq3nyiGocxvz+UgQqo50ifpC 0ncNOTZSEXj3H3HqehAnalTMG9awIgJ+vxX04CaX8Ks0vSX0H6q9ymjp1GDtq3HD nWgGz2hyhjmSzxrV/7rSvVmOGmR45D9GYeV7WCP49m291dKyTitHdarM4CyAQco= =h4Iw -----END PGP SIGNATURE----- From owner-freebsd-fs@FreeBSD.ORG Wed Jul 3 21:52:57 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 7DD798AF; Wed, 3 Jul 2013 21:52:57 +0000 (UTC) (envelope-from berend@pobox.com) Received: from smtp.pobox.com (b-pb-sasl-quonix.pobox.com [208.72.237.35]) by mx1.freebsd.org (Postfix) with ESMTP id 46F391A95; Wed, 3 Jul 2013 21:52:56 +0000 (UTC) Received: from smtp.pobox.com (unknown [127.0.0.1]) by b-sasl-quonix.pobox.com (Postfix) with ESMTP id E8BAB2DF4C; Wed, 3 Jul 2013 21:52:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=pobox.com; h=date :message-id:from:to:cc:subject:in-reply-to:references :mime-version:content-type:content-transfer-encoding; s=sasl; bh=Xwqhdn8NhTFXy2lmjxjhCiDuUXs=; b=fHl3a+hzKPDFZloBe0LXNftEdg2y HH1SL5CQbKE88Rn4FL428v+UPYBnIOCV0Db3XrZO5A+3f0lUXcijFy73SpkuXK5n uVyVWXX4BttWA8Ewoas+GWfOcbBJac7aS9RZmdZVHZZiz2b2UxttEJaSpVVRIBKT 686lrPpVB5aF/wM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=pobox.com; h=date:message-id :from:to:cc:subject:in-reply-to:references:mime-version :content-type:content-transfer-encoding; q=dns; s=sasl; b=fH5T7z RJIcUOTNt5tbW1/qa7JBt0/+8hOn3hVJ6MNol5hDskJLVTm2AcO52H0w3GC7kegT ehpsnPF1PkyX+pLDoQx0bpfyFu1cXAczERT8ZXlCu1I4ft0/V+qHKO9qAd54zjsd ubHal3g6NwJv9cRytg3zjVCF6Q3RHtr0S82bw= Received: from b-pb-sasl-quonix.pobox.com (unknown [127.0.0.1]) by b-sasl-quonix.pobox.com (Postfix) with ESMTP id DB08E2DF4B; Wed, 3 Jul 2013 21:52:55 +0000 (UTC) Received: from bmach.nederware.nl (unknown [27.252.247.84]) by b-sasl-quonix.pobox.com (Postfix) with ESMTPA id 01DA82DF48; Wed, 3 Jul 2013 21:52:55 +0000 (UTC) Received: from quadrio.nederware.nl (quadrio.nederware.nl [192.168.33.13]) by bmach.nederware.nl (Postfix) with ESMTP id 1D5175C55; Thu, 4 Jul 2013 09:52:48 +1200 (NZST) Received: from quadrio.nederware.nl (quadrio.nederware.nl [127.0.0.1]) by quadrio.nederware.nl (Postfix) with ESMTP id AEB1949FB971; Thu, 4 Jul 2013 09:52:52 +1200 (NZST) Date: Thu, 04 Jul 2013 09:52:52 +1200 Message-ID: <87k3l748gb.wl%berend@pobox.com> From: Berend de Boer To: Gary Palmer Subject: Re: EBS snapshot backups from a FreeBSD zfs file system: zpool freeze? In-Reply-To: <20130703200241.GB60515@in-addr.com> References: <87li5o5tz2.wl%berend@pobox.com> <87ehbg5raq.wl%berend@pobox.com> <20130703055047.GA54853@icarus.home.lan> <6488DECC-2455-4E92-B432-C39490D18484@dragondata.com> <871u7g57rl.wl%berend@pobox.com> <87mwq34emp.wl%berend@pobox.com> <20130703200241.GB60515@in-addr.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL/10.8 EasyPG/1.0.0 Emacs/24.3 (i686-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) Organization: Xplain Technology Ltd MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: multipart/signed; boundary="pgp-sign-Multipart_Thu_Jul__4_09:52:52_2013-1"; micalg=pgp-sha256; protocol="application/pgp-signature" Content-Transfer-Encoding: 7bit X-Pobox-Relay-ID: EA991424-E42A-11E2-B469-E84251E3A03C-48001098!b-pb-sasl-quonix.pobox.com Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jul 2013 21:52:57 -0000 --pgp-sign-Multipart_Thu_Jul__4_09:52:52_2013-1 Content-Type: text/plain; charset=US-ASCII >>>>> "Gary" == Gary Palmer writes: Gary> Other than using SAN (FC or iSCSI), I know of no reason to Gary> do backups at the raw disk level, nor any real demand. Probably the hundreds of thousands of businesses that use Amazon AWS disagree :-) Gary> I've worked with people who have done LUN based backups in Gary> the past and they have one drawback - they tend to back up Gary> the entire LUN, irrespective of whether it is an allocated Gary> block or not. Modern systems that implement some kind of Gary> TRIM emulation (or cheat and sniff the filesystem block Gary> allocation maps) may alleviate that problem. That's not how EBS does a back up. It only backs up allocated blocks for the first time, and for subsequent backups only back up the changed blocks. Gary> However, in the vast majority of cases, people back up from Gary> above the FS, not below. This makes your use case probably Gary> more tied to EBS than you may otherwise think. People generally didn't have a choice I would say. Now millions of servers run on top of block storage. Disks are just software. That's the new world. -- All the best, Berend de Boer ------------------------------------------------------ Awesome Drupal hosting: https://www.xplainhosting.com/ --pgp-sign-Multipart_Thu_Jul__4_09:52:52_2013-1 Content-Type: application/pgp-signature Content-Transfer-Encoding: 7bit Content-Description: OpenPGP Digital Signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.17 (GNU/Linux) iQIcBAABCAAGBQJR1J00AAoJEKOfeD48G3g5bX4P/1+APTOza9rSnUxMeihQSB2m EQAGuUgYJu3YuLKuzPwxIVuSaCVJDOMaPPbTvam0WUNJ4RQtXoBOwOi+/J7ebFlC PAifLVm+/4P7a4QABFwXGFFn9igBfOB9U9yhMIlqoI/vYqcpbnfmr6h9WxfFTyGv 5bIUwQg7CaSMigajvefBl7C6tXWcAkR3qbyNQhGWluCg3HEu4GRhvPfOHLKN8Rcs KvirmL0TaswDoWmHXoOHgGBFKTyht2I0+BOcfxQOETAYz087IzV14sWSNSruRip7 mr/Cap/V7PAfHqmjZZ0N1BuW+P4ETQLlkGySK18cB4swLdk4gRGY6/e1jZLhet3w nr9KafFoo+6bRchn90z6JxFQ6GnzNkmajPJE8B2ISTsu0fDc6And9aArXxgJctV1 N0IlbmUX+o6adCD2uElpU0Nu2tLcR8DEzQCxDGMK74er7QuS3mTAWOinVu9gq76s pAQtBmWRGupalDHv8Mpr0Q88GHEbi2wQVMphIzlduzBN/uNiScIExy+Xw/O9hFM0 SauhFv/Sqe9p/wKerUNH+2v5cuATtj60OMEhpsOzhs/vxquCr+Qyb6xyfrjPynG+ /aM0cQUsddelL942Hd1esUCRbVO82t+YZmRW8tXLDj9w1uXIiaIFm1qrn8UF1N0A TpJMSBL0BgCy5gh6oPGP =IIWr -----END PGP SIGNATURE----- --pgp-sign-Multipart_Thu_Jul__4_09:52:52_2013-1-- From owner-freebsd-fs@FreeBSD.ORG Wed Jul 3 23:09:00 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 84AF98CE; Wed, 3 Jul 2013 23:09:00 +0000 (UTC) (envelope-from amvandemore@gmail.com) Received: from mail-pb0-x235.google.com (mail-pb0-x235.google.com [IPv6:2607:f8b0:400e:c01::235]) by mx1.freebsd.org (Postfix) with ESMTP id 5A0711D6F; Wed, 3 Jul 2013 23:09:00 +0000 (UTC) Received: by mail-pb0-f53.google.com with SMTP id xb12so593354pbc.26 for ; Wed, 03 Jul 2013 16:09:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=GmR4Zz/u7O2uyJkvJcnCDOlYscE8FEP0eKiMtgmdCBQ=; b=uKESS8Bwl3cjyIeAvc1geqsxAMILhdBN9WwdU7dIIP0XpO0Q+HzepW0Em2FHDYgTct BGh2nne0kPTkXfx7hRZklIotyfPNMeOHyoT0LeObT1kSnxtMJYC2C0RD6M3yEMjcwybs C9BO7eBhymiDijqDYkvaQ/wRQeStXcH7BMjI4Cjr/2NYPcySepp/OJcWyBpsk9Tucfsb znUdqclWHUl0fcVw0DgoIAIHIHCRd2CRdV+ItORtJqAfRMhe+qW49AcNBaJpZ4a0Lo17 BOtm4dBOfsHaQzFO7/rXyrdj2gpUOVyk59PdIQzZzPi3zVCrqJ8lhVoqq3zOqH7WvniF vt1g== MIME-Version: 1.0 X-Received: by 10.66.40.136 with SMTP id x8mr4464187pak.33.1372892940183; Wed, 03 Jul 2013 16:09:00 -0700 (PDT) Received: by 10.70.88.74 with HTTP; Wed, 3 Jul 2013 16:09:00 -0700 (PDT) In-Reply-To: <87k3l748gb.wl%berend@pobox.com> References: <87li5o5tz2.wl%berend@pobox.com> <87ehbg5raq.wl%berend@pobox.com> <20130703055047.GA54853@icarus.home.lan> <6488DECC-2455-4E92-B432-C39490D18484@dragondata.com> <871u7g57rl.wl%berend@pobox.com> <87mwq34emp.wl%berend@pobox.com> <20130703200241.GB60515@in-addr.com> <87k3l748gb.wl%berend@pobox.com> Date: Wed, 3 Jul 2013 18:09:00 -0500 Message-ID: Subject: Re: EBS snapshot backups from a FreeBSD zfs file system: zpool freeze? From: Adam Vande More To: Berend de Boer Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jul 2013 23:09:00 -0000 On Wed, Jul 3, 2013 at 4:52 PM, Berend de Boer wrote: > >>>>> "Gary" == Gary Palmer writes: > > Gary> Other than using SAN (FC or iSCSI), I know of no reason to > Gary> do backups at the raw disk level, nor any real demand. > > Probably the hundreds of thousands of businesses that use Amazon AWS > disagree :-) > Well, that would be a SAN backup wouldn't it :) (Not NAS as you cited earlier) > Gary> However, in the vast majority of cases, people back up from > Gary> above the FS, not below. This makes your use case probably > Gary> more tied to EBS than you may otherwise think. > > People generally didn't have a choice I would say. Now millions of > servers run on top of block storage. > "People generally" don't use Amazon and especially EBS. Servers have pretty much always run on block storage. > Disks are just software. That's the new world. No matter how much you wish hard drive to be software they are still hard drives. -- Adam Vande More From owner-freebsd-fs@FreeBSD.ORG Wed Jul 3 23:36:55 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id B3AAA7AD; Wed, 3 Jul 2013 23:36:55 +0000 (UTC) (envelope-from jdc@koitsu.org) Received: from relay4-d.mail.gandi.net (relay4-d.mail.gandi.net [217.70.183.196]) by mx1.freebsd.org (Postfix) with ESMTP id 5AAF91EAB; Wed, 3 Jul 2013 23:36:55 +0000 (UTC) Received: from mfilter3-d.gandi.net (mfilter3-d.gandi.net [217.70.178.133]) by relay4-d.mail.gandi.net (Postfix) with ESMTP id D8E7517209C; Thu, 4 Jul 2013 01:36:36 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at mfilter3-d.gandi.net Received: from relay4-d.mail.gandi.net ([217.70.183.196]) by mfilter3-d.gandi.net (mfilter3-d.gandi.net [10.0.15.180]) (amavisd-new, port 10024) with ESMTP id 2PMoYLiZnndG; Thu, 4 Jul 2013 01:36:35 +0200 (CEST) X-Originating-IP: 76.102.14.35 Received: from jdc.koitsu.org (c-76-102-14-35.hsd1.ca.comcast.net [76.102.14.35]) (Authenticated sender: jdc@koitsu.org) by relay4-d.mail.gandi.net (Postfix) with ESMTPSA id C706A17209A; Thu, 4 Jul 2013 01:36:33 +0200 (CEST) Received: by icarus.home.lan (Postfix, from userid 1000) id DC49373A1C; Wed, 3 Jul 2013 16:36:31 -0700 (PDT) Date: Wed, 3 Jul 2013 16:36:31 -0700 From: Jeremy Chadwick To: Berend de Boer Subject: Re: EBS snapshot backups from a FreeBSD zfs file system: zpool freeze? Message-ID: <20130703233631.GA74698@icarus.home.lan> References: <87ehbg5raq.wl%berend@pobox.com> <20130703055047.GA54853@icarus.home.lan> <6488DECC-2455-4E92-B432-C39490D18484@dragondata.com> <871u7g57rl.wl%berend@pobox.com> <87mwq34emp.wl%berend@pobox.com> <20130703200241.GB60515@in-addr.com> <87k3l748gb.wl%berend@pobox.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87k3l748gb.wl%berend@pobox.com> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jul 2013 23:36:55 -0000 On Thu, Jul 04, 2013 at 09:52:52AM +1200, Berend de Boer wrote: > >>>>> "Gary" == Gary Palmer writes: > > Gary> Other than using SAN (FC or iSCSI), I know of no reason to > Gary> do backups at the raw disk level, nor any real demand. > > Probably the hundreds of thousands of businesses that use Amazon AWS > disagree :-) > > > Gary> I've worked with people who have done LUN based backups in > Gary> the past and they have one drawback - they tend to back up > Gary> the entire LUN, irrespective of whether it is an allocated > Gary> block or not. Modern systems that implement some kind of > Gary> TRIM emulation (or cheat and sniff the filesystem block > Gary> allocation maps) may alleviate that problem. > > That's not how EBS does a back up. It only backs up allocated blocks > for the first time, and for subsequent backups only back up the > changed blocks. > > > Gary> However, in the vast majority of cases, people back up from > Gary> above the FS, not below. This makes your use case probably > Gary> more tied to EBS than you may otherwise think. > > People generally didn't have a choice I would say. Now millions of > servers run on top of block storage. > > Disks are just software. That's the new world. I understand what you're trying to say by this statement, but you're stretching it big time. I was opting to stay out of the thread until I saw your last line. It doesn't matter how many layers of I/O abstraction there are, eventually physical hardware for storage is involved. It doesn't matter what type of disk (mechanical vs. solid-state vs. something custom/proprietary) or what type of controller -- it does eventually end up on bare metal. I say this well-aware of the relationship between software and hardware (ex. disk firmware (software) controlling the underlying hardware (drive motor IC, underlying I/O controller, etc.)). The problem with these "software solutions" (cloud, etc.) -- I'm not sure what to call them because it varies -- are many. One of those problems is that there is a great disconnect between the user of the "solution" and the actual bare metal. And quite often the topology -- meaning the actual innards/how it all works/what transpires even on a protocol level -- is never documented or made public to the user. Hell, my experience in the enterprise world shows that quite often even support personnel don't know how it works. Why this matters: when it breaks -- and it will break, believe me -- that information becomes critical/key to troubleshooting and providing a solution. I've even encountered one "enterprise-grade storage solution" where when the product broke (as in all filesystems inaccessible), multiple levels of support engineers had no idea where the actual problem was because of how much abstraction there was between the appliance itself and the bare metal. How many engineers does it take to turn a light bulb? Apparently too many. As politely as I can: It sounds like you may have spent too much time with these types of setups, or believe them to be "magical" in some way, in turn forgetting the realities of bare metal and instead thinking "everything is software". Bzzt. And while generally I don't see eye-to-eye with Richard Stallman, storage **is** the one area where I do: http://www.guardian.co.uk/technology/2008/sep/29/cloud.computing.richard.stallman KISS principle goes a long, long way when applied to storage. And no, I am not saying "get away from this EBS/AWS stuff!" -- I'm simply saying that your statement "disks are software in the new world" is utter nonsense. -- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Wed Jul 3 23:46:07 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id E1D41EF8 for ; Wed, 3 Jul 2013 23:46:06 +0000 (UTC) (envelope-from lkchen@k-state.edu) Received: from ksu-out.merit.edu (ksu-out.merit.edu [207.75.117.133]) by mx1.freebsd.org (Postfix) with ESMTP id AFCD71F00 for ; Wed, 3 Jul 2013 23:46:06 +0000 (UTC) X-Merit-ExtLoop1: 1 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ag4FABa31FHPS3TT/2dsb2JhbABagwl7gwi9NIEJFnSCIwEBBSNiDxEEAQEBAgINGQJRCBmID5ofjnyJEYgGgSaOUoJLgRwDqQ6DLYIM X-IronPort-AV: E=Sophos;i="4.87,991,1363147200"; d="scan'208";a="933934690" X-MERIT-SOURCE: KSU Received: from ksu-sfpop-mailstore02.merit.edu ([207.75.116.211]) by sfpop-ironport05.merit.edu with ESMTP; 03 Jul 2013 19:44:52 -0400 Date: Wed, 3 Jul 2013 19:44:52 -0400 (EDT) From: "Lawrence K. Chen, P.Eng." To: freebsd-fs@freebsd.org Message-ID: <526494777.29825981.1372895092282.JavaMail.root@k-state.edu> In-Reply-To: Subject: Re: Slow resilvering with mirrored ZIL MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [70.179.144.108] X-Mailer: Zimbra 7.2.2_GA_2852 (ZimbraWebClient - GC27 ([unknown])/7.2.2_GA_2852) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jul 2013 23:46:07 -0000 well, it would kind of make sense that the second drive resilvering slower than the first if its a 4K alignment issue...because now you have both misaligned reads and writes. Plus it seems misaligned reads are worse. When, I was first doing zpools on AF drives (I didn't know about how bad this problem was), they seemed fine at first....where things got really bad when I did a scrub...which should be all reads. Redoing the zpool for aligned, made a scrub go more than 10 times faster. Meanwhile, I've done zpools on non AF drives as 4k aligned and not 4k aligned. I hope the latter doesn't come around and bite me. Unfortunately, the only way to fix is create a new 4k aligned pool and copy over the old one....can't add a 4k aligned disk into a non-4k aligned pool. Though its nicer to fix with ZFS than say with Linux. Last system I redid, I split the mirror...created a new zpool on one disk, properly aligned. zfs send/receive between the zpools. blow away the old zpool. export the new zpool, reimport it as the name of the old zpool. Add other disk in as mirror. Only snag was to update zpool.cache, which I think isn't necessary with 9.1? I was running 9.0 when I was doing this. FWIW, I ran into this problem with linux and mdadm as well....had an old raid1 lose a drive, knowing that new drive was AF...instead of copying partition table to new drive, I worked out a new partition table. I had figured I would just replace resilver and carry on...and it was okay at first, but eventually it wasn't happy. I first tried removing the old disk, repartitioning and readding...but, I guess there's other layers involved. So eventually it got remade...though I opted to get another disk of the same size as the replacement I had gotten. (the old array was 1.5TB, and the replacement disk I had gotten was 2TB.) Next project....turn the system into FreeBSD & ZFS without losing any data, and without having any extra drives or systems to help....hmmm. Suspect that it might not be possible without some intermediate steps/hardware.... ----- Original Message ----- > > Not sure if new are 4k. Done nothing about that. > But the SECOND drive, resilvering is SLOW. Not the first one. > > As stated below. Those changes are introduced to the system. > ALL new driver ARE identical, except S/N of cause :) > > On 3 jul 2013, at 16:55, Steven Hartland > wrote: > > > > > ----- Original Message ----- From: "Daniel Kalchev" > > > > To: "mxb" > > Cc: > > Sent: Wednesday, July 03, 2013 3:40 PM > > Subject: Re: Slow resilvering with mirrored ZIL > > > > > >> On 03.07.13 16:36, mxb wrote: > >>> Well, then my question persists - why I get so significant drop > >>> of speed while resilvering second drive. > >>> The only changes to the system are: > >>> > >>> 1. Second partition for ZIL to create a mirror > >>> 2. New disks are 7200rpm. old ones are 5400rpm. > >>> > > > > Its not something like the old disks are 512byte sectors > > where as the new ones are 4k? > > > > It this is the case having already replaced one disk you've > > killed performance as its having to do lots more work reading > > none 4k aligned data? > > > > Regards > > steve > > From owner-freebsd-fs@FreeBSD.ORG Wed Jul 3 23:56:42 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 8480DA06; Wed, 3 Jul 2013 23:56:42 +0000 (UTC) (envelope-from berend@pobox.com) Received: from smtp.pobox.com (b-pb-sasl-quonix.pobox.com [208.72.237.35]) by mx1.freebsd.org (Postfix) with ESMTP id A59E81F6C; Wed, 3 Jul 2013 23:56:41 +0000 (UTC) Received: from smtp.pobox.com (unknown [127.0.0.1]) by b-sasl-quonix.pobox.com (Postfix) with ESMTP id C68F92D7AA; Wed, 3 Jul 2013 23:56:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=pobox.com; h=date :message-id:from:to:cc:subject:in-reply-to:references :mime-version:content-type:content-transfer-encoding; s=sasl; bh=BKNMTqKqLGxXRvpzxffWu55+Arc=; b=syzx8UaMb3tyufqGYGbJoONDsGKI CHdqQuiYOb5IZV0Vk4AyJyt1E3FsNTPL0F7IN/q0ew7iTas6PBPFKDQ56EO9RAb7 L0VgtQIoeepMZsLvrCClkRV3XRnhMC2geulSYUT4NdUkB5VoStkvkDvdVVo0/fY7 nUlR6Ueaa0+96OQ= DomainKey-Signature: a=rsa-sha1; c=nofws; d=pobox.com; h=date:message-id :from:to:cc:subject:in-reply-to:references:mime-version :content-type:content-transfer-encoding; q=dns; s=sasl; b=PR13j0 oMovKaKi0EE4VQeEVgXpAJrELnoupJh76Paw2JfU6BHSJYv85y8YbfE5l0Ei/itl 3a7frmSUQZmh/6DlKtO43zxemsWDAeQwnagxS0W4E7v9eMJXAtoa8hC4zoV98Dqx RX2EHIqL+KeNCYtfUnnbmSAwacASfutXBKi8Q= Received: from b-pb-sasl-quonix.pobox.com (unknown [127.0.0.1]) by b-sasl-quonix.pobox.com (Postfix) with ESMTP id BC0AC2D7A8; Wed, 3 Jul 2013 23:56:39 +0000 (UTC) Received: from bmach.nederware.nl (unknown [27.252.247.84]) by b-sasl-quonix.pobox.com (Postfix) with ESMTPA id D2E382D7A6; Wed, 3 Jul 2013 23:56:38 +0000 (UTC) Received: from quadrio.nederware.nl (quadrio.nederware.nl [192.168.33.13]) by bmach.nederware.nl (Postfix) with ESMTP id 789055C57; Thu, 4 Jul 2013 11:56:31 +1200 (NZST) Received: from quadrio.nederware.nl (quadrio.nederware.nl [127.0.0.1]) by quadrio.nederware.nl (Postfix) with ESMTP id 053F549FB971; Thu, 4 Jul 2013 11:56:36 +1200 (NZST) Date: Thu, 04 Jul 2013 11:56:35 +1200 Message-ID: <87d2qz42q4.wl%berend@pobox.com> From: Berend de Boer To: Jeremy Chadwick Subject: Re: EBS snapshot backups from a FreeBSD zfs file system: zpool freeze? In-Reply-To: <20130703233631.GA74698@icarus.home.lan> References: <87ehbg5raq.wl%berend@pobox.com> <20130703055047.GA54853@icarus.home.lan> <6488DECC-2455-4E92-B432-C39490D18484@dragondata.com> <871u7g57rl.wl%berend@pobox.com> <87mwq34emp.wl%berend@pobox.com> <20130703200241.GB60515@in-addr.com> <87k3l748gb.wl%berend@pobox.com> <20130703233631.GA74698@icarus.home.lan> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL/10.8 EasyPG/1.0.0 Emacs/24.3 (i686-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) Organization: Xplain Technology Ltd MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: multipart/signed; boundary="pgp-sign-Multipart_Thu_Jul__4_11:56:35_2013-1"; micalg=pgp-sha256; protocol="application/pgp-signature" Content-Transfer-Encoding: 7bit X-Pobox-Relay-ID: 338F47A0-E43C-11E2-9A28-E84251E3A03C-48001098!b-pb-sasl-quonix.pobox.com Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jul 2013 23:56:42 -0000 --pgp-sign-Multipart_Thu_Jul__4_11:56:35_2013-1 Content-Type: text/plain; charset=US-ASCII >>>>> "Jeremy" == Jeremy Chadwick writes: Jeremy> As politely as I can: It sounds like you may have spent Jeremy> too much time with these types of setups, or believe them Jeremy> to be "magical" in some way, in turn forgetting the Jeremy> realities of bare metal and instead thinking "everything Jeremy> is software". Bzzt. Heh. The solution with Amazon is even worse: if things go wrong, you're screwed. Can't get your disks back. You can't call anyone. There's no bare metal to touch, and no, they won't let you into their data centres. So I'm actually trying to avoid the magic. The only guarantee I basically have is that if I have made an EBS snapshot of my disk, I can, one day, restore that, and that this snapshot is stored in some multi-redundancy (magic!) cloud. (And obviously you can try to run a mirror in another data centre using zfs send/recv, yes, will run that too). If you go with AWS, there are no phone calls to make. Disk gone is disk gone. So you need to have working backup strategies in place. -- All the best, Berend de Boer ------------------------------------------------------ Awesome Drupal hosting: https://www.xplainhosting.com/ --pgp-sign-Multipart_Thu_Jul__4_11:56:35_2013-1 Content-Type: application/pgp-signature Content-Transfer-Encoding: 7bit Content-Description: OpenPGP Digital Signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.17 (GNU/Linux) iQIcBAABCAAGBQJR1LozAAoJEKOfeD48G3g5T2EQAJT9fT3fWrsA6BjUOBGZkA7K 5QmtRMRkw6Yb5FBoV/Qk1yJsNKKKvHzUhxinhNngM+FkrqxILVsgj1a78jGErOtc VWVV8eAQdOHqZcYrF/n8qyE+TkLqzGbVouNPpJmAnQzAHiHJp3hs3YVzBe5fI23l CnjPGHBSeXDpbRdIYpT4eWJwCR9JxR3nlRBHPwy0YwakTO1OZT9JUgXUO9OdNafK HavO6g/T00bKpMlGUR6Sg25KdeTjtK3NJjf1l0uffQ98l7ZuSd5bdHJDy8XtpxMT mZ/wKRc6BolyedKtUOmPls1MPB33m6Un2f9smDaDgfAdoo0zs6R+hErfDdK/w7Xl TXsNtMwYmQzGBVNFvZieT4Rtt2wdJu9LOaHn21lWz43p2rZvJ0ovIingy38tolfe obO8EIlTCkjI+QBOy9WMROY5yEOqRraRtWdN8pcruY6glahg44mJTmUEik+6cFnQ e4BNsGxKYqAHReZLlGdXstZfvR9pHizjumvPORpfSpOdHC7Vkzgks/iTO21u6hnj mNMbPgKmQJVGg39f5z69Ze/KEwfU+3b7g5w5/15RahLq351sMLqMq7kLgDHSXH0f saD12Vif6Ipz7WfRhwCSHCz2+BxL98VOsQ/sRc2YwmFvBJi//5dDlik/CJIy/g9a HwM9qPLXGPpx9M4BJZSP =AuXK -----END PGP SIGNATURE----- --pgp-sign-Multipart_Thu_Jul__4_11:56:35_2013-1-- From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 00:00:02 2013 Return-Path: Delivered-To: freebsd-fs@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 2D651E3D for ; Thu, 4 Jul 2013 00:00:02 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id 1FB771F91 for ; Thu, 4 Jul 2013 00:00:02 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id r64001Gi076819 for ; Thu, 4 Jul 2013 00:00:01 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.7/8.14.7/Submit) id r64001v6076818; Thu, 4 Jul 2013 00:00:01 GMT (envelope-from gnats) Date: Thu, 4 Jul 2013 00:00:01 GMT Message-Id: <201307040000.r64001v6076818@freefall.freebsd.org> To: freebsd-fs@FreeBSD.org Cc: From: "Steven Hartland" Subject: Re: kern/180236: [zfs] [nullfs] Leakage free space using ZFS with nullfs on 9.1-STABLE X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: Steven Hartland List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 00:00:02 -0000 The following reply was made to PR kern/180236; it has been noted by GNATS. From: "Steven Hartland" To: , "Ivan Klymenko" Cc: Subject: Re: kern/180236: [zfs] [nullfs] Leakage free space using ZFS with nullfs on 9.1-STABLE Date: Thu, 4 Jul 2013 00:58:13 +0100 Looks like nullfs isn't cleaning up correctly in the case where a rename colides with an existing file hence results in an implicit remove. This can be seen in the zdb output for the volume in that before the unmount all the plain file entries still exist but after the unmount of nullfs they are gone. In addition to this there may be an issue with the ZFS delete queue as, while after unmounting the nullfs the file entries do disappear the size of the delete queue doesn't seem to clear down properly. Dataset testpool/nullfs [ZPL], ID 40, cr_txg 19, 2.41M, 7 objects ZIL header: claim_txg 0, claim_blk_seq 0, claim_lr_seq 0 replay_seq 0, flags 0x0 Object lvl iblk dblk dsize lsize %full type 0 7 16K 16K 15.0K 32.0M 0.01 DMU dnode -1 1 16K 512 1K 512 100.00 ZFS user/group used -2 1 16K 512 1K 512 100.00 ZFS user/group used 1 1 16K 1K 1K 1K 100.00 ZFS master node 2 1 16K 512 1K 512 100.00 SA master node 3 3 16K 16K 2.38M 7.45M 100.00 ZFS delete queue 4 1 16K 512 1K 512 100.00 ZFS directory 5 1 16K 1.50K 1K 1.50K 100.00 SA attr registration 6 1 16K 16K 7.00K 32K 100.00 SA attr layouts 7 1 16K 512 1K 512 100.00 ZFS directory This is an area I'm not familiar with yet, so it may be this is expected but it should be checked. Regards Steve From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 00:04:21 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id E4230315 for ; Thu, 4 Jul 2013 00:04:21 +0000 (UTC) (envelope-from jdc@koitsu.org) Received: from relay5-d.mail.gandi.net (relay5-d.mail.gandi.net [217.70.183.197]) by mx1.freebsd.org (Postfix) with ESMTP id A0AFC1FCA for ; Thu, 4 Jul 2013 00:04:21 +0000 (UTC) Received: from mfilter5-d.gandi.net (mfilter5-d.gandi.net [217.70.178.132]) by relay5-d.mail.gandi.net (Postfix) with ESMTP id 67E0441C06D; Thu, 4 Jul 2013 02:04:10 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at mfilter5-d.gandi.net Received: from relay5-d.mail.gandi.net ([217.70.183.197]) by mfilter5-d.gandi.net (mfilter5-d.gandi.net [10.0.15.180]) (amavisd-new, port 10024) with ESMTP id b1QZSAl7SnQn; Thu, 4 Jul 2013 02:04:08 +0200 (CEST) X-Originating-IP: 76.102.14.35 Received: from jdc.koitsu.org (c-76-102-14-35.hsd1.ca.comcast.net [76.102.14.35]) (Authenticated sender: jdc@koitsu.org) by relay5-d.mail.gandi.net (Postfix) with ESMTPSA id 9388A41C062; Thu, 4 Jul 2013 02:04:07 +0200 (CEST) Received: by icarus.home.lan (Postfix, from userid 1000) id DCB6473A1D; Wed, 3 Jul 2013 17:04:05 -0700 (PDT) Date: Wed, 3 Jul 2013 17:04:05 -0700 From: Jeremy Chadwick To: mxb Subject: Re: Slow resilvering with mirrored ZIL Message-ID: <20130704000405.GA75529@icarus.home.lan> References: <51D42107.1050107@digsys.bg> <2EF46A8C-6908-4160-BF99-EC610B3EA771@alumni.chalmers.se> <51D437E2.4060101@digsys.bg> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 00:04:22 -0000 On Wed, Jul 03, 2013 at 05:12:17PM +0200, mxb wrote: > > Not sure if new are 4k. Done nothing about that. > But the SECOND drive, resilvering is SLOW. Not the first one. > > As stated below. Those changes are introduced to the system. > ALL new driver ARE identical, except S/N of cause :) > > On 3 jul 2013, at 16:55, Steven Hartland wrote: > > > > > ----- Original Message ----- From: "Daniel Kalchev" > > To: "mxb" > > Cc: > > Sent: Wednesday, July 03, 2013 3:40 PM > > Subject: Re: Slow resilvering with mirrored ZIL > > > > > >> On 03.07.13 16:36, mxb wrote: > >>> Well, then my question persists - why I get so significant drop of speed while resilvering second drive. > >>> The only changes to the system are: > >>> > >>> 1. Second partition for ZIL to create a mirror > >>> 2. New disks are 7200rpm. old ones are 5400rpm. > >>> > > > > Its not something like the old disks are 512byte sectors > > where as the new ones are 4k? > > > > It this is the case having already replaced one disk you've > > killed performance as its having to do lots more work reading > > none 4k aligned data? Not enough hard information to diagnose. What's needed is below; you can XXX out system names if you want, but please do not XXX out anything else. - Output from: dmesg - Output from: zpool status - Output from: zpool get all - Output from: zfs get all - Output from: "gpart show -p" for every disk on the system - Output from: cat /etc/sysctl.conf - Output from: cat /boot/loader.conf - Output from: "smartctl -a" for every disk that's used by ZFS (please use smartmontools 6.1 or newer in this case; install ports/sysutils/smartmontools) I can think of one tunable related to resilvering that may/might help with your problem, but I'm not going to mention it until the above can be provided. -- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 00:13:36 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 73ABD512 for ; Thu, 4 Jul 2013 00:13:36 +0000 (UTC) (envelope-from prvs=1897fed9ea=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 185821004 for ; Thu, 4 Jul 2013 00:13:35 +0000 (UTC) Received: from r2d2 ([82.69.141.170]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50004713854.msg for ; Thu, 04 Jul 2013 01:13:32 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Thu, 04 Jul 2013 01:13:32 +0100 (not processed: message from valid local sender) X-MDDKIM-Result: neutral (mail1.multiplay.co.uk) X-MDRemoteIP: 82.69.141.170 X-Return-Path: prvs=1897fed9ea=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-fs@FreeBSD.org Message-ID: From: "Steven Hartland" To: "Berend de Boer" , References: <87li5o5tz2.wl%berend@pobox.com> Subject: Re: EBS snapshot backups from a FreeBSD zfs file system: zpool freeze? Date: Thu, 4 Jul 2013 01:13:44 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 00:13:36 -0000 ----- Original Message ----- From: "Berend de Boer" > Hi All, > > I'm experimenting with building a FreeBSD NFS server on Amazon AWS > EC2. I've created a zpool with 5 disks in a raidz2 configuration. > > How can I make a consistent backup of this using EBS? > > On Linux' file systems I can freeze a file system, start the backup of > all disks, and unfreeze. This freeze usually only takes 100ms or so. > > ZFS on FreeBSD does not appear to have such an option. I.e. what I'm > looking for is basically a hardware based snapshot. ZFS should simply > be suspended at a recoverable point for a few hundred ms. > > A similar question from 2010 is here: > http://thr3ads.net/zfs-discuss/2010/11/580781-how-to-quiesce-and-unquiesc-zfs-and-zpool-for-array-hardware-snapshots > > Absent a "zfs freeze" it seems using FreeBSD/zfs on AWS with EBS is > going to be impossible. Unfortunately that means back to Linux sigh. Not been following the thread really so excuse if this has already been mentioned ;-) There is a zpool freeze which stops spa_sync() from doing anything, so that the only way to record changes is on the ZIL. The comment in the zpool_main is: "'freeze' is a vile debugging abomination" so it's evil but might be what you want if you're up to writing some code. For more info have a look at ztest. Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 00:28:52 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id B705291C for ; Thu, 4 Jul 2013 00:28:52 +0000 (UTC) (envelope-from spork@bway.net) Received: from smtp2.bway.net (smtp2.bway.net [216.220.96.28]) by mx1.freebsd.org (Postfix) with ESMTP id 75499107E for ; Thu, 4 Jul 2013 00:28:52 +0000 (UTC) Received: from toasty.sporklab.com (foon.sporktines.com [96.57.144.66]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: spork@bway.net) by smtp2.bway.net (Postfix) with ESMTPSA id 6A18195873; Wed, 3 Jul 2013 20:28:45 -0400 (EDT) References: <87li5o5tz2.wl%berend@pobox.com> In-Reply-To: Mime-Version: 1.0 (Apple Message framework v1085) X-Priority: 3 Content-Type: text/plain; charset=us-ascii Message-Id: Content-Transfer-Encoding: quoted-printable From: Charles Sprickman Subject: Re: EBS snapshot backups from a FreeBSD zfs file system: zpool freeze? Date: Wed, 3 Jul 2013 20:28:44 -0400 To: "Steven Hartland" X-Mailer: Apple Mail (2.1085) Cc: freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 00:28:52 -0000 On Jul 3, 2013, at 8:13 PM, Steven Hartland wrote: > ----- Original Message ----- From: "Berend de Boer" >> Hi All, >> I'm experimenting with building a FreeBSD NFS server on Amazon AWS >> EC2. I've created a zpool with 5 disks in a raidz2 configuration. >>=20 >> How can I make a consistent backup of this using EBS? >> On Linux' file systems I can freeze a file system, start the backup = of >> all disks, and unfreeze. This freeze usually only takes 100ms or so. >> ZFS on FreeBSD does not appear to have such an option. I.e. what I'm >> looking for is basically a hardware based snapshot. ZFS should simply >> be suspended at a recoverable point for a few hundred ms. >> A similar question from 2010 is here: >> = http://thr3ads.net/zfs-discuss/2010/11/580781-how-to-quiesce-and-unquiesc-= zfs-and-zpool-for-array-hardware-snapshots >> Absent a "zfs freeze" it seems using FreeBSD/zfs on AWS with EBS is >> going to be impossible. Unfortunately that means back to Linux sigh. >=20 > Not been following the thread really so excuse if this has already > been mentioned ;-) >=20 > There is a zpool freeze which stops spa_sync() from doing > anything, so that the only way to record changes is on the ZIL. I don't use EC2 or any of the other Amazon "cloud" stuff, but I'd assume you could even have another chunk of block storage as a dedicated ZIL and you could pass on snapshotting that. What effect would that have on the pool while it's "frozen"? Are all writes, sync or not, sent to ZIL while frozen? I do have an interest in this as I do run some ESXi hosts, and in addition to file-level backups, I do take snapshots with vmware. Knowing that the "quiesce writes" option is actually doing something=20 and is not just a no-op when running the FreeBSD guest tools would be nice. Any arguments that claim "people don't do this" I think are a bit dated, not only are the Amazon services widely used, but host-your-own=20 virtualization, be it VMware, Xen, VirtualBox, or Hyper-V is extremely = common. > The comment in the zpool_main is: "'freeze' is a vile debugging > abomination" so it's evil but might be what you want if you're up to > writing some code. Anything going on with this on the Illumos or ZFS on Linux side? Charles >=20 > For more info have a look at ztest. >=20 > Regards > Steve >=20 > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > This e.mail is private and confidential between Multiplay (UK) Ltd. = and the person or entity to whom it is addressed. In the event of = misdirection, the recipient is prohibited from using, copying, printing = or otherwise disseminating it or any information contained in it.=20 > In the event of misdirection, illegible or incomplete transmission = please telephone +44 845 868 1337 > or return the E.mail to postmaster@multiplay.co.uk. >=20 > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 00:55:41 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id C4F6AD4F for ; Thu, 4 Jul 2013 00:55:41 +0000 (UTC) (envelope-from berend@pobox.com) Received: from smtp.pobox.com (b-pb-sasl-quonix.pobox.com [208.72.237.35]) by mx1.freebsd.org (Postfix) with ESMTP id 90C8C112B for ; Thu, 4 Jul 2013 00:55:41 +0000 (UTC) Received: from smtp.pobox.com (unknown [127.0.0.1]) by b-sasl-quonix.pobox.com (Postfix) with ESMTP id 239DB22AD6; Thu, 4 Jul 2013 00:55:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=pobox.com; h=date :message-id:from:to:cc:subject:in-reply-to:references :mime-version:content-type:content-transfer-encoding; s=sasl; bh=G2HHEnFxiiBtBC+C13z5tWEJQnQ=; b=Zhn8KSB4pC0NoolKWbINsliIj1MS MoWMqQdvVlCQ2xCy4YLnFZ80KR10PDbqRTEvVsQkC5z/9DZkbTDDG3a8QcudHepn IrqB6sqKESWn5FTF9aKdZrJry+3RgHtOGetVjZMR08GgdlDW+TyKCi7MvtiWbxtU 0C1p06a2zIx1dA0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=pobox.com; h=date:message-id :from:to:cc:subject:in-reply-to:references:mime-version :content-type:content-transfer-encoding; q=dns; s=sasl; b=h9CTtV lLVBiO5pfsynoJrUOtV+Y9/Xrtyq6LYY7KIV8RmjUy4lgtfuRMf4aWXJjffOuek3 drYwZFA9D5Q0T1JUue8mWcb1wbNZ/GBlZcxkW6z/xzURh2/RIazoBfM8gi6nwMyD uapBTVX8dgZcefpuX1ebIiqYMQYa9g3KxMBdY= Received: from b-pb-sasl-quonix.pobox.com (unknown [127.0.0.1]) by b-sasl-quonix.pobox.com (Postfix) with ESMTP id 19ABE22AD4; Thu, 4 Jul 2013 00:55:40 +0000 (UTC) Received: from bmach.nederware.nl (unknown [27.252.247.84]) by b-sasl-quonix.pobox.com (Postfix) with ESMTPA id 3553922AD2; Thu, 4 Jul 2013 00:55:39 +0000 (UTC) Received: from quadrio.nederware.nl (quadrio.nederware.nl [192.168.33.13]) by bmach.nederware.nl (Postfix) with ESMTP id 096B15C80; Thu, 4 Jul 2013 12:55:32 +1200 (NZST) Received: from quadrio.nederware.nl (quadrio.nederware.nl [127.0.0.1]) by quadrio.nederware.nl (Postfix) with ESMTP id 9696849FB971; Thu, 4 Jul 2013 12:55:36 +1200 (NZST) Date: Thu, 04 Jul 2013 12:55:36 +1200 Message-ID: <87bo6j3zzr.wl%berend@pobox.com> From: Berend de Boer To: "Steven Hartland" Subject: Re: EBS snapshot backups from a FreeBSD zfs file system: zpool freeze? In-Reply-To: References: <87li5o5tz2.wl%berend@pobox.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL/10.8 EasyPG/1.0.0 Emacs/24.3 (i686-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) Organization: Xplain Technology Ltd MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: multipart/signed; boundary="pgp-sign-Multipart_Thu_Jul__4_12:55:36_2013-1"; micalg=pgp-sha256; protocol="application/pgp-signature" Content-Transfer-Encoding: 7bit X-Pobox-Relay-ID: 71C66F64-E444-11E2-8604-E84251E3A03C-48001098!b-pb-sasl-quonix.pobox.com Cc: freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 00:55:41 -0000 --pgp-sign-Multipart_Thu_Jul__4_12:55:36_2013-1 Content-Type: text/plain; charset=US-ASCII >>>>> "Steven" == Steven Hartland writes: Steven> Not been following the thread really so excuse if this has Steven> already been mentioned ;-) Not yet. Steven> There is a zpool freeze which stops spa_sync() from Steven> doing anything, so that the only way to record changes is Steven> on the ZIL. Pardon my ignorance, I don't really understand this. First of all, is there a way to recover from a freeze? I.e. do I need to unfreeze? What you're saying is that it is a one way street only? I just did this on my semi-production server to see what is going to happen. Nothing so far. I had a look at the code, this is what it calls: static int zfs_ioc_pool_freeze(zfs_cmd_t *zc) { spa_t *spa; int error; error = spa_open(zc->zc_name, &spa, FTAG); if (error == 0) { spa_freeze(spa); spa_close(spa, FTAG); } return (error); } And spa_freeze is this: void spa_freeze(spa_t *spa) { uint64_t freeze_txg = 0; spa_config_enter(spa, SCL_ALL, FTAG, RW_WRITER); if (spa->spa_freeze_txg == UINT64_MAX) { freeze_txg = spa_last_synced_txg(spa) + TXG_SIZE; spa->spa_freeze_txg = freeze_txg; } spa_config_exit(spa, SCL_ALL, FTAG); if (freeze_txg != 0) txg_wait_synced(spa_get_dsl(spa), freeze_txg); } All nicely undocumented code. Steven> The comment in the zpool_main is: "'freeze' is a vile Steven> debugging abomination" so it's evil but might be what you Steven> want if you're up to writing some code. Yeah! But thanks for digging this up, I hadn't expected undocumented commands for zpool! Steven> For more info have a look at ztest. Another undocumented tool. How would I use this in this case? -- All the best, Berend de Boer ------------------------------------------------------ Awesome Drupal hosting: https://www.xplainhosting.com/ --pgp-sign-Multipart_Thu_Jul__4_12:55:36_2013-1 Content-Type: application/pgp-signature Content-Transfer-Encoding: 7bit Content-Description: OpenPGP Digital Signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.17 (GNU/Linux) iQIcBAABCAAGBQJR1MgIAAoJEKOfeD48G3g5opcP/RiBZ7jhoWckQ2cFaz9pCL3l o0LlYgvHGbrkoCVy4tg4CEF2Y+5iKIsVkk728Tehqv4/RG7mQg7ZgsOQwVs7+cX+ 0I/3r6SG8K35/kUKAvJ2z1LAzehZw3tmin2h41vr7B41F7/hSCkuQwIQRLnFGzHp Ic7OYRpWOAGwBzX7AUt4pYr9INH0Wc3ldl4v06l73HswepmwV78GHKZc5XSIqzin qFQengiFScD6d+ms7h5b8M8CAXC7LJdu/pkjiIhS4LDHBvHokYy59siv3nAg5wAw vTRYkhbRhJJHhmK64m9j1ZtyLQINRYLNAHpeSCYR5RzKDkYJ9oHYalFMWQwN26bi 8T+tW57AJw4ppuh1Q2Scu3oonE58gzli/+wH2fzsvZbiWKGG7jpuwse/wjEcLhtg 1rvkt0z1hBe2z0YnjHcPzMab2rvIke7sJPM6Xk/xiIT0SYnCH9Ka2iVSpWi6nMxP R3HlsMNlulSpKuJXafKTk2mwwyveaWnKnKaPY/bI3Mgdp9z3CC5qi0d8glKY2xQH zwumk8pHI9H8v7oH0avMqTM66M2q/AnOpHTFgjJ9ONCmp9Tr+f2s3CWBmKaE7JJW KaQpaIoeBzIKGQCK5JjBQ1edtfWLrZgBn+RpLRQwXRgicVEXGzOIfDoWoTbBc7si 7eL0yxz6n7+YZmoY0piH =iiSX -----END PGP SIGNATURE----- --pgp-sign-Multipart_Thu_Jul__4_12:55:36_2013-1-- From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 01:08:31 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id E1B59EDB; Thu, 4 Jul 2013 01:08:31 +0000 (UTC) (envelope-from jdc@koitsu.org) Received: from relay5-d.mail.gandi.net (relay5-d.mail.gandi.net [217.70.183.197]) by mx1.freebsd.org (Postfix) with ESMTP id 557E411B5; Thu, 4 Jul 2013 01:08:31 +0000 (UTC) Received: from mfilter10-d.gandi.net (mfilter10-d.gandi.net [217.70.178.139]) by relay5-d.mail.gandi.net (Postfix) with ESMTP id 1BC4E41C05C; Thu, 4 Jul 2013 03:08:20 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at mfilter10-d.gandi.net Received: from relay5-d.mail.gandi.net ([217.70.183.197]) by mfilter10-d.gandi.net (mfilter10-d.gandi.net [10.0.15.180]) (amavisd-new, port 10024) with ESMTP id vfFXh18febxI; Thu, 4 Jul 2013 03:08:18 +0200 (CEST) X-Originating-IP: 76.102.14.35 Received: from jdc.koitsu.org (c-76-102-14-35.hsd1.ca.comcast.net [76.102.14.35]) (Authenticated sender: jdc@koitsu.org) by relay5-d.mail.gandi.net (Postfix) with ESMTPSA id 5361641C07D; Thu, 4 Jul 2013 03:08:17 +0200 (CEST) Received: by icarus.home.lan (Postfix, from userid 1000) id 5DF7B73A1C; Wed, 3 Jul 2013 18:08:15 -0700 (PDT) Date: Wed, 3 Jul 2013 18:08:15 -0700 From: Jeremy Chadwick To: Berend de Boer Subject: Re: EBS snapshot backups from a FreeBSD zfs file system: zpool freeze? Message-ID: <20130704010815.GB75529@icarus.home.lan> References: <6488DECC-2455-4E92-B432-C39490D18484@dragondata.com> <871u7g57rl.wl%berend@pobox.com> <87mwq34emp.wl%berend@pobox.com> <20130703200241.GB60515@in-addr.com> <87k3l748gb.wl%berend@pobox.com> <20130703233631.GA74698@icarus.home.lan> <87d2qz42q4.wl%berend@pobox.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87d2qz42q4.wl%berend@pobox.com> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 01:08:32 -0000 On Thu, Jul 04, 2013 at 11:56:35AM +1200, Berend de Boer wrote: > >>>>> "Jeremy" == Jeremy Chadwick writes: > > Jeremy> As politely as I can: It sounds like you may have spent > Jeremy> too much time with these types of setups, or believe them > Jeremy> to be "magical" in some way, in turn forgetting the > Jeremy> realities of bare metal and instead thinking "everything > Jeremy> is software". Bzzt. > > Heh. The solution with Amazon is even worse: if things go wrong, > you're screwed. Can't get your disks back. You can't call > anyone. There's no bare metal to touch, and no, they won't let you > into their data centres. > > So I'm actually trying to avoid the magic. > > The only guarantee I basically have is that if I have made an EBS > snapshot of my disk, I can, one day, restore that, and that this > snapshot is stored in some multi-redundancy (magic!) cloud. > > (And obviously you can try to run a mirror in another data centre > using zfs send/recv, yes, will run that too). > > If you go with AWS, there are no phone calls to make. Disk gone is > disk gone. So you need to have working backup strategies in place. How is being reliant on EBS (for readers: Amazon Elastic Block Store, which is advertised as, and I quote, "a virtualised storage service") "avoiding the magic"? You're still reliant on black-box voodoo. :-) I think the limiting factor here is more related to your need to use AWS and its services than using bare metal. I respect that/understand that, and won't get into a debate about that. So, that said... As I see them, your choices are these: - Keep using EBS, doing all of this at a "higher level" (meaning the Amazon level) by making snapshots of the actual "storage disks" that are referenced/used by the underlying OS -- FreeBSD, as we have stated, does not have a way to do this (AFAIK) from within the OS (meaning "induce an EBS snapshot"). Linux may have that, but no matter what, it's an Amazon proprietary thing. Are we clear? You should still be able to use whatever Amazon's EBS or AWS provides (as a user interface) to make "snapshots" of those disks, at least that's what I'd assume. I have no familiarity with this, etc.. - Within the OS: raw disk dump. It doesn't matter what the "backing store" is (e.g. EBS, something across iSCSI, etc.). Example command: dd if=/dev/da0 of=/some/other/place bs=64k (or you can send it to stdout and pipe that across ssh, netcat, etc.) This will read every LBA on the device -- including unused/untouched space, the partitioning scheme/layout (i.e. MBR/GPT), and the boot blocks/bootstrap mechanisms -- and will be the size of the disk itself (e.g. if 1TB, then the resulting file will be 1TB). From what you've said, this does not work for you because of the immense size (even if piped through gzip) does not allow for incremental snapshots of changes to the disk, and takes a long time. There is no way on FreeBSD or Linux, to my knowledge, to accomplish the latter at the disk level -- at the filesystem level yes, disk level no. Most people prefer to do this at the filesystem level (which if done right is also very fast -- you know this already though). - Within the OS: UFS+SU (do not use journalling/SUJ with this feature, it's known to be busted/throws a nastygram) filesystem snapshots. Commonly accomplished using dump(8), and restore using restore(8). dump(8) accomplishes snapshot generation by calling mksnap_ffs(8) (also a utility). Snapshot generation is usually very fast (commonly a few seconds), but depends on lots of things which I will not get into. dump(8) and restore(8) both support incremental snapshots, and also convenient, restore(8) has an interactive mode where you can navigate a snapshot and extract individual files. These are filesystem snapshots, not disk snapshots, and thus do not include things like the partition table (MBR/GPT) nor the bootstraps. This matters more if you're trying to do a "bare metal restore" of a box (i.e. box #0 broke badly, need to turn box #1 into the role of box #0 in every way/shape/form), so an admin in that case has to recreate the partition table and reinstall bootstraps manually. (There are ways to back these up as well via dd, but I am not going to go into that). And now some real-world experience: what isn't mentioned/discussed aside from mailing lists and what not is that this methodology is unreliable (for example I have avoided it and been a critic of it for several years). There are problems with the UFS-specific snapshot "stuff" that have existed for years, where sometimes the snapshot generation never ends, sometimes it causes the system to lock up, and lots of other problems). I will not provide all the details -- just go looking at the mailing lists -stable and -fs over the past several years and you'll see what I'm talking about. Likewise real-world experience: these bugs are what drove me away from using UFS snapshots, and I often boycott them for this reason. - Within the OS: ZFS and use "zfs snapshot". These are, of course, ZFS filesystem snapshots. Incrementals are supported, and these are also usually very fast (few seconds). You can also use "zfs {send,recv}" to send/receive the snapshots to another system and have them restored on that system (many admins really REALLY like this feature). Likewise, because this is filesystem-based, again this does not back up the partition tables nor the bootstraps. There are some "gotchas" with ZFS snapshots but those really depend on 1) how you're using them, and 2) your type of data. I won't go into #2, but others here have already mentioned it. For example, one bug that's been around for 3 years now has been if you prefer to navigate the snapshots as a filesystem and use the default filesystem attribute visibility=hidden (default) -- in this case "pwd" will return "No such file or directory" when within a snapshot. There are workarounds for this. Occasionally I see problems reported by people when using "zfs {send,recv}" and on (more rare) occasion issues with snapshot generation entirely. Most of the problems with the latter, however, have been worked out within stable/9 (so if you go the ZFS route, PLEASE PLEASE PLEASE run stable/9, not 9.1-RELEASE or earlier). There are also scripts in ports/sysutils to make management of ZFS snapshots much easier. Some write their own, others use those scripts. Also, because nobody seems to warn others of this: if you go the ZFS route on FreeBSD, please do not use features like dedup or compression. I can expand more on this if asked, as they have separate (and in one case identical/similar) caveats. (I'm always willing to bend on compression as long as the user knows of the one problem that still exists today and feels it's okay/acceptable) - Within the OS: rsync and/or rsnapshot (which uses rsync). These work at the file level (not filesystem, but file) -- think "copying all the files". They are known to be reliable, and can be used in conjunction with systems over a network (to back up from system X to system Y; default is via SSH). Naturally, this doesn't back up partition tables or bootstraps either. rsnapshot provides it's "snapshot-like" behaviour using hard links, which allows for incrementals in how it works (read about it on the web for further details -- not rocket science). But be aware "incremental" means "files that have been changed, added, or deleted", it doesn't mean "store/back up only the portions of a file that changed". I.e. if your MySQL table that's 2GB had a write done do it between the last snapshot and now, the incremental is going to back up an entire 2GB. That may be a drawback depending on what you're doing -- this is for your sysadmin to figure out. I have read of some problems relating to rsync when used with ZFS, but that seems to stem more from the amount of I/O being done and the type of data being used on the ZFS pool/filesystem, so rsync just happens to tickle something odd in those cases. I have never personally encountered this however (that's just me though), as I explain here: Real-world experience: rsnapshot is what I used for my hosting organisation of nearly 18 years to back-up 8 or 9 servers, nightly, across a network (gigE LAN). Those servers also used ZFS as their filesystems (for everything except root/var/tmp/usr), both the source being copied as well as the filesystem used to store the backups, and I only once had an issue during the early FreeBSD 8.x days (caused by a ZFS bug that has since been fixed). I still use rsync/rsnapshot to this day, even on my local system (which is ZFS-based barring root/var/tmp/usr -- I choose to use rsnapshot rather than ZFS snapshots for reasons I will not go into here as they're irrelevant). However, I would not use this method where ""snapshots"" need to be done very regularly (i.e. every hour), particularly on filesystems where there are either a) lots and lots of files, or b) files of immense size that change often. Filesystem snapshots are a better choice in that case. There are certainly other options available which I have not touched on, but in general the filesystem snapshot choice is probably your best bet. Filesystem snapshots have one other advantage that you might not have thought of: they're done within the OS, which means if Amazon's EBS stuff changes in such a way where you lose backwards compatibility or encounter bugs with it (during EBS snapshot generation), you can still get access to your data in some manner of speaking. I hope this has given you some details, avenues of choice, or at least things to ponder. Choose wisely, and remember: **ALWAYS DO A RESTORE TEST** when choosing a new backup strategy. I cannot tell you how many times I encounter people "doing backups" who never test a restore until that horrible day... only to find their backups were done wrong, or that the restore process (or even software!) is just utterly broken. -- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 01:40:18 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 0691A609; Thu, 4 Jul 2013 01:40:18 +0000 (UTC) (envelope-from berend@pobox.com) Received: from smtp.pobox.com (b-pb-sasl-quonix.pobox.com [208.72.237.35]) by mx1.freebsd.org (Postfix) with ESMTP id B8FE612A0; Thu, 4 Jul 2013 01:40:16 +0000 (UTC) Received: from smtp.pobox.com (unknown [127.0.0.1]) by b-sasl-quonix.pobox.com (Postfix) with ESMTP id 30A0E225E0; Thu, 4 Jul 2013 01:40:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=pobox.com; h=date :message-id:from:to:cc:subject:in-reply-to:references :mime-version:content-type:content-transfer-encoding; s=sasl; bh=MhighYhn0l1vncELBDnv3O0nVT8=; b=rnzFa7DBrAF35HgOelIkPTqvN8A3 ZkI49K+3zJ92WfpX9lbTEHTinkbEK5uj3/8tSmy0Ugm7d+2xzzjH8SG7JGlursES OSlEyDuu5aiUnq+EpHMVhOXPC79P/68mwOJEiaVJoDstOcgDUnPzSv5Iz/EkMbBT yW3dL/Y3surErp8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=pobox.com; h=date:message-id :from:to:cc:subject:in-reply-to:references:mime-version :content-type:content-transfer-encoding; q=dns; s=sasl; b=ZI61u3 F5lwJoczSpgAGBGnUw2DgRtt/VBdSQC/NbMq/Qwc/paO3j1zxbkt0xljqANuuBbQ VSoGls/Tdk9/hs75hsGKcOh8RZHU7tnN8q7cRfdhkDEtUTQFMtKCAYV7dwsbVOVI zSTViS14J658Uhown0cioTPk0FSxaLne1xxXA= Received: from b-pb-sasl-quonix.pobox.com (unknown [127.0.0.1]) by b-sasl-quonix.pobox.com (Postfix) with ESMTP id 2429A225DD; Thu, 4 Jul 2013 01:40:10 +0000 (UTC) Received: from bmach.nederware.nl (unknown [27.252.247.84]) by b-sasl-quonix.pobox.com (Postfix) with ESMTPA id 97D86225DC; Thu, 4 Jul 2013 01:40:09 +0000 (UTC) Received: from quadrio.nederware.nl (quadrio.nederware.nl [192.168.33.13]) by bmach.nederware.nl (Postfix) with ESMTP id B0CD95C57; Thu, 4 Jul 2013 13:40:02 +1200 (NZST) Received: from quadrio.nederware.nl (quadrio.nederware.nl [127.0.0.1]) by quadrio.nederware.nl (Postfix) with ESMTP id 497A549FB971; Thu, 4 Jul 2013 13:40:07 +1200 (NZST) Date: Thu, 04 Jul 2013 13:40:07 +1200 Message-ID: <8761wr3xxk.wl%berend@pobox.com> From: Berend de Boer To: Jeremy Chadwick Subject: Re: EBS snapshot backups from a FreeBSD zfs file system: zpool freeze? In-Reply-To: <20130704010815.GB75529@icarus.home.lan> References: <6488DECC-2455-4E92-B432-C39490D18484@dragondata.com> <871u7g57rl.wl%berend@pobox.com> <87mwq34emp.wl%berend@pobox.com> <20130703200241.GB60515@in-addr.com> <87k3l748gb.wl%berend@pobox.com> <20130703233631.GA74698@icarus.home.lan> <87d2qz42q4.wl%berend@pobox.com> <20130704010815.GB75529@icarus.home.lan> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL/10.8 EasyPG/1.0.0 Emacs/24.3 (i686-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) Organization: Xplain Technology Ltd MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: multipart/signed; boundary="pgp-sign-Multipart_Thu_Jul__4_13:40:06_2013-1"; micalg=pgp-sha256; protocol="application/pgp-signature" Content-Transfer-Encoding: 7bit X-Pobox-Relay-ID: A975EBE6-E44A-11E2-BFFF-E84251E3A03C-48001098!b-pb-sasl-quonix.pobox.com Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 01:40:18 -0000 --pgp-sign-Multipart_Thu_Jul__4_13:40:06_2013-1 Content-Type: text/plain; charset=US-ASCII >>>>> "Jeremy" == Jeremy Chadwick writes: Jeremy> Also, because nobody seems to warn others of this: if Jeremy> you go the ZFS route on FreeBSD, please do not use Jeremy> features like dedup or compression. Exactly the two reasons why I'm experimenting with FreeBSD on AWs. Please tell me more. -- All the best, Berend de Boer ------------------------------------------------------ Awesome Drupal hosting: https://www.xplainhosting.com/ --pgp-sign-Multipart_Thu_Jul__4_13:40:06_2013-1 Content-Type: application/pgp-signature Content-Transfer-Encoding: 7bit Content-Description: OpenPGP Digital Signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.17 (GNU/Linux) iQIcBAABCAAGBQJR1NJ3AAoJEKOfeD48G3g5zw4QAIJTjD0fjxCP84Z2oWQxDlon aoGrzE14Clqgo68UbdN62se1GTghs+jjGs2b6Hp12huXO2N3QWSfStFMdK2AKuhx SNyaUM0W1PpX1vqGmnuWYqUCFKb66PctuLKIwNdCWEvlBL18a2iuvjQ80P+iwZ5x 9q/UDtaDjX0/m2K7btKojhGp4MSQgWPHf5r21W3EAgZtMiWgdb88OM2Kw0/FGyzI Ii8OFzq+rUC8cr41esgZsrp0PjLDkmRfMcKTO3PnioWmdjxSeTqu8CRKMfN6LAO1 Ny5fODHZpjNJWjWG5mkfBGIB06djuWiA511BG6Y86zsZYqhr0cKaRv2IIv/ByBno KpRobEwsCAP/lK0fHmPfWuLbUKvg+/ERCq+0ZUGOchvdEvdMgwwdnQ0rqLO0I6Az Bvbc9cme5Vnm1JGSSi0MNfhWzv1QZ5iQ8W0BWF7j1/WTc98uqsiv+MjCtZTluaKV Acf/ebxgDbzhWhym2di+GIM/OYAjgD9WDPOs3bPEYTUSVbopxQS2DKjB/cb3iM+F X+UawZM7zwwF/W33NTAL0ZqxqZ9Ixt5XrrmV74jEwq8axJFtqmG1yrrQOIvkW/oi BjGBr0QT6lN+G2OMGM59J0kB8jgWG9CVKbftjjan4XvvSe1NQuFuZAaB+RdNB2Gu jEAy3zcaOmu/YKlukvKr =oHfx -----END PGP SIGNATURE----- --pgp-sign-Multipart_Thu_Jul__4_13:40:06_2013-1-- From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 02:15:51 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 6A6BFE14; Thu, 4 Jul 2013 02:15:51 +0000 (UTC) (envelope-from jdc@koitsu.org) Received: from relay4-d.mail.gandi.net (relay4-d.mail.gandi.net [217.70.183.196]) by mx1.freebsd.org (Postfix) with ESMTP id 0B2E513BC; Thu, 4 Jul 2013 02:15:50 +0000 (UTC) Received: from mfilter7-d.gandi.net (mfilter7-d.gandi.net [217.70.178.136]) by relay4-d.mail.gandi.net (Postfix) with ESMTP id C9A0F17209D; Thu, 4 Jul 2013 04:15:39 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at mfilter7-d.gandi.net Received: from relay4-d.mail.gandi.net ([217.70.183.196]) by mfilter7-d.gandi.net (mfilter7-d.gandi.net [10.0.15.180]) (amavisd-new, port 10024) with ESMTP id 4V3TvP6cfCUl; Thu, 4 Jul 2013 04:15:38 +0200 (CEST) X-Originating-IP: 76.102.14.35 Received: from jdc.koitsu.org (c-76-102-14-35.hsd1.ca.comcast.net [76.102.14.35]) (Authenticated sender: jdc@koitsu.org) by relay4-d.mail.gandi.net (Postfix) with ESMTPSA id 9973017209F; Thu, 4 Jul 2013 04:15:37 +0200 (CEST) Received: by icarus.home.lan (Postfix, from userid 1000) id B87E973A1C; Wed, 3 Jul 2013 19:15:35 -0700 (PDT) Date: Wed, 3 Jul 2013 19:15:35 -0700 From: Jeremy Chadwick To: Berend de Boer Subject: Re: EBS snapshot backups from a FreeBSD zfs file system: zpool freeze? Message-ID: <20130704021535.GA77546@icarus.home.lan> References: <871u7g57rl.wl%berend@pobox.com> <87mwq34emp.wl%berend@pobox.com> <20130703200241.GB60515@in-addr.com> <87k3l748gb.wl%berend@pobox.com> <20130703233631.GA74698@icarus.home.lan> <87d2qz42q4.wl%berend@pobox.com> <20130704010815.GB75529@icarus.home.lan> <8761wr3xxk.wl%berend@pobox.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <8761wr3xxk.wl%berend@pobox.com> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 02:15:51 -0000 On Thu, Jul 04, 2013 at 01:40:07PM +1200, Berend de Boer wrote: > >>>>> "Jeremy" == Jeremy Chadwick writes: > > > Jeremy> Also, because nobody seems to warn others of this: if > Jeremy> you go the ZFS route on FreeBSD, please do not use > Jeremy> features like dedup or compression. > > Exactly the two reasons why I'm experimenting with FreeBSD on AWs. > > Please tell me more. dedup has immense and crazy memory requirements; the commonly referenced model (which is in no way precise, it's just a general recommendation) is that for every 1TB of data you need 1GB of RAM just for the DDT (deduplication table)) -- understand that ZFS's ARC also eats lots of memory, so when I say 1GB of RAM, I'm talking about that being *purely dedicated* to DDT. But as I said the need varies depending on the type of data you have. When using dedup, the general attitude is "give ZFS as much memory as possible. Max your DIMM slots out with the biggest DIMMs the MCH can support". Many problems I have seen on the FreeBSD lists -- and one horror story on Solaris -- often pertain to people trying dedup. There have been reported issues with resilvering pools that use dedup, or even simply mounting filesystems using dedup. The situation when dedup is in use becomes significantly more complex in a "something is broken" scenario. The horror story I've heard and retell is this one, and this is me going off of memory: There was supposedly an Oracle customer who had been using dedup for some time, and they began to have problems (I don't remember what; if it was with ZFS, the controller, disks, or what). Anyway, the situation was such that the client needed to either resilver their pool, or just get their data -- but because they were using dedup, they could not. The system could not be upgraded to have more RAM (which would have alleviated the pains). The solution which was chosen was for Oracle to actually ship the customer an entire bare metal system with a gargantuan amount of RAM (hundreds of gigabytes; I often say 384GB because that's what sticks in my mind for some reason, maybe it was 192GB, doesn't matter), just to recover from the situation. compression is generally safe to use on FreeBSD, but there are often surprising changes to certain behaviours that people don't consider: the most common one I see reported is conflicting information between what "df", "du", and "zfs list" show. AFAIK this applies to Solaris/Illumos too, so it's just the nature of the beast. compression doesn't have the crazy memory requirements of dedup, obviously -- two separate things, don't confuse the two. :-) The final item is the one that, still to this day, keeps me from using either dedup or compression on FreeBSD (well actually I'd never consider dedup, only compression): system interactivity is destroyed when using either of these features. The system will regularly stall/lock up (depending on the I/O, for a few seconds) regularly, even at VGA console. This problem is specific to the FreeBSD port of ZFS as of this writing; Solaris/Illumos addressed this long ago. Rather than re-write it, I recommend you read my post from Feburary 2013 which references my convo with Bob Friesenhahn in October 2011 (please read all the quoted material too): http://lists.freebsd.org/pipermail/freebsd-stable/2013-February/072171.html Changing the compression scheme does not solve the issue; the less CPU-intensive schemes (ex. lzjb) help decrease the impact but do not solve it. All that said: there are people (often FreeNAS folks using their systems solely as a dedicated NAS, not as a shell server or desktop or other things) who do use these features happily and do not care about the last issue. Cool/great, I'm glad it works for them. But in my case it's not acceptable. If/when the above issue is addressed (putting the ZFS writer threads into their own priority/scheduling class), I look forward to using compression (but never dedup, I don't have the hardware/memory for that kind of thing). Otherwise please spend an afternoon looking through freebsd-fs and freebsd-stable lists over the past 2 years (see web archives) and reading about different stories/situations. I always, *always* advocate this. -- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 02:22:03 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id B42FFEE5 for ; Thu, 4 Jul 2013 02:22:03 +0000 (UTC) (envelope-from sean_bruno@yahoo.com) Received: from nm24-vm4.bullet.mail.gq1.yahoo.com (nm24-vm4.bullet.mail.gq1.yahoo.com [98.136.217.99]) by mx1.freebsd.org (Postfix) with ESMTP id 7A702146E for ; Thu, 4 Jul 2013 02:22:03 +0000 (UTC) Received: from [216.39.60.180] by nm24.bullet.mail.gq1.yahoo.com with NNFMP; 04 Jul 2013 02:21:57 -0000 Received: from [208.71.42.212] by tm16.bullet.mail.gq1.yahoo.com with NNFMP; 04 Jul 2013 02:21:57 -0000 Received: from [127.0.0.1] by smtp223.mail.gq1.yahoo.com with NNFMP; 04 Jul 2013 02:21:57 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1372904517; bh=x+uI5BbeiWxqpLYACzbB+nOTz73K8BgbS0sluFdUhAI=; h=X-Yahoo-Newman-Id:X-Yahoo-Newman-Property:X-YMail-OSG:X-Yahoo-SMTP:X-Rocket-Received:Subject:From:Reply-To:To:Cc:In-Reply-To:References:Content-Type:Date:Message-ID:Mime-Version:X-Mailer; b=o8C27jE58Zu8Da0aL5jWivXJ9EVunRjHQue5fEm0XU74jTYsYxstPGB34zHMqzqcBD/3YbbRs1SheLZLBDTAhLrbMNGtbbf5JeDgBUUz074aoyffA6mNW8sVBC5zcd8Pznrtr9ZjBAPjxFKZ9Tn4eYDvXvM51gVYjomJzUBjkDI= X-Yahoo-Newman-Id: 187898.3111.bm@smtp223.mail.gq1.yahoo.com X-Yahoo-Newman-Property: ymail-3 X-YMail-OSG: KQMycekVM1l8DUWEMy8_2G1hZvOAv1xc6d7uPWScIug9tKr wBVLcBYtTqx6M8O.ne_vYx9q8pL5s43iCrA1JmuZF2EjBYJU85ZzbK6IMVI7 66UrSA2JQK5fxYajxWSC4PhkY3ef6VYbNaENb0fy8Zbw6XmxQG_09a46wDOW KRYbeynAhvcg169OI.8Zb3yxwZie_Axuvlm9oFb_7_MDU2cXHl0jyywDksuS lr6Xe1MMl_dkyndEin5PUk_.WSGARB3s1eK9y9I4AKMCrCrN6YzrKbW5dtno fO8lR9PGq.vMj1iaZcMro2LOGN_kxsRB4WvbICQLsxRWZQfRZvl_jK_aDozL Astc32g.kMOCYvh1kjZ06cmbvNaoT5qRRO.0BXc.1amrA_.LmHCBGpgNDn3r D.aZTkwIWPDPiRrfYXNzAjLGQTy5eZVhwLVosba7aXobikJBhRYrEeqQZuwp ymb2GxOcPBh39AYqn7d7Utuwi0Ua_xTjKqbvMf_LEvL73p5Uoon8vEErkOxL SmxHXNl68wAnphlrY4g_xiDNbz_IYeq64_oDtVDJyJf1jSuve0zg- X-Yahoo-SMTP: u5BKR6OswBC_iZJVfGRoMkTIpc8pEA4- X-Rocket-Received: from [192.168.1.210] (sean_bruno@71.202.40.63 with ) by smtp223.mail.gq1.yahoo.com with SMTP; 04 Jul 2013 02:21:57 +0000 UTC Subject: Re: Dell R710 with PERC H310. GPT issues and hardware raid or ZFS? From: Sean Bruno To: Steven Hartland In-Reply-To: <12FB82C142E74924B27F4DBB642D0D1F@multiplay.co.uk> References: <51C5B4EE.3010601@growveg.net> <81EDE2A7D67241DAB985C38662DA57DD@multiplay.co.uk> <51C5EAB0.7040009@growveg.net> <20130622191619.GA73246@icarus.home.lan> <12FB82C142E74924B27F4DBB642D0D1F@multiplay.co.uk> Content-Type: multipart/signed; micalg="pgp-sha1"; protocol="application/pgp-signature"; boundary="=-91Ax1Ev7r2UteFQKAvyw" Date: Wed, 03 Jul 2013 19:21:56 -0700 Message-ID: <1372904516.1427.4.camel@localhost> Mime-Version: 1.0 X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: sbruno@freebsd.org List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 02:22:03 -0000 --=-91Ax1Ev7r2UteFQKAvyw Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable > Some mfi controllers actually have a JBOD option, if it does you'll > be able to this controller if not watch out as using the RAID 0 option > will almost certainly cause you pain. >=20 > mps as Jeremy has said is better option. >=20 > Regards > Steve I note that the H310 in a test R420 that I have does indeed have a JBOD option. This presents a disk as /dev/mfisyspdX which is totally befuddling. On our stable/9, we get crash and burns with kernel panics and data corruption, but I haven't pursued it enough to be productive. I did bump the machine to head and didn't see any difference. Sean --=-91Ax1Ev7r2UteFQKAvyw Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.13 (FreeBSD) iQEcBAABAgAGBQJR1Nw/AAoJEBkJRdwI6BaHONEH/2fy2g6CaQ3QJK/JnqgsO+1S kfBRCSTNe1ayOFrz59m6OfF9HJh5wFEgymNaRZZYFfPJVsNz+KovxofsakLj4lz+ MA4+8QdE9XUjhb4GM7y2jQ4sBuCYD9ghr0wtL0jXO+QsIwdMSCCRA6ilMw/bFms6 Az9ysnMi1hriBB0NFZ4E90ow1ZUl8HjcmfNIcyWSdrNuZtLOTdXJgbFZWZU2im2T ne/S0hNEbLHWxN5rXKnde9DeZe9THsFOSAJGM28tQZTwIMSgXygwrXcRTg0cbo/p bd+Y5rGJS/4ulAM6UB9CWVlqcvA0eJp7o40yQ6El0y2qm6I76lttHAS+73+/v7g= =3EGE -----END PGP SIGNATURE----- --=-91Ax1Ev7r2UteFQKAvyw-- From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 02:25:00 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id F3F8DF86; Thu, 4 Jul 2013 02:24:59 +0000 (UTC) (envelope-from berend@pobox.com) Received: from smtp.pobox.com (b-pb-sasl-quonix.pobox.com [208.72.237.35]) by mx1.freebsd.org (Postfix) with ESMTP id B07431497; Thu, 4 Jul 2013 02:24:59 +0000 (UTC) Received: from smtp.pobox.com (unknown [127.0.0.1]) by b-sasl-quonix.pobox.com (Postfix) with ESMTP id D105726109; Thu, 4 Jul 2013 02:24:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=pobox.com; h=date :message-id:from:to:cc:subject:in-reply-to:references :mime-version:content-type:content-transfer-encoding; s=sasl; bh=m1toXEkjdfRnsASEJp+OS9zNdyo=; b=Fd2FUO6plkiVLBcfVNqzoCir8KhZ Rx392BChKxHd2VVcCa/CGme9CUNLNoP5WosbCo/z1S7EUyEmgSQIB8FAEujVln6g xb4FBZY3mqiyAcsS22opgh0wXbVZFtIYbbiOI9fxjftycekNHiBXBGzM5apLrpXO ZS8TA5ZCQ2WIXCQ= DomainKey-Signature: a=rsa-sha1; c=nofws; d=pobox.com; h=date:message-id :from:to:cc:subject:in-reply-to:references:mime-version :content-type:content-transfer-encoding; q=dns; s=sasl; b=HCT79l RiXNHuljZ9aJM1WSCtb3bVXymgH3+Vo2pEYskw9sAxY6AjO52xHK6GxP5a6Gq5YB iYh9LHYUFAIaGtLFqY30lpItmCebdHRnQamIQgufKzLFjF09W+ePJbe8/0lo3hcF NLhmVRHclQXPz7QLago+4vEgJh5wmhwD0DJzQ= Received: from b-pb-sasl-quonix.pobox.com (unknown [127.0.0.1]) by b-sasl-quonix.pobox.com (Postfix) with ESMTP id BE21026108; Thu, 4 Jul 2013 02:24:51 +0000 (UTC) Received: from bmach.nederware.nl (unknown [27.252.247.84]) by b-sasl-quonix.pobox.com (Postfix) with ESMTPA id C4C9B260FB; Thu, 4 Jul 2013 02:24:50 +0000 (UTC) Received: from quadrio.nederware.nl (quadrio.nederware.nl [192.168.33.13]) by bmach.nederware.nl (Postfix) with ESMTP id E179D5C80; Thu, 4 Jul 2013 14:24:43 +1200 (NZST) Received: from quadrio.nederware.nl (quadrio.nederware.nl [127.0.0.1]) by quadrio.nederware.nl (Postfix) with ESMTP id 807EE49FB971; Thu, 4 Jul 2013 14:24:48 +1200 (NZST) Date: Thu, 04 Jul 2013 14:24:48 +1200 Message-ID: <8738rv3vv3.wl%berend@pobox.com> From: Berend de Boer To: Jeremy Chadwick Subject: Re: EBS snapshot backups from a FreeBSD zfs file system: zpool freeze? In-Reply-To: <20130704021535.GA77546@icarus.home.lan> References: <871u7g57rl.wl%berend@pobox.com> <87mwq34emp.wl%berend@pobox.com> <20130703200241.GB60515@in-addr.com> <87k3l748gb.wl%berend@pobox.com> <20130703233631.GA74698@icarus.home.lan> <87d2qz42q4.wl%berend@pobox.com> <20130704010815.GB75529@icarus.home.lan> <8761wr3xxk.wl%berend@pobox.com> <20130704021535.GA77546@icarus.home.lan> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL/10.8 EasyPG/1.0.0 Emacs/24.3 (i686-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) Organization: Xplain Technology Ltd MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: multipart/signed; boundary="pgp-sign-Multipart_Thu_Jul__4_14:24:48_2013-1"; micalg=pgp-sha256; protocol="application/pgp-signature" Content-Transfer-Encoding: 7bit X-Pobox-Relay-ID: E791C322-E450-11E2-B6C3-E84251E3A03C-48001098!b-pb-sasl-quonix.pobox.com Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 02:25:00 -0000 --pgp-sign-Multipart_Thu_Jul__4_14:24:48_2013-1 Content-Type: text/plain; charset=US-ASCII >>>>> "Jeremy" == Jeremy Chadwick writes: Jeremy> The solution which was chosen was for Oracle to actually Jeremy> ship the customer an entire bare metal system with a Jeremy> gargantuan amount of RAM (hundreds of gigabytes; I often Jeremy> say 384GB because that's what sticks in my mind for some Jeremy> reason, maybe it was 192GB, doesn't matter), just to Jeremy> recover from the situation. Yeah, well aware of the memory requirements. No problem. I see a lot of people here who are apparently unaware of AWS and why it is so far ahead of the pack. If I have a memory problem, I make an image of my machine, stop the old one, and boot up on a new machine with more memory. As simple as that. Takes my 5 minutes. 8GB not enough? 5 minutes later you boot up on 32GB without a sweat from your holiday vacation spot. 32GB not enough? What about 68GB? That's not enough? Why not 244GB? By that time your credit card is sweating, but hardware is trivially upgradable on EC2. No waiting for hardware to arrive, minimum down time, all customers remain happy. -- All the best, Berend de Boer ------------------------------------------------------ Awesome Drupal hosting: https://www.xplainhosting.com/ --pgp-sign-Multipart_Thu_Jul__4_14:24:48_2013-1 Content-Type: application/pgp-signature Content-Transfer-Encoding: 7bit Content-Description: OpenPGP Digital Signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.17 (GNU/Linux) iQIcBAABCAAGBQJR1NzwAAoJEKOfeD48G3g5AwkP/38pcPxPyq67OKIH8vmIJ6vT FQ7FNjHqTuXewJA4bkY2Y6DvefW/1/PWJeBgi9SCABkk5R7lCWuE93w2iqbOBaUT WUrdrKA2qDLnMXbhF2KQva/MvVJmr3oOmdXTMMeCvcWuZDqGSdG3tX1RJRmdXBN9 nW9W9OiGRezdrJCVut4+njZ0kuxYRxaP6f1YgyyCxYQp3YvgQn+1oRvAG2bbrQtl OWZWn1mNhh01QSfBMh8ocwd4Ev4FDS7dMHBYrx+rFzXF8rSFBNd2WgAeGn2ZtW13 eT8V3kB15u5e/XpeX+F5WvwDNbp8aBDM41hdG1F4/EsFVIJ0GTfbIdw7+2cRqH/W tD7gMZhQNNL6bwa6FS3F6G2Nt4tMV+StpBxzA92JCoGTpw15bcgT0eLx31NpIGU5 pHejNj4HIYHssBKtGDO/Cemp3ghG1+lg/cYS1XvYp6QyP3vXc6XO9aX8rHme0Voa XnaPoTm1+2Dj2QtpxaM/KXzxV1fFK3aDQqvBkS01cRhptH58yea0LU7l/TOaItKJ zTEAaaLS98pOMi9CfqTCaaJOeF31wH7bWQF3I0Wru7PBfbhJJ5Zk3lWanV2wOMoY EdKxV2nWt9IwNnvrA/+yfs4ePhlW0j6Hg0UMHX7MeVbVSlCqZOJyNXJ1xTcQvc2z bgSG/d65c2GEIruDl5qf =ybC6 -----END PGP SIGNATURE----- --pgp-sign-Multipart_Thu_Jul__4_14:24:48_2013-1-- From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 02:25:04 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 531D8F89 for ; Thu, 4 Jul 2013 02:25:04 +0000 (UTC) (envelope-from davidxu@freebsd.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id 455BA1498; Thu, 4 Jul 2013 02:25:04 +0000 (UTC) Received: from xyf.my.dom (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id r642P1AC015182; Thu, 4 Jul 2013 02:25:03 GMT (envelope-from davidxu@freebsd.org) Message-ID: <51D4DD23.5070706@freebsd.org> Date: Thu, 04 Jul 2013 10:25:39 +0800 From: David Xu User-Agent: Mozilla/5.0 (X11; FreeBSD i386; rv:17.0) Gecko/20130416 Thunderbird/17.0.5 MIME-Version: 1.0 To: Berend de Boer Subject: Re: EBS snapshot backups from a FreeBSD zfs file system: zpool freeze? References: <87li5o5tz2.wl%berend@pobox.com> <51D3C2F4.8010907@freebsd.org> <87y59o3sug.wl%berend@pobox.com> In-Reply-To: <87y59o3sug.wl%berend@pobox.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 02:25:04 -0000 On 2013/07/03 17:17, Berend de Boer wrote: >>>>>> "David" == David Xu writes: > > David> What you need is a tool to create snapshot on EBS server, > > FYI, EBS is not a server. It is NAS. It's block storage. > > -- > All the best, > > Berend de Boer > You can call it as NAS, but in pratical it is implemented as a server farm, we do have implemented a local EBS farm, normally, a command can be sent to a master server to request a snapshot for its client, the client can continuely writes to its iSCSI disk without being suspended by snapshot. HW snapshot is not a problem, because our clients uses journal file system, the snapshot even in inconsistent state will be recovered by its client OS when it is mounted. So our client does not need to support file system suspending or snapshot, only needs a journal file system. Regards, David Xu From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 02:39:59 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 21C3C1D4 for ; Thu, 4 Jul 2013 02:39:59 +0000 (UTC) (envelope-from fjwcash@gmail.com) Received: from mail-qe0-x233.google.com (mail-qe0-x233.google.com [IPv6:2607:f8b0:400d:c02::233]) by mx1.freebsd.org (Postfix) with ESMTP id D75F31632 for ; Thu, 4 Jul 2013 02:39:58 +0000 (UTC) Received: by mail-qe0-f51.google.com with SMTP id a11so503262qen.38 for ; Wed, 03 Jul 2013 19:39:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=uZbTuagP+qZsykkVAkbkZQvoe/zVlDj0G3kofD4qZVs=; b=BFzSAmSVRtxnLCAZ06NVTaBQ9HFi4EWoyUpL6W//phvW+ZYSAZXGA/lTsqQvvGE2ZJ RqngaM/IUr+fLe9h5Kfn2MjOJ78PfYJ5Lhy2Lv6Cmz7o3G9rKUoLLrpwRIjryF8wgGWl TbGNPrJJN/xpxYP/IU51GLJeh5ey6TDxCE6+KPCV+7k25ZMYze7L1/pnW3bwaaf7Hd1A Ho2Of/f4WNRaE0GIw6BEAsSiXVaTypA9ZK/do6tMvK3MPZlEzRVtG5YOVVA0xGiEa2uj 2lBDdk6gq96TGTsKR4q+DTTbZ0H4R/+cTEG1l8OXuZ8hRqPBE81S+omac4gCIkLuH+yA uw5Q== MIME-Version: 1.0 X-Received: by 10.229.196.73 with SMTP id ef9mr961404qcb.85.1372905598427; Wed, 03 Jul 2013 19:39:58 -0700 (PDT) Received: by 10.49.49.135 with HTTP; Wed, 3 Jul 2013 19:39:58 -0700 (PDT) Received: by 10.49.49.135 with HTTP; Wed, 3 Jul 2013 19:39:58 -0700 (PDT) In-Reply-To: <20130704021535.GA77546@icarus.home.lan> References: <871u7g57rl.wl%berend@pobox.com> <87mwq34emp.wl%berend@pobox.com> <20130703200241.GB60515@in-addr.com> <87k3l748gb.wl%berend@pobox.com> <20130703233631.GA74698@icarus.home.lan> <87d2qz42q4.wl%berend@pobox.com> <20130704010815.GB75529@icarus.home.lan> <8761wr3xxk.wl%berend@pobox.com> <20130704021535.GA77546@icarus.home.lan> Date: Wed, 3 Jul 2013 19:39:58 -0700 Message-ID: Subject: Re: EBS snapshot backups from a FreeBSD zfs file system: zpool freeze? From: Freddie Cash To: Jeremy Chadwick Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: FreeBSD Filesystems X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 02:39:59 -0000 On 2013-07-03 7:16 PM, "Jeremy Chadwick" wrote: > > On Thu, Jul 04, 2013 at 01:40:07PM +1200, Berend de Boer wrote: > > >>>>> "Jeremy" == Jeremy Chadwick writes: > > > > > > Jeremy> Also, because nobody seems to warn others of this: if > > Jeremy> you go the ZFS route on FreeBSD, please do not use > > Jeremy> features like dedup or compression. > > > > Exactly the two reasons why I'm experimenting with FreeBSD on AWs. > > > > Please tell me more. > > dedup has immense and crazy memory requirements; the commonly referenced > model (which is in no way precise, it's just a general recommendation) > is that for every 1TB of data you need 1GB of RAM just for the DDT > (deduplication table)) -- understand that ZFS's ARC also eats lots of > memory, so when I say 1GB of RAM, I'm talking about that being *purely > dedicated* to DDT. Correction: 1 GB of *ARC* space per TB of *unique* data in the pool. Each unique block in the pool gets an entry in the DDT. You can use L2ARC to store the DDT, although it takes ARC space to track data in L2ARC, so you can't go crazy (512 GB L2 with only 16 GB ARC is a no-no). However, you do need a lot of RAM to make dedupe work, and your I/O does drop through the floor. From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 02:51:16 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 07D675B6 for ; Thu, 4 Jul 2013 02:51:16 +0000 (UTC) (envelope-from jdc@koitsu.org) Received: from relay3-d.mail.gandi.net (relay3-d.mail.gandi.net [217.70.183.195]) by mx1.freebsd.org (Postfix) with ESMTP id B92CE169B for ; Thu, 4 Jul 2013 02:51:15 +0000 (UTC) Received: from mfilter3-d.gandi.net (mfilter3-d.gandi.net [217.70.178.133]) by relay3-d.mail.gandi.net (Postfix) with ESMTP id B01C6A80CA; Thu, 4 Jul 2013 04:51:04 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at mfilter3-d.gandi.net Received: from relay3-d.mail.gandi.net ([217.70.183.195]) by mfilter3-d.gandi.net (mfilter3-d.gandi.net [10.0.15.180]) (amavisd-new, port 10024) with ESMTP id 9vMLkqfAvCKQ; Thu, 4 Jul 2013 04:51:03 +0200 (CEST) X-Originating-IP: 76.102.14.35 Received: from jdc.koitsu.org (c-76-102-14-35.hsd1.ca.comcast.net [76.102.14.35]) (Authenticated sender: jdc@koitsu.org) by relay3-d.mail.gandi.net (Postfix) with ESMTPSA id D3659A80C7; Thu, 4 Jul 2013 04:51:02 +0200 (CEST) Received: by icarus.home.lan (Postfix, from userid 1000) id 0220373A1E; Wed, 3 Jul 2013 19:51:01 -0700 (PDT) Date: Wed, 3 Jul 2013 19:51:00 -0700 From: Jeremy Chadwick To: Freddie Cash Subject: Re: EBS snapshot backups from a FreeBSD zfs file system: zpool freeze? Message-ID: <20130704025100.GB78374@icarus.home.lan> References: <87mwq34emp.wl%berend@pobox.com> <20130703200241.GB60515@in-addr.com> <87k3l748gb.wl%berend@pobox.com> <20130703233631.GA74698@icarus.home.lan> <87d2qz42q4.wl%berend@pobox.com> <20130704010815.GB75529@icarus.home.lan> <8761wr3xxk.wl%berend@pobox.com> <20130704021535.GA77546@icarus.home.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Cc: FreeBSD Filesystems X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 02:51:16 -0000 On Wed, Jul 03, 2013 at 07:39:58PM -0700, Freddie Cash wrote: > On 2013-07-03 7:16 PM, "Jeremy Chadwick" wrote: > > > > On Thu, Jul 04, 2013 at 01:40:07PM +1200, Berend de Boer wrote: > > > >>>>> "Jeremy" == Jeremy Chadwick writes: > > > > > > > > > Jeremy> Also, because nobody seems to warn others of this: if > > > Jeremy> you go the ZFS route on FreeBSD, please do not use > > > Jeremy> features like dedup or compression. > > > > > > Exactly the two reasons why I'm experimenting with FreeBSD on AWs. > > > > > > Please tell me more. > > > > dedup has immense and crazy memory requirements; the commonly referenced > > model (which is in no way precise, it's just a general recommendation) > > is that for every 1TB of data you need 1GB of RAM just for the DDT > > (deduplication table)) -- understand that ZFS's ARC also eats lots of > > memory, so when I say 1GB of RAM, I'm talking about that being *purely > > dedicated* to DDT. > > Correction: 1 GB of *ARC* space per TB of *unique* data in the pool. Each > unique block in the pool gets an entry in the DDT. > > You can use L2ARC to store the DDT, although it takes ARC space to track > data in L2ARC, so you can't go crazy (512 GB L2 with only 16 GB ARC is a > no-no). > > However, you do need a lot of RAM to make dedupe work, and your I/O does > drop through the floor. Thanks Freddie -- I didn't know this (re: ARC space per TB of unique data); wasn't aware that's where the DDT got placed. (Actually makes sense now that I think about it...) -- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 04:15:48 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 4A29E9FA; Thu, 4 Jul 2013 04:15:48 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) by mx1.freebsd.org (Postfix) with ESMTP id B62EC1C17; Thu, 4 Jul 2013 04:15:47 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.7/8.14.7) with ESMTP id r644FdjJ031978; Thu, 4 Jul 2013 07:15:39 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.8.3 kib.kiev.ua r644FdjJ031978 Received: (from kostik@localhost) by tom.home (8.14.7/8.14.7/Submit) id r644FdGQ031977; Thu, 4 Jul 2013 07:15:39 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Thu, 4 Jul 2013 07:15:39 +0300 From: Konstantin Belousov To: Steven Hartland Subject: Re: kern/180236: [zfs] [nullfs] Leakage free space using ZFS with nullfs on 9.1-STABLE Message-ID: <20130704041539.GE91021@kib.kiev.ua> References: <201307040000.r64001v6076818@freefall.freebsd.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="Ra0fXv3AZ0rgmm73" Content-Disposition: inline In-Reply-To: <201307040000.r64001v6076818@freefall.freebsd.org> User-Agent: Mutt/1.5.21 (2010-09-15) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home Cc: freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 04:15:48 -0000 --Ra0fXv3AZ0rgmm73 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Jul 04, 2013 at 12:00:01AM +0000, Steven Hartland wrote: > The following reply was made to PR kern/180236; it has been noted by GNAT= S. >=20 > From: "Steven Hartland" > To: , > "Ivan Klymenko" > Cc: =20 > Subject: Re: kern/180236: [zfs] [nullfs] Leakage free space using ZFS wit= h nullfs on 9.1-STABLE > Date: Thu, 4 Jul 2013 00:58:13 +0100 >=20 > Looks like nullfs isn't cleaning up correctly in the case > where a rename colides with an existing file hence results > in an implicit remove. > =20 > This can be seen in the zdb output for the volume in that > before the unmount all the plain file entries still exist > but after the unmount of nullfs they are gone. Can you demonstrate the scenario of the problem, e.g. using the basic filesystem commands, like cp(1), mv(1) ? Does the issue reproduce on UFS ? --Ra0fXv3AZ0rgmm73 Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.20 (FreeBSD) iQIcBAEBAgAGBQJR1PbqAAoJEJDCuSvBvK1BDCYQAJSg0HnLLVtNiNzCsvX38w47 Y3My8bKkunwltQAc3bYF+16tVoeOX1XTeCnlH0WxMaWz/O6jwcB2SJGapeZ/YmjC AXkj0nHUw267vJUQzesgcPrOD3LhXUnuEO5K4ULVV6Oy3MVhKkCEAJ4gcgLn2cHF HxJZilag68kAFgrXzRav5o5muQZM2PVovenYZVN98e/TpA5kojBNILZ6fR7p5fjs G39m2TnHT/OLX5QHUukG08OzEugU/vV37aZoYCuuA7CouLaCwGg5pf4ETHj/JPx+ Eadk0cSIah4IdmKbAu6D1qnwKKBEY/YlbKb0draBOhUBTVWv0KjsjOlRTHedk1vv dlB+tZ0nGDrM1bdLNtahLHF3rgP7Qe1nYngQiIfKfoQkOFzJ1d1U9nA85zbq3wDX 8eoajQbpITaJD2ZX/xKXcgX1yPDQj21EZF3/nP0QCOLhLGVM7YRcNCB/iaNPi+j9 3RqOjwNr3zRZYS7s/yM6kYR0+CF/4/YEqFoqHKGx8pzy5ydiPf7OkleMylDuZT8c T6KBcWIDVQzI5yQhx6v9Bq58gDAeI3lUi48WE7KPfV2Dvkgc4MkyFHqIvod8ylmx dTr/j1bBy0FWc74PKi5Vdv0BF3zNaoLlO3v1p/VynoE7BfRuK3PQKqsd2O4B9tDi DIjsQs3Qfqp/Om+4i3VR =C6b+ -----END PGP SIGNATURE----- --Ra0fXv3AZ0rgmm73-- From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 07:31:42 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 60278E2 for ; Thu, 4 Jul 2013 07:31:42 +0000 (UTC) (envelope-from julian@freebsd.org) Received: from vps1.elischer.org (vps1.elischer.org [204.109.63.16]) by mx1.freebsd.org (Postfix) with ESMTP id 33F3212FF for ; Thu, 4 Jul 2013 07:31:41 +0000 (UTC) Received: from jre-mbp.elischer.org (ppp121-45-226-51.lns20.per1.internode.on.net [121.45.226.51]) (authenticated bits=0) by vps1.elischer.org (8.14.5/8.14.5) with ESMTP id r647VR2Q093003 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Thu, 4 Jul 2013 00:31:32 -0700 (PDT) (envelope-from julian@freebsd.org) Message-ID: <51D524C9.1030700@freebsd.org> Date: Thu, 04 Jul 2013 15:31:21 +0800 From: Julian Elischer User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:17.0) Gecko/20130620 Thunderbird/17.0.7 MIME-Version: 1.0 To: Berend de Boer Subject: Re: EBS snapshot backups from a FreeBSD zfs file system: zpool freeze? References: <87ehbg5raq.wl%berend@pobox.com> <20130703055047.GA54853@icarus.home.lan> <6488DECC-2455-4E92-B432-C39490D18484@dragondata.com> <871u7g57rl.wl%berend@pobox.com> <87mwq34emp.wl%berend@pobox.com> <20130703200241.GB60515@in-addr.com> <87k3l748gb.wl%berend@pobox.com> <20130703233631.GA74698@icarus.home.lan> <87d2qz42q4.wl%berend@pobox.com> In-Reply-To: <87d2qz42q4.wl%berend@pobox.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 07:31:42 -0000 On 7/4/13 7:56 AM, Berend de Boer wrote: >>>>>> "Jeremy" == Jeremy Chadwick writes: > Jeremy> As politely as I can: It sounds like you may have spent > Jeremy> too much time with these types of setups, or believe them > Jeremy> to be "magical" in some way, in turn forgetting the > Jeremy> realities of bare metal and instead thinking "everything > Jeremy> is software". Bzzt. > > Heh. The solution with Amazon is even worse: if things go wrong, > you're screwed. Can't get your disks back. You can't call > anyone. There's no bare metal to touch, and no, they won't let you > into their data centres. > > So I'm actually trying to avoid the magic. > > The only guarantee I basically have is that if I have made an EBS > snapshot of my disk, I can, one day, restore that, and that this > snapshot is stored in some multi-redundancy (magic!) cloud. > > (And obviously you can try to run a mirror in another data centre > using zfs send/recv, yes, will run that too). > > If you go with AWS, there are no phone calls to make. Disk gone is > disk gone. So you need to have working backup strategies in place. put your data on multiple data centers using a panzura box > > -- > All the best, > > Berend de Boer > > > ------------------------------------------------------ > Awesome Drupal hosting: https://www.xplainhosting.com/ > From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 08:02:55 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 07D4A9BD for ; Thu, 4 Jul 2013 08:02:55 +0000 (UTC) (envelope-from daniel@digsys.bg) Received: from smtp-sofia.digsys.bg (smtp-sofia.digsys.bg [193.68.21.123]) by mx1.freebsd.org (Postfix) with ESMTP id 67B5A15DB for ; Thu, 4 Jul 2013 08:02:54 +0000 (UTC) Received: from dcave.digsys.bg (dcave.digsys.bg [193.68.6.1]) (authenticated bits=0) by smtp-sofia.digsys.bg (8.14.6/8.14.6) with ESMTP id r6482p5M004700 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO) for ; Thu, 4 Jul 2013 11:02:51 +0300 (EEST) (envelope-from daniel@digsys.bg) Message-ID: <51D52C2B.5050906@digsys.bg> Date: Thu, 04 Jul 2013 11:02:51 +0300 From: Daniel Kalchev User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130627 Thunderbird/17.0.7 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Re: EBS snapshot backups from a FreeBSD zfs file system: zpool freeze? References: <87mwq34emp.wl%berend@pobox.com> <20130703200241.GB60515@in-addr.com> <87k3l748gb.wl%berend@pobox.com> <20130703233631.GA74698@icarus.home.lan> <87d2qz42q4.wl%berend@pobox.com> <20130704010815.GB75529@icarus.home.lan> <8761wr3xxk.wl%berend@pobox.com> <20130704021535.GA77546@icarus.home.lan> <20130704025100.GB78374@icarus.home.lan> In-Reply-To: <20130704025100.GB78374@icarus.home.lan> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 08:02:55 -0000 On 04.07.13 05:51, Jeremy Chadwick wrote: > On Wed, Jul 03, 2013 at 07:39:58PM -0700, Freddie Cash wrote: >> On 2013-07-03 7:16 PM, "Jeremy Chadwick" wrote: >>> On Thu, Jul 04, 2013 at 01:40:07PM +1200, Berend de Boer wrote: >>>>>>>>> "Jeremy" == Jeremy Chadwick writes: >>>> >>>> Jeremy> Also, because nobody seems to warn others of this: if >>>> Jeremy> you go the ZFS route on FreeBSD, please do not use >>>> Jeremy> features like dedup or compression. >>>> >>>> Exactly the two reasons why I'm experimenting with FreeBSD on AWs. >>>> >>>> Please tell me more. >>> dedup has immense and crazy memory requirements; the commonly referenced >>> model (which is in no way precise, it's just a general recommendation) >>> is that for every 1TB of data you need 1GB of RAM just for the DDT >>> (deduplication table)) -- understand that ZFS's ARC also eats lots of >>> memory, so when I say 1GB of RAM, I'm talking about that being *purely >>> dedicated* to DDT. >> Correction: 1 GB of *ARC* space per TB of *unique* data in the pool. Each >> unique block in the pool gets an entry in the DDT. >> >> You can use L2ARC to store the DDT, although it takes ARC space to track >> data in L2ARC, so you can't go crazy (512 GB L2 with only 16 GB ARC is a >> no-no). >> >> However, you do need a lot of RAM to make dedupe work, and your I/O does >> drop through the floor. > Thanks Freddie -- I didn't know this (re: ARC space per TB of unique > data); wasn't aware that's where the DDT got placed. (Actually makes > sense now that I think about it...) > The really bad thing about this is that the DDT actually competes with everything else in ARC. You don't want to arrive at the point where you trash the ARC with DDT... ZFS with dedup is really "handy" for an non-interactive storage box, such as an archive server. Mine get over 10x dedup ratio and that means I fit the data in 24 disks instead of 240 disks... Extra RAM and L2ARC is well worth the cost and the drop in performance. If you need higher performance form the storage subsystem though, ignore both dedup and compression. Even if they are bug free, some day. Which brings us back to AWS. I believe AWS will charge for CPU time too, which you will happily waste with both dedup and compression. Yet another reason to avoid it, unless block storage (unlikely) is more expensive. Daniel From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 08:06:47 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id A4920B51 for ; Thu, 4 Jul 2013 08:06:47 +0000 (UTC) (envelope-from prvs=1897fed9ea=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id D30BC1604 for ; Thu, 4 Jul 2013 08:06:46 +0000 (UTC) Received: from r2d2 ([82.69.141.170]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50004718274.msg for ; Thu, 04 Jul 2013 09:06:44 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Thu, 04 Jul 2013 09:06:44 +0100 (not processed: message from valid local sender) X-MDDKIM-Result: neutral (mail1.multiplay.co.uk) X-MDRemoteIP: 82.69.141.170 X-Return-Path: prvs=1897fed9ea=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-fs@freebsd.org Message-ID: <36872A46E9BE40688B8F59FD05D4ECE9@multiplay.co.uk> From: "Steven Hartland" To: "Berend de Boer" , "Jeremy Chadwick" References: <6488DECC-2455-4E92-B432-C39490D18484@dragondata.com> <871u7g57rl.wl%berend@pobox.com> <87mwq34emp.wl%berend@pobox.com> <20130703200241.GB60515@in-addr.com> <87k3l748gb.wl%berend@pobox.com> <20130703233631.GA74698@icarus.home.lan> <87d2qz42q4.wl%berend@pobox.com> <20130704010815.GB75529@icarus.home.lan> <8761wr3xxk.wl%berend@pobox.com> Subject: Re: EBS snapshot backups from a FreeBSD zfs file system: zpool freeze? Date: Thu, 4 Jul 2013 09:06:57 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 08:06:47 -0000 ----- Original Message ----- From: "Berend de Boer" Jeremy> Also, because nobody seems to warn others of this: if Jeremy> you go the ZFS route on FreeBSD, please do not use Jeremy> features like dedup or compression. While dedup is memory and sometimes cpu hungry, so HW spec should be considered before using it, compression is not so and I've not seen any valid reason not to use it should it fit your uses. We actually use compression extensivily here and we've had nothing but positive results from it so sounds like FUD to me. Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 08:13:12 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 1A442D65; Thu, 4 Jul 2013 08:13:12 +0000 (UTC) (envelope-from prvs=1897fed9ea=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 8FC81164A; Thu, 4 Jul 2013 08:13:11 +0000 (UTC) Received: from r2d2 ([82.69.141.170]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50004718338.msg; Thu, 04 Jul 2013 09:13:09 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Thu, 04 Jul 2013 09:13:09 +0100 (not processed: message from valid local sender) X-MDDKIM-Result: neutral (mail1.multiplay.co.uk) X-MDRemoteIP: 82.69.141.170 X-Return-Path: prvs=1897fed9ea=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk Message-ID: <201D836AD9A843D089DCAF0A09A707F9@multiplay.co.uk> From: "Steven Hartland" To: References: <51C5B4EE.3010601@growveg.net> <81EDE2A7D67241DAB985C38662DA57DD@multiplay.co.uk> <51C5EAB0.7040009@growveg.net> <20130622191619.GA73246@icarus.home.lan> <12FB82C142E74924B27F4DBB642D0D1F@multiplay.co.uk> <1372904516.1427.4.camel@localhost> Subject: Re: Dell R710 with PERC H310. GPT issues and hardware raid or ZFS? Date: Thu, 4 Jul 2013 09:13:20 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 08:13:12 -0000 ----- Original Message ----- From: "Sean Bruno" > > Some mfi controllers actually have a JBOD option, if it does you'll > > be able to this controller if not watch out as using the RAID 0 option > will almost certainly cause you pain. > > > > mps as Jeremy has said is better option. > > I note that the H310 in a test R420 that I have does indeed have a JBOD > option. This presents a disk as /dev/mfisyspdX which is totally > befuddling. > > On our stable/9, we get crash and burns with kernel panics and data > corruption, but I haven't pursued it enough to be productive. I did > bump the machine to head and didn't see any difference. I've not used stable/9 myself but there are know issues with older mfi FW which causes TIMEOUTS which then results in panics on UFS. The fix is to upgrade your FW. In addition I've backed out the forced timeout on requests which was MFC'ed to stable/9 in r252554: http://svnweb.freebsd.org/base?view=revision&revision=252554 Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 08:22:24 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id EE0B128D for ; Thu, 4 Jul 2013 08:22:24 +0000 (UTC) (envelope-from jdc@koitsu.org) Received: from relay3-d.mail.gandi.net (relay3-d.mail.gandi.net [217.70.183.195]) by mx1.freebsd.org (Postfix) with ESMTP id A9B3516C5 for ; Thu, 4 Jul 2013 08:22:24 +0000 (UTC) Received: from mfilter16-d.gandi.net (mfilter16-d.gandi.net [217.70.178.144]) by relay3-d.mail.gandi.net (Postfix) with ESMTP id 0C031A80FB; Thu, 4 Jul 2013 10:22:13 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at mfilter16-d.gandi.net Received: from relay3-d.mail.gandi.net ([217.70.183.195]) by mfilter16-d.gandi.net (mfilter16-d.gandi.net [10.0.15.180]) (amavisd-new, port 10024) with ESMTP id J8xzFpILU4Yf; Thu, 4 Jul 2013 10:22:11 +0200 (CEST) X-Originating-IP: 76.102.14.35 Received: from jdc.koitsu.org (c-76-102-14-35.hsd1.ca.comcast.net [76.102.14.35]) (Authenticated sender: jdc@koitsu.org) by relay3-d.mail.gandi.net (Postfix) with ESMTPSA id 2B5C8A80E9; Thu, 4 Jul 2013 10:22:11 +0200 (CEST) Received: by icarus.home.lan (Postfix, from userid 1000) id 576CA73A1C; Thu, 4 Jul 2013 01:22:09 -0700 (PDT) Date: Thu, 4 Jul 2013 01:22:09 -0700 From: Jeremy Chadwick To: Steven Hartland Subject: Re: EBS snapshot backups from a FreeBSD zfs file system: zpool freeze? Message-ID: <20130704082209.GB83766@icarus.home.lan> References: <871u7g57rl.wl%berend@pobox.com> <87mwq34emp.wl%berend@pobox.com> <20130703200241.GB60515@in-addr.com> <87k3l748gb.wl%berend@pobox.com> <20130703233631.GA74698@icarus.home.lan> <87d2qz42q4.wl%berend@pobox.com> <20130704010815.GB75529@icarus.home.lan> <8761wr3xxk.wl%berend@pobox.com> <36872A46E9BE40688B8F59FD05D4ECE9@multiplay.co.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <36872A46E9BE40688B8F59FD05D4ECE9@multiplay.co.uk> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 08:22:25 -0000 On Thu, Jul 04, 2013 at 09:06:57AM +0100, Steven Hartland wrote: > ----- Original Message ----- From: "Berend de Boer" > > Jeremy> Also, because nobody seems to warn others of this: if > Jeremy> you go the ZFS route on FreeBSD, please do not use > Jeremy> features like dedup or compression. > > While dedup is memory and sometimes cpu hungry, so HW spec > should be considered before using it, compression is not so > and I've not seen any valid reason not to use it should it > fit your uses. > > We actually use compression extensivily here and we've > had nothing but positive results from it so sounds like > FUD to me. The problem with the lack of separate and prioritised write threads for dedup and compression, thus causing interactivity stalls, is not FUD, it's fact. I explained this in the part of my reply to Berend which you omitted, which included the proof and acknowledgement from folks who are in-the-know (Bob Friesenhahn). :/ Nobody has told me "yeah that got fixed", so there is no reason for me to believe anything has changed. If a person considering use of compression on FreeBSD ZFS doesn't mind that problem, then by all means use it. It doesn't change the fact that there's an issue, and one that folks should be made aware of up front. It's not spreading FUD: it's spreading knowledge of a certain behaviour that differs between FreeBSD and Solaris/Illumos. The issue is a deal-breaker for me; if it's not for you, great. -- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 08:36:49 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 8E0C55E6; Thu, 4 Jul 2013 08:36:49 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) by mx1.freebsd.org (Postfix) with ESMTP id 2C4F71756; Thu, 4 Jul 2013 08:36:48 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.7/8.14.7) with ESMTP id r648afbE089141; Thu, 4 Jul 2013 11:36:41 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.8.3 kib.kiev.ua r648afbE089141 Received: (from kostik@localhost) by tom.home (8.14.7/8.14.7/Submit) id r648afRl089140; Thu, 4 Jul 2013 11:36:41 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Thu, 4 Jul 2013 11:36:41 +0300 From: Konstantin Belousov To: Steven Hartland Subject: Re: kern/180236: [zfs] [nullfs] Leakage free space using ZFS with nullfs on 9.1-STABLE Message-ID: <20130704083641.GK91021@kib.kiev.ua> References: <201307040000.r64001v6076818@freefall.freebsd.org> <20130704041539.GE91021@kib.kiev.ua> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="pSOzL3v+cS6lyd4U" Content-Disposition: inline In-Reply-To: <20130704041539.GE91021@kib.kiev.ua> User-Agent: Mutt/1.5.21 (2010-09-15) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home Cc: freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 08:36:49 -0000 --pSOzL3v+cS6lyd4U Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Jul 04, 2013 at 07:15:39AM +0300, Konstantin Belousov wrote: > On Thu, Jul 04, 2013 at 12:00:01AM +0000, Steven Hartland wrote: > > The following reply was made to PR kern/180236; it has been noted by GN= ATS. > >=20 > > From: "Steven Hartland" > > To: , > > "Ivan Klymenko" > > Cc: =20 > > Subject: Re: kern/180236: [zfs] [nullfs] Leakage free space using ZFS w= ith nullfs on 9.1-STABLE > > Date: Thu, 4 Jul 2013 00:58:13 +0100 > >=20 > > Looks like nullfs isn't cleaning up correctly in the case > > where a rename colides with an existing file hence results > > in an implicit remove. > > =20 > > This can be seen in the zdb output for the volume in that > > before the unmount all the plain file entries still exist > > but after the unmount of nullfs they are gone. >=20 > Can you demonstrate the scenario of the problem, e.g. using the basic > filesystem commands, like cp(1), mv(1) ? Does the issue reproduce > on UFS ? Ok, the following patch fixed the nullfs leakage for me (I tested over the UFS). diff --git a/sys/fs/nullfs/null_vnops.c b/sys/fs/nullfs/null_vnops.c index 6ff15ee..70402e3 100644 --- a/sys/fs/nullfs/null_vnops.c +++ b/sys/fs/nullfs/null_vnops.c @@ -554,6 +554,7 @@ null_rename(struct vop_rename_args *ap) struct vnode *fvp =3D ap->a_fvp; struct vnode *fdvp =3D ap->a_fdvp; struct vnode *tvp =3D ap->a_tvp; + struct null_node *tnn; =20 /* Check for cross-device rename. */ if ((fvp->v_mount !=3D tdvp->v_mount) || @@ -568,7 +569,11 @@ null_rename(struct vop_rename_args *ap) vrele(fvp); return (EXDEV); } -=09 + + if (tvp !=3D NULL) { + tnn =3D VTONULL(tvp); + tnn->null_flags |=3D NULLV_DROP; + } return (null_bypass((struct vop_generic_args *)ap)); } =20 --pSOzL3v+cS6lyd4U Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.20 (FreeBSD) iQIcBAEBAgAGBQJR1TQYAAoJEJDCuSvBvK1BLM0P/RH3l/YrW66st4j+KyukGcn7 6p57DZVhVPjPLNNlDBamlk94rwgSq5dQ4pIl1o7U+NVunjyopv4kobzfrgYC1cdW rOvYfoTC1VPchiETmoCtZeJgNzXGIuGkotR3Kh8ELeN83mJAeAiyCTmyt4S5Axly BIsjEbyghXruoq7NPZUA9Rqm4Rgn2YrF5PDwB0X12AY7u+qtuxALjc80TUyfeD9z 5P0iuHizjVGRmkCYJxHFYPHI/ggiZEZjC+oBQkC7gHuPeUpeT62mkrmL6UXokmdi ScV2ibI87byZu9nC7PEiD8eNsj9tYCmdcuVs9bDZOnrXVvS5lQmDFVmO97WrMZdo FLtfKwapN2lR0lguVuta785wrSkSzyMN+vLFjacJTgZc/vHEHHhPSFrrY/02ZZGJ jOj5DjbWGjobNzkJSpCg6ZDKr13F6TaW9XkTpCI9+w+wmLw19vv0KS5ULGAmBLi5 U9to+opTC0c7fqOYsD7E87ef9P5qm6v8ON/+OT8Q746mlSKDxO5gmflHPo107QHk IsLUEEVIuFksbHP5u1Iob4Uj6OxRgC39eRaw0ePbYMYbE+S5iZpTAUgiqkhNNVLr wQnjWEnzriyehRcW8xGbkd8ju3xq24w76t6bfPN5xTKXAQam5Dzg8KJTOWg97By8 NPMeC4sH3/FlB7GmxqCD =jkkk -----END PGP SIGNATURE----- --pSOzL3v+cS6lyd4U-- From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 08:47:23 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 49DB08C7 for ; Thu, 4 Jul 2013 08:47:23 +0000 (UTC) (envelope-from prvs=1897fed9ea=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id CE98017D2 for ; Thu, 4 Jul 2013 08:47:22 +0000 (UTC) Received: from r2d2 ([82.69.141.170]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50004718670.msg for ; Thu, 04 Jul 2013 09:47:16 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Thu, 04 Jul 2013 09:47:16 +0100 (not processed: message from valid local sender) X-MDDKIM-Result: neutral (mail1.multiplay.co.uk) X-MDRemoteIP: 82.69.141.170 X-Return-Path: prvs=1897fed9ea=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-fs@freebsd.org Message-ID: <98367DD8A8E34B75906FD46655AD74CF@multiplay.co.uk> From: "Steven Hartland" To: "Jeremy Chadwick" References: <871u7g57rl.wl%berend@pobox.com> <87mwq34emp.wl%berend@pobox.com> <20130703200241.GB60515@in-addr.com> <87k3l748gb.wl%berend@pobox.com> <20130703233631.GA74698@icarus.home.lan> <87d2qz42q4.wl%berend@pobox.com> <20130704010815.GB75529@icarus.home.lan> <8761wr3xxk.wl%berend@pobox.com> <36872A46E9BE40688B8F59FD05D4ECE9@multiplay.co.uk> <20130704082209.GB83766@icarus.home.lan> Subject: Re: EBS snapshot backups from a FreeBSD zfs file system: zpool freeze? Date: Thu, 4 Jul 2013 09:47:26 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 08:47:23 -0000 ----- Original Message ----- From: "Jeremy Chadwick" To: "Steven Hartland" Cc: "Berend de Boer" ; "freebsd-fs" Sent: Thursday, July 04, 2013 9:22 AM Subject: Re: EBS snapshot backups from a FreeBSD zfs file system: zpool freeze? > On Thu, Jul 04, 2013 at 09:06:57AM +0100, Steven Hartland wrote: >> ----- Original Message ----- From: "Berend de Boer" >> >> Jeremy> Also, because nobody seems to warn others of this: if >> Jeremy> you go the ZFS route on FreeBSD, please do not use >> Jeremy> features like dedup or compression. >> >> While dedup is memory and sometimes cpu hungry, so HW spec >> should be considered before using it, compression is not so >> and I've not seen any valid reason not to use it should it >> fit your uses. >> >> We actually use compression extensivily here and we've >> had nothing but positive results from it so sounds like >> FUD to me. > > The problem with the lack of separate and prioritised write threads for > dedup and compression, thus causing interactivity stalls, is not FUD, > it's fact. I explained this in the part of my reply to Berend which you > omitted, which included the proof and acknowledgement from folks who > are in-the-know (Bob Friesenhahn). :/ Nobody has told me "yeah that > got fixed", so there is no reason for me to believe anything has > changed. Do you have an links to the discuss on this Jeremy as I'd be intereted to read up on the this when I have some spare time? > If a person considering use of compression on FreeBSD ZFS doesn't mind > that problem, then by all means use it. It doesn't change the fact that > there's an issue, and one that folks should be made aware of up front. > It's not spreading FUD: it's spreading knowledge of a certain behaviour > that differs between FreeBSD and Solaris/Illumos. The issue is a > deal-breaker for me; if it's not for you, great. Sounds like it could well be use case based then, as we've not had any problems compression causing interactively problems. Quite the opposite in fact, the reduced physical IO that compression results in improved interactivity. So I guess its like everything, one size doesn't fit all, so temporing statements about blanket avoiding the these features seems like the way to go :) Regards Steve Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 09:44:47 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id C9A11AA3 for ; Thu, 4 Jul 2013 09:44:47 +0000 (UTC) (envelope-from prvs=1897fed9ea=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 6F6DD1B20 for ; Thu, 4 Jul 2013 09:44:47 +0000 (UTC) Received: from r2d2 ([82.69.141.170]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50004719264.msg for ; Thu, 04 Jul 2013 10:44:44 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Thu, 04 Jul 2013 10:44:44 +0100 (not processed: message from valid local sender) X-MDDKIM-Result: neutral (mail1.multiplay.co.uk) X-MDRemoteIP: 82.69.141.170 X-Return-Path: prvs=1897fed9ea=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-fs@FreeBSD.org Message-ID: <0F8371633F6B496B87285D660FAF205B@multiplay.co.uk> From: "Steven Hartland" To: "Konstantin Belousov" References: <201307040000.r64001v6076818@freefall.freebsd.org> <20130704041539.GE91021@kib.kiev.ua> <20130704083641.GK91021@kib.kiev.ua> Subject: Re: kern/180236: [zfs] [nullfs] Leakage free space using ZFS with nullfs on 9.1-STABLE Date: Thu, 4 Jul 2013 10:44:54 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 Cc: freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 09:44:47 -0000 ----- Original Message ----- From: "Konstantin Belousov" On Thu, Jul 04, 2013 at 07:15:39AM +0300, Konstantin Belousov wrote: > > On Thu, Jul 04, 2013 at 12:00:01AM +0000, Steven Hartland wrote: > > > The following reply was made to PR kern/180236; it has been noted by GNATS. > > > > > > From: "Steven Hartland" > > > To: , > > > "Ivan Klymenko" > > > Cc: > > > Subject: Re: kern/180236: [zfs] [nullfs] Leakage free space using ZFS with nullfs on 9.1-STABLE > > > Date: Thu, 4 Jul 2013 00:58:13 +0100 > > > > > > Looks like nullfs isn't cleaning up correctly in the case > > > where a rename colides with an existing file hence results > > > in an implicit remove. > > > > > > This can be seen in the zdb output for the volume in that > > > before the unmount all the plain file entries still exist > > > but after the unmount of nullfs they are gone. > > > > Can you demonstrate the scenario of the problem, e.g. using the basic > > filesystem commands, like cp(1), mv(1) ? Does the issue reproduce > > on UFS ? > > > Ok, the following patch fixed the nullfs leakage for me (I tested over > the UFS). > > diff --git a/sys/fs/nullfs/null_vnops.c b/sys/fs/nullfs/null_vnops.c > index 6ff15ee..70402e3 100644 > --- a/sys/fs/nullfs/null_vnops.c > +++ b/sys/fs/nullfs/null_vnops.c > @@ -554,6 +554,7 @@ null_rename(struct vop_rename_args *ap) > struct vnode *fvp = ap->a_fvp; > struct vnode *fdvp = ap->a_fdvp; > struct vnode *tvp = ap->a_tvp; > + struct null_node *tnn; > > /* Check for cross-device rename. */ > if ((fvp->v_mount != tdvp->v_mount) || > @@ -568,7 +569,11 @@ null_rename(struct vop_rename_args *ap) > vrele(fvp); > return (EXDEV); > } > - > + > + if (tvp != NULL) { > + tnn = VTONULL(tvp); > + tnn->null_flags |= NULLV_DROP; > + } > return (null_bypass((struct vop_generic_args *)ap)); > } Confirmed this fixes the issue. I believe there's still a ZFS leak being triggered in the delete queue before this nullfs fix, which will result in space not being freed but with patch no further leaks occur. Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 09:48:44 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id DEF11C25 for ; Thu, 4 Jul 2013 09:48:44 +0000 (UTC) (envelope-from berend@pobox.com) Received: from smtp.pobox.com (b-pb-sasl-quonix.pobox.com [208.72.237.35]) by mx1.freebsd.org (Postfix) with ESMTP id 9D5201B7D for ; Thu, 4 Jul 2013 09:48:44 +0000 (UTC) Received: from smtp.pobox.com (unknown [127.0.0.1]) by b-sasl-quonix.pobox.com (Postfix) with ESMTP id 5F23E2B831; Thu, 4 Jul 2013 09:48:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=pobox.com; h=date :message-id:from:to:cc:subject:in-reply-to:references :mime-version:content-type:content-transfer-encoding; s=sasl; bh=OHjhQK/GlXWVUoj1Y+gu8w0VaK4=; b=vptD2kUb8P3AXzlw5+qLo5EYHnOt EwU6/k30nNeD00qDf5AV/LSKvSxW6p6+XvAlysl8BsbwXYDTXPsRd3jQ51DtwkS7 1IDtBig/gQFT+3G3I/PhV9PesruAUMQDrY6SckQ1xbyy3Z09jJ4bVhFzrUs7X9cW vDM7utWcgtBKbK0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=pobox.com; h=date:message-id :from:to:cc:subject:in-reply-to:references:mime-version :content-type:content-transfer-encoding; q=dns; s=sasl; b=yB2fEb DQLtjG4DCFq87I8mNt8s7bcr4/gFC5d3NLs5CBDrQiwpFK0ZB+trzIq9GaMD9seI Ny7oveox2Z5MaggssyaoA4HZYsQHbqCuBfUhhXk9TGGAHUv4NPH1Pw5GAkvi4m/O G5Lp5tx6TbwOMAjfFQ50V7JAuXP85S1tesqQo= Received: from b-pb-sasl-quonix.pobox.com (unknown [127.0.0.1]) by b-sasl-quonix.pobox.com (Postfix) with ESMTP id 5657F2B830; Thu, 4 Jul 2013 09:48:41 +0000 (UTC) Received: from bmach.nederware.nl (unknown [27.252.169.66]) by b-sasl-quonix.pobox.com (Postfix) with ESMTPA id C397D2B82E; Thu, 4 Jul 2013 09:48:40 +0000 (UTC) Received: from quadrio.nederware.nl (quadrio.nederware.nl [192.168.33.13]) by bmach.nederware.nl (Postfix) with ESMTP id E49B85C84; Thu, 4 Jul 2013 21:48:33 +1200 (NZST) Received: from quadrio.nederware.nl (quadrio.nederware.nl [127.0.0.1]) by quadrio.nederware.nl (Postfix) with ESMTP id 7EB9F49FB971; Thu, 4 Jul 2013 21:48:38 +1200 (NZST) Date: Thu, 04 Jul 2013 21:48:38 +1200 Message-ID: <87vc4q3bbd.wl%berend@pobox.com> From: Berend de Boer To: Markus Gebert Subject: Re: EBS snapshot backups from a FreeBSD zfs file system: zpool freeze? In-Reply-To: <87zju43sxl.wl%berend@pobox.com> References: <87li5o5tz2.wl%berend@pobox.com> <87ehbg5raq.wl%berend@pobox.com> <20130703055047.GA54853@icarus.home.lan> <6488DECC-2455-4E92-B432-C39490D18484@dragondata.com> <14A2336A-969C-4A13-9EFA-C0C42A12039F@hostpoint.ch> <87zju43sxl.wl%berend@pobox.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL/10.8 EasyPG/1.0.0 Emacs/24.3 (i686-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) Organization: Xplain Technology Ltd MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: multipart/signed; boundary="pgp-sign-Multipart_Thu_Jul__4_21:48:37_2013-1"; micalg=pgp-sha256; protocol="application/pgp-signature" Content-Transfer-Encoding: 7bit X-Pobox-Relay-ID: E8489E60-E48E-11E2-970F-E84251E3A03C-48001098!b-pb-sasl-quonix.pobox.com Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 09:48:44 -0000 --pgp-sign-Multipart_Thu_Jul__4_21:48:37_2013-1 Content-Type: text/plain; charset=US-ASCII >>>>> "Berend" == Berend de Boer writes: Berend> Do you think (yes, I will definitely test this), that ZFS Berend> can mount a file system consisting of a couple of disk Berend> (raidz2 setup), and access it even though every disk might Berend> be a backup taken at a slighty different time? Answering my own question: raidz2, four disks, 128GB disks each, no writing happening can be backed up with an EBS snapshot, then turned into volumes (disks) again and mounted on another FreeBSD without apparent ill effects. The snapshots of the four disks were about 2 seconds apart each. zfs took perhaps 30 seconds to start on the second server. No snapshot was available (only had done one of a file system, no zfs snapshot -r was ever done). Didn't do any rollback. Next step is to do take the snapshot while writing stuff is going on. Then what you suggested: taking a snapshot first, and rollback after mount. -- All the best, Berend de Boer ------------------------------------------------------ Awesome Drupal hosting: https://www.xplainhosting.com/ --pgp-sign-Multipart_Thu_Jul__4_21:48:37_2013-1 Content-Type: application/pgp-signature Content-Transfer-Encoding: 7bit Content-Description: OpenPGP Digital Signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.17 (GNU/Linux) iQIcBAABCAAGBQJR1UT2AAoJEKOfeD48G3g5Uk4P/RBxKPCF9RfFvAFgacuDRQki RFdv5wuAPGVcKEAqmToypqm/+KLoUAzg8qSfs/xIlE360mO8NAmMScABaVZz/+lr Uhf0DNmFIQPvSEfL6+NP6ZHsR5a/Ik2bODvrd9Lsym+u0BZVRDW/ko4hwz7f++SU OiHeCTLck6s2Xi6QHq8PORVD60tZ126baUKgCJVOvXsNDSr5wvGOpeMQcw/jvfVi ENugU96G7VNsPFx1k0U6umYBS0s1xKjgOVyE3Fhkf26NpdydeXDcfE6P49+ZXdsJ nRGqJ6vMfkUbnPwed+YMIQXLRTjKRbd6cLsfkJ4WJxFjflW7Chisj97fuyxAVhNg A8xhs595C/2DzgBmv3pMaWH+nbcG3nUJrCWHgRShcVDg3j815JNj38OyJ+TdEQ7a DeCaIgfo6V1xloaLDGTrHdB/vUsTtfWn0FacwhcykuoQse9PYJQMsQgL8f00xiCs 0T6nF5pachbehf63HtJln6qdujp0TPe1TV9G00jmdE2KyqAauaB3dWSXHgckaodi o6l2J+hwADAyQUBPNtt+tVLeaybRKD7+p6sWrQx5kw+wFA/9u6eSkfpd8WsvQGK1 47Hr+qyU/9rxe7mPF/TzcFkhi5u0hFv3xA6/xTyXsOweU/RD9EZzUeiZIKibohNL FcJCvI8fP8Ft1o7EOGUg =/hg5 -----END PGP SIGNATURE----- --pgp-sign-Multipart_Thu_Jul__4_21:48:37_2013-1-- From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 10:32:45 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id B783B432 for ; Thu, 4 Jul 2013 10:32:45 +0000 (UTC) (envelope-from jdc@koitsu.org) Received: from relay4-d.mail.gandi.net (relay4-d.mail.gandi.net [217.70.183.196]) by mx1.freebsd.org (Postfix) with ESMTP id 1918B1D75 for ; Thu, 4 Jul 2013 10:32:44 +0000 (UTC) Received: from mfilter27-d.gandi.net (mfilter27-d.gandi.net [217.70.178.155]) by relay4-d.mail.gandi.net (Postfix) with ESMTP id 91815172071; Thu, 4 Jul 2013 12:32:33 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at mfilter27-d.gandi.net Received: from relay4-d.mail.gandi.net ([217.70.183.196]) by mfilter27-d.gandi.net (mfilter27-d.gandi.net [10.0.15.180]) (amavisd-new, port 10024) with ESMTP id WHpHuDlhGrMN; Thu, 4 Jul 2013 12:32:31 +0200 (CEST) X-Originating-IP: 76.102.14.35 Received: from jdc.koitsu.org (c-76-102-14-35.hsd1.ca.comcast.net [76.102.14.35]) (Authenticated sender: jdc@koitsu.org) by relay4-d.mail.gandi.net (Postfix) with ESMTPSA id 7D9E01720D6; Thu, 4 Jul 2013 12:32:29 +0200 (CEST) Received: by icarus.home.lan (Postfix, from userid 1000) id 5AFB873A1C; Thu, 4 Jul 2013 03:32:27 -0700 (PDT) Date: Thu, 4 Jul 2013 03:32:27 -0700 From: Jeremy Chadwick To: Steven Hartland Subject: Re: EBS snapshot backups from a FreeBSD zfs file system: zpool freeze? Message-ID: <20130704103227.GA84901@icarus.home.lan> References: <87mwq34emp.wl%berend@pobox.com> <20130703200241.GB60515@in-addr.com> <87k3l748gb.wl%berend@pobox.com> <20130703233631.GA74698@icarus.home.lan> <87d2qz42q4.wl%berend@pobox.com> <20130704010815.GB75529@icarus.home.lan> <8761wr3xxk.wl%berend@pobox.com> <36872A46E9BE40688B8F59FD05D4ECE9@multiplay.co.uk> <20130704082209.GB83766@icarus.home.lan> <98367DD8A8E34B75906FD46655AD74CF@multiplay.co.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <98367DD8A8E34B75906FD46655AD74CF@multiplay.co.uk> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 10:32:45 -0000 On Thu, Jul 04, 2013 at 09:47:26AM +0100, Steven Hartland wrote: > > ----- Original Message ----- From: "Jeremy Chadwick" > > To: "Steven Hartland" > Cc: "Berend de Boer" ; "freebsd-fs" > Sent: Thursday, July 04, 2013 9:22 AM > Subject: Re: EBS snapshot backups from a FreeBSD zfs file system: zpool freeze? > > > >On Thu, Jul 04, 2013 at 09:06:57AM +0100, Steven Hartland wrote: > >>----- Original Message ----- From: "Berend de Boer" > >> > >>Jeremy> Also, because nobody seems to warn others of this: if > >>Jeremy> you go the ZFS route on FreeBSD, please do not use > >>Jeremy> features like dedup or compression. > >> > >>While dedup is memory and sometimes cpu hungry, so HW spec > >>should be considered before using it, compression is not so > >>and I've not seen any valid reason not to use it should it > >>fit your uses. > >> > >>We actually use compression extensivily here and we've > >>had nothing but positive results from it so sounds like > >>FUD to me. > > > >The problem with the lack of separate and prioritised write threads for > >dedup and compression, thus causing interactivity stalls, is not FUD, > >it's fact. I explained this in the part of my reply to Berend which you > >omitted, which included the proof and acknowledgement from folks who > >are in-the-know (Bob Friesenhahn). :/ Nobody has told me "yeah that > >got fixed", so there is no reason for me to believe anything has > >changed. > > Do you have an links to the discuss on this Jeremy as I'd be intereted > to read up on the this when I have some spare time? Warning up front: sorry for the long mail (I did try to keep it terse) but most of it is demonstrating the problem. Useful FreeBSD links, specifically the conversations I've had over the years about this problem, at least the most useful ones. The first one is probably the most relevant, since it's a statement from Bob himself explaining it: http://lists.freebsd.org/pipermail/freebsd-fs/2011-October/012726.html http://lists.freebsd.org/pipermail/freebsd-fs/2011-October/012752.html http://lists.freebsd.org/pipermail/freebsd-stable/2013-February/072171.html http://lists.freebsd.org/pipermail/freebsd-stable/2013-February/072178.html To be clear (note the date and version): as of September 2011 I was able to reproduce the problem on stable/8. While you were writing your mail, I was off actually trying to find out technical details (specifically the source code changes in OpenSolaris or later) which fixed it / what Bob alluded to. I really had to jab at search engines to find anything useful, and wasn't getting anywhere until I found this: http://comments.gmane.org/gmane.os.solaris.opensolaris.zfs/28192 This mentioned the OpenSolaris bug number 6586537. I then poked about svnweb and found that this fix was imported into FreeBSD with the "ZFS version 15" import. Commit log entry: 6586537 async zio taskqs can block out userland commands (142901-09) Relevant revisions, dates, and branches for this: r209962: Jul 2010: head: http://svnweb.freebsd.org/base?view=revision&revision=209962 r212668: Sep 2010: stable/8: http://svnweb.freebsd.org/base?view=revision&revision=212668 And that head became stable/9 as of September 2011, I believe. So my testing as of September 2011 would have included the fix for 6586537. This makes me wonder if 6586537 is truly the issue I've been describing or not. It's easy enough to test for on stable/9 today (zfs create, zfs set compression=on, do the dd and in another window do stuff and see what happens, then later zfs destroy). So let's see if it's still there almost 2 years later... Yup, still there, but it seems improved in some way, possibly due to a combination of things. This box is actually a C2Q (more powerful than the one in Sep 2011) too, and is actively doing nothing. Relevant bits: # zpool list NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT backups 1.81T 463G 1.36T 24% 1.00x ONLINE - data 2.72T 694G 2.04T 24% 1.00x ONLINE - # zdb -C | grep ashift ashift: 12 ashift: 12 # zfs create -o compression=lzjb -o mountpoint=/mnt backups/comptest # zfs get all backups/comptest | grep compression backups/comptest compression lzjb local The "backups" pool is a single disk (WD20EFRX) running at SATA300 with NCQ, backed by an Intel ICH9 in AHCI mode. The disk is a 4K sector drive where the gnop trick was used (proof above). I could have used the "data" pool (raidz1 driven by 3 disks (WD10EFRX) + gnop), but it wouldn't matter -- the problem is consistent no matter what the pool. I can't demonstrate the problem using "while : ; do date ; sleep 1 ; done" because sleep 1 isn't granular enough (yes I'm aware FreeBSD sleep(1) supports more granularity) and because date/strftime doesn't show microseconds. So off into perl + Time::HiRes we go... window1# date +%s ; dd if=/dev/zero of=/mnt/bigfile bs=64k 1372932367 ^C123977+0 records in 123977+0 records out 8124956672 bytes transferred in 16.437748 secs (494286486 bytes/sec) window2# perl -e 'use Time::HiRes qw(time sleep); $|=1; while(1) { print time, "\n"; sleep(0.2); }' Now because even 0.2 seconds probably isn't granular enough, I ended up pressing Enter in the middle of the running perl output every time I'd notice that lines weren't coming across at consistent 0.2 second intervals (I guess I have a good eye for this sort of thing). So blank lines are me noticing the pauses/delays I've been talking about: 1372932411.90407 1372932412.10415 1372932412.30513 1372932412.50614 1372932412.70713 1372932412.90813 1372932413.10913 1372932413.31013 1372932413.51112 1372932413.71213 1372932413.91315 1372932414.11413 1372932414.31513 1372932414.51615 1372932414.71714 1372932415.00015 1372932415.27278 1372932415.47316 1372932415.67416 1372932415.87514 1372932416.07615 1372932416.27715 1372932416.48115 1372932416.78215 1372932416.98614 1372932417.18717 1372932417.38814 1372932417.58912 1372932417.79016 1372932417.99115 1372932418.40577 1372932418.60617 1372932418.80715 1372932419.00813 1372932419.20913 1372932419.41013 1372932419.64116 1372932419.85516 1372932420.11614 1372932420.31716 1372932420.51813 1372932420.71913 1372932420.92016 1372932421.12115 1372932421.32216 1372932421.58213 1372932421.78316 1372932421.98416 1372932422.18515 1372932422.38613 1372932422.58713 1372932422.80118 1372932423.05617 1372932423.34016 1372932423.54116 1372932423.74215 1372932423.94314 1372932424.14415 1372932424.43316 1372932424.63417 1372932424.85514 1372932425.05613 1372932425.25715 1372932425.45813 1372932425.65913 1372932425.86017 1372932426.18416 1372932426.51216 1372932426.71312 1372932426.91413 1372932427.11515 1372932427.31613 1372932427.74915 1372932428.00214 1372932428.20315 1372932428.40415 1372932428.60514 1372932428.80613 1372932429.00713 1372932429.38115 1372932429.58214 1372932429.78316 1372932429.98417 1372932430.18519 1372932430.38614 1372932430.58713 1372932430.92817 1372932431.12914 1372932431.33012 1372932431.53115 1372932431.73214 1372932431.93313 1372932432.13413 1372932432.48115 1372932432.73414 1372932432.93514 1372932433.13616 1372932433.33713 1372932433.53817 1372932433.73915 1372932433.95151 1372932434.28214 1372932434.48316 1372932434.68414 1372932434.88515 1372932435.08614 1372932435.28712 1372932435.48916 1372932435.84146 1372932436.05013 1372932436.25117 ^C There's a quite consistent pattern if you look closely: about every 8 lines of output. Each line = every 0.2 seconds, so about every 1.5 seconds is where I'd see a pause which would last for about 0.5 seconds. And no, the above output *was not* being written to a file on ZFS, only to stdout. :-) What's interesting: I tried compression=gzip-9, which historically was worse (I remember this clearly), but the stalls are about the same. Maybe it's because I'm using /dev/zero rather than /dev/random, but the issue there is that /dev/random would tax the CPU (entropy, etc.) more. We didn't use compression at my previous job on Solaris (available CPU time was very, very important given what the machines did), so I don't have any context for comparison. But: I can do this exact same procedure on the /backups filesystem/pool, without compression of course, and there are no stalls -- just smooth interactivity. Now let me circle back to the convo I had with Fabian in 2013... I have zero experience doing this "sched trace" stuff. I do not speak Python, but looking at /usr/src/tools/sched/schedgraph.py almost implies it has some kind of "visual graphing" (via X? I have no clue from the code) and "borders" and "colour" support -- this is not an X system, so unless this Python script generates image files somehow (I have no image libraries installed on my system)... My kernel does contain: options KTR options KTR_ENTRIES=262144 options KTR_COMPILE=(KTR_SCHED) options KTR_MASK=(KTR_SCHED) And I can follow the instructions at the top of the Python script and provide the ktrdump somewhere if needed, but that's about it. I don't know if that would help or be beneficial in any way -- because even though I have some familiarity with userland profiling via *_p.a libs, this is something at a completely different level. So if someone wants this, I need a bit of hand-holding to know what all I'm supposed to be doing. The instructions in the Python script make me a little weary, particularly since it doesn't say to re-set debug.ktr.mask to 536870912 afterward, so I'm not sure what the implications are. > >If a person considering use of compression on FreeBSD ZFS doesn't mind > >that problem, then by all means use it. It doesn't change the fact that > >there's an issue, and one that folks should be made aware of up front. > >It's not spreading FUD: it's spreading knowledge of a certain behaviour > >that differs between FreeBSD and Solaris/Illumos. The issue is a > >deal-breaker for me; if it's not for you, great. > > Sounds like it could well be use case based then, as we've not had any > problems compression causing interactively problems. Quite the opposite > in fact, the reduced physical IO that compression results in improved > interactivity. > > So I guess its like everything, one size doesn't fit all, so temporing > statements about blanket avoiding the these features seems like the > way to go :) While I see the logic in what you're saying, I prefer to publicly disclose the differences in behaviours between Illumos ZFS and FreeBSD ZFS. I'm well-aware of the tremendous and positive effort to minimise those differences (code-wise) -- I remember mm@ talking about this some time ago -- but if this is somehow one of them, I do not see the harm in telling people "FYI, there is this quirk/behavioural aspect specific to FreeBSD that you should be aware of". It doesn't mean ZFS on FreeBSD sucks, it doesn't mean it's broken, it just means it's something that would completely surprise someone out of the blue. Imagine the thread: "my system intermittently stalls, even at VGA console... does anyone know what's causing this?" -- I doubt anyone would think to check ZFS. -- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 10:42:32 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id CD9957A6 for ; Thu, 4 Jul 2013 10:42:32 +0000 (UTC) (envelope-from shuriku@shurik.kiev.ua) Received: from graal.it-profi.org.ua (graal.shurik.kiev.ua [193.239.74.7]) by mx1.freebsd.org (Postfix) with ESMTP id 8A86E1DCE for ; Thu, 4 Jul 2013 10:42:32 +0000 (UTC) Received: from [217.76.201.82] (helo=thinkpad.it-profi.org.ua) by graal.it-profi.org.ua with esmtpa (Exim 4.80.1 (FreeBSD)) (envelope-from ) id 1UugzY-000HOj-Ua for freebsd-fs@freebsd.org; Thu, 04 Jul 2013 13:42:25 +0300 Message-ID: <51D5518B.9040005@shurik.kiev.ua> Date: Thu, 04 Jul 2013 13:42:19 +0300 From: Alexandr User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130630 Thunderbird/17.0.7 MIME-Version: 1.0 To: freebsd-fs@freebsd.org References: <51D3ED4F.5030102@shurik.kiev.ua> <8afaecbe0f764963b57fac7743f483bc@DBXPR07MB064.eurprd07.prod.outlook.com> <51D42416.8080604@shurik.kiev.ua> <20130703200923.GA70533@icarus.home.lan> <51D496F2.6080000@delphij.net> <20130703213041.GA72100@icarus.home.lan> <51D499F9.1020201@delphij.net> In-Reply-To: <51D499F9.1020201@delphij.net> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-SA-Exim-Connect-IP: 217.76.201.82 X-SA-Exim-Mail-From: shuriku@shurik.kiev.ua X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on graal.it-profi.org.ua X-Spam-Level: X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED autolearn=unavailable version=3.3.2 Subject: Re: Whole disk ZFS or -a4k partition X-SA-Exim-Version: 4.2 X-SA-Exim-Scanned: Yes (on graal.it-profi.org.ua) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 10:42:32 -0000 04.07.2013 00:39, Xin Li пишет: > On 07/03/13 14:30, Jeremy Chadwick wrote: > > On Wed, Jul 03, 2013 at 02:26:10PM -0700, Xin Li wrote: > >> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 > >> > >> On 07/03/13 13:20, Warren Block wrote: > >>> On Wed, 3 Jul 2013, Jeremy Chadwick wrote: > >>>> On Wed, Jul 03, 2013 at 04:16:06PM +0300, Alexandr wrote: > >>>>>> > >>>>> Thank you for your explain. I'll try it soon and post my > >>>>> results. One thing - I can't use gpt-disks because off my > >>>>> laptop's bios (Lenovo Thinkpad E530) cannot boot it, only > >>>>> mbr-style. > >>>> > >>>> You can use GPT with a BIOS that only supports MBR (in other > >>>> words, you do not need UEFI to boot from GPT). FreeBSD's > >>>> boot blocks are intelligent in this regard. > >>> > >>> Yes. However, the Thinkpad BIOS is not intelligent about GPT: > >>> > >>> http://forums.freebsd.org/showthread.php?t=26759&highlight=UEFI+GPT > >>> > >>> > >> > >>> > http://www.dec.sakura.ne.jp/~junchoon/machine/freebsd-e.html > >> > >> Not true. My Lenovo T530 boots fine with GPT after a BIOS > >> upgrade and choosing "legacy" in the EFI boot options. > > > The issue Warren listed off is specific to the Lenovo T420. It > > does not happen on the T430, nor later models (this is further > > confirmed by the last person posting in the thread). Models prior > > to the T420 may also have the same problem (speculative / I do not > > know for certain). > > > I interpreted what Warren was saying to mean "yes it works, but > > some models of laptops/hardware don't work with it". > > Ah Ok, did Lenovo released new BIOS/EFI updates for the older models? > IIRC some of their consumer series also share this issue by the way. > I can confirm, that my laptop with latest bios (2.52 from 04/23/2013) can boot from gpt. A half-year ago it couldn't. From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 10:42:51 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id EA70681A for ; Thu, 4 Jul 2013 10:42:51 +0000 (UTC) (envelope-from marck@rinet.ru) Received: from woozle.rinet.ru (woozle.rinet.ru [195.54.192.68]) by mx1.freebsd.org (Postfix) with ESMTP id 7DBE61DD5 for ; Thu, 4 Jul 2013 10:42:51 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by woozle.rinet.ru (8.14.5/8.14.5) with ESMTP id r64Agi0C041833 for ; Thu, 4 Jul 2013 14:42:44 +0400 (MSK) (envelope-from marck@rinet.ru) Date: Thu, 4 Jul 2013 14:42:44 +0400 (MSK) From: Dmitry Morozovsky To: freebsd-fs@FreeBSD.org Subject: boot from ZFS: which pool types use? Message-ID: User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) X-NCC-RegID: ru.rinet X-OpenPGP-Key-ID: 6B691B03 MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (woozle.rinet.ru [0.0.0.0]); Thu, 04 Jul 2013 14:42:44 +0400 (MSK) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 10:42:52 -0000 Dear colleagues, I'm a bit stuck and possibly my google-fu sleeps somewhere, but I have inconsistent cases on what pool types can one use to boot contemporary (read: stable/9) FreeBSD from. For example, I have many older servers with UFS /bootdisk and ZFS-on-root. While this is useable, it does not seem to be very consistent. On the other hand, I have a couple of servers with ZFS-only config which uses complex raid10-like config on gpart disks, and they boot flawlessly, at lest till now. Lastly, I'm now in process of setting up new server, and trying to do the same, configuring ZFS with 4 pairs of SAS, now have Can't find /boot/zfsloader I suppose from line 619 from sys/boot/i386/zfsboot/zfsboot.c Configs are essentially the same, I double-check gpart bootcode and zpool.cache (while I still did not found the guide how to interprete its content; at least one of my ZFS servers successfully runs without it) Any hints? Are stripe-mirror configuration available for booting from (yes, I do remember that all, or at last enough for degraded use, disks should be exposed to BIOS by controller firmware, and it is usually constrained to 6 or 8 disk devices) Thanks! -- Sincerely, D.Marck [DM5020, MCK-RIPE, DM3-RIPN] [ FreeBSD committer: marck@FreeBSD.org ] ------------------------------------------------------------------------ *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- marck@rinet.ru *** ------------------------------------------------------------------------ From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 11:00:18 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id F0A97A46 for ; Thu, 4 Jul 2013 11:00:18 +0000 (UTC) (envelope-from daniel@digsys.bg) Received: from smtp-sofia.digsys.bg (smtp-sofia.digsys.bg [193.68.21.123]) by mx1.freebsd.org (Postfix) with ESMTP id 5D5701E8B for ; Thu, 4 Jul 2013 11:00:17 +0000 (UTC) Received: from dcave.digsys.bg (dcave.digsys.bg [193.68.6.1]) (authenticated bits=0) by smtp-sofia.digsys.bg (8.14.6/8.14.6) with ESMTP id r64B0Gcq041466 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO) for ; Thu, 4 Jul 2013 14:00:16 +0300 (EEST) (envelope-from daniel@digsys.bg) Message-ID: <51D555C0.9080403@digsys.bg> Date: Thu, 04 Jul 2013 14:00:16 +0300 From: Daniel Kalchev User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130627 Thunderbird/17.0.7 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Re: EBS snapshot backups from a FreeBSD zfs file system: zpool freeze? References: <87li5o5tz2.wl%berend@pobox.com> <87ehbg5raq.wl%berend@pobox.com> <20130703055047.GA54853@icarus.home.lan> <6488DECC-2455-4E92-B432-C39490D18484@dragondata.com> <14A2336A-969C-4A13-9EFA-C0C42A12039F@hostpoint.ch> <87zju43sxl.wl%berend@pobox.com> <87vc4q3bbd.wl%berend@pobox.com> In-Reply-To: <87vc4q3bbd.wl%berend@pobox.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 11:00:19 -0000 On 04.07.13 12:48, Berend de Boer wrote: >>>>>> "Berend" == Berend de Boer writes: > Berend> Do you think (yes, I will definitely test this), that ZFS > Berend> can mount a file system consisting of a couple of disk > Berend> (raidz2 setup), and access it even though every disk might > Berend> be a backup taken at a slighty different time? > > Answering my own question: raidz2, four disks, 128GB disks each, no > writing happening can be backed up with an EBS snapshot, then turned > into volumes (disks) again and mounted on another FreeBSD without > apparent ill effects. > > The snapshots of the four disks were about 2 seconds apart each. > > zfs took perhaps 30 seconds to start on the second server. It apparently had to replay the ZIL. > No snapshot was available (only had done one of a file system, no zfs > snapshot -r was ever done). Didn't do any rollback. As mentioned earlier, doing this is merely emulating (graceful) power loss of the system, or a sudden reboot if you will -- without unmounting the file systems. ZFS guarantees that your file system will be always consistent, at the cost of losing some data -- minimized by the ZIL if your applications used synchronous writes. > Next step is to do take the snapshot while writing stuff is going on. The snapshot will 'record' the known file system state. This does not necessarily mean it will record what you think it should, because applications might hold data in memory that makes their stored state consistent. ZFS has no way to know any of this (nor does any other filesystem). Exactly what happens when your system suddenly reboots. ZFS will guarantee the file system consistency. > Then what you suggested: taking a snapshot first, and rollback after mount. Taking a snapshot at some point of time will guarantee you have the file system state at that time. There will be difference with not taking a snapshot, only if your applications do not use synchronous writes where they should, because the snapshot will make the pending writes synchronous (I believe this is the case, or it would not be consistent). if your applications always do sync writes, for example an NFS server, whether you do the snapshot or not will not make any difference. But you should test it anyway. :) Daniel From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 11:03:00 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 48509AE0 for ; Thu, 4 Jul 2013 11:03:00 +0000 (UTC) (envelope-from kpaasial@gmail.com) Received: from mail-qc0-x22c.google.com (mail-qc0-x22c.google.com [IPv6:2607:f8b0:400d:c01::22c]) by mx1.freebsd.org (Postfix) with ESMTP id 125231EA8 for ; Thu, 4 Jul 2013 11:03:00 +0000 (UTC) Received: by mail-qc0-f172.google.com with SMTP id j10so700462qcx.3 for ; Thu, 04 Jul 2013 04:02:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=viEU5liQxtC92EO9PNkA704BLAVg3DY7kHizD59FOK4=; b=tCgRi/+mZz/++M42ucNlwbNre689/JQJoTsJRT8CN1yuSSp2tPOe4Ga/rU6o9MX3cY wOQZ2KlJVaN05FfHLeUx/qkjeVcOwwxa9mJlS+AkuQWv5QHbiwGQix1SFdJv0Jrx3dDu ElUdkrFuOJEk9nMS0t1n3adj+wA88HakS1UgqdK5fbl9O5L5nzCODE7bf9MUZwFTCqHM chWHOFonoXwIfb5crclRplv/GvZr061P26HvRFvtjprUykx3rmv/Z8dnReitKtk7S4s5 uvFv1yqbiFDKM+Mqt1VOAQHn/8tUvotedMsXpG+92OKD8ARanUE++m2a8Ai7pXkassmN gH7g== MIME-Version: 1.0 X-Received: by 10.49.127.196 with SMTP id ni4mr4959421qeb.5.1372935779662; Thu, 04 Jul 2013 04:02:59 -0700 (PDT) Received: by 10.224.138.78 with HTTP; Thu, 4 Jul 2013 04:02:59 -0700 (PDT) In-Reply-To: References: Date: Thu, 4 Jul 2013 14:02:59 +0300 Message-ID: Subject: Re: boot from ZFS: which pool types use? From: Kimmo Paasiala To: Dmitry Morozovsky Content-Type: text/plain; charset=UTF-8 Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 11:03:00 -0000 On Thu, Jul 4, 2013 at 1:42 PM, Dmitry Morozovsky wrote: > Dear colleagues, > > I'm a bit stuck and possibly my google-fu sleeps somewhere, but I have > inconsistent cases on what pool types can one use to boot contemporary > (read: stable/9) FreeBSD from. > > For example, I have many older servers with UFS /bootdisk and ZFS-on-root. > While this is useable, it does not seem to be very consistent. > > On the other hand, I have a couple of servers with ZFS-only config which uses > complex raid10-like config on gpart disks, and they boot flawlessly, at lest > till now. > > Lastly, I'm now in process of setting up new server, and trying to do the same, > configuring ZFS with 4 pairs of SAS, now have > > Can't find /boot/zfsloader > > I suppose from line 619 from sys/boot/i386/zfsboot/zfsboot.c > > Configs are essentially the same, I double-check gpart bootcode and zpool.cache > (while I still did not found the guide how to interprete its content; at least > one of my ZFS servers successfully runs without it) > > Any hints? Are stripe-mirror configuration available for booting from (yes, I > do remember that all, or at last enough for degraded use, disks should be > exposed to BIOS by controller firmware, and it is usually constrained to 6 or 8 > disk devices) > > Thanks! > > > -- > Sincerely, > D.Marck [DM5020, MCK-RIPE, DM3-RIPN] > [ FreeBSD committer: marck@FreeBSD.org ] > ------------------------------------------------------------------------ > *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- marck@rinet.ru *** > ------------------------------------------------------------------------ > _______________________________________________ I no longer have my FreeBSD 9-STABLE fileserver but I was booting succesfully from a four disk striped mirror ZFS pool (two 2-disk mirror vdevs in other words) that was pure ZFS, no UFS /boot involved. It always worked like a charm. -Kimmo From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 11:09:23 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 107B5BC4 for ; Thu, 4 Jul 2013 11:09:23 +0000 (UTC) (envelope-from feenberg@nber.org) Received: from mail2.nber.org (mail2.nber.org [66.251.72.79]) by mx1.freebsd.org (Postfix) with ESMTP id C37241EE5 for ; Thu, 4 Jul 2013 11:09:22 +0000 (UTC) Received: from nber2.nber.org (nber2.nber.org [66.251.72.72]) by mail2.nber.org (8.14.4/8.14.4) with ESMTP id r64B9DKA049957 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Thu, 4 Jul 2013 07:09:14 -0400 (EDT) (envelope-from feenberg@nber.org) Date: Thu, 4 Jul 2013 07:09:13 -0400 (EDT) From: Daniel Feenberg To: Jeremy Chadwick Subject: Re: EBS snapshot backups from a FreeBSD zfs file system: zpool freeze? In-Reply-To: <20130704010815.GB75529@icarus.home.lan> Message-ID: References: <6488DECC-2455-4E92-B432-C39490D18484@dragondata.com> <871u7g57rl.wl%berend@pobox.com> <87mwq34emp.wl%berend@pobox.com> <20130703200241.GB60515@in-addr.com> <87k3l748gb.wl%berend@pobox.com> <20130703233631.GA74698@icarus.home.lan> <87d2qz42q4.wl%berend@pobox.com> <20130704010815.GB75529@icarus.home.lan> User-Agent: Alpine 2.03 (LRH 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Anti-Virus: Kaspersky Anti-Virus for Linux Mail Server 5.6.39/RELEASE, bases: 20130704 #10503464, check: 20130704 clean Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 11:09:23 -0000 On Wed, 3 Jul 2013, Jeremy Chadwick wrote: > scripts. > > Also, because nobody seems to warn others of this: if you go the > ZFS route on FreeBSD, please do not use features like dedup or > compression. I can expand more on this if asked, as they have > separate (and in one case identical/similar) caveats. (I'm always > willing to bend on compression as long as the user knows of the one > problem that still exists today and feels it's okay/acceptable) > Please expand on the problem with compression - we have a lot of very large, very "fluffy" datasets that compress down about 90% (and are accessed in pure sequential order) so compression is very attractive to us. dan feenberg NBER From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 11:11:35 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 39C92C40 for ; Thu, 4 Jul 2013 11:11:35 +0000 (UTC) (envelope-from jdc@koitsu.org) Received: from relay5-d.mail.gandi.net (relay5-d.mail.gandi.net [217.70.183.197]) by mx1.freebsd.org (Postfix) with ESMTP id EBA2A1EF1 for ; Thu, 4 Jul 2013 11:11:34 +0000 (UTC) Received: from mfilter21-d.gandi.net (mfilter21-d.gandi.net [217.70.178.149]) by relay5-d.mail.gandi.net (Postfix) with ESMTP id 8031641C07F; Thu, 4 Jul 2013 13:11:17 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at mfilter21-d.gandi.net Received: from relay5-d.mail.gandi.net ([217.70.183.197]) by mfilter21-d.gandi.net (mfilter21-d.gandi.net [10.0.15.180]) (amavisd-new, port 10024) with ESMTP id 3hZ4vSoobLcy; Thu, 4 Jul 2013 13:11:15 +0200 (CEST) X-Originating-IP: 76.102.14.35 Received: from jdc.koitsu.org (c-76-102-14-35.hsd1.ca.comcast.net [76.102.14.35]) (Authenticated sender: jdc@koitsu.org) by relay5-d.mail.gandi.net (Postfix) with ESMTPSA id 4B73B41C089; Thu, 4 Jul 2013 13:11:15 +0200 (CEST) Received: by icarus.home.lan (Postfix, from userid 1000) id 6054973A1E; Thu, 4 Jul 2013 04:11:13 -0700 (PDT) Date: Thu, 4 Jul 2013 04:11:13 -0700 From: Jeremy Chadwick To: Daniel Feenberg Subject: Re: EBS snapshot backups from a FreeBSD zfs file system: zpool freeze? Message-ID: <20130704111113.GA87988@icarus.home.lan> References: <871u7g57rl.wl%berend@pobox.com> <87mwq34emp.wl%berend@pobox.com> <20130703200241.GB60515@in-addr.com> <87k3l748gb.wl%berend@pobox.com> <20130703233631.GA74698@icarus.home.lan> <87d2qz42q4.wl%berend@pobox.com> <20130704010815.GB75529@icarus.home.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 11:11:35 -0000 On Thu, Jul 04, 2013 at 07:09:13AM -0400, Daniel Feenberg wrote: > On Wed, 3 Jul 2013, Jeremy Chadwick wrote: > > > scripts. > > > > Also, because nobody seems to warn others of this: if you go the > > ZFS route on FreeBSD, please do not use features like dedup or > > compression. I can expand more on this if asked, as they have > > separate (and in one case identical/similar) caveats. (I'm always > > willing to bend on compression as long as the user knows of the one > > problem that still exists today and feels it's okay/acceptable) > > > > Please expand on the problem with compression - we have a lot of > very large, very "fluffy" datasets that compress down about 90% (and > are accessed in pure sequential order) so compression is very > attractive to us. Please see the rest of the thread; I explain the nuance there. :-) (Maybe mail delays of some sort are happening somewhere and you haven't seen the mail yet...) -- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 11:47:06 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 65AE4150 for ; Thu, 4 Jul 2013 11:47:06 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 950B21013 for ; Thu, 4 Jul 2013 11:47:05 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id OAA07431; Thu, 04 Jul 2013 14:46:38 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1Uuhzi-000EYr-FI; Thu, 04 Jul 2013 14:46:38 +0300 Message-ID: <51D56066.1020902@FreeBSD.org> Date: Thu, 04 Jul 2013 14:45:42 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130405 Thunderbird/17.0.5 MIME-Version: 1.0 To: Dmitry Morozovsky Subject: Re: boot from ZFS: which pool types use? References: In-Reply-To: X-Enigmail-Version: 1.5.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 11:47:06 -0000 on 04/07/2013 13:42 Dmitry Morozovsky said the following: > Can't find /boot/zfsloader Does this file exist in the filesystem pointed to by bootfs property (if set)? Are all ZFS disks visible by BIOS/firmware? Are there any other messages before the quoted one? -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 11:56:40 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 9F48035E; Thu, 4 Jul 2013 11:56:40 +0000 (UTC) (envelope-from marck@rinet.ru) Received: from woozle.rinet.ru (woozle.rinet.ru [195.54.192.68]) by mx1.freebsd.org (Postfix) with ESMTP id 3063B1066; Thu, 4 Jul 2013 11:56:39 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by woozle.rinet.ru (8.14.5/8.14.5) with ESMTP id r64BucPG045591; Thu, 4 Jul 2013 15:56:38 +0400 (MSK) (envelope-from marck@rinet.ru) Date: Thu, 4 Jul 2013 15:56:38 +0400 (MSK) From: Dmitry Morozovsky To: Andriy Gapon Subject: Re: boot from ZFS: which pool types use? In-Reply-To: <51D56066.1020902@FreeBSD.org> Message-ID: References: <51D56066.1020902@FreeBSD.org> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) X-NCC-RegID: ru.rinet X-OpenPGP-Key-ID: 6B691B03 MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (woozle.rinet.ru [0.0.0.0]); Thu, 04 Jul 2013 15:56:38 +0400 (MSK) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 11:56:40 -0000 On Thu, 4 Jul 2013, Andriy Gapon wrote: > on 04/07/2013 13:42 Dmitry Morozovsky said the following: > > Can't find /boot/zfsloader > > Does this file exist in the filesystem pointed to by bootfs property (if set)? Arghh!!! I missed to set this one (however, this is the only zfs pool on the machine -- shouldn't the loader assume it is safe to try to boot off?) Regarding your other questions: Yes, this file exists. > Are all ZFS disks visible by BIOS/firmware? Yes, at least that's what I can conclude from BIOS messages (it's interesting to add debugging printf()s to gptzfsboot, as there is plenty space in it, but it would be next stage) > Are there any other messages before the quoted one? Nope, just the first '|' from rotating subrouting -- Sincerely, D.Marck [DM5020, MCK-RIPE, DM3-RIPN] [ FreeBSD committer: marck@FreeBSD.org ] ------------------------------------------------------------------------ *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- marck@rinet.ru *** ------------------------------------------------------------------------ From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 12:01:41 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 79C2362D; Thu, 4 Jul 2013 12:01:41 +0000 (UTC) (envelope-from marck@rinet.ru) Received: from woozle.rinet.ru (woozle.rinet.ru [195.54.192.68]) by mx1.freebsd.org (Postfix) with ESMTP id 0923B10DC; Thu, 4 Jul 2013 12:01:40 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by woozle.rinet.ru (8.14.5/8.14.5) with ESMTP id r64C1d70045795; Thu, 4 Jul 2013 16:01:39 +0400 (MSK) (envelope-from marck@rinet.ru) Date: Thu, 4 Jul 2013 16:01:39 +0400 (MSK) From: Dmitry Morozovsky To: Andriy Gapon Subject: Re: boot from ZFS: which pool types use? In-Reply-To: Message-ID: References: <51D56066.1020902@FreeBSD.org> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) X-NCC-RegID: ru.rinet X-OpenPGP-Key-ID: 6B691B03 MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (woozle.rinet.ru [0.0.0.0]); Thu, 04 Jul 2013 16:01:39 +0400 (MSK) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 12:01:41 -0000 On Thu, 4 Jul 2013, Dmitry Morozovsky wrote: > > > Can't find /boot/zfsloader > > > > Does this file exist in the filesystem pointed to by bootfs property (if set)? > > Arghh!!! I missed to set this one (however, this is the only zfs pool on the > machine -- shouldn't the loader assume it is safe to try to boot off?) For the record: setting up zpool bootfs property fixes the issue. I suppose it should be emphasized in all guides to not miss this, to avoid confusions; also, maybr some reasonable defaults should be implemented to help avoiding simple configuration errors. Thanks Andriy a lot, I really needed a 'reset press' from outside after kinda-3-days dances around this machine ;-) -- Sincerely, D.Marck [DM5020, MCK-RIPE, DM3-RIPN] [ FreeBSD committer: marck@FreeBSD.org ] ------------------------------------------------------------------------ *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- marck@rinet.ru *** ------------------------------------------------------------------------ From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 12:23:00 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 7D877968; Thu, 4 Jul 2013 12:23:00 +0000 (UTC) (envelope-from marck@rinet.ru) Received: from woozle.rinet.ru (woozle.rinet.ru [195.54.192.68]) by mx1.freebsd.org (Postfix) with ESMTP id EA32E11C4; Thu, 4 Jul 2013 12:22:59 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by woozle.rinet.ru (8.14.5/8.14.5) with ESMTP id r64CMwPC046631; Thu, 4 Jul 2013 16:22:58 +0400 (MSK) (envelope-from marck@rinet.ru) Date: Thu, 4 Jul 2013 16:22:58 +0400 (MSK) From: Dmitry Morozovsky To: freebsd-fs@FreeBSD.org Subject: ZFS default compression algo for contemporary FreeBSD versions Message-ID: User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) X-NCC-RegID: ru.rinet X-OpenPGP-Key-ID: 6B691B03 MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (woozle.rinet.ru [0.0.0.0]); Thu, 04 Jul 2013 16:22:58 +0400 (MSK) Cc: avg@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 12:23:00 -0000 Collegues, is it sane to just set 'zfs compression=on dataset' to achieve best algo on fresh FreeBSD systems (-current and/or stable/9)? The manual page is a bit uncertain about this... -- Sincerely, D.Marck [DM5020, MCK-RIPE, DM3-RIPN] [ FreeBSD committer: marck@FreeBSD.org ] ------------------------------------------------------------------------ *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- marck@rinet.ru *** ------------------------------------------------------------------------ From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 12:36:57 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id AA6BA2B3 for ; Thu, 4 Jul 2013 12:36:57 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 035331260 for ; Thu, 4 Jul 2013 12:36:56 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id PAA08549; Thu, 04 Jul 2013 15:36:32 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1Uuim0-000EdS-LD; Thu, 04 Jul 2013 15:36:32 +0300 Message-ID: <51D56C19.8080103@FreeBSD.org> Date: Thu, 04 Jul 2013 15:35:37 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130405 Thunderbird/17.0.5 MIME-Version: 1.0 To: Dmitry Morozovsky Subject: Re: boot from ZFS: which pool types use? References: <51D56066.1020902@FreeBSD.org> In-Reply-To: X-Enigmail-Version: 1.5.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 12:36:57 -0000 on 04/07/2013 15:01 Dmitry Morozovsky said the following: > On Thu, 4 Jul 2013, Dmitry Morozovsky wrote: > >>>> Can't find /boot/zfsloader >>> >>> Does this file exist in the filesystem pointed to by bootfs property (if set)? >> >> Arghh!!! I missed to set this one (however, this is the only zfs pool on the >> machine -- shouldn't the loader assume it is safe to try to boot off?) > > For the record: setting up zpool bootfs property fixes the issue. > > I suppose it should be emphasized in all guides to not miss this, to avoid > confusions; also, maybr some reasonable defaults should be implemented to help > avoiding simple configuration errors. > > Thanks Andriy a lot, I really needed a 'reset press' from outside after > kinda-3-days dances around this machine ;-) Setting bootfs should not be required. If your root filesystem is your root dataset (like "tank"), then everything should have just worked. -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 12:40:57 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id DC44E460 for ; Thu, 4 Jul 2013 12:40:57 +0000 (UTC) (envelope-from bofh@terranova.net) Received: from tog.net (tog.net [IPv6:2605:5a00::5]) by mx1.freebsd.org (Postfix) with ESMTP id B34CB1293 for ; Thu, 4 Jul 2013 12:40:57 +0000 (UTC) Received: from [IPv6:2605:5a00:ffff::face] (unknown [IPv6:2605:5a00:ffff::face]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by tog.net (Postfix) with ESMTPSA id 3bmJh83fg4z5Rn for ; Thu, 4 Jul 2013 08:40:56 -0400 (EDT) Message-ID: <51D56D50.6040807@terranova.net> Date: Thu, 04 Jul 2013 08:40:48 -0400 From: Travis Mikalson Organization: TerraNovaNet Internet Services User-Agent: Thunderbird 2.0.0.24 (Windows/20100228) MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Re: Report: ZFS deadlock in 9-STABLE References: <51D45401.5050801@terranova.net> <6F014EB7E2A446E5B7FDDF086E77880E@multiplay.co.uk> In-Reply-To: <6F014EB7E2A446E5B7FDDF086E77880E@multiplay.co.uk> X-Enigmail-Version: 0.96.0 OpenPGP: url=http://www.terranova.net/pgp/bofh Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 12:40:57 -0000 Steven Hartland wrote: > Any zvols? No, none. > ----- Original Message ----- From: "Travis Mikalson" > To: > Sent: Wednesday, July 03, 2013 5:40 PM > Subject: Report: ZFS deadlock in 9-STABLE > > >> Hello, >> >> To cut to the chase, I have a procstat -kk -a captured during a livelock >> for you here: >> http://tog.net/freebsd/zfsdeadlock-storage1-20130703 >> >> The other relevant configurations I could think of to show you are >> available within that http://tog.net/freebsd/ directory. >> >> If you want any additional information that I haven't given here please >> let me know! >> >> This is a FreeBSD 9-STABLE AMD64 system currently at: >> r250777: Sat May 18 17:41:39 EDT 2013 >> >> I didn't see too many relevant ZFS-related fixes after that date so am >> waiting for another round of interesting commits to update again. >> >> Unfortunately, this system has been livelocking on average about once >> every 7-14 days. Its lot in life is a ZFS storage server serving NFS and >> istgt traffic. >> >> It has 32GB of RAM and is an 8-core 2.6GHz Opteron 6212. >> The zpool looks like this, it has eight 1TB SAS drives and two SSDs >> being used for log and cache. >> >> pool: storage1 >> state: ONLINE >> status: The pool is formatted using a legacy on-disk format. The pool >> can >> still be used, but some features are unavailable. >> action: Upgrade the pool using 'zpool upgrade'. Once this is done, the >> pool will no longer be accessible on software that does not >> support feature >> flags. >> scan: scrub repaired 0 in 6h4m with 0 errors on Sun Jan 6 06:39:38 2013 >> config: >> >> NAME STATE READ WRITE CKSUM >> storage1 ONLINE 0 0 0 >> raidz1-0 ONLINE 0 0 0 >> da0 ONLINE 0 0 0 >> da2 ONLINE 0 0 0 >> da4 ONLINE 0 0 0 >> da6 ONLINE 0 0 0 >> raidz1-1 ONLINE 0 0 0 >> da1 ONLINE 0 0 0 >> da3 ONLINE 0 0 0 >> da5 ONLINE 0 0 0 >> da7 ONLINE 0 0 0 >> logs >> mirror-2 ONLINE 0 0 0 >> da8p2 ONLINE 0 0 0 >> da9p2 ONLINE 0 0 0 >> cache >> da8p3 ONLINE 0 0 0 >> da9p3 ONLINE 0 0 0 >> >> errors: No known data errors From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 12:43:53 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 629F4502; Thu, 4 Jul 2013 12:43:53 +0000 (UTC) (envelope-from marck@rinet.ru) Received: from woozle.rinet.ru (woozle.rinet.ru [195.54.192.68]) by mx1.freebsd.org (Postfix) with ESMTP id E3CD412B6; Thu, 4 Jul 2013 12:43:52 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by woozle.rinet.ru (8.14.5/8.14.5) with ESMTP id r64ChpSs047737; Thu, 4 Jul 2013 16:43:51 +0400 (MSK) (envelope-from marck@rinet.ru) Date: Thu, 4 Jul 2013 16:43:51 +0400 (MSK) From: Dmitry Morozovsky To: Andriy Gapon Subject: Re: boot from ZFS: which pool types use? In-Reply-To: <51D56C19.8080103@FreeBSD.org> Message-ID: References: <51D56066.1020902@FreeBSD.org> <51D56C19.8080103@FreeBSD.org> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) X-NCC-RegID: ru.rinet X-OpenPGP-Key-ID: 6B691B03 MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (woozle.rinet.ru [0.0.0.0]); Thu, 04 Jul 2013 16:43:51 +0400 (MSK) Cc: freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 12:43:53 -0000 On Thu, 4 Jul 2013, Andriy Gapon wrote: > >>>> Can't find /boot/zfsloader > >>> > >>> Does this file exist in the filesystem pointed to by bootfs property (if set)? > >> > >> Arghh!!! I missed to set this one (however, this is the only zfs pool on the > >> machine -- shouldn't the loader assume it is safe to try to boot off?) > > > > For the record: setting up zpool bootfs property fixes the issue. > > > > I suppose it should be emphasized in all guides to not miss this, to avoid > > confusions; also, maybr some reasonable defaults should be implemented to help > > avoiding simple configuration errors. > > > > Thanks Andriy a lot, I really needed a 'reset press' from outside after > > kinda-3-days dances around this machine ;-) > > Setting bootfs should not be required. If your root filesystem is your root > dataset (like "tank"), then everything should have just worked. it was/is (and, as I previously stated, is the only ZFS dataset on the machine), but unfortunately without explicit setting bootfs property does not work :( As this machine is still on staging phase (although I'm afraid I have to put it in production during next few days, so timing are tight), I'm more than happy to make debugging boots/runs to tighten the issue. Thanks again! -- Sincerely, D.Marck [DM5020, MCK-RIPE, DM3-RIPN] [ FreeBSD committer: marck@FreeBSD.org ] ------------------------------------------------------------------------ *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- marck@rinet.ru *** ------------------------------------------------------------------------ From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 12:59:40 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 46418A16; Thu, 4 Jul 2013 12:59:40 +0000 (UTC) (envelope-from jdc@koitsu.org) Received: from relay5-d.mail.gandi.net (relay5-d.mail.gandi.net [217.70.183.197]) by mx1.freebsd.org (Postfix) with ESMTP id 047D91382; Thu, 4 Jul 2013 12:59:39 +0000 (UTC) Received: from mfilter26-d.gandi.net (mfilter26-d.gandi.net [217.70.178.154]) by relay5-d.mail.gandi.net (Postfix) with ESMTP id B4EC341C090; Thu, 4 Jul 2013 14:59:28 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at mfilter26-d.gandi.net Received: from relay5-d.mail.gandi.net ([217.70.183.197]) by mfilter26-d.gandi.net (mfilter26-d.gandi.net [10.0.15.180]) (amavisd-new, port 10024) with ESMTP id v+R045U8th-V; Thu, 4 Jul 2013 14:59:27 +0200 (CEST) X-Originating-IP: 76.102.14.35 Received: from jdc.koitsu.org (c-76-102-14-35.hsd1.ca.comcast.net [76.102.14.35]) (Authenticated sender: jdc@koitsu.org) by relay5-d.mail.gandi.net (Postfix) with ESMTPSA id BC99041C096; Thu, 4 Jul 2013 14:59:26 +0200 (CEST) Received: by icarus.home.lan (Postfix, from userid 1000) id CAE4373A1C; Thu, 4 Jul 2013 05:59:24 -0700 (PDT) Date: Thu, 4 Jul 2013 05:59:24 -0700 From: Jeremy Chadwick To: Dmitry Morozovsky Subject: Re: ZFS default compression algo for contemporary FreeBSD versions Message-ID: <20130704125924.GA89675@icarus.home.lan> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@FreeBSD.org, avg@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 12:59:40 -0000 On Thu, Jul 04, 2013 at 04:22:58PM +0400, Dmitry Morozovsky wrote: > Collegues, > > is it sane to just set 'zfs compression=on dataset' to achieve best algo on > fresh FreeBSD systems (-current and/or stable/9)? > > The manual page is a bit uncertain about this... The man page on stable/9, for me, says clearly: Setting compression to on uses the lzjb compression algorithm. This is as of r251935. Your catman pages may be out of date. I deal with this problem per a recommendation of dougb back in the day: "rm -fr /usr/share/man/*" right before doing "make installworld". I've just gotten in the habit of remembering to do it every time. Otherwise render what's in /usr/src yourself: nroff -Tman -man /usr/src/cddl/contrib/opensolaris/cmd/zfs/zfs.8 | less -- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 12:59:54 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id AF568A1A; Thu, 4 Jul 2013 12:59:54 +0000 (UTC) (envelope-from kpaasial@gmail.com) Received: from mail-qe0-x233.google.com (mail-qe0-x233.google.com [IPv6:2607:f8b0:400d:c02::233]) by mx1.freebsd.org (Postfix) with ESMTP id 655C51384; Thu, 4 Jul 2013 12:59:54 +0000 (UTC) Received: by mail-qe0-f51.google.com with SMTP id a11so734607qen.10 for ; Thu, 04 Jul 2013 05:59:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=zJOjaEQa9p/Zh+b/8qM9mMbJcNVSoewDpUOUjHrvbts=; b=mHrknBUXwVBvojWtytpe5r0Oe61wkfl+mrSymLoGjAQBb7BJZ1EV1KlSATUngzj6Kz WrQdvl0BRdRxBrVrQnsF/P64WvxoC10wYdBGm1/pj20/V1FBaS4BQoMRlICsH24tqCYe hjo82JNNv/3ezAcQjpofujfXFrivrc2qTSENg5CHuIC1n7unXNO2v7MTOpg4E/jGnSUo c0vpc8VY6VoFzIiLhjxdmwHBw16E8+QkTAwdcsGlhhNpIff0dOAWPcjc33M049KN2s02 wdy5XtdHM1N6XlHvOMf5t0RqF0Tn5Mva7r/gOdAOMV4cBDZOcPI/6GKjzoVBirWCi0q0 lEyA== MIME-Version: 1.0 X-Received: by 10.49.127.4 with SMTP id nc4mr4975140qeb.41.1372942794008; Thu, 04 Jul 2013 05:59:54 -0700 (PDT) Received: by 10.224.138.78 with HTTP; Thu, 4 Jul 2013 05:59:53 -0700 (PDT) In-Reply-To: References: <51D56066.1020902@FreeBSD.org> <51D56C19.8080103@FreeBSD.org> Date: Thu, 4 Jul 2013 15:59:53 +0300 Message-ID: Subject: Re: boot from ZFS: which pool types use? From: Kimmo Paasiala To: Dmitry Morozovsky Content-Type: text/plain; charset=UTF-8 Cc: freebsd-fs@freebsd.org, Andriy Gapon X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 12:59:54 -0000 On Thu, Jul 4, 2013 at 3:43 PM, Dmitry Morozovsky wrote: > On Thu, 4 Jul 2013, Andriy Gapon wrote: > >> >>>> Can't find /boot/zfsloader >> >>> >> >>> Does this file exist in the filesystem pointed to by bootfs property (if set)? >> >> >> >> Arghh!!! I missed to set this one (however, this is the only zfs pool on the >> >> machine -- shouldn't the loader assume it is safe to try to boot off?) >> > >> > For the record: setting up zpool bootfs property fixes the issue. >> > >> > I suppose it should be emphasized in all guides to not miss this, to avoid >> > confusions; also, maybr some reasonable defaults should be implemented to help >> > avoiding simple configuration errors. >> > >> > Thanks Andriy a lot, I really needed a 'reset press' from outside after >> > kinda-3-days dances around this machine ;-) >> >> Setting bootfs should not be required. If your root filesystem is your root >> dataset (like "tank"), then everything should have just worked. > > it was/is (and, as I previously stated, is the only ZFS dataset on the > machine), but unfortunately without explicit setting bootfs property does not > work :( > > As this machine is still on staging phase (although I'm afraid I have to put it > in production during next few days, so timing are tight), I'm more than happy > to make debugging boots/runs to tighten the issue. > > Thanks again! > > -- With a recent 9-STABLE setting the bootfs property is actually the only thing you need in addition to installing the boot loader, no need to muck with the zpool.cache anymore. -Kimmo From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 13:06:10 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 42856CC3 for ; Thu, 4 Jul 2013 13:06:10 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 8B2551453 for ; Thu, 4 Jul 2013 13:06:09 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id QAA09226; Thu, 04 Jul 2013 16:05:45 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1UujEH-000EgS-1E; Thu, 04 Jul 2013 16:05:45 +0300 Message-ID: <51D57305.2010709@FreeBSD.org> Date: Thu, 04 Jul 2013 16:05:09 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130405 Thunderbird/17.0.5 MIME-Version: 1.0 To: Dmitry Morozovsky Subject: Re: boot from ZFS: which pool types use? References: <51D56066.1020902@FreeBSD.org> <51D56C19.8080103@FreeBSD.org> In-Reply-To: X-Enigmail-Version: 1.5.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 13:06:10 -0000 on 04/07/2013 15:43 Dmitry Morozovsky said the following: > it was/is (and, as I previously stated, is the only ZFS dataset on the > machine), but unfortunately without explicit setting bootfs property does not > work :( This is some confusing wording. We talk about _pools_ on a machine and we talk about _datasets_ in a pool. So I am not 100% sure what you mean. Whether you have a single pool, or whether you have a single dataset/filesystem in a pool, or both. Hint: output of commands is usually better than a bunch of free-form text :) -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 13:17:05 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id E0E4C18F; Thu, 4 Jul 2013 13:17:05 +0000 (UTC) (envelope-from bofh@terranova.net) Received: from tog.net (tog.net [IPv6:2605:5a00::5]) by mx1.freebsd.org (Postfix) with ESMTP id B8B91162C; Thu, 4 Jul 2013 13:17:05 +0000 (UTC) Received: from [IPv6:2605:5a00:ffff::face] (unknown [IPv6:2605:5a00:ffff::face]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by tog.net (Postfix) with ESMTPSA id 3bmKTs1cv2z2D6; Thu, 4 Jul 2013 09:17:05 -0400 (EDT) Message-ID: <51D575C9.4040402@terranova.net> Date: Thu, 04 Jul 2013 09:16:57 -0400 From: Travis Mikalson Organization: TerraNovaNet Internet Services User-Agent: Thunderbird 2.0.0.24 (Windows/20100228) MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Re: Report: ZFS deadlock in 9-STABLE References: <51D45401.5050801@terranova.net> <51D47A5F.3030501@delphij.net> In-Reply-To: <51D47A5F.3030501@delphij.net> X-Enigmail-Version: 0.96.0 OpenPGP: url=http://www.terranova.net/pgp/bofh Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: kib@freebsd.org, d@delphij.net X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 13:17:05 -0000 Xin Li wrote: > Hi, > > Sorry for the top posting but I am quite convinced that this is a > known issue that we have seen with our customer. Please try applying > this patch [1] and please report back if that fixes your problem. > > Note that if you would like to provide more help, we would appreciate > that you test Konstantin's patch as well, at: I will apply both patches and see what happens. It will be a couple of weeks with no deadlocks before we get an idea if it was effective. (Or, god forbid, I come back with another different-looking deadlock.) Thanks! > http://lists.freebsd.org/pipermail/freebsd-hackers/2013-May/042876.html > > [1] See attachment; the commit is > https://github.com/trueos/trueos/commit/f678ae7c7f72fba577b00e3d0c237c4f297575c6 > > Cheers, > > On 07/03/13 09:40, Travis Mikalson wrote: >> Hello, > >> To cut to the chase, I have a procstat -kk -a captured during a >> livelock for you here: >> http://tog.net/freebsd/zfsdeadlock-storage1-20130703 > >> The other relevant configurations I could think of to show you are >> available within that http://tog.net/freebsd/ directory. > >> If you want any additional information that I haven't given here >> please let me know! > >> This is a FreeBSD 9-STABLE AMD64 system currently at: r250777: Sat >> May 18 17:41:39 EDT 2013 > >> I didn't see too many relevant ZFS-related fixes after that date so >> am waiting for another round of interesting commits to update >> again. > >> Unfortunately, this system has been livelocking on average about >> once every 7-14 days. Its lot in life is a ZFS storage server >> serving NFS and istgt traffic. > >> It has 32GB of RAM and is an 8-core 2.6GHz Opteron 6212. The zpool >> looks like this, it has eight 1TB SAS drives and two SSDs being >> used for log and cache. > >> pool: storage1 state: ONLINE status: The pool is formatted using a >> legacy on-disk format. The pool can still be used, but some >> features are unavailable. action: Upgrade the pool using 'zpool >> upgrade'. Once this is done, the pool will no longer be accessible >> on software that does not support feature flags. scan: scrub >> repaired 0 in 6h4m with 0 errors on Sun Jan 6 06:39:38 2013 >> config: > >> NAME STATE READ WRITE CKSUM storage1 ONLINE 0 >> 0 0 raidz1-0 ONLINE 0 0 0 da0 ONLINE 0 >> 0 0 da2 ONLINE 0 0 0 da4 ONLINE 0 >> 0 0 da6 ONLINE 0 0 0 raidz1-1 ONLINE 0 >> 0 0 da1 ONLINE 0 0 0 da3 ONLINE 0 >> 0 0 da5 ONLINE 0 0 0 da7 ONLINE 0 >> 0 0 logs mirror-2 ONLINE 0 0 0 da8p2 ONLINE >> 0 0 0 da9p2 ONLINE 0 0 0 cache da8p3 >> ONLINE 0 0 0 da9p3 ONLINE 0 0 0 > >> errors: No known data errors From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 13:21:40 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id DF64A3BF; Thu, 4 Jul 2013 13:21:40 +0000 (UTC) (envelope-from c.kworr@gmail.com) Received: from mail-la0-x231.google.com (mail-la0-x231.google.com [IPv6:2a00:1450:4010:c03::231]) by mx1.freebsd.org (Postfix) with ESMTP id 38A4D1671; Thu, 4 Jul 2013 13:21:40 +0000 (UTC) Received: by mail-la0-f49.google.com with SMTP id ea20so1212180lab.8 for ; Thu, 04 Jul 2013 06:21:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=Dr++XZmDvxVgRSyC2H6iq8OFhWPYoQT+NjCHtWnhDYk=; b=WJiQ8Rz5/T7Zh0cePQDiRA0J+ZhgOLYuwGFoa3V8Lk/B5uWxWAwxpg+b6VXEuRURj0 T3YBXyquDBR72iXx2wJ706od+Zs77/wZ7wR3uWWgsz/eQIFcLDfSPUMMybhx7fcE6sDY C1JZbNpj4yF1umUcDoGHo27AKVglqjZq9u9TksaONpE/UXbbY4D2wxB37qxyqZfB182G /zlQtOERsDwxM9C8K/kH/mpZXLuaYBjRoMmp5GxLodWZb8xjQS7aytFBm5Egvww3gB2/ Lg3cqBN/W3oZAcbycKBdjFHn42CUwyypCcA9Hx89+ynwR89O6mkVelNWaeN/CXhRpky/ k0uw== X-Received: by 10.112.218.68 with SMTP id pe4mr3521483lbc.40.1372944099089; Thu, 04 Jul 2013 06:21:39 -0700 (PDT) Received: from [192.168.1.139] (mau.donbass.com. [92.242.127.250]) by mx.google.com with ESMTPSA id x5sm1264083lbx.8.2013.07.04.06.21.38 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 04 Jul 2013 06:21:38 -0700 (PDT) Message-ID: <51D576E1.6030803@gmail.com> Date: Thu, 04 Jul 2013 16:21:37 +0300 From: Volodymyr Kostyrko User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130627 Thunderbird/17.0.7 MIME-Version: 1.0 To: Dmitry Morozovsky Subject: Re: ZFS default compression algo for contemporary FreeBSD versions References: In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org, avg@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 13:21:40 -0000 04.07.2013 15:22, Dmitry Morozovsky wrote: > Collegues, > > is it sane to just set 'zfs compression=on dataset' to achieve best algo on > fresh FreeBSD systems (-current and/or stable/9)? No and this is not safe AFAIK. Default compression is still lzjb and bootloader can't boot oof datasets compressed with lzjb. However on stable/9 you can simply set zfs compression=lz4 pool and everything would work fine if you updated the boot loader. -- Sphinx of black quartz, judge my vow. From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 13:24:40 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 1112E469 for ; Thu, 4 Jul 2013 13:24:40 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 47BDE1691 for ; Thu, 4 Jul 2013 13:24:38 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id QAA09646; Thu, 04 Jul 2013 16:24:35 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1UujWU-000Ehx-Qo; Thu, 04 Jul 2013 16:24:34 +0300 Message-ID: <51D5776F.5060101@FreeBSD.org> Date: Thu, 04 Jul 2013 16:23:59 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130405 Thunderbird/17.0.5 MIME-Version: 1.0 To: Travis Mikalson Subject: Re: Report: ZFS deadlock in 9-STABLE References: <51D45401.5050801@terranova.net> In-Reply-To: <51D45401.5050801@terranova.net> X-Enigmail-Version: 1.5.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 13:24:40 -0000 on 03/07/2013 19:40 Travis Mikalson said the following: > Hello, > > To cut to the chase, I have a procstat -kk -a captured during a livelock > for you here: > http://tog.net/freebsd/zfsdeadlock-storage1-20130703 BTW, https://wiki.freebsd.org/AvgZfsDeadlockDebug "If neither of the above is true. That is, you do see zio_wait and you don't see either of zio_done or zio_interrupt, then the problem is most likely with the storage layer..." -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 13:25:00 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 92C5C4DC; Thu, 4 Jul 2013 13:25:00 +0000 (UTC) (envelope-from c.kworr@gmail.com) Received: from mail-la0-x22c.google.com (mail-la0-x22c.google.com [IPv6:2a00:1450:4010:c03::22c]) by mx1.freebsd.org (Postfix) with ESMTP id E13EC1698; Thu, 4 Jul 2013 13:24:59 +0000 (UTC) Received: by mail-la0-f44.google.com with SMTP id er20so1242365lab.3 for ; Thu, 04 Jul 2013 06:24:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=2EDottu3r6c0tZEg5T3lz8XFfG608120Rx3PsidwU18=; b=ZnvTzKmZ1DaqwBCrkHKqw26/N4BwxnP9VvT2pt77QCjYuPOm7350OvWCVxTdPxxa6Q pdu2D05H4VUMp1hJ/McadG74KtsvmnR3UbvjAW7DjtVI3FpQJ/B1nKNNq1P1vTM9BGtz uyzo9cTL27JCeydoZsaV02oI/ye4KU+Nyfsy7QMzXeA27mn3YWEc18OAkKxYR4tAzT+K eYUHhM0vRoDtYVnDbmSKdbpuwXcEJOTdcEBcfq1wsq+kpkuCF8JLDNsNouL7e/ro3JvV HiweEA26tL3DzMZBQTfiFaPXxhzcRDkM5fSWNGGn5qDlVdaaKwip6TexXdPI4IZuZQ0r eg+w== X-Received: by 10.152.4.65 with SMTP id i1mr2937530lai.21.1372944298912; Thu, 04 Jul 2013 06:24:58 -0700 (PDT) Received: from [192.168.1.139] (mau.donbass.com. [92.242.127.250]) by mx.google.com with ESMTPSA id am8sm1146573lac.1.2013.07.04.06.24.58 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 04 Jul 2013 06:24:58 -0700 (PDT) Message-ID: <51D577A9.1030304@gmail.com> Date: Thu, 04 Jul 2013 16:24:57 +0300 From: Volodymyr Kostyrko User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130627 Thunderbird/17.0.7 MIME-Version: 1.0 To: Dmitry Morozovsky Subject: Re: boot from ZFS: which pool types use? References: <51D56066.1020902@FreeBSD.org> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org, Andriy Gapon X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 13:25:00 -0000 04.07.2013 14:56, Dmitry Morozovsky wrote: > On Thu, 4 Jul 2013, Andriy Gapon wrote: > >> on 04/07/2013 13:42 Dmitry Morozovsky said the following: >>> Can't find /boot/zfsloader >> >> Does this file exist in the filesystem pointed to by bootfs property (if set)? > > Arghh!!! I missed to set this one (however, this is the only zfs pool on the > machine -- shouldn't the loader assume it is safe to try to boot off?) > > Regarding your other questions: That's weird. On all my machines I always set only vfs.root.mountfrom in /boot/loader.conf to point to dataset containing root fs. -- Sphinx of black quartz, judge my vow. From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 13:44:03 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 1E9FC684 for ; Thu, 4 Jul 2013 13:44:03 +0000 (UTC) (envelope-from bofh@terranova.net) Received: from tog.net (tog.net [IPv6:2605:5a00::5]) by mx1.freebsd.org (Postfix) with ESMTP id E99BB1722 for ; Thu, 4 Jul 2013 13:44:02 +0000 (UTC) Received: from [IPv6:2605:5a00:ffff::face] (unknown [IPv6:2605:5a00:ffff::face]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by tog.net (Postfix) with ESMTPSA id 3bmL4x1VRxz6X8 for ; Thu, 4 Jul 2013 09:44:01 -0400 (EDT) Message-ID: <51D57C19.1080906@terranova.net> Date: Thu, 04 Jul 2013 09:43:53 -0400 From: Travis Mikalson Organization: TerraNovaNet Internet Services User-Agent: Thunderbird 2.0.0.24 (Windows/20100228) MIME-Version: 1.0 To: freebsd-fs@FreeBSD.org Subject: Re: Report: ZFS deadlock in 9-STABLE References: <51D45401.5050801@terranova.net> <51D5776F.5060101@FreeBSD.org> In-Reply-To: <51D5776F.5060101@FreeBSD.org> X-Enigmail-Version: 0.96.0 OpenPGP: url=http://www.terranova.net/pgp/bofh Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 13:44:03 -0000 Andriy Gapon wrote: > on 03/07/2013 19:40 Travis Mikalson said the following: >> Hello, >> >> To cut to the chase, I have a procstat -kk -a captured during a livelock >> for you here: >> http://tog.net/freebsd/zfsdeadlock-storage1-20130703 > > BTW, https://wiki.freebsd.org/AvgZfsDeadlockDebug > > "If neither of the above is true. That is, you do see zio_wait and you don't see > either of zio_done or zio_interrupt, then the problem is most likely with the > storage layer..." Yes, that helpful article is where I got the run-down on how best to report what was going on here. I still believe this is an actual deadlock bug and not a storage layer issue. I have not seen any indications of any problems with my storage layer. You'd think there would be some scary-looking complaint on the console during one of these deadlocks if it had suddenly lost the capability to communicate with most or all the disks, but I've deadlocked at least 10 times now in 2013 and never anything of the sort. Thanks to IPMI, I have actually viewed the console each time it has happened. From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 13:50:05 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 65EBE95A; Thu, 4 Jul 2013 13:50:05 +0000 (UTC) (envelope-from bofh@terranova.net) Received: from tog.net (tog.net [216.89.226.5]) by mx1.freebsd.org (Postfix) with ESMTP id 1F98A1784; Thu, 4 Jul 2013 13:50:04 +0000 (UTC) Received: from [IPv6:2605:5a00:ffff::face] (unknown [IPv6:2605:5a00:ffff::face]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by tog.net (Postfix) with ESMTPSA id 3bmLCv6nbWz7kS; Thu, 4 Jul 2013 09:50:03 -0400 (EDT) Message-ID: <51D57D84.8090004@terranova.net> Date: Thu, 04 Jul 2013 09:49:56 -0400 From: Travis Mikalson Organization: TerraNovaNet Internet Services User-Agent: Thunderbird 2.0.0.24 (Windows/20100228) MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Re: Report: ZFS deadlock in 9-STABLE References: <51D45401.5050801@terranova.net> <51D47A5F.3030501@delphij.net> <51D575C9.4040402@terranova.net> In-Reply-To: <51D575C9.4040402@terranova.net> X-Enigmail-Version: 0.96.0 OpenPGP: url=http://www.terranova.net/pgp/bofh Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: d@delphij.net, kib@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 13:50:05 -0000 Travis Mikalson wrote: > Xin Li wrote: >> Hi, >> >> Sorry for the top posting but I am quite convinced that this is a >> known issue that we have seen with our customer. Please try applying >> this patch [1] and please report back if that fixes your problem. >> >> Note that if you would like to provide more help, we would appreciate >> that you test Konstantin's patch as well, at: > > I will apply both patches and see what happens. It will be a couple of > weeks with no deadlocks before we get an idea if it was effective. (Or, > god forbid, I come back with another different-looking deadlock.) Actually it looks like Konstantin's patch is already incorporated into yours. Konstantin's diff: - while (ithd->it_need) { + while (atomic_load_acq_int(&ithd->it_need)) { Your diff: - while (ithd->it_need) { + while (atomic_load_acq_int(&ithd->it_need) != 0) { > Thanks! > >> http://lists.freebsd.org/pipermail/freebsd-hackers/2013-May/042876.html >> >> [1] See attachment; the commit is >> https://github.com/trueos/trueos/commit/f678ae7c7f72fba577b00e3d0c237c4f297575c6 >> >> Cheers, >> >> On 07/03/13 09:40, Travis Mikalson wrote: >>> Hello, >>> To cut to the chase, I have a procstat -kk -a captured during a >>> livelock for you here: >>> http://tog.net/freebsd/zfsdeadlock-storage1-20130703 >>> The other relevant configurations I could think of to show you are >>> available within that http://tog.net/freebsd/ directory. >>> If you want any additional information that I haven't given here >>> please let me know! >>> This is a FreeBSD 9-STABLE AMD64 system currently at: r250777: Sat >>> May 18 17:41:39 EDT 2013 >>> I didn't see too many relevant ZFS-related fixes after that date so >>> am waiting for another round of interesting commits to update >>> again. >>> Unfortunately, this system has been livelocking on average about >>> once every 7-14 days. Its lot in life is a ZFS storage server >>> serving NFS and istgt traffic. >>> It has 32GB of RAM and is an 8-core 2.6GHz Opteron 6212. The zpool >>> looks like this, it has eight 1TB SAS drives and two SSDs being >>> used for log and cache. >>> pool: storage1 state: ONLINE status: The pool is formatted using a >>> legacy on-disk format. The pool can still be used, but some >>> features are unavailable. action: Upgrade the pool using 'zpool >>> upgrade'. Once this is done, the pool will no longer be accessible >>> on software that does not support feature flags. scan: scrub >>> repaired 0 in 6h4m with 0 errors on Sun Jan 6 06:39:38 2013 >>> config: >>> NAME STATE READ WRITE CKSUM storage1 ONLINE 0 >>> 0 0 raidz1-0 ONLINE 0 0 0 da0 ONLINE 0 >>> 0 0 da2 ONLINE 0 0 0 da4 ONLINE 0 >>> 0 0 da6 ONLINE 0 0 0 raidz1-1 ONLINE 0 >>> 0 0 da1 ONLINE 0 0 0 da3 ONLINE 0 >>> 0 0 da5 ONLINE 0 0 0 da7 ONLINE 0 >>> 0 0 logs mirror-2 ONLINE 0 0 0 da8p2 ONLINE >>> 0 0 0 da9p2 ONLINE 0 0 0 cache da8p3 >>> ONLINE 0 0 0 da9p3 ONLINE 0 0 0 >>> errors: No known data errors From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 14:02:28 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 29B8DBFD for ; Thu, 4 Jul 2013 14:02:28 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 76C2117F5 for ; Thu, 4 Jul 2013 14:02:27 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id RAA10693; Thu, 04 Jul 2013 17:02:24 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1Uuk75-000ElU-MX; Thu, 04 Jul 2013 17:02:23 +0300 Message-ID: <51D5804B.7090702@FreeBSD.org> Date: Thu, 04 Jul 2013 17:01:47 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130405 Thunderbird/17.0.5 MIME-Version: 1.0 To: Travis Mikalson Subject: Re: Report: ZFS deadlock in 9-STABLE References: <51D45401.5050801@terranova.net> <51D5776F.5060101@FreeBSD.org> <51D57C19.1080906@terranova.net> In-Reply-To: <51D57C19.1080906@terranova.net> X-Enigmail-Version: 1.5.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 14:02:28 -0000 on 04/07/2013 16:43 Travis Mikalson said the following: > Yes, that helpful article is where I got the run-down on how best to > report what was going on here. I still believe this is an actual > deadlock bug and not a storage layer issue. > > I have not seen any indications of any problems with my storage layer. > You'd think there would be some scary-looking complaint on the console > during one of these deadlocks if it had suddenly lost the capability to > communicate with most or all the disks, but I've deadlocked at least 10 > times now in 2013 and never anything of the sort. Thanks to IPMI, I have > actually viewed the console each time it has happened. Well, I do consider GEOM, CAM, drivers to be parts of the storage layer. In other words, everything below ZFS. -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 14:30:25 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id E6B7A3FD for ; Thu, 4 Jul 2013 14:30:25 +0000 (UTC) (envelope-from bofh@terranova.net) Received: from tog.net (tog.net [IPv6:2605:5a00::5]) by mx1.freebsd.org (Postfix) with ESMTP id BC3F91914 for ; Thu, 4 Jul 2013 14:30:25 +0000 (UTC) Received: from [IPv6:2605:5a00:ffff::face] (unknown [IPv6:2605:5a00:ffff::face]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by tog.net (Postfix) with ESMTPSA id 3bmM6T2LlYz6x9 for ; Thu, 4 Jul 2013 10:30:25 -0400 (EDT) Message-ID: <51D586F9.7060508@terranova.net> Date: Thu, 04 Jul 2013 10:30:17 -0400 From: Travis Mikalson Organization: TerraNovaNet Internet Services User-Agent: Thunderbird 2.0.0.24 (Windows/20100228) MIME-Version: 1.0 To: freebsd-fs@FreeBSD.org Subject: Re: Report: ZFS deadlock in 9-STABLE References: <51D45401.5050801@terranova.net> <51D5776F.5060101@FreeBSD.org> <51D57C19.1080906@terranova.net> <51D5804B.7090702@FreeBSD.org> In-Reply-To: <51D5804B.7090702@FreeBSD.org> X-Enigmail-Version: 0.96.0 OpenPGP: url=http://www.terranova.net/pgp/bofh Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 14:30:26 -0000 Andriy Gapon wrote: > on 04/07/2013 16:43 Travis Mikalson said the following: >> Yes, that helpful article is where I got the run-down on how best to >> report what was going on here. I still believe this is an actual >> deadlock bug and not a storage layer issue. >> >> I have not seen any indications of any problems with my storage layer. >> You'd think there would be some scary-looking complaint on the console >> during one of these deadlocks if it had suddenly lost the capability to >> communicate with most or all the disks, but I've deadlocked at least 10 >> times now in 2013 and never anything of the sort. Thanks to IPMI, I have >> actually viewed the console each time it has happened. > > Well, I do consider GEOM, CAM, drivers to be parts of the storage layer. > In other words, everything below ZFS. Ah, I believe I understand. It's not necessarily a hardware issue (which is what I took away from the original verbage), the deadlock may have occurred in other parts of the storage layer. FWIW, my simple UFS compact flash that I boot from also becomes inaccessible during these deadlocks. All UFS and ZFS storage goes dead simultaneously. If it were purely a ZFS issue, I suppose one might expect to still be able to read from their UFS filesystem. From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 14:56:10 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id AC641DA4 for ; Thu, 4 Jul 2013 14:56:10 +0000 (UTC) (envelope-from jdc@koitsu.org) Received: from relay3-d.mail.gandi.net (relay3-d.mail.gandi.net [217.70.183.195]) by mx1.freebsd.org (Postfix) with ESMTP id 521B31AC0 for ; Thu, 4 Jul 2013 14:56:10 +0000 (UTC) Received: from mfilter25-d.gandi.net (mfilter25-d.gandi.net [217.70.178.153]) by relay3-d.mail.gandi.net (Postfix) with ESMTP id 57580A80D9; Thu, 4 Jul 2013 16:55:53 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at mfilter25-d.gandi.net Received: from relay3-d.mail.gandi.net ([217.70.183.195]) by mfilter25-d.gandi.net (mfilter25-d.gandi.net [10.0.15.180]) (amavisd-new, port 10024) with ESMTP id l27hw-nSdhOq; Thu, 4 Jul 2013 16:55:51 +0200 (CEST) X-Originating-IP: 76.102.14.35 Received: from jdc.koitsu.org (c-76-102-14-35.hsd1.ca.comcast.net [76.102.14.35]) (Authenticated sender: jdc@koitsu.org) by relay3-d.mail.gandi.net (Postfix) with ESMTPSA id A99B0A80C4; Thu, 4 Jul 2013 16:55:50 +0200 (CEST) Received: by icarus.home.lan (Postfix, from userid 1000) id C776573A1C; Thu, 4 Jul 2013 07:55:48 -0700 (PDT) Date: Thu, 4 Jul 2013 07:55:48 -0700 From: Jeremy Chadwick To: Travis Mikalson Subject: Re: Report: ZFS deadlock in 9-STABLE Message-ID: <20130704145548.GA91766@icarus.home.lan> References: <51D45401.5050801@terranova.net> <51D5776F.5060101@FreeBSD.org> <51D57C19.1080906@terranova.net> <51D5804B.7090702@FreeBSD.org> <51D586F9.7060508@terranova.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <51D586F9.7060508@terranova.net> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 14:56:10 -0000 On Thu, Jul 04, 2013 at 10:30:17AM -0400, Travis Mikalson wrote: > > > Andriy Gapon wrote: > > on 04/07/2013 16:43 Travis Mikalson said the following: > >> Yes, that helpful article is where I got the run-down on how best to > >> report what was going on here. I still believe this is an actual > >> deadlock bug and not a storage layer issue. > >> > >> I have not seen any indications of any problems with my storage layer. > >> You'd think there would be some scary-looking complaint on the console > >> during one of these deadlocks if it had suddenly lost the capability to > >> communicate with most or all the disks, but I've deadlocked at least 10 > >> times now in 2013 and never anything of the sort. Thanks to IPMI, I have > >> actually viewed the console each time it has happened. > > > > Well, I do consider GEOM, CAM, drivers to be parts of the storage layer. > > In other words, everything below ZFS. > > Ah, I believe I understand. It's not necessarily a hardware issue (which > is what I took away from the original verbage), the deadlock may have > occurred in other parts of the storage layer. > > FWIW, my simple UFS compact flash that I boot from also becomes > inaccessible during these deadlocks. All UFS and ZFS storage goes dead > simultaneously. If it were purely a ZFS issue, I suppose one might > expect to still be able to read from their UFS filesystem. I'd like to get output from all of these commands: - dmesg (you can hide/XXX out the system name if you want, but please don't remove anything else, barring IP addresses/etc.) - zpool get all - zfs get all - "gpart show -p" for every disk on the system - "vmstat -i" when the system is livelocked (if possible; see below) - The exact brand and model string of mps(4) controllers you're using - The exact firmware version and firmware type (often a 2-letter code) you're using on your mps(4) controllers (dmesg might show some of this but possibly not all) - Is powerd(8) running on this system at all? Please put these in separate files and upload them to http://tog.net/freebsd/ if you could. (For the gpart output, you can put all the output from all the disks in a single file) I can see your ZFS disks are probably using those mps(4) controllers. I also see you have an AHCI controller. I know you can't move all your disks to the AHCI controller due to there not being enough ports, and the controller might not even work with SAS disks (depends, some newer/higher end Intel ones do), but: A "CF drive locking up too" doesn't really tell us anything about the CF drive, how it's hooked up, etc... But I'd rather not even go into that, because: Advice: Hook a SATA disk up to your ahci(4) controller and just leave it there. No filesystem, just a raw disk sitting on a bus. When the livelock happens, in another window issue "dd if=/dev/ada0 of=/dev/null bs=64k" (disk might not be named ada0; again, need that dmesg) and after a second or two press Ctrl-T to see if you get any output (output should be immediate). If you do get output, it means GEOM and/or CAM are still functional in some manner, and that puts more focus on the mps(4) side of things. There are still nearly infinite explanations for what's going on though. Which leads me to... Question: If the system is livelocked, how are you running "procstat -kk -a" in the first place? Or does it "livelock" and then release itself from the pain (eventually), only later to re-lock? A "livelock" usually implies the system is alive in some way (hitting NumLock on the keyboard (hopefully PS/2) still toggles the LED (kernel does this -- I've used this as a way to see if a system is locked up or not for years)) just that some layer pertaining to your focus (ZFS I/O) is wonky. If it comes and goes, there may be some explanations for that, but output from those commands would greatly help. Question: What's with the tunings in loader.conf and sysctl.conf for ZFS? Not saying those are the issue, just asking why you're setting those at all. Is there something we need to know about that you've run into in the past? -- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 15:13:56 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 2DB8131B for ; Thu, 4 Jul 2013 15:13:56 +0000 (UTC) (envelope-from roberto@keltia.freenix.fr) Received: from keltia.net (cl-90.mrs-01.fr.sixxs.net [IPv6:2a01:240:fe00:59::2]) by mx1.freebsd.org (Postfix) with ESMTP id E5BC31BBC for ; Thu, 4 Jul 2013 15:13:55 +0000 (UTC) Received: from roberto02-aw.erc.corp.eurocontrol.int (aran.keltia.net [88.191.250.24]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: roberto) by keltia.net (Postfix) with ESMTPSA id C4ABF52AE; Thu, 4 Jul 2013 17:13:53 +0200 (CEST) Date: Thu, 4 Jul 2013 17:13:52 +0200 From: Ollivier Robert To: freebsd-fs@freebsd.org, Dmitry Morozovsky Subject: Re: boot from ZFS: which pool types use? Message-ID: <20130704151352.GC43809@roberto02-aw.erc.corp.eurocontrol.int> References: <51D56066.1020902@FreeBSD.org> <51D56C19.8080103@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <51D56C19.8080103@FreeBSD.org> X-Operating-System: MacOS X / Macbook Pro - FreeBSD 7.2 / Dell D820 SMP User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 15:13:56 -0000 According to Andriy Gapon on Thu, Jul 04, 2013 at 03:35:37PM +0300: > Setting bootfs should not be required. If your root filesystem is your root > dataset (like "tank"), then everything should have just worked. Except that it is a better idea to have a dataset separated from the root of the pool in order to be able to switch from one to another when upgrading, right? -- Ollivier ROBERT -=- FreeBSD: The Power to Serve! -=- roberto@keltia.net In memoriam to Ondine, our 2nd child: http://ondine.keltia.net/ From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 15:28:22 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 36310854 for ; Thu, 4 Jul 2013 15:28:22 +0000 (UTC) (envelope-from kpaasial@gmail.com) Received: from mail-qa0-x232.google.com (mail-qa0-x232.google.com [IPv6:2607:f8b0:400d:c00::232]) by mx1.freebsd.org (Postfix) with ESMTP id F042A1C93 for ; Thu, 4 Jul 2013 15:28:21 +0000 (UTC) Received: by mail-qa0-f50.google.com with SMTP id l18so834913qak.9 for ; Thu, 04 Jul 2013 08:28:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=DplLSclHBkH4VQWY4dtcrCCVgPDDNh7L6BHksuKAnWY=; b=hcxqQ7iUlDFnPBacfgNCYdQ3W+RDjZFkbmmRUYy1/JolqbCKoMrRusIBNhzMohMQQu G4lFaS+ph1n06kRGgxLgLp3dBtrJSZQAG/3Q4DUMMX3S8VfM7KvFzTkXHCXAOe0mRiU6 QvLtTinUhbhv4xCMb/Jt+eYW4Mwrv3bv7ab7H47FagP6+ZcLPUyeFhyZ0qBJWadYYRsp JzRFKy9P34YIUi0l2gC+T/Fh3yFFxtMLWmWE3wIRN84CgMZkcmXwZFJ+fRW9/IL8WZiE dxUjZ+GjuF7CZ0BekzjDiwn6d99/SpfrxrF4uHPoTzT92afra0wXjWYozuKg8uSLOKCS L6SQ== MIME-Version: 1.0 X-Received: by 10.224.29.69 with SMTP id p5mr7853423qac.5.1372951701590; Thu, 04 Jul 2013 08:28:21 -0700 (PDT) Received: by 10.224.138.78 with HTTP; Thu, 4 Jul 2013 08:28:21 -0700 (PDT) In-Reply-To: <20130704151352.GC43809@roberto02-aw.erc.corp.eurocontrol.int> References: <51D56066.1020902@FreeBSD.org> <51D56C19.8080103@FreeBSD.org> <20130704151352.GC43809@roberto02-aw.erc.corp.eurocontrol.int> Date: Thu, 4 Jul 2013 18:28:21 +0300 Message-ID: Subject: Re: boot from ZFS: which pool types use? From: Kimmo Paasiala To: Ollivier Robert Content-Type: text/plain; charset=UTF-8 Cc: freebsd-fs@freebsd.org, Dmitry Morozovsky X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 15:28:22 -0000 On Thu, Jul 4, 2013 at 6:13 PM, Ollivier Robert wrote: > According to Andriy Gapon on Thu, Jul 04, 2013 at 03:35:37PM +0300: >> Setting bootfs should not be required. If your root filesystem is your root >> dataset (like "tank"), then everything should have just worked. > > Except that it is a better idea to have a dataset separated from the root of the pool in order to be able to switch from one to another when upgrading, right? > > -- If you use the Solaris boot environments (the FreeBSD port is called sysutils/beadm) they require that the datasets are structured exacly in a certain way. For example: pool/ROOT/default pool/ROOT/default/usr ... pool/ROOT/test pool/ROOT/test/usr etc. To be able to boot from any of the boot environments you have to set the bootfs property (or set vfs.root.mountfrom in loader.conf but that's terribly clumsy with boot environments). -Kimmo From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 15:50:03 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id C3923DAB; Thu, 4 Jul 2013 15:50:03 +0000 (UTC) (envelope-from marck@rinet.ru) Received: from woozle.rinet.ru (woozle.rinet.ru [195.54.192.68]) by mx1.freebsd.org (Postfix) with ESMTP id 37CB71DC7; Thu, 4 Jul 2013 15:50:02 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by woozle.rinet.ru (8.14.5/8.14.5) with ESMTP id r64Fo11U055390; Thu, 4 Jul 2013 19:50:01 +0400 (MSK) (envelope-from marck@rinet.ru) Date: Thu, 4 Jul 2013 19:50:01 +0400 (MSK) From: Dmitry Morozovsky To: Andriy Gapon Subject: Re: boot from ZFS: which pool types use? In-Reply-To: <51D57305.2010709@FreeBSD.org> Message-ID: References: <51D56066.1020902@FreeBSD.org> <51D56C19.8080103@FreeBSD.org> <51D57305.2010709@FreeBSD.org> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) X-NCC-RegID: ru.rinet X-OpenPGP-Key-ID: 6B691B03 MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (woozle.rinet.ru [0.0.0.0]); Thu, 04 Jul 2013 19:50:01 +0400 (MSK) Cc: freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 15:50:03 -0000 On Thu, 4 Jul 2013, Andriy Gapon wrote: > on 04/07/2013 15:43 Dmitry Morozovsky said the following: > > it was/is (and, as I previously stated, is the only ZFS dataset on the > > machine), but unfortunately without explicit setting bootfs property does not > > work :( > > This is some confusing wording. We talk about _pools_ on a machine and we talk > about _datasets_ in a pool. So I am not 100% sure what you mean. Whether you > have a single pool, or whether you have a single dataset/filesystem in a pool, > or both. > > Hint: output of commands is usually better than a bunch of free-form text :) Ah, I see the point, and the root dataset of the pool is *not* the root in my case: root@briareus:/usr/src# zfs list NAME USED AVAIL REFER MOUNTPOINT br 13.3G 1.05T 31K legacy br/R 13.3G 1.05T 389M / br/R/db 8.30G 1.05T 8.30G /db br/R/usr 4.51G 1.05T 3.44G /usr br/R/usr/local 363M 1.05T 363M /usr/local br/R/usr/ports 416M 1.05T 302M /usr/ports br/R/usr/ports/distfiles 113M 1.05T 113M /usr/ports/distfiles br/R/usr/src 317M 1.05T 317M /usr/src br/R/var 122M 1.05T 122M /var root@briareus:/usr/src# zpool get bootfs NAME PROPERTY VALUE SOURCE br bootfs br/R local /R is to have single mountpoint and possibility to have other root datasets to use as non-mounted (iSCSI volumes, etc) -- Sincerely, D.Marck [DM5020, MCK-RIPE, DM3-RIPN] [ FreeBSD committer: marck@FreeBSD.org ] ------------------------------------------------------------------------ *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- marck@rinet.ru *** ------------------------------------------------------------------------ From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 15:51:54 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 062C4E6E; Thu, 4 Jul 2013 15:51:54 +0000 (UTC) (envelope-from marck@rinet.ru) Received: from woozle.rinet.ru (woozle.rinet.ru [195.54.192.68]) by mx1.freebsd.org (Postfix) with ESMTP id 898E21DE0; Thu, 4 Jul 2013 15:51:53 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by woozle.rinet.ru (8.14.5/8.14.5) with ESMTP id r64FpqtI055476; Thu, 4 Jul 2013 19:51:52 +0400 (MSK) (envelope-from marck@rinet.ru) Date: Thu, 4 Jul 2013 19:51:52 +0400 (MSK) From: Dmitry Morozovsky To: Volodymyr Kostyrko Subject: Re: ZFS default compression algo for contemporary FreeBSD versions In-Reply-To: <51D576E1.6030803@gmail.com> Message-ID: References: <51D576E1.6030803@gmail.com> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) X-NCC-RegID: ru.rinet X-OpenPGP-Key-ID: 6B691B03 MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (woozle.rinet.ru [0.0.0.0]); Thu, 04 Jul 2013 19:51:52 +0400 (MSK) Cc: freebsd-fs@freebsd.org, avg@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 15:51:54 -0000 On Thu, 4 Jul 2013, Volodymyr Kostyrko wrote: > 04.07.2013 15:22, Dmitry Morozovsky wrote: > > Collegues, > > > > is it sane to just set 'zfs compression=on dataset' to achieve best algo on > > fresh FreeBSD systems (-current and/or stable/9)? > > No and this is not safe AFAIK. Default compression is still lzjb and > bootloader can't boot oof datasets compressed with lzjb. However on stable/9 > you can simply set zfs compression=lz4 pool and everything would work fine if > you updated the boot loader. I did not intend to compress root/boot datasets (and there is no much sense in this AFAICS); the second (and actually more important) my question is -- is lz4 in general better than lzjb? -- Sincerely, D.Marck [DM5020, MCK-RIPE, DM3-RIPN] [ FreeBSD committer: marck@FreeBSD.org ] ------------------------------------------------------------------------ *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- marck@rinet.ru *** ------------------------------------------------------------------------ From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 15:52:19 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 1E6DAEE1; Thu, 4 Jul 2013 15:52:19 +0000 (UTC) (envelope-from marck@rinet.ru) Received: from woozle.rinet.ru (woozle.rinet.ru [195.54.192.68]) by mx1.freebsd.org (Postfix) with ESMTP id A15881DE6; Thu, 4 Jul 2013 15:52:18 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by woozle.rinet.ru (8.14.5/8.14.5) with ESMTP id r64FqHAt055500; Thu, 4 Jul 2013 19:52:17 +0400 (MSK) (envelope-from marck@rinet.ru) Date: Thu, 4 Jul 2013 19:52:17 +0400 (MSK) From: Dmitry Morozovsky To: Volodymyr Kostyrko Subject: Re: boot from ZFS: which pool types use? In-Reply-To: <51D577A9.1030304@gmail.com> Message-ID: References: <51D56066.1020902@FreeBSD.org> <51D577A9.1030304@gmail.com> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) X-NCC-RegID: ru.rinet X-OpenPGP-Key-ID: 6B691B03 MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (woozle.rinet.ru [0.0.0.0]); Thu, 04 Jul 2013 19:52:17 +0400 (MSK) Cc: freebsd-fs@freebsd.org, Andriy Gapon X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 15:52:19 -0000 On Thu, 4 Jul 2013, Volodymyr Kostyrko wrote: > > > > Can't find /boot/zfsloader > > > > > > Does this file exist in the filesystem pointed to by bootfs property (if > > > set)? > > > > Arghh!!! I missed to set this one (however, this is the only zfs pool on the > > machine -- shouldn't the loader assume it is safe to try to boot off?) > > > > Regarding your other questions: > > That's weird. On all my machines I always set only vfs.root.mountfrom in > /boot/loader.conf to point to dataset containing root fs. In my last case, this was not enough. -- Sincerely, D.Marck [DM5020, MCK-RIPE, DM3-RIPN] [ FreeBSD committer: marck@FreeBSD.org ] ------------------------------------------------------------------------ *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- marck@rinet.ru *** ------------------------------------------------------------------------ From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 15:54:14 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id DD443106 for ; Thu, 4 Jul 2013 15:54:14 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 33D311E0F for ; Thu, 4 Jul 2013 15:54:13 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id SAA12717; Thu, 04 Jul 2013 18:53:50 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1Uulqv-000Evg-NZ; Thu, 04 Jul 2013 18:53:49 +0300 Message-ID: <51D59A55.1050807@FreeBSD.org> Date: Thu, 04 Jul 2013 18:52:53 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130405 Thunderbird/17.0.5 MIME-Version: 1.0 To: Dmitry Morozovsky Subject: Re: boot from ZFS: which pool types use? References: <51D56066.1020902@FreeBSD.org> <51D56C19.8080103@FreeBSD.org> <51D57305.2010709@FreeBSD.org> In-Reply-To: X-Enigmail-Version: 1.5.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 15:54:14 -0000 on 04/07/2013 18:50 Dmitry Morozovsky said the following: > On Thu, 4 Jul 2013, Andriy Gapon wrote: > >> on 04/07/2013 15:43 Dmitry Morozovsky said the following: >>> it was/is (and, as I previously stated, is the only ZFS dataset on the >>> machine), but unfortunately without explicit setting bootfs property does not >>> work :( >> >> This is some confusing wording. We talk about _pools_ on a machine and we talk >> about _datasets_ in a pool. So I am not 100% sure what you mean. Whether you >> have a single pool, or whether you have a single dataset/filesystem in a pool, >> or both. >> >> Hint: output of commands is usually better than a bunch of free-form text :) > > Ah, I see the point, and the root dataset of the pool is *not* the root in my > case: > > root@briareus:/usr/src# zfs list > NAME USED AVAIL REFER MOUNTPOINT > br 13.3G 1.05T 31K legacy > br/R 13.3G 1.05T 389M / > br/R/db 8.30G 1.05T 8.30G /db > br/R/usr 4.51G 1.05T 3.44G /usr > br/R/usr/local 363M 1.05T 363M /usr/local > br/R/usr/ports 416M 1.05T 302M /usr/ports > br/R/usr/ports/distfiles 113M 1.05T 113M /usr/ports/distfiles > br/R/usr/src 317M 1.05T 317M /usr/src > br/R/var 122M 1.05T 122M /var > root@briareus:/usr/src# zpool get bootfs > NAME PROPERTY VALUE SOURCE > br bootfs br/R local > > /R is to have single mountpoint and possibility to have other root datasets to > use as non-mounted (iSCSI volumes, etc) > Right. So in this case it is purely a pilot error. ZFS can not know that among the available datasets br/R is the root fs. Thus bootfs must be set. -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 15:55:42 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id B087318E for ; Thu, 4 Jul 2013 15:55:42 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id E2B0B1E22 for ; Thu, 4 Jul 2013 15:55:41 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id SAA12742; Thu, 04 Jul 2013 18:55:17 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1UulsK-000Evt-OE; Thu, 04 Jul 2013 18:55:16 +0300 Message-ID: <51D59AAD.3030208@FreeBSD.org> Date: Thu, 04 Jul 2013 18:54:21 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130405 Thunderbird/17.0.5 MIME-Version: 1.0 To: Dmitry Morozovsky Subject: Re: boot from ZFS: which pool types use? References: <51D56066.1020902@FreeBSD.org> <51D577A9.1030304@gmail.com> In-Reply-To: X-Enigmail-Version: 1.5.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org, Volodymyr Kostyrko X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 15:55:42 -0000 on 04/07/2013 18:52 Dmitry Morozovsky said the following: > On Thu, 4 Jul 2013, Volodymyr Kostyrko wrote: > >>>>> Can't find /boot/zfsloader >>>> >>>> Does this file exist in the filesystem pointed to by bootfs property (if >>>> set)? >>> >>> Arghh!!! I missed to set this one (however, this is the only zfs pool on the >>> machine -- shouldn't the loader assume it is safe to try to boot off?) >>> >>> Regarding your other questions: >> >> That's weird. On all my machines I always set only vfs.root.mountfrom in >> /boot/loader.conf to point to dataset containing root fs. > > In my last case, this was not enough. > And this is redundant if bootfs is correctly set. -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 15:57:35 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 90767226; Thu, 4 Jul 2013 15:57:35 +0000 (UTC) (envelope-from c.kworr@gmail.com) Received: from mail-la0-x22e.google.com (mail-la0-x22e.google.com [IPv6:2a00:1450:4010:c03::22e]) by mx1.freebsd.org (Postfix) with ESMTP id DE2B11E39; Thu, 4 Jul 2013 15:57:34 +0000 (UTC) Received: by mail-la0-f46.google.com with SMTP id eg20so1356643lab.5 for ; Thu, 04 Jul 2013 08:57:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=SPlrlkZLIwdyQJdUQokBF7ClkyVPB8OE9hKZseo860U=; b=0vsho0bChouImFsyMLRUp4hAfqV1p3PRfZO0mbMMC8+Nce15rj14TY4VStEBQTxEcD ivhQ+B1EUkm03pkFUmZTnq0vi2JzYJhgQiqzH9MdM0oxOF2NnWcNq4QY9OPcoZ1Ee47a 8ke6upbNIqj54Bv1HLzbkhp6KLzc1MLX7JStTYbKfnTpnNQnKnz7UiOM+z9+MrLm+SsA TApm8XEXcZodWs3IqreLFVWRd0KcJcTlSGp4w7QXUmACm5BFfpw40cbIoZs2nIl843zu AgAxsDzT92jGrxjfTixxNbBs+1x1nyQWZoDeP/osC5ljYxFcRYPBhElFfdfF1gyz3DJE 8h+g== X-Received: by 10.112.28.48 with SMTP id y16mr3888582lbg.37.1372953453715; Thu, 04 Jul 2013 08:57:33 -0700 (PDT) Received: from [192.168.1.139] (mau.donbass.com. [92.242.127.250]) by mx.google.com with ESMTPSA id et10sm1487139lbc.6.2013.07.04.08.57.32 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 04 Jul 2013 08:57:33 -0700 (PDT) Message-ID: <51D59B6C.5030600@gmail.com> Date: Thu, 04 Jul 2013 18:57:32 +0300 From: Volodymyr Kostyrko User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130627 Thunderbird/17.0.7 MIME-Version: 1.0 To: Dmitry Morozovsky Subject: Re: ZFS default compression algo for contemporary FreeBSD versions References: <51D576E1.6030803@gmail.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org, avg@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 15:57:35 -0000 04.07.2013 18:51, Dmitry Morozovsky wrote: > On Thu, 4 Jul 2013, Volodymyr Kostyrko wrote: > >> 04.07.2013 15:22, Dmitry Morozovsky wrote: >>> Collegues, >>> >>> is it sane to just set 'zfs compression=on dataset' to achieve best algo on >>> fresh FreeBSD systems (-current and/or stable/9)? >> >> No and this is not safe AFAIK. Default compression is still lzjb and >> bootloader can't boot oof datasets compressed with lzjb. However on stable/9 >> you can simply set zfs compression=lz4 pool and everything would work fine if >> you updated the boot loader. > > I did not intend to compress root/boot datasets (and there is no much sense in > this AFAICS); > > the second (and actually more important) my question is -- is lz4 in general > better than lzjb? Yes. Much better in terms of speed. -- Sphinx of black quartz, judge my vow. From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 16:03:38 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 27379767 for ; Thu, 4 Jul 2013 16:03:38 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 558F91EAC for ; Thu, 4 Jul 2013 16:03:36 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id TAA12837; Thu, 04 Jul 2013 19:03:12 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1Uum00-000Ews-1p; Thu, 04 Jul 2013 19:03:12 +0300 Message-ID: <51D59C88.9060403@FreeBSD.org> Date: Thu, 04 Jul 2013 19:02:16 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130405 Thunderbird/17.0.5 MIME-Version: 1.0 To: Dmitry Morozovsky Subject: Re: ZFS default compression algo for contemporary FreeBSD versions References: <51D576E1.6030803@gmail.com> <51D59B6C.5030600@gmail.com> In-Reply-To: <51D59B6C.5030600@gmail.com> X-Enigmail-Version: 1.5.1 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org, Volodymyr Kostyrko X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 16:03:38 -0000 on 04/07/2013 18:57 Volodymyr Kostyrko said the following: > Yes. Much better in terms of speed. And compression too. And the code is much younger. Keep this in mind just in case. -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 16:05:00 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 259B17EF; Thu, 4 Jul 2013 16:05:00 +0000 (UTC) (envelope-from marck@rinet.ru) Received: from woozle.rinet.ru (woozle.rinet.ru [195.54.192.68]) by mx1.freebsd.org (Postfix) with ESMTP id A79621EBF; Thu, 4 Jul 2013 16:04:59 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by woozle.rinet.ru (8.14.5/8.14.5) with ESMTP id r64G4wBK056115; Thu, 4 Jul 2013 20:04:58 +0400 (MSK) (envelope-from marck@rinet.ru) Date: Thu, 4 Jul 2013 20:04:58 +0400 (MSK) From: Dmitry Morozovsky To: Andriy Gapon Subject: Re: boot from ZFS: which pool types use? In-Reply-To: <51D59AAD.3030208@FreeBSD.org> Message-ID: References: <51D56066.1020902@FreeBSD.org> <51D577A9.1030304@gmail.com> <51D59AAD.3030208@FreeBSD.org> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) X-NCC-RegID: ru.rinet X-OpenPGP-Key-ID: 6B691B03 MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (woozle.rinet.ru [0.0.0.0]); Thu, 04 Jul 2013 20:04:58 +0400 (MSK) Cc: freebsd-fs@FreeBSD.org, Volodymyr Kostyrko X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 16:05:00 -0000 On Thu, 4 Jul 2013, Andriy Gapon wrote: > >>>>> Can't find /boot/zfsloader > >>>> > >>>> Does this file exist in the filesystem pointed to by bootfs property (if > >>>> set)? > >>> > >>> Arghh!!! I missed to set this one (however, this is the only zfs pool on the > >>> machine -- shouldn't the loader assume it is safe to try to boot off?) > >>> > >>> Regarding your other questions: > >> > >> That's weird. On all my machines I always set only vfs.root.mountfrom in > >> /boot/loader.conf to point to dataset containing root fs. > > > > In my last case, this was not enough. > > > > And this is redundant if bootfs is correctly set. However: setting this in boot.loader (or manually in loader prompt) is not enough to have bootable system And this, I suppose, could be attacked to reduce possibility of user-configuration errors: not having a possibility to point at loader is a bit disappointing, you see ;) -- Sincerely, D.Marck [DM5020, MCK-RIPE, DM3-RIPN] [ FreeBSD committer: marck@FreeBSD.org ] ------------------------------------------------------------------------ *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- marck@rinet.ru *** ------------------------------------------------------------------------ From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 16:05:45 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 21A1E86B; Thu, 4 Jul 2013 16:05:45 +0000 (UTC) (envelope-from marck@rinet.ru) Received: from woozle.rinet.ru (woozle.rinet.ru [195.54.192.68]) by mx1.freebsd.org (Postfix) with ESMTP id A37E41ECA; Thu, 4 Jul 2013 16:05:44 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by woozle.rinet.ru (8.14.5/8.14.5) with ESMTP id r64G5hRD056185; Thu, 4 Jul 2013 20:05:43 +0400 (MSK) (envelope-from marck@rinet.ru) Date: Thu, 4 Jul 2013 20:05:43 +0400 (MSK) From: Dmitry Morozovsky To: Volodymyr Kostyrko Subject: Re: ZFS default compression algo for contemporary FreeBSD versions In-Reply-To: <51D59B6C.5030600@gmail.com> Message-ID: References: <51D576E1.6030803@gmail.com> <51D59B6C.5030600@gmail.com> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) X-NCC-RegID: ru.rinet X-OpenPGP-Key-ID: 6B691B03 MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (woozle.rinet.ru [0.0.0.0]); Thu, 04 Jul 2013 20:05:43 +0400 (MSK) Cc: freebsd-fs@freebsd.org, avg@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 16:05:45 -0000 On Thu, 4 Jul 2013, Volodymyr Kostyrko wrote: > > > > is it sane to just set 'zfs compression=on dataset' to achieve best algo > > > > on > > > > fresh FreeBSD systems (-current and/or stable/9)? > > > > > > No and this is not safe AFAIK. Default compression is still lzjb and > > > bootloader can't boot oof datasets compressed with lzjb. However on > > > stable/9 > > > you can simply set zfs compression=lz4 pool and everything would work fine > > > if > > > you updated the boot loader. > > > > I did not intend to compress root/boot datasets (and there is no much sense > > in > > this AFAICS); > > > > the second (and actually more important) my question is -- is lz4 in general > > better than lzjb? > > Yes. Much better in terms of speed. Then, next logical step semms to me is to make lz4 the default ;-P -- Sincerely, D.Marck [DM5020, MCK-RIPE, DM3-RIPN] [ FreeBSD committer: marck@FreeBSD.org ] ------------------------------------------------------------------------ *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- marck@rinet.ru *** ------------------------------------------------------------------------ From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 16:26:53 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 523E3D13 for ; Thu, 4 Jul 2013 16:26:53 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 9A3B11FC3 for ; Thu, 4 Jul 2013 16:26:52 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id TAA13227; Thu, 04 Jul 2013 19:26:27 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1UumMV-000Ez2-Gw; Thu, 04 Jul 2013 19:26:27 +0300 Message-ID: <51D5A20F.4070103@FreeBSD.org> Date: Thu, 04 Jul 2013 19:25:51 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130405 Thunderbird/17.0.5 MIME-Version: 1.0 To: Dmitry Morozovsky Subject: Re: boot from ZFS: which pool types use? References: <51D56066.1020902@FreeBSD.org> <51D577A9.1030304@gmail.com> <51D59AAD.3030208@FreeBSD.org> In-Reply-To: X-Enigmail-Version: 1.5.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org, Volodymyr Kostyrko X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 16:26:53 -0000 on 04/07/2013 19:04 Dmitry Morozovsky said the following: > However: setting this in boot.loader (or manually in loader prompt) is not > enough to have bootable system > > And this, I suppose, could be attacked to reduce possibility of > user-configuration errors: not having a possibility to point at loader is a bit > disappointing, you see ;) I am confused with this's and that's and unknown file names. What exactly did you try and what exactly did not work? Please note that vfs.root.mountfrom tells loader to tell kernel from where it should mount root fs. It can not tell earlier boot blocks where to find loader itself. -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 16:57:14 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 7CED260A for ; Thu, 4 Jul 2013 16:57:14 +0000 (UTC) (envelope-from mxb@alumni.chalmers.se) Received: from mail-la0-f53.google.com (mail-la0-f53.google.com [209.85.215.53]) by mx1.freebsd.org (Postfix) with ESMTP id DFBEB1127 for ; Thu, 4 Jul 2013 16:57:13 +0000 (UTC) Received: by mail-la0-f53.google.com with SMTP id fs12so1412487lab.12 for ; Thu, 04 Jul 2013 09:57:12 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=content-type:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to:x-mailer :x-gm-message-state; bh=5KIEsliP57LCZ+hb6aLTMl4AupmmPvQIdSbiTgoLkSY=; b=INpAdr/0lbf1/dUueL1R8QJK8EPeD/zCpuyBIeksLO8P5TStqm0CiQBJ5hYmkjbGbC ZDSpNbb2HkmKfWZNGSiAOoK0ZV3N6sEwV7Yypf+XH1yCmb+J1ay+Fkm5dYWP964dnbOQ qDkfnPrWG2ZxJdhizcpHG/H1T2zlQiuwGQhD6lKMbRpRgwUDqeWCyyNl1kTwVV7Uao0Y k8mgbZCNeTQrNT0IMvwTHqaS/AXSz6w4V5Vc8U55sae4aICkzPVdu/YDYDDIdCn6Ls7V unlvinvKzwh5St6LWK7SDFsEFTtYdBfJirvcrmLeELGuXeeuF5qRm0ELVsqg8A4IYEV5 Lfag== X-Received: by 10.152.43.52 with SMTP id t20mr3238389lal.62.1372957032320; Thu, 04 Jul 2013 09:57:12 -0700 (PDT) Received: from grey.home.unixconn.com (h-74-23.a183.priv.bahnhof.se. [46.59.74.23]) by mx.google.com with ESMTPSA id p10sm1438834lap.8.2013.07.04.09.57.10 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 04 Jul 2013 09:57:11 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\)) Subject: Re: Slow resilvering with mirrored ZIL From: mxb In-Reply-To: <20130704000405.GA75529@icarus.home.lan> Date: Thu, 4 Jul 2013 18:57:09 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: References: <51D42107.1050107@digsys.bg> <2EF46A8C-6908-4160-BF99-EC610B3EA771@alumni.chalmers.se> <51D437E2.4060101@digsys.bg> <20130704000405.GA75529@icarus.home.lan> To: Jeremy Chadwick X-Mailer: Apple Mail (2.1508) X-Gm-Message-State: ALoCoQkohRYV+3ZgJ8VdwOLkGQijKxOUBfFGGbcUBbJIjYM+7A4jMaL/9cV8cmMLQhc/tHJ6qIjs Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 16:57:14 -0000 Well, I'v got a lot of errors in dmesg regarding one of four disks, = which is not currently replaced. After resilvering dropped to 2MB/s ( :O ), it stuck. I had to reboot = system. It came up and now continues resilvering process at 9/s (9 what??? = bytes? ) Here comes requested info: dmesg(plan was to re-new disks and then lift this sys up to STABLE): Copyright (c) 1992-2012 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights = reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 9.1-RELEASE-p1 #2: Fri Mar 8 10:30:38 CET 2013 root@nas.home.unixconn.com:/usr/obj/usr/src/sys/GENERIC amd64 module_register: module pci/em already exists! Module pci/em failed to register: 17 module_register: module pci/lem already exists! Module pci/lem failed to register: 17 CPU: Intel(R) Xeon(R) CPU X5460 @ 3.16GHz (3166.74-MHz = K8-class CPU) Origin =3D "GenuineIntel" Id =3D 0x10676 Family =3D 6 Model =3D 17 = Stepping =3D 6 = Features=3D0xbfebfbff = Features2=3D0xce3bd AMD Features=3D0x20100800 AMD Features2=3D0x1 TSC: P-state invariant, performance statistics real memory =3D 34359738368 (32768 MB) avail memory =3D 33095909376 (31562 MB) Event timer "LAPIC" quality 400 ACPI APIC Table: FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs FreeBSD/SMP: 1 package(s) x 4 core(s) cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 cpu2 (AP): APIC ID: 2 cpu3 (AP): APIC ID: 3 ioapic0 irqs 0-23 on motherboard ioapic1 irqs 24-47 on motherboard kbd1 at kbdmux0 ctl: CAM Target Layer loaded acpi0: on motherboard acpi0: Power Button (fixed) cpu0: on acpi0 cpu1: on acpi0 cpu2: on acpi0 cpu3: on acpi0 atrtc0: port 0x70-0x71 irq 8 on acpi0 Event timer "RTC" frequency 32768 Hz quality 0 attimer0: port 0x40-0x43,0x50-0x53 irq 0 on acpi0 Timecounter "i8254" frequency 1193182 Hz quality 0 Event timer "i8254" frequency 1193182 Hz quality 100 Timecounter "ACPI-fast" frequency 3579545 Hz quality 900 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1008-0x100b on acpi0 pcib0: port 0xcf8-0xcff on acpi0 pci0: on pcib0 pcib1: at device 2.0 on pci0 pci1: on pcib1 pcib2: irq 16 at device 0.0 on pci1 pci2: on pcib2 pcib3: irq 16 at device 0.0 on pci2 pci3: on pcib3 pcib4: at device 0.0 on pci3 pci4: on pcib4 pcib5: at device 0.2 on pci3 pci5: on pcib5 pcib6: irq 18 at device 2.0 on pci2 pci6: on pcib6 em0: port 0x2000-0x201f mem = 0xdc000000-0xdc01ffff irq 18 at device 0.0 on pci6 em0: Using an MSI interrupt em0: Ethernet address: 00:30:48:34:46:36 em1: port 0x2020-0x203f mem = 0xdc020000-0xdc03ffff irq 19 at device 0.1 on pci6 em1: Using an MSI interrupt em1: Ethernet address: 00:30:48:34:46:37 pcib7: at device 0.3 on pci1 pci7: on pcib7 pcib8: at device 4.0 on pci0 pci8: on pcib8 pcib9: at device 0.0 on pci8 pci9: on pcib9 bce0: mem = 0xd8000000-0xd9ffffff irq 16 at device 4.0 on pci9 miibus0: on bce0 brgphy0: PHY 1 on miibus0 brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, = 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow bce0: Ethernet address: 00:1b:78:38:2f:28 bce0: ASIC (0x57060020); Rev (A2); Bus (PCI-X, 64-bit, 100MHz); B/C = (1.9.6); Bufs (RX:2;TX:2;PG:8); Flags (SPLT|MSI) Coal (RX:6,6,18,18; TX:20,20,80,80) pcib10: at device 0.2 on pci8 pci10: on pcib10 bce1: mem = 0xda000000-0xdbffffff irq 17 at device 5.0 on pci10 miibus1: on bce1 brgphy1: PHY 1 on miibus1 brgphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, = 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow bce1: Ethernet address: 00:1b:78:38:2f:2a bce1: ASIC (0x57060020); Rev (A2); Bus (PCI-X, 64-bit, 100MHz); B/C = (1.9.6); Bufs (RX:2;TX:2;PG:8); Flags (SPLT|MSI) Coal (RX:6,6,18,18; TX:20,20,80,80) pcib11: at device 6.0 on pci0 pci11: on pcib11 pci0: at device 8.0 (no driver attached) pcib12: irq 17 at device 28.0 on pci0 pci12: on pcib12 uhci0: port = 0x1800-0x181f irq 17 at device 29.0 on pci0 usbus0 on uhci0 uhci1: port = 0x1820-0x183f irq 19 at device 29.1 on pci0 usbus1 on uhci1 uhci2: port = 0x1840-0x185f irq 18 at device 29.2 on pci0 usbus2 on uhci2 ehci0: mem 0xdc500000-0xdc5003ff irq = 17 at device 29.7 on pci0 usbus3: EHCI version 1.0 usbus3 on ehci0 pcib13: at device 30.0 on pci0 pci13: on pcib13 vgapci0: port 0x3000-0x30ff mem = 0xd0000000-0xd7ffffff,0xdc200000-0xdc20ffff irq 18 at device 1.0 on = pci13 isab0: at device 31.0 on pci0 isa0: on isab0 ahci0: port = 0x1890-0x1897,0x1884-0x1887,0x1888-0x188f,0x1880-0x1883,0x1860-0x187f = mem 0xdc500400-0xdc5007ff irq 19 at device 31.2 on pci0 ahci0: AHCI v1.10 with 6 3Gbps ports, Port Multiplier supported ahcich0: at channel 0 on ahci0 ahcich1: at channel 1 on ahci0 ahcich2: at channel 2 on ahci0 ahcich3: at channel 3 on ahci0 ahcich4: at channel 4 on ahci0 ahcich5: at channel 5 on ahci0 pci0: at device 31.3 (no driver attached) acpi_button0: on acpi0 atkbdc0: port 0x60,0x64 irq 1 on acpi0 atkbd0: irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 uart1: <16550 or compatible> port 0x2f8-0x2ff irq 3 on acpi0 fdc0: port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on = acpi0 fdc0: does not respond device_attach: fdc0 attach returned 6 ppc0: port 0x378-0x37f,0x778-0x77f irq 7 drq 3 on acpi0 ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode ppc0: FIFO with 16/16/9 bytes threshold ppbus0: on ppc0 plip0: on ppbus0 lpt0: on ppbus0 lpt0: Interrupt-driven port ppi0: on ppbus0 orm0: at iomem = 0xc0000-0xcafff,0xcb000-0xccfff,0xcd000-0xce7ff,0xce800-0xcffff on isa0 sc0: at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=3D0x300> vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on = isa0 fdc0: No FDOUT register! est0: on cpu0 est: CPU supports Enhanced Speedstep, but is not recognized. est: cpu_vendor GenuineIntel, msr 4921492106004921 device_attach: est0 attach returned 6 p4tcc0: on cpu0 est1: on cpu1 est: CPU supports Enhanced Speedstep, but is not recognized. est: cpu_vendor GenuineIntel, msr 4921492106004921 device_attach: est1 attach returned 6 p4tcc1: on cpu1 est2: on cpu2 est: CPU supports Enhanced Speedstep, but is not recognized. est: cpu_vendor GenuineIntel, msr 4921492106004921 device_attach: est2 attach returned 6 p4tcc2: on cpu2 est3: on cpu3 est: CPU supports Enhanced Speedstep, but is not recognized. est: cpu_vendor GenuineIntel, msr 4921492106004921 device_attach: est3 attach returned 6 p4tcc3: on cpu3 ZFS filesystem version 5 ZFS storage pool version 28 Timecounters tick every 1.000 msec usbus0: 12Mbps Full Speed USB v1.0 usbus1: 12Mbps Full Speed USB v1.0 usbus2: 12Mbps Full Speed USB v1.0 usbus3: 480Mbps High Speed USB v2.0 ugen0.1: at usbus0 uhub0: on usbus0 ugen1.1: at usbus1 uhub1: on usbus1 ugen2.1: at usbus2 uhub2: on usbus2 ugen3.1: at usbus3 uhub3: on usbus3 uhub0: 2 ports with 2 removable, self powered uhub1: 2 ports with 2 removable, self powered uhub2: 2 ports with 2 removable, self powered bce0: bce_pulse(): Warning: bootcode thinks driver is absent! (bc_state = =3D 0x00000004) bce1: bce_pulse(): Warning: bootcode thinks driver is absent! (bc_state = =3D 0x00000004) uhub3: 6 ports with 6 removable, self powered ada0 at ahcich0 bus 0 scbus0 target 0 lun 0 ada0: ATA-8 SATA 3.x device ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada0: Command Queueing enabled ada0: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C) ada0: Previously was known as ad4 ada1 at ahcich1 bus 0 scbus1 target 0 lun 0 ada1: ATA-8 SATA 2.x device ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada1: Command Queueing enabled ada1: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C) ada1: Previously was known as ad6 ada2 at ahcich2 bus 0 scbus2 target 0 lun 0 ada2: ATA-8 SATA 2.x device ada2: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada2: Command Queueing enabled ada2: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C) ada2: Previously was known as ad8 ada3 at ahcich3 bus 0 scbus3 target 0 lun 0 ada3: ATA-8 SATA 3.x device ada3: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada3: Command Queueing enabled ada3: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C) ada3: Previously was known as ad10 ada4 at ahcich4 bus 0 scbus4 target 0 lun 0 ada4: ATA-7 SATA 2.x device ada4: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada4: Command Queueing enabled ada4: 19087MB (39091248 512 byte sectors: 16H 63S/T 16383C) ada4: Previously was known as ad12 ada5 at ahcich5 bus 0 scbus5 target 0 lun 0 ada5: ATA-8 SATA 3.x device ada5: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada5: Command Queueing enabled ada5: 57241MB (117231408 512 byte sectors: 16H 63S/T 16383C) ada5: Previously was known as ad14 SMP: AP CPU #3 Launched! SMP: AP CPU #1 Launched! SMP: AP CPU #2 Launched! Timecounter "TSC-low" frequency 12370070 Hz quality 1000 Root mount waiting for: usbus3 ugen0.2: at usbus0 ukbd0: on usbus0 kbd2 at ukbd0 Root mount waiting for: usbus3 ugen3.2: at usbus3 umass0: on = usbus3 umass0: SCSI over Bulk-Only; quirks =3D 0x4100 umass0:7:0:-1: Attached to scbus7 Trying to mount root from zfs:NAS/system/root [rw]... da0 at umass-sim0 bus 0 scbus7 target 0 lun 0 da0: Removable Direct Access SCSI-0 device da0: 40.000MB/s transfers da0: 7580MB (15523840 512 byte sectors: 255H 63S/T 966C) GEOM_PART: integrity check failed (label/zfs_boot, BSD) GEOM_PART: integrity check failed (label/zfs_boot, BSD) bce0: Gigabit link up! bce1: Gigabit link up! zpool status: pool: NAS state: DEGRADED status: One or more devices is currently being resilvered. The pool = will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Tue Jul 2 07:31:29 2013 483G scanned out of 3.49T at 10/s, (scan is slow, no estimated = time) 121G resilvered, 13.52% done config: NAME STATE READ WRITE CKSUM NAS DEGRADED 0 0 0 raidz1-0 DEGRADED 0 0 0 replacing-0 DEGRADED 0 0 0 1160414651350745057 UNAVAIL 0 0 0 was = /dev/ada0/old ada0 ONLINE 0 0 0 = (resilvering) ada3 ONLINE 0 0 0 ada1 ONLINE 0 0 0 ada2 ONLINE 0 0 0 logs mirror-1 ONLINE 0 0 0 ada4s1 ONLINE 0 0 0 ada5s1 ONLINE 0 0 0 cache ada5s2 ONLINE 0 0 0 errors: No known data errors zpool get all: NAME PROPERTY VALUE SOURCE NAS size 3.64T - NAS capacity 95% - NAS altroot - default NAS health DEGRADED - NAS guid 3808946822857359331 default NAS version 28 default NAS bootfs - default NAS delegation on default NAS autoreplace off default NAS cachefile - default NAS failmode wait default NAS listsnapshots off default NAS autoexpand off default NAS dedupditto 0 default NAS dedupratio 1.00x - NAS free 154G - NAS allocated 3.49T - NAS readonly off - NAS comment - default NAS expandsize 0 - zfs get all: NAME PROPERTY VALUE SOURCE NAS type filesystem - NAS creation Tue Aug 3 20:10 2010 - NAS used 2.61T - NAS available 72.0G - NAS referenced 62.8K - NAS compressratio 1.00x - NAS mounted yes - NAS quota none default NAS reservation none default NAS recordsize 128K default NAS mountpoint /mnt/NAS local NAS sharenfs off default NAS checksum on default NAS compression off default NAS atime off local NAS devices on default NAS exec on default NAS setuid on default NAS readonly off default NAS jailed off default NAS snapdir hidden default NAS aclmode discard default NAS aclinherit restricted default NAS canmount on default NAS xattr off = temporary NAS copies 1 default NAS version 5 - NAS utf8only off - NAS normalization none - NAS casesensitivity sensitive - NAS vscan off default NAS nbmand off default NAS sharesmb off default NAS refquota none default NAS refreservation none default NAS primarycache all default NAS secondarycache all default NAS usedbysnapshots 0 - NAS usedbydataset 0 - NAS usedbychildren 0 - NAS usedbyrefreservation 0 - NAS logbias latency default NAS dedup off default NAS mlslabel - NAS sync standard local NAS refcompressratio 1.00x - NAS written 62.8K - the rest of created - defaults. no changes or inherited from this one. part show -p: nas# gpart show -p ada0 gpart: No such geom: ada0. nas# gpart show -p ada1 gpart: No such geom: ada1. nas# gpart show -p ada2 gpart: No such geom: ada2. nas# gpart show -p ada3 gpart: No such geom: ada3. nas# gpart show -p ada4 =3D> 63 39091185 ada4 MBR (18G) 63 20971503 ada4s1 freebsd (10G) 20971566 18119682 ada4s2 freebsd (8.7G) nas# gpart show -p ada5 =3D> 63 117231345 ada5 MBR (55G) 63 20971503 ada5s1 freebsd (10G) 20971566 96259842 ada5s2 freebsd (45G) ada4/ada5 are ZIL/L2ARC/swap cat /etc/sysct.conf: nas# cat /etc/sysctl.conf # $FreeBSD: src/etc/sysctl.conf,v 1.8.34.1.6.1 2010/12/21 17:09:25 = kensmith Exp $ # # This file is read when going to multi-user and its contents piped = thru # ``sysctl'' to adjust kernel values. ``man 5 sysctl.conf'' for = details. # # Uncomment this to prevent users from seeing information about = processes that # are being run under another UID. security.bsd.see_other_uids=3D0 # Disable power button hw.acpi.power_button_state=3DNONE # Disable core dump kern.coredump=3D0 kern.ipc.maxsockbuf=3D2097152 kern.ipc.nmbclusters=3D32768 kern.ipc.somaxconn=3D8192 kern.maxfiles=3D65536 kern.maxfilesperproc=3D32768 kern.ipc.nmbjumbo9=3D12800 # ZFS tuning kern.maxvnodes=3D400000 kern.maxfilesperproc=3D200000 kern.ipc.maxsockets=3D204800 = #http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#File= -Level_Prefetching vfs.zfs.l2arc_noprefetch=3D0 vfs.zfs.l2arc_write_max=3D209715200 ## in bytes. 200MB/s vfs.zfs.l2arc_write_boost=3D314572800 ## in bytes. 300MB/s #net.inet.icmp.icmplim=3D300 #net.inet.icmp.icmplim_output=3D1 #net.inet.ip.fastforwarding=3D1 #net.inet.ip.forwarding=3D1 #net.inet.tcp.delayed_ack=3D0 #net.inet.tcp.inflight.enable=3D0 #net.inet.tcp.path_mtu_discovery=3D0 #net.inet.tcp.recvspace=3D262144 #net.inet.tcp.rfc1323=3D1 #net.inet.tcp.sendspace=3D262144 #net.inet.udp.maxdgram=3D57344 #net.inet.udp.recvspace=3D65536 #net.local.stream.recvspace=3D65536 #net.local.stream.sendspace=3D65536 # Increase max command-line length showed in `ps` (e.g for Tomcat/Java) # Default is PAGE_SIZE / 16 or 256 on x86 # For more info see: http://www.freebsd.org/cgi/query-pr.cgi?pr=3D120749 kern.ps_arg_cache_limit=3D4096 net.inet.tcp.sendspace=3D65535 net.inet.tcp.recvspace=3D131072 net.inet.tcp.mssdflt=3D1452 net.inet.tcp.sendbuf_max=3D16777216 net.inet.tcp.sendbuf_inc=3D524288 net.inet.tcp.recvbuf_max=3D16777216 net.inet.tcp.recvbuf_inc=3D524288 net.inet.udp.recvspace=3D65535 net.inet.udp.maxdgram=3D65535 net.local.stream.recvspace=3D65535 net.local.stream.sendspace=3D65535 net.inet.tcp.delayed_ack=3D0 cat /boot/loader.conf: nas# cat /boot/loader.conf zfs_load=3D"YES" kern.maxfiles=3D"20480" vfs.root.mountfrom=3D"zfs:NAS/system/root" vfs.root.mountfrom.options=3D"rw" #I have 8G of Ram #vfs.zfs.prefetch_disable=3D0 vfs.zfs.write_limit_override=3D53687091200 #If Ram =3D 4GB, set the value to 512M #If Ram =3D 8GB, set the value to 1024M #vfs.zfs.arc_min=3D"4096M" #Ram x 0.5 - 512 MB #vfs.zfs.arc_max=3D"15488" #Ram x 2 #vm.kmem_size_max=3D"64G" #Ram x 1.5 #vm.kmem_size=3D"48G" ahci_load=3D"YES" aio_load=3D"YES" if_em_load=3D"YES" if_lagg_load=3D"YES" THIS IS ROOT ON ZFS SYS. On 4 jul 2013, at 02:04, Jeremy Chadwick wrote: > On Wed, Jul 03, 2013 at 05:12:17PM +0200, mxb wrote: >>=20 >> Not sure if new are 4k. Done nothing about that. >> But the SECOND drive, resilvering is SLOW. Not the first one. >>=20 >> As stated below. Those changes are introduced to the system. >> ALL new driver ARE identical, except S/N of cause :) >>=20 >> On 3 jul 2013, at 16:55, Steven Hartland = wrote: >>=20 >>>=20 >>> ----- Original Message ----- From: "Daniel Kalchev" = >>> To: "mxb" >>> Cc: >>> Sent: Wednesday, July 03, 2013 3:40 PM >>> Subject: Re: Slow resilvering with mirrored ZIL >>>=20 >>>=20 >>>> On 03.07.13 16:36, mxb wrote: >>>>> Well, then my question persists - why I get so significant drop of = speed while resilvering second drive. >>>>> The only changes to the system are: >>>>>=20 >>>>> 1. Second partition for ZIL to create a mirror >>>>> 2. New disks are 7200rpm. old ones are 5400rpm. >>>>>=20 >>>=20 >>> Its not something like the old disks are 512byte sectors >>> where as the new ones are 4k? >>>=20 >>> It this is the case having already replaced one disk you've >>> killed performance as its having to do lots more work reading >>> none 4k aligned data? >=20 > Not enough hard information to diagnose. What's needed is below; you > can XXX out system names if you want, but please do not XXX out = anything > else. >=20 > - Output from: dmesg >=20 > - Output from: zpool status >=20 > - Output from: zpool get all >=20 > - Output from: zfs get all >=20 > - Output from: "gpart show -p" for every disk on the system >=20 > - Output from: cat /etc/sysctl.conf >=20 > - Output from: cat /boot/loader.conf >=20 > - Output from: "smartctl -a" for every disk that's used by ZFS > (please use smartmontools 6.1 or newer in this case; install > ports/sysutils/smartmontools) >=20 > I can think of one tunable related to resilvering that may/might help > with your problem, but I'm not going to mention it until the above can > be provided. >=20 > --=20 > | Jeremy Chadwick jdc@koitsu.org | > | UNIX Systems Administrator http://jdc.koitsu.org/ | > | Making life hard for others since 1977. PGP 4BD6C0CB | >=20 From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 16:58:38 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 69489773; Thu, 4 Jul 2013 16:58:38 +0000 (UTC) (envelope-from delphij@delphij.net) Received: from anubis.delphij.net (anubis.delphij.net [IPv6:2001:470:1:117::25]) by mx1.freebsd.org (Postfix) with ESMTP id 4D6881138; Thu, 4 Jul 2013 16:58:38 +0000 (UTC) Received: from delphij-macbook.local (c-67-188-85-47.hsd1.ca.comcast.net [67.188.85.47]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by anubis.delphij.net (Postfix) with ESMTPSA id DF97F9E76; Thu, 4 Jul 2013 09:58:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=delphij.net; s=anubis; t=1372957118; bh=JlheDK0zuga1FwJVO2ox2612qb8nCGDrNQBRQUmcWZI=; h=Date:From:Reply-To:To:CC:Subject:References:In-Reply-To; b=mkafSwG3JO1OLJOfg+p/Kf0ZhNit50B/6Swl6ITt3Lyc2Wmaj2Ygq8D6F8o1ke/Q6 N4WCMtmLVL+dluzPaUG/vtrzctazd9WF4d3RmWruzGr7AE3h8hlzg1qf7ifJl/Vdfa puF9kGq7hkfODIKVTPMcwhRWvG5BkL1CWJ6siCVM= Message-ID: <51D5A9BD.5010705@delphij.net> Date: Thu, 04 Jul 2013 09:58:37 -0700 From: Xin Li Organization: The FreeBSD Project MIME-Version: 1.0 To: Dmitry Morozovsky Subject: Re: ZFS default compression algo for contemporary FreeBSD versions References: In-Reply-To: X-Enigmail-Version: 1.5.1 X-Enigmail-Draft-Status: 513 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org, avg@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: d@delphij.net List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 16:58:38 -0000 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 On 7/4/13 5:22 AM, Dmitry Morozovsky wrote: > Collegues, > > is it sane to just set 'zfs compression=on dataset' to achieve best > algo on fresh FreeBSD systems (-current and/or stable/9)? > > The manual page is a bit uncertain about this... No, we do plan to make LZ4 the default though. Both I and Martin have pending patches to make LZ4 the default (also for metadata, see my svn branch) but I get distracted from that recently and have not get a chance to revisit this yet. Cheers, -----BEGIN PGP SIGNATURE----- iQEcBAEBCAAGBQJR1am9AAoJEG80Jeu8UPuzkSkH/AlSQh/DWaBiNvPopEagHtPq /7fa9GKgmuj96PhqspNQz9Shh76KcmzSGg57mbFDwIoh3wo8e/jI8KhDSxK3Hnbg rt9CSR+ys2NGxsmLAeGKTljO3xWlj/3lqPZEHXe+wsIRH0jpIOa/tjeXtZxV95nL s+ROqn0CnC+smYg/FlXDqzs7PtNt/2G5YSJTRTsY1bA1M/4vgGsskAZsL7TyeG9b 7Jhhue7lOYux0lqqiZ2bSdbZ4BM2LjTN88122TJHxxKVGnwykqXi2PEO7ciwCIMG IyTLpj16GAAR4L5cLH5G0Tnb0ztSifNxej5n/pGLZJfw0hJy3FKvYF4p235x+kE= =+Zcm -----END PGP SIGNATURE----- From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 16:59:15 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 9426380A for ; Thu, 4 Jul 2013 16:59:15 +0000 (UTC) (envelope-from mxb@alumni.chalmers.se) Received: from mail-lb0-f169.google.com (mail-lb0-f169.google.com [209.85.217.169]) by mx1.freebsd.org (Postfix) with ESMTP id 1382A1143 for ; Thu, 4 Jul 2013 16:59:14 +0000 (UTC) Received: by mail-lb0-f169.google.com with SMTP id d10so1393826lbj.0 for ; Thu, 04 Jul 2013 09:59:13 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=content-type:mime-version:subject:from:in-reply-to:date:cc :message-id:references:to:x-mailer:x-gm-message-state; bh=hXmoRwhvQUrCteyU9AjWaR00wOoQvMpnshNjDIKxQ3c=; b=Z+lsz0a08dYIZ1xrTuoXWeldzDu6WptxCCMZ8ulGwRF7TWkZCLQthETqHBLk924yz7 dzzUzCRk5eVZ9GZVhbbRcMnsF3jwV/YvgeIbQ15Bg1QAACl0uwzK+p+mKcBHoJqdzT5i PRa1KNgKJuLafyKikr0ulFurFDBH22vMG95hz1COgNFFsh9C1a8XeJiC8OhF9zGrNeBx 0J/t+aiBrDrpL9BCeJQxlLGpkxzftDJHtnY+oLtNlgvUnmWRrZHlDIL/YnJ/uaUgKI+M sabUToulgVB2B1T6yVGKT80zb5i6I9fBvE/KP+6iBVvs92u1+3WU5fHIrGIh1a1jNGXh i6/w== X-Received: by 10.152.170.197 with SMTP id ao5mr3334879lac.35.1372957153635; Thu, 04 Jul 2013 09:59:13 -0700 (PDT) Received: from grey.home.unixconn.com (h-74-23.a183.priv.bahnhof.se. [46.59.74.23]) by mx.google.com with ESMTPSA id u1sm1448350lag.5.2013.07.04.09.59.11 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 04 Jul 2013 09:59:12 -0700 (PDT) Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\)) Subject: Re: Slow resilvering with mirrored ZIL From: mxb In-Reply-To: Date: Thu, 4 Jul 2013 18:59:10 +0200 Message-Id: <13D8539F-192B-443F-906C-ED09F3AC1BC1@alumni.chalmers.se> References: <51D42107.1050107@digsys.bg> <2EF46A8C-6908-4160-BF99-EC610B3EA771@alumni.chalmers.se> <51D437E2.4060101@digsys.bg> <20130704000405.GA75529@icarus.home.lan> To: Jeremy Chadwick X-Mailer: Apple Mail (2.1508) X-Gm-Message-State: ALoCoQm6Z1wj+mUOR9S5baNP498UrDe6BAK6AL3qwr+lArYC2zNG9o7x89SyNhpsWcvqlS4Fn+aQ Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 16:59:15 -0000 Those are NEW disks On 4 jul 2013, at 18:57, mxb wrote: > ST32000645NS From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 17:16:54 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 17538C58 for ; Thu, 4 Jul 2013 17:16:54 +0000 (UTC) (envelope-from jdc@koitsu.org) Received: from relay4-d.mail.gandi.net (relay4-d.mail.gandi.net [217.70.183.196]) by mx1.freebsd.org (Postfix) with ESMTP id 82C42124B for ; Thu, 4 Jul 2013 17:16:53 +0000 (UTC) Received: from mfilter17-d.gandi.net (mfilter17-d.gandi.net [217.70.178.145]) by relay4-d.mail.gandi.net (Postfix) with ESMTP id 6F52217208C; Thu, 4 Jul 2013 19:16:42 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at mfilter17-d.gandi.net Received: from relay4-d.mail.gandi.net ([217.70.183.196]) by mfilter17-d.gandi.net (mfilter17-d.gandi.net [10.0.15.180]) (amavisd-new, port 10024) with ESMTP id pvrhZl8PWtmP; Thu, 4 Jul 2013 19:16:40 +0200 (CEST) X-Originating-IP: 76.102.14.35 Received: from jdc.koitsu.org (c-76-102-14-35.hsd1.ca.comcast.net [76.102.14.35]) (Authenticated sender: jdc@koitsu.org) by relay4-d.mail.gandi.net (Postfix) with ESMTPSA id 25229172090; Thu, 4 Jul 2013 19:16:38 +0200 (CEST) Received: by icarus.home.lan (Postfix, from userid 1000) id 2388E73A1C; Thu, 4 Jul 2013 10:16:37 -0700 (PDT) Date: Thu, 4 Jul 2013 10:16:37 -0700 From: Jeremy Chadwick To: mxb Subject: Re: Slow resilvering with mirrored ZIL Message-ID: <20130704171637.GA94539@icarus.home.lan> References: <51D42107.1050107@digsys.bg> <2EF46A8C-6908-4160-BF99-EC610B3EA771@alumni.chalmers.se> <51D437E2.4060101@digsys.bg> <20130704000405.GA75529@icarus.home.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 17:16:54 -0000 On Thu, Jul 04, 2013 at 06:57:09PM +0200, mxb wrote: > > Well, I'v got a lot of errors in dmesg regarding one of four disks, which is not currently replaced. > After resilvering dropped to 2MB/s ( :O ), it stuck. I had to reboot system. > It came up and now continues resilvering process at 9/s (9 what??? bytes? ) > > Here comes requested info: > > dmesg(plan was to re-new disks and then lift this sys up to STABLE): > > Copyright (c) 1992-2012 The FreeBSD Project. > Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 > The Regents of the University of California. All rights reserved. > FreeBSD is a registered trademark of The FreeBSD Foundation. > FreeBSD 9.1-RELEASE-p1 #2: Fri Mar 8 10:30:38 CET 2013 > root@nas.home.unixconn.com:/usr/obj/usr/src/sys/GENERIC amd64 > module_register: module pci/em already exists! > Module pci/em failed to register: 17 > module_register: module pci/lem already exists! > Module pci/lem failed to register: 17 > CPU: Intel(R) Xeon(R) CPU X5460 @ 3.16GHz (3166.74-MHz K8-class CPU) > Origin = "GenuineIntel" Id = 0x10676 Family = 6 Model = 17 Stepping = 6 > Features=0xbfebfbff > Features2=0xce3bd > AMD Features=0x20100800 > AMD Features2=0x1 > TSC: P-state invariant, performance statistics > real memory = 34359738368 (32768 MB) > avail memory = 33095909376 (31562 MB) > Event timer "LAPIC" quality 400 > ACPI APIC Table: > FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs > FreeBSD/SMP: 1 package(s) x 4 core(s) > cpu0 (BSP): APIC ID: 0 > cpu1 (AP): APIC ID: 1 > cpu2 (AP): APIC ID: 2 > cpu3 (AP): APIC ID: 3 > ioapic0 irqs 0-23 on motherboard > ioapic1 irqs 24-47 on motherboard > kbd1 at kbdmux0 > ctl: CAM Target Layer loaded > acpi0: on motherboard > acpi0: Power Button (fixed) > cpu0: on acpi0 > cpu1: on acpi0 > cpu2: on acpi0 > cpu3: on acpi0 > atrtc0: port 0x70-0x71 irq 8 on acpi0 > Event timer "RTC" frequency 32768 Hz quality 0 > attimer0: port 0x40-0x43,0x50-0x53 irq 0 on acpi0 > Timecounter "i8254" frequency 1193182 Hz quality 0 > Event timer "i8254" frequency 1193182 Hz quality 100 > Timecounter "ACPI-fast" frequency 3579545 Hz quality 900 > acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1008-0x100b on acpi0 > pcib0: port 0xcf8-0xcff on acpi0 > pci0: on pcib0 > pcib1: at device 2.0 on pci0 > pci1: on pcib1 > pcib2: irq 16 at device 0.0 on pci1 > pci2: on pcib2 > pcib3: irq 16 at device 0.0 on pci2 > pci3: on pcib3 > pcib4: at device 0.0 on pci3 > pci4: on pcib4 > pcib5: at device 0.2 on pci3 > pci5: on pcib5 > pcib6: irq 18 at device 2.0 on pci2 > pci6: on pcib6 > em0: port 0x2000-0x201f mem 0xdc000000-0xdc01ffff irq 18 at device 0.0 on pci6 > em0: Using an MSI interrupt > em0: Ethernet address: 00:30:48:34:46:36 > em1: port 0x2020-0x203f mem 0xdc020000-0xdc03ffff irq 19 at device 0.1 on pci6 > em1: Using an MSI interrupt > em1: Ethernet address: 00:30:48:34:46:37 > pcib7: at device 0.3 on pci1 > pci7: on pcib7 > pcib8: at device 4.0 on pci0 > pci8: on pcib8 > pcib9: at device 0.0 on pci8 > pci9: on pcib9 > bce0: mem 0xd8000000-0xd9ffffff irq 16 at device 4.0 on pci9 > miibus0: on bce0 > brgphy0: PHY 1 on miibus0 > brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow > bce0: Ethernet address: 00:1b:78:38:2f:28 > bce0: ASIC (0x57060020); Rev (A2); Bus (PCI-X, 64-bit, 100MHz); B/C (1.9.6); Bufs (RX:2;TX:2;PG:8); Flags (SPLT|MSI) > Coal (RX:6,6,18,18; TX:20,20,80,80) > pcib10: at device 0.2 on pci8 > pci10: on pcib10 > bce1: mem 0xda000000-0xdbffffff irq 17 at device 5.0 on pci10 > miibus1: on bce1 > brgphy1: PHY 1 on miibus1 > brgphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow > bce1: Ethernet address: 00:1b:78:38:2f:2a > bce1: ASIC (0x57060020); Rev (A2); Bus (PCI-X, 64-bit, 100MHz); B/C (1.9.6); Bufs (RX:2;TX:2;PG:8); Flags (SPLT|MSI) > Coal (RX:6,6,18,18; TX:20,20,80,80) > pcib11: at device 6.0 on pci0 > pci11: on pcib11 > pci0: at device 8.0 (no driver attached) > pcib12: irq 17 at device 28.0 on pci0 > pci12: on pcib12 > uhci0: port 0x1800-0x181f irq 17 at device 29.0 on pci0 > usbus0 on uhci0 > uhci1: port 0x1820-0x183f irq 19 at device 29.1 on pci0 > usbus1 on uhci1 > uhci2: port 0x1840-0x185f irq 18 at device 29.2 on pci0 > usbus2 on uhci2 > ehci0: mem 0xdc500000-0xdc5003ff irq 17 at device 29.7 on pci0 > usbus3: EHCI version 1.0 > usbus3 on ehci0 > pcib13: at device 30.0 on pci0 > pci13: on pcib13 > vgapci0: port 0x3000-0x30ff mem 0xd0000000-0xd7ffffff,0xdc200000-0xdc20ffff irq 18 at device 1.0 on pci13 > isab0: at device 31.0 on pci0 > isa0: on isab0 > ahci0: port 0x1890-0x1897,0x1884-0x1887,0x1888-0x188f,0x1880-0x1883,0x1860-0x187f mem 0xdc500400-0xdc5007ff irq 19 at device 31.2 on pci0 > ahci0: AHCI v1.10 with 6 3Gbps ports, Port Multiplier supported > ahcich0: at channel 0 on ahci0 > ahcich1: at channel 1 on ahci0 > ahcich2: at channel 2 on ahci0 > ahcich3: at channel 3 on ahci0 > ahcich4: at channel 4 on ahci0 > ahcich5: at channel 5 on ahci0 > pci0: at device 31.3 (no driver attached) > acpi_button0: on acpi0 > atkbdc0: port 0x60,0x64 irq 1 on acpi0 > atkbd0: irq 1 on atkbdc0 > kbd0 at atkbd0 > atkbd0: [GIANT-LOCKED] > uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 > uart1: <16550 or compatible> port 0x2f8-0x2ff irq 3 on acpi0 > fdc0: port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0 > fdc0: does not respond > device_attach: fdc0 attach returned 6 > ppc0: port 0x378-0x37f,0x778-0x77f irq 7 drq 3 on acpi0 > ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode > ppc0: FIFO with 16/16/9 bytes threshold > ppbus0: on ppc0 > plip0: on ppbus0 > lpt0: on ppbus0 > lpt0: Interrupt-driven port > ppi0: on ppbus0 > orm0: at iomem 0xc0000-0xcafff,0xcb000-0xccfff,0xcd000-0xce7ff,0xce800-0xcffff on isa0 > sc0: at flags 0x100 on isa0 > sc0: VGA <16 virtual consoles, flags=0x300> > vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 > fdc0: No FDOUT register! > est0: on cpu0 > est: CPU supports Enhanced Speedstep, but is not recognized. > est: cpu_vendor GenuineIntel, msr 4921492106004921 > device_attach: est0 attach returned 6 > p4tcc0: on cpu0 > est1: on cpu1 > est: CPU supports Enhanced Speedstep, but is not recognized. > est: cpu_vendor GenuineIntel, msr 4921492106004921 > device_attach: est1 attach returned 6 > p4tcc1: on cpu1 > est2: on cpu2 > est: CPU supports Enhanced Speedstep, but is not recognized. > est: cpu_vendor GenuineIntel, msr 4921492106004921 > device_attach: est2 attach returned 6 > p4tcc2: on cpu2 > est3: on cpu3 > est: CPU supports Enhanced Speedstep, but is not recognized. > est: cpu_vendor GenuineIntel, msr 4921492106004921 > device_attach: est3 attach returned 6 > p4tcc3: on cpu3 > ZFS filesystem version 5 > ZFS storage pool version 28 > Timecounters tick every 1.000 msec > usbus0: 12Mbps Full Speed USB v1.0 > usbus1: 12Mbps Full Speed USB v1.0 > usbus2: 12Mbps Full Speed USB v1.0 > usbus3: 480Mbps High Speed USB v2.0 > ugen0.1: at usbus0 > uhub0: on usbus0 > ugen1.1: at usbus1 > uhub1: on usbus1 > ugen2.1: at usbus2 > uhub2: on usbus2 > ugen3.1: at usbus3 > uhub3: on usbus3 > uhub0: 2 ports with 2 removable, self powered > uhub1: 2 ports with 2 removable, self powered > uhub2: 2 ports with 2 removable, self powered > bce0: bce_pulse(): Warning: bootcode thinks driver is absent! (bc_state = 0x00000004) > bce1: bce_pulse(): Warning: bootcode thinks driver is absent! (bc_state = 0x00000004) > uhub3: 6 ports with 6 removable, self powered > ada0 at ahcich0 bus 0 scbus0 target 0 lun 0 > ada0: ATA-8 SATA 3.x device > ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) > ada0: Command Queueing enabled > ada0: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C) > ada0: Previously was known as ad4 > ada1 at ahcich1 bus 0 scbus1 target 0 lun 0 > ada1: ATA-8 SATA 2.x device > ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) > ada1: Command Queueing enabled > ada1: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C) > ada1: Previously was known as ad6 > ada2 at ahcich2 bus 0 scbus2 target 0 lun 0 > ada2: ATA-8 SATA 2.x device > ada2: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) > ada2: Command Queueing enabled > ada2: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C) > ada2: Previously was known as ad8 > ada3 at ahcich3 bus 0 scbus3 target 0 lun 0 > ada3: ATA-8 SATA 3.x device > ada3: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) > ada3: Command Queueing enabled > ada3: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C) > ada3: Previously was known as ad10 > ada4 at ahcich4 bus 0 scbus4 target 0 lun 0 > ada4: ATA-7 SATA 2.x device > ada4: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) > ada4: Command Queueing enabled > ada4: 19087MB (39091248 512 byte sectors: 16H 63S/T 16383C) > ada4: Previously was known as ad12 > ada5 at ahcich5 bus 0 scbus5 target 0 lun 0 > ada5: ATA-8 SATA 3.x device > ada5: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) > ada5: Command Queueing enabled > ada5: 57241MB (117231408 512 byte sectors: 16H 63S/T 16383C) > ada5: Previously was known as ad14 > SMP: AP CPU #3 Launched! > SMP: AP CPU #1 Launched! > SMP: AP CPU #2 Launched! > Timecounter "TSC-low" frequency 12370070 Hz quality 1000 > Root mount waiting for: usbus3 > ugen0.2: at usbus0 > ukbd0: on usbus0 > kbd2 at ukbd0 > Root mount waiting for: usbus3 > ugen3.2: at usbus3 > umass0: on usbus3 > umass0: SCSI over Bulk-Only; quirks = 0x4100 > umass0:7:0:-1: Attached to scbus7 > Trying to mount root from zfs:NAS/system/root [rw]... > da0 at umass-sim0 bus 0 scbus7 target 0 lun 0 > da0: Removable Direct Access SCSI-0 device > da0: 40.000MB/s transfers > da0: 7580MB (15523840 512 byte sectors: 255H 63S/T 966C) > GEOM_PART: integrity check failed (label/zfs_boot, BSD) > GEOM_PART: integrity check failed (label/zfs_boot, BSD) > bce0: Gigabit link up! > bce1: Gigabit link up! > > zpool status: > pool: NAS > state: DEGRADED > status: One or more devices is currently being resilvered. The pool will > continue to function, possibly in a degraded state. > action: Wait for the resilver to complete. > scan: resilver in progress since Tue Jul 2 07:31:29 2013 > 483G scanned out of 3.49T at 10/s, (scan is slow, no estimated time) > 121G resilvered, 13.52% done > config: > > NAME STATE READ WRITE CKSUM > NAS DEGRADED 0 0 0 > raidz1-0 DEGRADED 0 0 0 > replacing-0 DEGRADED 0 0 0 > 1160414651350745057 UNAVAIL 0 0 0 was /dev/ada0/old > ada0 ONLINE 0 0 0 (resilvering) > ada3 ONLINE 0 0 0 > ada1 ONLINE 0 0 0 > ada2 ONLINE 0 0 0 > logs > mirror-1 ONLINE 0 0 0 > ada4s1 ONLINE 0 0 0 > ada5s1 ONLINE 0 0 0 > cache > ada5s2 ONLINE 0 0 0 > > errors: No known data errors > > zpool get all: > NAME PROPERTY VALUE SOURCE > NAS size 3.64T - > NAS capacity 95% - > NAS altroot - default > NAS health DEGRADED - > NAS guid 3808946822857359331 default > NAS version 28 default > NAS bootfs - default > NAS delegation on default > NAS autoreplace off default > NAS cachefile - default > NAS failmode wait default > NAS listsnapshots off default > NAS autoexpand off default > NAS dedupditto 0 default > NAS dedupratio 1.00x - > NAS free 154G - > NAS allocated 3.49T - > NAS readonly off - > NAS comment - default > NAS expandsize 0 - > > zfs get all: > NAME PROPERTY VALUE SOURCE > NAS type filesystem - > NAS creation Tue Aug 3 20:10 2010 - > NAS used 2.61T - > NAS available 72.0G - > NAS referenced 62.8K - > NAS compressratio 1.00x - > NAS mounted yes - > NAS quota none default > NAS reservation none default > NAS recordsize 128K default > NAS mountpoint /mnt/NAS local > NAS sharenfs off default > NAS checksum on default > NAS compression off default > NAS atime off local > NAS devices on default > NAS exec on default > NAS setuid on default > NAS readonly off default > NAS jailed off default > NAS snapdir hidden default > NAS aclmode discard default > NAS aclinherit restricted default > NAS canmount on default > NAS xattr off temporary > NAS copies 1 default > NAS version 5 - > NAS utf8only off - > NAS normalization none - > NAS casesensitivity sensitive - > NAS vscan off default > NAS nbmand off default > NAS sharesmb off default > NAS refquota none default > NAS refreservation none default > NAS primarycache all default > NAS secondarycache all default > NAS usedbysnapshots 0 - > NAS usedbydataset 0 - > NAS usedbychildren 0 - > NAS usedbyrefreservation 0 - > NAS logbias latency default > NAS dedup off default > NAS mlslabel - > NAS sync standard local > NAS refcompressratio 1.00x - > NAS written 62.8K - > > the rest of created - defaults. no changes or inherited from this one. > > part show -p: > nas# gpart show -p ada0 > gpart: No such geom: ada0. > nas# gpart show -p ada1 > gpart: No such geom: ada1. > nas# gpart show -p ada2 > gpart: No such geom: ada2. > nas# gpart show -p ada3 > gpart: No such geom: ada3. > nas# gpart show -p ada4 > => 63 39091185 ada4 MBR (18G) > 63 20971503 ada4s1 freebsd (10G) > 20971566 18119682 ada4s2 freebsd (8.7G) > > nas# gpart show -p ada5 > => 63 117231345 ada5 MBR (55G) > 63 20971503 ada5s1 freebsd (10G) > 20971566 96259842 ada5s2 freebsd (45G) > > ada4/ada5 are ZIL/L2ARC/swap > > cat /etc/sysct.conf: > > nas# cat /etc/sysctl.conf > # $FreeBSD: src/etc/sysctl.conf,v 1.8.34.1.6.1 2010/12/21 17:09:25 kensmith Exp $ > # > # This file is read when going to multi-user and its contents piped thru > # ``sysctl'' to adjust kernel values. ``man 5 sysctl.conf'' for details. > # > > # Uncomment this to prevent users from seeing information about processes that > # are being run under another UID. > security.bsd.see_other_uids=0 > > # Disable power button > hw.acpi.power_button_state=NONE > # Disable core dump > kern.coredump=0 > > kern.ipc.maxsockbuf=2097152 > kern.ipc.nmbclusters=32768 > kern.ipc.somaxconn=8192 > kern.maxfiles=65536 > kern.maxfilesperproc=32768 > kern.ipc.nmbjumbo9=12800 > > # ZFS tuning > kern.maxvnodes=400000 > kern.maxfilesperproc=200000 > kern.ipc.maxsockets=204800 > > #http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#File-Level_Prefetching > vfs.zfs.l2arc_noprefetch=0 > vfs.zfs.l2arc_write_max=209715200 ## in bytes. 200MB/s > vfs.zfs.l2arc_write_boost=314572800 ## in bytes. 300MB/s > > #net.inet.icmp.icmplim=300 > #net.inet.icmp.icmplim_output=1 > #net.inet.ip.fastforwarding=1 > #net.inet.ip.forwarding=1 > #net.inet.tcp.delayed_ack=0 > #net.inet.tcp.inflight.enable=0 > #net.inet.tcp.path_mtu_discovery=0 > #net.inet.tcp.recvspace=262144 > #net.inet.tcp.rfc1323=1 > #net.inet.tcp.sendspace=262144 > #net.inet.udp.maxdgram=57344 > #net.inet.udp.recvspace=65536 > #net.local.stream.recvspace=65536 > #net.local.stream.sendspace=65536 > > > # Increase max command-line length showed in `ps` (e.g for Tomcat/Java) > # Default is PAGE_SIZE / 16 or 256 on x86 > # For more info see: http://www.freebsd.org/cgi/query-pr.cgi?pr=120749 > kern.ps_arg_cache_limit=4096 > > net.inet.tcp.sendspace=65535 > net.inet.tcp.recvspace=131072 > net.inet.tcp.mssdflt=1452 > net.inet.tcp.sendbuf_max=16777216 > net.inet.tcp.sendbuf_inc=524288 > net.inet.tcp.recvbuf_max=16777216 > net.inet.tcp.recvbuf_inc=524288 > net.inet.udp.recvspace=65535 > net.inet.udp.maxdgram=65535 > net.local.stream.recvspace=65535 > net.local.stream.sendspace=65535 > net.inet.tcp.delayed_ack=0 > > cat /boot/loader.conf: > > nas# cat /boot/loader.conf > zfs_load="YES" > kern.maxfiles="20480" > > vfs.root.mountfrom="zfs:NAS/system/root" > vfs.root.mountfrom.options="rw" > > #I have 8G of Ram > #vfs.zfs.prefetch_disable=0 > vfs.zfs.write_limit_override=53687091200 > > #If Ram = 4GB, set the value to 512M > #If Ram = 8GB, set the value to 1024M > #vfs.zfs.arc_min="4096M" > > #Ram x 0.5 - 512 MB > #vfs.zfs.arc_max="15488" > > #Ram x 2 > #vm.kmem_size_max="64G" > > #Ram x 1.5 > #vm.kmem_size="48G" > > ahci_load="YES" > aio_load="YES" > if_em_load="YES" > if_lagg_load="YES" > > THIS IS ROOT ON ZFS SYS. Thanks for the info. Given the above, I need even more: - Output from: zdb -C - Output from: smartctl -a ada{0,5} And yes, "9/s" or "10/s" in resilvering means bytes per second. I have a couple feelings about what's going on (meaning right now I have 3 separate theories), but I need to see the above first. Also, I swore I remember some Intel ESB2 thing that got committed to stable/9 recently, not sure what I'm remembering or if I'm remembering it right though, will go look it up later. I should be clear however: I do not believe at this time the problem you're experiencing is with the ESB2 southbridge or anything along those lines. If there were issues with that on FreeBSD I'd expect them to be widespread given the popularity of the chipset in the server realm. -- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 17:36:39 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 4AEC3E7 for ; Thu, 4 Jul 2013 17:36:39 +0000 (UTC) (envelope-from mxb@alumni.chalmers.se) Received: from mail-la0-f44.google.com (mail-la0-f44.google.com [209.85.215.44]) by mx1.freebsd.org (Postfix) with ESMTP id AE3D71305 for ; Thu, 4 Jul 2013 17:36:38 +0000 (UTC) Received: by mail-la0-f44.google.com with SMTP id er20so1451325lab.31 for ; Thu, 04 Jul 2013 10:36:30 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=content-type:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to:x-mailer :x-gm-message-state; bh=V3+h3DHqWswKtjjXU7r/RAMxbVgiAk7EddoGDigEMPE=; b=kBw+Xq3VfATHV8BoJWlEOix5FnMGxJQEzOoS9wi/0vYoj2l+f1uoc09PLIG7RNaXLY 2k35cW1fwaCVrDkmqaTOp6h3DM7DpalK7yh27AvXmHE/XHeKGZeIzYVcYrpff2ONABHx b64cKkVJ7WYulLieH3FPkKgv0wyYS/iCI+vRMgG2bKfrKKiXgtSY8W92zJKFBDAoQvr8 AoQFcQGvaq8CQntdobkGAQuWeM9QVPSrR74HNJwU5yUcBK103idafInGOHxAXR2Rf6CX 9GYo+NR0w/eEjOvwln/Fw/W5HRWi9Q5PXcOPsQveAKEeWezQk3nIsOrKsaONkqNlQioW 0lPg== X-Received: by 10.152.44.170 with SMTP id f10mr3256619lam.68.1372959390591; Thu, 04 Jul 2013 10:36:30 -0700 (PDT) Received: from grey.home.unixconn.com (h-74-23.a183.priv.bahnhof.se. [46.59.74.23]) by mx.google.com with ESMTPSA id y5sm1507202lae.2.2013.07.04.10.36.28 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 04 Jul 2013 10:36:29 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\)) Subject: Re: Slow resilvering with mirrored ZIL From: mxb In-Reply-To: <20130704171637.GA94539@icarus.home.lan> Date: Thu, 4 Jul 2013 19:36:27 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: <2A261BEA-4452-4F6A-8EFB-90A54D79CBB9@alumni.chalmers.se> References: <51D42107.1050107@digsys.bg> <2EF46A8C-6908-4160-BF99-EC610B3EA771@alumni.chalmers.se> <51D437E2.4060101@digsys.bg> <20130704000405.GA75529@icarus.home.lan> <20130704171637.GA94539@icarus.home.lan> To: Jeremy Chadwick X-Mailer: Apple Mail (2.1508) X-Gm-Message-State: ALoCoQkox10Nga9IjFZbGRo8ljwuRaYcMrKeCqBGNaC6g0D7IYdsv4pg9/rWpWl1xvDroK5dB4cB Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 17:36:39 -0000 zdb -C: NAS: version: 28 name: 'NAS' state: 0 txg: 19039918 pool_guid: 3808946822857359331 hostid: 516334119 hostname: 'nas.home.unixconn.com' vdev_children: 2 vdev_tree: type: 'root' id: 0 guid: 3808946822857359331 children[0]: type: 'raidz' id: 0 guid: 15126043265564363201 nparity: 1 metaslab_array: 14 metaslab_shift: 32 ashift: 9 asize: 4000799784960 is_log: 0 children[0]: type: 'replacing' id: 0 guid: 15772352161192928927 whole_disk: 0 children[0]: type: 'disk' id: 0 guid: 1160414651350745057 path: '/dev/ada0/old' phys_path: '/dev/ada0' whole_disk: 0 DTL: 788 children[1]: type: 'disk' id: 1 guid: 16857066297968094282 path: '/dev/ada0' phys_path: '/dev/ada0' whole_disk: 1 DTL: 4100 resilvering: 1 children[1]: type: 'disk' id: 1 guid: 8602024497074155869 path: '/dev/ada3' phys_path: '/dev/ada3' whole_disk: 1 DTL: 4097 children[2]: type: 'disk' id: 2 guid: 15590206669698985709 path: '/dev/ada1' phys_path: '/dev/ada1' whole_disk: 0 DTL: 786 children[3]: type: 'disk' id: 3 guid: 14741760026602900071 path: '/dev/ada2' phys_path: '/dev/ada2' whole_disk: 0 DTL: 785 children[1]: type: 'mirror' id: 1 guid: 6427576469721219307 metaslab_array: 4098 metaslab_shift: 26 ashift: 9 asize: 10732437504 is_log: 1 create_txg: 19039876 children[0]: type: 'disk' id: 0 guid: 14073296989893822073 path: '/dev/ada4s1' phys_path: '/dev/ada4s1' whole_disk: 1 create_txg: 19039876 children[1]: type: 'disk' id: 1 guid: 8631947194818711903 path: '/dev/ada5s1' phys_path: '/dev/ada5s1' whole_disk: 1 create_txg: 19039876 smartctl: nas# smartctl -a ada3 smartctl 6.0 2012-10-10 r3643 [FreeBSD 9.1-RELEASE-p1 amd64] (local = build) Copyright (C) 2002-12, Bruce Allen, Christian Franke, = www.smartmontools.org ada3: Unable to detect device type Please specify device type with the -d option. Use smartctl -h to get a usage summary THIS IS FOR ALL DISKS I HAVE INSIDE. On 4 jul 2013, at 19:16, Jeremy Chadwick wrote: > On Thu, Jul 04, 2013 at 06:57:09PM +0200, mxb wrote: >>=20 >> Well, I'v got a lot of errors in dmesg regarding one of four disks, = which is not currently replaced. >> After resilvering dropped to 2MB/s ( :O ), it stuck. I had to reboot = system. >> It came up and now continues resilvering process at 9/s (9 what??? = bytes? ) >>=20 >> Here comes requested info: >>=20 >> dmesg(plan was to re-new disks and then lift this sys up to STABLE): >>=20 >> Copyright (c) 1992-2012 The FreeBSD Project. >> Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, = 1994 >> The Regents of the University of California. All rights = reserved. >> FreeBSD is a registered trademark of The FreeBSD Foundation. >> FreeBSD 9.1-RELEASE-p1 #2: Fri Mar 8 10:30:38 CET 2013 >> root@nas.home.unixconn.com:/usr/obj/usr/src/sys/GENERIC amd64 >> module_register: module pci/em already exists! >> Module pci/em failed to register: 17 >> module_register: module pci/lem already exists! >> Module pci/lem failed to register: 17 >> CPU: Intel(R) Xeon(R) CPU X5460 @ 3.16GHz (3166.74-MHz = K8-class CPU) >> Origin =3D "GenuineIntel" Id =3D 0x10676 Family =3D 6 Model =3D = 17 Stepping =3D 6 >> = Features=3D0xbfebfbff >> = Features2=3D0xce3bd >> AMD Features=3D0x20100800 >> AMD Features2=3D0x1 >> TSC: P-state invariant, performance statistics >> real memory =3D 34359738368 (32768 MB) >> avail memory =3D 33095909376 (31562 MB) >> Event timer "LAPIC" quality 400 >> ACPI APIC Table: >> FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs >> FreeBSD/SMP: 1 package(s) x 4 core(s) >> cpu0 (BSP): APIC ID: 0 >> cpu1 (AP): APIC ID: 1 >> cpu2 (AP): APIC ID: 2 >> cpu3 (AP): APIC ID: 3 >> ioapic0 irqs 0-23 on motherboard >> ioapic1 irqs 24-47 on motherboard >> kbd1 at kbdmux0 >> ctl: CAM Target Layer loaded >> acpi0: on motherboard >> acpi0: Power Button (fixed) >> cpu0: on acpi0 >> cpu1: on acpi0 >> cpu2: on acpi0 >> cpu3: on acpi0 >> atrtc0: port 0x70-0x71 irq 8 on acpi0 >> Event timer "RTC" frequency 32768 Hz quality 0 >> attimer0: port 0x40-0x43,0x50-0x53 irq 0 on acpi0 >> Timecounter "i8254" frequency 1193182 Hz quality 0 >> Event timer "i8254" frequency 1193182 Hz quality 100 >> Timecounter "ACPI-fast" frequency 3579545 Hz quality 900 >> acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1008-0x100b on = acpi0 >> pcib0: port 0xcf8-0xcff on acpi0 >> pci0: on pcib0 >> pcib1: at device 2.0 on pci0 >> pci1: on pcib1 >> pcib2: irq 16 at device 0.0 on pci1 >> pci2: on pcib2 >> pcib3: irq 16 at device 0.0 on pci2 >> pci3: on pcib3 >> pcib4: at device 0.0 on pci3 >> pci4: on pcib4 >> pcib5: at device 0.2 on pci3 >> pci5: on pcib5 >> pcib6: irq 18 at device 2.0 on pci2 >> pci6: on pcib6 >> em0: port 0x2000-0x201f = mem 0xdc000000-0xdc01ffff irq 18 at device 0.0 on pci6 >> em0: Using an MSI interrupt >> em0: Ethernet address: 00:30:48:34:46:36 >> em1: port 0x2020-0x203f = mem 0xdc020000-0xdc03ffff irq 19 at device 0.1 on pci6 >> em1: Using an MSI interrupt >> em1: Ethernet address: 00:30:48:34:46:37 >> pcib7: at device 0.3 on pci1 >> pci7: on pcib7 >> pcib8: at device 4.0 on pci0 >> pci8: on pcib8 >> pcib9: at device 0.0 on pci8 >> pci9: on pcib9 >> bce0: mem = 0xd8000000-0xd9ffffff irq 16 at device 4.0 on pci9 >> miibus0: on bce0 >> brgphy0: PHY 1 on miibus0 >> brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, = 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow >> bce0: Ethernet address: 00:1b:78:38:2f:28 >> bce0: ASIC (0x57060020); Rev (A2); Bus (PCI-X, 64-bit, 100MHz); B/C = (1.9.6); Bufs (RX:2;TX:2;PG:8); Flags (SPLT|MSI) >> Coal (RX:6,6,18,18; TX:20,20,80,80) >> pcib10: at device 0.2 on pci8 >> pci10: on pcib10 >> bce1: mem = 0xda000000-0xdbffffff irq 17 at device 5.0 on pci10 >> miibus1: on bce1 >> brgphy1: PHY 1 on miibus1 >> brgphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, = 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow >> bce1: Ethernet address: 00:1b:78:38:2f:2a >> bce1: ASIC (0x57060020); Rev (A2); Bus (PCI-X, 64-bit, 100MHz); B/C = (1.9.6); Bufs (RX:2;TX:2;PG:8); Flags (SPLT|MSI) >> Coal (RX:6,6,18,18; TX:20,20,80,80) >> pcib11: at device 6.0 on pci0 >> pci11: on pcib11 >> pci0: at device 8.0 (no driver attached) >> pcib12: irq 17 at device 28.0 on pci0 >> pci12: on pcib12 >> uhci0: port = 0x1800-0x181f irq 17 at device 29.0 on pci0 >> usbus0 on uhci0 >> uhci1: port = 0x1820-0x183f irq 19 at device 29.1 on pci0 >> usbus1 on uhci1 >> uhci2: port = 0x1840-0x185f irq 18 at device 29.2 on pci0 >> usbus2 on uhci2 >> ehci0: mem 0xdc500000-0xdc5003ff = irq 17 at device 29.7 on pci0 >> usbus3: EHCI version 1.0 >> usbus3 on ehci0 >> pcib13: at device 30.0 on pci0 >> pci13: on pcib13 >> vgapci0: port 0x3000-0x30ff mem = 0xd0000000-0xd7ffffff,0xdc200000-0xdc20ffff irq 18 at device 1.0 on = pci13 >> isab0: at device 31.0 on pci0 >> isa0: on isab0 >> ahci0: port = 0x1890-0x1897,0x1884-0x1887,0x1888-0x188f,0x1880-0x1883,0x1860-0x187f = mem 0xdc500400-0xdc5007ff irq 19 at device 31.2 on pci0 >> ahci0: AHCI v1.10 with 6 3Gbps ports, Port Multiplier supported >> ahcich0: at channel 0 on ahci0 >> ahcich1: at channel 1 on ahci0 >> ahcich2: at channel 2 on ahci0 >> ahcich3: at channel 3 on ahci0 >> ahcich4: at channel 4 on ahci0 >> ahcich5: at channel 5 on ahci0 >> pci0: at device 31.3 (no driver attached) >> acpi_button0: on acpi0 >> atkbdc0: port 0x60,0x64 irq 1 on acpi0 >> atkbd0: irq 1 on atkbdc0 >> kbd0 at atkbd0 >> atkbd0: [GIANT-LOCKED] >> uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on = acpi0 >> uart1: <16550 or compatible> port 0x2f8-0x2ff irq 3 on acpi0 >> fdc0: port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on = acpi0 >> fdc0: does not respond >> device_attach: fdc0 attach returned 6 >> ppc0: port 0x378-0x37f,0x778-0x77f irq 7 drq 3 on = acpi0 >> ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode >> ppc0: FIFO with 16/16/9 bytes threshold >> ppbus0: on ppc0 >> plip0: on ppbus0 >> lpt0: on ppbus0 >> lpt0: Interrupt-driven port >> ppi0: on ppbus0 >> orm0: at iomem = 0xc0000-0xcafff,0xcb000-0xccfff,0xcd000-0xce7ff,0xce800-0xcffff on isa0 >> sc0: at flags 0x100 on isa0 >> sc0: VGA <16 virtual consoles, flags=3D0x300> >> vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on = isa0 >> fdc0: No FDOUT register! >> est0: on cpu0 >> est: CPU supports Enhanced Speedstep, but is not recognized. >> est: cpu_vendor GenuineIntel, msr 4921492106004921 >> device_attach: est0 attach returned 6 >> p4tcc0: on cpu0 >> est1: on cpu1 >> est: CPU supports Enhanced Speedstep, but is not recognized. >> est: cpu_vendor GenuineIntel, msr 4921492106004921 >> device_attach: est1 attach returned 6 >> p4tcc1: on cpu1 >> est2: on cpu2 >> est: CPU supports Enhanced Speedstep, but is not recognized. >> est: cpu_vendor GenuineIntel, msr 4921492106004921 >> device_attach: est2 attach returned 6 >> p4tcc2: on cpu2 >> est3: on cpu3 >> est: CPU supports Enhanced Speedstep, but is not recognized. >> est: cpu_vendor GenuineIntel, msr 4921492106004921 >> device_attach: est3 attach returned 6 >> p4tcc3: on cpu3 >> ZFS filesystem version 5 >> ZFS storage pool version 28 >> Timecounters tick every 1.000 msec >> usbus0: 12Mbps Full Speed USB v1.0 >> usbus1: 12Mbps Full Speed USB v1.0 >> usbus2: 12Mbps Full Speed USB v1.0 >> usbus3: 480Mbps High Speed USB v2.0 >> ugen0.1: at usbus0 >> uhub0: on = usbus0 >> ugen1.1: at usbus1 >> uhub1: on = usbus1 >> ugen2.1: at usbus2 >> uhub2: on = usbus2 >> ugen3.1: at usbus3 >> uhub3: on = usbus3 >> uhub0: 2 ports with 2 removable, self powered >> uhub1: 2 ports with 2 removable, self powered >> uhub2: 2 ports with 2 removable, self powered >> bce0: bce_pulse(): Warning: bootcode thinks driver is absent! = (bc_state =3D 0x00000004) >> bce1: bce_pulse(): Warning: bootcode thinks driver is absent! = (bc_state =3D 0x00000004) >> uhub3: 6 ports with 6 removable, self powered >> ada0 at ahcich0 bus 0 scbus0 target 0 lun 0 >> ada0: ATA-8 SATA 3.x device >> ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) >> ada0: Command Queueing enabled >> ada0: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C) >> ada0: Previously was known as ad4 >> ada1 at ahcich1 bus 0 scbus1 target 0 lun 0 >> ada1: ATA-8 SATA 2.x device >> ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) >> ada1: Command Queueing enabled >> ada1: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C) >> ada1: Previously was known as ad6 >> ada2 at ahcich2 bus 0 scbus2 target 0 lun 0 >> ada2: ATA-8 SATA 2.x device >> ada2: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) >> ada2: Command Queueing enabled >> ada2: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C) >> ada2: Previously was known as ad8 >> ada3 at ahcich3 bus 0 scbus3 target 0 lun 0 >> ada3: ATA-8 SATA 3.x device >> ada3: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) >> ada3: Command Queueing enabled >> ada3: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C) >> ada3: Previously was known as ad10 >> ada4 at ahcich4 bus 0 scbus4 target 0 lun 0 >> ada4: ATA-7 SATA 2.x device >> ada4: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) >> ada4: Command Queueing enabled >> ada4: 19087MB (39091248 512 byte sectors: 16H 63S/T 16383C) >> ada4: Previously was known as ad12 >> ada5 at ahcich5 bus 0 scbus5 target 0 lun 0 >> ada5: ATA-8 SATA 3.x device >> ada5: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) >> ada5: Command Queueing enabled >> ada5: 57241MB (117231408 512 byte sectors: 16H 63S/T 16383C) >> ada5: Previously was known as ad14 >> SMP: AP CPU #3 Launched! >> SMP: AP CPU #1 Launched! >> SMP: AP CPU #2 Launched! >> Timecounter "TSC-low" frequency 12370070 Hz quality 1000 >> Root mount waiting for: usbus3 >> ugen0.2: at usbus0 >> ukbd0: on usbus0 >> kbd2 at ukbd0 >> Root mount waiting for: usbus3 >> ugen3.2: at usbus3 >> umass0: on = usbus3 >> umass0: SCSI over Bulk-Only; quirks =3D 0x4100 >> umass0:7:0:-1: Attached to scbus7 >> Trying to mount root from zfs:NAS/system/root [rw]... >> da0 at umass-sim0 bus 0 scbus7 target 0 lun 0 >> da0: Removable Direct Access SCSI-0 = device >> da0: 40.000MB/s transfers >> da0: 7580MB (15523840 512 byte sectors: 255H 63S/T 966C) >> GEOM_PART: integrity check failed (label/zfs_boot, BSD) >> GEOM_PART: integrity check failed (label/zfs_boot, BSD) >> bce0: Gigabit link up! >> bce1: Gigabit link up! >>=20 >> zpool status: >> pool: NAS >> state: DEGRADED >> status: One or more devices is currently being resilvered. The pool = will >> continue to function, possibly in a degraded state. >> action: Wait for the resilver to complete. >> scan: resilver in progress since Tue Jul 2 07:31:29 2013 >> 483G scanned out of 3.49T at 10/s, (scan is slow, no estimated = time) >> 121G resilvered, 13.52% done >> config: >>=20 >> NAME STATE READ WRITE CKSUM >> NAS DEGRADED 0 0 0 >> raidz1-0 DEGRADED 0 0 0 >> replacing-0 DEGRADED 0 0 0 >> 1160414651350745057 UNAVAIL 0 0 0 was = /dev/ada0/old >> ada0 ONLINE 0 0 0 = (resilvering) >> ada3 ONLINE 0 0 0 >> ada1 ONLINE 0 0 0 >> ada2 ONLINE 0 0 0 >> logs >> mirror-1 ONLINE 0 0 0 >> ada4s1 ONLINE 0 0 0 >> ada5s1 ONLINE 0 0 0 >> cache >> ada5s2 ONLINE 0 0 0 >>=20 >> errors: No known data errors >>=20 >> zpool get all: >> NAME PROPERTY VALUE SOURCE >> NAS size 3.64T - >> NAS capacity 95% - >> NAS altroot - default >> NAS health DEGRADED - >> NAS guid 3808946822857359331 default >> NAS version 28 default >> NAS bootfs - default >> NAS delegation on default >> NAS autoreplace off default >> NAS cachefile - default >> NAS failmode wait default >> NAS listsnapshots off default >> NAS autoexpand off default >> NAS dedupditto 0 default >> NAS dedupratio 1.00x - >> NAS free 154G - >> NAS allocated 3.49T - >> NAS readonly off - >> NAS comment - default >> NAS expandsize 0 - >>=20 >> zfs get all: >> NAME PROPERTY VALUE = SOURCE >> NAS type filesystem - >> NAS creation Tue Aug 3 20:10 2010 - >> NAS used 2.61T - >> NAS available 72.0G - >> NAS referenced 62.8K - >> NAS compressratio 1.00x - >> NAS mounted yes - >> NAS quota none = default >> NAS reservation none = default >> NAS recordsize 128K = default >> NAS mountpoint /mnt/NAS = local >> NAS sharenfs off = default >> NAS checksum on = default >> NAS compression off = default >> NAS atime off = local >> NAS devices on = default >> NAS exec on = default >> NAS setuid on = default >> NAS readonly off = default >> NAS jailed off = default >> NAS snapdir hidden = default >> NAS aclmode discard = default >> NAS aclinherit restricted = default >> NAS canmount on = default >> NAS xattr off = temporary >> NAS copies 1 = default >> NAS version 5 - >> NAS utf8only off - >> NAS normalization none - >> NAS casesensitivity sensitive - >> NAS vscan off = default >> NAS nbmand off = default >> NAS sharesmb off = default >> NAS refquota none = default >> NAS refreservation none = default >> NAS primarycache all = default >> NAS secondarycache all = default >> NAS usedbysnapshots 0 - >> NAS usedbydataset 0 - >> NAS usedbychildren 0 - >> NAS usedbyrefreservation 0 - >> NAS logbias latency = default >> NAS dedup off = default >> NAS mlslabel - >> NAS sync standard = local >> NAS refcompressratio 1.00x - >> NAS written 62.8K - >>=20 >> the rest of created - defaults. no changes or inherited from this = one. >>=20 >> part show -p: >> nas# gpart show -p ada0 >> gpart: No such geom: ada0. >> nas# gpart show -p ada1 >> gpart: No such geom: ada1. >> nas# gpart show -p ada2 >> gpart: No such geom: ada2. >> nas# gpart show -p ada3 >> gpart: No such geom: ada3. >> nas# gpart show -p ada4 >> =3D> 63 39091185 ada4 MBR (18G) >> 63 20971503 ada4s1 freebsd (10G) >> 20971566 18119682 ada4s2 freebsd (8.7G) >>=20 >> nas# gpart show -p ada5 >> =3D> 63 117231345 ada5 MBR (55G) >> 63 20971503 ada5s1 freebsd (10G) >> 20971566 96259842 ada5s2 freebsd (45G) >>=20 >> ada4/ada5 are ZIL/L2ARC/swap >>=20 >> cat /etc/sysct.conf: >>=20 >> nas# cat /etc/sysctl.conf >> # $FreeBSD: src/etc/sysctl.conf,v 1.8.34.1.6.1 2010/12/21 17:09:25 = kensmith Exp $ >> # >> # This file is read when going to multi-user and its contents piped = thru >> # ``sysctl'' to adjust kernel values. ``man 5 sysctl.conf'' for = details. >> # >>=20 >> # Uncomment this to prevent users from seeing information about = processes that >> # are being run under another UID. >> security.bsd.see_other_uids=3D0 >>=20 >> # Disable power button >> hw.acpi.power_button_state=3DNONE >> # Disable core dump >> kern.coredump=3D0 >>=20 >> kern.ipc.maxsockbuf=3D2097152 >> kern.ipc.nmbclusters=3D32768 >> kern.ipc.somaxconn=3D8192 >> kern.maxfiles=3D65536 >> kern.maxfilesperproc=3D32768 >> kern.ipc.nmbjumbo9=3D12800 >>=20 >> # ZFS tuning >> kern.maxvnodes=3D400000 >> kern.maxfilesperproc=3D200000 >> kern.ipc.maxsockets=3D204800 >>=20 >> = #http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#File= -Level_Prefetching >> vfs.zfs.l2arc_noprefetch=3D0 >> vfs.zfs.l2arc_write_max=3D209715200 ## in bytes. 200MB/s >> vfs.zfs.l2arc_write_boost=3D314572800 ## in bytes. 300MB/s >>=20 >> #net.inet.icmp.icmplim=3D300 >> #net.inet.icmp.icmplim_output=3D1 >> #net.inet.ip.fastforwarding=3D1 >> #net.inet.ip.forwarding=3D1 >> #net.inet.tcp.delayed_ack=3D0 >> #net.inet.tcp.inflight.enable=3D0 >> #net.inet.tcp.path_mtu_discovery=3D0 >> #net.inet.tcp.recvspace=3D262144 >> #net.inet.tcp.rfc1323=3D1 >> #net.inet.tcp.sendspace=3D262144 >> #net.inet.udp.maxdgram=3D57344 >> #net.inet.udp.recvspace=3D65536 >> #net.local.stream.recvspace=3D65536 >> #net.local.stream.sendspace=3D65536 >>=20 >>=20 >> # Increase max command-line length showed in `ps` (e.g for = Tomcat/Java) >> # Default is PAGE_SIZE / 16 or 256 on x86 >> # For more info see: = http://www.freebsd.org/cgi/query-pr.cgi?pr=3D120749 >> kern.ps_arg_cache_limit=3D4096 >>=20 >> net.inet.tcp.sendspace=3D65535 >> net.inet.tcp.recvspace=3D131072 >> net.inet.tcp.mssdflt=3D1452 >> net.inet.tcp.sendbuf_max=3D16777216 >> net.inet.tcp.sendbuf_inc=3D524288 >> net.inet.tcp.recvbuf_max=3D16777216 >> net.inet.tcp.recvbuf_inc=3D524288 >> net.inet.udp.recvspace=3D65535 >> net.inet.udp.maxdgram=3D65535 >> net.local.stream.recvspace=3D65535 >> net.local.stream.sendspace=3D65535 >> net.inet.tcp.delayed_ack=3D0 >>=20 >> cat /boot/loader.conf: >>=20 >> nas# cat /boot/loader.conf >> zfs_load=3D"YES" >> kern.maxfiles=3D"20480" >>=20 >> vfs.root.mountfrom=3D"zfs:NAS/system/root" >> vfs.root.mountfrom.options=3D"rw" >>=20 >> #I have 8G of Ram >> #vfs.zfs.prefetch_disable=3D0 >> vfs.zfs.write_limit_override=3D53687091200 >>=20 >> #If Ram =3D 4GB, set the value to 512M >> #If Ram =3D 8GB, set the value to 1024M >> #vfs.zfs.arc_min=3D"4096M" >>=20 >> #Ram x 0.5 - 512 MB >> #vfs.zfs.arc_max=3D"15488" >>=20 >> #Ram x 2 >> #vm.kmem_size_max=3D"64G" >>=20 >> #Ram x 1.5 >> #vm.kmem_size=3D"48G" >>=20 >> ahci_load=3D"YES" >> aio_load=3D"YES" >> if_em_load=3D"YES" >> if_lagg_load=3D"YES" >>=20 >> THIS IS ROOT ON ZFS SYS. >=20 > Thanks for the info. Given the above, I need even more: >=20 > - Output from: zdb -C >=20 > - Output from: smartctl -a ada{0,5} >=20 > And yes, "9/s" or "10/s" in resilvering means bytes per second. I = have > a couple feelings about what's going on (meaning right now I have 3 > separate theories), but I need to see the above first. >=20 > Also, I swore I remember some Intel ESB2 thing that got committed to > stable/9 recently, not sure what I'm remembering or if I'm remembering > it right though, will go look it up later. I should be clear however: = I > do not believe at this time the problem you're experiencing is with = the > ESB2 southbridge or anything along those lines. If there were issues > with that on FreeBSD I'd expect them to be widespread given the > popularity of the chipset in the server realm. >=20 > --=20 > | Jeremy Chadwick jdc@koitsu.org | > | UNIX Systems Administrator http://jdc.koitsu.org/ | > | Making life hard for others since 1977. PGP 4BD6C0CB | >=20 From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 18:13:14 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id EDF1E6D0 for ; Thu, 4 Jul 2013 18:13:14 +0000 (UTC) (envelope-from joh.hendriks@gmail.com) Received: from mail-ve0-x22d.google.com (mail-ve0-x22d.google.com [IPv6:2607:f8b0:400c:c01::22d]) by mx1.freebsd.org (Postfix) with ESMTP id A12C91736 for ; Thu, 4 Jul 2013 18:13:14 +0000 (UTC) Received: by mail-ve0-f173.google.com with SMTP id jw11so1211848veb.32 for ; Thu, 04 Jul 2013 11:13:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=LutsGvdAbTlsyGktjsoFAS/ccrulQbojjoD7dOm9cN4=; b=sK1aUOE3jDy2fOBcgDKqttAKfC3/MJS7Sv1AJBYKkf6TQQ/a9jLKjp8dwOVTKHJOtT eIFlenJABXHNPewHLgF7DG7jrliE5vrPeUOOazAlL7VQSxjymFPOZC2GXzzaD+88PAha O1yrWu+z2vwSBjwwuV4Tgg4+D/YQH6Waerr2yyg2T0sjaJ1nvfPUY3x/XdvXKuwhMjcn SXydP5orDXZg5s0txnp4spM5SlMCygLjn3TzTlpcTF8CDadmfE8V6HrOz4hDrwxczpnm Hd27XWjgHo9TNBDjjomFnl5clLhDoSJaf0JjLD91QeyIfvdXhznCXJduyhGhhKLw/N4P HaEA== MIME-Version: 1.0 X-Received: by 10.220.162.200 with SMTP id w8mr3911720vcx.90.1372961594193; Thu, 04 Jul 2013 11:13:14 -0700 (PDT) Received: by 10.58.227.10 with HTTP; Thu, 4 Jul 2013 11:13:14 -0700 (PDT) In-Reply-To: References: <51D42107.1050107@digsys.bg> <2EF46A8C-6908-4160-BF99-EC610B3EA771@alumni.chalmers.se> <51D437E2.4060101@digsys.bg> <20130704000405.GA75529@icarus.home.lan> Date: Thu, 4 Jul 2013 20:13:14 +0200 Message-ID: Subject: Re: Slow resilvering with mirrored ZIL From: Johan Hendriks To: mxb Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 18:13:15 -0000 Op donderdag 4 juli 2013 schreef mxb (mxb@alumni.chalmers.se) het volgende: > > Well, I'v got a lot of errors in dmesg regarding one of four disks, which > is not currently replaced. > After resilvering dropped to 2MB/s ( :O ), it stuck. I had to reboot > system. > It came up and now continues resilvering process at 9/s (9 what??? bytes? ) > > Here comes requested info: > > dmesg(plan was to re-new disks and then lift this sys up to STABLE): > > Copyright (c) 1992-2012 The FreeBSD Project. > Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 > The Regents of the University of California. All rights reserved. > FreeBSD is a registered trademark of The FreeBSD Foundation. > FreeBSD 9.1-RELEASE-p1 #2: Fri Mar 8 10:30:38 CET 2013 > root@nas.home.unixconn.com:/usr/obj/usr/src/sys/GENERIC amd64 > module_register: module pci/em already exists! > Module pci/em failed to register: 17 > module_register: module pci/lem already exists! > Module pci/lem failed to register: 17 > CPU: Intel(R) Xeon(R) CPU X5460 @ 3.16GHz (3166.74-MHz K8-class > CPU) > Origin = "GenuineIntel" Id = 0x10676 Family = 6 Model = 17 Stepping > = 6 > > Features=0xbfebfbff > > Features2=0xce3bd > AMD Features=0x20100800 > AMD Features2=0x1 > TSC: P-state invariant, performance statistics > real memory = 34359738368 (32768 MB) > avail memory = 33095909376 (31562 MB) > Event timer "LAPIC" quality 400 > ACPI APIC Table: > FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs > FreeBSD/SMP: 1 package(s) x 4 core(s) > cpu0 (BSP): APIC ID: 0 > cpu1 (AP): APIC ID: 1 > cpu2 (AP): APIC ID: 2 > cpu3 (AP): APIC ID: 3 > ioapic0 irqs 0-23 on motherboard > ioapic1 irqs 24-47 on motherboard > kbd1 at kbdmux0 > ctl: CAM Target Layer loaded > acpi0: on motherboard > acpi0: Power Button (fixed) > cpu0: on acpi0 > cpu1: on acpi0 > cpu2: on acpi0 > cpu3: on acpi0 > atrtc0: port 0x70-0x71 irq 8 on acpi0 > Event timer "RTC" frequency 32768 Hz quality 0 > attimer0: port 0x40-0x43,0x50-0x53 irq 0 on acpi0 > Timecounter "i8254" frequency 1193182 Hz quality 0 > Event timer "i8254" frequency 1193182 Hz quality 100 > Timecounter "ACPI-fast" frequency 3579545 Hz quality 900 > acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1008-0x100b on acpi0 > pcib0: port 0xcf8-0xcff on acpi0 > pci0: on pcib0 > pcib1: at device 2.0 on pci0 > pci1: on pcib1 > pcib2: irq 16 at device 0.0 on pci1 > pci2: on pcib2 > pcib3: irq 16 at device 0.0 on pci2 > pci3: on pcib3 > pcib4: at device 0.0 on pci3 > pci4: on pcib4 > pcib5: at device 0.2 on pci3 > pci5: on pcib5 > pcib6: irq 18 at device 2.0 on pci2 > pci6: on pcib6 > em0: port 0x2000-0x201f mem > 0xdc000000-0xdc01ffff irq 18 at device 0.0 on pci6 > em0: Using an MSI interrupt > em0: Ethernet address: 00:30:48:34:46:36 > em1: port 0x2020-0x203f mem > 0xdc020000-0xdc03ffff irq 19 at device 0.1 on pci6 > em1: Using an MSI interrupt > em1: Ethernet address: 00:30:48:34:46:37 > pcib7: at device 0.3 on pci1 > pci7: on pcib7 > pcib8: at device 4.0 on pci0 > pci8: on pcib8 > pcib9: at device 0.0 on pci8 > pci9: on pcib9 > bce0: mem > 0xd8000000-0xd9ffffff irq 16 at device 4.0 on pci9 > miibus0: on bce0 > brgphy0: PHY 1 on miibus0 > brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, > 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow > bce0: Ethernet address: 00:1b:78:38:2f:28 > bce0: ASIC (0x57060020); Rev (A2); Bus (PCI-X, 64-bit, 100MHz); B/C > (1.9.6); Bufs (RX:2;TX:2;PG:8); Flags (SPLT|MSI) > Coal (RX:6,6,18,18; TX:20,20,80,80) > pcib10: at device 0.2 on pci8 > pci10: on pcib10 > bce1: mem > 0xda000000-0xdbffffff irq 17 at device 5.0 on pci10 > miibus1: on bce1 > brgphy1: PHY 1 on miibus1 > brgphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, > 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow > bce1: Ethernet address: 00:1b:78:38:2f:2a > bce1: ASIC (0x57060020); Rev (A2); Bus (PCI-X, 64-bit, 100MHz); B/C > (1.9.6); Bufs (RX:2;TX:2;PG:8); Flags (SPLT|MSI) > Coal (RX:6,6,18,18; TX:20,20,80,80) > pcib11: at device 6.0 on pci0 > pci11: on pcib11 > pci0: at device 8.0 (no driver attached) > pcib12: irq 17 at device 28.0 on pci0 > pci12: on pcib12 > uhci0: port > 0x1800-0x181f irq 17 at device 29.0 on pci0 > usbus0 on uhci0 > uhci1: port > 0x1820-0x183f irq 19 at device 29.1 on pci0 > usbus1 on uhci1 > uhci2: port > 0x1840-0x185f irq 18 at device 29.2 on pci0 > usbus2 on uhci2 > ehci0: mem 0xdc500000-0xdc5003ff irq 17 > at device 29.7 on pci0 > usbus3: EHCI version 1.0 > usbus3 on ehci0 > pcib13: at device 30.0 on pci0 > pci13: on pcib13 > vgapci0: port 0x3000-0x30ff mem > 0xd0000000-0xd7ffffff,0xdc200000-0xdc20ffff irq 18 at device 1.0 on pci13 > isab0: at device 31.0 on pci0 > isa0: on isab0 > ahci0: port > 0x1890-0x1897,0x1884-0x1887,0x1888-0x188f,0x1880-0x1883,0x1860-0x187f mem > 0xdc500400-0xdc5007ff irq 19 at device 31.2 on pci0 > ahci0: AHCI v1.10 with 6 3Gbps ports, Port Multiplier supported > ahcich0: at channel 0 on ahci0 > ahcich1: at channel 1 on ahci0 > ahcich2: at channel 2 on ahci0 > ahcich3: at channel 3 on ahci0 > ahcich4: at channel 4 on ahci0 > ahcich5: at channel 5 on ahci0 > pci0: at device 31.3 (no driver attached) > acpi_button0: on acpi0 > atkbdc0: port 0x60,0x64 irq 1 on acpi0 > atkbd0: irq 1 on atkbdc0 > kbd0 at atkbd0 > atkbd0: [GIANT-LOCKED] > uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 > uart1: <16550 or compatible> port 0x2f8-0x2ff irq 3 on acpi0 > fdc0: port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0 > fdc0: does not respond > device_attach: fdc0 attach returned 6 > ppc0: port 0x378-0x37f,0x778-0x77f irq 7 drq 3 on acpi0 > ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode > ppc0: FIFO with 16/16/9 bytes threshold > ppbus0: on ppc0 > plip0: on ppbus0 > lpt0: on ppbus0 > lpt0: Interrupt-driven port > ppi0: on ppbus0 > orm0: at iomem > 0xc0000-0xcafff,0xcb000-0xccfff,0xcd000-0xce7ff,0xce800-0xcffff on isa0 > sc0: at flags 0x100 on isa0 > sc0: VGA <16 virtual consoles, flags=0x300> > vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 > fdc0: No FDOUT register! > est0: on cpu0 > est: CPU supports Enhanced Speedstep, but is not recognized. > est: cpu_vendor GenuineIntel, msr 4921492106004921 > device_attach: est0 attach returned 6 > p4tcc0: on cpu0 > est1: on cpu1 > est: CPU supports Enhanced Speedstep, but is not recognized. > est: cpu_vendor GenuineIntel, msr 4921492106004921 > device_attach: est1 attach returned 6 > p4tcc1: on cpu1 > est2: on cpu2 > est: CPU supports Enhanced Speedstep, but is not recognized. > est: cpu_vendor GenuineIntel, msr 4921492106004921 > device_attach: est2 attach returned 6 > p4tcc2: on cpu2 > est3: on cpu3 > est: CPU supports Enhanced Speedstep, but is not recognized. > est: cpu_vendor GenuineIntel, msr 4921492106004921 > device_attach: est3 attach returned 6 > p4tcc3: on cpu3 > ZFS filesystem version 5 > ZFS storage pool version 28 > Timecounters tick every 1.000 msec > usbus0: 12Mbps Full Speed USB v1.0 > usbus1: 12Mbps Full Speed USB v1.0 > usbus2: 12Mbps Full Speed USB v1.0 > usbus3: 480Mbps High Speed USB v2.0 > ugen0.1: at usbus0 > uhub0: on usbus0 > ugen1.1: at usbus1 > uhub1: on usbus1 > ugen2.1: at usbus2 > uhub2: on usbus2 > ugen3.1: at usbus3 > uhub3: on usbus3 > uhub0: 2 ports with 2 removable, self powered > uhub1: 2 ports with 2 removable, self powered > uhub2: 2 ports with 2 removable, self powered > bce0: bce_pulse(): Warning: bootcode thinks driver is absent! (bc_state = > 0x00000004) > bce1: bce_pulse(): Warning: bootcode thinks driver is absent! (bc_state = > 0x00000004) > uhub3: 6 ports with 6 removable, self powered > ada0 at ahcich0 bus 0 scbus0 target 0 lun 0 > ada0: ATA-8 SATA 3.x device > ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) > ada0: Command Queueing enabled > ada0: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C) > ada0: Previously was known as ad4 > ada1 at ahcich1 bus 0 scbus1 target 0 lun 0 > ada1: ATA-8 SATA 2.x device > ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) > ada1: Command Queueing enabled > ada1: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C) > ada1: Previously was known as ad6 > ada2 at ahcich2 bus 0 scbus2 target 0 lun 0 > ada2: ATA-8 SATA 2.x device > ada2: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) > ada2: Command Queueing enabled > ada2: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C) > ada2: Previously was known as ad8 > ada3 at ahcich3 bus 0 scbus3 target 0 lun 0 > ada3: ATA-8 SATA 3.x device > ada3: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) > ada3: Command Queueing enabled > ada3: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C) > ada3: Previously was known as ad10 > ada4 at ahcich4 bus 0 scbus4 target 0 lun 0 > ada4: ATA-7 SATA 2.x device > ada4: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) > ada4: Command Queueing enabled > ada4: 19087MB (39091248 512 byte sectors: 16H 63S/T 16383C) > ada4: Previously was known as ad12 > ada5 at ahcich5 bus 0 scbus5 target 0 lun 0 > ada5: ATA-8 SATA 3.x device > ada5: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) > ada5: Command Queueing enabled > ada5: 57241MB (117231408 512 byte sectors: 16H 63S/T 16383C) > ada5: Previously was known as ad14 > SMP: AP CPU #3 Launched! > SMP: AP CPU #1 Launched! > SMP: AP CPU #2 Launched! > Timecounter "TSC-low" frequency 12370070 Hz quality 1000 > Root mount waiting for: usbus3 > ugen0.2: at usbus0 > ukbd0: 2.00/5.35, addr 2> on usbus0 > kbd2 at ukbd0 > Root mount waiting for: usbus3 > ugen3.2: at usbus3 > umass0: on usbus3 > umass0: SCSI over Bulk-Only; quirks = 0x4100 > umass0:7:0:-1: Attached to scbus7 > Trying to mount root from zfs:NAS/system/root [rw]... > da0 at umass-sim0 bus 0 scbus7 target 0 lun 0 > da0: Removable Direct Access SCSI-0 device > da0: 40.000MB/s transfers > da0: 7580MB (15523840 512 byte sectors: 255H 63S/T 966C) > GEOM_PART: integrity check failed (label/zfs_boot, BSD) > GEOM_PART: integrity check failed (label/zfs_boot, BSD) > bce0: Gigabit link up! > bce1: Gigabit link up! > > zpool status: > pool: NAS > state: DEGRADED > status: One or more devices is currently being resilvered. The pool will > continue to function, possibly in a degraded state. > action: Wait for the resilver to complete. > scan: resilver in progress since Tue Jul 2 07:31:29 2013 > 483G scanned out of 3.49T at 10/s, (scan is slow, no estimated > time) > 121G resilvered, 13.52% done > config: > > NAME STATE READ WRITE CKSUM > NAS DEGRADED 0 0 0 > raidz1-0 DEGRADED 0 0 0 > replacing-0 DEGRADED 0 0 0 > 1160414651350745057 UNAVAIL 0 0 0 was > /dev/ada0/old > ada0 ONLINE 0 0 0 > (resilvering) > ada3 ONLINE 0 0 0 > ada1 ONLINE 0 0 0 > ada2 ONLINE 0 0 0 > logs > mirror-1 ONLINE 0 0 0 > ada4s1 ONLINE 0 0 0 > ada5s1 ONLINE 0 0 0 > cache > ada5s2 ONLINE 0 0 0 > > errors: No known data errors > > zpool get all: > NAME PROPERTY VALUE SOURCE > NAS size 3.64T - > NAS capacity 95% - > NAS altroot - default > NAS health DEGRADED - > NAS guid 3808946822857359331 default > NAS version 28 default > NAS bootfs - default > NAS delegation on default > NAS autoreplace off default > NAS cachefile - default > NAS failmode wait default > NAS listsnapshots off default > NAS autoexpand off default > NAS dedupditto 0 default > NAS dedupratio 1.00x - > NAS free 154G - > NAS allocated 3.49T - > NAS readonly off - > NAS comment - default > NAS expandsize 0 - > > zfs get all: > NAME PROPERTY VALUE SOURCE > NAS type filesystem - > NAS creation Tue Aug 3 20:10 2010 - > NAS used 2.61T - > NAS available 72.0G - > NAS referenced > > > > > On 4 jul 2013, at 02:04, Jeremy Chadwick wrote: > > > On Wed, Jul 03, 2013 at 05:12:17PM +0200, mxb wrote: > >> > >> Not sure if new are 4k. Done nothing about that. > >> But the SECOND drive, resilvering is SLOW. Not the first one. > >> > >> As stated below. Those changes are introduced to the system. > >> ALL new driver ARE identical, except S/N of cause :) > >> > >> On 3 jul 2013, at 16:55, Steven Hartland > wrote: > >> > >>> > >>> ----- Original Message ----- From: "Daniel Kalchev" > >>> To: "mxb" > >>> Cc: > >>> Sent: Wednesday, July 03, 2013 3:40 PM > >>> Subject: Re: Slow resilvering with mirrored ZIL > >>> > >>> > >>>> On 03.07.13 16:36, mxb wrote: > >>>>> Well, then my question persists - why I get so significant drop of > speed while resilvering second drive. > >>>>> The only changes to the system are: > >>>>> > >>>>> 1. Second partition for ZIL to create a mirror > >>>>> 2. New disks are 7200rpm. old ones are 5400rpm. > >>>>> > >>> > >>> Its not something like the old disks are 512byte sectors > >>> where as the new ones are 4k? > >>> > >>> It this is the case having already replaced one disk you've > >>> killed performance as its having to do lots more work reading > >>> none 4k aligned data? > > > > Not enough hard information to diagnose. What's needed is below; you > > can XXX out system names if you want, but please do not XXX out anything > > else. > > > > - Output from: dmesg > > > > - Output from: zpool status > > > > - Output from: zpool get all > > > > - Output from: zfs get all > > > > - Output from: "gpart show -p" for every disk on the system > > > > - Output from: cat /etc/sysctl.conf > > > > - Output from: cat /boot/loader.conf > > > > - Output from: "smartctl -a" for every disk that's used by ZFS > > (please use smartmontools 6.1 or newer in this case; install > > ports/sysutils/smartmontools) > > > > I can think o Could it be that your pool is at a capacity of 95%.. In a lot of cases this slows down a pool. Maybe this is what you see. Regards Johan From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 18:15:02 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 4DEA77A8; Thu, 4 Jul 2013 18:15:02 +0000 (UTC) (envelope-from marck@rinet.ru) Received: from woozle.rinet.ru (woozle.rinet.ru [195.54.192.68]) by mx1.freebsd.org (Postfix) with ESMTP id CFD941753; Thu, 4 Jul 2013 18:15:01 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by woozle.rinet.ru (8.14.5/8.14.5) with ESMTP id r64IF0gL060489; Thu, 4 Jul 2013 22:15:00 +0400 (MSK) (envelope-from marck@rinet.ru) Date: Thu, 4 Jul 2013 22:15:00 +0400 (MSK) From: Dmitry Morozovsky To: Andriy Gapon Subject: Re: boot from ZFS: which pool types use? In-Reply-To: <51D5A20F.4070103@FreeBSD.org> Message-ID: References: <51D56066.1020902@FreeBSD.org> <51D577A9.1030304@gmail.com> <51D59AAD.3030208@FreeBSD.org> <51D5A20F.4070103@FreeBSD.org> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) X-NCC-RegID: ru.rinet X-OpenPGP-Key-ID: 6B691B03 MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (woozle.rinet.ru [0.0.0.0]); Thu, 04 Jul 2013 22:15:00 +0400 (MSK) Cc: freebsd-fs@FreeBSD.org, Volodymyr Kostyrko X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 18:15:02 -0000 On Thu, 4 Jul 2013, Andriy Gapon wrote: > > However: setting this in boot.loader (or manually in loader prompt) is not > > enough to have bootable system > > > > And this, I suppose, could be attacked to reduce possibility of > > user-configuration errors: not having a possibility to point at loader is a bit > > disappointing, you see ;) > > I am confused with this's and that's and unknown file names. Well, sorry for not being explicit enough -- after all, we're both non-native speakers ;P > What exactly did you try and what exactly did not work? Now, examining my setups, I'm not quite sure what could we do in non-deterministic situation like I've created myself :) > Please note that vfs.root.mountfrom tells loader to tell kernel from where it > should mount root fs. It can not tell earlier boot blocks where to find loader > itself. Understood, and this is fair enough. Now, looking at zfsboot.c, couldn't we (for finding /boot/zfsloader suitable for us) loop over, say, top-level datasets in pools we have, in abcense of special property? As for me, this should plug the most annoying mistakes... -- Sincerely, D.Marck [DM5020, MCK-RIPE, DM3-RIPN] [ FreeBSD committer: marck@FreeBSD.org ] ------------------------------------------------------------------------ *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- marck@rinet.ru *** ------------------------------------------------------------------------ From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 18:18:45 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 629B486C; Thu, 4 Jul 2013 18:18:45 +0000 (UTC) (envelope-from marck@rinet.ru) Received: from woozle.rinet.ru (woozle.rinet.ru [195.54.192.68]) by mx1.freebsd.org (Postfix) with ESMTP id E43351784; Thu, 4 Jul 2013 18:18:44 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by woozle.rinet.ru (8.14.5/8.14.5) with ESMTP id r64IIhL2060729; Thu, 4 Jul 2013 22:18:43 +0400 (MSK) (envelope-from marck@rinet.ru) Date: Thu, 4 Jul 2013 22:18:43 +0400 (MSK) From: Dmitry Morozovsky To: Andriy Gapon Subject: Re: boot from ZFS: which pool types use? In-Reply-To: Message-ID: References: <51D56066.1020902@FreeBSD.org> <51D577A9.1030304@gmail.com> <51D59AAD.3030208@FreeBSD.org> <51D5A20F.4070103@FreeBSD.org> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) X-NCC-RegID: ru.rinet X-OpenPGP-Key-ID: 6B691B03 MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (woozle.rinet.ru [0.0.0.0]); Thu, 04 Jul 2013 22:18:43 +0400 (MSK) Cc: freebsd-fs@freebsd.org, Volodymyr Kostyrko X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 18:18:45 -0000 On Thu, 4 Jul 2013, Dmitry Morozovsky wrote: > Now, looking at zfsboot.c, couldn't we (for finding /boot/zfsloader suitable > for us) loop over, say, top-level datasets in pools we have, in abcense of > special property? As for me, this should plug the most annoying mistakes... And followup: for ufs/fdisk, one could easily drop in the boot process and switch to other disk, slice, partitiona and even loader. Not that it's (at least easily) achievable with zfsloader... And, not, unfortunately, I have no patches to apply.... -- Sincerely, D.Marck [DM5020, MCK-RIPE, DM3-RIPN] [ FreeBSD committer: marck@FreeBSD.org ] ------------------------------------------------------------------------ *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- marck@rinet.ru *** ------------------------------------------------------------------------ From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 19:10:01 2013 Return-Path: Delivered-To: freebsd-fs@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 6B6BCC3D for ; Thu, 4 Jul 2013 19:10:01 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id 5D75E19A2 for ; Thu, 4 Jul 2013 19:10:01 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id r64JA1Un028439 for ; Thu, 4 Jul 2013 19:10:01 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.7/8.14.7/Submit) id r64JA17V028438; Thu, 4 Jul 2013 19:10:01 GMT (envelope-from gnats) Date: Thu, 4 Jul 2013 19:10:01 GMT Message-Id: <201307041910.r64JA17V028438@freefall.freebsd.org> To: freebsd-fs@FreeBSD.org Cc: From: dfilter@FreeBSD.ORG (dfilter service) Subject: Re: kern/180236: commit references a PR X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: dfilter service List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 19:10:01 -0000 The following reply was made to PR kern/180236; it has been noted by GNATS. From: dfilter@FreeBSD.ORG (dfilter service) To: bug-followup@FreeBSD.org Cc: Subject: Re: kern/180236: commit references a PR Date: Thu, 4 Jul 2013 19:01:25 +0000 (UTC) Author: kib Date: Thu Jul 4 19:01:18 2013 New Revision: 252714 URL: http://svnweb.freebsd.org/changeset/base/252714 Log: The tvp vnode on rename is usually unlinked. Drop the cached null vnode for tvp to allow the free of the lower vnode, if needed. PR: kern/180236 Tested by: smh Sponsored by: The FreeBSD Foundation MFC after: 1 week Modified: head/sys/fs/nullfs/null_vnops.c Modified: head/sys/fs/nullfs/null_vnops.c ============================================================================== --- head/sys/fs/nullfs/null_vnops.c Thu Jul 4 18:59:58 2013 (r252713) +++ head/sys/fs/nullfs/null_vnops.c Thu Jul 4 19:01:18 2013 (r252714) @@ -554,6 +554,7 @@ null_rename(struct vop_rename_args *ap) struct vnode *fvp = ap->a_fvp; struct vnode *fdvp = ap->a_fdvp; struct vnode *tvp = ap->a_tvp; + struct null_node *tnn; /* Check for cross-device rename. */ if ((fvp->v_mount != tdvp->v_mount) || @@ -568,7 +569,11 @@ null_rename(struct vop_rename_args *ap) vrele(fvp); return (EXDEV); } - + + if (tvp != NULL) { + tnn = VTONULL(tvp); + tnn->null_flags |= NULLV_DROP; + } return (null_bypass((struct vop_generic_args *)ap)); } _______________________________________________ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 19:12:19 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 6860CD39 for ; Thu, 4 Jul 2013 19:12:19 +0000 (UTC) (envelope-from jdc@koitsu.org) Received: from relay5-d.mail.gandi.net (relay5-d.mail.gandi.net [217.70.183.197]) by mx1.freebsd.org (Postfix) with ESMTP id 0E06919C7 for ; Thu, 4 Jul 2013 19:12:18 +0000 (UTC) Received: from mfilter1-d.gandi.net (mfilter1-d.gandi.net [217.70.178.130]) by relay5-d.mail.gandi.net (Postfix) with ESMTP id D583441C07E; Thu, 4 Jul 2013 21:12:07 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at mfilter1-d.gandi.net Received: from relay5-d.mail.gandi.net ([217.70.183.197]) by mfilter1-d.gandi.net (mfilter1-d.gandi.net [10.0.15.180]) (amavisd-new, port 10024) with ESMTP id zN6G9c8c7fBE; Thu, 4 Jul 2013 21:12:06 +0200 (CEST) X-Originating-IP: 76.102.14.35 Received: from jdc.koitsu.org (c-76-102-14-35.hsd1.ca.comcast.net [76.102.14.35]) (Authenticated sender: jdc@koitsu.org) by relay5-d.mail.gandi.net (Postfix) with ESMTPSA id 4EC5841C053; Thu, 4 Jul 2013 21:12:05 +0200 (CEST) Received: by icarus.home.lan (Postfix, from userid 1000) id 49D6673A1C; Thu, 4 Jul 2013 12:12:03 -0700 (PDT) Date: Thu, 4 Jul 2013 12:12:03 -0700 From: Jeremy Chadwick To: mxb Subject: Re: Slow resilvering with mirrored ZIL Message-ID: <20130704191203.GA95642@icarus.home.lan> References: <51D42107.1050107@digsys.bg> <2EF46A8C-6908-4160-BF99-EC610B3EA771@alumni.chalmers.se> <51D437E2.4060101@digsys.bg> <20130704000405.GA75529@icarus.home.lan> <20130704171637.GA94539@icarus.home.lan> <2A261BEA-4452-4F6A-8EFB-90A54D79CBB9@alumni.chalmers.se> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <2A261BEA-4452-4F6A-8EFB-90A54D79CBB9@alumni.chalmers.se> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 19:12:19 -0000 On Thu, Jul 04, 2013 at 07:36:27PM +0200, mxb wrote: > zdb -C: > > NAS: > version: 28 > name: 'NAS' > state: 0 > txg: 19039918 > pool_guid: 3808946822857359331 > hostid: 516334119 > hostname: 'nas.home.unixconn.com' > vdev_children: 2 > vdev_tree: > type: 'root' > id: 0 > guid: 3808946822857359331 > children[0]: > type: 'raidz' > id: 0 > guid: 15126043265564363201 > nparity: 1 > metaslab_array: 14 > metaslab_shift: 32 > ashift: 9 Wow, okay, where do I begin... should I just send you a bill for all this? :P I kid, but seriously... Your pool is not correctly 4K-aligned (ashift 12); it is clearly ashift 9, which is aligned to 512 bytes. This means you did not do the 4K gnop procedure which is needed for proper alignment: http://ivoras.net/blog/tree/2011-01-01.freebsd-on-4k-sector-drives.html The performance hit on 4K sector drives (particularly writes) is tremendous. The performance hit on SSDs is borderline catastrophic. The 4K procedure does not hurt/harm/hinder older 512-byte sector drives in any way, so using 4K alignment for all drives regardless of sector size is perfectly safe. I believe -- but I need someone else to chime in here with confirmation, particularly someone who is familiar with ZFS's internals -- once your pool is ashift 12, you can do a disk replacement ***without*** having to do the gnop procedure (because the pool itself is already using ashift 12). But again, I need someone to confirm that. Next topic. Your disks are as follows: ada0: ST32000645NS -- Seagate Constellation ES.2, 2TB, 7200rpm, 512-byte sector ada1: WD10EARS -- Western Digital Green, 1TB, 5400/7200rpm, 4KB sector ada2: WD10EARS -- Western Digital Green, 1TB, 5400/7200rpm, 4KB sector ada3: ST32000645NS -- Seagate Constellation ES.2, 2TB, 7200rpm, 512-byte sector ada4: INTEL SSDSA2VP020G2 -- Intel 311 Series, 20GB, SSD, 4KB sector ada5: OCZ-AGILITY3 -- OCZ Agility 3, 60GB, SSD, 4KB sector The WD10EARS are known for excessively parking their heads, which causes massive performance problems with both reads and writes. This is known by PC enthusiasts as the "LCC issue" (LCC = Load Cycle Count, referring to SMART attribute 193). On these drives there are ways to work around this issue -- it specifically involves disabling drive-level APM. To do so, you have to initiate a specific ATA CDB to the drive using "camcontrol cmd", and this has to be done every time the system reboots. There is one drawback to disabling APM as well: the drives run hotter. In general, stay away from any "Green" or "GreenPower" or "EcoGreen" drives from any vendor. Likewise, many of Seagate's drives these days parking their heads excessively and **without** any way to disable the behaviour (and in later firmwares, they don't even increment LCC, and they did this solely because customers were noticing the behaviour and complaining about it so they decided to hide it. Cute) I tend to recommend WD Red drives these days -- and not because they're "NAS friendly" (marketing drivel), but because they don't have retarded firmware settings, don't act stupid/get in the way, use single 1TB platters (thus less heads, thus less heat, less parts to fail, less vibration), and perform a little bit better than the Greens. Next topic, sort of circling back up... Your ada4 and ada5 drives -- the slices, I mean -- are ALSO not not properly aligned to a 4K boundary. This is destroying their performance and probably destroying the drives every time they write. They are having to do two actual erase-write cycles to make up for this problem, due to misaligned boundaries. Combine this fact with the fact that 9.1-RELEASE does not support TRIM on ZFS, and you now have SSDs which are probably beat to hell and back. You really need to be running stable/9 if you want to use SSDs with ZFS. I cannot stress this enough. I will not bend on this fact. I do not care if what people have are SLC rather than MLC or TLC -- it doesn't matter. TRIM on ZFS is a downright necessity for long-term reliability of an SSD. Anyway... These SSDs need a full Secure Erase done to them. In stable/9 you can do this through camcontrol, otherwise you need to use Linux (there are live CD/DVD distros that can do this for you) or the vendor's native utilities (in Windows usually). UNDERSTAND: THIS IS NOT THE SAME AS A "DISK FORMAT" OR "ZEROING THE DISK". In fact, dd if=/dev/zero to zero an SSD would be the worst possible thing you could do to it. Secure Erase clears the entire FTL and resets the wear levelling matrix (that's just what I call it) back to factory defaults, so you end up with out-of-the-box performance: there's no more LBA-to-NAND-cell map entries in the FTL (which are usually what are responsible for slowdown). But even so, there is a very good possibility you have induced massive wear and tear on the NAND cells given this situation (especially with the Intel drive) and they may be in very bad shape in general. smartctl will tell me (keep reading). You should do the Secure Erase before doing any of the gpart stuff I talk about below. Back to the 4K alignment problem with your SSD slices: You should have partitioned these using "gpart" and the GPT mechanism, and used a 1MByte alignment for both partitions -- it's the easiest way to get proper alignment (with MBR its possible but it's more of a mess and really not worth it). The procedure is described here: http://www.wonkity.com/~wblock/docs/html/disksetup.html#_the_new_standard_gpt I'll explain what I'd do, but please do not do this yet either (there is another aspect that needs to be covered as well). Ignore the stuff about labels, and ignore the "boot" partition stuff -- it doesn't apply to your setup. To mimic what you currently have, you just need two partitions on each drive, properly aligned via GPT. That should be as easy as this -- except you are going to have to destroy your pool and start over because you're using log devices: gpart destroy -F ada4 gpart destroy -F ada5 gpart create -s gpt ada4 gpart add -t freebsd-zfs -b 1M -s 10G gpart add -t freebsd-zfs -a 1M gpart create -s gpt ada5 gpart add -t freebsd-zfs -b 1M -s 10G gpart add -t freebsd-zfs -a 1M You should be left with 4 devices at this point -- note the name change ("p" not "s"): /dev/ada4p1 -- use for your log device (mirrored) /dev/ada4p2 -- unused /dev/ada5p1 -- use for your log device (mirrored) /dev/ada5p2 -- use for your cache device/L2ARC And these should be able to be used with ZFS without any issue due to being properly aligned to a 1MByte boundary (which is 4096-byte aligned, but the reason we pick 1MByte is that it often correlates with NAND erase block size on many SSDs; it's sort of a "de-facto social standard" used by other OSes too). But before doing ANY OF THIS... You should probably be made aware of the fact that SSDs need to be kept roughly 30-40% unused to get the most benefits out of wear levelling. Once you hit the 20% remaining mark, performance takes a hit, and the drive begins hurting more and more. Low-capacity SSDs are therefore generally worthless given the capacity limitation need. Your Intel drive is very very small, and in fact I wouldn't even bother to use this drive -- it means you'd only be able to use roughly 14GB of it (at most) for data, and leave the remaining 6GB unallocated/unused solely for wear levelling. So given this information, when using the above gpart commands, you might actually want to adjust the sizes of the partitions so that there is always a guaranteed level of free/untouched space on them. For example: gpart destroy -F ada4 gpart destroy -F ada5 gpart create -s gpt ada4 gpart add -t freebsd-zfs -b 1M -s 7G gpart add -t freebsd-zfs -a 1M -s 7G gpart create -s gpt ada5 gpart add -t freebsd-zfs -b 1M -s 10G gpart add -t freebsd-zfs -a 1M -s 32G This would result in the following: /dev/ada4p1 -- 7GB -- use for your log device (mirrored) /dev/ada4p2 -- 7GB -- unused -- 6GB (remaining 30% of SSD) -- for wear levelling /dev/ada5p1 -- 10GB -- use for your log device (mirrored) /dev/ada5p2 -- 32GB -- use for your cache device/L2ARC -- 18GB (remaining 30% of SSD) -- for wear levelling I feel like I'm missing something, but no, I think that just about does it. Oh, I remember now. Next topic... I would strongly recommend you not use 1 SSD for both log and cache. I understand your thought process here: "if the SSD dies, the log devices are mirrored so I'm okay, and the cache is throw-away anyway". What you're not taking into consideration is how log and cache devices bottleneck ZFS, in addition to the fact that SATA is not like SAS when it comes to simultaneous R/W. That poor OCZ drive... I will let someone else talk about this part, because I feel I've already written up enough as is, I'm not sure how much will sink in, you'll probably be angry being told "your setup is a pretty gigantic mess but here's how to fix it, it involves recreating your entire pool and partitions on your SSDs too", but I'm also exhausted as this is the 4th or 5th Email in just the past few days I've had to write to cover multiple bases. I don't understand how this knowledge has not been dispensed into the community by now, common practise, etc.. I feel like yelling "VAFAAAAAAN!!!" :-) > nas# smartctl -a ada3 > ada3: Unable to detect device type My fault -- the syntax here is wrong, I should have been more clear: smartctl -a /dev/ada{0,5} Also, please update your ports tree and install smartmontools 6.1. There are improvements there pertaining to SSDs that are relevant. -- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 19:14:47 2013 Return-Path: Delivered-To: freebsd-fs@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id D6DC9E19; Thu, 4 Jul 2013 19:14:47 +0000 (UTC) (envelope-from smh@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id B18981A0F; Thu, 4 Jul 2013 19:14:47 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id r64JEl4e030915; Thu, 4 Jul 2013 19:14:47 GMT (envelope-from smh@freefall.freebsd.org) Received: (from smh@localhost) by freefall.freebsd.org (8.14.7/8.14.7/Submit) id r64JEkfw030914; Thu, 4 Jul 2013 19:14:46 GMT (envelope-from smh) Date: Thu, 4 Jul 2013 19:14:46 GMT Message-Id: <201307041914.r64JEkfw030914@freefall.freebsd.org> To: fidaj@ukr.net, smh@FreeBSD.org, freebsd-fs@FreeBSD.org From: smh@FreeBSD.org Subject: Re: kern/180236: [zfs] [nullfs] Leakage free space using ZFS with nullfs on 9.1-STABLE X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 19:14:47 -0000 Synopsis: [zfs] [nullfs] Leakage free space using ZFS with nullfs on 9.1-STABLE State-Changed-From-To: open->patched State-Changed-By: smh State-Changed-When: Thu Jul 4 19:14:46 UTC 2013 State-Changed-Why: This is now addressed by r252714 (http://svnweb.freebsd.org/changeset/base/252714) in head thanks to kib http://www.freebsd.org/cgi/query-pr.cgi?pr=180236 From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 19:15:05 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id BECD3E2A for ; Thu, 4 Jul 2013 19:15:05 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 161F11A16 for ; Thu, 4 Jul 2013 19:15:04 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id WAA15948; Thu, 04 Jul 2013 22:14:41 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1UuozI-000FDc-OK; Thu, 04 Jul 2013 22:14:40 +0300 Message-ID: <51D5C968.2000803@FreeBSD.org> Date: Thu, 04 Jul 2013 22:13:44 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130405 Thunderbird/17.0.5 MIME-Version: 1.0 To: Dmitry Morozovsky Subject: Re: boot from ZFS: which pool types use? References: <51D56066.1020902@FreeBSD.org> <51D577A9.1030304@gmail.com> <51D59AAD.3030208@FreeBSD.org> <51D5A20F.4070103@FreeBSD.org> In-Reply-To: X-Enigmail-Version: 1.5.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 19:15:05 -0000 on 04/07/2013 21:18 Dmitry Morozovsky said the following: > On Thu, 4 Jul 2013, Dmitry Morozovsky wrote: > >> Now, looking at zfsboot.c, couldn't we (for finding /boot/zfsloader suitable >> for us) loop over, say, top-level datasets in pools we have, in abcense of >> special property? As for me, this should plug the most annoying mistakes... I don't see any reason to do that. For example I have multiple root filesystems and they are second level from top. And I don't want any code to second-guess me. > And followup: > > for ufs/fdisk, one could easily drop in the boot process and switch to other > disk, slice, partitiona and even loader. Not that it's (at least easily) > achievable with zfsloader... Really? http://ru.kyivbsd.org.ua/arhiv/2012/kyivbsd12-gapon-zfs.pdf?attredirects=0&d=1 > And, not, unfortunately, I have no patches to apply.... > -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 19:34:15 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id AEE7538A for ; Thu, 4 Jul 2013 19:34:15 +0000 (UTC) (envelope-from osharoiko@gmail.com) Received: from mail-ve0-x234.google.com (mail-ve0-x234.google.com [IPv6:2607:f8b0:400c:c01::234]) by mx1.freebsd.org (Postfix) with ESMTP id 72F041B3A for ; Thu, 4 Jul 2013 19:34:15 +0000 (UTC) Received: by mail-ve0-f180.google.com with SMTP id pa12so1244096veb.39 for ; Thu, 04 Jul 2013 12:34:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=6rIrPpzPmfN2r/GUUWGwmIA5MntbXonCW2dIqzbq2go=; b=feMZd97Vd2V0FJQWfsK7sTY7aIzkA1Xb1bnIc8+JMUJ9YpvnSadGlHbhe3rwJ5CVs7 PQ4vJivMKZAxQcv4onIaMUwZcFMffmXpqxkLOABCFjMvtEvzB3wDey0QQeyxKaLN4bHL qramrKTp+M5SrGxWw7ZSHwqWdqacIna/5HxkXpFwqpPlQbjbo7vz6Zlb3BnOpCzQFujx fpUxPOV1K7QgIDT93dTB+cB7WR2/h0XQRWOxQ42y6yjzuXzuZaxBgrnCgkv3q476nD5w 7Qtc0gq4kEo8Xu6jrokplyvw7utI9Uzko3bg5eo44qmfRgBs7WoqZf+jWeMXPWo8/i05 gf9g== MIME-Version: 1.0 X-Received: by 10.58.50.7 with SMTP id y7mr4121519ven.24.1372966454971; Thu, 04 Jul 2013 12:34:14 -0700 (PDT) Received: by 10.58.28.238 with HTTP; Thu, 4 Jul 2013 12:34:14 -0700 (PDT) Date: Thu, 4 Jul 2013 20:34:14 +0100 Message-ID: Subject: NFSv4 and Kerberos, group permission seem to be ignored From: Oleg Sharoyko To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 19:34:15 -0000 Hello, I have a small server which runs FreeBSD 9.1 and I've is set up as NFSv4 server with kerberised NFS access. My clients are linux machines. It almost works as expected (mounting/accessing files) except for one strange issue: it looks like group permissions on files and directories are being ignored. Here's an example: Server: evendim:~ % id uid=1001(ols) gid=1001(ols) groups=1001(ols),0(wheel),60000(family) evendim:~ % ls -l /data/file1 -rw-rw---- 1 root family 6 4 Jul 18:42 /data/file1 evendim:~ % cat /data/file1 test1 evendim:~ % ls -l /data/file2 -rw------- 1 ols family 6 4 Jul 18:42 /data/file2 evendim:~ % cat /data/file2 test2 evendim:~ % ls -l /data/file3 -rw-r--r-- 1 root family 6 4 Jul 18:42 /data/file3 evendim:~ % cat /data/file3 test3 evendim:~ % cat /etc/exports V4:/ -sec=krb5 /data -sec=krb5 Client: sherlock:~ % id uid=1000(ols) gid=1000(ols) groups=1000(ols),4(adm),20(dialout),24(cdrom),25(floppy),29(audio),30(dip),44(video),46(plugdev),109(netdev),110(bluetooth),113(fuse),116(scanner),118(kismet),60000(family) sherlock:~ % sudo mount -v -t nfs4 -o sec=krb5 evendim.sharoyko.net:/data /mnt mount.nfs4: timeout set for Thu Jul 4 19:52:16 2013 mount.nfs4: trying text-based options 'sec=krb5,addr=192.168.1.3,clientaddr=192.168.1.128' sherlock:~ % ls -l /mnt/file1 -rw-rw---- 1 root family 6 Jul 4 19:42 /mnt/file1 sherlock:~ % cat /mnt/file1 cat: /mnt/file1: Permission denied sherlock:~ % ls -l /mnt/file2 -rw------- 1 ols family 6 Jul 4 19:42 /mnt/file2 sherlock:~ % cat /mnt/file2 test2 sherlock:~ % ls -l /mnt/file3 -rw-r--r-- 1 root family 6 Jul 4 19:42 /mnt/file3 sherlock:~ % cat /mnt/file3 test3 As you can see file2 is inaccessible while it has group read/write permissions, user ols belongs to group family on both client and server and user/group mapping seems to work. /data on the server is a ZFS filesystem but I've also tried UFS with the same results. I've also tried ACLs and ACLs for users do work while ACLs for groups don't seem to have any effect. Is there something that I'm doing wrong? Is this an expected behaviour? I will greatly appreciate if you can help me debugging this issue. I'll quote below captured packets that are relevant to my attempt to access file1. As you can see access is clearly denied by server but I don't understand why. No. Time Source Destination Protocol Length Info 109 5.649608 192.168.1.128 192.168.1.3 NFS 258 V4 Call (Reply In 110) LOOKUP DH:0x4dcc3776/file1 Frame 109: 258 bytes on wire (2064 bits), 258 bytes captured (2064 bits) Ethernet II, Src: GemtekTe_f6:cf:a1 (00:26:82:f6:cf:a1), Dst: Giga-Byt_db:cd:c4 (90:2b:34:db:cd:c4) Internet Protocol Version 4, Src: 192.168.1.128 (192.168.1.128), Dst: 192.168.1.3 (192.168.1.3) Transmission Control Protocol, Src Port: 726 (726), Dst Port: nfs (2049), Seq: 3337, Ack: 3193, Len: 192 Remote Procedure Call, Type:Call XID:0xba073c52 Network File System [Program Version: 4] [V4 Procedure: COMPOUND (1)] Tag: length: 0 contents: minorversion: 0 Operations (count: 4) Opcode: PUTFH (22) filehandle length: 28 [hash (CRC-32): 0x4dcc3776] decode type as: unknown filehandle: 9a7470c6deedeca50a0004000000000037d80a0000000000... Opcode: LOOKUP (15) Filename: file1 length: 5 contents: file1 fill bytes: opaque data Opcode: GETFH (10) Opcode: GETATTR (9) GETATTR4args attr_request bitmap[0] = 0x0010011a [5 attributes requested] mand_attr: FATTR4_TYPE (1) mand_attr: FATTR4_CHANGE (3) mand_attr: FATTR4_SIZE (4) mand_attr: FATTR4_FSID (8) recc_attr: FATTR4_FILEID (20) bitmap[1] = 0x0030a23a [9 attributes requested] recc_attr: FATTR4_MODE (33) recc_attr: FATTR4_NUMLINKS (35) recc_attr: FATTR4_OWNER (36) recc_attr: FATTR4_OWNER_GROUP (37) recc_attr: FATTR4_RAWDEV (41) recc_attr: FATTR4_SPACE_USED (45) recc_attr: FATTR4_TIME_ACCESS (47) recc_attr: FATTR4_TIME_METADATA (52) recc_attr: FATTR4_TIME_MODIFY (53) [Main Opcode: LOOKUP (15)] No. Time Source Destination Protocol Length Info 110 5.649870 192.168.1.3 192.168.1.128 NFS 370 V4 Reply (Call In 109) LOOKUP Frame 110: 370 bytes on wire (2960 bits), 370 bytes captured (2960 bits) Ethernet II, Src: Giga-Byt_db:cd:c4 (90:2b:34:db:cd:c4), Dst: GemtekTe_f6:cf:a1 (00:26:82:f6:cf:a1) Internet Protocol Version 4, Src: 192.168.1.3 (192.168.1.3), Dst: 192.168.1.128 (192.168.1.128) Transmission Control Protocol, Src Port: nfs (2049), Dst Port: 726 (726), Seq: 3193, Ack: 3529, Len: 304 Remote Procedure Call, Type:Reply XID:0xba073c52 Network File System [Program Version: 4] [V4 Procedure: COMPOUND (1)] Status: NFS4_OK (0) Tag: length: 0 contents: Operations (count: 4) Opcode: PUTFH (22) Status: NFS4_OK (0) Opcode: LOOKUP (15) Status: NFS4_OK (0) Opcode: GETFH (10) Status: NFS4_OK (0) Filehandle length: 28 [hash (CRC-32): 0xc0a4eeb4] decode type as: unknown filehandle: 9a7470c6deedeca50a00ed00000000001bb70d0000000000... Opcode: GETATTR (9) Status: NFS4_OK (0) GETATTR4res resok4 obj_attributes attrmask bitmap[0] = 0x0010011a [5 attributes requested] mand_attr: FATTR4_TYPE (1) mand_attr: FATTR4_CHANGE (3) mand_attr: FATTR4_SIZE (4) mand_attr: FATTR4_FSID (8) recc_attr: FATTR4_FILEID (20) bitmap[1] = 0x0030a23a [9 attributes requested] recc_attr: FATTR4_MODE (33) recc_attr: FATTR4_NUMLINKS (35) recc_attr: FATTR4_OWNER (36) recc_attr: FATTR4_OWNER_GROUP (37) recc_attr: FATTR4_RAWDEV (41) recc_attr: FATTR4_SPACE_USED (45) recc_attr: FATTR4_TIME_ACCESS (47) recc_attr: FATTR4_TIME_METADATA (52) recc_attr: FATTR4_TIME_MODIFY (53) attr_vals mand_attr: FATTR4_TYPE (1) nfs_ftype4: NF4REG (1) mand_attr: FATTR4_CHANGE (3) changeid: 96 mand_attr: FATTR4_SIZE (4) size: 6 mand_attr: FATTR4_FSID (8) fattr4_fsid fsid4.major: 3329258650 fsid4.minor: 2783768030 recc_attr: FATTR4_FILEID (20) fileid: 237 recc_attr: FATTR4_MODE (33) fattr4_mode: 0660 000. .... .... .... = Unknown .... 0... .... .... = not SUID .... .0.. .... .... = not SGID .... ..0. .... .... = not save swapped text .... ...1 .... .... = Read permission for owner .... .... 1... .... = Write permission for owner .... .... .0.. .... = no Execute permission for owner .... .... ..1. .... = Read permission for group .... .... ...1 .... = Write permission for group .... .... .... 0... = no Execute permission for group .... .... .... .0.. = no Read permission for others .... .... .... ..0. = no Write permission for others .... .... .... ...0 = no Execute permission for others recc_attr: FATTR4_NUMLINKS (35) numlinks: 1 recc_attr: FATTR4_OWNER (36) fattr4_owner: root@id.sharoyko.net length: 20 contents: root@id.sharoyko.net recc_attr: FATTR4_OWNER_GROUP (37) fattr4_owner_group: family@id.sharoyko.net length: 22 contents: family@id.sharoyko.net fill bytes: opaque data recc_attr: FATTR4_RAWDEV (41) specdata1: 128 specdata2: 123863040 recc_attr: FATTR4_SPACE_USED (45) space_used: 1024 recc_attr: FATTR4_TIME_ACCESS (47) seconds: 1372963326 nseconds: 263434280 recc_attr: FATTR4_TIME_METADATA (52) seconds: 1372963379 nseconds: 804435894 recc_attr: FATTR4_TIME_MODIFY (53) seconds: 1372963326 nseconds: 264422029 [Main Opcode: LOOKUP (15)] No. Time Source Destination Protocol Length Info 117 8.456684 192.168.1.128 192.168.1.3 NFS 322 V4 Call (Reply In 118) OPEN DH:0x4dcc3776/file1 Frame 117: 322 bytes on wire (2576 bits), 322 bytes captured (2576 bits) Ethernet II, Src: GemtekTe_f6:cf:a1 (00:26:82:f6:cf:a1), Dst: Giga-Byt_db:cd:c4 (90:2b:34:db:cd:c4) Internet Protocol Version 4, Src: 192.168.1.128 (192.168.1.128), Dst: 192.168.1.3 (192.168.1.3) Transmission Control Protocol, Src Port: 726 (726), Dst Port: nfs (2049), Seq: 3905, Ack: 3697, Len: 256 Remote Procedure Call, Type:Call XID:0xbd073c52 Network File System [Program Version: 4] [V4 Procedure: COMPOUND (1)] Tag: length: 0 contents: minorversion: 0 Operations (count: 5) Opcode: PUTFH (22) filehandle length: 28 [hash (CRC-32): 0x4dcc3776] decode type as: unknown filehandle: 9a7470c6deedeca50a0004000000000037d80a0000000000... Opcode: OPEN (18) seqid: 0x00000000 share_access: OPEN4_SHARE_ACCESS_READ (1) share_deny: OPEN4_SHARE_DENY_NONE (0) clientid: 0xcd6cc75124000000 owner: length: 24 contents: Open Type: OPEN4_NOCREATE (0) Claim Type: CLAIM_NULL (0) Filename: file1 length: 5 contents: file1 fill bytes: opaque data Opcode: GETFH (10) Opcode: ACCESS (3), [Check: RD MD XT XE] Check access: 0x2d .... ...1 = 0x01 READ: allowed? .... .1.. = 0x04 MODIFY: allowed? .... 1... = 0x08 EXTEND: allowed? ..1. .... = 0x20 EXECUTE: allowed? Opcode: GETATTR (9) GETATTR4args attr_request bitmap[0] = 0x0010011a [5 attributes requested] mand_attr: FATTR4_TYPE (1) mand_attr: FATTR4_CHANGE (3) mand_attr: FATTR4_SIZE (4) mand_attr: FATTR4_FSID (8) recc_attr: FATTR4_FILEID (20) bitmap[1] = 0x0030a23a [9 attributes requested] recc_attr: FATTR4_MODE (33) recc_attr: FATTR4_NUMLINKS (35) recc_attr: FATTR4_OWNER (36) recc_attr: FATTR4_OWNER_GROUP (37) recc_attr: FATTR4_RAWDEV (41) recc_attr: FATTR4_SPACE_USED (45) recc_attr: FATTR4_TIME_ACCESS (47) recc_attr: FATTR4_TIME_METADATA (52) recc_attr: FATTR4_TIME_MODIFY (53) [Main Opcode: OPEN (18)] No. Time Source Destination Protocol Length Info 118 8.456811 192.168.1.3 192.168.1.128 NFS 150 V4 Reply (Call In 117) OPEN Status: NFS4ERR_ACCES Frame 118: 150 bytes on wire (1200 bits), 150 bytes captured (1200 bits) Ethernet II, Src: Giga-Byt_db:cd:c4 (90:2b:34:db:cd:c4), Dst: GemtekTe_f6:cf:a1 (00:26:82:f6:cf:a1) Internet Protocol Version 4, Src: 192.168.1.3 (192.168.1.3), Dst: 192.168.1.128 (192.168.1.128) Transmission Control Protocol, Src Port: nfs (2049), Dst Port: 726 (726), Seq: 3697, Ack: 4161, Len: 84 Remote Procedure Call, Type:Reply XID:0xbd073c52 Network File System [Program Version: 4] [V4 Procedure: COMPOUND (1)] Status: NFS4ERR_ACCES (13) Tag: length: 0 contents: Operations (count: 2) Opcode: PUTFH (22) Status: NFS4_OK (0) Opcode: OPEN (18) Status: NFS4ERR_ACCES (13) [Main Opcode: OPEN (18)] Kind regards, -- Oleg From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 19:41:12 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 20704564 for ; Thu, 4 Jul 2013 19:41:12 +0000 (UTC) (envelope-from prvs=1897fed9ea=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id A3BFC1B9A for ; Thu, 4 Jul 2013 19:41:11 +0000 (UTC) Received: from r2d2 ([82.69.141.170]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50004727068.msg for ; Thu, 04 Jul 2013 20:41:09 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Thu, 04 Jul 2013 20:41:09 +0100 (not processed: message from valid local sender) X-MDDKIM-Result: neutral (mail1.multiplay.co.uk) X-MDRemoteIP: 82.69.141.170 X-Return-Path: prvs=1897fed9ea=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-fs@freebsd.org Message-ID: <43015E9015084CA6BAC6978F39D22E8B@multiplay.co.uk> From: "Steven Hartland" To: "Jeremy Chadwick" , "mxb" References: <51D42107.1050107@digsys.bg> <2EF46A8C-6908-4160-BF99-EC610B3EA771@alumni.chalmers.se> <51D437E2.4060101@digsys.bg> <20130704000405.GA75529@icarus.home.lan> <20130704171637.GA94539@icarus.home.lan> <2A261BEA-4452-4F6A-8EFB-90A54D79CBB9@alumni.chalmers.se> <20130704191203.GA95642@icarus.home.lan> Subject: Re: Slow resilvering with mirrored ZIL Date: Thu, 4 Jul 2013 20:41:18 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 19:41:12 -0000 ----- Original Message ----- From: "Jeremy Chadwick" ... > I believe -- but I need someone else to chime in here with confirmation, > particularly someone who is familiar with ZFS's internals -- once your > pool is ashift 12, you can do a disk replacement ***without*** having to > do the gnop procedure (because the pool itself is already using ashift > 12). But again, I need someone to confirm that. Close, the ashift is a property of the vdev and not the entire pool so if your adding a new vdev to the pool at least one of the devices in said pool needs to report 4k sectors either natively or via the gnop work around. Note our ZFS code doesn't currently recognise FreeBSD 4K quirks, this is something I have a patch for but want to enhance before committing. > Next topic. ... > Combine this fact with the fact that 9.1-RELEASE does not support TRIM > on ZFS, and you now have SSDs which are probably beat to hell and back. > > You really need to be running stable/9 if you want to use SSDs with ZFS. > I cannot stress this enough. I will not bend on this fact. I do not > care if what people have are SLC rather than MLC or TLC -- it doesn't > matter. TRIM on ZFS is a downright necessity for long-term reliability > of an SSD. Anyway... stable/8 also has TRIM support too now. > These SSDs need a full Secure Erase done to them. In stable/9 you can > do this through camcontrol, otherwise you need to use Linux (there are > live CD/DVD distros that can do this for you) or the vendor's native > utilities (in Windows usually). When adding a new device to ZFS it will attempt to do a full TRIM so this isn't 100% necessary but as some disks still get extra benefits from this its still good if you want best performance. ... > Next topic... > > I would strongly recommend you not use 1 SSD for both log and cache. > I understand your thought process here: "if the SSD dies, the log > devices are mirrored so I'm okay, and the cache is throw-away anyway". While not ideal it still gives a significant boost against no SLOG, so if thats what HW you have to work with, don't discount the benefit it will bring. ... >> nas# smartctl -a ada3 >> ada3: Unable to detect device type > > My fault -- the syntax here is wrong, I should have been more clear: > > smartctl -a /dev/ada{0,5} > > Also, please update your ports tree and install smartmontools 6.1. > There are improvements there pertaining to SSDs that are relevant. Also don't forget to update the disk DB using update-smart-drivedb. Regardds Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 19:59:04 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 64632CAF; Thu, 4 Jul 2013 19:59:04 +0000 (UTC) (envelope-from marck@rinet.ru) Received: from woozle.rinet.ru (woozle.rinet.ru [195.54.192.68]) by mx1.freebsd.org (Postfix) with ESMTP id E3F8E1C42; Thu, 4 Jul 2013 19:59:03 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by woozle.rinet.ru (8.14.5/8.14.5) with ESMTP id r64Jx28a064294; Thu, 4 Jul 2013 23:59:02 +0400 (MSK) (envelope-from marck@rinet.ru) Date: Thu, 4 Jul 2013 23:59:02 +0400 (MSK) From: Dmitry Morozovsky To: Andriy Gapon Subject: Re: boot from ZFS: which pool types use? In-Reply-To: <51D5C968.2000803@FreeBSD.org> Message-ID: References: <51D56066.1020902@FreeBSD.org> <51D577A9.1030304@gmail.com> <51D59AAD.3030208@FreeBSD.org> <51D5A20F.4070103@FreeBSD.org> <51D5C968.2000803@FreeBSD.org> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) X-NCC-RegID: ru.rinet X-OpenPGP-Key-ID: 6B691B03 MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (woozle.rinet.ru [0.0.0.0]); Thu, 04 Jul 2013 23:59:02 +0400 (MSK) Cc: freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 19:59:04 -0000 On Thu, 4 Jul 2013, Andriy Gapon wrote: > >> Now, looking at zfsboot.c, couldn't we (for finding /boot/zfsloader suitable > >> for us) loop over, say, top-level datasets in pools we have, in abcense of > >> special property? As for me, this should plug the most annoying mistakes... > > I don't see any reason to do that. For example I have multiple root filesystems > and they are second level from top. And I don't want any code to second-guess me. Yes, it's fair objection, and I had this in mind > > for ufs/fdisk, one could easily drop in the boot process and switch to other > > disk, slice, partitiona and even loader. Not that it's (at least easily) > > achievable with zfsloader... > > Really? > http://ru.kyivbsd.org.ua/arhiv/2012/kyivbsd12-gapon-zfs.pdf?attredirects=0&d=1 Hmm. I do not see (maybe I'm missing something obvious) how can one select at gptzfsboot or similar prompt the pool to load loader (not too much 'load's, heh?) from... -- Sincerely, D.Marck [DM5020, MCK-RIPE, DM3-RIPN] [ FreeBSD committer: marck@FreeBSD.org ] ------------------------------------------------------------------------ *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- marck@rinet.ru *** ------------------------------------------------------------------------ From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 20:07:02 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id E5BD7DA6 for ; Thu, 4 Jul 2013 20:07:02 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 3BD141C7F for ; Thu, 4 Jul 2013 20:07:01 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id XAA16718; Thu, 04 Jul 2013 23:06:38 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1Uupna-000FIv-0d; Thu, 04 Jul 2013 23:06:38 +0300 Message-ID: <51D5D5AA.8070807@FreeBSD.org> Date: Thu, 04 Jul 2013 23:06:02 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130405 Thunderbird/17.0.5 MIME-Version: 1.0 To: Dmitry Morozovsky Subject: Re: boot from ZFS: which pool types use? References: <51D56066.1020902@FreeBSD.org> <51D577A9.1030304@gmail.com> <51D59AAD.3030208@FreeBSD.org> <51D5A20F.4070103@FreeBSD.org> <51D5C968.2000803@FreeBSD.org> In-Reply-To: X-Enigmail-Version: 1.5.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 20:07:03 -0000 on 04/07/2013 22:59 Dmitry Morozovsky said the following: > On Thu, 4 Jul 2013, Andriy Gapon wrote: >> Really? >> http://ru.kyivbsd.org.ua/arhiv/2012/kyivbsd12-gapon-zfs.pdf?attredirects=0&d=1 > > Hmm. I do not see (maybe I'm missing something obvious) how can one select at > gptzfsboot or similar prompt the pool to load loader (not too much 'load's, > heh?) from... Page 18. FreeBSD/x86 boot Default: pool1:ROOT/test1:/boot/zfsloader boot: pool2:ROOT/knowngood:/boot/zfsloader.old BTW, the syntax now is changed to use more familiar dataset names, e.g. "pool2/ROOT/knowngood" instead of "pool2:ROOT/knowngood". -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 20:09:50 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 136C1F71; Thu, 4 Jul 2013 20:09:50 +0000 (UTC) (envelope-from marck@rinet.ru) Received: from woozle.rinet.ru (woozle.rinet.ru [195.54.192.68]) by mx1.freebsd.org (Postfix) with ESMTP id 92B0F1C9A; Thu, 4 Jul 2013 20:09:49 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by woozle.rinet.ru (8.14.5/8.14.5) with ESMTP id r64K9mvt064624; Fri, 5 Jul 2013 00:09:48 +0400 (MSK) (envelope-from marck@rinet.ru) Date: Fri, 5 Jul 2013 00:09:48 +0400 (MSK) From: Dmitry Morozovsky To: Andriy Gapon Subject: Re: boot from ZFS: which pool types use? In-Reply-To: <51D5D5AA.8070807@FreeBSD.org> Message-ID: References: <51D56066.1020902@FreeBSD.org> <51D577A9.1030304@gmail.com> <51D59AAD.3030208@FreeBSD.org> <51D5A20F.4070103@FreeBSD.org> <51D5C968.2000803@FreeBSD.org> <51D5D5AA.8070807@FreeBSD.org> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) X-NCC-RegID: ru.rinet X-OpenPGP-Key-ID: 6B691B03 MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (woozle.rinet.ru [0.0.0.0]); Fri, 05 Jul 2013 00:09:48 +0400 (MSK) Cc: freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 20:09:50 -0000 On Thu, 4 Jul 2013, Andriy Gapon wrote: > >> Really? > >> http://ru.kyivbsd.org.ua/arhiv/2012/kyivbsd12-gapon-zfs.pdf?attredirects=0&d=1 > > > > Hmm. I do not see (maybe I'm missing something obvious) how can one select at > > gptzfsboot or similar prompt the pool to load loader (not too much 'load's, > > heh?) from... > > Page 18. > > FreeBSD/x86 boot > Default: pool1:ROOT/test1:/boot/zfsloader > boot: pool2:ROOT/knowngood:/boot/zfsloader.old > > BTW, the syntax now is changed to use more familiar dataset names, e.g. > "pool2/ROOT/knowngood" instead of "pool2:ROOT/knowngood". Wow. This should be documented loud, then. Possibly in the prompt too. Thank you very much for the insights :) -- Sincerely, D.Marck [DM5020, MCK-RIPE, DM3-RIPN] [ FreeBSD committer: marck@FreeBSD.org ] ------------------------------------------------------------------------ *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- marck@rinet.ru *** ------------------------------------------------------------------------ From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 20:14:48 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 47A4E469; Thu, 4 Jul 2013 20:14:48 +0000 (UTC) (envelope-from Devin.Teske@fisglobal.com) Received: from mx1.fisglobal.com (mx1.fisglobal.com [199.200.24.190]) by mx1.freebsd.org (Postfix) with ESMTP id 0F8541CD7; Thu, 4 Jul 2013 20:14:47 +0000 (UTC) Received: from smtp.fisglobal.com ([10.132.206.16]) by ltcfislmsgpa07.fnfis.com (8.14.5/8.14.5) with ESMTP id r64KEboD026720 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT); Thu, 4 Jul 2013 15:14:37 -0500 Received: from LTCFISWMSGMB21.FNFIS.com ([10.132.99.23]) by LTCFISWMSGHT05.FNFIS.com ([10.132.206.16]) with mapi id 14.02.0309.002; Thu, 4 Jul 2013 15:14:37 -0500 From: "Teske, Devin" To: Dmitry Morozovsky Subject: Re: boot from ZFS: which pool types use? Thread-Topic: boot from ZFS: which pool types use? Thread-Index: AQHOeKNATnsjWmNSU0ODtAAAUK/mnZlUupEAgAADDgCAABitgIAAKSqAgAAAk4CAAAL4AIAABdaAgAAefwCAAAEJgIAAD18AgAAMqQCAAAH0AIAAAQ4AgAABVwA= Date: Thu, 4 Jul 2013 20:14:37 +0000 Message-ID: <13CA24D6AB415D428143D44749F57D7201FAFE59@ltcfiswmsgmb21> References: <51D56066.1020902@FreeBSD.org> <51D577A9.1030304@gmail.com> <51D59AAD.3030208@FreeBSD.org> <51D5A20F.4070103@FreeBSD.org> <51D5C968.2000803@FreeBSD.org> <51D5D5AA.8070807@FreeBSD.org> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.132.253.126] Content-Type: text/plain; charset="us-ascii" Content-ID: Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.10.8794, 1.0.431, 0.0.0000 definitions=2013-07-04_07:2013-07-04,2013-07-04,1970-01-01 signatures=0 Cc: "freebsd-fs@freebsd.org" , Devin Teske , Andriy Gapon X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: Devin Teske List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 20:14:48 -0000 On Jul 4, 2013, at 1:09 PM, Dmitry Morozovsky wrote: > On Thu, 4 Jul 2013, Andriy Gapon wrote: >=20 >>>> Really? >>>> http://ru.kyivbsd.org.ua/arhiv/2012/kyivbsd12-gapon-zfs.pdf?attredirec= ts=3D0&d=3D1 >>>=20 >>> Hmm. I do not see (maybe I'm missing something obvious) how can one se= lect at=20 >>> gptzfsboot or similar prompt the pool to load loader (not too much 'loa= d's,=20 >>> heh?) from... >>=20 >> Page 18. >>=20 >> FreeBSD/x86 boot >> Default: pool1:ROOT/test1:/boot/zfsloader >> boot: pool2:ROOT/knowngood:/boot/zfsloader.old >>=20 >> BTW, the syntax now is changed to use more familiar dataset names, e.g. >> "pool2/ROOT/knowngood" instead of "pool2:ROOT/knowngood". >=20 > Wow. This should be documented loud, then. Possibly in the prompt too. >=20 > Thank you very much for the insights :) >=20 I hope to pick up work again sometime soon on pool enumeration integration = for the beastie menu. Hard work is done. Was going to catch up with avg/mav sometime in the near = future to work on previously-discussed enhancements so that we could (toget= her) achieve simple submenus showing your possible choices for boot pool. --=20 Devin _____________ The information contained in this message is proprietary and/or confidentia= l. If you are not the intended recipient, please: (i) delete the message an= d all copies; (ii) do not disclose, distribute or use the message in any ma= nner; and (iii) notify the sender immediately. In addition, please be aware= that any message addressed to our domain is subject to archiving and revie= w by persons other than the intended recipient. Thank you. From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 20:27:41 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 242587F7; Thu, 4 Jul 2013 20:27:41 +0000 (UTC) (envelope-from c.kworr@gmail.com) Received: from mail-wg0-x236.google.com (mail-wg0-x236.google.com [IPv6:2a00:1450:400c:c00::236]) by mx1.freebsd.org (Postfix) with ESMTP id 8A9BF1D37; Thu, 4 Jul 2013 20:27:40 +0000 (UTC) Received: by mail-wg0-f54.google.com with SMTP id n11so1474186wgh.9 for ; Thu, 04 Jul 2013 13:27:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=CwYmr+0lLdWqyi1YFVTQG0foje+4yifiOAmomDxmJA8=; b=flw6rynpzbmHp+ZSLbC4B07Rl3IxGHVe9XL2vJ0kNJwp0WvKbT8GhMgcqLSLeZgj5p Ii2Uw9xZx7eMPD0sJxG74BNHlna9fC0tAzDNSwuv2kp3U0zFA3bs/nWcA6Xq0FedJ9fv n6A9a9hzfHdpYxdafcFwNgpeHRRjAsutYQdxqfLn8Xkv8yT41kwADCP9NEp9zj34UHo3 dZ/t/M5nwjphByMe0jYdnuPuTcY2nbQcr5a90buZ7v84LtRNvSsNqkuDGGDSIfpyMijW usgFTwSLAm6ho7Ya9NESMI2T3pkdCD7oYKKVBBLypwuB3ldKNUtdrlfzPOflAWwSaiLc 6aWA== X-Received: by 10.194.240.201 with SMTP id wc9mr4498954wjc.1.1372969659700; Thu, 04 Jul 2013 13:27:39 -0700 (PDT) Received: from limbo.xim.bz ([46.150.100.6]) by mx.google.com with ESMTPSA id fs8sm37052176wib.0.2013.07.04.13.27.38 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 04 Jul 2013 13:27:38 -0700 (PDT) Message-ID: <51D5DAB9.4070507@gmail.com> Date: Thu, 04 Jul 2013 23:27:37 +0300 From: Volodymyr Kostyrko User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130627 Thunderbird/17.0.7 MIME-Version: 1.0 To: Andriy Gapon Subject: Re: ZFS default compression algo for contemporary FreeBSD versions References: <51D576E1.6030803@gmail.com> <51D59B6C.5030600@gmail.com> <51D59C88.9060403@FreeBSD.org> In-Reply-To: <51D59C88.9060403@FreeBSD.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org, Dmitry Morozovsky X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 20:27:41 -0000 04.07.2013 19:02, Andriy Gapon wrote: > on 04/07/2013 18:57 Volodymyr Kostyrko said the following: >> Yes. Much better in terms of speed. > > And compression too. Can't really say. When the code first appeared in stable I moved two of my machines (desktops) to LZ4 recreating each dataset. To my surprise gain at transition from lzjb was fairly minimal and sometimes LZ4 even loses to lzjb in compression size. However better compression/decompression speed and moreover earlier takeoff when data is incompressible clearly makes lz4 a winner. -- Sphinx of black quartz judge my vow. From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 20:28:34 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 9EEA5876 for ; Thu, 4 Jul 2013 20:28:34 +0000 (UTC) (envelope-from jdc@koitsu.org) Received: from relay3-d.mail.gandi.net (relay3-d.mail.gandi.net [217.70.183.195]) by mx1.freebsd.org (Postfix) with ESMTP id 45DEA1D42 for ; Thu, 4 Jul 2013 20:28:34 +0000 (UTC) Received: from mfilter10-d.gandi.net (mfilter10-d.gandi.net [217.70.178.139]) by relay3-d.mail.gandi.net (Postfix) with ESMTP id 7E9AAA80BF; Thu, 4 Jul 2013 22:28:23 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at mfilter10-d.gandi.net Received: from relay3-d.mail.gandi.net ([217.70.183.195]) by mfilter10-d.gandi.net (mfilter10-d.gandi.net [10.0.15.180]) (amavisd-new, port 10024) with ESMTP id BkstVRh3uO85; Thu, 4 Jul 2013 22:28:21 +0200 (CEST) X-Originating-IP: 76.102.14.35 Received: from jdc.koitsu.org (c-76-102-14-35.hsd1.ca.comcast.net [76.102.14.35]) (Authenticated sender: jdc@koitsu.org) by relay3-d.mail.gandi.net (Postfix) with ESMTPSA id 368A0A80C0; Thu, 4 Jul 2013 22:28:20 +0200 (CEST) Received: by icarus.home.lan (Postfix, from userid 1000) id 7395A73A1D; Thu, 4 Jul 2013 13:28:18 -0700 (PDT) Date: Thu, 4 Jul 2013 13:28:18 -0700 From: Jeremy Chadwick To: Steven Hartland Subject: Re: Slow resilvering with mirrored ZIL Message-ID: <20130704202818.GB97119@icarus.home.lan> References: <2EF46A8C-6908-4160-BF99-EC610B3EA771@alumni.chalmers.se> <51D437E2.4060101@digsys.bg> <20130704000405.GA75529@icarus.home.lan> <20130704171637.GA94539@icarus.home.lan> <2A261BEA-4452-4F6A-8EFB-90A54D79CBB9@alumni.chalmers.se> <20130704191203.GA95642@icarus.home.lan> <43015E9015084CA6BAC6978F39D22E8B@multiplay.co.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <43015E9015084CA6BAC6978F39D22E8B@multiplay.co.uk> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 20:28:34 -0000 On Thu, Jul 04, 2013 at 08:41:18PM +0100, Steven Hartland wrote: > ----- Original Message ----- From: "Jeremy Chadwick" > > ... > >I believe -- but I need someone else to chime in here with confirmation, > >particularly someone who is familiar with ZFS's internals -- once your > >pool is ashift 12, you can do a disk replacement ***without*** having to > >do the gnop procedure (because the pool itself is already using ashift > >12). But again, I need someone to confirm that. > > Close, the ashift is a property of the vdev and not the entire pool so > if your adding a new vdev to the pool at least one of the devices in > said pool needs to report 4k sectors either natively or via the gnop > work around. I just looked at zdb -C -- you're right. I was visually parsing the ashift line/location as being part of the pool section, not the vdev section. Thank you for correcting me. And if I'm reading what you've said correctly, re: that only one device in the pool needs to have the gnop workaround for the vdev to end up using ashift=12, then that means "zpool replace" doesn't actually need the administrator to do the gnop trick on a disk replacement -- because the more I though about that, the more I realised that would open up a can of worms that would be painful to deal with if the pool was in use or the system could not be rebooted (I can explain this if asked). > Note our ZFS code doesn't currently recognise FreeBSD 4K quirks, this is > something I have a patch for but want to enhance before committing. This topic has come up before but I'll ask it again: is there some reason ashift still defaults to 9 / can't be increased to 12 ? I'm not sure of the impact in situations like "I had a vdev made long ago (ashift 9), then I added a new vdev to the pool (ashift 12) and now ZFS is threatening to murder my children..." :-) > >Next topic. > ... > >Combine this fact with the fact that 9.1-RELEASE does not support TRIM > >on ZFS, and you now have SSDs which are probably beat to hell and back. > > > >You really need to be running stable/9 if you want to use SSDs with ZFS. > >I cannot stress this enough. I will not bend on this fact. I do not > >care if what people have are SLC rather than MLC or TLC -- it doesn't > >matter. TRIM on ZFS is a downright necessity for long-term reliability > >of an SSD. Anyway... > > stable/8 also has TRIM support too now. Thanks -- didn't know that was MFC'd that far back. And thank you for your work on that, it's something I've been looking forward to for a long time now as you know, and I really do appreciate it. > >These SSDs need a full Secure Erase done to them. In stable/9 you can > >do this through camcontrol, otherwise you need to use Linux (there are > >live CD/DVD distros that can do this for you) or the vendor's native > >utilities (in Windows usually). > > When adding a new device to ZFS it will attempt to do a full TRIM so > this isn't 100% necessary but as some disks still get extra benefits > from this its still good if you want best performance. Ah, I wasn't aware of that, thanks. :-) But it would also be doing a TRIM of the LBA ranges associated with each partition, rather than the entire SSD. Meaning, in the example I gave (re: leaving untouched/unpartitioned space at the end of the drive for wear levelling), this would result in the untouched/unpartitioned space never being TRIM'd (by anything), thus the FTL map would still have references to those LBA ranges. That'd be potentially 30% of LBA ranges in the FTL (depending on past I/O of course -- no way to know), and the only thing that would work that out is the SSD's GC (which is known to kill performance if/when it kicks in). The situation would be different if the OP was using the entire SSD for ZFS (i.e. no partitioning), in that case yeah, a full TRIM would do the trick. Overall though, Secure Erase is probably wiser in this situation given that it's a one-time deal before putting the partitioned SSDs into their roles. He's using log devices, so once those are in place you gotta stick with 'em. Hmm, that gives me an idea actually -- if gpart(8) itself had a flag to induce TRIM for the LBA range of whatever was just created (gpart create) or added (gpart add). That way you could actually induce TRIM on those LBA ranges rather than rely on the FS to do it, or have to put faith into the SSD's GC (I rarely do :P). In the OP's case he could then make a freebsd-zfs partition filling up the remaining 30% with the flag to TRIM it, then when that was done immediately delete the partition. Hmm, not sure if what I'm saying makes sense or not, or if that's even a task/role gpart(8) should have... > ... > >Next topic... > > > >I would strongly recommend you not use 1 SSD for both log and cache. > >I understand your thought process here: "if the SSD dies, the log > >devices are mirrored so I'm okay, and the cache is throw-away anyway". > > While not ideal it still gives a significant boost against no SLOG, so > if thats what HW you have to work with, don't discount the benefit it > will bring. Sure, the advantage of no seek times due to NAND plays a big role, but some of these drives don't particularly perform well when used with a larger I/O queue depth. The OCZ he has is okay, but the Intel drive -- despite performing well for something of such low capacity (it's on par with that of the older X25-M G2 160GB drives) -- still has that capacity concern aspect, re: wear levelling needing 30% or so. The drive costs US$130. Now consider this: the Samsung 840 256GB (not the Pro) costs US$173 and will give you 2x the performance of that Intel drive -- and more importantly, 12x the capacity (that means 30% for wear levelling is hardly a concern). The 840 also performs significantly better at higher queue depths. I'm just saying that for about US$40 more you get something that is by far better and will last you longer. Low-capacity SSDs, even if SLC, are incredibly niche and I'm still not sure what demographic they're catering to. I'm making a lot of assumptions about his I/O workload too, of course. I myself tend to stay away from cache/log devices for the time being given that my workloads don't necessitate them. Persistent cache (yeah I know it's on the todo list) would interest me since the MCH on my board is maxed out at 8GB. > ... > >>nas# smartctl -a ada3 > >>ada3: Unable to detect device type > > > >My fault -- the syntax here is wrong, I should have been more clear: > > > >smartctl -a /dev/ada{0,5} > > > >Also, please update your ports tree and install smartmontools 6.1. > >There are improvements there pertaining to SSDs that are relevant. > > Also don't forget to update the disk DB using update-smart-drivedb. Yeah, that too. The stock drivedb.h that comes with 6.1 should have both his drive models correctly supported, if my memory serves me right. I don't follow the drivedb.h commits (at one point I did). -- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 20:32:01 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 0C609A25; Thu, 4 Jul 2013 20:32:01 +0000 (UTC) (envelope-from c.kworr@gmail.com) Received: from mail-wg0-x229.google.com (mail-wg0-x229.google.com [IPv6:2a00:1450:400c:c00::229]) by mx1.freebsd.org (Postfix) with ESMTP id 7238E1D6C; Thu, 4 Jul 2013 20:32:00 +0000 (UTC) Received: by mail-wg0-f41.google.com with SMTP id y10so6731948wgg.0 for ; Thu, 04 Jul 2013 13:31:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=4AGU8E2MqiiEjaqnBhgpZTOItU+qbhUGx+1Amb7981Q=; b=tzc6XnKFxizlCWhbiAQ3zpSrVpjTAwDqnTc5F+5J/FbKs2Vcr3BViekR3uvpwSuCdx /L0TF1Kq7JgH7aYXJYvM+SLPLY1KYgyYgfG56Fmdc4MoOIWz14I/BR0X2NVilOKr8d+S SPsmxf27RtksiGpRSjUGxi2Cxf+CELT+AgvVktmJg7KAjFrvhS0en68A86Qnyh683UxR hUymBdB8FPPhqss0Vy1tl1IyyRv4Meehv4ceTfm68dQiFVJ23UStc2Fto48Uv2fHEZm/ Ml9yrLpsTNJGwhHm9bYaJqN5EUYgk9m0jH8Ks6oQsQuYRihG+v7OHnBahvsUYNMSPMza P5vQ== X-Received: by 10.194.243.226 with SMTP id xb2mr4490174wjc.67.1372969919551; Thu, 04 Jul 2013 13:31:59 -0700 (PDT) Received: from limbo.xim.bz ([46.150.100.6]) by mx.google.com with ESMTPSA id p1sm36988520wix.9.2013.07.04.13.31.58 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 04 Jul 2013 13:31:58 -0700 (PDT) Message-ID: <51D5DBBD.70702@gmail.com> Date: Thu, 04 Jul 2013 23:31:57 +0300 From: Volodymyr Kostyrko User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130627 Thunderbird/17.0.7 MIME-Version: 1.0 To: Dmitry Morozovsky Subject: Re: ZFS default compression algo for contemporary FreeBSD versions References: <51D576E1.6030803@gmail.com> <51D59B6C.5030600@gmail.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org, avg@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 20:32:01 -0000 04.07.2013 19:05, Dmitry Morozovsky wrote: > On Thu, 4 Jul 2013, Volodymyr Kostyrko wrote: > >>>>> is it sane to just set 'zfs compression=on dataset' to achieve best algo >>>>> on >>>>> fresh FreeBSD systems (-current and/or stable/9)? >>>> >>>> No and this is not safe AFAIK. Default compression is still lzjb and >>>> bootloader can't boot oof datasets compressed with lzjb. However on >>>> stable/9 >>>> you can simply set zfs compression=lz4 pool and everything would work fine >>>> if >>>> you updated the boot loader. >>> >>> I did not intend to compress root/boot datasets (and there is no much sense >>> in >>> this AFAICS); >>> >>> the second (and actually more important) my question is -- is lz4 in general >>> better than lzjb? >> >> Yes. Much better in terms of speed. > > Then, next logical step semms to me is to make lz4 the default ;-P As far as the code is too young and most other distribution are behind in terms of compatibility this is a no go. My naive dream is to see lz4hc in ZFS too. This way I can just give up at compressing logs. -- Sphinx of black quartz judge my vow. From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 20:36:50 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 6C1B2C08; Thu, 4 Jul 2013 20:36:50 +0000 (UTC) (envelope-from delphij@delphij.net) Received: from anubis.delphij.net (anubis.delphij.net [IPv6:2001:470:1:117::25]) by mx1.freebsd.org (Postfix) with ESMTP id 461AD1D9B; Thu, 4 Jul 2013 20:36:50 +0000 (UTC) Received: from delphij-macbook.local (c-67-188-85-47.hsd1.ca.comcast.net [67.188.85.47]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by anubis.delphij.net (Postfix) with ESMTPSA id 0CEBA9897; Thu, 4 Jul 2013 13:36:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=delphij.net; s=anubis; t=1372970209; bh=pz1QjjsZvvahfcmdN2qvXIjtvM85bzropJ1k+WNrlgw=; h=Date:From:Reply-To:To:CC:Subject:References:In-Reply-To; b=iMK6lJYsXSK5BUbcQBRRW3rC/4AkZElroawrlyQR9P+IOxMi5cWvhE7X3Uthz6M1S QLf/azlDklAeiwdHq58aHmarhjlJMKnhFUqynbGpEIy+X3bKfNfJw/MtB/8o5aVmai LmAuua66cUPpYUw+pu4OlzNDuNEwp3ZZQ8BaEH04= Message-ID: <51D5DCDF.2030503@delphij.net> Date: Thu, 04 Jul 2013 13:36:47 -0700 From: Xin Li Organization: The FreeBSD Project MIME-Version: 1.0 To: Volodymyr Kostyrko Subject: Re: ZFS default compression algo for contemporary FreeBSD versions References: <51D576E1.6030803@gmail.com> <51D59B6C.5030600@gmail.com> <51D59C88.9060403@FreeBSD.org> <51D5DAB9.4070507@gmail.com> In-Reply-To: <51D5DAB9.4070507@gmail.com> X-Enigmail-Version: 1.5.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org, Dmitry Morozovsky , Andriy Gapon X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: d@delphij.net List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 20:36:50 -0000 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 On 7/4/13 1:27 PM, Volodymyr Kostyrko wrote: > 04.07.2013 19:02, Andriy Gapon wrote: >> on 04/07/2013 18:57 Volodymyr Kostyrko said the following: >>> Yes. Much better in terms of speed. >> >> And compression too. > > Can't really say. > > When the code first appeared in stable I moved two of my machines > (desktops) to LZ4 recreating each dataset. To my surprise gain at > transition from lzjb was fairly minimal and sometimes LZ4 even > loses to lzjb in compression size. However better > compression/decompression speed and moreover earlier takeoff when > data is incompressible clearly makes lz4 a winner. I'm interested in this -- what's the nature of data on that dataset (e.g. plain text? binaries? images?) Cheers, -----BEGIN PGP SIGNATURE----- iQEcBAEBCAAGBQJR1dzfAAoJEG80Jeu8UPuz84AIAMp8BT/a/H4tX1AzytmHxf5o zVC7yj59BnfmkgdeKbo49fIiEafg2FHRXNsGGQ/3TvMliDqmNPvTmJQwDApH9Efl oysm/OVcSy7ZtiT/3M2AQqNyzaIB90pidGYwO6oqZ7gwtMi6FJuiwZHsBMiHU92c F6tieTICIWKj8cF60oWBP+kx8oM4cTfdOt1S2SGfcaBySQdmw3B7Yxg7pLHoUZ1+ 6zcaoqFgSEIS8Svnk6pdOCRREUTcAKmE1W7SbEJEgeTbJ7TaCMt64yZoyQqeyayl KBo9v8/9mYwz89L1ljHoYAXInBMUQxqIiHPvN/H1QFSZomERaLjFSYDYaIbMGeQ= =6Zmf -----END PGP SIGNATURE----- From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 20:44:56 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 971E4F0F; Thu, 4 Jul 2013 20:44:56 +0000 (UTC) (envelope-from c.kworr@gmail.com) Received: from mail-ea0-x235.google.com (mail-ea0-x235.google.com [IPv6:2a00:1450:4013:c01::235]) by mx1.freebsd.org (Postfix) with ESMTP id 055D91DF4; Thu, 4 Jul 2013 20:44:55 +0000 (UTC) Received: by mail-ea0-f181.google.com with SMTP id a15so1073104eae.12 for ; Thu, 04 Jul 2013 13:44:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=+sEwFB6EFkD0opZX8QQbeDALsvZPAMfBBlzsoBKid/s=; b=VcEQ8XMTTkxJdjU3V2vSbNbeBREA75woFe08GcJa/MSiplRUgZgqGM9yL1JIe1MvVw 03jhnFKCtY6gLCPsdKMAt5IhfmFuUlXQpPK8+qdwBMzpynGYT2v29VD2elysRmU4N83l g7noXnfnRWBMaPYoSPCnAXHZbYt37PeGr8iE1u7eZrBCdmn70iHpxGjePelKal//OhD+ FDaW/DEVkOWUmmZb8+XuIp1UAkifb+Faav3sjrlc26NvfrveXfwtuMssW2kjbBbELh6V oPHl3wgsXbVB/AbRBB6ZJGK/MRpTe7GFdrfXoXQhmxKP99roeJr2LKkJ7b6EtYCWs/+w DXjg== X-Received: by 10.14.203.194 with SMTP id f42mr8761491eeo.53.1372970694908; Thu, 04 Jul 2013 13:44:54 -0700 (PDT) Received: from limbo.xim.bz ([46.150.100.6]) by mx.google.com with ESMTPSA id b3sm7958963eev.10.2013.07.04.13.44.53 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 04 Jul 2013 13:44:54 -0700 (PDT) Message-ID: <51D5DEC4.2000101@gmail.com> Date: Thu, 04 Jul 2013 23:44:52 +0300 From: Volodymyr Kostyrko User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130627 Thunderbird/17.0.7 MIME-Version: 1.0 To: d@delphij.net Subject: Re: ZFS default compression algo for contemporary FreeBSD versions References: <51D576E1.6030803@gmail.com> <51D59B6C.5030600@gmail.com> <51D59C88.9060403@FreeBSD.org> <51D5DAB9.4070507@gmail.com> <51D5DCDF.2030503@delphij.net> In-Reply-To: <51D5DCDF.2030503@delphij.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org, Dmitry Morozovsky , Andriy Gapon X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 20:44:56 -0000 04.07.2013 23:36, Xin Li wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA256 > > On 7/4/13 1:27 PM, Volodymyr Kostyrko wrote: >> 04.07.2013 19:02, Andriy Gapon wrote: >>> on 04/07/2013 18:57 Volodymyr Kostyrko said the following: >>>> Yes. Much better in terms of speed. >>> >>> And compression too. >> >> Can't really say. >> >> When the code first appeared in stable I moved two of my machines >> (desktops) to LZ4 recreating each dataset. To my surprise gain at >> transition from lzjb was fairly minimal and sometimes LZ4 even >> loses to lzjb in compression size. However better >> compression/decompression speed and moreover earlier takeoff when >> data is incompressible clearly makes lz4 a winner. > > I'm interested in this -- what's the nature of data on that dataset > (e.g. plain text? binaries? images?) Triple no. Biggest difference in lzjb favor was at zvol with Mac OS X Snow Leo. Maybe it's just because recordsize is too small on zvols? Anyway the difference was like a 1% or 2%. Can't remember but can retest. -- Sphinx of black quartz judge my vow. From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 20:56:45 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id AB20039F for ; Thu, 4 Jul 2013 20:56:45 +0000 (UTC) (envelope-from fjwcash@gmail.com) Received: from mail-qe0-x22a.google.com (mail-qe0-x22a.google.com [IPv6:2607:f8b0:400d:c02::22a]) by mx1.freebsd.org (Postfix) with ESMTP id 6D5311E73 for ; Thu, 4 Jul 2013 20:56:45 +0000 (UTC) Received: by mail-qe0-f42.google.com with SMTP id s14so944442qeb.1 for ; Thu, 04 Jul 2013 13:56:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=71zB1ShQL8wMOQ/kMK4DhSN9qy41kWRVxB7T9WZVhk4=; b=n1DgRrpUhaini932NlyH6uCbxGQDFFPPFX7pIbQoUxeOT7IXUslNB2FNWRK5cprpua RLpdzKKs/+JyXyN1G6c7Mwfv2uyrt+By3u6SXgim2TuI24WRlYuSN48JTvEexkDe1J2s BFd6NiVuXl1kYSkYpxhW8KWHLlZQSjRvZQ3Yu2FLG4y9WaxQaNVlitG/8NMlYY1+k6Ee B5mr/PLyiFdmm3dthWheA4OTLRn8mICZDKzsWUfup2VT2Rc9zevMe/mJQQKOHi5pUCuS w9GwivmxpqQrEXRtMteP9YKxEXv1zZ8AqZXvZ5YaBEPtaiWkOH3Gr0xqd6HKnFVyIwkb nddw== MIME-Version: 1.0 X-Received: by 10.224.79.14 with SMTP id n14mr7862297qak.114.1372971404931; Thu, 04 Jul 2013 13:56:44 -0700 (PDT) Received: by 10.49.49.135 with HTTP; Thu, 4 Jul 2013 13:56:44 -0700 (PDT) In-Reply-To: <20130704191203.GA95642@icarus.home.lan> References: <51D42107.1050107@digsys.bg> <2EF46A8C-6908-4160-BF99-EC610B3EA771@alumni.chalmers.se> <51D437E2.4060101@digsys.bg> <20130704000405.GA75529@icarus.home.lan> <20130704171637.GA94539@icarus.home.lan> <2A261BEA-4452-4F6A-8EFB-90A54D79CBB9@alumni.chalmers.se> <20130704191203.GA95642@icarus.home.lan> Date: Thu, 4 Jul 2013 13:56:44 -0700 Message-ID: Subject: Re: Slow resilvering with mirrored ZIL From: Freddie Cash To: Jeremy Chadwick Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: FreeBSD Filesystems X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 20:56:45 -0000 On Thu, Jul 4, 2013 at 12:12 PM, Jeremy Chadwick wrote: > I believe -- but I need someone else to chime in here with confirmation, > particularly someone who is familiar with ZFS's internals -- once your > pool is ashift 12, you can do a disk replacement ***without*** having to > do the gnop procedure (because the pool itself is already using ashift > 12). But again, I need someone to confirm that. > Correct. The ashift property of a vdev is set at creation time and cannot be changed (AFAIK) without destroying/recreating the pool. Thus, you can use gnop to create the vdev with ashift=12, and then just do normal "zpool replace" or "zpool detach/attach" to replace drives in the vdevs (512b or 4K drives) without gnop. Haven't read the code :) but I have done many, many drive replacements on ashift=9 and ashift=12 vdevs and watched what happens via zdb. :) The WD10EARS are known for excessively parking their heads, which > causes massive performance problems with both reads and writes. This is > known by PC enthusiasts as the "LCC issue" (LCC = Load Cycle Count, > referring to SMART attribute 193). > > On these drives there are ways to work around this issue -- it > specifically involves disabling drive-level APM. To do so, you have to > initiate a specific ATA CDB to the drive using "camcontrol cmd", and > this has to be done every time the system reboots. There is one > drawback to disabling APM as well: the drives run hotter. > On some WD Green drives, depending on the firmware and manufacturing date, you can use the wdidle3.exe program (via a DOS boot) to set the timeout to either "disabled" or "15 minutes" which is usually enough to prevent most of the head-parking wear-out issues. However, I believe this only worked up until Dec 2011 or Dec 2012? We had the misfortune of using 12 of these in a ZFS storage box when they were first released (2 TB for under $150? Hell Yeah! Ooops, you get what you pay for ...). We quickly replaced them. You really need to be running stable/9 if you want to use SSDs with ZFS. > I cannot stress this enough. I will not bend on this fact. I do not > care if what people have are SLC rather than MLC or TLC -- it doesn't > matter. TRIM on ZFS is a downright necessity for long-term reliability > of an SSD. Anyway... > One can mitigate this a little by leaving 25% of the SSD unpartitioned/unformatted, thus allowing the background GC process to work without impacting performance and providing long-term performance that's close to (but not quite 100%) after-TRIM performance. Takes a lot of will-power to leave 8-16-odd GB free on an SSD that cost close to $200, though. :) It's not perfect, it's not as good as using TRIM, but at least it's doable on FreeBSD pre-9.1-STABLE. > > You should probably be made aware of the fact that SSDs need to be > kept roughly 30-40% unused to get the most benefits out of wear > levelling. Once you hit the 20% remaining mark, performance takes a > hit, and the drive begins hurting more and more. Low-capacity SSDs > are therefore generally worthless given the capacity limitation need. > Ah, I see you mention what I did above. :) Guess that's what I get for not reading all the way through before starting a reply. :) -- Freddie Cash fjwcash@gmail.com From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 21:01:11 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 6975C5CA for ; Thu, 4 Jul 2013 21:01:11 +0000 (UTC) (envelope-from fjwcash@gmail.com) Received: from mail-qc0-x22f.google.com (mail-qc0-x22f.google.com [IPv6:2607:f8b0:400d:c01::22f]) by mx1.freebsd.org (Postfix) with ESMTP id 2A0891E9D for ; Thu, 4 Jul 2013 21:01:11 +0000 (UTC) Received: by mail-qc0-f175.google.com with SMTP id k14so945096qcv.20 for ; Thu, 04 Jul 2013 14:01:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=ImaN5YtmhW6Y8Zx4qrfiylEE6uLNOxjuWJntA8ZFe5g=; b=jeNo6x5v9EL8ia3BxMCGTXMI4jmTFKCFa9G5bCG9TqIyKZW/9us07wE91Mqdz2FjIt n9lF39bx+k7Vbo/9guGT5SNFMIvchLSquH6Kfsk1vZZ13P8pA2Shq4c3No7GQQNOWB64 nZcpA93WMc4cegogN97+wN/DYKa+hlIsC8Frw/YEPhU9IKJKnxDZ+ixG093O4GOlRRA3 N8SUqkFipOCJ0fOaQ1BEkqk5YNDezvYtSFbKmCqIwtpRsvd8OI8PgFaD7XlLyriYjqt3 fS/xJmE2Ghsf2j+PdW8rTCvqJM1fJWjH4tjQnJRCmYi1Xki66ISXyzC8qJ97gjV0kUzF uSnA== MIME-Version: 1.0 X-Received: by 10.49.83.73 with SMTP id o9mr5313304qey.71.1372971670700; Thu, 04 Jul 2013 14:01:10 -0700 (PDT) Received: by 10.49.49.135 with HTTP; Thu, 4 Jul 2013 14:01:10 -0700 (PDT) In-Reply-To: <43015E9015084CA6BAC6978F39D22E8B@multiplay.co.uk> References: <51D42107.1050107@digsys.bg> <2EF46A8C-6908-4160-BF99-EC610B3EA771@alumni.chalmers.se> <51D437E2.4060101@digsys.bg> <20130704000405.GA75529@icarus.home.lan> <20130704171637.GA94539@icarus.home.lan> <2A261BEA-4452-4F6A-8EFB-90A54D79CBB9@alumni.chalmers.se> <20130704191203.GA95642@icarus.home.lan> <43015E9015084CA6BAC6978F39D22E8B@multiplay.co.uk> Date: Thu, 4 Jul 2013 14:01:10 -0700 Message-ID: Subject: Re: Slow resilvering with mirrored ZIL From: Freddie Cash To: Steven Hartland Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: FreeBSD Filesystems X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 21:01:11 -0000 On Thu, Jul 4, 2013 at 12:41 PM, Steven Hartland wrote: > ----- Original Message ----- From: "Jeremy Chadwick" > ... > >> I believe -- but I need someone else to chime in here with confirmation, >> particularly someone who is familiar with ZFS's internals -- once your >> pool is ashift 12, you can do a disk replacement ***without*** having to >> do the gnop procedure (because the pool itself is already using ashift >> 12). But again, I need someone to confirm that. >> > > Close, the ashift is a property of the vdev and not the entire pool so > if your adding a new vdev to the pool at least one of the devices in > said pool needs to report 4k sectors either natively or via the gnop > ^^^^^^^^^^^^ > work around. > Typo? "... so if you're adding a new vdev to the pool at least one of the devices in said VDEV needs to report ..." I made the mistake of thinking ashift was a property of the pool and added 3x 6-drive raidz2 vdevs to an existing pool of 4x 6-drive raidz vdevs without using gnop .... and now have 3 vdevs with ashift=9 and 4 vdevs with ashift=12. :( Here's hoping the box gets replaced before 512b drives are discontinued completely ... I now have it on my "zpool add" checklist to always use gnop devices, regardless of what kind of drive is being used. -- Freddie Cash fjwcash@gmail.com From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 21:08:07 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 46B5E849; Thu, 4 Jul 2013 21:08:07 +0000 (UTC) (envelope-from delphij@delphij.net) Received: from anubis.delphij.net (anubis.delphij.net [64.62.153.212]) by mx1.freebsd.org (Postfix) with ESMTP id 300CE1EEF; Thu, 4 Jul 2013 21:08:06 +0000 (UTC) Received: from delphij-macbook.local (c-67-188-85-47.hsd1.ca.comcast.net [67.188.85.47]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by anubis.delphij.net (Postfix) with ESMTPSA id E35F89A87; Thu, 4 Jul 2013 14:07:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=delphij.net; s=anubis; t=1372972080; bh=T4HS+2+bnmvaUlzVHtcvbjYw2mvVl+2CpOqyWTtpwiY=; h=Date:From:Reply-To:To:CC:Subject:References:In-Reply-To; b=e2AwDyBo/o8Cmq0KUNwCjNipIO6H13IYHDNnYgY1OdeXjni4YnRQZVhoGomvOawXF wA8ZowHmCE17Tsgoa8MgHe5DdAoQvSDpN0tu8uUFnRYQEvE8mzURM5ID2lzwxqxShs uGtNQ76Ak60GFXzhfldvrxv/8cfHrbgSaCJnK3Yw= Message-ID: <51D5E42C.5010506@delphij.net> Date: Thu, 04 Jul 2013 14:07:56 -0700 From: Xin Li Organization: The FreeBSD Project MIME-Version: 1.0 To: Volodymyr Kostyrko Subject: Re: ZFS default compression algo for contemporary FreeBSD versions References: <51D576E1.6030803@gmail.com> <51D59B6C.5030600@gmail.com> <51D59C88.9060403@FreeBSD.org> <51D5DAB9.4070507@gmail.com> <51D5DCDF.2030503@delphij.net> <51D5DEC4.2000101@gmail.com> In-Reply-To: <51D5DEC4.2000101@gmail.com> X-Enigmail-Version: 1.5.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org, d@delphij.net, Dmitry Morozovsky , Andriy Gapon X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: d@delphij.net List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 21:08:07 -0000 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 On 7/4/13 1:44 PM, Volodymyr Kostyrko wrote: > 04.07.2013 23:36, Xin Li wrote: >> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 >> >> On 7/4/13 1:27 PM, Volodymyr Kostyrko wrote: >>> 04.07.2013 19:02, Andriy Gapon wrote: >>>> on 04/07/2013 18:57 Volodymyr Kostyrko said the following: >>>>> Yes. Much better in terms of speed. >>>> >>>> And compression too. >>> >>> Can't really say. >>> >>> When the code first appeared in stable I moved two of my >>> machines (desktops) to LZ4 recreating each dataset. To my >>> surprise gain at transition from lzjb was fairly minimal and >>> sometimes LZ4 even loses to lzjb in compression size. However >>> better compression/decompression speed and moreover earlier >>> takeoff when data is incompressible clearly makes lz4 a >>> winner. >> >> I'm interested in this -- what's the nature of data on that >> dataset (e.g. plain text? binaries? images?) > > Triple no. Biggest difference in lzjb favor was at zvol with Mac OS > X Snow Leo. > > Maybe it's just because recordsize is too small on zvols? Anyway > the difference was like a 1% or 2%. Can't remember but can retest. Hmm that's weird. I haven't tried Mac iSCSI volumes but do have tried Windows iSCSI volumes, and lz4 was a win. It may be helpful if you can post your 'zfs get all ' output so we can try to reproduce the problem at lab? Cheers, -----BEGIN PGP SIGNATURE----- iQEcBAEBCAAGBQJR1eQsAAoJEG80Jeu8UPuzj4sH/ipcY7uo5tvYFj5YJOpTBZgK CR6LVtSTmdVL9EXWQvLiT6pCSwxQcKhWWlGIhFjyacfVop8r/hGDjuB+HejtM3AT ryebN152Wt/5f15KZg5Wa6ccwIf50bS4H6sIDb6LcSxmHwEFh7U7+FqWbfIcvK/E zuYmmIgLAkpEav0BpfaTJvslL+dc2P11nDkMKe0nlAHeCeXouIQKwG0MleMowguq gZ+j01w2VYNnSOo5O7sBtl8k4J8p5tKwN/ZUDN2rLXLRR+shMqAGFjNAqJOvcqUe uTjskE0yph4LBQ2r0fcjxFrM3q4Cjj0kNi42my+7IjJbHWE9RiiXD/dboAJNhak= =A7yu -----END PGP SIGNATURE----- From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 23:21:24 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 057854D9 for ; Thu, 4 Jul 2013 23:21:24 +0000 (UTC) (envelope-from prvs=1897fed9ea=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 879C81381 for ; Thu, 4 Jul 2013 23:21:23 +0000 (UTC) Received: from r2d2 ([82.69.141.170]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50004731583.msg for ; Fri, 05 Jul 2013 00:21:20 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Fri, 05 Jul 2013 00:21:20 +0100 (not processed: message from valid local sender) X-MDDKIM-Result: neutral (mail1.multiplay.co.uk) X-MDRemoteIP: 82.69.141.170 X-Return-Path: prvs=1897fed9ea=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-fs@freebsd.org Message-ID: <8280798FCEE74CB08536F3BC43C9207F@multiplay.co.uk> From: "Steven Hartland" To: "Jeremy Chadwick" References: <2EF46A8C-6908-4160-BF99-EC610B3EA771@alumni.chalmers.se> <51D437E2.4060101@digsys.bg> <20130704000405.GA75529@icarus.home.lan> <20130704171637.GA94539@icarus.home.lan> <2A261BEA-4452-4F6A-8EFB-90A54D79CBB9@alumni.chalmers.se> <20130704191203.GA95642@icarus.home.lan> <43015E9015084CA6BAC6978F39D22E8B@multiplay.co.uk> <20130704202818.GB97119@icarus.home.lan> Subject: Re: Slow resilvering with mirrored ZIL Date: Fri, 5 Jul 2013 00:21:32 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 23:21:24 -0000 ----- Original Message ----- From: "Jeremy Chadwick" > On Thu, Jul 04, 2013 at 08:41:18PM +0100, Steven Hartland wrote: >> ----- Original Message ----- From: "Jeremy Chadwick" >> >> ... >> >I believe -- but I need someone else to chime in here with confirmation, >> >particularly someone who is familiar with ZFS's internals -- once your >> >pool is ashift 12, you can do a disk replacement ***without*** having to >> >do the gnop procedure (because the pool itself is already using ashift >> >12). But again, I need someone to confirm that. >> >> Close, the ashift is a property of the vdev and not the entire pool so >> if your adding a new vdev to the pool at least one of the devices in >> said pool needs to report 4k sectors either natively or via the gnop >> work around. > > I just looked at zdb -C -- you're right. I was visually parsing the > ashift line/location as being part of the pool section, not the vdev > section. Thank you for correcting me. > > And if I'm reading what you've said correctly, re: that only one device > in the pool needs to have the gnop workaround for the vdev to end up > using ashift=12, then that means "zpool replace" doesn't actually need > the administrator to do the gnop trick on a disk replacement -- because > the more I though about that, the more I realised that would open up a > can of worms that would be painful to deal with if the pool was in use > or the system could not be rebooted (I can explain this if asked). Correct once the vdev is created its ashift is fixed so replaces don't need any gnop. Same with pool creation only of of the devices in the vdev need to have gnop even for like a 6 disk RAIDZ >> Note our ZFS code doesn't currently recognise FreeBSD 4K quirks, this is >> something I have a patch for but want to enhance before committing. > > This topic has come up before but I'll ask it again: is there some > reason ashift still defaults to 9 / can't be increased to 12 ? In my patch you can configure the default "desired" ashift aka minimum and it defaults to 12. There was concern about the overhead this added for none 4k disks especially when dealing with small files on none compressed volumes. Which is why I'm still working on being able to easily override ashift on creation. > I'm not sure of the impact in situations like "I had a vdev made long > ago (ashift 9), then I added a new vdev to the pool (ashift 12) and now > ZFS is threatening to murder my children..." :-) I don't believe it cares but it could cause odd performance issue due to the unbalanced nature of the pool, but if your adding a vdev to the pool you'll already have that ;-) >> >These SSDs need a full Secure Erase done to them. In stable/9 you can >> >do this through camcontrol, otherwise you need to use Linux (there are >> >live CD/DVD distros that can do this for you) or the vendor's native >> >utilities (in Windows usually). >> >> When adding a new device to ZFS it will attempt to do a full TRIM so >> this isn't 100% necessary but as some disks still get extra benefits >> from this its still good if you want best performance. > > Ah, I wasn't aware of that, thanks. :-) > > But it would also be doing a TRIM of the LBA ranges associated with each > partition, rather than the entire SSD. > > Meaning, in the example I gave (re: leaving untouched/unpartitioned > space at the end of the drive for wear levelling), this would result in > the untouched/unpartitioned space never being TRIM'd (by anything), thus > the FTL map would still have references to those LBA ranges. That'd be > potentially 30% of LBA ranges in the FTL (depending on past I/O of > course -- no way to know), and the only thing that would work that out > is the SSD's GC (which is known to kill performance if/when it kicks > in). Correct which is one of the reasons a full secure erase is a good idea :) > The situation would be different if the OP was using the entire SSD for > ZFS (i.e. no partitioning), in that case yeah, a full TRIM would do the > trick. Yes and no depending on the disk, its been noted that TRIM even a full disk TRIM doesn't result in the same preformance restoration as a secure erase, another reason to still do a secure erase if you can. > Overall though, Secure Erase is probably wiser in this situation given > that it's a one-time deal before putting the partitioned SSDs into their > roles. He's using log devices, so once those are in place you gotta > stick with 'em. > > Hmm, that gives me an idea actually -- if gpart(8) itself had a flag to > induce TRIM for the LBA range of whatever was just created (gpart > create) or added (gpart add). That way you could actually induce TRIM > on those LBA ranges rather than rely on the FS to do it, or have to put > faith into the SSD's GC (I rarely do :P). In the OP's case he could > then make a freebsd-zfs partition filling up the remaining 30% with the > flag to TRIM it, then when that was done immediately delete the > partition. Hmm, not sure if what I'm saying makes sense or not, or if > that's even a task/role gpart(8) should have... You mean like the following PR, which is on my list for when I get some free time: http://www.freebsd.org/cgi/query-pr.cgi?pr=175943 >> ... >> >Next topic... >> > >> >I would strongly recommend you not use 1 SSD for both log and cache. >> >I understand your thought process here: "if the SSD dies, the log >> >devices are mirrored so I'm okay, and the cache is throw-away anyway". >> >> While not ideal it still gives a significant boost against no SLOG, so >> if thats what HW you have to work with, don't discount the benefit it >> will bring. > > Sure, the advantage of no seek times due to NAND plays a big role, but > some of these drives don't particularly perform well when used with a > larger I/O queue depth. The OCZ he has is okay, but the Intel drive -- > despite performing well for something of such low capacity (it's on par > with that of the older X25-M G2 160GB drives) -- still has that capacity > concern aspect, re: wear levelling needing 30% or so. The drive costs > US$130. > > Now consider this: the Samsung 840 256GB (not the Pro) costs US$173 > and will give you 2x the performance of that Intel drive -- and more > importantly, 12x the capacity (that means 30% for wear levelling is > hardly a concern). The 840 also performs significantly better at higher > queue depths. I'm just saying that for about US$40 more you get > something that is by far better and will last you longer. Low-capacity > SSDs, even if SLC, are incredibly niche and I'm still not sure what > demographic they're catering to. Absolutely, also factor in that TRIM on Sandforce disks is very slow; so much so that big deletes can easily become a significant performance bottleneck, so TRIM isn't always the golden bullet so to speak. > I'm making a lot of assumptions about his I/O workload too, of course. > I myself tend to stay away from cache/log devices for the time being > given that my workloads don't necessitate them. Persistent cache (yeah > I know it's on the todo list) would interest me since the MCH on my > board is maxed out at 8GB. To give a concrete example which may well be of use for others, we had a mysql box here with dual 60GB SSD L2ARC's whic after continuous increases in query write traffic we ended up with total IO saturation. As a test we removed the L2ARC, partitioning them into a 10GB SLOG and 40GB L2ARC and the machine was utterly transformed, from constant 100% disk IO to 10% as the SLOG's soaked up the sync transfers from mysql. Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 23:27:58 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id E28EA58F for ; Thu, 4 Jul 2013 23:27:58 +0000 (UTC) (envelope-from prvs=1897fed9ea=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 82D2813B1 for ; Thu, 4 Jul 2013 23:27:58 +0000 (UTC) Received: from r2d2 ([82.69.141.170]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50004731645.msg for ; Fri, 05 Jul 2013 00:27:56 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Fri, 05 Jul 2013 00:27:56 +0100 (not processed: message from valid local sender) X-MDDKIM-Result: neutral (mail1.multiplay.co.uk) X-MDRemoteIP: 82.69.141.170 X-Return-Path: prvs=1897fed9ea=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-fs@freebsd.org Message-ID: <3CFB4564D8EB4A6A9BCE2AFCC5B6E400@multiplay.co.uk> From: "Steven Hartland" To: "Freddie Cash" References: <51D42107.1050107@digsys.bg><2EF46A8C-6908-4160-BF99-EC610B3EA771@alumni.chalmers.se><51D437E2.4060101@digsys.bg><20130704000405.GA75529@icarus.home.lan><20130704171637.GA94539@icarus.home.lan><2A261BEA-4452-4F6A-8EFB-90A54D79CBB9@alumni.chalmers.se><20130704191203.GA95642@icarus.home.lan><43015E9015084CA6BAC6978F39D22E8B@multiplay.co.uk> Subject: Re: Slow resilvering with mirrored ZIL Date: Fri, 5 Jul 2013 00:28:07 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="utf-8"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 Cc: FreeBSD Filesystems X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 23:27:58 -0000 ----- Original Message ----- From: "Freddie Cash" >>> I believe -- but I need someone else to chime in here with confirmation, >>> particularly someone who is familiar with ZFS's internals -- once your >>> pool is ashift 12, you can do a disk replacement ***without*** having to >>> do the gnop procedure (because the pool itself is already using ashift >>> 12). But again, I need someone to confirm that. >>> >> >> Close, the ashift is a property of the vdev and not the entire pool so >> if your adding a new vdev to the pool at least one of the devices in >> said pool needs to report 4k sectors either natively or via the gnop >> > ^^^^^^^^^^^^ > > >> work around. >> > > Typo? This should have read "at least one of the devices your adding as a new vdev to the pool" > "... so if you're adding a new vdev to the pool at least one of the devices > in > said VDEV needs to report ..." > > I made the mistake of thinking ashift was a property of the pool and added > 3x 6-drive raidz2 vdevs to an existing pool of 4x 6-drive raidz vdevs > without using gnop .... and now have 3 vdevs with ashift=9 and 4 vdevs with > ashift=12. :( Here's hoping the box gets replaced before 512b drives are > discontinued completely ... > > I now have it on my "zpool add" checklist to always use gnop devices, > regardless of what kind of drive is being used. If anyone wants my current patches which add switch to 4k ashift by default as a sysctl + works with QUIRKS too, just let me know. They are well tested, just we want more options before putting in the tree. Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 23:51:50 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id D6AA99AD for ; Thu, 4 Jul 2013 23:51:50 +0000 (UTC) (envelope-from amvandemore@gmail.com) Received: from mail-pd0-x234.google.com (mail-pd0-x234.google.com [IPv6:2607:f8b0:400e:c02::234]) by mx1.freebsd.org (Postfix) with ESMTP id B122715E4 for ; Thu, 4 Jul 2013 23:51:50 +0000 (UTC) Received: by mail-pd0-f180.google.com with SMTP id 10so1454588pdi.25 for ; Thu, 04 Jul 2013 16:51:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=XLeypeU15ArMQ50FImCh4eJoqC5p1O9W0uq7SZBM3fw=; b=HahJVqU/kZnPgiJ2riywe/YP35OpmyGYEEH+C3wjetL2vpK4zGDNT6UI7bSDP8sDTw ALLvhqaAbRroj0zABn6bib6rjKAiws9z20Vl784i6Wrloe1g9BP4OnnzhPNe+84Z9bWT dnN+jhVB+17c9Jh99LmR7WhGRhinSKp5K6dUDAKOZM/wpziBqJlKgE6CWLuHfFB115vX 34J4pqVorofKuWFwP+RueqAOEEjVs2mwNZH3S8+RI7rhJijF6hLmASiqUO2NctOFVw64 YJuaANtdPNZAw5ppPTaEG+KZzA0ybSYyz10SFrekVuYEhEsWmE5IUzVPvEqcAh36hWY2 FW7g== MIME-Version: 1.0 X-Received: by 10.68.245.200 with SMTP id xq8mr7265803pbc.32.1372981910529; Thu, 04 Jul 2013 16:51:50 -0700 (PDT) Received: by 10.70.88.74 with HTTP; Thu, 4 Jul 2013 16:51:50 -0700 (PDT) In-Reply-To: <20130704082209.GB83766@icarus.home.lan> References: <871u7g57rl.wl%berend@pobox.com> <87mwq34emp.wl%berend@pobox.com> <20130703200241.GB60515@in-addr.com> <87k3l748gb.wl%berend@pobox.com> <20130703233631.GA74698@icarus.home.lan> <87d2qz42q4.wl%berend@pobox.com> <20130704010815.GB75529@icarus.home.lan> <8761wr3xxk.wl%berend@pobox.com> <36872A46E9BE40688B8F59FD05D4ECE9@multiplay.co.uk> <20130704082209.GB83766@icarus.home.lan> Date: Thu, 4 Jul 2013 18:51:50 -0500 Message-ID: Subject: Re: EBS snapshot backups from a FreeBSD zfs file system: zpool freeze? From: Adam Vande More To: Jeremy Chadwick Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 23:51:50 -0000 On Thu, Jul 4, 2013 at 3:22 AM, Jeremy Chadwick wrote: > The issue is a > deal-breaker for me; if it's not for you, great. I'm not quite clear on why this is a "deal-breaker" for. The stalls are a blink of an eyelash here, but reproducible at least sort of. Maybe if you had a good use case scenario demonstrating the problem it would attract more attention from those able to fix it. And a PR if one doesn't already exist. -- Adam Vande More From owner-freebsd-fs@FreeBSD.ORG Fri Jul 5 00:37:15 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 85A6595 for ; Fri, 5 Jul 2013 00:37:15 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id F054116F7 for ; Fri, 5 Jul 2013 00:37:14 +0000 (UTC) X-Cloudmark-SP-Filtered: true X-Cloudmark-SP-Result: v=1.1 cv=u+Bwc9JL7tMNtl/i9xObSTPSFclN5AOtXcIZY5dPsHA= c=1 sm=2 a=2CN1efILQXEA:10 a=FKkrIqjQGGEA:10 a=HOrS8CuyQosA:10 a=IkcTkHD0fZMA:10 a=qpnfIXV0AAAA:8 a=6I5d2MoRAAAA:8 a=EuJvPkJV90wfAGCKYGcA:9 a=QEXdDO2ut3YA:10 a=SV7veod9ZcQA:10 a=PQGfWuSqlghChR6S:21 a=jMxOjptq5P4Fa_aM:21 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqMEAJIU1lGDaFve/2dsb2JhbABagztJgwi9M4EXdIIjAQEBAwEBAQEgBCcgCwUWDgoCAg0FFAIpAQkmBggHBAEcBIdoBgyoEZB8gSaMeRp+ATMHEg2CMoEcA5UEg26QHIMtIDKBAzc X-IronPort-AV: E=Sophos;i="4.87,998,1363147200"; d="scan'208";a="38887181" Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.222]) by esa-jnhn.mail.uoguelph.ca with ESMTP; 04 Jul 2013 20:37:07 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 83EDCB3F47; Thu, 4 Jul 2013 20:37:07 -0400 (EDT) Date: Thu, 4 Jul 2013 20:37:07 -0400 (EDT) From: Rick Macklem To: Oleg Sharoyko Message-ID: <1821939739.2131305.1372984627528.JavaMail.root@uoguelph.ca> In-Reply-To: Subject: Re: NFSv4 and Kerberos, group permission seem to be ignored MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.201] X-Mailer: Zimbra 7.2.1_GA_2790 (ZimbraWebClient - FF3.0 (Win)/7.2.1_GA_2790) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Jul 2013 00:37:15 -0000 Oleg Sharoyko wrote: > Hello, > > I have a small server which runs FreeBSD 9.1 and I've is set up as > NFSv4 server with kerberised NFS access. My clients are linux > machines. It almost works as expected (mounting/accessing files) > except for one strange issue: it looks like group permissions on > files > and directories are being ignored. Here's an example: > > Server: > > evendim:~ % id > uid=1001(ols) gid=1001(ols) groups=1001(ols),0(wheel),60000(family) > evendim:~ % ls -l /data/file1 > -rw-rw---- 1 root family 6 4 Jul 18:42 /data/file1 > evendim:~ % cat /data/file1 > test1 > evendim:~ % ls -l /data/file2 > -rw------- 1 ols family 6 4 Jul 18:42 /data/file2 > evendim:~ % cat /data/file2 > test2 > evendim:~ % ls -l /data/file3 > -rw-r--r-- 1 root family 6 4 Jul 18:42 /data/file3 > evendim:~ % cat /data/file3 > test3 > evendim:~ % cat /etc/exports > V4:/ -sec=krb5 > /data -sec=krb5 > Well, here's the code snippet in gssd.c that does the principal/user name to gid list translation. All I can suggest is putting this in a little test program on the server and then running it as "root", to see if it generates the results you would expect. (Btw, NGRPS is defined as 16 in a .h file in /usr/include/rpc. I suspect that should be increased and the code should check for a -1 return from getgrouplist(). However, you don't seem to exceed 16 groups. It also assumes that sizeof(gid_t) == sizeof(int). A little weird and I'm not sure that is true for all arches? Did I remember to mention I wasn't the author of this?;-) if (pw) { int len = NGRPS; int groups[NGRPS]; result->gid = pw->pw_gid; getgrouplist(pw->pw_name, pw->pw_gid, groups, &len); result->gidlist.gidlist_len = len; result->gidlist.gidlist_val = mem_alloc(len * sizeof(int)); memcpy(result->gidlist.gidlist_val, groups, len * sizeof(int)); gssd_verbose_out("gssd_pname_to_uid: mapped" " to uid=%d, gid=%d\n", (int)result->uid, (int)result->gid); } else { > Client: > > sherlock:~ % id > uid=1000(ols) gid=1000(ols) > groups=1000(ols),4(adm),20(dialout),24(cdrom),25(floppy),29(audio),30(dip),44(video),46(plugdev),109(netdev),110(bluetooth),113(fuse),116(scanner),118(kismet),60000(family) > sherlock:~ % sudo mount -v -t nfs4 -o sec=krb5 > evendim.sharoyko.net:/data /mnt > mount.nfs4: timeout set for Thu Jul 4 19:52:16 2013 > mount.nfs4: trying text-based options > 'sec=krb5,addr=192.168.1.3,clientaddr=192.168.1.128' > sherlock:~ % ls -l /mnt/file1 > -rw-rw---- 1 root family 6 Jul 4 19:42 /mnt/file1 > sherlock:~ % cat /mnt/file1 > cat: /mnt/file1: Permission denied > sherlock:~ % ls -l /mnt/file2 > -rw------- 1 ols family 6 Jul 4 19:42 /mnt/file2 > sherlock:~ % cat /mnt/file2 > test2 > sherlock:~ % ls -l /mnt/file3 > -rw-r--r-- 1 root family 6 Jul 4 19:42 /mnt/file3 > sherlock:~ % cat /mnt/file3 > test3 > > As you can see file2 is inaccessible while it has group read/write > permissions, user ols belongs to group family on both client and > server and user/group mapping seems to work. /data on the server is a > ZFS filesystem but I've also tried UFS with the same results. I've > also tried ACLs and ACLs for users do work while ACLs for groups > don't > seem to have any effect. Is there something that I'm doing wrong? Is > this an expected behaviour? I will greatly appreciate if you can help > me debugging this issue. I'll quote below captured packets that are > relevant to my attempt to access file1. As you can see access is > clearly denied by server but I don't understand why. > > No. Time Source Destination > Protocol Length Info > 109 5.649608 192.168.1.128 192.168.1.3 NFS > 258 V4 Call (Reply In 110) LOOKUP DH:0x4dcc3776/file1 > > Frame 109: 258 bytes on wire (2064 bits), 258 bytes captured (2064 > bits) > Ethernet II, Src: GemtekTe_f6:cf:a1 (00:26:82:f6:cf:a1), Dst: > Giga-Byt_db:cd:c4 (90:2b:34:db:cd:c4) > Internet Protocol Version 4, Src: 192.168.1.128 (192.168.1.128), Dst: > 192.168.1.3 (192.168.1.3) > Transmission Control Protocol, Src Port: 726 (726), Dst Port: nfs > (2049), Seq: 3337, Ack: 3193, Len: 192 > Remote Procedure Call, Type:Call XID:0xba073c52 > Network File System > [Program Version: 4] > [V4 Procedure: COMPOUND (1)] > Tag: > length: 0 > contents: > minorversion: 0 > Operations (count: 4) > Opcode: PUTFH (22) > filehandle > length: 28 > [hash (CRC-32): 0x4dcc3776] > decode type as: unknown > filehandle: > 9a7470c6deedeca50a0004000000000037d80a0000000000... > Opcode: LOOKUP (15) > Filename: file1 > length: 5 > contents: file1 > fill bytes: opaque data > Opcode: GETFH (10) > Opcode: GETATTR (9) > GETATTR4args > attr_request > bitmap[0] = 0x0010011a > [5 attributes requested] > mand_attr: FATTR4_TYPE (1) > mand_attr: FATTR4_CHANGE (3) > mand_attr: FATTR4_SIZE (4) > mand_attr: FATTR4_FSID (8) > recc_attr: FATTR4_FILEID (20) > bitmap[1] = 0x0030a23a > [9 attributes requested] > recc_attr: FATTR4_MODE (33) > recc_attr: FATTR4_NUMLINKS (35) > recc_attr: FATTR4_OWNER (36) > recc_attr: FATTR4_OWNER_GROUP (37) > recc_attr: FATTR4_RAWDEV (41) > recc_attr: FATTR4_SPACE_USED (45) > recc_attr: FATTR4_TIME_ACCESS (47) > recc_attr: FATTR4_TIME_METADATA (52) > recc_attr: FATTR4_TIME_MODIFY (53) > [Main Opcode: LOOKUP (15)] > > No. Time Source Destination > Protocol Length Info > 110 5.649870 192.168.1.3 192.168.1.128 NFS > 370 V4 Reply (Call In 109) LOOKUP > > Frame 110: 370 bytes on wire (2960 bits), 370 bytes captured (2960 > bits) > Ethernet II, Src: Giga-Byt_db:cd:c4 (90:2b:34:db:cd:c4), Dst: > GemtekTe_f6:cf:a1 (00:26:82:f6:cf:a1) > Internet Protocol Version 4, Src: 192.168.1.3 (192.168.1.3), Dst: > 192.168.1.128 (192.168.1.128) > Transmission Control Protocol, Src Port: nfs (2049), Dst Port: 726 > (726), Seq: 3193, Ack: 3529, Len: 304 > Remote Procedure Call, Type:Reply XID:0xba073c52 > Network File System > [Program Version: 4] > [V4 Procedure: COMPOUND (1)] > Status: NFS4_OK (0) > Tag: > length: 0 > contents: > Operations (count: 4) > Opcode: PUTFH (22) > Status: NFS4_OK (0) > Opcode: LOOKUP (15) > Status: NFS4_OK (0) > Opcode: GETFH (10) > Status: NFS4_OK (0) > Filehandle > length: 28 > [hash (CRC-32): 0xc0a4eeb4] > decode type as: unknown > filehandle: > 9a7470c6deedeca50a00ed00000000001bb70d0000000000... > Opcode: GETATTR (9) > Status: NFS4_OK (0) > GETATTR4res > resok4 > obj_attributes > attrmask > bitmap[0] = 0x0010011a > [5 attributes requested] > mand_attr: FATTR4_TYPE (1) > mand_attr: FATTR4_CHANGE (3) > mand_attr: FATTR4_SIZE (4) > mand_attr: FATTR4_FSID (8) > recc_attr: FATTR4_FILEID (20) > bitmap[1] = 0x0030a23a > [9 attributes requested] > recc_attr: FATTR4_MODE (33) > recc_attr: FATTR4_NUMLINKS (35) > recc_attr: FATTR4_OWNER (36) > recc_attr: FATTR4_OWNER_GROUP (37) > recc_attr: FATTR4_RAWDEV (41) > recc_attr: FATTR4_SPACE_USED (45) > recc_attr: FATTR4_TIME_ACCESS (47) > recc_attr: FATTR4_TIME_METADATA (52) > recc_attr: FATTR4_TIME_MODIFY (53) > attr_vals > mand_attr: FATTR4_TYPE (1) > nfs_ftype4: NF4REG (1) > mand_attr: FATTR4_CHANGE (3) > changeid: 96 > mand_attr: FATTR4_SIZE (4) > size: 6 > mand_attr: FATTR4_FSID (8) > fattr4_fsid > fsid4.major: 3329258650 > fsid4.minor: 2783768030 > recc_attr: FATTR4_FILEID (20) > fileid: 237 > recc_attr: FATTR4_MODE (33) > fattr4_mode: 0660 > 000. .... .... .... = Unknown > .... 0... .... .... = not SUID > .... .0.. .... .... = not SGID > .... ..0. .... .... = not save > swapped text > .... ...1 .... .... = Read > permission for owner > .... .... 1... .... = Write > permission for owner > .... .... .0.. .... = no Execute > permission for owner > .... .... ..1. .... = Read > permission for group > .... .... ...1 .... = Write > permission for group > .... .... .... 0... = no Execute > permission for group > .... .... .... .0.. = no Read > permission for others > .... .... .... ..0. = no Write > permission for others > .... .... .... ...0 = no Execute > permission for others > recc_attr: FATTR4_NUMLINKS (35) > numlinks: 1 > recc_attr: FATTR4_OWNER (36) > fattr4_owner: root@id.sharoyko.net > length: 20 > contents: root@id.sharoyko.net > recc_attr: FATTR4_OWNER_GROUP (37) > fattr4_owner_group: > family@id.sharoyko.net > length: 22 > contents: family@id.sharoyko.net > fill bytes: opaque data > recc_attr: FATTR4_RAWDEV (41) > specdata1: 128 > specdata2: 123863040 > recc_attr: FATTR4_SPACE_USED (45) > space_used: 1024 > recc_attr: FATTR4_TIME_ACCESS (47) > seconds: 1372963326 > nseconds: 263434280 > recc_attr: FATTR4_TIME_METADATA (52) > seconds: 1372963379 > nseconds: 804435894 > recc_attr: FATTR4_TIME_MODIFY (53) > seconds: 1372963326 > nseconds: 264422029 > [Main Opcode: LOOKUP (15)] > > No. Time Source Destination > Protocol Length Info > 117 8.456684 192.168.1.128 192.168.1.3 NFS > 322 V4 Call (Reply In 118) OPEN DH:0x4dcc3776/file1 > > Frame 117: 322 bytes on wire (2576 bits), 322 bytes captured (2576 > bits) > Ethernet II, Src: GemtekTe_f6:cf:a1 (00:26:82:f6:cf:a1), Dst: > Giga-Byt_db:cd:c4 (90:2b:34:db:cd:c4) > Internet Protocol Version 4, Src: 192.168.1.128 (192.168.1.128), Dst: > 192.168.1.3 (192.168.1.3) > Transmission Control Protocol, Src Port: 726 (726), Dst Port: nfs > (2049), Seq: 3905, Ack: 3697, Len: 256 > Remote Procedure Call, Type:Call XID:0xbd073c52 > Network File System > [Program Version: 4] > [V4 Procedure: COMPOUND (1)] > Tag: > length: 0 > contents: > minorversion: 0 > Operations (count: 5) > Opcode: PUTFH (22) > filehandle > length: 28 > [hash (CRC-32): 0x4dcc3776] > decode type as: unknown > filehandle: > 9a7470c6deedeca50a0004000000000037d80a0000000000... > Opcode: OPEN (18) > seqid: 0x00000000 > share_access: OPEN4_SHARE_ACCESS_READ (1) > share_deny: OPEN4_SHARE_DENY_NONE (0) > clientid: 0xcd6cc75124000000 > owner: > length: 24 > contents: > Open Type: OPEN4_NOCREATE (0) > Claim Type: CLAIM_NULL (0) > Filename: file1 > length: 5 > contents: file1 > fill bytes: opaque data > Opcode: GETFH (10) > Opcode: ACCESS (3), [Check: RD MD XT XE] > Check access: 0x2d > .... ...1 = 0x01 READ: allowed? > .... .1.. = 0x04 MODIFY: allowed? > .... 1... = 0x08 EXTEND: allowed? > ..1. .... = 0x20 EXECUTE: allowed? > Opcode: GETATTR (9) > GETATTR4args > attr_request > bitmap[0] = 0x0010011a > [5 attributes requested] > mand_attr: FATTR4_TYPE (1) > mand_attr: FATTR4_CHANGE (3) > mand_attr: FATTR4_SIZE (4) > mand_attr: FATTR4_FSID (8) > recc_attr: FATTR4_FILEID (20) > bitmap[1] = 0x0030a23a > [9 attributes requested] > recc_attr: FATTR4_MODE (33) > recc_attr: FATTR4_NUMLINKS (35) > recc_attr: FATTR4_OWNER (36) > recc_attr: FATTR4_OWNER_GROUP (37) > recc_attr: FATTR4_RAWDEV (41) > recc_attr: FATTR4_SPACE_USED (45) > recc_attr: FATTR4_TIME_ACCESS (47) > recc_attr: FATTR4_TIME_METADATA (52) > recc_attr: FATTR4_TIME_MODIFY (53) > [Main Opcode: OPEN (18)] > > No. Time Source Destination > Protocol Length Info > 118 8.456811 192.168.1.3 192.168.1.128 NFS > 150 V4 Reply (Call In 117) OPEN Status: NFS4ERR_ACCES > > Frame 118: 150 bytes on wire (1200 bits), 150 bytes captured (1200 > bits) > Ethernet II, Src: Giga-Byt_db:cd:c4 (90:2b:34:db:cd:c4), Dst: > GemtekTe_f6:cf:a1 (00:26:82:f6:cf:a1) > Internet Protocol Version 4, Src: 192.168.1.3 (192.168.1.3), Dst: > 192.168.1.128 (192.168.1.128) > Transmission Control Protocol, Src Port: nfs (2049), Dst Port: 726 > (726), Seq: 3697, Ack: 4161, Len: 84 > Remote Procedure Call, Type:Reply XID:0xbd073c52 > Network File System > [Program Version: 4] > [V4 Procedure: COMPOUND (1)] > Status: NFS4ERR_ACCES (13) > Tag: > length: 0 > contents: > Operations (count: 2) > Opcode: PUTFH (22) > Status: NFS4_OK (0) > Opcode: OPEN (18) > Status: NFS4ERR_ACCES (13) > [Main Opcode: OPEN (18)] > > Kind regards, > -- > Oleg > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > From owner-freebsd-fs@FreeBSD.ORG Fri Jul 5 02:36:46 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id B15B3766 for ; Fri, 5 Jul 2013 02:36:46 +0000 (UTC) (envelope-from will@firepipe.net) Received: from mail-ve0-f174.google.com (mail-ve0-f174.google.com [209.85.128.174]) by mx1.freebsd.org (Postfix) with ESMTP id 756481AB2 for ; Fri, 5 Jul 2013 02:36:46 +0000 (UTC) Received: by mail-ve0-f174.google.com with SMTP id oz10so1446708veb.33 for ; Thu, 04 Jul 2013 19:36:45 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:x-gm-message-state; bh=E32CYtBMnN2Ntm0MmJez+xXgfBwhNxJ9dn4IGw4g1L4=; b=EWIof/OKRCKmANo5kd4u5oDac9baiJuVjm4uHOVNSwKdZKBb97UEc/vMovYUEJj09b 6uA6eQZED/VIFOk+UBijHE6zehKHZ0CNi4GXuNScdNQIzkMfJ5RhsSg5nRZAcD531hYv KJeSUMMg/zzMoau/YzIY6twwyxSl16W7a68vI209rnzTHZ9/CSrtK/fEbA6ObbgzyT4v XyVmv7BFRkjiAA5/WFmNqDNfQBBQOVEb17Bv6OqONmaJA/ATMTBiq1FbYnZ2DBsYrBR6 WWUN9BDrMNVKTltVkRjqFO9+H5j1baTbUQgHfoTSEAwUzm0M68NZc7pHYqf9wV6y7dLK 5BUw== MIME-Version: 1.0 X-Received: by 10.52.65.10 with SMTP id t10mr4222185vds.90.1372991805534; Thu, 04 Jul 2013 19:36:45 -0700 (PDT) Received: by 10.58.226.66 with HTTP; Thu, 4 Jul 2013 19:36:45 -0700 (PDT) In-Reply-To: References: <87li5o5tz2.wl%berend@pobox.com> Date: Thu, 4 Jul 2013 20:36:45 -0600 Message-ID: Subject: Re: EBS snapshot backups from a FreeBSD zfs file system: zpool freeze? From: Will Andrews To: Steven Hartland Content-Type: text/plain; charset=UTF-8 X-Gm-Message-State: ALoCoQmxboSrslQ2ytwh81Pj/JW4Nj4yGZcN6Yqz+Mh7ANxds4GYa6Hjf+4gUb26p7VgeQ/xlYJq Cc: "freebsd-fs@freebsd.org" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Jul 2013 02:36:46 -0000 On Wed, Jul 3, 2013 at 6:13 PM, Steven Hartland wrote: > Not been following the thread really so excuse if this has already > been mentioned ;-) > > There is a zpool freeze which stops spa_sync() from doing > anything, so that the only way to record changes is on the ZIL. > > The comment in the zpool_main is: "'freeze' is a vile debugging > abomination" so it's evil but might be what you want if you're up to > writing some code. zpool freeze is a debugging-only command, as the comment suggests. It is not really of much use outside of testing changes to ZIL code. Once run, the only thing you can do to get normal I/O running again is to export the pool and import it again. The point of the command is to ensure that ZIL blocks exist on a pool when it is exported, so they are guaranteed to have to be replayed on import. It is used in the STF test suite for the express purpose of testing the ZIL replay. They write some stuff, freeze the pool, write some more stuff, export the pool, use zdb to check for ZIL blocks, then import it and check again, both to see that the changes were applied, and to see that the ZIL blocks are gone. Most likely, doing something more like Berend wants requires a slightly different approach. --Will. From owner-freebsd-fs@FreeBSD.ORG Fri Jul 5 07:43:37 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 5A329A8A for ; Fri, 5 Jul 2013 07:43:37 +0000 (UTC) (envelope-from daniel@digsys.bg) Received: from smtp-sofia.digsys.bg (smtp-sofia.digsys.bg [193.68.21.123]) by mx1.freebsd.org (Postfix) with ESMTP id C855B18CF for ; Fri, 5 Jul 2013 07:43:36 +0000 (UTC) Received: from dcave.digsys.bg (dcave.digsys.bg [193.68.6.1]) (authenticated bits=0) by smtp-sofia.digsys.bg (8.14.6/8.14.6) with ESMTP id r657hSYe045323 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO) for ; Fri, 5 Jul 2013 10:43:28 +0300 (EEST) (envelope-from daniel@digsys.bg) Message-ID: <51D67920.5030800@digsys.bg> Date: Fri, 05 Jul 2013 10:43:28 +0300 From: Daniel Kalchev User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130627 Thunderbird/17.0.7 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Re: Slow resilvering with mirrored ZIL References: <51D42107.1050107@digsys.bg> <2EF46A8C-6908-4160-BF99-EC610B3EA771@alumni.chalmers.se> <51D437E2.4060101@digsys.bg> <20130704000405.GA75529@icarus.home.lan> <20130704171637.GA94539@icarus.home.lan> <2A261BEA-4452-4F6A-8EFB-90A54D79CBB9@alumni.chalmers.se> <20130704191203.GA95642@icarus.home.lan> In-Reply-To: <20130704191203.GA95642@icarus.home.lan> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Jul 2013 07:43:37 -0000 On 04.07.13 22:12, Jeremy Chadwick wrote: > I believe -- but I need someone else to chime in here with confirmation, > particularly someone who is familiar with ZFS's internals -- once your > pool is ashift 12, you can do a disk replacement ***without*** having to > do the gnop procedure (because the pool itself is already using ashift > 12). But again, I need someone to confirm that. I do not in any way claim to know well the ZFS internals, but can confirm this: Once you have an ZFS vdev 4k aligned (ashift=12), you can replace drives in there and the vdev will stay 4k aligned. In ZFS, the alignment is per-vdev, not per device, not per zpool. When creating a new vdev, ZFS looks for the largest sector size as the underlying storage supports it and uses it. This is why you only need to apply the gnop trick to just one of the drives. Once the vdev is created, it pretty much does not care what the underlying storage reports. > On these drives there are ways to work around this issue -- it > specifically involves disabling drive-level APM. To do so, you have to > initiate a specific ATA CDB to the drive using "camcontrol cmd", and > this has to be done every time the system reboots. There is one > drawback to disabling APM as well: the drives run hotter. There is a way to do this with smartmontools as well, either with smartctl or smartd (which is wise thing to run anyway). Look for the -g option and the apm sub-option. Sometimes, for example when you have ATA devices connected trough SAS backplanes and HBAs you can't send them these commands via camcontrol. > These SSDs need a full Secure Erase done to them. In stable/9 you can > do this through camcontrol, otherwise you need to use Linux (there are > live CD/DVD distros that can do this for you) or the vendor's native > utilities (in Windows usually). ZFS in stable/9 actually does full TRIM when you attach a new device, which can be observed/confirmed via the TRIM statistics counters. You don't need to use any external utilities. > UNDERSTAND: THIS IS NOT THE SAME AS A "DISK FORMAT" OR "ZEROING THE > DISK". In fact, dd if=/dev/zero to zero an SSD would be the worst > possible thing you could do to it. Secure Erase clears the entire FTL > and resets the wear levelling matrix (that's just what I call it) back > to factory defaults, so you end up with out-of-the-box performance: > there's no more LBA-to-NAND-cell map entries in the FTL (which are > usually what are responsible for slowdown). I do not believe Secure Erase does what you propose. It more or less just does full device TRIM. Resetting things to factory defaults won't make any vendor happy, because they base their SSD warranties on the wear level. Anyway, if you know of a way to trick this, I am all ears :) > Your Intel drive is very very small, and in fact I wouldn't even bother > to use this drive -- it means you'd only be able to use roughly 14GB of > it (at most) for data, and leave the remaining 6GB unallocated/unused > solely for wear levelling. An small SLC FLASH based drive might be worth more than a large MLC based drive... Just saying. The SLOG rarely fills the drive and if you use TRIM, you should be safe. > What you're not taking into consideration is how log and cache devices > bottleneck ZFS, in addition to the fact that SATA is not like SAS when > it comes to simultaneous R/W. That poor OCZ drive... With proper setup, there is really no bottleneck. For the cache device, it is advisable to set vfs.zfs.l2arc_norw=0 As otherwise data will not be read from the L2ARC while something is written there. This is problematic with metadata, for other data, you just don't get the performance you could form having a SSD. For mixing SLOG and L2ARC.. I always think it is a bad idea to do so. There are two reasons to have SLOG: 1. To reduce latency. By combinning SLOG and L2ARC on the same device you might not have enough IOPS in order to have low latency and consumer grade SSDs tend to not have consistent latency anyway. Some newer drives are promising, for example the OCZ Vector, or better yet the Intel S3500/S3700. 2. To reduce ZFS pool fragmentation. This is very important and often very much overlooked by everyone. If you want ZFS to perform well, you are better to have separate LOG device even if it is on an rotating disk (you only lose the low latency!). ZFS pool fragmentation might be a problem for long lived pools. Mirroring the SLOG is just a safeguard, for not losing the last few seconds of writing really important data. But, if you could afford it, just do it. Considering the small size of this pool however, I do not believe using one SSD for both SLOG and L2ARC might be serious bottleneck, unless real-life observation says otherwise. Daniel From owner-freebsd-fs@FreeBSD.ORG Fri Jul 5 08:02:04 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 92F81FC3 for ; Fri, 5 Jul 2013 08:02:04 +0000 (UTC) (envelope-from daniel@digsys.bg) Received: from smtp-sofia.digsys.bg (smtp-sofia.digsys.bg [193.68.21.123]) by mx1.freebsd.org (Postfix) with ESMTP id 0C2DF1974 for ; Fri, 5 Jul 2013 08:02:03 +0000 (UTC) Received: from dcave.digsys.bg (dcave.digsys.bg [193.68.6.1]) (authenticated bits=0) by smtp-sofia.digsys.bg (8.14.6/8.14.6) with ESMTP id r65821GO053189 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO) for ; Fri, 5 Jul 2013 11:02:02 +0300 (EEST) (envelope-from daniel@digsys.bg) Message-ID: <51D67D79.3030403@digsys.bg> Date: Fri, 05 Jul 2013 11:02:01 +0300 From: Daniel Kalchev User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130627 Thunderbird/17.0.7 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Re: Slow resilvering with mirrored ZIL References: <2EF46A8C-6908-4160-BF99-EC610B3EA771@alumni.chalmers.se> <51D437E2.4060101@digsys.bg> <20130704000405.GA75529@icarus.home.lan> <20130704171637.GA94539@icarus.home.lan> <2A261BEA-4452-4F6A-8EFB-90A54D79CBB9@alumni.chalmers.se> <20130704191203.GA95642@icarus.home.lan> <43015E9015084CA6BAC6978F39D22E8B@multiplay.co.uk> <20130704202818.GB97119@icarus.home.lan> In-Reply-To: <20130704202818.GB97119@icarus.home.lan> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Jul 2013 08:02:04 -0000 On 04.07.13 23:28, Jeremy Chadwick wrote: > > I'm not sure of the impact in situations like "I had a vdev made long > ago (ashift 9), then I added a new vdev to the pool (ashift 12) and now > ZFS is threatening to murder my children..." :-) Such an situation led me to spend few months to recreate/reshuffle some 40TB of snapshots -- mostly because I was lazy to build an new system to copy to and the old one didn't have enough spare slots... To make things more interesting, I had made the ashift=9 vdev on 4k aligned drives and the ashift=12 vdev on 512b aligned drives... Which brings up the question, whether it is possible to rollback this new vdev creation easily -- errors happen... > But it would also be doing a TRIM of the LBA ranges associated with each > partition, rather than the entire SSD. > > Meaning, in the example I gave (re: leaving untouched/unpartitioned > space at the end of the drive for wear levelling), this would result in > the untouched/unpartitioned space never being TRIM'd (by anything), thus > the FTL map would still have references to those LBA ranges. That'd be > potentially 30% of LBA ranges in the FTL (depending on past I/O of > course -- no way to know), and the only thing that would work that out > is the SSD's GC (which is known to kill performance if/when it kicks > in). This assumes some knowledge of how SSD drives operate. Which might be true for one model/maker and not true for another. No doubt, starting with an clean drive is best. That might be achieved by adding the entire drive to ZFS, then removing it -- a cheap way to get "Secure Erase" effect on FreeBSD. Then go on with partitioning... > Hmm, that gives me an idea actually -- if gpart(8) itself had a flag to > induce TRIM for the LBA range of whatever was just created (gpart > create) or added (gpart add). That way you could actually induce TRIM > on those LBA ranges rather than rely on the FS to do it, or have to put > faith into the SSD's GC (I rarely do :P). In the OP's case he could > then make a freebsd-zfs partition filling up the remaining 30% with the > flag to TRIM it, then when that was done immediately delete the > partition. Hmm, not sure if what I'm saying makes sense or not, or if > that's even a task/role gpart(8) should have... Not a bad idea. Really. :) > >> ... >>> Next topic... >>> >>> I would strongly recommend you not use 1 SSD for both log and cache. >>> I understand your thought process here: "if the SSD dies, the log >>> devices are mirrored so I'm okay, and the cache is throw-away anyway". >> While not ideal it still gives a significant boost against no SLOG, so >> if thats what HW you have to work with, don't discount the benefit it >> will bring. > Sure, the advantage of no seek times due to NAND plays a big role, but > some of these drives don't particularly perform well when used with a > larger I/O queue depth. IF we talk about the SLOG, there are no seeks. The SLOG is written sequentially. You *can* use an spinning drive for SLOG and you *will* see noticeable performance boost in doing so. The L2ARC on the other hand is especially designed to use no-seek SSDs, as it will do many small and scattered reads. Writes are still sequential, I believe.. > Now consider this: the Samsung 840 256GB (not the Pro) costs US$173 > and will give you 2x the performance of that Intel drive -- and more > importantly, 12x the capacity (that means 30% for wear levelling is > hardly a concern). The 840 also performs significantly better at higher > queue depths. I'm just saying that for about US$40 more you get > something that is by far better and will last you longer. Low-capacity > SSDs, even if SLC, are incredibly niche and I'm still not sure what > demographic they're catering to. The non-Pro 840 is hardly a match to any SLC SSD. Remember, SLC is all about endurance. It is order(s) of magnitude more enduring than the TLC flash used in that cheap consumer drive. IOPS and interface speed are different things -- that might not be of concern here. Nevertheless, I have recently began to view SSDs in SLOG/L2ARC as consumables... however, no matter how I calculate, the enterprise drives always win by a big margin... > I'm making a lot of assumptions about his I/O workload too, of course. > I myself tend to stay away from cache/log devices for the time being > given that my workloads don't necessitate them. Persistent cache (yeah > I know it's on the todo list) would interest me since the MCH on my > board is maxed out at 8GB. In short... be careful. :) Don't be tempted to add too large of an L2ARC with only 8GB of RAM. :) Daniel From owner-freebsd-fs@FreeBSD.ORG Fri Jul 5 09:04:38 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 08F32308 for ; Fri, 5 Jul 2013 09:04:38 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 52E4F1D8E for ; Fri, 5 Jul 2013 09:04:36 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id MAA26697; Fri, 05 Jul 2013 12:04:13 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1Uv1w5-000JAK-0I; Fri, 05 Jul 2013 12:04:13 +0300 Message-ID: <51D68BD5.5080403@FreeBSD.org> Date: Fri, 05 Jul 2013 12:03:17 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130405 Thunderbird/17.0.5 MIME-Version: 1.0 To: Dmitry Morozovsky Subject: Re: boot from ZFS: which pool types use? References: <51D56066.1020902@FreeBSD.org> <51D577A9.1030304@gmail.com> <51D59AAD.3030208@FreeBSD.org> <51D5A20F.4070103@FreeBSD.org> <51D5C968.2000803@FreeBSD.org> <51D5D5AA.8070807@FreeBSD.org> In-Reply-To: X-Enigmail-Version: 1.5.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Jul 2013 09:04:38 -0000 on 04/07/2013 23:09 Dmitry Morozovsky said the following: > On Thu, 4 Jul 2013, Andriy Gapon wrote: > >>>> Really? >>>> http://ru.kyivbsd.org.ua/arhiv/2012/kyivbsd12-gapon-zfs.pdf?attredirects=0&d=1 >>> >>> Hmm. I do not see (maybe I'm missing something obvious) how can one select at >>> gptzfsboot or similar prompt the pool to load loader (not too much 'load's, >>> heh?) from... >> >> Page 18. >> >> FreeBSD/x86 boot >> Default: pool1:ROOT/test1:/boot/zfsloader >> boot: pool2:ROOT/knowngood:/boot/zfsloader.old >> >> BTW, the syntax now is changed to use more familiar dataset names, e.g. >> "pool2/ROOT/knowngood" instead of "pool2:ROOT/knowngood". > > Wow. This should be documented loud, then. Possibly in the prompt too. This should be documented in the manual page(s), yes. -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Fri Jul 5 10:30:47 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 80893E6F for ; Fri, 5 Jul 2013 10:30:47 +0000 (UTC) (envelope-from daniel@digsys.bg) Received: from smtp-sofia.digsys.bg (smtp-sofia.digsys.bg [193.68.21.123]) by mx1.freebsd.org (Postfix) with ESMTP id 045CC111B for ; Fri, 5 Jul 2013 10:30:46 +0000 (UTC) Received: from dcave.digsys.bg (dcave.digsys.bg [193.68.6.1]) (authenticated bits=0) by smtp-sofia.digsys.bg (8.14.6/8.14.6) with ESMTP id r65AUiOZ023679 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO) for ; Fri, 5 Jul 2013 13:30:45 +0300 (EEST) (envelope-from daniel@digsys.bg) Message-ID: <51D6A054.2070704@digsys.bg> Date: Fri, 05 Jul 2013 13:30:44 +0300 From: Daniel Kalchev User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130627 Thunderbird/17.0.7 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Re: Slow resilvering with mirrored ZIL References: <2EF46A8C-6908-4160-BF99-EC610B3EA771@alumni.chalmers.se> <51D437E2.4060101@digsys.bg> <20130704000405.GA75529@icarus.home.lan> <20130704171637.GA94539@icarus.home.lan> <2A261BEA-4452-4F6A-8EFB-90A54D79CBB9@alumni.chalmers.se> <20130704191203.GA95642@icarus.home.lan> <43015E9015084CA6BAC6978F39D22E8B@multiplay.co.uk> <20130704202818.GB97119@icarus.home.lan> <8280798FCEE74CB08536F3BC43C9207F@multiplay.co.uk> In-Reply-To: <8280798FCEE74CB08536F3BC43C9207F@multiplay.co.uk> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Jul 2013 10:30:47 -0000 On 05.07.13 02:21, Steven Hartland wrote: > > To give a concrete example which may well be of use for others, we > had a mysql box here with dual 60GB SSD L2ARC's whic after continuous > increases in query write traffic we ended up with total IO saturation. > > As a test we removed the L2ARC, partitioning them into a 10GB SLOG and > 40GB L2ARC and the machine was utterly transformed, from constant 100% > disk IO to 10% as the SLOG's soaked up the sync transfers from mysql. Also, you removed the ZFS fragmentation issue, which can severely impact performance. Unfortunately, this removes fragmentation for newly written data only, old data remains fragmented --- one can only dream for the day when the "block pointer rewrite" promise becomes reality... it could do wonders to ZFS. Alternatively, my proposal is for an background rewrite of ZFS blocks, that could at least achieve: re-distributing blocks to newly added vdevs, "fixing" compression/dedup of datasets, cleaning up fragmented data etc. It should be relatively easy to be implemented... (I am, unfortunately not offering coding help due to lack of time) Daniel From owner-freebsd-fs@FreeBSD.ORG Fri Jul 5 10:38:00 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id DA1F4113 for ; Fri, 5 Jul 2013 10:38:00 +0000 (UTC) (envelope-from daniel@digsys.bg) Received: from smtp-sofia.digsys.bg (smtp-sofia.digsys.bg [193.68.21.123]) by mx1.freebsd.org (Postfix) with ESMTP id 5CBAB1182 for ; Fri, 5 Jul 2013 10:38:00 +0000 (UTC) Received: from dcave.digsys.bg (dcave.digsys.bg [193.68.6.1]) (authenticated bits=0) by smtp-sofia.digsys.bg (8.14.6/8.14.6) with ESMTP id r65AbwfO025971 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO) for ; Fri, 5 Jul 2013 13:37:59 +0300 (EEST) (envelope-from daniel@digsys.bg) Message-ID: <51D6A206.2020303@digsys.bg> Date: Fri, 05 Jul 2013 13:37:58 +0300 From: Daniel Kalchev User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130627 Thunderbird/17.0.7 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Re: Slow resilvering with mirrored ZIL References: <51D42107.1050107@digsys.bg><2EF46A8C-6908-4160-BF99-EC610B3EA771@alumni.chalmers.se><51D437E2.4060101@digsys.bg><20130704000405.GA75529@icarus.home.lan><20130704171637.GA94539@icarus.home.lan><2A261BEA-4452-4F6A-8EFB-90A54D79CBB9@alumni.chalmers.se><20130704191203.GA95642@icarus.home.lan><43015E9015084CA6BAC6978F39D22E8B@multiplay.co.uk> <3CFB4564D8EB4A6A9BCE2AFCC5B6E400@multiplay.co.uk> In-Reply-To: <3CFB4564D8EB4A6A9BCE2AFCC5B6E400@multiplay.co.uk> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Jul 2013 10:38:00 -0000 On 05.07.13 02:28, Steven Hartland wrote: > > > If anyone wants my current patches which add switch to 4k ashift by > default > as a sysctl + works with QUIRKS too, just let me know. > > They are well tested, just we want more options before putting in the > tree. Is it not easier to add this as an option to zpool create, instead of an sysctl? That is, I believe we have two scenarios here: 1. Having an sysctl that instructs ZFS to look at the FreeBSD quirks to decide what the ashift should be, instead of only querying the 'sectorsize' property of the storage. I believe we might not even need an sysctl here, just make it default to obey the quirks --- but sysctl for the interim period will not hurt (with the proper default). 2. Have an option to zpool create and zpool add, that specifies the ashift value. Here my thinking is that it should let you specify an ashift equal or larger than the computed one, which is based on the largest sector size of all devices in a vdev. Don't know, but always wondered.. how hard is it to change the ashift value on the fly? Does it impact reads of data already on the vdev, or does it impact only writes? If only writes, it should be trivial, really.... Daniel From owner-freebsd-fs@FreeBSD.ORG Fri Jul 5 11:14:07 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 6A67F8AB for ; Fri, 5 Jul 2013 11:14:07 +0000 (UTC) (envelope-from prvs=1898728ac9=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id EF2FC12FB for ; Fri, 5 Jul 2013 11:14:06 +0000 (UTC) Received: from r2d2 ([82.69.141.170]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50004739427.msg for ; Fri, 05 Jul 2013 12:13:59 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Fri, 05 Jul 2013 12:13:59 +0100 (not processed: message from valid local sender) X-MDDKIM-Result: neutral (mail1.multiplay.co.uk) X-MDRemoteIP: 82.69.141.170 X-Return-Path: prvs=1898728ac9=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-fs@freebsd.org Message-ID: <13061C613C1B4FE597DA5984CAFBA14C@multiplay.co.uk> From: "Steven Hartland" To: "Daniel Kalchev" , References: <51D42107.1050107@digsys.bg><2EF46A8C-6908-4160-BF99-EC610B3EA771@alumni.chalmers.se><51D437E2.4060101@digsys.bg><20130704000405.GA75529@icarus.home.lan><20130704171637.GA94539@icarus.home.lan><2A261BEA-4452-4F6A-8EFB-90A54D79CBB9@alumni.chalmers.se><20130704191203.GA95642@icarus.home.lan><43015E9015084CA6BAC6978F39D22E8B@multiplay.co.uk> <3CFB4564D8EB4A6A9BCE2AFCC5B6E400@multiplay.co.uk> <51D6A206.2020303@digsys.bg> Subject: Re: Slow resilvering with mirrored ZIL Date: Fri, 5 Jul 2013 12:14:09 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=response Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Jul 2013 11:14:07 -0000 ----- Original Message ----- From: "Daniel Kalchev" To: Sent: Friday, July 05, 2013 11:37 AM Subject: Re: Slow resilvering with mirrored ZIL > > On 05.07.13 02:28, Steven Hartland wrote: >> >> >> If anyone wants my current patches which add switch to 4k ashift by >> default >> as a sysctl + works with QUIRKS too, just let me know. >> >> They are well tested, just we want more options before putting in the >> tree. > > Is it not easier to add this as an option to zpool create, instead of an > sysctl? That wouldn't achieve what I wanted which was to clamp all pools created to a min of XYZ, instead of having to remember to always add the 4K option to every pool command. This doesn't remove the requirement for an option to zpool create, zpool add etc, which is what I still need to implement. > That is, I believe we have two scenarios here: > > 1. Having an sysctl that instructs ZFS to look at the FreeBSD quirks to > decide what the ashift should be, instead of only querying the > 'sectorsize' property of the storage. I believe we might not even need > an sysctl here, just make it default to obey the quirks --- but sysctl > for the interim period will not hurt (with the proper default). While most people will want this behaviour some may not so I currently have: vfs.zfs.min_create_ashift: Minimum ashift used when creating new pools vfs.zfs.vdev.optimal_ashift: Enable/disable optimal ashift usage on initialisation > 2. Have an option to zpool create and zpool add, that specifies the > ashift value. Here my thinking is that it should let you specify an > ashift equal or larger than the computed one, which is based on the > largest sector size of all devices in a vdev. Exactly whats planned. > Don't know, but always wondered.. how hard is it to change the ashift > value on the fly? Does it impact reads of data already on the vdev, or > does it impact only writes? If only writes, it should be trivial, really.... I've not looked in depth but considering even adding an device with a larger ashift requirement is prevented I'd say this is going to be none trivial. Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Fri Jul 5 11:17:52 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id D8C8AABC; Fri, 5 Jul 2013 11:17:52 +0000 (UTC) (envelope-from fidaj@ukr.net) Received: from fsm2.ukr.net (fsm2.ukr.net [195.214.192.121]) by mx1.freebsd.org (Postfix) with ESMTP id 89FDE1326; Fri, 5 Jul 2013 11:17:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=ukr.net; s=fsm; h=Content-Transfer-Encoding:Content-Type:Mime-Version:Message-ID:Subject:Cc:To:From:Date; bh=n9Y9GHue6Ba6lwwjDgMaQTv4aXKJmk0LN9jkdKo2Uo4=; b=tD33kqP6mUVRj2xLkrYS6MIkC8m0v3TRjkXU2XgFuDexiIZ3O0iE8JTT3GDDRwweMTFGpR2SXk0FOTJPH2QvO4RS09WmTqpcCKHj4LnHv88J0ImueEjHhF9r3BbE+vt+o+6sFK93XDZrIkI1r3gF6nChqMnACbrRbu+pbgqQuiw=; Received: from [178.137.138.140] (helo=nonamehost.local) by fsm2.ukr.net with esmtpsa ID 1Uv41G-000871-Gm ; Fri, 05 Jul 2013 14:17:42 +0300 Date: Fri, 5 Jul 2013 14:17:40 +0300 From: Ivan Klymenko To: bug-followup@FreeBSD.org, fidaj@ukr.net Subject: Re: kern/180236: [zfs] [nullfs] Leakage free space using ZFS with nullfs on 9.1-STABLE Message-ID: <20130705141741.757c3611@nonamehost.local> X-Mailer: Claws Mail 3.9.2 (GTK+ 2.24.19; amd64-portbld-freebsd10.0) Face: iVBORw0KGgoAAAANSUhEUgAAADAAAAAwBAMAAAClLOS0AAAAFVBMVEWpqak/Pz/i4uIfHx8GBwZwcHAQEBA6o92AAAACHElEQVQ4jWWUTY7bMAyF6QzUPSEoa8PFHEBgqwuM4bVVg7MvZOj+R+ijpMTpjIwgkT7z75EKrdfattpXERG6zqvUOtAr2LCRYfEKcB4l/Q+2cc6XjQH7hv+2YZYreIk5nevZEPvuzUzptizHLzgDMnC5Wpbl7ewJlOEqlQF+DlCjgVLki0WV6FMDMsBxjlJiQulIznwZ+DxHiQyDyIg0wN3Oo6o6ZQ5s5AIfar+W2Wlmz+kCcb8tg6j3voMEwNrBQk69dDBDqw/urpqJH+m+Q6u/4QnoAeYpnUXC/s1iup9rhCd6xMgAqdDyAyFegbKkVAHeLCcOulPLawaoUIDos4M88iLNrVkU7uu5ccTDO6naJzWLum51C6Yb7y4HKKbdArLWir0PBiS8glJRBZHeyHl7J9lENpAC6qT9NlNG4u5hsVYDyJP6mlJJtY3oVju4WSUzHal1sDU17NASoBWSk40J2eBLBJhYrVmzC5gVALGpNIAiQgN6eGstOp9Oa6zFbbLTISYi28BGZDRUJKWeroECkCEkzXjUtbmmaKMfAx2RfbT69/cO+tgHcmx6AfyZOmj3NDIah0F0GB66d4CrdIoplNFFGHSpSheRxbo0W4S8azNItEoMWbw3uXAeJgCrmX5joz7CGXqSg6PcryEhnFr/C1C2ntPxBOYbdwY+8dO3+wZJyFlbMX9s8zNnvp/tLwAv03NB4j3HVpn8Awwm+GrlP6MVAAAAAElFTkSuQmCC Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Authentication-Result: IP=178.137.138.140; mail.from=fidaj@ukr.net; dkim=pass; header.d=ukr.net Cc: freebsd-fs@FreeBSD.org, Steven Hartland X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Jul 2013 11:17:52 -0000 I confirm - this patch fixes this problem by 9.1-STABLE Thanks to all. PR kern/180236 may be closed. From owner-freebsd-fs@FreeBSD.ORG Fri Jul 5 11:20:02 2013 Return-Path: Delivered-To: freebsd-fs@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 7489FBAD for ; Fri, 5 Jul 2013 11:20:02 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id 4CF3A1350 for ; Fri, 5 Jul 2013 11:20:02 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id r65BK1B4046657 for ; Fri, 5 Jul 2013 11:20:01 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.7/8.14.7/Submit) id r65BK1rd046656; Fri, 5 Jul 2013 11:20:01 GMT (envelope-from gnats) Date: Fri, 5 Jul 2013 11:20:01 GMT Message-Id: <201307051120.r65BK1rd046656@freefall.freebsd.org> To: freebsd-fs@FreeBSD.org Cc: From: Ivan Klymenko Subject: Re: kern/180236: [zfs] [nullfs] Leakage free space using ZFS with nullfs on 9.1-STABLE X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: Ivan Klymenko List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Jul 2013 11:20:02 -0000 The following reply was made to PR kern/180236; it has been noted by GNATS. From: Ivan Klymenko To: bug-followup@FreeBSD.org, fidaj@ukr.net Cc: "Steven Hartland" , freebsd-fs@FreeBSD.org, "Konstantin Belousov" Subject: Re: kern/180236: [zfs] [nullfs] Leakage free space using ZFS with nullfs on 9.1-STABLE Date: Fri, 5 Jul 2013 14:17:40 +0300 I confirm - this patch fixes this problem by 9.1-STABLE Thanks to all. PR kern/180236 may be closed. From owner-freebsd-fs@FreeBSD.ORG Fri Jul 5 14:53:53 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id AE145F35 for ; Fri, 5 Jul 2013 14:53:53 +0000 (UTC) (envelope-from jdc@koitsu.org) Received: from relay5-d.mail.gandi.net (relay5-d.mail.gandi.net [217.70.183.197]) by mx1.freebsd.org (Postfix) with ESMTP id 5515F10DA for ; Fri, 5 Jul 2013 14:53:53 +0000 (UTC) Received: from mfilter10-d.gandi.net (mfilter10-d.gandi.net [217.70.178.139]) by relay5-d.mail.gandi.net (Postfix) with ESMTP id 9E8A241C074; Fri, 5 Jul 2013 16:53:36 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at mfilter10-d.gandi.net Received: from relay5-d.mail.gandi.net ([217.70.183.197]) by mfilter10-d.gandi.net (mfilter10-d.gandi.net [10.0.15.180]) (amavisd-new, port 10024) with ESMTP id 3NYKojwuXQkh; Fri, 5 Jul 2013 16:53:35 +0200 (CEST) X-Originating-IP: 76.102.14.35 Received: from jdc.koitsu.org (c-76-102-14-35.hsd1.ca.comcast.net [76.102.14.35]) (Authenticated sender: jdc@koitsu.org) by relay5-d.mail.gandi.net (Postfix) with ESMTPSA id 594CE41C05C; Fri, 5 Jul 2013 16:53:34 +0200 (CEST) Received: by icarus.home.lan (Postfix, from userid 1000) id 5C22173A31; Fri, 5 Jul 2013 07:53:32 -0700 (PDT) Date: Fri, 5 Jul 2013 07:53:32 -0700 From: Jeremy Chadwick To: Daniel Kalchev Subject: Re: Slow resilvering with mirrored ZIL Message-ID: <20130705145332.GA5449@icarus.home.lan> References: <20130704000405.GA75529@icarus.home.lan> <20130704171637.GA94539@icarus.home.lan> <2A261BEA-4452-4F6A-8EFB-90A54D79CBB9@alumni.chalmers.se> <20130704191203.GA95642@icarus.home.lan> <43015E9015084CA6BAC6978F39D22E8B@multiplay.co.uk> <3CFB4564D8EB4A6A9BCE2AFCC5B6E400@multiplay.co.uk> <51D6A206.2020303@digsys.bg> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <51D6A206.2020303@digsys.bg> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Jul 2013 14:53:53 -0000 On Fri, Jul 05, 2013 at 01:37:58PM +0300, Daniel Kalchev wrote: > > On 05.07.13 02:28, Steven Hartland wrote: > > > > > >If anyone wants my current patches which add switch to 4k ashift > >by default > >as a sysctl + works with QUIRKS too, just let me know. > > > >They are well tested, just we want more options before putting in > >the tree. > > Is it not easier to add this as an option to zpool create, instead > of an sysctl? > > That is, I believe we have two scenarios here: > > 1. Having an sysctl that instructs ZFS to look at the FreeBSD quirks > to decide what the ashift should be, instead of only querying the > 'sectorsize' property of the storage. I believe we might not even > need an sysctl here, just make it default to obey the quirks --- but > sysctl for the interim period will not hurt (with the proper > default). I can expand on this one (specifically "relying on sectorsize of the media"): no, this will not work reliably, for two reasons. Hear me out: 1. You're operating under the assumption that every disk/device advertises both logical and physical sector sizes separately. That is far from the case. I know you're aware that all devices advertise a logical size of 512 (even if they are 4K physical) to remain fully compatible with legacy OSes, but what a lot of people don't know is that many disks (I'm speaking about ATA here because I don't do much with SAS/SCSI) don't implement the necessary bits of ATA IDENTIFY CDB result that defines separate logical and physical sector sizes (per T13 ATA-8 Working Draft specification). The problem is that these vendors see the Working Draft as "beta/alpha" and therefore don't bother honouring some of the more useful (I would say critical in this case) features of it -- like this one. :-( You will find many disks on the market today -- including SSDs -- that are like this. Some are even big-name brands you would expect better from. If I had to guess, I would say probably 30% of them this way; it's a substantial number. If you want some examples/proof, or want to see the spec yourself, just let me know and I can give you some/point you to the relevant docs. 2. A common rebuttal is "well that's what quirks are for". Absolutely! But how do you think those quirks are added? When someone tells a committer "hey, there's a new disk out which doesn't implement physical sector size in ATA IDENTIFY, here's a patch". Otherwise it never gets added. And sometimes that addition takes months given people's FreeBSD time vs. real life time. So in effect quirks are *always* outdated. I'm not "damning FreeBSD", I'm just stating that this is the reality of the situation. For example -- only recently in stable/9 (maybe stable/8, didn't look) were quirks added for Intel SSDs which came to market over 2 years ago (BTW thanks for adding those Steve). Even if there was a way to rectify that scenario in an efficient manner (the quirks right now are hard-coded in kernel space), it wouldn't change this scenario: - User buys a disk which advertises logical only/lacks 4K quirks; say the disk was RTM 2 years ago - Installs FreeBSD on it / uses it, lots of data on it now - Notices performance problems or "other anomalies" (this thread is an acceptable example, although there are literally 8 or 9 problems going on with this situation that are all compounded) - User posts on FreeBSD forum or mailing list asking for help, not sure what the problem is (very common) - Response from community/devs is: "you get to repartition/reinstall your entire OS". I know *I* sure wouldn't want to be told that... I've pondered a some solutions to this dilemma, but really none of them are plausible/realistic/have too many potential risks in exchange. It sort of reminds me of the gmirror/GPT conflict problem**. > 2. Have an option to zpool create and zpool add, that specifies the > ashift value. Here my thinking is that it should let you specify an > ashift equal or larger than the computed one, which is based on the > largest sector size of all devices in a vdev. I'm very much a supporter of the option being added to one of the ZFS commands. I'm not against Steve's sysctl, but the problem with that is more of a social one: features like this (if committed) never end up being announced to the world in a useful manner, so nobody knows they exist until it's too late. It would also just make me wonder "why bother with the sysctl at all, just use 4096 universally going forward, and have whatever code/bits still support cases where existing setups use 512" (last part sounds easier than probably done, not sure). As for the "basing things on sector size" -- see my above explanation for why/how this isn't entirely reliable. Manufacturers, argh! :-) But something like "zpool create -a 12 ..." would be a blessing, because I'd just use that all the time. If changing the default from 9 to 12 isn't plausible, then at least offering what I just described would be a good/worthwhile stepping stone. Though, I guess really it's not much different from the gnop approach, just that you now don't have to use gnop. But you still have to be aware of the flag (ex. -a 12), just like you have to be aware of gnop and that ordeal. I'm trying to think about it from the viewpoint of a user not having to know about/do *any* of that. > Don't know, but always wondered.. how hard is it to change the > ashift value on the fly? Does it impact reads of data already on the > vdev, or does it impact only writes? If only writes, it should be > trivial, really.... I've wondered this too, but I don't have any familiarity with the ZFS innards or filesystems at a low level to be able to talk on it. ** -- Linux md had this same problem (though at the beginning of the device, not the end), and they solved it cleanly with md 1.2 (the version number is stored in the superblock/metadata) where they skip the first 4096 bytes on the disk and store the superblock there: https://raid.wiki.kernel.org/index.php/RAID_superblock_formats Sections 1.3 and 1.5 are most relevant/educational. You can see clearly, though, that they've had to change their approach given the same stuff FreeBSD is dealing with. I looked at the gmirror code late last week to see if this was possible to do, and at first glance it appears to be, but I don't think there would be a clean "upgrade path" -- I'm fairly certain it would require a full gmirror recreation (as in fully start over), because otherwise changes would conflict with existing partition sizes and other whatnots. See Section 1.6 in the above document for this situation on Linux -- just remember that Linux md is a bit of a different beast than gmirror (GEOM is more versatile, md is more rigid/static). I'm sure Pawel has thought about all of this many times over though and that it's more of an issue of time than anything else. -- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Fri Jul 5 15:13:50 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id A0878610 for ; Fri, 5 Jul 2013 15:13:50 +0000 (UTC) (envelope-from ler@lerctr.org) Received: from thebighonker.lerctr.org (lrosenman-1-pt.tunnel.tserv8.dal1.ipv6.he.net [IPv6:2001:470:1f0e:3ad::2]) by mx1.freebsd.org (Postfix) with ESMTP id 6ADFD11B8 for ; Fri, 5 Jul 2013 15:13:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lerctr.org; s=lerami; h=Message-ID:Subject:Cc:To:From:Date:Content-Transfer-Encoding:Content-Type:MIME-Version; bh=hgKKl7f97nn1upWQjrpDgfYzSqFAVSSRtGuvVFgbAEo=; b=Htbi1LEc8ZoLfiFxkNoXtTLdCx3882u1XEKXwJjjkYoOvIzbTbF+axE4dO5rp55OeyqRhji9RRixjPjGjlIQpRpKaiCoY75oW+LZdo5RK3fdhJ9slQ+C8A2sv7E7JvGHsGiZUkLYlDZMfLVJC2jp1nEKa8LuB9XNrNGgz1orLwM=; Received: from localhost.lerctr.org ([127.0.0.1]:36947 helo=webmail.lerctr.org) by thebighonker.lerctr.org with esmtpa (Exim 4.80.1 (FreeBSD)) (envelope-from ) id 1Uv7hj-0000Rd-Sb; Fri, 05 Jul 2013 10:13:49 -0500 Received: from cpe-72-182-19-162.austin.res.rr.com ([72.182.19.162]) by webmail.lerctr.org with HTTP (HTTP/1.1 POST); Fri, 05 Jul 2013 10:13:47 -0500 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Date: Fri, 05 Jul 2013 10:13:47 -0500 From: Larry Rosenman To: Steven Hartland Subject: Still with the invalid backup stream issue Message-ID: <42806c723d2cd4ceccbeb19c576eaeb7@webmail.lerctr.org> X-Sender: ler@lerctr.org User-Agent: Roundcube Webmail/0.9.2 X-Spam-Score: -3.1 (---) X-LERCTR-Spam-Score: -3.1 (---) X-Spam-Report: SpamScore (-3.1/5.0) ALL_TRUSTED=-1, BAYES_00=-1.9, RP_MATCHES_RCVD=-0.227 X-LERCTR-Spam-Report: SpamScore (-3.1/5.0) ALL_TRUSTED=-1, BAYES_00=-1.9, RP_MATCHES_RCVD=-0.227 Cc: Freebsd fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Jul 2013 15:13:50 -0000 I'm still seeing this: received 176KB stream in 1 seconds (176KB/sec) receiving incremental stream of vault/var@2013-07-05 into zroot/backups/TBH/var@2013-07-05 cannot receive incremental stream: invalid backup stream This is with Stable/8 as of today going to HEAD as of yesterday. How can I get to the bottom of this? -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 (c) E-Mail: ler@lerctr.org US Mail: 430 Valona Loop, Round Rock, TX 78681-3893 From owner-freebsd-fs@FreeBSD.ORG Fri Jul 5 15:24:32 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 8181C97F for ; Fri, 5 Jul 2013 15:24:32 +0000 (UTC) (envelope-from BATV+29e3569f53842bc9067f+3603+infradead.org+hch@bombadil.srs.infradead.org) Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2001:1868:205::9]) by mx1.freebsd.org (Postfix) with ESMTP id 730B21216 for ; Fri, 5 Jul 2013 15:24:32 +0000 (UTC) Received: from hch by bombadil.infradead.org with local (Exim 4.80.1 #2 (Red Hat Linux)) id 1Uv7s7-0005ZV-ET; Fri, 05 Jul 2013 15:24:31 +0000 Date: Fri, 5 Jul 2013 08:24:31 -0700 From: Christoph Hellwig To: Zoltan Arnold NAGY Subject: Re: O_DIRECT|O_SYNC semantics? Message-ID: <20130705152431.GA21283@infradead.org> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org See http://www.infradead.org/rpr.html Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Jul 2013 15:24:32 -0000 On Wed, Jul 03, 2013 at 09:05:40PM +0200, Zoltan Arnold NAGY wrote: > Hi, > > Could someone have a look here: > http://serverfault.com/questions/520141/please-explain-my-fio-results-is-o-synco-direct-misbehaving-on-linux > > Basically, I'm seeing wastly different results on Linux and on FreeBSD 9.1. > Either FreeBSD's not honoring O_SYNC properly, or Linux does something > wicked. > > I've been at it for a few days, without any real progress. For consumer disks using O_SYNC on Linux does make a huge difference, because it flushes the disk write cache after completion to make sure the O_SYNC gurantees that data has hit physical storage are met. It seems like FreeBSD might be missing that call. From owner-freebsd-fs@FreeBSD.ORG Fri Jul 5 15:47:22 2013 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id A20DF308; Fri, 5 Jul 2013 15:47:22 +0000 (UTC) (envelope-from gnn@freebsd.org) Received: from vps.hungerhost.com (vps.hungerhost.com [216.38.53.176]) by mx1.freebsd.org (Postfix) with ESMTP id 79141134A; Fri, 5 Jul 2013 15:47:22 +0000 (UTC) Received: from [209.249.190.124] (port=49907 helo=gnnmac.hudson-trading.com) by vps.hungerhost.com with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.80.1) (envelope-from ) id 1Uv8ED-0002HI-2S; Fri, 05 Jul 2013 11:47:21 -0400 Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\)) Subject: Re: New fusefs implementation not usable with multiple fusefs mounts From: George Neville-Neil In-Reply-To: Date: Fri, 5 Jul 2013 11:47:48 -0400 Content-Transfer-Encoding: quoted-printable Message-Id: <4EF944D8-0D08-433D-BFD7-917631E3264E@freebsd.org> References: To: Kevin Oberman X-Mailer: Apple Mail (2.1508) X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - vps.hungerhost.com X-AntiAbuse: Original Domain - freebsd.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - freebsd.org X-Get-Message-Sender-Via: vps.hungerhost.com: authenticated_id: gnn@neville-neil.com Cc: Attilio Rao , fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Jul 2013 15:47:22 -0000 On Jul 2, 2013, at 18:46 , Kevin Oberman wrote: > On Tue, Jul 2, 2013 at 3:20 PM, Attilio Rao = wrote: > On Wed, Jul 3, 2013 at 12:07 AM, Kevin Oberman = wrote: > > I have been using the new fusefs for a while and have had to back it = out and > > go back to the old kernel module. I keep getting corrupted file NTFS = systems > > and I think I understand why, > > > > I mount two NTFS systems: > > /dev/fuse 184319948 110625056 73694892 60% /media/Media > > /dev/fuse 110636028 104943584 5692444 95% = /media/Windows7_OS > > > > Note that both systems are mounted on /dev/fuse and I am assured = that this > > is by design. Both work fine for reads and seem to work for writes. = Then I > > unmount either of them. Both are unmounted, at least as far as the = OS is > > concerned. There is no way to unmount one and leave the other = mounted. It > > appears that any attempt to unmount either system does a proper = unmount of > > /media/Media, but, while marking /media/Windows7_OS as unmounted, = actually > > does not do so. The device ends up corrupt and the only way I have = been able > > to clean it is to boot Windows and have a disk check run. Media = never seems > > to get corrupted. > > > > Any further information I might gather before filing a PR? I am = running on > > 9.1 stable, but havehad the problem since the patch set first became > > available on 9.0-stable. >=20 > I do not understand, new fusefs implementation was never committed to > stable branch to my knowledge. > Did you backport manually? >=20 > BTW I cc'ed George which should maintain the module. >=20 > Attilio >=20 > Attilio, >=20 > Actually, you provided the patches for 9-Stable way back when you = first did them and we had an exchange on current@ about their use on = 9-stable and their operation including the mounts all being on = /dev/fuse. I also edited the mount_fuse man pages to clarify the awkward = wording of the original (which you didn't write). >=20 > They still apply pretty cleanly and I continued using them until about = 3 weeks ago when I removed them to test whether they were responsible = for the issues I was seeing. Since I got corruption most every time I = unmounted the file systems after having written to the Windows one, I am = now pretty sure that it does not happen when I use the old kernel = module. >=20 > The analysis of the problem is purely speculation, but fits the = behavior. If it is correct, I would expect the same issues to occur with = head. >=20 > Thanks for copying George. I didn't realize that he had taken over the = code. I won't bu you about it again.=20 Actually I too have no time for this code as other things have come up. = It's time to find someone who really needs this on a long term basis. Best, George From owner-freebsd-fs@FreeBSD.ORG Fri Jul 5 15:52:15 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id DB1A7556 for ; Fri, 5 Jul 2013 15:52:15 +0000 (UTC) (envelope-from prvs=1898728ac9=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 697BF137E for ; Fri, 5 Jul 2013 15:52:15 +0000 (UTC) Received: from r2d2 ([82.69.141.170]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50004742804.msg for ; Fri, 05 Jul 2013 16:52:13 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Fri, 05 Jul 2013 16:52:13 +0100 (not processed: message from valid local sender) X-MDDKIM-Result: neutral (mail1.multiplay.co.uk) X-MDRemoteIP: 82.69.141.170 X-Return-Path: prvs=1898728ac9=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-fs@freebsd.org Message-ID: From: "Steven Hartland" To: "Jeremy Chadwick" , "Daniel Kalchev" References: <20130704000405.GA75529@icarus.home.lan> <20130704171637.GA94539@icarus.home.lan> <2A261BEA-4452-4F6A-8EFB-90A54D79CBB9@alumni.chalmers.se> <20130704191203.GA95642@icarus.home.lan> <43015E9015084CA6BAC6978F39D22E8B@multiplay.co.uk> <3CFB4564D8EB4A6A9BCE2AFCC5B6E400@multiplay.co.uk> <51D6A206.2020303@digsys.bg> <20130705145332.GA5449@icarus.home.lan> Subject: Re: Slow resilvering with mirrored ZIL Date: Fri, 5 Jul 2013 16:52:26 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Jul 2013 15:52:15 -0000 ----- Original Message ----- From: "Jeremy Chadwick" > I'm not "damning FreeBSD", I'm just stating that this is the reality of > the situation. For example -- only recently in stable/9 (maybe > stable/8, didn't look) were quirks added for Intel SSDs which came to > market over 2 years ago (BTW thanks for adding those Steve). Just to confirm both stable/8 and stable/9 are in sync with head for all the SSD quirks. > Even if there was a way to rectify that scenario in an efficient manner > (the quirks right now are hard-coded in kernel space), it wouldn't > change this scenario: I've thought about that too, it would be really nice not to have to recompile to add a QUIRK. I also know scottl has some ideas on how to improve CAM which may well enable us to config these things much easier. >> 2. Have an option to zpool create and zpool add, that specifies the >> ashift value. Here my thinking is that it should let you specify an >> ashift equal or larger than the computed one, which is based on the >> largest sector size of all devices in a vdev. > > I'm very much a supporter of the option being added to one of the ZFS > commands. I'm not against Steve's sysctl, but the problem with that is > more of a social one: features like this (if committed) never end up > being announced to the world in a useful manner, so nobody knows they > exist until it's too late. It would also just make me wonder "why > bother with the sysctl at all, just use 4096 universally going forward, > and have whatever code/bits still support cases where existing setups > use 512" (last part sounds easier than probably done, not sure). The primary reason for not having it hard coded is while its good for 99% of use cases there's still that extra 1%. Which is why I think from a FreeBSD perspective having the option to configure the default, but still having the standard as 4k is where my mind is. > As for the "basing things on sector size" -- see my above explanation > for why/how this isn't entirely reliable. Manufacturers, argh! :-) > > But something like "zpool create -a 12 ..." would be a blessing, because > I'd just use that all the time. If changing the default from 9 to 12 > isn't plausible, then at least offering what I just described would be a > good/worthwhile stepping stone. Indeed. Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Fri Jul 5 18:11:17 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 64EA5884 for ; Fri, 5 Jul 2013 18:11:17 +0000 (UTC) (envelope-from osharoiko@gmail.com) Received: from mail-ve0-x22b.google.com (mail-ve0-x22b.google.com [IPv6:2607:f8b0:400c:c01::22b]) by mx1.freebsd.org (Postfix) with ESMTP id D41271A6F for ; Fri, 5 Jul 2013 18:11:16 +0000 (UTC) Received: by mail-ve0-f171.google.com with SMTP id b10so2005106vea.30 for ; Fri, 05 Jul 2013 11:11:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=lAzfX+OmJryCLfkUQ8K9DpYpbECWKfihaJxhwbfDtas=; b=M/cNRqGYK/JF58qX6lmUjjIrXufDJEUOmVckQJu2d4BDkSnC3CEMMdrPf4Z4d7nplP ZpuDv02g0U6Rm/NF5NxE8OI+PgFX4NDX4glHcjWzxVwu+ZShkfdqugxpL6QO1xhNVZRg lBMNqEwpOogKjndcS8SeHpdWBfL1Xnuqlvcg2uivsmqh9Pcu6BaNPMeSurcE8u0TPLAi Ny+HIhHR3Nrm/8i93y/Q69DLdkHI7eCf4S9vnBm1FENO/1Ihy2EVro4gbBFEOZgQvvdL s+6LEFW+O9WbUmmVOIfWyhXmSxp+h0Q0gAoFduiPDB2+rIpFdwHmSjBA7HAvPuJQvceD 9hBg== MIME-Version: 1.0 X-Received: by 10.58.89.147 with SMTP id bo19mr3939368veb.18.1373047876405; Fri, 05 Jul 2013 11:11:16 -0700 (PDT) Received: by 10.58.28.238 with HTTP; Fri, 5 Jul 2013 11:11:16 -0700 (PDT) In-Reply-To: <1821939739.2131305.1372984627528.JavaMail.root@uoguelph.ca> References: <1821939739.2131305.1372984627528.JavaMail.root@uoguelph.ca> Date: Fri, 5 Jul 2013 19:11:16 +0100 Message-ID: Subject: Re: NFSv4 and Kerberos, group permission seem to be ignored From: Oleg Sharoyko To: Rick Macklem Content-Type: text/plain; charset=UTF-8 Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Jul 2013 18:11:17 -0000 Hi Rick, Thank you very much for pointing me to the code that does principal to uid/gids translation. Turns out my problems are caused by a buffer being too small which you have already fixed in r250176. Thank you once again! Regards, Oleg On 5 July 2013 01:37, Rick Macklem wrote: > Oleg Sharoyko wrote: >> Hello, >> >> I have a small server which runs FreeBSD 9.1 and I've is set up as >> NFSv4 server with kerberised NFS access. My clients are linux >> machines. It almost works as expected (mounting/accessing files) >> except for one strange issue: it looks like group permissions on >> files >> and directories are being ignored. Here's an example: >> >> Server: >> >> evendim:~ % id >> uid=1001(ols) gid=1001(ols) groups=1001(ols),0(wheel),60000(family) >> evendim:~ % ls -l /data/file1 >> -rw-rw---- 1 root family 6 4 Jul 18:42 /data/file1 >> evendim:~ % cat /data/file1 >> test1 >> evendim:~ % ls -l /data/file2 >> -rw------- 1 ols family 6 4 Jul 18:42 /data/file2 >> evendim:~ % cat /data/file2 >> test2 >> evendim:~ % ls -l /data/file3 >> -rw-r--r-- 1 root family 6 4 Jul 18:42 /data/file3 >> evendim:~ % cat /data/file3 >> test3 >> evendim:~ % cat /etc/exports >> V4:/ -sec=krb5 >> /data -sec=krb5 >> > Well, here's the code snippet in gssd.c that does the principal/user > name to gid list translation. All I can suggest is putting this in a > little test program on the server and then running it as "root", to > see if it generates the results you would expect. > (Btw, NGRPS is defined as 16 in a .h file in /usr/include/rpc. I suspect > that should be increased and the code should check for a -1 return > from getgrouplist(). However, you don't seem to exceed 16 groups. > It also assumes that sizeof(gid_t) == sizeof(int). > A little weird and I'm not sure that is true for all arches? > Did I remember to mention I wasn't the author of this?;-) > > if (pw) { > int len = NGRPS; > int groups[NGRPS]; > result->gid = pw->pw_gid; > getgrouplist(pw->pw_name, pw->pw_gid, > groups, &len); > result->gidlist.gidlist_len = len; > result->gidlist.gidlist_val = > mem_alloc(len * sizeof(int)); > memcpy(result->gidlist.gidlist_val, groups, > len * sizeof(int)); > gssd_verbose_out("gssd_pname_to_uid: mapped" > " to uid=%d, gid=%d\n", (int)result->uid, > (int)result->gid); > } else { > >> Client: >> >> sherlock:~ % id >> uid=1000(ols) gid=1000(ols) >> groups=1000(ols),4(adm),20(dialout),24(cdrom),25(floppy),29(audio),30(dip),44(video),46(plugdev),109(netdev),110(bluetooth),113(fuse),116(scanner),118(kismet),60000(family) >> sherlock:~ % sudo mount -v -t nfs4 -o sec=krb5 >> evendim.sharoyko.net:/data /mnt >> mount.nfs4: timeout set for Thu Jul 4 19:52:16 2013 >> mount.nfs4: trying text-based options >> 'sec=krb5,addr=192.168.1.3,clientaddr=192.168.1.128' >> sherlock:~ % ls -l /mnt/file1 >> -rw-rw---- 1 root family 6 Jul 4 19:42 /mnt/file1 >> sherlock:~ % cat /mnt/file1 >> cat: /mnt/file1: Permission denied >> sherlock:~ % ls -l /mnt/file2 >> -rw------- 1 ols family 6 Jul 4 19:42 /mnt/file2 >> sherlock:~ % cat /mnt/file2 >> test2 >> sherlock:~ % ls -l /mnt/file3 >> -rw-r--r-- 1 root family 6 Jul 4 19:42 /mnt/file3 >> sherlock:~ % cat /mnt/file3 >> test3 >> >> As you can see file2 is inaccessible while it has group read/write >> permissions, user ols belongs to group family on both client and >> server and user/group mapping seems to work. /data on the server is a >> ZFS filesystem but I've also tried UFS with the same results. I've >> also tried ACLs and ACLs for users do work while ACLs for groups >> don't >> seem to have any effect. Is there something that I'm doing wrong? Is >> this an expected behaviour? I will greatly appreciate if you can help >> me debugging this issue. I'll quote below captured packets that are >> relevant to my attempt to access file1. As you can see access is >> clearly denied by server but I don't understand why. >> >> No. Time Source Destination >> Protocol Length Info >> 109 5.649608 192.168.1.128 192.168.1.3 NFS >> 258 V4 Call (Reply In 110) LOOKUP DH:0x4dcc3776/file1 >> >> Frame 109: 258 bytes on wire (2064 bits), 258 bytes captured (2064 >> bits) >> Ethernet II, Src: GemtekTe_f6:cf:a1 (00:26:82:f6:cf:a1), Dst: >> Giga-Byt_db:cd:c4 (90:2b:34:db:cd:c4) >> Internet Protocol Version 4, Src: 192.168.1.128 (192.168.1.128), Dst: >> 192.168.1.3 (192.168.1.3) >> Transmission Control Protocol, Src Port: 726 (726), Dst Port: nfs >> (2049), Seq: 3337, Ack: 3193, Len: 192 >> Remote Procedure Call, Type:Call XID:0xba073c52 >> Network File System >> [Program Version: 4] >> [V4 Procedure: COMPOUND (1)] >> Tag: >> length: 0 >> contents: >> minorversion: 0 >> Operations (count: 4) >> Opcode: PUTFH (22) >> filehandle >> length: 28 >> [hash (CRC-32): 0x4dcc3776] >> decode type as: unknown >> filehandle: >> 9a7470c6deedeca50a0004000000000037d80a0000000000... >> Opcode: LOOKUP (15) >> Filename: file1 >> length: 5 >> contents: file1 >> fill bytes: opaque data >> Opcode: GETFH (10) >> Opcode: GETATTR (9) >> GETATTR4args >> attr_request >> bitmap[0] = 0x0010011a >> [5 attributes requested] >> mand_attr: FATTR4_TYPE (1) >> mand_attr: FATTR4_CHANGE (3) >> mand_attr: FATTR4_SIZE (4) >> mand_attr: FATTR4_FSID (8) >> recc_attr: FATTR4_FILEID (20) >> bitmap[1] = 0x0030a23a >> [9 attributes requested] >> recc_attr: FATTR4_MODE (33) >> recc_attr: FATTR4_NUMLINKS (35) >> recc_attr: FATTR4_OWNER (36) >> recc_attr: FATTR4_OWNER_GROUP (37) >> recc_attr: FATTR4_RAWDEV (41) >> recc_attr: FATTR4_SPACE_USED (45) >> recc_attr: FATTR4_TIME_ACCESS (47) >> recc_attr: FATTR4_TIME_METADATA (52) >> recc_attr: FATTR4_TIME_MODIFY (53) >> [Main Opcode: LOOKUP (15)] >> >> No. Time Source Destination >> Protocol Length Info >> 110 5.649870 192.168.1.3 192.168.1.128 NFS >> 370 V4 Reply (Call In 109) LOOKUP >> >> Frame 110: 370 bytes on wire (2960 bits), 370 bytes captured (2960 >> bits) >> Ethernet II, Src: Giga-Byt_db:cd:c4 (90:2b:34:db:cd:c4), Dst: >> GemtekTe_f6:cf:a1 (00:26:82:f6:cf:a1) >> Internet Protocol Version 4, Src: 192.168.1.3 (192.168.1.3), Dst: >> 192.168.1.128 (192.168.1.128) >> Transmission Control Protocol, Src Port: nfs (2049), Dst Port: 726 >> (726), Seq: 3193, Ack: 3529, Len: 304 >> Remote Procedure Call, Type:Reply XID:0xba073c52 >> Network File System >> [Program Version: 4] >> [V4 Procedure: COMPOUND (1)] >> Status: NFS4_OK (0) >> Tag: >> length: 0 >> contents: >> Operations (count: 4) >> Opcode: PUTFH (22) >> Status: NFS4_OK (0) >> Opcode: LOOKUP (15) >> Status: NFS4_OK (0) >> Opcode: GETFH (10) >> Status: NFS4_OK (0) >> Filehandle >> length: 28 >> [hash (CRC-32): 0xc0a4eeb4] >> decode type as: unknown >> filehandle: >> 9a7470c6deedeca50a00ed00000000001bb70d0000000000... >> Opcode: GETATTR (9) >> Status: NFS4_OK (0) >> GETATTR4res >> resok4 >> obj_attributes >> attrmask >> bitmap[0] = 0x0010011a >> [5 attributes requested] >> mand_attr: FATTR4_TYPE (1) >> mand_attr: FATTR4_CHANGE (3) >> mand_attr: FATTR4_SIZE (4) >> mand_attr: FATTR4_FSID (8) >> recc_attr: FATTR4_FILEID (20) >> bitmap[1] = 0x0030a23a >> [9 attributes requested] >> recc_attr: FATTR4_MODE (33) >> recc_attr: FATTR4_NUMLINKS (35) >> recc_attr: FATTR4_OWNER (36) >> recc_attr: FATTR4_OWNER_GROUP (37) >> recc_attr: FATTR4_RAWDEV (41) >> recc_attr: FATTR4_SPACE_USED (45) >> recc_attr: FATTR4_TIME_ACCESS (47) >> recc_attr: FATTR4_TIME_METADATA (52) >> recc_attr: FATTR4_TIME_MODIFY (53) >> attr_vals >> mand_attr: FATTR4_TYPE (1) >> nfs_ftype4: NF4REG (1) >> mand_attr: FATTR4_CHANGE (3) >> changeid: 96 >> mand_attr: FATTR4_SIZE (4) >> size: 6 >> mand_attr: FATTR4_FSID (8) >> fattr4_fsid >> fsid4.major: 3329258650 >> fsid4.minor: 2783768030 >> recc_attr: FATTR4_FILEID (20) >> fileid: 237 >> recc_attr: FATTR4_MODE (33) >> fattr4_mode: 0660 >> 000. .... .... .... = Unknown >> .... 0... .... .... = not SUID >> .... .0.. .... .... = not SGID >> .... ..0. .... .... = not save >> swapped text >> .... ...1 .... .... = Read >> permission for owner >> .... .... 1... .... = Write >> permission for owner >> .... .... .0.. .... = no Execute >> permission for owner >> .... .... ..1. .... = Read >> permission for group >> .... .... ...1 .... = Write >> permission for group >> .... .... .... 0... = no Execute >> permission for group >> .... .... .... .0.. = no Read >> permission for others >> .... .... .... ..0. = no Write >> permission for others >> .... .... .... ...0 = no Execute >> permission for others >> recc_attr: FATTR4_NUMLINKS (35) >> numlinks: 1 >> recc_attr: FATTR4_OWNER (36) >> fattr4_owner: root@id.sharoyko.net >> length: 20 >> contents: root@id.sharoyko.net >> recc_attr: FATTR4_OWNER_GROUP (37) >> fattr4_owner_group: >> family@id.sharoyko.net >> length: 22 >> contents: family@id.sharoyko.net >> fill bytes: opaque data >> recc_attr: FATTR4_RAWDEV (41) >> specdata1: 128 >> specdata2: 123863040 >> recc_attr: FATTR4_SPACE_USED (45) >> space_used: 1024 >> recc_attr: FATTR4_TIME_ACCESS (47) >> seconds: 1372963326 >> nseconds: 263434280 >> recc_attr: FATTR4_TIME_METADATA (52) >> seconds: 1372963379 >> nseconds: 804435894 >> recc_attr: FATTR4_TIME_MODIFY (53) >> seconds: 1372963326 >> nseconds: 264422029 >> [Main Opcode: LOOKUP (15)] >> >> No. Time Source Destination >> Protocol Length Info >> 117 8.456684 192.168.1.128 192.168.1.3 NFS >> 322 V4 Call (Reply In 118) OPEN DH:0x4dcc3776/file1 >> >> Frame 117: 322 bytes on wire (2576 bits), 322 bytes captured (2576 >> bits) >> Ethernet II, Src: GemtekTe_f6:cf:a1 (00:26:82:f6:cf:a1), Dst: >> Giga-Byt_db:cd:c4 (90:2b:34:db:cd:c4) >> Internet Protocol Version 4, Src: 192.168.1.128 (192.168.1.128), Dst: >> 192.168.1.3 (192.168.1.3) >> Transmission Control Protocol, Src Port: 726 (726), Dst Port: nfs >> (2049), Seq: 3905, Ack: 3697, Len: 256 >> Remote Procedure Call, Type:Call XID:0xbd073c52 >> Network File System >> [Program Version: 4] >> [V4 Procedure: COMPOUND (1)] >> Tag: >> length: 0 >> contents: >> minorversion: 0 >> Operations (count: 5) >> Opcode: PUTFH (22) >> filehandle >> length: 28 >> [hash (CRC-32): 0x4dcc3776] >> decode type as: unknown >> filehandle: >> 9a7470c6deedeca50a0004000000000037d80a0000000000... >> Opcode: OPEN (18) >> seqid: 0x00000000 >> share_access: OPEN4_SHARE_ACCESS_READ (1) >> share_deny: OPEN4_SHARE_DENY_NONE (0) >> clientid: 0xcd6cc75124000000 >> owner: >> length: 24 >> contents: >> Open Type: OPEN4_NOCREATE (0) >> Claim Type: CLAIM_NULL (0) >> Filename: file1 >> length: 5 >> contents: file1 >> fill bytes: opaque data >> Opcode: GETFH (10) >> Opcode: ACCESS (3), [Check: RD MD XT XE] >> Check access: 0x2d >> .... ...1 = 0x01 READ: allowed? >> .... .1.. = 0x04 MODIFY: allowed? >> .... 1... = 0x08 EXTEND: allowed? >> ..1. .... = 0x20 EXECUTE: allowed? >> Opcode: GETATTR (9) >> GETATTR4args >> attr_request >> bitmap[0] = 0x0010011a >> [5 attributes requested] >> mand_attr: FATTR4_TYPE (1) >> mand_attr: FATTR4_CHANGE (3) >> mand_attr: FATTR4_SIZE (4) >> mand_attr: FATTR4_FSID (8) >> recc_attr: FATTR4_FILEID (20) >> bitmap[1] = 0x0030a23a >> [9 attributes requested] >> recc_attr: FATTR4_MODE (33) >> recc_attr: FATTR4_NUMLINKS (35) >> recc_attr: FATTR4_OWNER (36) >> recc_attr: FATTR4_OWNER_GROUP (37) >> recc_attr: FATTR4_RAWDEV (41) >> recc_attr: FATTR4_SPACE_USED (45) >> recc_attr: FATTR4_TIME_ACCESS (47) >> recc_attr: FATTR4_TIME_METADATA (52) >> recc_attr: FATTR4_TIME_MODIFY (53) >> [Main Opcode: OPEN (18)] >> >> No. Time Source Destination >> Protocol Length Info >> 118 8.456811 192.168.1.3 192.168.1.128 NFS >> 150 V4 Reply (Call In 117) OPEN Status: NFS4ERR_ACCES >> >> Frame 118: 150 bytes on wire (1200 bits), 150 bytes captured (1200 >> bits) >> Ethernet II, Src: Giga-Byt_db:cd:c4 (90:2b:34:db:cd:c4), Dst: >> GemtekTe_f6:cf:a1 (00:26:82:f6:cf:a1) >> Internet Protocol Version 4, Src: 192.168.1.3 (192.168.1.3), Dst: >> 192.168.1.128 (192.168.1.128) >> Transmission Control Protocol, Src Port: nfs (2049), Dst Port: 726 >> (726), Seq: 3697, Ack: 4161, Len: 84 >> Remote Procedure Call, Type:Reply XID:0xbd073c52 >> Network File System >> [Program Version: 4] >> [V4 Procedure: COMPOUND (1)] >> Status: NFS4ERR_ACCES (13) >> Tag: >> length: 0 >> contents: >> Operations (count: 2) >> Opcode: PUTFH (22) >> Status: NFS4_OK (0) >> Opcode: OPEN (18) >> Status: NFS4ERR_ACCES (13) >> [Main Opcode: OPEN (18)] >> >> Kind regards, >> -- >> Oleg >> _______________________________________________ >> freebsd-fs@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >> -- Oleg From owner-freebsd-fs@FreeBSD.ORG Fri Jul 5 19:38:25 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 5EDD43B5 for ; Fri, 5 Jul 2013 19:38:25 +0000 (UTC) (envelope-from mxb@alumni.chalmers.se) Received: from mail-la0-f47.google.com (mail-la0-f47.google.com [209.85.215.47]) by mx1.freebsd.org (Postfix) with ESMTP id D497B1F54 for ; Fri, 5 Jul 2013 19:38:23 +0000 (UTC) Received: by mail-la0-f47.google.com with SMTP id fe20so2299936lab.6 for ; Fri, 05 Jul 2013 12:38:17 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=subject:mime-version:content-type:from:x-priority:in-reply-to:date :cc:content-transfer-encoding:message-id:references:to:x-mailer :x-gm-message-state; bh=G/la65njL0hOn+VoHkr99xy6E6D+RTz/Znh9pUsmuQ8=; b=agii92WS8uDI/agw9nQeQn81X5yjubZ3ECfubnzj/jmcJSQa0rEVtRmJlkpLyb2iDd XlYQ3Dp5seDnIQGkhR/HQlb8WjXDJhDCxmtvF93G2yrqnS9IgIKhNz9WPU1i17UbuAzv 4WbYRgXXYSSqBbO9pkzqTAp1hiv4JWv7cqDMc+cOqQkVTgXH+rOGXhO/7Zz7Ns2b/4Ga hYPn4zDmKqdNGVH1BGnSlAKf3Erris+8XBmqBHHEbwwf407mlQv1MKYANyA45ZGmBK9r QwoMdpv1O8CLoBIPpDjQZpXgGZnx9aCbCYp+7o+c1guTsZkqvS9mc4HwPLGI5HBlhE+x 1B5A== X-Received: by 10.112.180.164 with SMTP id dp4mr6206009lbc.68.1373053097025; Fri, 05 Jul 2013 12:38:17 -0700 (PDT) Received: from grey.home.unixconn.com (h-74-23.a183.priv.bahnhof.se. [46.59.74.23]) by mx.google.com with ESMTPSA id b8sm3391190lah.0.2013.07.05.12.38.14 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 05 Jul 2013 12:38:15 -0700 (PDT) Subject: Re: Slow resilvering with mirrored ZIL Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\)) Content-Type: text/plain; charset=us-ascii From: mxb X-Priority: 3 In-Reply-To: Date: Fri, 5 Jul 2013 21:38:13 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: <9052B6E6-F742-4C10-87B5-2EFE03FDB31E@alumni.chalmers.se> References: <20130704000405.GA75529@icarus.home.lan> <20130704171637.GA94539@icarus.home.lan> <2A261BEA-4452-4F6A-8EFB-90A54D79CBB9@alumni.chalmers.se> <20130704191203.GA95642@icarus.home.lan> <43015E9015084CA6BAC6978F39D22E8B@multiplay.co.uk> <3CFB4564D8EB4A6A9BCE2AFCC5B6E400@multiplay.co.uk> <51D6A206.2020303@digsys.bg> <20130705145332.GA5449@icarus.home.lan> To: Steven Hartland X-Mailer: Apple Mail (2.1508) X-Gm-Message-State: ALoCoQkYFrlXrzTsbwYa6hYzQf7eSvGATG7yv4KTFc0l510hgLw5xe5WWbK9wiYKwMNeBVJiQyzw Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Jul 2013 19:38:25 -0000 Thanks everyone for a very good info provided in this discussion! Question is if I should wait for resilvering to finish? It runs at 97B/s = now. Do I have any other options in this situation? Put back old disk? I really don't want to lose all data on this pool. //mxb On 5 jul 2013, at 17:52, Steven Hartland = wrote: > ----- Original Message ----- From: "Jeremy Chadwick" >=20 >> I'm not "damning FreeBSD", I'm just stating that this is the reality = of >> the situation. For example -- only recently in stable/9 (maybe >> stable/8, didn't look) were quirks added for Intel SSDs which came to >> market over 2 years ago (BTW thanks for adding those Steve). >=20 > Just to confirm both stable/8 and stable/9 are in sync with head for = all > the SSD quirks. >> Even if there was a way to rectify that scenario in an efficient = manner >> (the quirks right now are hard-coded in kernel space), it wouldn't >> change this scenario: >=20 > I've thought about that too, it would be really nice not to have to = recompile > to add a QUIRK. I also know scottl has some ideas on how to improve = CAM which > may well enable us to config these things much easier. >=20 >>> 2. Have an option to zpool create and zpool add, that specifies the >>> ashift value. Here my thinking is that it should let you specify an >>> ashift equal or larger than the computed one, which is based on the >>> largest sector size of all devices in a vdev. >> I'm very much a supporter of the option being added to one of the ZFS >> commands. I'm not against Steve's sysctl, but the problem with that = is >> more of a social one: features like this (if committed) never end up >> being announced to the world in a useful manner, so nobody knows they >> exist until it's too late. It would also just make me wonder "why >> bother with the sysctl at all, just use 4096 universally going = forward, >> and have whatever code/bits still support cases where existing setups >> use 512" (last part sounds easier than probably done, not sure). >=20 > The primary reason for not having it hard coded is while its good for > 99% of use cases there's still that extra 1%. Which is why I think > from a FreeBSD perspective having the option to configure the default, > but still having the standard as 4k is where my mind is. >=20 >> As for the "basing things on sector size" -- see my above explanation >> for why/how this isn't entirely reliable. Manufacturers, argh! :-) >> But something like "zpool create -a 12 ..." would be a blessing, = because >> I'd just use that all the time. If changing the default from 9 to 12 >> isn't plausible, then at least offering what I just described would = be a >> good/worthwhile stepping stone. >=20 > Indeed. >=20 > Regards > Steve >=20 > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > This e.mail is private and confidential between Multiplay (UK) Ltd. = and the person or entity to whom it is addressed. In the event of = misdirection, the recipient is prohibited from using, copying, printing = or otherwise disseminating it or any information contained in it.=20 > In the event of misdirection, illegible or incomplete transmission = please telephone +44 845 868 1337 > or return the E.mail to postmaster@multiplay.co.uk. >=20 > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Sat Jul 6 01:08:12 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 3808DF0B for ; Sat, 6 Jul 2013 01:08:12 +0000 (UTC) (envelope-from fjwcash@gmail.com) Received: from mail-qc0-x236.google.com (mail-qc0-x236.google.com [IPv6:2607:f8b0:400d:c01::236]) by mx1.freebsd.org (Postfix) with ESMTP id ED11F1E39 for ; Sat, 6 Jul 2013 01:08:11 +0000 (UTC) Received: by mail-qc0-f182.google.com with SMTP id e10so1485213qcy.13 for ; Fri, 05 Jul 2013 18:08:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=QcNaypJtOShYEaFOLw4IYy08Jzo8xM0wlw3Yzi5iM9I=; b=hiSpNK/WpKrI5LnfdydaZZT5DjizjvFUBx7KHEmcjAheN3hRQiqYjMeTsiKJkZ9zgV Te+ZioD/PQasUManthSCvw5GvWMvkDO+e3q5qR9NOTs28G+g3jIbBArPiZrsiq6QBqju 2fUMEA9L/cruv4CGRN40bP4u2ezq48pp9KiIV9oeMf4PDz1LReO+eaxlmK+XlU61EXnt 0ucL8tarvTBZ+HgHwRjuD3dIsephYn4BPwlJWdcl33oH3LWFPehdQDAzko/cQQQ5MqtS /rXR3wUBBX16nOAEmLCo/OD3YSHPcIEdY1grOv87F/Y0CVhq5oaM/5DC9PTI4puNNtc2 ya5Q== MIME-Version: 1.0 X-Received: by 10.224.151.137 with SMTP id c9mr9919181qaw.107.1373072891481; Fri, 05 Jul 2013 18:08:11 -0700 (PDT) Received: by 10.49.49.135 with HTTP; Fri, 5 Jul 2013 18:08:11 -0700 (PDT) Received: by 10.49.49.135 with HTTP; Fri, 5 Jul 2013 18:08:11 -0700 (PDT) In-Reply-To: <51D6A206.2020303@digsys.bg> References: <51D42107.1050107@digsys.bg> <2EF46A8C-6908-4160-BF99-EC610B3EA771@alumni.chalmers.se> <51D437E2.4060101@digsys.bg> <20130704000405.GA75529@icarus.home.lan> <20130704171637.GA94539@icarus.home.lan> <2A261BEA-4452-4F6A-8EFB-90A54D79CBB9@alumni.chalmers.se> <20130704191203.GA95642@icarus.home.lan> <43015E9015084CA6BAC6978F39D22E8B@multiplay.co.uk> <3CFB4564D8EB4A6A9BCE2AFCC5B6E400@multiplay.co.uk> <51D6A206.2020303@digsys.bg> Date: Fri, 5 Jul 2013 18:08:11 -0700 Message-ID: Subject: Re: Slow resilvering with mirrored ZIL From: Freddie Cash To: Daniel Kalchev Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: FreeBSD Filesystems X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 06 Jul 2013 01:08:12 -0000 On 2013-07-05 3:38 AM, "Daniel Kalchev" wrote: > > > On 05.07.13 02:28, Steven Hartland wrote: >> >> >> >> If anyone wants my current patches which add switch to 4k ashift by default >> as a sysctl + works with QUIRKS too, just let me know. >> >> They are well tested, just we want more options before putting in the tree. > > > Is it not easier to add this as an option to zpool create, instead of an sysctl? ZFS- on-Linux has added this as "-o ashift=" property for zpool create. There's a threat on the illumos list about standardising this s across all ZFS- using OSes. From owner-freebsd-fs@FreeBSD.ORG Sat Jul 6 01:10:53 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id DE177FBD for ; Sat, 6 Jul 2013 01:10:53 +0000 (UTC) (envelope-from fjwcash@gmail.com) Received: from mail-qa0-x22c.google.com (mail-qa0-x22c.google.com [IPv6:2607:f8b0:400d:c00::22c]) by mx1.freebsd.org (Postfix) with ESMTP id 9DCF11E63 for ; Sat, 6 Jul 2013 01:10:53 +0000 (UTC) Received: by mail-qa0-f44.google.com with SMTP id o13so4952206qaj.10 for ; Fri, 05 Jul 2013 18:10:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=G2OVd8DdDgDvZK0zN1JI+0w31B/2GLbu34nH6I+yO10=; b=MoRROB6uvOpD/vbssikpuxuWl91wWA4ZemWJVpfSImcZXXymoD8hi1JvJE0LdUpE+l l6RpJ1QTW3c4VDhLMD1SVv89HcUSzkO9IcxgHuYySGwxBxJULXSgBbt2W5AxYrEcUBvA IBTptWoA4SanDDXp1KBab8nkgAZWrljtLVNhUbUV8ZJZQ4mhVB57QHD3B+ADMXKugLgd IzCsahmWlVaX7q0QcG7D5mzw0F0m3iVr2bZqiOY78zQkYlmwo2Hw923XXuYHQg7S23cE HpBp7o68MAFO430lnxkFIGxOg1KDgCy61mYxgwzcD2X0w39lNo7QpiwM723JamWN8uAQ mMMg== MIME-Version: 1.0 X-Received: by 10.224.187.129 with SMTP id cw1mr9594453qab.68.1373073053202; Fri, 05 Jul 2013 18:10:53 -0700 (PDT) Received: by 10.49.49.135 with HTTP; Fri, 5 Jul 2013 18:10:53 -0700 (PDT) Received: by 10.49.49.135 with HTTP; Fri, 5 Jul 2013 18:10:53 -0700 (PDT) In-Reply-To: References: <51D42107.1050107@digsys.bg> <2EF46A8C-6908-4160-BF99-EC610B3EA771@alumni.chalmers.se> <51D437E2.4060101@digsys.bg> <20130704000405.GA75529@icarus.home.lan> <20130704171637.GA94539@icarus.home.lan> <2A261BEA-4452-4F6A-8EFB-90A54D79CBB9@alumni.chalmers.se> <20130704191203.GA95642@icarus.home.lan> <43015E9015084CA6BAC6978F39D22E8B@multiplay.co.uk> <3CFB4564D8EB4A6A9BCE2AFCC5B6E400@multiplay.co.uk> <51D6A206.2020303@digsys.bg> Date: Fri, 5 Jul 2013 18:10:53 -0700 Message-ID: Subject: Re: Slow resilvering with mirrored ZIL From: Freddie Cash To: Daniel Kalchev Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: FreeBSD Filesystems X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 06 Jul 2013 01:10:53 -0000 On 2013-07-05 6:08 PM, "Freddie Cash" wrote: > > > On 2013-07-05 3:38 AM, "Daniel Kalchev" wrote: > > > > > > On 05.07.13 02:28, Steven Hartland wrote: > >> > >> > >> > >> If anyone wants my current patches which add switch to 4k ashift by default > >> as a sysctl + works with QUIRKS too, just let me know. > >> > >> They are well tested, just we want more options before putting in the tree. > > > > > > Is it not easier to add this as an option to zpool create, instead of an sysctl? > > ZFS- on-Linux has added this as "-o ashift=" property for zpool create. > > There's a threat on the illumos list about standardising this s across all ZFS- using OSes. Threat=thread Lol Thinking about this, it would be nice to have a property that can be set via -o to "zpool add" and "zpool create", as well as a pool property for minimum ashift. Obviously, you'd have to set the minimum when the pool is created. From owner-freebsd-fs@FreeBSD.ORG Sat Jul 6 05:10:35 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 15C9C954 for ; Sat, 6 Jul 2013 05:10:35 +0000 (UTC) (envelope-from daniel@digsys.bg) Received: from smtp-sofia.digsys.bg (smtp-sofia.digsys.bg [193.68.21.123]) by mx1.freebsd.org (Postfix) with ESMTP id 77B9017A1 for ; Sat, 6 Jul 2013 05:10:33 +0000 (UTC) Received: from [10.183.196.154] ([213.226.63.185]) (authenticated bits=0) by smtp-sofia.digsys.bg (8.14.6/8.14.6) with ESMTP id r665AUJ3070116 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO); Sat, 6 Jul 2013 08:10:31 +0300 (EEST) (envelope-from daniel@digsys.bg) References: <20130704000405.GA75529@icarus.home.lan> <20130704171637.GA94539@icarus.home.lan> <2A261BEA-4452-4F6A-8EFB-90A54D79CBB9@alumni.chalmers.se> <20130704191203.GA95642@icarus.home.lan> <43015E9015084CA6BAC6978F39D22E8B@multiplay.co.uk> <3CFB4564D8EB4A6A9BCE2AFCC5B6E400@multiplay.co.uk> <51D6A206.2020303@digsys.bg> <20130705145332.GA5449@icarus.home.lan> Mime-Version: 1.0 (1.0) In-Reply-To: <20130705145332.GA5449@icarus.home.lan> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Message-Id: <209EBC4A-45C6-4F94-87C3-9679B8A0691E@digsys.bg> X-Mailer: iPhone Mail (10B329) From: Daniel Kalchev Subject: Re: Slow resilvering with mirrored ZIL Date: Sat, 6 Jul 2013 08:10:29 +0300 To: Jeremy Chadwick Cc: "freebsd-fs@freebsd.org" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 06 Jul 2013 05:10:35 -0000 On 05.07.2013, at 17:53, Jeremy Chadwick wrote: > On Fri, Jul 05, 2013 at 01:37:58PM +0300, Daniel Kalchev wrote: >>=20 >> On 05.07.13 02:28, Steven Hartland wrote: >>>=20 >>>=20 >>> If anyone wants my current patches which add switch to 4k ashift >>> by default >>> as a sysctl + works with QUIRKS too, just let me know. >>>=20 >>> They are well tested, just we want more options before putting in >>> the tree. >>=20 >> Is it not easier to add this as an option to zpool create, instead >> of an sysctl? >>=20 >> That is, I believe we have two scenarios here: >>=20 >> 1. Having an sysctl that instructs ZFS to look at the FreeBSD quirks >> to decide what the ashift should be, instead of only querying the >> 'sectorsize' property of the storage. I believe we might not even >> need an sysctl here, just make it default to obey the quirks --- but >> sysctl for the interim period will not hurt (with the proper >> default). >=20 > I can expand on this one (specifically "relying on sectorsize of the > media"): no, this will not work reliably, for two reasons. =20 This is how ZFS works now. While I am we'll aware that most people use disk= s with all the quirks manufacturers put there, lets not forget ZFS was desig= ned for the enterprise environment, where "storage device" has much wider me= aning and "sector size" also varies wildly. Having said this and because FreeBSD actually goes via geom for disks now, f= iguring out the correct "sector size" is for geom to do. As it is now, perha= ps ZFS should be made to query the "stripe size" property of the geom provid= er and not the "sector size". A sysctl to that effect makes more sense to me= , will make things more automatic for most users. Daniel= From owner-freebsd-fs@FreeBSD.ORG Sat Jul 6 05:28:29 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id D4046C83 for ; Sat, 6 Jul 2013 05:28:29 +0000 (UTC) (envelope-from daniel@digsys.bg) Received: from smtp-sofia.digsys.bg (smtp-sofia.digsys.bg [193.68.21.123]) by mx1.freebsd.org (Postfix) with ESMTP id 53D3B1912 for ; Sat, 6 Jul 2013 05:28:28 +0000 (UTC) Received: from [10.183.196.154] ([213.226.63.185]) (authenticated bits=0) by smtp-sofia.digsys.bg (8.14.6/8.14.6) with ESMTP id r665SQa9073116 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO); Sat, 6 Jul 2013 08:28:27 +0300 (EEST) (envelope-from daniel@digsys.bg) References: <51D42107.1050107@digsys.bg> <2EF46A8C-6908-4160-BF99-EC610B3EA771@alumni.chalmers.se> <51D437E2.4060101@digsys.bg> <20130704000405.GA75529@icarus.home.lan> <20130704171637.GA94539@icarus.home.lan> <2A261BEA-4452-4F6A-8EFB-90A54D79CBB9@alumni.chalmers.se> <20130704191203.GA95642@icarus.home.lan> <43015E9015084CA6BAC6978F39D22E8B@multiplay.co.uk> <3CFB4564D8EB4A6A9BCE2AFCC5B6E400@multiplay.co.uk> <51D6A206.2020303@digsys.bg> Mime-Version: 1.0 (1.0) In-Reply-To: Message-Id: X-Mailer: iPhone Mail (10B329) From: Daniel Kalchev Subject: Re: Slow resilvering with mirrored ZIL Date: Sat, 6 Jul 2013 08:28:25 +0300 To: Freddie Cash Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: FreeBSD Filesystems X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 06 Jul 2013 05:28:29 -0000 > Thinking about this, it would be nice to have a property that can be set v= ia -o to "zpool add" and "zpool create", as well as a pool property for mini= mum ashift. Obviously, you'd have to set the minimum when the pool is create= d. >=20 Is there performance etc benefit from having all vdevs with the same ashift?= From owner-freebsd-fs@FreeBSD.ORG Sat Jul 6 18:30:51 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id CC200587 for ; Sat, 6 Jul 2013 18:30:51 +0000 (UTC) (envelope-from bofh@terranova.net) Received: from tog.net (tog.net [IPv6:2605:5a00::5]) by mx1.freebsd.org (Postfix) with ESMTP id 6412A1F8F for ; Sat, 6 Jul 2013 18:30:51 +0000 (UTC) Received: from [IPv6:2605:5a00:ffff::face] (unknown [IPv6:2605:5a00:ffff::face]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by tog.net (Postfix) with ESMTPSA id 3bnhLx74kSz1jW; Sat, 6 Jul 2013 14:30:49 -0400 (EDT) Message-ID: <51D8624D.30100@terranova.net> Date: Sat, 06 Jul 2013 14:30:37 -0400 From: Travis Mikalson Organization: TerraNovaNet Internet Services User-Agent: Thunderbird 2.0.0.24 (Windows/20100228) MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Re: Report: ZFS deadlock in 9-STABLE References: <51D45401.5050801@terranova.net> <51D5776F.5060101@FreeBSD.org> <51D57C19.1080906@terranova.net> <51D5804B.7090702@FreeBSD.org> <51D586F9.7060508@terranova.net> <20130704145548.GA91766@icarus.home.lan> In-Reply-To: <20130704145548.GA91766@icarus.home.lan> X-Enigmail-Version: 0.96.0 OpenPGP: url=http://www.terranova.net/pgp/bofh Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 06 Jul 2013 18:30:51 -0000 Sorry for the late reply, been enjoying a bit of a holiday away from my PC the last couple of days. Jeremy Chadwick wrote: > I'd like to get output from all of these commands: > > - dmesg (you can hide/XXX out the system name if you want, but please > don't remove anything else, barring IP addresses/etc.) That is now available in file storage1-dmesg under http://tog.net/freebsd/ > - zpool get all File storage1-zpoolgetall > - zfs get all storage1-zfsgetall (big file) > - "gpart show -p" for every disk on the system Available in storage1-gpartshow-p, the SAS disks have no geom since they're being used in their entirety by ZFS. The compact flash and the two SSDs are the only partitioned devices. > - "vmstat -i" when the system is livelocked (if possible; see below) I will try to get that for you if it happens again. Keep in mind if Xin Li's patch is effective we may not get another chance. Very optimistic, I know. I've added vmstat -i > /dev/null to a crontab to keep the vmstat command cached and ready to run during a storage livelock. > - The exact brand and model string of mps(4) controllers you're using These are IBM ServeRAID 1015 controllers flashed with LSI SAS 9211_8i "IT" firmware. They are flashed to be passthrough without RAID functionality or a BIOS you can boot from. If this isn't enough info and is really important, I will need to go pull the server out and open it up to get whatever info the IBM controllers might have on their stickers. > - The exact firmware version and firmware type (often a 2-letter code) > you're using on your mps(4) controllers (dmesg might show some of this > but possibly not all) I've flashed these to LSI firmware type "IT", straight passthrough without RAID functionality. mps0: Firmware: 14.00.00.00, Driver: 14.00.00.01-fbsd mps1: Firmware: 14.00.00.00, Driver: 14.00.00.01-fbsd mps2: Firmware: 14.00.00.00, Driver: 14.00.00.01-fbsd I can confirm that this is accurate, it's 14.00.00.00 I flashed on there from the LSI firmware release notes in the firmware package I used. SCS Engineering Release Notice Phase14 GCA Release Version 14.00.00.00 - SAS2FW_Phase14 (SCGCQ00300504) > - Is powerd(8) running on this system at all? It is not. > Please put these in separate files and upload them to > http://tog.net/freebsd/ if you could. (For the gpart output, you can > put all the output from all the disks in a single file) Done, as described above. > I can see your ZFS disks are probably using those mps(4) controllers. I > also see you have an AHCI controller. Right, three mps controllers plugged into the system and the motherboard's onboard controllers. There's a DVD-ROM connected ATA mode: atapci0: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xff00-0xff0f at device 20.1 on pci0 ata0: at channel 0 on atapci0 ata1: at channel 1 on atapci0 cd0 at ata0 bus 0 scbus7 target 1 lun 0 And the compact flash is connected to an onboard controller that's in AHCI mode: ahci0: port 0x9000-0x9007,0x8000-0x8003,0x7000-0x7007,0x6000-0x6003,0x5000-0x500f mem 0xfebfbc00-0xfebfbfff irq 22 at device 17.0 on pci0 ahci0: AHCI v1.10 with 4 3Gbps ports, Port Multiplier supported ahcich0: at channel 0 on ahci0 ahcich1: at channel 1 on ahci0 ahcich2: at channel 2 on ahci0 ahcich3: at channel 3 on ahci0 ada0 at ahcich2 bus 0 scbus5 target 0 lun 0 > I know you can't move all your disks to the AHCI controller due to there > not being enough ports, and the controller might not even work with SAS > disks (depends, some newer/higher end Intel ones do), but: > > A "CF drive locking up too" doesn't really tell us anything about the CF > drive, how it's hooked up, etc... But I'd rather not even go into that, > because: The compact flash is connected using a SYBA SD-ADA40001 SATA II To Compact Flash Adapter. This and the DVD-ROM drive are the only devices connected to the motherboard's onboard controllers. > Advice: > > Hook a SATA disk up to your ahci(4) controller and just leave it there. > No filesystem, just a raw disk sitting on a bus. When the livelock > happens, in another window issue "dd if=/dev/ada0 of=/dev/null bs=64k" > (disk might not be named ada0; again, need that dmesg) and after a > second or two press Ctrl-T to see if you get any output (output should > be immediate). If you do get output, it means GEOM and/or CAM are still > functional in some manner, and that puts more focus on the mps(4) side > of things. There are still nearly infinite explanations for what's > going on though. Which leads me to... Sure, I could do that, but would it be sufficient to try dd'ing from the compact flash device that's already connected via AHCI? I can add a dd command to crontab to run every minute so dd is cached and available for me to run during a loss-of-storage livelock condition. > Question: > > If the system is livelocked, how are you running "procstat -kk -a" in > the first place? Or does it "livelock" and then release itself from the > pain (eventually), only later to re-lock? A "livelock" usually implies > the system is alive in some way (hitting NumLock on the keyboard > (hopefully PS/2) still toggles the LED (kernel does this -- I've used > this as a way to see if a system is locked up or not for years)) just > that some layer pertaining to your focus (ZFS I/O) is wonky. If it > comes and goes, there may be some explanations for that, but output from > those commands would greatly help. This is a permanent livelock, it never recovers on its own. The system requires a hard reset. I have a cron job running every minute that runs procstat -kk -a > /dev/null to ensure that the procstat command is always cached and available to me when I go to use it without any access to storage. During the livelock, I used the ssh session I already had open to run my procstat -kk -a, it was the last thing I could do within that session without resetting the system. After the procstat command completed, I apparently needed an I/O to get my shell prompt back and that never came of course. It's a livelock of the storage-related bits only. Numlock does toggle and you can actually get a response in all the normal ways you would expect to get a response if your FreeBSD system had suddenly lost all contact with all of its storage. You can ping it, establish TCP connections with any listening services but get no banner/greeting response, etc. Console is responsive right up to the point that it needs an I/O. > Question: > > What's with the tunings in loader.conf and sysctl.conf for ZFS? Not > saying those are the issue, just asking why you're setting those at all. > Is there something we need to know about that you've run into in the > past? It's all from fairly well-documented situations, including things from https://wiki.freebsd.org/ZFSTuningGuide The sysctl tunings are as follows: Knowing my HPET is trustworthy, chose it as my best timecounter hardware. Have used default and other timecounters with no effect on my livelock issue. Upped my maxvnodes, merely for performance tuning considering this box's role. This is actually mentioned in the FreeBSD handbook. vfs.zfs.l2arc_write_max=250000000 vfs.zfs.l2arc_write_boost=450000000 These increase the speed at which ZFS will write to cache devices. ZFS, by default, seems to throttle down the speed at which it will write to a cache device quite a lot. These SSDs can handle way more than ZFS was giving them. The loader.conf tunings are as follows: Increased the amount of filesystem metadata ZFS allows to be cached in ARC since I've got many small things I'd like to be able to find in both ARC and ~460GB of L2ARC with less need to walk around on the disks looking for metadata. All things considered, I don't mind my RAM ARC being mostly full of metadata and the actual cached data coming off the SSDs. Increased the ZFS TXG write limit to something appropriately large for the RAM I've got to work with. The last and most recent tunings in loader.conf: I actually ran out of mbuf clusters a month ago and had to hard reset the system to bring the network back. ifconfig down/up actually gave an error along the lines of cannot allocate memory. I increased all related limits to fairly high values as the defaults were apparently too low. And this was just a single gigabit interface that was at 70-90% utilization at the time of total permanent networking death due to mbuf cluster exhaustion. From owner-freebsd-fs@FreeBSD.ORG Sat Jul 6 19:29:40 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id DF2D62CD for ; Sat, 6 Jul 2013 19:29:40 +0000 (UTC) (envelope-from radiomlodychbandytow@o2.pl) Received: from moh3-ve3.go2.pl (moh3-ve3.go2.pl [193.17.41.87]) by mx1.freebsd.org (Postfix) with ESMTP id A0BF9116D for ; Sat, 6 Jul 2013 19:29:40 +0000 (UTC) Received: from moh3-ve3.go2.pl (unknown [10.0.0.158]) by moh3-ve3.go2.pl (Postfix) with ESMTP id 92354B5A713 for ; Sat, 6 Jul 2013 21:29:39 +0200 (CEST) Received: from unknown (unknown [10.0.0.142]) by moh3-ve3.go2.pl (Postfix) with SMTP for ; Sat, 6 Jul 2013 21:29:38 +0200 (CEST) Received: from unknown [93.175.66.185] by poczta.o2.pl with ESMTP id lvlUWf; Sat, 06 Jul 2013 21:29:38 +0200 Message-ID: <51D8701E.1010209@o2.pl> Date: Sat, 06 Jul 2013 21:29:34 +0200 From: =?UTF-8?B?UmFkaW8gbcWCb2R5Y2ggYmFuZHl0w7N3?= User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130611 Thunderbird/17.0.6 MIME-Version: 1.0 To: freebsd-fs@freebsd.org, Xin Li Subject: Re: freebsd-fs Digest, Vol 524, Issue 10 References: In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-O2-Trust: 1, 37 X-O2-SPF: neutral Cc: Volodymyr Kostyrko , d@delphij.net, marck@rinet.ru, avg@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 06 Jul 2013 19:29:40 -0000 On 05/07/2013 12:30, freebsd-fs-request@freebsd.org wrote: > On 7/4/13 1:44 PM, Volodymyr Kostyrko wrote: >> > 04.07.2013 23:36, Xin Li wrote: >>> >> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 >>> >> >>> >> On 7/4/13 1:27 PM, Volodymyr Kostyrko wrote: >>>> >>> 04.07.2013 19:02, Andriy Gapon wrote: >>>>> >>>> on 04/07/2013 18:57 Volodymyr Kostyrko said the following: >>>>>> >>>>> Yes. Much better in terms of speed. >>>>> >>>> >>>>> >>>> And compression too. >>>> >>> >>>> >>> Can't really say. >>>> >>> >>>> >>> When the code first appeared in stable I moved two of my >>>> >>> machines (desktops) to LZ4 recreating each dataset. To my >>>> >>> surprise gain at transition from lzjb was fairly minimal and >>>> >>> sometimes LZ4 even loses to lzjb in compression size. However >>>> >>> better compression/decompression speed and moreover earlier >>>> >>> takeoff when data is incompressible clearly makes lz4 a >>>> >>> winner. >>> >> >>> >> I'm interested in this -- what's the nature of data on that >>> >> dataset (e.g. plain text? binaries? images?) >> > >> > Triple no. Biggest difference in lzjb favor was at zvol with Mac OS >> > X Snow Leo. >> > >> > Maybe it's just because recordsize is too small on zvols? Anyway >> > the difference was like a 1% or 2%. Can't remember but can retest. > Hmm that's weird. I haven't tried Mac iSCSI volumes but do have tried > Windows iSCSI volumes, and lz4 was a win. > > It may be helpful if you can post your 'zfs get all ' > output so we can try to reproduce the problem at lab? I guess this is mostly about record size. LZJB seems to have been designed for 4K buffers, works well for 8K, but then compression ratio basically flats out, which is why in general it performs badly. LZ4 has been designed for megabytes of data (no more no less) and while it scales well to 128K, they are rather close at 8K and I don't find it surprising that LZJB wins from time to time. https://extrememoderate.wordpress.com/2011/08/14/synthetic-test-of-filesystem-compression-part-1/ -- Twoje radio From owner-freebsd-fs@FreeBSD.ORG Sat Jul 6 19:46:06 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 39C0AAC1 for ; Sat, 6 Jul 2013 19:46:06 +0000 (UTC) (envelope-from prvs=1899a30d0c=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id D1E2E1238 for ; Sat, 6 Jul 2013 19:46:05 +0000 (UTC) Received: from r2d2 ([82.69.141.170]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50004773479.msg for ; Sat, 06 Jul 2013 20:45:58 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Sat, 06 Jul 2013 20:45:58 +0100 (not processed: message from valid local sender) X-MDDKIM-Result: neutral (mail1.multiplay.co.uk) X-MDRemoteIP: 82.69.141.170 X-Return-Path: prvs=1899a30d0c=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-fs@freebsd.org Message-ID: <04A8AD5DBC794325901908DC17467CA2@multiplay.co.uk> From: "Steven Hartland" To: "Travis Mikalson" , References: <51D45401.5050801@terranova.net> <51D5776F.5060101@FreeBSD.org> <51D57C19.1080906@terranova.net> <51D5804B.7090702@FreeBSD.org> <51D586F9.7060508@terranova.net> <20130704145548.GA91766@icarus.home.lan> <51D8624D.30100@terranova.net> Subject: Re: Report: ZFS deadlock in 9-STABLE Date: Sat, 6 Jul 2013 20:46:10 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 06 Jul 2013 19:46:06 -0000 ----- Original Message ----- From: "Travis Mikalson" > I've flashed these to LSI firmware type "IT", straight passthrough > without RAID functionality. > > mps0: Firmware: 14.00.00.00, Driver: 14.00.00.01-fbsd > mps1: Firmware: 14.00.00.00, Driver: 14.00.00.01-fbsd > mps2: Firmware: 14.00.00.00, Driver: 14.00.00.01-fbsd > > I can confirm that this is accurate, it's 14.00.00.00 I flashed on there > from the LSI firmware release notes in the firmware package I used. > > SCS Engineering Release Notice > Phase14 GCA Release Version 14.00.00.00 - SAS2FW_Phase14 (SCGCQ00300504) You should ideally be using at least FW Phase16 for those cards. Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Sat Jul 6 20:35:47 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 17DA6B92 for ; Sat, 6 Jul 2013 20:35:47 +0000 (UTC) (envelope-from bofh@terranova.net) Received: from tog.net (tog.net [216.89.226.5]) by mx1.freebsd.org (Postfix) with ESMTP id E910B13AA for ; Sat, 6 Jul 2013 20:35:46 +0000 (UTC) Received: from [192.168.5.201] (tog.bb.terranova.net [216.182.250.242]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by tog.net (Postfix) with ESMTPSA id 3bnl700Djpz29J; Sat, 6 Jul 2013 16:35:40 -0400 (EDT) Message-ID: <51D87F8F.6020906@terranova.net> Date: Sat, 06 Jul 2013 16:35:27 -0400 From: Travis Mikalson Organization: TerraNovaNet Internet Services User-Agent: Thunderbird 2.0.0.24 (Windows/20100228) MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Re: Report: ZFS deadlock in 9-STABLE References: <51D45401.5050801@terranova.net> <51D5776F.5060101@FreeBSD.org> <51D57C19.1080906@terranova.net> <51D5804B.7090702@FreeBSD.org> <51D586F9.7060508@terranova.net> <20130704145548.GA91766@icarus.home.lan> <51D8624D.30100@terranova.net> <04A8AD5DBC794325901908DC17467CA2@multiplay.co.uk> In-Reply-To: <04A8AD5DBC794325901908DC17467CA2@multiplay.co.uk> X-Enigmail-Version: 0.96.0 OpenPGP: url=http://www.terranova.net/pgp/bofh Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 06 Jul 2013 20:35:47 -0000 Steven Hartland wrote: > > ----- Original Message ----- From: "Travis Mikalson" > >> I've flashed these to LSI firmware type "IT", straight passthrough >> without RAID functionality. >> >> mps0: Firmware: 14.00.00.00, Driver: 14.00.00.01-fbsd >> mps1: Firmware: 14.00.00.00, Driver: 14.00.00.01-fbsd >> mps2: Firmware: 14.00.00.00, Driver: 14.00.00.01-fbsd >> >> I can confirm that this is accurate, it's 14.00.00.00 I flashed on there >> from the LSI firmware release notes in the firmware package I used. >> >> SCS Engineering Release Notice >> Phase14 GCA Release Version 14.00.00.00 - SAS2FW_Phase14 (SCGCQ00300504) > > You should ideally be using at least FW Phase16 for those cards. Phase14 was the latest when I assembled the system of course. I'll look into updating these to Phase16, though I hate doing that on production boxes. I may simply buy three new controllers, flash them elsewhere and then swap them in. Wow, I see LSI has a FreeBSD sas2flash utility. Has anybody tried LSI's FreeBSD driver, mpslsi? If so, do you feel it's more advisable to use for some reason instead of FreeBSD's included mps driver? From owner-freebsd-fs@FreeBSD.ORG Sat Jul 6 22:06:47 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 83C63933 for ; Sat, 6 Jul 2013 22:06:47 +0000 (UTC) (envelope-from prvs=1899a30d0c=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 2618E1777 for ; Sat, 6 Jul 2013 22:06:46 +0000 (UTC) Received: from r2d2 ([82.69.141.170]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50004776079.msg for ; Sat, 06 Jul 2013 23:06:45 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Sat, 06 Jul 2013 23:06:45 +0100 (not processed: message from valid local sender) X-MDDKIM-Result: neutral (mail1.multiplay.co.uk) X-MDRemoteIP: 82.69.141.170 X-Return-Path: prvs=1899a30d0c=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-fs@freebsd.org Message-ID: <570BB7A0D99B4F11A60492874CDE84AF@multiplay.co.uk> From: "Steven Hartland" To: "Travis Mikalson" , References: <51D45401.5050801@terranova.net> <51D5776F.5060101@FreeBSD.org> <51D57C19.1080906@terranova.net> <51D5804B.7090702@FreeBSD.org> <51D586F9.7060508@terranova.net> <20130704145548.GA91766@icarus.home.lan> <51D8624D.30100@terranova.net> <04A8AD5DBC794325901908DC17467CA2@multiplay.co.uk> <51D87F8F.6020906@terranova.net> Subject: Re: Report: ZFS deadlock in 9-STABLE Date: Sat, 6 Jul 2013 23:06:58 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 06 Jul 2013 22:06:47 -0000 ----- Original Message ----- From: "Travis Mikalson" To: Cc: "Steven Hartland" Sent: Saturday, July 06, 2013 9:35 PM Subject: Re: Report: ZFS deadlock in 9-STABLE > Steven Hartland wrote: >> >> ----- Original Message ----- From: "Travis Mikalson" >> >>> I've flashed these to LSI firmware type "IT", straight passthrough >>> without RAID functionality. >>> >>> mps0: Firmware: 14.00.00.00, Driver: 14.00.00.01-fbsd >>> mps1: Firmware: 14.00.00.00, Driver: 14.00.00.01-fbsd >>> mps2: Firmware: 14.00.00.00, Driver: 14.00.00.01-fbsd >>> >>> I can confirm that this is accurate, it's 14.00.00.00 I flashed on there >>> from the LSI firmware release notes in the firmware package I used. >>> >>> SCS Engineering Release Notice >>> Phase14 GCA Release Version 14.00.00.00 - SAS2FW_Phase14 (SCGCQ00300504) >> >> You should ideally be using at least FW Phase16 for those cards. > > Phase14 was the latest when I assembled the system of course. I'll look > into updating these to Phase16, though I hate doing that on production > boxes. I may simply buy three new controllers, flash them elsewhere and > then swap them in. > > Wow, I see LSI has a FreeBSD sas2flash utility. Yes we've used in on loads of servers, always puts my heart in my stomach but has hever failed yet (touch wood). > Has anybody tried LSI's FreeBSD driver, mpslsi? If so, do you feel it's > more advisable to use for some reason instead of FreeBSD's included mps > driver? All the changes in this (v16) + others have are being prepared to be merged into head. I don't believe there are any pressing things that would need you to switch to it until its in the tree. Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk.