From owner-freebsd-stable@FreeBSD.ORG Fri Jan 16 23:07:55 2015 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 59E95C9D for ; Fri, 16 Jan 2015 23:07:55 +0000 (UTC) Received: from mail-wg0-x236.google.com (mail-wg0-x236.google.com [IPv6:2a00:1450:400c:c00::236]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 37792CFB for ; Fri, 16 Jan 2015 23:07:54 +0000 (UTC) Received: by mail-wg0-f54.google.com with SMTP id z12so22971444wgg.13 for ; Fri, 16 Jan 2015 15:07:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type; bh=cTAbAXCpAhR6CcpwFjo/KhuV6Vtn29RoCbPvX114gCs=; b=WDfuSqnxGe5+GT2/5kUXSc8weAFvpCDt1ikIbDV1mCbocJwBVtLUHw0RHQzl51ZkL4 5q17btZaJ5FWUec8P82/KP6x/haFIWU5i2jq9xVjgNfdYTk72SMTRQFQFoipZcTmu81c bCNd/QfS9mYLApeB4K980XnYWsOBWIb5SCkMMW6Zji8C/GLhgbrQpTA830Il89jYkW8d ya2wZCQheOVGb8TMWGi4OHRYuMbxPYcTjPNGJ32uU1cRAFQlrSQu8Dj1ko71fVRazBJg +mj2Fnc4x46N5PCNzNrEM2VZnVG2Jn5S/bod/+Rqkn+/M4lfmi0o6ld/0lsvvdsiNYsp SAzg== X-Received: by 10.194.235.226 with SMTP id up2mr34195337wjc.9.1421449672659; Fri, 16 Jan 2015 15:07:52 -0800 (PST) Received: from [192.168.3.105] ([193.148.0.35]) by mx.google.com with ESMTPSA id x16sm4701335wia.15.2015.01.16.15.07.51 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 16 Jan 2015 15:07:51 -0800 (PST) Message-ID: <54B999C6.2090909@gmail.com> Date: Sat, 17 Jan 2015 01:07:50 +0200 From: Mihai Vintila User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:31.0) Gecko/20100101 Thunderbird/31.4.0 MIME-Version: 1.0 To: freebsd-stable@freebsd.org Subject: Re: Poor performance on Intel P3600 NVME driver References: <54B7F769.40605@gmail.com> <20150115175927.GA19071@zxy.spb.ru> <54B8C7E9.3030602@gmail.com> <20150116221344.GA72201@pit.databus.com> In-Reply-To: <20150116221344.GA72201@pit.databus.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 16 Jan 2015 23:07:55 -0000 I've remade the test with atime=off. Drive has 512b physical, but I've created it with 4k gnop anyway. Results are similar with atime Processor cache line size set to 32 bytes. File stride size set to 17 * record size. random random bk wd record stride KB reclen write rewrite read reread read write re ad rewrite read fwrite frewrite fread freread 1048576 4 74427 0 101744 0 93529 47925 1048576 8 39072 0 64693 0 61104 25452 I've also tried to increase vfs.zfs.vdev.aggregation_limit and ended up with a crash (screenshot attached) I'm attaching zfs tunables: sysctl -a|grep vfs.zfs vfs.zfs.arc_max: 34359738368 vfs.zfs.arc_min: 4294967296 vfs.zfs.arc_average_blocksize: 8192 vfs.zfs.arc_meta_used: 5732232 vfs.zfs.arc_meta_limit: 8589934592 vfs.zfs.l2arc_write_max: 8388608 vfs.zfs.l2arc_write_boost: 8388608 vfs.zfs.l2arc_headroom: 2 vfs.zfs.l2arc_feed_secs: 1 vfs.zfs.l2arc_feed_min_ms: 200 vfs.zfs.l2arc_noprefetch: 1 vfs.zfs.l2arc_feed_again: 1 vfs.zfs.l2arc_norw: 1 vfs.zfs.anon_size: 32768 vfs.zfs.anon_metadata_lsize: 0 vfs.zfs.anon_data_lsize: 0 vfs.zfs.mru_size: 17841664 vfs.zfs.mru_metadata_lsize: 858624 vfs.zfs.mru_data_lsize: 13968384 vfs.zfs.mru_ghost_size: 0 vfs.zfs.mru_ghost_metadata_lsize: 0 vfs.zfs.mru_ghost_data_lsize: 0 vfs.zfs.mfu_size: 4574208 vfs.zfs.mfu_metadata_lsize: 465408 vfs.zfs.mfu_data_lsize: 4051456 vfs.zfs.mfu_ghost_size: 0 vfs.zfs.mfu_ghost_metadata_lsize: 0 vfs.zfs.mfu_ghost_data_lsize: 0 vfs.zfs.l2c_only_size: 0 vfs.zfs.dedup.prefetch: 1 vfs.zfs.nopwrite_enabled: 1 vfs.zfs.mdcomp_disable: 0 vfs.zfs.dirty_data_max: 4294967296 vfs.zfs.dirty_data_max_max: 4294967296 vfs.zfs.dirty_data_max_percent: 10 vfs.zfs.dirty_data_sync: 67108864 vfs.zfs.delay_min_dirty_percent: 60 vfs.zfs.delay_scale: 500000 vfs.zfs.prefetch_disable: 1 vfs.zfs.zfetch.max_streams: 8 vfs.zfs.zfetch.min_sec_reap: 2 vfs.zfs.zfetch.block_cap: 256 vfs.zfs.zfetch.array_rd_sz: 1048576 vfs.zfs.top_maxinflight: 32 vfs.zfs.resilver_delay: 2 vfs.zfs.scrub_delay: 4 vfs.zfs.scan_idle: 50 vfs.zfs.scan_min_time_ms: 1000 vfs.zfs.free_min_time_ms: 1000 vfs.zfs.resilver_min_time_ms: 3000 vfs.zfs.no_scrub_io: 0 vfs.zfs.no_scrub_prefetch: 0 vfs.zfs.metaslab.gang_bang: 131073 vfs.zfs.metaslab.fragmentation_threshold: 70 vfs.zfs.metaslab.debug_load: 0 vfs.zfs.metaslab.debug_unload: 0 vfs.zfs.metaslab.df_alloc_threshold: 131072 vfs.zfs.metaslab.df_free_pct: 4 vfs.zfs.metaslab.min_alloc_size: 10485760 vfs.zfs.metaslab.load_pct: 50 vfs.zfs.metaslab.unload_delay: 8 vfs.zfs.metaslab.preload_limit: 3 vfs.zfs.metaslab.preload_enabled: 1 vfs.zfs.metaslab.fragmentation_factor_enabled: 1 vfs.zfs.metaslab.lba_weighting_enabled: 1 vfs.zfs.metaslab.bias_enabled: 1 vfs.zfs.condense_pct: 200 vfs.zfs.mg_noalloc_threshold: 0 vfs.zfs.mg_fragmentation_threshold: 85 vfs.zfs.check_hostid: 1 vfs.zfs.spa_load_verify_maxinflight: 10000 vfs.zfs.spa_load_verify_metadata: 1 vfs.zfs.spa_load_verify_data: 1 vfs.zfs.recover: 0 vfs.zfs.deadman_synctime_ms: 1000000 vfs.zfs.deadman_checktime_ms: 5000 vfs.zfs.deadman_enabled: 1 vfs.zfs.spa_asize_inflation: 24 vfs.zfs.txg.timeout: 5 vfs.zfs.vdev.cache.max: 16384 vfs.zfs.vdev.cache.size: 0 vfs.zfs.vdev.cache.bshift: 16 vfs.zfs.vdev.trim_on_init: 0 vfs.zfs.vdev.mirror.rotating_inc: 0 vfs.zfs.vdev.mirror.rotating_seek_inc: 5 vfs.zfs.vdev.mirror.rotating_seek_offset: 1048576 vfs.zfs.vdev.mirror.non_rotating_inc: 0 vfs.zfs.vdev.mirror.non_rotating_seek_inc: 1 vfs.zfs.vdev.max_active: 1000 vfs.zfs.vdev.sync_read_min_active: 32 vfs.zfs.vdev.sync_read_max_active: 32 vfs.zfs.vdev.sync_write_min_active: 32 vfs.zfs.vdev.sync_write_max_active: 32 vfs.zfs.vdev.async_read_min_active: 32 vfs.zfs.vdev.async_read_max_active: 32 vfs.zfs.vdev.async_write_min_active: 32 vfs.zfs.vdev.async_write_max_active: 32 vfs.zfs.vdev.scrub_min_active: 1 vfs.zfs.vdev.scrub_max_active: 2 vfs.zfs.vdev.trim_min_active: 1 vfs.zfs.vdev.trim_max_active: 64 vfs.zfs.vdev.aggregation_limit: 131072 vfs.zfs.vdev.read_gap_limit: 32768 vfs.zfs.vdev.write_gap_limit: 4096 vfs.zfs.vdev.bio_flush_disable: 0 vfs.zfs.vdev.bio_delete_disable: 0 vfs.zfs.vdev.trim_max_bytes: 2147483648 vfs.zfs.vdev.trim_max_pending: 64 vfs.zfs.max_auto_ashift: 13 vfs.zfs.min_auto_ashift: 9 vfs.zfs.zil_replay_disable: 0 vfs.zfs.cache_flush_disable: 0 vfs.zfs.zio.use_uma: 1 vfs.zfs.zio.exclude_metadata: 0 vfs.zfs.sync_pass_deferred_free: 2 vfs.zfs.sync_pass_dont_compress: 5 vfs.zfs.sync_pass_rewrite: 2 vfs.zfs.snapshot_list_prefetch: 0 vfs.zfs.super_owner: 0 vfs.zfs.debug: 0 vfs.zfs.version.ioctl: 4 vfs.zfs.version.acl: 1 vfs.zfs.version.spa: 5000 vfs.zfs.version.zpl: 5 vfs.zfs.vol.mode: 1 vfs.zfs.trim.enabled: 0 vfs.zfs.trim.txg_delay: 32 vfs.zfs.trim.timeout: 30 vfs.zfs.trim.max_interval: 1 And nvm: ev.nvme.%parent: dev.nvme.0.%desc: Generic NVMe Device dev.nvme.0.%driver: nvme dev.nvme.0.%location: slot=0 function=0 handle=\_SB_.PCI0.BR3A.D08A dev.nvme.0.%pnpinfo: vendor=0x8086 device=0x0953 subvendor=0x8086 subdevice=0x370a class=0x010802 dev.nvme.0.%parent: pci4 dev.nvme.0.int_coal_time: 0 dev.nvme.0.int_coal_threshold: 0 dev.nvme.0.timeout_period: 30 dev.nvme.0.num_cmds: 811857 dev.nvme.0.num_intr_handler_calls: 485242 dev.nvme.0.reset_stats: 0 dev.nvme.0.adminq.num_entries: 128 dev.nvme.0.adminq.num_trackers: 16 dev.nvme.0.adminq.sq_head: 12 dev.nvme.0.adminq.sq_tail: 12 dev.nvme.0.adminq.cq_head: 8 dev.nvme.0.adminq.num_cmds: 12 dev.nvme.0.adminq.num_intr_handler_calls: 7 dev.nvme.0.adminq.dump_debug: 0 dev.nvme.0.ioq0.num_entries: 256 dev.nvme.0.ioq0.num_trackers: 128 dev.nvme.0.ioq0.sq_head: 69 dev.nvme.0.ioq0.sq_tail: 69 dev.nvme.0.ioq0.cq_head: 69 dev.nvme.0.ioq0.num_cmds: 811845 dev.nvme.0.ioq0.num_intr_handler_calls: 485235 dev.nvme.0.ioq0.dump_debug: 0 dev.nvme.1.%desc: Generic NVMe Device dev.nvme.1.%driver: nvme dev.nvme.1.%location: slot=0 function=0 handle=\_SB_.PCI0.BR3B.H000 dev.nvme.1.%pnpinfo: vendor=0x8086 device=0x0953 subvendor=0x8086 subdevice=0x370a class=0x010802 dev.nvme.1.%parent: pci5 dev.nvme.1.int_coal_time: 0 dev.nvme.1.int_coal_threshold: 0 dev.nvme.1.timeout_period: 30 dev.nvme.1.num_cmds: 167 dev.nvme.1.num_intr_handler_calls: 163 dev.nvme.1.reset_stats: 0 dev.nvme.1.adminq.num_entries: 128 dev.nvme.1.adminq.num_trackers: 16 dev.nvme.1.adminq.sq_head: 12 dev.nvme.1.adminq.sq_tail: 12 dev.nvme.1.adminq.cq_head: 8 dev.nvme.1.adminq.num_cmds: 12 dev.nvme.1.adminq.num_intr_handler_calls: 8 dev.nvme.1.adminq.dump_debug: 0 dev.nvme.1.ioq0.num_entries: 256 dev.nvme.1.ioq0.num_trackers: 128 dev.nvme.1.ioq0.sq_head: 155 dev.nvme.1.ioq0.sq_tail: 155 dev.nvme.1.ioq0.cq_head: 155 dev.nvme.1.ioq0.num_cmds: 155 dev.nvme.1.ioq0.num_intr_handler_calls: 155 dev.nvme.1.ioq0.dump_debug: 0 Best regards, Vintila Mihai Alexandru On 1/17/2015 12:13 AM, Barney Wolff wrote: > I suspect Linux defaults to noatime - at least it does on my rpi. I > believe the FreeBSD default is the other way. That may explain some > of the difference. > > Also, did you use gnop to force the zpool to start on a 4k boundary? > If not, and the zpool happens to be offset, that's another big hit. > Same for ufs, especially if the disk has logical sectors of 512 but > physical of 4096. One can complain that FreeBSD should prevent, or > at least warn about, this sort of foot-shooting. > > On Fri, Jan 16, 2015 at 10:21:07PM +0200, Mihai-Alexandru Vintila wrote: >> @Barney Wolff it's a new pool with only changes recordsize=4k and >> compression=lz4 . On linux test is on ext4 with default values. Penalty is >> pretty high. Also there is a read penalty for read as well between ufs and >> zfs. Even on nvmecontrol perftest you can see the read penalty it's not >> normal to have same result for both write and read