From owner-freebsd-hackers@freebsd.org Thu Jun 30 04:06:34 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E52C4B878D0 for ; Thu, 30 Jun 2016 04:06:34 +0000 (UTC) (envelope-from paul.koch137@gmail.com) Received: from mail-pa0-x22a.google.com (mail-pa0-x22a.google.com [IPv6:2607:f8b0:400e:c03::22a]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id B837B2E8C for ; Thu, 30 Jun 2016 04:06:34 +0000 (UTC) (envelope-from paul.koch137@gmail.com) Received: by mail-pa0-x22a.google.com with SMTP id hl6so24016155pac.2 for ; Wed, 29 Jun 2016 21:06:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:subject:message-id:mime-version :content-transfer-encoding; bh=Uu73p51+jpTA3TritXQi/VMilDXJqNMrnzwRMVm9nUs=; b=vymyVdiEtTSJ3uI59tredPsq9Si9SNLVSfGEqpSyyNp6XPUI9aOaOfG/OfrTLtUxoT JJtjlCzuUdXYQ9MMT5Jhys9ihAUKE1+aFeyaVoYBWM7iOT5JslKyPk4Ye44QBFeAJBgA tn6h2HOh5I3sruAJXiakEcmfTTpi0z3VI3lBXQgfLbiuXBP+SSyOZa1UPu9wU0HBdWlm K9ROdqfzuvScKjUOTvyTSFDy7VFow+8ERQLiri7sJ2RpirlU8c0DTeGwP8CCZy45aBIa 3yh/xCM+evdVuaNEBptryAsWyaEfa0dRPH0EhLgISMyQ77ZumfmBsXTZd0EHRne3s39j e78w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:subject:message-id:mime-version :content-transfer-encoding; bh=Uu73p51+jpTA3TritXQi/VMilDXJqNMrnzwRMVm9nUs=; b=acSv7BTIg8YR/Z0vKD6rHIhTTM715YztpL6+q6MiVvdQJO7p3yTOZmH9q8pnpO5jHu 2orTJlJgjr6XyKSBubBW0SqKP+PDCFhlrqWb9xERCtdrf1ByPWzL8Xgu3yH6Q70HQpUh vk2AZA5aenn5XOCaRH01UCs6QKNRGaNL1fnZ8LMa70I5WBFSGt+3BhQEwbvN0zN1ETXG a3MKmNKDig5pfaUH0In5za5bW/eusZKmp2B6Lj7kCPDY9vkkWMLKCV5xtWzm7MnTDrkY iFBj0/B5k5szX55uFJsQ7xHKO2efiIbKrvzYVgINlvORxoSB0jjR06C7zDpJcJv0r92h W1kg== X-Gm-Message-State: ALyK8tLnzL+fCG1pt4uBhMqpOdje7fp2c4bXLB2tOeyp6u9FzDGjuKKsLLvReUxYa4oQHA== X-Received: by 10.66.189.225 with SMTP id gl1mr18026557pac.158.1467259593800; Wed, 29 Jun 2016 21:06:33 -0700 (PDT) Received: from splash.akips.com (CPE-120-146-191-2.static.qld.bigpond.net.au. [120.146.191.2]) by smtp.gmail.com with ESMTPSA id v62sm1366627pfv.50.2016.06.29.21.06.31 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 29 Jun 2016 21:06:33 -0700 (PDT) Date: Thu, 30 Jun 2016 14:06:25 +1000 From: Paul Koch To: freebsd-hackers@freebsd.org Subject: ZFS ARC and mmap/page cache coherency question Message-ID: <20160630140625.3b4aece3@splash.akips.com> X-Mailer: Claws Mail 3.13.2 (GTK+ 2.24.29; amd64-portbld-freebsd10.2) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 Jun 2016 04:06:35 -0000 Posted this to -stable on the 15th June, but no feedback... We are trying to understand a performance issue when syncing large mmap'ed files on ZFS. Example test box setup: FreeBSD 10.3-p5 Intel i7-5820K 3.30GHz with 64G RAM 6 * 2 Tbyte Seagate ST2000DM001-1ER164 in a ZFS stripe Read performance of a sequentially written large file on the pool is typically around 950Mbytes/sec using dd. Our software mmap's some large database files using MAP_NOSYNC, and we call fsync() every 10 minutes when we know the file system is mostly idle. In our test setup, the database files are 1.1G, 2G, 1.4G, 12G, 4.7G and ~20 small files (under 10M). All of the memory pages in the mmap'ed files are updated every minute with new values, so the entire mmap'ed file needs to be synced to disk, not just fragments. When the 10 minute fsync() occurs, gstat typically shows very little disk reads and very high write speeds, which is what we expect. But, every 80 minutes we process the data in the large mmap'ed files and store it in highly compressed blocks of a ~300G file using pread/pwrite (i.e. not mmap'ed). After that, the performance of the next fsync() of the mmap'ed files falls off a cliff. We are assuming it is because the ARC has thrown away the cached data of the mmap'ed files. gstat shows lots of read/write contention and lots of things tend to stall waiting for disk. Is this just a lack of ZFS ARC and page cache coherency ?? Is there a way to prime the ARC with the mmap'ed files again before we call fsync() ? We've tried cat and read() on the mmap'ed files but doesn't seem to touch the disk at all and the fsync() performance is still poor, so it looks like the ARC is not being filled. msync() doesn't seem to be much different. mincore() stats show the mmap'ed data is entirely incore and referenced. Paul.