From owner-freebsd-arch@FreeBSD.ORG Wed Dec 19 13:55:03 2012 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id A181592D for ; Wed, 19 Dec 2012 13:55:03 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) by mx1.freebsd.org (Postfix) with ESMTP id 0FEFD8FC17 for ; Wed, 19 Dec 2012 13:55:02 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.5/8.14.5) with ESMTP id qBJDsp4N027304 for ; Wed, 19 Dec 2012 15:54:51 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.7.3 kib.kiev.ua qBJDsp4N027304 Received: (from kostik@localhost) by tom.home (8.14.5/8.14.5/Submit) id qBJDspVQ027303 for arch@freebsd.org; Wed, 19 Dec 2012 15:54:51 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Wed, 19 Dec 2012 15:54:51 +0200 From: Konstantin Belousov To: arch@freebsd.org Subject: Unmapped I/O Message-ID: <20121219135451.GU71906@kib.kiev.ua> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="qim0fXNpvdl5D74Y" Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Dec 2012 13:55:03 -0000 --qim0fXNpvdl5D74Y Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable One of the known FreeBSD I/O path performance bootleneck is the neccessity to map each I/O buffer pages into KVA. The problem is that on the multi-core machines, the mapping must flush TLB on all cores, due to the global mapping of the buffer pages into the kernel. This means that buffer creation and destruction disrupts execution of all other cores to perform TLB shootdown through IPI, and the thread initiating the shootdown must wait for all other cores to execute and report. The patch at http://people.freebsd.org/~kib/misc/unmapped.4.patch implements the 'unmapped buffers'. It means an ability to create the VMIO struct buf, which does not point to the KVA mapping the buffer pages to the kernel addresses. Since there is no mapping, kernel does not need to clear TLB. The unmapped buffers are marked with the new B_NOTMAPPED flag, and should be requested explicitely using the GB_NOTMAPPED flag to the buffer allocation routines. If the mapped buffer is requested but unmapped buffer already exists, the buffer subsystem automatically maps the pages. The clustering code is also made aware of the not-mapped buffers, but this required the KPI change that accounts for the diff in the non-UFS filesystems. UFS is adopted to request not mapped buffers when kernel does not need to access the content, i.e. mostly for the file data. New helper function vn_io_fault_pgmove() operates on the unmapped array of pages. It calls new pmap method pmap_copy_pages() to do the data move to and =66rom usermode. Besides not mapped buffers, not mapped BIOs are introduced, marked with the flag BIO_NOTMAPPED. Unmapped buffers are directly translated to unmapped BIOs. Geom providers may indicate an acceptance of the unmapped BIOs. If provider does not handle unmapped i/o requests, geom now automatically establishes transient mapping for the i/o pages. Swap- and malloc-backed md(4) is changed to accept unmapped BIOs. The gpart providers indicate the unmapped BIOs support if the underlying provider can do unmapped i/o. I also hacked ahci(4) to handle unmapped i/o, but this should be changed after the Jeff' physbio patch is committed, to use proper busdma interface. Besides, the swap pager does unmapped swapping if the swap partition indicated that it can do unmapped i/o. By Jeff request, a buffer allocation code may reserve the KVA for unmapped buffer in advance. The unmapped page-in for the vnode pager is also implemented if filesystem supports it, but the page out is not. The page-out, as well as the vnode-backed md(4), currently require mappings, mostly due to the use of VOP_WRITE(). As such, the patch worked in my test environment, where I used ahci-attached SATA disks with gpt partitions, md(4) and UFS. I see no statistically significant difference in the buildworld -j 10 times on the 4-core machine with HT. On the other hand, when doing sha1 over the 5GB file, the system time was reduced by 30%. Unfinished items: - Integration with the physbio, will be done after physbio is committed to HEAD. - The key per-architecture function needed for the unmapped i/o is the pmap_copy_pages(). I implemented it for amd64 and i386 right now, it shall be done for all other architectures. - The sizing of the submap used for transient mapping of the BIOs is naive. Should be adjusted, esp. for KVA-lean architectures. - Conversion of the other filesystems. Low priority. I am interested in reviews, tests and suggestions. Note that this only works now for md(4) and ahci(4), for other drivers the patched kernel should fall back to the mapped i/o. sys/amd64/amd64/pmap.c | 24 +++ sys/cam/ata/ata_da.c | 5 +- sys/cam/cam_ccb.h | 30 ++++ sys/dev/ahci/ahci.c | 53 +++++- sys/dev/md/md.c | 255 ++++++++++++++++++++++++----- sys/fs/cd9660/cd9660_vnops.c | 2 +- sys/fs/ext2fs/ext2_balloc.c | 2 +- sys/fs/ext2fs/ext2_vnops.c | 9 +- sys/fs/msdosfs/msdosfs_vnops.c | 4 +- sys/fs/udf/udf_vnops.c | 5 +- sys/geom/geom.h | 1 + sys/geom/geom_disk.c | 2 + sys/geom/geom_disk.h | 1 + sys/geom/geom_io.c | 44 ++++- sys/geom/geom_vfs.c | 10 +- sys/geom/part/g_part.c | 1 + sys/i386/i386/pmap.c | 42 +++++ sys/kern/vfs_bio.c | 356 +++++++++++++++++++++++++++++++++----= ---- sys/kern/vfs_cluster.c | 118 +++++++------- sys/kern/vfs_vnops.c | 39 +++++ sys/sys/bio.h | 7 + sys/sys/buf.h | 22 ++- sys/sys/mount.h | 1 + sys/sys/vnode.h | 2 + sys/ufs/ffs/ffs_alloc.c | 10 +- sys/ufs/ffs/ffs_balloc.c | 58 ++++--- sys/ufs/ffs/ffs_vfsops.c | 3 +- sys/ufs/ffs/ffs_vnops.c | 35 ++-- sys/ufs/ufs/ufs_extern.h | 1 + sys/vm/pmap.h | 2 + sys/vm/swap_pager.c | 43 +++-- sys/vm/swap_pager.h | 1 + sys/vm/vm.h | 2 + sys/vm/vm_init.c | 6 +- sys/vm/vm_kern.c | 9 +- sys/vm/vnode_pager.c | 30 +++- 36 files changed, 989 insertions(+), 246 deletions(-) --qim0fXNpvdl5D74Y Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iQIcBAEBAgAGBQJQ0ccfAAoJEJDCuSvBvK1BsN4QAKFcmhXwCzuwBTZcWIKK/J/Z 9BFBWG0hFKcIVOLyrwkEbwYdumjiVriJbGTl9PWrjc1e41YQBr4FNhrO/fitR31U rlEXuNaXjc/e5BuKg18nUGnrLBGQFryeT2ZYaomU06qtvMYknwXnbM4y+GmfYEnz FzoGICsoDpDZo9TKInL1Y/bM6gEgW5AjjdXJyOs/5Vb/ZrQJVBc/DMw7vg/U7olb EW6T7KxBc3d3zIkPkFtSHVRA6c3905gBYmKN/p11/GtZQpGsjLizYmK4WkwHvmR9 WVDkxRIK1XVq003om5HnTXZ+LPngDvZTC1djMAWjsHTAXwb8lLrewmcKIGQNaIxf 9qnIIuxX4FPHkpay7/EdlDQxR1gphSLbGFtLZBFMBxnCgAYMZXLguvLNnh/Jk1KC eRl8mgN7M2+E8JwcHgIsJTKMDrGuUvgIvCXJDHG8OuXKtdzzxrZi+fWeRfg/cTel K0sgvG49vWACpLoylCl0LcxXdtbBtYgNfjDdi/UaAqBPqUvRCrU9EuJlWhq7MgYp kJzlMcjKq1nxofy/bsXnztQ85KMgl88DN2CAXAqOpcfB9dVR5CbBYVw4UYeBuAoi Us9oIM09BddUHgunrdE3VAiwYDJWwfHgZI6t7dvma72eOChhJ0pmlJmDUGITCZH3 JxCckXc2yFKXR6UXGB5Y =BZQ1 -----END PGP SIGNATURE----- --qim0fXNpvdl5D74Y--