Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 19 Dec 2012 15:54:51 +0200
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        arch@freebsd.org
Subject:   Unmapped I/O
Message-ID:  <20121219135451.GU71906@kib.kiev.ua>

next in thread | raw e-mail | index | archive | help

--qim0fXNpvdl5D74Y
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

One of the known FreeBSD I/O path performance bootleneck is the
neccessity to map each I/O buffer pages into KVA.  The problem is that
on the multi-core machines, the mapping must flush TLB on all cores,
due to the global mapping of the buffer pages into the kernel.  This
means that buffer creation and destruction disrupts execution of all
other cores to perform TLB shootdown through IPI, and the thread
initiating the shootdown must wait for all other cores to execute and
report.

The patch at
http://people.freebsd.org/~kib/misc/unmapped.4.patch
implements the 'unmapped buffers'.  It means an ability to create the
VMIO struct buf, which does not point to the KVA mapping the buffer
pages to the kernel addresses.  Since there is no mapping, kernel does
not need to clear TLB. The unmapped buffers are marked with the new
B_NOTMAPPED flag, and should be requested explicitely using the
GB_NOTMAPPED flag to the buffer allocation routines.  If the mapped
buffer is requested but unmapped buffer already exists, the buffer
subsystem automatically maps the pages.

The clustering code is also made aware of the not-mapped buffers, but
this required the KPI change that accounts for the diff in the non-UFS
filesystems.

UFS is adopted to request not mapped buffers when kernel does not need
to access the content, i.e. mostly for the file data.  New helper
function vn_io_fault_pgmove() operates on the unmapped array of pages.
It calls new pmap method pmap_copy_pages() to do the data move to and
=66rom usermode.

Besides not mapped buffers, not mapped BIOs are introduced, marked
with the flag BIO_NOTMAPPED.  Unmapped buffers are directly translated
to unmapped BIOs.  Geom providers may indicate an acceptance of the
unmapped BIOs.  If provider does not handle unmapped i/o requests,
geom now automatically establishes transient mapping for the i/o
pages.

Swap- and malloc-backed md(4) is changed to accept unmapped BIOs. The
gpart providers indicate the unmapped BIOs support if the underlying
provider can do unmapped i/o.  I also hacked ahci(4) to handle
unmapped i/o, but this should be changed after the Jeff' physbio patch
is committed, to use proper busdma interface.

Besides, the swap pager does unmapped swapping if the swap partition
indicated that it can do unmapped i/o.  By Jeff request, a buffer
allocation code may reserve the KVA for unmapped buffer in advance.
The unmapped page-in for the vnode pager is also implemented if
filesystem supports it, but the page out is not. The page-out, as well
as the vnode-backed md(4), currently require mappings, mostly due to
the use of VOP_WRITE().

As such, the patch worked in my test environment, where I used
ahci-attached SATA disks with gpt partitions, md(4) and UFS.  I see no
statistically significant difference in the buildworld -j 10 times on
the 4-core machine with HT.  On the other hand, when doing sha1 over
the 5GB file, the system time was reduced by 30%.

Unfinished items:
- Integration with the physbio, will be done after physbio is
  committed to HEAD.
- The key per-architecture function needed for the unmapped i/o is the
  pmap_copy_pages(). I implemented it for amd64 and i386 right now, it
  shall be done for all other architectures.
- The sizing of the submap used for transient mapping of the BIOs is
  naive.  Should be adjusted, esp. for KVA-lean architectures.
- Conversion of the other filesystems. Low priority.

I am interested in reviews, tests and suggestions.  Note that this
only works now for md(4) and ahci(4), for other drivers the patched
kernel should fall back to the mapped i/o.

 sys/amd64/amd64/pmap.c         |  24 +++
 sys/cam/ata/ata_da.c           |   5 +-
 sys/cam/cam_ccb.h              |  30 ++++
 sys/dev/ahci/ahci.c            |  53 +++++-
 sys/dev/md/md.c                | 255 ++++++++++++++++++++++++-----
 sys/fs/cd9660/cd9660_vnops.c   |   2 +-
 sys/fs/ext2fs/ext2_balloc.c    |   2 +-
 sys/fs/ext2fs/ext2_vnops.c     |   9 +-
 sys/fs/msdosfs/msdosfs_vnops.c |   4 +-
 sys/fs/udf/udf_vnops.c         |   5 +-
 sys/geom/geom.h                |   1 +
 sys/geom/geom_disk.c           |   2 +
 sys/geom/geom_disk.h           |   1 +
 sys/geom/geom_io.c             |  44 ++++-
 sys/geom/geom_vfs.c            |  10 +-
 sys/geom/part/g_part.c         |   1 +
 sys/i386/i386/pmap.c           |  42 +++++
 sys/kern/vfs_bio.c             | 356 +++++++++++++++++++++++++++++++++----=
----
 sys/kern/vfs_cluster.c         | 118 +++++++-------
 sys/kern/vfs_vnops.c           |  39 +++++
 sys/sys/bio.h                  |   7 +
 sys/sys/buf.h                  |  22 ++-
 sys/sys/mount.h                |   1 +
 sys/sys/vnode.h                |   2 +
 sys/ufs/ffs/ffs_alloc.c        |  10 +-
 sys/ufs/ffs/ffs_balloc.c       |  58 ++++---
 sys/ufs/ffs/ffs_vfsops.c       |   3 +-
 sys/ufs/ffs/ffs_vnops.c        |  35 ++--
 sys/ufs/ufs/ufs_extern.h       |   1 +
 sys/vm/pmap.h                  |   2 +
 sys/vm/swap_pager.c            |  43 +++--
 sys/vm/swap_pager.h            |   1 +
 sys/vm/vm.h                    |   2 +
 sys/vm/vm_init.c               |   6 +-
 sys/vm/vm_kern.c               |   9 +-
 sys/vm/vnode_pager.c           |  30 +++-
 36 files changed, 989 insertions(+), 246 deletions(-)


--qim0fXNpvdl5D74Y
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (FreeBSD)

iQIcBAEBAgAGBQJQ0ccfAAoJEJDCuSvBvK1BsN4QAKFcmhXwCzuwBTZcWIKK/J/Z
9BFBWG0hFKcIVOLyrwkEbwYdumjiVriJbGTl9PWrjc1e41YQBr4FNhrO/fitR31U
rlEXuNaXjc/e5BuKg18nUGnrLBGQFryeT2ZYaomU06qtvMYknwXnbM4y+GmfYEnz
FzoGICsoDpDZo9TKInL1Y/bM6gEgW5AjjdXJyOs/5Vb/ZrQJVBc/DMw7vg/U7olb
EW6T7KxBc3d3zIkPkFtSHVRA6c3905gBYmKN/p11/GtZQpGsjLizYmK4WkwHvmR9
WVDkxRIK1XVq003om5HnTXZ+LPngDvZTC1djMAWjsHTAXwb8lLrewmcKIGQNaIxf
9qnIIuxX4FPHkpay7/EdlDQxR1gphSLbGFtLZBFMBxnCgAYMZXLguvLNnh/Jk1KC
eRl8mgN7M2+E8JwcHgIsJTKMDrGuUvgIvCXJDHG8OuXKtdzzxrZi+fWeRfg/cTel
K0sgvG49vWACpLoylCl0LcxXdtbBtYgNfjDdi/UaAqBPqUvRCrU9EuJlWhq7MgYp
kJzlMcjKq1nxofy/bsXnztQ85KMgl88DN2CAXAqOpcfB9dVR5CbBYVw4UYeBuAoi
Us9oIM09BddUHgunrdE3VAiwYDJWwfHgZI6t7dvma72eOChhJ0pmlJmDUGITCZH3
JxCckXc2yFKXR6UXGB5Y
=BZQ1
-----END PGP SIGNATURE-----

--qim0fXNpvdl5D74Y--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20121219135451.GU71906>