Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 11 Jan 2016 14:52:32 -0500
From:      "Kenneth D. Merry" <ken@FreeBSD.ORG>
To:        Roger Pau Monn?? <royger@FreeBSD.org>
Cc:        src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org, gibbs@freebsd.org
Subject:   Re: svn commit: r291716 - in head: share/man/man4 sys/cam sys/cam/ata sys/cam/scsi sys/dev/md sys/geom sys/kern sys/pc98/include sys/sys usr.sbin usr.sbin/camdd
Message-ID:  <20160111195231.GA57278@mithlond.kdm.org>
In-Reply-To: <5693E672.7080100@FreeBSD.org>
References:  <201512032054.tB3KsuUw037541@repo.freebsd.org> <5693E672.7080100@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Jan 11, 2016 at 18:29:22 +0100, Roger Pau Monn?? wrote:
> El 03/12/15 a les 21.54, Kenneth D. Merry ha escrit:
> > Author: ken
> > Date: Thu Dec  3 20:54:55 2015
> > New Revision: 291716
> > URL: https://svnweb.freebsd.org/changeset/base/291716
> > 
> > Log:
> >   Add asynchronous command support to the pass(4) driver, and the new
> >   camdd(8) utility.
> >   
> >   CCBs may be queued to the driver via the new CAMIOQUEUE ioctl, and
> >   completed CCBs may be retrieved via the CAMIOGET ioctl.  User
> >   processes can use poll(2) or kevent(2) to get notification when
> >   I/O has completed.
> >   
> >   While the existing CAMIOCOMMAND blocking ioctl interface only
> >   supports user virtual data pointers in a CCB (generally only
> >   one per CCB), the new CAMIOQUEUE ioctl supports user virtual and
> >   physical address pointers, as well as user virtual and physical
> >   scatter/gather lists.  This allows user applications to have more
> >   flexibility in their data handling operations.
> >   
> >   Kernel memory for data transferred via the queued interface is
> >   allocated from the zone allocator in MAXPHYS sized chunks, and user
> >   data is copied in and out.  This is likely faster than the
> >   vmapbuf()/vunmapbuf() method used by the CAMIOCOMMAND ioctl in
> >   configurations with many processors (there are more TLB shootdowns
> >   caused by the mapping/unmapping operation) but may not be as fast
> >   as running with unmapped I/O.
> >   
> >   The new memory handling model for user requests also allows
> >   applications to send CCBs with request sizes that are larger than
> >   MAXPHYS.  The pass(4) driver now limits queued requests to the I/O
> >   size listed by the SIM driver in the maxio field in the Path
> >   Inquiry (XPT_PATH_INQ) CCB.
> >   
> >   There are some things things would be good to add:
> >   
> >   1. Come up with a way to do unmapped I/O on multiple buffers.
> >      Currently the unmapped I/O interface operates on a struct bio,
> >      which includes only one address and length.  It would be nice
> >      to be able to send an unmapped scatter/gather list down to
> >      busdma.  This would allow eliminating the copy we currently do
> >      for data.
> >   
> >   2. Add an ioctl to list currently outstanding CCBs in the various
> >      queues.
> >   
> >   3. Add an ioctl to cancel a request, or use the XPT_ABORT CCB to do
> >      that.
> >   
> >   4. Test physical address support.  Virtual pointers and scatter
> >      gather lists have been tested, but I have not yet tested
> >      physical addresses or scatter/gather lists.
> >   
> >   5. Investigate multiple queue support.  At the moment there is one
> >      queue of commands per pass(4) device.  If multiple processes
> >      open the device, they will submit I/O into the same queue and
> >      get events for the same completions.  This is probably the right
> >      model for most applications, but it is something that could be
> >      changed later on.
> >   
> >   Also, add a new utility, camdd(8) that uses the asynchronous pass(4)
> >   driver interface.
> >   
> >   This utility is intended to be a basic data transfer/copy utility,
> >   a simple benchmark utility, and an example of how to use the
> >   asynchronous pass(4) interface.
> >   
> >   It can copy data to and from pass(4) devices using any target queue
> >   depth, starting offset and blocksize for the input and ouptut devices.
> >   It currently only supports SCSI devices, but could be easily extended
> >   to support ATA devices.
> >   
> >   It can also copy data to and from regular files, block devices, tape
> >   devices, pipes, stdin, and stdout.  It does not support queueing
> >   multiple commands to any of those targets, since it uses the standard
> >   read(2)/write(2)/writev(2)/readv(2) system calls.
> >   
> >   The I/O is done by two threads, one for the reader and one for the
> >   writer.  The reader thread sends completed read requests to the
> >   writer thread in strictly sequential order, even if they complete
> >   out of order.  That could be modified later on for random I/O patterns
> >   or slightly out of order I/O.
> >   
> >   camdd(8) uses kqueue(2)/kevent(2) to get I/O completion events from
> >   the pass(4) driver and also to send request notifications internally.
> >   
> >   For pass(4) devcies, camdd(8) uses a single buffer (CAM_DATA_VADDR)
> >   per CAM CCB on the reading side, and a scatter/gather list
> >   (CAM_DATA_SG) on the writing side.  In addition to testing both
> >   interfaces, this makes any potential reblocking of I/O easier.  No
> >   data is copied between the reader and the writer, but rather the
> >   reader's buffers are split into multiple I/O requests or combined
> >   into a single I/O request depending on the input and output blocksize.
> >   
> >   For the file I/O path, camdd(8) also uses a single buffer (read(2),
> >   write(2), pread(2) or pwrite(2)) on reads, and a scatter/gather list
> >   (readv(2), writev(2), preadv(2), pwritev(2)) on writes.
> >   
> >   Things that would be nice to do for camdd(8) eventually:
> >   
> >   1.  Add support for I/O pattern generation.  Patterns like all
> >       zeros, all ones, LBA-based patterns, random patterns, etc. Right
> >       Now you can always use /dev/zero, /dev/random, etc.
> >   
> >   2.  Add support for a "sink" mode, so we do only reads with no
> >       writes.  Right now, you can use /dev/null.
> >   
> >   3.  Add support for automatic queue depth probing, so that we can
> >       figure out the right queue depth on the input and output side
> >       for maximum throughput.  At the moment it defaults to 6.
> >   
> >   4.  Add support for SATA device passthrough I/O.
> >   
> >   5.  Add support for random LBAs and/or lengths on the input and
> >       output sides.
> >   
> >   6.  Track average per-I/O latency and busy time.  The busy time
> >       and latency could also feed in to the automatic queue depth
> >       determination.
> >   
> >   sys/cam/scsi/scsi_pass.h:
> >   	Define two new ioctls, CAMIOQUEUE and CAMIOGET, that queue
> >   	and fetch asynchronous CAM CCBs respectively.
> >   
> >   	Although these ioctls do not have a declared argument, they
> >   	both take a union ccb pointer.  If we declare a size here,
> >   	the ioctl code in sys/kern/sys_generic.c will malloc and free
> >   	a buffer for either the CCB or the CCB pointer (depending on
> >   	how it is declared).  Since we have to keep a copy of the
> >   	CCB (which is fairly large) anyway, having the ioctl malloc
> >   	and free a CCB for each call is wasteful.
> >   
> >   sys/cam/scsi/scsi_pass.c:
> >   	Add asynchronous CCB support.
> >   
> >   	Add two new ioctls, CAMIOQUEUE and CAMIOGET.
> >   
> >   	CAMIOQUEUE adds a CCB to the incoming queue.  The CCB is
> >   	executed immediately (and moved to the active queue) if it
> >   	is an immediate CCB, but otherwise it will be executed
> >   	in passstart() when a CCB is available from the transport layer.
> >   
> >   	When CCBs are completed (because they are immediate or
> >   	passdone() if they are queued), they are put on the done
> >   	queue.
> >   
> >   	If we get the final close on the device before all pending
> >   	I/O is complete, all active I/O is moved to the abandoned
> >   	queue and we increment the peripheral reference count so
> >   	that the peripheral driver instance doesn't go away before
> >   	all pending I/O is done.
> >   
> >   	The new passcreatezone() function is called on the first
> >   	call to the CAMIOQUEUE ioctl on a given device to allocate
> >   	the UMA zones for I/O requests and S/G list buffers.  This
> >   	may be good to move off to a taskqueue at some point.
> >   	The new passmemsetup() function allocates memory and
> >   	scatter/gather lists to hold the user's data, and copies
> >   	in any data that needs to be written.  For virtual pointers
> >   	(CAM_DATA_VADDR), the kernel buffer is malloced from the
> >   	new pass(4) driver malloc bucket.  For virtual
> >   	scatter/gather lists (CAM_DATA_SG), buffers are allocated
> >   	from a new per-pass(9) UMA zone in MAXPHYS-sized chunks.
> >   	Physical pointers are passed in unchanged.  We have support
> >   	for up to 16 scatter/gather segments (for the user and
> >   	kernel S/G lists) in the default struct pass_io_req, so
> >   	requests with longer S/G lists require an extra kernel malloc.
> >   
> >   	The new passcopysglist() function copies a user scatter/gather
> >   	list to a kernel scatter/gather list.  The number of elements
> >   	in each list may be different, but (obviously) the amount of data
> >   	stored has to be identical.
> >   
> >   	The new passmemdone() function copies data out for the
> >   	CAM_DATA_VADDR and CAM_DATA_SG cases.
> >   
> >   	The new passiocleanup() function restores data pointers in
> >   	user CCBs and frees memory.
> >   
> >   	Add new functions to support kqueue(2)/kevent(2):
> >   
> >   	passreadfilt() tells kevent whether or not the done
> >   	queue is empty.
> >   
> >   	passkqfilter() adds a knote to our list.
> >   
> >   	passreadfiltdetach() removes a knote from our list.
> >   
> >   	Add a new function, passpoll(), for poll(2)/select(2)
> >   	to use.
> >   
> >   	Add devstat(9) support for the queued CCB path.
> >   
> >   sys/cam/ata/ata_da.c:
> >   	Add support for the BIO_VLIST bio type.
> >   
> >   sys/cam/cam_ccb.h:
> >   	Add a new enumeration for the xflags field in the CCB header.
> >   	(This doesn't change the CCB header, just adds an enumeration to
> >   	use.)
> >   
> >   sys/cam/cam_xpt.c:
> >   	Add a new function, xpt_setup_ccb_flags(), that allows specifying
> >   	CCB flags.
> >   
> >   sys/cam/cam_xpt.h:
> >   	Add a prototype for xpt_setup_ccb_flags().
> >   
> >   sys/cam/scsi/scsi_da.c:
> >   	Add support for BIO_VLIST.
> >   
> >   sys/dev/md/md.c:
> >   	Add BIO_VLIST support to md(4).
> >   
> >   sys/geom/geom_disk.c:
> >   	Add BIO_VLIST support to the GEOM disk class.  Re-factor the I/O size
> >   	limiting code in g_disk_start() a bit.
> >   
> >   sys/kern/subr_bus_dma.c:
> >   	Change _bus_dmamap_load_vlist() to take a starting offset and
> >   	length.
> >   
> >   	Add a new function, _bus_dmamap_load_pages(), that will load a list
> >   	of physical pages starting at an offset.
> 
> (cannot comment inline because the diff is truncated, sorry).
> 
> This is bogus, and defeats the purpose of _bus_dmamap_load_ma by 
> reintroducing the mechanics already found in bus_dmamap_load_ma_triv. 
> See r290610, which was the commit that introduced a proper 
> implementation of bus_dmamap_load_ma in x86.
> 
> The problem with the approach taken both in _bus_dmamap_load_pages and 
> bus_dmamap_load_ma_triv is that they split the request buffer into 
> arbitrary chunks based on page boundaries, creating segments that no 
> longer have a size that's a multiple of the sector size. This breaks 
> drivers like blkfront (and probably other stuff).

Sorry, the breakage was an oversight.

As for the history here, gibbs@ originally introduced the load_pages()
function into the Spectra tree in October 2014.  But he was doing
development on a tree that did not yet have _bus_dmamap_load_ma().  He did
something similar to that, and just chose a different name --
_bus_dmamap_load_pages().

I obviously didn't pay enough attention when putting into the tree.

> The following patch solves the problem AFAICT, and I would like to 
> commit it ASAP:

I think this should be fine.

> Roger.
> ---
> diff --git a/sys/kern/subr_bus_dma.c b/sys/kern/subr_bus_dma.c
> index ae30276..cdf176b 100644
> --- a/sys/kern/subr_bus_dma.c
> +++ b/sys/kern/subr_bus_dma.c
> @@ -131,28 +131,6 @@ _bus_dmamap_load_mbuf_sg(bus_dma_tag_t dmat, bus_dmamap_t map,
>  }
>  
>  /*
> - * Load tlen data starting at offset within a region specified by a list of
> - * physical pages.
> - */
> -static int
> -_bus_dmamap_load_pages(bus_dma_tag_t dmat, bus_dmamap_t map,
> -    vm_page_t *pages, bus_size_t tlen, int offset, int *nsegs, int flags)
> -{
> -	vm_paddr_t paddr;
> -	bus_size_t len;
> -	int error, i;
> - 
> -	for (i = 0, error = 0; error == 0 && tlen > 0; i++, tlen -= len) {
> -		len = min(PAGE_SIZE - offset, tlen);
> -		paddr = VM_PAGE_TO_PHYS(pages[i]) + offset;
> -		error = _bus_dmamap_load_phys(dmat, map, paddr, len,
> -		    flags, NULL, nsegs);
> -		offset = 0;
> -	}
> -	return (error);
> -}
> - 
> -/*
>   * Load from block io.
>   */
>  static int
> @@ -168,8 +146,8 @@ _bus_dmamap_load_bio(bus_dma_tag_t dmat, bus_dmamap_t map, struct bio *bio,
>  	}
>  
>  	if ((bio->bio_flags & BIO_UNMAPPED) != 0)
> -		return (_bus_dmamap_load_pages(dmat, map, bio->bio_ma,
> -		    bio->bio_bcount, bio->bio_ma_offset, nsegs, flags));
> +		return (_bus_dmamap_load_ma(dmat, map, bio->bio_ma,
> +		    bio->bio_bcount, bio->bio_ma_offset, flags, NULL, nsegs));
>  
>  	return (_bus_dmamap_load_buffer(dmat, map, bio->bio_data,
>  	    bio->bio_bcount, kernel_pmap, flags, NULL, nsegs));
> 
> 
> 
> 

Ken
-- 
Kenneth Merry
ken@FreeBSD.ORG



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20160111195231.GA57278>