From owner-svn-src-head@freebsd.org Mon Jan 11 19:52:41 2016 Return-Path: Delivered-To: svn-src-head@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C3DA3A6B595; Mon, 11 Jan 2016 19:52:41 +0000 (UTC) (envelope-from ken@kdm.org) Received: from mithlond.kdm.org (mithlond.kdm.org [96.89.93.250]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "A1-33714", Issuer "A1-33714" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 7C6CF19D1; Mon, 11 Jan 2016 19:52:41 +0000 (UTC) (envelope-from ken@kdm.org) Received: from mithlond.kdm.org (localhost [127.0.0.1]) by mithlond.kdm.org (8.15.2/8.14.9) with ESMTPS id u0BJqWUY057393 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Mon, 11 Jan 2016 14:52:32 -0500 (EST) (envelope-from ken@mithlond.kdm.org) Received: (from ken@localhost) by mithlond.kdm.org (8.15.2/8.14.9/Submit) id u0BJqWRl057392; Mon, 11 Jan 2016 14:52:32 -0500 (EST) (envelope-from ken) Date: Mon, 11 Jan 2016 14:52:32 -0500 From: "Kenneth D. Merry" To: Roger Pau Monn?? Cc: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org, gibbs@freebsd.org Subject: Re: svn commit: r291716 - in head: share/man/man4 sys/cam sys/cam/ata sys/cam/scsi sys/dev/md sys/geom sys/kern sys/pc98/include sys/sys usr.sbin usr.sbin/camdd Message-ID: <20160111195231.GA57278@mithlond.kdm.org> References: <201512032054.tB3KsuUw037541@repo.freebsd.org> <5693E672.7080100@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5693E672.7080100@FreeBSD.org> User-Agent: Mutt/1.5.23 (2014-03-12) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (mithlond.kdm.org [127.0.0.1]); Mon, 11 Jan 2016 14:52:32 -0500 (EST) X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS autolearn=ham autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on mithlond.kdm.org X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 11 Jan 2016 19:52:41 -0000 On Mon, Jan 11, 2016 at 18:29:22 +0100, Roger Pau Monn?? wrote: > El 03/12/15 a les 21.54, Kenneth D. Merry ha escrit: > > Author: ken > > Date: Thu Dec 3 20:54:55 2015 > > New Revision: 291716 > > URL: https://svnweb.freebsd.org/changeset/base/291716 > > > > Log: > > Add asynchronous command support to the pass(4) driver, and the new > > camdd(8) utility. > > > > CCBs may be queued to the driver via the new CAMIOQUEUE ioctl, and > > completed CCBs may be retrieved via the CAMIOGET ioctl. User > > processes can use poll(2) or kevent(2) to get notification when > > I/O has completed. > > > > While the existing CAMIOCOMMAND blocking ioctl interface only > > supports user virtual data pointers in a CCB (generally only > > one per CCB), the new CAMIOQUEUE ioctl supports user virtual and > > physical address pointers, as well as user virtual and physical > > scatter/gather lists. This allows user applications to have more > > flexibility in their data handling operations. > > > > Kernel memory for data transferred via the queued interface is > > allocated from the zone allocator in MAXPHYS sized chunks, and user > > data is copied in and out. This is likely faster than the > > vmapbuf()/vunmapbuf() method used by the CAMIOCOMMAND ioctl in > > configurations with many processors (there are more TLB shootdowns > > caused by the mapping/unmapping operation) but may not be as fast > > as running with unmapped I/O. > > > > The new memory handling model for user requests also allows > > applications to send CCBs with request sizes that are larger than > > MAXPHYS. The pass(4) driver now limits queued requests to the I/O > > size listed by the SIM driver in the maxio field in the Path > > Inquiry (XPT_PATH_INQ) CCB. > > > > There are some things things would be good to add: > > > > 1. Come up with a way to do unmapped I/O on multiple buffers. > > Currently the unmapped I/O interface operates on a struct bio, > > which includes only one address and length. It would be nice > > to be able to send an unmapped scatter/gather list down to > > busdma. This would allow eliminating the copy we currently do > > for data. > > > > 2. Add an ioctl to list currently outstanding CCBs in the various > > queues. > > > > 3. Add an ioctl to cancel a request, or use the XPT_ABORT CCB to do > > that. > > > > 4. Test physical address support. Virtual pointers and scatter > > gather lists have been tested, but I have not yet tested > > physical addresses or scatter/gather lists. > > > > 5. Investigate multiple queue support. At the moment there is one > > queue of commands per pass(4) device. If multiple processes > > open the device, they will submit I/O into the same queue and > > get events for the same completions. This is probably the right > > model for most applications, but it is something that could be > > changed later on. > > > > Also, add a new utility, camdd(8) that uses the asynchronous pass(4) > > driver interface. > > > > This utility is intended to be a basic data transfer/copy utility, > > a simple benchmark utility, and an example of how to use the > > asynchronous pass(4) interface. > > > > It can copy data to and from pass(4) devices using any target queue > > depth, starting offset and blocksize for the input and ouptut devices. > > It currently only supports SCSI devices, but could be easily extended > > to support ATA devices. > > > > It can also copy data to and from regular files, block devices, tape > > devices, pipes, stdin, and stdout. It does not support queueing > > multiple commands to any of those targets, since it uses the standard > > read(2)/write(2)/writev(2)/readv(2) system calls. > > > > The I/O is done by two threads, one for the reader and one for the > > writer. The reader thread sends completed read requests to the > > writer thread in strictly sequential order, even if they complete > > out of order. That could be modified later on for random I/O patterns > > or slightly out of order I/O. > > > > camdd(8) uses kqueue(2)/kevent(2) to get I/O completion events from > > the pass(4) driver and also to send request notifications internally. > > > > For pass(4) devcies, camdd(8) uses a single buffer (CAM_DATA_VADDR) > > per CAM CCB on the reading side, and a scatter/gather list > > (CAM_DATA_SG) on the writing side. In addition to testing both > > interfaces, this makes any potential reblocking of I/O easier. No > > data is copied between the reader and the writer, but rather the > > reader's buffers are split into multiple I/O requests or combined > > into a single I/O request depending on the input and output blocksize. > > > > For the file I/O path, camdd(8) also uses a single buffer (read(2), > > write(2), pread(2) or pwrite(2)) on reads, and a scatter/gather list > > (readv(2), writev(2), preadv(2), pwritev(2)) on writes. > > > > Things that would be nice to do for camdd(8) eventually: > > > > 1. Add support for I/O pattern generation. Patterns like all > > zeros, all ones, LBA-based patterns, random patterns, etc. Right > > Now you can always use /dev/zero, /dev/random, etc. > > > > 2. Add support for a "sink" mode, so we do only reads with no > > writes. Right now, you can use /dev/null. > > > > 3. Add support for automatic queue depth probing, so that we can > > figure out the right queue depth on the input and output side > > for maximum throughput. At the moment it defaults to 6. > > > > 4. Add support for SATA device passthrough I/O. > > > > 5. Add support for random LBAs and/or lengths on the input and > > output sides. > > > > 6. Track average per-I/O latency and busy time. The busy time > > and latency could also feed in to the automatic queue depth > > determination. > > > > sys/cam/scsi/scsi_pass.h: > > Define two new ioctls, CAMIOQUEUE and CAMIOGET, that queue > > and fetch asynchronous CAM CCBs respectively. > > > > Although these ioctls do not have a declared argument, they > > both take a union ccb pointer. If we declare a size here, > > the ioctl code in sys/kern/sys_generic.c will malloc and free > > a buffer for either the CCB or the CCB pointer (depending on > > how it is declared). Since we have to keep a copy of the > > CCB (which is fairly large) anyway, having the ioctl malloc > > and free a CCB for each call is wasteful. > > > > sys/cam/scsi/scsi_pass.c: > > Add asynchronous CCB support. > > > > Add two new ioctls, CAMIOQUEUE and CAMIOGET. > > > > CAMIOQUEUE adds a CCB to the incoming queue. The CCB is > > executed immediately (and moved to the active queue) if it > > is an immediate CCB, but otherwise it will be executed > > in passstart() when a CCB is available from the transport layer. > > > > When CCBs are completed (because they are immediate or > > passdone() if they are queued), they are put on the done > > queue. > > > > If we get the final close on the device before all pending > > I/O is complete, all active I/O is moved to the abandoned > > queue and we increment the peripheral reference count so > > that the peripheral driver instance doesn't go away before > > all pending I/O is done. > > > > The new passcreatezone() function is called on the first > > call to the CAMIOQUEUE ioctl on a given device to allocate > > the UMA zones for I/O requests and S/G list buffers. This > > may be good to move off to a taskqueue at some point. > > The new passmemsetup() function allocates memory and > > scatter/gather lists to hold the user's data, and copies > > in any data that needs to be written. For virtual pointers > > (CAM_DATA_VADDR), the kernel buffer is malloced from the > > new pass(4) driver malloc bucket. For virtual > > scatter/gather lists (CAM_DATA_SG), buffers are allocated > > from a new per-pass(9) UMA zone in MAXPHYS-sized chunks. > > Physical pointers are passed in unchanged. We have support > > for up to 16 scatter/gather segments (for the user and > > kernel S/G lists) in the default struct pass_io_req, so > > requests with longer S/G lists require an extra kernel malloc. > > > > The new passcopysglist() function copies a user scatter/gather > > list to a kernel scatter/gather list. The number of elements > > in each list may be different, but (obviously) the amount of data > > stored has to be identical. > > > > The new passmemdone() function copies data out for the > > CAM_DATA_VADDR and CAM_DATA_SG cases. > > > > The new passiocleanup() function restores data pointers in > > user CCBs and frees memory. > > > > Add new functions to support kqueue(2)/kevent(2): > > > > passreadfilt() tells kevent whether or not the done > > queue is empty. > > > > passkqfilter() adds a knote to our list. > > > > passreadfiltdetach() removes a knote from our list. > > > > Add a new function, passpoll(), for poll(2)/select(2) > > to use. > > > > Add devstat(9) support for the queued CCB path. > > > > sys/cam/ata/ata_da.c: > > Add support for the BIO_VLIST bio type. > > > > sys/cam/cam_ccb.h: > > Add a new enumeration for the xflags field in the CCB header. > > (This doesn't change the CCB header, just adds an enumeration to > > use.) > > > > sys/cam/cam_xpt.c: > > Add a new function, xpt_setup_ccb_flags(), that allows specifying > > CCB flags. > > > > sys/cam/cam_xpt.h: > > Add a prototype for xpt_setup_ccb_flags(). > > > > sys/cam/scsi/scsi_da.c: > > Add support for BIO_VLIST. > > > > sys/dev/md/md.c: > > Add BIO_VLIST support to md(4). > > > > sys/geom/geom_disk.c: > > Add BIO_VLIST support to the GEOM disk class. Re-factor the I/O size > > limiting code in g_disk_start() a bit. > > > > sys/kern/subr_bus_dma.c: > > Change _bus_dmamap_load_vlist() to take a starting offset and > > length. > > > > Add a new function, _bus_dmamap_load_pages(), that will load a list > > of physical pages starting at an offset. > > (cannot comment inline because the diff is truncated, sorry). > > This is bogus, and defeats the purpose of _bus_dmamap_load_ma by > reintroducing the mechanics already found in bus_dmamap_load_ma_triv. > See r290610, which was the commit that introduced a proper > implementation of bus_dmamap_load_ma in x86. > > The problem with the approach taken both in _bus_dmamap_load_pages and > bus_dmamap_load_ma_triv is that they split the request buffer into > arbitrary chunks based on page boundaries, creating segments that no > longer have a size that's a multiple of the sector size. This breaks > drivers like blkfront (and probably other stuff). Sorry, the breakage was an oversight. As for the history here, gibbs@ originally introduced the load_pages() function into the Spectra tree in October 2014. But he was doing development on a tree that did not yet have _bus_dmamap_load_ma(). He did something similar to that, and just chose a different name -- _bus_dmamap_load_pages(). I obviously didn't pay enough attention when putting into the tree. > The following patch solves the problem AFAICT, and I would like to > commit it ASAP: I think this should be fine. > Roger. > --- > diff --git a/sys/kern/subr_bus_dma.c b/sys/kern/subr_bus_dma.c > index ae30276..cdf176b 100644 > --- a/sys/kern/subr_bus_dma.c > +++ b/sys/kern/subr_bus_dma.c > @@ -131,28 +131,6 @@ _bus_dmamap_load_mbuf_sg(bus_dma_tag_t dmat, bus_dmamap_t map, > } > > /* > - * Load tlen data starting at offset within a region specified by a list of > - * physical pages. > - */ > -static int > -_bus_dmamap_load_pages(bus_dma_tag_t dmat, bus_dmamap_t map, > - vm_page_t *pages, bus_size_t tlen, int offset, int *nsegs, int flags) > -{ > - vm_paddr_t paddr; > - bus_size_t len; > - int error, i; > - > - for (i = 0, error = 0; error == 0 && tlen > 0; i++, tlen -= len) { > - len = min(PAGE_SIZE - offset, tlen); > - paddr = VM_PAGE_TO_PHYS(pages[i]) + offset; > - error = _bus_dmamap_load_phys(dmat, map, paddr, len, > - flags, NULL, nsegs); > - offset = 0; > - } > - return (error); > -} > - > -/* > * Load from block io. > */ > static int > @@ -168,8 +146,8 @@ _bus_dmamap_load_bio(bus_dma_tag_t dmat, bus_dmamap_t map, struct bio *bio, > } > > if ((bio->bio_flags & BIO_UNMAPPED) != 0) > - return (_bus_dmamap_load_pages(dmat, map, bio->bio_ma, > - bio->bio_bcount, bio->bio_ma_offset, nsegs, flags)); > + return (_bus_dmamap_load_ma(dmat, map, bio->bio_ma, > + bio->bio_bcount, bio->bio_ma_offset, flags, NULL, nsegs)); > > return (_bus_dmamap_load_buffer(dmat, map, bio->bio_data, > bio->bio_bcount, kernel_pmap, flags, NULL, nsegs)); > > > > Ken -- Kenneth Merry ken@FreeBSD.ORG