From owner-freebsd-scsi@FreeBSD.ORG Tue Dec 30 06:58:24 2014 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id DCE124B9; Tue, 30 Dec 2014 06:58:24 +0000 (UTC) Received: from mithlond.kdm.org (mithlond.kdm.org [70.56.43.85]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "A1-33714", Issuer "A1-33714" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 860E3256B; Tue, 30 Dec 2014 06:58:24 +0000 (UTC) Received: from mithlond.kdm.org (localhost [127.0.0.1]) by mithlond.kdm.org (8.14.9/8.14.9) with ESMTP id sBU6VTCe077608 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Mon, 29 Dec 2014 23:31:29 -0700 (MST) (envelope-from ken@mithlond.kdm.org) Received: (from ken@localhost) by mithlond.kdm.org (8.14.9/8.14.9/Submit) id sBU6VTUm077607; Mon, 29 Dec 2014 23:31:29 -0700 (MST) (envelope-from ken) Date: Mon, 29 Dec 2014 23:31:29 -0700 From: "Kenneth D. Merry" To: Shivaram Upadhyayula Subject: Re: Tape block size greater than MAXPHYS Message-ID: <20141230063129.GA77314@mithlond.kdm.org> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (mithlond.kdm.org [127.0.0.1]); Mon, 29 Dec 2014 23:31:29 -0700 (MST) X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS autolearn=ham autolearn_force=no version=3.4.0 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on mithlond.kdm.org Cc: freebsd-scsi@freebsd.org X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Dec 2014 06:58:25 -0000 On Mon, Dec 29, 2014 at 14:52:12 +0530, Shivaram Upadhyayula wrote: > Hi, > > It seems that currently any tape reads/writes greater than MAXPHYS > will fail. For example > > cpi->maxio = 256 * 1024; /* Controller max io size 256K */ > > root@quadstorvtl # dd if=/dev/zero of=/dev/sa0 bs=256k count=1 > sa0.0: request size=262144 > si_iosize_max=131072; cannot split request > sa0.0: request size=262144 > MAXPHYS=131072; cannot split request > dd: /dev/sa0: File too large > 1+0 records in > 0+0 records out > 0 bytes transferred in 0.000390 secs (0 bytes/sec) > > The first limitation comes from sys/cam/scsi/scsi_sa.c:saregister > /* > * If maxio isn't set, we fall back to DFLTPHYS. Otherwise we take > * the smaller of cpi.maxio or MAXPHYS. > */ > if (cpi.maxio == 0) > softc->maxio = DFLTPHYS; > else if (cpi.maxio > MAXPHYS) > softc->maxio = MAXPHYS; > else > softc->maxio = cpi.maxio; > > softc limits maxio to MAXPHYS even if the controller supports a higher > maxio value. I tried removing the limitation which then led me to > reason for the actual reason for the limiation in > sys/kern/kern_physio.c:physio > > /* > * If the driver does not want I/O to be split, that means that we > * need to reject any requests that will not fit into one buffer. > */ > if (dev->si_flags & SI_NOSPLIT && > (uio->uio_resid > dev->si_iosize_max || uio->uio_resid > MAXPHYS || > uio->uio_iovcnt > 1)) { > > To maintain consistency of the block numbers SI_NOSPLIT has to be set, > but then to issue the entire request in a single bio the request size > will be limited to MAXPHYS. > > Would is be correct to assume that the only way to increase the tape > block size for writes/reads is to increase MAXPHYS and recompile the > kernel ? (As of now on FreeBSD 10.1) Your analysis is correct. The reason I added the SI_NOSPLIT code (and set the flag in the sa(4) driver) is that the previous situation was bad from the standpoint of a tape drive user. You could write to a tape with a large blocksize, but that isn't what would actually make it onto the tape. You wouldn't know exactly what size blocks were making it onto the tape; that would depend on the size and alignment of the incoming buffers. Now at least the application has a clear understanding of what is written to tape. One problem that was there before the SI_NOSPLIT changes and is still present is that we can't by default read tapes with a large blocksize (e.g. 1MB). Increasing MAXPHYS will certainly fix it (assuming your controller sets the maxio field in the path inquiry CCB to something sufficiently large). I have considered adding a custom read/write routine to the sa(4) driver that would essentially take the best available path given the requested block size and the constraints imposed by the controller and MAXPHYS. The logic would be something like: - If the I/O is <= MAXPHYS (including alignment constraints) and the controller supports it, do unmapped I/O. - Otherwise, allocate buffers from a sa(4)-specific UMA zone and copy in and out. This would allow for doing I/O up to the controller's limit, without regard for MAXPHYS. On modern machines, this would also usually be faster than mapping the memory in and out of the kernel, because you avoid the extra TLB shootdowns. Ideally we'll get a scheme in place to allow doing unmapped S/G lists at some point. But we don't have that yet. I have some code with logic similar to the above scenario for the pass(4) driver asynchronous mode that I has been in my queue to upstream for about a year. I also have a very large set of tape driver improvements that I've been working on (off and on) for about a year and a half. I haven't done the custom read/write routine yet, but I may do it if I have some time. By the way, the mps(4) and mpr(4) drivers can do I/O larger than 256KB. That limit is somewhat arbitrary. Perhaps Steve (CCed) can take a look at what we need to do to calculate the true limit (which would be based on the page size of the machine and maximum number of S/G lists the controller can handle) so we can pass back a more accurate number. The isp(4) driver I/O limit is accurate. If you try to use it with a modern tape drive, you'll likely run into some FC-Tape related bugs. I need to upstream those fixes too. Ken -- Kenneth Merry ken@FreeBSD.ORG