From owner-freebsd-scsi@FreeBSD.ORG Tue Jun 3 06:37:46 2003 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5F9A837B401 for ; Tue, 3 Jun 2003 06:37:46 -0700 (PDT) Received: from matou.sibbald.com (matou.sibbald.com [195.202.201.48]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8E06543F3F for ; Tue, 3 Jun 2003 06:37:44 -0700 (PDT) (envelope-from kern@sibbald.com) Received: from [192.168.68.112] (rufus [192.168.68.112]) by matou.sibbald.com (8.11.6/8.11.6) with ESMTP id h53Dbev09582; Tue, 3 Jun 2003 15:37:40 +0200 From: Kern Sibbald To: cer@mirapoint.com In-Reply-To: References: Content-Type: text/plain Organization: Message-Id: <1054647459.13630.189.camel@rufus> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.4 Date: 03 Jun 2003 15:37:40 +0200 Content-Transfer-Encoding: 7bit cc: freebsd-scsi@freebsd.org Subject: Re: SCSI tape data loss X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Jun 2003 13:37:46 -0000 Thanks for the lesson is how blocks are written to a tape -- especially the example. I'm now leaning strongly toward aligning my buffers. However a couple more questions please. - When using tar (or say Bacula), how do you know your writes are split by the kernel? In the case of Bacula, with the buffer size I use, it ALWAYS gets back exactly what it wrote. From my userland perspective I see no double writes. - What is the "page" that you are referring to? Paged memory? If I am not mistaken the page size can be radically different depending on the OS and hardware. I.e. 1024 to 4096 or even more. - How does one determine what a page size is, preferably in a system independent way? Thanks, Kern On Tue, 2003-06-03 at 15:19, Carl Reisinger wrote: > >Concerning the maximum buffer size: I have chosen > >the default maximum buffer size to be 64512 bytes so > >that it is smaller than 65536. In fact, 64512 bytes is > >the size (126 blocks) that I used for tar in 1982 > >and never had any problems. > > Try using the FreeBSD tar with the multi-volume flag (-M) and > your record size. > > Without the flag the writes are page aligned, with the flag > the writes are offset some, either 512 or 1536 bytes (I forget > which), and the writes will be split by the kernel physio > function into a 60K and 3K write. (This is with the tar > shipped with FreeBSD up to at least 4.2. Later ones may also > do this, I have not tried them) > > > > >>From what I understand the 65536 point at which > >buffers are always split only applies to devices in > >fixed block mode, and probably older devices at that. > > This magic number has nothing to do with the device. I've only > used variable block mode and newer technologies, SDLT, LTO. > > > > >Though Bacula can run in fixed block mode, the > >default is variable block, so I don't see that as > >an issue here -- unless I am missing something? > > > >Can you explain why you mention 61440 bytes? and > >why it might be a better choice than 64512? > > > > 61440 was mentioned since that is the largest write that can > be done without the physio function doing some surprising and > annoying things to your write. 61440 is the size that, no > matter its address alignment, can always be mapped with one > page register. > > If you are careful to page align all writes then you can write > up to 65536 and have one record sent to the tape device. > > (Actually, with a minor change to scsi_sa.c and limiting one > self to newer SCSI HBAs you can go as high as 128KB for > read/write) > > An example: > > Write 64512 bytes with a starting address of 4096. Physio will > take this, see that the address is paged aligned, check that > it can be mapped with one page register and perform one write. > > Now lets write 64512 bytes but with an address of 5632. In > this case physio will notice it is not paged aligned and > adjust the starting address to be 4096. Now 66048 bytes need > to be mapped which exceeds the default size of 65536. In this > case physio will map the first 60K (64K to him because of the > starting address change), write that and then map and write > the remainder. > > Now when one goes back to read 64512 bytes, the first read > returns 61440 bytes and the second 3072 instead of just one > read retuning 64512. > > >On aligning the buffers on a page boundary: interesting > >idea, I'll look into it, but I'm not too keen on the > >idea. > > > > If your software has no problem with short reads and records > being split into two, then don't bother page aligning. > But, if you want to read exectly what you know you wrote then > alignment is a must. > > Carl