From owner-freebsd-scsi@FreeBSD.ORG Mon Jun 2 01:57:53 2003 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1769E37B404 for ; Mon, 2 Jun 2003 01:57:53 -0700 (PDT) Received: from matou.sibbald.com (matou.sibbald.com [195.202.201.48]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4A62543F85 for ; Mon, 2 Jun 2003 01:57:51 -0700 (PDT) (envelope-from kern@sibbald.com) Received: from [192.168.68.112] (rufus [192.168.68.112]) by matou.sibbald.com (8.11.6/8.11.6) with ESMTP id h528vgv04658; Mon, 2 Jun 2003 10:57:42 +0200 From: Kern Sibbald To: "Justin T. Gibbs" In-Reply-To: <2897610000.1054507162@aslan.scsiguy.com> References: <1054490081.1582.1685.camel@rufus> <2846020000.1054498114@aslan.scsiguy.com> <1054503429.1578.1715.camel@rufus> <2897610000.1054507162@aslan.scsiguy.com> Content-Type: text/plain Organization: Message-Id: <1054544261.1578.1801.camel@rufus> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.4 Date: 02 Jun 2003 10:57:42 +0200 Content-Transfer-Encoding: 7bit cc: scsi@FreeBSD.org Subject: Re: SCSI tape data loss X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 Jun 2003 08:57:53 -0000 On Mon, 2003-06-02 at 00:39, Justin T. Gibbs wrote: > >> Perhaps both Linux and Solaris force the tape drives to run in > >> unbuffered mode? > > > > Both of these systems run in synchronous write (unbuffered) > > mode by default. It is possible to run with asynchronous > > writes (buffered mode), but I am not aware of any > > program that does so. The mt program can be used to set > > synchronous/asynchronous writes, or other modes such > > as Sys V compatibility rather than BSD style. > > Does Solaris have the drvbuffer command that is in Linux? I'm not 100% sure -- they have just about everything, and their documentation is very good. All their documentation is online at http://docs.sun.com -- their AnswerBook. However, if you have not read their mt documentation, I recommend it -- that is the definition of what I consider the "correct" driver behavior. See for example: http://docs.sun.com/db/doc/802-5747-07/6i9g1cn4u?a=view For me, it is the bible. Unfortunately, not all Unicies behave like that. > > >> > 2. The SCSI driver is doing asynchronous writes (very bad) and > >> > the End of Medium is not sent to Bacula until many writes after > >> > the end of the tape. > >> > >> Disabling the tape drive's write buffer kills performance. All > >> of the information required to handle buffered writes should be > >> available to you. > > > > My personal preference is for data security before performance. > > There is no potential for lost data if you handle the status that > is presented to you. Could you explain that more in detail? If you mean dig into the OS/driver specific details of an MTIOCERRSTAT packet. That *shouldn't* be necessary -- at least it is not necessary on Solaris/Linux to guarantee data integrity. > > > If you are in fact doing asynchronous writes (buffered mode), then > > Bacula will not support FreeBSD without essentially duplicating the > > driver's buffering code inside Bacula -- something I don't plan > > to do in the near future, if for not other reason than doing so > > would mean a different driver for every operating system. > > The tape driver doesn't have any buffering code (unlike Linux which > does). The tape drive has a buffer. We are just enabling the use > of that buffer. If you really want to do this simply, just do a > write filemarks of 0 marks everytime you are about to switch input > files. The write marks flushes the device's buffer an guarantees > that any residual will be within the fd that you are currently using. > This would imply that you only need to explicitly buffer if you support > backups from stdin. I don't mind if the tape drive buffers data as long as it writes *all* of that data to the tape and informs me on the next write that the LEOM logical EOM in Solaris parlance (or early EOM) has been hit. If the drive cannot write *all* the data it has accepted to the tape because of the EOM or whatever (I/O error), then I *much* prefer to turn that mode off and write a block at a time. Bacula in such a single write non-buffered mode Bacula is faster than Networker, which for the moment is good enough for me. I think that I can get even more speed by internally buffering and possibly using asynchronous writes -- but that is for the pretty far future and will undoubtedly be OS dependent since there seems to be no standard interface for enabling/disabling such modes. > > > I'm not convinced that there is really much loss in performance, > > and even if I am wrong (quite possibly) > > it can be easily compensated by having Bacula > > buffer itself and using a separate thread dedicated to writing > > and using synchronous (non-buffered) writes in the OS driver. > > You can never recover the round trip time on the SCSI bus unless > you either have a device that allows you to queue more than one > command at a time or that buffers. I believe that only FC tape > devices support queuing more than one command at a time, but few > programs support this anyway (unless you lie and say that a previous > write has completed). I can see that performance concerns you because you wrote the driver, but for me (and most users I believe) what counts is data integrity first and then performance. In addition for me as a systems applications writer, I look for the common denominator so that my program will work on the maximum machines. Writing to a specific machine is very difficult for me since I only have access to Linux and at times Solaris machines with tape drives. > > > How do you support tar? Tar knows nothing about buffering -- > > at least not GNU tar to the best of my knowledge. > > I think few people use tar for multi-volume backups unless they > specify a specific tape length, but I really don't know. I'm beginning to understand why Amanda doesn't handle multi-volume backups. I guess I can tell FreeBSD users that they can use the tape drive *if* they specify a tape length, but that seems a pity. > > >> Perhaps we should also implement the MTCACHE/MTNOCACHE opcodes so > >> that userland apps can control this. It's not clear if this is > >> exactly what they were created for, but it may be better to use > >> these than to add some other opcodes. > > > >> From my experience with Solaris/Linux (absolutely no problems in > > 3 years), I'd recommend implementing a non-buffered mode (your > > MTNOCACHE I assume), and it should be the default. In fact, > > though it is certainly possible and possibly worth the effort, > > I've never heard of any standard Unix program handling a > > buffered tape drive. If you know one, I would certainly like to > > know about it. > > Standard program? I don't know about that, but the commercial > apps have always supported buffered mode. Well, in the case of Networker on Solaris, that hasn't helped them much -- in any case, I *will* support buffered mode someday even if it is my own buffering. > > > Exactly what ioctl() does what is not critical for me as I can > > always code it -- what counts is that it is well documented. > > Of course, the more things are standard across systems, the > > easier it is to program. > > It's not clear to me that there is a standard. Yes, it is a pity isn't it, and I'm certainly not blaming anyone especially you. > > > Maybe I missed it, but I didn't see anything that indicated that > > the FreeBSD does asynchronous writes. > > >From looking at the sa driver, it appears that it always tries to > do buffered writes unless there is a device quirk indicating that > mode select doesn't work. Hmmm. Well short term, it looks like the user must specify the size -- something almost impossible to do with any precision given hardware compression on drives these days. In the longer run, I hope you will consider either turning off buffering by default or at least letting me (in user land) do so. Best regards, Kern