Date: Tue, 3 Jun 2003 09:03:36 -0700 (PDT) From: Matthew Jacob <mjacob@feral.com> To: Kern Sibbald <kern@sibbald.com> Cc: freebsd-scsi@freebsd.org Subject: Re: SCSI tape data loss Message-ID: <20030603084701.U24586@wonky.in0.lcl> In-Reply-To: <1054653106.13606.217.camel@rufus> References: <3EDB31AB.16420.C8964B7D@localhost> <3EDB59A4.27599.C93270FB@localhost> <577540000.1054579840@aslan.btc.adaptec.com> <20030602131225.F71034@beppo> <1054645616.13630.161.camel@rufus> <1054653106.13606.217.camel@rufus>
next in thread | previous in thread | raw e-mail | index | archive | help
> > This is exactly what it does. *Every* time the requested write > size does not agree with the returned value, Bacula gives > up on the tape. My last email has the code that does that. > > My email above was not very clear because I was telling you what > happened in the particular case of loss of data (the -1 and errno=0 > or errno=ENOSPC I don't know which). As noted here, Bacula *will* > stop writing if the driver returns a short block (assuming my > code isn't broken), but I have never seen that case on FreeBSD. That's really wierd. I have to look at this closer. I've had some drives not report LEOT at all, but since tape_pattern_tester didn't complain on the same drive you were using, I know tape_pattern_tester is in fact stopping at LEOT. write(2) isn't necessarily returning -1. It may be returning 0- which means that no data moved. I think the ENOSPC as you report is a red herring because you're setting this value- unless you actually *did* see -1 returned from write(2) and ENOSPC set in errno,. In any case, even if you hit PEOT instead of LEOT, you shouldn't *lose* data. If you hit PEOT, we have to return -1/ENOSPC. Because this is Unix or Linux or Solaris instead of a reasonable and modern OS, like RSX, VMS or NT, which allow you to give realistic details to failures in I/O requests, this means you have no way of telling the user application how much was *actually* written when you hit *PEOT* (not LEOT, note!). As far as the user application is concerned, *no* data was written at all for this last write. But there may in fact be data on the tape media. What is particularily annoying in the PEOT case is that your application probably asked for the next tape and rewrote all the blocks from the failed write. This is fine, but you have to make damned sure then on rereading the data later that you can handle duplicate blocks because you may read blocks NOPQR on tapeA and then switch to tapeB and read blocks OPQR again on tapeB. I don't think this is your problem here, but I thought I'd have a pre-coffee diatribe about it. Grump. > > > Ignoring the short write and waiting until you hit ENOSPC guarantees > > you will hit PEOM, since the LEOM is only reported once. The tape > > driver expects that you know what you are doing if you go on writing. > > The only additional writing Bacula does (unless I am missing something) > is the two EOF marks. This is one of the things that's bothering me. You shouldn't be writing extra marks if you actually close the device. I'd like to look over all the current Bacula source, but sourceforge is offline at the moment. -matt
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20030603084701.U24586>