Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 3 Jun 2003 09:03:36 -0700 (PDT)
From:      Matthew Jacob <mjacob@feral.com>
To:        Kern Sibbald <kern@sibbald.com>
Cc:        freebsd-scsi@freebsd.org
Subject:   Re: SCSI tape data loss
Message-ID:  <20030603084701.U24586@wonky.in0.lcl>
In-Reply-To: <1054653106.13606.217.camel@rufus>
References:  <3EDB31AB.16420.C8964B7D@localhost> <3EDB59A4.27599.C93270FB@localhost> <577540000.1054579840@aslan.btc.adaptec.com> <20030602131225.F71034@beppo>  <1054645616.13630.161.camel@rufus>  <1054653106.13606.217.camel@rufus>

next in thread | previous in thread | raw e-mail | index | archive | help
>
> This is exactly what it does. *Every* time the requested write
> size does not agree with the returned value, Bacula gives
> up on the tape.  My last email has the code that does that.
>
> My email above was not very clear because I was telling you what
> happened in the particular case of loss of data (the -1 and errno=0
> or errno=ENOSPC I don't know which). As noted here, Bacula *will*
> stop writing if the driver returns a short block (assuming my
> code isn't broken), but I have never seen that case on FreeBSD.

That's really wierd. I have to look at this closer. I've had some
drives not report LEOT at all, but since tape_pattern_tester didn't
complain on the same drive you were using, I know tape_pattern_tester is
in fact stopping at LEOT.

write(2) isn't necessarily returning -1. It may be returning 0- which
means that no data moved.

I think the ENOSPC as you report is a red herring because you're setting
this value- unless you actually *did* see -1 returned from write(2) and
ENOSPC set in errno,.

In any case, even if you hit PEOT instead of LEOT, you shouldn't *lose*
data. If you hit PEOT, we have to return -1/ENOSPC. Because this is Unix
or Linux or Solaris instead of a reasonable and modern OS, like RSX, VMS
or NT, which allow you to give realistic details to failures in I/O
requests, this means you have no way of telling the user application how
much was *actually* written when you hit *PEOT* (not LEOT, note!). As
far as the user application is concerned, *no* data was written at all
for this last write.

But there may in fact be data on the tape media. What is particularily
annoying in the PEOT case is that your application probably asked for
the next tape and rewrote all the blocks from the failed write. This is
fine, but you have to make damned sure then on rereading the data later
that you can handle duplicate blocks because you may read blocks NOPQR
on tapeA and then switch to tapeB and read blocks OPQR again on tapeB.

I don't think this is your problem here, but I thought I'd have a
pre-coffee diatribe about it. Grump.


>
> > Ignoring the short write and waiting until you hit ENOSPC guarantees
> > you will hit PEOM, since the LEOM is only reported once.  The tape
> > driver expects that you know what you are doing if you go on writing.
>
> The only additional writing Bacula does (unless I am missing something)
> is the two EOF marks.

This is one of the things that's bothering me. You shouldn't be writing
extra marks if you actually close the device. I'd like to look over all
the current Bacula source, but sourceforge is offline at the moment.


-matt



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20030603084701.U24586>