Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 3 Jun 2003 11:39:28 -0700 (PDT)
From:      Matthew Jacob <mjacob@feral.com>
To:        Kern Sibbald <kern@sibbald.com>
Cc:        freebsd-scsi@freebsd.org
Subject:   Re: Differences between Solaris/Linux and FreeBSD
Message-ID:  <20030603111738.X24586@wonky.in0.lcl>
In-Reply-To: <1054550725.1582.1859.camel@rufus>
References:  <1054490081.1582.1685.camel@rufus> <2846020000.1054498114@aslan.scsiguy.com> <1054550725.1582.1859.camel@rufus>

next in thread | previous in thread | raw e-mail | index | archive | help

> As promised, in this email, I will try my best to describe
> the differences I found between Solaris/Linux and FreeBSD
> concerning tape handling. There were five separate areas
> where I noticed differences:
>
> 1. On Solaris/Linux, the default behavior for ioctl(MTEOM)
>    is to run in what they call slow mode. In this mode, the
>    tape is positioned to the end of the data, and the driver
>    returns the correct file number in the MTIOCGET packet.
>    It is possible to enable fast-EOM, but no one uses it to
>    my knowledge.
>
>    On FreeBSD, you apparently always use the fast-EOM so that
>    the tape position is unknown after the ioctl().

You *could* read block position. Particularly for h/w blocks this works
very fast when you need to locate.

NB: SCSI-3 changed the layout for h/w block position stuff and I haven't
updated the FreeBSD driver to handle this yet.

>    Bacula always knows how many files are on a tape, and when
>    appending to a tape that is already written and newly opened,
>    it MUST know where it is on the tape. As a consequence, on
>    FreeBSD, I must explicitly use MTFSF with read()s in between
>    to position to the end of the tape -- a fairly slow affair.

Uh, this is how 'slow' EOM works. It's not really faster to do it in the
kernel as opposed to in the driver.

I must point out that you cannot, and should not, depend absolutely on
reported position. For tape you can ensure BOT or end of recorded media,
but otherwise you really must use self-referential data on the tape if
tape location is important.

> 2. Your handling of EOM differs from Solaris/Linux.  On both of
>    those systems, when the Bacula reads the first EOF, the driver
>    returns 0 bytes read. On reading the second EOF, the driver
>    returns 0 bytes read, but before returning backspaces over
>    the EOF, leaving you positioned correctly for appending to the
>    tape and having told you you are at the end of the tape by
>    giving two consecutive 0 byte read.  Any further read()
>    request return an I/O error.
>
>    On FreeBSD, reading the first EOF returns 0 bytes, reading
>    the second EOF also returns 0 bytes (sometimes, I apparently
>    get "Illegal operation"). However, the tape is left positioned
>    after the second EOF, so appending from that point effectively
>    "loses" the data.
>
>    To handle this correctly the FreeBSD user must add a configuration
>    statement to Bacula telling him to backspace file at EOM.

Yes. This is a problem.

But part of the problem here is that dual-filemark at EOM is only one
tape convention- and a poorly thought out one at best- it exists
*solely* because a *few* (ancient) tape drives would unwind off the feed
reel if you kept advancing them. For QIC drives, you *cannot* write dual
filemarks (really).

Note that there is a setting that can change the model to single EOM. If
I could have gotten away with it, I would have made this the default.

I think, though, I'd accept that the FreeBSD behaviour is a bug that
should be fixed. If we have a dual fmk EOT model and are advancing along
and hit two in a row, we *probably* should say we're at logical EOT and
backspace over one of them. After all, this is what we do when we're
*writing* to tape and close the no-rewind device.

I also would agree that this situation is exacerbated by the 'space to
end of recorded data' model for the MTEOM command. This now leaves us
with a legacy of tapes with spurious dual filemarks in the middle.

Oops. This means that I really can't fix things the way you'd like :-(.

>
> 3. I have previously described this but will do so again for
>    completeness here. On Solaris/Linux when Bacula does:
>
>     write();
>     ioctl(MTEOF);
>     ioctl(MTEOF)
>     ioctl(MTBSF);
>     ioctl(MTBSF);
>     ioctl(MTBSR);
>     read();
>
>    the read() re-reads the last write.  On FreeBSD, the read returns
>    0 bytes (there is also a problem of freezing the tape wrapped into
>    this example if I am not mistaken). Apparently the 0 bytes read is
>    because FreeBSD adds an additional EOF mark (not necessary) and
>    leaves the drive positioned *after* the mark thus re-reading the
>    last record fails when it logically should not.

I don't believe that FreeBSD adds an additional filemark here, but I
should add this as a test case. I have another tester program that I use
for testing block locate, but I haven't really validated it or finished
it yet.

Why, btw, are you issuing two MTEOFs? The mtop has a count field y'know
:-).

>
> 4. Tape freezing: On Solaris/Linux, the tape never "freezes". On
>    FreeBSD it does freeze. As best I can determine, you freeze the
>    drive when you lose track of where you are. Typically, this
>    occurs when I do a MTBSR to re-read the last record. On Solaris/Linux
>    the tape is never frozen, but when they don't know the position,
>    they simply return -s in the MTIOCGET packet, which is fine with
>    me because Bacula only uses that info when initially reading a
>    tape to append to it.
>
>    Freezing the tape causes all sorts of problems because it generates
>    a flood of unexpected errors. Within a large complicated program like
>    Bacula, when a low level routine re-reads a record during writing and
>    the tape freezes, it cannot simply rewind the drive as this could
>    cause chaos and possible overwriting of the beginning of the drive.
>
>    I've attempted to overcome tape freezing by providing the user a
>    means to turn off MTBSR (but they don't always do so), and by issuing
>    ioctl(MTIOCERRSTAT) after every return of -1 from any I/O request.
>
>    I recommend that you do away with freezing the drive -- it seems to
>    me that it only causes more problems.  In saying that I have to
>    that I really do not understand tape freezing or why you do it since
>    I found no documentation on it, and everything I write above I have
>    deduced from what Dan has reported back to me.

Freezing the drive is precisely what Solaris and Linux *should* do. If
you've lost position, you have to take some action to bring the tape to
a known position. The unaware application should not be allowed to
overwrite in random spots on the tape. If your low level read/write
routines get any kind of error, you have to move to a "what do I have in
my tape drive now?" state anyway.

You know, I was pretty sure I'd documented the freeze option, but I
cannot find it in the man page (sa(4)) now at all.


>
> 5. I am quite fuzzy on this point because I forget exactly what happened
>    and what I did about it.
>
>    It seems to me that on Linux, if I read a block but specify a number
>    of bytes less than the number actually in the block on the tape, the
>    driver returns the data anyway.  I then check if the block is
>    internally complete and if not, increase my record size to the size
>    indicated in the data received, backspace one record, and re-read it.
>
>    If I am not mistaken, on FreeBSD, the first read returns an error,
>    and Bacula just immediately gives up.  Your documentation specifies
>    that one can never read a partial record from a tape, but it does not
>    specify what error code is generated. As a consequence, rather than
>    recovering and re-reading the record, Bacula has to assume it was
>    a fatal error.

The reason linux 'succeeds' here is because linux internally reads all
tape data to an oversized buffer in kernel memory anyway. This means
that it doesn't suffer an 'overrun' condition which is what you are
doing if you attempt to read *less* than a tape record size. Solaris
will fail the same way, btw, as FreeBSD.

What you should always do is start out by reading the largest possible
record size (a pathetic 64KB for FreeBSD) and adjust *downward* (if
desired and you are just autosizing to find a tape record size).


THanks for doing the critique. There's definitely food for thought here
and some changes that *should* be made.

-matt



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20030603111738.X24586>