From owner-freebsd-scsi@FreeBSD.ORG Mon Jun 2 15:45:15 2003 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3DF0A37B404 for ; Mon, 2 Jun 2003 15:45:15 -0700 (PDT) Received: from beppo.feral.com (beppo.feral.com [192.67.166.79]) by mx1.FreeBSD.org (Postfix) with ESMTP id DEF7A43FAF for ; Mon, 2 Jun 2003 15:45:13 -0700 (PDT) (envelope-from mjacob@feral.com) Received: from mailhost.feral.com (mjacob@mailhost.feral.com [192.67.166.1]) by beppo.feral.com (8.12.9/8.12.9) with ESMTP id h52MjCqw072758; Mon, 2 Jun 2003 15:45:13 -0700 (PDT) (envelope-from mjacob@feral.com) Date: Mon, 2 Jun 2003 15:45:12 -0700 (PDT) From: Matthew Jacob X-X-Sender: mjacob@beppo To: Kern Sibbald In-Reply-To: <1054593075.13606.28.camel@rufus> Message-ID: <20030602154021.T71034@beppo> References: <3EDB31AB.16420.C8964B7D@localhost> <3EDB59A4.27599.C93270FB@localhost> <577540000.1054579840@aslan.btc.adaptec.com> <20030602131225.F71034@beppo> <1054590119.13606.8.camel@rufus> <20030602145421.D71034@beppo> <1054593075.13606.28.camel@rufus> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: freebsd-scsi@freebsd.org Subject: Re: SCSI tape data loss X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: mjacob@feral.com List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 Jun 2003 22:45:15 -0000 On Mon, 3 Jun 2003, Kern Sibbald wrote: > On Mon, 2003-06-02 at 23:55, Matthew Jacob wrote: > > > I suspect that the problem is something very simple such as > > > the drive buffering data then hitting the physical EOM and > > > of course any buffered data goes down the bit bucket. > > > > A question to ask then is why tape_pattern_tester stopped at LEOT but > > Bacula didn't and kept going to PEOT. > > > > -matt > > This was just a thought, because you or Justin said that > the driver does not fail writes at the LEOF, which means > that unless you are doing something special in your > tpt, it is not stopping at the LEOF. Yes, it does provide a signfier. At the end of one operation that has athe check condition that indicates early warning: } else if (sense->flags & SSD_EOM) { softc->flags |= SA_FLAG_EOM_PENDING; and SA_FLAG_ERR_PENDING = (SA_FLAG_EOM_PENDING|SA_FLAG_EIO_PENDING| SA_FLAG_EOF_PENDING), and at the start of an I/O: } else if ((softc->flags & SA_FLAG_ERR_PENDING) != 0) { .... bp->b_resid = bp->b_bcount; ... if ((softc->flags & SA_FLAG_EOM_PENDING) != 0) { /* * We now just clear errors in this case * and let the residual be the notifier. */ bp->b_error = 0; The signifier here back to the user application is a write returning less than the requested amount. > > One thought that I had is: the fact that Bacula backs > up at the EOM to re-read the last record could cause > some problems. I've asked Dan if he will re-run the > Bacula backup/restore test but with the re-read disabled. > As someone said, this will give one more data point. Yes. > > Another interesting test would be to see if the same > data loss occurs in a situation where a tape size is > specified such that Bacula stops writing before the > EOM on the first tape. That too. -matt