From owner-freebsd-scsi@FreeBSD.ORG Tue Jun 3 07:34:52 2003 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id BD96A37B401 for ; Tue, 3 Jun 2003 07:34:52 -0700 (PDT) Received: from beppo.feral.com (beppo.feral.com [192.67.166.79]) by mx1.FreeBSD.org (Postfix) with ESMTP id 99D7843FA3 for ; Tue, 3 Jun 2003 07:34:51 -0700 (PDT) (envelope-from mjacob@feral.com) Received: from mailhost.feral.com (mjacob@mailhost.feral.com [192.67.166.1]) by beppo.feral.com (8.12.9/8.12.9) with ESMTP id h53EYoqw044921; Tue, 3 Jun 2003 07:34:50 -0700 (PDT) (envelope-from mjacob@feral.com) Date: Tue, 3 Jun 2003 07:34:49 -0700 (PDT) From: Matthew Jacob X-X-Sender: mjacob@beppo To: Kern Sibbald In-Reply-To: <1054645616.13630.161.camel@rufus> Message-ID: <20030603072944.U44880@beppo> References: <3EDB31AB.16420.C8964B7D@localhost> <3EDB59A4.27599.C93270FB@localhost> <577540000.1054579840@aslan.btc.adaptec.com> <20030602131225.F71034@beppo> <1054645616.13630.161.camel@rufus> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: freebsd-scsi@freebsd.org Subject: Re: SCSI tape data loss X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: mjacob@feral.com List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Jun 2003 14:34:53 -0000 The fact that you're getting ENOSPC means that you're getting to PEOT- past LEOT. I guess I need to see the Bacula source to see why LEOT is being missed. If you can build a kernel with CAMDEBUG and run camcontrol debug -I b:t:l (bus:target:lun for the tape) and rerun the test, you'll get boatloads of output, but an audit trail of what sastart and saerror are doing around the PEOT timeframe. There's other stuff here that I need to collect my thoughts on to mail about. This will happen later today. On Tue, 3 Jun 2003, Kern Sibbald wrote: > Hello, > > Dan has now re-run our test of writing to two tapes. In > this test, he told Bacula not to attempt to re-read the > last block written, so Bacula wrote until -1 with errno=ENOSPC > was returned, wrote two EOF marks then put up > the next volume. > > The results were the same (more or less) 12 blocks of > data were lost, which corresponds to the smaller size > of the restored file that was split across two tapes. > > These 12 blocks were also at the end of the tape. > > During the restore, Bacula reported the following: > > 03-Jun-2003 05:01 undef-sd: RestoreFiles.2003-06-03_04.36.59 Error: > Invalid block number. Expected 6060, got 6072 > > and in Bacula's database, Bacula indicates that blocks > 0 to 6072 were written to the first tape. In fact, only > blocks 0 to 6071 were written to the first tape -- I > see that Bacula has included the failed block in its > count, which is wrong, but this doesn't change the results > at all though. > > Bottom line: > > Even when we eliminate the code that backs > up and re-reads the last block, we still see > the last 12 or 13 blocks being lost. They were > written by the program but are not physically > on the tape. > > Next step: > > Dan is now running a test where Bacula will stop > writing on the first tape before the EOM is reached. > > Best regards, > > Kern > > > > > >