Date: Wed, 1 Oct 2003 19:30:29 -0700 From: jasondic@sbcglobal.net To: scsi@freebsd.org Subject: Re: What's an appropriate response to "data overrun detected in Data-in phase"? Message-ID: <200310011930.29662.jasondic@sbcglobal.net> In-Reply-To: <200310012240.h91Medl9015195@bunrab.catwhisker.org> References: <200310012240.h91Medl9015195@bunrab.catwhisker.org>
next in thread | previous in thread | raw e-mail | index | archive | help
I dont' know what the problem is, but I know what the error means. It means the target sent more data then the initiator was expecting over the SCSI bus. That is usually followed by a bus reset..... -Jason On Wednesday 01 October 2003 15:40, David Wolfskill wrote: > I am trying to set up a box to act as a "backup" server (i.e., a > server to perform backups -- not a server to fill the breach left > when an active server falls over, for example). > > I note in passing that I have done this sort of thing previously > (with some success); but my attempts to date in the current sequence > have met with decidedly unsatisfactory results. :-( > > Some things that are common to the various attempts: > > * There is a "combination" AIT-3 tape drive & autoloader as the only > devices on the SCSI bus. (The word "combination" in quotes because > although it looks like a single box, it contains 2 different SCSI > targets -- the drive is target 6; the autoloader/robot/changer is > target 0.) Well, the SCSI host adapter is also on the SCSI bus. > > * The tape drive/autoloader has no options for internal termination; > rather, it has 2 SCSI connectors; one of these is connected to the > cable; the other, to a SCSI terminator. > > (Thus, bearing in mind that each 8-bit data channel on each SCSI bus > should be terminated precisely once at each of its 2 ends, termination > ought not be too complicated. Or so I thought.) > > * The tape drive/autoloader claims to be "autosensing SE/LVD". > > > Now some variables, with the most recently-used option last: > > * I have tried using both an UltraSPARC 5 (sparc64), running -CURRENT > as of a few days ago, a similar UltraSPARC 5 running Solaris 9 (but > I had no user-level programs to drive the hardware; unless OpenBoot's > "probe-scsi-all" reported the devices, then, I had no assurance that > the devices were seen). Most recently, I installed FreeBSD 4.9-RC1 > on a dual-CPU PII-400 box (then built a slightly-customized kernel, > to take advantage of the other CPU and to support the use of the > SCSI changer device)). It is this last that I will be discussing > in the text below, though it is my recollection that I got similar > behavior from the UltraSPARC 5 running -CURRENT. > > * SCSI host adaptors... I have been trying a few of these. There was an > Antares P-0060 (which, though differential, appears to not be LVD). > The others were Adaptec: an AHA-2940UW; a couple of SE AAA-131Bs; a > 39160 (though the only PCI slots I have available are 32-bit ones) and > finally, an SE/LVD AAA-131B. > > In the course of working with various SCSI cards, I've become rather > skeptical of the "automatic termination". I expect it's probably better > than it was several years ago, but I'm a little more comfortable > specifying it explicitly. Thus, given the topology of the bus in > question, I set the termination on the card to "on" (or, in one case > where the options were "off" or "auto," I left them at "auto"). Note > that the Antares card has resistor packs. > > Now, I'll (finally) get to my symptoms.... > > frecnocpc6# chio params > /dev/ch0: 8 slots, 0 drive, 1 picker > /dev/ch0: current picker: 0 > frecnocpc6# > > That works OK. Because of the errors I get below, I recompiled the > kernel, specifying the CAM debugging options and increasing the kernel > buffer from 10 pages to 40; here's what pops up on the (serial) console > when I did that: > > (ch0:ahc0:0:6:0): entering cdgetccb > (ch0:ahc0:0:6:0): xpt_schedule > (ch0:ahc0:0:6:0): xpt_setup_ccb > (ch0:ahc0:0:6:0): xpt_action > (ch0:ahc0:0:6:0): . CDB: 1a 8 1d 0 20 0 > (ch0:ahc0:0:6:0): ahc_action > (ch0:ahc0:0:6:0): ahc_done - scb 9 > (ch0:ahc0:0:6:0): xpt_done > (ch0:ahc0:0:6:0): camisr > (ch0:ahc0:0:6:0): xpt_action > (ch0:ahc0:0:6:0): . CDB: 1a 8 1f 0 20 0 > (ch0:ahc0:0:6:0): ahc_action > (ch0:ahc0:0:6:0): ahc_done - scb 2 > (ch0:ahc0:0:6:0): xpt_done > (ch0:ahc0:0:6:0): camisr > (ch0:ahc0:0:6:0): entering chioctl > (ch0:ahc0:0:6:0): trying to do ioctl 0x40086306 > (ch0:ahc0:0:6:0): entering chioctl > (ch0:ahc0:0:6:0): trying to do ioctl 0x40046304 > > Now, for an error condition: > > frecnocpc6# chio status > chio: /dev/ch0: CHIOGSTATUS: Input/output error > frecnocpc6# > > and the corresponding console messages: > > (ch0:ahc0:0:6:0): entering cdgetccb > (ch0:ahc0:0:6:0): xpt_schedule > (ch0:ahc0:0:6:0): xpt_setup_ccb > (ch0:ahc0:0:6:0): xpt_action > (ch0:ahc0:0:6:0): . CDB: 1a 8 1d 0 20 0 > (ch0:ahc0:0:6:0): ahc_action > (ch0:ahc0:0:6:0): ahc_done - scb 9 > (ch0:ahc0:0:6:0): xpt_done > (ch0:ahc0:0:6:0): camisr > (ch0:ahc0:0:6:0): xpt_action > (ch0:ahc0:0:6:0): . CDB: 1a 8 1f 0 20 0 > (ch0:ahc0:0:6:0): ahc_action > (ch0:ahc0:0:6:0): ahc_done - scb 2 > (ch0:ahc0:0:6:0): xpt_done > (ch0:ahc0:0:6:0): camisr > (ch0:ahc0:0:6:0): entering chioctl > (ch0:ahc0:0:6:0): trying to do ioctl 0x40086306 > (ch0:ahc0:0:6:0): entering chioctl > (ch0:ahc0:0:6:0): trying to do ioctl 0x800c6308 > (ch0:ahc0:0:6:0): entering cdgetccb > (ch0:ahc0:0:6:0): xpt_schedule > (ch0:ahc0:0:6:0): xpt_setup_ccb > (ch0:ahc0:0:6:0): xpt_action > (ch0:ahc0:0:6:0): . CDB: b8 0 0 84 0 1 0 0 4 0 0 0 > (ch0:ahc0:0:6:0): ahc_action > (ch0:ahc0:0:6:0): ahc_done - scb 9 > (ch0:ahc0:0:6:0): xpt_done > (ch0:ahc0:0:6:0): camisr > (ch0:ahc0:0:6:0): xpt_action > (ch0:ahc0:0:6:0): . CDB: b8 0 0 84 0 1 0 0 0 20 0 0 > (ch0:ahc0:0:6:0): ahc_action > (ch0:ahc0:0:6:0): data overrun detected in Data-in phase. Tag == 0x2. > (ch0:ahc0:0:6:0): Have seen Data Phase. Length = 32. NumSGs = 1. > sg[0] - Addr 0x01367c180 : Length 32 > (ch0:ahc0:0:6:0): ahc_done - scb 2 > (ch0:ahc0:0:6:0): xpt_done > (ch0:ahc0:0:6:0): camisr > (ch0:ahc0:0:6:0): xpt_action > (ch0:ahc0:0:6:0): . CDB: b8 0 0 84 0 1 0 0 0 20 0 0 > (ch0:ahc0:0:6:0): xpt_setup_ccb > (ch0:ahc0:0:6:0): xpt_action > (ch0:ahc0:0:6:0): ahc_action > (ch0:ahc0:0:6:0): data overrun detected in Data-in phase. Tag == 0x9. > (ch0:ahc0:0:6:0): Have seen Data Phase. Length = 32. NumSGs = 1. > sg[0] - Addr 0x01367c180 : Length 32 > (ch0:ahc0:0:6:0): ahc_done - scb 9 > (ch0:ahc0:0:6:0): xpt_done > (ch0:ahc0:0:6:0): camisr > (ch0:ahc0:0:6:0): xpt_setup_ccb > (ch0:ahc0:0:6:0): xpt_action > > > All of which I find rather perplexifying -- what can I do about it? > > (The most recent previous experience I had was with an ADIC 7-slot DLT > autoloader & drive; no problems anything like this. This one is made > by "Bason", and is an 8-slot AIT-3 autoloader & drive.) > > The dmesg weighs in at about 56 KB, so I'm a little reluctant to post > it here, but I'll be happy to provide it privately (or put it up on > my Web server someplace), if that's called for. > > Finally, please include me in replies; I'm not subscribed to -scsi@. > > Thanks in advance, > david
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200310011930.29662.jasondic>