Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 22 Feb 2003 11:47:41 -0800
From:      Marcel Moolenaar <marcel@xcllnt.net>
To:        Hidetoshi Shimokawa <simokawa@sat.t.u-tokyo.ac.jp>
Cc:        firewire@FreeBSD.org
Subject:   Re: Bad news: bus resets not fixed yet
Message-ID:  <20030222194741.GA579@dhcp01.pn.xcllnt.net>
In-Reply-To: <ybssmugmopp.wl@ett.sat.t.u-tokyo.ac.jp>
References:  <20030222002547.GA1247@dhcp01.pn.xcllnt.net> <ybssmugmopp.wl@ett.sat.t.u-tokyo.ac.jp>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Feb 22, 2003 at 10:25:54PM +0900, Hidetoshi Shimokawa wrote:
> > 
> > I rebooted after Hidetoshi-san's patch has been committed and I
> > also made sure to include ken's cd(4) fixes, but unfortunately
> > the bus resets have returned. We may have timing problems that
> > were hidden by the additional overhead of emitting the debug info.

As a quick update: Enabling debug in sbp "fixed" the problem.
dmesg shows (partial, it should not be much different (if at all)
from the one I gave the URL to before -- now it's archived):

	:
FreeBSD 5.0-CURRENT #10: Sat Feb 22 01:18:29 PST 2003
    marcel@dhcp01.pn.xcllnt.net:/usr/src/sys/i386/compile/VAIO
	:
fwohci0: <Texas Instruments TSB43AB22/A> mem 0xe0200000-0xe0203fff,0xe0205000-0xe02057ff at device 2.0 on pci2
fwohci0: PCI bus latency was changing to 250.
pcib1: slot 2 INTA is routed to irq 9
fwohci0: OHCI version 1.10 (ROM=1)
fwohci0: No. of Isochronous channel is 4.
fwohci0: EUI64 08:00:46:03:01:24:cb:1d
fwohci0: Phy 1394a available S400, 2 ports.
fwohci0: Link S400, max_rec 2048 bytes.
firewire0: <IEEE1394(FireWire) bus> on fwohci0
sbp_identify
sbp_probe
sbp0: <SBP2/SCSI over firewire> on firewire0
sbp_attach (cold=1)
fwohci0: Initiate bus reset
fwohci0: BUS reset
fwohci0: node_id = 0xc000ffc1, CYCLEMASTER mode
firewire0: 2 nodes, maxhop <= 1, cable IRM = 1 (me)
	:
firewire0: New S400 device ID:08004603011eb709
firewire0: Device SBP-II
sbp_post_explore (sbp_cold=2)
sbp_post_explore: EUI:08004603011eb709 spec=1 key=1.
target 0 lun 0 found
sbp0:0:0 LOGIN
sbp0:0:0 ordered:0 type:5 EUI:08004603011eb709 node:0 speed:2 maxrec:10 new!
sbp0:0:0 'Sony' 'PCGA-DSM5' 'ad1830'
fwohci0: BUS reset
fw_xfer_done: pending
fwohci0: node_id = 0xc000ffc1, CYCLEMASTER mode
firewire0: 2 nodes, maxhop <= 1, cable IRM = 1 (me)
ad0: 28615MB <HITACHI_DK23DA-30> [58140/16/63] at ata0-master UDMA100
pcm0: measured ac97 link rate at 5994 Hz
sbp_post_explore (sbp_cold=1)
sbp_post_explore: EUI:08004603011eb709 spec=1 key=1.
sbp0:0:0 RECONNECT
sbp0:0:0 ordered:0 type:5 EUI:08004603011eb709 node:0 speed:2 maxrec:10 new!
sbp0:0:0 'Sony' 'PCGA-DSM5' 'ad1830'
fw_attach_dev: 1 pending handlers called
sbp0:0:0 ORB status src:1 resp:0 dead:0 len:1 stat:4 orb:00080513c
sbp0:0:0 Access denied
sbp0:0:0 reconnect failed
sbp0:0:0 LOGIN
(probe0:umass-sim0:0:0:0): INQUIRY. CDB: 12 0 0 0 24 0 
(probe0:umass-sim0:0:0:0): CAM Status: SCSI Status Error
(probe0:umass-sim0:0:0:0): SCSI Status: Check Condition
(probe0:umass-sim0:0:0:0): UNIT ATTENTION asc:29,0
(probe0:umass-sim0:0:0:0): Power on, reset, or bus device reset occurred
(probe0:umass-sim0:0:0:0): Retrying Command (per Sense Data)
(probe0:umass-sim0:0:0:0): INQUIRY. CDB: 12 0 0 0 24 0 
(probe0:umass-sim0:0:0:0): CAM Status: SCSI Status Error
(probe0:umass-sim0:0:0:0): SCSI Status: Check Condition
(probe0:umass-sim0:0:0:0): NOT READY asc:3a,0
(probe0:umass-sim0:0:0:0): Medium not present
(probe0:umass-sim0:0:0:0): Unretryable error
Mounting root from ufs:/dev/ad0s1a
sbp0:0:0 login: len 12, ID 0, cmd 0000fffff0010100, recon_hold 0
sbp0:0:0 sbp_busy_timeout
sbp0:0:0 sbp_agent_reset
sbp0:0:0 sbp_do_attach
sbp0:0:0 sbp_cam_scan_lun
sbp0:0:0 ORB status src:1 resp:0 dead:0 len:7 stat:c orb:00080560c
sbp0:0:0 Request aborted
sbp0:0:0 XPT_SCSI_IO: cmd: 12 01 80 00 ff 00 ff af f7 f7, flags: 0x40, 6b cmd/255b data/18b sense
sbp0:0:0 SCSI status 2 sfmt 0 valid 0 key 5 code 24 qlfr 0 len 7
sbp0:0:0 ORB status src:1 resp:0 dead:0 len:7 stat:c orb:000805740
sbp0:0:0 Request aborted
sbp0:0:0 XPT_SCSI_IO: cmd: 00 00 00 00 00 00 ff af f7 f7, flags: 0xc0, 6b cmd/0b data/32b sense
sbp0:0:0 SCSI status 2 sfmt 0 valid 0 key 6 code 29 qlfr 0 len 7
sbp0:0:0 sbp_cam_callback
sbp0:0:0 ORB status src:1 resp:0 dead:0 len:7 stat:c orb:000805874
sbp0:0:0 Request aborted
sbp0:0:0 XPT_SCSI_IO: cmd: 25 00 00 00 00 00 00 00 00 00, flags: 0x40, 10b cmd/8b data/32b sense
sbp0:0:0 SCSI status 2 sfmt 0 valid 0 key 2 code 3a qlfr 0 len 7
cd0 at sbp0 bus 0 target 0 lun 0
cd0: <MATSHITA UJDA730 DVD/CDRW 1.00> Removable CD-ROM SCSI-0 device 
cd0: 50.000MB/s transfers
cd0: Attempt to query device size failed: NOT READY, Medium not present

This looks good. Next step: Rebuild with the exact same sources, but
with debug disabled again, just to make sure no other changes have been
made (ie cvs update). If the bus reset loop (I have to give it a name)
reappears we have something that's reproducable, which means we can
try to narrow down on the code...

> > o  Aren't bus resets expensive operations in general/principle?
> 
> It's expensive, at least, for bus operation.
> We cannot do transactions while bus reset phase.

Hmmmm... As Katsushi-san explained, bus resets happen for a couple
of reasons of which one can be unstability. If we know bus resets
are expensive, we may be able to limit the rate of bus resets or
simply bring down the bus (can we keep a device off the bus?) if
we detect a bus reset loop... just a thought...

> > o  My logs show 6 resets per second (unoptimized :-). Isn't this
> >    high no matter if there's a bug or not?
> 
> Too high and unusual.
> Please note the driver doesn't initiate bus reset at all as far
> as dmesg says. Your CD drive seems initiate bus reset.
> (fwcontrol -t shows who initiated bus reset last time)
> phy chip of your laptop may initiate bus reset.

Ah... interesting. With debug enabled it shows:

dhcp01% sudo fwcontrol -t
crc_len: 4 generation:2 node_count:2 sid_count:2
id link gap_cnt speed delay cIRM power port0 port1 port2 ini more
00    1       1  S400     0    0    0W     -     P         1    0
01    1       1  S400     0    1    0W     -     C         0    0

I'm interested to see what it will show when I disable debug
again.

> Even with the success case:
> http://www.xcllnt.net/~marcel/vaio.txt 
<snip>
> Someone initiated bus reset just after login operation. I'd like to
> know why this happens...

Ok. I'll pay special attention to that.

-- 
 Marcel Moolenaar	  USPA: A-39004		 marcel@xcllnt.net

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-firewire" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20030222194741.GA579>