Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 5 Feb 1996 13:19:22 +0100 (GMT-1:00)
From:      "Jesus A. Mora Marin" <amora@obelix.cica.es>
To:        undisclosed-recipients:;
Message-ID:  <199602051219.NAA11830@obelix.cica.es>

next in thread | raw e-mail | index | archive | help
Hi. Weekend came, so time to work on what I do like (no more code-grinding
stupid apps in a brain-damaged 4GL five days a week, no more telling lusers
that ttys use to work faster when plugged. Barfulous, but have to earn my life).

I have compiled again my custom kernel with `config -g', backed up a version
with the debug info (more than 6MB long!), set the dumpdev option, rebooted
with the new stripped kernel and forced the crash. This time all worked right
and I could savecore and run a `gdb -k' session on the crash dump. Here you are:

(kgdb) symbol-file kernel.debug
Reading symbols from kernel.debug...done.
(kgdb) exec-file /usr/crash/kernel.0   # My /var fs is only 10MB -enough for me-
(kgdb) core-file /usr/crash/vmcore.0
IdlePTD 1a1000
current pcb at 195374
panic: page fault
#0  boot (howto=256) at ../../i386/i386/machdep.c:892
892					dumppcb.pcb_ptd = rcr3();
(kgdb) bt
#0  boot (howto=256) at ../../i386/i386/machdep.c:892
#1  0xf0112aa3 in panic (fmt=0xf016b6fc "page fault")
    at ../../kern/subr_prf.c:124
#2  0xf016c1ee in trap_fatal (frame=0xf0189f18) at ../../i386/i386/trap.c:745
#3  0xf016bd60 in trap_pfault (frame=0xf0189f18, usermode=0)
    at ../../i386/i386/trap.c:667
#4  0xf016b9ff in trap (frame={tf_es = 16, tf_ds = -252706800, tf_edi = 0, 
      tf_esi = -266780848, tf_ebp = -266821732, tf_isp = -266876354, 
      tf_ebx = 85, tf_edx = 560, tf_ecx = 561, tf_eax = -236834816, 
      tf_trapno = 12, tf_err = 2, tf_eip = -266876354, tf_cs = -267255800, 
      tf_eflags = 66118, tf_esp = 144, tf_ss = -266876864})
    at ../../i386/i386/trap.c:307
#5  0xf0164c9d in calltrap ()
#6  0xf017ca3e in matcd_blockread (state=144)
    at ../../i386/isa/matcd/matcd.c:2043
#7  0xf0107110 in softclock () at ../../kern/kern_clock.c:654
#8  0xf0165ff7 in doreti_swi ()
#9  0xf016b3ec in cpu_switch ()
(kgdb) up 4
#4  0xf016b9ff in trap (frame={tf_es = 16, tf_ds = -252706800, tf_edi = 0, 
      tf_esi = -266780848, tf_ebp = -266821732, tf_isp = -266876354, 
      tf_ebx = 85, tf_edx = 560, tf_ecx = 561, tf_eax = -236834816, 
      tf_trapno = 12, tf_err = 2, tf_eip = -266876354, tf_cs = -267255800, 
      tf_eflags = 66118, tf_esp = 144, tf_ss = -266876864})
    at ../../i386/i386/trap.c:307
307				(void) trap_pfault(&frame, FALSE);
(kgdb) frame frame->tf_ebp frame->tf_eip
#0  0xf017ca3e in matcd_blockread (state=144)
    at ../../i386/isa/matcd/matcd.c:2043
2043						*addr++=inb(port+DATA);
(kgdb) list   # Modified by hand. Not compiled lines deleted.
2033                            addr=bp->b_un.b_addr + mbx->skip;
2039				if (iftype==0) {	/*<20>Creative host I/F*/
2040					outb(port+PHASE,1);	/*Enable data read*/
2041					while((inb(port+STATUS) &
2042					      (DTEN|STEN))==STEN) {/*<19>*/
2043						*addr++=inb(port+DATA);
                                                ^^^^^^^
2047					}
2048                                    outb(port+PHASE, 0);    /* Disable read */
(kgdb) print addr
$1 = 0xf1e23000 <Address 0xf1e23000 out of bounds>
(kgdb) print bp
$2 = (struct buf *) 0xf0f04858
(kgdb) print bp->b_un.b_addr
$3 = 0xf1e22800   ..... (lot of pretty struct fields)
(kgdb) print mbx->skip
$4 = 0
(kgdb) quit
For the sake of completion, some other vars inspected in the same session:
mbx->nblk               1
mbx->partition          0
mbx->sz                 2048
cd->partflags[mbx->partition]   1
iftype                  0
i                       85
blknum                  0xf1e23000
ldrive                  0
cdrive                  0
port                    560 (0x230)
cmd                     { 0x00, 0xfc, 0x62, 0xf0, 0x00, 0xfc, 0x62, 0xf0,
                          0x00, 0xf6, 0x62, 0xf0 }
phase                   0
state                   0 (it was 0x90 at the start of function)

That is, the offending source line is exactly the one pointed out by J"org
Wunsch in msg <199601301131.MAA13737@uriah.heep.sax.de>. Further, at the start
of the read loop, `addr' pointed to 0xf1e22800 (bp->b_un.b_addr + mbx->skip),
and at the time of panic it was pointing to an address 2048 bytes up, i.e. the
loop was reading the 2049th byte of the block. The problem seems to be
hardware/firmware related (mbx values denote that a 2048 byte block was to be
read, I guess). For some reason, the drive wants to transfer more than
the 2KB expected.

> The only way that more than 2048 bytes could be read (assuming no drive
> malfunction) is if the "c" partition was opened, which causes the drive
> to read 2532 byte sectors.  This is intentional.  Partition "c" must
> never be mounted.

Er... I have NEVER tried to mount a "c" partition. I've read manpage for
matcd and was warned against.

> ....  It is possible that the firmware he has in the system
> has a bug and the drive returns more data than it should, which could cause
> a GPF, but such an action would break the Windows driver that also reads
> bytes until the drive says "All Done" as matcd does.

In fact, there aren't problems with Windows, nor Linux.

> I was planning to change the way this loop was done anyway to improve
> speed (the 6x TEAC drive bogs badly here) and simply pull in the expected
> number of bytes and deal with any excess later, but I really can't blame
> the current implementation as the cause of this reported failure since it
> won't fail here.

According to this idea I have patched the read loop, keeping a count of the
bytes read in a block and `breaking' out of the loop when it reaches the
expected 2048 bytes, and simply ignoring the excess. I am not sure this is
a convenient solution but it has proved to work great. Yes, the panic has gone,
although I wouldn't bet that this crude approach cannot raise some unexpected
and `creative' problems later :). For example, something must be done to deal
with raw blocks. Frank, I think that, if you've implemented the driver this way,
you'd have your reasons. So, any clue to deal with the nasty behaviour of my
hardware? I tell you, now this simplistic patch works fine for accessing data
in a mounted fs on CD-ROM -enough for me-, but...

> Again, I do not have the precise drive firmware revision Jesus has.

I have reviewed the text of my original report, and have seen I stated:

    Creative Lab, Matsushita/Panasonic CR-563-x 0.81.

Oops! Sorry, this is the signature of the MS-DOS CD-ROM driver. More details
about this: my CD-ROM drive is a Creative Lab CR-563-B, manufactured in
January '95. I cannot extract more info from the messy bunch of numbers and
codes in the labels on the drive. It is connected to the socket in a Creative
Labs SoundBlaster AWE 32, model CT2760. The board has a signature '02 95' on
it. Extract your own conclusions. (NB: -You CAN skip this comment- I know at
least two CR-563 drives that died miserably because of a broken design in
the heatsinking of the IC that drives the spindle motor. They simply stopped
working without even some smoke to let you know they had joined their
ancestors. I was told that some changes were introduced in the hardware of
model CR563B, but nobody is sure whether this has been fixed -so you couldn't :)

That's all. Comments, ideas, suggestions about this question will be highly
appreciated. Thanks.

                                        Jesus



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199602051219.NAA11830>