Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 18 Mar 1999 13:31:23 +1030
From:      Greg Lehey <grog@lemis.com>
To:        vallo@matti.ee
Cc:        freebsd-current@freebsd.org
Subject:   Help! (was: Crash while newfs'ing innocent vinum volume on fresh system.)
Message-ID:  <19990318133123.S429@lemis.com>
In-Reply-To: <19990317142107.A78437@matti.ee>; from Vallo Kallaste on Wed, Mar 17, 1999 at 02:21:07PM %2B0200
References:  <19990317142107.A78437@matti.ee>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wednesday, 17 March 1999 at 14:21:07 +0200, Vallo Kallaste wrote:
> I have reinstalled fresh 3.1-RELEASE to the machine in question,
> before swapped out two memory DIMM's to the single new 32MB one, just
> to be sure it's not some kind of memory error.

This is definitely not a memory error.  I wish I knew what it was,
though.  In my last message I said it wasn't appropriate to copy
-CURRENT, but maybe it is: I just can't figure out how this is
happening.

> Then I downloaded your newest archive for vinum installation:
> vinum-4.0-CURRENT.tar.gz...

> Copied the same kernel configuration file as usual to the right place
> from backup. Made some minor changes to kernel configuration, removed
> DMA support for ata disks, I don't trust the support very much yet.
> Made new debug kernel, copied to /var/crash, stripped the original
> with -g and installed. Reboot.

For just testing Vinum, you don't need to build a kernel.

> Newfs -n1 -d0 -v /dev/vinum/rsvol .... two lines newfs output and
> crash:
>
> Script started on Wed Mar 17 13:24:45 1999
> sh-2.02# gdb -k ker=07nel.gdb vmcore.0
> (kgdb) add-symbol-file /modules/vinum.ko 0x00005ba8+
> A syntax error in expression, near `'.

You shouldn't need to do this any more.  The .gdbinit file I put in
this directory did it for you.  When removing dumps, don't remove the
=2Egdbinit.

> #9  0xf01eadba in trap (frame=3D{tf_es =3D -203751408, tf_ds =3D -2037514=
08,
>       tf_edi =3D -559038242, tf_esi =3D -2147483648, tf_ebp =3D -20371514=
4,
>       tf_isp =3D -203715180, tf_ebx =3D -258066920, tf_edx =3D 16384, tf_=
ecx =3D 14,
>       tf_eax =3D -16162, tf_trapno =3D 12, tf_err =3D 0, tf_eip =3D -2576=
39372,
>       tf_cs =3D 8, tf_eflags =3D 66182, tf_esp =3D -257699840, tf_ss =3D =
-245981592})
>     at ../../i386/i386/trap.c:437
> #10 0xf0a4bc34 in ?? ()
> #11 0xf0a4bae1 in ?? ()
> #12 0xf0a4b8b6 in ?? ()
> #13 0xf012ffec in physio (strategy=3D0xf0a4b808 <swbuf+8322816>, bp=3D0x0,
>     dev=3D23296, rw=3D0, minp=3D0xf01300dc <minphys>, uio=3D0xf3db8f34)
>     at ../../kern/kern_physio.c:113
> #14 0xf0a4c97c in ?? ()
> #15 0xf0162f2f in spec_write (ap=3D0xf3db8ef8)
>     at ../../miscfs/specfs/spec_vnops.c:355

These are the critical frames.  With syumbols, they read:

#9  0xf01eadba in trap (frame=3D{tf_es =3D -203751408, tf_ds =3D -203751408=
, tf_edi =3D -559038242, tf_esi =3D -2147483648,
      tf_ebp =3D -203715144, tf_isp =3D -203715180, tf_ebx =3D -258066920, =
tf_edx =3D 16384, tf_ecx =3D 14, tf_eax =3D -16162,
      tf_trapno =3D 12, tf_err =3D 0, tf_eip =3D -257639372, tf_cs =3D 8, t=
f_eflags =3D 66182, tf_esp =3D -257699840,
      tf_ss =3D -245981592}) at ../../i386/i386/trap.c:437
#10 0xf0a4bc34 in launch_requests (rq=3D0xf0a3cb00, reviveok=3D0)
    at /usr/src/sys/modules/vinum/../../dev/vinum/vinumrequest.c:370
#11 0xf0a4bae1 in vinumstart (bp=3D0xf1569e68, reviveok=3D0)
    at /usr/src/sys/modules/vinum/../../dev/vinum/vinumrequest.c:300
#12 0xf0a4b8b6 in vinumstrategy (bp=3D0xf1569e68) at /usr/src/sys/modules/v=
inum/../../dev/vinum/vinumrequest.c:151
#13 0xf012ffec in physio (strategy=3D0xf0a4b808 <vinumstrategy>, bp=3D0x0, =
dev=3D23296, rw=3D0, minp=3D0xf01300dc <minphys>,
    uio=3D0xf3db8f34) at ../../kern/kern_physio.c:113
#14 0xf0a4c97c in vinumwrite (dev=3D23296, uio=3D0xf3db8f34, ioflag=3D1)
    at /usr/src/sys/modules/vinum/../../dev/vinum/vinumrequest.c:995
#15 0xf0162f2f in spec_write (ap=3D0xf3db8ef8) at ../../miscfs/specfs/spec_=
vnops.c:355

This is a problem I've seen before, but it completely baffles me.  The
request passed to launch_requests (frame 10) has been deallocated.
Some of the debug code I put in caught it:

(kgdb) p freeinfo[7]
$2 =3D {
  time =3D {
    tv_sec =3D 921669613,=20
    tv_usec =3D 289712
  },=20
  seq =3D 24,=20
  size =3D 36,=20
  line =3D 174,=20
  address =3D 0xf0a3cb00 "=DE=C0=AD=DEh\235\"=F0=DE=C0=AD=DE=C0=C8=A3=F0=DE=
=C0=AD=DE=DE=C0=AD=DE=DE=C0=AD=DE=DE=C0=AD=DE=DE=C0=AD=DE=DE=C0=AD=DE=DE=C0=
=AD=DE=DE=C0=AD=DE=DE=C0=AD=DE=DE=C0=AD=DE=DE=C0=AD=DE=DE=C0=AD=DE",=20
  file =3D "vinuminterrupt.c"
}

This was called from freerq, which frees the complete request.  freerq
is called from only four places: one on completion of the request
(which in this case is just about to be started), one if the request
is aborted (which also sets bp->b_error, which is not set here), once
in a read context (which is not applicable here: it's a write), and
once just before the call to launch_requests in frame 11:

292             if ((status > REQUEST_RECOVERED)                    /* can'=
t satisfy it */
293             ||(bp->b_flags & B_DONE)) {                         /* XXX =
shouldn't get this without bad status */
294                 if (status =3D=3D REQUEST_DOWN) {                   /* =
not enough subdisks */
295                     bp->b_error =3D EIO;                          /* I/=
O error */
296                     bp->b_flags |=3D B_ERROR;
297                 }
298                 if ((bp->b_flags & B_DONE) =3D=3D 0)
299                     biodone(bp);
300                 freerq(rq);
301                 return -1;
302             }
303             return launch_requests(rq, reviveok);               /* now =
start the requests if we can */

[the line numbers are off by 3, but the code here is unchanged].  If
we go through this code, we will also return.  There's something to
suggest that this code was executed, since the buffer was also marked
done (at line 298/300).  I checked the assembly code and confirmed
that it was, in fact, correct.

So where is this coming from?  I'm completely baffled.  It doesn't
happen to most people, though I have had reports of one or two other
cases.  About the only clue is that the problem didn't occur when I
removed the debug memory allocator, but I don't know whether it went
away or into hiding.  I'd really like to find out what's going on
here.

Here's what I've done:  I've built a version of vinum on your site
with even more debug code, and would like you to run it again.  It'll
crash, but hopefully the dump will give me more information.  If it's
OK for me to run the tests, tell me so and I'll do it during the
night.  When you run this time, do the following:

  # vinum
  vinum -> debug 104
  vinum -> setdaemon 1

Thanks for your patience.

Greg
--
See complete headers for address, home page and phone numbers
finger grog@lemis.com for PGP public key


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19990318133123.S429>