Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 4 Jan 2001 10:34:26 +1030
From:      Greg Lehey <grog@lemis.com>
To:        Roman Shterenzon <roman@jamus.xpert.com>
Cc:        Daniel Lang <dl@leo.org>, freebsd-stable@freebsd.org
Subject:   Re: Vinum saga continues
Message-ID:  <20010104103426.C4336@wantadilla.lemis.com>
In-Reply-To: <20010103141514.A381@jamus.xpert.com>; from roman@jamus.xpert.com on Wed, Jan 03, 2001 at 02:15:14PM %2B0200
References:  <20010103141514.A381@jamus.xpert.com>

next in thread | previous in thread | raw e-mail | index | archive | help
[Format recovered--see http://www.lemis.com/email/email-format.html]

On Wednesday,  3 January 2001 at 14:15:14 +0200, Roman Shterenzon wrote:
> Hi,
>
> Attached is the most valuable information that was in my pr 22103.
> I've read the vinumdebug and the other guy's PR.
> I'm still not getting what is missing.
> You told the other guy to submit the backtrace, but it was in fact submitted!
> It's as well in my PR as well.
> Your responses are very brief - "please read vinumdebug", but in fact, if
> there's something that is missing, you can be more specific.

OK.  I don't know what's so difficult about this, but here we go:.  On
the web page to which I refer, I say:

  If you need to contact me because of problems with Vinum, please send
  me a mail message with the following information:

  - What problems are you having?

    You don't say this, but I suppose it's obvious.

  - Which version of FreeBSD are you running?

    I can't find this in your report.    

  - Have you made any changes to the system sources, including Vinum?

    I can't find this in your report.    

  - Supply the output of the vinum list command.  If you can't start
    Vinum, supply the on-disk configuration, as described below.  If
    you can't start Vinum, then (and only then) send a copy of the
    configuration file.

    I can't find this in your report.    

  - Supply an extract of the Vinum history file.  Unless you have
    explicitly renamed it, it will be /var/log/vinum_history.  This
    file can get very big; please limit it to the time around when you
    have the problems.  Each line contains a timestamp at the
    beginning, so you will have no difficulty in establishing which
    data is of relevance.

    I can't find this in your report.    

  - Supply an extract of the file /var/log/messages.  Restrict the
    extract to the same time frame as the history file.  Again, each
    line contains a timestamp at the beginning, so you will have no
    difficulty in establishing which data is of relevance.

    I can't find this in your report.    

  - If you have a crash, please supply a backtrace from the dump
    analysis as discussed below under Kernel Panics.  Please don't
    delete the crash dump; it may be needed for further analysis.

Basically, all I can see here is the backtrace, which is still wrapped
at 80 characters, despite all my requests.  I've had to manually
reformat it to make it legible.  Have you really read the web page?

> Alfred Perlstein looked it my PR once and he thinks that it's due to
> stack smashing.
> However, he wasn't able to find where it happends.
> It may be in fact interaction with some other driver, like you said, for
> example - fxp. This is why I submitted the dmesg output.

Please, only if I ask for it.

>  #62 0xc023660b in trap (frame={tf_fs = 0xc0270010, tf_es = 0xc0150010, tf_ds = 0x680010, tf_edi = 0xc16e9588,
>        tf_esi = 0xc16e9400, tf_ebp = 0xc02773b0, tf_isp = 0xc0277380, tf_ebx = 0xc208e340, tf_edx = 0x0,
>        tf_ecx = 0x5610001, tf_eax = 0xff9773bf, tf_trapno = 0xc, tf_err = 0x2, tf_eip = 0xc150fc67, tf_cs = 0x8,
>        tf_eflags = 0x10246, tf_esp = 0xc16e9588, tf_ss = 0xc14bd000}) at ../../i386/i386/trap.c:426
>  #63 0xc150fc67 in complete_rqe () at /usr/src/sys/modules/vinum/../../dev/vinum/vinuminterrupt.c:199
>  #64 0xc0178d6b in biodone (bp=0xc16e9588) at ../../kern/vfs_bio.c:2637
>  #65 0xc0126bb9 in dadone (periph=0xc14ca700, done_ccb=0xc1808400) at ../../cam/scsi/scsi_da.c:1246
>  #66 0xc0122aff in camisr (queue=0xc0298690) at ../../cam/cam_xpt.c:6319
>  #67 0xc0122911 in swi_cambio () at ../../cam/cam_xpt.c:6222
>  #68 0xc022d0e0 in splz_swi ()
>  (kgdb) up 63
>  #64 0xc0178d6b in biodone (bp=0xc16e9588) at ../../kern/vfs_bio.c:2637
>  2637                   (*bp->b_iodone) (bp);
>  (kgdb) print bp
>  $1 = (struct buf *) 0xc16e9588
>  (kgdb) print *bp->b_iodone
>  $2 = {void ()} 0xc150f6ac <complete_rqe>
>  (kgdb) down
>  #63 0xc150fc67 in complete_rqe () at /usr/src/sys/modules/vinum/../../dev/vinum/vinuminterrupt.c:199
>  199    }
>  (kgdb) list
>  194                    VOL[rq->volplex.volno].active--;            /* another request finished */
>  195                biodone(ubp);                                   /* top level buffer completed */
>  196                freerq(rq);                                     /* return the request storage */
>  197            }
>  198        }
>  199    }
>  (kgdb) down
>  #62 0xc023660b in trap (frame={tf_fs = 0xc0270010, tf_es = 0xc0150010, tf_ds = 0x680010, tf_edi = 0xc16e9588,
>        tf_esi = 0xc16e9400, tf_ebp = 0xc02773b0, tf_isp = 0xc0277380, tf_ebx = 0xc208e340, tf_edx = 0x0,
>        tf_ecx = 0x5610001, tf_eax = 0xff9773bf, tf_trapno = 0xc, tf_err = 0x2, tf_eip = 0xc150fc67, tf_cs = 0x8,
>        tf_eflags = 0x10246, tf_esp = 0xc16e9588, tf_ss = 0xc14bd000}) at ../../i386/i386/trap.c:426
>  426                            (void) trap_pfault(&frame, FALSE, eva);
>  (kgdb) up 2
>  #64 0xc0178d6b in biodone (bp=0xc16e9588) at ../../kern/vfs_bio.c:2637
>  2637                   (*bp->b_iodone) (bp);
>  (kgdb) up
>  #65 0xc0126bb9 in dadone (periph=0xc14ca700, done_ccb=0xc1808400) at ../../cam/scsi/scsi_da.c:1246
>  1246                   biodone(bp);
>  (kgdb) print bp
>  $3 = (struct buf *) 0xc16e9588
>  (kgdb) print *bp
>    b_flags = 0x204,
>    b_qindex = 0x0,
>    b_xflags = 0x0,
>    b_lock = {
>      lk_interlock = {
>        lock_data = 0x0
>      },
>      lk_flags = 0x400,
>      lk_sharecount = 0x0,
>      lk_waitcount = 0x0,
>      lk_exclusivecount = 0x1,
>      lk_prio = 0x14,
>      lk_wmesg = 0xc0257a24 "bufwait",
>      lk_timo = 0x0,
>      lk_lockholder = 0x5
>    },
>    b_error = 0x0,
>    b_bufsize = 0x2000,
>    b_bcount = 0x2000,
>    b_resid = 0x0,
>    b_dev = 0xc15cd880,
>    b_data = 0xcbdcc000 "jA\002",
>    b_kvabase = 0x0,
>    b_kvasize = 0x0,
>    b_lblkno = 0x0,
>    b_blkno = 0x2b08149,
>    b_offset = 0x0,
>    b_iodone = 0xc150f6ac <complete_rqe>,

OK, this is *not* the buffer header corruption bug, but it's happening
in a very similar position.  With the buffer header corruption, you
wouldn't have got as far as this, because b_iodone would be zeroed
out.  I also can't see any other obvious damage to the buffer header.

What we need to do now is to find out where the trap occurred.  That's
at line 199 of complete_rqe, which shows as the very end of the
function.  Could you give me the following information from gdb,
please?

 (gdb) x/20i 0xc150fc60

Thanks
Greg
--
When replying to this message, please take care not to mutilate the
original text.  
For more information, see http://www.lemis.com/email.html
Finger grog@lemis.com for PGP public key
See complete headers for address and phone numbers


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20010104103426.C4336>