Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 28 Nov 2005 10:43:38 +0100 (CET)
From:      Stijn Hoop <stijn@win.tue.nl>
To:        FreeBSD-gnats-submit@FreeBSD.org
Cc:        Lukas Ertl <le@FreeBSD.org>
Subject:   kern/89660: panic due to g_malloc returning null in gv_drive_done
Message-ID:  <20051128094338.3D52DAC823@sandcat.nl>
Resent-Message-ID: <200511280950.jAS9o27a078004@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help

>Number:         89660
>Category:       kern
>Synopsis:       panic due to g_malloc returning null in gv_drive_done
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Mon Nov 28 09:50:02 GMT 2005
>Closed-Date:
>Last-Modified:
>Originator:     Stijn Hoop
>Release:        FreeBSD 6.0-RELEASE i386
>Organization:
>Environment:

System: FreeBSD 6.0-RELEASE #1: Sun Nov 27 14:48:26 CET 2005 stijn@pcwin002.win.tue.nl:/net/freebsd/6.0-SECURITY/obj/net/freebsd/6.0-SECURITY/src/sys/SANDCAT i386

>Description:

- This machine panics every night on the daily maintenance time, with the
  following backtrace (hand transcribed because of a lack of serial cable):

Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0x0
fault code              = supervisor write, page not present
instruction pointer     = 0x20:0xc0711a41
stack pointer           = 0x28:0xe3245ccc
frame pointer           = 0x28:0xe3245cd8
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 3 (g_up)
[thread pid 3 tid 100021 ]
Stopped at      gv_drive_done+0x29:     movl    %ebx,0(%eax)
db> bt
Tracing pid 3 tid 100021 td 0xc1e8f300
gv_drive_done(c6bcc630) at gv_drive_done+0x29
biodone(c6bcc630) at biodone+0x8b
g_io_schedule_up(c1e8f300) at g_io_schedule_up+0x86
g_up_procbody(0,e3245d38) at g_up_procbody+0x6e
fork_exit(c0482c04,0,e3245d38) at fork_exit+0x70
fork_trampoline() at fork_trampoline+0x8
--- trap 0x1, eip = 0, esp = 0xe3245d6c, ebp = 0 ---
db>

  Luckily I was able to obtain a crash dump, which yielded the following
  information (after loading the geom_vinum symbols from a debug module):

#0  doadump () at pcpu.h:165
#1  0xc0431953 in db_fncall (dummy1=-1066867712, dummy2=0, dummy3=-1067549157,
    dummy4=0xe3245af8 "$[$ã\210e^À\020[$ã\024[$ã\220\a")
    at /net/freebsd/6.0-SECURITY/src/sys/ddb/db_command.c:492
#2  0xc0431758 in db_command (last_cmdp=0xc0667c04, cmd_table=0x0,
    aux_cmd_tablep=0xc0638d58, aux_cmd_tablep_end=0xc0638d5c)
    at /net/freebsd/6.0-SECURITY/src/sys/ddb/db_command.c:350
#3  0xc0431820 in db_command_loop ()
    at /net/freebsd/6.0-SECURITY/src/sys/ddb/db_command.c:458
#4  0xc043342d in db_trap (type=12, code=0)
    at /net/freebsd/6.0-SECURITY/src/sys/ddb/db_main.c:221
#5  0xc04d38ff in kdb_trap (type=12, code=0, tf=0xe3245c8c)
    at /net/freebsd/6.0-SECURITY/src/sys/kern/subr_kdb.c:473
#6  0xc05fea4c in trap_fatal (frame=0xe3245c8c, eva=0)
    at /net/freebsd/6.0-SECURITY/src/sys/i386/i386/trap.c:822
#7  0xc05fe7bb in trap_pfault (frame=0xe3245c8c, usermode=0, eva=0)
    at /net/freebsd/6.0-SECURITY/src/sys/i386/i386/trap.c:742
#8  0xc05fe3d5 in trap (frame=
      {tf_fs = 8, tf_es = 40, tf_ds = -484179928, tf_edi = 0, tf_esi = -1038670080, tf_ebp = -484156200, tf_isp = -484156232, tf_ebx = -960707024, tf_edx = 0, tf_ecx = -1041698048, tf_eax = 0, tf_trapno = 12, tf_err = 2, tf_eip = -1066329535, tf_cs = 32, tf_eflags = 590470, tf_esp = -1066329576, tf_ss = -960707024})
    at /net/freebsd/6.0-SECURITY/src/sys/i386/i386/trap.c:432
#9  0xc05f0f8a in calltrap ()
    at /net/freebsd/6.0-SECURITY/src/sys/i386/i386/exception.s:139
#10 0xc0711a41 in gv_drive_done (bp=0xc6bcc630) at geom.h:290
#11 0xc05038a7 in biodone (bp=0xc6bcc630)
    at /net/freebsd/6.0-SECURITY/src/sys/kern/vfs_bio.c:2893
#12 0xc0482a26 in g_io_schedule_up (tp=0xc1e8f300)
    at /net/freebsd/6.0-SECURITY/src/sys/geom/geom_io.c:474
#13 0xc0482c72 in g_up_procbody ()
    at /net/freebsd/6.0-SECURITY/src/sys/geom/geom_kern.c:95
#14 0xc04a5820 in fork_exit (callout=0xc0482c04 <g_up_procbody>, arg=0x0,
    frame=0xe3245d38) at /net/freebsd/6.0-SECURITY/src/sys/kern/kern_fork.c:789
#15 0xc05f0fec in fork_trampoline ()
    at /net/freebsd/6.0-SECURITY/src/sys/i386/i386/exception.s:208
(kgdb) frame 10
#10 0xc0711a41 in gv_drive_done (bp=0xc6bcc630) at geom.h:290
290     geom.h: No such file or directory.
        in geom.h
(kgdb) print/x $eax
$3 = 0x0
(kgdb) disassemble gv_drive_done+0x41
Dump of assembler code for function gv_drive_done:
0xc0711a18 <gv_drive_done+0>:   push   %ebp
0xc0711a19 <gv_drive_done+1>:   mov    %esp,%ebp
0xc0711a1b <gv_drive_done+3>:   push   %edi
0xc0711a1c <gv_drive_done+4>:   push   %esi
0xc0711a1d <gv_drive_done+5>:   push   %ebx
0xc0711a1e <gv_drive_done+6>:   mov    0x8(%ebp),%ebx
0xc0711a21 <gv_drive_done+9>:   mov    0x44(%ebx),%eax
0xc0711a24 <gv_drive_done+12>:  mov    (%eax),%eax
0xc0711a26 <gv_drive_done+14>:  mov    0x3c(%eax),%esi
0xc0711a29 <gv_drive_done+17>:  orb    $0x1,0x2(%ebx)
0xc0711a2d <gv_drive_done+21>:  push   $0x101
0xc0711a32 <gv_drive_done+26>:  push   $0xc0643ca0
0xc0711a37 <gv_drive_done+31>:  push   $0xc
0xc0711a39 <gv_drive_done+33>:  call   0xc04b0a30 <malloc>
0xc0711a3e <gv_drive_done+38>:  add    $0xc,%esp
0xc0711a41 <gv_drive_done+41>:  mov    %ebx,(%eax)
0xc0711a43 <gv_drive_done+43>:  push   $0xf6
0xc0711a48 <gv_drive_done+48>:  push   $0xc071a752
0xc0711a4d <gv_drive_done+53>:  lea    0x7c(%esi),%ebx

  which on my system corresponds to

%%%
static void
gv_drive_done(struct bio *bp)
{
        struct gv_drive *d;
        struct gv_bioq *bq;

        /* Put the BIO on the worker queue again. */
        d = bp->bio_from->geom->softc;
        bp->bio_cflags |= GV_BIO_DONE;
        bq = g_malloc(sizeof(*bq), M_NOWAIT | M_ZERO);  <--- g_malloc
								returns NULL
        bq->bp = bp;
        mtx_lock(&d->bqueue_mtx);
        TAILQ_INSERT_TAIL(&d->bqueue, bq, queue);
        wakeup(d);
        mtx_unlock(&d->bqueue_mtx);
}
%%%

  Now, I don't know if this routine is run in an interrupt context, so
  I really wouldn't know if it is possible to remove the M_NOWAIT.

  Any thoughts?

>How-To-Repeat:

I wish I knew; I've tried running all of 'periodic daily', the backup
routines (normally run at 4:00 AM), generating disk load using
dd if={gvinum drive} of=/dev/null, all at the same time, and the box
just copes (albeit slowly). However this is the 5th day in a row that
it crashes during daily maintenance, so something else must also be
triggering it. I'll update the PR if I can reproduce it more easily but
for now it's "only" once a day...

>Fix:

Would love to have one.
>Release-Note:
>Audit-Trail:
>Unformatted:



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20051128094338.3D52DAC823>