Date: Mon, 28 Nov 2005 10:43:38 +0100 (CET) From: Stijn Hoop <stijn@win.tue.nl> To: FreeBSD-gnats-submit@FreeBSD.org Cc: Lukas Ertl <le@FreeBSD.org> Subject: kern/89660: panic due to g_malloc returning null in gv_drive_done Message-ID: <20051128094338.3D52DAC823@sandcat.nl> Resent-Message-ID: <200511280950.jAS9o27a078004@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
>Number: 89660 >Category: kern >Synopsis: panic due to g_malloc returning null in gv_drive_done >Confidential: no >Severity: serious >Priority: medium >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Mon Nov 28 09:50:02 GMT 2005 >Closed-Date: >Last-Modified: >Originator: Stijn Hoop >Release: FreeBSD 6.0-RELEASE i386 >Organization: >Environment: System: FreeBSD 6.0-RELEASE #1: Sun Nov 27 14:48:26 CET 2005 stijn@pcwin002.win.tue.nl:/net/freebsd/6.0-SECURITY/obj/net/freebsd/6.0-SECURITY/src/sys/SANDCAT i386 >Description: - This machine panics every night on the daily maintenance time, with the following backtrace (hand transcribed because of a lack of serial cable): Fatal trap 12: page fault while in kernel mode fault virtual address = 0x0 fault code = supervisor write, page not present instruction pointer = 0x20:0xc0711a41 stack pointer = 0x28:0xe3245ccc frame pointer = 0x28:0xe3245cd8 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 3 (g_up) [thread pid 3 tid 100021 ] Stopped at gv_drive_done+0x29: movl %ebx,0(%eax) db> bt Tracing pid 3 tid 100021 td 0xc1e8f300 gv_drive_done(c6bcc630) at gv_drive_done+0x29 biodone(c6bcc630) at biodone+0x8b g_io_schedule_up(c1e8f300) at g_io_schedule_up+0x86 g_up_procbody(0,e3245d38) at g_up_procbody+0x6e fork_exit(c0482c04,0,e3245d38) at fork_exit+0x70 fork_trampoline() at fork_trampoline+0x8 --- trap 0x1, eip = 0, esp = 0xe3245d6c, ebp = 0 --- db> Luckily I was able to obtain a crash dump, which yielded the following information (after loading the geom_vinum symbols from a debug module): #0 doadump () at pcpu.h:165 #1 0xc0431953 in db_fncall (dummy1=-1066867712, dummy2=0, dummy3=-1067549157, dummy4=0xe3245af8 "$[$ã\210e^À\020[$ã\024[$ã\220\a") at /net/freebsd/6.0-SECURITY/src/sys/ddb/db_command.c:492 #2 0xc0431758 in db_command (last_cmdp=0xc0667c04, cmd_table=0x0, aux_cmd_tablep=0xc0638d58, aux_cmd_tablep_end=0xc0638d5c) at /net/freebsd/6.0-SECURITY/src/sys/ddb/db_command.c:350 #3 0xc0431820 in db_command_loop () at /net/freebsd/6.0-SECURITY/src/sys/ddb/db_command.c:458 #4 0xc043342d in db_trap (type=12, code=0) at /net/freebsd/6.0-SECURITY/src/sys/ddb/db_main.c:221 #5 0xc04d38ff in kdb_trap (type=12, code=0, tf=0xe3245c8c) at /net/freebsd/6.0-SECURITY/src/sys/kern/subr_kdb.c:473 #6 0xc05fea4c in trap_fatal (frame=0xe3245c8c, eva=0) at /net/freebsd/6.0-SECURITY/src/sys/i386/i386/trap.c:822 #7 0xc05fe7bb in trap_pfault (frame=0xe3245c8c, usermode=0, eva=0) at /net/freebsd/6.0-SECURITY/src/sys/i386/i386/trap.c:742 #8 0xc05fe3d5 in trap (frame= {tf_fs = 8, tf_es = 40, tf_ds = -484179928, tf_edi = 0, tf_esi = -1038670080, tf_ebp = -484156200, tf_isp = -484156232, tf_ebx = -960707024, tf_edx = 0, tf_ecx = -1041698048, tf_eax = 0, tf_trapno = 12, tf_err = 2, tf_eip = -1066329535, tf_cs = 32, tf_eflags = 590470, tf_esp = -1066329576, tf_ss = -960707024}) at /net/freebsd/6.0-SECURITY/src/sys/i386/i386/trap.c:432 #9 0xc05f0f8a in calltrap () at /net/freebsd/6.0-SECURITY/src/sys/i386/i386/exception.s:139 #10 0xc0711a41 in gv_drive_done (bp=0xc6bcc630) at geom.h:290 #11 0xc05038a7 in biodone (bp=0xc6bcc630) at /net/freebsd/6.0-SECURITY/src/sys/kern/vfs_bio.c:2893 #12 0xc0482a26 in g_io_schedule_up (tp=0xc1e8f300) at /net/freebsd/6.0-SECURITY/src/sys/geom/geom_io.c:474 #13 0xc0482c72 in g_up_procbody () at /net/freebsd/6.0-SECURITY/src/sys/geom/geom_kern.c:95 #14 0xc04a5820 in fork_exit (callout=0xc0482c04 <g_up_procbody>, arg=0x0, frame=0xe3245d38) at /net/freebsd/6.0-SECURITY/src/sys/kern/kern_fork.c:789 #15 0xc05f0fec in fork_trampoline () at /net/freebsd/6.0-SECURITY/src/sys/i386/i386/exception.s:208 (kgdb) frame 10 #10 0xc0711a41 in gv_drive_done (bp=0xc6bcc630) at geom.h:290 290 geom.h: No such file or directory. in geom.h (kgdb) print/x $eax $3 = 0x0 (kgdb) disassemble gv_drive_done+0x41 Dump of assembler code for function gv_drive_done: 0xc0711a18 <gv_drive_done+0>: push %ebp 0xc0711a19 <gv_drive_done+1>: mov %esp,%ebp 0xc0711a1b <gv_drive_done+3>: push %edi 0xc0711a1c <gv_drive_done+4>: push %esi 0xc0711a1d <gv_drive_done+5>: push %ebx 0xc0711a1e <gv_drive_done+6>: mov 0x8(%ebp),%ebx 0xc0711a21 <gv_drive_done+9>: mov 0x44(%ebx),%eax 0xc0711a24 <gv_drive_done+12>: mov (%eax),%eax 0xc0711a26 <gv_drive_done+14>: mov 0x3c(%eax),%esi 0xc0711a29 <gv_drive_done+17>: orb $0x1,0x2(%ebx) 0xc0711a2d <gv_drive_done+21>: push $0x101 0xc0711a32 <gv_drive_done+26>: push $0xc0643ca0 0xc0711a37 <gv_drive_done+31>: push $0xc 0xc0711a39 <gv_drive_done+33>: call 0xc04b0a30 <malloc> 0xc0711a3e <gv_drive_done+38>: add $0xc,%esp 0xc0711a41 <gv_drive_done+41>: mov %ebx,(%eax) 0xc0711a43 <gv_drive_done+43>: push $0xf6 0xc0711a48 <gv_drive_done+48>: push $0xc071a752 0xc0711a4d <gv_drive_done+53>: lea 0x7c(%esi),%ebx which on my system corresponds to %%% static void gv_drive_done(struct bio *bp) { struct gv_drive *d; struct gv_bioq *bq; /* Put the BIO on the worker queue again. */ d = bp->bio_from->geom->softc; bp->bio_cflags |= GV_BIO_DONE; bq = g_malloc(sizeof(*bq), M_NOWAIT | M_ZERO); <--- g_malloc returns NULL bq->bp = bp; mtx_lock(&d->bqueue_mtx); TAILQ_INSERT_TAIL(&d->bqueue, bq, queue); wakeup(d); mtx_unlock(&d->bqueue_mtx); } %%% Now, I don't know if this routine is run in an interrupt context, so I really wouldn't know if it is possible to remove the M_NOWAIT. Any thoughts? >How-To-Repeat: I wish I knew; I've tried running all of 'periodic daily', the backup routines (normally run at 4:00 AM), generating disk load using dd if={gvinum drive} of=/dev/null, all at the same time, and the box just copes (albeit slowly). However this is the 5th day in a row that it crashes during daily maintenance, so something else must also be triggering it. I'll update the PR if I can reproduce it more easily but for now it's "only" once a day... >Fix: Would love to have one. >Release-Note: >Audit-Trail: >Unformatted:
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20051128094338.3D52DAC823>