Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 23 Jul 2002 00:07:04 -0700
From:      Peter Wemm <peter@wemm.org>
To:        Yann Berthier <yb@sainte-barbe.org>
Cc:        current@freebsd.org, alfred@freebsd.org
Subject:   Re: Is it just me or has -current suddenly got massively unstable? 
Message-ID:  <20020723070704.7B4CB3925@overcee.wemm.org>
In-Reply-To: <20020722101211.GA442@hsc.fr> 

next in thread | previous in thread | raw e-mail | index | archive | help
Yann Berthier wrote:
> On Mon, 22 Jul 2002, Peter Wemm wrote:
> 
> > It might be just me because I swapped an ISA 'si' card for a PCI version, b
    ut
> > the problems I've been seeing are pretty spectacular.  I'm regularly seeing
> > the following panics:
> > 
> > - selwakeup() taking fatal traps (always while running postfix/smtpd,
> > presumably this is happening during the traditional 'select collision'
> > window - the locking looks rather suspect there too).  This killed my box
> > 3 times today alone.
> > 
> > eg:
> > Fatal trap 12: page fault while in kernel mode
> > fault virtual address   = 0xc44a01b4
> > fault code              = supervisor write, page not present
> > instruction pointer     = 0x8:0xc027f945
> > current process         = 4078 (smtpd)
> > trap number             = 12
> 
>    Same here: 2 panics with a kernel from today while running
>    postfix/smtpd.
> 
>    Sorry, I have no more info to give for now though

Thanks for the independent confirmation.  Here's a workaround patch
that you might like to try:

--- kern_thread.c       17 Jul 2002 23:43:55 -0000      1.8
+++ kern_thread.c       22 Jul 2002 23:31:06 -0000
@@ -198,7 +198,7 @@
 
        thread_zone = uma_zcreate("THREAD", sizeof (struct thread),
            thread_ctor, thread_dtor, thread_init, thread_fini,
-           UMA_ALIGN_CACHE, 0);
+           UMA_ALIGN_CACHE, UMA_ZONE_NOFREE);
 }
 
 /*

I haven't paniced yet with that change. :-) For some unknown reason,
selwakeup() is dereferencing pointers to threads that have long gone and
the backing store has been freed.  The patch above is a bandaid, not a
solution.  It basically prevents threads ever being freed back to the
general pool, even though everything here supposedly does not need that.
(unlike struct proc and socket, for example).

peter@overcee[11:57pm]/home/crash-105# gdb -k kernel.12 vmcore.12
...
Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0xc29b0634
fault code              = supervisor write, page not present
instruction pointer     = 0x8:0xc0257755
current process         = 1411 (smtpd)
...
(kgdb) l *0xc0257755
0xc0257755 is in selwakeup (../../../kern/sys_generic.c:1186).
1181            }
1182            if (td == NULL) {
1183                    mtx_unlock(&sellock);
1184                    return;
1185            }
1186            TAILQ_REMOVE(&td->td_selq, sip, si_thrlist);
1187            sip->si_thread = NULL;
1188            mtx_lock_spin(&sched_lock);
1189            if (td->td_wchan == (caddr_t)&selwait) {
1190                    if (td->td_state == TDS_SLP)

#5  0xc034c68d in trap (frame=
      {tf_fs = -1069613032, tf_es = 16, tf_ds = -1070006256, tf_edi = 0, tf_esi = -1034848204, tf_ebp = -630072692, tf_isp = -630072736, tf_ebx = -1030027776, tf_edx = -1030911744, tf_ecx = 1, tf_eax = -1030027728, tf_trapno = 12, tf_err = 2, tf_eip = -1071286443, tf_cs = 8, tf_eflags = 66118, tf_esp = -1069571036, tf_ss = 0}) at ../../../i386/i386/trap.c:445
#6  0xc0257755 in selwakeup (sip=0xc2517834)
    at ../../../kern/sys_generic.c:1186
#7  0xc026d249 in sowakeup (so=0xc25177d0, sb=0xc251781c)
    at ../../../kern/uipc_socket2.c:300
#8  0xc026cdb0 in soisconnected (so=0xc2750bb8)
    at ../../../kern/uipc_socket2.c:132
#9  0xc02726fd in unp_connect2 (so=0xc30a3190, so2=0xc2750bb8)
    at ../../../kern/uipc_usrreq.c:769
#10 0xc0272653 in unp_connect (so=0xc30a3190, nam=0xc4359d00, td=0xc30a3190)
    at ../../../kern/uipc_usrreq.c:737
#11 0xc027173e in uipc_connect (so=0x0, nam=0x0, td=0xc28d8900)
    at ../../../kern/uipc_usrreq.c:161
#12 0xc026abda in soconnect (so=0xc263c630, nam=0x0, td=0x0)
    at ../../../kern/uipc_socket.c:429
#13 0xc026eade in connect (td=0xc30a3190, uap=0xc2750bb8)
    at ../../../kern/uipc_syscalls.c:441
#14 0xc034d1c1 in syscall (frame=
      {tf_fs = 47, tf_es = 47, tf_ds = 47, tf_edi = 11, tf_esi = 0, tf_ebp = -1077938236, tf_isp = -630071948, tf_ebx = 134708840, tf_edx = -1077938342, tf_ecx = 0, tf_eax = 98, tf_trapno = 22, tf_err = 2, tf_eip = 671906955, tf_cs = 31, tf_eflags = 663, tf_esp = -1077938408, tf_ss = 47})
    at ../../../i386/i386/trap.c:1049

I've checked the page tables, it is indeed unmapped.

Also note that this is in the guts of the unix domain socket code. :-]

(kgdb) peter@overcee[11:58pm]/home/crash-110# gdb -k kernel.10 vmcore.10
Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0xc44a01b4
fault code              = supervisor write, page not present
instruction pointer     = 0x8:0xc027f945
current process         = 4078 (smtpd)
[..]
#13 0xc03750dd in trap ()
#14 0xc027f945 in selwakeup ()
#15 0xc02953f9 in sowakeup ()
#16 0xc0294f60 in soisconnected ()
#17 0xc029a8ad in unp_connect2 ()
#18 0xc029a803 in unp_connect ()
#19 0xc02998ee in uipc_connect ()
#20 0xc0292d8a in soconnect ()
#21 0xc0296c8e in connect ()
#22 0xc0375c11 in syscall ()

Interestingly, the stack trace is identical on both of these that I managed
to capture.

Cheers,
-Peter
--
Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com
"All of this is for nothing if we don't go to the stars" - JMS/B5


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20020723070704.7B4CB3925>