Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 5 Aug 2003 20:50:28 -0400 
From:      Scot Loach <sloach@sandvine.com>
To:        "'freebsd-net@freebsd.org'" <freebsd-net@freebsd.org>
Subject:   RE: TCP socket shutdown race condition
Message-ID:  <FE045D4D9F7AED4CBFF1B3B813C8533701AE86DE@mail.sandvine.com>

next in thread | raw e-mail | index | archive | help
So I instrumented the kernel source, and ui_ref isn't going very high at
all.  Just to be sure I changed it to a 32-bit number though.  maxsockets is
only 12K for my test program.

I managed to reproduce it again, this one is slightly different, my theory
is this is the other side of the race where a timer has first deleted the
socket.

This time the tcpcb state is TCPS_FIN_WAIT_2, last time it was TIME_WAIT.

I'm working on making this easier to reproduce.  Once I've found the formula
to quickly reproduce it I'll post some code.


#0  dumpsys () at /usr/src/sys/kern/kern_shutdown.c:493
#1  0xc01bd148 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:322
#2  0xc01bd671 in panic (fmt=0xc032a6d9 "%s")
    at /usr/src/sys/kern/kern_shutdown.c:608
#3  0xc02d6bfe in trap_fatal (frame=0xff807b5c, eva=48)
    at /usr/src/sys/i386/i386/trap.c:974
#4  0xc02d682d in trap_pfault (frame=0xff807b5c, usermode=0, eva=48)
    at /usr/src/sys/i386/i386/trap.c:867
#5  0xc02d62cf in trap (frame={tf_fs = -1016528872, tf_es = 16,
      tf_ds = -8388592, tf_edi = 0, tf_esi = -1070268932, tf_ebp = -8356956,
      tf_isp = -8356984, tf_ebx = -1, tf_edx = 1778434048,
      tf_ecx = -533147840, tf_eax = 0, tf_trapno = 12, tf_err = 0,
      tf_eip = -1071174995, tf_cs = 8, tf_eflags = 66054,
      tf_esp = -1016487936, tf_ss = -1016487936})
    at /usr/src/sys/i386/i386/trap.c:466
#6  0xc0272aad in acquire_lock (lk=0xc034fdfc) at machine/globals.h:114
#7  0xc0277494 in softdep_update_inodeblock (ip=0xc369a000, bp=0xcfc98fa0,
    waitfor=0) at /usr/src/sys/ufs/ffs/ffs_softdep.c:3813
#8  0xc0271aea in ffs_update (vp=0xe038cf40, waitfor=0)
    at /usr/src/sys/ufs/ffs/ffs_inode.c:106
#9  0xc027aee7 in ffs_sync (mp=0xc35e3e00, waitfor=2, cred=0xc2053700,
    p=0xc037b740) at /usr/src/sys/ufs/ffs/ffs_vfsops.c:1025
#10 0xc01f48cb in sync (p=0xc037b740, uap=0x0)
    at /usr/src/sys/kern/vfs_syscalls.c:577
#11 0xc01bcebb in boot (howto=256) at /usr/src/sys/kern/kern_shutdown.c:241
#12 0xc01bd671 in panic (fmt=0xc032a6d9 "%s")
    at /usr/src/sys/kern/kern_shutdown.c:608
#13 0xc02d6bfe in trap_fatal (frame=0xff807d38, eva=8)
    at /usr/src/sys/i386/i386/trap.c:974
#14 0xc02d682d in trap_pfault (frame=0xff807d38, usermode=0, eva=8)
    at /usr/src/sys/i386/i386/trap.c:867
#15 0xc02d62cf in trap (frame={tf_fs = -1070202856, tf_es = 16,
      tf_ds = -1071906800, tf_edi = 0, tf_esi = 2147483647, tf_ebp =
-8356472,
      tf_isp = -8356508, tf_ebx = -1, tf_edx = 1644167168, tf_ecx = 0,
      tf_eax = 1644167168, tf_trapno = 12, tf_err = 0, tf_eip = -1071920291,
      tf_cs = 8, tf_eflags = 66054, tf_esp = -570891728, tf_ss =
-570891840})
    at /usr/src/sys/i386/i386/trap.c:466
#16 0xc01bcb5d in chgsbsize (uip=0x0, hiwat=0xddf8e234, to=0,
    max=9223372036854775807) at /usr/src/sys/kern/kern_resource.c:780
#17 0xc01e2ba3 in sbrelease (sb=0xddf8e230, so=0xddf8e1c0)
    at /usr/src/sys/kern/uipc_socket2.c:441
#18 0xc01dfdb7 in sofree (so=0xddf8e1c0) at
/usr/src/sys/kern/uipc_socket.c:262
#19 0xc020fe5c in in_pcbdetach (inp=0xde6f0260)
    at /usr/src/sys/netinet/in_pcb.c:567
#20 0xc02213aa in tcp_close (tp=0xde6f0320)
    at /usr/src/sys/netinet/tcp_subr.c:754
#21 0xc021e9be in tcp_input (m=0xc21f3000, off0=20, proto=6)
    at /usr/src/sys/netinet/tcp_input.c:1512
#22 0xc021938a in ip_input (m=0xc21f3000)
    at /usr/src/sys/netinet/ip_input.c:834
#23 0xc0219427 in ipintr () at /usr/src/sys/netinet/ip_input.c:858

-----Original Message-----
From: Mike Silbersack [mailto:silby@silby.com]
Sent: Saturday, August 02, 2003 1:43 PM
To: Scot Loach
Cc: 'freebsd-net@freebsd.org'
Subject: RE: TCP socket shutdown race condition



On Sat, 2 Aug 2003, Scot Loach wrote:

> I don't think that's the problem, although it does seem suspicious.
>
> Here's the struct ucred pointed to by the socket:
>
> (kgdb) p *so.so_cred
> $2 = {cr_ref = 3279453304, cr_uid = 3486088556, cr_ngroups = 1, cr_groups
=
> {
>     0, 3276863080, 3277717504, 21162, 0, 0, 0, 0, 0, 4294967295,
4294967295,
>     0, 0, 0, 0, 3279496516}, cr_uidinfo = 0x0}
>
> This looks like garbage, but the cr_uidinfo pointer is null, and the
cr_ref
> of _this_ structure is 32 bits.
>
> This doesn't look to me like a problem with the uidinfo, it looks to me
like
> the ucred structure has already been freed.
>
> scot.

Well, as ui_ref is the best bet, redoing your tests with it expanded to
ui_int is where we need to start before looking further. :)

I believe that a uidinfo->ui_ref over/underflow could cause random memory
corruption, so maybe the panic you're seeing comes about after a bunch of
memory has already been trashed.

So anyway, promote ui_ref to a u_int and retest.  Tell us what happens.

Mike "Silby" Silbersack



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?FE045D4D9F7AED4CBFF1B3B813C8533701AE86DE>