From owner-freebsd-net@FreeBSD.ORG Tue Aug 5 17:50:32 2003 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0719D37B401 for ; Tue, 5 Aug 2003 17:50:32 -0700 (PDT) Received: from mail.sandvine.com (sandvine.com [199.243.201.138]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2762D43F93 for ; Tue, 5 Aug 2003 17:50:31 -0700 (PDT) (envelope-from sloach@sandvine.com) Received: by mail.sandvine.com with Internet Mail Service (5.5.2653.19) id <305LHL5D>; Tue, 5 Aug 2003 20:50:30 -0400 Message-ID: From: Scot Loach To: "'freebsd-net@freebsd.org'" Date: Tue, 5 Aug 2003 20:50:28 -0400 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain; charset="iso-8859-1" Subject: RE: TCP socket shutdown race condition X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Aug 2003 00:50:32 -0000 So I instrumented the kernel source, and ui_ref isn't going very high at all. Just to be sure I changed it to a 32-bit number though. maxsockets is only 12K for my test program. I managed to reproduce it again, this one is slightly different, my theory is this is the other side of the race where a timer has first deleted the socket. This time the tcpcb state is TCPS_FIN_WAIT_2, last time it was TIME_WAIT. I'm working on making this easier to reproduce. Once I've found the formula to quickly reproduce it I'll post some code. #0 dumpsys () at /usr/src/sys/kern/kern_shutdown.c:493 #1 0xc01bd148 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:322 #2 0xc01bd671 in panic (fmt=0xc032a6d9 "%s") at /usr/src/sys/kern/kern_shutdown.c:608 #3 0xc02d6bfe in trap_fatal (frame=0xff807b5c, eva=48) at /usr/src/sys/i386/i386/trap.c:974 #4 0xc02d682d in trap_pfault (frame=0xff807b5c, usermode=0, eva=48) at /usr/src/sys/i386/i386/trap.c:867 #5 0xc02d62cf in trap (frame={tf_fs = -1016528872, tf_es = 16, tf_ds = -8388592, tf_edi = 0, tf_esi = -1070268932, tf_ebp = -8356956, tf_isp = -8356984, tf_ebx = -1, tf_edx = 1778434048, tf_ecx = -533147840, tf_eax = 0, tf_trapno = 12, tf_err = 0, tf_eip = -1071174995, tf_cs = 8, tf_eflags = 66054, tf_esp = -1016487936, tf_ss = -1016487936}) at /usr/src/sys/i386/i386/trap.c:466 #6 0xc0272aad in acquire_lock (lk=0xc034fdfc) at machine/globals.h:114 #7 0xc0277494 in softdep_update_inodeblock (ip=0xc369a000, bp=0xcfc98fa0, waitfor=0) at /usr/src/sys/ufs/ffs/ffs_softdep.c:3813 #8 0xc0271aea in ffs_update (vp=0xe038cf40, waitfor=0) at /usr/src/sys/ufs/ffs/ffs_inode.c:106 #9 0xc027aee7 in ffs_sync (mp=0xc35e3e00, waitfor=2, cred=0xc2053700, p=0xc037b740) at /usr/src/sys/ufs/ffs/ffs_vfsops.c:1025 #10 0xc01f48cb in sync (p=0xc037b740, uap=0x0) at /usr/src/sys/kern/vfs_syscalls.c:577 #11 0xc01bcebb in boot (howto=256) at /usr/src/sys/kern/kern_shutdown.c:241 #12 0xc01bd671 in panic (fmt=0xc032a6d9 "%s") at /usr/src/sys/kern/kern_shutdown.c:608 #13 0xc02d6bfe in trap_fatal (frame=0xff807d38, eva=8) at /usr/src/sys/i386/i386/trap.c:974 #14 0xc02d682d in trap_pfault (frame=0xff807d38, usermode=0, eva=8) at /usr/src/sys/i386/i386/trap.c:867 #15 0xc02d62cf in trap (frame={tf_fs = -1070202856, tf_es = 16, tf_ds = -1071906800, tf_edi = 0, tf_esi = 2147483647, tf_ebp = -8356472, tf_isp = -8356508, tf_ebx = -1, tf_edx = 1644167168, tf_ecx = 0, tf_eax = 1644167168, tf_trapno = 12, tf_err = 0, tf_eip = -1071920291, tf_cs = 8, tf_eflags = 66054, tf_esp = -570891728, tf_ss = -570891840}) at /usr/src/sys/i386/i386/trap.c:466 #16 0xc01bcb5d in chgsbsize (uip=0x0, hiwat=0xddf8e234, to=0, max=9223372036854775807) at /usr/src/sys/kern/kern_resource.c:780 #17 0xc01e2ba3 in sbrelease (sb=0xddf8e230, so=0xddf8e1c0) at /usr/src/sys/kern/uipc_socket2.c:441 #18 0xc01dfdb7 in sofree (so=0xddf8e1c0) at /usr/src/sys/kern/uipc_socket.c:262 #19 0xc020fe5c in in_pcbdetach (inp=0xde6f0260) at /usr/src/sys/netinet/in_pcb.c:567 #20 0xc02213aa in tcp_close (tp=0xde6f0320) at /usr/src/sys/netinet/tcp_subr.c:754 #21 0xc021e9be in tcp_input (m=0xc21f3000, off0=20, proto=6) at /usr/src/sys/netinet/tcp_input.c:1512 #22 0xc021938a in ip_input (m=0xc21f3000) at /usr/src/sys/netinet/ip_input.c:834 #23 0xc0219427 in ipintr () at /usr/src/sys/netinet/ip_input.c:858 -----Original Message----- From: Mike Silbersack [mailto:silby@silby.com] Sent: Saturday, August 02, 2003 1:43 PM To: Scot Loach Cc: 'freebsd-net@freebsd.org' Subject: RE: TCP socket shutdown race condition On Sat, 2 Aug 2003, Scot Loach wrote: > I don't think that's the problem, although it does seem suspicious. > > Here's the struct ucred pointed to by the socket: > > (kgdb) p *so.so_cred > $2 = {cr_ref = 3279453304, cr_uid = 3486088556, cr_ngroups = 1, cr_groups = > { > 0, 3276863080, 3277717504, 21162, 0, 0, 0, 0, 0, 4294967295, 4294967295, > 0, 0, 0, 0, 3279496516}, cr_uidinfo = 0x0} > > This looks like garbage, but the cr_uidinfo pointer is null, and the cr_ref > of _this_ structure is 32 bits. > > This doesn't look to me like a problem with the uidinfo, it looks to me like > the ucred structure has already been freed. > > scot. Well, as ui_ref is the best bet, redoing your tests with it expanded to ui_int is where we need to start before looking further. :) I believe that a uidinfo->ui_ref over/underflow could cause random memory corruption, so maybe the panic you're seeing comes about after a bunch of memory has already been trashed. So anyway, promote ui_ref to a u_int and retest. Tell us what happens. Mike "Silby" Silbersack