From owner-freebsd-current  Fri Feb 18  8:27:37 2000
Delivered-To: freebsd-current@freebsd.org
Received: from lor.watermarkgroup.com (lor.watermarkgroup.com [207.202.73.33])
	by hub.freebsd.org (Postfix) with ESMTP id BFCE737B989
	for <freebsd-current@FreeBSD.ORG>; Fri, 18 Feb 2000 08:27:30 -0800 (PST)
	(envelope-from luoqi@watermarkgroup.com)
Received: (from luoqi@localhost)
	by lor.watermarkgroup.com (8.8.8/8.8.8) id LAA15059;
	Fri, 18 Feb 2000 11:27:27 -0500 (EST)
	(envelope-from luoqi)
Date: Fri, 18 Feb 2000 11:27:27 -0500 (EST)
From: Luoqi Chen <luoqi@watermarkgroup.com>
Message-Id: <200002181627.LAA15059@lor.watermarkgroup.com>
To: freebsd-current@FreeBSD.ORG, tstromberg@rtci.com
Subject: Re:  repost of procfs crashes in -CURRENT (no html)..
Sender: owner-freebsd-current@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

> Kernel: 
> =======
> FreeBSD karma.afterthought.org 4.0-CURRENT FreeBSD 4.0-CURRENT #0: Mon Feb
> 14 23:00:42 GMT 2000
> chenresig@karma.afterthought.org:/usr/src/sys/compile/KARMA  i386
> 
> Background:
> ============ 
> 3 users. One with X running <me>, and two users running breakwidgets
> <binary testing script>, which make use of a minimized version of the
> "killall" perl script which reads procfs. 
> 
> This crash appears to be the old one where when two processes read procfs
> simultaneously, ugly things can happen. mdillon described this in more
> depth to me once but I've since lost the e-mail. <I posted similar crash
> reports in late November & early december>. He suggested having my
> programs "lock" procfs reads so only one could do it's killall function at
> a time. Unfortunatly, the binary testing script is very time sensitive and
> this would slow things down <my current run-through is about 48 hours
> paralleled on 4 machines>
> 
I don't believe that's the cause.

> The kernel is a GENERIC one with ipv6, softupdates, and pcm added to it. 
> 
> Crash #1:
> =========
> (kgdb) bt
> #0  boot (howto=256) at ../../kern/kern_shutdown.c:304
> #1  0xc014e194 in poweroff_wait (junk=0xc02b9480, howto=-871862272) at
> ../../kern/kern_shutdown.c:554
> #2  0xc022d064 in vm_fault (map=0xc031ee28, vaddr=3423105024, fault_type=1
> '\001', fault_flags=0) at ../../vm/vm_fault.c:240
> #3  0xc02810d2 in trap_pfault (frame=0xcc136cc4, usermode=0,
> eva=3423108180) at ../../i386/i386/trap.c:788
> #4  0xc0280d37 in trap (frame={tf_fs = -871170032, tf_es = -871170032,
> tf_ds = 16, tf_edi = -871142055, tf_esi = -871142025,
>       tf_ebp = -871141804, tf_isp = -871142160, tf_ebx = -872323392,
> tf_edx = 0, tf_ecx = -872323392, tf_eax = -871859336,
>       tf_trapno = 12, tf_err = 0, tf_eip = -1072160861, tf_cs = 8,
> tf_eflags = 66118, tf_esp = 0, tf_ss = 0})
>     at ../../i386/i386/trap.c:423
> #5  0xc0181fa3 in procfs_dostatus (curp=0xcc145e00, p=0xcc0166c0,
> pfs=0xc14abf60, uio=0xcc136eec)
>     at ../../miscfs/procfs/procfs_status.c:115

The fault is taken when trying to access the target process' p_stats which
resides in the u area. What's interesting here is the code checks P_INMEM
flag prior to accessing p_stats, so there shouldn't be a fault. My guess is
this is an embryonic process, the p_stats field is inherited from the corpse
of another process which points to no where. Would you print out p->p_stat
(not p_stats) and check if it is 1 (SIDL)? That would confirm my theory.

If this indeed is the case, the fix should be delaying setting P_INMEM flags
in fork() until after the u area is allocated. It maybe also a good idea to
skip embryonic processes in procfs altogether.

> #6  0xc0182590 in procfs_rw (ap=0xcc136ea0) at
> ../../miscfs/procfs/procfs_subr.c:277
> #7  0xc017dc0a in vn_read (fp=0xc14431c0, uio=0xcc136eec, cred=0xc1450700,
> flags=0, p=0xcc145e00) at vnode_if.h:334
> #8  0xc015ac50 in dofileread (p=0xcc145e00, fp=0xc14431c0, fd=6,
> buf=0x8235000, nbyte=4096, offset=-1, flags=0)
>     at ../../sys/file.h:140
> #9  0xc015ab57 in read (p=0xcc145e00, uap=0xcc136f80) at
> ../../kern/sys_generic.c:111
> #10 0xc028167e in syscall (frame={tf_fs = 47, tf_es = 47, tf_ds = 47,
> tf_edi = -1077946820, tf_esi = 672915688,
>       tf_ebp = -1077946996, tf_isp = -871141420, tf_ebx = 672858084,
> tf_edx = 672809512, tf_ecx = 136531968, tf_eax = 3,
>       tf_trapno = 0, tf_err = 2, tf_eip = 672818732, tf_cs = 31, tf_eflags
> = 659, tf_esp = -1077947040, tf_ss = 47})
>     at ../../i386/i386/trap.c:1055
> 
> 
> 
> Crash #2:
> =========
> #0  boot (howto=256) at ../../kern/kern_shutdown.c:304
> #1  0xc014e194 in poweroff_wait (junk=0xc02b9480, howto=-873472000) at
> ../../kern/kern_shutdown.c:554
> #2  0xc022d064 in vm_fault (map=0xc031ee28, vaddr=3421495296, fault_type=1
> '\001', fault_flags=0) at ../../vm/vm_fault.c:240
> #3  0xc02810d2 in trap_pfault (frame=0xcbe0ccc4, usermode=0,
> eva=3421498452) at ../../i386/i386/trap.c:788
> #4  0xc0280d37 in trap (frame={tf_fs = -874512368, tf_es = -874512368,
> tf_ds = 16, tf_edi = -874459817, tf_esi = -874459788,
>       tf_ebp = -874459564, tf_isp = -874459920, tf_ebx = -873997056,
> tf_edx = 0, tf_ecx = -873997056, tf_eax = -873469064,
>       tf_trapno = 12, tf_err = 0, tf_eip = -1072160861, tf_cs = 8,
> tf_eflags = 66118, tf_esp = 0, tf_ss = 0})
>     at ../../i386/i386/trap.c:423
> #5  0xc0181fa3 in procfs_dostatus (curp=0xcbd7df20, p=0xcbe7dd00,
> pfs=0xc154ac20, uio=0xcbe0ceec)
>     at ../../miscfs/procfs/procfs_status.c:115
> #6  0xc0182590 in procfs_rw (ap=0xcbe0cea0) at
> ../../miscfs/procfs/procfs_subr.c:277
> #7  0xc017dc0a in vn_read (fp=0xc1469200, uio=0xcbe0ceec, cred=0xc153d180,
> flags=0, p=0xcbd7df20) at vnode_if.h:334
> #8  0xc015ac50 in dofileread (p=0xcbd7df20, fp=0xc1469200, fd=5,
> buf=0x8253000, nbyte=4096, offset=-1, flags=0)
>     at ../../sys/file.h:140
> #9  0xc015ab57 in read (p=0xcbd7df20, uap=0xcbe0cf80) at
> ../../kern/sys_generic.c:111
> #10 0xc028167e in syscall (frame={tf_fs = 47, tf_es = 47, tf_ds = 47,
> tf_edi = -1077945828, tf_esi = 136638564,
>       tf_ebp = -1077946004, tf_isp = -874459180, tf_ebx = 672858084,
> tf_edx = 672809512, tf_ecx = 136654848, tf_eax = 3,
>       tf_trapno = 0, tf_err = 2, tf_eip = 672818732, tf_cs = 31, tf_eflags
> = 663, tf_esp = -1077946048, tf_ss = 47})
>     at ../../i386/i386/trap.c:1055
> #11 0xc0276646 in Xint0x80_syscall ()
> 
-lq


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message