From owner-freebsd-bugs Wed Sep 20 05:20:11 1995 Return-Path: owner-bugs Received: (from root@localhost) by freefall.freebsd.org (8.6.12/8.6.6) id FAA00849 for bugs-outgoing; Wed, 20 Sep 1995 05:20:11 -0700 Received: (from gnats@localhost) by freefall.freebsd.org (8.6.12/8.6.6) id FAA00843 ; Wed, 20 Sep 1995 05:20:06 -0700 Resent-Date: Wed, 20 Sep 1995 05:20:06 -0700 Resent-Message-Id: <199509201220.FAA00843@freefall.freebsd.org> Resent-From: gnats (GNATS Management) Resent-To: freebsd-bugs Resent-Reply-To: FreeBSD-gnats@freefall.FreeBSD.org, kato@eclogite.eps.nagoya-u.ac.jp Received: from mail.barrnet.net (mail.barrnet.net [131.119.246.7]) by freefall.freebsd.org (8.6.12/8.6.6) with ESMTP id FAA00818 for ; Wed, 20 Sep 1995 05:17:46 -0700 Received: from marble.eps.nagoya-u.ac.jp (marble.eps.nagoya-u.ac.jp [133.6.57.68]) by mail.barrnet.net (8.6.10/MAIL-RELAY-LEN) with ESMTP id FAA12434 for ; Wed, 20 Sep 1995 05:17:45 -0700 Received: (from kato@localhost) by marble.eps.nagoya-u.ac.jp (8.6.12+2.4W/3.3W9) id VAA00386; Wed, 20 Sep 1995 21:13:55 +0900 Message-Id: <199509201213.VAA00386@marble.eps.nagoya-u.ac.jp> Date: Wed, 20 Sep 1995 21:13:55 +0900 From: kato@eclogite.eps.nagoya-u.ac.jp Reply-To: kato@eclogite.eps.nagoya-u.ac.jp To: FreeBSD-gnats-submit@freebsd.org X-Send-Pr-Version: 3.2 Subject: kern/729: unexpected signal 4/10/11 Sender: owner-bugs@freebsd.org Precedence: bulk >Number: 729 >Category: kern >Synopsis: unexpected signal 4/10/11 >Confidential: no >Severity: critical >Priority: high >Responsible: freebsd-bugs >State: open >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Wed Sep 20 05:20:00 PDT 1995 >Last-Modified: >Originator: KATO Takenori >Organization: Dept. Earth Planet. Sci. Nagoya Univ. >Release: FreeBSD 2.2-CURRENT i386 >Environment: FreeBSD-current (after September 5) on i486DX4 box >Description: Programs catches signal 11 and terminated just after execution. If some progrma caught signal 11, such program cannot be excuted. So I have to reboot my box. Signal caught by program is usually signal 11, but somtimes it is signal 10 and as a rare case signal 4. In most case, virtual address where signal occured is in shared library (I checked it by running programs on gdb). This phenomenon has appered since September 5. Before then, this problem occurs rarely. (Many vm related code had been changed from Semtember 3 to 5.) >How-To-Repeat: I don't know how to repeat this problem on any machine. On my box, this problem happens every day! >Fix: I think this problem is due to vm bug, but I don't know complete fix. I have found three problem related vm. (1) Function splimp doesn't block disk I/O. Even though 4.4BSD derived code assumes splhigh is higher than or equals to splbio + splnet, net_imask doesn't include bio_imask (cf. isa.c). This may cause access to kmem without lock, if disk I/O intterupton occurs. In most code, splimp call in 4.4BSD has been changed into splhigh (why 'splhigh' which block ALL intterupton?), but some has not been changed yet. The next proble is one of them. My quick hack is that I add following code just above spl0() in isa_conigure: net_imask |= bio_imask; (2) In function mbinit (/sys/kern/uipc_mbuf.c), function m_clalloc is called at splimp. In m_clalloc, kmem_malloc is called. The comment of kmem_malloc in /sys/vm/vm_kern.c says that kmem_malloc should be called at splhigh. So splhigh and splx should be added before and after kmem_malloc call in m_clalloc. (3) splhigh() is misplaced in function vm_map_functon (/sys/vm/vm_map.c). I think this splhigh is added to avoid recursive lock_write call (splhigh doesn't appear in vm_map_function in 4.4BSD). To avoid recursive lock there are two way. One is block interruption as FreeBSD does and another is make submap to avoid competition of map. I think FreeBSD choose former way. In this case, splhigh should be placed BEFORE vm_map_lock, because interruption may occur between vm_map_lock and splhigh, and kmem_map is not locked. (I heard that combination of both two way makes splhigh unneccessary in NetBSD.) Applying above three fixes, the time from reboot to appearing the problem becomes long (but once proble happens, I have to reboot yet.) >Audit-Trail: >Unformatted: