From owner-freebsd-bugs Sun May 25 21:10:03 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id VAA26088 for bugs-outgoing; Sun, 25 May 1997 21:10:03 -0700 (PDT) Received: (from gnats@localhost) by hub.freebsd.org (8.8.5/8.8.5) id VAA26079; Sun, 25 May 1997 21:10:01 -0700 (PDT) Resent-Date: Sun, 25 May 1997 21:10:01 -0700 (PDT) Resent-Message-Id: <199705260410.VAA26079@hub.freebsd.org> Resent-From: gnats (GNATS Management) Resent-To: freebsd-bugs Resent-Reply-To: FreeBSD-gnats@FreeBSD.ORG, Tor.Egge@idi.ntnu.no Received: from pat.idt.unit.no (0@pat.idt.unit.no [129.241.103.5]) by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id VAA25936 for ; Sun, 25 May 1997 21:07:13 -0700 (PDT) Received: from ikke.idi.ntnu.no (tegge@ikke.idi.ntnu.no [129.241.111.65]) by pat.idt.unit.no (8.8.5/8.8.5) with ESMTP id GAA29993 for ; Mon, 26 May 1997 06:07:09 +0200 (MET DST) Received: (from tegge@localhost) by ikke.idi.ntnu.no (8.8.5/8.8.5) id GAA01188; Mon, 26 May 1997 06:07:09 +0200 (MET DST) Message-Id: <199705260407.GAA01188@ikke.idi.ntnu.no> Date: Mon, 26 May 1997 06:07:09 +0200 (MET DST) From: Tor Egge Reply-To: Tor.Egge@idi.ntnu.no To: FreeBSD-gnats-submit@FreeBSD.ORG X-Send-Pr-Version: 3.2 Subject: kern/3688: fsck -p gets transient unexpected inconsistensies Sender: owner-bugs@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk >Number: 3688 >Category: kern >Synopsis: fsck -p gets transient unexpected inconsistensies >Confidential: no >Severity: serious >Priority: medium >Responsible: freebsd-bugs >State: open >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Sun May 25 21:10:01 PDT 1997 >Last-Modified: >Originator: Tor Egge >Organization: Norwegian University of Science and Technology, Trondheim, Norway >Release: FreeBSD 3.0-CURRENT i386 >Environment: FreeBSD 3.0-CURRENT ahc0 rev 0 int a irq 19 on pci0:9:0 ahc0: aic7880 Wide Channel, SCSI Id=7, 16/255 SCBs ahc1 rev 0 int a irq 16 on pci0:12:0 ahc1: aic7880 Wide Channel, SCSI Id=7, 16/255 SCBs sd0: scbus0 target 0 lun 0: type 0 fixed SCSI 2 cd1: scbus0 target 1 lun 0: type 5 removable SCSI 2 sd2: scbus0 target 2 lun 0: type 0 fixed SCSI 2 sd3: scbus0 target 3 lun 0: type 0 fixed SCSI 2 sd6: scbus1 target 2 lun 0: type 0 fixed SCSI 2 sd7: scbus1 target 3 lun 0: type 0 fixed SCSI 2 sd8: scbus1 target 4 lun 0: type 0 fixed SCSI 2 sd9: scbus1 target 5 lun 0: type 0 fixed SCSI 2 sd10: scbus1 target 6 lun 0: type 0 fixed SCSI 2 sd11: scbus1 target 8 lun 0: type 0 fixed SCSI 2 sd12: scbus1 target 9 lun 0: type 0 fixed SCSI 2 sd13: scbus1 target 10 lun 0: type 0 fixed SCSI 2 /etc/ccd.conf: ccd0 64 0 /dev/sd6d /dev/sd7d ccd1 64 0 /dev/sd8a /dev/sd9a /dev/sd10a ccd2 64 0 /dev/sd11a /dev/sd12a /dev/sd13a ccd3 64 0 /dev/sd2a /dev/sd3a sd0: a: 176715 0 4.2BSD 1024 8192 16 # (Cyl. 0 - 10) b: 1413720 176715 swap # (Cyl. 11 - 98) c: 8883945 0 unused 0 0 # (Cyl. 0 - 552) d: 160650 1590435 4.2BSD 512 4096 16 # (Cyl. 99 - 108) e: 321300 1751085 4.2BSD 512 4096 16 # (Cyl. 109 - 128) g: 6811560 2072385 4.2BSD 1024 8192 16 # (Cyl. 129 - 552) sd2: a: 8385930 0 4.2BSD 1024 8192 16 # (Cyl. 0 - 521) c: 8385930 0 unused 0 0 # (Cyl. 0 - 521) sd3: a: 8385930 0 4.2BSD 1024 8192 16 # (Cyl. 0 - 521) c: 8385930 0 unused 0 0 # (Cyl. 0 - 521) sd6 and sd7: a: 128520 0 4.2BSD 1024 8192 16 # (Cyl. 0 - 7) b: 706860 128520 swap # (Cyl. 8 - 51) c: 8883945 0 unused 0 0 # (Cyl. 0 - 552) d: 8048565 835380 4.2BSD 1024 8192 16 # (Cyl. 52 - 552) sd8, sd9, sd10, sd11, sd12, sd13: a: 8883945 0 4.2BSD 1024 8192 16 # (Cyl. 0 - 552) c: 8883945 0 unused 0 0 # (Cyl. 0 - 552) ccd0: c: 16097024 0 4.2BSD 0 0 0 # (Cyl. 0 - 7859*) ccd1 and ccd2: c: 26651712 0 4.2BSD 0 0 0 # (Cyl. 0 - 13013*) ccd3: c: 16771712 0 4.2BSD 0 0 0 # (Cyl. 0 - 8189*) /etc/fstab: /dev/sd0b none swap sw 0 0 /dev/sd6b none swap sw 0 0 /dev/sd7b none swap sw 0 0 /dev/sd0a / ufs rw 1 1 /dev/sd0e /store ufs rw 1 2 /dev/sd0g /usr ufs rw 1 2 /dev/sd0d /var ufs rw 1 2 proc /proc procfs rw 0 0 /dev/sd0b /tmp mfs rw,-s=240000 0 0 /dev/sd6a /resroot1 ufs rw 1 2 /dev/sd7a /resroot2 ufs rw 1 2 /dev/ccd0c /export/ftpsearch1 ufs rw 1 2 /dev/ccd1c /export/ftpsearch2 ufs rw 1 2 /dev/ccd2c /export/ftpsearch3 ufs rw 1 2 /dev/ccd3c /mirror ufs rw 1 2 >Description: When recovering from a system crash, `fsck -p' in /etc/rc complained about unexpected inconsistencies on 3 different filesystems. When running fsck manually on each of these filesystems, only the clean flag needed to be set in the superblock. The three filesystems were all located on ccd devices. When recovering from the next system crash, `fsck -p' in /etc/rc complained about the values in super block not agreeing with those in the first alternate, but when running fsck manually, only the clean flag needed to be set in the superblock. This was on a small partition (/resroot2) where no write operations had been performed since the last boot. When investigating the probable cause (on a different 3.0-CURRENT machine), I found that simultaneous open of several partitions on a disk (where no partitions were open before the attempt) caused inconsistent behaviour, sometimes with an kernel crash. sdopen on the different partitions ends up calling dsopen. As long as the first dsopen on the device has not completed, a new call to dsopen ends up doing the same reading of the disklabel from the device. When the first call to dsopen returns, the other calls to dsopen might still do things to the disk label and slice maps that causes reads/writes to the partition for which the first dsopen call was made to access wrong places on the disk. Writes to freed kernel memory might also occur. This probably triggered the kernel crash during the investigation. This bug does not explain my problems with fsck, which must have been caused by a different bug. >How-To-Repeat: Configure a disk (sd1) with several file systems: a: 204800 0 4.2BSD 1024 8192 16 # (Cyl. 0 - 99) c: 3450880 0 unused 0 0 # (Cyl. 0 - 1684) d: 204800 204800 4.2BSD 1024 8192 16 # (Cyl. 100 - 199) e: 204800 409600 4.2BSD 1024 8192 16 # (Cyl. 200 - 299) f: 204800 614400 4.2BSD 1024 8192 16 # (Cyl. 300 - 399) g: 204800 819200 4.2BSD 1024 8192 16 # (Cyl. 400 - 499) h: 2426880 1024000 4.2BSD 1024 8192 16 # (Cyl. 500 - 1684) No partitions from the disk mounted when performing parallel open of raw devices: #!/bin/sh fsck -n /dev/rsd1a & fsck -n /dev/rsd1d & fsck -n /dev/rsd1e & fsck -n /dev/rsd1f & fsck -n /dev/rsd1g & fsck -n /dev/rsd1h & Run this script several times. You should get some error messages similar to the following: ---- sd1: raw partition size != slice size sd1: start 0, end 3450901, size 3450902 sd1c: start 0, end 3450879, size 3450880 sd1s4: cannot find label (no disk label) sd1: raw partition size != slice size sd1: start 0, end 3450901, size 3450902 sd1c: start 0, end 3450879, size 3450880 sd1: raw partition size != slice size sd1: start 0, end 3450901, size 3450902 sd1c: start 0, end 3450879, size 3450880 sd1: raw partition size != slice size sd1: start 0, end 3450901, size 3450902 sd1c: start 0, end 3450879, size 3450880 sd1: raw partition size != slice size sd1: start 0, end 3450901, size 3450902 sd1c: start 0, end 3450879, size 3450880 sd1s4: cannot find label (no disk label) sd1: raw partition size != slice size sd1: start 0, end 3450901, size 3450902 sd1c: start 0, end 3450879, size 3450880 sd1: raw partition size != slice size sd1: start 0, end 3450901, size 3450902 sd1c: start 0, end 3450879, size 3450880 sd1: raw partition size != slice size sd1: start 0, end 3450901, size 3450902 sd1c: start 0, end 3450879, size 3450880 sd1s1: raw partition size != slice size sd1s1: start 0, end 3450901, size 3450902 sd1s1c: start 0, end 3450879, size 3450880 sd1: ILLEGAL REQUEST asc:21,0 Logical block address out of range Fatal trap 12: page fault while in kernel mode cpunumber = 1 fault virtual address = 0x8 fault code = supervisor read, page not present instruction pointer = 0x8:0xe0150db7 stack pointer = 0x10:0xe94afde0 frame pointer = 0x10:0xe94afe00 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 6 (cpuidle1) interrupt mask = --- >Fix: protect critical parts of dsopen (and other routines ?) with a lock ? >Audit-Trail: >Unformatted: