Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 26 May 1997 06:07:09 +0200 (MET DST)
From:      Tor Egge <Tor.Egge@idi.ntnu.no>
To:        FreeBSD-gnats-submit@FreeBSD.ORG
Subject:   kern/3688: fsck -p gets transient unexpected inconsistensies
Message-ID:  <199705260407.GAA01188@ikke.idi.ntnu.no>
Resent-Message-ID: <199705260410.VAA26079@hub.freebsd.org>

next in thread | raw e-mail | index | archive | help

>Number:         3688
>Category:       kern
>Synopsis:       fsck -p gets transient unexpected inconsistensies
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sun May 25 21:10:01 PDT 1997
>Last-Modified:
>Originator:     Tor Egge
>Organization:
Norwegian University of Science and Technology, Trondheim, Norway
>Release:        FreeBSD 3.0-CURRENT i386
>Environment:

FreeBSD 3.0-CURRENT

ahc0 <Adaptec 2940 Ultra SCSI host adapter> rev 0 int a irq 19 on pci0:9:0
ahc0: aic7880 Wide Channel, SCSI Id=7, 16/255 SCBs
	
ahc1 <Adaptec 2940 Ultra SCSI host adapter> rev 0 int a irq 16 on pci0:12:0
ahc1: aic7880 Wide Channel, SCSI Id=7, 16/255 SCBs

sd0:  scbus0 target 0 lun 0: <QUANTUM XP34550W LXY4> type 0 fixed SCSI 2
cd1:  scbus0 target 1 lun 0: <MATSHITA CD-ROM CR-506 8S05> type 5 removable SCSI 2
sd2:  scbus0 target 2 lun 0: <SEAGATE ST15150N 0905> type 0 fixed SCSI 2
sd3:  scbus0 target 3 lun 0: <Quantum XP34300 L915> type 0 fixed SCSI 2
sd6:  scbus1 target 2 lun 0: <QUANTUM XP34550W LXY4> type 0 fixed SCSI 2
sd7:  scbus1 target 3 lun 0: <QUANTUM XP34550W LXY4> type 0 fixed SCSI 2
sd8:  scbus1 target 4 lun 0: <QUANTUM XP34550W LXY4> type 0 fixed SCSI 2
sd9:  scbus1 target 5 lun 0: <QUANTUM XP34550W LXY4> type 0 fixed SCSI 2
sd10: scbus1 target 6 lun 0: <QUANTUM XP34550W LXY4> type 0 fixed SCSI 2
sd11: scbus1 target 8 lun 0: <QUANTUM XP34550W LXY4> type 0 fixed SCSI 2
sd12: scbus1 target 9 lun 0: <QUANTUM XP34550W LXY4> type 0 fixed SCSI 2
sd13: scbus1 target 10 lun 0: <QUANTUM XP34550W LXY1> type 0 fixed SCSI 2

/etc/ccd.conf:
ccd0    64      0       /dev/sd6d /dev/sd7d
ccd1    64      0       /dev/sd8a /dev/sd9a /dev/sd10a
ccd2    64      0       /dev/sd11a /dev/sd12a /dev/sd13a
ccd3    64      0       /dev/sd2a /dev/sd3a

sd0:
  a:   176715        0    4.2BSD     1024  8192    16   # (Cyl.    0 - 10)
  b:  1413720   176715      swap                        # (Cyl.   11 - 98)
  c:  8883945        0    unused        0     0         # (Cyl.    0 - 552)
  d:   160650  1590435    4.2BSD      512  4096    16   # (Cyl.   99 - 108)
  e:   321300  1751085    4.2BSD      512  4096    16   # (Cyl.  109 - 128)
  g:  6811560  2072385    4.2BSD     1024  8192    16   # (Cyl.  129 - 552)

sd2:
  a:  8385930        0    4.2BSD     1024  8192    16   # (Cyl.    0 - 521)
  c:  8385930        0    unused        0     0         # (Cyl.    0 - 521)

sd3:
  a:  8385930        0    4.2BSD     1024  8192    16   # (Cyl.    0 - 521)
  c:  8385930        0    unused        0     0         # (Cyl.    0 - 521)

sd6 and sd7:
  a:   128520        0    4.2BSD     1024  8192    16   # (Cyl.    0 - 7)
  b:   706860   128520      swap                        # (Cyl.    8 - 51)
  c:  8883945        0    unused        0     0         # (Cyl.    0 - 552)
  d:  8048565   835380    4.2BSD     1024  8192    16   # (Cyl.   52 - 552)

sd8, sd9, sd10, sd11, sd12, sd13:
  a:  8883945        0    4.2BSD     1024  8192    16   # (Cyl.    0 - 552)
  c:  8883945        0    unused        0     0         # (Cyl.    0 - 552)

ccd0:
  c: 16097024        0    4.2BSD        0     0     0   # (Cyl.    0 - 7859*)

ccd1 and ccd2:
  c: 26651712        0    4.2BSD        0     0     0   # (Cyl.    0 - 13013*)

ccd3:
  c: 16771712        0    4.2BSD        0     0     0   # (Cyl.    0 - 8189*)

/etc/fstab:
/dev/sd0b                       none            swap    sw 0 0
/dev/sd6b                       none            swap    sw 0 0
/dev/sd7b                       none            swap    sw 0 0
/dev/sd0a                       /               ufs     rw 1 1
/dev/sd0e                       /store          ufs     rw 1 2
/dev/sd0g                       /usr            ufs     rw 1 2
/dev/sd0d                       /var            ufs     rw 1 2
proc                            /proc           procfs  rw 0 0
/dev/sd0b                       /tmp            mfs     rw,-s=240000 0 0
/dev/sd6a                       /resroot1       ufs     rw 1 2
/dev/sd7a                       /resroot2       ufs     rw 1 2
/dev/ccd0c                      /export/ftpsearch1 ufs  rw 1 2
/dev/ccd1c                      /export/ftpsearch2 ufs  rw 1 2
/dev/ccd2c                      /export/ftpsearch3 ufs  rw 1 2
/dev/ccd3c                      /mirror            ufs  rw 1 2 

>Description:

When recovering from a system crash, `fsck -p' in /etc/rc complained about
unexpected inconsistencies on 3 different filesystems. When running fsck
manually on each of these filesystems, only the clean flag needed to be set in
the superblock. The three filesystems were all located on ccd devices.

When recovering from the next system crash, `fsck -p' in /etc/rc complained
about the values in super block not agreeing with those in the first alternate,
but when running fsck manually, only the clean flag needed to be set in the
superblock. This was on a small partition (/resroot2) where no write operations
had been performed since the last boot.

When investigating the probable cause (on a different 3.0-CURRENT machine), I
found that simultaneous open of several partitions on a disk (where no
partitions were open before the attempt) caused inconsistent behaviour,
sometimes with an kernel crash.

sdopen on the different partitions ends up calling dsopen. As long as the first
dsopen on the device has not completed, a new call to dsopen ends up doing the
same reading of the disklabel from the device. When the first call to dsopen
returns, the other calls to dsopen might still do things to the disk label and
slice maps that causes reads/writes to the partition for which the first dsopen
call was made to access wrong places on the disk.

Writes to freed kernel memory might also occur. This probably triggered
the kernel crash during the investigation.

This bug does not explain my problems with fsck, which must have been
caused by a different bug.

>How-To-Repeat:

Configure a disk (sd1) with several file systems:
  a:   204800        0    4.2BSD     1024  8192    16   # (Cyl.    0 - 99)
  c:  3450880        0    unused        0     0         # (Cyl.    0 - 1684)
  d:   204800   204800    4.2BSD     1024  8192    16   # (Cyl.  100 - 199)
  e:   204800   409600    4.2BSD     1024  8192    16   # (Cyl.  200 - 299)
  f:   204800   614400    4.2BSD     1024  8192    16   # (Cyl.  300 - 399)
  g:   204800   819200    4.2BSD     1024  8192    16   # (Cyl.  400 - 499)
  h:  2426880  1024000    4.2BSD     1024  8192    16   # (Cyl.  500 - 1684)

No partitions from the disk mounted when performing parallel open of raw 
devices:

#!/bin/sh
fsck -n /dev/rsd1a &
fsck -n /dev/rsd1d &
fsck -n /dev/rsd1e &
fsck -n /dev/rsd1f &
fsck -n /dev/rsd1g &
fsck -n /dev/rsd1h &

Run this script several times. You should get some error messages 
similar to the following:

----
sd1: raw partition size != slice size
sd1: start 0, end 3450901, size 3450902
sd1c: start 0, end 3450879, size 3450880
sd1s4: cannot find label (no disk label)
sd1: raw partition size != slice size
sd1: start 0, end 3450901, size 3450902
sd1c: start 0, end 3450879, size 3450880
sd1: raw partition size != slice size
sd1: start 0, end 3450901, size 3450902
sd1c: start 0, end 3450879, size 3450880
sd1: raw partition size != slice size
sd1: start 0, end 3450901, size 3450902
sd1c: start 0, end 3450879, size 3450880
sd1: raw partition size != slice size
sd1: start 0, end 3450901, size 3450902
sd1c: start 0, end 3450879, size 3450880
sd1s4: cannot find label (no disk label)
sd1: raw partition size != slice size
sd1: start 0, end 3450901, size 3450902
sd1c: start 0, end 3450879, size 3450880
sd1: raw partition size != slice size
sd1: start 0, end 3450901, size 3450902
sd1c: start 0, end 3450879, size 3450880
sd1: raw partition size != slice size
sd1: start 0, end 3450901, size 3450902
sd1c: start 0, end 3450879, size 3450880
sd1s1: raw partition size != slice size
sd1s1: start 0, end 3450901, size 3450902
sd1s1c: start 0, end 3450879, size 3450880
sd1: ILLEGAL REQUEST asc:21,0 Logical block address out of range

Fatal trap 12: page fault while in kernel mode
cpunumber = 1
fault virtual address	= 0x8
fault code		= supervisor read, page not present
instruction pointer	= 0x8:0xe0150db7
stack pointer	        = 0x10:0xe94afde0
frame pointer	        = 0x10:0xe94afe00
code segment		= base 0x0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, def32 1, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 6 (cpuidle1)
interrupt mask		= 
---

>Fix:
	
protect critical parts of dsopen (and other routines ?) with a lock ?
>Audit-Trail:
>Unformatted:



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199705260407.GAA01188>