Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 5 Mar 2012 17:12:40 -0500
From:      Arnaud Lacombe <lacombar@gmail.com>
To:        FreeBSD Stable <freebsd-stable@freebsd.org>
Subject:   Heavy fs corruption with 9.0-RELEASE
Message-ID:  <CACqU3MW734kWjNxH2SWHBb_ERiN9TE0ju6DM%2BAh8zkGmh-j5TA@mail.gmail.com>

next in thread | raw e-mail | index | archive | help
Hi,

I've been running a couple of system with 9.0-RELEASE since it is out.
All the system were installed through the standard installation
procedure. After unclean reboot, either crash or power-failure, I get
a huge amount of really bad filesystem corruption (read: "silent",
fs-wide, corruptions). This happens with either i386 or amd64 build.
Systems involved use compact flash as their system permanent storage
medium.

Typical symptoms are:

[during rc startup]
Starting sshd.
/usr/lib/libkrb5.so.10: invalid file format/etc/rc: WARNING: failed to
start sshd
/usr/libexec/sendmail/sendmail: Undefined symbol
"SSL_library_init"/usr/libexec/sendmail/sendmail: Undefined symbol
"SSL_library_init"Starting cron.

[after startup, dropped in single user, remount / read-only + ran
`fsck -y /' and went back multi-user]
Starting sshd.
Segmentation fault
Mar  5 18:07:38 test kernel: Failed to write core file for process
sshd (error 14)
/etc/rc: WARNING: failed to start sshd
Segmentation fault
Segmentation fault
Starting cron.
/usr/lib/libgnuregex.so.5: invalid file
format/usr/lib/libgnuregex.so.5: invalid file formatStarting
background file system checks in 60 seconds.

well, something looks broken, let's investage...

# file /usr/include/* | tail -20
/usr/include/ulog.h:            broken symbolic link to `liblzma.so.5'
/usr/include/unctrl.h:          ELF 64-bit LSB shared object, x86-64,
version 1 (FreeBSD), dynamically linked, stripped
/usr/include/unistd.h:          ELF 64-bit LSB shared object, x86-64,
version 1 (FreeBSD), dynamically linked, stripped
/usr/include/usb.h:             current ar archive
/usr/include/usbhid.h:          current ar archive
/usr/include/utempter.h:        current ar archive
/usr/include/utime.h:           broken symbolic link to `librt.so.1'
/usr/include/utmpx.h:           current ar archive
/usr/include/uuid.h:            current ar archive
/usr/include/varargs.h:         ELF 64-bit LSB shared object, x86-64,
version 1 (FreeBSD), dynamically linked, stripped
/usr/include/vgl.h:             ELF 64-bit LSB shared object, x86-64,
version 1 (FreeBSD), dynamically linked, stripped
/usr/include/vis.h:             broken symbolic link to `librtld_db.so.2'
/usr/include/vm:                directory
/usr/include/wchar.h:           current ar archive
/usr/include/wctype.h:          current ar archive
/usr/include/wordexp.h:         ELF 64-bit LSB shared object, x86-64,
version 1 (FreeBSD), dynamically linked, stripped
/usr/include/x86:               directory
/usr/include/ypclnt.h:          ELF 64-bit LSB shared object, x86-64,
version 1 (FreeBSD), dynamically linked, stripped
/usr/include/zconf.h:           current ar archive
/usr/include/zlib.h:            current ar archive

"since when /usr/include contains a majority of binary file ?"

# ssh
Undefined symbol "ssh_compat20" referenced from COPY relocation in /usr/bin/ssh

# file /usr/lib/libssh.so.5
/usr/lib/libssh.so.5: symbolic link to `libopie.so.7'

"what ?"

# file /usr/lib/snmp*
/usr/lib/snmp_atm.so:        ELF 64-bit LSB shared object, x86-64,
version 1 (FreeBSD), dynamically linked, stripped
/usr/lib/snmp_atm.so.6:      symbolic link to `pam_deny.so.5'
/usr/lib/snmp_bridge.so:     ELF 64-bit LSB shared object, x86-64,
version 1 (FreeBSD), dynamically linked, stripped
/usr/lib/snmp_bridge.so.6:   symbolic link to `pam_echo.so.5'
/usr/lib/snmp_hostres.so:    ELF 64-bit LSB shared object, x86-64,
version 1 (FreeBSD), dynamically linked, stripped
/usr/lib/snmp_hostres.so.6:  symbolic link to `pam_exec.so.5'
/usr/lib/snmp_mibII.so:      ELF 64-bit LSB shared object, x86-64,
version 1 (FreeBSD), dynamically linked, stripped
/usr/lib/snmp_mibII.so.6:    symbolic link to `pam_ftpusers.so.5'
/usr/lib/snmp_netgraph.so:   ELF 64-bit LSB shared object, x86-64,
version 1 (FreeBSD), dynamically linked, stripped
/usr/lib/snmp_netgraph.so.6: symbolic link to `pam_login_access.so.5'
[...]

"why `snmp_netgraph.so.6' would be linked to `pam_login_access.so.5' ?"

Unsurprisingly, fsck (still) detects a lot of inconsistency:

# fsck -f /
** /dev/ada0p2 (NO WRITE)

USE JOURNAL? no

** Skipping journal, falling through to full fsck
SETTING DIRTY FLAG IN READ_ONLY MODE

UNEXPECTED SOFT UPDATE INCONSISTENCY
** Last Mounted on /
** Root file system
** Phase 1 - Check Blocks and Sizes
124184 DUP I=31488
UNEXPECTED SOFT UPDATE INCONSISTENCY

124185 DUP I=31488
UNEXPECTED SOFT UPDATE INCONSISTENCY

124186 DUP I=31488
UNEXPECTED SOFT UPDATE INCONSISTENCY

124187 DUP I=31488
UNEXPECTED SOFT UPDATE INCONSISTENCY

124188 DUP I=31488
UNEXPECTED SOFT UPDATE INCONSISTENCY

124189 DUP I=31488
UNEXPECTED SOFT UPDATE INCONSISTENCY

[...]

EXCESSIVE DUP BLKS I=31494
CONTINUE? [yn]

I do not see this behavior when running 9.0-RELEASE on top of a
7.4-RELEASE userland (including FS). I've seen this behavior on
various CF, so a single bad card is unlikely to be the culprit.

Here are the currently mounted filesystem on the machine, as well as
mount options:

# mount
/dev/ada0p2 on / (ufs, local, journaled soft-updates)
devfs on /dev (devfs, local, multilabel)

Any hints appreciated.

Thanks,
 - Arnaud



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CACqU3MW734kWjNxH2SWHBb_ERiN9TE0ju6DM%2BAh8zkGmh-j5TA>