Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 7 Nov 2007 15:23:28 -0800
From:      Jeremy Chadwick <koitsu@FreeBSD.org>
To:        freebsd-stable@freebsd.org
Subject:   Re: RELENG_6 kernel panic + savecore(8) problem
Message-ID:  <20071107232328.GA1678@eos.sc1.parodius.com>
In-Reply-To: <20071107191611.GA1400@eos.sc1.parodius.com>
References:  <20071107191611.GA1400@eos.sc1.parodius.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Nov 07, 2007 at 11:16:11AM -0800, Jeremy Chadwick wrote:
> Tracing pid 3 tid 100001 td 0xc7c6ad80
> kdb_enter(3228441820,3228796672,3228487817,3867634632,256,...) at kdb_enter+48
> panic(3228487817,3426817152,256,3228643296,0,...) at panic+206
> handle_written_inodeblock(3459887104,3688934424,3226775710,3228787204,3228175693,...) at handle_written_inodeblock+1503
> softdep_disk_write_complete(3688934424,3227842097,3356275348,3867634836,3226342800,...) at softdep_disk_write_complete+241
> bufdone(3688934424,0,3867634856,3226352850,3356275348,...) at bufdone+126
> g_vfs_done(3356275348,0,0,3352445440,3355957180) at g_vfs_done+198
> biodone(3356275348,3228786984,588,3228423470,100,...) at biodone+178
> g_io_schedule_up(3351686528,76,3351679512,3226344072,3867634980,...) at g_io_schedule_up+137
> g_up_procbody(0,3867635000,0,0,0,...) at g_up_procbody+122
> fork_exit(3226344072,0,3867635000) at fork_exit+122
> fork_trampoline() at fork_trampoline+8

A follow-up to this:

It appears that somehow a few of the filesystems on the disk (it's a
single-disk system) were suffering from some bizarre form of soft update
corruption.

I csup'd + rebuilt/reinstalled kernel + world on the box.  Upon reboot,
I saw that a few of the filesystems were reporting errors on mount
and unmount:

/var: mount pending error: blocks 16 files 2
/home: mount pending error: blocks 3904 files 6
/home: unmount pending error: blocks 848 files 0

I dropped back into single user and did manual fsck's of all the
filesystems.  /tmp (somehow) and /var were still marked dirty, but had
no other problems.  /home did have problems.

Numerous reference count problems, ditto with some unrefs which required
dumping some partial data into lost+found.  There was also a single
instance of a "unexpected soft update inconsistency", although that may
have been induced by the panic.  Thankfully we do backups, so the user
won't lose anything.  The physical disk itself appears OK (looking at
SMART data, and a dd of the full disk had no I/O errors during reading).

I don't think any of this could explain the savecore(8) issue, since
savecore claimed there was no core to save.  But I did want to follow-
up on this so that it wasn't a mailing list thread left hanging.  :-)

If the issue crops up again, I'll likely be replacing the disk (as a
precaution) and rebuilding all the filesystems from scratch.

-- 
| Jeremy Chadwick                                    jdc at parodius.com |
| Parodius Networking                           http://www.parodius.com/ |
| UNIX Systems Administrator                      Mountain View, CA, USA |
| Making life hard for others since 1977.                  PGP: 4BD6C0CB |




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20071107232328.GA1678>