Date: Wed, 19 Oct 2016 08:52:24 +0200 From: Andrea Venturoli <ml@netfence.it> To: "freebsd-stable@freebsd.org" <freebsd-stable@freebsd.org> Subject: Nightly disk-related panic since upgrade to 10.3 Message-ID: <e923a01a-0739-1fc6-32aa-3a1658cd9e7f@netfence.it>
next in thread | raw e-mail | index | archive | help
Hello. Last week I upgraded a 9.3/amd64 box to 10.3: since then, it crashed and rebooted at least once every night. The only exception was on Friday, when it locked without rebooting: it still answered ping request and logins through HTTP would half work; I'm under the impression that the disk subsystem was hung, so ICMP would work since it does no I/O and HTTP too worked as far as no disk access was required. Today I was able to get a couple of (almost identical) dumps: > cpuid = 1 > KDB: stack backtrace: > #0 0xffffffff804ee170 at kdb_backtrace+0x60 > #1 0xffffffff804b4576 at vpanic+0x126 > #2 0xffffffff804b4443 at panic+0x43 > #3 0xffffffff8068fd2a at softdep_deallocate_dependencies+0x6a > #4 0xffffffff805394b5 at brelse+0x145 > #5 0xffffffff8053793c at bufwrite+0x3c > #6 0xffffffff806ae20f at ffs_write+0x3df > #7 0xffffffff8076d519 at VOP_WRITE_APV+0x149 > #8 0xffffffff806ec7c9 at vnode_pager_generic_putpages+0x2a9 > #9 0xffffffff8076f3b7 at VOP_PUTPAGES_APV+0xa7 > #10 0xffffffff806ea6f5 at vnode_pager_putpages+0xc5 > #11 0xffffffff806e17f8 at vm_pageout_flush+0xc8 > #12 0xffffffff806db432 at vm_object_page_collect_flush+0x182 > #13 0xffffffff806db1cd at vm_object_page_clean+0x13d > #14 0xffffffff806dadbe at vm_object_terminate+0x8e > #15 0xffffffff806eac60 at vnode_destroy_vobject+0x90 > #16 0xffffffff806b4232 at ufs_reclaim+0x22 > #17 0xffffffff8076e5c7 at VOP_RECLAIM_APV+0xa7 Has anyone any better insight on what might be going on? The disks are all connected to a SAS RAID adapter running on mfi; I don't think it might be an hardware issue, since it has worked perfectly for years until I did the upgrade; also mfiutil says everything is ok and nothing mfi-related is in the logs. Some ideas come to mind about which I might use a second opinion: _ soft-update is broken: that would really surprise me, since I've been using that for years on this and several other boxes (10.3 too); _ snapshot creation/deletion is causing this: again I'm using that almost anywhere, so I don't think this might be the cause alone; besides, I've been able to do some dumps without trouble and I don't think anything was messing with snapshots at the time of the last two panics; _ mfi driver is broken on 10.3: this is more reasonable to me, since this is the only machine I have it on and it's the only case where I get this panics. I found https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=183618, but I get no "g_vfs_done()..." messages. Any other hint? I'd really like to find out what's going on, I'll appreciate any help and I'm willing to provide any useful info. On the other hand, this is a production server, so I have to solve this really soon. Some idea comes to mind, like disabling softupdate (knowing which file system was having trouble would help here; is there any way to know?), trying to enable journaling, upgrading to 10-STABLE, build a kernel with INVARIANTS/WITNESS/etc..., but I'd appreciate a second opinion before I start shooting in the dark. bye & Thanks av.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?e923a01a-0739-1fc6-32aa-3a1658cd9e7f>