Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 4 Jul 2013 08:24:40 +0300
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Andre Albsmeier <Andre.Albsmeier@siemens.com>
Cc:        "freebsd-stable@freebsd.org" <freebsd-stable@freebsd.org>, John Baldwin <jhb@freebsd.org>
Subject:   Re: FreeBSD-9.1: machine reboots during snapshot creation, LORs found
Message-ID:  <20130704052440.GG91021@kib.kiev.ua>
In-Reply-To: <20130704051409.GA22021@bali>
References:  <20130531122611.GA6607@bali> <201305311051.03157.jhb@freebsd.org> <20130616063942.GA72803@bali> <201306171530.31208.jhb@freebsd.org> <20130704051409.GA22021@bali>

next in thread | previous in thread | raw e-mail | index | archive | help

--vDF0KQD20bz5pO0G
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu, Jul 04, 2013 at 07:14:09AM +0200, Andre Albsmeier wrote:
> On Mon, 17-Jun-2013 at 21:30:31 +0200, John Baldwin wrote:
> > On Sunday, June 16, 2013 2:39:42 am Andre Albsmeier wrote:
> > > On Fri, 31-May-2013 at 16:51:03 +0200, John Baldwin wrote:
> > > > On Friday, May 31, 2013 8:26:11 am Andre Albsmeier wrote:
> > > > > Each day at 5:15 we are generating snapshots on various machines.
> > > > > This used to work perfectly under 7-STABLE for years but since
> > > > > we started to use 9.1-STABLE the machine reboots in about 10%
> > > > > of all cases.
> > > > >=20
> > > > > After rebooting we find a new snapshot file which is a bit
> > > > > smaller than the good ones and with different permissions
> > > > > It does not succeed a fsck. In this example it is the one
> > > > > whose name is beginning with s3:
> > > > >=20
> > > > > -r--r-----   1 root  operator  snapshot 72802894528 29 May 05:15 =
s2-2013.05.28-03.15.04
> > > > > -r--------   1 root  operator  snapshot 72802893824 29 May 05:15 =
s3-2013.05.29-03.15.03
> > > > > -r--r-----   1 root  operator  snapshot 72802894528 28 May 14:22 =
s4-2013.05.23-06.38.44
> > > > > -r--r-----   1 root  operator  snapshot 72802894528 28 May 14:22 =
s5-2013.05.24-03.15.03
> > > > > -r--r-----   1 root  operator  snapshot 72802894528 28 May 14:22 =
s6-2013.05.25-03.15.03
> > > > >=20
> > > > > After enabling DIAGNOSTIC, WITNESS and INVARIANTS in the kernel
> > > > > I see the following LORs (mksnap_ffs starts exactly at 5:15):
> > > > >=20
> > > > > May 29 05:15:00 <kern.crit> palveli kernel: lock order reversal:
> > > > > May 29 05:15:00 <kern.crit> palveli kernel: 1st 0xc2371da8 ufs (u=
fs) @ /src/src-9/sys/kern/vfs_mount.c:1240
> > > > > May 29 05:15:00 <kern.crit> palveli kernel: 2nd 0xc2371ec4 devfs =
(devfs) @ /src/src-9/sys/ufs/ffs/ffs_vfsops.c:1414
> > > > > May 29 05:15:04 <kern.crit> palveli kernel: lock order reversal:
> > > > > May 29 05:15:04 <kern.crit> palveli kernel: 1st 0xc228471c snaplk=
 (snaplk) @ /src/src-9/sys/ufs/ufs/ufs_vnops.c:976
> > > > > May 29 05:15:04 <kern.crit> palveli kernel: 2nd 0xc22f25e4 ufs (u=
fs) @ /src/src-9/sys/ufs/ffs/ffs_snapshot.c:1626
> > > > >=20
> > > > > Unfortunatley no corefiles are being generated ;-(.
> > > > >=20
> > > > > I have checked and even rebuilt the (UFS1) fs in question
> > > > > from scratch. I have also seen this happen on an UFS2 on
> > > > > another machine and on a third one when running "dump -L"
> > > > > on a root fs.
> > > > >=20
> > > > > Any hints of how to proceed?
> > > >=20
> > > > Would it be possible to setup a serial console that is logged on th=
is machine
> > > > to see if it is panic'ing but failing to write out a crashdump?
> > >=20
> > > Couldn't attach the serial console yet ;-(. But I had people
> > > attach a KVMoverIP switch and enabled the various KDB options
> > > in the kernel. Now we can see a bit more (see below) -- no
> > > crashdump is being generated though.
> >=20
> > :(  Unfortunately these LORs don't really help with discerning the caus=
e of
> > the reboot.  If you have remote power access (and still wanted to test =
this)
> > one option would be to change KDB to drop into the debugger on a panic.
> > Then you could connect over the KVM and take images of the original pan=
ic
> > along with a stack trace.
>=20
> After a few days of no problems, the box decided to crash
> during mksnap_ffs today ;-(. But now I have a crashdump,
> see below. Unfortunatley, I cannot upload the dump somewhere
> but if you ask me check whatever things I'll be happy to help.
>=20
> kgdb /usr/obj/src/src-9/sys/palveli/kernel.debug vmcore.4
> GNU gdb 6.1.1 [FreeBSD]
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you =
are
> welcome to change it and/or distribute copies of it under certain conditi=
ons.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for detail=
s.
> This GDB was configured as "i386-marcel-freebsd"...
>=20
> Unread portion of the kernel message buffer:
>=20
>=20
> Fatal trap 12: page fault while in kernel mode
> fault virtual address   =3D 0xcfb5e000
> fault code              =3D supervisor write, page not present
> instruction pointer     =3D 0x20:0xc07cb2fe
> stack pointer           =3D 0x28:0xd83545d0
> frame pointer           =3D 0x28:0xd835490c
> code segment            =3D base 0x0, limit 0xfffff, type 0x1b
>                         =3D DPL 0, pres 1, def32 1, gran 1
> processor eflags        =3D interrupt enabled, resume, IOPL =3D 0
> current process         =3D 12929 (mksnap_ffs)
> trap number             =3D 12
> panic: page fault
> KDB: stack backtrace:
> db_trace_self_wrapper(c08207eb,d835441c,c05fdfc9,c081df13,c08a82e0,...) a=
t db_trace_self_wrapper+0x26/frame 0xd83543ec
> kdb_backtrace(c081df13,c08a82e0,c0801bfa,d8354428,d8354428,...) at kdb_ba=
cktrace+0x29/frame 0xd83543f8
> panic(c0801bfa,c0845a01,c2bafae4,1,1,...) at panic+0xc9/frame 0xd835441c
> trap_fatal(c0ff6000,cfb5e000,2,0,265abf,...) at trap_fatal+0x353/frame 0x=
d835445c
> trap_pfault(140da,0,c2baf930,c08b6a40,c282145c,...) at trap_pfault+0x2d7/=
frame 0xd83544a4
> trap(d8354590) at trap+0x41a/frame 0xd8354584
> calltrap() at calltrap+0x6/frame 0xd8354584
> --- trap 0xc, eip =3D 0xc07cb2fe, esp =3D 0xd83545d0, ebp =3D 0xd835490c =
---
> bcopy(c2b36548,c2f194e0,0,0,0,...) at bcopy+0x1a/frame 0xd835490c
> ffs_mount(c2b36548,c2db9000,ff,d8354c08,c2b665e4,...) at ffs_mount+0x15ee=
/frame 0xd8354a3c

=46rom the crash dump in kgdb, do
list *ffs_mount+0x15ee

> vfs_donmount(c2baf930,10313108,0,c2b8ba80,c2b8ba80,...) at vfs_donmount+0=
x196b/frame 0xd8354c2c
> sys_nmount(c2baf930,d8354ccc,c2bafc18,d8354c6c,c0605015,...) at sys_nmoun=
t+0x63/frame 0xd8354c50
> syscall(d8354d08) at syscall+0x2ce/frame 0xd8354cfc
> Xint0x80_syscall() at Xint0x80_syscall+0x21/frame 0xd8354cfc
> --- syscall (378, FreeBSD ELF32, sys_nmount), eip =3D 0x180bdf37, esp =3D=
 0xbfbfd65c, ebp =3D 0xbfbfddd8 ---
> Uptime: 2d21h49m21s
> Physical memory: 503 MB
> Dumping 108 MB: 93 77 61 45 29 13
>=20
> No symbol "stopped_cpus" in current context.
> No symbol "stoppcbs" in current context.
> #0  doadump (textdump=3D1) at pcpu.h:249
> 249     pcpu.h: No such file or directory.
>         in pcpu.h
> (kgdb) where
> #0  doadump (textdump=3D1) at pcpu.h:249
> #1  0xc05fdddd in kern_reboot (howto=3D260) at /src/src-9/sys/kern/kern_s=
hutdown.c:449
> #2  0xc05fe028 in panic (fmt=3D<value optimized out>) at /src/src-9/sys/k=
ern/kern_shutdown.c:637
> #3  0xc07cd1d3 in trap_fatal (frame=3D0xd8354590, eva=3D3484803072)
>     at /src/src-9/sys/i386/i386/trap.c:1044
> #4  0xc07cd4b7 in trap_pfault (frame=3D0xd8354590, usermode=3D0, eva=3D34=
84803072)
>     at /src/src-9/sys/i386/i386/trap.c:957
> #5  0xc07ce05a in trap (frame=3D0xd8354590) at /src/src-9/sys/i386/i386/t=
rap.c:555
> #6  0xc07ba88c in calltrap () at /src/src-9/sys/i386/i386/exception.s:170
> #7  0xc07cb2fe in bcopy () at /src/src-9/sys/i386/i386/support.s:196
> Previous frame inner to this frame (corrupt stack?)
> (kgdb)=20
>=20
> 	-Andre
> _______________________________________________
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"

--vDF0KQD20bz5pO0G
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.20 (FreeBSD)

iQIcBAEBAgAGBQJR1QcXAAoJEJDCuSvBvK1BzdoP+gPvfkqV1v+ae8dg+8WWgSWo
L5nAKmrtRsj+teXXhmqS8pf5W536Uizs6rbA0WcPYBbBvcKcmd2o14aDt9NPgw/1
L8zi3ejMUthSsjcAxNI+/O8ZOpt3Ntw37t4RPuokKusqZBuca7D6xq+ZnKZyV0Y1
ge8NFOQ6YLCb4cOSrmxV/hgzpiOLfsG48YDov6WydUrfVYSagxNyF3sgWIKhvUda
qz7ps/Y9YmLQv1Z0WvD4ybaywM/3SLP1vl3WWuOT0GKK7GdZqkS80yHEDudoFFEq
N1LG34dNncXSE58wuBor003Pa2agReRJHHtqRZeRSaDi5sOs891weyEKgg7mUZhx
DcnKXZ+Ovaxw0rxqw0U/u9wQnmzeSNz83QHax22mkjrh2KPivVEE1XuaBRs92VI8
U4TdFUK6yViZfZI0z0uCM+C1jIp3PHpQh1BnnUZMAQ6A3NABIsCl/AIiABQeGKYx
gr3oOj7PXBXSWHbNJHGsOKTeXODKTBlVuEgO9wiPTVRW5iz9kMvG4YZq/xUenjtP
z2jgdYU9CoALAT0gVUPp0dzMFVzWuO8GDSWZf33fBK6JK82G5+LraH5uahH9mbjF
ItRKRZ6BuPfYOmokv5khR+ZposFwBBfbzNREiD44UpKfUp3/b1TCQG1PJboMZAft
KvnTtCrqTJHSNVZ/nyqG
=2DSj
-----END PGP SIGNATURE-----

--vDF0KQD20bz5pO0G--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130704052440.GG91021>