Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 16 Jun 2013 10:02:39 +0200
From:      Andre Albsmeier <Andre.Albsmeier@siemens.com>
To:        Jeremy Chadwick <jdc@koitsu.org>
Cc:        "freebsd-stable@freebsd.org" <freebsd-stable@freebsd.org>, John Baldwin <jhb@freebsd.org>
Subject:   Re: FreeBSD-9.1: machine reboots during snapshot creation, LORs found
Message-ID:  <20130616080239.GA73100@bali>
In-Reply-To: <20130616065441.GA15175@icarus.home.lan>
References:  <20130531122611.GA6607@bali> <201305311051.03157.jhb@freebsd.org> <20130531172523.GA9188@bali> <20130616065441.GA15175@icarus.home.lan>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 16-Jun-2013 at 08:54:41 +0200, Jeremy Chadwick wrote:
> On Fri, May 31, 2013 at 07:25:23PM +0200, Andre Albsmeier wrote:
> > On Fri, 31-May-2013 at 16:51:03 +0200, John Baldwin wrote:
> > > On Friday, May 31, 2013 8:26:11 am Andre Albsmeier wrote:
> > > > Each day at 5:15 we are generating snapshots on various machines.
> > > > This used to work perfectly under 7-STABLE for years but since
> > > > we started to use 9.1-STABLE the machine reboots in about 10%
> > > > of all cases.
> > > > 
> > > > After rebooting we find a new snapshot file which is a bit
> > > > smaller than the good ones and with different permissions
> > > > It does not succeed a fsck. In this example it is the one
> > > > whose name is beginning with s3:
> > > > 
> > > > -r--r-----   1 root  operator  snapshot 72802894528 29 May 05:15 s2-2013.05.28-03.15.04
> > > > -r--------   1 root  operator  snapshot 72802893824 29 May 05:15 s3-2013.05.29-03.15.03
> > > > -r--r-----   1 root  operator  snapshot 72802894528 28 May 14:22 s4-2013.05.23-06.38.44
> > > > -r--r-----   1 root  operator  snapshot 72802894528 28 May 14:22 s5-2013.05.24-03.15.03
> > > > -r--r-----   1 root  operator  snapshot 72802894528 28 May 14:22 s6-2013.05.25-03.15.03
> > > > 
> > > > After enabling DIAGNOSTIC, WITNESS and INVARIANTS in the kernel
> > > > I see the following LORs (mksnap_ffs starts exactly at 5:15):
> > > > 
> > > > May 29 05:15:00 <kern.crit> palveli kernel: lock order reversal:
> > > > May 29 05:15:00 <kern.crit> palveli kernel: 1st 0xc2371da8 ufs (ufs) @ /src/src-9/sys/kern/vfs_mount.c:1240
> > > > May 29 05:15:00 <kern.crit> palveli kernel: 2nd 0xc2371ec4 devfs (devfs) @ /src/src-9/sys/ufs/ffs/ffs_vfsops.c:1414
> > > > May 29 05:15:04 <kern.crit> palveli kernel: lock order reversal:
> > > > May 29 05:15:04 <kern.crit> palveli kernel: 1st 0xc228471c snaplk (snaplk) @ /src/src-9/sys/ufs/ufs/ufs_vnops.c:976
> > > > May 29 05:15:04 <kern.crit> palveli kernel: 2nd 0xc22f25e4 ufs (ufs) @ /src/src-9/sys/ufs/ffs/ffs_snapshot.c:1626
> > > > 
> > > > Unfortunatley no corefiles are being generated ;-(.
> > > > 
> > > > I have checked and even rebuilt the (UFS1) fs in question
> > > > from scratch. I have also seen this happen on an UFS2 on
> > > > another machine and on a third one when running "dump -L"
> > > > on a root fs.
> > > > 
> > > > Any hints of how to proceed?
> > > 
> > > Would it be possible to setup a serial console that is logged on this machine
> > > to see if it is panic'ing but failing to write out a crashdump?
> > 
> > I'll try to arrange that. It'll take a bit since this
> > box is 200 km away... 
> > 
> > Maybe I'll find another one nearby to reproduce it...
> 
> SPECIFICALLY regarding "lack of crash dumps": I need to see the
> following:
> 
> * cat /etc/rc.conf
> * cat /etc/fstab
> 
> I may need output from other commands, but shall deal with that when I
> see output from the above.  Thanks.

No problem, see below...

To make a long story short, the machine dumps core perfectly
(tested that a while ago), but not when dealing with _this_
issue...

I dump on da1s1b and savecore fetches it from there and puts
it on /var (sitting on da0), that's faster.

rc.conf (beware, rc.conf.local exists):
---------------------------------------
rcshutdown_timeout=180
tmpmfs=YES
tmpsize="$(( `/sbin/sysctl -n hw.usermem` / 3000000 ))m"
tmpmfs_flags="$tmpmfs_flags -v 1 -n"

background_fsck=NO

nisdomainname=ofw.tld
pflog_flags=-S

syslogd_flags=-svv
inetd_enable=YES
inetd_flags=-l
named_flags="-S 1000"
named_chrootdir=""
rwhod_enable=YES
sshd_enable=YES
amd_enable=YES
amd_flags="-F /etc/amd.conf"
nfs_client_enable=YES
nfs_access_cache=2
mountd_flags=-n
rpcbind_enable=YES

ntpdate_enable=YES
ntpdate_hosts=ntp
ntpd_enable=YES
ntpd_flags="-p /var/run/ntpd.pid"

nis_client_enable=YES
nis_client_flags="-s -S ofw.tld,nis-16-1,nis-16-2"
nis_server_flags=-n
nis_yppasswdd_flags="-t /var/yp/src/master.passwd -f -v"

defaultrouter=192.168.16.2

keyrate=fast

sendmail_flags="-bd -q5m"
sendmail_submit_flags="$sendmail_flags -ODaemonPortOptions=Addr=localhost"
sendmail_msp_queue_flags="-Ac -q30m"
sendmail_rebuild_aliases=NO

lpd_enable=YES
lpd_flags=-s
chkprintcap_enable=YES
dumpdev=AUTO
clear_tmp_X=NO
ldconfig_paths=/usr/local/lib
ldconfig_paths_aout=""
entropy_file=/boot/entropy-file


rc.conf.local:
--------------
hostname=typhon.ofw.tld
ifconfig_msk0="inet 192.168.24.1/21"
ifconfig_msk0_alias0="inet 192.168.24.10/32"

named_enable=YES
nfs_server_enable=YES

nis_client_flags="-s -S ofw.tld,nis-24-1,nis-24-2"
nis_server_enable=YES

defaultrouter=192.168.24.2

lpd_flags=-l
dumpdev=/dev/da1s1b
quota_enable=YES


fstab:
------
/dev/da0s1a	/		ufs	noatime,rw				0 1
/dev/da0s1b	none		swap	sw					0 0
proc		/proc		procfs	rw					0 0
/dev/da0s1d	/usr		ufs	noatime,rw				0 2
/dev/da0s1e	/var		ufs	noatime,nosuid,rw			0 2

/dev/da10p1	/share2		ufs	suiddir,groupquota,noatime,nosuid,rw	0 2
/dev/da10p2	/raid2		ufs	userquota,noatime,nosuid,rw		0 2





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130616080239.GA73100>