Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 15 Oct 2008 19:24:28 +1100
From:      Peter Jeremy <peterjeremy@optushome.com.au>
To:        freebsd-stable@freebsd.org
Subject:   System hanging during dump
Message-ID:  <20081015082428.GE26536@server.vk2pj.dyndns.org>

next in thread | raw e-mail | index | archive | help

--ZJcv+A0YCCLh2VIg
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Last night, I attempted a full, compressed backup of my 181GB /home
(on a PATA disk) to a remote system.  The backup started at 2159 and
everything appeared normal until about 0040 when the system became
non-responsive and this lasted until the dump completed at 1033.  This
is the first full backup of /home I've made for several years (due to
lack of space).

I noticed the non-responsiveness at about 0500 when:
- The dump, gzip and fifo pipeline were running normally.
- A 'systat -v' I had started was running normally (though it
  reported an excessive number of 'D' processes).  Other values
  all appeared normal.
- No response to return key at a zsh prompt
- No response to up/down arrows in mutt
[above all done in pre-existing ssh sessions from another host]
- telnet to port 22 connected but didn't produce a banner.

The duration above is based on system logs - which show nothing
happened during this period.  At the end, there were various anomolous
entries:
Oct 15 10:33:27 server ntpd[750]: too many recvbufs allocated (40)
Oct 15 10:33:30 server sshd[947]: error: accept: Software caused connection=
 abort
Oct 15 10:33:34 server kernel: TCP: [192.168.123.123]:59516 to [192.168.123=
=2E200]:25 tcpflags 0x4<RST>; syncache_chkrst: Spurious RST without matchin=
g syncache entry (possibly syncookie only), segment ignored

Possibly useful information:
The dump pipeline was:
dump -uaL0 -C 32 -f - /home | reblock | gzip [stdout connected to socket
to remote server]
'reblock' is basically a 200MB FIFO I wrote to desynchronise the (often
I/O bound) dump from the CPU-bound gzip.

server% uname -a
FreeBSD server.vk2pj.dyndns.org 7.0-STABLE FreeBSD 7.0-STABLE #18: Sun May =
18 15:02:39 EST 2008     root@server.vk2pj.dyndns.org:/var/obj/k7/usr/src/s=
ys/server  i386
server% df -ki
Filesystem  1024-blocks      Used   Avail Capacity iused    ifree %iused  M=
ounted on
/dev/ad0s3d   204648864 181911710 6365246    97% 1703016 11353942   13%   /=
home

About the only think that happened at around this time was nightly
updates.  These start at 0005, fetching CTM cvs-cur updates, applying
them to /home/ncvs, then cvs updating /home/ports.  Looking at
timestamps, /home/ports/graphics/icod/CVS/Entries was updated at
0042 and /home/ports/graphics/imlib2_loaders/CVS/Entries (the next
entry) was updated at 1034.

Whilst /home is fairly full, I can't see that the snapshot meta and
rollback data would have occupied the 20GB free (and no 'out-of-space'
messages were generated).  Is there some limit on the number of inodes
that can be updated whilst a snapshot exists?

Has anyone else seen anything similar?
--=20
Peter Jeremy
Please excuse any delays as the result of my ISP's inability to implement
an MTA that is either RFC2821-compliant or matches their claimed behaviour.

--ZJcv+A0YCCLh2VIg
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.9 (FreeBSD)

iEYEARECAAYFAkj1qLwACgkQ/opHv/APuIeREACgpjCPVxERhgEs0D8grqn3uGc3
+28AniXCh990RNkp/msGrhs3CffIMBtV
=XdEX
-----END PGP SIGNATURE-----

--ZJcv+A0YCCLh2VIg--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20081015082428.GE26536>