Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 8 Jan 2013 02:12:31 +0200
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Dmitry Morozovsky <marck@rinet.ru>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: zfs -> ufs rsync: livelock in wdrain state
Message-ID:  <20130108001231.GB82219@kib.kiev.ua>
In-Reply-To: <alpine.BSF.2.00.1301080013520.7949@woozle.rinet.ru>
References:  <alpine.BSF.2.00.1301080013520.7949@woozle.rinet.ru>

next in thread | previous in thread | raw e-mail | index | archive | help

--gPVs24VLDFKgHP1I
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Tue, Jan 08, 2013 at 12:19:15AM +0400, Dmitry Morozovsky wrote:
> Dear colleagues,
>=20
> I have archive server with pretty large ZFS (24*2T in single raidz2 raidg=
roup)
>=20
> Sometimes we moved really old archives to external SATA drives, which are=
=20
> formatted with UFS2/SU.  Files are copied via rsync
>=20
> The system in question is stable/8; upgrade to stable/9 is planned, but n=
ot yet=20
> completed.
>=20
> Now, during last rsync, the process is stuck as
>=20
> dump.2012062219.bin.gz
>   3208015437 100%  102.42MB/s    0:00:29 (xfer#66, to-check=3D196/721)
> dump.2012062220.bin.gz
> load: 0.01  cmd: rsync 47543 [wdrain] 1904.69r 443.01u 241.12s 0% 1736k
> ^C
> rsync error: received SIGINT, SIGTERM, or SIGHUP (code 20) at rsync.c(645=
)=20
> [sender=3D3.0.9]
>=20
> As we can see, rsync writer stops in wdrain state.
>=20
> I terminated it by ^C in terminal session, as it was not autogenerated=20
> backup.
>=20
> Now, zfs and other system is working seemingly well, but trying to sync=
=20
> manually stucks console forever:
>=20
> root@moose:/ar# sync
> load: 0.00  cmd: sync 67229 [wdrain] 468.17r 0.00u 0.00s 0% 596k
>=20
> Any hints? Quick searching throug freebsd mailing lists and/or open PRs d=
oes=20
> not reveal much.
>=20

Are there any kernel messages about the disk system ?

The wdrain means that the amount of the dirty buffers accumulated exceeds
the allowed maximum. The transient 'wdrain' state is normal on a machine
doing lot of writes to a filesystem using buffer cache, say UFS. Failure
to clean the dirty buffers is usually related to the disk i/o stalling.

It cannot be denied that a bug could cause stuck 'wdrain' state, but
in the last five or so years all the cases I investigated were due to
disks.

--gPVs24VLDFKgHP1I
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (FreeBSD)

iQIcBAEBAgAGBQJQ62RuAAoJEJDCuSvBvK1BQbAP/2bUyXPL/GfvgXG/GiaIWBZm
75vlOyeNlQ7+zAR+Z++BmQUCnNPCSAbzEDlmfJ4nxcCCFBG/2slDdcHUsMr6osu5
/20G9UaBRt+tvjhlXiIAU6JgIKyv3o/DDEVTd4RW1lJmVDlFPQVqD9EK4tq/HITf
BefQVznBHZHCyBs93YapOtghpJak81/nIMBTwLHe2lTuMTRaP1R8lhqK8TeputHr
FcC70CyBwPz1oJqyHVu1fOcqMUWXZOGn0rlYmtv236Ba8z7W5p8wiSw70o4JSrqJ
KN4rTzwtC8NsG7c/TaeAqzrMeSnvjBMwIC9SuoK1xhxUZxzCrZklrQEgaVeO2g6V
BH4+1yEZDUPdXBvS+7TKA2fHd8cGdGFnil4mkMY2xRt9zpOPg5rrNP0Ubc4/3C+d
wDj0LKPE/Uiq2LFlJQxg8cD8yyzoIb7T+4AuFqelGnwkvpgbbq7AQtXedY8afwBq
qdeW2Zb3l3qMsF/IUoa1UFtQNPK4hLfcOuATVTPGufyCOwLwNIq13EQwsTQaxJc5
v9l9cU4m3pUybqAGFfMYkM7/W2jd/v9dfMhN9P2pz8HP5UzyoWNfMNYaNaYmd5eZ
OeeHyOmPYpkMWlAK/ok+AIDV+qOxynqM532BzK85uk4BWM7Hi8yncT2wxer9N+NZ
t5O43VdHbtTQIut0ZWPs
=Urrw
-----END PGP SIGNATURE-----

--gPVs24VLDFKgHP1I--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130108001231.GB82219>