Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 12 Dec 2017 19:02:29 +0000
From:      bugzilla-noreply@freebsd.org
To:        freebsd-bugs@FreeBSD.org
Subject:   [Bug 224292] processes are hanging in state ufs / possible deadlock in file system
Message-ID:  <bug-224292-8@https.bugs.freebsd.org/bugzilla/>

next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D224292

            Bug ID: 224292
           Summary: processes are hanging in state ufs / possible deadlock
                    in file system
           Product: Base System
           Version: CURRENT
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Only Me
          Priority: ---
         Component: kern
          Assignee: freebsd-bugs@FreeBSD.org
          Reporter: wosch@FreeBSD.org

A friend and I are testing perl modules on FreeBSD, on all the major versio=
ns
(12-current, 11-stable, 11-RELEASE). The tests are done in a cheap VM (1GB =
RAM,
40GB SSD, 1 core), apparently running on virtio/kvm.

While testing we noticed that the VM randomly crashes every 1-3 days, and we
had to run a hard fsck on the console. I first thought it is a resource
problem, but munin doesn=E2=80=99t show any shortage of RAM, disk or file d=
escriptors.

We find out that the crashes occurred shortly after processes hanging in st=
ate
=E2=80=9Cufs=E2=80=9D, as shown in top(1) output:

$ top -b 300 | egrep ufs
73278 root          1  31    0  8364K  2340K ufs      0:02  10.35% find
34548 cpansand      1  52   10 43828K 15348K ufs      0:00   0.00% perl
34824 cpansand      1  20    0 21060K  1960K ufs      0:00   0.00% procstat
34557 cpansand      1  52   10 13180K  2112K ufs      0:00   0.00% sh


$ ps xuawww | egrep 73278=20=20
root       73278   0.0  0.2   8364   2340  -  D    03:01       0:02.25 find=
 -sx
/ /dev/null ( ! -fstype local ) -prune -o -type f ( -perm -u+x -or -perm -g=
+x
-or -perm -o+x ) ( -perm -u+s -or -perm -g+s ) -exec ls -liTd {} +
root       73667   0.0  0.0    412    312  8  R+   03:08       0:00.00 egrep
73278

$ ps xuawww | egrep 34548=20=20
cpansand   34548   0.0  1.5  43828  15348  4  DN+  15:29       0:00.10
/usr/perl5.26.1p/bin/perl t/test_command.t


Another example:

$ lsof  | egrep 12710
perl5.26. 12710   cpansand  cwd     VDIR=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20
18446735278340920664    0 / -- home/cpansand/.cpan/build/2017112212/Fsdb-2.=
64-1
perl5.26. 12710   cpansand  rtd     VDIR=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20
7507057954083258955    0 / (/dev/vtbd0p3)
perl5.26. 12710   cpansand    0u    VCHR               0,90=20=20=20=20=20=
=20=20=20=20=20=20=20=20
0t20972   90 /dev/pts/0
perl5.26. 12710   cpansand    1u    PIPE 0xfffff8000399f168=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20
0      ->0xfffff8000399f000
perl5.26. 12710   cpansand    2u    PIPE 0xfffff8001c30da50=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20
0      ->0xfffff8001c30d8e8
perl5.26. 12710   cpansand    3u    PIPE 0xfffff8001c30ebe0=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20
16384      ->0xfffff8001c30ed48
perl5.26. 12710   cpansand    5u    PIPE 0xfffff8000399f168=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20
0      ->0xfffff8000399f000
perl5.26. 12710   cpansand    6u    PIPE 0xfffff8001c30da50=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20
0      ->0xfffff8001c30d8e8
perl5.26. 12710   cpansand    7u    PIPE 0xfffff8000399f168=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20
0      ->0xfffff8000399f000
perl5.26. 12710   cpansand    8u    PIPE 0xfffff8001c30da50=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20
0      ->0xfffff8001c30d8e8


$ top -b 300 | egrep ufs
12358 cpansand      1  20   10   268M 12980K ufs      0:04   0.00% perl
12710 cpansand      1  20   10 21412K 13808K ufs      0:00   0.00% perl5.26=
.0
56111 root          1  20    0  7752K  3180K ufs      0:00   0.00% bash
56077 root          1  52    0  6332K  2312K ufs      0:00   0.00% find
12719 cpansand      1  20   10  7060K  1472K ufs      0:00   0.00% sh


$ lsof | egrep '12710|12358' | egrep home
perl      12358   cpansand  cwd     VDIR=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20
18446735278340920664    0 / -- home/cpansand/.cpan/build/2017112212/Fsdb-2.=
64-1
perl5.26. 12710   cpansand  cwd     VDIR=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20
18446735278340920664    0 / -- home/cpansand/.cpan/build/2017112212/Fsdb-2.=
64-1


Perl often hangs in a directory during a build of a CPAN module. If I try to
switch to this directory, my current shell will hang too:

$ cd /home/cpansand/.cpan/build/2017112212/Fsdb-2.64-1
[hangs]

$ find /home/cpansand/
/home/cpansand/
/home/cpansand/.login_conf
/home/cpansand/.cshrc
/home/cpansand/.rhosts
/home/cpansand/.shrc
/home/cpansand/.mail_aliases
/home/cpansand/.profile
/home/cpansand/.mailrc
/home/cpansand/.login
/home/cpansand/.ssh
/home/cpansand/.ssh/authorized_keys
/home/cpansand/.ssh/known_hosts
/home/cpansand/.zshrc
/home/cpansand/bin
/home/cpansand/src
/home/cpansand/.cpan
/home/cpansand/.cpan/build
/home/cpansand/.cpan/build/2017112215
/home/cpansand/.cpan/build/2017112212
[ hangs]


$ top -b 300 | egrep ufs
12358 cpansand      1  20   10   268M 12980K ufs      0:04   0.00% perl
12710 cpansand      1  20   10 21412K 13808K ufs      0:00   0.00% perl5.26=
.0
56111 root          1  20    0  7752K  3180K ufs      0:00   0.00% bash
56077 root          1  52    0  6332K  2312K ufs      0:00   0.00% find
57160 root          1  20    0  6332K  1792K ufs      0:00   0.00% find
57169 root          1  20    0  6332K  1792K ufs      0:00   0.00% find
12719 cpansand      1  20   10  7060K  1472K ufs      0:00   0.00% sh


$ pstree
|-+=3D 00675 root /usr/sbin/cron -s
 | \-+- 55939 root cron: running job (cron)
 |   \-+=3D 55941 root /bin/sh - /usr/sbin/periodic daily
 |     \-+- 55943 root lockf -t 0 /var/run/periodic.daily.lock /bin/sh
/usr/sbin/periodic LOCKED daily
 |       \-+- 55944 root /bin/sh /usr/sbin/periodic LOCKED daily
 |         |-+- 55953 root /bin/sh /usr/sbin/periodic LOCKED daily
 |         | \-+- 56059 root /bin/sh /etc/periodic/daily/450.status-security
 |         |   \-+- 56060 root /bin/sh - /usr/sbin/periodic security
 |         |     \-+- 56062 root lockf -t 0 /var/run/periodic.security.lock
/bin/sh /usr/sbin/periodic LOCKED security
 |         |       \-+- 56063 root /bin/sh /usr/sbin/periodic LOCKED securi=
ty
 |         |         |-+- 56071 root /bin/sh /usr/sbin/periodic LOCKED secu=
rity
 |         |         | \-+- 56073 root /bin/sh -
/etc/periodic/security/100.chksetuid
 |         |         |   |--- 56077 root find -sx / /dev/null ( ! -fstype l=
ocal
) -prune -o -type f ( -perm -u+x -or -perm -g+x -or -perm -o+x ) ( -perm -u=
+s
-or -perm -g+s ) -exec ls -liTd {} +
 |         |         |   \-+- 56078 root /bin/sh -
/etc/periodic/security/100.chksetuid
 |         |         |     \--- 56080 root cat


     \--=3D 12358 cpansand /usr/perl5.26.0p/bin/perl5.26.0 (CPAN.pm test
J/JO/JOHNH/Fsdb-2.64.tar.gz) (perl)

 |-+- 12710 cpansand /usr/perl5.26.0p/bin/perl5.26.0 t/test_command.t
 | \--- 12719 cpansand sh -c blib/script/cgi_to_db TEST/cgi_to_db_ex.in
TEST/cgi_to_db_ex.in-2  < TEST/cgi_to_db_ex.in 2>&1



root       56077   0.0  0.2   6332   1936  -  D    03:01      0:00.02 find =
-sx
/ /dev/null ( ! -fstype local ) -prune -o -type f ( -perm -u+x -or -perm -g=
+x
-or -perm -o+x ) ( -perm -u+s -or -perm -g+s ) -exec ls -liTd {} +

cpansand   12710   0.0  1.4  21412 13780  0  DN+  14:11      0:00.10
/usr/perl5.26.0p/bin/perl5.26.0 t/test_command.t


$ ps -o wchan -auxgww | grep ufs=20=20=20=20
-        root          19   0.0  0.0      0    16  -  DL   Wed02      0:03.=
12
[bufspacedaemon]
ufs      root       56077   0.0  0.2   6332  1936  -  D    03:01      0:00.=
02
find -sx / /dev/null ( ! -fstype local ) -prune -o -type f ( -perm -u+x -or
-perm -g+x -or -perm -o+x ) ( -perm -u+s -or -perm -g+s ) -exec ls -liTd {}=
 +
ufs      cpansand   12358   0.0  0.7 274408  7280  0  DNs+ 14:11      0:03.=
89
/usr/perl5.26.0p/bin/perl5.26.0 (CPAN.pm test J/JO/JOHNH/Fsdb-2.64.tar.gz)
(perl)
ufs      cpansand   12710   0.0  1.4  21412 13780  0  DN+  14:11      0:00.=
10
/usr/perl5.26.0p/bin/perl5.26.0 t/test_command.t
ufs      cpansand   12719   0.0  0.1   7060  1444  0  DN+  14:11      0:00.=
00
sh -c blib/script/cgi_to_db TEST/cgi_to_db_ex.in TEST/cgi_to_db_ex.in-2  <
TEST/cgi_to_db_ex.in 2>&1
-        root       78756   0.0  0.0    412   304  7  R+   04:18      0:00.=
00
grep ufs
ufs      root       56111   0.0  0.3   7752  2688  8  Ds+  03:03      0:00.=
02
-bash (bash)
ufs      root       57160   0.0  0.1   6332  1416 10  D+   03:15      0:00.=
00
find .
ufs      root       57169   0.0  0.1   6332  1416 11  D+   03:15      0:00.=
00
find /home/cpansand/


The perl interpreter usually crashed while testing the CPAN module
J/JO/JOHNH/Fsdb-2.64.tar.gz in test t/test_command.t
But I think this is a red herring. The module just runs fine if you test on=
ly
this module. It must be a combination of other, earlier test which finally
bring Fsdb-2.64.tar.gz down.

We saw this error on FreeBSD 9.x, 10.3, 11.1 Release, 11-stable, 12-current=
.=20

My gut feeling is that we have a deadlock or forgotten lock of a directory/=
file
in the ufs code. Some kernel thread forgot to release a lock. Any other pro=
cess
which need to open this directory will hang, and soon or later (2-4 hours) =
the
machine crashes hard.

Everytime the machine crashed we had to run fsck twice, repair inodes and t=
he
superblock on the console manually. This is not fun.

We saw the errors at different cloud providers. I=E2=80=99m pretty sure it =
is not a bad
setup, it is a general problem of the FreeBSD kernel/ufs filesystem.

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-224292-8>