Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 03 Feb 2010 10:48:57 +0100
From:      Attila Nagy <bra@fsn.hu>
To:        freebsd-fs@freebsd.org
Subject:   Machine stops for some seconds with ZFS
Message-ID:  <4B694689.2030704@fsn.hu>

next in thread | raw e-mail | index | archive | help
Hello,

After a long time, I've switched back to ZFS on my desktop. It runs 
8-STABLE/amd64 with two SATA disks and an USB pendrive.
One-one partition is used from each disk for the zpool, which is 
encrypted using GELI, and the pendrive is there for L2ARC:
    NAME            STATE     READ WRITE CKSUM
    data            ONLINE       0     0     0
      mirror        ONLINE       0     0     0
        ad0s1d.eli  ONLINE       0     0     0
        ad1s1d.eli  ONLINE       0     0     0
    cache
      da0           ONLINE       0     0     0

Today, after 12 days of uptime the machine has frozen. I could ping it 
from a different machine, even could open a telnet to its ssh port, but 
I couldn't get the ssh banner.

Now I'm building a 9-CURRENT kernel and world to see whether the same 
problem persists with that, and during the make process I've noticed a 
strange thing.
I build with -j4 (the machine has one dual core CPU), so the fans are 
screaming during the process. But every few minutes (I couldn't 
recognize any patterns in it) the machine goes completely silent (even 
more silent than normally), and everything halts.
During this, the top running on the machine can refresh itself, and I 
can type on pass through ssh connections (that is, I use the machine in 
question to access other machines with ssh), but I can't open new ssh 
connections to it, and can't start anything new (for example from an 
open shell).
ping is running seamlessly during this, and top shows the following:

last pid: 36503;  load averages:  1.59,  3.04,  3.01    up 0+00:49:53  
10:32:10
97 processes:  1 running, 96 sleeping
CPU:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
Mem: 218M Active, 24M Inact, 639M Wired, 40M Cache, 6208K Buf, 1022M Free
Swap: 4096M Total, 4096M Free

  PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
 1342 root          1  44    0  3204K   620K select  0   0:02  0.00% make
 1424 root          1  44    0  3204K  1036K select  0   0:01  0.00% make
 1280 root          1  44    0 12540K  1900K select  0   0:01  0.00% 
hald-addon-storage
 1234 haldaemon     1  44    0 24116K  4464K select  0   0:01  0.00% hald
93600 root          1  44    0  3204K  1028K select  0   0:00  0.00% make
 1260 root          1  44    0 19704K  2688K select  0   0:00  0.00% 
hald-addon-mouse-sy
15142 bra           1  44    0  9332K  2864K CPU0    0   0:00  0.00% top
 1263 root          1  44    0 12540K  1896K cgticb  0   0:00  0.00% 
hald-addon-storage
94415 bra           1  44    0 37944K  4992K select  1   0:00  0.00% sshd
35837 root          1  44    0  5252K  2424K select  1   0:00  0.00% make
95361 bra           1  44    0 37944K  4992K select  1   0:00  0.00% sshd
35973 root          1  44    0  3204K  1772K select  0   0:00  0.00% make
  608 root          1  44    0  6892K  1436K select  1   0:00  0.00% syslogd
96928 root          1  44    0  3204K   728K select  0   0:00  0.00% make
94369 root          1  51    0 37944K  4584K sbwait  0   0:00  0.00% sshd
82631 root          1  50    0 37944K  4584K sbwait  0   0:00  0.00% sshd
16304 root          1  44    0 37944K  4576K zio->i  1   0:00  0.00% sshd
  951 _ntp          1  44    0  6876K  1692K select  0   0:00  0.00% ntpd
 1238 root          1  76    0 16768K  2372K select  0   0:00  0.00% 
hald-runner
 4916 root          1  44    0  3204K   728K select  1   0:00  0.00% make
95338 root          1  49    0 37944K  4584K sbwait  1   0:00  0.00% sshd
 1259 root          1  44    0 10280K  2712K pause   1   0:00  0.00% csh
33357 bra           1  44    0 21596K  4004K select  0   0:00  0.00% ssh
16405 bra           1  44    0 37944K  5012K zio->i  0   0:00  0.00% sshd
 1044 root          1  44    0  9104K  1796K kqread  0   0:00  0.00% master
34765 root          1  76    0  8260K  1764K wait    1   0:00  0.00% sh
82685 bra           1  44    0 37944K  4960K select  1   0:00  0.00% sshd
 1065 postfix       1  44    0  9100K  1872K kqread  0   0:00  0.00% qmgr
 1237 root         17  44    0 27460K  4124K waitvt  0   0:00  0.00% 
console-kit-daemon
95362 bra           1  44    0 10216K  2612K ttyin   0   0:00  0.00% bash
34764 root          1  44    0  3204K   852K select  0   0:00  0.00% make
 1222 root          1  49    0 21672K  1896K wait    0   0:00  0.00% login
35728 root          1  44    0  3204K   860K select  0   0:00  0.00% make
 1064 postfix       1  44    0  9104K  1772K zio->i  1   0:00  0.00% pickup
82696 bra           1  44    0 10216K  2596K wait    0   0:00  0.00% bash
94417 bra           1  44    0 10216K  2596K wait    1   0:00  0.00% bash
35455 root          1  44    0  3204K   744K select  0   0:00  0.00% make
35774 root          1  44    0  3204K   728K select  1   0:00  0.00% make
16409 bra           1  44    0 10216K  2592K ttyin   0   0:00  0.00% bash
 1155 root          1  44    0  7948K  1604K nanslp  0   0:00  0.00% cron
 1077 messagebus    1  53    0  8092K  2060K select  0   0:00  0.00% 
dbus-daemon
 1149 root          1  44    0 26012K  3960K select  1   0:00  0.00% sshd
35729 root          1  76    0  8260K  1760K wait    0   0:00  0.00% sh
 4921 root          1  57    0  8260K  1748K wait    0   0:00  0.00% sh
  825 root          1  76    0 39212K  2372K lockf   1   0:00  0.00% 
saslauthd
35460 root          1  76    0  8260K  1748K wait    0   0:00  0.00% sh
34761 root          1  48    0  8260K  1740K wait    1   0:00  0.00% sh
96923 root          1  50    0  8260K  1740K wait    0   0:00  0.00% sh


As you can see, top reports that the machine is 100% idle, while a make 
-j4 buildworld runs. This lasts for few seconds (10-20), then everything 
goes back to normal, the fans start to scream, the build continues and I 
can use the machine.
This occasional halt is new to me -but I'm just switched to ZFS on my 
desktop, in a server it's harder to notice if you don't use it for 
interactive sessions-, but I could see the final freeze on more than one 
servers.
How could I help to debug this, and the final one?

Thanks,



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4B694689.2030704>