Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 30 Mar 2001 11:17:24 +0100 (BST)
From:      Andrew Gordon <arg@arg1.demon.co.uk>
To:        freebsd-stable@freebsd.org
Subject:   NFS problems in 4.3-RC (maybe Vinum?)
Message-ID:  <Pine.BSF.4.21.0103301006100.1651-100000@server.arg.sj.co.uk>

next in thread | raw e-mail | index | archive | help


On Wednesday, I upgraded an NFS server to 4.3-RC.  This machine has an IDE
drive with the system partitions and a Vinum RAID5 on 5 SCSI drives for
/home which is the main NFS export, plus a single SCSI drive (non-Vinum)
exported as /cd.  Soft updates are enabled everywhere except the Vinum
volume.

The server had been running 4.2-STABLE without problems
since mid-January (at which time there were some Vinum-related panics, but
nothing like the current behaviour).

Since the upgrade, it has failed 4 times:

 1)  Apparently stopped serving NFS to one client - tcpdump showed
     incoming UDP from that client but no replies.  Server rebooted
     cleanly and problem went away.

 2)  Stopped providing NFS service to any clients.  On reboot,
     "syncing disks... 5 1 1 1 1 1 1 1 1 1 1 1 1 1 giving up on 1 buffers"
     The automatic fsck on all the filesystems threw up one error on
     /home (INCORRECT BLOCK COUNT I=12634345 (2 should be 0)),
     suggesting that the un-flushed block was in the Vinum volume.     

 3)  Stopped serving NFS.  This time I noticed on ps that the nfsd
     processes were all stuck:

  0  523    1   0   2  0   360  180 accept Is  ??  0:00.00 nfsd: master
  0  525  523   0  -2  0   352  172 getblk D   ??  0:06.24 nfsd: server
  0  526  523   0 -14  0   352  172 inode  D   ??  0:00.07 nfsd: server
  0  527  523   0 -14  0   352  172 inode  D   ??  0:00.01 nfsd: server
  0  528  523   0 -14  0   352  172 inode  D   ??  0:00.01 nfsd: server

     A reboot hung the machine: ctrl-T gave:

load: 0.00  cmd: reboot 62014 [inode] 0.00u 0.00s 0% 252k

     After a hard reset, the fsck gave three "incorrect block count"
     errors on /home (also one unref file in /var), but again came up
     without needing manual fsck.

 4)  As for 2), except that this time the fsck found nothing wrong
     on /home, but a load of unref files on /var.  A 'ps' before
     doing the reboot showed the nfsd processes stuck again:

  0  264     1   0   2  0  360  132 accept Is  ??  0:00.00 nfsd: master
  0  266   264   0 -14  0  352  124 inode  D   ??  0:06.15 nfsd: server
  0  267   264   0 -14  0  352  124 inode  D   ??  0:00.26 nfsd: server
  0  268   264   0 -14  0  352  124 inode  D   ??  0:00.02 nfsd: server
  0  269   264   0 -14  0  352  124 inode  D   ??  0:00.04 nfsd: server



The load on the machine would have been much lower than usual, since most
of the users are on holiday (which is why I did the upgrade in the first
place).  The only thing that has changed apart from the upgrade is that
the /cd filesystem, while present on the machine for some time and full of
data, would not have been used until this week as various clients were
re-configured to use it; however it doesn't seem particularly involved
(and also one of the failures happened around 02:00 when all of the
machines mounting /cd were powered off: there would only have been me
(logged into another machine that mounts /home) and various cron jobs
active at the time.

I say "maybe Vinum?" in the subject since the main NFS export is on a
Vinum RAID5, but there isn't really any evidence to suggest Vinum is to
blame.


I re-cvsuped this morning in case a fix had appeared; I haven't rebuilt
yet, but none of the diffs look at all relevant:

U contrib/sendmail/FREEBSD-upgrade
U lib/libc/gen/glob.c
U release/sysinstall/main.c
U sys/dev/vinum/vinumconfig.c
U sys/net/if.c
U sys/net/if_vlan.c
U sys/netinet/if_ether.c
U sys/netinet/ip_icmp.c
U sys/netinet/tcp_subr.c
U usr.bin/fetch/fetch.c
U usr.bin/netstat/if.c
U usr.sbin/ppp/bundle.c
U usr.sbin/ppp/ether.c
U usr.sbin/ppp/iface.c
U usr.sbin/ppp/iface.h
U usr.sbin/ppp/ppp.8



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.21.0103301006100.1651-100000>