Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 9 Jun 2010 11:28:52 -0400 (EDT)
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Anders Nordby <anders@FreeBSD.org>
Cc:        freebsd-fs@FreeBSD.org
Subject:   Re: Odd network issues on ZFS based NFS server
Message-ID:  <Pine.GSO.4.63.1006091119410.23896@muncher.cs.uoguelph.ca>
In-Reply-To: <20100609122517.GA16231@fupp.net>
References:  <20100608083649.GA77452@fupp.net> <Pine.GSO.4.63.1006081946040.8742@muncher.cs.uoguelph.ca> <20100609122517.GA16231@fupp.net>

next in thread | previous in thread | raw e-mail | index | archive | help


On Wed, 9 Jun 2010, Anders Nordby wrote:

>
> Thanks. The only thing that (temporarily) solves this issue so far is
> rebooting, which helps only for a day or so. I have tried different
> NICs, replacing the physical server, replacing cables, changing and
> resetting switch ports. But it did not help, so I think this is a
> software problem. I will try zio_use_uma = 0 I think, and then try to
> limit vfs.zfs.arc_max to 100 MB or so.
>

When you tried a different NIC, was a different type (ie. different
chipset that uses a different device driver)? I suggested that not
because I thought the hardware was broken but because I thought it
might be related to the network interface's device driver and switching
to a different device driver would isolate that possibility.

> On the ZFS+NFS server while having these issues:
>
> root@unixfile:~# netstat -m
> 1293/4602/5895 mbufs in use (current/cache/total)
> 1109/3619/4728/65536 mbuf clusters in use (current/cache/total/max)
> 257/1023 mbuf+clusters out of packet secondary zone in use
> (current/cache)
> 0/104/104/12800 4k (page size) jumbo clusters in use
> (current/cache/total/max)
> 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max)
> 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max)
> 2541K/8804K/11345K bytes allocated to network (current/cache/total)
> 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
> 0/0/0 requests for jumbo clusters denied (4k/9k/16k)
> 0/0/0 sfbufs in use (current/peak/max)
> 0 requests for sfbufs denied
> 0 requests for sfbufs delayed
> 0 requests for I/O initiated by sendfile
> 0 calls to protocol drain routines
>
> Packet loss seen from my workstation:
>
> anders@noname:~$ ping unixfile
> PING unixfile.aftenposten.no (192.168.120.33) 56(84) bytes of data.
> 64 bytes from unixfile.aftenposten.no (192.168.120.33): icmp_seq=1
> ttl=63 time=0
> .230 ms
> 64 bytes from unixfile.aftenposten.no (192.168.120.33): icmp_seq=3
> ttl=63 time=0
> .262 ms
> 64 bytes from unixfile.aftenposten.no (192.168.120.33): icmp_seq=5
> ttl=63 time=0
> .272 ms
> 64 bytes from unixfile.aftenposten.no (192.168.120.33): icmp_seq=6
> ttl=63 time=0
> .203 ms
> 64 bytes from unixfile.aftenposten.no (192.168.120.33): icmp_seq=7
> ttl=63 time=0
> .306 ms
> 64 bytes from unixfile.aftenposten.no (192.168.120.33): icmp_seq=9
> ttl=63 time=0
> .309 ms

Well, it doesn't seem to be mbuf exhaustion (I don't know what
"out of packet secondary zone" means, I'll have to look at that) and
if it doesn't handle pings it seems really hosed. Have you done a
"vmstat 5" + "ps axlH" (or similar) to try and see what it's doing?
("top" and "netstat" might also help?)

If you can figure out where it's spinning its wheels, that might
at least give us a hint w.r.t. the problem.

Good luck with it, rick




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.GSO.4.63.1006091119410.23896>