Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 02 Jul 2014 11:11:53 -0400
From:      Bob Healey <healer@rpi.edu>
To:        freebsd-stable@freebsd.org
Subject:   Interactions with mxge, pf, nfsd, and the kernel
Message-ID:  <53B42139.302@rpi.edu>

next in thread | raw e-mail | index | archive | help
Hello.

I've been wrestling with this on and off for a few months now.  I have 
an assortment of systems (some Dell Poweredge R515, R610, and IBM 
x3630M3) with 10 gig Myricom ethernet cards acting as nfs servers to 
Linux HPC compute clusters (12-36 nodes, 384 - 480 cores) connected via 
gigabit ethernet.  They are also connected to the outside world via 
onboard bce (Dell) or igb (IBM).  After a variable length of time, I 
will lose all network access to a host. Connecting via console, the 
machine tends to be fully responsive.  A reboot clears the problem, but 
I have yet to figure out any sysctls/loader.conf tunables to clear the 
problem and make it stay away.  PF is in use to restrict access to the 
host to a pair of public /24's, and to 10/8.  If there is a way in zfs's 
sharenfs property to make that restriction, I'd be happy to change, but 
I really don't like leaving nfs open to the university's quartet of 
/16's, so PF it is.  The vlan2 interface has mxge0 as its parent.

Thanks for any help.

This host is getting ready to crash soon, based on netstat.
root@husker:~ # netstat -i
Name    Mtu Network       Address              Ipkts Ierrs Idrop Opkts 
Oerrs  Coll
mxge0  9000 <Link#1>      00:60:dd:44:d2:0a  6358280   262 0  
4061637     0     0
mxge0  9000 fe80::260:ddf fe80::260:ddff:fe        0     - -        
2     -     -
bce0   1500 <Link#2>      08:9e:01:50:a1:ac   276391     0 0        
0     0     0
bce0   1500 fe80::a9e:1ff fe80::a9e:1ff:fe5        0     - -        
3     -     -
bce1   1500 <Link#3>      08:9e:01:50:a1:ad 2229709391 16921     0 
1182942116     0     0
bce1   1500 128.113.12.0  husker            2226254093     -     - 
1183962005     -     -
bce1   1500 fe80::a9e:1ff fe80::a9e:1ff:fe5        0     - -        
3     -     -
lo0   16384 <Link#4>                            2030     0 0     
2030     0     0
lo0   16384 localhost     ::1                      4     - -        
4     -     -
lo0   16384 fe80::1%lo0   fe80::1                  0     - -        
0     -     -
lo0   16384 your-net      localhost             2026     -     - 
2026     -     -
vlan2  9000 <Link#5>      00:60:dd:44:d2:0a  4387250     0 0  
3060586     0     0
vlan2  9000 10.2.3.0      husker.galactica.  4370309     -     - 
3963931     -     -
vlan2  9000 fe80::260:ddf fe80::260:ddff:fe        0     - -        
2     -     -
vlan2  9000 <Link#6>      00:60:dd:44:d2:0a  1971034     0 0  
1001061     0     0
vlan2  9000 10.2.4.0      husker.enterprise  1700742     -     - 
1961891     -     -
vlan2  9000 fe80::260:ddf fe80::260:ddff:fe        0     - -        
4     -     -
root@husker:~ # netstat -im
6157/3233/9390 mbufs in use (current/cache/total)
4081/1883/5964/1018800 mbuf clusters in use (current/cache/total/max)
4080/795 mbuf+clusters out of packet secondary zone in use (current/cache)
0/5/5/509399 4k (page size) jumbo clusters in use (current/cache/total/max)
512/23/535/150933 9k jumbo clusters in use (current/cache/total/max)
0/0/0/84899 16k jumbo clusters in use (current/cache/total/max)
14309K/4801K/19110K bytes allocated to network (current/cache/total)
10/1883/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
2/1736/0 requests for jumbo clusters denied (4k/9k/16k)
0 requests for sfbufs denied
0 requests for sfbufs delayed
0 requests for I/O initiated by sendfile
root@husker:~ # uptime
11:07AM  up 23 days, 19:27, 1 user, load averages: 0.14, 0.17, 0.13
root@husker:~ # sysctl -a | grep nmb
kern.ipc.nmbclusters: 1018800
kern.ipc.nmbjumbop: 509399
kern.ipc.nmbjumbo9: 452799
kern.ipc.nmbjumbo16: 339596
kern.ipc.nmbufs: 6520320
root@husker:~ # cat /boot/loader.conf
zfs_load="YES"
amdtemp_load="YES"
if_mxge_load="YES"
mxge_ethp_z8e_load="YES"
mxge_eth_z8e_load="YES"
mxge_rss_ethp_z8e_load="YES"
mxge_rss_eth_z8e_load="YES"
vfs.zfs.arc_max="12288M"
root@husker:~ # cat /var/run/dmesg.boot | head -16
Copyright (c) 1992-2014 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
         The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 10.0-RELEASE-p4 #0: Tue Jun  3 13:14:57 UTC 2014
     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64
FreeBSD clang version 3.3 (tags/RELEASE_33/final 183502) 20130610
CPU: AMD Opteron(tm) Processor 4122 (2200.07-MHz K8-class CPU)
   Origin = "AuthenticAMD"  Id = 0x100f80  Family = 0x10  Model = 0x8  
Stepping = 0
Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
   Features2=0x802009<SSE3,MON,CX16,POPCNT>
   AMD 
Features=0xee500800<SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM,3DNow!+,3DNow!>
   AMD 
Features2=0x837ff<LAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,IBS,SKINIT,WDT,NodeId>
   TSC: P-state invariant
real memory  = 17179869184 (16384 MB)
avail memory = 16588054528 (15819 MB)


-- 
Bob Healey
Systems Administrator
Biocomputation and Bioinformatics Constellation
and Molecularium
healer@rpi.edu
(518) 276-4407




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?53B42139.302>