Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 28 Feb 2011 04:02:25 -0500
From:      Michael Powell <nightrecon@hotmail.com>
To:        freebsd-questions@freebsd.org
Subject:   Re: server  drop network connections
Message-ID:  <ikfo6d$slr$1@dough.gmane.org>
References:  <410789.77875.qm@web120601.mail.ne1.yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Lep Names wrote:

>   Hello.  I have so strange trouble:  every week my server drop all
>   network
> connections - ssh,ping etc.  But it continue working.  tech support can
> access it over kvm.
> after reboot everything works fine for a week.  it seems to me that it's
> trouble in mbufs.
> 
> FreeBSD  8.1-RELEASE-p2
> 
> sysctl.conf:
> security.bsd.see_other_uids=0
> kern.ipc.somaxconn=2048
> net.inet.icmp.drop_redirect=1
> #net.inet.icmp.log_redirect=1
> net.inet.tcp.blackhole=2
> net.inet.tcp.drop_synfin=1
> net.inet.tcp.sendspace=131072
> net.inet.tcp.recvspace=65536
> net.inet.udp.recvspace=32768
> kern.fallback_elf_brand=-1
> net.inet.ip.maxfragpackets=1024
> kern.sync_on_panic=1
> vfs.ufs.dirhash_maxmem=100000000
> kern.polling.burst_max=1000
> kern.polling.each_burst=1000
> kern.polling.reg_frac=100
> kern.polling.user_frac=1
> kern.maxvnodes=256000
> net.inet.ip.intr_queue_maxlen=256
> #dev.em.0.rx_processing_limit=1000
> #dev.em.1.rx_processing_limit=1000
> net.inet.tcp.recvbuf_auto=0
> net.inet.tcp.sendbuf_auto=0
> net.inet.tcp.tso=0
> net.isr.direct=1
> net.route.netisr_maxqlen=1024
> #net.inet.flowtable.nmbflows=8192
> kern.ipc.nmbclusters=65536
> net.inet.ip.portrange.first=1024
> net.inet.ip.portrange.hifirst=1024
> net.inet.tcp.hostcache.expire=1200
> net.inet.tcp.fast_finwait2_recycle=1
> net.inet.tcp.finwait2_timeout=3000
> net.inet.tcp.keepinit=5000
> net.inet.tcp.nolocaltimewait=1
> net.inet.tcp.maxtcptw=65536
> net.inet.tcp.msl=3000
> kern.coredump=1
> kern.random.sys.harvest.interrupt=0
> kern.random.sys.harvest.ethernet=0
> net.inet.udp.blackhole=1
> 
> netstat -m
> 868/1052/1920 mbufs in use (current/cache/total)
> 715/923/1638/65536 mbuf clusters in use (current/cache/total/max)
> 709/443 mbuf+clusters out of packet secondary zone in use (current/cache)
> 0/35/35/12800 4k (page size) jumbo clusters in use
> (current/cache/total/max) 0/0/0/6400 9k jumbo clusters in use
> (current/cache/total/max) 0/0/0/3200 16k jumbo clusters in use
> (current/cache/total/max) 1647K/2249K/3896K bytes allocated to network
> (current/cache/total) 0/0/0 requests for mbufs denied
> (mbufs/clusters/mbuf+clusters) 0/0/0 requests for jumbo clusters denied
> (4k/9k/16k) 139/313/6656 sfbufs in use (current/peak/max)
> 0 requests for sfbufs denied
> 0 requests for sfbufs delayed
> 4031 requests for I/O initiated by sendfile
> 0 calls to protocol drain routines
> 
> it seems to me that i must enlarge 1920 value, but i do not know how.
>          Thanks

You may wish to try adding (or changing) these to /etc/sysctl.conf and 
reboot:

kern.ipc.nmbclusters=32768
kern.ipc.somaxconn=4096               
kern.ipc.shmmax=67108864
kern.ipc.shmall=32768
kern.ipc.maxsockbuf=4194304

I see your nmbclusters is already larger than mine. I don't remember the 
exact relationship right off the top of my head (it's in the docs), but 
there is a ratio releationship between nmbclusters and some of the other 
parameters. IIRC increasing nmbclusters means increasing these others in 
proportion as well.

And possibly consider these maybe too:

net.inet.tcp.recvbuf_max=16777216
net.inet.tcp.recvspace=131072        
net.inet.tcp.sendbuf_max=16777216 
net.inet.tcp.sendspace=131072  

If these kinds of changes only make it so the problem continues to surface, 
but at a different time interval, you might want to search the lists (-
stable and -current, as well as bug tracker) for similar troubles 
experienced by others. I believe I have seen a couple of reports which sound 
similar to what you're describing.

If you locate such, pay particular attention to the specific hardware NIC 
and driver combination. If it is exactly the same as yours and a patch has 
been created which resolves the problem check and see if it has been MFC'd 
to -stable. In such a case (where you have _exactly_ the same problem) a 
possible solution is to then upgrade your box to -stable. I don't 
necessarily recommend blindly 'trying' -stable just to see what happens on a 
production box - it is possible to create new problems as a result. But if 
there exists a fix for exactly the problem that's where you'll likely find 
it.

-Mike
      






Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?ikfo6d$slr$1>