Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 30 May 2002 18:58:37 +0200
From:      Andre Oppermann <oppermann@pipeline.ch>
To:        freebsd-net@freebsd.org
Cc:        freebsd-stable@freebsd.org
Subject:   FreeBSD kernel routing table, need statistics, please install this patch
Message-ID:  <3CF65A3C.915493B8@pipeline.ch>

next in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format.
--------------FF908FDE6605971918FB41DB
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Hi all,

while working on a design overhaul of the kernel routing table I was
inspecting the rt_metrics stuff a little bit closer. Then I checked
with some busier web servers to see how much effect the rt_metric
caching actually has. The result was not very clear. Some conntctions
never got cached.

The attached patch collects some statistics about the usage of the
rt_metrics on a system. Specifically it counts how many time it a new
tcp session has been established, how many times of those it found
useful cached rt_metrics and then how many times it updated those
metrics. The counters look like this (on a freshly booted system):

 # sysctl -a | grep tcp.rmx
 net.inet.tcp.rmxcachelookup: 3
 net.inet.tcp.rmxcachehit: 1
 net.inet.tcp.rmxcacheupdate: 2
 net.inet.tcp.rmxcachenoupdate: 0

Please apply the attached patch (against 4-STABLE) and after a couple
of hours/days please send me the output of:

 # uname -a
 # sysctl -a | grep tcp.rmx
 # netstat -m
 # netstat -ran | wc -l
 # decription main usage of your system (webserver/workstation/whatever)


I don't want to nuke it but I'd like to see how much it helps overall.
Then, because it's TCP specific, I'd like to move it out of the main
routing table (only MTU remains) and transform it into a hash table.
The rt_metrics are host specific so they only ever got used on host
routes and are wasting an enormous amount of space in the main routing
table.

Also the strategy of the rt_metrics caching is probably inapropriate
for todays world with many web servers. The problem is the rt_metrics
only get updated when a tcp session to/from that host closes and a
sufficient number of packets have been exchanged to make a mostly
accurate messurement of those parameters. Unfortunatly in todays world
the webbrowsers open a number of connections in very rapid succession
so there is no chance to have any cached values for the connections
after the first if not one of them closed already. The benefit is only
being seen when the user loads the next page and opens new tcp seesions.
Even that is being migitated by HTTP/1.1 keepalive and pipelining since
sessions are not closed anymore. A possible solution is to update the
rt_cache for the first time after a sufficient number of packets have
been exchanged to make a mostly accurate measurement. And then update
it after any n packets thereafter.

The here collected statistics and numbers will greatly help to determin
the best way how to adjust the rt_metrics to be most effective.

The patch applies against /usr/src/sys/netinet/tcp_[input.c|subr.c].

"Profile, don't speculate"

Many thanks for your cooperation!
-- 
Andre
--------------FF908FDE6605971918FB41DB
Content-Type: text/plain; charset=us-ascii;
 name="tcp_input.c.patch"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="tcp_input.c.patch"

--- tcp_input.c	Sun Apr 28 07:40:26 2002
+++ tcp_input.c.new	Thu May 30 18:18:20 2002
@@ -31,7 +31,7 @@
  * SUCH DAMAGE.
  *
  *	@(#)tcp_input.c	8.12 (Berkeley) 5/24/95
- * $FreeBSD: src/sys/netinet/tcp_input.c,v 1.107.2.23 2002/04/28 05:40:26 suz Exp $
+ * $FreeBSD: src/sys/netinet/tcp_input.c,v 1.107.2.23 2002/05/30 16:12:00 andre Exp $
  */
 
 #include "opt_ipfw.h"		/* for ipfw_fwd		*/
@@ -126,6 +126,16 @@
     &drop_synfin, 0, "Drop TCP packets with SYN+FIN set");
 #endif
 
+
+int rmxcachelookup = 0;
+SYSCTL_INT(_net_inet_tcp, OID_AUTO, rmxcachelookup,
+	    CTLFLAG_RD, &rmxcachelookup, 0, "RMX cache lookups");
+
+int rmxcachehit = 0;
+SYSCTL_INT(_net_inet_tcp, OID_AUTO, rmxcachehit,
+	    CTLFLAG_RD, &rmxcachehit, 0, "RMX cache hits");
+
+
 struct inpcbhead tcb;
 #define	tcb6	tcb  /* for KAME src sync over BSD*'s */
 struct inpcbinfo tcbinfo;
@@ -2521,7 +2531,13 @@
 	 * or rttvar.  Convert from the route-table units
 	 * to scaled multiples of the slow timeout timer.
 	 */
+
+	++rmxcachelookup;
+
 	if (tp->t_srtt == 0 && (rtt = rt->rt_rmx.rmx_rtt)) {
+
+		++rmxcachehit;
+
 		/*
 		 * XXX the lock bit for RTT indicates that the value
 		 * is also a minimum value; this is subject to time.

--------------FF908FDE6605971918FB41DB
Content-Type: text/plain; charset=us-ascii;
 name="tcp_subr.c.patch"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="tcp_subr.c.patch"

--- tcp_subr.c	Sun Apr 14 06:02:30 2002
+++ tcp_subr.c.new	Thu May 30 18:19:12 2002
@@ -31,7 +31,7 @@
  * SUCH DAMAGE.
  *
  *	@(#)tcp_subr.c	8.2 (Berkeley) 5/24/95
- * $FreeBSD: src/sys/netinet/tcp_subr.c,v 1.73.2.25 2002/04/14 04:02:30 silby Exp $
+ * $FreeBSD: src/sys/netinet/tcp_subr.c,v 1.73.2.25 2002/05/30 16:12:00 andre Exp $
  */
 
 #include "opt_compat.h"
@@ -144,6 +144,16 @@
 SYSCTL_INT(_net_inet_tcp, OID_AUTO, isn_reseed_interval, CTLFLAG_RW,
     &tcp_isn_reseed_interval, 0, "Seconds between reseeding of ISN secret");
 
+
+int rmxcacheupdate = 0;
+SYSCTL_INT(_net_inet_tcp, OID_AUTO, rmxcacheupdate,
+	    CTLFLAG_RD, &rmxcacheupdate, 0, "RMX cache update");
+
+int rmxcachenoupdate = 0;
+SYSCTL_INT(_net_inet_tcp, OID_AUTO, rmxcachenoupdate,
+	    CTLFLAG_RD, &rmxcachenoupdate, 0, "RMX cache no update");
+
+
 static void	tcp_cleartaocache __P((void));
 static void	tcp_notify __P((struct inpcb *, int));
 
@@ -638,6 +648,8 @@
 		    == INADDR_ANY)
 			goto no_valid_rt;
 
+		++rmxcacheupdate;
+
 		if ((rt->rt_rmx.rmx_locks & RTV_RTT) == 0) {
 			i = tp->t_srtt *
 			    (RTM_RTTUNIT / (hz * TCP_RTT_SCALE));
@@ -710,7 +722,9 @@
 				rt->rt_rmx.rmx_ssthresh = i;
 			tcpstat.tcps_cachedssthresh++;
 		}
-	}
+	} else
+		++rmxcachenoupdate;
+
     no_valid_rt:
 	/* free the reassembly queue, if any */
 	while((q = LIST_FIRST(&tp->t_segq)) != NULL) {

--------------FF908FDE6605971918FB41DB--


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3CF65A3C.915493B8>