From owner-freebsd-stable Thu May 30 10: 0:17 2002 Delivered-To: freebsd-stable@freebsd.org Received: from mailtoaster1.pipeline.ch (mailtoaster1.pipeline.ch [62.48.0.70]) by hub.freebsd.org (Postfix) with SMTP id 13C2337B400 for ; Thu, 30 May 2002 09:59:48 -0700 (PDT) Received: (qmail 40396 invoked from network); 30 May 2002 16:59:27 -0000 Received: from unknown (HELO pipeline.ch) ([62.48.0.53]) (envelope-sender ) by mailtoaster1.pipeline.ch (qmail-ldap-1.03) with SMTP for ; 30 May 2002 16:59:27 -0000 Message-ID: <3CF65A3C.915493B8@pipeline.ch> Date: Thu, 30 May 2002 18:58:37 +0200 From: Andre Oppermann X-Mailer: Mozilla 4.76 [en] (Windows NT 5.0; U) X-Accept-Language: en MIME-Version: 1.0 To: freebsd-net@freebsd.org Cc: freebsd-stable@freebsd.org Subject: FreeBSD kernel routing table, need statistics, please install this patch Content-Type: multipart/mixed; boundary="------------FF908FDE6605971918FB41DB" Sender: owner-freebsd-stable@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG This is a multi-part message in MIME format. --------------FF908FDE6605971918FB41DB Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Hi all, while working on a design overhaul of the kernel routing table I was inspecting the rt_metrics stuff a little bit closer. Then I checked with some busier web servers to see how much effect the rt_metric caching actually has. The result was not very clear. Some conntctions never got cached. The attached patch collects some statistics about the usage of the rt_metrics on a system. Specifically it counts how many time it a new tcp session has been established, how many times of those it found useful cached rt_metrics and then how many times it updated those metrics. The counters look like this (on a freshly booted system): # sysctl -a | grep tcp.rmx net.inet.tcp.rmxcachelookup: 3 net.inet.tcp.rmxcachehit: 1 net.inet.tcp.rmxcacheupdate: 2 net.inet.tcp.rmxcachenoupdate: 0 Please apply the attached patch (against 4-STABLE) and after a couple of hours/days please send me the output of: # uname -a # sysctl -a | grep tcp.rmx # netstat -m # netstat -ran | wc -l # decription main usage of your system (webserver/workstation/whatever) I don't want to nuke it but I'd like to see how much it helps overall. Then, because it's TCP specific, I'd like to move it out of the main routing table (only MTU remains) and transform it into a hash table. The rt_metrics are host specific so they only ever got used on host routes and are wasting an enormous amount of space in the main routing table. Also the strategy of the rt_metrics caching is probably inapropriate for todays world with many web servers. The problem is the rt_metrics only get updated when a tcp session to/from that host closes and a sufficient number of packets have been exchanged to make a mostly accurate messurement of those parameters. Unfortunatly in todays world the webbrowsers open a number of connections in very rapid succession so there is no chance to have any cached values for the connections after the first if not one of them closed already. The benefit is only being seen when the user loads the next page and opens new tcp seesions. Even that is being migitated by HTTP/1.1 keepalive and pipelining since sessions are not closed anymore. A possible solution is to update the rt_cache for the first time after a sufficient number of packets have been exchanged to make a mostly accurate measurement. And then update it after any n packets thereafter. The here collected statistics and numbers will greatly help to determin the best way how to adjust the rt_metrics to be most effective. The patch applies against /usr/src/sys/netinet/tcp_[input.c|subr.c]. "Profile, don't speculate" Many thanks for your cooperation! -- Andre --------------FF908FDE6605971918FB41DB Content-Type: text/plain; charset=us-ascii; name="tcp_input.c.patch" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="tcp_input.c.patch" --- tcp_input.c Sun Apr 28 07:40:26 2002 +++ tcp_input.c.new Thu May 30 18:18:20 2002 @@ -31,7 +31,7 @@ * SUCH DAMAGE. * * @(#)tcp_input.c 8.12 (Berkeley) 5/24/95 - * $FreeBSD: src/sys/netinet/tcp_input.c,v 1.107.2.23 2002/04/28 05:40:26 suz Exp $ + * $FreeBSD: src/sys/netinet/tcp_input.c,v 1.107.2.23 2002/05/30 16:12:00 andre Exp $ */ #include "opt_ipfw.h" /* for ipfw_fwd */ @@ -126,6 +126,16 @@ &drop_synfin, 0, "Drop TCP packets with SYN+FIN set"); #endif + +int rmxcachelookup = 0; +SYSCTL_INT(_net_inet_tcp, OID_AUTO, rmxcachelookup, + CTLFLAG_RD, &rmxcachelookup, 0, "RMX cache lookups"); + +int rmxcachehit = 0; +SYSCTL_INT(_net_inet_tcp, OID_AUTO, rmxcachehit, + CTLFLAG_RD, &rmxcachehit, 0, "RMX cache hits"); + + struct inpcbhead tcb; #define tcb6 tcb /* for KAME src sync over BSD*'s */ struct inpcbinfo tcbinfo; @@ -2521,7 +2531,13 @@ * or rttvar. Convert from the route-table units * to scaled multiples of the slow timeout timer. */ + + ++rmxcachelookup; + if (tp->t_srtt == 0 && (rtt = rt->rt_rmx.rmx_rtt)) { + + ++rmxcachehit; + /* * XXX the lock bit for RTT indicates that the value * is also a minimum value; this is subject to time. --------------FF908FDE6605971918FB41DB Content-Type: text/plain; charset=us-ascii; name="tcp_subr.c.patch" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="tcp_subr.c.patch" --- tcp_subr.c Sun Apr 14 06:02:30 2002 +++ tcp_subr.c.new Thu May 30 18:19:12 2002 @@ -31,7 +31,7 @@ * SUCH DAMAGE. * * @(#)tcp_subr.c 8.2 (Berkeley) 5/24/95 - * $FreeBSD: src/sys/netinet/tcp_subr.c,v 1.73.2.25 2002/04/14 04:02:30 silby Exp $ + * $FreeBSD: src/sys/netinet/tcp_subr.c,v 1.73.2.25 2002/05/30 16:12:00 andre Exp $ */ #include "opt_compat.h" @@ -144,6 +144,16 @@ SYSCTL_INT(_net_inet_tcp, OID_AUTO, isn_reseed_interval, CTLFLAG_RW, &tcp_isn_reseed_interval, 0, "Seconds between reseeding of ISN secret"); + +int rmxcacheupdate = 0; +SYSCTL_INT(_net_inet_tcp, OID_AUTO, rmxcacheupdate, + CTLFLAG_RD, &rmxcacheupdate, 0, "RMX cache update"); + +int rmxcachenoupdate = 0; +SYSCTL_INT(_net_inet_tcp, OID_AUTO, rmxcachenoupdate, + CTLFLAG_RD, &rmxcachenoupdate, 0, "RMX cache no update"); + + static void tcp_cleartaocache __P((void)); static void tcp_notify __P((struct inpcb *, int)); @@ -638,6 +648,8 @@ == INADDR_ANY) goto no_valid_rt; + ++rmxcacheupdate; + if ((rt->rt_rmx.rmx_locks & RTV_RTT) == 0) { i = tp->t_srtt * (RTM_RTTUNIT / (hz * TCP_RTT_SCALE)); @@ -710,7 +722,9 @@ rt->rt_rmx.rmx_ssthresh = i; tcpstat.tcps_cachedssthresh++; } - } + } else + ++rmxcachenoupdate; + no_valid_rt: /* free the reassembly queue, if any */ while((q = LIST_FIRST(&tp->t_segq)) != NULL) { --------------FF908FDE6605971918FB41DB-- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message