From owner-freebsd-net Tue Jun 4 1:45:24 2002 Delivered-To: freebsd-net@freebsd.org Received: from whale.sunbay.crimea.ua (whale.sunbay.crimea.ua [212.110.138.65]) by hub.freebsd.org (Postfix) with ESMTP id 4524337B408 for ; Tue, 4 Jun 2002 01:44:48 -0700 (PDT) Received: (from ru@localhost) by whale.sunbay.crimea.ua (8.11.6/8.11.2) id g548hXp96791; Tue, 4 Jun 2002 11:43:33 +0300 (EEST) (envelope-from ru) Date: Tue, 4 Jun 2002 11:43:33 +0300 From: Ruslan Ermilov To: Andre Oppermann Cc: Garrett Wollman , net@FreeBSD.org Subject: Re: Bug in net/route.c function rtredirect() Message-ID: <20020604084333.GA95488@sunbay.com> References: <20020602125310.GD58857@sunbay.com> <3CFBE83F.3E7025A4@pipeline.ch> <200206032237.g53Mb0h1018954@khavrinen.lcs.mit.edu> <3CF81BC5.EAF0ACC@pipeline.ch> <20020601095707.GC50435@sunbay.com> <3CF8A6CF.6033679A@pipeline.ch> <20020602125310.GD58857@sunbay.com> <3CFBE83F.3E7025A4@pipeline.ch> <20020604070053.GA83399@sunbay.com> <3CFC7951.57285201@pipeline.ch> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="gBBFr7Ir9EOA20Yy" Content-Disposition: inline In-Reply-To: <3CFC7951.57285201@pipeline.ch> User-Agent: Mutt/1.3.99i Sender: owner-freebsd-net@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org --gBBFr7Ir9EOA20Yy Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Jun 04, 2002 at 10:24:49AM +0200, Andre Oppermann wrote: > Ruslan Ermilov wrote: > >=20 > > On Tue, Jun 04, 2002 at 12:05:51AM +0200, Andre Oppermann wrote: > > > After reading this whole redirect stuff a couple of time I've come to > > > the conclusion that the function is right as it is there. There is no > > > such bug as I described it. The rtalloc1() in rtredirect() is incre- > > > menting the refcount on the route found, the two rtfree() after > > > "create:" and "done:" will decrement it as it is must happen (we don't > > > have a reference to that route because we don't clone here). > > > > > Right. There's just no point in calling rtfree() just to decrement > > the route's rt_refcnt (we were only looking up the route, we didn't > > need a reference to it). rtfree() does not free the route because > > it's still RTF_UP, and calling rtfree() also needlessly calls > > rnh_close() (in_clsroute() in the PF_INET case). Hence this would > > still be a slight optimization (tested this time): >=20 > This breaks if someone hacks up the routing table. >=20 What do you mean by "hacks"? > The vast majority > is using rtfree() (actually the RTFREE macro which checks =3D=3D0 first). > To be entirely correct and clean the macro RTFREE would be the right > thing to do here. >=20 Well, rtinit() does the same decrementing. > > %%% > > Index: net/route.c > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > RCS file: /home/ncvs/src/sys/net/route.c,v > > retrieving revision 1.69 > > diff -u -p -r1.69 route.c > > --- net/route.c 19 Mar 2002 21:54:18 -0000 1.69 > > +++ net/route.c 4 Jun 2002 06:41:42 -0000 > > @@ -345,7 +345,7 @@ rtredirect(dst, gateway, netmask, flags, > > */ > > create: > > if (rt) > > - rtfree(rt); > > + rt->rt_refcnt--; > > flags |=3D RTF_GATEWAY | RTF_DYNAMIC; > > bzero((caddr_t)&info, sizeof(info)); > > info.rti_info[RTAX_DST] =3D dst; > > %%% > >=20 > > > A bug is that host routes created by redirect are never being purged. > > > But that one has been present for a long (?) time. > > > > > > I'm still try to track a problem with the following situation: 1. I > > > get a tcp syn from somewhere, 2. a host route is created by tcp, 3. > > > the synack is sent back, 4. we get a redirect from the router to use > > > a different router, 5. host route created from tcp is updated and > > > replaced by icmp redirect route, 6. I see a RTM_LOSING message and > > > the redirect route is being purged. > > > > > This is handled by in_losing(). in_losing() has all the necessary > > comments explaining what's going on here. >=20 > Yes, I see what and why in_losing() is doing it. What I'm wondering is > why are the packets lost so in_losing() is being triggered. In my net- > work such packet loss is not supposed to happen. Otherwise I could only > imagine getting a bugus redirect. That would be a problem on the router > (which is also FreeBSD based). Anyway, I'll investigate my problem here > further and see where that takes me... >=20 OK, you're the best person to try realize that. :-) > > > This happens in, I think, some 5% of the cases. What I'm tracking > > > is why the rtfree() after "create:" is de-refcounting the default > > > route when we should have updated the host route created by tcp > > > before. Maybe this is a side-effect of the tcp syncache and the > > > flow has changed? I'll track what happens there. > > > > > There may be only one reason: there's no (yet) route created by tcp. > > I can't reproduce it here. >=20 > In tcp_input.c is a route lookup (via tcp_rtlookup) in the function > tcp_mssopt(). tcp_mssopt is supposed to look up the outgoing inter- > face to find out the MSS this host supports. tcp_rtlookup rtalloc's > a route and clones it if neccessary. This is before any redirect can > happen because it's before we've sent out the packet which triggers > the redirect. syncache_respond is doing the same. So there should be > a route existing prior to the redirect?! >=20 This TCP-cloned route is only installed after a connection is established. There's the packet flow taking place before it is established. Could it be that you're seeing these RTM_LOOSING during the connection establishment phase? What does tcpdump(1) show you? What does ``route -vn monitor'' tell you? > (UDP is not doing this, it does not request a host route but simply > takes whichever routes matches, usually default route). >=20 I know, there's no per-destination metrics for UDP. :-) > > > > Heh, so you in fact tried to rtfree() "rt" in "done:" by adding > > > > "rtn". And how *rtp (if rtp !=3D NULL) will become "rtn" then? > > > > What about this? > > > > > > No. No bug as I said, no need to patch. Sorry for this touble. > > > > > > For the expiration of redirects I'll port/adapt the NetBSD solution > > > and post the patch here. > > > > > We could treat RTF_DYNAMIC routes just like RTF_WASCLONED ones. > > Seems to work just fine here: >=20 > Ah, an even smarter solution! I'll try this on my box this evening! >=20 OK, I will also wait for what others think about it. > > %%% > > Index: netinet/in_rmx.c > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > RCS file: /home/ncvs/src/sys/netinet/in_rmx.c,v > > retrieving revision 1.42 > > diff -u -p -r1.42 in_rmx.c > > --- netinet/in_rmx.c 19 Mar 2002 21:25:46 -0000 1.42 > > +++ netinet/in_rmx.c 4 Jun 2002 06:41:42 -0000 > > @@ -202,8 +202,10 @@ in_clsroute(struct radix_node *rn, struc > > if((rt->rt_flags & (RTF_LLINFO | RTF_HOST)) !=3D RTF_HOST) > > return; > >=20 > > - if((rt->rt_flags & (RTF_WASCLONED | RTPRF_OURS)) > > - !=3D RTF_WASCLONED) > > + if((rt->rt_flags & RTPRF_OURS) !=3D 0) > > + return; > > + > > + if((rt->rt_flags & (RTF_WASCLONED | RTF_DYNAMIC)) =3D=3D 0) > > return; > >=20 > > /* > > %%% > >=20 > > On Mon, Jun 03, 2002 at 06:37:00PM -0400, Garrett Wollman wrote: > > > < said: > > > > > > > A bug is that host routes created by redirect are never being purge= d. > > > > But that one has been present for a long (?) time. > > > > > > You are expected to be running a routing process (like `routed' in > > > router-discovery mode) which will delete them for you. I agree that > > > this is not a reasonable expectation, but that was the design intent. > > > > > I think there's nothing wrong if dynamically created routes (by redirec= t) > > will be treated as "temporary" routes, just like protocol-cloned routes. > > The above patch for in_rmx.c implements this. Let me know what do you > > think. >=20 > /me thinks this is fine. This reminds of a problem I had in August 2001 > with the .CH secondary nameserver ccTLD.tix.ch where I've got more than > 303'000 redirects in the routing table. The thread was named "303,000 > routes in kernel". Thankfully the network code is quite robust and > didn't fall over when it could no longer allocate memory for new routes. >=20 And if you think about it, dynamically learned routes are just another way of "cloning". Your host learns them from a remote gateway, and that gateway's configuration may change as time goes by, and this route would become stale. Cheers, --=20 Ruslan Ermilov Sysadmin and DBA, ru@sunbay.com Sunbay Software AG, ru@FreeBSD.org FreeBSD committer, +380.652.512.251 Simferopol, Ukraine http://www.FreeBSD.org The Power To Serve http://www.oracle.com Enabling The Information Age --gBBFr7Ir9EOA20Yy Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.7 (FreeBSD) iD8DBQE8/H21Ukv4P6juNwoRApXsAJ9wZRT5DK3CdYaLBRPsR619k+dKiwCdEFDq 9n4gZ+nM3QD5hZ39OIHrdpo= =mXdq -----END PGP SIGNATURE----- --gBBFr7Ir9EOA20Yy-- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-net" in the body of the message