From owner-cvs-all@FreeBSD.ORG Tue Apr 15 15:53:11 2008 Return-Path: Delivered-To: cvs-all@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8C7B51065679 for ; Tue, 15 Apr 2008 15:53:11 +0000 (UTC) (envelope-from andre@freebsd.org) Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2]) by mx1.freebsd.org (Postfix) with ESMTP id 9683D8FC25 for ; Tue, 15 Apr 2008 15:53:10 +0000 (UTC) (envelope-from andre@freebsd.org) Received: (qmail 25526 invoked from network); 15 Apr 2008 14:59:05 -0000 Received: from localhost (HELO [127.0.0.1]) ([127.0.0.1]) (envelope-sender ) by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP for ; 15 Apr 2008 14:59:05 -0000 Message-ID: <4804CF68.9060109@freebsd.org> Date: Tue, 15 Apr 2008 17:53:12 +0200 From: Andre Oppermann User-Agent: Thunderbird 1.5.0.14 (Windows/20071210) MIME-Version: 1.0 To: Qing Li References: <200804130545.m3D5jEtd081771@repoman.freebsd.org> <4803D7E2.80000@freebsd.org> <000201c89eae$d4dcfe10$b1335140@SAINTS> In-Reply-To: <000201c89eae$d4dcfe10$b1335140@SAINTS> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: 'Qing Li' , src-committers@FreeBSD.org, claudio@openbsd.org, cvs-all@FreeBSD.org, cvs-src@FreeBSD.org Subject: Re: cvs commit: src/sys/conf files options src/sys/net radix.c radix.h route.c route.h rtsock.c src/sys/netinet in_proto.c ip_output.c src/sys/netinet6 in6_proto.c in6_src.c nd6_nbr.c X-BeenThere: cvs-all@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: CVS commit messages for the entire tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Apr 2008 15:53:11 -0000 Qing Li wrote: > Hi Andre, >>> is disallowed. For example, >>> >>> route add -net 192.103.54.0/24 10.9.44.1 >>> route add -net 192.103.54.0/24 10.9.44.2 >>> >>> The second route insertion will trigger an error message of >>> "add net 192.103.54.0/24: gateway 10.2.5.2: route already >> in table" >> >> Would it make sense to retain this behavior by default (POLA) >> and have multi-path being enabled via sysctl like packet >> forwarding in general? >> Just adding the same route twice with different next-hops can >> lead to very confusing situations for the users which are not >> used to multi-path. >> > > I think that is possible. Were you thinking more along the > line of accidental route insertion ... Because users who > are not familiar with ecmp probably won't ever bother > with more than one route per destination. If there is no error message when adding a second route it easily happens. Due to hash based balancing some connections work and some do not. Very confusing. >>> "route: writing to routing socket: No such process" >>> "delete net default: not in table" >> Can this be made more descriptive? This messages are about >> as confusing and non-descript as possible. >> > > We should fix the above error message in general. > >> Not being aware of the multipath functionality I would pull >> out my last hair try to get rid of a route. >> > > I think updating the manpage would be a necessary > next step. > >> How does this behave with common routing daemons; >> Quagga/Zebra, OpenBGPD, OpenOSPFD? >> > > Hmm... Good question, I haven't tried them but > I will. Is this something you could help me > with ? I've chatted with Claudio Jeker (claudio@openbsd.org). He's the author of OpenBGPD and OpenOSPFD plus some work on the OpenBSD multipath support. He says the implicit multipath doesn't work out right and is very difficult to manage from the routing daemons. In OpenBSD they had to change it to explicit mark multipath routes with the RTM_MPATH flag in the table, during creation and removal. The problem is that many daemons and programs (dhclient, ppp, ...) do not properly remove routes and simply re-add a new one with different parameters. This obviously leads to chaos. In OpenBSD multipath one has to install an multipath route explicitly with the -mpath modifier to route(8) and for daemons with RTF_MPATH in the routing message. Multipath routes also retain this flag during their lifetime. If not set, the normal one-route-only behavior is kept. This allows all non-mpath aware programs to continue to work. I think this is the model to follow. Also for inter-BSD compatibility. >> Do they have to be aware >> of the multipath functionality? Will it confuse them? >> > > I don't believe these routing protocols necessarily > have to know about the multipath functionality. > The routing protocols should continue to function > wrt route insertion/deletion. It's easy to throw them into disarray as they do not expect routes to persist when they delete (one of) them. > You do bring up a good question about whether > we should associate ownership with a route entry > if multiple routing protocols are running > in parallel. Is this a common practice from your > experience ? And should we allow multiple routes > with the same next-hop but different owners in > the FIB ?? Yes. Let me explain. There are two approaches here: The Quagga/Zebra approach where all routing protocol daemons communicate with a central daemon that is the single point of contact to the kernel. The other approach is the OpenBGPD/OpenOSPFD approach where each daemon runs on its own (because most of the time there is little to no overlap) and does its own routing table manipulations. The second approach is a bit tricky at the moment as the routing socket is not really intended for operating in this way and the daemons have to be aware of each other in certain ways. Ideally, and this is what Claudio says as well, we should end up with the following functionality: - equal cost multipath where one prefix can have multiple next-hops. - ecmp should be explicit with the RTM_MPATH flag. - a hierarchy of multiple prefixes where the one with the highest priority carries the traffic (possibly with ecmp). - the hierarchy should have a number of precedence levels (interface route, static route, IGP route, EGP route, other). - within those precedence levels it should have further subdivision to prefer OSPF over RIP in the IGP category for example. - a change/delete applies to a specific precedence level if specified. - routing socket filters on reading so that routing daemons can select which precendence levels they want to track (IGP doesn't have to track EGP route changes for example). With this functionality a number of independent but complementary routing daemons can work together is a useful and -more important- standardized way. The ospfd inserts a multipath for 10.0.0.0/8 via 192.168.1.1 and 192.168.1.2 and precedence 4. The bgpd inserts a single route for 10.0.0.0/8 via 192.168.1.3 with precedence 8. All traffic goes through 192.68.1.1 and .2. If the ospfd removes both routes .3 will become active right away. Normally bgpd would have to notice the removal and then has to insert the new prefix. If ospfd then wants to insert them again it has to remove or modify the route bgpd installed. With precedence multipath these problems go away. >> What about the other big missing piece; new-arp? ;-) >> > > That's on its way. Julian is helping me testing the > patch and reviewing the code etc. I am still > debugging a locking/reference count issue and > I hope to make good progress in the coming week. May I have a look too before it goes into CVS? -- Andre