From owner-freebsd-net@FreeBSD.ORG Fri Aug 22 21:40:05 2008 Return-Path: Delivered-To: net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 495A3106564A; Fri, 22 Aug 2008 21:40:05 +0000 (UTC) (envelope-from gnn@neville-neil.com) Received: from proxy.meer.net (proxy.meer.net [64.13.141.13]) by mx1.freebsd.org (Postfix) with ESMTP id 278F28FC22; Fri, 22 Aug 2008 21:40:04 +0000 (UTC) (envelope-from gnn@neville-neil.com) Received: from mail.meer.net (mail.meer.net [64.13.141.3]) by proxy.meer.net (8.14.2/8.14.2) with ESMTP id m7MLe2Xj008278; Fri, 22 Aug 2008 14:40:04 -0700 (PDT) (envelope-from gnn@neville-neil.com) Received: from mail2.meer.net (mail2.meer.net [64.13.141.16]) by mail.meer.net (8.13.3/8.13.3/meer) with ESMTP id m7MLdTjt052740; Fri, 22 Aug 2008 14:39:29 -0700 (PDT) (envelope-from gnn@neville-neil.com) Received: from minion.local.neville-neil.com (ppp-71-139-4-101.dsl.snfc21.pacbell.net [71.139.4.101]) (authenticated bits=0) by mail2.meer.net (8.14.1/8.14.1) with ESMTP id m7MLdSWT074357; Fri, 22 Aug 2008 14:39:29 -0700 (PDT) (envelope-from gnn@neville-neil.com) Date: Fri, 22 Aug 2008 14:39:27 -0700 Message-ID: From: gnn@FreeBSD.org To: "Bruce M. Simpson" In-Reply-To: <48AF08B7.4090804@FreeBSD.org> References: <20080821203519.GA51534@onelab2.iet.unipi.it> <48AE23FF.9070009@FreeBSD.org> <48AF08B7.4090804@FreeBSD.org> User-Agent: Wanderlust/2.15.5 (Almost Unreal) SEMI/1.14.6 (Maruoka) FLIM/1.14.9 (=?ISO-8859-4?Q?Goj=F2?=) APEL/10.7 Emacs/22.2.50 (i386-apple-darwin9.4.0) MULE/5.0 (SAKAKI) MIME-Version: 1.0 (generated by SEMI 1.14.6 - "Maruoka") Content-Type: text/plain; charset=US-ASCII X-Canit-CHI2: 0.50 X-Bayes-Prob: 0.5 (Score 0, tokens from: ) X-Spam-Score: 0.10 () [Tag at 5.00] COMBINED_FROM X-CanItPRO-Stream: default X-Canit-Stats-ID: 1317801 - 8a2c231cbb4a X-Scanned-By: CanIt (www . roaringpenguin . com) on 64.13.141.13 Cc: Luigi Rizzo , net@FreeBSD.org Subject: Re: Small patch to multicast code... X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Aug 2008 21:40:05 -0000 At Fri, 22 Aug 2008 19:43:03 +0100, Bruce M. Simpson wrote: > > We end up calling if_simloop() from a few "interesting" places, in > particular the kernel PIM packet handler. > > In this particular case we're going to take a full mbuf chain copy every > time we send a packet which needs to be looped back to userland. Right, I know the penalty. > It's been a while since I've done any in-depth FreeBSD work other > than hacking on the IGMPv3 snap, and my time is largely tied up with > other work these days, sadly. > > It doesn't seem right to my mind that we need to make a full copy of > an mbuf chain with m_dup() to workaround this kind of problem. > > Whilst it may suffice for a band-aid workaround, we may see mbuf > pool fragmentation as packet rates go up. > > However we are now in a "new world order" where mbuf chains may be > very tied to the device where they've originated or to where they're > going. It isn't clear to me where this kind of intrusion is > happening. > > In the case of ip_mloopback(), somehow we are stomping on a > read-only copy of an mbuf chain. The use of m_copy() with m_pullup() > there is fine according to the documented uses of mbuf(9), although > as Luigi pointed out, most likely we need to look at the upper-layer > protocol too, e.g. where UDP checksums are also being offloaded. > > Some of the code in the IGMPv3 branch actually reworks how loopback > happens i.e. the preference is not to loop back wherever possible > because of the locking implications. Check the bms_netdev branch > history for more info. Well, what I suspect is the problem are these bits: udp_output(): /* * Set up checksum and output datagram. */ if (udp_cksum) { if (inp->inp_flags & INP_ONESBCAST) faddr.s_addr = INADDR_BROADCAST; ui->ui_sum = in_pseudo(ui->ui_src.s_addr, faddr.s_addr, htons((u_short)len + sizeof(struct udphdr) + IPPROTO_UDP)); m->m_pkthdr.csum_flags = CSUM_UDP; m->m_pkthdr.csum_data = offsetof(struct udphdr, uh_sum); } else ip_mloopback(): copym = m_copy(m, 0, M_COPYALL); if (copym != NULL && (copym->m_flags & M_EXT || copym->m_len < hlen)) copym = m_pullup(copym, hlen); if (copym != NULL) { /* If needed, compute the checksum and mark it as valid. */ if (copym->m_pkthdr.csum_flags & CSUM_DELAY_DATA) { in_delayed_cksum(copym); copym->m_pkthdr.csum_flags &= ~CSUM_DELAY_DATA; copym->m_pkthdr.csum_flags |= CSUM_DATA_VALID | CSUM_PSEUDO_HDR; copym->m_pkthdr.csum_data = 0xffff; } and: in_delayed_cksum(struct mbuf *m) { struct ip *ip; u_short csum, offset; ip = mtod(m, struct ip *); offset = ip->ip_hl << 2 ; csum = in_cksum_skip(m, ip->ip_len, offset); if (m->m_pkthdr.csum_flags & CSUM_UDP && csum == 0) csum = 0xffff; offset += m->m_pkthdr.csum_data; /* checksum offset */ Somehow the data that the device needs to do the proper checksum offload is getting trashed here. Now, since it's clear we need a writable packet structure so that we don't trash the original, I'm wondering if the m_pullup() will be sufficient. Best, George