From owner-freebsd-net@FreeBSD.ORG Wed Oct 13 11:04:57 2010 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 247CD1065670; Wed, 13 Oct 2010 11:04:57 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from mail-gy0-f182.google.com (mail-gy0-f182.google.com [209.85.160.182]) by mx1.freebsd.org (Postfix) with ESMTP id 7CB3A8FC16; Wed, 13 Oct 2010 11:04:56 +0000 (UTC) Received: by gyf3 with SMTP id 3so1585862gyf.13 for ; Wed, 13 Oct 2010 04:04:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:sender:received :in-reply-to:references:date:x-google-sender-auth:message-id:subject :from:to:cc:content-type:content-transfer-encoding; bh=FmnrUex4CjnM75QJfA42JdvQTvm8kn9MiFt1JhEptUw=; b=WbAeTyOjyFs8YEucVk80GK59Dj5IgJ2AbhAmrpZp1HFgrDh0iRiBX7tqLqn1QwFMZq anJlB/0OrJYpyswC6iuChSY35VSlwyEF2fMk0JO4j+ut0s3IwaEVuZviDtkqmQa4vkZg Y4DscHfTMDKy9RNcAVRhAKHsbzs3humynHT2I= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; b=AZviO6avwzfw5akY1JCjmspEyk+ISzogsT/houLAuWx+Ux/neQa3IAZTOBFf2RDMLs BXzmIJFVPyhqYN1F/xOuUMJXtZXT4+q0e4DK+udCsJ9ouoIP4akEDRysnC3rwiexmzI7 /v6bWGWQD17+SFAxpKU9cQ1EfKntob3LTNSsE= MIME-Version: 1.0 Received: by 10.151.42.5 with SMTP id u5mr819214ybj.259.1286967895490; Wed, 13 Oct 2010 04:04:55 -0700 (PDT) Sender: asmrookie@gmail.com Received: by 10.150.143.19 with HTTP; Wed, 13 Oct 2010 04:04:54 -0700 (PDT) In-Reply-To: References: Date: Wed, 13 Oct 2010 13:04:54 +0200 X-Google-Sender-Auth: HVuO12zbGvEvfCtL0I3E8Q5_WfM Message-ID: From: Attilio Rao To: Robert Watson Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: FreeBSD Current , freebsd-net@freebsd.org, Nima Misaghian , Sergey Kandaurov , Jack F Vogel , Ryan Stone , Ed Maste Subject: Re: [PATCH] Netdump for review and testing -- preliminary version X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 13 Oct 2010 11:04:57 -0000 2010/10/9 Robert Watson : > On Fri, 8 Oct 2010, Attilio Rao wrote: > >>> GENERAL FRAMEWORK ARCHITECTURE >>> >>> Netdump is composed, right now, by an userland "server" and a kernel >>> "client". The former is run on the target machine (where the dump will >>> phisically happen) and it is responsible for receiving =C2=A0the packet= s >>> containing coredumps frame and for correctly writing them on-disk. The >>> latter is part of the kernel installed on the source machine (where the= dump >>> is initiated) and is responsible for building correctly UDP packets >>> containing the coredump frames, pushing through the network interface a= nd >>> routing them appropriately. > > Hi Attilio: > > Network dumps would be a great addition to the FreeBSD debugging suite! = =C2=A0A > few casual comments and questions, I need to spend some more time ponderi= ng > the implications of the current netdump design later in the week. > > (1) Did you consider using tftp as the network dump protocol, rather than= a > custom protocol? =C2=A0It's also a simple UDP-based, ACKed file transfer > protocol, with the advantage that it's widely supported by exiting tftp > daemons, packet sniffers, security devices, etc. =C2=A0This wouldn't requ= ire > using a stock tftpd, although that would certainly be a benefit as well. > =C2=A0The post-processing of crashdumps that seems to happen in the netdu= mp > server could, presumably, happen "offline" as easily...? I don't understand the "offline" question but, really, I don't know why tftp wasn't considered in the first place. Let me do some small search and see how much we could benefit from it. > (2) Have you thought about adding a checksum to the dump format, since > packets are going over the wire on UDP, etc? =C2=A0Even an MD5 sum wouldn= 't be > too hard to arrange: all the code is in the kernel already, requires > relatively little state storage, and is designed for streamed data. Someone else already brought this point and I agree, we could use a checksum here. > (3) As the bounds checking/etc behavior in dumping grows more complex, it > seems a shame to replicate it in architecture-specific code. =C2=A0Could = we pull > the mediaoffset/mediasize checking into common code? =C2=A0Also, a reserv= ed > size/offset of "0" meaning "no limit" sounds slightly worrying; it might = be > slightly more conservative to add a flags field to dumperinfo and have a > flag indicating "size is no object" for netdump, rather than overloading = the > existing fields. I don't agree with you here. The real problem is that dumpsys is highly disk-specific (I've commented about it somewhere, maybe the e-mail or maybe the commit logs). What we'd need is to have something like netdumpsys() which tries to use network, but it is not short to make and the mediasize+mediaoffset concept really rappresents an higher bound which can't be 0. I think it is a reasonable compromise so far but it is subjected to further improvements for sure. > Some specific patch comments: > > + * XXX: This should be split into machdep and non-machdep parts > > What MD parts are in the file? This is just a stale comment. > The sysctl parts of the patch have a number of issues: Sysctl are still not overhauled just because I'm not sure we want to keep them. That is one of the points I raised in the main e-mail and I'd really would like to hear opinions about if we should keep them rather than having a setup userland process, etc. I'm sorry about this, but please keep in mind that the file still needs a lot of cleanup so some comments are a bit out of date and other parts may not be still perfectly overhauled. > +sysctl_force_crash(SYSCTL_HANDLER_ARGS) > > Does this belong in the netdump code? =C2=A0We already have some of these= options > in debug.kdb.*, but perhaps others should be added there as well. For this specific case, I'd really axe it out rather. > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 /* > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0* get and fill a= header mbuf, then chain data as an > extended > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0* mbuf. > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0*/ > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 MGETHDR(m, M_DONTWAIT,= MT_DATA); > > The idea of calling into the mbuf allocator in this context is just freak= y, > and may have some truly awful side effects. =C2=A0I suppose this is the c= ost of > trying to combine code paths in the network device driver rather than hav= e > an independent path in the netdump case, but it's quite unfortunate and w= ill > significantly reduce the robustness of netdumps in the face of, for examp= le, > mbuf starvation. I'm not sure in which other way we could fix that actually. Maybe a pre-allocated pool of mbufs? > + =C2=A0 =C2=A0 =C2=A0 if (ntohs(ah->ar_hrd) !=3D ARPHRD_ETHER && > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ntohs(ah->ar_hrd) !=3D ARPHRD_IEEE80= 2 && > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ntohs(ah->ar_hrd) !=3D ARPHRD_ARCNET= && > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ntohs(ah->ar_hrd) !=3D ARPHRD_IEEE13= 94) { > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 NETDDEBUG("nd_handle_a= rp: unknown hardware address fmt " > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 "0x%2D)\= n", (unsigned char *)&ah->ar_hrd, ""); > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 return; > + =C2=A0 =C2=A0 =C2=A0 } > > Are you sure you don't want to just check for ETHER here? I'd really like to hear Ryan's or Ed's idea here. > + =C2=A0 =C2=A0 =C2=A0 /* XXX: Probably should check if we're the recipie= nt MAC address */ > + =C2=A0 =C2=A0 =C2=A0 /* Done ethernet processing. Strip off the etherne= t header */ > > Yes, quite probably. =C2=A0What if you panic in promiscuous mode? > > + =C2=A0 =C2=A0 =C2=A0 /* > + =C2=A0 =C2=A0 =C2=A0 =C2=A0* The first write (at offset 0) is the kerne= l dump header. =C2=A0Flag it > + =C2=A0 =C2=A0 =C2=A0 =C2=A0* for the server to treat specially. =C2=A0X= XX: This doesn't strip out > the > + =C2=A0 =C2=A0 =C2=A0 =C2=A0* footer KDH, although it shouldn't hurt any= thing. > + =C2=A0 =C2=A0 =C2=A0 =C2=A0*/ > > The footer allows us to confirm that the tail end of a dump matches the > beginning; while probably not strictly necessary in this scenario, it's n= ot > a bad idea given the lack of a checksum. So I assume you are in favor of it, right? > + =C2=A0 =C2=A0 =C2=A0 /* Default the nic to the first available interfac= e */ > + =C2=A0 =C2=A0 =C2=A0 if ((ifn =3D TAILQ_FIRST(&ifnet)) !=3D NULL) do { > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if ((ifn->if_flags & I= FF_UP) =3D=3D 0) > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 continue; > > More locking needed. Please refer to the second patch I posted. It should have proper locking and actually this code is just trimmed (more locking in V_ifnet accessings, but not in this codepath). > > - =C2=A0 =C2=A0 =C2=A0 void =C2=A0 =C2=A0*if_pspare[7]; > + =C2=A0 =C2=A0 =C2=A0 struct =C2=A0netdump_methods *if_ndumpfuncs; /* ne= tdump virtual methods > */ > + =C2=A0 =C2=A0 =C2=A0 void =C2=A0 =C2=A0*if_pspare[6]; > =C2=A0 =C2=A0 =C2=A0 =C2=A0int =C2=A0 =C2=A0 if_ispare[4]; > > In HEAD, at least, you should add your field and not use the spare. =C2= =A0The > spare should only be used on a merge to a KBI-stable branch (such as 8.x)= . Thanks, for picking this. > I need to ponder your changes to ifnet and to the drivers some more; I > recognize the benefits of maximal code alignment, but I worry a lot about > these code paths having side effects (not least, calls into memory > allocators, which in turn can enter VM, etc). =C2=A0I wonder if, given th= at, we > need to teach the mbuf allocator to be much more conservative if netdumpi= ng > is active... Sorry, which calls from drivers should get in the memory allocator? Are you just referring to the mbuf headers allocation? Changes to the drivers are mostly stright-forward -- they just try to do polling in locked or unlocked mode, following DDB entering or not (basically how other DDB-agnostic routines already do in FreeBSD for the known locking problems we discussed several times in the past). Thanks for your valuable feedback, looking forward to hear more. Attilio --=20 Peace can only be achieved by understanding - A. Einstein