From owner-freebsd-net@FreeBSD.ORG  Wed Oct 13 11:04:57 2010
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 247CD1065670;
	Wed, 13 Oct 2010 11:04:57 +0000 (UTC)
	(envelope-from asmrookie@gmail.com)
Received: from mail-gy0-f182.google.com (mail-gy0-f182.google.com
	[209.85.160.182])
	by mx1.freebsd.org (Postfix) with ESMTP id 7CB3A8FC16;
	Wed, 13 Oct 2010 11:04:56 +0000 (UTC)
Received: by gyf3 with SMTP id 3so1585862gyf.13
	for <multiple recipients>; Wed, 13 Oct 2010 04:04:55 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:received:sender:received
	:in-reply-to:references:date:x-google-sender-auth:message-id:subject
	:from:to:cc:content-type:content-transfer-encoding;
	bh=FmnrUex4CjnM75QJfA42JdvQTvm8kn9MiFt1JhEptUw=;
	b=WbAeTyOjyFs8YEucVk80GK59Dj5IgJ2AbhAmrpZp1HFgrDh0iRiBX7tqLqn1QwFMZq
	anJlB/0OrJYpyswC6iuChSY35VSlwyEF2fMk0JO4j+ut0s3IwaEVuZviDtkqmQa4vkZg
	Y4DscHfTMDKy9RNcAVRhAKHsbzs3humynHT2I=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:sender:in-reply-to:references:date
	:x-google-sender-auth:message-id:subject:from:to:cc:content-type
	:content-transfer-encoding;
	b=AZviO6avwzfw5akY1JCjmspEyk+ISzogsT/houLAuWx+Ux/neQa3IAZTOBFf2RDMLs
	BXzmIJFVPyhqYN1F/xOuUMJXtZXT4+q0e4DK+udCsJ9ouoIP4akEDRysnC3rwiexmzI7
	/v6bWGWQD17+SFAxpKU9cQ1EfKntob3LTNSsE=
MIME-Version: 1.0
Received: by 10.151.42.5 with SMTP id u5mr819214ybj.259.1286967895490; Wed, 13
	Oct 2010 04:04:55 -0700 (PDT)
Sender: asmrookie@gmail.com
Received: by 10.150.143.19 with HTTP; Wed, 13 Oct 2010 04:04:54 -0700 (PDT)
In-Reply-To: <alpine.BSF.2.00.1010090121310.1232@fledge.watson.org>
References: <AANLkTikA5OUYD1A9pqCqVEZ5qk+VECq8x-fnRXnpp0KE@mail.gmail.com>
	<AANLkTikau6omhWrXVM13zonFEPCxXM+8EqJauovDu0OU@mail.gmail.com>
	<alpine.BSF.2.00.1010090121310.1232@fledge.watson.org>
Date: Wed, 13 Oct 2010 13:04:54 +0200
X-Google-Sender-Auth: HVuO12zbGvEvfCtL0I3E8Q5_WfM
Message-ID: <AANLkTikaUL5g5ViaG763yq2=NGw9V-+6JL_+Wxum8iAB@mail.gmail.com>
From: Attilio Rao <attilio@freebsd.org>
To: Robert Watson <rwatson@freebsd.org>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Cc: FreeBSD Current <current@freebsd.org>, freebsd-net@freebsd.org,
	Nima Misaghian <nmisaghian@sandvine.com>,
	Sergey Kandaurov <pluknet@freebsd.org>,
	Jack F Vogel <jfv@freebsd.org>, Ryan Stone <rstone@sandvine.com>,
	Ed Maste <emaste@sandvine.com>
Subject: Re: [PATCH] Netdump for review and testing -- preliminary version
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 13 Oct 2010 11:04:57 -0000

2010/10/9 Robert Watson <rwatson@freebsd.org>:
> On Fri, 8 Oct 2010, Attilio Rao wrote:
>
>>> GENERAL FRAMEWORK ARCHITECTURE
>>>
>>> Netdump is composed, right now, by an userland "server" and a kernel
>>> "client". The former is run on the target machine (where the dump will
>>> phisically happen) and it is responsible for receiving =C2=A0the packet=
s
>>> containing coredumps frame and for correctly writing them on-disk. The
>>> latter is part of the kernel installed on the source machine (where the=
 dump
>>> is initiated) and is responsible for building correctly UDP packets
>>> containing the coredump frames, pushing through the network interface a=
nd
>>> routing them appropriately.
>
> Hi Attilio:
>
> Network dumps would be a great addition to the FreeBSD debugging suite! =
=C2=A0A
> few casual comments and questions, I need to spend some more time ponderi=
ng
> the implications of the current netdump design later in the week.
>
> (1) Did you consider using tftp as the network dump protocol, rather than=
 a
> custom protocol? =C2=A0It's also a simple UDP-based, ACKed file transfer
> protocol, with the advantage that it's widely supported by exiting tftp
> daemons, packet sniffers, security devices, etc. =C2=A0This wouldn't requ=
ire
> using a stock tftpd, although that would certainly be a benefit as well.
> =C2=A0The post-processing of crashdumps that seems to happen in the netdu=
mp
> server could, presumably, happen "offline" as easily...?

I don't understand the "offline" question but, really, I don't know
why tftp wasn't considered in the first place.
Let me do some small search and see how much we could benefit from it.

> (2) Have you thought about adding a checksum to the dump format, since
> packets are going over the wire on UDP, etc? =C2=A0Even an MD5 sum wouldn=
't be
> too hard to arrange: all the code is in the kernel already, requires
> relatively little state storage, and is designed for streamed data.

Someone else already brought this point and I agree, we could use a
checksum here.

> (3) As the bounds checking/etc behavior in dumping grows more complex, it
> seems a shame to replicate it in architecture-specific code. =C2=A0Could =
we pull
> the mediaoffset/mediasize checking into common code? =C2=A0Also, a reserv=
ed
> size/offset of "0" meaning "no limit" sounds slightly worrying; it might =
be
> slightly more conservative to add a flags field to dumperinfo and have a
> flag indicating "size is no object" for netdump, rather than overloading =
the
> existing fields.

I don't agree with you here.
The real problem is that dumpsys is highly disk-specific (I've
commented about it somewhere, maybe the e-mail or maybe the commit
logs).
What we'd need is to have something like netdumpsys() which tries to
use network, but it is not short to make and the mediasize+mediaoffset
concept really rappresents an higher bound which can't be 0. I think
it is a reasonable compromise so far but it is subjected to further
improvements for sure.

> Some specific patch comments:
>
> + * XXX: This should be split into machdep and non-machdep parts
>
> What MD parts are in the file?

This is just a stale comment.

> The sysctl parts of the patch have a number of issues:

Sysctl are still not overhauled just because I'm not sure we want to
keep them. That is one of the points I raised in the main e-mail and
I'd really would like to hear opinions about if we should keep them
rather than having a setup userland process, etc.
I'm sorry about this, but please keep in mind that the file still
needs a lot of cleanup so some comments are a bit out of date and
other parts may not be still perfectly overhauled.

> +sysctl_force_crash(SYSCTL_HANDLER_ARGS)
>
> Does this belong in the netdump code? =C2=A0We already have some of these=
 options
> in debug.kdb.*, but perhaps others should be added there as well.

For this specific case, I'd really axe it out rather.

> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 /*
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0* get and fill a=
 header mbuf, then chain data as an
> extended
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0* mbuf.
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0*/
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 MGETHDR(m, M_DONTWAIT,=
 MT_DATA);
>
> The idea of calling into the mbuf allocator in this context is just freak=
y,
> and may have some truly awful side effects. =C2=A0I suppose this is the c=
ost of
> trying to combine code paths in the network device driver rather than hav=
e
> an independent path in the netdump case, but it's quite unfortunate and w=
ill
> significantly reduce the robustness of netdumps in the face of, for examp=
le,
> mbuf starvation.

I'm not sure in which other way we could fix that actually. Maybe a
pre-allocated pool of mbufs?

> + =C2=A0 =C2=A0 =C2=A0 if (ntohs(ah->ar_hrd) !=3D ARPHRD_ETHER &&
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ntohs(ah->ar_hrd) !=3D ARPHRD_IEEE80=
2 &&
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ntohs(ah->ar_hrd) !=3D ARPHRD_ARCNET=
 &&
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ntohs(ah->ar_hrd) !=3D ARPHRD_IEEE13=
94) {
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 NETDDEBUG("nd_handle_a=
rp: unknown hardware address fmt "
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 "0x%2D)\=
n", (unsigned char *)&ah->ar_hrd, "");
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 return;
> + =C2=A0 =C2=A0 =C2=A0 }
>
> Are you sure you don't want to just check for ETHER here?

I'd really like to hear Ryan's or Ed's idea here.

> + =C2=A0 =C2=A0 =C2=A0 /* XXX: Probably should check if we're the recipie=
nt MAC address */
> + =C2=A0 =C2=A0 =C2=A0 /* Done ethernet processing. Strip off the etherne=
t header */
>
> Yes, quite probably. =C2=A0What if you panic in promiscuous mode?
>
> + =C2=A0 =C2=A0 =C2=A0 /*
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0* The first write (at offset 0) is the kerne=
l dump header. =C2=A0Flag it
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0* for the server to treat specially. =C2=A0X=
XX: This doesn't strip out
> the
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0* footer KDH, although it shouldn't hurt any=
thing.
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0*/
>
> The footer allows us to confirm that the tail end of a dump matches the
> beginning; while probably not strictly necessary in this scenario, it's n=
ot
> a bad idea given the lack of a checksum.

So I assume you are in favor of it, right?

> + =C2=A0 =C2=A0 =C2=A0 /* Default the nic to the first available interfac=
e */
> + =C2=A0 =C2=A0 =C2=A0 if ((ifn =3D TAILQ_FIRST(&ifnet)) !=3D NULL) do {
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if ((ifn->if_flags & I=
FF_UP) =3D=3D 0)
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 continue;
>
> More locking needed.

Please refer to the second patch I posted.
It should have proper locking and actually this code is just trimmed
(more locking in V_ifnet accessings, but not in this codepath).

>
> - =C2=A0 =C2=A0 =C2=A0 void =C2=A0 =C2=A0*if_pspare[7];
> + =C2=A0 =C2=A0 =C2=A0 struct =C2=A0netdump_methods *if_ndumpfuncs; /* ne=
tdump virtual methods
> */
> + =C2=A0 =C2=A0 =C2=A0 void =C2=A0 =C2=A0*if_pspare[6];
> =C2=A0 =C2=A0 =C2=A0 =C2=A0int =C2=A0 =C2=A0 if_ispare[4];
>
> In HEAD, at least, you should add your field and not use the spare. =C2=
=A0The
> spare should only be used on a merge to a KBI-stable branch (such as 8.x)=
.

Thanks, for picking this.

> I need to ponder your changes to ifnet and to the drivers some more; I
> recognize the benefits of maximal code alignment, but I worry a lot about
> these code paths having side effects (not least, calls into memory
> allocators, which in turn can enter VM, etc). =C2=A0I wonder if, given th=
at, we
> need to teach the mbuf allocator to be much more conservative if netdumpi=
ng
> is active...

Sorry, which calls from drivers should get in the memory allocator?
Are you just referring to the mbuf headers allocation?
Changes to the drivers are mostly stright-forward -- they just try to
do polling in locked or unlocked mode, following DDB entering or not
(basically how other DDB-agnostic routines already do in FreeBSD for
the known locking problems we discussed several times in the past).

Thanks for your valuable feedback, looking forward to hear more.

Attilio


--=20
Peace can only be achieved by understanding - A. Einstein