From owner-freebsd-dtrace@freebsd.org Fri Dec 30 17:37:54 2016 Return-Path: Delivered-To: freebsd-dtrace@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 79EB6C97B06 for ; Fri, 30 Dec 2016 17:37:54 +0000 (UTC) (envelope-from domagoj.stolfa@gmail.com) Received: from mail-wm0-x243.google.com (mail-wm0-x243.google.com [IPv6:2a00:1450:400c:c09::243]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 076211609 for ; Fri, 30 Dec 2016 17:37:54 +0000 (UTC) (envelope-from domagoj.stolfa@gmail.com) Received: by mail-wm0-x243.google.com with SMTP id l2so46408751wml.2 for ; Fri, 30 Dec 2016 09:37:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:subject:message-id:mime-version:content-disposition :user-agent; bh=JBBDE536A+fZsuA+1rfoy4VXwGgxiZ/jRDzRvjRu+zI=; b=RhLNEw8DTqFzLbcL74R9CNSE9ozS9/fNGBbKm7Qrvdt94a0HTuo0iO7thEutLQuP0C kH6RU0d13Sf4i8yWj9944R+y4yqdQYpoQQ/YaJRdgC6fGFEEQCn8WQKH/ALZanDmCnub 4ks/SZgWzW0qBNIgLo7WpOr5bB52alBMQkaoRs1OedjPMnScYOieigeKkMscZkip6Mq9 j9x+KwtMuOIgW4A4BX+EtzEjPr30VmjbQhKon4gk6aSyryyVbvHoiNTr8qf4KWQyNdDM ctLUyX2ESazIXhJNKEtFGqvx1+k/ejkjZbhilELyew4xAOiGQnANwuUnzuipl6vQ+Q01 keKQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:subject:message-id:mime-version :content-disposition:user-agent; bh=JBBDE536A+fZsuA+1rfoy4VXwGgxiZ/jRDzRvjRu+zI=; b=tXAQwIGJXOy33lRqFGpOsPUuxnEQt9ZNFUnnrWl8UP8C2QfV+a21t2r+rFGaDwJwKC Hcs+DpI/PxPvFDPo2zBZRyT6pxTAQO2omcCrhXWJtFN5OukE8MQ7UmDp41KCrNQAoiXf eYubX43/P2kFxWhTfZU3ZvdXA7eSD4TdarMhXzKGDGjrSL9Qf8xdUgZVuWyPbENpwEUy crZj3UA7l++q7Kv7iB8LQuucPNSM7oRdmVKB9Fzo33Wil8UBrjF+6MWn/JodORfccGS3 wwFgOR3MpT8xqJEhHiY/1BuonaWyILpTXY3ODqWnr9yhwjMomcywOl3UlQBrDjepeTah j4wg== X-Gm-Message-State: AIkVDXK2fWgy2Lnv6EAMJ9j+f8Ckbdy4lZCKJZ6RCVvAAHdwlm9ZB2R1hxmbLLfxe7u4+w== X-Received: by 10.28.152.137 with SMTP id a131mr40643304wme.56.1483119471289; Fri, 30 Dec 2016 09:37:51 -0800 (PST) Received: from freebsd-laptop (224-136.dsl.iskon.hr. [89.164.224.136]) by smtp.gmail.com with ESMTPSA id ba10sm74422152wjb.32.2016.12.30.09.37.49 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 30 Dec 2016 09:37:50 -0800 (PST) Date: Fri, 30 Dec 2016 18:37:47 +0100 From: Domagoj Stolfa To: freebsd-dtrace@freebsd.org Subject: RFC: Changes in DTrace to allow for distributed operation Message-ID: <20161230173747.GB46006@freebsd-laptop> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="mojUlQ0s9EVzWg2t" Content-Disposition: inline User-Agent: Mutt/1.7.2 (2016-11-26) X-BeenThere: freebsd-dtrace@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "A discussion list for developers working on DTrace in FreeBSD." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 Dec 2016 17:37:54 -0000 --mojUlQ0s9EVzWg2t Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Hello, I have been working on extending DTrace to allow for a natural way of traci= ng in a distributed environment. This would consist of being able to trace events= on different virtual machines, remote servers with access, cluster nodes and s= o on. I will summarize the changes I have made and have thought of making, outly = all the design tradeoffs, flaws and merits of each design tradeoff I have thoug= ht of making in hopes of getting feedback from others interested in distributed tracing. The following abbreviations will be used: instance -> Operating system instance, running on a VM or bare metal. UUIDv1 -> Universally unique identifier version 1 as per RFC4122 UUIDv5 -> Universally unique identified version 5 as per RFC4122 host -> the DTrace instance running on the machine that issued the DTrace script. DDAG -> Distributed directed acyclic graph =20 Starting off with an added struct in the kernel as a part of the DTrace framework: typedef struct dtrace_instance { char *dtis_name; struct dtrace_provider *dtis_provhead; struct dtrace_instance *dtis_next; struct dtrace_instance *dtis_prev; } dtrace_instance_t; where: dtis_name -> instance name dtis_provhead -> first provider in the instance dtis_next, dtis_prev -> doubly linked list nodes - Each instance is identified by it's name, which implies that once an inst= ance with a given name is created, all other instances with that name will be identified equally on the host. - Each new instance is added at the start of the list and becomes the new l= ist head. Merits: - The instances being identified by their name allows for an easy transition between the framework and the scripts one would be writing. - There is no redundancy in the list, which allows for both less memory bei= ng used, less indirections in traversing and looking up probes in the hash in order to identify which instance they belong to. Flaws: - This does not identify the instance that fired the probe in an unique way= =2E In order to get this information the provider needs to be known(however, thi= s is known from the dtrace_probe struct). The problem with this approach comes= when we want to send the appropriate information on level up(towards the host)= =2E What needs to be sent is the probe ID, which then needs to be mapped to the appropriate ID on the host.=20 - Using just the instance name is not enough to identify which instance the provider/probe belongs to. Possible resolution: - A probe ID could be sent over to the host with the change in the DTrace framework being made so that dtrace_probes array is no longer kept global= ly. Instead, it would be kept in the dtrace_instance struct. This would allow= to easily identify the instance where the probe needs to be fired, and would eliminate the need for the additional hash table. - In order to be able to identify the instance that the provider belongs to= , a UUID could be kept in the way that will further be explained. Additionall= y, the dtpv_next pointer could be used differently in such a way that it is = no longer a list of providers, but a list of providers in an instance. This = could be accomplished by keeping a list of providers of each instance in the dtrace_instance struct, or alternatively, implementing the semantics of t= he provider list differently, so that it can easily be identified which prov= iders belong to which instance. Another thing that needs to be changed is the way that providers are identi= fied. In a distributed setting, it is not sufficient to identify a provider based= on it's memory address, which is what DTrace currently does. This can be done through combined use of UUIDv1 and UUIDv5. - Each provider would have a corresponding UUID assigned to it. The way this would be done is starting at the endpoint. It would then advertise it's namespace-local UUID(UUIDv1 in this case) one level up. That instance wou= ld=20 then generate a namespace-local UUID for the providers that originate fro= m the instance that has just advertised it's UUID. The UUID in this case would = be a UUIDv5, combining the UUIDv1 generated in the endpoint with the name of t= he instance. The UUIDv5 generated on the node would be kept as a namespace-l= ocal UUID on each provider that originated from the endpoint. This would then further be advertised one more level up, again, generating a UUIDv5. Using this, two DDAGs would be built implicitly. This can be demonstrated on the following topology: VM{0...n}{0...m} / VM{0...n} / =20 P1 |=20 |=20 / H ----- P2 - VM{0...n} - VM{0...n}{0...m} \ | | Pk \ VM{0...n} \ VM{0...n}{0...m} where P{1}, ..., P{k} are bare-metal machines, VM{0}, ..., VM{n} top level virtual machines and VM{i}{0}, ..., VM{i}{m} nested virtual machines in the= i-th top level virtual machine. The nested virtual machines, VM{i, j} would generate their own UUIDv1 for a= ll their providers. This is guaranteed to be unique due to the fact that DTrace locks every time it creates a new provider. Following that, each of the providers from VM{i, j} would get advertised to= it's corresponding virtualization host, VM{i}. VM{i} would then generate a UUIDv= 5 for each of the providers that were advertised from VM{i, j}. The namespace name that could be used is the name of the VM. This guarantees the uniqueness of= each UUIDv5 generated on VM{i}. Furthermore, each of the VMs, VM{i} would then advertise it's providers(including the providers that were advertised from the nested VMs, VM{i, j}) to P{x}. P{x} would in the same fashion generate the UUIDv5 and finally, advertise to H, which would then have all the providers from diffe= rent machines. The difference in the case of P{x} advertising to H is that the VM name could not be used, because in this case P{x} is a bare metal machine connected through the network to H, to which H has access to. One could use= the public IP address(assuming no anycast)/hostname and/or port here. In order to be able to identify these different machines, there two UUIDs w= ould need to be stored in the dtrace_provider struct. Namely, a namespace-local = UUID generated on the host machine and the provider UUID that was generated on t= he machine that advertised the provider, so that the graph could then be trave= rsed. This would form a DDAG in the direction of tracing information flow from the perspective of VM{i, j}. That means that H would get information from VM{i,= j}, but there should be no way that VM{i, j} gets any information from H in ter= ms of data that is disclosed local to H. H could identify exactly which instance = has fired the probe.=20 Another DDAG would be formed in the opposite direction, which would be used= to instruct other instances what to do. These actions could be DTrace destruct= ive actions, asking for identification of a certain machine and similar things.= It is important that this indeed is a DDAG, as there should be no possiblity f= or this request to circle back around to the host. Additionally, in case of conflicts, UUID pocketing could be employed and si= mply store the identifying information in that form. This approach requires the restructuring of the DTrace Provider-to-Framework API. Namely, there needs to be a way to tell DTrace what instance is being registered, what instance a probe is firing in and a way to index them. Thi= s can be made backwards-compatible. Consider the following example of ensuring th= at there are no changes that need to be made in the existing providers for cor= rect operation of DTrace: dtrace_register() becomes dtrace_distributed_register(), where the former is implemented with the latter by simply passing in the instance as "host". Merits: - Allows for a concise way of storing the identifying information on the ho= st, allowing for DTrace operations such as dtrace_register(), dtrace_probe() = to operate in a similar fashion as they do now with instance-awareness inclu= ded. These operations could be implemented very efficiently. - Easily scalable to an arbitrary amount of nodes Flaws: - The instances need to be trusted. There is room for malicious operation of these instances in the proposed approach if the deployment is arbitrary. - While the existing DTrace operations can be performed efficiently, there = is an accumulation of the instructions in the operations, resulting in a lar= ger probe effect. This might prove problematic for some critical tasks and add complexity to DTrace. Possible resolution: - For virtual machines, VMI could be emplyed. This could help verify whethe= r or not the virtual machines are operating in a non-malicious manner. Many of these things are subject to change. This approach has mainly evolved =66rom the goal of tracing virtual machines with DTrace through bhyve. The = details on how the interoperability between the DTrace instances would be implement= ed have been intentionally left out, as it is not the scope of this RFC email(though I am more than willing to provide the information on the side = of virtual machines should it be needed). --=20 Best regards, Domagoj Stolfa. --mojUlQ0s9EVzWg2t Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQEzBAEBCAAdFiEEHQB+y96lmmv+IXofwxT+ikb0YU4FAlhmm2sACgkQwxT+ikb0 YU6nUgf/d/lABqPF+kG/dBrBYQmPQUWlfs5Pf11dbOiM9FC11t8HSPE/CKKEVLiN HK6DHSrpbGEvOI+c+Bk1aw6tU6vRVLjPWi3KLuScpFzfIUdiUOHJC5bKur2QHFLP FVOp+0Te6EqEmmVYY18PTGDKHpnSXw7f2j2wY98VK4JFXf8cBoLMg/0dSEJucCfz yxnNF5JK/Jun8yG8pNin7OZm+tGjPdU/WIpLDpL0JACwv+Zbf8nRGylOoe6Wwv06 3xLJqc3O7SpEO6XFcWOGcs7lGoSkVsUJU/FRKrcfcDX6mEalOOUB7yMyy4B3Sjui c7EIK/+WDIUqPqIzx6o3w6VByrvWdg== =6c/r -----END PGP SIGNATURE----- --mojUlQ0s9EVzWg2t--