Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 30 Dec 2016 18:37:47 +0100
From:      Domagoj Stolfa <domagoj.stolfa@gmail.com>
To:        freebsd-dtrace@freebsd.org
Subject:   RFC: Changes in DTrace to allow for distributed operation
Message-ID:  <20161230173747.GB46006@freebsd-laptop>

next in thread | raw e-mail | index | archive | help

--mojUlQ0s9EVzWg2t
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Hello,

I have been working on extending DTrace to allow for a natural way of traci=
ng in
a distributed environment. This would consist of being able to trace events=
 on
different virtual machines, remote servers with access, cluster nodes and s=
o on.
I will summarize the changes I have made and have thought of making, outly =
all
the design tradeoffs, flaws and merits of each design tradeoff I have thoug=
ht of
making in hopes of getting feedback from others interested in distributed
tracing.

The following abbreviations will be used:
 instance -> Operating system instance, running on a VM or bare metal.
 UUIDv1 -> Universally unique identifier version 1 as per RFC4122
 UUIDv5 -> Universally unique identified version 5 as per RFC4122
 host -> the DTrace instance running on the machine that issued the DTrace
         script.
 DDAG -> Distributed directed acyclic graph
=20

Starting off with an added struct in the kernel as a part of the DTrace
framework:

typedef struct dtrace_instance {
	char *dtis_name;
	struct dtrace_provider *dtis_provhead;
	struct dtrace_instance *dtis_next;
	struct dtrace_instance *dtis_prev;
} dtrace_instance_t;

where:
 dtis_name -> instance name
 dtis_provhead -> first provider in the instance
 dtis_next, dtis_prev -> doubly linked list nodes

- Each instance is identified by it's name, which implies that once an inst=
ance
  with a given name is created, all other instances with that name will be
  identified equally on the host.
- Each new instance is added at the start of the list and becomes the new l=
ist
  head.

Merits:

- The instances being identified by their name allows for an easy transition
  between the framework and the scripts one would be writing.
- There is no redundancy in the list, which allows for both less memory bei=
ng
  used, less indirections in traversing and looking up probes in the hash in
  order to identify which instance they belong to.

Flaws:
- This does not identify the instance that fired the probe in an unique way=
=2E In
  order to get this information the provider needs to be known(however, thi=
s is
  known from the dtrace_probe struct). The problem with this approach comes=
 when
  we want to send the appropriate information on level up(towards the host)=
=2E What
  needs to be sent is the probe ID, which then needs to be mapped to the
  appropriate ID on the host.=20
- Using just the instance name is not enough to identify which instance the
  provider/probe belongs to.

Possible resolution:
- A probe ID could be sent over to the host with the change in the DTrace
  framework being made so that dtrace_probes array is no longer kept global=
ly.
  Instead, it would be kept in the dtrace_instance struct. This would allow=
 to
  easily identify the instance where the probe needs to be fired, and would
  eliminate the need for the additional hash table.
- In order to be able to identify the instance that the provider belongs to=
, a
  UUID could be kept in the way that will further be explained. Additionall=
y,
  the dtpv_next pointer could be used differently in such a way that it is =
no
  longer a list of providers, but a list of providers in an instance. This =
could
  be accomplished by keeping a list of providers of each instance in the
  dtrace_instance struct, or alternatively, implementing the semantics of t=
he
  provider list differently, so that it can easily be identified which prov=
iders
  belong to which instance.

Another thing that needs to be changed is the way that providers are identi=
fied.
In a distributed setting, it is not sufficient to identify a provider based=
 on
it's memory address, which is what DTrace currently does. This can be done
through combined use of UUIDv1 and UUIDv5.

- Each provider would have a corresponding UUID assigned to it. The way this
  would be done is starting at the endpoint. It would then advertise it's
  namespace-local UUID(UUIDv1 in this case) one level up. That instance wou=
ld=20
  then generate a namespace-local UUID for the providers that originate fro=
m the
  instance that has just advertised it's UUID. The UUID in this case would =
be a
  UUIDv5, combining the UUIDv1 generated in the endpoint with the name of t=
he
  instance. The UUIDv5 generated on the node would be kept as a namespace-l=
ocal
  UUID on each provider that originated from the endpoint. This would then
  further be advertised one more level up, again, generating a UUIDv5. Using
  this, two DDAGs would be built implicitly. This can be demonstrated on the
  following topology:

                VM{0...n}{0...m}
               /
      VM{0...n}
     / =20
   P1
   |=20
   |=20
  /
H  ----- P2 - VM{0...n} - VM{0...n}{0...m}
  \
   |
   |
   Pk
     \
      VM{0...n}
               \
                 VM{0...n}{0...m}

where P{1}, ..., P{k} are bare-metal machines, VM{0}, ..., VM{n} top level
virtual machines and VM{i}{0}, ..., VM{i}{m} nested virtual machines in the=
 i-th
top level virtual machine.

The nested virtual machines, VM{i, j} would generate their own UUIDv1 for a=
ll
their providers. This is guaranteed to be unique due to the fact that DTrace
locks every time it creates a new provider.

Following that, each of the providers from VM{i, j} would get advertised to=
 it's
corresponding virtualization host, VM{i}. VM{i} would then generate a UUIDv=
5 for
each of the providers that were advertised from VM{i, j}. The namespace name
that could be used is the name of the VM. This guarantees the uniqueness of=
 each
UUIDv5 generated on VM{i}.

Furthermore, each of the VMs, VM{i} would then advertise it's
providers(including the providers that were advertised from the nested VMs,
VM{i, j}) to P{x}. P{x} would in the same fashion generate the UUIDv5 and
finally, advertise to H, which would then have all the providers from diffe=
rent
machines. The difference in the case of P{x} advertising to H is that the VM
name could not be used, because in this case P{x} is a bare metal machine
connected through the network to H, to which H has access to. One could use=
 the
public IP address(assuming no anycast)/hostname and/or port here.

In order to be able to identify these different machines, there two UUIDs w=
ould
need to be stored in the dtrace_provider struct. Namely, a namespace-local =
UUID
generated on the host machine and the provider UUID that was generated on t=
he
machine that advertised the provider, so that the graph could then be trave=
rsed.

This would form a DDAG in the direction of tracing information flow from the
perspective of VM{i, j}. That means that H would get information from VM{i,=
 j},
but there should be no way that VM{i, j} gets any information from H in ter=
ms of
data that is disclosed local to H. H could identify exactly which instance =
has
fired the probe.=20

Another DDAG would be formed in the opposite direction, which would be used=
 to
instruct other instances what to do. These actions could be DTrace destruct=
ive
actions, asking for identification of a certain machine and similar things.=
 It
is important that this indeed is a DDAG, as there should be no possiblity f=
or
this request to circle back around to the host.

Additionally, in case of conflicts, UUID pocketing could be employed and si=
mply
store the identifying information in that form.

This approach requires the restructuring of the DTrace Provider-to-Framework
API. Namely, there needs to be a way to tell DTrace what instance is being
registered, what instance a probe is firing in and a way to index them. Thi=
s can
be made backwards-compatible. Consider the following example of ensuring th=
at
there are no changes that need to be made in the existing providers for cor=
rect
operation of DTrace:

dtrace_register() becomes dtrace_distributed_register(), where the former is
implemented with the latter by simply passing in the instance as "host".

Merits:

- Allows for a concise way of storing the identifying information on the ho=
st,
  allowing for DTrace operations such as dtrace_register(), dtrace_probe() =
to
  operate in a similar fashion as they do now with instance-awareness inclu=
ded.
  These operations could be implemented very efficiently.
- Easily scalable to an arbitrary amount of nodes

Flaws:

- The instances need to be trusted. There is room for malicious operation of
  these instances in the proposed approach if the deployment is arbitrary.
- While the existing DTrace operations can be performed efficiently, there =
is
  an accumulation of the instructions in the operations, resulting in a lar=
ger
  probe effect. This might prove problematic for some critical tasks and add
  complexity to DTrace.

Possible resolution:

- For virtual machines, VMI could be emplyed. This could help verify whethe=
r or
  not the virtual machines are operating in a non-malicious manner.


Many of these things are subject to change. This approach has mainly evolved
=66rom the goal of tracing virtual machines with DTrace through bhyve. The =
details
on how the interoperability between the DTrace instances would be implement=
ed
have been intentionally left out, as it is not the scope of this RFC
email(though I am more than willing to provide the information on the side =
of
virtual machines should it be needed).

--=20
Best regards,
Domagoj Stolfa.

--mojUlQ0s9EVzWg2t
Content-Type: application/pgp-signature; name="signature.asc"

-----BEGIN PGP SIGNATURE-----

iQEzBAEBCAAdFiEEHQB+y96lmmv+IXofwxT+ikb0YU4FAlhmm2sACgkQwxT+ikb0
YU6nUgf/d/lABqPF+kG/dBrBYQmPQUWlfs5Pf11dbOiM9FC11t8HSPE/CKKEVLiN
HK6DHSrpbGEvOI+c+Bk1aw6tU6vRVLjPWi3KLuScpFzfIUdiUOHJC5bKur2QHFLP
FVOp+0Te6EqEmmVYY18PTGDKHpnSXw7f2j2wY98VK4JFXf8cBoLMg/0dSEJucCfz
yxnNF5JK/Jun8yG8pNin7OZm+tGjPdU/WIpLDpL0JACwv+Zbf8nRGylOoe6Wwv06
3xLJqc3O7SpEO6XFcWOGcs7lGoSkVsUJU/FRKrcfcDX6mEalOOUB7yMyy4B3Sjui
c7EIK/+WDIUqPqIzx6o3w6VByrvWdg==
=6c/r
-----END PGP SIGNATURE-----

--mojUlQ0s9EVzWg2t--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20161230173747.GB46006>