Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 12 Jul 1999 04:41:47 -0400 (EDT)
From:      Robert Watson <robert@cyrus.watson.org>
To:        Nate Williams <nate@mt.sri.com>
Cc:        Darren Reed <avalon@coombs.anu.edu.au>, Ben Gras <ben@nl.euro.net>, freebsd-security@FreeBSD.ORG
Subject:   Re: how to keep track of root users?
Message-ID:  <Pine.BSF.3.96.990712042257.8908B-100000@fledge.watson.org>
In-Reply-To: <199907091711.LAA07208@mt.sri.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 9 Jul 1999, Nate Williams wrote:

> My thinking is that we 'pre-allocate' a AUDIT_RECORD_FAILED record, and
> use it to inform the system that a record was unable to be generated.
> Therefore, you have an idea that something is missing, but you don't
> slow down the the system or cause deadlock.

Sounds great.  And the existence of such a record in the record stream
would cause an appropriate IDS module to start flailing.

> Ahh, I understand now.  You are worried about one or the other of
> namei/audit copyin being redundant.  I misunderstood both you and
> Garrett.  Would it be possible to copy the string from the namei buffer,
> thus avoiding the issue of modifying namei?

I haven't looked closely at the namei implementation -- at some point the
entire string is dumped into a KTRACE record by namei(), I assume, and
that might be an appropriate place to deal with this.  I don't have
bundles of source code in front of me, but I seem to recall that namei()
acts on a name lookup descriptor of some kind, perhaps allocated on the
stack for the process?

> > I did something like this to add speculative process execution to
> > FreeBSD/i386 a few months ago (that is, generating disk prefetch hints
> > based on speculatively executing a sandboxed process copy), and it proved
> > quite straightforward.  However, I believe the architecture-dependent code
> > is what sits directly below the syscall code: we should perhaps insert
> > another architecture-independent layer that wraps the syscall, where
> > things like this can be placed.
> 
> However, in the 'generic' code, it may not be obvious why the error
> occured, and this makes it more difficult to generate an audit record
> 'atomically' since the creation of the record happens in a completely
> different code-base from the 'end' of the record.  We'd need to design
> some sort of even model in the audit record generation code, as well as
> pass in information in each sub-record to identify which record the
> sub-record belongs to.

My thought was to arrange the code as follows.  Currently, the FreeBSD
kernel code does something like the following:

                   kern_exec.c:execve()
                   +------------------+
archspecific:trap.c|                  |
-------------------+                  +---------

Again, without source in front of me I'm not sure I have this exacting
right but... I am suggesting breaking out some of the syscall switch code
from occuring in the architecture-specific section.  I.e., 

                         kern_exec.c:execve()
                         +-------------------
      kern/kern_syscall.c|
      +------------------+
archsp|
------+

And the middle syscall layer would accept a struct proc and information on
the syscall request.  Its responsibility would be to create an audit
record, tag it with the syscall number, pid, credentials, and any other
data consistent across syscalls that needs to be in the audit record, then
call the appropriate syscall code through the switch/sysvec array.  On
return from the syscall, it would tag the audit record with the return
code, error condition a timestamp, and commit the audit record.  It would
then be the responsibility of the syscall code itself to submit a list of
arguments as appropriate, as well as any more detailed subject/object
information, etc. 

This gives us a architecture-independent location to automatically manage
audit records for syscalls, and allows the syscall-specific code to manage
its own syscall-specific audit data as it sees fit.  Presumably either the
current audit record reference would be passed in as an argument to the
syscall, be accessible via curproc, or whatever the most appropriate way
to do it would be.  Signals could probably be handled the same way.

> > Similarly, auditing signal delivery would
> > need to happen the same way: currently signal deliver lives in
> > architecture-dependent-land, and we'd want the auditing wrapper to sit
> > somewhere independent of architecture, I suspect.
> 
> Are signals required for IDS?  (Showing my ignorance here...)

A useful IDS module might consist of:

   Watch for a process that receives data from a network socket or local
   IPC pipe and sigsegv's within two (or n) syscalls of receiving the
   data. 

Or more specifically:

   Watch for a process started by inetd that receives data from a network
   socket or local IPC pipe and sigsegv's within n syscalls of receiving
   the data.

Etc, etc.

Clearly also you'd want to be able to ignore uninteresting signals, as
things like timer delivery might be quite boring in most circumstances.

> > As Garrett mentions, there will still be a context record from somewhere
> > that could be extended to carry an active audit record for the activities
> > of that context.  Presumably that is the place to put it?
> 
> How is this record 'identified' from the other records being generated
> in parallel by the other CPUs?  (In other words, what identifies this
> process from other process in the above code.  We're not passing the
> proc structure around....)

Well, presumably proc p is passed into the syscall, and could be passed
elsewhere.  I would imagine that p is passed into namei() as it would be
required to work from the cwd and process root, etc.  What probably does
require some more thought would be various thread models that might be
implemented later.  Really we want to be able to have one audit record in
process at any time for each possible active process context, I believe.

> n /sys/kern/kern_exec.c, the syscall 'entry/exit' point is in the
> routine is:
> 
> /*
>  * execve() system call.
>  */
> int
> execve(p, uap, retval)
>         struct proc *p;
>         register struct execve_args *uap;
>         int *retval;
> {
> 
> This is the code that needs to be instrumented, otherwise we have a
> nightmare on our hands.  We need to know that kind of information
> anyway, so why not put in in the most likely place.  This also buys us
> the cross-platform compatability (not MD code), and makes it *very*
> obvious what information is gathered.
> 
> Unfortunately, it means changing lots of kernel files, but to do this
> correctly and in a way that is understandable, I don't see a better
> solution.
> 
> Trying to 'sniff' what the syscall is at a lower layer and generating
> the necessary information means we may end up doing the same sort of
> information gathering that already exists in the real system call
> implementation.
> 
> In other words, I think we're in violent agreement, but I'm not sure. ;)

Largely.  The nice thing about adding the audit record creation/commital
outside of this syscall code though is that we get at least introductory
auditing on all syscalls without hassle: we know the syscall number,
information on the process, and the return value.  We can also take this
opportunity to determine if the syscall actually did tag it at all: we can
flag audit records before executing the syscall and observe if they
cleared the flag by submitting data.  If not, we can raise a warning
message that a syscall without auditing instrumented for it has been
called, warning the sysadmin or syscall implementer that all is not well
from an auditing standpoint.

We indeed both violently agree that argument auditing, etc, must happen
inside the syscall.  But my temptation is to push a little of the
general-purpose work out of the syscall and into the handler.

> > > > POSIX.1E only defines a way to tell whether auditing is turned on or off
> > > > for a specific process, and to toggle that (so that, for example, the
> > > > audit daemon can turn off auditing so as to prevent feedback on audit
> > > > record delivery).  This seems to broad to me.  Suppose active IDS modules
> > > > only require fork(), exec() and exit() tracing--then delivering the
> > > > 20,000 calls to gettimeofday() is a waste of resources.
> > > 
> > > See above.  However, building a truly generic filtering mechanism would
> > > be 'hard to do', so for now I think we can live with no filtering, or a
> > > very simple filtering scheme.  But, will the FreeBSD kernel maintainers
> > > allow this is another story. :(
> > 
> > See above: simple stuff in kernel may be the optimum approach, and I
> > suspect a little bit of simple goes a long way.
> 
> Agreed, although a mechanism similar to BPF may allow for more 'complex'
> filtering mechanisms and still be quite effecient at the kernel.

My concern is not that a language and programming model couldn't be
devised, but that seeking through arguments to syscalls and managing types
may be fairly costly: with packets in bpf you just look at offsets and
values, and no looping.  Clearly this is something we'll need to
experiment with quite a bit: having a flexible query front end in the
user-land audit daemon could allow us to experiment with various backends
with various degrees of complexity.

  Robert N M Watson 

robert@fledge.watson.org              http://www.watson.org/~robert/
PGP key fingerprint: AF B5 5F FF A6 4A 79 37  ED 5F 55 E9 58 04 6A B1
TIS Labs at Network Associates, Computing Laboratory at Cambridge University
Safeport Network Services



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-security" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.3.96.990712042257.8908B-100000>