From owner-freebsd-security  Fri Jul  9 10:11:56 1999
Delivered-To: freebsd-security@freebsd.org
Received: from ns.mt.sri.com (unknown [206.127.79.91])
	by hub.freebsd.org (Postfix) with ESMTP id 39DA415645
	for <freebsd-security@FreeBSD.ORG>; Fri,  9 Jul 1999 10:11:51 -0700 (PDT)
	(envelope-from nate@mt.sri.com)
Received: from mt.sri.com (rocky.mt.sri.com [206.127.76.100])
	by ns.mt.sri.com (8.8.8/8.8.8) with SMTP id LAA11951;
	Fri, 9 Jul 1999 11:11:33 -0600 (MDT)
	(envelope-from nate@rocky.mt.sri.com)
Received: by mt.sri.com (SMI-8.6/SMI-SVR4)
	id LAA07208; Fri, 9 Jul 1999 11:11:32 -0600
Date: Fri, 9 Jul 1999 11:11:32 -0600
Message-Id: <199907091711.LAA07208@mt.sri.com>
From: Nate Williams <nate@mt.sri.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
To: Robert Watson <robert+freebsd@cyrus.watson.org>
Cc: Nate Williams <nate@mt.sri.com>,
	Darren Reed <avalon@coombs.anu.edu.au>, Ben Gras <ben@nl.euro.net>,
	freebsd-security@FreeBSD.ORG
Subject: Re: how to keep track of root users?
In-Reply-To: <Pine.BSF.3.96.990709123354.24202J-100000@fledge.watson.org>
References: <199907091609.KAA06341@mt.sri.com>
	<Pine.BSF.3.96.990709123354.24202J-100000@fledge.watson.org>
X-Mailer: VM 6.34 under 19.16 "Lille" XEmacs Lucid
Sender: owner-freebsd-security@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

> > > #ifdef POSIX_AUD
> > > 	/* allocate the record */
> > > 	if (!audrec = k_aud_new_record())
> > > 		return(...appropriate error...)
> > 
> > Or not.  No being able to create an audit record should not cause the
> > syscall to fail, but that's another discussion in itself. :)
> 
> Indeed it is--offhand I see two choices: 1) you don't let a syscall
> succeed unless you can audit it, as otherwise you won't set off your IDS
> as they can overload the system, or 2) an IDS module recognizes that a
> congested audit system may be an attack.  Blocking for an available record
> might be a problem if it results in deadlock...

My thinking is that we 'pre-allocate' a AUDIT_RECORD_FAILED record, and
use it to inform the system that a record was unable to be generated.
Therefore, you have an idea that something is missing, but you don't
slow down the the system or cause deadlock.

> > > The problem raised here again, of course, is the copyin of string
> > > arguments.
> > 
> > I don't see any way around this, given the audit record needs to exist
> > as a discrete record that has a lifetime outside of the syscall, so the
> > information must be copied in.  Yes, it does mean that it will have to
> > be copied in to stored in the kernel, and then copied out, but given
> > that the time difference between in/out could be long (in terms of
> > computer time) I can't think of another solution.
> > 
> > Does anyone else have any ideas?
> 
> My concern was that it was being copied in twice, as opposed to that it
> was being copied in.  I'm tempted to pull the copyin out of namei and
> instead pass in a string buffer to namei, stored in kernel space.

Ahh, I understand now.  You are worried about one or the other of
namei/audit copyin being redundant.  I misunderstood both you and
Garrett.  Would it be possible to copy the string from the namei buffer,
thus avoiding the issue of modifying namei?

> > > Another problem is error-handling: at any possible exit point
> > > from the syscall, we need to commit an audit record describing the exit
> > > (in error, success, etc).
> > 
> > Again, I am in total agreement with you.  Especially given that we have
> > already agreed that the type of information gathered is already
> > syscall specific.  Adding exit hooks isn't that much more difficult.
> 
> I did something like this to add speculative process execution to
> FreeBSD/i386 a few months ago (that is, generating disk prefetch hints
> based on speculatively executing a sandboxed process copy), and it proved
> quite straightforward.  However, I believe the architecture-dependent code
> is what sits directly below the syscall code: we should perhaps insert
> another architecture-independent layer that wraps the syscall, where
> things like this can be placed.

However, in the 'generic' code, it may not be obvious why the error
occured, and this makes it more difficult to generate an audit record
'atomically' since the creation of the record happens in a completely
different code-base from the 'end' of the record.  We'd need to design
some sort of even model in the audit record generation code, as well as
pass in information in each sub-record to identify which record the
sub-record belongs to.

> Similarly, auditing signal delivery would
> need to happen the same way: currently signal deliver lives in
> architecture-dependent-land, and we'd want the auditing wrapper to sit
> somewhere independent of architecture, I suspect.

Are signals required for IDS?  (Showing my ignorance here...)

> > > This suggests instead making auditing to some
> > > extent implicit to the syscall: a record is created associated with the
> > > process structure (or thread or whatevr) when entering kernel mode, and
> > > committed when returning to userland (or explicitely committed if we are
> > > never going to return, i.e., the process called _exit).  Kernel code in
> > > the syscall may optionally add additional information about the kernel
> > > entry point using a set of calls that automatically modify the implicit
> > > audit record state associated with the proc, meaning no need to allocate
> > > an audrec or pass it into all the routines, as it might be in
> > > p->p_curaudrec.
> > > 
> > > #ifdef POSIX_AUD
> > > 	AUD_SET_SYSCALL(AUD_AEV_CHMOD);
> > > 	AUD_ADD_ARG(AUD_PATHNAME, ...);
> > > 	AUD_ADD_ARG(AUD_MODE, SC(args, mdoe));
> > > 	...
> > > #endif
> > 
> > I don't think this will work, simply because how do we differentiate
> > between different syscall that will eventually be running in parallel in
> > the kernel?
> 
> As Garrett mentions, there will still be a context record from somewhere
> that could be extended to carry an active audit record for the activities
> of that context.  Presumably that is the place to put it?

How is this record 'identified' from the other records being generated
in parallel by the other CPUs?  (In other words, what identifies this
process from other process in the above code.  We're not passing the
proc structure around....)

> > > Information like credentials, pid, return code, syscall number would
> > > automatically be inserted when available by the syscall handler.
> > 
> > Why not just add it at the entry point to the syscall?  We're going to
> > have to instrument them all anyway, so why not make things consistant by
> > instrumenting at the syscall entry/exit points?
> 
> Yes.
> 
> > > Syscall
> > > number could be overriden by an explicit call.  This still leaves us with
> > > dealing with the arguments, especially pathnames and arrays of strings
> > > (e.g., argv[] or env[]).
> > 
> > See above.  We can properly deal with the arguments if we have the
> > context of these arguments, which of course we do at particular syscall
> > entry point (execve, chmod, link, etc...)
> 
> But we do have to know what the syscall is, otherwise we don't understand
> the arguments.

We have a miscommunication.  When I say syscall entry/exit points, I'm
*NOT* talking about the machine dependant points, I'm talking about the
machine independant points.

n /sys/kern/kern_exec.c, the syscall 'entry/exit' point is in the
routine is:

/*
 * execve() system call.
 */
int
execve(p, uap, retval)
        struct proc *p;
        register struct execve_args *uap;
        int *retval;
{

This is the code that needs to be instrumented, otherwise we have a
nightmare on our hands.  We need to know that kind of information
anyway, so why not put in in the most likely place.  This also buys us
the cross-platform compatability (not MD code), and makes it *very*
obvious what information is gathered.

Unfortunately, it means changing lots of kernel files, but to do this
correctly and in a way that is understandable, I don't see a better
solution.

Trying to 'sniff' what the syscall is at a lower layer and generating
the necessary information means we may end up doing the same sort of
information gathering that already exists in the real system call
implementation.

In other words, I think we're in violent agreement, but I'm not sure. ;)

[ Kernel filtering ]

> > However, this is probably not needed for IDS-V1. :)
> 
> I agree.  Currently I have a userland matching mechanism, but it's not
> very efficient as this is really just an initial exploration.  My feeling
> is that some very simple limiting mechanisms would be quite sufficient to
> block the majority of the unneeded record generation.  For example,
> per-syscall-number and per-pid, per-uid, etc, plus a per-process enable
> flag on auditing.  This can be caught quite early, and all submissions
> become no-ops on the record.
> 
> > > POSIX.1E only defines a way to tell whether auditing is turned on or off
> > > for a specific process, and to toggle that (so that, for example, the
> > > audit daemon can turn off auditing so as to prevent feedback on audit
> > > record delivery).  This seems to broad to me.  Suppose active IDS modules
> > > only require fork(), exec() and exit() tracing--then delivering the
> > > 20,000 calls to gettimeofday() is a waste of resources.
> > 
> > See above.  However, building a truly generic filtering mechanism would
> > be 'hard to do', so for now I think we can live with no filtering, or a
> > very simple filtering scheme.  But, will the FreeBSD kernel maintainers
> > allow this is another story. :(
> 
> See above: simple stuff in kernel may be the optimum approach, and I
> suspect a little bit of simple goes a long way.

Agreed, although a mechanism similar to BPF may allow for more 'complex'
filtering mechanisms and still be quite effecient at the kernel.


Nate


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-security" in the body of the message