Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 7 Jan 2007 11:49:56 +0000 (GMT)
From:      Robert Watson <rwatson@FreeBSD.org>
To:        Ceri Davies <ceri@submonkey.net>
Cc:        stable@FreeBSD.org
Subject:   Re: (audit?) Panic in 6.2-PRERELEASE
Message-ID:  <20070107114243.K41371@fledge.watson.org>
In-Reply-To: <20070106132540.GG7088@submonkey.net>
References:  <20070105111954.GA51511@submonkey.net> <20070105120539.H46119@fledge.watson.org> <20070105131528.GB7088@submonkey.net> <20070105133028.F98541@fledge.watson.org> <20070105150857.GC7088@submonkey.net> <20070106120040.N46119@fledge.watson.org> <20070106132540.GG7088@submonkey.net>

next in thread | previous in thread | raw e-mail | index | archive | help

On Sat, 6 Jan 2007, Ceri Davies wrote:

>>> So far it's happened this morning and yesterday morning.  I haven't seen 
>>> it before that.  I don't know the cause so I can't reproduce it at will, 
>>> but the logs don't give any indication.  Chances are that it will happen 
>>> again tomorrow, but we'll see.
>>
>> Hmm.  It looks like you printf *(td->td_proc->p_fd->fd_ofiles) without the 
>> array index.  Could you repeat that, but with the array index -- i.e., 
>> td->td_proc->p_fd->fd_ofiles[uap->fd]?  Also, it would probably be useful 
>> to print uap->fd.  Right now you're printing stdin (index 0), but if the 
>> index is non-0, we want a different file.
>
> Very tactfully put :)  Sorry about that.
>
> None of the uap->fd's seem to be valid. In the first case, uap->fd is way 
> too high for the length of fd_ofiles, which only has 21 elements:
>
> (kgdb) up 8
> #8  0xc04c470d in fstat (td=0xc2eeb180, uap=0xd610dc74) at /usr/src/sys/kern/kern_descrip.c:1075
> 1075            error = kern_fstat(td, uap->fd, &ub);
> (kgdb) p uap->fd
> $1 = 89
> (kgdb) p *td->td_proc->p_fd->fd_ofiles[uap->fd]
> Cannot access memory at address 0x0
>
> In the second, uap->fd is nonsense:
>
> (kgdb) up 8
> #8  0xc04c470d in fstat (td=0xc3109300, uap=0xd617ec74) at /usr/src/sys/kern/kern_descrip.c:1075
> 1075            error = kern_fstat(td, uap->fd, &ub);
> (kgdb) p uap->fd
> $1 = -1023449232
> (kgdb)

Hmm.  So, I reviewed audit_arg_file() closely, and after staring at the code a 
lot, couldn't see anything obvious in either the socket or the vnode/fifo 
case.  I did fix one other bug there, however, which can never actually be 
exercised in 7-CURRENT, and is fairly unlikely in 6-STABLE, and will MFC that 
in a week.

Could you try printing *td->td_ar?  Maybe this will give us a clue as to how 
far it got.  In particular, this may be able to more reliably give us the file 
descriptor number, which is audited early in the system call.  You might find 
that 'td' is corrupted in many layers of the stack, keep going up until you 
find one where it's good.  It may well be that td->td_ar->k_ar.ar_arg_fd is 
correct, and might confirm that uap->fd is correct still.  We'd like also to 
know if ARG_SOCKINFO, ARG_VNODE1, or ARG_VNODE2 is set in the 
k_ar.ar_valid_arg field.  This may tell us some more about the file descriptor 
even though it appears to have vanished.

I'm quite worried by the fact that the file descriptor seems not to be present 
any more -- this suggests a file descriptor related race of the sort that is 
both quite difficult to figure out and also quite a risk.  It's strange that 
it would only trigger with audit, however--perhaps audit stretches out the 
race.  Is this an SMP box?

Could you print the entire contents of *td->td_proc->p_fd?

Thanks,

Robert N M Watson
Computer Laboratory
University of Cambridge



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070107114243.K41371>