Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 11 Feb 2014 13:25:24 -0500
From:      John Baldwin <jhb@freebsd.org>
To:        freebsd-hackers@freebsd.org
Cc:        Jens Krieg <jkrieg@mailbox.tu-berlin.de>
Subject:   Re: ULE locking mechanism
Message-ID:  <201402111325.24523.jhb@freebsd.org>
In-Reply-To: <FD4193F4-FA47-4D77-BC1F-23749D9B7E5F@mailbox.tu-berlin.de>
References:  <FD4193F4-FA47-4D77-BC1F-23749D9B7E5F@mailbox.tu-berlin.de>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tuesday, January 28, 2014 8:07:08 am Jens Krieg wrote:
> Hello,
>=20
> we are currently working on project for our university. Our goal is to=20
implement a simple round robin scheduler for FreeBSD 9.2 on a single core=20
machine.
> So far we removed most of the functionality of the ULE scheduler except t=
he=20
functions that are called from outside. The system successfully boots to us=
er=20
land with our RR scheduler managing thread in a list based run queue. Furth=
er,=20
it is possible to interact with the system using the shell.
>=20
> The next step is to replace the locking mechanism of the ULE scheduler.=20
Therefore, we replaced the scheduling dependent thread_lock/thread_unlock=20
functions by simply disabling/enabling the interrupts. With this modificati=
on=20
the kernel works fine until we hit the user land then the system crashes.
> The error occurs in the init user process (init_main.c:start_init:685). W=
e=20
found out that the page fault is triggered while executing the subyte funct=
ion=20
for the first time. See the error description below (unfortunately not show=
n=20
in backtrace).
> We compared the ULE scheduler with our RR implementation and it appears,=
=20
that the parameters passed to subyte as well as the register values are=20
identical. We assume, that whatever caused the error is related to the thre=
ad=20
locking replacement.
>=20
> Every time the kernel want to modify thread data the corresponding thread=
 is=20
locked to prevent any interference by other threads. Since we are using a=20
single core machine why isn=92t it sufficient to simply disable interrupt w=
hile=20
modifying thread data. Could you provide us with detailed information about=
=20
the locking mechanism in FreeBSD and also answer the following questions,=20
please.
>=20
> What is the purpose of thread_lock/thread_unlock besides protecting threa=
d=20
data?
> How does the TDQ LOCK works and how is it related to a thread LOCK?
> 	- all thread LOCKs of the thread located in the run queue pointing to th=
e=20
TDQ LOCK, and
> 	- the TDQ LOCK points to the currently running thread
> 	- on context switching the current thread passes the TDQ LOCK to the new=
=20
chosen thread
> 	- Could you explain the idea behind that locking concept, please?=20
> Any suggestions we shall care about in our own lock implementation?

thread_lock is quite intertwined with other locks.  E.g. when a thread is
blocked on a turnstile, thread_lock() for that thread locks the 'ts_lock'
spin mutex for that turnstile.  If you want to replace thread lock, you need
to change all the locks that td_lock can be to use your new primitive.  You=
'd
probably have an easier time just changing how mtx_lock_spin() works.  (In=
=20
fact, if you just disable 'options SMP', the stock kernel turns=20
mtx_lock_spin() into a function that just disables interrupts.)

=46or your core dump, the first step would be to use gdb to map that addres=
s to=20
a file line.  For example, you can just do 'l *fork_exit+0x9d', or you can =
do
'l *<instruction pointer>' where you use the value from the trap message.
Looking at that can probably tell you why you panic'd.

=2D-=20
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201402111325.24523.jhb>