Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 14 Apr 2003 12:00:33 +0800
From:      "David Xu" <davidxu@freebsd.org>
To:        "Daniel Eischen" <eischen@pcnet1.pcnet.com>
Cc:        freebsd-threads@freebsd.org
Subject:   libpthread patch
Message-ID:  <006001c3023a$65fe01d0$f001a8c0@davidw2k>
References:  <Pine.BSF.4.21.0304102351360.94222-100000@InterJet.elischer.org>

next in thread | previous in thread | raw e-mail | index | archive | help
After tested Daniel's pthread patch, I found 50% of ACE test
program are core dumpped on my machine. So I studied the
libpthread source code after applied the patch, I found that
the main problem is thread state transition is not atomic,=20
for example in thr_mutex:

    mutex_queue_enq(*m, curthread);
    curthread->data.mutex =3D *m;
    /*
     This thread is active and is in a critical
     region (holding the mutex lock); we should
     be able to safely set the state.
    */
    THR_SET_STATE(curthread, PS_MUTEX_WAIT);

    /* Unlock the mutex structure: */
    THR_LOCK_RELEASE(curthread, &(*m)->m_lock);
    /* Schedule the next thread: */
    _thr_sched_switch(curthread);


thread sets its state to PS_MUTEX_WAIT, and call _thr_sched_switch,
but it is not under scheduler lock, so there is a race between
THR_SET_STATE and thr_sched_switch.
I have inserted  _kse_critical_enter() before THR_SET_STATE,
the code looks as following:

    mutex_queue_enq(*m, curthread);
    curthread->data.mutex =3D *m;
    _kse_critical_enter();
    /*
     This thread is active and is in a critical
     region (holding the mutex lock); we should
     be able to safely set the state.
    */
    THR_SET_STATE(curthread, PS_MUTEX_WAIT);

    /* Unlock the mutex structure: */
    THR_LOCK_RELEASE(curthread, &(*m)->m_lock);
    /* Schedule the next thread: */
    _thr_sched_switch(curthread);

I also commented out most code in thr_lock_wait() and
thr_lock_wakeup(), I think without better scheduler lock,
these code has race condition, and in most case will
this cause a thread be reinserted into runq while it=20
is already in this queue.

now, I can run ACE test programs without any core dumpped,
and only the following program are failed:

Cached_Conn_Test
Conn_Test
MT_Reactor_Timer_Test
Malloc_Test
Process_Strategy_Test
Thread_Pool_Test

a complete log file is at:
http://people.freebsd.org/~davidxu/run_test.log

the libpthread package I modified is at:
http://people.freebsd.org/~davidxu/libpthread.tgz

Also, I can run crew test program without any problem.

I think the whole scheduler lock should be reworked
to allow state transition is in atomic, my change is
not SMP safe, only works on UP, because kse_critical_enter
is only works for UP system. If we fixed this scheduler lock
problem, I think the libpthread will be stable enough.

David Xu





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?006001c3023a$65fe01d0$f001a8c0>