From owner-freebsd-threads@freebsd.org Mon Dec 5 15:25:24 2016 Return-Path: Delivered-To: freebsd-threads@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 67DA2C6713C for ; Mon, 5 Dec 2016 15:25:24 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 57B0F1F30 for ; Mon, 5 Dec 2016 15:25:24 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id uB5FPO3k009391 for ; Mon, 5 Dec 2016 15:25:24 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-threads@FreeBSD.org Subject: [Bug 214540] pam_exec isn't multithreading save Date: Mon, 05 Dec 2016 15:25:24 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: threads X-Bugzilla-Version: 11.0-RELEASE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: crest@bultmann.eu X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-threads@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 05 Dec 2016 15:25:24 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D214540 --- Comment #2 from Jan Bramkamp --- I found one case where the same issue in the Linux PAM pam_exec module broke vsftpd and vsftpd had to workaround the problem because afaik Linux lacks t= he required API to avoid this problem completely. I know that pam_exec is a ha= ck and should only be used for testing or after very careful analysis on the o= ther hand the documentation doesn't warn users about the problem and it's a nasty layering violation that blow up into the system administrators face and I d= on't want to be the poor bastard how has to debug this under time pressure. The = PAM policy isn't supposed to inject race conditions into otherwise "working" applications. Pointing to a "non-normative recommendation" won't help users bitten by this problem. My problem with this is that it's a accident waiting to happen and FreeBSD has the APIs to avoid this whole bug class. To make it worse the on= es who will run into the problem (system admins) are often incapable of debugg= ing and patching applications complex enough to use pthreads and PAM. --=20 You are receiving this mail because: You are the assignee for the bug.= From owner-freebsd-threads@freebsd.org Tue Dec 6 10:59:25 2016 Return-Path: Delivered-To: freebsd-threads@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 44E44C6AAA0 for ; Tue, 6 Dec 2016 10:59:25 +0000 (UTC) (envelope-from dimitri.staessens@intec.ugent.be) Received: from smtp2.ugent.be (smtp2.ugent.be [157.193.49.126]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 07F6D2BB for ; Tue, 6 Dec 2016 10:59:24 +0000 (UTC) (envelope-from dimitri.staessens@intec.ugent.be) Received: from localhost (mcheck3.ugent.be [157.193.71.89]) by smtp2.ugent.be (Postfix) with ESMTP id 19822B2402 for ; Tue, 6 Dec 2016 11:51:56 +0100 (CET) X-Virus-Scanned: by UGent DICT Received: from smtp2.ugent.be ([157.193.49.126]) by localhost (mcheck3.ugent.be [157.193.43.11]) (amavisd-new, port 10024) with ESMTP id MqnrulxLzhIm for ; Tue, 6 Dec 2016 11:51:55 +0100 (CET) Received: from mail2.intec.ugent.be (mail2.intec.ugent.be [157.193.214.245]) by smtp2.ugent.be (Postfix) with ESMTP id C0245B2253 for ; Tue, 6 Dec 2016 11:51:55 +0100 (CET) Received: from [10.10.131.46] (hal.ilabt.iminds.be [193.191.148.129]) (using TLSv1 with cipher ECDHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: dstaesse) by mail2.intec.ugent.be (Postfix) with ESMTPSA id A16E82D for ; Tue, 6 Dec 2016 11:51:55 +0100 (CET) From: Dimitri Staessens Subject: Unlocking a robust mutex in a cleanup handler To: freebsd-threads@freebsd.org Message-ID: <119e59d4-6125-f313-e6e6-67055a15d224@intec.ugent.be> Date: Tue, 6 Dec 2016 11:49:24 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.5.1 MIME-Version: 1.0 X-Miltered: at jchkm3 with ID 5846984B.003 by Joe's j-chkmail (http://helpdesk.ugent.be/email/)! X-j-chkmail-Enveloppe: 5846984B.003 from mail2.intec.ugent.be/mail2.intec.ugent.be/157.193.214.245/mail2.intec.ugent.be/ X-j-chkmail-Score: MSGID : 5846984B.003 on smtp2.ugent.be : j-chkmail score : . : R=. U=. O=. B=0.000 -> S=0.000 X-j-chkmail-Status: Ham Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.23 X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 06 Dec 2016 10:59:25 -0000 Dear devs, first of all, thank you for supporting robust mutexes in FreeBSD 11. I'm having some issues with a thread that holds a robust mutex (residing in a POSIX shared memory (shm) segment) in conjunctions with a condition variable (also in that POSIX shm) on which that thread is blocked via a pthread_cond_wait() call. pthread_cond_wait tries to retake the mutex after the thread cancellation signal is received, and there is a pthread_mutex_unlock pushed to the cleanup stack to unlock that robust mutex in case of a cancellation. Cancelling that thread works fine on Linux, however, on FreeBSD 11.0-RELEASE, if I pthread_cancel that thread I can't get past the following check and resulting PANIC call: https://github.com/freebsd/freebsd/blob/master/lib/libthr/thread/thr_mutex.c#L187 After removing the check and recompiling libthr everything seems to work fine. Could this be a bug in libthr or am I missing a nuance in the use of a shared robust mutexes in conjunction with condition variables? Kind regards, Dimitri -- Dimitri Staessens Ghent University - imec Dept. of Information Technology (INTEC) Internet Based Communication Networks and Services Technologiepark 15 9052 Zwijnaarde T: +32 9 331 48 70 F: +32 9 331 48 99 From owner-freebsd-threads@freebsd.org Tue Dec 6 11:26:04 2016 Return-Path: Delivered-To: freebsd-threads@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C68FAC692D4 for ; Tue, 6 Dec 2016 11:26:04 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4460F12EC for ; Tue, 6 Dec 2016 11:26:04 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id uB6BPwbK063341 (version=TLSv1 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Tue, 6 Dec 2016 13:25:58 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua uB6BPwbK063341 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id uB6BPw6M063292; Tue, 6 Dec 2016 13:25:58 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Tue, 6 Dec 2016 13:25:58 +0200 From: Konstantin Belousov To: Dimitri Staessens Cc: freebsd-threads@freebsd.org Subject: Re: Unlocking a robust mutex in a cleanup handler Message-ID: <20161206112558.GN54029@kib.kiev.ua> References: <119e59d4-6125-f313-e6e6-67055a15d224@intec.ugent.be> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <119e59d4-6125-f313-e6e6-67055a15d224@intec.ugent.be> User-Agent: Mutt/1.7.1 (2016-10-04) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 06 Dec 2016 11:26:04 -0000 On Tue, Dec 06, 2016 at 11:49:24AM +0100, Dimitri Staessens wrote: > Dear devs, > > first of all, thank you for supporting robust mutexes in FreeBSD 11. > > I'm having some issues with a thread that holds a robust mutex (residing > in a POSIX shared memory (shm) segment) in conjunctions with a condition > variable (also in that POSIX shm) on which that thread is blocked via a > pthread_cond_wait() call. pthread_cond_wait tries to retake the mutex > after the thread cancellation signal is received, and there is a > pthread_mutex_unlock pushed to the cleanup stack to unlock that robust > mutex in case of a cancellation. > > Cancelling that thread works fine on Linux, however, on FreeBSD > 11.0-RELEASE, if I pthread_cancel that thread I can't get past the > following check and resulting PANIC call: > https://github.com/freebsd/freebsd/blob/master/lib/libthr/thread/thr_mutex.c#L187 > > After removing the check and recompiling libthr everything seems to work > fine. > > Could this be a bug in libthr or am I missing a nuance in the use of a > shared robust mutexes in conjunction with condition variables? Most likely, this is a bug in libthr. But please extract the minimal reproduction case and send it to me. From owner-freebsd-threads@freebsd.org Tue Dec 6 14:20:16 2016 Return-Path: Delivered-To: freebsd-threads@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E27D9C6962C for ; Tue, 6 Dec 2016 14:20:16 +0000 (UTC) (envelope-from dimitri.staessens@intec.ugent.be) Received: from smtp2.ugent.be (smtp2.ugent.be [157.193.49.126]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id A4B749AB for ; Tue, 6 Dec 2016 14:20:16 +0000 (UTC) (envelope-from dimitri.staessens@intec.ugent.be) Received: from localhost (mcheck3.ugent.be [157.193.71.89]) by smtp2.ugent.be (Postfix) with ESMTP id 80201B210E; Tue, 6 Dec 2016 15:20:13 +0100 (CET) X-Virus-Scanned: by UGent DICT Received: from smtp2.ugent.be ([157.193.49.126]) by localhost (mcheck3.ugent.be [157.193.43.11]) (amavisd-new, port 10024) with ESMTP id YelxjzOLM51m; Tue, 6 Dec 2016 15:20:13 +0100 (CET) Received: from mail2.intec.ugent.be (mail2.intec.ugent.be [157.193.214.245]) by smtp2.ugent.be (Postfix) with ESMTP id 18FDFB2300; Tue, 6 Dec 2016 15:20:13 +0100 (CET) Received: from [10.10.131.46] (hal.ilabt.iminds.be [193.191.148.129]) (using TLSv1 with cipher ECDHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: dstaesse) by mail2.intec.ugent.be (Postfix) with ESMTPSA id EA4EE2C; Tue, 6 Dec 2016 15:20:12 +0100 (CET) Subject: Re: Unlocking a robust mutex in a cleanup handler To: Konstantin Belousov References: <119e59d4-6125-f313-e6e6-67055a15d224@intec.ugent.be> <20161206112558.GN54029@kib.kiev.ua> Cc: freebsd-threads@freebsd.org From: Dimitri Staessens Message-ID: <6a7139cd-b6db-d078-ee5e-b7c590eb13d1@intec.ugent.be> Date: Tue, 6 Dec 2016 15:17:41 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.5.1 MIME-Version: 1.0 In-Reply-To: <20161206112558.GN54029@kib.kiev.ua> X-Miltered: at jchkm3 with ID 5846C91D.001 by Joe's j-chkmail (http://helpdesk.ugent.be/email/)! X-j-chkmail-Enveloppe: 5846C91D.001 from mail2.intec.ugent.be/mail2.intec.ugent.be/157.193.214.245/mail2.intec.ugent.be/ X-j-chkmail-Score: MSGID : 5846C91D.001 on smtp2.ugent.be : j-chkmail score : . : R=. U=. O=. B=0.000 -> S=0.000 X-j-chkmail-Status: Ham Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.23 X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 06 Dec 2016 14:20:17 -0000 Dear Konstantin, thanks for your immediate response. Please find attached a minimal code example. I do hope I'm sending in the correct format, I'm new to the community. The test creates an integer, robust mutex and condition variable in shared memory. A thread blocks on that condition variable with the associated mutex. After one second, the main thread cancels the blocking thread. I compile as follows: gcc robust_test.c -lpthread -lrt -o robust_test and run ./robust_test On linux it gives the following output: [dstaesse@phoneutria]$ ./robust_test Initializing... Starting thread... Sleeping for one second... Thread started... Cancelling thread... Thread finished. Bye. On FreeBSD I get the following: $ ./robust_test Initializing... Starting thread... Sleeping for one second... Thread started... Cancelling thread... Fatal error 'inact_mtx enter' at line 188 in file /usr/src/lib/libthr/thread/thr_mutex.c (errno = 0) Abort trap (core dumped) Thanks again for your time, Dimitri On 12/06/16 12:25, Konstantin Belousov wrote: > On Tue, Dec 06, 2016 at 11:49:24AM +0100, Dimitri Staessens wrote: >> Dear devs, >> >> first of all, thank you for supporting robust mutexes in FreeBSD 11. >> >> I'm having some issues with a thread that holds a robust mutex (residing >> in a POSIX shared memory (shm) segment) in conjunctions with a condition >> variable (also in that POSIX shm) on which that thread is blocked via a >> pthread_cond_wait() call. pthread_cond_wait tries to retake the mutex >> after the thread cancellation signal is received, and there is a >> pthread_mutex_unlock pushed to the cleanup stack to unlock that robust >> mutex in case of a cancellation. >> >> Cancelling that thread works fine on Linux, however, on FreeBSD >> 11.0-RELEASE, if I pthread_cancel that thread I can't get past the >> following check and resulting PANIC call: >> https://github.com/freebsd/freebsd/blob/master/lib/libthr/thread/thr_mutex.c#L187 >> >> After removing the check and recompiling libthr everything seems to work >> fine. >> >> Could this be a bug in libthr or am I missing a nuance in the use of a >> shared robust mutexes in conjunction with condition variables? > Most likely, this is a bug in libthr. But please extract the minimal > reproduction case and send it to me. -- Dimitri Staessens Ghent University - imec Dept. of Information Technology (INTEC) Internet Based Communication Networks and Services Technologiepark 15 9052 Zwijnaarde T: +32 9 331 48 70 F: +32 9 331 48 99 From owner-freebsd-threads@freebsd.org Tue Dec 6 14:48:17 2016 Return-Path: Delivered-To: freebsd-threads@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D189BC6A9A4 for ; Tue, 6 Dec 2016 14:48:17 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 672DA3DB for ; Tue, 6 Dec 2016 14:48:17 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id uB6EmCGa019284 (version=TLSv1 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Tue, 6 Dec 2016 16:48:12 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua uB6EmCGa019284 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id uB6EmCrL019283; Tue, 6 Dec 2016 16:48:12 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Tue, 6 Dec 2016 16:48:12 +0200 From: Konstantin Belousov To: Dimitri Staessens Cc: freebsd-threads@freebsd.org Subject: Re: Unlocking a robust mutex in a cleanup handler Message-ID: <20161206144812.GS54029@kib.kiev.ua> References: <119e59d4-6125-f313-e6e6-67055a15d224@intec.ugent.be> <20161206112558.GN54029@kib.kiev.ua> <6a7139cd-b6db-d078-ee5e-b7c590eb13d1@intec.ugent.be> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <6a7139cd-b6db-d078-ee5e-b7c590eb13d1@intec.ugent.be> User-Agent: Mutt/1.7.1 (2016-10-04) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 06 Dec 2016 14:48:17 -0000 On Tue, Dec 06, 2016 at 03:17:41PM +0100, Dimitri Staessens wrote: > Dear Konstantin, > > thanks for your immediate response. Please find attached a minimal code > example. I do hope I'm sending in the correct format, I'm new to the > community. > > The test creates an integer, robust mutex and condition variable in > shared memory. A thread blocks on that condition variable with the > associated mutex. After one second, the main thread cancels the blocking > thread. > > I compile as follows: > > gcc robust_test.c -lpthread -lrt -o robust_test > > and run > > ./robust_test > > On linux it gives the following output: > [dstaesse@phoneutria]$ ./robust_test > Initializing... > Starting thread... > Sleeping for one second... > Thread started... > Cancelling thread... > Thread finished. > Bye. > > On FreeBSD I get the following: > $ ./robust_test > Initializing... > Starting thread... > Sleeping for one second... > Thread started... > Cancelling thread... > Fatal error 'inact_mtx enter' at line 188 in file > /usr/src/lib/libthr/thread/thr_mutex.c (errno = 0) > Abort trap (core dumped) Try this patch. It worked for me. It is enough to patch and then rebuild only libthr: cd /usr/src patch -p1 Delivered-To: freebsd-threads@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0C76CC69E53 for ; Tue, 6 Dec 2016 15:32:50 +0000 (UTC) (envelope-from dimitri.staessens@intec.ugent.be) Received: from smtp2.ugent.be (smtp2.ugent.be [157.193.49.126]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id B5145122B for ; Tue, 6 Dec 2016 15:32:48 +0000 (UTC) (envelope-from dimitri.staessens@intec.ugent.be) Received: from localhost (mcheck3.ugent.be [157.193.71.89]) by smtp2.ugent.be (Postfix) with ESMTP id CF529B21B3; Tue, 6 Dec 2016 16:32:46 +0100 (CET) X-Virus-Scanned: by UGent DICT Received: from smtp2.ugent.be ([157.193.49.126]) by localhost (mcheck3.ugent.be [157.193.43.11]) (amavisd-new, port 10024) with ESMTP id NV9td_TAwXQR; Tue, 6 Dec 2016 16:32:46 +0100 (CET) Received: from mail2.intec.ugent.be (mail2.intec.ugent.be [157.193.214.245]) by smtp2.ugent.be (Postfix) with ESMTP id 24323B2308; Tue, 6 Dec 2016 16:32:46 +0100 (CET) Received: from [10.10.131.46] (hal.ilabt.iminds.be [193.191.148.129]) (using TLSv1 with cipher ECDHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: dstaesse) by mail2.intec.ugent.be (Postfix) with ESMTPSA id 0DFE42C; Tue, 6 Dec 2016 16:32:46 +0100 (CET) Subject: Re: Unlocking a robust mutex in a cleanup handler To: Konstantin Belousov References: <119e59d4-6125-f313-e6e6-67055a15d224@intec.ugent.be> <20161206112558.GN54029@kib.kiev.ua> <6a7139cd-b6db-d078-ee5e-b7c590eb13d1@intec.ugent.be> <20161206144812.GS54029@kib.kiev.ua> Cc: freebsd-threads@freebsd.org From: Dimitri Staessens Message-ID: <35726dbb-75f7-682d-ad41-c78b96675485@intec.ugent.be> Date: Tue, 6 Dec 2016 16:30:14 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.5.1 MIME-Version: 1.0 In-Reply-To: <20161206144812.GS54029@kib.kiev.ua> X-Miltered: at jchkm1 with ID 5846DA1E.000 by Joe's j-chkmail (http://helpdesk.ugent.be/email/)! X-j-chkmail-Enveloppe: 5846DA1E.000 from mail2.intec.ugent.be/mail2.intec.ugent.be/157.193.214.245/mail2.intec.ugent.be/ X-j-chkmail-Score: MSGID : 5846DA1E.000 on smtp2.ugent.be : j-chkmail score : . : R=. U=. O=. B=0.000 -> S=0.000 X-j-chkmail-Status: Ham Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.23 X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 06 Dec 2016 15:32:50 -0000 Dear Konstantin, I didn't get the error, but on my machine the thread never exists the condwait when the pthread_cancel is called. gdb output: $ sudo gdb ./robust_test 3246 GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd"...(no debugging symbols found)... Attaching to program: /usr/home/dstaesse/robust_test, process 3246 Reading symbols from /lib/libthr.so.3...Reading symbols from /usr/lib/debug//lib/libthr.so.3.debug...done. [New Thread 801416500 (LWP 100404/robust_test)] [New Thread 801416000 (LWP 100313/robust_test)] done. Loaded symbols for /lib/libthr.so.3 Reading symbols from /usr/lib/librt.so.1...done. Loaded symbols for /usr/lib/librt.so.1 Reading symbols from /lib/libc.so.7...done. Loaded symbols for /lib/libc.so.7 Reading symbols from /libexec/ld-elf.so.1...done. Loaded symbols for /libexec/ld-elf.so.1 [Switching to Thread 801416000 (LWP 100313/robust_test)] 0x00000008008386ac in _umtx_op_err () from /lib/libthr.so.3 (gdb) info threads * 2 Thread 801416000 (LWP 100313/robust_test) 0x00000008008386ac in _umtx_op_err () from /lib/libthr.so.3 1 Thread 801416500 (LWP 100404/robust_test) _thr_ast (curthread=0x801416500) at /usr/src/lib/libthr/thread/thr_sig.c:271 Current language: auto; currently minimal (gdb) bt #0 0x00000008008386ac in _umtx_op_err () from /lib/libthr.so.3 #1 0x0000000800834df6 in join_common (pthread=, thread_return=, abstime=) at /usr/src/lib/libthr/thread/thr_join.c:125 #2 0x0000000000401186 in main () (gdb) thread 1 [Switching to thread 1 (Thread 801416500 (LWP 100404/robust_test))]#0 _thr_ast (curthread=0x801416500) at /usr/src/lib/libthr/thread/thr_sig.c:271 271 check_suspend(curthread); (gdb) bt #0 _thr_ast (curthread=0x801416500) at /usr/src/lib/libthr/thread/thr_sig.c:271 #1 0x0000000800837a5b in __thr_pshared_offpage (key=, doalloc=) at /usr/src/lib/libthr/thread/thr_pshared.c:86 #2 0x00000008008363cb in cond_wait_common (cond=, mutex=0x800643004, abstime=0x0, cancel=1) at /usr/src/lib/libthr/thread/thr_cond.c:349 #3 0x0000000000400ff2 in blockfunc () #4 0x000000080082ab55 in thread_start (curthread=) at /usr/src/lib/libthr/thread/thr_create.c:289 #5 0x0000000000000000 in ?? () (gdb) cheers, Dimitri On 12/06/16 15:48, Konstantin Belousov wrote: > On Tue, Dec 06, 2016 at 03:17:41PM +0100, Dimitri Staessens wrote: >> Dear Konstantin, >> >> thanks for your immediate response. Please find attached a minimal code >> example. I do hope I'm sending in the correct format, I'm new to the >> community. >> >> The test creates an integer, robust mutex and condition variable in >> shared memory. A thread blocks on that condition variable with the >> associated mutex. After one second, the main thread cancels the blocking >> thread. >> >> I compile as follows: >> >> gcc robust_test.c -lpthread -lrt -o robust_test >> >> and run >> >> ./robust_test >> >> On linux it gives the following output: >> [dstaesse@phoneutria]$ ./robust_test >> Initializing... >> Starting thread... >> Sleeping for one second... >> Thread started... >> Cancelling thread... >> Thread finished. >> Bye. >> >> On FreeBSD I get the following: >> $ ./robust_test >> Initializing... >> Starting thread... >> Sleeping for one second... >> Thread started... >> Cancelling thread... >> Fatal error 'inact_mtx enter' at line 188 in file >> /usr/src/lib/libthr/thread/thr_mutex.c (errno = 0) >> Abort trap (core dumped) > Try this patch. It worked for me. > It is enough to patch and then rebuild only libthr: > cd /usr/src > patch -p1 (cd lib/libthr && make WITHOUT_TESTS=yes all install) > > diff --git a/lib/libthr/thread/thr_cond.c b/lib/libthr/thread/thr_cond.c > index 506b8eca9e7..64d075ca06f 100644 > --- a/lib/libthr/thread/thr_cond.c > +++ b/lib/libthr/thread/thr_cond.c > @@ -224,16 +224,26 @@ cond_wait_kernel(struct pthread_cond *cvp, struct pthread_mutex *mp, > * state and unlock the mutex without making the state > * consistent and the state will be unrecoverable. > */ > - if (error2 == 0 && cancel) > + if (error2 == 0 && cancel) { > + if (robust) { > + _mutex_leave_robust(curthread, mp); > + robust = false; > + } > _thr_testcancel(curthread); > + } > > if (error == EINTR) > error = 0; > } else { > /* We know that it didn't unlock the mutex. */ > _mutex_cv_attach(mp, recurse); > - if (cancel) > + if (cancel) { > + if (robust) { > + _mutex_leave_robust(curthread, mp); > + robust = false; > + } > _thr_testcancel(curthread); > + } > error2 = 0; > } > if (robust) -- Dimitri Staessens Ghent University - imec Dept. of Information Technology (INTEC) Internet Based Communication Networks and Services Technologiepark 15 9052 Zwijnaarde T: +32 9 331 48 70 F: +32 9 331 48 99 From owner-freebsd-threads@freebsd.org Tue Dec 6 16:38:18 2016 Return-Path: Delivered-To: freebsd-threads@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 36A25C6A3A3 for ; Tue, 6 Dec 2016 16:38:18 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id AE7FC17CA for ; Tue, 6 Dec 2016 16:38:17 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id uB6Gc7Rp046147 (version=TLSv1 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Tue, 6 Dec 2016 18:38:07 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua uB6Gc7Rp046147 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id uB6Gc7id046146; Tue, 6 Dec 2016 18:38:07 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Tue, 6 Dec 2016 18:38:07 +0200 From: Konstantin Belousov To: Dimitri Staessens Cc: freebsd-threads@freebsd.org Subject: Re: Unlocking a robust mutex in a cleanup handler Message-ID: <20161206163807.GT54029@kib.kiev.ua> References: <119e59d4-6125-f313-e6e6-67055a15d224@intec.ugent.be> <20161206112558.GN54029@kib.kiev.ua> <6a7139cd-b6db-d078-ee5e-b7c590eb13d1@intec.ugent.be> <20161206144812.GS54029@kib.kiev.ua> <35726dbb-75f7-682d-ad41-c78b96675485@intec.ugent.be> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <35726dbb-75f7-682d-ad41-c78b96675485@intec.ugent.be> User-Agent: Mutt/1.7.1 (2016-10-04) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 06 Dec 2016 16:38:18 -0000 On Tue, Dec 06, 2016 at 04:30:14PM +0100, Dimitri Staessens wrote: > Dear Konstantin, > > I didn't get the error, but on my machine the thread never exists the > condwait when the pthread_cancel is called. > > gdb output: > > $ sudo gdb ./robust_test 3246 > GNU gdb 6.1.1 [FreeBSD] > Copyright 2004 Free Software Foundation, Inc. > GDB is free software, covered by the GNU General Public License, and you are > welcome to change it and/or distribute copies of it under certain > conditions. > Type "show copying" to see the conditions. > There is absolutely no warranty for GDB. Type "show warranty" for details. > This GDB was configured as "amd64-marcel-freebsd"...(no debugging > symbols found)... > Attaching to program: /usr/home/dstaesse/robust_test, process 3246 > Reading symbols from /lib/libthr.so.3...Reading symbols from > /usr/lib/debug//lib/libthr.so.3.debug...done. > [New Thread 801416500 (LWP 100404/robust_test)] > [New Thread 801416000 (LWP 100313/robust_test)] > done. > Loaded symbols for /lib/libthr.so.3 > Reading symbols from /usr/lib/librt.so.1...done. > Loaded symbols for /usr/lib/librt.so.1 > Reading symbols from /lib/libc.so.7...done. > Loaded symbols for /lib/libc.so.7 > Reading symbols from /libexec/ld-elf.so.1...done. > Loaded symbols for /libexec/ld-elf.so.1 > [Switching to Thread 801416000 (LWP 100313/robust_test)] > 0x00000008008386ac in _umtx_op_err () from /lib/libthr.so.3 > (gdb) info threads > * 2 Thread 801416000 (LWP 100313/robust_test) 0x00000008008386ac in > _umtx_op_err () from /lib/libthr.so.3 > 1 Thread 801416500 (LWP 100404/robust_test) _thr_ast > (curthread=0x801416500) at /usr/src/lib/libthr/thread/thr_sig.c:271 > Current language: auto; currently minimal > (gdb) bt > #0 0x00000008008386ac in _umtx_op_err () from /lib/libthr.so.3 > #1 0x0000000800834df6 in join_common (pthread=, > thread_return=, abstime=) > at /usr/src/lib/libthr/thread/thr_join.c:125 > #2 0x0000000000401186 in main () > (gdb) thread 1 > [Switching to thread 1 (Thread 801416500 (LWP 100404/robust_test))]#0 > _thr_ast (curthread=0x801416500) > at /usr/src/lib/libthr/thread/thr_sig.c:271 > 271 check_suspend(curthread); > (gdb) bt > #0 _thr_ast (curthread=0x801416500) at > /usr/src/lib/libthr/thread/thr_sig.c:271 > #1 0x0000000800837a5b in __thr_pshared_offpage (key= out>, doalloc=) > at /usr/src/lib/libthr/thread/thr_pshared.c:86 > #2 0x00000008008363cb in cond_wait_common (cond=, > mutex=0x800643004, abstime=0x0, cancel=1) > at /usr/src/lib/libthr/thread/thr_cond.c:349 > #3 0x0000000000400ff2 in blockfunc () > #4 0x000000080082ab55 in thread_start (curthread=) > at /usr/src/lib/libthr/thread/thr_create.c:289 > #5 0x0000000000000000 in ?? () > (gdb) > I suspect that there is an issue with the test program itself. If you terminate your program, e.g. with SIGING/Ctrl-C, then shm_unlink() call is not performed at the end, and orphaned locked robust mutex is kept associated with that memory segment. Then, since you have the loop around pthread_cond_wait() call, it seems feasible to assume that the next instance of the program gets ignored errors from pthread_mutex_lock() and pthread_cond_wait(). This is explicitely allowed by POSIX, which states that "Attempting to initialize an already initialized mutex results in undefined behavior." Can you try the following modification of your test program, without rebooting the machine, so that the shared segment and mutex were kept around ? #define _POSIX_C_SOURCE 200809L #define __XSI_VISIBLE 500 #include #include #include #include #include #include #define FN "/robust" #define FS (sizeof(int) + sizeof(pthread_mutex_t) + sizeof(pthread_cond_t)) /* contents of the shm segment */ int * shm_int; pthread_mutex_t * shm_mtx; pthread_cond_t * shm_cnd; /* function for thread */ void * blockfunc(void * o) { int error; printf("Thread started...\n"); error = pthread_mutex_lock(shm_mtx); if (error != 0) printf("mutex_lock err %d %s\n", error, strerror(error)); pthread_cleanup_push((void (*)(void *)) pthread_mutex_unlock, (void *) shm_mtx); error = 0; while (*shm_int == 0 && error == 0) error = pthread_cond_wait(shm_cnd, shm_mtx); if (error != 0) printf("cond_wait err %d %s\n", error, strerror(error)); pthread_cleanup_pop(1); return (void *) 0; } int main(void) { /* file descriptor for shm_open */ int fd; /* mutex and condvar attributes */ pthread_mutexattr_t mattr; pthread_condattr_t cattr; /* thread that will block on the convar in shm */ pthread_t thr; printf("Initializing...\n"); /* create shm segment containing an int, a mutex, and a condvar */ fd = shm_open(FN, O_CREAT | O_RDWR, 0666); ftruncate(fd, FS - 1); shm_int = mmap(NULL, FS, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); shm_mtx = (pthread_mutex_t *) (shm_int + 1); shm_cnd = (pthread_cond_t *) (shm_mtx + 1); close(fd); /* initialize the contents */ pthread_mutexattr_init(&mattr); pthread_mutexattr_setpshared(&mattr, PTHREAD_PROCESS_SHARED); pthread_mutexattr_setrobust(&mattr, PTHREAD_MUTEX_ROBUST); pthread_condattr_init(&cattr); pthread_condattr_setpshared(&cattr, PTHREAD_PROCESS_SHARED); pthread_mutex_init(shm_mtx, &mattr); pthread_cond_init(shm_cnd, &cattr); *shm_int = 0; /* start the thread */ printf("Starting thread...\n"); pthread_create(&thr, NULL, blockfunc, NULL); /* sleep for a second */ printf("Sleeping for one second...\n"); sleep(1); /* cancel the thread */ printf("Cancelling thread...\n"); pthread_cancel(thr); /* wait for the thread to join */ pthread_join(thr, NULL); printf("Thread finished.\n"); pthread_mutex_destroy(shm_mtx); pthread_cond_destroy(shm_cnd); /* cleanup shared memory */ munmap(shm_int, FS); shm_unlink(FN); printf("Bye.\n"); return (0); } From owner-freebsd-threads@freebsd.org Tue Dec 6 17:01:10 2016 Return-Path: Delivered-To: freebsd-threads@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 48FB5C6ACEE for ; Tue, 6 Dec 2016 17:01:10 +0000 (UTC) (envelope-from dimitri.staessens@intec.ugent.be) Received: from smtp2.ugent.be (smtp2.ugent.be [157.193.49.126]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 04FC369A for ; Tue, 6 Dec 2016 17:01:09 +0000 (UTC) (envelope-from dimitri.staessens@intec.ugent.be) Received: from localhost (mcheck3.ugent.be [157.193.71.89]) by smtp2.ugent.be (Postfix) with ESMTP id 7FBF8B2223; Tue, 6 Dec 2016 18:01:05 +0100 (CET) X-Virus-Scanned: by UGent DICT Received: from smtp2.ugent.be ([157.193.49.126]) by localhost (mcheck3.ugent.be [157.193.43.11]) (amavisd-new, port 10024) with ESMTP id l8Q3ZEKEbg2Z; Tue, 6 Dec 2016 18:01:05 +0100 (CET) Received: from mail2.intec.ugent.be (mail2.intec.ugent.be [157.193.214.245]) by smtp2.ugent.be (Postfix) with ESMTP id E1B90B2220; Tue, 6 Dec 2016 18:01:04 +0100 (CET) Received: from [192.168.66.170] (78-22-161-144.access.telenet.be [78.22.161.144]) (using TLSv1 with cipher ECDHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: dstaesse) by mail2.intec.ugent.be (Postfix) with ESMTPSA id AC38B2C; Tue, 6 Dec 2016 18:01:03 +0100 (CET) Subject: Re: Unlocking a robust mutex in a cleanup handler To: Konstantin Belousov References: <119e59d4-6125-f313-e6e6-67055a15d224@intec.ugent.be> <20161206112558.GN54029@kib.kiev.ua> <6a7139cd-b6db-d078-ee5e-b7c590eb13d1@intec.ugent.be> <20161206144812.GS54029@kib.kiev.ua> <35726dbb-75f7-682d-ad41-c78b96675485@intec.ugent.be> <20161206163807.GT54029@kib.kiev.ua> Cc: freebsd-threads@freebsd.org From: Dimitri Staessens Message-ID: <1c235c8f-b1db-f107-63e2-28e099e17667@intec.ugent.be> Date: Tue, 6 Dec 2016 17:58:32 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.5.1 MIME-Version: 1.0 In-Reply-To: <20161206163807.GT54029@kib.kiev.ua> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-Miltered: at jchkm3 with ID 5846EED0.001 by Joe's j-chkmail (http://helpdesk.ugent.be/email/)! X-j-chkmail-Enveloppe: 5846EED0.001 from mail2.intec.ugent.be/mail2.intec.ugent.be/157.193.214.245/mail2.intec.ugent.be/ X-j-chkmail-Score: MSGID : 5846EED0.001 on smtp2.ugent.be : j-chkmail score : . : R=. U=. O=. B=0.000 -> S=0.000 X-j-chkmail-Status: Ham X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 06 Dec 2016 17:01:10 -0000 Hi Konstantin, you're right. I moved home from work, so the machine was rebooted. Your patch works. It will have choked on the already initialized mutex in the shm segment as you suggest, the small test wasn't handling that case. I can't get the situation back by kiling the process while running (it was probably due to the specific location of the PANIC which led to a specific state in that mutex that was causing the lock up). Patch confirmed as working, thanks for your help! Dimitri On 12/06/16 17:38, Konstantin Belousov wrote: > On Tue, Dec 06, 2016 at 04:30:14PM +0100, Dimitri Staessens wrote: >> Dear Konstantin, >> >> I didn't get the error, but on my machine the thread never exists the >> condwait when the pthread_cancel is called. >> >> gdb output: >> >> $ sudo gdb ./robust_test 3246 >> GNU gdb 6.1.1 [FreeBSD] >> Copyright 2004 Free Software Foundation, Inc. >> GDB is free software, covered by the GNU General Public License, and you are >> welcome to change it and/or distribute copies of it under certain >> conditions. >> Type "show copying" to see the conditions. >> There is absolutely no warranty for GDB. Type "show warranty" for details. >> This GDB was configured as "amd64-marcel-freebsd"...(no debugging >> symbols found)... >> Attaching to program: /usr/home/dstaesse/robust_test, process 3246 >> Reading symbols from /lib/libthr.so.3...Reading symbols from >> /usr/lib/debug//lib/libthr.so.3.debug...done. >> [New Thread 801416500 (LWP 100404/robust_test)] >> [New Thread 801416000 (LWP 100313/robust_test)] >> done. >> Loaded symbols for /lib/libthr.so.3 >> Reading symbols from /usr/lib/librt.so.1...done. >> Loaded symbols for /usr/lib/librt.so.1 >> Reading symbols from /lib/libc.so.7...done. >> Loaded symbols for /lib/libc.so.7 >> Reading symbols from /libexec/ld-elf.so.1...done. >> Loaded symbols for /libexec/ld-elf.so.1 >> [Switching to Thread 801416000 (LWP 100313/robust_test)] >> 0x00000008008386ac in _umtx_op_err () from /lib/libthr.so.3 >> (gdb) info threads >> * 2 Thread 801416000 (LWP 100313/robust_test) 0x00000008008386ac in >> _umtx_op_err () from /lib/libthr.so.3 >> 1 Thread 801416500 (LWP 100404/robust_test) _thr_ast >> (curthread=0x801416500) at /usr/src/lib/libthr/thread/thr_sig.c:271 >> Current language: auto; currently minimal >> (gdb) bt >> #0 0x00000008008386ac in _umtx_op_err () from /lib/libthr.so.3 >> #1 0x0000000800834df6 in join_common (pthread=, >> thread_return=, abstime=) >> at /usr/src/lib/libthr/thread/thr_join.c:125 >> #2 0x0000000000401186 in main () >> (gdb) thread 1 >> [Switching to thread 1 (Thread 801416500 (LWP 100404/robust_test))]#0 >> _thr_ast (curthread=0x801416500) >> at /usr/src/lib/libthr/thread/thr_sig.c:271 >> 271 check_suspend(curthread); >> (gdb) bt >> #0 _thr_ast (curthread=0x801416500) at >> /usr/src/lib/libthr/thread/thr_sig.c:271 >> #1 0x0000000800837a5b in __thr_pshared_offpage (key=> out>, doalloc=) >> at /usr/src/lib/libthr/thread/thr_pshared.c:86 >> #2 0x00000008008363cb in cond_wait_common (cond=, >> mutex=0x800643004, abstime=0x0, cancel=1) >> at /usr/src/lib/libthr/thread/thr_cond.c:349 >> #3 0x0000000000400ff2 in blockfunc () >> #4 0x000000080082ab55 in thread_start (curthread=) >> at /usr/src/lib/libthr/thread/thr_create.c:289 >> #5 0x0000000000000000 in ?? () >> (gdb) >> > I suspect that there is an issue with the test program itself. > > If you terminate your program, e.g. with SIGING/Ctrl-C, then shm_unlink() > call is not performed at the end, and orphaned locked robust mutex is kept > associated with that memory segment. Then, since you have the loop around > pthread_cond_wait() call, it seems feasible to assume that the next > instance of the program gets ignored errors from pthread_mutex_lock() > and pthread_cond_wait(). > > This is explicitely allowed by POSIX, which states that "Attempting to > initialize an already initialized mutex results in undefined behavior." > > Can you try the following modification of your test program, without > rebooting the machine, so that the shared segment and mutex were kept > around ? > > #define _POSIX_C_SOURCE 200809L > #define __XSI_VISIBLE 500 > > #include > #include > #include > #include > #include > #include > > #define FN "/robust" > #define FS (sizeof(int) + sizeof(pthread_mutex_t) + sizeof(pthread_cond_t)) > > /* contents of the shm segment */ > int * shm_int; > pthread_mutex_t * shm_mtx; > pthread_cond_t * shm_cnd; > > /* function for thread */ > void * blockfunc(void * o) > { > int error; > > printf("Thread started...\n"); > > error = pthread_mutex_lock(shm_mtx); > if (error != 0) > printf("mutex_lock err %d %s\n", error, strerror(error)); > > pthread_cleanup_push((void (*)(void *)) pthread_mutex_unlock, > (void *) shm_mtx); > > error = 0; > while (*shm_int == 0 && error == 0) > error = pthread_cond_wait(shm_cnd, shm_mtx); > if (error != 0) > printf("cond_wait err %d %s\n", error, strerror(error)); > > pthread_cleanup_pop(1); > > return (void *) 0; > } > > int > main(void) > { > /* file descriptor for shm_open */ > int fd; > > /* mutex and condvar attributes */ > pthread_mutexattr_t mattr; > pthread_condattr_t cattr; > > /* thread that will block on the convar in shm */ > pthread_t thr; > > printf("Initializing...\n"); > > /* create shm segment containing an int, a mutex, and a condvar */ > fd = shm_open(FN, O_CREAT | O_RDWR, 0666); > ftruncate(fd, FS - 1); > shm_int = mmap(NULL, FS, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); > shm_mtx = (pthread_mutex_t *) (shm_int + 1); > shm_cnd = (pthread_cond_t *) (shm_mtx + 1); > > close(fd); > > /* initialize the contents */ > > pthread_mutexattr_init(&mattr); > pthread_mutexattr_setpshared(&mattr, PTHREAD_PROCESS_SHARED); > pthread_mutexattr_setrobust(&mattr, PTHREAD_MUTEX_ROBUST); > > pthread_condattr_init(&cattr); > pthread_condattr_setpshared(&cattr, PTHREAD_PROCESS_SHARED); > > pthread_mutex_init(shm_mtx, &mattr); > pthread_cond_init(shm_cnd, &cattr); > > *shm_int = 0; > > /* start the thread */ > printf("Starting thread...\n"); > pthread_create(&thr, NULL, blockfunc, NULL); > > /* sleep for a second */ > printf("Sleeping for one second...\n"); > sleep(1); > > /* cancel the thread */ > printf("Cancelling thread...\n"); > pthread_cancel(thr); > > /* wait for the thread to join */ > pthread_join(thr, NULL); > > printf("Thread finished.\n"); > pthread_mutex_destroy(shm_mtx); > pthread_cond_destroy(shm_cnd); > > /* cleanup shared memory */ > munmap(shm_int, FS); > shm_unlink(FN); > > printf("Bye.\n"); > return (0); > } -- Dimitri Staessens Ghent University - imec Dept. of Information Technology (INTEC) Internet Based Communication Networks and Services Technologiepark 15 9052 Zwijnaarde T: +32 9 331 48 70 F: +32 9 331 48 99