From owner-freebsd-current@freebsd.org Mon Mar 21 11:23:04 2016 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A8834AD87CF for ; Mon, 21 Mar 2016 11:23:04 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 533CBCC8 for ; Mon, 21 Mar 2016 11:23:04 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id u2LBMxUn061278 (version=TLSv1 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Mon, 21 Mar 2016 13:22:59 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua u2LBMxUn061278 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id u2LBMwX0061277; Mon, 21 Mar 2016 13:22:58 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Mon, 21 Mar 2016 13:22:58 +0200 From: Konstantin Belousov To: "Oleg V. Nauman" Cc: freebsd-current@freebsd.org Subject: Re: Fatal error 'mutex is on list' at line 139 in file /usr/src/lib/libthr/thread/thr_mutex.c (errno = 35) Message-ID: <20160321112258.GM1741@kib.kiev.ua> References: <5093647.qxI0C33PyG@asus.theweb.org.ua> <20160321052102.GJ1741@kib.kiev.ua> <20160321070710.GK1741@kib.kiev.ua> <1541955.eeyoXZYkvP@asus.theweb.org.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1541955.eeyoXZYkvP@asus.theweb.org.ua> User-Agent: Mutt/1.5.24 (2015-08-30) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 21 Mar 2016 11:23:04 -0000 On Mon, Mar 21, 2016 at 12:15:15PM +0200, Oleg V. Nauman wrote: > OK, but please take a look what I have found ( it makes me thinking that > problem is within the compiled KDE code ): > The failure point within the KDE code is the same ( at least it is true for > coredumps generated today ): > > #7 0x0000000805a2f6be in __pthread_mutex_timedlock (mutex=0x81b200008, > abstime=0x7fffffffd458) at /usr/src/lib/libthr/thread/thr_mutex.c:583 > #8 0x000000080443c4b0 in pthreadTimedLock::lock (this=0x81777b680) > at > /usr/ports/x11/kdelibs4/work/kdelibs-4.14.3/kdecore/util/kshareddatacache_p.h:252 > .... > (gdb) f 8 > #8 0x000000080443c4b0 in pthreadTimedLock::lock (this=0x81777b680) > at > /usr/ports/x11/kdelibs4/work/kdelibs-4.14.3/kdecore/util/kshareddatacache_p.h:252 > 252 return pthread_mutex_timedlock(&m_mutex, &timeout) == 0; > (gdb) p &m_mutex > $1 = (pthread_mutex_t *) 0x81b200008 > (gdb) p m_mutex > $2 = (pthread_mutex_t &) @0x81b200008: 0x8000000000000001 This is correct. The value is the special cookie set for the process-shared locks, the actual lock exists elsewere. > (gdb) p &timeout > $3 = (timespec *) 0x6 This might be some gdb issue. Anyway, the timeout value is not the problem. > (gdb) p timeout > Cannot access memory at address 0x6 > (gdb) > > It seems that both m_mutex and timeout are wrong m_mutex is fine, as I noted above. > > The class which generates coredumps looks like: > > #if defined(KSDC_THREAD_PROCESS_SHARED_SUPPORTED) && > defined(KSDC_TIMEOUTS_SUPPORTED) > class pthreadTimedLock : public pthreadLock > { > public: > pthreadTimedLock(pthread_mutex_t &mutex) > : pthreadLock(mutex) > { > } > > virtual bool lock() > { > struct timespec timeout; > > // Long timeout, but if we fail to meet this timeout it's probably a > cache > // corruption (and if we take 8 seconds then it should be much much > quicker > // the next time anyways since we'd be paged back in from disk) > timeout.tv_sec = 10 + ::time(NULL); // Absolute time, so 10 seconds > from now > timeout.tv_nsec = 0; > > return pthread_mutex_timedlock(&m_mutex, &timeout) == 0; > } > }; > #endif > > It is called by: > > (gdb) f 9 > #9 0x000000080443c8a8 in KSharedDataCache::Private::CacheLocker::cautiousLock > ( > this=0x7fffffffd5f0) > at > /usr/ports/x11/kdelibs4/work/kdelibs-4.14.3/kdecore/util/kshareddatacache.cpp:1259 > 1259 while (!d->lock() && !isLockedCacheSafe()) { > gdb) p *d > $4 = {m_cacheName = {static null = {}, static shared_null = > {ref = { > _q_value = 2731}, alloc = 0, size = 0, data = 0x6192ca > , > clean = 0, simpletext = 0, righttoleft = 0, asciiCache = 0, capacity = > 0, reserved = 0, > array = {0}}, static shared_empty = {ref = {_q_value = 50}, alloc = 0, > size = 0, > data = 0x805105c3a , clean = 0, simpletext = > 0, > righttoleft = 0, asciiCache = 0, capacity = 0, reserved = 0, array = > {0}}, > d = 0x8176e8180, static codecForCStrings = 0x0}, shm = 0x81b200000, > m_lock = {> = > {> = {value = 0x81777b680}, d = 0x81777b6c0}, > }, m_mapSize = 10547304, > m_defaultCacheSize = 10485760, m_expectedItemSize = 0, m_expectedType = > LOCKTYPE_MUTEX} > (gdb) p d > $5 = (KSharedDataCache::Private *) 0x8176d2030 > > Well I understand that unwinding the KDE code it is a task not for humans.. > > The hardware is ASUS X552C notebook, Ivybridge, amd64 > I noticed massive coredumps after x11/kdelibs4 recompilation with clang 3.8.0 > so it is possible that it is a problem with code generation. > It is does not depend on optimization level ( at least it exhibits the same > behavior for both -O2 and -O0 ) > The only CPU/optimization/code generation specific setting is > CPUTYPE?=nehalem > in make.conf In other words, there is no virtualization involved. I think that the problem at hands is not related to clang update. You recently rebuilt kde libs, which probably triggered detection of the new feature, process-shared locks in our libthr. Before that, older HEAD does not exposed p/shared as implemented option. Somehow the implementation and KDE expectations do not match, and asserts in libthr catch that. Anyway, please apply the debugging patch I posted in the previous mail.