Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 1 Jan 2019 19:37:35 -0800
From:      Mark Millard <marklmi@yahoo.com>
To:        freebsd-emulation@freebsd.org, ports-list freebsd <freebsd-ports@freebsd.org>
Cc:        freebsd-arm <freebsd-arm@freebsd.org>
Subject:   Re: qemu-arm-static under amd64: example of stuck looping atomic_cmpset_int while building graphics/poppler-qt5 [problem possibly found]
Message-ID:  <B68A2CBC-7B81-47B9-A997-2A78C24F67FC@yahoo.com>
In-Reply-To: <51834F45-1EB7-449F-A0EC-7F5EC7CBB12F@yahoo.com>
References:  <51834F45-1EB7-449F-A0EC-7F5EC7CBB12F@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
[It looks to me like the assembler code has some code moved out of the
loop that it should not be (by intent). It appears that calling
tcmpset_32 does not prevent code motion to before the call and
the variable involved was not declared volatile.]

On 2019-Jan-1, at 18:43, Mark Millard <marklmi at yahoo.com> wrote:

> The below showed up for poudiere-devel bulk getting stuck using one =
FreeBSD
> cpu while building graphics/poppler-qt5 . This is not a kevent =
hang-up, unlike
> the last hang-up that I analyzed. I do not yet know how repeatable =
this is
> but the original hang-up and the one experiment the below is from.
>=20
> =46rom top:
>=20
>  PID USERNAME    THR PRI NICE   SIZE    RES SWAP STATE    C   TIME     =
CPU COMMAND
> 12789 root          4  52    0   166M    33M    0 uwait    6  36:06  =
97.22% /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E =
cmake_autogen /wrkdirs/usr/ports/graphics/poppler-qt5/work/poppler-0
>=20
> Note: The vast margority of the 36:06 has been stuck in the uwait loop =
involved.
>=20
> =46rom ps -auxd:
>=20
> root       94075    0.0  0.0  12932  3552  1  S+   10:42        =
0:01.21 |           `-- sh -e /usr/local/share/poudriere/bulk.sh =
-jFBSDFSSDjailArmV7 -w graphics/poppler-qt5
> root        1944    0.0  0.0  12932  3540  1  I+   10:42        =
0:00.00 |             |-- sh -e /usr/local/share/poudriere/bulk.sh =
-jFBSDFSSDjailArmV7 -w graphics/poppler-qt5
> root        1957    0.0  0.0  12932  3556  1  I    10:42        =
0:00.04 |             |-- sh: poudriere[FBSDFSSDjailArmV7-default][01]: =
build_pkg (poppler-qt5-0.72.0) (sh)
> root       12328    0.0  0.0  12932  3548  1  I    10:49        =
0:00.00 |             | `-- sh: =
poudriere[FBSDFSSDjailArmV7-default][01]: build_pkg (poppler-qt5-0.72.0) =
(sh)
> root       12329    0.0  0.0  10328  1756  1  IJ   10:49        =
0:00.01 |             |   `-- /usr/bin/make -C =
/usr/ports/graphics/poppler-qt5 stage
> root       12350    0.0  0.0   9860  1248  1  IJ   10:49        =
0:00.00 |             |     `-- /usr/bin/make -f Makefile =
DESTDIR=3D/wrkdirs/usr/ports/graphics/poppler-qt5/work/stage install
> root       12353    0.0  0.0  10236  1664  1  IJ   10:49        =
0:00.05 |             |       `-- /nxb-bin/usr/bin/make -f =
CMakeFiles/Makefile2 qt5/all
> root       12787    0.0  0.0   9856  1236  1  IJ   10:50        =
0:00.00 |             |         `-- /nxb-bin/usr/bin/make -f =
qt5/tests/CMakeFiles/check_qt5_attachments_autogen.dir/build.make =
qt5/test
> root       12789  100.0  0.0 169868 33528  1  IJ   10:50       =
36:35.26 |             |           `-- /usr/local/bin/qemu-arm-static =
/usr/local/bin/cmake -E cmake_autogen /wrkdirs/usr/ports/graphics/
> root       94423    0.0  0.0  12932  3484  1  S+   10:42        =
0:12.91 |             `-- sh -e /usr/local/share/poudriere/bulk.sh =
-jFBSDFSSDjailArmV7 -w graphics/poppler-qt5
>=20
>=20
> (gdb) attach 12789
> Attaching to process 12789
> Reading symbols from =
/usr/local/poudriere/data/.m/FBSDFSSDjailArmV7-default/01/usr/local/bin/qe=
mu-arm-static...done.
> [New LWP 101168 of process 12789]
> [New LWP 101178 of process 12789]
> [New LWP 101499 of process 12789]
> [Switching to LWP 100304 of process 12789]
> _umtx_op () at _umtx_op.S:3
> 3	RSYSCALL(_umtx_op)
> (gdb) info threads
>  Id   Target Id                   Frame=20
> * 1    LWP 100304 of process 12789 _umtx_op () at _umtx_op.S:3
>  2    LWP 101168 of process 12789 _umtx_op_err () at =
/usr/src/lib/libthr/arch/amd64/amd64/_umtx_op_err.S:37
>  3    LWP 101178 of process 12789 _umtx_op () at _umtx_op.S:3
>  4    LWP 101499 of process 12789 0x0000000060051c26 in =
atomic_cmpset_int (dst=3D<optimized out>, expect=3D<optimized out>, =
src=3D536870912) at /usr/include/machine/atomic.h:220
> (gdb) thread 4
> [Switching to thread 4 (LWP 101499 of process 12789)]
> #0  0x0000000060051c26 in atomic_cmpset_int (dst=3D<optimized out>, =
expect=3D<optimized out>, src=3D536870912) at =
/usr/include/machine/atomic.h:220
> 220	ATOMIC_CMPSET(int);
>=20
> (gdb) bt
> #0  0x0000000060051c26 in atomic_cmpset_int (dst=3D<optimized out>, =
expect=3D<optimized out>, src=3D536870912) at =
/usr/include/machine/atomic.h:220
> #1  tcmpset_32 (addr=3D<optimized out>, a=3D<optimized out>, =
b=3D536870912) at =
/wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/b=
sd-user/freebsd/os-thread.c:178
> #2  freebsd_rw_unlock (target_addr=3D4108246528) at =
/wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/b=
sd-user/freebsd/os-thread.c:1264
> #3  0x000000006004ab33 in do_freebsd__umtx_op (obj=3D<optimized out>, =
op=3D536870912, val=3D<optimized out>, uaddr=3D<optimized out>, =
target_time=3D<optimized out>)
>    at =
/wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/b=
sd-user/freebsd/os-thread.h:474
> #4  0x0000000060041b83 in do_freebsd_syscall (cpu_env=3D0x86159b118, =
num=3D454, arg1=3D<optimized out>, arg2=3D<optimized out>, =
arg3=3D<optimized out>, arg4=3D0, arg5=3D0, arg6=3D-184411592, =
arg7=3D-199471616,=20
>    arg8=3D-1622188640) at =
/wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/b=
sd-user/syscall.c:1364
> #5  0x00000000600392f0 in target_cpu_loop (env=3D0x86159b118) at =
/wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/b=
sd-user/arm/target_arch_cpu.h:207
> #6  0x0000000060038c99 in cpu_loop (env=3D0xf4dede80) at =
/wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/b=
sd-user/main.c:121
> #7  0x0000000060050c1a in new_freebsd_thread_start (arg=3D<optimized =
out>) at =
/wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/b=
sd-user/freebsd/os-thread.c:152
> #8  0x00000000601ad5f6 in thread_start (curthread=3D0x861571300) at =
/usr/src/lib/libthr/thread/thr_create.c:291
> #9  0x0000000000000000 in ?? ()
> Backtrace stopped: Cannot access memory at address 0x7fffdfbb3000
>=20
> 181	#define	ATOMIC_CMPSET(TYPE)				\
> 182	static __inline int					\
> 183	atomic_cmpset_##TYPE(volatile u_##TYPE *dst, u_##TYPE expect, =
u_##TYPE src) \
> 184	{							\
> (gdb)=20
> 185		u_char res;					\
> 186								\
> 187		__asm __volatile(				\
> 188		"	" MPLOCKED "		"		\
> 189		"	cmpxchg %3,%1 ;	"			\
> 190		"	sete	%0 ;		"		\
> 191		"# atomic_cmpset_" #TYPE "	"		\
> 192		: "=3Dq" (res),			/* 0 */		\
> 193		  "+m" (*dst),			/* 1 */		\
> 194		  "+a" (expect)			/* 2 */		\
> (gdb)=20
> 195		: "r" (src)			/* 3 */		\
> 196		: "memory", "cc");				\
> 197		return (res);					\
> 198	}							\
> . . .
> 218	ATOMIC_CMPSET(char);
> 219	ATOMIC_CMPSET(short);
> 220	ATOMIC_CMPSET(int);
> 221	ATOMIC_CMPSET(long);
>=20
> (gdb) up
> #1  tcmpset_32 (addr=3D<optimized out>, a=3D<optimized out>, =
b=3D536870912) at =
/wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/b=
sd-user/freebsd/os-thread.c:178
> 178	    return atomic_cmpset_acq_32(addr, current, new);
> (gdb) list
> 173	static int tcmpset_32(uint32_t *addr, uint32_t a, uint32_t b)
> 174	{
> 175	    uint32_t current =3D tswap32(a);
> 176	    uint32_t new =3D tswap32(b);
> 177=09
> 178	    return atomic_cmpset_acq_32(addr, current, new);
> 179	}
>=20
> (gdb) up
> #2  freebsd_rw_unlock (target_addr=3D4108246528) at =
/wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/b=
sd-user/freebsd/os-thread.c:1264
> 1264	            if (!tcmpset_32(&target_urwlock->rw_state, state, =
(state  - 1))) {
> (gdb) list
> 1259	            }
> 1260	        }
> 1261	    } else if (TARGET_URWLOCK_READER_COUNT(state) !=3D 0) {
> 1262	        /* decrement reader count */
> 1263	        for (;;) {
> 1264	            if (!tcmpset_32(&target_urwlock->rw_state, state, =
(state  - 1))) {
> 1265	                if (TARGET_URWLOCK_READER_COUNT(state) =3D=3D 0) =
{
> 1266	                    unlock_user_struct(target_urwlock,
> 1267	                        target_addr, 1);
> 1268	                        return -TARGET_EPERM;
> (gdb)=20
> 1269	                 }
> 1270	            } else {
> 1271	                break;
> 1272	            }
> 1273	        }
> 1274	    } else {
> 1275	        unlock_user_struct(target_urwlock, target_addr, 1);
> 1276	        return -TARGET_EPERM;
> 1277	    }
> 1278=09
>=20
> The above loop is where the code is stuck looping, using most of the =
capacity of
> one cpu (as FreeBSD classifies cpus). The value of state is stuck at:
>=20
> (gdb) print state
> $2 =3D 536870913
>=20
> I'll note that the loop is inside qemu-arm-atatic, not in the cmake =
that it
> is running.
>=20
>=20
>=20
> To show a little more context before the assembler for the routine =
that ends up
> containing that loop:
>=20
> (gdb) up
> #3  0x000000006004ab33 in do_freebsd__umtx_op (obj=3D<optimized out>, =
op=3D536870912, val=3D<optimized out>, uaddr=3D<optimized out>, =
target_time=3D<optimized out>)
>    at =
/wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/b=
sd-user/freebsd/os-thread.h:474
> 474	        ret =3D freebsd_rw_unlock(obj);
> (gdb) list
> 469	            ret =3D freebsd_rw_wrlock(obj, val, 0, NULL);
> 470	        }
> 471	        break;
> 472=09
> 473	    case TARGET_UMTX_OP_RW_UNLOCK:
> 474	        ret =3D freebsd_rw_unlock(obj);
> 475	        break;
> 476=09
> 477	#if defined(__FreeBSD_version) && __FreeBSD_version > 900000
> 478	#ifdef UMTX_OP_MUTEX_WAKE2
>=20
> (gdb) up
> #4  0x0000000060041b83 in do_freebsd_syscall (cpu_env=3D0x86159b118, =
num=3D454, arg1=3D<optimized out>, arg2=3D<optimized out>, =
arg3=3D<optimized out>, arg4=3D0, arg5=3D0, arg6=3D-184411592, =
arg7=3D-199471616,=20
>    arg8=3D-1622188640) at =
/wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/b=
sd-user/syscall.c:1364
> 1364	        ret =3D do_freebsd__umtx_op(arg1, arg2, arg3, arg4, =
arg5);
> (gdb) list
> 1359	        ret =3D do_freebsd__umtx_unlock(arg1);
> 1360	        break;
> 1361	#endif
> 1362=09
> 1363	    case TARGET_FREEBSD_NR__umtx_op: /* undocumented */
> 1364	        ret =3D do_freebsd__umtx_op(arg1, arg2, arg3, arg4, =
arg5);
> 1365	        break;
> 1366=09
> 1367	        /*
> 1368	         * ioctl(2)
>=20
> As for the assembler level view for the routine containing the
> loop (via inlining and such):
>=20
> Dump of assembler code for function freebsd_rw_unlock:
>   0x0000000060051bd0 <+0>:	push   %rbp
>   0x0000000060051bd1 <+1>:	mov    %rsp,%rbp
>   0x0000000060051bd4 <+4>:	push   %rbx
>   0x0000000060051bd5 <+5>:	push   %rax
>   0x0000000060051bd6 <+6>:	mov    %edi,%ebx
>   0x0000000060051bd8 <+8>:	mov    $0x20,%esi
>   0x0000000060051bdd <+13>:	mov    $0x3,%edx
>   0x0000000060051be2 <+18>:	callq  0x60036b70 <page_check_range>
>   0x0000000060051be7 <+23>:	mov    %eax,%ecx
>   0x0000000060051be9 <+25>:	mov    $0xfffffff2,%eax
>   0x0000000060051bee <+30>:	test   %ecx,%ecx
>   0x0000000060051bf0 <+32>:	jne    0x60051c50 =
<freebsd_rw_unlock+128>
>   0x0000000060051bf2 <+34>:	mov    %ebx,%edi
>   0x0000000060051bf4 <+36>:	add    0x26a6355(%rip),%rdi        # =
0x626f7f50 <guest_base>
>   0x0000000060051bfb <+43>:	je     0x60051c50 =
<freebsd_rw_unlock+128>
>   0x0000000060051bfd <+45>:	mov    (%rdi),%ecx
>   0x0000000060051bff <+47>:	mov    0x4(%rdi),%edx
>   0x0000000060051c02 <+50>:	test   %ecx,%ecx
>   0x0000000060051c04 <+52>:	js     0x60051c30 <freebsd_rw_unlock+96>
>   0x0000000060051c06 <+54>:	test   $0x1fffffff,%ecx
>   0x0000000060051c0c <+60>:	je     0x60051c4b =
<freebsd_rw_unlock+123>
>   0x0000000060051c0e <+62>:	lea    -0x1(%rcx),%esi
>   0x0000000060051c11 <+65>:	nopw   %cs:0x0(%rax,%rax,1)
>   0x0000000060051c1b <+75>:	nopl   0x0(%rax,%rax,1)
> NOTE: loop is below:
>   0x0000000060051c20 <+80>:	mov    %ecx,%eax
> =3D> 0x0000000060051c22 <+82>:	lock cmpxchg %esi,(%rdi)
>   0x0000000060051c26 <+86>:	sete   %bl
>   0x0000000060051c29 <+89>:	test   %bl,%bl
>   0x0000000060051c2b <+91>:	je     0x60051c20 <freebsd_rw_unlock+80>
> Note: loop is above
>   0x0000000060051c2d <+93>:	jmp    0x60051c57 =
<freebsd_rw_unlock+135>
>   0x0000000060051c2f <+95>:	nop
>   0x0000000060051c30 <+96>:	mov    %ecx,%esi
>   0x0000000060051c32 <+98>:	and    $0x7fffffff,%esi
>   0x0000000060051c38 <+104>:	mov    %ecx,%eax
>   0x0000000060051c3a <+106>:	lock cmpxchg %esi,(%rdi)
>   0x0000000060051c3e <+110>:	sete   %bl
>   0x0000000060051c41 <+113>:	test   %bl,%bl
>   0x0000000060051c43 <+115>:	jne    0x60051c57 =
<freebsd_rw_unlock+135>
>   0x0000000060051c45 <+117>:	mov    (%rdi),%ecx
>   0x0000000060051c47 <+119>:	test   %ecx,%ecx
>   0x0000000060051c49 <+121>:	js     0x60051c30 <freebsd_rw_unlock+96>
>   0x0000000060051c4b <+123>:	mov    $0xffffffff,%eax
>   0x0000000060051c50 <+128>:	add    $0x8,%rsp
>   0x0000000060051c54 <+132>:	pop    %rbx
>   0x0000000060051c55 <+133>:	pop    %rbp
>   0x0000000060051c56 <+134>:	retq  =20
>   0x0000000060051c57 <+135>:	test   $0x2,%dl
>   0x0000000060051c5a <+138>:	jne    0x60051c70 =
<freebsd_rw_unlock+160>
>   0x0000000060051c5c <+140>:	test   $0x40000000,%ecx
>   0x0000000060051c62 <+146>:	jne    0x60051c82 =
<freebsd_rw_unlock+178>
>   0x0000000060051c64 <+148>:	xor    %eax,%eax
>   0x0000000060051c66 <+150>:	test   $0x20000000,%ecx
>   0x0000000060051c6c <+156>:	je     0x60051c50 =
<freebsd_rw_unlock+128>
>   0x0000000060051c6e <+158>:	jmp    0x60051c89 =
<freebsd_rw_unlock+185>
>   0x0000000060051c70 <+160>:	test   $0x20000000,%ecx
>   0x0000000060051c76 <+166>:	jne    0x60051c89 =
<freebsd_rw_unlock+185>
>   0x0000000060051c78 <+168>:	xor    %eax,%eax
>   0x0000000060051c7a <+170>:	test   $0x40000000,%ecx
>   0x0000000060051c80 <+176>:	je     0x60051c50 =
<freebsd_rw_unlock+128>
>   0x0000000060051c82 <+178>:	mov    $0x1,%edx
>   0x0000000060051c87 <+183>:	jmp    0x60051c8e =
<freebsd_rw_unlock+190>
>   0x0000000060051c89 <+185>:	mov    $0x7fffffff,%edx
>   0x0000000060051c8e <+190>:	mov    $0x3,%esi
>   0x0000000060051c93 <+195>:	xor    %ecx,%ecx
>   0x0000000060051c95 <+197>:	xor    %r8d,%r8d
>   0x0000000060051c98 <+200>:	callq  0x601b5970 <_umtx_op>
>   0x0000000060051c9d <+205>:	mov    %eax,%edi
>   0x0000000060051c9f <+207>:	add    $0x8,%rsp
>   0x0000000060051ca3 <+211>:	pop    %rbx
>   0x0000000060051ca4 <+212>:	pop    %rbp
> --Type <RET> for more, q to quit, c to continue without paging--
>   0x0000000060051ca5 <+213>:	jmpq   0x60040a40 <get_errno>
> End of assembler dump.
>=20
> For just before the "lock cmpxchg %esi,(%rdi)" in the loop:
>=20
> rax            0x20000001          536870913
> rbx            0xf4dede00          4108246528
> rcx            0x20000001          536870913
> rdx            0x2                 2
> rsi            0x20000000          536870912
> rdi            0xf4dede80          4108246656
>=20
> (gdb) x 0xf4dede80
> 0xf4dede80:	0x40000001
>=20
>   0x0000000060051c20 <+80>:	mov    %ecx,%eax
> =3D> 0x0000000060051c22 <+82>:	lock cmpxchg %esi,(%rdi)
>   0x0000000060051c26 <+86>:	sete   %bl
>   0x0000000060051c29 <+89>:	test   %bl,%bl
>   0x0000000060051c2b <+91>:	je     0x60051c20 <freebsd_rw_unlock+80>
>=20
>=20
> At this point I do not have interpretation of the details, not even
> a comparison to the source code.


In the source code there is:

1261	    } else if (TARGET_URWLOCK_READER_COUNT(state) !=3D 0) {
1262	        /* decrement reader count */
1263	        for (;;) {
1264	            if (!tcmpset_32(&target_urwlock->rw_state, state, =
(state  - 1))) {
1265	                if (TARGET_URWLOCK_READER_COUNT(state) =3D=3D 0) =
{
1266	                    unlock_user_struct(target_urwlock,
1267	                        target_addr, 1);
1268	                        return -TARGET_EPERM;
(gdb)=20
1269	                 }
1270	            } else {
1271	                break;
1272	            }
1273	        }

But in the assembler code it appears to me that the code:

1265	                if (TARGET_URWLOCK_READER_COUNT(state) =3D=3D 0) =
{
1266	                    unlock_user_struct(target_urwlock,
1267	                        target_addr, 1);
1268	                        return -TARGET_EPERM;
(gdb)=20
1269	                 }

was factored out of the loop, with the test moved to before the loop.
The TARGET_URWLOCK_READER_COUNT use masks off based on 0x1fffffff .
But the code for the conditional test is before the loop:

  0x0000000060051c06 <+54>:	test   $0x1fffffff,%ecx
  0x0000000060051c0c <+60>:	je     0x60051c4b =
<freebsd_rw_unlock+123>
. . .
NOTE: loop is below:
  0x0000000060051c20 <+80>:	mov    %ecx,%eax
=3D> 0x0000000060051c22 <+82>:	lock cmpxchg %esi,(%rdi)
  0x0000000060051c26 <+86>:	sete   %bl
  0x0000000060051c29 <+89>:	test   %bl,%bl
  0x0000000060051c2b <+91>:	je     0x60051c20 <freebsd_rw_unlock+80>
Note: loop is above

I may try declaring state volatile in an attempt to prevent the
code motion. (But what of other uses of tcmpset_32 controlling
subordinate code in other places?)


=3D=3D=3D
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?B68A2CBC-7B81-47B9-A997-2A78C24F67FC>