Date: Tue, 2 Jan 2018 12:27:47 -0800 From: Mark Millard <markmi@dsl-only.net> To: Allan Jude <allanjude@freebsd.org> Cc: FreeBSD Hackers <freebsd-hackers@freebsd.org> Subject: Re: rpi2 head -r327485 (e.g.): rpi2 leaves one "CPU n" always idle for some boots Message-ID: <E6B45C90-5A0C-4AE0-AAFA-C9753FD5E908@dsl-only.net> In-Reply-To: <85268695-3440-4B94-BF3D-68A38E204F50@dsl-only.net> References: <3258F809-F179-4EB7-857C-74667BECD360@dsl-only.net> <9da67685-d1c2-b6e0-cc32-efb49e67f0b1@freebsd.org> <85268695-3440-4B94-BF3D-68A38E204F50@dsl-only.net>
next in thread | previous in thread | raw e-mail | index | archive | help
[It looks like the opensll speed on problem cpuN (2) can not get ^C and the like. On 2018-Jan-2, at 12:13 PM, Mark Millard <markmi at dsl-only.net> wrote: > On 2018-Jan-2, at 12:01 PM, Allan Jude <allanjude at freebsd.org> = wrote: >=20 >> On 2018-01-02 14:48, Mark Millard wrote: >>> I've seen this over many versions of head for months >>> but have never managed to find a way to force it to >>> happen. It just shows up once and a while. >>>=20 >>> Thus, I'm just dumping out some top and kernel information >>> here for reference. I've used: >>>=20 >>> openssl speed 1>/dev/null 2>&1 & >>> openssl speed 1>/dev/null 2>&1 & >>> openssl speed 1>/dev/null 2>&1 & >>> openssl speed 1>/dev/null 2>&1 & >>>=20 >>> to give the rpi2 4 active processes. Various outputs >>> are from different times without a reboot between. >>>=20 >>> top -CaePores shows the likes of: >>>=20 >>> PID USERNAME THR PRI NICE SIZE RES SWAP STATE C = TIME CPU COMMAND >>> 614 root 1 20 0 10452K 10480K 0K select 1 = 0:00 0.03% /usr/sbin/ntpd -g -c /etc/ntp.conf -p /var/run/ntpd.pid -f = /var/db/ntpd.drift >>> 661 root 1 52 0 9984K 6132K 0K select 1 = 0:00 0.00% /usr/sbin/sshd >>> 751 root 1 101 0 7256K 4276K 0K RUN 1 = 0:28 99.57% openssl speed >>> 750 root 1 100 0 7256K 4276K 0K CPU0 0 = 0:32 94.83% openssl speed >>> 753 root 1 86 0 7256K 4276K 0K RUN 3 = 0:13 52.36% openssl speed >>> 752 root 1 86 0 7256K 4276K 0K CPU3 3 = 0:14 46.54% openssl speed >>> 363 root 1 20 0 6428K 3840K 0K select 3 = 0:00 0.00% /sbin/devd >>> . . . >>>=20 >>> and: >>>=20 >>> last pid: 754; load averages: 3.70, 2.38, 1.58 = = up 0+00:16:50 01:59:37 >>> 21 processes: 5 running, 16 sleeping >>> CPU 0: 94.9% user, 0.0% nice, 0.0% system, 5.1% interrupt, 0.0% = idle >>> CPU 1: 99.6% user, 0.0% nice, 0.0% system, 0.4% interrupt, 0.0% = idle >>> CPU 2: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% = idle >>> CPU 3: 100% user, 0.0% nice, 0.0% system, 0.0% interrupt, 0.0% = idle >>> Mem: 12M Active, 1136K Inact, 56M Wired, 30M Buf, 722M Free >>> Swap: 1536M Total, 6M Free >>>=20 >>> =46rom problem boot to problem boot, the CPU that stays >>> idle has varied but usually has been CPU 2. I've never >>> seen 2 or more stuck in idle. >>>=20 >>> show allpcpu shows the likes of: >>>=20 >>> db> show allpcpu >>> Current CPU: 0 >>>=20 >>> cpuid =3D 0 >>> dynamic pcpu =3D 0x3d2540 >>> curthread =3D 0xd8478ae0: pid 2032 tid 100150 "openssl" >>> curpcb =3D 0xd852ae98 >>> fpcurthread =3D 0xd8478ae0: pid 2032 "openssl" >>> idlethread =3D 0xc376fae0: tid 100002 "idle: cpu0" >>> curpmap =3D 0xd8e43bf4 >>> curvnet =3D 0 >>>=20 >>> cpuid =3D 1 >>> dynamic pcpu =3D 0x3998540 >>> curthread =3D 0xd7e5b3a0: pid 2031 tid 100173 "openssl" >>> curpcb =3D 0xda7e0e98 >>> fpcurthread =3D 0xd7e5b3a0: pid 2031 "openssl" >>> idlethread =3D 0xc376f740: tid 100003 "idle: cpu1" >>> curpmap =3D 0xd8e43ec4 >>> curvnet =3D 0 >>>=20 >>> cpuid =3D 2 >>> dynamic pcpu =3D 0x3999540 >>> curthread =3D 0xc376f3a0: pid 10 tid 100004 "idle: cpu2" >>> curpcb =3D 0xc378ae98 >>> fpcurthread =3D none >>> idlethread =3D 0xc376f3a0: tid 100004 "idle: cpu2" >>> curpmap =3D 0 >>> curvnet =3D 0 >>>=20 >>> cpuid =3D 3 >>> dynamic pcpu =3D 0x399a540 >>> curthread =3D 0xd8477000: pid 2034 tid 100167 "openssl" >>> curpcb =3D 0xd876de98 >>> fpcurthread =3D 0xd8477000: pid 2034 "openssl" >>> idlethread =3D 0xc376f000: tid 100005 "idle: cpu3" >>> curpmap =3D 0xc377ab04 >>> curvnet =3D 0 >>>=20 >>> In other words: it appears that the cpuN (here cpu2) is >>> left with idle scheduled all the time for some reason. >>>=20 >>> ps from db> shows things like: >>>=20 >>>=20 >>> db> ps >>> pid ppid pgrp uid state wmesg wchan cmd >>> 2034 714 2034 0 R+ openssl >>> 2033 714 2033 0 R+ CPU 3 openssl >>> 2032 714 2032 0 R+ CPU 0 openssl >>> 2031 714 2031 0 R+ CPU 1 openssl >>>=20 >>> (then later:) >>>=20 >>> db> ps >>> pid ppid pgrp uid state wmesg wchan cmd >>> 2034 714 2034 0 R+ CPU 3 openssl >>> 2033 714 2033 0 R+ openssl >>> 2032 714 2032 0 R+ CPU 0 openssl >>> 2031 714 2031 0 R+ CPU 1 openssl >>>=20 >>> There is also: >>>=20 >>> 10 0 0 0 RL (threaded) [idle] >>> 100002 CanRun [idle: cpu0] >>> 100003 CanRun [idle: cpu1] >>> 100004 CanRun [idle: cpu2] >>> 100005 CanRun [idle: cpu3] >>>=20 >>>=20 >>> These are from: >>>=20 >>> # uname -apKU >>> FreeBSD rpi2 12.0-CURRENT FreeBSD 12.0-CURRENT r327485M arm armv7 = 1200054 1200054 >>>=20 >>>=20 >>> =3D=3D=3D >>> Mark Millard >>> markmi at dsl-only.net >>>=20 >>> _______________________________________________ >>> freebsd-hackers@freebsd.org mailing list >>> https://lists.freebsd.org/mailman/listinfo/freebsd-hackers >>> To unsubscribe, send any mail to = "freebsd-hackers-unsubscribe@freebsd.org" >>>=20 >>=20 >> Have you tried using 'cpuset' to force each of the 4 to run on a >> specific CPU? What happens if you force it to run on cpu #2? >=20 > # cpuset -c -l 2 openssl speed 1>/dev/null 2>&1 & >=20 > results in: >=20 > last pid: 2120; load averages: 2.95, 2.44, 2.25 = = up 0+10:26:07 12:08:54 > 18 processes: 2 running, 16 sleeping > CPU 0: 0.0% user, 0.0% nice, 0.0% system, 4.7% interrupt, 95.3% = idle > CPU 1: 0.4% user, 0.0% nice, 0.0% system, 0.0% interrupt, 99.6% = idle > CPU 2: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% = idle > CPU 3: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% = idle > Mem: 1852K Active, 587M Inact, 5368K Laundry, 145M Wired, 86M Buf, 53M = Free > Swap: 1536M Total, 1536M Free >=20 > PID USERNAME THR PRI NICE SIZE RES SWAP STATE C TIME = CPU COMMAND > 2120 root 1 20 0 6432K 2872K 0K CPU1 1 0:00 = 0.22% top -CawPopid > 2118 root 1 29 0 7256K 4432K 0K RUN 1 0:00 = 0.00% openssl speed > . . . >=20 >=20 > With the other 3 also assigned: >=20 > last pid: 2138; load averages: 4.45, 3.04, 2.51 = = up 0+10:28:48 12:11:35 > 21 processes: 5 running, 16 sleeping > CPU 0: 93.7% user, 0.0% nice, 1.6% system, 4.7% interrupt, 0.0% = idle > CPU 1: 100% user, 0.0% nice, 0.0% system, 0.0% interrupt, 0.0% = idle > CPU 2: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% = idle > CPU 3: 100% user, 0.0% nice, 0.0% system, 0.0% interrupt, 0.0% = idle > Mem: 3632K Active, 587M Inact, 5368K Laundry, 145M Wired, 86M Buf, 51M = Free > Swap: 1536M Total, 1536M Free >=20 > PID USERNAME THR PRI NICE SIZE RES SWAP STATE C TIME = CPU COMMAND > 2138 root 1 20 0 6472K 2888K 0K CPU1 1 0:00 = 0.24% top -CawPopid > 2137 root 1 100 0 7256K 4436K 0K CPU3 3 0:25 = 96.79% openssl speed > 2124 root 1 102 0 7256K 4436K 0K RUN 1 0:37 = 101.66% openssl speed > 2123 root 1 101 0 7256K 4436K 0K CPU0 0 0:44 = 95.07% openssl speed > 2118 root 1 29 0 7256K 4432K 0K RUN 1 0:00 = 0.00% openssl speed > . . . >=20 > Thus the difference is that the one instance > of openssl speed does not run at all. I tried to fg and ^C each of the 4 openssl's, with cpu2's instance being last. All worked but for cpu2's. ^C and ^Z do not work after the fg. But ^T reports: # fg cpuset -c -l 3 openssl speed >/dev/null 2>&1 ^C # fg cpuset -c -l 1 openssl speed >/dev/null 2>&1 ^C # fg cpuset -c -l 0 openssl speed >/dev/null 2>&1 ^C# fg cpuset -c -l 2 openssl speed >/dev/null 2>&1 ^C ^Z load: 3.03 cmd: openssl 2118 [runnable] 768.91r 0.06u 0.02s 0% 4424k load: 3.07 cmd: openssl 2118 [runnable] 825.03r 0.06u 0.02s 0% 4424k load: 3.06 cmd: openssl 2118 [runnable] 1177.78r 0.06u 0.02s 0% 4424k Using an ssh sessions as the console is stuck in the above state, it appears that the load average is now stuck at around 3 despite the basically idle state: last pid: 2172; load averages: 3.03, 3.29, 3.08 = = up 0+10:39:36 12:22:23 23 processes: 2 running, 21 sleeping CPU 0: 0.0% user, 0.0% nice, 0.0% system, 3.1% interrupt, 96.9% idle CPU 1: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle CPU 2: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle CPU 3: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle Mem: 4480K Active, 590M Inact, 5364K Laundry, 145M Wired, 86M Buf, 46M = Free Swap: 1536M Total, 1536M Free PID USERNAME THR PRI NICE SIZE RES SWAP STATE C TIME = CPU COMMAND 2172 root 1 20 0 6344K 2888K 0K CPU3 3 0:00 = 0.24% top -CawPopid 2155 root 1 22 0 5484K 2660K 0K wait 3 0:00 = 0.00% su (sh) 2154 markmi 1 23 0 5260K 2648K 0K wait 0 0:00 = 0.00% su 2148 markmi 1 28 0 5524K 2724K 0K wait 3 0:00 = 0.00% -sh (sh) 2147 markmi 1 20 0 11024K 6208K 0K select 0 0:00 = 0.03% sshd: markmi@pts/0 (sshd) 2143 root 1 26 0 10644K 6128K 0K select 3 0:00 = 0.00% sshd: markmi [priv] (sshd) 2118 root 1 20 0 7256K 4432K 0K RUN 1 0:00 = 0.00% openssl speed 714 root 1 20 0 5568K 2828K 0K wait 3 0:00 = 0.00% -sh (sh) 713 root 1 22 0 5300K 2600K 0K wait 1 0:00 = 0.00% login [pam] (login) 712 root 1 52 0 4356K 2004K 0K ttyin 0 0:00 = 0.00% /usr/libexec/getty Pc ttyv0 665 root 1 52 0 4816K 2332K 0K nanslp 3 0:01 = 0.00% /usr/sbin/cron -s 661 root 1 20 0 9984K 4832K 0K select 0 0:00 = 0.00% /usr/sbin/sshd 617 root 1 20 0 4308K 1860K 0K select 0 0:06 = 0.01% /usr/sbin/powerd 614 root 1 20 0 10452K 10480K 0K select 3 0:04 = 0.01% /usr/sbin/ntpd -g -c /etc/ntp.conf -p /var/run/ntpd.pid -f = /var/db/ntpd.drift 576 root 32 52 0 4388K 2160K 0K rpcsvc 0 0:00 = 0.00% nfsd: server (nfsd) 575 root 1 52 0 4720K 2044K 0K select 3 0:00 = 0.00% nfsd: master (nfsd) 573 root 1 52 0 5052K 2216K 0K select 3 0:00 = 0.00% /usr/sbin/mountd -r 443 root 1 20 0 4724K 2088K 0K select 0 0:00 = 0.00% /usr/sbin/rpcbind 434 root 1 20 0 4772K 2280K 0K select 1 0:01 = 0.00% /usr/sbin/syslogd -s 363 root 1 20 0 6428K 3576K 0K select 0 0:00 = 0.00% /sbin/devd 362 _dhcp 1 42 0 5128K 2388K 0K select 3 0:00 = 0.00% dhclient: ue0 (dhclient) 316 root 1 52 0 5032K 2292K 0K select 0 0:00 = 0.00% dhclient: ue0 [priv] (dhclient) 313 root 1 52 0 4800K 2148K 0K select 3 0:00 = 0.00% dhclient: system.syslog (dhclient) Any more experiments before a reboot (and possibly not seeing the problem at all)? =3D=3D=3D Mark Millard markmi at dsl-only.net
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?E6B45C90-5A0C-4AE0-AAFA-C9753FD5E908>