Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 2 Jan 2018 12:27:47 -0800
From:      Mark Millard <markmi@dsl-only.net>
To:        Allan Jude <allanjude@freebsd.org>
Cc:        FreeBSD Hackers <freebsd-hackers@freebsd.org>
Subject:   Re: rpi2 head -r327485 (e.g.): rpi2 leaves one "CPU n" always idle for some boots
Message-ID:  <E6B45C90-5A0C-4AE0-AAFA-C9753FD5E908@dsl-only.net>
In-Reply-To: <85268695-3440-4B94-BF3D-68A38E204F50@dsl-only.net>
References:  <3258F809-F179-4EB7-857C-74667BECD360@dsl-only.net> <9da67685-d1c2-b6e0-cc32-efb49e67f0b1@freebsd.org> <85268695-3440-4B94-BF3D-68A38E204F50@dsl-only.net>

next in thread | previous in thread | raw e-mail | index | archive | help
[It looks like the opensll speed on problem cpuN (2)
can not get ^C and the like.

On 2018-Jan-2, at 12:13 PM, Mark Millard <markmi at dsl-only.net> wrote:

> On 2018-Jan-2, at 12:01 PM, Allan Jude <allanjude at freebsd.org> =
wrote:
>=20
>> On 2018-01-02 14:48, Mark Millard wrote:
>>> I've seen this over many versions of head for months
>>> but have never managed to find a way to force it to
>>> happen. It just shows up once and a while.
>>>=20
>>> Thus, I'm just dumping out some top and kernel information
>>> here for reference. I've used:
>>>=20
>>> openssl speed 1>/dev/null 2>&1 &
>>> openssl speed 1>/dev/null 2>&1 &
>>> openssl speed 1>/dev/null 2>&1 &
>>> openssl speed 1>/dev/null 2>&1 &
>>>=20
>>> to give the rpi2 4 active processes. Various outputs
>>> are from different times without a reboot between.
>>>=20
>>> top -CaePores shows the likes of:
>>>=20
>>> PID USERNAME       THR PRI NICE   SIZE    RES   SWAP STATE   C   =
TIME     CPU COMMAND
>>> 614 root             1  20    0 10452K 10480K     0K select  1   =
0:00   0.03% /usr/sbin/ntpd -g -c /etc/ntp.conf -p /var/run/ntpd.pid -f =
/var/db/ntpd.drift
>>> 661 root             1  52    0  9984K  6132K     0K select  1   =
0:00   0.00% /usr/sbin/sshd
>>> 751 root             1 101    0  7256K  4276K     0K RUN     1   =
0:28  99.57% openssl speed
>>> 750 root             1 100    0  7256K  4276K     0K CPU0    0   =
0:32  94.83% openssl speed
>>> 753 root             1  86    0  7256K  4276K     0K RUN     3   =
0:13  52.36% openssl speed
>>> 752 root             1  86    0  7256K  4276K     0K CPU3    3   =
0:14  46.54% openssl speed
>>> 363 root             1  20    0  6428K  3840K     0K select  3   =
0:00   0.00% /sbin/devd
>>> . . .
>>>=20
>>> and:
>>>=20
>>> last pid:   754;  load averages:  3.70,  2.38,  1.58                 =
                                                                         =
                                  up 0+00:16:50  01:59:37
>>> 21 processes:  5 running, 16 sleeping
>>> CPU 0: 94.9% user,  0.0% nice,  0.0% system,  5.1% interrupt,  0.0% =
idle
>>> CPU 1: 99.6% user,  0.0% nice,  0.0% system,  0.4% interrupt,  0.0% =
idle
>>> CPU 2:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% =
idle
>>> CPU 3:  100% user,  0.0% nice,  0.0% system,  0.0% interrupt,  0.0% =
idle
>>> Mem: 12M Active, 1136K Inact, 56M Wired, 30M Buf, 722M Free
>>> Swap: 1536M Total, 6M Free
>>>=20
>>> =46rom problem boot to problem boot, the CPU that stays
>>> idle has varied but usually has been CPU 2. I've never
>>> seen 2 or more stuck in idle.
>>>=20
>>> show allpcpu shows the likes of:
>>>=20
>>> db> show allpcpu
>>> Current CPU: 0
>>>=20
>>> cpuid        =3D 0
>>> dynamic pcpu =3D 0x3d2540
>>> curthread    =3D 0xd8478ae0: pid 2032 tid 100150 "openssl"
>>> curpcb       =3D 0xd852ae98
>>> fpcurthread  =3D 0xd8478ae0: pid 2032 "openssl"
>>> idlethread   =3D 0xc376fae0: tid 100002 "idle: cpu0"
>>> curpmap      =3D 0xd8e43bf4
>>> curvnet      =3D 0
>>>=20
>>> cpuid        =3D 1
>>> dynamic pcpu =3D 0x3998540
>>> curthread    =3D 0xd7e5b3a0: pid 2031 tid 100173 "openssl"
>>> curpcb       =3D 0xda7e0e98
>>> fpcurthread  =3D 0xd7e5b3a0: pid 2031 "openssl"
>>> idlethread   =3D 0xc376f740: tid 100003 "idle: cpu1"
>>> curpmap      =3D 0xd8e43ec4
>>> curvnet      =3D 0
>>>=20
>>> cpuid        =3D 2
>>> dynamic pcpu =3D 0x3999540
>>> curthread    =3D 0xc376f3a0: pid 10 tid 100004 "idle: cpu2"
>>> curpcb       =3D 0xc378ae98
>>> fpcurthread  =3D none
>>> idlethread   =3D 0xc376f3a0: tid 100004 "idle: cpu2"
>>> curpmap      =3D 0
>>> curvnet      =3D 0
>>>=20
>>> cpuid        =3D 3
>>> dynamic pcpu =3D 0x399a540
>>> curthread    =3D 0xd8477000: pid 2034 tid 100167 "openssl"
>>> curpcb       =3D 0xd876de98
>>> fpcurthread  =3D 0xd8477000: pid 2034 "openssl"
>>> idlethread   =3D 0xc376f000: tid 100005 "idle: cpu3"
>>> curpmap      =3D 0xc377ab04
>>> curvnet      =3D 0
>>>=20
>>> In other words: it appears that the cpuN (here cpu2) is
>>> left with idle scheduled all the time for some reason.
>>>=20
>>> ps from db> shows things like:
>>>=20
>>>=20
>>> db> ps
>>> pid  ppid  pgrp   uid  state   wmesg   wchan       cmd
>>> 2034   714  2034     0  R+                          openssl
>>> 2033   714  2033     0  R+      CPU 3               openssl
>>> 2032   714  2032     0  R+      CPU 0               openssl
>>> 2031   714  2031     0  R+      CPU 1               openssl
>>>=20
>>> (then later:)
>>>=20
>>> db> ps
>>> pid  ppid  pgrp   uid  state   wmesg   wchan       cmd
>>> 2034   714  2034     0  R+      CPU 3               openssl
>>> 2033   714  2033     0  R+                          openssl
>>> 2032   714  2032     0  R+      CPU 0               openssl
>>> 2031   714  2031     0  R+      CPU 1               openssl
>>>=20
>>> There is also:
>>>=20
>>>  10     0     0     0  RL      (threaded)          [idle]
>>> 100002                   CanRun                      [idle: cpu0]
>>> 100003                   CanRun                      [idle: cpu1]
>>> 100004                   CanRun                      [idle: cpu2]
>>> 100005                   CanRun                      [idle: cpu3]
>>>=20
>>>=20
>>> These are from:
>>>=20
>>> # uname -apKU
>>> FreeBSD rpi2 12.0-CURRENT FreeBSD 12.0-CURRENT  r327485M  arm armv7 =
1200054 1200054
>>>=20
>>>=20
>>> =3D=3D=3D
>>> Mark Millard
>>> markmi at dsl-only.net
>>>=20
>>> _______________________________________________
>>> freebsd-hackers@freebsd.org mailing list
>>> https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
>>> To unsubscribe, send any mail to =
"freebsd-hackers-unsubscribe@freebsd.org"
>>>=20
>>=20
>> Have you tried using 'cpuset' to force each of the 4 to run on a
>> specific CPU? What happens if you force it to run on cpu #2?
>=20
> # cpuset -c -l 2 openssl speed 1>/dev/null 2>&1 &
>=20
> results in:
>=20
> last pid:  2120;  load averages:  2.95,  2.44,  2.25                   =
                                                                         =
                                up 0+10:26:07  12:08:54
> 18 processes:  2 running, 16 sleeping
> CPU 0:  0.0% user,  0.0% nice,  0.0% system,  4.7% interrupt, 95.3% =
idle
> CPU 1:  0.4% user,  0.0% nice,  0.0% system,  0.0% interrupt, 99.6% =
idle
> CPU 2:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% =
idle
> CPU 3:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% =
idle
> Mem: 1852K Active, 587M Inact, 5368K Laundry, 145M Wired, 86M Buf, 53M =
Free
> Swap: 1536M Total, 1536M Free
>=20
>  PID USERNAME       THR PRI NICE   SIZE    RES   SWAP STATE   C   TIME =
    CPU COMMAND
> 2120 root             1  20    0  6432K  2872K     0K CPU1    1   0:00 =
  0.22% top -CawPopid
> 2118 root             1  29    0  7256K  4432K     0K RUN     1   0:00 =
  0.00% openssl speed
> . . .
>=20
>=20
> With the other 3 also assigned:
>=20
> last pid:  2138;  load averages:  4.45,  3.04,  2.51                   =
                                                                         =
                                up 0+10:28:48  12:11:35
> 21 processes:  5 running, 16 sleeping
> CPU 0: 93.7% user,  0.0% nice,  1.6% system,  4.7% interrupt,  0.0% =
idle
> CPU 1:  100% user,  0.0% nice,  0.0% system,  0.0% interrupt,  0.0% =
idle
> CPU 2:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% =
idle
> CPU 3:  100% user,  0.0% nice,  0.0% system,  0.0% interrupt,  0.0% =
idle
> Mem: 3632K Active, 587M Inact, 5368K Laundry, 145M Wired, 86M Buf, 51M =
Free
> Swap: 1536M Total, 1536M Free
>=20
>  PID USERNAME       THR PRI NICE   SIZE    RES   SWAP STATE   C   TIME =
    CPU COMMAND
> 2138 root             1  20    0  6472K  2888K     0K CPU1    1   0:00 =
  0.24% top -CawPopid
> 2137 root             1 100    0  7256K  4436K     0K CPU3    3   0:25 =
 96.79% openssl speed
> 2124 root             1 102    0  7256K  4436K     0K RUN     1   0:37 =
101.66% openssl speed
> 2123 root             1 101    0  7256K  4436K     0K CPU0    0   0:44 =
 95.07% openssl speed
> 2118 root             1  29    0  7256K  4432K     0K RUN     1   0:00 =
  0.00% openssl speed
> . . .
>=20
> Thus the difference is that the one instance
> of openssl speed does not run at all.

I tried to fg and ^C each of the 4 openssl's,
with cpu2's instance being last.

All worked but for cpu2's.

^C and ^Z do not work after the fg. But ^T reports:

# fg
cpuset -c -l 3 openssl speed >/dev/null 2>&1
^C
# fg
cpuset -c -l 1 openssl speed >/dev/null 2>&1
^C
# fg
cpuset -c -l 0 openssl speed >/dev/null 2>&1
^C# fg
cpuset -c -l 2 openssl speed >/dev/null 2>&1
^C
^Z
load: 3.03  cmd: openssl 2118 [runnable] 768.91r 0.06u 0.02s 0% 4424k
load: 3.07  cmd: openssl 2118 [runnable] 825.03r 0.06u 0.02s 0% 4424k
load: 3.06  cmd: openssl 2118 [runnable] 1177.78r 0.06u 0.02s 0% 4424k


Using an ssh sessions as the console is stuck
in the above state, it appears that the load
average is now stuck at around 3 despite the
basically idle state:

last pid:  2172;  load averages:  3.03,  3.29,  3.08                     =
                                                                         =
                              up 0+10:39:36  12:22:23
23 processes:  2 running, 21 sleeping
CPU 0:  0.0% user,  0.0% nice,  0.0% system,  3.1% interrupt, 96.9% idle
CPU 1:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
CPU 2:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
CPU 3:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
Mem: 4480K Active, 590M Inact, 5364K Laundry, 145M Wired, 86M Buf, 46M =
Free
Swap: 1536M Total, 1536M Free

  PID USERNAME       THR PRI NICE   SIZE    RES   SWAP STATE   C   TIME  =
   CPU COMMAND
 2172 root             1  20    0  6344K  2888K     0K CPU3    3   0:00  =
 0.24% top -CawPopid
 2155 root             1  22    0  5484K  2660K     0K wait    3   0:00  =
 0.00% su (sh)
 2154 markmi           1  23    0  5260K  2648K     0K wait    0   0:00  =
 0.00% su
 2148 markmi           1  28    0  5524K  2724K     0K wait    3   0:00  =
 0.00% -sh (sh)
 2147 markmi           1  20    0 11024K  6208K     0K select  0   0:00  =
 0.03% sshd: markmi@pts/0 (sshd)
 2143 root             1  26    0 10644K  6128K     0K select  3   0:00  =
 0.00% sshd: markmi [priv] (sshd)
 2118 root             1  20    0  7256K  4432K     0K RUN     1   0:00  =
 0.00% openssl speed
  714 root             1  20    0  5568K  2828K     0K wait    3   0:00  =
 0.00% -sh (sh)
  713 root             1  22    0  5300K  2600K     0K wait    1   0:00  =
 0.00% login [pam] (login)
  712 root             1  52    0  4356K  2004K     0K ttyin   0   0:00  =
 0.00% /usr/libexec/getty Pc ttyv0
  665 root             1  52    0  4816K  2332K     0K nanslp  3   0:01  =
 0.00% /usr/sbin/cron -s
  661 root             1  20    0  9984K  4832K     0K select  0   0:00  =
 0.00% /usr/sbin/sshd
  617 root             1  20    0  4308K  1860K     0K select  0   0:06  =
 0.01% /usr/sbin/powerd
  614 root             1  20    0 10452K 10480K     0K select  3   0:04  =
 0.01% /usr/sbin/ntpd -g -c /etc/ntp.conf -p /var/run/ntpd.pid -f =
/var/db/ntpd.drift
  576 root            32  52    0  4388K  2160K     0K rpcsvc  0   0:00  =
 0.00% nfsd: server (nfsd)
  575 root             1  52    0  4720K  2044K     0K select  3   0:00  =
 0.00% nfsd: master (nfsd)
  573 root             1  52    0  5052K  2216K     0K select  3   0:00  =
 0.00% /usr/sbin/mountd -r
  443 root             1  20    0  4724K  2088K     0K select  0   0:00  =
 0.00% /usr/sbin/rpcbind
  434 root             1  20    0  4772K  2280K     0K select  1   0:01  =
 0.00% /usr/sbin/syslogd -s
  363 root             1  20    0  6428K  3576K     0K select  0   0:00  =
 0.00% /sbin/devd
  362 _dhcp            1  42    0  5128K  2388K     0K select  3   0:00  =
 0.00% dhclient: ue0 (dhclient)
  316 root             1  52    0  5032K  2292K     0K select  0   0:00  =
 0.00% dhclient: ue0 [priv] (dhclient)
  313 root             1  52    0  4800K  2148K     0K select  3   0:00  =
 0.00% dhclient: system.syslog (dhclient)


Any more experiments before a reboot (and possibly not seeing
the problem at all)?

=3D=3D=3D
Mark Millard
markmi at dsl-only.net





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?E6B45C90-5A0C-4AE0-AAFA-C9753FD5E908>