Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 20 Aug 2018 11:53:03 -0700
From:      John-Mark Gurney <jmg@funkthat.com>
To:        Ian Lepore <ian@freebsd.org>, freebsd-arm@FreeBSD.org
Cc:        manu@FreeBSD.org
Subject:   Re: sx_sleep not waking up when timo expires
Message-ID:  <20180820185303.GH97145@funkthat.com>
In-Reply-To: <20180811020136.GD97145@funkthat.com>
References:  <20180729010157.GC2884@funkthat.com> <1532874944.61594.110.camel@freebsd.org> <20180811020136.GD97145@funkthat.com>

next in thread | previous in thread | raw e-mail | index | archive | help
John-Mark Gurney wrote this message on Fri, Aug 10, 2018 at 19:01 -0700:
> Also, I've had the shell command sleep hang as well.. I figure that's
> expected, but made me realized that a good test program could be to
> fire up a bunch of threads and sleep in them, to make finding the
> problem more quickly....
> 
> Anything I can do to help debug/fix it?
> 
> I have a couple spare LTS boards specifically to do stuff like this.

I wrote a program to trigger the issue.  It triggered the issue in only
an hour or two on both the A64-LTS boards that I've tried it on.

Hopefully this can help others debug it.

On my firewall board, that does a lot of interrupts, it happens a lot more
frequently.  In the last 4 hours or so of running the program, I've had 6
threads hang in sleep.

# vmstat -i
interrupt                                             total       rate
gic0,p11: +                                       664239597        676
gic0,s0: uart0                                        10659          0
gic0,s60: aw_mmc0                                     70294          0
gic0,s82: awg0                                    452027133        460
cpu0:ast                                                511          0
cpu1:ast                                                 48          0
cpu2:ast                                                 34          0
cpu3:ast                                                 35          0
cpu0:preempt                                       15682717         16
cpu1:preempt                                       14384242         15
cpu2:preempt                                       16722306         17
cpu3:preempt                                       16798837         17
cpu0:rendezvous                                      300161          0
cpu1:rendezvous                                        8545          0
cpu2:rendezvous                                      300183          0
cpu3:rendezvous                                      300115          0
cpu0:hardclock                                        35093          0
Total                                            1180880510       1201
# uptime
11:50AM  up 11 days,  9:03, 8 users, load averages: 0.68, 0.83, 0.93

The other box that has only two threads freeze has a total rate of 77..

---- sleeptest.py ----
import Queue
import threading
import time
import random

def sleepfun(q, lngth, extlst, idobj):

	while not extlst:
		#factor = (random.random() + 1) * 4
		factor = 1
		factor = (random.random() * .5 + 1)
		time.sleep(lngth * factor)
		q.put((idobj, time.time()))

def run():
	sleeplength = .5
	exitlist = []
	nthreads = 20

	q = Queue.Queue()

	thds = {}
	lastcheck = {}

	for i in xrange(nthreads):
		obj = object()
		thr = threading.Thread(target=sleepfun, args=(q, sleeplength, exitlist, obj))
		thds[obj] = thr
		lastcheck[obj] = time.time()
		thr.start()

	try:
		while True:
			for i in xrange(nthreads*3):
				obj, tm = q.get()
				lastcheck[obj] = tm

			cur = time.time()
			for i in lastcheck.keys():
				if not thds[i].isAlive():
					print 'thread died.'
					del thds[i]
					del lastcheck[i]
					continue

				print 'last checkin:', cur - lastcheck[i]
				if cur - lastcheck[i] > 2 * sleeplength:
					print 'thread is stuck:', `obj`, 'since:', time.ctime(lastcheck[i])
	except KeyboardInterrupt:
		print 'trying to exit...'
		print time.ctime(time.time())
		exitlist.append(True)
		for i in thds:
			thds[i].join()

if __name__ == '__main__':
	run()
---- sleeptest.py ----

-- 
  John-Mark Gurney				Voice: +1 415 225 5579

     "All that I will do, has been done, All that I have, has not."



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20180820185303.GH97145>