From owner-freebsd-hackers@FreeBSD.ORG  Mon Apr  2 17:55:39 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 28EE1106564A
	for <freebsd-hackers@freebsd.org>; Mon,  2 Apr 2012 17:55:39 +0000 (UTC)
	(envelope-from jrytoung@gmail.com)
Received: from mail-wg0-f42.google.com (mail-wg0-f42.google.com [74.125.82.42])
	by mx1.freebsd.org (Postfix) with ESMTP id B369C8FC1B
	for <freebsd-hackers@freebsd.org>; Mon,  2 Apr 2012 17:55:38 +0000 (UTC)
Received: by wgbds11 with SMTP id ds11so2663902wgb.1
	for <freebsd-hackers@freebsd.org>; Mon, 02 Apr 2012 10:55:31 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=mime-version:date:message-id:subject:from:to:content-type;
	bh=1jFQVM8F9/HE/wN+59aK9HO7O1m2+WbfUSOsG7lT6ok=;
	b=Abdelqjb8QecAKTVSZre+BaRUFeBQTT3IDUPyQJZw9fIc5Ib9IpMytd0mqazZkWhv0
	DqvnvUzQSmJXVP4x/I/B37bmtyLIuh40+Rv2B03rhfxa/ATVU6K1msKxzaUpduCc4El9
	6/qz9fdbXUvYw3rWvvA3RYSkW9MmBcdukK6drVi6zLZXJ6PWHmAiU1h83tBfMp6wICZK
	O4LZ6xziWpDMgk8+QMy3eW349cpF/nobw1svuWWkym++j/jT2EQcuNcLQUVJed9WTGTF
	3/s9D3inyNd3G4MJVUJBS3wUK+aVHOrSfbq0DhIPa2nqNlaB+MskKFQIkfhYyaiIWOVb
	1nZg==
MIME-Version: 1.0
Received: by 10.180.107.101 with SMTP id hb5mr27204285wib.7.1333389331929;
	Mon, 02 Apr 2012 10:55:31 -0700 (PDT)
Received: by 10.216.27.148 with HTTP; Mon, 2 Apr 2012 10:55:31 -0700 (PDT)
Date: Mon, 2 Apr 2012 10:55:31 -0700
Message-ID: <CADC0LV=-e+7PshRQdc69e2-Vktf6XFpVrqiMpx=QL4m_+9hSnw@mail.gmail.com>
From: Jerry Toung <jrytoung@gmail.com>
To: freebsd-hackers <freebsd-hackers@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1
Subject: CAM disk I/O starvation
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 02 Apr 2012 17:55:39 -0000

Hello list,
I am convinced that there is a bug in the CAM code that leads to I/O starvation.
I have already discussed this privately with some. I am now bringing this up to
the general audience to get more feedback.

My setup is that I have 1 RAID controller with 2 arrays connected to
it, da0 and da1.
The controller supports 252 tags. After boot up, camcontrol tags on
da0 and da1 shows that both devices have 252 openings each. A process
P0 writing on da0 is dormant most of the time, but would wake up with
burst of I/Os, 5000-6000 ops as reported by gstat.
A process P1 writing on da1 has a fixed data rate to da1 as reported by gstat.

The issue: When P0 generates that burst of 5000-6000 ops, the write
rate of P1 on da1 goes to 0 MB/sec for up to 8-9sec,
vfs.hirunningspace starts climbing and we get into waithirunning() or
getblk() sleep channel. BTW, raising hirunningspace has no effect on
the 0 MB/s behavior.

The first problem that I see here, is that if the sim's devq has 252
alloc_queue and
send_queue, the struct cam_ed representing da0 and da1 should each
have 126 openings and not
252. The second problem is that clearly, there is no I/O fairness in CAM as seen
in gstat output and da0 exclusively takes a hold of the sim/controller
until it has processed all it's I/Os (8-9 seconds). The code that does
this is at

cam/cam_xpt.c:3030
3030             && (devq->alloc_openings > 0)

and

cam/cam_xpt.c:3091
3091             && (devq->send_openings > 0)

After you've split the openings to 126 each, the tests above will always be true

I have a patch and it fixes those problems. I can share it to the list
if requested to.
da0 and da1 now both automatically get 126 openings and based on that,
extra logic implements fairness in cam/cam_xpt.c. No more 0 MB/s on
da1. This is on 8.1-RELEASE FreeBSD.

Any comments welcome.

Thanks,
Jerry