From owner-freebsd-hackers@FreeBSD.ORG Mon Apr 2 17:55:39 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 28EE1106564A for ; Mon, 2 Apr 2012 17:55:39 +0000 (UTC) (envelope-from jrytoung@gmail.com) Received: from mail-wg0-f42.google.com (mail-wg0-f42.google.com [74.125.82.42]) by mx1.freebsd.org (Postfix) with ESMTP id B369C8FC1B for ; Mon, 2 Apr 2012 17:55:38 +0000 (UTC) Received: by wgbds11 with SMTP id ds11so2663902wgb.1 for ; Mon, 02 Apr 2012 10:55:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=1jFQVM8F9/HE/wN+59aK9HO7O1m2+WbfUSOsG7lT6ok=; b=Abdelqjb8QecAKTVSZre+BaRUFeBQTT3IDUPyQJZw9fIc5Ib9IpMytd0mqazZkWhv0 DqvnvUzQSmJXVP4x/I/B37bmtyLIuh40+Rv2B03rhfxa/ATVU6K1msKxzaUpduCc4El9 6/qz9fdbXUvYw3rWvvA3RYSkW9MmBcdukK6drVi6zLZXJ6PWHmAiU1h83tBfMp6wICZK O4LZ6xziWpDMgk8+QMy3eW349cpF/nobw1svuWWkym++j/jT2EQcuNcLQUVJed9WTGTF 3/s9D3inyNd3G4MJVUJBS3wUK+aVHOrSfbq0DhIPa2nqNlaB+MskKFQIkfhYyaiIWOVb 1nZg== MIME-Version: 1.0 Received: by 10.180.107.101 with SMTP id hb5mr27204285wib.7.1333389331929; Mon, 02 Apr 2012 10:55:31 -0700 (PDT) Received: by 10.216.27.148 with HTTP; Mon, 2 Apr 2012 10:55:31 -0700 (PDT) Date: Mon, 2 Apr 2012 10:55:31 -0700 Message-ID: From: Jerry Toung To: freebsd-hackers Content-Type: text/plain; charset=ISO-8859-1 Subject: CAM disk I/O starvation X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 Apr 2012 17:55:39 -0000 Hello list, I am convinced that there is a bug in the CAM code that leads to I/O starvation. I have already discussed this privately with some. I am now bringing this up to the general audience to get more feedback. My setup is that I have 1 RAID controller with 2 arrays connected to it, da0 and da1. The controller supports 252 tags. After boot up, camcontrol tags on da0 and da1 shows that both devices have 252 openings each. A process P0 writing on da0 is dormant most of the time, but would wake up with burst of I/Os, 5000-6000 ops as reported by gstat. A process P1 writing on da1 has a fixed data rate to da1 as reported by gstat. The issue: When P0 generates that burst of 5000-6000 ops, the write rate of P1 on da1 goes to 0 MB/sec for up to 8-9sec, vfs.hirunningspace starts climbing and we get into waithirunning() or getblk() sleep channel. BTW, raising hirunningspace has no effect on the 0 MB/s behavior. The first problem that I see here, is that if the sim's devq has 252 alloc_queue and send_queue, the struct cam_ed representing da0 and da1 should each have 126 openings and not 252. The second problem is that clearly, there is no I/O fairness in CAM as seen in gstat output and da0 exclusively takes a hold of the sim/controller until it has processed all it's I/Os (8-9 seconds). The code that does this is at cam/cam_xpt.c:3030 3030 && (devq->alloc_openings > 0) and cam/cam_xpt.c:3091 3091 && (devq->send_openings > 0) After you've split the openings to 126 each, the tests above will always be true I have a patch and it fixes those problems. I can share it to the list if requested to. da0 and da1 now both automatically get 126 openings and based on that, extra logic implements fairness in cam/cam_xpt.c. No more 0 MB/s on da1. This is on 8.1-RELEASE FreeBSD. Any comments welcome. Thanks, Jerry