From owner-svn-src-all@freebsd.org Thu Apr 14 22:13:45 2016 Return-Path: Delivered-To: svn-src-all@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5B0B6ADA4B7; Thu, 14 Apr 2016 22:13:45 +0000 (UTC) (envelope-from imp@FreeBSD.org) Received: from repo.freebsd.org (repo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id EE6881ED4; Thu, 14 Apr 2016 22:13:44 +0000 (UTC) (envelope-from imp@FreeBSD.org) Received: from repo.freebsd.org ([127.0.1.37]) by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id u3EMDiNF061022; Thu, 14 Apr 2016 22:13:44 GMT (envelope-from imp@FreeBSD.org) Received: (from imp@localhost) by repo.freebsd.org (8.15.2/8.15.2/Submit) id u3EMDiGn061020; Thu, 14 Apr 2016 22:13:44 GMT (envelope-from imp@FreeBSD.org) Message-Id: <201604142213.u3EMDiGn061020@repo.freebsd.org> X-Authentication-Warning: repo.freebsd.org: imp set sender to imp@FreeBSD.org using -f From: Warner Losh Date: Thu, 14 Apr 2016 22:13:44 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: svn commit: r298010 - head/sys/cam X-SVN-Group: head MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 14 Apr 2016 22:13:45 -0000 Author: imp Date: Thu Apr 14 22:13:44 2016 New Revision: 298010 URL: https://svnweb.freebsd.org/changeset/base/298010 Log: Add in missing files from r298002. Added: head/sys/cam/cam_iosched.c (contents, props changed) head/sys/cam/cam_iosched.h (contents, props changed) Added: head/sys/cam/cam_iosched.c ============================================================================== --- /dev/null 00:00:00 1970 (empty, because file is newly added) +++ head/sys/cam/cam_iosched.c Thu Apr 14 22:13:44 2016 (r298010) @@ -0,0 +1,1599 @@ +/*- + * CAM IO Scheduler Interface + * + * Copyright (c) 2015 Netflix, Inc. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions, and the following disclaimer, + * without modification, immediately at the beginning of the file. + * 2. The name of the author may not be used to endorse or promote products + * derived from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR + * ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD$ + */ + +#include "opt_cam.h" +#include "opt_ddb.h" + +#include +__FBSDID("$FreeBSD$"); + +#include + +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include + +#include + +static MALLOC_DEFINE(M_CAMSCHED, "CAM I/O Scheduler", + "CAM I/O Scheduler buffers"); + +/* + * Default I/O scheduler for FreeBSD. This implementation is just a thin-vineer + * over the bioq_* interface, with notions of separate calls for normal I/O and + * for trims. + */ + +#ifdef CAM_NETFLIX_IOSCHED + +SYSCTL_DECL(_kern_cam); +static int do_netflix_iosched = 1; +TUNABLE_INT("kern.cam.do_netflix_iosched", &do_netflix_iosched); +SYSCTL_INT(_kern_cam, OID_AUTO, do_netflix_iosched, CTLFLAG_RD, + &do_netflix_iosched, 1, + "Enable Netflix I/O scheduler optimizations."); + +static int alpha_bits = 9; +TUNABLE_INT("kern.cam.iosched_alpha_bits", &alpha_bits); +SYSCTL_INT(_kern_cam, OID_AUTO, iosched_alpha_bits, CTLFLAG_RW, + &alpha_bits, 1, + "Bits in EMA's alpha."); + + + +struct iop_stats; +struct cam_iosched_softc; + +int iosched_debug = 0; + +typedef enum { + none = 0, /* No limits */ + queue_depth, /* Limit how many ops we queue to SIM */ + iops, /* Limit # of IOPS to the drive */ + bandwidth, /* Limit bandwidth to the drive */ + limiter_max +} io_limiter; + +static const char *cam_iosched_limiter_names[] = + { "none", "queue_depth", "iops", "bandwidth" }; + +/* + * Called to initialize the bits of the iop_stats structure relevant to the + * limiter. Called just after the limiter is set. + */ +typedef int l_init_t(struct iop_stats *); + +/* + * Called every tick. + */ +typedef int l_tick_t(struct iop_stats *); + +/* + * Called to see if the limiter thinks this IOP can be allowed to + * proceed. If so, the limiter assumes that the while IOP proceeded + * and makes any accounting of it that's needed. + */ +typedef int l_iop_t(struct iop_stats *, struct bio *); + +/* + * Called when an I/O completes so the limiter can updates its + * accounting. Pending I/Os may complete in any order (even when + * sent to the hardware at the same time), so the limiter may not + * make any assumptions other than this I/O has completed. If it + * returns 1, then xpt_schedule() needs to be called again. + */ +typedef int l_iodone_t(struct iop_stats *, struct bio *); + +static l_iop_t cam_iosched_qd_iop; +static l_iop_t cam_iosched_qd_caniop; +static l_iodone_t cam_iosched_qd_iodone; + +static l_init_t cam_iosched_iops_init; +static l_tick_t cam_iosched_iops_tick; +static l_iop_t cam_iosched_iops_caniop; +static l_iop_t cam_iosched_iops_iop; + +static l_init_t cam_iosched_bw_init; +static l_tick_t cam_iosched_bw_tick; +static l_iop_t cam_iosched_bw_caniop; +static l_iop_t cam_iosched_bw_iop; + +struct limswitch +{ + l_init_t *l_init; + l_tick_t *l_tick; + l_iop_t *l_iop; + l_iop_t *l_caniop; + l_iodone_t *l_iodone; +} limsw[] = +{ + { /* none */ + .l_init = NULL, + .l_tick = NULL, + .l_iop = NULL, + .l_iodone= NULL, + }, + { /* queue_depth */ + .l_init = NULL, + .l_tick = NULL, + .l_caniop = cam_iosched_qd_caniop, + .l_iop = cam_iosched_qd_iop, + .l_iodone= cam_iosched_qd_iodone, + }, + { /* iops */ + .l_init = cam_iosched_iops_init, + .l_tick = cam_iosched_iops_tick, + .l_caniop = cam_iosched_iops_caniop, + .l_iop = cam_iosched_iops_iop, + .l_iodone= NULL, + }, + { /* bandwidth */ + .l_init = cam_iosched_bw_init, + .l_tick = cam_iosched_bw_tick, + .l_caniop = cam_iosched_bw_caniop, + .l_iop = cam_iosched_bw_iop, + .l_iodone= NULL, + }, +}; + +struct iop_stats +{ + /* + * sysctl state for this subnode. + */ + struct sysctl_ctx_list sysctl_ctx; + struct sysctl_oid *sysctl_tree; + + /* + * Information about the current rate limiters, if any + */ + io_limiter limiter; /* How are I/Os being limited */ + int min; /* Low range of limit */ + int max; /* High range of limit */ + int current; /* Current rate limiter */ + int l_value1; /* per-limiter scratch value 1. */ + int l_value2; /* per-limiter scratch value 2. */ + + + /* + * Debug information about counts of I/Os that have gone through the + * scheduler. + */ + int pending; /* I/Os pending in the hardware */ + int queued; /* number currently in the queue */ + int total; /* Total for all time -- wraps */ + int in; /* number queued all time -- wraps */ + int out; /* number completed all time -- wraps */ + + /* + * Statistics on different bits of the process. + */ + /* Exp Moving Average, alpha = 1 / (1 << alpha_bits) */ + sbintime_t ema; + sbintime_t emss; /* Exp Moving sum of the squares */ + sbintime_t sd; /* Last computed sd */ + + struct cam_iosched_softc *softc; +}; + + +typedef enum { + set_max = 0, /* current = max */ + read_latency, /* Steer read latency by throttling writes */ + cl_max /* Keep last */ +} control_type; + +static const char *cam_iosched_control_type_names[] = + { "set_max", "read_latency" }; + +struct control_loop +{ + /* + * sysctl state for this subnode. + */ + struct sysctl_ctx_list sysctl_ctx; + struct sysctl_oid *sysctl_tree; + + sbintime_t next_steer; /* Time of next steer */ + sbintime_t steer_interval; /* How often do we steer? */ + sbintime_t lolat; + sbintime_t hilat; + int alpha; + control_type type; /* What type of control? */ + int last_count; /* Last I/O count */ + + struct cam_iosched_softc *softc; +}; + +#endif + +struct cam_iosched_softc +{ + struct bio_queue_head bio_queue; + struct bio_queue_head trim_queue; + /* scheduler flags < 16, user flags >= 16 */ + uint32_t flags; + int sort_io_queue; +#ifdef CAM_NETFLIX_IOSCHED + int read_bias; /* Read bias setting */ + int current_read_bias; /* Current read bias state */ + int total_ticks; + + struct bio_queue_head write_queue; + struct iop_stats read_stats, write_stats, trim_stats; + struct sysctl_ctx_list sysctl_ctx; + struct sysctl_oid *sysctl_tree; + + int quanta; /* Number of quanta per second */ + struct callout ticker; /* Callout for our quota system */ + struct cam_periph *periph; /* cam periph associated with this device */ + uint32_t this_frac; /* Fraction of a second (1024ths) for this tick */ + sbintime_t last_time; /* Last time we ticked */ + struct control_loop cl; +#endif +}; + +#ifdef CAM_NETFLIX_IOSCHED +/* + * helper functions to call the limsw functions. + */ +static int +cam_iosched_limiter_init(struct iop_stats *ios) +{ + int lim = ios->limiter; + + /* maybe this should be a kassert */ + if (lim < none || lim >= limiter_max) + return EINVAL; + + if (limsw[lim].l_init) + return limsw[lim].l_init(ios); + + return 0; +} + +static int +cam_iosched_limiter_tick(struct iop_stats *ios) +{ + int lim = ios->limiter; + + /* maybe this should be a kassert */ + if (lim < none || lim >= limiter_max) + return EINVAL; + + if (limsw[lim].l_tick) + return limsw[lim].l_tick(ios); + + return 0; +} + +static int +cam_iosched_limiter_iop(struct iop_stats *ios, struct bio *bp) +{ + int lim = ios->limiter; + + /* maybe this should be a kassert */ + if (lim < none || lim >= limiter_max) + return EINVAL; + + if (limsw[lim].l_iop) + return limsw[lim].l_iop(ios, bp); + + return 0; +} + +static int +cam_iosched_limiter_caniop(struct iop_stats *ios, struct bio *bp) +{ + int lim = ios->limiter; + + /* maybe this should be a kassert */ + if (lim < none || lim >= limiter_max) + return EINVAL; + + if (limsw[lim].l_caniop) + return limsw[lim].l_caniop(ios, bp); + + return 0; +} + +static int +cam_iosched_limiter_iodone(struct iop_stats *ios, struct bio *bp) +{ + int lim = ios->limiter; + + /* maybe this should be a kassert */ + if (lim < none || lim >= limiter_max) + return 0; + + if (limsw[lim].l_iodone) + return limsw[lim].l_iodone(ios, bp); + + return 0; +} + +/* + * Functions to implement the different kinds of limiters + */ + +static int +cam_iosched_qd_iop(struct iop_stats *ios, struct bio *bp) +{ + + if (ios->current <= 0 || ios->pending < ios->current) + return 0; + + return EAGAIN; +} + +static int +cam_iosched_qd_caniop(struct iop_stats *ios, struct bio *bp) +{ + + if (ios->current <= 0 || ios->pending < ios->current) + return 0; + + return EAGAIN; +} + +static int +cam_iosched_qd_iodone(struct iop_stats *ios, struct bio *bp) +{ + + if (ios->current <= 0 || ios->pending != ios->current) + return 0; + + return 1; +} + +static int +cam_iosched_iops_init(struct iop_stats *ios) +{ + + ios->l_value1 = ios->current / ios->softc->quanta; + if (ios->l_value1 <= 0) + ios->l_value1 = 1; + + return 0; +} + +static int +cam_iosched_iops_tick(struct iop_stats *ios) +{ + + ios->l_value1 = (int)((ios->current * (uint64_t)ios->softc->this_frac) >> 16); + if (ios->l_value1 <= 0) + ios->l_value1 = 1; + + return 0; +} + +static int +cam_iosched_iops_caniop(struct iop_stats *ios, struct bio *bp) +{ + + /* + * So if we have any more IOPs left, allow it, + * otherwise wait. + */ + if (ios->l_value1 <= 0) + return EAGAIN; + return 0; +} + +static int +cam_iosched_iops_iop(struct iop_stats *ios, struct bio *bp) +{ + int rv; + + rv = cam_iosched_limiter_caniop(ios, bp); + if (rv == 0) + ios->l_value1--; + + return rv; +} + +static int +cam_iosched_bw_init(struct iop_stats *ios) +{ + + /* ios->current is in kB/s, so scale to bytes */ + ios->l_value1 = ios->current * 1000 / ios->softc->quanta; + + return 0; +} + +static int +cam_iosched_bw_tick(struct iop_stats *ios) +{ + int bw; + + /* + * If we're in the hole for available quota from + * the last time, then add the quantum for this. + * If we have any left over from last quantum, + * then too bad, that's lost. Also, ios->current + * is in kB/s, so scale. + * + * We also allow up to 4 quanta of credits to + * accumulate to deal with burstiness. 4 is extremely + * arbitrary. + */ + bw = (int)((ios->current * 1000ull * (uint64_t)ios->softc->this_frac) >> 16); + if (ios->l_value1 < bw * 4) + ios->l_value1 += bw; + + return 0; +} + +static int +cam_iosched_bw_caniop(struct iop_stats *ios, struct bio *bp) +{ + /* + * So if we have any more bw quota left, allow it, + * otherwise wait. Not, we'll go negative and that's + * OK. We'll just get a lettle less next quota. + * + * Note on going negative: that allows us to process + * requests in order better, since we won't allow + * shorter reads to get around the long one that we + * don't have the quota to do just yet. It also prevents + * starvation by being a little more permissive about + * what we let through this quantum (to prevent the + * starvation), at the cost of getting a little less + * next quantum. + */ + if (ios->l_value1 <= 0) + return EAGAIN; + + + return 0; +} + +static int +cam_iosched_bw_iop(struct iop_stats *ios, struct bio *bp) +{ + int rv; + + rv = cam_iosched_limiter_caniop(ios, bp); + if (rv == 0) + ios->l_value1 -= bp->bio_length; + + return rv; +} + +static void cam_iosched_cl_maybe_steer(struct control_loop *clp); + +static void +cam_iosched_ticker(void *arg) +{ + struct cam_iosched_softc *isc = arg; + sbintime_t now, delta; + + callout_reset(&isc->ticker, hz / isc->quanta - 1, cam_iosched_ticker, isc); + + now = sbinuptime(); + delta = now - isc->last_time; + isc->this_frac = (uint32_t)delta >> 16; /* Note: discards seconds -- should be 0 harmless if not */ + isc->last_time = now; + + cam_iosched_cl_maybe_steer(&isc->cl); + + cam_iosched_limiter_tick(&isc->read_stats); + cam_iosched_limiter_tick(&isc->write_stats); + cam_iosched_limiter_tick(&isc->trim_stats); + + cam_iosched_schedule(isc, isc->periph); + + isc->total_ticks++; +} + + +static void +cam_iosched_cl_init(struct control_loop *clp, struct cam_iosched_softc *isc) +{ + + clp->next_steer = sbinuptime(); + clp->softc = isc; + clp->steer_interval = SBT_1S * 5; /* Let's start out steering every 5s */ + clp->lolat = 5 * SBT_1MS; + clp->hilat = 15 * SBT_1MS; + clp->alpha = 20; /* Alpha == gain. 20 = .2 */ + clp->type = set_max; +} + +static void +cam_iosched_cl_maybe_steer(struct control_loop *clp) +{ + struct cam_iosched_softc *isc; + sbintime_t now, lat; + int old; + + isc = clp->softc; + now = isc->last_time; + if (now < clp->next_steer) + return; + + clp->next_steer = now + clp->steer_interval; + switch (clp->type) { + case set_max: + if (isc->write_stats.current != isc->write_stats.max) + printf("Steering write from %d kBps to %d kBps\n", + isc->write_stats.current, isc->write_stats.max); + isc->read_stats.current = isc->read_stats.max; + isc->write_stats.current = isc->write_stats.max; + isc->trim_stats.current = isc->trim_stats.max; + break; + case read_latency: + old = isc->write_stats.current; + lat = isc->read_stats.ema; + /* + * Simple PLL-like engine. Since we're steering to a range for + * the SP (set point) that makes things a little more + * complicated. In addition, we're not directly controlling our + * PV (process variable), the read latency, but instead are + * manipulating the write bandwidth limit for our MV + * (manipulation variable), analysis of this code gets a bit + * messy. Also, the MV is a very noisy control surface for read + * latency since it is affected by many hidden processes inside + * the device which change how responsive read latency will be + * in reaction to changes in write bandwidth. Unlike the classic + * boiler control PLL. this may result in over-steering while + * the SSD takes its time to react to the new, lower load. This + * is why we use a relatively low alpha of between .1 and .25 to + * compensate for this effect. At .1, it takes ~22 steering + * intervals to back off by a factor of 10. At .2 it only takes + * ~10. At .25 it only takes ~8. However some preliminary data + * from the SSD drives suggests a reasponse time in 10's of + * seconds before latency drops regardless of the new write + * rate. Careful observation will be reqiured to tune this + * effectively. + * + * Also, when there's no read traffic, we jack up the write + * limit too regardless of the last read latency. 10 is + * somewhat arbitrary. + */ + if (lat < clp->lolat || isc->read_stats.total - clp->last_count < 10) + isc->write_stats.current = isc->write_stats.current * + (100 + clp->alpha) / 100; /* Scale up */ + else if (lat > clp->hilat) + isc->write_stats.current = isc->write_stats.current * + (100 - clp->alpha) / 100; /* Scale down */ + clp->last_count = isc->read_stats.total; + + /* + * Even if we don't steer, per se, enforce the min/max limits as + * those may have changed. + */ + if (isc->write_stats.current < isc->write_stats.min) + isc->write_stats.current = isc->write_stats.min; + if (isc->write_stats.current > isc->write_stats.max) + isc->write_stats.current = isc->write_stats.max; + if (old != isc->write_stats.current) + printf("Steering write from %d kBps to %d kBps due to latency of %ldus\n", + old, isc->write_stats.current, + ((uint64_t)1000000 * (uint32_t)lat) >> 32); + break; + case cl_max: + break; + } +} +#endif + + /* Trim or similar currently pending completion */ +#define CAM_IOSCHED_FLAG_TRIM_ACTIVE (1ul << 0) + /* Callout active, and needs to be torn down */ +#define CAM_IOSCHED_FLAG_CALLOUT_ACTIVE (1ul << 1) + + /* Periph drivers set these flags to indicate work */ +#define CAM_IOSCHED_FLAG_WORK_FLAGS ((0xffffu) << 16) + +static void +cam_iosched_io_metric_update(struct cam_iosched_softc *isc, + sbintime_t sim_latency, int cmd, size_t size); + +static inline int +cam_iosched_has_flagged_work(struct cam_iosched_softc *isc) +{ + return !!(isc->flags & CAM_IOSCHED_FLAG_WORK_FLAGS); +} + +static inline int +cam_iosched_has_io(struct cam_iosched_softc *isc) +{ +#ifdef CAM_NETFLIX_IOSCHED + if (do_netflix_iosched) { + struct bio *rbp = bioq_first(&isc->bio_queue); + struct bio *wbp = bioq_first(&isc->write_queue); + int can_write = wbp != NULL && + cam_iosched_limiter_caniop(&isc->write_stats, wbp) == 0; + int can_read = rbp != NULL && + cam_iosched_limiter_caniop(&isc->read_stats, rbp) == 0; + if (iosched_debug > 2) { + printf("can write %d: pending_writes %d max_writes %d\n", can_write, isc->write_stats.pending, isc->write_stats.max); + printf("can read %d: read_stats.pending %d max_reads %d\n", can_read, isc->read_stats.pending, isc->read_stats.max); + printf("Queued reads %d writes %d\n", isc->read_stats.queued, isc->write_stats.queued); + } + return can_read || can_write; + } +#endif + return bioq_first(&isc->bio_queue) != NULL; +} + +static inline int +cam_iosched_has_more_trim(struct cam_iosched_softc *isc) +{ + return !(isc->flags & CAM_IOSCHED_FLAG_TRIM_ACTIVE) && + bioq_first(&isc->trim_queue); +} + +#define cam_iosched_sort_queue(isc) ((isc)->sort_io_queue >= 0 ? \ + (isc)->sort_io_queue : cam_sort_io_queues) + + +static inline int +cam_iosched_has_work(struct cam_iosched_softc *isc) +{ +#ifdef CAM_NETFLIX_IOSCHED + if (iosched_debug > 2) + printf("has work: %d %d %d\n", cam_iosched_has_io(isc), + cam_iosched_has_more_trim(isc), + cam_iosched_has_flagged_work(isc)); +#endif + + return cam_iosched_has_io(isc) || + cam_iosched_has_more_trim(isc) || + cam_iosched_has_flagged_work(isc); +} + +#ifdef CAM_NETFLIX_IOSCHED +static void +cam_iosched_iop_stats_init(struct cam_iosched_softc *isc, struct iop_stats *ios) +{ + + ios->limiter = none; + cam_iosched_limiter_init(ios); + ios->in = 0; + ios->max = 300000; + ios->min = 1; + ios->out = 0; + ios->pending = 0; + ios->queued = 0; + ios->total = 0; + ios->ema = 0; + ios->emss = 0; + ios->sd = 0; + ios->softc = isc; +} + +static int +cam_iosched_limiter_sysctl(SYSCTL_HANDLER_ARGS) +{ + char buf[16]; + struct iop_stats *ios; + struct cam_iosched_softc *isc; + int value, i, error, cantick; + const char *p; + + ios = arg1; + isc = ios->softc; + value = ios->limiter; + if (value < none || value >= limiter_max) + p = "UNKNOWN"; + else + p = cam_iosched_limiter_names[value]; + + strlcpy(buf, p, sizeof(buf)); + error = sysctl_handle_string(oidp, buf, sizeof(buf), req); + if (error != 0 || req->newptr == NULL) + return error; + + cam_periph_lock(isc->periph); + + for (i = none; i < limiter_max; i++) { + if (strcmp(buf, cam_iosched_limiter_names[i]) != 0) + continue; + ios->limiter = i; + error = cam_iosched_limiter_init(ios); + if (error != 0) { + ios->limiter = value; + cam_periph_unlock(isc->periph); + return error; + } + cantick = !!limsw[isc->read_stats.limiter].l_tick + + !!limsw[isc->write_stats.limiter].l_tick + + !!limsw[isc->trim_stats.limiter].l_tick + + 1; /* Control loop requires it */ + if (isc->flags & CAM_IOSCHED_FLAG_CALLOUT_ACTIVE) { + if (cantick == 0) { + callout_stop(&isc->ticker); + isc->flags &= ~CAM_IOSCHED_FLAG_CALLOUT_ACTIVE; + } + } else { + if (cantick != 0) { + callout_reset(&isc->ticker, hz / isc->quanta - 1, cam_iosched_ticker, isc); + isc->flags |= CAM_IOSCHED_FLAG_CALLOUT_ACTIVE; + } + } + + cam_periph_unlock(isc->periph); + return 0; + } + + cam_periph_unlock(isc->periph); + return EINVAL; +} + +static int +cam_iosched_control_type_sysctl(SYSCTL_HANDLER_ARGS) +{ + char buf[16]; + struct control_loop *clp; + struct cam_iosched_softc *isc; + int value, i, error; + const char *p; + + clp = arg1; + isc = clp->softc; + value = clp->type; + if (value < none || value >= cl_max) + p = "UNKNOWN"; + else + p = cam_iosched_control_type_names[value]; + + strlcpy(buf, p, sizeof(buf)); + error = sysctl_handle_string(oidp, buf, sizeof(buf), req); + if (error != 0 || req->newptr == NULL) + return error; + + for (i = set_max; i < cl_max; i++) { + if (strcmp(buf, cam_iosched_control_type_names[i]) != 0) + continue; + cam_periph_lock(isc->periph); + clp->type = i; + cam_periph_unlock(isc->periph); + return 0; + } + + return EINVAL; +} + +static int +cam_iosched_sbintime_sysctl(SYSCTL_HANDLER_ARGS) +{ + char buf[16]; + sbintime_t value; + int error; + uint64_t us; + + value = *(sbintime_t *)arg1; + us = (uint64_t)value / SBT_1US; + snprintf(buf, sizeof(buf), "%ju", (intmax_t)us); + error = sysctl_handle_string(oidp, buf, sizeof(buf), req); + if (error != 0 || req->newptr == NULL) + return error; + us = strtoul(buf, NULL, 10); + if (us == 0) + return EINVAL; + *(sbintime_t *)arg1 = us * SBT_1US; + return 0; +} + +static void +cam_iosched_iop_stats_sysctl_init(struct cam_iosched_softc *isc, struct iop_stats *ios, char *name) +{ + struct sysctl_oid_list *n; + struct sysctl_ctx_list *ctx; + + ios->sysctl_tree = SYSCTL_ADD_NODE(&isc->sysctl_ctx, + SYSCTL_CHILDREN(isc->sysctl_tree), OID_AUTO, name, + CTLFLAG_RD, 0, name); + n = SYSCTL_CHILDREN(ios->sysctl_tree); + ctx = &ios->sysctl_ctx; + + SYSCTL_ADD_UQUAD(ctx, n, + OID_AUTO, "ema", CTLFLAG_RD, + &ios->ema, + "Fast Exponentially Weighted Moving Average"); + SYSCTL_ADD_UQUAD(ctx, n, + OID_AUTO, "emss", CTLFLAG_RD, + &ios->emss, + "Fast Exponentially Weighted Moving Sum of Squares (maybe wrong)"); + SYSCTL_ADD_UQUAD(ctx, n, + OID_AUTO, "sd", CTLFLAG_RD, + &ios->sd, + "Estimated SD for fast ema (may be wrong)"); + + SYSCTL_ADD_INT(ctx, n, + OID_AUTO, "pending", CTLFLAG_RD, + &ios->pending, 0, + "Instantaneous # of pending transactions"); + SYSCTL_ADD_INT(ctx, n, + OID_AUTO, "count", CTLFLAG_RD, + &ios->total, 0, + "# of transactions submitted to hardware"); + SYSCTL_ADD_INT(ctx, n, + OID_AUTO, "queued", CTLFLAG_RD, + &ios->queued, 0, + "# of transactions in the queue"); + SYSCTL_ADD_INT(ctx, n, + OID_AUTO, "in", CTLFLAG_RD, + &ios->in, 0, + "# of transactions queued to driver"); + SYSCTL_ADD_INT(ctx, n, + OID_AUTO, "out", CTLFLAG_RD, + &ios->out, 0, + "# of transactions completed"); + + SYSCTL_ADD_PROC(ctx, n, + OID_AUTO, "limiter", CTLTYPE_STRING | CTLFLAG_RW, + ios, 0, cam_iosched_limiter_sysctl, "A", + "Current limiting type."); + SYSCTL_ADD_INT(ctx, n, + OID_AUTO, "min", CTLFLAG_RW, + &ios->min, 0, + "min resource"); + SYSCTL_ADD_INT(ctx, n, + OID_AUTO, "max", CTLFLAG_RW, + &ios->max, 0, + "max resource"); + SYSCTL_ADD_INT(ctx, n, + OID_AUTO, "current", CTLFLAG_RW, + &ios->current, 0, + "current resource"); + +} + +static void +cam_iosched_iop_stats_fini(struct iop_stats *ios) +{ + if (ios->sysctl_tree) + if (sysctl_ctx_free(&ios->sysctl_ctx) != 0) + printf("can't remove iosched sysctl stats context\n"); +} + +static void +cam_iosched_cl_sysctl_init(struct cam_iosched_softc *isc) +{ + struct sysctl_oid_list *n; + struct sysctl_ctx_list *ctx; + struct control_loop *clp; + + clp = &isc->cl; + clp->sysctl_tree = SYSCTL_ADD_NODE(&isc->sysctl_ctx, + SYSCTL_CHILDREN(isc->sysctl_tree), OID_AUTO, "control", + CTLFLAG_RD, 0, "Control loop info"); + n = SYSCTL_CHILDREN(clp->sysctl_tree); + ctx = &clp->sysctl_ctx; + + SYSCTL_ADD_PROC(ctx, n, + OID_AUTO, "type", CTLTYPE_STRING | CTLFLAG_RW, + clp, 0, cam_iosched_control_type_sysctl, "A", + "Control loop algorithm"); + SYSCTL_ADD_PROC(ctx, n, + OID_AUTO, "steer_interval", CTLTYPE_STRING | CTLFLAG_RW, + &clp->steer_interval, 0, cam_iosched_sbintime_sysctl, "A", + "How often to steer (in us)"); + SYSCTL_ADD_PROC(ctx, n, + OID_AUTO, "lolat", CTLTYPE_STRING | CTLFLAG_RW, + &clp->lolat, 0, cam_iosched_sbintime_sysctl, "A", + "Low water mark for Latency (in us)"); + SYSCTL_ADD_PROC(ctx, n, + OID_AUTO, "hilat", CTLTYPE_STRING | CTLFLAG_RW, + &clp->hilat, 0, cam_iosched_sbintime_sysctl, "A", + "Hi water mark for Latency (in us)"); + SYSCTL_ADD_INT(ctx, n, + OID_AUTO, "alpha", CTLFLAG_RW, + &clp->alpha, 0, + "Alpha for PLL (x100) aka gain"); +} + +static void +cam_iosched_cl_sysctl_fini(struct control_loop *clp) +{ + if (clp->sysctl_tree) + if (sysctl_ctx_free(&clp->sysctl_ctx) != 0) + printf("can't remove iosched sysctl control loop context\n"); +} +#endif + +/* + * Allocate the iosched structure. This also insulates callers from knowing + * sizeof struct cam_iosched_softc. + */ +int +cam_iosched_init(struct cam_iosched_softc **iscp, struct cam_periph *periph) +{ + + *iscp = malloc(sizeof(**iscp), M_CAMSCHED, M_NOWAIT | M_ZERO); + if (*iscp == NULL) + return ENOMEM; +#ifdef CAM_NETFLIX_IOSCHED + if (iosched_debug) + printf("CAM IOSCHEDULER Allocating entry at %p\n", *iscp); +#endif + (*iscp)->sort_io_queue = -1; + bioq_init(&(*iscp)->bio_queue); + bioq_init(&(*iscp)->trim_queue); +#ifdef CAM_NETFLIX_IOSCHED + if (do_netflix_iosched) { + bioq_init(&(*iscp)->write_queue); + (*iscp)->read_bias = 100; + (*iscp)->current_read_bias = 100; + (*iscp)->quanta = 200; + cam_iosched_iop_stats_init(*iscp, &(*iscp)->read_stats); + cam_iosched_iop_stats_init(*iscp, &(*iscp)->write_stats); + cam_iosched_iop_stats_init(*iscp, &(*iscp)->trim_stats); + (*iscp)->trim_stats.max = 1; /* Trims are special: one at a time for now */ + (*iscp)->last_time = sbinuptime(); + callout_init_mtx(&(*iscp)->ticker, cam_periph_mtx(periph), 0); + (*iscp)->periph = periph; + cam_iosched_cl_init(&(*iscp)->cl, *iscp); + callout_reset(&(*iscp)->ticker, hz / (*iscp)->quanta - 1, cam_iosched_ticker, *iscp); + (*iscp)->flags |= CAM_IOSCHED_FLAG_CALLOUT_ACTIVE; + } +#endif + + return 0; +} + +/* + * Reclaim all used resources. This assumes that other folks have + * drained the requests in the hardware. Maybe an unwise assumption. + */ +void +cam_iosched_fini(struct cam_iosched_softc *isc) +{ + if (isc) { + cam_iosched_flush(isc, NULL, ENXIO); +#ifdef CAM_NETFLIX_IOSCHED + cam_iosched_iop_stats_fini(&isc->read_stats); + cam_iosched_iop_stats_fini(&isc->write_stats); + cam_iosched_iop_stats_fini(&isc->trim_stats); + cam_iosched_cl_sysctl_fini(&isc->cl); + if (isc->sysctl_tree) + if (sysctl_ctx_free(&isc->sysctl_ctx) != 0) + printf("can't remove iosched sysctl stats context\n"); + if (isc->flags & CAM_IOSCHED_FLAG_CALLOUT_ACTIVE) { + callout_drain(&isc->ticker); + isc->flags &= ~ CAM_IOSCHED_FLAG_CALLOUT_ACTIVE; + } *** DIFF OUTPUT TRUNCATED AT 1000 LINES ***