From owner-freebsd-arch@FreeBSD.ORG Fri Feb 14 03:06:09 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id BED17444; Fri, 14 Feb 2014 03:06:09 +0000 (UTC) Received: from mail-qc0-x231.google.com (mail-qc0-x231.google.com [IPv6:2607:f8b0:400d:c01::231]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 6AD0B15A7; Fri, 14 Feb 2014 03:06:09 +0000 (UTC) Received: by mail-qc0-f177.google.com with SMTP id i8so19073631qcq.36 for ; Thu, 13 Feb 2014 19:06:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=+jioV4xyVCq1T00uG72ww/G+bXecll9niyHinUnbWKk=; b=Vj8r3VYM1y7MtT/BiWRwTs6KQJ1zda90xdUJb0isC8K/6gKnYXyY2XWekbn74LIXOv huLHFOt4ntEpHHwFqL6UE4YaEaji8wPlWnaj6bR0sb0obnHmWQEBMKYAUxHZmgiUpzH8 3cr/7oZGkIcEkI4aBdYIKR+YiuBh4Y841GKfslmjBNIcZONFnv9mmfxGsMLh8ry1Ho97 X7OGZg4F3uRIzmgv783PtCGn0QrVmHxA2iQt7ECG4vcXJ2PkcmWJ+iF8RqqcOM9O3Re+ VGqypWDzsDnGcB9p99v5+7gJ82qgnJhbTnQS7JTvd/kPlRSBBB92DtafDn2PS+CMJJ84 37Fw== MIME-Version: 1.0 X-Received: by 10.224.11.136 with SMTP id t8mr8929293qat.26.1392347167800; Thu, 13 Feb 2014 19:06:07 -0800 (PST) Sender: adrian.chadd@gmail.com Received: by 10.224.16.10 with HTTP; Thu, 13 Feb 2014 19:06:07 -0800 (PST) In-Reply-To: References: Date: Thu, 13 Feb 2014 19:06:07 -0800 X-Google-Sender-Auth: Mev_E25aX-jq1TG6UrCYnr52XZM Message-ID: Subject: Re: can the scheduler decide to schedule an interrupted but runnable thread on another CPU core? What are the implications for code? From: Adrian Chadd To: Ryan Stone Content-Type: text/plain; charset=ISO-8859-1 Cc: "freebsd-hackers@freebsd.org" , "freebsd-arch@freebsd.org" X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 14 Feb 2014 03:06:09 -0000 On 13 February 2014 18:37, Ryan Stone wrote: > On Thu, Feb 13, 2014 at 6:57 PM, Adrian Chadd wrote: >> sequentually: >> >> * lookup occurs on CPU A; >> * lookup succeeds on CPU A for some almost-expired entry; >> * preemption occurs, and it gets scheduled to CPU B; >> >> then simultaneously: >> >> * CPU A's flowtable purge code runs, and decides to purge entries >> including the current one; >> * the code now running on CPU B has an item from the CPU A flowtable, >> and dereferences it as it's being freed, leading to potential badness. > > This kind of scenario is definitely possible. All of the FreeBSD > kernel code that deals with lockless per-cpu data structures that I > have seen (e.g. uma) has used critical_enter()/critical_exit() to > prevent preemption, and have been careful to invalidate their > references to the per-cpu data if they have to drop the critical > section. > > I don't believe that sched_pin() is good enough because I don't > believe that it handles the scenario when thread A gets a reference > and then is preempted, thread B frees the entry, and then A is > scheduled and uses the now-freed entry. However I'm really not > familiar at all with flowtable so maybe there's something preventing > that. Oh man, I hate you for bringing that up. So, both flowtable_lookup_common() and flowtable_insert() does do a critical_enter() / critical_exit() during the list operations, so that's ok. It won't be pre-empted. If I wrap flowtable_lookup() with sched_pin() so it covers both the lookup path and the insert path, then it won't get scheduled on another CPU. Then the critical sections around the list operations guarantee it won't end up with a freed entry. The fl entry uptime is updated in the critical section so it shouldn't end up overlapping with the flow purger in a way that has it removed. So, I think that's okay. I think once i add sched_pin() in the right spots, this crap won't use a stale entry. Well, as long as the entry get used before the flow cleaner picks it up. Phew! -adrian