From owner-freebsd-arch@FreeBSD.ORG Sun Apr 19 08:52:55 2015 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 5AB4ED9C; Sun, 19 Apr 2015 08:52:55 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id D6490CE8; Sun, 19 Apr 2015 08:52:54 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id t3J8qn2N056851 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Sun, 19 Apr 2015 11:52:49 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua t3J8qn2N056851 Received: (from kostik@localhost) by tom.home (8.14.9/8.14.9/Submit) id t3J8qnSl056850; Sun, 19 Apr 2015 11:52:49 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Sun, 19 Apr 2015 11:52:48 +0300 From: Konstantin Belousov To: Oliver Pinter Cc: "freebsd-arch@freebsd.org" , peter@freebsd.org Subject: setproctitle [was: Re: Removal of the 6.x kernel compat code from libc] Message-ID: <20150419085248.GB2390@kib.kiev.ua> References: <20150417075942.GI2390@kib.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.0 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on tom.home X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 19 Apr 2015 08:52:55 -0000 On Fri, Apr 17, 2015 at 01:39:04PM +0200, Oliver Pinter wrote: > Is there any chanche to get ride of the very old (FreeBSD 2.x) compat > hacks like these: > https://github.com/freebsd/freebsd/blob/master/lib/libc/gen/setproctitle.c#L40 ? > Below is the promised cleanup of the libc/gen/setproctitle.c. I see a value in keeping the track of the historical interfaces. I considered moving the explanation about old_ps_strings to sys/exec.h, where the current ps_strings is defined and explained, but this would make the safety checks in the setproctitle() code uncomprehensive. diff --git a/lib/libc/gen/setproctitle.c b/lib/libc/gen/setproctitle.c index cd705fb..9dff328 100644 --- a/lib/libc/gen/setproctitle.c +++ b/lib/libc/gen/setproctitle.c @@ -42,9 +42,10 @@ __FBSDID("$FreeBSD$"); * 1: old_ps_strings at the very top of the stack. * 2: old_ps_strings at SPARE_USRSPACE below the top of the stack. * 3: ps_strings at the very top of the stack. - * This attempts to support a kernel built in the #2 and #3 era. - */ - + * We only support a kernel providing #3 style ps_strings. + * + * For historical purposes, a definition of the old ps_strings structure + * and location is preserved below: struct old_ps_strings { char *old_ps_argvstr; int old_ps_nargvstr; @@ -53,6 +54,7 @@ struct old_ps_strings { }; #define OLD_PS_STRINGS ((struct old_ps_strings *) \ (USRSTACK - SPARE_USRSPACE - sizeof(struct old_ps_strings))) + */ #include @@ -136,41 +138,38 @@ setproctitle(const char *fmt, ...) ps_strings = (struct ps_strings *)ul_ps_strings; } - /* PS_STRINGS points to zeroed memory on a style #2 kernel */ - if (ps_strings->ps_argvstr) { - /* style #3 */ - if (oargc == -1) { - /* Record our original args */ - oargc = ps_strings->ps_nargvstr; - oargv = ps_strings->ps_argvstr; - for (i = len = 0; i < oargc; i++) { - /* - * The program may have scribbled into its - * argv array, e.g., to remove some arguments. - * If that has happened, break out before - * trying to call strlen on a NULL pointer. - */ - if (oargv[i] == NULL) { - oargc = i; - break; - } - snprintf(obuf + len, SPT_BUFSIZE - len, "%s%s", - len ? " " : "", oargv[i]); - if (len) - len++; - len += strlen(oargv[i]); - if (len >= SPT_BUFSIZE) - break; + /* + * PS_STRINGS points to zeroed memory on a style #2 kernel. + * Should not happen. + */ + if (ps_strings->ps_argvstr == NULL) + return; + + /* style #3 */ + if (oargc == -1) { + /* Record our original args */ + oargc = ps_strings->ps_nargvstr; + oargv = ps_strings->ps_argvstr; + for (i = len = 0; i < oargc; i++) { + /* + * The program may have scribbled into its + * argv array, e.g., to remove some arguments. + * If that has happened, break out before + * trying to call strlen on a NULL pointer. + */ + if (oargv[i] == NULL) { + oargc = i; + break; } + snprintf(obuf + len, SPT_BUFSIZE - len, "%s%s", + len != 0 ? " " : "", oargv[i]); + if (len != 0) + len++; + len += strlen(oargv[i]); + if (len >= SPT_BUFSIZE) + break; } - ps_strings->ps_nargvstr = nargc; - ps_strings->ps_argvstr = nargvp; - } else { - /* style #2 - we can only restore our first arg :-( */ - if (*obuf == '\0') - strncpy(obuf, OLD_PS_STRINGS->old_ps_argvstr, - SPT_BUFSIZE - 1); - OLD_PS_STRINGS->old_ps_nargvstr = 1; - OLD_PS_STRINGS->old_ps_argvstr = nargvp[0]; } + ps_strings->ps_nargvstr = nargc; + ps_strings->ps_argvstr = nargvp; } From owner-freebsd-arch@FreeBSD.ORG Sun Apr 19 19:05:46 2015 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id C1FC6A2A; Sun, 19 Apr 2015 19:05:46 +0000 (UTC) Received: from mail.soaustin.net (mail.soaustin.net [66.135.54.68]) by mx1.freebsd.org (Postfix) with ESMTP id A3AF2FA2; Sun, 19 Apr 2015 19:05:46 +0000 (UTC) Received: by mail.soaustin.net (Postfix, from userid 502) id 19D895607A; Sun, 19 Apr 2015 13:56:10 -0500 (CDT) Date: Sun, 19 Apr 2015 13:56:10 -0500 From: Mark Linimon To: Stefan Esser Cc: Konstantin Belousov , "freebsd-arch@freebsd.org" , peter@freebsd.org, Oliver Pinter Subject: Re: Removal of the 6.x kernel compat code from libc Message-ID: <20150419185609.GA14639@lonesome.com> References: <20150417075942.GI2390@kib.kiev.ua> <20150417121034.GN2390@kib.kiev.ua> <5531059F.4060500@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5531059F.4060500@freebsd.org> User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 19 Apr 2015 19:05:46 -0000 On Fri, Apr 17, 2015 at 03:07:43PM +0200, Stefan Esser wrote: > I doubt that anybody relies on non-POSIX behaviour that has been > deprecated for some 15 years ... Any sin that's ever been committed is probably still referenced by some damned port or other :-) That's still no reason to keep them, of course; I'm just pointing out that you're taunting Murphy. mcl From owner-freebsd-arch@FreeBSD.ORG Sun Apr 19 19:59:48 2015 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 3B010900 for ; Sun, 19 Apr 2015 19:59:48 +0000 (UTC) Received: from mail-ob0-f182.google.com (mail-ob0-f182.google.com [209.85.214.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 095DD773 for ; Sun, 19 Apr 2015 19:59:47 +0000 (UTC) Received: by obfe9 with SMTP id e9so103871546obf.1 for ; Sun, 19 Apr 2015 12:59:47 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=4BjroyI9ShcGr/mhIqVgoERDHW05GuzQ91f8Cc8RFt8=; b=bbd/tYZYn//M/WYn/cy9PS9SOyOoR1/Xw7dcUAf5koaKIL1iW/F6pgPT0A3WHmRItI jNEn8N2cGQi/lLKAdASYk3LvK53e9DzBy1787Wql7bt/Ehfra2aPxjW8wHHCW/ImAE7r Jp5ujOx19kktKK5m5kVxxeyQXFlbA2dQb3HrhHIFVZhSbvd/Mf3btboJpQkMBLflAfGU nFCC/pHTdDDtdorbmV3bspncMD5VP/Oq4coJL6luzeJ6ZfEBKSPOPC1newHGE6wmDKi/ Q0H1VXHoayGQ4Orf2jQobwx12uxHX11/U0xAk2LWaUGb0WAMaiilf44fkB4dmp9nVAf7 xcyw== X-Gm-Message-State: ALoCoQmHADs6up91PJ1i0SeCiQKkE9qgIozCrbpVGjXNOsMqtO5EYrhBYYhLHoAxaLDd6FggPiBn MIME-Version: 1.0 X-Received: by 10.202.84.135 with SMTP id i129mr10730751oib.114.1429473587050; Sun, 19 Apr 2015 12:59:47 -0700 (PDT) Received: by 10.202.80.6 with HTTP; Sun, 19 Apr 2015 12:59:46 -0700 (PDT) In-Reply-To: <20150419085248.GB2390@kib.kiev.ua> References: <20150417075942.GI2390@kib.kiev.ua> <20150419085248.GB2390@kib.kiev.ua> Date: Sun, 19 Apr 2015 21:59:46 +0200 Message-ID: Subject: Re: setproctitle [was: Re: Removal of the 6.x kernel compat code from libc] From: Oliver Pinter To: Konstantin Belousov Cc: "freebsd-arch@freebsd.org" , peter@freebsd.org Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 19 Apr 2015 19:59:48 -0000 On Sun, Apr 19, 2015 at 10:52 AM, Konstantin Belousov wrote: > On Fri, Apr 17, 2015 at 01:39:04PM +0200, Oliver Pinter wrote: >> Is there any chanche to get ride of the very old (FreeBSD 2.x) compat >> hacks like these: >> https://github.com/freebsd/freebsd/blob/master/lib/libc/gen/setproctitle.c#L40 ? >> > > Below is the promised cleanup of the libc/gen/setproctitle.c. Seems fine to me, and I started a jenkins build with this patch: http://jenkins.hardenedbsd.org:8180/jenkins/job/HardenedBSD-unstable-amd64/changes > > I see a value in keeping the track of the historical interfaces. I > considered moving the explanation about old_ps_strings to sys/exec.h, > where the current ps_strings is defined and explained, but this would > make the safety checks in the setproctitle() code uncomprehensive. Fine. > > diff --git a/lib/libc/gen/setproctitle.c b/lib/libc/gen/setproctitle.c > index cd705fb..9dff328 100644 > --- a/lib/libc/gen/setproctitle.c > +++ b/lib/libc/gen/setproctitle.c > @@ -42,9 +42,10 @@ __FBSDID("$FreeBSD$"); > * 1: old_ps_strings at the very top of the stack. > * 2: old_ps_strings at SPARE_USRSPACE below the top of the stack. > * 3: ps_strings at the very top of the stack. > - * This attempts to support a kernel built in the #2 and #3 era. > - */ > - > + * We only support a kernel providing #3 style ps_strings. > + * > + * For historical purposes, a definition of the old ps_strings structure > + * and location is preserved below: > struct old_ps_strings { > char *old_ps_argvstr; > int old_ps_nargvstr; > @@ -53,6 +54,7 @@ struct old_ps_strings { > }; > #define OLD_PS_STRINGS ((struct old_ps_strings *) \ > (USRSTACK - SPARE_USRSPACE - sizeof(struct old_ps_strings))) > + */ > > #include > > @@ -136,41 +138,38 @@ setproctitle(const char *fmt, ...) > ps_strings = (struct ps_strings *)ul_ps_strings; > } > > - /* PS_STRINGS points to zeroed memory on a style #2 kernel */ > - if (ps_strings->ps_argvstr) { > - /* style #3 */ > - if (oargc == -1) { > - /* Record our original args */ > - oargc = ps_strings->ps_nargvstr; > - oargv = ps_strings->ps_argvstr; > - for (i = len = 0; i < oargc; i++) { > - /* > - * The program may have scribbled into its > - * argv array, e.g., to remove some arguments. > - * If that has happened, break out before > - * trying to call strlen on a NULL pointer. > - */ > - if (oargv[i] == NULL) { > - oargc = i; > - break; > - } > - snprintf(obuf + len, SPT_BUFSIZE - len, "%s%s", > - len ? " " : "", oargv[i]); > - if (len) > - len++; > - len += strlen(oargv[i]); > - if (len >= SPT_BUFSIZE) > - break; > + /* > + * PS_STRINGS points to zeroed memory on a style #2 kernel. > + * Should not happen. > + */ > + if (ps_strings->ps_argvstr == NULL) > + return; > + > + /* style #3 */ > + if (oargc == -1) { > + /* Record our original args */ > + oargc = ps_strings->ps_nargvstr; > + oargv = ps_strings->ps_argvstr; > + for (i = len = 0; i < oargc; i++) { > + /* > + * The program may have scribbled into its > + * argv array, e.g., to remove some arguments. > + * If that has happened, break out before > + * trying to call strlen on a NULL pointer. > + */ > + if (oargv[i] == NULL) { > + oargc = i; > + break; > } > + snprintf(obuf + len, SPT_BUFSIZE - len, "%s%s", > + len != 0 ? " " : "", oargv[i]); > + if (len != 0) > + len++; > + len += strlen(oargv[i]); > + if (len >= SPT_BUFSIZE) > + break; > } > - ps_strings->ps_nargvstr = nargc; > - ps_strings->ps_argvstr = nargvp; > - } else { > - /* style #2 - we can only restore our first arg :-( */ > - if (*obuf == '\0') > - strncpy(obuf, OLD_PS_STRINGS->old_ps_argvstr, > - SPT_BUFSIZE - 1); > - OLD_PS_STRINGS->old_ps_nargvstr = 1; > - OLD_PS_STRINGS->old_ps_argvstr = nargvp[0]; > } > + ps_strings->ps_nargvstr = nargc; > + ps_strings->ps_argvstr = nargvp; > } From owner-freebsd-arch@FreeBSD.ORG Mon Apr 20 06:13:32 2015 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 0524CA73 for ; Mon, 20 Apr 2015 06:13:32 +0000 (UTC) Received: from yamoy.9001888.com (yamoy.9001888.com [178.251.230.21]) by mx1.freebsd.org (Postfix) with ESMTP id 1E582AF7 for ; Mon, 20 Apr 2015 06:13:31 +0000 (UTC) To: freebsd-arch@freebsd.org Subject: about our email marketing Message-ID: <815cfb1b09dc0a059ff61c2ca38fda36@masonite.com> Date: Mon, 20 Apr 2015 07:39:00 +0200 From: "Louis" Reply-To: bonnietongy@sina.com MIME-Version: 1.0 X-Mailer-LID: 26 X-Mailer-RecptId: 20219457 X-Mailer-SID: 282 X-Mailer-Sent-By: 1 Content-Type: text/plain; format=flowed; charset="UTF-8" Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 20 Apr 2015 06:13:32 -0000 Hi, You are receiving this email because we wish you to use our target email marketing service. We specialize in providing target email marketing services to a number of businesses all over the world! Email marketing is one of the best marketing strategies of all time and has helped many businesses globally achieve their goals, double their profits and increase their client base. We have worked on a number of projects and campaigns, all our packages are tailor made and designed according to your requirements. We wish to be your marketing partner, we can increase your business sales 2-5 times. If you would require more information please send us an email and we would be glad to discuss the project requirements with you soon. Looking forward to your positive response. Kind Regards Louis Marketing Specialist Email: wukelili@tom.com From owner-freebsd-arch@FreeBSD.ORG Mon Apr 20 15:18:23 2015 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A5BDF44E for ; Mon, 20 Apr 2015 15:18:23 +0000 (UTC) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 7D57D7C for ; Mon, 20 Apr 2015 15:18:23 +0000 (UTC) Received: from ralph.baldwin.cx (pool-173-54-116-245.nwrknj.fios.verizon.net [173.54.116.245]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 0A299B91F; Mon, 20 Apr 2015 11:18:22 -0400 (EDT) From: John Baldwin To: Yue Chen Cc: freebsd-arch@freebsd.org Subject: Re: Situations about PC values in kernel data segments Date: Mon, 20 Apr 2015 11:07:05 -0400 Message-ID: <2404384.sKCn9g0TDD@ralph.baldwin.cx> User-Agent: KMail/4.14.2 (FreeBSD/10.1-STABLE; KDE/4.14.2; amd64; ; ) In-Reply-To: References: <6048769.xVxqkDkTGK@ralph.baldwin.cx> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Mon, 20 Apr 2015 11:18:22 -0400 (EDT) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 20 Apr 2015 15:18:23 -0000 On Friday, April 17, 2015 04:19:54 PM Yue Chen wrote: > I mean, the PC values in non-.text segments like .data, .rodata, stack, > heap, etc. Usually this is for comparison purposes. E.g., compare the > faulting PC against some range already stored in a table/handler. > > > When pcb_onfault is used it is set to point to code in a .text segment, > not anywhere else. > > The pointer value stored in non-.text segments is a PC value (instruction > address in .text), like 0xffffffff12345678, and may not be a function entry > point address, right? I think I do not follow your question. Are you asking if you can figure out if a given PC value used as the value of $rip for an arbitrary instruction is valid, or are you trying to enumerate all the words in memory that hold a pointer to a .text value (like pcb_onfault)? I assumed the former. AFAIK, the kernel is not going to execute any code from .data, .rodata, or the stack. For things like pcb_onfault, the value stored is in .text, like this: ENTRY(copyout) PUSH_FRAME_POINTER movq PCPU(CURPCB),%rax movq $copyout_fault,PCB_ONFAULT(%rax) testq %rdx,%rdx /* anything to do? */ jz done_copyout ... done_copyout: xorl %eax,%eax movq PCPU(CURPCB),%rdx movq %rax,PCB_ONFAULT(%rdx) POP_FRAME_POINTER ret ALIGN_TEXT copyout_fault: movq PCPU(CURPCB),%rdx movq $0,PCB_ONFAULT(%rdx) movq $EFAULT,%rax POP_FRAME_POINTER ret END(copyout) Here 'copyout_fault' is in .text, not in a different section. > > > On Fri, Apr 17, 2015 at 9:22 AM, John Baldwin wrote: > > > On Saturday, April 11, 2015 05:18:28 AM Yue Chen wrote: > > > Dear all, > > > > > > We are working on a project about OS security. > > > We wonder in which situations the program counter (PC) value (e.g., the > > > value in %RIP on x86_64, i.e, instruction address) could be in kernel > > > (module) data segments (including stack, heap, etc.). > > > > > > Here we mainly care about the address/value that are NOT function entry > > > points since there exist a number of function pointers. Also, we only > > > consider the normal cases because one can write arbitrary values into a > > > variable/pointer. And we mainly consider i386, AMD64 and ARM. > > > > > > Here are some situations I can think about: > > > function/interrupt/exception/syscall return address on stack; switch/case > > > jump table target; page fault handler (pcb_onfault on *BSD); restartable > > > atomic sequences (RAS) registry; thread/process context structure like > > Task > > > state segment (TSS), process control block (PCB) and thread control block > > > (TCB); situations for debugging purposes (e.g., like those in ``segment > > not > > > present'' exception handler). > > > > > > Additionally, does any of these addresses have offset formats or special > > > encodings? For example, on x86_64, we may use 32-bit RIP-relative > > > (addressing) offset to represent a 64-bit full address. In glibc's > > > setjmp/longjmp jmp_buf, they use a special encoding (PTR_MANGLE) for > > saved > > > register values. > > > > For i386 and amd64, I think all of the code that is executed does live in a > > .text segment. When pcb_onfault is used it is set to point to code in a > > .text > > segment, not anywhere else. Similarly, fault and exception handlers as > > well > > as the stub for new threads/processes after fork/thread_create is in .text > > as well. There are multiple text segments present when modules are loaded > > of course, but you should be able to enumerate all of those in the linker. > > > > -- > > John Baldwin > > -- John Baldwin From owner-freebsd-arch@FreeBSD.ORG Mon Apr 20 15:18:24 2015 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 6B78C44F for ; Mon, 20 Apr 2015 15:18:24 +0000 (UTC) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4598B7D for ; Mon, 20 Apr 2015 15:18:24 +0000 (UTC) Received: from ralph.baldwin.cx (pool-173-54-116-245.nwrknj.fios.verizon.net [173.54.116.245]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 3A531B926; Mon, 20 Apr 2015 11:18:23 -0400 (EDT) From: John Baldwin To: Konstantin Belousov Cc: freebsd-arch@freebsd.org, Yue Chen Subject: Re: Situations about PC values in kernel data segments Date: Mon, 20 Apr 2015 11:00:27 -0400 Message-ID: <2177000.nIlZYR4khO@ralph.baldwin.cx> User-Agent: KMail/4.14.2 (FreeBSD/10.1-STABLE; KDE/4.14.2; amd64; ; ) In-Reply-To: <20150417134348.GR2390@kib.kiev.ua> References: <6048769.xVxqkDkTGK@ralph.baldwin.cx> <20150417134348.GR2390@kib.kiev.ua> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Mon, 20 Apr 2015 11:18:23 -0400 (EDT) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 20 Apr 2015 15:18:24 -0000 On Friday, April 17, 2015 04:43:48 PM Konstantin Belousov wrote: > On Fri, Apr 17, 2015 at 09:22:43AM -0400, John Baldwin wrote: > > On Saturday, April 11, 2015 05:18:28 AM Yue Chen wrote: > > > Dear all, > > > > > > We are working on a project about OS security. > > > We wonder in which situations the program counter (PC) value (e.g., the > > > value in %RIP on x86_64, i.e, instruction address) could be in kernel > > > (module) data segments (including stack, heap, etc.). > > > > > > Here we mainly care about the address/value that are NOT function entry > > > points since there exist a number of function pointers. Also, we only > > > consider the normal cases because one can write arbitrary values into a > > > variable/pointer. And we mainly consider i386, AMD64 and ARM. > > > > > > Here are some situations I can think about: > > > function/interrupt/exception/syscall return address on stack; switch/case > > > jump table target; page fault handler (pcb_onfault on *BSD); restartable > > > atomic sequences (RAS) registry; thread/process context structure like Task > > > state segment (TSS), process control block (PCB) and thread control block > > > (TCB); situations for debugging purposes (e.g., like those in ``segment not > > > present'' exception handler). > > > > > > Additionally, does any of these addresses have offset formats or special > > > encodings? For example, on x86_64, we may use 32-bit RIP-relative > > > (addressing) offset to represent a 64-bit full address. In glibc's > > > setjmp/longjmp jmp_buf, they use a special encoding (PTR_MANGLE) for saved > > > register values. > > > > For i386 and amd64, I think all of the code that is executed does live in a > > .text segment. When pcb_onfault is used it is set to point to code in a .text > > segment, not anywhere else. Similarly, fault and exception handlers as well > > as the stub for new threads/processes after fork/thread_create is in .text > > as well. There are multiple text segments present when modules are loaded > > of course, but you should be able to enumerate all of those in the linker. > > Wasn't bpf enhanced to compile filters to the native code, on x86 ? > Also, what about BIOS code ? Esp. since the spread of UEFI and hope that > our kernel starts using UEFI runtime services one day. My point is that > _relying_ on enumeration of the text segments for kernel and modules to > determine all executable memory is not correct. It depends on the scope. If this is for a graduate research project to build a prototype to see if this is feasible, then some cavets are acceptable if they are known. One could be to disallow the bpf JIT option (I believe it is not in GENERIC)? EFI is actually fairly easily handled since the EFI memory map gives you the bounds of the executable code and you can just treat that as an additional .text segment. -- John Baldwin From owner-freebsd-arch@FreeBSD.ORG Mon Apr 20 16:21:59 2015 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id AE60665E; Mon, 20 Apr 2015 16:21:59 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 21809B3A; Mon, 20 Apr 2015 16:21:58 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id t3KGLo1S072553 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Mon, 20 Apr 2015 19:21:50 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua t3KGLo1S072553 Received: (from kostik@localhost) by tom.home (8.14.9/8.14.9/Submit) id t3KGLoJ6072552; Mon, 20 Apr 2015 19:21:50 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Mon, 20 Apr 2015 19:21:50 +0300 From: Konstantin Belousov To: arch@freebsd.org, amd64@freebsd.org Subject: Move x86 idle code to the x86/ common place. Message-ID: <20150420162149.GE2390@kib.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.0 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on tom.home X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 20 Apr 2015 16:21:59 -0000 Below is the patch which unifies some code from sys/{amd64/amd64,i386/i386}/machdep.c into the new shared file sys/x86/x86/cpu_machdep.c. Most of the code is related to handling the idle CPU state, but there is some additional trivialities like cpu_boot() etc. The move is mostly a preparation for some other changes to the idle infrastructure. I did not wanted to make same changes twice. Make universe passed with the patch, I successfully booted debug amd64 kernel and UP i386. Comments ? diff --git a/sys/amd64/amd64/machdep.c b/sys/amd64/amd64/machdep.c index 4c20e4f..3230937 100644 --- a/sys/amd64/amd64/machdep.c +++ b/sys/amd64/amd64/machdep.c @@ -578,375 +578,6 @@ freebsd4_sigreturn(struct thread *td, struct freebsd4_sigreturn_args *uap) } #endif - -/* - * Machine dependent boot() routine - * - * I haven't seen anything to put here yet - * Possibly some stuff might be grafted back here from boot() - */ -void -cpu_boot(int howto) -{ -} - -/* - * Flush the D-cache for non-DMA I/O so that the I-cache can - * be made coherent later. - */ -void -cpu_flush_dcache(void *ptr, size_t len) -{ - /* Not applicable */ -} - -/* Get current clock frequency for the given cpu id. */ -int -cpu_est_clockrate(int cpu_id, uint64_t *rate) -{ - uint64_t tsc1, tsc2; - uint64_t acnt, mcnt, perf; - register_t reg; - - if (pcpu_find(cpu_id) == NULL || rate == NULL) - return (EINVAL); - - /* - * If TSC is P-state invariant and APERF/MPERF MSRs do not exist, - * DELAY(9) based logic fails. - */ - if (tsc_is_invariant && !tsc_perf_stat) - return (EOPNOTSUPP); - -#ifdef SMP - if (smp_cpus > 1) { - /* Schedule ourselves on the indicated cpu. */ - thread_lock(curthread); - sched_bind(curthread, cpu_id); - thread_unlock(curthread); - } -#endif - - /* Calibrate by measuring a short delay. */ - reg = intr_disable(); - if (tsc_is_invariant) { - wrmsr(MSR_MPERF, 0); - wrmsr(MSR_APERF, 0); - tsc1 = rdtsc(); - DELAY(1000); - mcnt = rdmsr(MSR_MPERF); - acnt = rdmsr(MSR_APERF); - tsc2 = rdtsc(); - intr_restore(reg); - perf = 1000 * acnt / mcnt; - *rate = (tsc2 - tsc1) * perf; - } else { - tsc1 = rdtsc(); - DELAY(1000); - tsc2 = rdtsc(); - intr_restore(reg); - *rate = (tsc2 - tsc1) * 1000; - } - -#ifdef SMP - if (smp_cpus > 1) { - thread_lock(curthread); - sched_unbind(curthread); - thread_unlock(curthread); - } -#endif - - return (0); -} - -/* - * Shutdown the CPU as much as possible - */ -void -cpu_halt(void) -{ - for (;;) - halt(); -} - -void (*cpu_idle_hook)(sbintime_t) = NULL; /* ACPI idle hook. */ -static int cpu_ident_amdc1e = 0; /* AMD C1E supported. */ -static int idle_mwait = 1; /* Use MONITOR/MWAIT for short idle. */ -SYSCTL_INT(_machdep, OID_AUTO, idle_mwait, CTLFLAG_RWTUN, &idle_mwait, - 0, "Use MONITOR/MWAIT for short idle"); - -#define STATE_RUNNING 0x0 -#define STATE_MWAIT 0x1 -#define STATE_SLEEPING 0x2 - -static void -cpu_idle_acpi(sbintime_t sbt) -{ - int *state; - - state = (int *)PCPU_PTR(monitorbuf); - *state = STATE_SLEEPING; - - /* See comments in cpu_idle_hlt(). */ - disable_intr(); - if (sched_runnable()) - enable_intr(); - else if (cpu_idle_hook) - cpu_idle_hook(sbt); - else - __asm __volatile("sti; hlt"); - *state = STATE_RUNNING; -} - -static void -cpu_idle_hlt(sbintime_t sbt) -{ - int *state; - - state = (int *)PCPU_PTR(monitorbuf); - *state = STATE_SLEEPING; - - /* - * Since we may be in a critical section from cpu_idle(), if - * an interrupt fires during that critical section we may have - * a pending preemption. If the CPU halts, then that thread - * may not execute until a later interrupt awakens the CPU. - * To handle this race, check for a runnable thread after - * disabling interrupts and immediately return if one is - * found. Also, we must absolutely guarentee that hlt is - * the next instruction after sti. This ensures that any - * interrupt that fires after the call to disable_intr() will - * immediately awaken the CPU from hlt. Finally, please note - * that on x86 this works fine because of interrupts enabled only - * after the instruction following sti takes place, while IF is set - * to 1 immediately, allowing hlt instruction to acknowledge the - * interrupt. - */ - disable_intr(); - if (sched_runnable()) - enable_intr(); - else - __asm __volatile("sti; hlt"); - *state = STATE_RUNNING; -} - -static void -cpu_idle_mwait(sbintime_t sbt) -{ - int *state; - - state = (int *)PCPU_PTR(monitorbuf); - *state = STATE_MWAIT; - - /* See comments in cpu_idle_hlt(). */ - disable_intr(); - if (sched_runnable()) { - enable_intr(); - *state = STATE_RUNNING; - return; - } - cpu_monitor(state, 0, 0); - if (*state == STATE_MWAIT) - __asm __volatile("sti; mwait" : : "a" (MWAIT_C1), "c" (0)); - else - enable_intr(); - *state = STATE_RUNNING; -} - -static void -cpu_idle_spin(sbintime_t sbt) -{ - int *state; - int i; - - state = (int *)PCPU_PTR(monitorbuf); - *state = STATE_RUNNING; - - /* - * The sched_runnable() call is racy but as long as there is - * a loop missing it one time will have just a little impact if any - * (and it is much better than missing the check at all). - */ - for (i = 0; i < 1000; i++) { - if (sched_runnable()) - return; - cpu_spinwait(); - } -} - -/* - * C1E renders the local APIC timer dead, so we disable it by - * reading the Interrupt Pending Message register and clearing - * both C1eOnCmpHalt (bit 28) and SmiOnCmpHalt (bit 27). - * - * Reference: - * "BIOS and Kernel Developer's Guide for AMD NPT Family 0Fh Processors" - * #32559 revision 3.00+ - */ -#define MSR_AMDK8_IPM 0xc0010055 -#define AMDK8_SMIONCMPHALT (1ULL << 27) -#define AMDK8_C1EONCMPHALT (1ULL << 28) -#define AMDK8_CMPHALT (AMDK8_SMIONCMPHALT | AMDK8_C1EONCMPHALT) - -static void -cpu_probe_amdc1e(void) -{ - - /* - * Detect the presence of C1E capability mostly on latest - * dual-cores (or future) k8 family. - */ - if (cpu_vendor_id == CPU_VENDOR_AMD && - (cpu_id & 0x00000f00) == 0x00000f00 && - (cpu_id & 0x0fff0000) >= 0x00040000) { - cpu_ident_amdc1e = 1; - } -} - -void (*cpu_idle_fn)(sbintime_t) = cpu_idle_acpi; - -void -cpu_idle(int busy) -{ - uint64_t msr; - sbintime_t sbt = -1; - - CTR2(KTR_SPARE2, "cpu_idle(%d) at %d", - busy, curcpu); -#ifdef MP_WATCHDOG - ap_watchdog(PCPU_GET(cpuid)); -#endif - /* If we are busy - try to use fast methods. */ - if (busy) { - if ((cpu_feature2 & CPUID2_MON) && idle_mwait) { - cpu_idle_mwait(busy); - goto out; - } - } - - /* If we have time - switch timers into idle mode. */ - if (!busy) { - critical_enter(); - sbt = cpu_idleclock(); - } - - /* Apply AMD APIC timer C1E workaround. */ - if (cpu_ident_amdc1e && cpu_disable_c3_sleep) { - msr = rdmsr(MSR_AMDK8_IPM); - if (msr & AMDK8_CMPHALT) - wrmsr(MSR_AMDK8_IPM, msr & ~AMDK8_CMPHALT); - } - - /* Call main idle method. */ - cpu_idle_fn(sbt); - - /* Switch timers back into active mode. */ - if (!busy) { - cpu_activeclock(); - critical_exit(); - } -out: - CTR2(KTR_SPARE2, "cpu_idle(%d) at %d done", - busy, curcpu); -} - -int -cpu_idle_wakeup(int cpu) -{ - struct pcpu *pcpu; - int *state; - - pcpu = pcpu_find(cpu); - state = (int *)pcpu->pc_monitorbuf; - /* - * This doesn't need to be atomic since missing the race will - * simply result in unnecessary IPIs. - */ - if (*state == STATE_SLEEPING) - return (0); - if (*state == STATE_MWAIT) - *state = STATE_RUNNING; - return (1); -} - -/* - * Ordered by speed/power consumption. - */ -struct { - void *id_fn; - char *id_name; -} idle_tbl[] = { - { cpu_idle_spin, "spin" }, - { cpu_idle_mwait, "mwait" }, - { cpu_idle_hlt, "hlt" }, - { cpu_idle_acpi, "acpi" }, - { NULL, NULL } -}; - -static int -idle_sysctl_available(SYSCTL_HANDLER_ARGS) -{ - char *avail, *p; - int error; - int i; - - avail = malloc(256, M_TEMP, M_WAITOK); - p = avail; - for (i = 0; idle_tbl[i].id_name != NULL; i++) { - if (strstr(idle_tbl[i].id_name, "mwait") && - (cpu_feature2 & CPUID2_MON) == 0) - continue; - if (strcmp(idle_tbl[i].id_name, "acpi") == 0 && - cpu_idle_hook == NULL) - continue; - p += sprintf(p, "%s%s", p != avail ? ", " : "", - idle_tbl[i].id_name); - } - error = sysctl_handle_string(oidp, avail, 0, req); - free(avail, M_TEMP); - return (error); -} - -SYSCTL_PROC(_machdep, OID_AUTO, idle_available, CTLTYPE_STRING | CTLFLAG_RD, - 0, 0, idle_sysctl_available, "A", "list of available idle functions"); - -static int -idle_sysctl(SYSCTL_HANDLER_ARGS) -{ - char buf[16]; - int error; - char *p; - int i; - - p = "unknown"; - for (i = 0; idle_tbl[i].id_name != NULL; i++) { - if (idle_tbl[i].id_fn == cpu_idle_fn) { - p = idle_tbl[i].id_name; - break; - } - } - strncpy(buf, p, sizeof(buf)); - error = sysctl_handle_string(oidp, buf, sizeof(buf), req); - if (error != 0 || req->newptr == NULL) - return (error); - for (i = 0; idle_tbl[i].id_name != NULL; i++) { - if (strstr(idle_tbl[i].id_name, "mwait") && - (cpu_feature2 & CPUID2_MON) == 0) - continue; - if (strcmp(idle_tbl[i].id_name, "acpi") == 0 && - cpu_idle_hook == NULL) - continue; - if (strcmp(idle_tbl[i].id_name, buf)) - continue; - cpu_idle_fn = idle_tbl[i].id_fn; - return (0); - } - return (EINVAL); -} - -SYSCTL_PROC(_machdep, OID_AUTO, idle, CTLTYPE_STRING | CTLFLAG_RW, 0, 0, - idle_sysctl, "A", "currently selected idle function"); - /* * Reset registers to default values on exec. */ diff --git a/sys/amd64/include/md_var.h b/sys/amd64/include/md_var.h index ccde0e3..9083421 100644 --- a/sys/amd64/include/md_var.h +++ b/sys/amd64/include/md_var.h @@ -91,6 +91,7 @@ struct dumperinfo; void *alloc_fpusave(int flags); void amd64_syscall(struct thread *td, int traced); void busdma_swi(void); +void cpu_probe_amdc1e(void); void cpu_setregs(void); void doreti_iret(void) __asm(__STRING(doreti_iret)); void doreti_iret_fault(void) __asm(__STRING(doreti_iret_fault)); diff --git a/sys/conf/files.amd64 b/sys/conf/files.amd64 index 4910903..ae71c39 100644 --- a/sys/conf/files.amd64 +++ b/sys/conf/files.amd64 @@ -558,6 +558,7 @@ x86/pci/pci_bus.c optional pci x86/pci/qpi.c optional pci x86/x86/busdma_bounce.c standard x86/x86/busdma_machdep.c standard +x86/x86/cpu_machdep.c standard x86/x86/dump_machdep.c standard x86/x86/fdt_machdep.c optional fdt x86/x86/identcpu.c standard diff --git a/sys/conf/files.i386 b/sys/conf/files.i386 index 1873514..f072247 100644 --- a/sys/conf/files.i386 +++ b/sys/conf/files.i386 @@ -576,6 +576,7 @@ x86/pci/pci_bus.c optional pci x86/pci/qpi.c optional pci x86/x86/busdma_bounce.c standard x86/x86/busdma_machdep.c standard +x86/x86/cpu_machdep.c standard x86/x86/dump_machdep.c standard x86/x86/fdt_machdep.c optional fdt x86/x86/identcpu.c standard diff --git a/sys/conf/files.pc98 b/sys/conf/files.pc98 index be67ce4..f95d0bb 100644 --- a/sys/conf/files.pc98 +++ b/sys/conf/files.pc98 @@ -248,6 +248,7 @@ x86/isa/isa.c optional isa x86/pci/pci_bus.c optional pci x86/x86/busdma_bounce.c standard x86/x86/busdma_machdep.c standard +x86/x86/cpu_machdep.c standard x86/x86/dump_machdep.c standard x86/x86/identcpu.c standard x86/x86/intr_machdep.c standard diff --git a/sys/i386/i386/machdep.c b/sys/i386/i386/machdep.c index 123db4e..72f7685 100644 --- a/sys/i386/i386/machdep.c +++ b/sys/i386/i386/machdep.c @@ -1176,427 +1176,6 @@ sys_sigreturn(td, uap) } /* - * Machine dependent boot() routine - * - * I haven't seen anything to put here yet - * Possibly some stuff might be grafted back here from boot() - */ -void -cpu_boot(int howto) -{ -} - -/* - * Flush the D-cache for non-DMA I/O so that the I-cache can - * be made coherent later. - */ -void -cpu_flush_dcache(void *ptr, size_t len) -{ - /* Not applicable */ -} - -/* Get current clock frequency for the given cpu id. */ -int -cpu_est_clockrate(int cpu_id, uint64_t *rate) -{ - uint64_t tsc1, tsc2; - uint64_t acnt, mcnt, perf; - register_t reg; - - if (pcpu_find(cpu_id) == NULL || rate == NULL) - return (EINVAL); - if ((cpu_feature & CPUID_TSC) == 0) - return (EOPNOTSUPP); - - /* - * If TSC is P-state invariant and APERF/MPERF MSRs do not exist, - * DELAY(9) based logic fails. - */ - if (tsc_is_invariant && !tsc_perf_stat) - return (EOPNOTSUPP); - -#ifdef SMP - if (smp_cpus > 1) { - /* Schedule ourselves on the indicated cpu. */ - thread_lock(curthread); - sched_bind(curthread, cpu_id); - thread_unlock(curthread); - } -#endif - - /* Calibrate by measuring a short delay. */ - reg = intr_disable(); - if (tsc_is_invariant) { - wrmsr(MSR_MPERF, 0); - wrmsr(MSR_APERF, 0); - tsc1 = rdtsc(); - DELAY(1000); - mcnt = rdmsr(MSR_MPERF); - acnt = rdmsr(MSR_APERF); - tsc2 = rdtsc(); - intr_restore(reg); - perf = 1000 * acnt / mcnt; - *rate = (tsc2 - tsc1) * perf; - } else { - tsc1 = rdtsc(); - DELAY(1000); - tsc2 = rdtsc(); - intr_restore(reg); - *rate = (tsc2 - tsc1) * 1000; - } - -#ifdef SMP - if (smp_cpus > 1) { - thread_lock(curthread); - sched_unbind(curthread); - thread_unlock(curthread); - } -#endif - - return (0); -} - -#ifdef XEN - -static void -idle_block(void) -{ - - HYPERVISOR_sched_op(SCHEDOP_block, 0); -} - -void -cpu_halt(void) -{ - HYPERVISOR_shutdown(SHUTDOWN_poweroff); -} - -int scheduler_running; - -static void -cpu_idle_hlt(sbintime_t sbt) -{ - - scheduler_running = 1; - enable_intr(); - idle_block(); -} - -#else -/* - * Shutdown the CPU as much as possible - */ -void -cpu_halt(void) -{ - for (;;) - halt(); -} - -#endif - -void (*cpu_idle_hook)(sbintime_t) = NULL; /* ACPI idle hook. */ -static int cpu_ident_amdc1e = 0; /* AMD C1E supported. */ -static int idle_mwait = 1; /* Use MONITOR/MWAIT for short idle. */ -SYSCTL_INT(_machdep, OID_AUTO, idle_mwait, CTLFLAG_RWTUN, &idle_mwait, - 0, "Use MONITOR/MWAIT for short idle"); - -#define STATE_RUNNING 0x0 -#define STATE_MWAIT 0x1 -#define STATE_SLEEPING 0x2 - -#ifndef PC98 -static void -cpu_idle_acpi(sbintime_t sbt) -{ - int *state; - - state = (int *)PCPU_PTR(monitorbuf); - *state = STATE_SLEEPING; - - /* See comments in cpu_idle_hlt(). */ - disable_intr(); - if (sched_runnable()) - enable_intr(); - else if (cpu_idle_hook) - cpu_idle_hook(sbt); - else - __asm __volatile("sti; hlt"); - *state = STATE_RUNNING; -} -#endif /* !PC98 */ - -#ifndef XEN -static void -cpu_idle_hlt(sbintime_t sbt) -{ - int *state; - - state = (int *)PCPU_PTR(monitorbuf); - *state = STATE_SLEEPING; - - /* - * Since we may be in a critical section from cpu_idle(), if - * an interrupt fires during that critical section we may have - * a pending preemption. If the CPU halts, then that thread - * may not execute until a later interrupt awakens the CPU. - * To handle this race, check for a runnable thread after - * disabling interrupts and immediately return if one is - * found. Also, we must absolutely guarentee that hlt is - * the next instruction after sti. This ensures that any - * interrupt that fires after the call to disable_intr() will - * immediately awaken the CPU from hlt. Finally, please note - * that on x86 this works fine because of interrupts enabled only - * after the instruction following sti takes place, while IF is set - * to 1 immediately, allowing hlt instruction to acknowledge the - * interrupt. - */ - disable_intr(); - if (sched_runnable()) - enable_intr(); - else - __asm __volatile("sti; hlt"); - *state = STATE_RUNNING; -} -#endif - -static void -cpu_idle_mwait(sbintime_t sbt) -{ - int *state; - - state = (int *)PCPU_PTR(monitorbuf); - *state = STATE_MWAIT; - - /* See comments in cpu_idle_hlt(). */ - disable_intr(); - if (sched_runnable()) { - enable_intr(); - *state = STATE_RUNNING; - return; - } - cpu_monitor(state, 0, 0); - if (*state == STATE_MWAIT) - __asm __volatile("sti; mwait" : : "a" (MWAIT_C1), "c" (0)); - else - enable_intr(); - *state = STATE_RUNNING; -} - -static void -cpu_idle_spin(sbintime_t sbt) -{ - int *state; - int i; - - state = (int *)PCPU_PTR(monitorbuf); - *state = STATE_RUNNING; - - /* - * The sched_runnable() call is racy but as long as there is - * a loop missing it one time will have just a little impact if any - * (and it is much better than missing the check at all). - */ - for (i = 0; i < 1000; i++) { - if (sched_runnable()) - return; - cpu_spinwait(); - } -} - -/* - * C1E renders the local APIC timer dead, so we disable it by - * reading the Interrupt Pending Message register and clearing - * both C1eOnCmpHalt (bit 28) and SmiOnCmpHalt (bit 27). - * - * Reference: - * "BIOS and Kernel Developer's Guide for AMD NPT Family 0Fh Processors" - * #32559 revision 3.00+ - */ -#define MSR_AMDK8_IPM 0xc0010055 -#define AMDK8_SMIONCMPHALT (1ULL << 27) -#define AMDK8_C1EONCMPHALT (1ULL << 28) -#define AMDK8_CMPHALT (AMDK8_SMIONCMPHALT | AMDK8_C1EONCMPHALT) - -static void -cpu_probe_amdc1e(void) -{ - - /* - * Detect the presence of C1E capability mostly on latest - * dual-cores (or future) k8 family. - */ - if (cpu_vendor_id == CPU_VENDOR_AMD && - (cpu_id & 0x00000f00) == 0x00000f00 && - (cpu_id & 0x0fff0000) >= 0x00040000) { - cpu_ident_amdc1e = 1; - } -} - -#if defined(PC98) || defined(XEN) -void (*cpu_idle_fn)(sbintime_t) = cpu_idle_hlt; -#else -void (*cpu_idle_fn)(sbintime_t) = cpu_idle_acpi; -#endif - -void -cpu_idle(int busy) -{ -#ifndef XEN - uint64_t msr; -#endif - sbintime_t sbt = -1; - - CTR2(KTR_SPARE2, "cpu_idle(%d) at %d", - busy, curcpu); -#if defined(MP_WATCHDOG) && !defined(XEN) - ap_watchdog(PCPU_GET(cpuid)); -#endif -#ifndef XEN - /* If we are busy - try to use fast methods. */ - if (busy) { - if ((cpu_feature2 & CPUID2_MON) && idle_mwait) { - cpu_idle_mwait(busy); - goto out; - } - } -#endif - - /* If we have time - switch timers into idle mode. */ - if (!busy) { - critical_enter(); - sbt = cpu_idleclock(); - } - -#ifndef XEN - /* Apply AMD APIC timer C1E workaround. */ - if (cpu_ident_amdc1e && cpu_disable_c3_sleep) { - msr = rdmsr(MSR_AMDK8_IPM); - if (msr & AMDK8_CMPHALT) - wrmsr(MSR_AMDK8_IPM, msr & ~AMDK8_CMPHALT); - } -#endif - - /* Call main idle method. */ - cpu_idle_fn(sbt); - - /* Switch timers back into active mode. */ - if (!busy) { - cpu_activeclock(); - critical_exit(); - } -#ifndef XEN -out: -#endif - CTR2(KTR_SPARE2, "cpu_idle(%d) at %d done", - busy, curcpu); -} - -int -cpu_idle_wakeup(int cpu) -{ - struct pcpu *pcpu; - int *state; - - pcpu = pcpu_find(cpu); - state = (int *)pcpu->pc_monitorbuf; - /* - * This doesn't need to be atomic since missing the race will - * simply result in unnecessary IPIs. - */ - if (*state == STATE_SLEEPING) - return (0); - if (*state == STATE_MWAIT) - *state = STATE_RUNNING; - return (1); -} - -/* - * Ordered by speed/power consumption. - */ -struct { - void *id_fn; - char *id_name; -} idle_tbl[] = { - { cpu_idle_spin, "spin" }, - { cpu_idle_mwait, "mwait" }, - { cpu_idle_hlt, "hlt" }, -#ifndef PC98 - { cpu_idle_acpi, "acpi" }, -#endif - { NULL, NULL } -}; - -static int -idle_sysctl_available(SYSCTL_HANDLER_ARGS) -{ - char *avail, *p; - int error; - int i; - - avail = malloc(256, M_TEMP, M_WAITOK); - p = avail; - for (i = 0; idle_tbl[i].id_name != NULL; i++) { - if (strstr(idle_tbl[i].id_name, "mwait") && - (cpu_feature2 & CPUID2_MON) == 0) - continue; -#ifndef PC98 - if (strcmp(idle_tbl[i].id_name, "acpi") == 0 && - cpu_idle_hook == NULL) - continue; -#endif - p += sprintf(p, "%s%s", p != avail ? ", " : "", - idle_tbl[i].id_name); - } - error = sysctl_handle_string(oidp, avail, 0, req); - free(avail, M_TEMP); - return (error); -} - -SYSCTL_PROC(_machdep, OID_AUTO, idle_available, CTLTYPE_STRING | CTLFLAG_RD, - 0, 0, idle_sysctl_available, "A", "list of available idle functions"); - -static int -idle_sysctl(SYSCTL_HANDLER_ARGS) -{ - char buf[16]; - int error; - char *p; - int i; - - p = "unknown"; - for (i = 0; idle_tbl[i].id_name != NULL; i++) { - if (idle_tbl[i].id_fn == cpu_idle_fn) { - p = idle_tbl[i].id_name; - break; - } - } - strncpy(buf, p, sizeof(buf)); - error = sysctl_handle_string(oidp, buf, sizeof(buf), req); - if (error != 0 || req->newptr == NULL) - return (error); - for (i = 0; idle_tbl[i].id_name != NULL; i++) { - if (strstr(idle_tbl[i].id_name, "mwait") && - (cpu_feature2 & CPUID2_MON) == 0) - continue; -#ifndef PC98 - if (strcmp(idle_tbl[i].id_name, "acpi") == 0 && - cpu_idle_hook == NULL) - continue; -#endif - if (strcmp(idle_tbl[i].id_name, buf)) - continue; - cpu_idle_fn = idle_tbl[i].id_fn; - return (0); - } - return (EINVAL); -} - -SYSCTL_PROC(_machdep, OID_AUTO, idle, CTLTYPE_STRING | CTLFLAG_RW, 0, 0, - idle_sysctl, "A", "currently selected idle function"); - -/* * Reset registers to default values on exec. */ void diff --git a/sys/i386/include/md_var.h b/sys/i386/include/md_var.h index 339dff3..bffdd57 100644 --- a/sys/i386/include/md_var.h +++ b/sys/i386/include/md_var.h @@ -97,6 +97,7 @@ struct dumperinfo; void *alloc_fpusave(int flags); void bcopyb(const void *from, void *to, size_t len); void busdma_swi(void); +void cpu_probe_amdc1e(void); void cpu_setregs(void); void cpu_switch_load_gs(void) __asm(__STRING(cpu_switch_load_gs)); void doreti_iret(void) __asm(__STRING(doreti_iret)); diff --git a/sys/x86/x86/cpu_machdep.c b/sys/x86/x86/cpu_machdep.c new file mode 100644 index 0000000..846a123 --- /dev/null +++ b/sys/x86/x86/cpu_machdep.c @@ -0,0 +1,533 @@ +/*- + * Copyright (c) 2003 Peter Wemm. + * Copyright (c) 1992 Terrence R. Lambert. + * Copyright (c) 1982, 1987, 1990 The Regents of the University of California. + * All rights reserved. + * + * This code is derived from software contributed to Berkeley by + * William Jolitz. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * 3. All advertising materials mentioning features or use of this software + * must display the following acknowledgement: + * This product includes software developed by the University of + * California, Berkeley and its contributors. + * 4. Neither the name of the University nor the names of its contributors + * may be used to endorse or promote products derived from this software + * without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * from: @(#)machdep.c 7.4 (Berkeley) 6/3/91 + */ + +#include +__FBSDID("$FreeBSD$"); + +#include "opt_atpic.h" +#include "opt_compat.h" +#include "opt_cpu.h" +#include "opt_ddb.h" +#include "opt_inet.h" +#include "opt_isa.h" +#include "opt_kstack_pages.h" +#include "opt_maxmem.h" +#include "opt_mp_watchdog.h" +#include "opt_perfmon.h" +#include "opt_platform.h" +#ifdef __i386__ +#include "opt_npx.h" +#include "opt_apic.h" +#include "opt_xbox.h" +#endif + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#ifdef SMP +#include +#endif +#include + +#include +#include +#include +#include +#include +#include +#ifdef PERFMON +#include +#endif +#include +#ifdef SMP +#include +#endif + +#include +#include +#include +#include +#include +#include +#include +#include + +#ifdef XEN +/* XEN includes */ +#include +#include +#include +#include +#include +#endif + +/* + * Machine dependent boot() routine + * + * I haven't seen anything to put here yet + * Possibly some stuff might be grafted back here from boot() + */ +void +cpu_boot(int howto) +{ +} + +/* + * Flush the D-cache for non-DMA I/O so that the I-cache can + * be made coherent later. + */ +void +cpu_flush_dcache(void *ptr, size_t len) +{ + /* Not applicable */ +} + +/* Get current clock frequency for the given cpu id. */ +int +cpu_est_clockrate(int cpu_id, uint64_t *rate) +{ + uint64_t tsc1, tsc2; + uint64_t acnt, mcnt, perf; + register_t reg; + + if (pcpu_find(cpu_id) == NULL || rate == NULL) + return (EINVAL); +#ifdef __i386__ + if ((cpu_feature & CPUID_TSC) == 0) + return (EOPNOTSUPP); +#endif + + /* + * If TSC is P-state invariant and APERF/MPERF MSRs do not exist, + * DELAY(9) based logic fails. + */ + if (tsc_is_invariant && !tsc_perf_stat) + return (EOPNOTSUPP); + +#ifdef SMP + if (smp_cpus > 1) { + /* Schedule ourselves on the indicated cpu. */ + thread_lock(curthread); + sched_bind(curthread, cpu_id); + thread_unlock(curthread); + } +#endif + + /* Calibrate by measuring a short delay. */ + reg = intr_disable(); + if (tsc_is_invariant) { + wrmsr(MSR_MPERF, 0); + wrmsr(MSR_APERF, 0); + tsc1 = rdtsc(); + DELAY(1000); + mcnt = rdmsr(MSR_MPERF); + acnt = rdmsr(MSR_APERF); + tsc2 = rdtsc(); + intr_restore(reg); + perf = 1000 * acnt / mcnt; + *rate = (tsc2 - tsc1) * perf; + } else { + tsc1 = rdtsc(); + DELAY(1000); + tsc2 = rdtsc(); + intr_restore(reg); + *rate = (tsc2 - tsc1) * 1000; + } + +#ifdef SMP + if (smp_cpus > 1) { + thread_lock(curthread); + sched_unbind(curthread); + thread_unlock(curthread); + } +#endif + + return (0); +} + +#if defined(__i386__) && defined(XEN) + +static void +idle_block(void) +{ + + HYPERVISOR_sched_op(SCHEDOP_block, 0); +} + +void +cpu_halt(void) +{ + HYPERVISOR_shutdown(SHUTDOWN_poweroff); +} + +int scheduler_running; + +static void +cpu_idle_hlt(sbintime_t sbt) +{ + + scheduler_running = 1; + enable_intr(); + idle_block(); +} + +#else +/* + * Shutdown the CPU as much as possible + */ +void +cpu_halt(void) +{ + for (;;) + halt(); +} + +#endif + +void (*cpu_idle_hook)(sbintime_t) = NULL; /* ACPI idle hook. */ +static int cpu_ident_amdc1e = 0; /* AMD C1E supported. */ +static int idle_mwait = 1; /* Use MONITOR/MWAIT for short idle. */ +SYSCTL_INT(_machdep, OID_AUTO, idle_mwait, CTLFLAG_RWTUN, &idle_mwait, + 0, "Use MONITOR/MWAIT for short idle"); + +#define STATE_RUNNING 0x0 +#define STATE_MWAIT 0x1 +#define STATE_SLEEPING 0x2 + +#ifndef PC98 +static void +cpu_idle_acpi(sbintime_t sbt) +{ + int *state; + + state = (int *)PCPU_PTR(monitorbuf); + *state = STATE_SLEEPING; + + /* See comments in cpu_idle_hlt(). */ + disable_intr(); + if (sched_runnable()) + enable_intr(); + else if (cpu_idle_hook) + cpu_idle_hook(sbt); + else + __asm __volatile("sti; hlt"); + *state = STATE_RUNNING; +} +#endif /* !PC98 */ + +#if !defined(__i386__) || !defined(XEN) +static void +cpu_idle_hlt(sbintime_t sbt) +{ + int *state; + + state = (int *)PCPU_PTR(monitorbuf); + *state = STATE_SLEEPING; + + /* + * Since we may be in a critical section from cpu_idle(), if + * an interrupt fires during that critical section we may have + * a pending preemption. If the CPU halts, then that thread + * may not execute until a later interrupt awakens the CPU. + * To handle this race, check for a runnable thread after + * disabling interrupts and immediately return if one is + * found. Also, we must absolutely guarentee that hlt is + * the next instruction after sti. This ensures that any + * interrupt that fires after the call to disable_intr() will + * immediately awaken the CPU from hlt. Finally, please note + * that on x86 this works fine because of interrupts enabled only + * after the instruction following sti takes place, while IF is set + * to 1 immediately, allowing hlt instruction to acknowledge the + * interrupt. + */ + disable_intr(); + if (sched_runnable()) + enable_intr(); + else + __asm __volatile("sti; hlt"); + *state = STATE_RUNNING; +} +#endif + +static void +cpu_idle_mwait(sbintime_t sbt) +{ + int *state; + + state = (int *)PCPU_PTR(monitorbuf); + *state = STATE_MWAIT; + + /* See comments in cpu_idle_hlt(). */ + disable_intr(); + if (sched_runnable()) { + enable_intr(); + *state = STATE_RUNNING; + return; + } + cpu_monitor(state, 0, 0); + if (*state == STATE_MWAIT) + __asm __volatile("sti; mwait" : : "a" (MWAIT_C1), "c" (0)); + else + enable_intr(); + *state = STATE_RUNNING; +} + +static void +cpu_idle_spin(sbintime_t sbt) +{ + int *state; + int i; + + state = (int *)PCPU_PTR(monitorbuf); + *state = STATE_RUNNING; + + /* + * The sched_runnable() call is racy but as long as there is + * a loop missing it one time will have just a little impact if any + * (and it is much better than missing the check at all). + */ + for (i = 0; i < 1000; i++) { + if (sched_runnable()) + return; + cpu_spinwait(); + } +} + +/* + * C1E renders the local APIC timer dead, so we disable it by + * reading the Interrupt Pending Message register and clearing + * both C1eOnCmpHalt (bit 28) and SmiOnCmpHalt (bit 27). + * + * Reference: + * "BIOS and Kernel Developer's Guide for AMD NPT Family 0Fh Processors" + * #32559 revision 3.00+ + */ +#define MSR_AMDK8_IPM 0xc0010055 +#define AMDK8_SMIONCMPHALT (1ULL << 27) +#define AMDK8_C1EONCMPHALT (1ULL << 28) +#define AMDK8_CMPHALT (AMDK8_SMIONCMPHALT | AMDK8_C1EONCMPHALT) + +void +cpu_probe_amdc1e(void) +{ + + /* + * Detect the presence of C1E capability mostly on latest + * dual-cores (or future) k8 family. + */ + if (cpu_vendor_id == CPU_VENDOR_AMD && + (cpu_id & 0x00000f00) == 0x00000f00 && + (cpu_id & 0x0fff0000) >= 0x00040000) { + cpu_ident_amdc1e = 1; + } +} + +#if defined(__i386__) && (defined(PC98) || defined(XEN)) +void (*cpu_idle_fn)(sbintime_t) = cpu_idle_hlt; +#else +void (*cpu_idle_fn)(sbintime_t) = cpu_idle_acpi; +#endif + +void +cpu_idle(int busy) +{ +#if !defined(__i386__) || !defined(XEN) + uint64_t msr; +#endif + sbintime_t sbt = -1; + + CTR2(KTR_SPARE2, "cpu_idle(%d) at %d", + busy, curcpu); +#if defined(MP_WATCHDOG) && (!defined(__i386__) || !defined(XEN)) + ap_watchdog(PCPU_GET(cpuid)); +#endif +#if !defined(__i386__) || !defined(XEN) + /* If we are busy - try to use fast methods. */ + if (busy) { + if ((cpu_feature2 & CPUID2_MON) && idle_mwait) { + cpu_idle_mwait(busy); + goto out; + } + } +#endif + + /* If we have time - switch timers into idle mode. */ + if (!busy) { + critical_enter(); + sbt = cpu_idleclock(); + } + +#if !defined(__i386__) || !defined(XEN) + /* Apply AMD APIC timer C1E workaround. */ + if (cpu_ident_amdc1e && cpu_disable_c3_sleep) { + msr = rdmsr(MSR_AMDK8_IPM); + if (msr & AMDK8_CMPHALT) + wrmsr(MSR_AMDK8_IPM, msr & ~AMDK8_CMPHALT); + } +#endif + + /* Call main idle method. */ + cpu_idle_fn(sbt); + + /* Switch timers back into active mode. */ + if (!busy) { + cpu_activeclock(); + critical_exit(); + } +#if !defined(__i386__) || !defined(XEN) +out: +#endif + CTR2(KTR_SPARE2, "cpu_idle(%d) at %d done", + busy, curcpu); +} + +int +cpu_idle_wakeup(int cpu) +{ + struct pcpu *pcpu; + int *state; + + pcpu = pcpu_find(cpu); + state = (int *)pcpu->pc_monitorbuf; + /* + * This doesn't need to be atomic since missing the race will + * simply result in unnecessary IPIs. + */ + if (*state == STATE_SLEEPING) + return (0); + if (*state == STATE_MWAIT) + *state = STATE_RUNNING; + return (1); +} + +/* + * Ordered by speed/power consumption. + */ +struct { + void *id_fn; + char *id_name; +} idle_tbl[] = { + { cpu_idle_spin, "spin" }, + { cpu_idle_mwait, "mwait" }, + { cpu_idle_hlt, "hlt" }, +#if !defined(__i386__) || !defined(PC98) + { cpu_idle_acpi, "acpi" }, +#endif + { NULL, NULL } +}; + +static int +idle_sysctl_available(SYSCTL_HANDLER_ARGS) +{ + char *avail, *p; + int error; + int i; + + avail = malloc(256, M_TEMP, M_WAITOK); + p = avail; + for (i = 0; idle_tbl[i].id_name != NULL; i++) { + if (strstr(idle_tbl[i].id_name, "mwait") && + (cpu_feature2 & CPUID2_MON) == 0) + continue; +#if !defined(__i386__) || !defined(PC98) + if (strcmp(idle_tbl[i].id_name, "acpi") == 0 && + cpu_idle_hook == NULL) + continue; +#endif + p += sprintf(p, "%s%s", p != avail ? ", " : "", + idle_tbl[i].id_name); + } + error = sysctl_handle_string(oidp, avail, 0, req); + free(avail, M_TEMP); + return (error); +} + +SYSCTL_PROC(_machdep, OID_AUTO, idle_available, CTLTYPE_STRING | CTLFLAG_RD, + 0, 0, idle_sysctl_available, "A", "list of available idle functions"); + +static int +idle_sysctl(SYSCTL_HANDLER_ARGS) +{ + char buf[16]; + int error; + char *p; + int i; + + p = "unknown"; + for (i = 0; idle_tbl[i].id_name != NULL; i++) { + if (idle_tbl[i].id_fn == cpu_idle_fn) { + p = idle_tbl[i].id_name; + break; + } + } + strncpy(buf, p, sizeof(buf)); + error = sysctl_handle_string(oidp, buf, sizeof(buf), req); + if (error != 0 || req->newptr == NULL) + return (error); + for (i = 0; idle_tbl[i].id_name != NULL; i++) { + if (strstr(idle_tbl[i].id_name, "mwait") && + (cpu_feature2 & CPUID2_MON) == 0) + continue; +#if !defined(__i386__) || !defined(PC98) + if (strcmp(idle_tbl[i].id_name, "acpi") == 0 && + cpu_idle_hook == NULL) + continue; +#endif + if (strcmp(idle_tbl[i].id_name, buf)) + continue; + cpu_idle_fn = idle_tbl[i].id_fn; + return (0); + } + return (EINVAL); +} + +SYSCTL_PROC(_machdep, OID_AUTO, idle, CTLTYPE_STRING | CTLFLAG_RW, 0, 0, + idle_sysctl, "A", "currently selected idle function"); From owner-freebsd-arch@FreeBSD.ORG Mon Apr 20 17:54:59 2015 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id BC82E492; Mon, 20 Apr 2015 17:54:59 +0000 (UTC) Received: from mail-pd0-x235.google.com (mail-pd0-x235.google.com [IPv6:2607:f8b0:400e:c02::235]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 8C5A08CB; Mon, 20 Apr 2015 17:54:59 +0000 (UTC) Received: by pdbnk13 with SMTP id nk13so214586174pdb.0; Mon, 20 Apr 2015 10:54:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; bh=BzQ2EDGwWtpvze1UGBSVKQ79bAvFKTqxACeNzxpg2/0=; b=Cg4SKavCUPdNXM1FvlvClp1tjVxitQvI1xvwPd7fs0MWpKrF+FILLytgMczYhUV5ih JRTyTBUtm3STGTV3HElSWFFZ+v0ZwdbVR1Lb+EF+EtcufN49W9kgqGf1iSWfm42ZpCP0 9qDbgO2zsZxS3AGaMiUxE1863OzOn+YnqPZujUJqIfZzCqxCydA7IdvUHji3Jvm/7LQK sDM0PjKBf0dz05i56CiwUk/UB9WudUX0Y6UG7kPB2EJ/ziS2Y4uyf+DvNcCYbX1A0ZIT ZbbDNSqUUSN7UqDh3JeWzm5N87+Hw45bNSA88tPA0plJ+S64pU2TTM+w38cYn6sr1K19 ZM8A== X-Received: by 10.68.198.36 with SMTP id iz4mr30326080pbc.167.1429552499005; Mon, 20 Apr 2015 10:54:59 -0700 (PDT) MIME-Version: 1.0 Sender: ycyc321@gmail.com Received: by 10.67.2.42 with HTTP; Mon, 20 Apr 2015 10:54:28 -0700 (PDT) In-Reply-To: <2177000.nIlZYR4khO@ralph.baldwin.cx> References: <6048769.xVxqkDkTGK@ralph.baldwin.cx> <20150417134348.GR2390@kib.kiev.ua> <2177000.nIlZYR4khO@ralph.baldwin.cx> From: Yue Chen Date: Mon, 20 Apr 2015 13:54:28 -0400 X-Google-Sender-Auth: isyFUP2m2B1Ueftn4cDC44aZPs0 Message-ID: Subject: Re: Situations about PC values in kernel data segments To: John Baldwin Cc: Konstantin Belousov , freebsd-arch@freebsd.org Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 20 Apr 2015 17:54:59 -0000 > Are you asking if you can figure out if a given PC value used as the value > of $rip for an arbitrary instruction is valid, or are you trying to enumerate > all the words in memory that hold a pointer to a .text value (like > pcb_onfault)? > I assumed the former. So sorry for the confusion. I mean any other situations of the *latter* one, which *excludes* function pointers. And this does not have to be a full-word pointer. This can be a half-word displacement/offset to the address in .text, or a special encoding of the address as well. On Mon, Apr 20, 2015 at 11:00 AM, John Baldwin wrote: > On Friday, April 17, 2015 04:43:48 PM Konstantin Belousov wrote: > > On Fri, Apr 17, 2015 at 09:22:43AM -0400, John Baldwin wrote: > > > On Saturday, April 11, 2015 05:18:28 AM Yue Chen wrote: > > > > Dear all, > > > > > > > > We are working on a project about OS security. > > > > We wonder in which situations the program counter (PC) value (e.g., > the > > > > value in %RIP on x86_64, i.e, instruction address) could be in kernel > > > > (module) data segments (including stack, heap, etc.). > > > > > > > > Here we mainly care about the address/value that are NOT function > entry > > > > points since there exist a number of function pointers. Also, we only > > > > consider the normal cases because one can write arbitrary values > into a > > > > variable/pointer. And we mainly consider i386, AMD64 and ARM. > > > > > > > > Here are some situations I can think about: > > > > function/interrupt/exception/syscall return address on stack; > switch/case > > > > jump table target; page fault handler (pcb_onfault on *BSD); > restartable > > > > atomic sequences (RAS) registry; thread/process context structure > like Task > > > > state segment (TSS), process control block (PCB) and thread control > block > > > > (TCB); situations for debugging purposes (e.g., like those in > ``segment not > > > > present'' exception handler). > > > > > > > > Additionally, does any of these addresses have offset formats or > special > > > > encodings? For example, on x86_64, we may use 32-bit RIP-relative > > > > (addressing) offset to represent a 64-bit full address. In glibc's > > > > setjmp/longjmp jmp_buf, they use a special encoding (PTR_MANGLE) for > saved > > > > register values. > > > > > > For i386 and amd64, I think all of the code that is executed does live > in a > > > .text segment. When pcb_onfault is used it is set to point to code in > a .text > > > segment, not anywhere else. Similarly, fault and exception handlers > as well > > > as the stub for new threads/processes after fork/thread_create is in > .text > > > as well. There are multiple text segments present when modules are > loaded > > > of course, but you should be able to enumerate all of those in the > linker. > > > > Wasn't bpf enhanced to compile filters to the native code, on x86 ? > > Also, what about BIOS code ? Esp. since the spread of UEFI and hope that > > our kernel starts using UEFI runtime services one day. My point is that > > _relying_ on enumeration of the text segments for kernel and modules to > > determine all executable memory is not correct. > > It depends on the scope. If this is for a graduate research project to > build > a prototype to see if this is feasible, then some cavets are acceptable if > they are known. One could be to disallow the bpf JIT option (I believe it > is > not in GENERIC)? EFI is actually fairly easily handled since the EFI > memory > map gives you the bounds of the executable code and you can just treat > that as > an additional .text segment. > > -- > John Baldwin > From owner-freebsd-arch@FreeBSD.ORG Mon Apr 20 21:25:31 2015 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id DA8A321C; Mon, 20 Apr 2015 21:25:31 +0000 (UTC) Received: from mail-wi0-x22e.google.com (mail-wi0-x22e.google.com [IPv6:2a00:1450:400c:c05::22e]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 7698D662; Mon, 20 Apr 2015 21:25:31 +0000 (UTC) Received: by wiun10 with SMTP id n10so107234242wiu.1; Mon, 20 Apr 2015 14:25:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=G5NBfn+u03Zdf81gPvgH17MBIThAdSC9cgGtECLYaeQ=; b=l11sTyXhlRj2sSHwEYcCnrOkpLNuK2JiyFKuKwrRP4EpbGhHkUhnJfBDKNrESD+lr1 CjN/Czj0QNCOadQlr60G96AcvHGH67r1oIr7FsRSttyiFSWerpx2BF3YLxRWUVioF1rq RLIExQuBO861sh9X+m9i9J4QEv0G+Uz0/uq2Dl8ft/GtERsf3cTjwJxqIN8/wMa2p7nI Y/QASXSCpYJbwE0so1g4SmExwx606fnE66iQbmgBK5W2YlWV4jGMyFh+Tyl+WEwQt/A/ 7Y4NW+v3sKYUUZjDbeAr7/nSC0nGwiugl8fuMUYhazHp6oHxluWk81O2zSv+2rUqGeUd PaxA== MIME-Version: 1.0 X-Received: by 10.180.104.137 with SMTP id ge9mr29625940wib.24.1429565129111; Mon, 20 Apr 2015 14:25:29 -0700 (PDT) Received: by 10.27.80.202 with HTTP; Mon, 20 Apr 2015 14:25:29 -0700 (PDT) In-Reply-To: <20150420162149.GE2390@kib.kiev.ua> References: <20150420162149.GE2390@kib.kiev.ua> Date: Tue, 21 Apr 2015 00:25:29 +0300 Message-ID: Subject: Re: Move x86 idle code to the x86/ common place. From: Sergey Kandaurov To: Konstantin Belousov Cc: arch@freebsd.org, amd64@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 20 Apr 2015 21:25:32 -0000 On 20 April 2015 at 19:21, Konstantin Belousov wrote: [..] > +struct { > + void *id_fn; > + char *id_name; > +} idle_tbl[] = { > + { cpu_idle_spin, "spin" }, > + { cpu_idle_mwait, "mwait" }, > + { cpu_idle_hlt, "hlt" }, > +#if !defined(__i386__) || !defined(PC98) > + { cpu_idle_acpi, "acpi" }, > +#endif > + { NULL, NULL } > +}; > + I believe this conditional could be left unchanged as #ifndef PC98 (also in several other places), given that pc98 may not be present other than under i386. Otherwise, looks good. -- wbr, pluknet From owner-freebsd-arch@FreeBSD.ORG Tue Apr 21 06:05:20 2015 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 20718233; Tue, 21 Apr 2015 06:05:20 +0000 (UTC) Received: from mout.gmx.net (mout.gmx.net [212.227.17.22]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mout.gmx.net", Issuer "TeleSec ServerPass DE-1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id A76E915E7; Tue, 21 Apr 2015 06:05:19 +0000 (UTC) Received: from [10.182.183.249] ([109.42.2.3]) by mail.gmx.com (mrgmx103) with ESMTPSA (Nemesis) id 0MFLmC-1YW7xw4BwS-00EQ2T; Tue, 21 Apr 2015 08:05:09 +0200 User-Agent: K-9 Mail for Android In-Reply-To: <20150419185609.GA14639@lonesome.com> References: <20150417075942.GI2390@kib.kiev.ua> <20150417121034.GN2390@kib.kiev.ua> <5531059F.4060500@freebsd.org> <20150419185609.GA14639@lonesome.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Subject: Re: Removal of the 6.x kernel compat code from libc From: olli hauer Date: Tue, 21 Apr 2015 08:05:08 +0200 To: Mark Linimon ,Stefan Esser CC: Konstantin Belousov , "freebsd-arch@freebsd.org" , Oliver Pinter , peter@freebsd.org Message-ID: <5C660531-4E03-45C5-BB54-0FC679D3C170@gmx.de> X-Provags-ID: V03:K0:MQZYeOqh6+weLOfbqbUlXtheEhsKN0l1m89nHXMWPy3Je+xS1UH mI/fQxmFI+vdndw+TQyzz1Xxm+vEd6ZJh1mFp0AGLmfOwaMYZCZNTNWQI91U4IkvLJSDF2E vlfBc5XB0WjrM1tOS5DPsTFlFj6F91IkREbtEyojiKow0PIyZXS1CAzaT32mmFfML61ZaA5 rp/5InzRgld0Qbk/PN/cw== X-UI-Out-Filterresults: notjunk:1; X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Apr 2015 06:05:20 -0000 On April 19, 2015 8:56:10 PM CEST, Mark Linimon wr= ote: > On Fri, Apr 17, 2015 at 03:07:43PM +0200, Stefan Esser wrote: > > I doubt that anybody relies on non-POSIX behaviour that has been > > deprecated for some 15 years =2E=2E=2E >=20 > Any sin that's ever been committed is probably still referenced by > some > damned port or other :-) >=20 > That's still no reason to keep them, of course; I'm just pointing out > that you're taunting Murphy=2E >=20 > mcl > _______________________________________________ > freebsd-arch@freebsd=2Eorg mailing list > http://lists=2Efreebsd=2Eorg/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to > "freebsd-arch-unsubscribe@freebsd=2Eorg" One potential issue are the official vmware tools, thry still require=2E c= ompat6 =2E=2E=2E From owner-freebsd-arch@FreeBSD.ORG Tue Apr 21 06:07:59 2015 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id DFE1E311 for ; Tue, 21 Apr 2015 06:07:58 +0000 (UTC) Received: from mail-pa0-f54.google.com (mail-pa0-f54.google.com [209.85.220.54]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id B0DDF1603 for ; Tue, 21 Apr 2015 06:07:58 +0000 (UTC) Received: by pacyx8 with SMTP id yx8so230912999pac.1 for ; Mon, 20 Apr 2015 23:07:52 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:sender:subject:mime-version:content-type:from :in-reply-to:date:cc:message-id:references:to; bh=fkknLubCxzjANePCQgKx63nyXqQXBzgcOSu+Ql7qkY8=; b=j3tr+QUGwfDfpjKQOjnRayjdOAZm8whagRzsq2IbPs/Zn39rUXyg1zsjZ2//eGm4pQ EmBzlqyFN7lPsMjC1oebv/PW9tFEq82VZ/FDBxqR9DT4kUKMSvUxxxjDPPLNFDmMzDqz 6UemNVFlUPQ+ijjhpkFz2PLXiA8gF/eFWNT5irwpRrypMdGTNwf/FBVnY0rCMPpegMx8 3YSRK6OIe26MepaD3anx92f9hBLxL/bedIerHj3jSNmBsb6r85KbFKAY2lB8K5dfdsI3 FKYBqdQGvSqieYg9Y6zkFH/rasHVRX8xbk24c/VwBDSHUkm2NFX4pf36ChQfyOfq2wFJ RPrw== X-Gm-Message-State: ALoCoQm4he9NH4BqBwCS3EVhftOBu0liOry43B5GjpyfQgoEUe+ozYJ18/8VrH+ivK0a/4QTpLbj X-Received: by 10.70.98.197 with SMTP id ek5mr34360126pdb.109.1429596471998; Mon, 20 Apr 2015 23:07:51 -0700 (PDT) Received: from [10.64.27.48] ([69.53.236.236]) by mx.google.com with ESMTPSA id bo2sm815931pbb.1.2015.04.20.23.07.49 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 20 Apr 2015 23:07:50 -0700 (PDT) Sender: Warner Losh Subject: Re: Removal of the 6.x kernel compat code from libc Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2098\)) Content-Type: multipart/signed; boundary="Apple-Mail=_D5074B6E-E014-44B3-82DE-5E11C8C0A602"; protocol="application/pgp-signature"; micalg=pgp-sha512 X-Pgp-Agent: GPGMail 2.5b6 From: Warner Losh In-Reply-To: <5C660531-4E03-45C5-BB54-0FC679D3C170@gmx.de> Date: Tue, 21 Apr 2015 00:07:47 -0600 Cc: Mark Linimon , Stefan Esser , Konstantin Belousov , "freebsd-arch@freebsd.org" , peter@freebsd.org, Oliver Pinter Message-Id: <9514285C-30A8-4ED2-8571-6E1492364741@bsdimp.com> References: <20150417075942.GI2390@kib.kiev.ua> <20150417121034.GN2390@kib.kiev.ua> <5531059F.4060500@freebsd.org> <20150419185609.GA14639@lonesome.com> <5C660531-4E03-45C5-BB54-0FC679D3C170@gmx.de> To: olli hauer X-Mailer: Apple Mail (2.2098) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Apr 2015 06:07:59 -0000 --Apple-Mail=_D5074B6E-E014-44B3-82DE-5E11C8C0A602 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 > On Apr 21, 2015, at 12:05 AM, olli hauer wrote: >=20 > On April 19, 2015 8:56:10 PM CEST, Mark Linimon = wrote: >> On Fri, Apr 17, 2015 at 03:07:43PM +0200, Stefan Esser wrote: >>> I doubt that anybody relies on non-POSIX behaviour that has been >>> deprecated for some 15 years ... >>=20 >> Any sin that's ever been committed is probably still referenced by >> some >> damned port or other :-) >>=20 >> That's still no reason to keep them, of course; I'm just pointing out >> that you're taunting Murphy. >>=20 >> mcl >> _______________________________________________ >> freebsd-arch@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-arch >> To unsubscribe, send any mail to >> "freebsd-arch-unsubscribe@freebsd.org" >=20 > One potential issue are the official vmware tools, thry still require. = compat6 =E2=80=A6 This is the libc side (running FreeBSD 11 binaries on a FreeBSD 6 = kernel), not the kernel side, which is what vmware tools need=E2=80=A6 Warner --Apple-Mail=_D5074B6E-E014-44B3-82DE-5E11C8C0A602 Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=signature.asc Content-Type: application/pgp-signature; name=signature.asc Content-Description: Message signed with OpenPGP using GPGMail -----BEGIN PGP SIGNATURE----- Comment: GPGTools - https://gpgtools.org iQIcBAEBCgAGBQJVNekzAAoJEGwc0Sh9sBEAAucP/0LzSXwUvZYPht4tBj5RiIkm E7R8uxo39KubAI7IGtozKrEUZOw0VvItKyojuutau5sH2nM7U9XefWOuAqcIYY9n E1IpzYmj1zD4bmLV+6aGHfqKIhfDZRgSoQNJsfT2rzWNIo72xwQX0kIu/3mWcxYE CehuP/MwWT3YftE/9TJpLnYDhqt8MiNw96U96p9yMTqoUFFNRi5+tSWUD1BrpYSL Hu7uKLg7EcOHHyIuWMV/v1AaDuvWvbEDwdW+OzV8fflaJk0XRc+jaSSbM03Gz989 fW5e87VET1j2q26eawgjtWctzRISlDVtzu7E9RJViQL4RNkoAIDOgYnXVDaAgddh +0vPVkFyPw4clAFTH0qjauSqwc6BiRNzZgm5chJrwuKbkzSEJptFABVFQ1VPcf6U wJgMGcWNx2+lOP6jjHDfTXMmMOEDN73bHGmtsaBe+QjOjnecoHQOk9NnmWPwP6ip mOPmARcxLxT95stUEzF2t1FLsPZGF/be82z5STs4oHsbkXSqGhuqshgc+DwVfvK5 zUjo8djnW5+rzYVLGEkqf/yWV6ZYLcGXc3QXIWN9XQFHKDAqm+caXht0r683DzxB fYY1XLxGx5dXCVyabJfOl0JGgDBRecQUG1FRWUTLPJGY4KKlQ9/ptcv3sgoW7RF2 krDpqM+y7ErC5zp1zBdk =HnYf -----END PGP SIGNATURE----- --Apple-Mail=_D5074B6E-E014-44B3-82DE-5E11C8C0A602-- From owner-freebsd-arch@FreeBSD.ORG Tue Apr 21 06:21:32 2015 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 3A4CE8C7; Tue, 21 Apr 2015 06:21:32 +0000 (UTC) Received: from mout.gmx.net (mout.gmx.net [212.227.17.22]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mout.gmx.net", Issuer "TeleSec ServerPass DE-1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id BCC4517EB; Tue, 21 Apr 2015 06:21:31 +0000 (UTC) Received: from [10.182.183.249] ([109.42.2.3]) by mail.gmx.com (mrgmx102) with ESMTPSA (Nemesis) id 0Meutp-1YzVZn245Z-00OTdP; Tue, 21 Apr 2015 08:21:26 +0200 User-Agent: K-9 Mail for Android In-Reply-To: <9514285C-30A8-4ED2-8571-6E1492364741@bsdimp.com> References: <20150417075942.GI2390@kib.kiev.ua> <20150417121034.GN2390@kib.kiev.ua> <5531059F.4060500@freebsd.org> <20150419185609.GA14639@lonesome.com> <5C660531-4E03-45C5-BB54-0FC679D3C170@gmx.de> <9514285C-30A8-4ED2-8571-6E1492364741@bsdimp.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Subject: Re: Removal of the 6.x kernel compat code from libc From: olli hauer Date: Tue, 21 Apr 2015 08:21:27 +0200 To: Warner Losh CC: Mark Linimon , Stefan Esser , Konstantin Belousov , "freebsd-arch@freebsd.org" , peter@freebsd.org, Oliver Pinter Message-ID: <02C44E56-DCF3-4F51-A81B-B89BBF37D8CC@gmx.de> X-Provags-ID: V03:K0:s4X3nVRvZ/6N2/YhUyDkU6wplvErd1aE9lkSCCHEuDzzeqz+XBO 8AeGL2unr5l2wXX7Glu+pC6WyB7OS8pdKGJrFAhA7xkcCsxiBs+J/1mcFm/zQ/owSJaUiSN oVCe2g9QKIoMjF+K4YEIKXY2qSyAa6lHxQNge5qazVjfgV/OpFxs5EpfZd1JLoIbxCQ/PkM HFlaZK6Y+3jpAUUmTe62A== X-UI-Out-Filterresults: notjunk:1; X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Apr 2015 06:21:32 -0000 On April 21, 2015 8:07:47 AM CEST, Warner Losh wrote: >=20 > > On Apr 21, 2015, at 12:05 AM, olli hauer wrote: > >=20 > > On April 19, 2015 8:56:10 PM CEST, Mark Linimon > wrote: > >> On Fri, Apr 17, 2015 at 03:07:43PM +0200, Stefan Esser wrote: > >>> I doubt that anybody relies on non-POSIX behaviour that has been > >>> deprecated for some 15 years =2E=2E=2E > >>=20 > >> Any sin that's ever been committed is probably still referenced by > >> some > >> damned port or other :-) > >>=20 > >> That's still no reason to keep them, of course; I'm just pointing > out > >> that you're taunting Murphy=2E > >>=20 > >> mcl > >> _______________________________________________ > >> freebsd-arch@freebsd=2Eorg mailing list > >> http://lists=2Efreebsd=2Eorg/mailman/listinfo/freebsd-arch > >> To unsubscribe, send any mail to > >> "freebsd-arch-unsubscribe@freebsd=2Eorg" > >=20 > > One potential issue are the official vmware tools, thry still > require=2E compat6 =E2=80=A6 >=20 > This is the libc side (running FreeBSD 11 binaries on a FreeBSD 6 > kernel), > not the kernel side, which is what vmware tools need=E2=80=A6 >=20 > Warner Ah,OK Thanks for clarification! From owner-freebsd-arch@FreeBSD.ORG Tue Apr 21 08:31:42 2015 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id E5B9C289; Tue, 21 Apr 2015 08:31:42 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 5AA691640; Tue, 21 Apr 2015 08:31:42 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id t3L8VXQG014636 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Tue, 21 Apr 2015 11:31:33 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua t3L8VXQG014636 Received: (from kostik@localhost) by tom.home (8.14.9/8.14.9/Submit) id t3L8VXAc014634; Tue, 21 Apr 2015 11:31:33 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Tue, 21 Apr 2015 11:31:33 +0300 From: Konstantin Belousov To: Sergey Kandaurov Cc: arch@freebsd.org, amd64@freebsd.org Subject: Re: Move x86 idle code to the x86/ common place. Message-ID: <20150421083133.GI2390@kib.kiev.ua> References: <20150420162149.GE2390@kib.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.0 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on tom.home X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Apr 2015 08:31:43 -0000 On Tue, Apr 21, 2015 at 12:25:29AM +0300, Sergey Kandaurov wrote: > On 20 April 2015 at 19:21, Konstantin Belousov wrote: > [..] > > +struct { > > + void *id_fn; > > + char *id_name; > > +} idle_tbl[] = { > > + { cpu_idle_spin, "spin" }, > > + { cpu_idle_mwait, "mwait" }, > > + { cpu_idle_hlt, "hlt" }, > > +#if !defined(__i386__) || !defined(PC98) > > + { cpu_idle_acpi, "acpi" }, > > +#endif > > + { NULL, NULL } > > +}; > > + > > I believe this conditional could be left unchanged as #ifndef PC98 > (also in several other places), given that pc98 may not be present > other than under i386. Otherwise, looks good. Sure, you are correct. I know that PC98 is i386 only, and I considered both approaches when I did the merge. My decision to add explicit __i386__ check was to make it clearer for reader who might be interested in the __amd64__ flow. That said, I do not mind doing the pass to revert the #if !defined(__i386__) || !defined(PC98) to #ifndef PC98 if people consider this preferable. From owner-freebsd-arch@FreeBSD.ORG Tue Apr 21 19:48:04 2015 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id DF8DCFF1 for ; Tue, 21 Apr 2015 19:48:04 +0000 (UTC) Received: from maila-bc.linkedin.com (maila-bc.linkedin.com [108.174.3.139]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mail.linkedin.com", Issuer "DigiCert Secure Server CA" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 8376413AD for ; Tue, 21 Apr 2015 19:48:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linkedin.com; s=proddkim1024; t=1429645611; bh=/3NArCkMX3jdsPy6sJS5BZB9vmYYDjuILCz8zWwBzDI=; h=From:Subject:MIME-Version:Content-Type:To:Date:X-LinkedIn-Class: X-LinkedIn-Template:X-LinkedIn-fbl; b=v96s/sigqwdBQ6ybDPcFeTJBFeMwnAkQwAI4UViR1tp8Z+99ZCpbXkDh1B2gsS7Qh Y1mx/ZuH2oYHCdj+4F3Yzflm3amN7wBSZWLSbALGz06QKWriRGAVbnIce8VtgBHucL NMNX3dBibYXwKJ/ZkyZSVM0LGOLdKfWJXJQhdNsI= From: LinkedIn Security Message-ID: <2026065136.7669603.1429645611348.JavaMail.app@lva1-app8913.prod> Subject: Sairam, your password was successfully reset MIME-Version: 1.0 To: Sairam Chengala Date: Tue, 21 Apr 2015 19:46:51 +0000 (UTC) X-LinkedIn-Class: ACCT-ADMIN X-LinkedIn-Template: security_reset_password_notification X-LinkedIn-fbl: m2-aszryivwjx2goekk7ihh6n5qrb3tnplrio1lygpqdfvnaer6devgmqru3iax3enccbx21o2th5kjtki69h40xt6o6u3yno1fmgmeye X-LinkedIn-Id: i9hmy-i8rq31nf-4y Content-Type: text/plain;charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Apr 2015 19:48:05 -0000 Hi Sairam, You've successfully changed your LinkedIn password. Thanks for using LinkedIn! The LinkedIn Team When and where this happened: Date:April 21, 2015, 3:46 PM Browser:Chrome Operating System:OS X IP Address:98.109.25.114 Approximate Location:Hoboken, New Jersey, United States Didn't do this? Be sure to change your password right away: https://www.lin= kedin.com/e/v2?e=3Di9hmy-i8rq31nf-4y&a=3Duas-request-password-reset&midToke= n=3DAQFN2Z6JzeYPdA&ek=3Dsecurity_reset_password_notification From owner-freebsd-arch@FreeBSD.ORG Wed Apr 22 02:42:53 2015 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id C5AA360F for ; Wed, 22 Apr 2015 02:42:53 +0000 (UTC) Received: from mail-ig0-x231.google.com (mail-ig0-x231.google.com [IPv6:2607:f8b0:4001:c05::231]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 92EC11E4B for ; Wed, 22 Apr 2015 02:42:53 +0000 (UTC) Received: by igbpi8 with SMTP id pi8so96598973igb.0 for ; Tue, 21 Apr 2015 19:42:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:date:message-id:subject:from:to:content-type; bh=yhLe4AaieiRv6y25Y4x3zW1RLwIKzDdD0NBEG8EMQIM=; b=aN7efnrKhIDKcmtpH8Sss8PJnFDPR5TApyzdj+SZeG4tkoKsgA/SZq20DF8Kyk9Fyv LRdi33X2VlLiuSJd+C+rsDlU6nwp8ZfMZJHjSrCRY5m6J9BMrvB861mfIAPn+X6bWA6x bhbIbo5+XqmeTIzA9acYUiG8BqmjjswsCOQzhMooSb/z/dlWu0DGs5ue96bD4tm7g1S0 0/CypF0oaqhgE0DjslyyG5rxWdMYBk/K3lp6vISh5YsG0nFwZsrRvilocl4PGk9d78qx SqM1xpuIuA9pXEL75HgV6febjGzJZbHRGciRNHy07GnlcCk2d3H/Fw2mKbuWCIGbdtE9 NNcg== MIME-Version: 1.0 X-Received: by 10.107.46.39 with SMTP id i39mr25407317ioo.8.1429670572562; Tue, 21 Apr 2015 19:42:52 -0700 (PDT) Sender: adrian.chadd@gmail.com Received: by 10.36.17.194 with HTTP; Tue, 21 Apr 2015 19:42:52 -0700 (PDT) Date: Tue, 21 Apr 2015 19:42:52 -0700 X-Google-Sender-Auth: expZw_WdR1gzF2xocJohes0TJNI Message-ID: Subject: RFT: numa policy branch From: Adrian Chadd To: "freebsd-arch@freebsd.org" Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 22 Apr 2015 02:42:54 -0000 Hi! I have a branch off of -HEAD that implements the bare minimum for default, per-thread, per-process NUMA allocation policies and associated syscalls / tool to manipulate it. You can all thank Norse for providing me with kit to test this on (including a Dell R910, which is a quad-socket 40-core, 80-thread westmere-EX box with ~1TB of RAM) and time to do the work, and Dell for loaning me way too much hardware to make this happen. It's not ready for formal review for commit (hence why this is a "RFT") but it works well enough in my local test setup that I think it's worth sharing. What it does: * adds VM domain policy and iterator types; * the system default policy is "first-touch-round-robin", which is "first-touch, and if fail, round-robin to other domains"; * there's per-proc and per-thread policy entries in struct proc / struct thread - enough to play with, but certainly not in its final form; * two syscalls - numa_setaffinity() and numa_getaffinity(); * a very basic numactl program, complete with adrian-standard "MAN=". This doesn't teach ULE or the proc/thread stuff anything about NUMA /scheduling/. That's a whole different ballgame. It also has nothing to do with kernel memory allocation - no ULE, no contigmalloc, no driver affinity, etc. This is purely for controlling the initial page allocation for processes - which for a lot of NUMA workloads is all it needs. How to use: * look at the NUMA config file. You have to add in memory domain support or you won't get the domains setup; * sysctl vm.default_domain controls the default policy. "rr", "first-touch-rr" and "first-touch" are supported here. * numactl (--tid=tid or --pid=pid) --policy=policy, --domain=domain, (--get or --set) (optional command) - like cpuset So, some examples: numactl --pid=1 --get Get the current policy for the given PID: # ./numactl --pid=1 --get Policy: none; domain: -1 Run a job with a fixed-domain allocation from domain 1, but pinned to CPU 0 (which on my system is in domain 0, so it's 100% remote memory access): $ cpuset -l 0 ./numactl --policy=fixed-domain --domain=1 ~/himenobmtxpa xl 0 Run a job with round-robin: $ cpuset -l 0 ./numactl --policy=rr ~/himenobmtxpa xl 0 I'm using the 'pcm-numa.x' tool from the intel-pcm package to ensure that memory accesses are correctly local/remote/round-robin as appropriate. I'd appreciate feedback and any improvements (yes, including a manpage) that people have. Thanks! -adrian From owner-freebsd-arch@FreeBSD.ORG Wed Apr 22 03:04:00 2015 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 74AE39FF for ; Wed, 22 Apr 2015 03:04:00 +0000 (UTC) Received: from mail-ig0-x22c.google.com (mail-ig0-x22c.google.com [IPv6:2607:f8b0:4001:c05::22c]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 48265112E for ; Wed, 22 Apr 2015 03:04:00 +0000 (UTC) Received: by igblo3 with SMTP id lo3so31556766igb.0 for ; Tue, 21 Apr 2015 20:03:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:content-type; bh=2h2jNGfX5vu6DkwSV2SZVFnXedfqJwkCoGEaACgAxLI=; b=DJ4AKevkgG8itb+yie+gGnWZEHnYMMJlv1MGf1sh2CpEyOI8569j2zx9m0IE9JCWxY q0jREnPfbX/Jgh/xc2a2N4QTVFE44gxR1EAfxqruE/s4X4Pedj11JHXFb0XWxHvvcMvP nMfnhdekDBjHWStixatfiCdjNb6FW/RD6Vp2tbzfUV2jh0NDkfJETg+KCjgy9c0OJTEz +3ADm3yGndpNXALqQUj3U3+D3FVB77e9v8SYZ8JdZcCesHRCp0M8lfqyCVQtH48SSL6J y/pLo/WqWvieDL3xMhG6B617A/1PS/uunDMQ5AAHK9AiHxHwTmpg+49gEYpivIsSm3OD ZsRw== MIME-Version: 1.0 X-Received: by 10.50.57.36 with SMTP id f4mr1500608igq.6.1429671839476; Tue, 21 Apr 2015 20:03:59 -0700 (PDT) Sender: adrian.chadd@gmail.com Received: by 10.36.17.194 with HTTP; Tue, 21 Apr 2015 20:03:59 -0700 (PDT) In-Reply-To: References: Date: Tue, 21 Apr 2015 20:03:59 -0700 X-Google-Sender-Auth: 5gT-IrvEKq0GU6xNRGA067FdxpE Message-ID: Subject: Re: RFT: numa policy branch From: Adrian Chadd To: "freebsd-arch@freebsd.org" Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 22 Apr 2015 03:04:00 -0000 OH, and the branch: https://github.com/erikarn/freebsd/tree/local/adrian_numa_policy On 21 April 2015 at 19:42, Adrian Chadd wrote: > Hi! > > I have a branch off of -HEAD that implements the bare minimum for > default, per-thread, per-process NUMA allocation policies and > associated syscalls / tool to manipulate it. > > You can all thank Norse for providing me with kit to test this on > (including a Dell R910, which is a quad-socket 40-core, 80-thread > westmere-EX box with ~1TB of RAM) and time to do the work, and Dell > for loaning me way too much hardware to make this happen. > > It's not ready for formal review for commit (hence why this is a > "RFT") but it works well enough in my local test setup that I think > it's worth sharing. > > What it does: > > * adds VM domain policy and iterator types; > * the system default policy is "first-touch-round-robin", which is > "first-touch, and if fail, round-robin to other domains"; > * there's per-proc and per-thread policy entries in struct proc / > struct thread - enough to play with, but certainly not in its final > form; > * two syscalls - numa_setaffinity() and numa_getaffinity(); > * a very basic numactl program, complete with adrian-standard "MAN=". > > This doesn't teach ULE or the proc/thread stuff anything about NUMA > /scheduling/. That's a whole different ballgame. It also has nothing > to do with kernel memory allocation - no ULE, no contigmalloc, no > driver affinity, etc. This is purely for controlling the initial page > allocation for processes - which for a lot of NUMA workloads is all it > needs. > > How to use: > > * look at the NUMA config file. You have to add in memory domain > support or you won't get the domains setup; > * sysctl vm.default_domain controls the default policy. "rr", > "first-touch-rr" and "first-touch" are supported here. > * numactl (--tid=tid or --pid=pid) --policy=policy, --domain=domain, > (--get or --set) (optional command) - like cpuset > > So, some examples: > > numactl --pid=1 --get > > Get the current policy for the given PID: > > # ./numactl --pid=1 --get > Policy: none; domain: -1 > > Run a job with a fixed-domain allocation from domain 1, but pinned to > CPU 0 (which on my system is in domain 0, so it's 100% remote memory > access): > > $ cpuset -l 0 ./numactl --policy=fixed-domain --domain=1 ~/himenobmtxpa xl 0 > > Run a job with round-robin: > > $ cpuset -l 0 ./numactl --policy=rr ~/himenobmtxpa xl 0 > > I'm using the 'pcm-numa.x' tool from the intel-pcm package to ensure > that memory accesses are correctly local/remote/round-robin as > appropriate. > > I'd appreciate feedback and any improvements (yes, including a > manpage) that people have. > > Thanks! > > > > -adrian From owner-freebsd-arch@FreeBSD.ORG Wed Apr 22 12:55:34 2015 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 1DC5917B; Wed, 22 Apr 2015 12:55:34 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 8A7041D92; Wed, 22 Apr 2015 12:55:33 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id t3MCtLrV060133 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Wed, 22 Apr 2015 15:55:21 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua t3MCtLrV060133 Received: (from kostik@localhost) by tom.home (8.14.9/8.14.9/Submit) id t3MCtLnd060132; Wed, 22 Apr 2015 15:55:21 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Wed, 22 Apr 2015 15:55:21 +0300 From: Konstantin Belousov To: arch@freebsd.org, amd64@freebsd.org Subject: Cx MWAIT Message-ID: <20150422125521.GQ2390@kib.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.0 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on tom.home X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 22 Apr 2015 12:55:34 -0000 Below is the patch to start using mwait instead of 'legacy' port read to enter the higher Cx states when idle. This is the Intel' recommended way of entering Cx, using hints provided by the vendor-specific fixed function hardware GAS encoding. See the "Intel(R) Processor Vendor-Specific ACPI Interface Specification" revision 007. Patch was written after I become interested why my Haswell desktop test box does not report any C-states besides C1. It appeared to be due to combination of BIOS misconfiguration and FreeBSD code lacking mwait support. Also an enchanced C1 entry sequence, "I/O then halt", for coordination of C1 entry with PCH, is supported. The "sti;hlt" sequence usage was consolidated by calling acpi_cpu_c1(). Intel hardware automatically handles per-core and per-package state aggregated from the thread-local C-states, which is indicated as "hardware-coordinated" C-state entry. It is theoretically possible that OS must handle software-coordinated package C-entry, but I am not aware of real processors which need this mode. Intel is hw-coordinated, and it seems that AMD does not advertise mwait sequence for C-states at all. I know that BIOS _CST tables are believed to be buggy. In particular, for Linux, Intel wrote a driver which has hard-coded model tables with the encoding of supported C-states, latencies and caches/busmastering behaviour. I agree with avg that we cannot support this approach. I tried to keep the dev/acpica/acpi_cpu.c to be MI as much as possible. At least, all mwait-specific code is put under #ifdef x86. The acpi_PkgFFH_IntelCPU() helper to parse Intel FFH GAS is MI, but only usable on x86; I believe this is fine. Note that currently ACPI is only used on x86: we lost ia64, but it might be used on arm shortly. diff --git a/sys/amd64/acpica/acpi_machdep.c b/sys/amd64/acpica/acpi_machdep.c index 049b51bb4e..8f88a00 100644 --- a/sys/amd64/acpica/acpi_machdep.c +++ b/sys/amd64/acpica/acpi_machdep.c @@ -87,13 +87,6 @@ acpi_machdep_quirks(int *quirks) return (0); } -void -acpi_cpu_c1() -{ - - __asm __volatile("sti; hlt"); -} - /* * Support for mapping ACPI tables during early boot. Currently this * uses the crashdump map to map each table. However, the crashdump diff --git a/sys/amd64/include/md_var.h b/sys/amd64/include/md_var.h index 9083421..0813e5f 100644 --- a/sys/amd64/include/md_var.h +++ b/sys/amd64/include/md_var.h @@ -91,6 +91,7 @@ struct dumperinfo; void *alloc_fpusave(int flags); void amd64_syscall(struct thread *td, int traced); void busdma_swi(void); +bool cpu_mwait_usable(void); void cpu_probe_amdc1e(void); void cpu_setregs(void); void doreti_iret(void) __asm(__STRING(doreti_iret)); diff --git a/sys/dev/acpica/acpi_cpu.c b/sys/dev/acpica/acpi_cpu.c index 8df2782..3fb21a6 100644 --- a/sys/dev/acpica/acpi_cpu.c +++ b/sys/dev/acpica/acpi_cpu.c @@ -47,6 +47,8 @@ __FBSDID("$FreeBSD$"); #include #if defined(__amd64__) || defined(__i386__) #include +#include +#include #endif #include @@ -70,6 +72,10 @@ struct acpi_cx { uint32_t power; /* Power consumed (mW). */ int res_type; /* Resource type for p_lvlx. */ int res_rid; /* Resource ID for p_lvlx. */ + bool do_mwait; + uint32_t mwait_hint; + bool mwait_hw_coord; + bool mwait_bm_avoidance; }; #define MAX_CX_STATES 8 @@ -128,6 +134,12 @@ struct acpi_cpu_device { #define PIIX4_STOP_BREAK_MASK (PIIX4_BRLD_EN_IRQ0 | PIIX4_BRLD_EN_IRQ | PIIX4_BRLD_EN_IRQ8) #define PIIX4_PCNTRL_BST_EN (1<<10) +#define CST_FFH_VENDOR_INTEL 1 +#define CST_FFH_INTEL_CL_C1IO 1 +#define CST_FFH_INTEL_CL_MWAIT 2 +#define CST_FFH_MWAIT_HW_COORD 0x0001 +#define CST_FFH_MWAIT_BM_AVOID 0x0002 + /* Allow users to ignore processor orders in MADT. */ static int cpu_unordered; SYSCTL_INT(_debug_acpi, OID_AUTO, cpu_unordered, CTLFLAG_RDTUN, @@ -348,7 +360,17 @@ acpi_cpu_attach(device_t dev) * so advertise this ourselves. Note this is not the same as independent * SMP control where each CPU can have different settings. */ - sc->cpu_features = ACPI_CAP_SMP_SAME | ACPI_CAP_SMP_SAME_C3; + sc->cpu_features = ACPI_CAP_SMP_SAME | ACPI_CAP_SMP_SAME_C3 | + ACPI_CAP_C1_IO_HALT; + +#if defined(__i386__) || defined(__amd64__) + /* + * Ask for MWAIT modes if interrupts work reasonable with MWAIT. + */ + if (cpu_mwait_usable()) + sc->cpu_features |= ACPI_CAP_SMP_C1_NATIVE | ACPI_CAP_SMP_C3_NATIVE; +#endif + if (devclass_get_drivers(acpi_cpu_devclass, &drivers, &drv_count) == 0) { for (i = 0; i < drv_count; i++) { if (ACPI_GET_FEATURES(drivers[i], &features) == 0) @@ -720,6 +742,27 @@ acpi_cpu_generic_cx_probe(struct acpi_cpu_softc *sc) } } +static void +acpi_cpu_cx_cst_mwait(struct acpi_cx *cx_ptr, uint64_t address, int accsize) +{ + + cx_ptr->do_mwait = true; + cx_ptr->mwait_hint = address & 0xffffffff; + cx_ptr->mwait_hw_coord = (accsize & CST_FFH_MWAIT_HW_COORD) != 0; + cx_ptr->mwait_bm_avoidance = (accsize & CST_FFH_MWAIT_BM_AVOID) != 0; +} + +static void +acpi_cpu_cx_cst_free_plvlx(device_t cpu_dev, struct acpi_cx *cx_ptr) +{ + + if (cx_ptr->p_lvlx == NULL) + return; + bus_release_resource(cpu_dev, cx_ptr->res_type, cx_ptr->res_rid, + cx_ptr->p_lvlx); + cx_ptr->p_lvlx = NULL; +} + /* * Parse a _CST package and set up its Cx states. Since the _CST object * can change dynamically, our notify handler may call this function @@ -734,7 +777,8 @@ acpi_cpu_cx_cst(struct acpi_cpu_softc *sc) ACPI_OBJECT *top; ACPI_OBJECT *pkg; uint32_t count; - int i; + uint64_t address; + int i, vendor, class, accsize; ACPI_FUNCTION_TRACE((char *)(uintptr_t)__func__); @@ -790,6 +834,30 @@ acpi_cpu_cx_cst(struct acpi_cpu_softc *sc) /* Validate the state to see if we should use it. */ switch (cx_ptr->type) { case ACPI_STATE_C1: + acpi_cpu_cx_cst_free_plvlx(sc->cpu_dev, cx_ptr); +#if defined(__i386__) || defined(__amd64__) + if (acpi_PkgFFH_IntelCpu(pkg, 0, &vendor, &class, &address, + &accsize) == 0 && vendor == CST_FFH_VENDOR_INTEL) { + if (class == CST_FFH_INTEL_CL_C1IO) { + /* C1 I/O then Halt */ + cx_ptr->res_rid = sc->cpu_cx_count; + bus_set_resource(sc->cpu_dev, SYS_RES_IOPORT, + cx_ptr->res_rid, address, 1); + cx_ptr->p_lvlx = bus_alloc_resource_any(sc->cpu_dev, + SYS_RES_IOPORT, &cx_ptr->res_rid, RF_ACTIVE | + RF_SHAREABLE); + if (cx_ptr->p_lvlx == NULL) { + bus_delete_resource(sc->cpu_dev, SYS_RES_IOPORT, + cx_ptr->res_rid); + device_printf(sc->cpu_dev, + "C1 I/O failed to allocate port %d, " + "degrading to C1 Halt", (int)address); + } + } else if (class == CST_FFH_INTEL_CL_MWAIT) { + acpi_cpu_cx_cst_mwait(cx_ptr, address, accsize); + } + } +#endif if (sc->cpu_cx_states[0].type == ACPI_STATE_C0) { /* This is the first C1 state. Use the reserved slot. */ sc->cpu_cx_states[0] = *cx_ptr; @@ -818,23 +886,34 @@ acpi_cpu_cx_cst(struct acpi_cpu_softc *sc) } /* Free up any previous register. */ - if (cx_ptr->p_lvlx != NULL) { - bus_release_resource(sc->cpu_dev, cx_ptr->res_type, cx_ptr->res_rid, - cx_ptr->p_lvlx); - cx_ptr->p_lvlx = NULL; - } + acpi_cpu_cx_cst_free_plvlx(sc->cpu_dev, cx_ptr); /* Allocate the control register for C2 or C3. */ - cx_ptr->res_rid = sc->cpu_cx_count; - acpi_PkgGas(sc->cpu_dev, pkg, 0, &cx_ptr->res_type, &cx_ptr->res_rid, - &cx_ptr->p_lvlx, RF_SHAREABLE); - if (cx_ptr->p_lvlx) { +#if defined(__i386__) || defined(__amd64__) + if (acpi_PkgFFH_IntelCpu(pkg, 0, &vendor, &class, &address, + &accsize) == 0 && vendor == CST_FFH_VENDOR_INTEL && + class == CST_FFH_INTEL_CL_MWAIT) { + /* Native C State Instruction use (mwait) */ + acpi_cpu_cx_cst_mwait(cx_ptr, address, accsize); ACPI_DEBUG_PRINT((ACPI_DB_INFO, - "acpi_cpu%d: Got C%d - %d latency\n", - device_get_unit(sc->cpu_dev), cx_ptr->type, - cx_ptr->trans_lat)); + "acpi_cpu%d: Got C%d/mwait - %d latency\n", + device_get_unit(sc->cpu_dev), cx_ptr->type, cx_ptr->trans_lat)); cx_ptr++; sc->cpu_cx_count++; + } else +#endif + { + cx_ptr->res_rid = sc->cpu_cx_count; + acpi_PkgGas(sc->cpu_dev, pkg, 0, &cx_ptr->res_type, + &cx_ptr->res_rid, &cx_ptr->p_lvlx, RF_SHAREABLE); + if (cx_ptr->p_lvlx) { + ACPI_DEBUG_PRINT((ACPI_DB_INFO, + "acpi_cpu%d: Got C%d - %d latency\n", + device_get_unit(sc->cpu_dev), cx_ptr->type, + cx_ptr->trans_lat)); + cx_ptr++; + sc->cpu_cx_count++; + } } } AcpiOsFree(buf.Pointer); @@ -1043,7 +1122,14 @@ acpi_cpu_idle(sbintime_t sbt) */ if (cx_next->type == ACPI_STATE_C1) { cputicks = cpu_ticks(); - acpi_cpu_c1(); + if (cx_next->p_lvlx != NULL) { + /* C1 I/O then Halt */ + CPU_GET_REG(cx_next->p_lvlx, 1); + } + if (cx_next->do_mwait) + acpi_cpu_idle_mwait(cx_next->mwait_hint); + else + acpi_cpu_c1(); end_time = ((cpu_ticks() - cputicks) << 20) / cpu_tickrate(); if (curthread->td_critnest == 0) end_time = min(end_time, 500000 / hz); @@ -1055,7 +1141,7 @@ acpi_cpu_idle(sbintime_t sbt) * For C3, disable bus master arbitration and enable bus master wake * if BM control is available, otherwise flush the CPU cache. */ - if (cx_next->type == ACPI_STATE_C3) { + if (cx_next->type == ACPI_STATE_C3 || cx_next->mwait_bm_avoidance) { if ((cpu_quirks & CPU_QUIRK_NO_BM_CTRL) == 0) { AcpiWriteBitRegister(ACPI_BITREG_ARB_DISABLE, 1); AcpiWriteBitRegister(ACPI_BITREG_BUS_MASTER_RLD, 1); @@ -1076,7 +1162,10 @@ acpi_cpu_idle(sbintime_t sbt) start_time = 0; cputicks = cpu_ticks(); } - CPU_GET_REG(cx_next->p_lvlx, 1); + if (cx_next->do_mwait) + acpi_cpu_idle_mwait(cx_next->mwait_hint); + else + CPU_GET_REG(cx_next->p_lvlx, 1); /* * Read the end time twice. Since it may take an arbitrary time @@ -1092,8 +1181,8 @@ acpi_cpu_idle(sbintime_t sbt) end_time = ((cpu_ticks() - cputicks) << 20) / cpu_tickrate(); /* Enable bus master arbitration and disable bus master wakeup. */ - if (cx_next->type == ACPI_STATE_C3 && - (cpu_quirks & CPU_QUIRK_NO_BM_CTRL) == 0) { + if ((cx_next->type == ACPI_STATE_C3 || cx_next->mwait_bm_avoidance) && + (cpu_quirks & CPU_QUIRK_NO_BM_CTRL) == 0) { AcpiWriteBitRegister(ACPI_BITREG_ARB_DISABLE, 0); AcpiWriteBitRegister(ACPI_BITREG_BUS_MASTER_RLD, 0); } diff --git a/sys/dev/acpica/acpi_package.c b/sys/dev/acpica/acpi_package.c index e38fea5..c1070cb 100644 --- a/sys/dev/acpica/acpi_package.c +++ b/sys/dev/acpica/acpi_package.c @@ -120,6 +120,28 @@ acpi_PkgGas(device_t dev, ACPI_OBJECT *res, int idx, int *type, int *rid, return (acpi_bus_alloc_gas(dev, type, rid, &gas, dst, flags)); } +int +acpi_PkgFFH_IntelCpu(ACPI_OBJECT *res, int idx, int *vendor, int *class, + uint64_t *address, int *accsize) +{ + ACPI_GENERIC_ADDRESS gas; + ACPI_OBJECT *obj; + + obj = &res->Package.Elements[idx]; + if (obj == NULL || obj->Type != ACPI_TYPE_BUFFER || + obj->Buffer.Length < sizeof(ACPI_GENERIC_ADDRESS) + 3) + return (EINVAL); + + memcpy(&gas, obj->Buffer.Pointer + 3, sizeof(gas)); + if (gas.SpaceId != ACPI_ADR_SPACE_FIXED_HARDWARE) + return (ERESTART); + *vendor = gas.BitWidth; + *class = gas.BitOffset; + *address = gas.Address; + *accsize = gas.AccessWidth; + return (0); +} + ACPI_HANDLE acpi_GetReference(ACPI_HANDLE scope, ACPI_OBJECT *obj) { diff --git a/sys/dev/acpica/acpivar.h b/sys/dev/acpica/acpivar.h index 2e2b96d..cbd4bd9 100644 --- a/sys/dev/acpica/acpivar.h +++ b/sys/dev/acpica/acpivar.h @@ -467,6 +467,8 @@ int acpi_PkgInt32(ACPI_OBJECT *res, int idx, uint32_t *dst); int acpi_PkgStr(ACPI_OBJECT *res, int idx, void *dst, size_t size); int acpi_PkgGas(device_t dev, ACPI_OBJECT *res, int idx, int *type, int *rid, struct resource **dst, u_int flags); +int acpi_PkgFFH_IntelCpu(ACPI_OBJECT *res, int idx, int *vendor, + int *class, uint64_t *address, int *accsize); ACPI_HANDLE acpi_GetReference(ACPI_HANDLE scope, ACPI_OBJECT *obj); /* diff --git a/sys/i386/acpica/acpi_machdep.c b/sys/i386/acpica/acpi_machdep.c index 049354b..4c79691 100644 --- a/sys/i386/acpica/acpi_machdep.c +++ b/sys/i386/acpica/acpi_machdep.c @@ -106,13 +106,6 @@ acpi_machdep_quirks(int *quirks) return (0); } -void -acpi_cpu_c1() -{ - - __asm __volatile("sti; hlt"); -} - /* * Support for mapping ACPI tables during early boot. This abuses the * crashdump map because the kernel cannot allocate KVA in diff --git a/sys/i386/include/md_var.h b/sys/i386/include/md_var.h index bffdd57..b5bd35e 100644 --- a/sys/i386/include/md_var.h +++ b/sys/i386/include/md_var.h @@ -97,6 +97,7 @@ struct dumperinfo; void *alloc_fpusave(int flags); void bcopyb(const void *from, void *to, size_t len); void busdma_swi(void); +bool cpu_mwait_usable(void); void cpu_probe_amdc1e(void); void cpu_setregs(void); void cpu_switch_load_gs(void) __asm(__STRING(cpu_switch_load_gs)); diff --git a/sys/x86/include/acpica_machdep.h b/sys/x86/include/acpica_machdep.h index 46080c0..136285c 100644 --- a/sys/x86/include/acpica_machdep.h +++ b/sys/x86/include/acpica_machdep.h @@ -74,6 +74,7 @@ enum intr_polarity; void acpi_SetDefaultIntrModel(int model); void acpi_cpu_c1(void); +void acpi_cpu_idle_mwait(uint32_t mwait_hint); void *acpi_map_table(vm_paddr_t pa, const char *sig); void acpi_unmap_table(void *table); vm_paddr_t acpi_find_table(const char *sig); diff --git a/sys/x86/x86/cpu_machdep.c b/sys/x86/x86/cpu_machdep.c index 846a123..d1d49f4 100644 --- a/sys/x86/x86/cpu_machdep.c +++ b/sys/x86/x86/cpu_machdep.c @@ -90,6 +90,7 @@ __FBSDID("$FreeBSD$"); #ifdef SMP #include #endif +#include #include #include @@ -130,6 +131,27 @@ cpu_flush_dcache(void *ptr, size_t len) /* Not applicable */ } +void +acpi_cpu_c1(void) +{ + + __asm __volatile("sti; hlt"); +} + +void +acpi_cpu_idle_mwait(uint32_t mwait_hint) +{ + int *state; + + state = (int *)PCPU_PTR(monitorbuf); + /* + * XXXKIB. Software coordination mode should be supported, + * but all Intel CPUs provide hardware coordination. + */ + cpu_monitor(state, 0, 0); + cpu_mwait(MWAIT_INTRBREAK, mwait_hint); +} + /* Get current clock frequency for the given cpu id. */ int cpu_est_clockrate(int cpu_id, uint64_t *rate) @@ -232,6 +254,15 @@ cpu_halt(void) #endif +bool +cpu_mwait_usable(void) +{ + + return ((cpu_feature2 & CPUID2_MON) != 0 && ((cpu_mon_mwait_flags & + (CPUID5_MON_MWAIT_EXT | CPUID5_MWAIT_INTRBREAK)) == + (CPUID5_MON_MWAIT_EXT | CPUID5_MWAIT_INTRBREAK))); +} + void (*cpu_idle_hook)(sbintime_t) = NULL; /* ACPI idle hook. */ static int cpu_ident_amdc1e = 0; /* AMD C1E supported. */ static int idle_mwait = 1; /* Use MONITOR/MWAIT for short idle. */ @@ -258,7 +289,7 @@ cpu_idle_acpi(sbintime_t sbt) else if (cpu_idle_hook) cpu_idle_hook(sbt); else - __asm __volatile("sti; hlt"); + acpi_cpu_c1(); *state = STATE_RUNNING; } #endif /* !PC98 */ @@ -292,7 +323,7 @@ cpu_idle_hlt(sbintime_t sbt) if (sched_runnable()) enable_intr(); else - __asm __volatile("sti; hlt"); + acpi_cpu_c1(); *state = STATE_RUNNING; } #endif From owner-freebsd-arch@FreeBSD.ORG Thu Apr 23 00:52:04 2015 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id F1489BE3 for ; Thu, 23 Apr 2015 00:52:04 +0000 (UTC) Received: from mail-ig0-x230.google.com (mail-ig0-x230.google.com [IPv6:2607:f8b0:4001:c05::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id BBE811E67 for ; Thu, 23 Apr 2015 00:52:04 +0000 (UTC) Received: by igbyr2 with SMTP id yr2so11819249igb.0 for ; Wed, 22 Apr 2015 17:52:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:content-type; bh=/e+eac/P4aSITWkT6uSi7ovfQA0Syu4bqFCNYivTmLI=; b=uxJ3gd46nWsrOUf2vfHmq+gv8MANvTByZ0IiEbO7jUrtOPTMofwSljzocV0DhhagOF fIiteqzdBzpdcd9mnDSdDz3rH/zNYXVbB+tZx+LaagwTjIb1OVelFYwQVYFuez02diMO AxqfCxO48vCEspP7cWnfhEb5xgFgm/aYWeKuRDXESLbzwHyU6mXjdQ8VHpIBrP8JMLFO yovXpfTTQXg2CHchDclqMiFFs1wxzs54kODgwvLNmXHoZ6ZeeI6qDnQbWpDAQC1eKORq Zvi8zldlEjEc4O679Dye6OZa4hL1e3st6qYPZ0L01CZE/4XzfxVIYKBiEnAZylQNKWNV ZBhQ== MIME-Version: 1.0 X-Received: by 10.42.20.197 with SMTP id h5mr847583icb.22.1429750324099; Wed, 22 Apr 2015 17:52:04 -0700 (PDT) Sender: adrian.chadd@gmail.com Received: by 10.36.38.133 with HTTP; Wed, 22 Apr 2015 17:52:04 -0700 (PDT) In-Reply-To: References: Date: Wed, 22 Apr 2015 17:52:04 -0700 X-Google-Sender-Auth: 8XhKYjuj1kEGTErFWT4wzo6n1p4 Message-ID: Subject: Re: RFT: numa policy branch From: Adrian Chadd To: "freebsd-arch@freebsd.org" Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Apr 2015 00:52:05 -0000 On 21 April 2015 at 20:03, Adrian Chadd wrote: > OH, and the branch: > > https://github.com/erikarn/freebsd/tree/local/adrian_numa_policy Hi! Update: * the whole setup/copy process for thread and proc domain policies is slightly less dirty now; * the phys layer now checks domain policy in this order:: thread -> proc -> default; so now setting a proc policy will take action for all threads in that proc that don't have a more specific domain policy; * numactl is slightly less terrible to use. Todo: * for correctness, I should call the free methods on the domain policy whenever a thread/proc is destroyed. * .. write manpages for all of this. * Test on AMD NUMA systems - who has one I can poke at? -adrian From owner-freebsd-arch@FreeBSD.ORG Thu Apr 23 05:06:37 2015 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A87FDF27 for ; Thu, 23 Apr 2015 05:06:37 +0000 (UTC) Received: from popular.555581.net (popular.555581.net [23.92.219.75]) by mx1.freebsd.org (Postfix) with ESMTP id 3859616F9 for ; Thu, 23 Apr 2015 05:06:37 +0000 (UTC) To: freebsd-arch@freebsd.org Subject: business leads Message-ID: <57498bd8aab0343999e16525305f9e64@cakegroup.com> Date: Thu, 23 Apr 2015 06:54:38 +0200 From: "John" Reply-To: healerc@tom.com MIME-Version: 1.0 X-Mailer-LID: 5 X-Mailer-RecptId: 6480503 X-Mailer-SID: 282 X-Mailer-Sent-By: 1 Content-Type: text/plain; format=flowed; charset="UTF-8" Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Apr 2015 05:06:37 -0000 Hey, You are receiving this email because we wish you to use our email marketing service. We wish to be your email marketing partner, we can grow your business 2-5 times than now. If you would require more information please send us an email and we would be glad to discuss the project requirements with you soon. Looking forward to your positive response. Kind Regards John Email: pottleyo@aliyun.com ------------------------------------------------- This e-mail message and its attachments (if any) are intended solely for the use of the addressee(s) hereof. In addition, this message and the attachments (if any) may contain information that is confidential, privileged and exempt from disclosure under applicable law. If you are not the intended recipient of this message, you are prohibited from reading, disclosing, reproducing, distributing, disseminating or otherwise using this transmission. Delivery of this message to any person other than the intended recipient is not intended to waive any right or privilege. If you have received this message in error, please promptly notify the sender and immediately delete this message from your system. If you don't wish our future news letter, pls send address to ttickmay@aliyun.com for removal. From owner-freebsd-arch@FreeBSD.ORG Thu Apr 23 06:38:23 2015 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 5B8A5E1A; Thu, 23 Apr 2015 06:38:23 +0000 (UTC) Received: from mail-ie0-x233.google.com (mail-ie0-x233.google.com [IPv6:2607:f8b0:4001:c03::233]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 283151FFF; Thu, 23 Apr 2015 06:38:23 +0000 (UTC) Received: by iedfl3 with SMTP id fl3so61077885ied.1; Wed, 22 Apr 2015 23:38:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:date:message-id:subject:from:to:content-type; bh=L0c+XTabjEjeV5H3tQ9ZAUEkFJCSRkU+J2u56xSBlSY=; b=AN9ldd3geq6w+a472mnISrkQ/TVOl+xDw7erspFglTUB9+0zLpaN8ouO4hdo2+HrRl J5P77ouU54zkBHJv6KRD5yE1j0VhjgwquW6JygEaAhhFhYM+kOI21kFDHypXwgWl30g0 EkrY5+NlAZku8s26JO4aL3a5535iJz609a4ni3TlY06/SDfiygnVl0DKU4nax7izQCst 7vpu4ua6jD6WiXjb0hVjG/ssNldcLuQiITXRdWlNwtk0u5Sbcu4gJyAaQ1jWvjJx4uRy rEKvkZ/uiIjvUGDd2Axk2v5DqQLDWYUoenKYJjsFHEf/maZrJzkBeJ63ANpT1qUpKOXU LJ1w== MIME-Version: 1.0 X-Received: by 10.107.155.13 with SMTP id d13mr1666838ioe.29.1429771102395; Wed, 22 Apr 2015 23:38:22 -0700 (PDT) Sender: adrian.chadd@gmail.com Received: by 10.36.38.133 with HTTP; Wed, 22 Apr 2015 23:38:22 -0700 (PDT) Date: Wed, 22 Apr 2015 23:38:22 -0700 X-Google-Sender-Auth: qxH_SXhU5_oShWVXOneeDKWlJc0 Message-ID: Subject: help with sandybridge/ivybridge hwpmc NUMA DRAM counters From: Adrian Chadd To: "freebsd-arch@freebsd.org" , freebsd-current Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Apr 2015 06:38:23 -0000 hi all, I'm having a spot of problem trying to get the local/remote dram counters working on the NUMA sandybridge and ivybridge systems here. Things work fine using intel-pcm, but those same counters don't work with hwpmc. Sandybridge - there's apparently an MSR that needs to be fiddled if the counters are active. Ivybridge - the v1 and v2 chips have different local/remote dram counters, and on my v2 setup there's actually /two/ LOCAL_DRAM: adrian@testbox1:~/git/github/erikarn/freebsd/sys/dev/hwpmc % pmccontrol -L | grep DRAM MEM_LOAD_UOPS_LLC_MISS_RETIRED.LOCAL_DRAM MEM_LOAD_UOPS_LLC_MISS_RETIRED.LOCAL_DRAM MEM_LOAD_UOPS_LLC_MISS_RETIRED.REMOTE_DRAM Now, i may be able to get to figuring this out at some point in the distant future, but I'd really appreciate any help I can get now. I'm now at the point with the NUMA affinity API stuff where I'm now chasing down when things are correctly working with local/remote RAM, and I'd really like to use hwpmc in sampling mode to work on it. Thanks for any help! -adrian From owner-freebsd-arch@FreeBSD.ORG Thu Apr 23 06:50:10 2015 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id B1C3C254; Thu, 23 Apr 2015 06:50:10 +0000 (UTC) Received: from zxy.spb.ru (zxy.spb.ru [195.70.199.98]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 6BDC61107; Thu, 23 Apr 2015 06:50:10 +0000 (UTC) Received: from slw by zxy.spb.ru with local (Exim 4.84 (FreeBSD)) (envelope-from ) id 1YlAxb-000PNf-1J; Thu, 23 Apr 2015 09:50:07 +0300 Date: Thu, 23 Apr 2015 09:50:06 +0300 From: Slawa Olhovchenkov To: Adrian Chadd Cc: "freebsd-arch@freebsd.org" , freebsd-current Subject: Re: help with sandybridge/ivybridge hwpmc NUMA DRAM counters Message-ID: <20150423065006.GV1394@zxy.spb.ru> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: slw@zxy.spb.ru X-SA-Exim-Scanned: No (on zxy.spb.ru); SAEximRunCond expanded to false X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Apr 2015 06:50:10 -0000 On Wed, Apr 22, 2015 at 11:38:22PM -0700, Adrian Chadd wrote: > hi all, > > I'm having a spot of problem trying to get the local/remote dram > counters working on the NUMA sandybridge and ivybridge systems here. > > Things work fine using intel-pcm, but those same counters don't work with hwpmc. > > Sandybridge - there's apparently an MSR that needs to be fiddled if > the counters are active. > > Ivybridge - the v1 and v2 chips have different local/remote dram > counters, and on my v2 setup there's actually /two/ LOCAL_DRAM: # pmccontrol -L | grep DRAM MEM_LOAD_UOPS_LLC_MISS_RETIRED.LOCAL_DRAM MEM_LOAD_UOPS_LLC_MISS_RETIRED.REMOTE_DRAM CPU: Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz (2600.05-MHz K8-class CPU) > adrian@testbox1:~/git/github/erikarn/freebsd/sys/dev/hwpmc % > pmccontrol -L | grep DRAM > MEM_LOAD_UOPS_LLC_MISS_RETIRED.LOCAL_DRAM > MEM_LOAD_UOPS_LLC_MISS_RETIRED.LOCAL_DRAM > MEM_LOAD_UOPS_LLC_MISS_RETIRED.REMOTE_DRAM > > Now, i may be able to get to figuring this out at some point in the > distant future, but I'd really appreciate any help I can get now. I'm > now at the point with the NUMA affinity API stuff where I'm now > chasing down when things are correctly working with local/remote RAM, > and I'd really like to use hwpmc in sampling mode to work on it. > > Thanks for any help! > > > > -adrian > _______________________________________________ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" From owner-freebsd-arch@FreeBSD.ORG Thu Apr 23 07:20:14 2015 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id EA2E6846; Thu, 23 Apr 2015 07:20:14 +0000 (UTC) Received: from mail-ig0-x230.google.com (mail-ig0-x230.google.com [IPv6:2607:f8b0:4001:c05::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id B4D6513BE; Thu, 23 Apr 2015 07:20:14 +0000 (UTC) Received: by igblo3 with SMTP id lo3so16305795igb.1; Thu, 23 Apr 2015 00:20:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=YX5dm31C1BPGuJ0Fam/OYg2v9+BFx3SI9Km9M5LPQCw=; b=gfch/y7e4OvCr8pRIajahZRmcAusTZM7wGbvcEXXMvOqWO9iVYGbQz+0evnbsy5EQH 4coDUW3fSCmx+Sph64AXCH50XfkfTlMZMv+KOdieGQOpJ+oUl8wyGA5Vq7jNVMz0CStU SL3BQdfckiCeiFPP5t+xSXMtwN1P56g2F+tQrkALI7ka7OjQxoG+2da5OXNI+MJb2TUZ mdmfV43gW/hQjJLv6mhlGwqtxl26kWbklBgyutTQUwtZ1FhVRUV1VrsXsu+FwslvxmkZ p68ZuMzvQG9d4L89SQqQNmQjD0NVInOkFVL/p0QqiujRZUryvzgRsetTLzT7cg03JRIc 4WEA== MIME-Version: 1.0 X-Received: by 10.50.57.51 with SMTP id f19mr2945060igq.6.1429773614170; Thu, 23 Apr 2015 00:20:14 -0700 (PDT) Sender: adrian.chadd@gmail.com Received: by 10.36.38.133 with HTTP; Thu, 23 Apr 2015 00:20:14 -0700 (PDT) In-Reply-To: <20150423065006.GV1394@zxy.spb.ru> References: <20150423065006.GV1394@zxy.spb.ru> Date: Thu, 23 Apr 2015 00:20:14 -0700 X-Google-Sender-Auth: 7bptOHC1A3nAiio0P4Po5Gils9I Message-ID: Subject: Re: help with sandybridge/ivybridge hwpmc NUMA DRAM counters From: Adrian Chadd To: Slawa Olhovchenkov Cc: "freebsd-arch@freebsd.org" , freebsd-current Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Apr 2015 07:20:15 -0000 Yeah, on stable/10. But on -HEAD it's different. There's two entries - one for d3_01 and one for d3_03. -adrian From owner-freebsd-arch@FreeBSD.ORG Thu Apr 23 07:22:46 2015 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 9243BA1C; Thu, 23 Apr 2015 07:22:46 +0000 (UTC) Received: from zxy.spb.ru (zxy.spb.ru [195.70.199.98]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4BA2C14C4; Thu, 23 Apr 2015 07:22:46 +0000 (UTC) Received: from slw by zxy.spb.ru with local (Exim 4.84 (FreeBSD)) (envelope-from ) id 1YlBT9-000Pvy-Ty; Thu, 23 Apr 2015 10:22:44 +0300 Date: Thu, 23 Apr 2015 10:22:43 +0300 From: Slawa Olhovchenkov To: Adrian Chadd Cc: "freebsd-arch@freebsd.org" , freebsd-current Subject: Re: help with sandybridge/ivybridge hwpmc NUMA DRAM counters Message-ID: <20150423072243.GM9114@zxy.spb.ru> References: <20150423065006.GV1394@zxy.spb.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: slw@zxy.spb.ru X-SA-Exim-Scanned: No (on zxy.spb.ru); SAEximRunCond expanded to false X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Apr 2015 07:22:46 -0000 On Thu, Apr 23, 2015 at 12:20:14AM -0700, Adrian Chadd wrote: > Yeah, on stable/10. But on -HEAD it's different. There's two entries - > one for d3_01 and one for d3_03. What CPU model? From owner-freebsd-arch@FreeBSD.ORG Thu Apr 23 07:24:07 2015 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id B8207C44; Thu, 23 Apr 2015 07:24:07 +0000 (UTC) Received: from mail-ig0-x236.google.com (mail-ig0-x236.google.com [IPv6:2607:f8b0:4001:c05::236]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 8163014E1; Thu, 23 Apr 2015 07:24:07 +0000 (UTC) Received: by igbyr2 with SMTP id yr2so17012827igb.0; Thu, 23 Apr 2015 00:24:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=01BgYjGT48qk4Tx5h6A+0C72kbWzEZb4n3orkUPxsHw=; b=n+e4f+7wEtskkCsjzoFh1IuYlcHoSgvzSYXwJ9aoMRTGTChjtcyyVy+0M0OPoFBug7 lcUn0g6fS0zAYQQ4kDqk/IxrQexkC+GPgeisrNHx5d4dQ4ax1ocXQRE2vEhCRlS/bGK1 H7OIGFm4jjHHk8QMiqaW4Hi7u3MSYmXYHVMvoumy39W3eFR93uUud4wU/O6N3Bohw4eL cEEyoXyz74BNKm5LXERVnjFfCDnN2Fmce41WkWF+Qm08Ns1UuXmi0h80F9mP85W4ECjs 2O3+UtEkeB7sn8CmsoU6Uy/rgh0RYlIHLvS+AOTTxbeyEZV58qIZRlRuHWtiKyzUhReU HG3g== MIME-Version: 1.0 X-Received: by 10.42.137.202 with SMTP id z10mr2071751ict.37.1429773846956; Thu, 23 Apr 2015 00:24:06 -0700 (PDT) Sender: adrian.chadd@gmail.com Received: by 10.36.38.133 with HTTP; Thu, 23 Apr 2015 00:24:06 -0700 (PDT) In-Reply-To: <20150423072243.GM9114@zxy.spb.ru> References: <20150423065006.GV1394@zxy.spb.ru> <20150423072243.GM9114@zxy.spb.ru> Date: Thu, 23 Apr 2015 00:24:06 -0700 X-Google-Sender-Auth: EsAAgMHyHEbS3kYjMI0OE96FVJw Message-ID: Subject: Re: help with sandybridge/ivybridge hwpmc NUMA DRAM counters From: Adrian Chadd To: Slawa Olhovchenkov Cc: "freebsd-arch@freebsd.org" , freebsd-current Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Apr 2015 07:24:07 -0000 On 23 April 2015 at 00:22, Slawa Olhovchenkov wrote: > On Thu, Apr 23, 2015 at 12:20:14AM -0700, Adrian Chadd wrote: > >> Yeah, on stable/10. But on -HEAD it's different. There's two entries - >> one for d3_01 and one for d3_03. > > What CPU model? CPU: Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz (2600.06-MHz K8-class CPU) Origin="GenuineIntel" Id=0x306e4 Family=0x6 Model=0x3e Stepping=4 -adrian From owner-freebsd-arch@FreeBSD.ORG Thu Apr 23 07:24:57 2015 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id D843FDF1; Thu, 23 Apr 2015 07:24:57 +0000 (UTC) Received: from zxy.spb.ru (zxy.spb.ru [195.70.199.98]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 901881500; Thu, 23 Apr 2015 07:24:57 +0000 (UTC) Received: from slw by zxy.spb.ru with local (Exim 4.84 (FreeBSD)) (envelope-from ) id 1YlBVH-000PzS-Hc; Thu, 23 Apr 2015 10:24:55 +0300 Date: Thu, 23 Apr 2015 10:24:55 +0300 From: Slawa Olhovchenkov To: Adrian Chadd Cc: "freebsd-arch@freebsd.org" , freebsd-current Subject: Re: help with sandybridge/ivybridge hwpmc NUMA DRAM counters Message-ID: <20150423072455.GN9114@zxy.spb.ru> References: <20150423065006.GV1394@zxy.spb.ru> <20150423072243.GM9114@zxy.spb.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: slw@zxy.spb.ru X-SA-Exim-Scanned: No (on zxy.spb.ru); SAEximRunCond expanded to false X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Apr 2015 07:24:58 -0000 On Thu, Apr 23, 2015 at 12:24:06AM -0700, Adrian Chadd wrote: > On 23 April 2015 at 00:22, Slawa Olhovchenkov wrote: > > On Thu, Apr 23, 2015 at 12:20:14AM -0700, Adrian Chadd wrote: > > > >> Yeah, on stable/10. But on -HEAD it's different. There's two entries - > >> one for d3_01 and one for d3_03. > > > > What CPU model? > > CPU: Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz (2600.06-MHz K8-class CPU) > Origin="GenuineIntel" Id=0x306e4 Family=0x6 Model=0x3e Stepping=4 Same with me? From owner-freebsd-arch@FreeBSD.ORG Thu Apr 23 08:07:09 2015 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 6989C3B9; Thu, 23 Apr 2015 08:07:09 +0000 (UTC) Received: from zxy.spb.ru (zxy.spb.ru [195.70.199.98]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 20D1519AE; Thu, 23 Apr 2015 08:07:09 +0000 (UTC) Received: from slw by zxy.spb.ru with local (Exim 4.84 (FreeBSD)) (envelope-from ) id 1YlCA5-0000lr-Ht; Thu, 23 Apr 2015 11:07:05 +0300 Date: Thu, 23 Apr 2015 11:07:05 +0300 From: Slawa Olhovchenkov To: Adrian Chadd Cc: "freebsd-arch@freebsd.org" , freebsd-current Subject: Re: help with sandybridge/ivybridge hwpmc NUMA DRAM counters Message-ID: <20150423080705.GO9114@zxy.spb.ru> References: <20150423065006.GV1394@zxy.spb.ru> <20150423072243.GM9114@zxy.spb.ru> <20150423072455.GN9114@zxy.spb.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150423072455.GN9114@zxy.spb.ru> User-Agent: Mutt/1.5.23 (2014-03-12) X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: slw@zxy.spb.ru X-SA-Exim-Scanned: No (on zxy.spb.ru); SAEximRunCond expanded to false X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Apr 2015 08:07:09 -0000 On Thu, Apr 23, 2015 at 10:24:55AM +0300, Slawa Olhovchenkov wrote: > On Thu, Apr 23, 2015 at 12:24:06AM -0700, Adrian Chadd wrote: > > > On 23 April 2015 at 00:22, Slawa Olhovchenkov wrote: > > > On Thu, Apr 23, 2015 at 12:20:14AM -0700, Adrian Chadd wrote: > > > > > >> Yeah, on stable/10. But on -HEAD it's different. There's two entries - > > >> one for d3_01 and one for d3_03. > > > > > > What CPU model? > > > > CPU: Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz (2600.06-MHz K8-class CPU) > > Origin="GenuineIntel" Id=0x306e4 Family=0x6 Model=0x3e Stepping=4 > > Same with me? May be in you case E5-269x? From owner-freebsd-arch@FreeBSD.ORG Fri Apr 24 13:13:06 2015 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 58CBDC35 for ; Fri, 24 Apr 2015 13:13:06 +0000 (UTC) Received: from mail-ie0-x22d.google.com (mail-ie0-x22d.google.com [IPv6:2607:f8b0:4001:c03::22d]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 27C941890 for ; Fri, 24 Apr 2015 13:13:06 +0000 (UTC) Received: by iedfl3 with SMTP id fl3so93501106ied.1 for ; Fri, 24 Apr 2015 06:13:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=+2ZMCbbMUrrG+EeTnUUOVcxqc55aEdO4iE5JtF+AHBk=; b=EKwOs2Z7A8PuL60En4rR2oRHqvAZTgc9Iwx0tlGBv2tfhftsFlQvW7KheB0Gy7P2jo q7qIqAopb8d3kQZAVDs+yUptjdINwh+4X9Nlkz+x627kMUulEJCTwWtJpSsghWuqSrMX N1DxelZd4CAisvbMexjJ/F609PEKxkXwOlm0cvGEu9CRA7C5pePYR1sRJ5j4SCkg8jsM mimwAZqkUA8ECoIxIhgufBzPKiWHZqfglIcY4d0VPvhZI11qV94+uI+ZVk6BpzXuds0a w4/NZKsrkPeFqT0D+T2bd4XiSLwkUhYeZrTWlF+QzC9TURCNgjCCKbzEU9MHcR8ks0JG Cykg== MIME-Version: 1.0 X-Received: by 10.107.18.65 with SMTP id a62mr11034041ioj.67.1429881185685; Fri, 24 Apr 2015 06:13:05 -0700 (PDT) Received: by 10.64.13.81 with HTTP; Fri, 24 Apr 2015 06:13:05 -0700 (PDT) Date: Fri, 24 Apr 2015 15:13:05 +0200 Message-ID: Subject: bus_dmamap_sync() for bounced client buffers from user address space From: Svatopluk Kraus To: FreeBSD Arch Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 24 Apr 2015 13:13:06 -0000 DMA can be done on client buffer from user address space. For example, thru bus_dmamap_load_uio() when uio->uio_segflg is UIO_USERSPACE. Such client buffer can bounce and then, it must be copied to and from bounce buffer in bus_dmamap_sync(). Current implementations (in all archs) do not take into account that bus_dmamap_sync() is asynchronous for POSTWRITE and POSTREAD in general. It can be asynchronous for PREWRITE and PREREAD too. For example, in some driver implementations where DMA client buffers operations are buffered. In those cases, simple bcopy() is not correct. Demonstration of current implementation (x86) is the following: ----------------------------- struct bounce_page { vm_offset_t vaddr; /* kva of bounce buffer */ bus_addr_t busaddr; /* Physical address */ vm_offset_t datavaddr; /* kva of client data */ bus_addr_t dataaddr; /* client physical address */ bus_size_t datacount; /* client data count */ STAILQ_ENTRY(bounce_page) links; }; if ((op & BUS_DMASYNC_PREWRITE) != 0) { while (bpage != NULL) { if (bpage->datavaddr != 0) { bcopy((void *)bpage->datavaddr, (void *)bpage->vaddr, bpage->datacount); } else { physcopyout(bpage->dataaddr, (void *)bpage->vaddr, bpage->datacount); } bpage = STAILQ_NEXT(bpage, links); } dmat->bounce_zone->total_bounced++; } ----------------------------- There are two things: (1) datavaddr is not always kva of client data, but sometimes it can be uva of client data. (2) bcopy() can be used only if datavaddr is kva or when map->pmap is current pmap. Note that there is an implication for bus_dmamap_load_uio() with uio->uio_segflg set to UIO_USERSPACE that used physical pages are in-core and wired. See "man bus_dma". There is not public interface to check that map->pmap is current pmap. So one solution is the following: if (bpage->datavaddr >= VM_MIN_KERNEL_ADDRESS) { bcopy(); } else { physcopy(); } If there will be public pmap_is_current() then another solution is the following: if (bpage->datavaddr != 0) && pmap_is_current(map->pmap)) { bcopy(); } else { physcopy(); } The second solution implies that context switch must not happen during bus_dmamap_sync() called from an interrupt routine. However, IMO, it's granted. Note that map->pmap should be always kernel_pmap for datavaddr >= VM_MIN_KERNEL_ADDRESS. Comments, different solutions, or objections? Svatopluk Kraus From owner-freebsd-arch@FreeBSD.ORG Fri Apr 24 18:57:50 2015 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 8A85EB84 for ; Fri, 24 Apr 2015 18:57:50 +0000 (UTC) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 706F013C0 for ; Fri, 24 Apr 2015 18:57:50 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.9/8.14.9) with ESMTP id t3OIvoJn033424 for ; Fri, 24 Apr 2015 18:57:50 GMT (envelope-from bdrewery@freefall.freebsd.org) Received: (from bdrewery@localhost) by freefall.freebsd.org (8.14.9/8.14.9/Submit) id t3OIvoAC033421 for arch@FreeBSD.org; Fri, 24 Apr 2015 18:57:50 GMT (envelope-from bdrewery) Received: (qmail 9218 invoked from network); 24 Apr 2015 13:57:45 -0500 Received: from unknown (HELO ?10.10.1.139?) (freebsd@shatow.net@10.10.1.139) by sweb.xzibition.com with ESMTPA; 24 Apr 2015 13:57:45 -0500 Message-ID: <553A922F.3000303@FreeBSD.org> Date: Fri, 24 Apr 2015 13:57:51 -0500 From: Bryan Drewery Organization: FreeBSD User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0 MIME-Version: 1.0 To: Baptiste Daroussin , arch@FreeBSD.org Subject: Re: RFC: Alternative to PRIVATELIB References: <20150411142835.GE65320@ivaldir.etoilebsd.net> In-Reply-To: <20150411142835.GE65320@ivaldir.etoilebsd.net> OpenPGP: id=F9173CB2C3AAEA7A5C8A1F0935D771BB6E4697CF; url=http://www.shatow.net/bryan/bryan2.asc Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="69KlI9bmva1VqQBJnaMDv3pG2B3bUDEIV" X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 24 Apr 2015 18:57:50 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --69KlI9bmva1VqQBJnaMDv3pG2B3bUDEIV Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable On 4/11/2015 9:28 AM, Baptiste Daroussin wrote: > Hi, >=20 > I would like to propose to replace PRIVATELIB with something more conve= nient. >=20 > First what is PRIVATELIB is trying to solve: > We are maintaining stable ABI over branches but some third parties sour= ces > are not really good at maintaining stable ABI, so we do hide them into = private > where nothing can use it. We do not provide headers for that and we add= rpath to > every binaries that needs to link to those. >=20 > What is the issues of PRIVATELIB: > any application linking to a library from base (a regular one) that doe= s itself > links to a PRIVATELIB cannot anymore statically link to the said applic= ation. >=20 > The is no mechanism to handle PRIVATELIBS in compat*x ports which can b= e a > problem if one of our regular lib is linked to a privatelib and ends up= into > compat one day. >=20 > It prevents easy linking for 3rd party application using those privatel= ibs on > purpose (aka with the knowledge abi can break) like libbsdstat. >=20 > What I would like to propose is the following: >=20 > Create in bsd.lib.mk support for PRIVATE knobs (what ever name you do p= refer) >=20 > It will just prefix the name of the library with "private" but install = it in the > regular place >=20 > It will automatically decide to install the headers into /usr/include/p= rivate/${LIB}/ This is basically a requirement to even make some libraries private. (I may be misremembering details) Anything built against readline wants to find a readline/*.h but the source in the tree is not stored in a readline/*.h structure. By installing the actual headers into /usr/include/private the problem goes away. > Each private library headers in a custom place to avoid an application = that > deliberatly use a given privatelib to find another one >=20 > Prefix all manpage with private_ so that we can provide the documentati= on for > the said libs for the version we ship but if another version is shipped= by ports > then we can easily access both documentation. >=20 This sounds useful for debugging and developing base applications which need these private libs. > As a result bsd.lib.mk will be simpler, we could again static link agai= nst > everything we ship in base, we can provide documentation for those libs= and we > can easily isolate them anyway from the ports. >=20 > I plan to start working on this in a week. >=20 +1 --=20 Regards, Bryan Drewery --69KlI9bmva1VqQBJnaMDv3pG2B3bUDEIV Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAEBAgAGBQJVOpIvAAoJEDXXcbtuRpfPAfYH/1XuFHv9dws0wgWUiEr3lgp2 zf9pgoqn21YUm0jPY1662P1dHpLLEAE3rTbCVpgCErtF/pA5Afu9O3jomRYtzUP3 RIrUP7/qj03kwvZVvgQIJAgqX5Fj/0I4LMvorW9WDbM9JIEUyqVmNGS/sUs4ljRp si6MSELNOWQIBuvHFB/qwbHgBwR2x5GXWUkBFdp8InRjSr0qGNvWasb+WV10YuDt W9QYfnoqG+XouWZZdTIrGm2BSy31kheqX5WzVhFDqW7JwA5O8GIHkTVnay0pxlCw brNoD0lulyPYzr6neRQwTktofGIqt+KeQ9eARWol5mZmv+rg8WE483IITXKiVoU= =on6E -----END PGP SIGNATURE----- --69KlI9bmva1VqQBJnaMDv3pG2B3bUDEIV-- From owner-freebsd-arch@FreeBSD.ORG Fri Apr 24 19:00:41 2015 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 4B201C5E for ; Fri, 24 Apr 2015 19:00:41 +0000 (UTC) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 16FA113DC for ; Fri, 24 Apr 2015 19:00:41 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.9/8.14.9) with ESMTP id t3OJ0eX4033832 for ; Fri, 24 Apr 2015 19:00:40 GMT (envelope-from bdrewery@freefall.freebsd.org) Received: (from bdrewery@localhost) by freefall.freebsd.org (8.14.9/8.14.9/Submit) id t3OJ0eD5033828 for arch@FreeBSD.org; Fri, 24 Apr 2015 19:00:40 GMT (envelope-from bdrewery) Received: (qmail 23719 invoked from network); 24 Apr 2015 14:00:39 -0500 Received: from unknown (HELO ?10.10.1.139?) (freebsd@shatow.net@10.10.1.139) by sweb.xzibition.com with ESMTPA; 24 Apr 2015 14:00:39 -0500 Message-ID: <553A92DD.1020404@FreeBSD.org> Date: Fri, 24 Apr 2015 14:00:45 -0500 From: Bryan Drewery Organization: FreeBSD User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0 MIME-Version: 1.0 To: Baptiste Daroussin , arch@FreeBSD.org Subject: Re: RFC: Alternative to PRIVATELIB References: <20150411142835.GE65320@ivaldir.etoilebsd.net> <553A922F.3000303@FreeBSD.org> In-Reply-To: <553A922F.3000303@FreeBSD.org> OpenPGP: id=F9173CB2C3AAEA7A5C8A1F0935D771BB6E4697CF; url=http://www.shatow.net/bryan/bryan2.asc Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="PEvkNjAgal3pEvFMNn5p4qWHpia6RLHId" X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 24 Apr 2015 19:00:41 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --PEvkNjAgal3pEvFMNn5p4qWHpia6RLHId Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable On 4/24/2015 1:57 PM, Bryan Drewery wrote: > On 4/11/2015 9:28 AM, Baptiste Daroussin wrote: >> Prefix all manpage with private_ so that we can provide the documentat= ion for >> the said libs for the version we ship but if another version is shippe= d by ports >> then we can easily access both documentation. >> >=20 > This sounds useful for debugging and developing base applications which= > need these private libs. However I would suggest a /usr/share/man/private that you can access by setting MANDIR. Prefixing all manpages with 'private_' will get silly with cross-references and function_name.3 files. --=20 Regards, Bryan Drewery --PEvkNjAgal3pEvFMNn5p4qWHpia6RLHId Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAEBAgAGBQJVOpLdAAoJEDXXcbtuRpfP3zUH/jLjG8J4kAWgfuM7CXyoxZcI ix8+2AnAYTH0hAq+8hxm3zqbaeP6fttuL4nJPg4XGdwPmXrBoFRmPAcWezloc8gx 2lwWXWaW8lKygeDbLxOEknSU706ajlOXgq+Qks1Sl+WP3D5/yQsMyTYsmtm4cOk4 JgzWOOcXKeemtWrhjtoAkBKJRvblfxE/YNx4YvokbIX4lddva06wuyVpMWg8UAjh +RhobfWzAWkLSF03wLDnh1MIKGF4EjRAHTiLtwkdynz/PP1fQI+u+9v8hTS7FZ5S wJBqeYgqPi3EiUyP2F9irt2UJcarGoywxp+Zk4AU/qlDk0MIj9jSYpVvtlK8eTs= =V980 -----END PGP SIGNATURE----- --PEvkNjAgal3pEvFMNn5p4qWHpia6RLHId-- From owner-freebsd-arch@FreeBSD.ORG Fri Apr 24 19:02:23 2015 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 122F9D2F; Fri, 24 Apr 2015 19:02:23 +0000 (UTC) Received: from mail-wi0-x22b.google.com (mail-wi0-x22b.google.com [IPv6:2a00:1450:400c:c05::22b]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id AC2CB14ED; Fri, 24 Apr 2015 19:02:22 +0000 (UTC) Received: by wizk4 with SMTP id k4so32780874wiz.1; Fri, 24 Apr 2015 12:02:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=oTH5af/cafG3AeYTd2yW0musUhcxgAxrqrANvwzFhQM=; b=f8zbloL9Kd5TAWtef9btzRt7qEQvaS9jb+CaBomA7hgaU6t9JftXjtA1WmjIzraBk9 79hhW/BIW4ZrRMRWRcoV766g/E1qKxdOabcxtufRApF7WUkFUcLhVpG78TFxgbwk3SCZ WNX/cUhq0d1AMX5ssybPaPnp8ABRX1u0tQBDGQSgzozB25T0tHcYTx+qze2LaoHIEv8v wXQxwfCVPmUUPocWSKGjmMH5MVBbGS/yeVz9U/cirDl3oUhMmpj8cI2CU190GZkdMA+a jle9acHEbPr2TNxeh/+OHaX0NuhbCE3x5MYL1Ofi6cuHzSUX/oh2NPbIkEogxj7WZvwG SdmQ== X-Received: by 10.180.107.38 with SMTP id gz6mr713821wib.63.1429902141192; Fri, 24 Apr 2015 12:02:21 -0700 (PDT) Received: from ivaldir.etoilebsd.net ([2001:41d0:8:db4c::1]) by mx.google.com with ESMTPSA id g14sm18071677wjs.47.2015.04.24.12.02.19 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 24 Apr 2015 12:02:20 -0700 (PDT) Sender: Baptiste Daroussin Date: Fri, 24 Apr 2015 21:02:18 +0200 From: Baptiste Daroussin To: Bryan Drewery Cc: arch@FreeBSD.org Subject: Re: RFC: Alternative to PRIVATELIB Message-ID: <20150424190218.GA13141@ivaldir.etoilebsd.net> References: <20150411142835.GE65320@ivaldir.etoilebsd.net> <553A922F.3000303@FreeBSD.org> <553A92DD.1020404@FreeBSD.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="envbJBWh7q8WU6mo" Content-Disposition: inline In-Reply-To: <553A92DD.1020404@FreeBSD.org> User-Agent: Mutt/1.5.23 (2014-03-12) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 24 Apr 2015 19:02:23 -0000 --envbJBWh7q8WU6mo Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Apr 24, 2015 at 02:00:45PM -0500, Bryan Drewery wrote: > On 4/24/2015 1:57 PM, Bryan Drewery wrote: > > On 4/11/2015 9:28 AM, Baptiste Daroussin wrote: > >> Prefix all manpage with private_ so that we can provide the documentat= ion for > >> the said libs for the version we ship but if another version is shippe= d by ports > >> then we can easily access both documentation. > >> > >=20 > > This sounds useful for debugging and developing base applications which > > need these private libs. >=20 > However I would suggest a /usr/share/man/private that you can access by > setting MANDIR. Prefixing all manpages with 'private_' will get silly > with cross-references and function_name.3 files. Actually manpages is the complicated part of that proposal, so I may do man= pages in a second attempt :) Best regards, Bapt --envbJBWh7q8WU6mo Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iEYEARECAAYFAlU6kzoACgkQ8kTtMUmk6Eyb8wCfeScAle14ah+Xpx/LeM8lHUKT eyIAoLndSPyxLSMpISizRgtxK9B6pqkS =pF6w -----END PGP SIGNATURE----- --envbJBWh7q8WU6mo-- From owner-freebsd-arch@FreeBSD.ORG Fri Apr 24 22:50:17 2015 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 9CFB1E97 for ; Fri, 24 Apr 2015 22:50:17 +0000 (UTC) Received: from mail-ig0-x22a.google.com (mail-ig0-x22a.google.com [IPv6:2607:f8b0:4001:c05::22a]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 6D06E1DB7 for ; Fri, 24 Apr 2015 22:50:17 +0000 (UTC) Received: by igblo3 with SMTP id lo3so26314346igb.1 for ; Fri, 24 Apr 2015 15:50:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=+6oZdQd1ywzWxCvca3iMRpGGsKaLnEXcKMTmIeoLY9o=; b=M8nljuxr0nsgKx96usLmWtRW34WeR0npjC58+NEOpNUOfMuOY/KI5hD8h2684zeTzH TDjRBvACEomuKh6N0x7xqEcCWgs6AWmsxoRxT/A0sPgmW37btAqWb1OpjU2IF6Rda/Ya 3yXE+SvqM23maG9+xMwXPJV/JTpbuBE9CptFC9LHPT2dRq1dvhLNYL8z/HubWkou9K1J 9pZiGgol6V3vYog2iSceK7SatyI+7qMaaSO30dTWqp2RwYnJL68y5mgO9V1HVOM6xEtl AFvcsk7zMSenU0y42YBLzF2ibq4FgHkZObXeM5W8VDhge831cCCCL4e2ZLDFoWxenR6Z NWBg== MIME-Version: 1.0 X-Received: by 10.50.36.66 with SMTP id o2mr354712igj.16.1429915815629; Fri, 24 Apr 2015 15:50:15 -0700 (PDT) Received: by 10.36.106.70 with HTTP; Fri, 24 Apr 2015 15:50:15 -0700 (PDT) In-Reply-To: References: Date: Fri, 24 Apr 2015 17:50:15 -0500 Message-ID: Subject: Re: bus_dmamap_sync() for bounced client buffers from user address space From: Jason Harmening To: Svatopluk Kraus Cc: FreeBSD Arch Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 24 Apr 2015 22:50:17 -0000 A couple of comments: --POSTWRITE and POSTREAD are only asynchronous if you call them from an asynchronous context. For a driver that's already performing DMA operations on usermode memory, it seems likely that there's going to be *some* place where you can call bus_dmamap_sync() and be guaranteed to be executing in the context of the process that owns the memory. Then a regular bcopy will be safe and inexpensive, assuming the pages have been properly vslock-ed/vm_map_wire-d. That's usually whatever read/write/ioctl operation spawned the DMA transfer in the first place. So, in those cases can you not just move the POSTREAD/POSTWRITE sync from the "DMA-finished" interrupt to the d_read/d_write/d_ioctl that waits on the "DMA-finished" interrupt? --physcopyin/physcopyout aren't trivial. They go through uiomove_fromphys, which often uses sfbufs to create temporary KVA mappings for the physical pages. sf_buf_alloc() can sleep (unless SFB_NOWAIT is specified, which means it can fail and which uiomove_fromphys does not specify for good reason); that makes it unsafe for use in either a threaded interrupt or a filter. Perhaps the physcopyout path could be changed to use pmap_qenter directly in this case, but that can still be expensive in terms of TLB shootdowns. Checking against VM_MIN_KERNEL_ADDRESS seems sketchy; it eliminates the chance to use a much-less-expensive bcopy in cases where the sync is happening in correct process context. Context-switching during bus_dmamap_sync() shouldn't be an issue. In a filter interrupt, curproc will be completely arbitrary but none of this stuff should be called in a filter anyway. Otherwise, if you're in a kernel thread (including an ithread), curproc should be whatever proc was supplied when the thread was created. That's usually proc0, which only has kernel address space. IOW, even if a context-switch happens sometime during bus_dmamap_sync, any pmap check or copy should have a consistent and non-arbitrary process context. I think something like your second solution would be workable to make UIO_USERSPACE syncs work in non-interrupt kernel threads, but given all the restrictions and extra cost of physcopy, I'm not sure how useful that would be. I do think busdma.9 could at least use a note that bus_dmamap_sync() is only safe to call in the context of the owning process for user buffers. On Fri, Apr 24, 2015 at 8:13 AM, Svatopluk Kraus wrote: > DMA can be done on client buffer from user address space. For example, > thru bus_dmamap_load_uio() when uio->uio_segflg is UIO_USERSPACE. Such > client buffer can bounce and then, it must be copied to and from > bounce buffer in bus_dmamap_sync(). > > Current implementations (in all archs) do not take into account that > bus_dmamap_sync() is asynchronous for POSTWRITE and POSTREAD in > general. It can be asynchronous for PREWRITE and PREREAD too. For > example, in some driver implementations where DMA client buffers > operations are buffered. In those cases, simple bcopy() is not > correct. > > Demonstration of current implementation (x86) is the following: > > ----------------------------- > struct bounce_page { > vm_offset_t vaddr; /* kva of bounce buffer */ > bus_addr_t busaddr; /* Physical address */ > vm_offset_t datavaddr; /* kva of client data */ > bus_addr_t dataaddr; /* client physical address */ > bus_size_t datacount; /* client data count */ > STAILQ_ENTRY(bounce_page) links; > }; > > > if ((op & BUS_DMASYNC_PREWRITE) != 0) { > while (bpage != NULL) { > if (bpage->datavaddr != 0) { > bcopy((void *)bpage->datavaddr, > (void *)bpage->vaddr, > bpage->datacount); > } else { > physcopyout(bpage->dataaddr, > (void *)bpage->vaddr, > bpage->datacount); > } > bpage = STAILQ_NEXT(bpage, links); > } > dmat->bounce_zone->total_bounced++; > } > ----------------------------- > > There are two things: > > (1) datavaddr is not always kva of client data, but sometimes it can > be uva of client data. > (2) bcopy() can be used only if datavaddr is kva or when map->pmap is > current pmap. > > Note that there is an implication for bus_dmamap_load_uio() with > uio->uio_segflg set to UIO_USERSPACE that used physical pages are > in-core and wired. See "man bus_dma". > > There is not public interface to check that map->pmap is current pmap. > So one solution is the following: > > if (bpage->datavaddr >= VM_MIN_KERNEL_ADDRESS) { > bcopy(); > } else { > physcopy(); > } > > If there will be public pmap_is_current() then another solution is the > following: > > if (bpage->datavaddr != 0) && pmap_is_current(map->pmap)) { > bcopy(); > } else { > physcopy(); > } > > The second solution implies that context switch must not happen during > bus_dmamap_sync() called from an interrupt routine. However, IMO, it's > granted. > > Note that map->pmap should be always kernel_pmap for datavaddr >= > VM_MIN_KERNEL_ADDRESS. > > Comments, different solutions, or objections? > > Svatopluk Kraus > _______________________________________________ > freebsd-arch@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" > From owner-freebsd-arch@FreeBSD.ORG Sat Apr 25 09:41:59 2015 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 8F0C253E for ; Sat, 25 Apr 2015 09:41:59 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 193F91BA0 for ; Sat, 25 Apr 2015 09:41:58 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id t3P9frQR000982 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Sat, 25 Apr 2015 12:41:53 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua t3P9frQR000982 Received: (from kostik@localhost) by tom.home (8.14.9/8.14.9/Submit) id t3P9fqVn000981; Sat, 25 Apr 2015 12:41:52 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Sat, 25 Apr 2015 12:41:52 +0300 From: Konstantin Belousov To: Jason Harmening Cc: Svatopluk Kraus , FreeBSD Arch Subject: Re: bus_dmamap_sync() for bounced client buffers from user address space Message-ID: <20150425094152.GE2390@kib.kiev.ua> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.0 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on tom.home X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Apr 2015 09:41:59 -0000 On Fri, Apr 24, 2015 at 05:50:15PM -0500, Jason Harmening wrote: > A couple of comments: > > --POSTWRITE and POSTREAD are only asynchronous if you call them from an > asynchronous context. > For a driver that's already performing DMA operations on usermode memory, > it seems likely that there's going to be *some* place where you can call > bus_dmamap_sync() and be guaranteed to be executing in the context of the > process that owns the memory. Then a regular bcopy will be safe and > inexpensive, assuming the pages have been properly vslock-ed/vm_map_wire-d. > That's usually whatever read/write/ioctl operation spawned the DMA transfer > in the first place. So, in those cases can you not just move the > POSTREAD/POSTWRITE sync from the "DMA-finished" interrupt to the > d_read/d_write/d_ioctl that waits on the "DMA-finished" interrupt? > > --physcopyin/physcopyout aren't trivial. They go through uiomove_fromphys, > which often uses sfbufs to create temporary KVA mappings for the physical > pages. sf_buf_alloc() can sleep (unless SFB_NOWAIT is specified, which > means it can fail and which uiomove_fromphys does not specify for good > reason); that makes it unsafe for use in either a threaded interrupt or a > filter. Perhaps the physcopyout path could be changed to use pmap_qenter > directly in this case, but that can still be expensive in terms of TLB > shootdowns. > > Checking against VM_MIN_KERNEL_ADDRESS seems sketchy; it eliminates the > chance to use a much-less-expensive bcopy in cases where the sync is > happening in correct process context. > > Context-switching during bus_dmamap_sync() shouldn't be an issue. In a > filter interrupt, curproc will be completely arbitrary but none of this > stuff should be called in a filter anyway. Otherwise, if you're in a > kernel thread (including an ithread), curproc should be whatever proc was > supplied when the thread was created. That's usually proc0, which only has > kernel address space. IOW, even if a context-switch happens sometime > during bus_dmamap_sync, any pmap check or copy should have a consistent and > non-arbitrary process context. > > I think something like your second solution would be workable to make > UIO_USERSPACE syncs work in non-interrupt kernel threads, but given all the > restrictions and extra cost of physcopy, I'm not sure how useful that would > be. > > I do think busdma.9 could at least use a note that bus_dmamap_sync() is > only safe to call in the context of the owning process for user buffers. UIO_USERSPACE for busdma is absolutely unsafe and cannot be used without making kernel panicing. Even if you wire the userspace bufer, there is nothing which could prevent other thread in the user process, or other process sharing the same address space, to call munmap(2) on the range. The only safe method to work with the userspace regions is to vm_fault_quick_hold() them to get hold on the pages, and then either pass pages array down, or remap them in the KVA with pmap_qenter(). > > > On Fri, Apr 24, 2015 at 8:13 AM, Svatopluk Kraus wrote: > > > DMA can be done on client buffer from user address space. For example, > > thru bus_dmamap_load_uio() when uio->uio_segflg is UIO_USERSPACE. Such > > client buffer can bounce and then, it must be copied to and from > > bounce buffer in bus_dmamap_sync(). > > > > Current implementations (in all archs) do not take into account that > > bus_dmamap_sync() is asynchronous for POSTWRITE and POSTREAD in > > general. It can be asynchronous for PREWRITE and PREREAD too. For > > example, in some driver implementations where DMA client buffers > > operations are buffered. In those cases, simple bcopy() is not > > correct. > > > > Demonstration of current implementation (x86) is the following: > > > > ----------------------------- > > struct bounce_page { > > vm_offset_t vaddr; /* kva of bounce buffer */ > > bus_addr_t busaddr; /* Physical address */ > > vm_offset_t datavaddr; /* kva of client data */ > > bus_addr_t dataaddr; /* client physical address */ > > bus_size_t datacount; /* client data count */ > > STAILQ_ENTRY(bounce_page) links; > > }; > > > > > > if ((op & BUS_DMASYNC_PREWRITE) != 0) { > > while (bpage != NULL) { > > if (bpage->datavaddr != 0) { > > bcopy((void *)bpage->datavaddr, > > (void *)bpage->vaddr, > > bpage->datacount); > > } else { > > physcopyout(bpage->dataaddr, > > (void *)bpage->vaddr, > > bpage->datacount); > > } > > bpage = STAILQ_NEXT(bpage, links); > > } > > dmat->bounce_zone->total_bounced++; > > } > > ----------------------------- > > > > There are two things: > > > > (1) datavaddr is not always kva of client data, but sometimes it can > > be uva of client data. > > (2) bcopy() can be used only if datavaddr is kva or when map->pmap is > > current pmap. > > > > Note that there is an implication for bus_dmamap_load_uio() with > > uio->uio_segflg set to UIO_USERSPACE that used physical pages are > > in-core and wired. See "man bus_dma". > > > > There is not public interface to check that map->pmap is current pmap. > > So one solution is the following: > > > > if (bpage->datavaddr >= VM_MIN_KERNEL_ADDRESS) { > > bcopy(); > > } else { > > physcopy(); > > } > > > > If there will be public pmap_is_current() then another solution is the > > following: > > > > if (bpage->datavaddr != 0) && pmap_is_current(map->pmap)) { > > bcopy(); > > } else { > > physcopy(); > > } > > > > The second solution implies that context switch must not happen during > > bus_dmamap_sync() called from an interrupt routine. However, IMO, it's > > granted. > > > > Note that map->pmap should be always kernel_pmap for datavaddr >= > > VM_MIN_KERNEL_ADDRESS. > > > > Comments, different solutions, or objections? > > > > Svatopluk Kraus > > _______________________________________________ > > freebsd-arch@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" > > > _______________________________________________ > freebsd-arch@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" From owner-freebsd-arch@FreeBSD.ORG Sat Apr 25 14:01:04 2015 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 7C83CA8D for ; Sat, 25 Apr 2015 14:01:04 +0000 (UTC) Received: from mail-oi0-x232.google.com (mail-oi0-x232.google.com [IPv6:2607:f8b0:4003:c06::232]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 3C7D011EF for ; Sat, 25 Apr 2015 14:01:04 +0000 (UTC) Received: by oiko83 with SMTP id o83so60560485oik.1 for ; Sat, 25 Apr 2015 07:01:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type; bh=rYOl//E1cbPFP0QYFLbhatfqNckX0MMueJe4B5ZFGVU=; b=KMIw2gM1LUPI072xwd81W9vAL5/xAHTegdU0sAN65lo1CFK85XKqsoCb7WoO/Byw0R AH4WOBCSxZxHx0wBWht2Ayq/Jjdr82c9hlWTgbzGrQkITKJmbwk+cW8uqEvbmfghEQcN KWhaGatrm2S/Clq957NDU9TY01RR9iEX/09jN31aMJNgLhNs7Q/97p0IZvZGqWMIPn1d D0fvUXV4Hv2YEL9KRT6sC9/4JK83Olm9XLG4eU56WTdGmGJIdUMiyW9YPDN6Q0uWfJ6L Re0hUN5KelXzu8IOrjXEZyW41VQKzwDBCd68rgzn74lb+ml/Obin8Vxj2k32Dtdw4O59 3mdQ== X-Received: by 10.60.148.225 with SMTP id tv1mr2893115oeb.14.1429970463468; Sat, 25 Apr 2015 07:01:03 -0700 (PDT) Received: from corona.austin.rr.com (cpe-72-177-6-10.austin.res.rr.com. [72.177.6.10]) by mx.google.com with ESMTPSA id z133sm8230852oif.14.2015.04.25.07.01.01 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 25 Apr 2015 07:01:02 -0700 (PDT) Message-ID: <553B9E64.8030907@gmail.com> Date: Sat, 25 Apr 2015 09:02:12 -0500 From: Jason Harmening User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0 MIME-Version: 1.0 To: Konstantin Belousov CC: Svatopluk Kraus , FreeBSD Arch Subject: Re: bus_dmamap_sync() for bounced client buffers from user address space References: <20150425094152.GE2390@kib.kiev.ua> In-Reply-To: <20150425094152.GE2390@kib.kiev.ua> Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="eiGxCA8j3t9W15b4RvfXexTIWMNHp0Eiw" X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Apr 2015 14:01:04 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --eiGxCA8j3t9W15b4RvfXexTIWMNHp0Eiw Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable On 04/25/15 04:41, Konstantin Belousov wrote: > On Fri, Apr 24, 2015 at 05:50:15PM -0500, Jason Harmening wrote: >> A couple of comments: >> >> --POSTWRITE and POSTREAD are only asynchronous if you call them from a= n >> asynchronous context. >> For a driver that's already performing DMA operations on usermode memo= ry, >> it seems likely that there's going to be *some* place where you can ca= ll >> bus_dmamap_sync() and be guaranteed to be executing in the context of = the >> process that owns the memory. Then a regular bcopy will be safe and >> inexpensive, assuming the pages have been properly vslock-ed/vm_map_wi= re-d. >> That's usually whatever read/write/ioctl operation spawned the DMA tra= nsfer >> in the first place. So, in those cases can you not just move the >> POSTREAD/POSTWRITE sync from the "DMA-finished" interrupt to the >> d_read/d_write/d_ioctl that waits on the "DMA-finished" interrupt? >> >> --physcopyin/physcopyout aren't trivial. They go through uiomove_from= phys, >> which often uses sfbufs to create temporary KVA mappings for the physi= cal >> pages. sf_buf_alloc() can sleep (unless SFB_NOWAIT is specified, whic= h >> means it can fail and which uiomove_fromphys does not specify for good= >> reason); that makes it unsafe for use in either a threaded interrupt o= r a >> filter. Perhaps the physcopyout path could be changed to use pmap_qen= ter >> directly in this case, but that can still be expensive in terms of TLB= >> shootdowns. >> >> Checking against VM_MIN_KERNEL_ADDRESS seems sketchy; it eliminates th= e >> chance to use a much-less-expensive bcopy in cases where the sync is >> happening in correct process context. >> >> Context-switching during bus_dmamap_sync() shouldn't be an issue. In = a >> filter interrupt, curproc will be completely arbitrary but none of thi= s >> stuff should be called in a filter anyway. Otherwise, if you're in a >> kernel thread (including an ithread), curproc should be whatever proc = was >> supplied when the thread was created. That's usually proc0, which onl= y has >> kernel address space. IOW, even if a context-switch happens sometime >> during bus_dmamap_sync, any pmap check or copy should have a consisten= t and >> non-arbitrary process context. >> >> I think something like your second solution would be workable to make >> UIO_USERSPACE syncs work in non-interrupt kernel threads, but given al= l the >> restrictions and extra cost of physcopy, I'm not sure how useful that = would >> be. >> >> I do think busdma.9 could at least use a note that bus_dmamap_sync() i= s >> only safe to call in the context of the owning process for user buffer= s. > UIO_USERSPACE for busdma is absolutely unsafe and cannot be used withou= t > making kernel panicing. Even if you wire the userspace bufer, there is= > nothing which could prevent other thread in the user process, or other > process sharing the same address space, to call munmap(2) on the range.= > > The only safe method to work with the userspace regions is to > vm_fault_quick_hold() them to get hold on the pages, and then either > pass pages array down, or remap them in the KVA with pmap_qenter(). I was under the impression that any attempt to free or munmap the virtual range would block waiting for the underlying pages to be unwired.= That's certainly the behavior I've seen with free; is it not the case with munmap? Either way, I haven't walked the code to see where the blocking is implemented. If UIO_USERSPACE truly is unsafe to use with busdma, then we need to either make it safe or stop documenting it in the manpage. Perhaps the bounce buffer logic could use copyin/copyout for userspace, if in the right process? Or just always use physcopy for non-KVA as in the first suggestion? It seems like in general it is too hard for drivers using busdma to deal with usermode memory in a way that's both safe and efficient: --bus_dmamap_load_uio + UIO_USERSPACE is apparently really unsafe --if they do things the other way and allocate in the kernel, then then they better either be willing to do extra copying, or create and refcount their own vm_objects and use d_mmap_single (I still haven't seen a good example of that), or leak a bunch of memory (if they use d_mmap), because the old device pager is also really unsafe. > >> >> On Fri, Apr 24, 2015 at 8:13 AM, Svatopluk Kraus wr= ote: >> >>> DMA can be done on client buffer from user address space. For example= , >>> thru bus_dmamap_load_uio() when uio->uio_segflg is UIO_USERSPACE. Suc= h >>> client buffer can bounce and then, it must be copied to and from >>> bounce buffer in bus_dmamap_sync(). >>> >>> Current implementations (in all archs) do not take into account that >>> bus_dmamap_sync() is asynchronous for POSTWRITE and POSTREAD in >>> general. It can be asynchronous for PREWRITE and PREREAD too. For >>> example, in some driver implementations where DMA client buffers >>> operations are buffered. In those cases, simple bcopy() is not >>> correct. >>> >>> Demonstration of current implementation (x86) is the following: >>> >>> ----------------------------- >>> struct bounce_page { >>> vm_offset_t vaddr; /* kva of bounce buffer */ >>> bus_addr_t busaddr; /* Physical address */ >>> vm_offset_t datavaddr; /* kva of client data */ >>> bus_addr_t dataaddr; /* client physical address */ >>> bus_size_t datacount; /* client data count */ >>> STAILQ_ENTRY(bounce_page) links; >>> }; >>> >>> >>> if ((op & BUS_DMASYNC_PREWRITE) !=3D 0) { >>> while (bpage !=3D NULL) { >>> if (bpage->datavaddr !=3D 0) { >>> bcopy((void *)bpage->datavaddr, >>> (void *)bpage->vaddr, >>> bpage->datacount); >>> } else { >>> physcopyout(bpage->dataaddr, >>> (void *)bpage->vaddr, >>> bpage->datacount); >>> } >>> bpage =3D STAILQ_NEXT(bpage, links); >>> } >>> dmat->bounce_zone->total_bounced++; >>> } >>> ----------------------------- >>> >>> There are two things: >>> >>> (1) datavaddr is not always kva of client data, but sometimes it can >>> be uva of client data. >>> (2) bcopy() can be used only if datavaddr is kva or when map->pmap is= >>> current pmap. >>> >>> Note that there is an implication for bus_dmamap_load_uio() with >>> uio->uio_segflg set to UIO_USERSPACE that used physical pages are >>> in-core and wired. See "man bus_dma". >>> >>> There is not public interface to check that map->pmap is current pmap= =2E >>> So one solution is the following: >>> >>> if (bpage->datavaddr >=3D VM_MIN_KERNEL_ADDRESS) { >>> bcopy(); >>> } else { >>> physcopy(); >>> } >>> >>> If there will be public pmap_is_current() then another solution is th= e >>> following: >>> >>> if (bpage->datavaddr !=3D 0) && pmap_is_current(map->pmap)) { >>> bcopy(); >>> } else { >>> physcopy(); >>> } >>> >>> The second solution implies that context switch must not happen durin= g >>> bus_dmamap_sync() called from an interrupt routine. However, IMO, it'= s >>> granted. >>> >>> Note that map->pmap should be always kernel_pmap for datavaddr >=3D >>> VM_MIN_KERNEL_ADDRESS. >>> >>> Comments, different solutions, or objections? >>> >>> Svatopluk Kraus >>> _______________________________________________ >>> freebsd-arch@freebsd.org mailing list >>> http://lists.freebsd.org/mailman/listinfo/freebsd-arch >>> To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.or= g" >>> >> _______________________________________________ >> freebsd-arch@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-arch >> To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org= " --eiGxCA8j3t9W15b4RvfXexTIWMNHp0Eiw Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQF8BAEBCgBmBQJVO55kXxSAAAAAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9w ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXQwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAw MDAwMDAwMDAwMDAwMDAwAAoJELufi/mShB0bH3AH/iynIZttCSfpAjLURApDguQB rss1/oNNIfDpigv4BC2AYLs144GyIfn8k8mAMzpiNQK1p+6cI3e5YC5Luo4KEyeB aenxiu6TuvEEPRVVsJaDjAt6J0ZTM4KQpiwokE4hHlib/hT2YQ9gQKVjYMPKHDQI D0jc0lEQbbDBZmLRMeFzk9gVy8/YmJTt6dGxkS0mttZrjuni05lELdPgIb250KZW rvs1irjEcztCbMUj6pGoaenhD/8arlSehxjKOmEP3HhYx7L6z/S+vOz5/rPXxYd/ 6/qpJwmuX14as0qUabcOHBl0Yd1nww0iUSwHlQhD2/0cTLN5NrbhIbIKkQIBbq0= =4s0y -----END PGP SIGNATURE----- --eiGxCA8j3t9W15b4RvfXexTIWMNHp0Eiw-- From owner-freebsd-arch@FreeBSD.ORG Sat Apr 25 16:31:32 2015 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 54790EBF for ; Sat, 25 Apr 2015 16:31:32 +0000 (UTC) Received: from mail-ig0-x230.google.com (mail-ig0-x230.google.com [IPv6:2607:f8b0:4001:c05::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 21BE71080 for ; Sat, 25 Apr 2015 16:31:32 +0000 (UTC) Received: by igbpi8 with SMTP id pi8so45466962igb.0 for ; Sat, 25 Apr 2015 09:31:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:date:message-id:subject:from:to:content-type; bh=tIa+mu/cO/iNmtzmsjft++fAa/aSgJUkEFjwJR6LJKU=; b=n9STZDhqTfUE6KleAF3oPHiQOpUVhXfdiS30V4Z19iXuDJnnwZlUjlKx8n+dCuWc6w lNtrJjY/IEQqr8588dNxfqlyQW5xwX9pZipolvbvIjvHRdANINvq8Z/azIGxBoYblffU kwnH5FkRoxOb71zULxZBWtnAO8AdXzl8Yb8RcR2vwkekUR/ZmatE/j+KDT+kq+Sz9d2l ph/lzLnB1DgHbquiC5sKfB/NioS3Fpuh5x926x0GHK26Saupbo8w5qkjJgFnouUDVS3t PbHJPVcZPtDsIgeT9NG0SpoyMQNMWKYvpz2VsrtGnGJSP2WDvf9IJFkLuQ2ryWTm7RjN hYXA== MIME-Version: 1.0 X-Received: by 10.107.155.13 with SMTP id d13mr4564178ioe.29.1429979491562; Sat, 25 Apr 2015 09:31:31 -0700 (PDT) Sender: adrian.chadd@gmail.com Received: by 10.36.38.133 with HTTP; Sat, 25 Apr 2015 09:31:31 -0700 (PDT) Date: Sat, 25 Apr 2015 09:31:31 -0700 X-Google-Sender-Auth: lVzMIvQwk5kpojJj4C_FbI7La4g Message-ID: Subject: RFC: setting performance_cx_lowest=C2 in -HEAD to avoid lock contention on many-CPU boxes From: Adrian Chadd To: "freebsd-arch@freebsd.org" Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Apr 2015 16:31:32 -0000 Hi! I've been doing some NUMA testing on large boxes and I've found that there's lock contention in the ACPI path. It's due to my change a while ago to start using sleep states above ACPI C1 by default. The ACPI C3 state involves a bunch of register fiddling in the ACPI sleep path that grabs a serialiser lock, and on an 80 thread box this is costly. I'd like to drop performance_cx_lowest to C2 in -HEAD. ACPI C2 state doesn't require the same register fiddling (to disable bus mastering, if I'm reading it right) and so it doesn't enter that particular serialised path. I've verified on Westmere-EX, Sandybridge, Ivybridge and Haswell boxes that ACPI C2 does let one drop down into a deeper CPU sleep state (C6 on each of these). I think is still a good default for both servers and desktops. If no-one has a problem with this then I'll do it after the weekend. Thanks! -adrian From owner-freebsd-arch@FreeBSD.ORG Sat Apr 25 16:34:50 2015 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 0AC6310C for ; Sat, 25 Apr 2015 16:34:50 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 72EDE114E for ; Sat, 25 Apr 2015 16:34:49 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id t3PGYipw005993 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Sat, 25 Apr 2015 19:34:44 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua t3PGYipw005993 Received: (from kostik@localhost) by tom.home (8.14.9/8.14.9/Submit) id t3PGYiLE005992; Sat, 25 Apr 2015 19:34:44 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Sat, 25 Apr 2015 19:34:44 +0300 From: Konstantin Belousov To: Jason Harmening Cc: Svatopluk Kraus , FreeBSD Arch Subject: Re: bus_dmamap_sync() for bounced client buffers from user address space Message-ID: <20150425163444.GL2390@kib.kiev.ua> References: <20150425094152.GE2390@kib.kiev.ua> <553B9E64.8030907@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <553B9E64.8030907@gmail.com> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.0 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on tom.home X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Apr 2015 16:34:50 -0000 On Sat, Apr 25, 2015 at 09:02:12AM -0500, Jason Harmening wrote: > It seems like in general it is too hard for drivers using busdma to deal > with usermode memory in a way that's both safe and efficient: > --bus_dmamap_load_uio + UIO_USERSPACE is apparently really unsafe > --if they do things the other way and allocate in the kernel, then then > they better either be willing to do extra copying, or create and > refcount their own vm_objects and use d_mmap_single (I still haven't > seen a good example of that), or leak a bunch of memory (if they use > d_mmap), because the old device pager is also really unsafe. munmap(2) does not free the pages, it removes the mapping and dereferences the backing vm object. If the region was wired, munmap would decrement the wiring count for the pages. So if a kernel code wired the regions pages, they are kept wired, but no longer mapped into the userspace. So bcopy() still does not work. d_mmap_single() is used by GPU, definitely by GEM and TTM code, and possibly by the proprietary nvidia driver. I believe UIO_USERSPACE is almost unused, it might be there for some obscure (and buggy) driver. From owner-freebsd-arch@FreeBSD.ORG Sat Apr 25 17:06:15 2015 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id E45188F1 for ; Sat, 25 Apr 2015 17:06:15 +0000 (UTC) Received: from mail-oi0-x22a.google.com (mail-oi0-x22a.google.com [IPv6:2607:f8b0:4003:c06::22a]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id A24F7147F for ; Sat, 25 Apr 2015 17:06:15 +0000 (UTC) Received: by oign205 with SMTP id n205so62281210oig.2 for ; Sat, 25 Apr 2015 10:06:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type; bh=cRCJ9EJdcbnDyFfuMWj4qrJE7bt6pI8VpSchY501Ncw=; b=LwUq8XXB2OeEFY5wu+UT848MFeDeXaR3FX9oNgBsTzpiaOKFX1fsOxTgxLrBmBxDUq Xe1Zle7uCQIycLsPgrSBMuEqfs+v0SPzTf2U7+uVV9p9tQUZiEUVT+ZkdJVGjx8ha1Bc kC4iSfxey/rOaaeRtLhvqmUqdmouJ0we7a0W088NpEOmbipgHNiNAlEX+HKYlZjNSi8y dIwX5BTPQwa9u21YuAtSUmNrZsCjmFDjtbeRRIztCON88dulZNBXivd9F8M9fdGhUHX1 Jn+eSXkYmf2aF+9dtgZp/MZqjBUo5HadTnTOlF1nHeAHGOfJxeqv8Vp78qREOAuGQy+4 +2mQ== X-Received: by 10.60.161.242 with SMTP id xv18mr3393087oeb.51.1429981574994; Sat, 25 Apr 2015 10:06:14 -0700 (PDT) Received: from corona.austin.rr.com (cpe-72-177-6-10.austin.res.rr.com. [72.177.6.10]) by mx.google.com with ESMTPSA id a7sm8330853oez.17.2015.04.25.10.06.13 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 25 Apr 2015 10:06:14 -0700 (PDT) Message-ID: <553BC9D1.1070502@gmail.com> Date: Sat, 25 Apr 2015 12:07:29 -0500 From: Jason Harmening User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0 MIME-Version: 1.0 To: Konstantin Belousov CC: Svatopluk Kraus , FreeBSD Arch Subject: Re: bus_dmamap_sync() for bounced client buffers from user address space References: <20150425094152.GE2390@kib.kiev.ua> <553B9E64.8030907@gmail.com> <20150425163444.GL2390@kib.kiev.ua> In-Reply-To: <20150425163444.GL2390@kib.kiev.ua> Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="o69MO6Bb4S2i7LQq19KNltmn3TMAc9FfI" X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Apr 2015 17:06:16 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --o69MO6Bb4S2i7LQq19KNltmn3TMAc9FfI Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable On 04/25/15 11:34, Konstantin Belousov wrote: > On Sat, Apr 25, 2015 at 09:02:12AM -0500, Jason Harmening wrote: >> It seems like in general it is too hard for drivers using busdma to de= al >> with usermode memory in a way that's both safe and efficient: >> --bus_dmamap_load_uio + UIO_USERSPACE is apparently really unsafe >> --if they do things the other way and allocate in the kernel, then the= n >> they better either be willing to do extra copying, or create and >> refcount their own vm_objects and use d_mmap_single (I still haven't >> seen a good example of that), or leak a bunch of memory (if they use >> d_mmap), because the old device pager is also really unsafe. > munmap(2) does not free the pages, it removes the mapping and dereferen= ces > the backing vm object. If the region was wired, munmap would decrement= > the wiring count for the pages. So if a kernel code wired the regions > pages, they are kept wired, but no longer mapped into the userspace. > So bcopy() still does not work. Ok, my question wasn't whether munmap frees the pages, but whether it accounts for wire count when attempting to remove the mapping. It doesn't, so yes bcopy will be unsafe. > > d_mmap_single() is used by GPU, definitely by GEM and TTM code, and pos= sibly > by the proprietary nvidia driver. > > I believe UIO_USERSPACE is almost unused, it might be there for some > obscure (and buggy) driver. It may be nearly unused, but we still document it in busdma.9, and we still explicitly check for it when setting the pmap in _bus_dmamap_load_uio. If it's not safe to use, then it's not OK for us to do that. We need to either a) remove support for it by adding a failure/KASSERT on UIO_USERSPACE in _busdmamap_load_uio() and remove the paragraph on it from busdma.9, or b) make it safe. I'd be in favor of b), because I think it is still valid to support some non-painful way of using DMA with userspace buffers. Right now, the only safe way to do that seems to be: 1) vm_fault_quick_hold_pages 2) kva_alloc 3) pmap_qenter 4) bus_dmamap_load That seems both unnecessarily complex and wasteful of KVA space. --o69MO6Bb4S2i7LQq19KNltmn3TMAc9FfI Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQF8BAEBCgBmBQJVO8nRXxSAAAAAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9w ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXQwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAw MDAwMDAwMDAwMDAwMDAwAAoJELufi/mShB0bLo8IAIcGMrX09DjmKnhFf0k4ujDE iPoxNlN20mT9Ai1GlmSG4roVTHgBR4F2JSESZO/Mu14bDzmfCpzM0qbnojT1Ig2d CK6/WCmE8RzkjdOHJvWKjZrX7bBdbscizXlL+NW4enpaCkMAHxcSeQDEZrdQmJ5q Dce2fzwxDNdKRpr8rGt0t52+ikuaiYTKb+nxOOH5xUus5VlLEky5aBSD/Zbm2YQx yiZd4URYYmeY2f6Gcku7OExeRqLnNkmGIgYHxoHX1Qdw4PJy60LAQztzG792wnEa YLYTzEXN10ro2mlDF+179hceGqYKGBrPBSva2qLX5s1mCCHSx//HuKzkQdMs7i8= =KqVS -----END PGP SIGNATURE----- --o69MO6Bb4S2i7LQq19KNltmn3TMAc9FfI-- From owner-freebsd-arch@FreeBSD.ORG Sat Apr 25 17:18:52 2015 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 087FBB5D; Sat, 25 Apr 2015 17:18:52 +0000 (UTC) Received: from mail-ob0-x234.google.com (mail-ob0-x234.google.com [IPv6:2607:f8b0:4003:c01::234]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id C420415BC; Sat, 25 Apr 2015 17:18:51 +0000 (UTC) Received: by oblw8 with SMTP id w8so58303781obl.0; Sat, 25 Apr 2015 10:18:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=CTYhs4OypYpVi3j7BMK9kWple/ceHM1m2PNQPZg8kYk=; b=FV9ddbafZRJEZVykE+HP6mZQkFtr6UwNKt5RJ7cqoz3/bu7I/JwnGlIXdF7POzGru1 Zu3wCRmdzVaTnAvrCIgfpqHOj+lBTpKxgXG2SuT0YGe5GRg2G7Wz28B1QVo2PwjkNYMR WPaqttoxYwA7rxLRyf7FUsheiR9KWg+0QfvO1camilvIj9/c9AxHdwvmiYJ832DDqyr+ Az022rzZqILsdLOHd2ETm2dwwyjfFH+UdqaoQpA6wNm8js1by6jTLH75BqXmOdB56xaO 6/iFA1yo4FDnpJU14Ez3vSVgOc9jV+Ip7F/BEFSG4Jc4L3en0sWIA8oPqMPNusZMY/D1 j0MQ== MIME-Version: 1.0 X-Received: by 10.202.186.214 with SMTP id k205mr3329740oif.10.1429982330879; Sat, 25 Apr 2015 10:18:50 -0700 (PDT) Sender: kmacybsd@gmail.com Received: by 10.202.11.82 with HTTP; Sat, 25 Apr 2015 10:18:50 -0700 (PDT) In-Reply-To: References: Date: Sat, 25 Apr 2015 10:18:50 -0700 X-Google-Sender-Auth: AFCAk4WZK3ZObL_e4cORbLR6tFs Message-ID: Subject: Re: RFC: setting performance_cx_lowest=C2 in -HEAD to avoid lock contention on many-CPU boxes From: "K. Macy" To: Adrian Chadd Cc: "freebsd-arch@freebsd.org" Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Apr 2015 17:18:52 -0000 Perhaps use an arbitrary cutoff - say <= 8 cores - where the cx_lowest=C3. This serialization isn't going to hurt on systems with more modest core counts. On Sat, Apr 25, 2015 at 9:31 AM, Adrian Chadd wrote: > Hi! > > I've been doing some NUMA testing on large boxes and I've found that > there's lock contention in the ACPI path. It's due to my change a > while ago to start using sleep states above ACPI C1 by default. The > ACPI C3 state involves a bunch of register fiddling in the ACPI sleep > path that grabs a serialiser lock, and on an 80 thread box this is > costly. > > I'd like to drop performance_cx_lowest to C2 in -HEAD. ACPI C2 state > doesn't require the same register fiddling (to disable bus mastering, > if I'm reading it right) and so it doesn't enter that particular > serialised path. I've verified on Westmere-EX, Sandybridge, Ivybridge > and Haswell boxes that ACPI C2 does let one drop down into a deeper > CPU sleep state (C6 on each of these). I think is still a good default > for both servers and desktops. > > If no-one has a problem with this then I'll do it after the weekend. > > Thanks! > > > > -adrian > _______________________________________________ > freebsd-arch@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" From owner-freebsd-arch@FreeBSD.ORG Sat Apr 25 17:28:40 2015 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 8D256C5B for ; Sat, 25 Apr 2015 17:28:40 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id F34A91691 for ; Sat, 25 Apr 2015 17:28:39 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id t3PHSYbL018846 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Sat, 25 Apr 2015 20:28:34 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua t3PHSYbL018846 Received: (from kostik@localhost) by tom.home (8.14.9/8.14.9/Submit) id t3PHSXJn018845; Sat, 25 Apr 2015 20:28:33 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Sat, 25 Apr 2015 20:28:33 +0300 From: Konstantin Belousov To: Jason Harmening Cc: Svatopluk Kraus , FreeBSD Arch Subject: Re: bus_dmamap_sync() for bounced client buffers from user address space Message-ID: <20150425172833.GM2390@kib.kiev.ua> References: <20150425094152.GE2390@kib.kiev.ua> <553B9E64.8030907@gmail.com> <20150425163444.GL2390@kib.kiev.ua> <553BC9D1.1070502@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <553BC9D1.1070502@gmail.com> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.0 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on tom.home X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Apr 2015 17:28:40 -0000 On Sat, Apr 25, 2015 at 12:07:29PM -0500, Jason Harmening wrote: > > On 04/25/15 11:34, Konstantin Belousov wrote: > > I believe UIO_USERSPACE is almost unused, it might be there for some > > obscure (and buggy) driver. > It may be nearly unused, but we still document it in busdma.9, and we > still explicitly check for it when setting the pmap in > _bus_dmamap_load_uio. If it's not safe to use, then it's not OK for us > to do that. > We need to either a) remove support for it by adding a failure/KASSERT > on UIO_USERSPACE in _busdmamap_load_uio() and remove the paragraph on it > from busdma.9, or b) make it safe. > > I'd be in favor of b), because I think it is still valid to support some > non-painful way of using DMA with userspace buffers. Right now, the > only safe way to do that seems to be: > 1) vm_fault_quick_hold_pages > 2) kva_alloc > 3) pmap_qenter > 4) bus_dmamap_load 1. vm_fault_quick_hold 2. bus_dmamap_load_ma > > That seems both unnecessarily complex and wasteful of KVA space. > The above sequence does not need a KVA allocation. From owner-freebsd-arch@FreeBSD.ORG Sat Apr 25 17:37:41 2015 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id B2BF8E30; Sat, 25 Apr 2015 17:37:41 +0000 (UTC) Received: from mail-ig0-x230.google.com (mail-ig0-x230.google.com [IPv6:2607:f8b0:4001:c05::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 7CE371766; Sat, 25 Apr 2015 17:37:41 +0000 (UTC) Received: by igbpi8 with SMTP id pi8so46004550igb.0; Sat, 25 Apr 2015 10:37:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=Phb/NQoXTkr9L15luDeyDMGmHX6GVt6+Cca+aMGtFik=; b=BW1PBGeSmAO0UxLJtjFQkWYWilcq58ZJ1xgNqJ60O7nAMh0T55Ylt+sfIdhCRYuH2d kVbBmFrk9tv6I0mAzQZ+Dpi02rtdzTz/NdH222DpeVLcJMML0g+Dgc7atyq87s0/M6lO Bi47DERk7vYyxxxJE03dL+Mcx/JbHw5O/rVyeDywO7XepC6SluFmPF6zjTYDVcU25H6m RF9cN1dZ19jIhuBD/cI0Kb4enrKMi4i5Z3oRb2qRCp9AcuTOGeYTa5qGnKYxXtnIfBbn oUghd37pOwg3lAbCOPtZl0CmWlIVQ3DY+ceAe2oXL/j/heWbS83fP0Us5UT8W2ypGQqi lzzg== MIME-Version: 1.0 X-Received: by 10.50.141.198 with SMTP id rq6mr370239igb.6.1429983460954; Sat, 25 Apr 2015 10:37:40 -0700 (PDT) Sender: adrian.chadd@gmail.com Received: by 10.36.38.133 with HTTP; Sat, 25 Apr 2015 10:37:40 -0700 (PDT) In-Reply-To: References: Date: Sat, 25 Apr 2015 10:37:40 -0700 X-Google-Sender-Auth: DL0iYIbYI4Qj7SHIPjRU1Ltqndo Message-ID: Subject: Re: RFC: setting performance_cx_lowest=C2 in -HEAD to avoid lock contention on many-CPU boxes From: Adrian Chadd To: "K. Macy" Cc: "freebsd-arch@freebsd.org" Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Apr 2015 17:37:41 -0000 On 25 April 2015 at 10:18, K. Macy wrote: > Perhaps use an arbitrary cutoff - say <= 8 cores - where the > cx_lowest=C3. This serialization isn't going to hurt on systems with > more modest core counts. Maybe. I bet it's a function of the idle state entry rate and core count - so maybe at 8 cores it'll hurt but only if it's entering idle at a high rate. Eg, if it's taking a hell of a lot of interrupts but not maxing out the CPU. -adrian From owner-freebsd-arch@FreeBSD.ORG Sat Apr 25 17:44:17 2015 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 97889F11; Sat, 25 Apr 2015 17:44:17 +0000 (UTC) Received: from mail-ig0-x230.google.com (mail-ig0-x230.google.com [IPv6:2607:f8b0:4001:c05::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 619AA1849; Sat, 25 Apr 2015 17:44:17 +0000 (UTC) Received: by igblo3 with SMTP id lo3so36093201igb.0; Sat, 25 Apr 2015 10:44:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=xFkK/f3mDTM1zymQ6tfahXg54M8WwkSTK4vCVHK2eCA=; b=d/ujyJKGrxU70ZiaBQTIxjdY8CnefVEQKthwZD3u3a1ookR6IgJrbyCZ9VQjZfJRkS 1aeSD+MsfzA+VSQ6VanRoMvh82dwMk2CuasecjNXqXZYxR381obCM4F/WvSpGqCWWMEj Br+c0PKOFlWPU+8oKEYKy0WOiKXEP5fLESGi3mHGFQqCk7xrIdpp9UBjI2HYsuUJYeYS LiiBLn/BxzAXCZSAytWpTXuapJOWct8GtmkWIPK5cQFDryaloA3isJuaMSidHpYRRGar W3KpA01WklF9euXYwhbvEOU6l+uFQnCCUshA9YoHaiPu9QI1VWjSM1flxpE9wArmH5iX OC0A== MIME-Version: 1.0 X-Received: by 10.107.155.13 with SMTP id d13mr4820287ioe.29.1429983856820; Sat, 25 Apr 2015 10:44:16 -0700 (PDT) Sender: adrian.chadd@gmail.com Received: by 10.36.38.133 with HTTP; Sat, 25 Apr 2015 10:44:16 -0700 (PDT) In-Reply-To: References: Date: Sat, 25 Apr 2015 10:44:16 -0700 X-Google-Sender-Auth: xit-XnkEPFT_yEq0wAFn6x8LPYc Message-ID: Subject: Re: RFC: setting performance_cx_lowest=C2 in -HEAD to avoid lock contention on many-CPU boxes From: Adrian Chadd To: "K. Macy" Cc: "freebsd-arch@freebsd.org" Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Apr 2015 17:44:17 -0000 Oh the other thing, which I just mentioned to kip in IRC - all of the intel laptops I've tested (and that's a long list) don't enter CPU C7 if the power is plugged in. Ie: * power in, ACPI C2 -> CPU C6 * power in, ACPI C3 - CPU C6 * battery - ACPI C2 -> CPU C6 * battery -> ACPI C3 -> CPU C7 So having performance_cx_lowest=C3 is effectively a no-op on the devices that it'd matter on, so it's okay to just flip it to C2. -adrian From owner-freebsd-arch@FreeBSD.ORG Sat Apr 25 17:53:58 2015 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id B159D1CD for ; Sat, 25 Apr 2015 17:53:58 +0000 (UTC) Received: from mail-oi0-x232.google.com (mail-oi0-x232.google.com [IPv6:2607:f8b0:4003:c06::232]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 6ED65194A for ; Sat, 25 Apr 2015 17:53:58 +0000 (UTC) Received: by oign205 with SMTP id n205so62710532oig.2 for ; Sat, 25 Apr 2015 10:53:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type; bh=zfrSQAiZztFEWS4GNzxodqQ+QpYS/QjEVLYB8Iel53s=; b=z+MMXalU8XqNmc6VGvvqN0uBl/OBq8rc9fCP3QqQ7jErUf2DJLFkoeb7QCR5y4f+nn K5C6QnJsSkZ43o1qIBEJlPl/2UMmTWGb6Snm8oDYYrfmq563prw2CXcdLkaHnBmZXZdn 02kbUYtXLEyef6Juw83jR8aDNLFXgL+gWk6NOd8BYC3cL2lr9yE9oDevwM1tECyrEXBx UuZBBSuENGvDeMGEsGL/zX9YCUgQE7CSqYUxSQPSC4SnunVZbqA+eT+7CyEarBLUctfb oKzfXrYatZAt2XB1LJ2FqlEkCwia4wIMPW63mgeHlV5LJALZy4CFSizscm3Agj75EKCh 4pGQ== X-Received: by 10.202.46.81 with SMTP id u78mr3363538oiu.54.1429984437895; Sat, 25 Apr 2015 10:53:57 -0700 (PDT) Received: from corona.austin.rr.com (cpe-72-177-6-10.austin.res.rr.com. [72.177.6.10]) by mx.google.com with ESMTPSA id c3sm7652186obo.5.2015.04.25.10.53.56 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 25 Apr 2015 10:53:57 -0700 (PDT) Message-ID: <553BD501.4010109@gmail.com> Date: Sat, 25 Apr 2015 12:55:13 -0500 From: Jason Harmening User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0 MIME-Version: 1.0 To: Konstantin Belousov CC: Svatopluk Kraus , FreeBSD Arch Subject: Re: bus_dmamap_sync() for bounced client buffers from user address space References: <20150425094152.GE2390@kib.kiev.ua> <553B9E64.8030907@gmail.com> <20150425163444.GL2390@kib.kiev.ua> <553BC9D1.1070502@gmail.com> <20150425172833.GM2390@kib.kiev.ua> In-Reply-To: <20150425172833.GM2390@kib.kiev.ua> Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="UoMawm07iWViiabtCn5TNOBAjbLNEqJQ7" X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Apr 2015 17:53:58 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --UoMawm07iWViiabtCn5TNOBAjbLNEqJQ7 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable On 04/25/15 12:28, Konstantin Belousov wrote: > On Sat, Apr 25, 2015 at 12:07:29PM -0500, Jason Harmening wrote: >> On 04/25/15 11:34, Konstantin Belousov wrote: >>> I believe UIO_USERSPACE is almost unused, it might be there for some >>> obscure (and buggy) driver. >> It may be nearly unused, but we still document it in busdma.9, and we >> still explicitly check for it when setting the pmap in >> _bus_dmamap_load_uio. If it's not safe to use, then it's not OK for u= s >> to do that. >> We need to either a) remove support for it by adding a failure/KASSERT= >> on UIO_USERSPACE in _busdmamap_load_uio() and remove the paragraph on = it >> from busdma.9, or b) make it safe. >> >> I'd be in favor of b), because I think it is still valid to support so= me >> non-painful way of using DMA with userspace buffers. Right now, the >> only safe way to do that seems to be: >> 1) vm_fault_quick_hold_pages >> 2) kva_alloc >> 3) pmap_qenter >> 4) bus_dmamap_load > 1. vm_fault_quick_hold > 2. bus_dmamap_load_ma > >> That seems both unnecessarily complex and wasteful of KVA space. >> > The above sequence does not need a KVA allocation. Ah, that looks much better. A few things though: 1) _bus_dmamap_load_ma (note the underscore) is still part of the MI/MD interface, which we tell drivers not to use. It looks like it's implemented for every arch though. Should there be a public and documented bus_dmamap_load_ma ? 2) There is a bus_dmamap_load_ma_triv that's part of the MI interface, but it's not documented, and it seems like it would be suboptimal in certain cases, such as when dmar is enabled. 3) Using bus_dmamap_load_ma would mean always using physcopy for bounce buffers...seems like the sfbufs would slow things down ? --UoMawm07iWViiabtCn5TNOBAjbLNEqJQ7 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQF8BAEBCgBmBQJVO9UBXxSAAAAAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9w ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXQwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAw MDAwMDAwMDAwMDAwMDAwAAoJELufi/mShB0bd1cH/RxJiZ/8/3BR+A1ZUFzUXISN lygdgF/0ZYfHKk0LsRgbWcWKCWgThdgRBFn31714DiI2ZqZfkrfVzEHkeL6XoCFF Tp3Zzil4fvMqmnN/RKtD3vapw5WNZR4Qi6AQuz+vIGZ7hw2Dv71uj4MugbyU5w9t LjAcZEX8M6173itq1JtegBVCqmgQYl0stYtF5nu3u6HmuEMl2m2o9Lyo8eVpoSe3 P/MFtMHwYWaW9kcs0Z0x2hW2bHkLYYZ8QgSs93DaEo6UEtkznKCforPEE4lZvezV yZ8wlzqK73zRDpwjrTqgTsUF7bF6P/CPZSyZENMXlupsSZiahVLy1z2VliwV0c4= =6S8y -----END PGP SIGNATURE----- --UoMawm07iWViiabtCn5TNOBAjbLNEqJQ7-- From owner-freebsd-arch@FreeBSD.ORG Sat Apr 25 18:02:10 2015 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A69BD3D9; Sat, 25 Apr 2015 18:02:10 +0000 (UTC) Received: from mail-oi0-x234.google.com (mail-oi0-x234.google.com [IPv6:2607:f8b0:4003:c06::234]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 69B701A38; Sat, 25 Apr 2015 18:02:10 +0000 (UTC) Received: by oift201 with SMTP id t201so62813931oif.3; Sat, 25 Apr 2015 11:02:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=ps1MnpiD6yqkprLgjXo0q+tWyWhmaz1QtuLNxnW6OSU=; b=JRKAiD3uWhKng4wD4yWRK469NNzLSAMuxh1E3vLwVwXZvgw6jm6LTCgYDOfG6kEzzr WtXPKlooPSsV/kTciAIOzr7OispvCPxF8gePPAQVfHpYaZAvuMt/qR0HpgE2kjqVBp4b MAjX8MBQQN07EBbExMRHkqEmYz/GrxfYc8OVpXg6aCjSq2s8FKQ9VdA6PCUI7F44PWK1 Vlw479IZsAeY1BSlsBQb4ahaxkspbMxBfJ9HOICbhhY+m2mGcbQHshUJN9MYH88hdXb5 T4ssAxJLOMehiThz+9nh31dm3IxLEPjQMdvs2A4kw7cTPedVI0aQ/ttCHENgN5WhDtLP 94Gg== MIME-Version: 1.0 X-Received: by 10.202.186.214 with SMTP id k205mr3429481oif.10.1429984929591; Sat, 25 Apr 2015 11:02:09 -0700 (PDT) Sender: kmacybsd@gmail.com Received: by 10.202.11.82 with HTTP; Sat, 25 Apr 2015 11:02:09 -0700 (PDT) Received: by 10.202.11.82 with HTTP; Sat, 25 Apr 2015 11:02:09 -0700 (PDT) In-Reply-To: References: Date: Sat, 25 Apr 2015 11:02:09 -0700 X-Google-Sender-Auth: Ma3CYOdNjVrs94XGXTJEV-5Ep4Y Message-ID: Subject: Re: RFC: setting performance_cx_lowest=C2 in -HEAD to avoid lock contention on many-CPU boxes From: "K. Macy" To: Adrian Chadd Cc: freebsd-arch@freebsd.org Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Apr 2015 18:02:10 -0000 Um, don't you care more about power savings when it's on battery? On Apr 25, 2015 10:44 AM, "Adrian Chadd" wrote: > Oh the other thing, which I just mentioned to kip in IRC - all of the > intel laptops I've tested (and that's a long list) don't enter CPU C7 > if the power is plugged in. > > Ie: > > * power in, ACPI C2 -> CPU C6 > * power in, ACPI C3 - CPU C6 > * battery - ACPI C2 -> CPU C6 > * battery -> ACPI C3 -> CPU C7 > > So having performance_cx_lowest=C3 is effectively a no-op on the > devices that it'd matter on, so it's okay to just flip it to C2. > > > > -adrian > From owner-freebsd-arch@FreeBSD.ORG Sat Apr 25 18:05:46 2015 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 50FBE4BB; Sat, 25 Apr 2015 18:05:46 +0000 (UTC) Received: from mail-ig0-x22b.google.com (mail-ig0-x22b.google.com [IPv6:2607:f8b0:4001:c05::22b]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 18EB61A50; Sat, 25 Apr 2015 18:05:46 +0000 (UTC) Received: by igblo3 with SMTP id lo3so36274362igb.0; Sat, 25 Apr 2015 11:05:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=h+05ZEZlT9/F2+otprg63CB97Wtuy2EcKrwy+4jDiPI=; b=dEV2E4ZRCvqS4Gp4fNz5HTCUBZ5TDGsgHbbdfc3GTnpefxiqP0bluqwmc/r79FnQ2j z3YuLtIHlOCBSwLquXJccLjm9AAWcLzyfGlSg79XwNFcAv1fxvqeaJOJSLjcqGFQs+ii Rq6biHnd/FPUa1jHx+gRmH+BgJYBKFsvrWXBKVybBUQrZAbJQtHDsdWwE8nybPYdsGIP O5T7T6YfHrSaU4PTm9UhZ4l4VV0RpMIB5IelguKZyWP9f3gM7fi1HhScvKk86vkH4Hst NSuEFHKJHFxJk9LkVDDOwZqXVoA2fyQa+uFAWMGbELTcAuESuOGDGU8gB2Dmg711nA3W eBxg== MIME-Version: 1.0 X-Received: by 10.107.136.25 with SMTP id k25mr4804256iod.88.1429985145555; Sat, 25 Apr 2015 11:05:45 -0700 (PDT) Sender: adrian.chadd@gmail.com Received: by 10.36.38.133 with HTTP; Sat, 25 Apr 2015 11:05:45 -0700 (PDT) In-Reply-To: References: Date: Sat, 25 Apr 2015 11:05:45 -0700 X-Google-Sender-Auth: ra7VqadAn-FIXCu5UsmIFmCsH1Y Message-ID: Subject: Re: RFC: setting performance_cx_lowest=C2 in -HEAD to avoid lock contention on many-CPU boxes From: Adrian Chadd To: "K. Macy" Cc: "freebsd-arch@freebsd.org" Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Apr 2015 18:05:46 -0000 On 25 April 2015 at 11:02, K. Macy wrote: > Um, don't you care more about power savings when it's on battery? right, that's what I'm changing performance_cx_lowest, and not economy_cx_lowest. performance == AC power economy == battery > On Apr 25, 2015 10:44 AM, "Adrian Chadd" wrote: >> >> Oh the other thing, which I just mentioned to kip in IRC - all of the >> intel laptops I've tested (and that's a long list) don't enter CPU C7 >> if the power is plugged in. >> >> Ie: >> >> * power in, ACPI C2 -> CPU C6 >> * power in, ACPI C3 - CPU C6 >> * battery - ACPI C2 -> CPU C6 >> * battery -> ACPI C3 -> CPU C7 >> >> So having performance_cx_lowest=C3 is effectively a no-op on the >> devices that it'd matter on, so it's okay to just flip it to C2. >> >> >> >> -adrian From owner-freebsd-arch@FreeBSD.ORG Sat Apr 25 18:18:08 2015 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 05181750; Sat, 25 Apr 2015 18:18:08 +0000 (UTC) Received: from mail-lb0-x229.google.com (mail-lb0-x229.google.com [IPv6:2a00:1450:4010:c04::229]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 815AA1B4A; Sat, 25 Apr 2015 18:18:07 +0000 (UTC) Received: by lbbqq2 with SMTP id qq2so57013677lbb.3; Sat, 25 Apr 2015 11:18:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=GHeXNbqzfUuv+fBZ5fmKo4vyza8SH9sFKTYoUEekyzs=; b=zxeuEuh+zjP7zQ3o90/ipSnKAmy4q3khcj6Jk/D9or3gpI0ee/aa2r4MP04HebRqkC Sk6EkjJiJ86LWLwqPG4Bgqq3Qjr8rMEqUIIIqqr759wl+4MAX26MJljAuaUbKRFWXExH s1eVt6H/hGq6aAcIpUgtdboXUcuasFAaopRPK/0qj2P4WYwUIlP0eJfFUhrjM104XtbP 9EQI5dAtU/RHn6Iiqik7DQQSAmuuAwUS3nM01GFc2j+Z1Yg1SWoQJ5tV6TLHHPaOqL6D hGHKe71n+/k6JXz6OHApEiRVO2I18seTg6ZXlmwKl0PkE2MCT5aTtZ8tqc458dxfqe1I LkxQ== MIME-Version: 1.0 X-Received: by 10.152.234.139 with SMTP id ue11mr3597627lac.28.1429985885670; Sat, 25 Apr 2015 11:18:05 -0700 (PDT) Sender: davide.italiano@gmail.com Received: by 10.25.88.77 with HTTP; Sat, 25 Apr 2015 11:18:05 -0700 (PDT) In-Reply-To: References: Date: Sat, 25 Apr 2015 11:18:05 -0700 X-Google-Sender-Auth: 4sDDFijODMBR-lRmfxQUMCV5F4M Message-ID: Subject: Re: RFC: setting performance_cx_lowest=C2 in -HEAD to avoid lock contention on many-CPU boxes From: Davide Italiano To: Adrian Chadd Cc: "freebsd-arch@freebsd.org" Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Apr 2015 18:18:08 -0000 On Sat, Apr 25, 2015 at 9:31 AM, Adrian Chadd wrote: > Hi! > > I've been doing some NUMA testing on large boxes and I've found that > there's lock contention in the ACPI path. It's due to my change a > while ago to start using sleep states above ACPI C1 by default. The > ACPI C3 state involves a bunch of register fiddling in the ACPI sleep > path that grabs a serialiser lock, and on an 80 thread box this is > costly. > > I'd like to drop performance_cx_lowest to C2 in -HEAD. ACPI C2 state > doesn't require the same register fiddling (to disable bus mastering, > if I'm reading it right) and so it doesn't enter that particular > serialised path. I've verified on Westmere-EX, Sandybridge, Ivybridge > and Haswell boxes that ACPI C2 does let one drop down into a deeper > CPU sleep state (C6 on each of these). I think is still a good default > for both servers and desktops. > > If no-one has a problem with this then I'll do it after the weekend. > This sounds to me just a way to hide a problem. Very few people nowaday run on NUMA and they can tune the machine as they like when they do testing. If there's a lock contention problem, it needs to be fixed and not hidden under another default. Also, as already noted this is a problem on 80-core machines but probably not on a 2-core Atom. I think you need to understand factors better and come up with a more sensible relation. In other words, your bet needs to be proven before changing a default useful for frew that can impact many. -- Davide "There are no solved problems; there are only problems that are more or less solved" -- Henri Poincare From owner-freebsd-arch@FreeBSD.ORG Sat Apr 25 18:18:52 2015 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 6A82C806 for ; Sat, 25 Apr 2015 18:18:52 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id D12BB1B5E for ; Sat, 25 Apr 2015 18:18:51 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id t3PIIlCj030557 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Sat, 25 Apr 2015 21:18:47 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua t3PIIlCj030557 Received: (from kostik@localhost) by tom.home (8.14.9/8.14.9/Submit) id t3PIIkiN030556; Sat, 25 Apr 2015 21:18:46 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Sat, 25 Apr 2015 21:18:46 +0300 From: Konstantin Belousov To: Jason Harmening Cc: Svatopluk Kraus , FreeBSD Arch Subject: Re: bus_dmamap_sync() for bounced client buffers from user address space Message-ID: <20150425181846.GN2390@kib.kiev.ua> References: <20150425094152.GE2390@kib.kiev.ua> <553B9E64.8030907@gmail.com> <20150425163444.GL2390@kib.kiev.ua> <553BC9D1.1070502@gmail.com> <20150425172833.GM2390@kib.kiev.ua> <553BD501.4010109@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <553BD501.4010109@gmail.com> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.0 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on tom.home X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Apr 2015 18:18:52 -0000 On Sat, Apr 25, 2015 at 12:55:13PM -0500, Jason Harmening wrote: > Ah, that looks much better. A few things though: > 1) _bus_dmamap_load_ma (note the underscore) is still part of the MI/MD > interface, which we tell drivers not to use. It looks like it's > implemented for every arch though. Should there be a public and > documented bus_dmamap_load_ma ? Might be yes. But at least one consumer of the KPI must appear before the facility is introduced. > 2) There is a bus_dmamap_load_ma_triv that's part of the MI interface, > but it's not documented, and it seems like it would be suboptimal in > certain cases, such as when dmar is enabled. When DMAR is enabled, bus_dmamap_load_triv() should not be used. It should not be used directly even when not. Drivers should use bus_dmamap_load_ma(), and implementation redirects to _triv() if needed. The _triv() is the helper to allow bus_dmamap_load_ma() to exists on architectures which cannot implement, on not yet implemented, proper page array load op. > 3) Using bus_dmamap_load_ma would mean always using physcopy for bounce > buffers...seems like the sfbufs would slow things down ? For amd64, sfbufs are nop, due to the direct map. But, I doubt that we can combine bounce buffers and performance in the single sentence. From owner-freebsd-arch@FreeBSD.ORG Sat Apr 25 18:45:11 2015 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id EE80EE74; Sat, 25 Apr 2015 18:45:10 +0000 (UTC) Received: from mail-ig0-x22f.google.com (mail-ig0-x22f.google.com [IPv6:2607:f8b0:4001:c05::22f]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id B84921E1F; Sat, 25 Apr 2015 18:45:10 +0000 (UTC) Received: by igblo3 with SMTP id lo3so36602877igb.0; Sat, 25 Apr 2015 11:45:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=s03rWzS3VQoPm0+hax9cAlGUqh/2tjRN9inL+1Q1cwA=; b=beQbViBu02LpsXfAZS/ZZqpjVe4tmw5VNyYtPpfvctoZBNrszgeg/7BH1VsPI46WD7 1jSPqndD155BCLzN/JpHqNWRsI94XpdvxiTT9acfKClXRpkWbkQumB795+vJsWFT2pFr Kn5+N28ij0xDa++MSvyAWjgtieK13XxiTj6F/TquJA6NafMRVvZSwvXheSlqtNn67X85 YkZJbplEq4vCalvkAr/1FAhlChtQ7//PU7WeYFcTsWA3lfNMXaumNfmxbZDyd+5AE3o1 lAuE0sWWQdJBQITC+TqfOjuxUcJTe6zdX403f2aDwrwTlm2QjxducCBVcddlRW5q34K7 1TQw== MIME-Version: 1.0 X-Received: by 10.107.136.25 with SMTP id k25mr4927011iod.88.1429987510146; Sat, 25 Apr 2015 11:45:10 -0700 (PDT) Sender: adrian.chadd@gmail.com Received: by 10.36.38.133 with HTTP; Sat, 25 Apr 2015 11:45:10 -0700 (PDT) In-Reply-To: References: Date: Sat, 25 Apr 2015 11:45:10 -0700 X-Google-Sender-Auth: J7cjSiXLCSOaYEM2YxuBtayScPE Message-ID: Subject: Re: RFC: setting performance_cx_lowest=C2 in -HEAD to avoid lock contention on many-CPU boxes From: Adrian Chadd To: Davide Italiano Cc: "freebsd-arch@freebsd.org" Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Apr 2015 18:45:11 -0000 On 25 April 2015 at 11:18, Davide Italiano wrote: > On Sat, Apr 25, 2015 at 9:31 AM, Adrian Chadd wrote: >> Hi! >> >> I've been doing some NUMA testing on large boxes and I've found that >> there's lock contention in the ACPI path. It's due to my change a >> while ago to start using sleep states above ACPI C1 by default. The >> ACPI C3 state involves a bunch of register fiddling in the ACPI sleep >> path that grabs a serialiser lock, and on an 80 thread box this is >> costly. >> >> I'd like to drop performance_cx_lowest to C2 in -HEAD. ACPI C2 state >> doesn't require the same register fiddling (to disable bus mastering, >> if I'm reading it right) and so it doesn't enter that particular >> serialised path. I've verified on Westmere-EX, Sandybridge, Ivybridge >> and Haswell boxes that ACPI C2 does let one drop down into a deeper >> CPU sleep state (C6 on each of these). I think is still a good default >> for both servers and desktops. >> >> If no-one has a problem with this then I'll do it after the weekend. >> > > This sounds to me just a way to hide a problem. > Very few people nowaday run on NUMA and they can tune the machine as > they like when they do testing. > If there's a lock contention problem, it needs to be fixed and not > hidden under another default. The lock contention problem is inside ACPI and how it's designed/implemented. We're not going to easily be able to make ACPI lock "better" as we're constrained by how ACPI implements things in the shared ACPICA code. > Also, as already noted this is a problem on 80-core machines but > probably not on a 2-core Atom. I think you need to understand factors > better and come up with a more sensible relation. In other words, your > bet needs to be proven before changing a default useful for frew that > can impact many. I've just described the differences in behaviour. I've checked the C states on all the intel servers too - with power plugged in, ACPI C2 and ACPI C3 still result in entering CPU C6 state, not CPU C7 state - so it's not going to result in worse behaviour. For reference, "all" being the following list: * westmere-EX * nehalem * sandybridge * sandybridge mobile * sandybridge xeon * ivybridge mobile * ivybridge xeon * haswell mobile * haswell * haswell xeon * haswell xeon v3 -adrian From owner-freebsd-arch@FreeBSD.ORG Sat Apr 25 18:45:52 2015 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 93EC1F1C for ; Sat, 25 Apr 2015 18:45:52 +0000 (UTC) Received: from mail-oi0-x233.google.com (mail-oi0-x233.google.com [IPv6:2607:f8b0:4003:c06::233]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4FF1F1E25 for ; Sat, 25 Apr 2015 18:45:52 +0000 (UTC) Received: by oift201 with SMTP id t201so63202551oif.3 for ; Sat, 25 Apr 2015 11:45:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type; bh=KfCJrTiLG662wbxdgA798kUQP/T/rKzTiw5UnyjTcDw=; b=FmR1X8zMw5FTUz+e0IT8rk+eHEWClUdCFI3Yjtah0B3cT3Hrms2HFQ2PCvXajJjxjX adaSTtIb0gK/aZgYidn3qBR/Ngm3oZzMF6Dxuvhbb765j8ybM6eimYAyZL2wlM1T0hL5 Jt4C7JUxWNx7p6AXQGTNiab0Zwx5dHoiSWJQI1u3I3g9e/QMIYoBrQvvGBAs01UV+Y17 9uIFa4Hx/vQAU34h6nfTjymEnDPLRGsi/FDyycCoAgEOS/XXT9DaioqK8PznzAVeYFmq xa+k8WQ4tf6axuiQeuVg0rQBPVoGWXIUi5cBsDvf0zISXRSTeCVbd51AT01ktKUeBiAt 6bnw== X-Received: by 10.182.142.137 with SMTP id rw9mr3650023obb.83.1429987551648; Sat, 25 Apr 2015 11:45:51 -0700 (PDT) Received: from corona.austin.rr.com (cpe-72-177-6-10.austin.res.rr.com. [72.177.6.10]) by mx.google.com with ESMTPSA id gc7sm8539138obb.26.2015.04.25.11.45.50 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 25 Apr 2015 11:45:51 -0700 (PDT) Message-ID: <553BE12B.4000105@gmail.com> Date: Sat, 25 Apr 2015 13:47:07 -0500 From: Jason Harmening User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0 MIME-Version: 1.0 To: Konstantin Belousov CC: Svatopluk Kraus , FreeBSD Arch Subject: Re: bus_dmamap_sync() for bounced client buffers from user address space References: <20150425094152.GE2390@kib.kiev.ua> <553B9E64.8030907@gmail.com> <20150425163444.GL2390@kib.kiev.ua> <553BC9D1.1070502@gmail.com> <20150425172833.GM2390@kib.kiev.ua> <553BD501.4010109@gmail.com> <20150425181846.GN2390@kib.kiev.ua> In-Reply-To: <20150425181846.GN2390@kib.kiev.ua> Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="nL5iEtE9e7JmsrptGRdbAV4CnLqkeTER2" X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Apr 2015 18:45:52 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --nL5iEtE9e7JmsrptGRdbAV4CnLqkeTER2 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable On 04/25/15 13:18, Konstantin Belousov wrote: > On Sat, Apr 25, 2015 at 12:55:13PM -0500, Jason Harmening wrote: >> Ah, that looks much better. A few things though: >> 1) _bus_dmamap_load_ma (note the underscore) is still part of the MI/M= D >> interface, which we tell drivers not to use. It looks like it's >> implemented for every arch though. Should there be a public and >> documented bus_dmamap_load_ma ? > Might be yes. But at least one consumer of the KPI must appear before > the facility is introduced. Could some of the GART/GTT code consume that? >> 2) There is a bus_dmamap_load_ma_triv that's part of the MI interface,= >> but it's not documented, and it seems like it would be suboptimal in >> certain cases, such as when dmar is enabled. > When DMAR is enabled, bus_dmamap_load_triv() should not be used. > It should not be used directly even when not. Drivers should use > bus_dmamap_load_ma(), and implementation redirects to _triv() if > needed. > > The _triv() is the helper to allow bus_dmamap_load_ma() to exists > on architectures which cannot implement, on not yet implemented, > proper page array load op. Yes, I noticed the same thing. I'm not sure why _triv() is treated as part of the public API and not prefixed with an underscore and a comment not to use it in drivers. >> 3) Using bus_dmamap_load_ma would mean always using physcopy for bounc= e >> buffers...seems like the sfbufs would slow things down ? > For amd64, sfbufs are nop, due to the direct map. But, I doubt that > we can combine bounce buffers and performance in the single sentence. In fact the amd64 implementation of uiomove_fromphys doesn't use sfbufs at all thanks to the direct map. sparc64 seems to avoid sfbufs as much as possible too. I don't know what arm64/aarch64 will be able to use.=20 Those seem like the platforms where bounce buffering would be the most likely, along with i386 + PAE. They might still be used on 32-bit platforms for alignment or devices with < 32-bit address width, but then those are likely to be old and slow anyway. I'm still a bit worried about the slowness of waiting for an sfbuf if one is needed, but in practice that might not be a big issue. --nL5iEtE9e7JmsrptGRdbAV4CnLqkeTER2 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQF8BAEBCgBmBQJVO+ErXxSAAAAAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9w ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXQwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAw MDAwMDAwMDAwMDAwMDAwAAoJELufi/mShB0b4MQIALHE02H9F1KBw7E6d75Q03Nk H7XaCvETTyNOQfxPLQ4Zs3FjRD3QbjsXavyqQA9XsAO8a9vMeDkUbKFaqIKWen4T zPAvEZ5RhWRnxRs0NNo/uvPCob7Or1GeE/kq9j0sxcVcvVVGa+xa4gCf6R+3aQ4Y rpEt72V4YyaPmr7e0nV6TTAjbJtbY4RxwbeSGuIlKcH+hHkfooRDWusEn8urOCVa bLzlrOZniwig6qG5s3fZ1oIRgWA9ngYy8jmh9RkEI/uhQeibJr1zJMCCu1Fg5ZQt V9STVSF2kyFlFqZVz+ENPGDEasjqAmgOwjHI4PF8WvujCLbO0h4OJZBkCMWLLD8= =aJCh -----END PGP SIGNATURE----- --nL5iEtE9e7JmsrptGRdbAV4CnLqkeTER2-- From owner-freebsd-arch@FreeBSD.ORG Sat Apr 25 20:14:16 2015 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 3FA82CB4 for ; Sat, 25 Apr 2015 20:14:16 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id D4743164A for ; Sat, 25 Apr 2015 20:14:15 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id t3PKEADd057498 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Sat, 25 Apr 2015 23:14:10 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua t3PKEADd057498 Received: (from kostik@localhost) by tom.home (8.14.9/8.14.9/Submit) id t3PKEAXF057497; Sat, 25 Apr 2015 23:14:10 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Sat, 25 Apr 2015 23:14:10 +0300 From: Konstantin Belousov To: Jason Harmening Cc: Svatopluk Kraus , FreeBSD Arch Subject: Re: bus_dmamap_sync() for bounced client buffers from user address space Message-ID: <20150425201410.GP2390@kib.kiev.ua> References: <20150425094152.GE2390@kib.kiev.ua> <553B9E64.8030907@gmail.com> <20150425163444.GL2390@kib.kiev.ua> <553BC9D1.1070502@gmail.com> <20150425172833.GM2390@kib.kiev.ua> <553BD501.4010109@gmail.com> <20150425181846.GN2390@kib.kiev.ua> <553BE12B.4000105@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <553BE12B.4000105@gmail.com> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.0 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on tom.home X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Apr 2015 20:14:16 -0000 On Sat, Apr 25, 2015 at 01:47:07PM -0500, Jason Harmening wrote: > > On 04/25/15 13:18, Konstantin Belousov wrote: > > On Sat, Apr 25, 2015 at 12:55:13PM -0500, Jason Harmening wrote: > >> Ah, that looks much better. A few things though: > >> 1) _bus_dmamap_load_ma (note the underscore) is still part of the MI/MD > >> interface, which we tell drivers not to use. It looks like it's > >> implemented for every arch though. Should there be a public and > >> documented bus_dmamap_load_ma ? > > Might be yes. But at least one consumer of the KPI must appear before > > the facility is introduced. > > Could some of the GART/GTT code consume that? Do you mean, by GEM/GTT code ? Indeed, this is interesting and probably workable suggestion. I thought that I would need to provide a special interface from DMAR for the GEM, but your proposal seems to fit. Still, an issue is that the Linux code is structured significantly different, and this code, although isolated, is significant divergent from the upstream. The special DMAR interface is still needed for bhyve, I am slowly working on it. > > >> 2) There is a bus_dmamap_load_ma_triv that's part of the MI interface, > >> but it's not documented, and it seems like it would be suboptimal in > >> certain cases, such as when dmar is enabled. > > When DMAR is enabled, bus_dmamap_load_triv() should not be used. > > It should not be used directly even when not. Drivers should use > > bus_dmamap_load_ma(), and implementation redirects to _triv() if > > needed. > > > > The _triv() is the helper to allow bus_dmamap_load_ma() to exists > > on architectures which cannot implement, on not yet implemented, > > proper page array load op. > Yes, I noticed the same thing. I'm not sure why _triv() is treated as > part of the public API and not prefixed with an underscore and a comment > not to use it in drivers. It is not. We do not claim that a function not starting with '_' is part of the driver KPI. Comment would be nice, indeed. > >> 3) Using bus_dmamap_load_ma would mean always using physcopy for bounce > >> buffers...seems like the sfbufs would slow things down ? > > For amd64, sfbufs are nop, due to the direct map. But, I doubt that > > we can combine bounce buffers and performance in the single sentence. > In fact the amd64 implementation of uiomove_fromphys doesn't use sfbufs > at all thanks to the direct map. sparc64 seems to avoid sfbufs as much > as possible too. I don't know what arm64/aarch64 will be able to use. > Those seem like the platforms where bounce buffering would be the most > likely, along with i386 + PAE. They might still be used on 32-bit > platforms for alignment or devices with < 32-bit address width, but then > those are likely to be old and slow anyway. > > I'm still a bit worried about the slowness of waiting for an sfbuf if > one is needed, but in practice that might not be a big issue. >