From owner-freebsd-mips@FreeBSD.ORG Fri Jan 29 05:29:01 2010 Return-Path: Delivered-To: freebsd-mips@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5EE8E106566C; Fri, 29 Jan 2010 05:29:01 +0000 (UTC) (envelope-from rrs@lakerest.net) Received: from lakerest.net (unknown [IPv6:2001:240:585:2:213:d4ff:fef3:2d8d]) by mx1.freebsd.org (Postfix) with ESMTP id F0B4E8FC1A; Fri, 29 Jan 2010 05:29:00 +0000 (UTC) Received: from [192.168.2.175] (pool-96-249-204-75.snfcca.dsl-w.verizon.net [96.249.204.75]) (authenticated bits=0) by lakerest.net (8.14.3/8.14.3) with ESMTP id o0T5Svfd002324 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT); Fri, 29 Jan 2010 00:28:59 -0500 (EST) (envelope-from rrs@lakerest.net) Message-Id: <85D9D383-29A3-4F09-A2FE-61E4EA85CE9B@lakerest.net> From: Randall Stewart To: Juli Mallett In-Reply-To: Content-Type: text/plain; charset=WINDOWS-1252; format=flowed; delsp=yes Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Apple Message framework v936) Date: Thu, 28 Jan 2010 21:28:51 -0800 References: <20100128.132114.1004138037722505681.imp@bsdimp.com> <66207A08-F691-4603-A6C5-9C675414C91E@lakerest.net> X-Mailer: Apple Mail (2.936) Cc: freebsd-mips@FreeBSD.org, Neel Natu Subject: Re: Code review: groundwork for SMP X-BeenThere: freebsd-mips@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting FreeBSD to MIPS List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 29 Jan 2010 05:29:01 -0000 On Jan 28, 2010, at 8:40 PM, Juli Mallett wrote: > On Thu, Jan 28, 2010 at 20:26, Randall Stewart =20 > wrote: >> It burns up TLB entries. Ok that does not sound so bad on the >> surface but wait lets think about this.. and I am going to >> speak in terms of XLR... but other mips processors may have >> the same issue. >> >> 1) I have 8 cores per cpu pack. >> 2) Each core has 4 "threads" which are kinda hyper threads, their >> own register set, there own everything accept they share a pipeline >> and get scheduled when one of the others are blocked. >> 3) This means I still need a pcpup per thread. >> 4) Now I have 64 TLB entries for every CPU complex. I can have them >> 16 per thread OR 64 shared amongst all threads. >> 5) This means I dedicate 4 of my 64 TLB entries for your pcpup =20 >> entries. > > So on your systems threads share the TLB? Wired TLB entries can't be > pulled out (in the case of the kernel stack it's basically > catastrophic for that to happen.) A compromise if your TLB entries > are really at a premium is to use a single large entry (using, say, a > single 32k page) that contains both PCPU and the kernel stack, or a > page which has pointers to pcpu data, the kernel stack, etc. I seem > to recall seeing a port of FreeBSD that used the same storage for the > kernel stack and PCPU data, but I could be mistaken. Which means you have a big array that you are offsetting. I was even thinking get a LARGE entry.. one that is say 8 Meg for the kernel.. covering all text/data etc... with this new super page stuff. of course I have never looked into how its implemented.. Going back to your idea. is that not the same thing as having an index into an array. Yes, you pay an index reference for every access .. or at least one to setup a pointer.. but I think that it much cheaper than a TLB miss is... (words for imp to think about)... > > There are other trade-offs available, of course. If we don't use the > gp for accessing small data, we can keep a pointer to the pcpu data of > a CPU in gp whenever the kernel is running, and then PCPU accesses are > just a madder of loading from offset+gp, which is very quick =97 = faster > than the wired TLB entry mechanism, unless you use a virtual address > for the pcpu in which case it can be painful. As there are more > things like VIMAGE, the amount of small global data in the kernel is > going to fall and making gp a pcpu pointer makes more sense. My old > port used -G0 and I still disable use of the gp in my non-FreeBSD MIPS > work =97 I think NetBSD used to but I haven't noticed what FreeBSD = does. > This is an interesting idea... need to think about it more ;-) > More curiosity than anything (since I don't seem to be able to get an > RMI system to develop on): if the threads are sharing the TLB, how are > updates to TLB-related fields synchronized? How do you atomically > increase the wired count of the TLB? How does 'tlbwr' work? Do you > have to use special instructions when you're sharing the TLB that are > XLR-specific? I can't tell you how the hardware works.. I can either have the TLB divided into 16 entries per thread OR enable a special global register and get 64 entries that all threads see. In any case I just don't see that this has that much gain... its sexy.. but I think its a tradeoff like everything.. R > > Juli. > ------------------------------ Randall Stewart 803-317-4952 (cell) 803-345-0391(direct)