Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 18 Jan 2007 14:47:55 -0800
From:      Chuck Swiger <cswiger@mac.com>
To:        Maxim Sobolev <sobomax@FreeBSD.org>
Cc:        freebsd-current@FreeBSD.org, freebsd-arch@FreeBSD.org
Subject:   Re: [PATCH] Mantaining turnstile aligned to 128 bytes in i386 CPUs
Message-ID:  <0B6D259B-618B-466C-844E-3F79FDE272BB@mac.com>
In-Reply-To: <45AFF47E.3020905@FreeBSD.org>
References:  <3bbf2fe10607250813w8ff9e34pc505bf290e71758@mail.gmail.com> <3bbf2fe10607281004o6727e976h19ee7e054876f914@mail.gmail.com> <3bbf2fe10701160851r79b04464m2cbdbb7f644b22b6@mail.gmail.com> <20070116154258.568e1aaf@pleiades.nextvenue.com> <b1fa29170701161355lc021b90o35fa5f9acb5749d@mail.gmail.com> <eoji7s$cit$2@sea.gmane.org> <b1fa29170701161425n7bcfe1e5m1b8c671caf3758db@mail.gmail.com> <eojlnb$qje$1@sea.gmane.org> <3bbf2fe10701161525j6ad9292y93502b8df0f67aa9@mail.gmail.com> <45AD6DFA.6030808@FreeBSD.org> <3bbf2fe10701161655p5e686b52n7340b3100ecfab93@mail.gmail.com> <200701172022.l0HKMYV8053837@apollo.backplane.com> <20070118113831.A11834@delplex.bde.org> <200701181948.l0IJmdfn061671@apollo.backplane.com> <45AFED63.7020009@FreeBSD.org> <25EB3FED-71A9-4AE1-9A38-5D2DC27D0DF7@mac.com> <45AFF47E.3020905@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Jan 18, 2007, at 2:28 PM, Maxim Sobolev wrote:
>> Unfortunately, there are simply different tradeoffs between  
>> mechanisms for copying depending on whether you want to use or  
>> avoid using/thrashing the L1/L2 caches, whether the data is cache- 
>> aligned, and so forth; the CPU can't infer what you want to  
>> occur-- you have to tell it.  I find it interesting that some of  
>> the architectures (PA-RISC,
>
> Well, of course there are some special cases, but in general there  
> should be some baseline suitable for most of uses. That's why we  
> (and most other operating systems) only provide single version for  
> the mem*(3) APIs.

Well, a truly generic version in is lib/libc/string/bcopy.c; it's  
architecture-neutral (ie, it's pure C code) and it handles all kinds  
of things like overlapping source and destination addresses, non- 
aligned access, and so forth.  The downside is that it's slower than  
using movl/movsl, much less some of the fancier variants that Bruce  
and Matt have been discussing (in considerable, interesting detail)  
earlier:

   http://now.cs.berkeley.edu/Td/bcopy.html

If you're only moving, say, 5 bytes, the overhead of fancy loop  
unrolling and prefetching and so forth isn't going to help compared  
with a simple movb/movl combination, so it really depends.

-- 
-Chuck




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?0B6D259B-618B-466C-844E-3F79FDE272BB>