Date: Mon, 22 Jan 1996 11:39:47 -0700 (MST) From: Terry Lambert <terry@lambert.org> To: davidg@root.com Cc: hasty@rah.star-gate.com, rmallory@wiley.csusb.edu, freebsd-hackers@freefall.freebsd.org Subject: Re: stanford benchmark/usenix Message-ID: <199601221839.LAA15576@phaeton.artisoft.com> In-Reply-To: <199601221021.CAA14236@Root.COM> from "David Greenman" at Jan 22, 96 02:21:28 am
next in thread | previous in thread | raw e-mail | index | archive | help
[ ... CPU specific bzero/bcopy/other ... ] > The function vector can then be changed to an optimized function for specific > CPU types. This would happen at some convenient place before program startup, > or perhaps in the generic function (which could, perhaps, be a stub whose sole > purpose is to select the appropriate routine, or fall back to a generic one). > I really don't want to get into this in more detail right now - I don't have > the time and in the end it would be easier to just sit down and code it. If > you think you know how to implement this correctly, then by all means, go for > it! We did this as well with the BSD kernel environment emulation under Windows95 for the file system framework (we have UFS running as a native FS under Win95 after making some changes of the changes I've been suggesting after isolating the BSD'isms and optimizing performance from the non-statistical profiling data). Do you remember Bruce's message regarding reordering the cache line loads in the P5 optimized bcopy? He said: | On my 486DX2/66 with an unknown writing strategy, copy() is about 20% | faster than memcpy() (*) but can be improved another 20% by changing the | cache line allocation strategy slightly: replace the load of 28(%edi) by | a load of 12(%edi) and add a load of 28(%edi) in the middle of the loop. | The pairing stuff and the nops make little difference. cache-line | alignment of the source and target made little difference. | | (*) When memcpy() is run a second time, it is as fast as the fastest | version as copy()! I didn't quite follow the reasoning, since it would write the contents of 12(%edi) into 28(%edi)?!? I mailed Bruce about this directly, but haven't seen a response yet... Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199601221839.LAA15576>