Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 17 Feb 1999 14:05:40 -0600 (CST)
From:      Kevin Day <toasty@home.dragondata.com>
To:        dillon@apollo.backplane.com (Matthew Dillon)
Cc:        dyson@iquest.net, tlambert@primenet.com, mike@smith.net.au, hackers@FreeBSD.ORG
Subject:   Re: vm_page_zero_fill
Message-ID:  <199902172005.OAA29252@home.dragondata.com>
In-Reply-To: <199902171951.LAA10456@apollo.backplane.com> from Matthew Dillon at "Feb 17, 1999 11:51:18 am"

next in thread | previous in thread | raw e-mail | index | archive | help
> :Currently, the time spent loading/preparing the new application is a bit
> :long, so I was looking at ways to shrink that down. That's where this
> 
>      Ahh.  A couple of things.  First, I presume that the amount of memory
>      in the machine is not an issue... that you have enough to hold all
>      the programs pretty much resident.

Right. We did have a problem with the vm system deciding to swap out nearly
every executable, trying to cache all the data we were sending to the
graphic system, but that went away largely when we switched to a 3.0 system.

> 
>      In that case, simply preload the executables.  That is, rather then
>      take the latency hit when the user hits a button, take the latency
>      hit when the user is idle and just tell the program to 'go' ( through
>      a pipe ) when the user hits the button.

Not really an option. We don't know which of the 20ish applications they're
going to pick, and it's just one button to start them.

> 
>      Second, if you aren't already using a Xeon with its largest L2
>      cache configuration, you should probably be using a Xeon with its
>      largest L2 cache configuration.  Intel cpu's tend to fall on their
>      face with DATA-memory-intensive applications due to their 
>      undersized caches.   The undersized cache works ok for instructions
>      because instructions are pretty compact, but it does not work
>      well for data.

Because of cost concerns, we're forced into a 586 series CPU.

> 
>      If the box you are using does not have a 100MHz memory bus, you need
>      to get one that does.

Again, not possible. :)

> 
> :While I don't want to get accused of not trying to figure this one out on my
> :own.... Suppose I mmap a large (2MB or more) file. Should any zero'ing be
> :going on when I touch those pages for the first time? From the analyzer, it
> :looks like it's zeroing pages before putting what it read from the disk into
> :them, but as you know, figuring out what's really going on by watching a
> :logic analyzer is a form of witchcraft... If this is the case, turning this
> :off would greatly help me. :)
> 
>     It should not be zeroing pages before doing full reads into them.
>     That is pretty well optimized, usually.

What if I'm doing a partial read? Is a partial read even possible if I'm
using the mmap method?

> 
>     Third, Memory->PCI transfers are best done with DMA ( as you
>     already know ).  For a frame store, you can eek out additiona
>     l PCI bus speed by messing with the burst transfer length ( 
>     especially if the cpu is not heavily involved and can afford 
>     to stall a little more ).  You should be able to push 
>     120 MBytes/sec on a PCI bus by tuning the DMA burst.  
>     The PCI card should have a FIFO big enough to accomodate the
>     burst, too.  If you do a large transfer to a PCI card's frame
>     buffer with memcpy() ( or equivalent ), you eat double the 
>     memory bandwidth plus blow away the data cache on the cpu.

I tried this, but ran into a few problems. 

First, I had to somehow convince the vm system to bring the pages in from
disk, before doing the DMA, and making sure they were contiguously mapped to
physical ram, and then forcing it not to dump them later. Never quite got
past this hurdle. :) Is there a driver somewhere that does this? In my
system, userland code mmap()'s data, and does memcpy's to a mmaped device
that corresponds directly to the physical frame buffer. I wasn't really sure
how to make sure the data ended up in a nice contiguous buffer to DMA it out
of, without doing another copy.

Also, just in a non-working test case, it actually seemed slower doing it
this way, and I lost interest at this point. :)

> 
>     Fourth, if you are doing direct frame store from disk to a
>     PCI card, you may wish to consider building a custom piece
>     of hardware / firmware to actually use the SCSI bus to 
>     transfer the data directly ( i.e. put the frame store *on*
>     the SCSI bus and have it master the data directly from the
>     drives without host intervention ).  This is a rather more
>     complex solution.

We're using IDE too. :)



Thanks,

Kevin


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199902172005.OAA29252>