Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 5 Oct 1997 11:39:56 +0600 (ESD)
From:      "Serge A. Babkin" <babkin@hq.icb.chel.su>
To:        se@freebsd.org (Stefan Esser)
Cc:        hackers@freebsd.org
Subject:   Re: PCI slowness ?
Message-ID:  <199710050539.LAA09048@hq.icb.chel.su>
In-Reply-To: <19971004221205.63474@mi.uni-koeln.de> from "Stefan Esser" at Oct 4, 97 10:12:05 pm

next in thread | previous in thread | raw e-mail | index | archive | help


> 
> On 1997-10-04 20:05 +0600, "Serge A. Babkin" <babkin@hq.icb.chel.su> wrote:
> > Hi!
> > 
> > I've made a simple driver to test the SCSI throughput. It takes
> > 2 NCR53c810A SCSI cards and starts to transfer data between them
> > at (theoretically) 10MBps synchronous rate. But in fact I get at most
> > 8.5MBps ! I was able to rase it from 7.5MBps to 8.5MBps
> > by changing the memory access options in NCRs from simplest
> > to maximal optimization so probably the PCI or memory
> > bus limits the throughput. Can anyone suggest me what's
> > the problem ?
> 
> You don't tell about your actual setup:
> 
> 1) transfer length ?

1024 bytes

> 2) did disconnects occur ?

No, the phase never changes, the cards just send and receive
those 1K blocks in SCRIPTS loop.

The exact loop is:

loop:
	SCR_NOP,
		0,
	SCR_MOVE_INIT_ABS(1024) ^ SCR_DATA_OUT, /* or MOVE_TARG_ABS */
		&initbf, /* or targbf */
	SCR_COPY(4),
		&initcnt, /* or targcnt */
		RADDR(scr0),
	SCR_CALL,
		PADDR(increase),
	SCR_COPY(4),
		RADDR(scr0),
		&initcnt,
	SCR_JUMP,
		PADDR(loop),

The `increase' subroutine increases scr0..3 as 4-byte register by 1.

> 3) does the time measurement include the time to prepare 
>    the transfer (which may be hidden in most cases) ?

It just interrupts on timer each 3 seconds, prints out 'initcnt' and
'targcnt' and prints the computed transfer rate (difference
between current and previous `*cnt' divided by 3). 

> 
> Did you take into account, that FAST SCSI uses a 100ns
> cycle length, which allows for 10mio B/s, or 9.5MB/s ?

Sorry, I was wrong. In fact it was 7500 to
8250 1Kbyte transfers. I was catched in this trap.
This must be 7.68 to 8.45 mio B/s.

> 
> If you transfer 64KB blocks (which takes 6.7ms at 10MHz)
> and the SCSI overhead is 1ms, then you'll see an actual
> transfer rate of 8.3MB/s.

There must be no SCSI overhead because I do not
reestablish the connections and do not change phases. But I never 
throught that SCRIPTS overhead can be so high.

> 
> > The chipset is Intel Triton on some chineese motherboard
> > with 75MHz Pentium, memory is 60ns EDO. 
> > Theoretical PCI throughput is 33M
> > of 4-byte transfers per second (the card claims to work
> > in burst mode). Theoretical memory throughput is at least
> > 10M of 4-byte transfers per second if we suppose that
> > the memory cycle with all overhead is 100ns and the
> > card reads by 4 bytes at a time. But the experiment
> > shows throughput of only 17MBps or 4.25M of 4 byte
> > transfers. Does the processor eats all the remaining 
> > throughput (although I think it must load most of the
> > code it runs at idling into the cache) ?
> 
> The PCI accesses don't limit throughput in your case.
> But you should allow for large bursts and should make
> sure, that read-multiple and write-and-invalidate PCI
> commands are used.

How can I do that ? Should I look on the PCI bus driver ?

> 
> What's the setting of the master latency-timer and 
> the NCR latency timers ?

scntl3=0x13
sxfer=0x04

Do you mean them ?

> Is the cache line size register set correctly ?
> What's the burst length limit chosen in DMODE ?

dmode=0xce;
dcntl=0x20|NOCOM;

I can describe the history:

Initial speed: 7.5 mio

Using EXT flag in SCNTL2: no difference

Enabled ERL, ERMP, BOF in DMODE. Got speed around 8.1 mio

Enabled CLSE, PFEN in DCNTL. Got speed around 8.25 mio

Removed CLSE in DCNTL : no difference
It seems strange because it must reduce the granularity of
memory accesses from 16 to 4 bytes. May be EDO memory
is the reason why it didn't influenced at all.

Reduced SCSI offset in SXFER from 8 to 4: no difference
This must show that SCSI is not limiting the speed.


> 
> > And another thing. I know that expensive machines like
> > DEC Prioris have possibilities to change some PCI
> > timing parameters for PCI cards. My cheap box does
> > not have anything like. May be that's the problem ?
> 
> Which PCI timing parameters ???

Per-card latency timers. If I remember correctly. Increasing
them led to lower throughput of Adaptec under SCO.
We had problems because 2 of 4 machines had only one
interrupt used for Adaptec instead of two. And before
we discovered this reason we tried to increase latencies.
This led to slower work but it paniced less frequently.

> 
> You can modify PCI chip set parameters, but most should
> already be set to best values (i.e. all optimisations
> should be enabled by default).
> 
> Increasing the value of the latency timer may improve PCI 
> bus throughput, but its effect is often over-estimated.

Hm. May be I decreased them on Prioris, not increased as
I wrote earlier. I can't remember now.

> 
> You should make sure, that the NCR cards are configured 
> correctly, and you should take the into account, that 
> the SCSI standard uses powers of 10, while throughput 
> is measured in MB/s which are based on powers of 2 ...

I've checked the Symbios examples and it seems to me that
I do everything like them.

> 
> You want to overlap the preparation of the next transfer
> with the execution of the previous one, and you want to
> use transfer lengths long enough to hide the SCSI command
> overhead (which may be as low as 50us, but it takes some
> effort to get it that low ...).

I tried to absolutely get rid of them. The only possible
slowdown reason is SCRIPTS overhead.

Thanks!

-SB



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199710050539.LAA09048>