Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 05 Jul 2009 20:12:08 +0300
From:      Alexander Motin <mav@FreeBSD.org>
To:        Bruce Evans <brde@optusnet.com.au>
Cc:        freebsd-arch@FreeBSD.org
Subject:   Re: DFLTPHYS vs MAXPHYS
Message-ID:  <4A50DEE8.6080406@FreeBSD.org>
In-Reply-To: <20090706005851.L1439@besplex.bde.org>
References:  <4A4FAA2D.3020409@FreeBSD.org> <20090705100044.4053e2f9@ernst.jennejohn.org> <4A50667F.7080608@FreeBSD.org> <20090705223126.I42918@delplex.bde.org> <4A50BA9A.9080005@FreeBSD.org> <20090706005851.L1439@besplex.bde.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Bruce Evans wrote:
> On Sun, 5 Jul 2009, Alexander Motin wrote:
>> Bruce Evans wrote:
>>> On Sun, 5 Jul 2009, Alexander Motin wrote:
>>> 64K is large enough to bust modern L1 caches and old L2 caches.  Make 
>>> the
>>> size bigger to bust modern L2 caches too.  Interrupt rates don't matter
>>> when you are transfering 64K items per interrupt.
>>
>> How cache size related to it, if DMA transfers data directly to RAM? 
>> Sure, CPU will invalidate related cache lines, but why it should 
>> invalidate everything?
> 
> I was thinking more of transfers to userland.  Increasing user buffer
> sizes above about half the L2 cache size guarantees busting the L2
> cache, if the application actually looks at all of its data.  If the
> data is read using read(), then the L2 cache will be busted twice (or
> a bit less with nontemporal copying), first by copying out the data
> and then by looking at it.  If the data is read using mmap(), then the
> L2 cache will only be busted once.  This effect has always been very
> noticeable using dd.  Larger buffer sizes are also bad for latency.
> 
>> Small transfers give more work to all levels from GEOM down to 
>> CAM/ATA, controllers and drives. It is not just a context switching.
> 
> Yes, I can't see any cache busting below the level of copyout().  Also,
> after you convert all applications to use mmap() instead of read(),
> the cache busting should become per-CPU.

As soon as file data usually passing via buffer cache, they will anyway 
be read to the different memory areas and copied-out from them. So I 
don't see much difference there between doing single big and several 
small transactions. Cache trashing by user-level also will depends only 
on user-level application buffer size, but not on kernel.

How to reproduce that dd experiment? I have my system running with 
MAXPHYS of 512K and here is what I have:

# dd if=/dev/ada0 of=/dev/null bs=512k count=1000
1000+0 records in
1000+0 records out
524288000 bytes transferred in 2.471564 secs (212128024 bytes/sec)
# dd if=/dev/ada0 of=/dev/null bs=256k count=2000
2000+0 records in
2000+0 records out
524288000 bytes transferred in 2.666643 secs (196609752 bytes/sec)
# dd if=/dev/ada0 of=/dev/null bs=128k count=4000
4000+0 records in
4000+0 records out
524288000 bytes transferred in 2.759498 secs (189993969 bytes/sec)
# dd if=/dev/ada0 of=/dev/null bs=64k count=8000
8000+0 records in
8000+0 records out
524288000 bytes transferred in 2.718900 secs (192830927 bytes/sec)

CPU load instead grows from 10% at 512K to 15% at 64K. May be trashing 
effect will only be noticeable at block comparable to cache size, but 
modern CPUs have megabytes of cache.

-- 
Alexander Motin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4A50DEE8.6080406>