Date: Sun, 05 Jul 2009 20:12:08 +0300 From: Alexander Motin <mav@FreeBSD.org> To: Bruce Evans <brde@optusnet.com.au> Cc: freebsd-arch@FreeBSD.org Subject: Re: DFLTPHYS vs MAXPHYS Message-ID: <4A50DEE8.6080406@FreeBSD.org> In-Reply-To: <20090706005851.L1439@besplex.bde.org> References: <4A4FAA2D.3020409@FreeBSD.org> <20090705100044.4053e2f9@ernst.jennejohn.org> <4A50667F.7080608@FreeBSD.org> <20090705223126.I42918@delplex.bde.org> <4A50BA9A.9080005@FreeBSD.org> <20090706005851.L1439@besplex.bde.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Bruce Evans wrote: > On Sun, 5 Jul 2009, Alexander Motin wrote: >> Bruce Evans wrote: >>> On Sun, 5 Jul 2009, Alexander Motin wrote: >>> 64K is large enough to bust modern L1 caches and old L2 caches. Make >>> the >>> size bigger to bust modern L2 caches too. Interrupt rates don't matter >>> when you are transfering 64K items per interrupt. >> >> How cache size related to it, if DMA transfers data directly to RAM? >> Sure, CPU will invalidate related cache lines, but why it should >> invalidate everything? > > I was thinking more of transfers to userland. Increasing user buffer > sizes above about half the L2 cache size guarantees busting the L2 > cache, if the application actually looks at all of its data. If the > data is read using read(), then the L2 cache will be busted twice (or > a bit less with nontemporal copying), first by copying out the data > and then by looking at it. If the data is read using mmap(), then the > L2 cache will only be busted once. This effect has always been very > noticeable using dd. Larger buffer sizes are also bad for latency. > >> Small transfers give more work to all levels from GEOM down to >> CAM/ATA, controllers and drives. It is not just a context switching. > > Yes, I can't see any cache busting below the level of copyout(). Also, > after you convert all applications to use mmap() instead of read(), > the cache busting should become per-CPU. As soon as file data usually passing via buffer cache, they will anyway be read to the different memory areas and copied-out from them. So I don't see much difference there between doing single big and several small transactions. Cache trashing by user-level also will depends only on user-level application buffer size, but not on kernel. How to reproduce that dd experiment? I have my system running with MAXPHYS of 512K and here is what I have: # dd if=/dev/ada0 of=/dev/null bs=512k count=1000 1000+0 records in 1000+0 records out 524288000 bytes transferred in 2.471564 secs (212128024 bytes/sec) # dd if=/dev/ada0 of=/dev/null bs=256k count=2000 2000+0 records in 2000+0 records out 524288000 bytes transferred in 2.666643 secs (196609752 bytes/sec) # dd if=/dev/ada0 of=/dev/null bs=128k count=4000 4000+0 records in 4000+0 records out 524288000 bytes transferred in 2.759498 secs (189993969 bytes/sec) # dd if=/dev/ada0 of=/dev/null bs=64k count=8000 8000+0 records in 8000+0 records out 524288000 bytes transferred in 2.718900 secs (192830927 bytes/sec) CPU load instead grows from 10% at 512K to 15% at 64K. May be trashing effect will only be noticeable at block comparable to cache size, but modern CPUs have megabytes of cache. -- Alexander Motin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4A50DEE8.6080406>