Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 6 Jul 2009 02:46:37 +1000 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Alexander Motin <mav@FreeBSD.org>
Cc:        gary.jennejohn@freenet.de, freebsd-arch@FreeBSD.org
Subject:   Re: DFLTPHYS vs MAXPHYS
Message-ID:  <20090706005851.L1439@besplex.bde.org>
In-Reply-To: <4A50BA9A.9080005@FreeBSD.org>
References:  <4A4FAA2D.3020409@FreeBSD.org> <20090705100044.4053e2f9@ernst.jennejohn.org> <4A50667F.7080608@FreeBSD.org> <20090705223126.I42918@delplex.bde.org> <4A50BA9A.9080005@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 5 Jul 2009, Alexander Motin wrote:

> Bruce Evans wrote:
>> On Sun, 5 Jul 2009, Alexander Motin wrote:
>>>>> Isn't it a time to review their values for increasing? 64KB looks funny, 
>>>>> comparing to modern memory sizes and data rates. It just increases 
>>>>> interrupt rates, but I don't think it really need to be so small to 
>>>>> improve interactivity now.
>> 
>> 64K is large enough to bust modern L1 caches and old L2 caches.  Make the
>> size bigger to bust modern L2 caches too.  Interrupt rates don't matter
>> when you are transfering 64K items per interrupt.
>
> How cache size related to it, if DMA transfers data directly to RAM? Sure, 
> CPU will invalidate related cache lines, but why it should invalidate 
> everything?

I was thinking more of transfers to userland.  Increasing user buffer
sizes above about half the L2 cache size guarantees busting the L2
cache, if the application actually looks at all of its data.  If the
data is read using read(), then the L2 cache will be busted twice (or
a bit less with nontemporal copying), first by copying out the data
and then by looking at it.  If the data is read using mmap(), then the
L2 cache will only be busted once.  This effect has always been very
noticeable using dd.  Larger buffer sizes are also bad for latency.

> Small transfers give more work to all levels from GEOM down to CAM/ATA, 
> controllers and drives. It is not just a context switching.

Yes, I can't see any cache busting below the level of copyout().  Also,
after you convert all applications to use mmap() instead of read(),
the cache busting should become per-CPU.

>>>> I wonder whether all drivers can correctly handle larger values for
>>>> DFLTPHYS.
>> 
>> Most can't, since their hardware can't.  They can fake it (ata used to)
>> but there is negative point in this for most drivers, since geom already
>> reblocks for disk devices and reblocking would be wrong for devices like
>> tapes.
>
> I am not speaking about reblocking. I am speaking about best possible 
> hardware usage. I can't say about the most, but at least AHCI and modern SiI 
> SATA chips, I have worked closely, practically have no limits for transaction 
> size, except the amount of memory their drivers allocate for S/G table. My 
> new drivers are able to self-tune for any MAXPHYS value.

The main limit above ata seems to be only MAXPHYS and its use in pbufs.
DFLTPHYS seems to only be used in buggy unimportant cases.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20090706005851.L1439>