Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 10 Apr 2009 02:46:49 +1000 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Christoph Mallon <christoph.mallon@gmx.de>
Cc:        Ed Schouten <ed@FreeBSD.org>, freebsd-i386@FreeBSD.org, freebsd-amd64@FreeBSD.org
Subject:   Re: [PATCH] Simplify in*() and out*() functions of AMD64 and i386
Message-ID:  <20090410001919.S1567@besplex.bde.org>
In-Reply-To: <49DD01CA.3040900@gmx.de>
References:  <49DD01CA.3040900@gmx.de>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 8 Apr 2009, Christoph Mallon wrote:

> attached is a patch which simplifies the in*() and out*() functions for I/O 
> port access of AMD64 and i386. It removes an unnecessary distinction of cases 
> for inb() and outb(), which was used to generate better code for ports < 256.

In gcc >= 2.  (gcc == 1 didn't even have __builtin_const_p(), so the
optimization was too hard to do).

> This is unnecessary, because GCC supports an asm input constraint to handle 
> this ("N"). The stale comment, which states there is no constraint for this, 
> is removed, too.

This was necessary, but is stale like its comment.  There is of course no
need to support gcc 1 or 2.

> Also the {in,out}{w,l}() get treated with this constraint. 
> They had no special case before, so now better code is generated for them.

Actually, worse code is generated for them on most cases since the
optimization of loading 32 bits for the usual case of a non-constant
operand is lost in your version -- see below.  The case of a constant
port address was rare except for 8-bit isa ports "on the motherboard"
and has become rarer with FreeBSD's runtime configuration becoming
even more excessive (so instead of "inb $N,%al", the code is normally
about 10 instructions for bus_space_read_1(sc->bst, sc->bsh, N1)).

> Further, the unnecessary "cld" is removed from {in,out}s{b,w,l}(), because it 
> is guaranteed by the ABI that the direction flag is cleared. All in all the 
> code for in/out gets a bit simpler.

kib@ already pointed out that the direction flag is not guaranteed to be
cleared in the kernel.  For amd64, this may be a bug in the kernel, but
for i386 it is a bug in gcc say that the ABI specifies that the direction
flag is cleared, since there is no standard i386 ABI so gcc must not
assume that the direction flag is clear, like it has done until recently.

There are lots more similar cld's in bus.h.

% Index: sys/i386/include/cpufunc.h
% ===================================================================
% --- sys/i386/include/cpufunc.h	(Revision 190841)
% +++ sys/i386/include/cpufunc.h	(Arbeitskopie)
% @@ -170,177 +170,97 @@
%  	__asm __volatile("hlt");
%  }
% 
% -#if !defined(__GNUCLIKE_BUILTIN_CONSTANT_P) || __GNUCLIKE_ASM < 3
% -
% -#define	inb(port)		inbv(port)
% -#define	outb(port, data)	outbv(port, data)
% -
% -#else /* __GNUCLIKE_BUILTIN_CONSTANT_P && __GNUCLIKE_ASM >= 3 */

The ifdefs are ugly, and this one seems to have been slightly wrong, but
they provide documentation aboout the features used:
- __GNUCLIKE_BUILTIN_CONSTANT_P: no longer needed
- __GNUCLIKE_ASM >= 3: to ensure that the "i" constraint is available.
   This used to be __GNUC__ >= 2.  If that was correct then >= 3 is
   stricter than necessary; otherwise >= 2 was not as strict as it
   should have been, since some versions of gcc-2 certainly supported
   the "i" constraint.

A precise translation of the above would be __GNU_PREREQ__(maj, min) ||
defined(__INTEL_COMPILER), where (maj, min) is the version of gcc that
implemented the "N" constraint and you should check that the Intel
compiler (all version?) support the "N" constraint.

% - * The unnecessary test `(port) < 0x10000' is to generate a warning if
% - * the `port' has type u_short or smaller.  Such types are pessimal.
% - * This actually only works for signed types.  The range check is
% - * careful to avoid generating warnings.

This won't work any more, if it ever did.  I think it only worked for
the constant case, but for the constant case the width doesn't really
matter.

% -static __inline u_char
% -inbc(u_int port)
% +static inline u_char
% +inb(u_short port)

__inline should be globabally changed to inline someday, not starting here.
Plain inline has been Standard for 10 years now, but it still isn't
standard -- we're still waiting for gcc to implement C99, and haven't
switched to gcc-4.3(?) or later which will implement C99 extern inline.

Changing the type changes the API and breaks an optimization.  Callers
are supposed to provide an arg of width 32 so that it can be loaded
as efficiently as possible (instruction prefixes and/or sign extension
waste code space and time, mainly on old CPUs).  The 0x10000 check
referred to in the above comment attempts to enforce this.  Casting
down in the API prevents the 32-bit loads.

%  {
% -	u_char	data;
% -
% -	__asm __volatile("inb %1,%0" : "=a" (data) : "id" ((u_short)(port)));

The port address was already cast down here in inbc().  This seems to
have been just because I didn't know gnu asm very well when I wrote
the above.  The cast has no effect in the usual case where the "i"
constraint is used, but when the "d" constraint is used (only when
compiling with -O0?) %1 would be %edx without the cast.  In inbv(), I
hard-code %dx, but that does't work here "i" case.  The correct method
seems to be to use %w1 in both places (keep u_int port everywhere and
use the same method for the "N" constraint).

% -	return (data);
% +	u_char data;
% +	__asm volatile("inb %1, %0" : "=a" (data) : "Nd" (port));
% +	return data;
%  }

Style breakages:
- lost blank line after declarations.  This file followed the rule about
   this except for previous breakage.  It even followed it in some cases
   where the declarations are null.
- lost parentheses around return value.  This file followed the rule about
   this in all cases.
Style not-quite-fix:
- lost the tab after the type in the declaration.  In KNF, such a tab is
   not used for auto declarations.  However, this file was fairly consistent
   about breaking the rule.

Why change __volatile to volatile?  This is similar to changing __inline
to inline, except plain volatile is more standard.  <sys/cdefs.h> has
ifdef tangles which are supposed to allow users or cdefs.h itself to
define away volatile so as to support old compilers, and we use __volatile
instead of volatile a lot to support this.  This is probably not needed
for kernel headers.  However, cpufunc.h is sometimes abused in userland.

Fixed version (change nothing except the function name, %1, and the
constraint):

%%%
static __inline u_char
inb(u_int port)
{
 	u_char	data;

 	__asm __volatile("inb %w1, %0" : "=a" (data) : "Nd" (port));
 	return (data);
}
%%%

Similarly for the other functions.

BTW, I sometimes missing having the inverse of %[bwl...] -- a way to
generate an operand size letter from the type.  gcc-3 and gcc-4 have
some horribly incomplete support for this.

% -#endif /* __GNUCLIKE_BUILTIN_CONSTANT_P  && __GNUCLIKE_ASM >= 3*/

Removing this accidentally fixes >= 2 style bugs in it.

% -static __inline u_char
% -inbv(u_int port)
% -{
% -	u_char	data;
% -	/*
% -	 * We use %%dx and not %1 here because i/o is done at %dx and not at
% -	 * %edx, while gcc generates inferior code (movw instead of movl)
% -	 * if we tell it to load (u_short) port.
% -	 */

Lost this documentation.  Now the magic in my version is in %w in my version.
The comment shouldn't blame gcc for generating movw -- that is the
programmer's fault.

gcc on modern machines now generates movswl for loading a u_short port
to the u_int port specified by the old API, and movswl is fast on modern
machines, so there is no penalty in using the old API/implementation
on modern machines even if the caller neglected to start with a u_int
port.

% ...
% -static __inline void
% -insb(u_int port, void *addr, size_t cnt)
% +static inline void
% +insw(u_short port, void *addr, size_t cnt)
%  {
% -	__asm __volatile("cld; rep; insb"
% -			 : "+D" (addr), "+c" (cnt)
% -			 : "d" (port)
% -			 : "memory");
% +	__asm volatile("rep; insw"
% +	    : "+D" (addr), "+c" (cnt)
% +	    : "d" (port)
% +	    : "memory");
%  }

The diffs are hard to read, apparently due to diff getting confused
by your API and style changes and comparing unrelated functions.
Without the API changes, I think it would have matched insb with insb,
etc.

% -static __inline void
% -outbv(u_int port, u_char data)
% +static inline void
% +outb(u_short port, u_char data)
%  {
% -	u_char	al;
% -	/*
% -	 * Use an unnecessary assignment to help gcc's register allocator.
% -	 * This make a large difference for gcc-1.40 and a tiny difference
% -	 * for gcc-2.6.0.  For gcc-1.40, al had to be ``asm("ax")'' for
% -	 * best results.  gcc-2.6.0 can't handle this.
% -	 */
% -	al = data;

This can go of course.  I saw dummy assignments bogusly helping the
allocator in more complicated asms for a version of atomic_cmpset()
as late as gcc-3, but that seemed to be fixed by gcc-3.3.

% Index: sys/i386/i386/machdep.c
% ===================================================================
% --- sys/i386/i386/machdep.c	(Revision 190841)
% +++ sys/i386/i386/machdep.c	(Arbeitskopie)
% @@ -3555,45 +3555,24 @@
%  #ifdef KDB
% 
%  /*
% - * Provide inb() and outb() as functions.  They are normally only
% - * available as macros calling inlined functions, thus cannot be
% - * called from the debugger.
% - *
% - * The actual code is stolen from <machine/cpufunc.h>, and de-inlined.
% + * Provide inb() and outb() as functions.  They are normally only available as
% + * inline functions, thus cannot be called from the debugger.
%   */

This should have been implemented by using cpufunc.h instead of repeating
it, and for everything in cpufunc.h not just inb and outb, e.g., as is
done for everything in atomic.h.  This gives a technical reason for
using __inline -- we want to #undef it and shouldn't #undef a standard
keyword.

% 
% -#undef inb
% -#undef outb
% -
%  /* silence compiler warnings */
% -u_char inb(u_int);
% -void outb(u_int, u_char);
% +u_char inb_(u_short);
% +void outb_(u_short, u_char);
% 
%  u_char
% -inb(u_int port)
% +inb_(u_short port)
%  {
% -	u_char	data;
% -	/*
% -	 * We use %%dx and not %1 here because i/o is done at %dx and not at
% -	 * %edx, while gcc generates inferior code (movw instead of movl)
% -	 * if we tell it to load (u_short) port.
% -	 */
% -	__asm __volatile("inb %%dx,%0" : "=a" (data) : "d" (port));
% -	return (data);
% +	return inb(port);
%  }
% 
%  void
% -outb(u_int port, u_char data)
% +outb_(u_short port, u_char data)
%  {
% -	u_char	al;
% -	/*
% -	 * Use an unnecessary assignment to help gcc's register allocator.
% -	 * This make a large difference for gcc-1.40 and a tiny difference
% -	 * for gcc-2.6.0.  For gcc-1.40, al had to be ``asm("ax")'' for
% -	 * best results.  gcc-2.6.0 can't handle this.
% -	 */
% -	al = data;
% -	__asm __volatile("outb %0,%%dx" : : "a" (al), "d" (port));
% +	outb(port, data);
%  }
% 
%  #endif /* KDB */

This changes the (ddb undocumented) API and ABI to worse ones.  I
already don't line having to type "call inb(foo)" (in my x86 debugger,
inb is a command named "i", with certain space insensitivity that
allows typing "ifoo").

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20090410001919.S1567>