Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 12 May 2014 03:43:24 +1000 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Nathan Whitehorn <nwhitehorn@freebsd.org>
Cc:        svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org, Bruce Evans <brde@optusnet.com.au>
Subject:   Re: svn commit: r265864 - head/sys/dev/vt/hw/ofwfb
Message-ID:  <20140512015015.G1959@besplex.bde.org>
In-Reply-To: <536F9864.9080606@freebsd.org>
References:  <201405110158.s4B1wvFA072381@svn.freebsd.org> <20140511133517.N1100@besplex.bde.org> <536F9864.9080606@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 11 May 2014, Nathan Whitehorn wrote:

> On 05/10/14 23:51, Bruce Evans wrote:
>> On Sun, 11 May 2014, Nathan Whitehorn wrote:
>> Only 10% slower?  Bitmapped mode with 256 colors is inherently 4 times
>> slower for an 8x8 font (8 bytes/char instead 2) of and 8 times slower for
>> an 8x16 font.  That's without any I/O pathology.  Perhaps you are comparing
>> with a syscons that is already very slow due to the hardware not supporting
>> text mode.
>> 
>> However, syscons has buffering that should limit this problem.
>
> This is indeed comparison to syscons in bitmap mode. PowerPC has no VGA text 
> mode, so that's the best we could do. That using newcons's bitmap console 
> instead of syscons's bitmap console almost tripled my boot time, however, was 
> totally unreasonable and needed fixing. Whatever buffering syscons may have 
> beyond what newcons has is at most a 10% thing.

Really?  The slowdown would need to be a factor of several hundred to
be noticeable and several thousand to be painful.

Syscons actually uses a slow mode with non-delayed update for kernel
messages.  This means that it will it must be many times slower than
the best case.  Calculations below.

>> A correctly-implemented console driver doesn't have itty-bitty hardware
>> i/o like the old version of this or itty-bitty buffering like the changed
>> version.
>
> There are many deficiencies in the general approach being used here. I'm 
> trying to patch it just to work for the time being so that it isn't a huge 
> regression in console performance compared to syscons. Hopefully, the general 
> architectural issues -- which you outline well below -- get solved in due 
> course. This patch at least fixes the immediate problem.

Some more details on the timing...

>> Some old screen benchmarks.  The benchmark is basically to write lines
>> of the screen width and scroll.  I stopped updating this often about 15
>> years ago when frame buffers and CPUs became fast enough.  But it appears
>> that software bloat and design errors have caught up.

It is difficult to generate data fast enough for syscons to be the
bottleneck.  My simple test program does 1-char writes so the syscall
overhead dominates.  I must have used a variation of it in the old
tests, but can't remember what.  So I reran some tests using:

   dd if=/dev/zero bs=1000000 count=many | tr '\000' c | time dd bs=10000000

(or with bs reduced to time slow cases).  Here tr is barely fast enough
to not be a bottleneck.  The final dd was needed needed to reblock, else
tr sometimes does 1-char writes.  c is either 'p' to test normal output,
or '\012' to test scrolling.

>> % machine     video        O/S              where      real user  sys 
>> speed
>> % ---------   -------      --------------   ---------  ----- ----  ----- 
>> -----
>> % A/2223 PCI  R9200SE      FreeBSD-5.2m     onscreen-o  .026 0.00   .026 
>> 76.9
>> % A/2223 PCI  R9200SE      FreeBSD-5.2m     offscreen-o .026 0.00   .026 
>> 76.9
>> % A/2223 PCI  R9200SE      FreeBSD-5.2m     onscreen    .031 0.00   .031 
>> 64.5
>> % A/2223 PCI  R9200SE      FreeBSD-5.2m     offscreen   .031 0.00   .031 
>> 64.5
>> 
>> An 11 year old system.
>> ...
>> I forget the units for these measurements, except that the speed column
>> gives a bandwidth in MB/sec.  I don't remember if this is for write(2)
>> bandwidth or is related to frame buffer bandwidth).  Interpret them as
>> relative.

The speed is for write(2) bandwidth, times 2 for character+attribute.  It
is for writing p's.  It is close to the frame buffer bandwidth.

You can pessimize this speed by a factor of 1000 and still have a usable
console.  More than 30 thousand characters/sec instead of more than 30
million.  I find 11520 for a serial tty noticeably slow, but useable.

>> On a system similar to the above, syscons scrolls at 50000 lines/sec.
>> Non-virtually, this would require a frame buffer bandwidth of 200MB/sec,
>> which is several times faster than possible.  Since syscons only does
>> a direct update for bytes written, it needs only about 1/25 of this
>> bandwidth or 800KB/sec.  This is not quite in the noise compared with
>> a frame buffer bandwidth of 60.2MB/sec.

Actually, on a similar system, syscons scrolls at 1.04 million lines/sec
with -opost and at 0.94 million lines/sec with opost (for printing
lots of newlines).  This must be mostly virtual, with most steps not done
in the frame buffer.  If it were physical, then the frame buffer bandwidth
for the -opost case would be 8.3GB/sec.  The main memory bandwidth for
this is relatively trivial, since writing 1 newline only involves clearing
1 line in the history buffer and not moving a screenful to the frame buffer.
It is 166MB/sec.

>> % K6/233 PCI  S3/Virge     linux-2.1.63     offscreen-o 0.97 0.00   0.97 
>> 2.06
>> % K6/233 PCI  S3/Virge     linux-2.1.63     onscreen-o  1.03 0.00   1.03 
>> 1.93
>> % K6/233 PCI  S3/Virge     linux-2.1.63     offscreen   1.18 0.00   1.18 
>> 1.69

I tried a newer Linux (ttylinux 2.6.30.5) console on newer hardware
similar to the above.  For normal output, its speed was 2.7 million
characters/sec (double this to compare with the speed column above).
-opost didn't make much difference for normal output.  For scrolling,
its speed was 22 thousand lines/sec with opost and 83 thousand lines/sec
with -opost.  I think it writes every line to the frame buffer, but
reduces the slowness of this by using hardware scrolling.

Calculations for direct updates in kernel mode: at best they go at the
frame buffer bandwidth, with the whole screen copied for each scroll
(hardware scrolling would help here).  For 80x25, this gives 4KB to
move per newline and relatively few other slow accesses.  So the scrolling
speed is about 20 thousand lines/sec for an 80MB/sec frame buffer.
Almost the same as Linux with opost.  Divide by 8 for pixel mode with
8x16 256 colors.  Still plenty for kernel messages, but there is no
longer a factor of hundreds to spare.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20140512015015.G1959>