Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 29 Dec 2013 19:14:32 +1100 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Warner Losh <imp@bsdimp.com>
Cc:        "freebsd-arch@freebsd.org Arch" <freebsd-arch@FreeBSD.org>
Subject:   Re: UART grab patch
Message-ID:  <20131229173041.E979@besplex.bde.org>
In-Reply-To: <79C659B9-F402-42B6-8240-C4AB0A0EB92B@bsdimp.com>
References:  <79C659B9-F402-42B6-8240-C4AB0A0EB92B@bsdimp.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, 28 Dec 2013, Warner Losh wrote:

> OK. Looks like my initial assessment of mountroot was too optimistic. With the exception of the imx driver, all the uart drivers in the tree fell victim to this bug. I hit a second one with the Raspberry Pi, and went looking.

Please use the Unix newline character in mail.

> I've uploaded a patch to http://people.freebsd.org/~imp/uart-grab.diff that should solve the problem. I don't have all these systems to test on, so I'm hoping people can report back to me what works and what doesn't.  Boot -a with and without the patch for all serial consoles will tell you if it is working. Type stuff. If it appears exactly as you type it, then the patch is working. If not, please let me know.
>
> The basic strategy for all these is to disable RX interrupts when the console is grabbed, and enable them when it is ungrabbed. This has to be done in the hardware, since masking the interrupt at the CPU or PIC level will cause many UARTs to misbehave and/or prevent other interrupts from happening that are shared which can cause problems on some platforms.

This is very broken on xx50 (3 copies which differ only in style).
There it doesn't actually disable interrupts using the interrupt enable
register, but hacks on the fifo control register to try to reduce the
rx fifo trigger level to 1.  It has to hard-code the latter register
since it is not supposed to be available at this level.  You previously
pointed out that the bug is smaller when the rx fifo is large, so this
change makes the problem worse when its attempt succeeds.  It is not
so bad when either the fifo doesn't exist (then it has no effect) or
when the trigger level doesn't get reduced all the way to 1 (this
happens in some non-default xx50 modes).

This is broken or fragile on most or all device types.  It "restores" to a
hard-coded state instead of to the previous state.  Maybe uart fixates
the state of the console device sufficently for the final state to be
constant, but I doubt this.  Indeed, just for xx50, buggy hardware needs
special handling for tx interrupts, and uart has this.  It sets the tx
interrupt enable bit while transmitting and keeps it clear it otherwise.
Thus the initial state of the interrupt enable register varies and the
state restored should vary to match.  The correct state to restore
for this register can possibly be determined by looking at the global
state of the driver, but this would be complicated (it needs transient
locking of software state all over; only hardware state is locked).

In general, hardware state should be saved in *sc by grab and restored
by ungrab.  Locking only this part of *sc across grab/ungrab is still
difficult.  Parts of the driver that might invalidate the saved state
must be locked out.

The patch also "sets" the state to a hard-coded state instead of modifying
the current state.  If this is done right, then it is correct since it
amounts to an initialization of the driver to console mode.  But the
parts of the patch that set the correct (interrupt control) register
enable all the non-tx interrupts even if none were enabled before (these
are later "restored").  This asks for interrupts that might cause problems.
On xx50, this has a chance of being harmless only due to the implementation
detail that the hardware has an interrupt identification register and the
software uses this register so as to not see hardware state changes unless
their interrupt has happened.

As I mentioned previously, grab/ungrab were supposed to handle both
directions of i/o, but are currently only called from higher levels
for input.  Forcing rx interrupts off and tx interrupts on in grab
is very input-centric.

The locking is certainly incomplete.  I don't remember even locking
against recursion in earlier changes in uart grab/ungrab.  (Note that
there is no anti-recursion code at the top level, except for some
deadlocking in cnputs()).  syscons grab has the buggy locking:

 	if (scp->grabbed++ > 0)
 		return;

Bugs in this start with the access being non-atomic.  With atomic
accesses, locking like this can work, but you might have to sprinkle
too many atomic tests of the variable, using code like:

 	uart_lock(...);
 	if (scp->grabbed == 0)
 		normal_operation();
 	else
 		operation_reduced_to_avoid_confict_with_grabbed_mode();
 	uart_unlock(...);

I changed this to use a mutex because atomic accesses are too hard to
use.  They would have to be spammed through normal_operation().  The
lock must be held throughout.  The !normal_operation() case may still
be complicated.  E.g., suppose you have a buggy grab routine that
somehow allows an interrupt.  Normal operation is to service the
interrupt so that it doesn't repeat.  Abnormal operation can't be to
simply return from the interrupt handler, since then the interrupt
might repeat endlessly or never occur again.  Here it would be better
to do the normal operation and break the console operation.  Make this
a feature and let invalid (but not very harmful) states occur
occasionally and fix up the invalid states using polling.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20131229173041.E979>