Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 3 Sep 2011 11:19:17 -0600 (MDT)
From:      Ian Lepore <freebsd@damnhippie.dyndns.org>
To:        FreeBSD-gnats-submit@FreeBSD.org
Subject:   arm/160431: [patch] Disable interrupts during busdma cache sync operations.
Message-ID:  <201109031719.p83HJHHO098751@revolution.hippie.lan>
Resent-Message-ID: <201109031720.p83HKCZl076304@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help

>Number:         160431
>Category:       arm
>Synopsis:       [patch] Disable interrupts during busdma cache sync operations.
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    freebsd-arm
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sat Sep 03 17:20:12 UTC 2011
>Closed-Date:
>Last-Modified:
>Originator:     Ian Lepore <freebsd@damnhippie.dyndns.org>
>Release:        FreeBSD 8.2-RC3 arm
>Organization:
none
>Environment:
FreeBSD dvb 8.2-RC3 FreeBSD 8.2-RC3 #49: Tue Feb 15 22:52:14 UTC 2011     root@revolution.hippie.lan:/usr/obj/arm/usr/src/sys/DVB  arm

>Description:
Data can be corrupted when an interrupt occurs while busdma_sync_buf() is 
handling a buffer that partially overlaps a cache line.  One scenario, seen
in the real world, was a small IO buffer allocated in the same cache line
as an instance of a struct intr_handler.  The dma sync code copied the non-DMA
data (the intr_handler struct) to a temporary buffer prior to the cache sync,
then an interrupt occurs that results in setting the it_need flag in the 
struct.  When control returns to the dma sync code it finishes by copying 
the saved partial cache line from the temporary buffer back to the 
intr_handler struct, restoring the it_need flag to zero, and resulting in
a threaded interrupt handler not running as needed.

Similar sequences can be imagined that would lead to corruption of either
the DMA'd data or non-DMA data sharing the same cache line, depending on the
timing of the interrupt, and I can't quite convince myself that the problem
only occurs in this partial-cacheline-overlap scenario.  For example, what
happens if execution is in the middle of a cpu_dcache_wbinv_range() operation
and an interrupt leads to a context switch wherein the pmap code decides to
call cpu_dcache_inv_range()?  So to be conservatively safe, this patch 
disables interrupts for the entire duration bus_dmamap_sync_buf(), not just
when partial cache lines are being handled.

>How-To-Repeat:
It would be very difficult to set up a repeatable test of this condition.  We
were lucky (!) enough to have it happen repeatably enough to diagnose.

>Fix:
Problem was discovered in an 8.2 environment, but this diff is to -current.

--- diff.tmp begins here ---
--- busdma_machdep.c.orig	2010-03-11 14:16:54.000000000 -0700
+++ busdma_machdep.c	2011-09-03 10:15:16.000000000 -0600
@@ -1091,6 +1091,14 @@ static void
 bus_dmamap_sync_buf(void *buf, int len, bus_dmasync_op_t op)
 {
 	char _tmp_cl[arm_dcache_align], _tmp_clend[arm_dcache_align];
+	uint32_t intr;
+
+	/* Interrupts MUST be disabled when handling partial cacheline flush
+	 * and most likely should be disabled for all flushes.  (I know for
+	 * certain interrupts can cause failures on partial flushes, and suspect
+	 * problems could also happen in other scenarios.)
+	 */
+	intr = intr_disable();
 
 	if ((op & BUS_DMASYNC_PREWRITE) && !(op & BUS_DMASYNC_PREREAD)) {
 		cpu_dcache_wb_range((vm_offset_t)buf, len);
@@ -1129,6 +1137,8 @@ bus_dmamap_sync_buf(void *buf, int len, 
 			    arm_dcache_align - (((vm_offset_t)(buf) + len) &
 			   arm_dcache_align_mask));
 	}
+
+	intr_restore(intr);
 }
 
 static void
--- diff.tmp ends here ---

>Release-Note:
>Audit-Trail:
>Unformatted:



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201109031719.p83HJHHO098751>