Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 23 Oct 2009 10:22:47 -0500 (CDT)
From:      Mark Tinguely <tinguely@casselton.net>
To:        freebsd-arm@freebsd.org, ray@dlink.ua, tinguely@casselton.net
Subject:   Re: [ARM+NFS] panic while copying across NFS
Message-ID:  <200910231522.n9NFMlE3002301@casselton.net>
In-Reply-To: <20091023155825.381728f4.ray@dlink.ua>

next in thread | previous in thread | raw e-mail | index | archive | help

>  Hi Mark!
>  With your patch works fine.
>
>  # dd if=/swap.file of=/mnt/swap.file bs=1M
>  1024+0 records in
>  1024+0 records out
>  1073741824 bytes transferred in 231.294150 secs (4642322 bytes/sec)
>
>  But still slow. Maybe someone know why slow? (Marvell 88F5182 rev A2)

Here is what I think is the complete update to the revisions 181296 and 195779
cache fixes.

1) vm_machdep.c: remove the dangling allocations so they do not
   un-necessarily turn off the cache in the future.
	(this is the patch that worked for you. 2-3 are two more)
2) busdma_machdep.c: remove the same amount than shadow mapped.
3) pmap.c: PVF_REF is used to invalidate cache and flush tlb. PVF_REF
   is set by a trap when the page is really use. kernel pages should
   assume it is immediately used.

In ARMv5 pmap, we should manage every RAM physical page. Without a profiling
the kernel, it would be tough to say were performance issues are orginating.
(device driver, in the fs code, or machine level).

Ideas about the machine level code:

 I think freeing the memory from the level page table descriptors
for general use should improve things. More usuable RAM is always a good
thing. There is some code in trap and other places that looks to see if
the level 1 pde is for this memory space or shared memory space.
we can keep a few level pde around for forks. downside a fork
could fail the 16K contig buffer; which it can in other archs too.
This is a pretty big change.

 There are tests/fixes (switch/pmap) for low vector page that can be
removed with define statement for high vector kernels. In fact if we
are not sharing the level 1 pd, this set only in pmap initialization.
Simple change "#ifdef LOW_VECTOR", minor savings.

 Are we cleaning caches too much?

ARMv6/7 will be a big game changer. Should put a ton of effort into
ARMv5, put the effort into optimizing, or do both?

Index: arm/arm/vm_machdep.c
===================================================================
--- arm/arm/vm_machdep.c	(revision 198246)
+++ arm/arm/vm_machdep.c	(working copy)
@@ -169,6 +169,9 @@ sf_buf_free(struct sf_buf *sf)
 	 if (sf->ref_count == 0) {
 		 TAILQ_INSERT_TAIL(&sf_buf_freelist, sf, free_entry);
 		 nsfbufsused--;
+		 pmap_kremove(sf->kva);
+		 sf->m = NULL;
+		 LIST_REMOVE(sf, list_entry);
 		 if (sf_buf_alloc_want > 0)
 			 wakeup_one(&sf_buf_freelist);
 	 }
@@ -449,9 +452,12 @@ arm_unmap_nocache(void *addr, vm_size_t size)
 
 	size = round_page(size);
 	i = (raddr - arm_nocache_startaddr) / (PAGE_SIZE);
-	for (; size > 0; size -= PAGE_SIZE, i++)
+	for (; size > 0; size -= PAGE_SIZE, i++) {
 		arm_nocache_allocated[i / BITS_PER_INT] &= ~(1 << (i % 
 		    BITS_PER_INT));
+		pmap_kremove(raddr);
+		raddr += PAGE_SIZE;
+	}
 }
 
 #ifdef ARM_USE_SMALL_ALLOC
Index: arm/arm/busdma_machdep.c
===================================================================
--- arm/arm/busdma_machdep.c	(revision 198246)
+++ arm/arm/busdma_machdep.c	(working copy)
@@ -649,7 +649,8 @@ bus_dmamem_free(bus_dma_tag_t dmat, void *vaddr, b
 		KASSERT(map->allocbuffer == vaddr,
 		    ("Trying to freeing the wrong DMA buffer"));
 		vaddr = map->origbuffer;
-		arm_unmap_nocache(map->allocbuffer, dmat->maxsize);
+		arm_unmap_nocache(map->allocbuffer,
+			dmat->maxsize + ((vm_offset_t)vaddr & PAGE_MASK));
 	}
         if (dmat->maxsize <= PAGE_SIZE &&
 	   dmat->alignment < dmat->maxsize &&
Index: arm/arm/pmap.c
===================================================================
--- arm/arm/pmap.c	(revision 198246)
+++ arm/arm/pmap.c	(working copy)
@@ -1643,7 +1643,7 @@ pmap_enter_pv(struct vm_page *pg, struct pv_entry
 		/* PMAP_ASSERT_LOCKED(pmap_kernel()); */
 		pve->pv_pmap = pmap_kernel();
 		pve->pv_va = pg->md.pv_kva;
-		pve->pv_flags = PVF_WRITE | PVF_UNMAN;
+		pve->pv_flags = PVF_WRITE | PVF_UNMAN | PVF_REF;
 		pg->md.pv_kva = 0;
 
 		TAILQ_INSERT_HEAD(&pg->md.pv_list, pve, pv_list);
@@ -2870,7 +2870,7 @@ pmap_kenter_internal(vm_offset_t va, vm_offset_t p
 			vm_page_lock_queues();
 			PMAP_LOCK(pmap_kernel());
 			pmap_enter_pv(m, pve, pmap_kernel(), va,
-					 PVF_WRITE | PVF_UNMAN);
+					 PVF_WRITE | PVF_UNMAN | PVF_REF);
 			pmap_fix_cache(m, pmap_kernel(), va);
 			PMAP_UNLOCK(pmap_kernel());
 		} else {
@@ -3538,7 +3538,7 @@ do_l2b_alloc:
 				if (!TAILQ_EMPTY(&m->md.pv_list) ||
 				     m->md.pv_kva) {
 					KASSERT(pve != NULL, ("No pv"));
-					nflags |= PVF_UNMAN;
+					nflags |= PVF_UNMAN | PVF_REF;
 					pmap_enter_pv(m, pve, pmap, va, nflags);
 				} else
 					m->md.pv_kva = va;



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200910231522.n9NFMlE3002301>