From owner-freebsd-arch@FreeBSD.ORG  Mon Feb  5 00:00:33 2007
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
X-Original-To: arch@freebsd.org
Delivered-To: freebsd-arch@FreeBSD.ORG
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id C605116A400;
	Mon,  5 Feb 2007 00:00:32 +0000 (UTC) (envelope-from alc@cs.rice.edu)
Received: from mail.cs.rice.edu (mail.cs.rice.edu [128.42.1.31])
	by mx1.freebsd.org (Postfix) with ESMTP id 7B43113C491;
	Mon,  5 Feb 2007 00:00:32 +0000 (UTC) (envelope-from alc@cs.rice.edu)
Received: from mail.cs.rice.edu (localhost.localdomain [127.0.0.1])
	by mail.cs.rice.edu (Postfix) with ESMTP id 833BA2C2AC9;
	Sun,  4 Feb 2007 17:41:10 -0600 (CST)
X-Virus-Scanned: by amavis-2.4.0 at mail.cs.rice.edu
Received: from mail.cs.rice.edu ([127.0.0.1])
	by mail.cs.rice.edu (mail.cs.rice.edu [127.0.0.1]) (amavisd-new,
	port 10024)
	with LMTP id GXjMwLHzpR6a; Sun,  4 Feb 2007 17:41:09 -0600 (CST)
Received: from [216.63.78.18] (adsl-216-63-78-18.dsl.hstntx.swbell.net
	[216.63.78.18])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by mail.cs.rice.edu (Postfix) with ESMTP id A0B822C2ABD;
	Sun,  4 Feb 2007 17:41:09 -0600 (CST)
Message-ID: <45C66F14.2020907@cs.rice.edu>
Date: Sun, 04 Feb 2007 17:41:08 -0600
From: Alan Cox <alc@cs.rice.edu>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.13) Gecko/20061115
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: Suleiman Souhlal <ssouhlal@FreeBSD.org>
References: <25784.67.188.127.3.1170310494.squirrel@ssl.mu.org>
In-Reply-To: <25784.67.188.127.3.1170310494.squirrel@ssl.mu.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: alc@freebsd.org, arch@freebsd.org
Subject: Re: Reducing vm page queue mutex contention
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 05 Feb 2007 00:00:33 -0000

Suleiman Souhlal wrote:

>Hello Alan,
>
>Profiling shows that the vm page queue mutex is the most contended lock in
>the kernel, maybe apart from sched_lock. It seems that this is in part
>because this lock protects a lot of things: page queues, pv entries, page
>flags, page hold count, page wired count..
>
>  
>

To start, I would concentrate on entirely eliminating the use of the 
page queues lock from pmap_enter() and pmap_removes_pages().  I've 
inlined specific comments below.

>I came up with a possible plan to reduce contention on this lock,
>concentrating on the amd64 pmap (although these should be applicable to
>the other architectures as well):
>
>- Make vm_page_flag_set/clear() just use atomic operations to get rid of
>  the page queues lock dependency.
>  I'm still not entirely convinced this is entirely safe.
>
>  
>

I would not do this.  Instead, I would create a separate 
machine-dependent flags field that is synchronized by the same lock as 
the pv lists.  This field would then be used for flags such as  
PG_REFERENCED and PG_WRITEABLE.  (In fact, I believe that there is 
wasted space due to alignment in amd64's page structure that could be 
used for this.)

[snip]

>- Add a mutex pool for vm pages to protect the pv entries lists.
>  I'm currently working on this.
>  My current approach makes struct pv_entry larger because it needs to store
>  a pointer to the pte in each pv_entry.
>
>  
>

While a mutex pool may ultimately be needed, I would start with a 
simpler approach and then reevaluate what should be the next step.  
Specifically, I would start with a single lock for the pv lists and 
machine-dependent flags.  Then, you won't need the pointer.  Moreover, 
the use of a mutex pool is going to add overhead to the ranged 
operations like pmap_remove_pages() and pmap_remove().

>  Another way that might be better is to move to per-object pv entries,
>  which is what Linux does. It would greatly reduce memory usage when
>  mapping large objects in a lot of processes, although it might be slower
>  for sparsely faulted objects mapped in a large number of processes.
>  This approach would be a lot of work, which is why I'm leaning towards
>  keeping per-page pv entries.
>
>  
>

I have addressed this problem in a different way.  With the superpages 
support in perforce, I create a single pv entry per superpage mapping.

>- It should be possible to make vm_page->wired_count use atomic operations
>  instead of needing a lock, similarly to what I did for the hold_count.
>  This might be a bit tricky, but hopefully possible.
>  Alternatively, we could use the mutex pool described above to protect it.
>
>  
>

Page table page's (ab)use their wire_count field as a reference count.  
The pmap lock is already held whenever this count is changed, or more 
generally when the page table is changed.  So, at least for page table 
pages nothing needs to change.

>- We can change pmap_unuse_pt and free_pv_entry to just mark the pages they
>  want to free in an array allocated by the caller.
>  The caller will then free those pages after it drops the pmap lock.
>
>  For example:
>
>  struct pages_to_free {
>          vm_page_t *page[MAX_PAGES];
>          int num_pages;
>  };
>
>  void pmap_remove(...)
>  {
>      struct pages_to_free pages;
>
>      PMAP_LOCK(pmap);
>      ...
>      pmap_unuse_pt(..., &pages);
>      ...
>      PMAP_UNLOCK(pmap);
>      vm_page_lock_queues();
>      for (i = 0; i < pages.num_pages; i++)
>          vm_page_free(pages.page[i]);
>      vm_page_unlock_queues();
>  }
>
>  This way, pmap_remove can be mostly without queues lock.
>
>  
>

I can make vm_page_free() callable without the page queues lock for page 
table and pv entry pages, i.e., pages that don't belong to a vm object.  
Then, you don't need to do this.

However, there is another issue that you don't touch on here.   The page 
queues lock is being used to synchronize changes to the page's dirty 
field and the PTE's PG_M bit against testing for dirty pages in the 
machine-independent (MI) code.  There are ordering issues in both the 
pmap and MI code, e.g., pmap_enter() clears PG_M on an old mapping 
before setting the page's dirty field and vm_page_dontneed() reads the 
page's dirty field and then conditionally call pmap_is_modified().

>- Once the above are done, it should be possible to make pmap_enter() run
>  mostly queue lock free by:
>  - Pre-allocating a pv chunk early in pmap_enter, if there are no free
>    ones, so that we never have to allocate new chunks in pmap_insert_entry.
>  - Dropping the page queues lock immediately after the pmap_allocpte in
>    pmap_enter.
>  
>

Actually, you shouldn't need to acquire the page queues lock at all or 
move the allocation of a pv_chunk.

Regards,
Alan


From owner-freebsd-arch@FreeBSD.ORG  Tue Feb  6 13:13:04 2007
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
X-Original-To: freebsd-arch@freebsd.org
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id D79B716A400
	for <freebsd-arch@freebsd.org>; Tue,  6 Feb 2007 13:13:02 +0000 (UTC)
	(envelope-from antinvidia@gmail.com)
Received: from nz-out-0506.google.com (nz-out-0506.google.com [64.233.162.237])
	by mx1.freebsd.org (Postfix) with ESMTP id 9CA9B13C428
	for <freebsd-arch@freebsd.org>; Tue,  6 Feb 2007 13:13:02 +0000 (UTC)
	(envelope-from antinvidia@gmail.com)
Received: by nz-out-0506.google.com with SMTP id i11so1933822nzh
	for <freebsd-arch@freebsd.org>; Tue, 06 Feb 2007 05:13:02 -0800 (PST)
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta;
	h=received:message-id:date:from:to:subject:mime-version:content-type;
	b=faAfCsasl4ohMZpwLlIDKBH5gXRBDdfMDqCpLwQM7mqXPQM2KbKy6sZ2JdAL55I+ZnmM7gZK0YtVbybmzdXqyiIxuVSvzppMI6bleCRbfD2eCirxV1L6KBNK40+cTH4Smc/pI3byhVWbRO16+Disj52Nmdn15aJ/YxPJk1Fg/g4=
Received: by 10.35.111.17 with SMTP id o17mr16215835pym.1170765929243;
	Tue, 06 Feb 2007 04:45:29 -0800 (PST)
Received: by 10.35.58.13 with HTTP; Tue, 6 Feb 2007 04:45:29 -0800 (PST)
Message-ID: <be0088ce0702060445t729e14dase836541dd4970456@mail.gmail.com>
Date: Tue, 6 Feb 2007 12:45:29 +0000
From: MQ <antinvidia@gmail.com>
To: freebsd-arch@freebsd.org
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Subject: a question about prison_ip
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 06 Feb 2007 13:13:04 -0000

Hi everyone,
    I was reading jail(8) related codes in the file kern_jail.c recently. I
found that prison_ip provided three parameters, the second of which is a
flag that indicate whether the third argument ip is host byte order or
network byte order. I skimmed the codes that invoke prison_ip function, all
of them calls prison_ip with flag=0 to indicate that the ip argument is in
network byte order.
    I'm considering if we can remove the flag parameter, and modify
corresponding source codes, say, userland program jail.c and jls.c to keep
consistent with the kernel. This can avoid some useless action taken by
current kernel. But I don't know if current implementation of the prison_ip
has some special reasons to act like this? Does anyone have opinions with
this problem?

Best regards.
MQ

From owner-freebsd-arch@FreeBSD.ORG  Tue Feb  6 20:37:33 2007
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
X-Original-To: arch@freebsd.org
Delivered-To: freebsd-arch@FreeBSD.ORG
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id D88C116A403;
	Tue,  6 Feb 2007 20:37:33 +0000 (UTC) (envelope-from jhb@freebsd.org)
Received: from server.baldwin.cx (66-23-211-162.clients.speedfactory.net
	[66.23.211.162])
	by mx1.freebsd.org (Postfix) with ESMTP id 644AE13C46B;
	Tue,  6 Feb 2007 20:37:33 +0000 (UTC) (envelope-from jhb@freebsd.org)
Received: from localhost.corp.yahoo.com (john@localhost [127.0.0.1])
	(authenticated bits=0)
	by server.baldwin.cx (8.13.6/8.13.6) with ESMTP id l16KbUGW072639;
	Tue, 6 Feb 2007 15:37:31 -0500 (EST) (envelope-from jhb@freebsd.org)
From: John Baldwin <jhb@freebsd.org>
To: arch@freebsd.org
Date: Tue, 6 Feb 2007 15:37:53 -0500
User-Agent: KMail/1.9.1
MIME-Version: 1.0
Content-Type: text/plain;
  charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200702061537.54056.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by
	milter-greylist-2.0.2 (server.baldwin.cx [127.0.0.1]);
	Tue, 06 Feb 2007 15:37:31 -0500 (EST)
X-Virus-Scanned: ClamAV 0.88.3/2529/Tue Feb 6 14:25:02 2007 on
	server.baldwin.cx
X-Virus-Status: Clean
X-Spam-Status: No, score=-4.4 required=4.2 tests=ALL_TRUSTED,AWL,BAYES_00 
	autolearn=ham version=3.1.3
X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on server.baldwin.cx
Cc: 
Subject: Restoring crash dump ability for kvm_getswapinfo(3)
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 06 Feb 2007 20:37:34 -0000

The patch below fixes kvm_getswapinfo(3) (and thus pstat -T/pstat -s/swapinfo) 
on crash dumps.  The one ugliness in it is that since 'struct swdevt' was 
moved into swap_pager.c, there is no longer a header that I can include to 
get it (used to be in vm/swap_pager.h) so I have a copy of the structure in 
kvm_getswapinfo.c.  I'd prefer to move the structures back into 
vm/swap_pager.h.

Index: kvm_getswapinfo.c
===================================================================
RCS file: /usr/cvs/src/lib/libkvm/kvm_getswapinfo.c,v
retrieving revision 1.26
diff -u -r1.26 kvm_getswapinfo.c
--- kvm_getswapinfo.c	31 Jul 2004 18:49:53 -0000	1.26
+++ kvm_getswapinfo.c	6 Feb 2007 20:31:09 -0000
@@ -49,18 +49,59 @@
 
 #include "kvm_private.h"
 
-#define NL_SWAPBLIST	0
-#define NL_SWDEVT	1
-#define NL_NSWDEV	2
-#define NL_DMMAX	3
+/* Grrr, this is hidden in swap_pager.c now, how annoying. */
+typedef	int32_t	swblk_t;
+
+struct buf;
+struct blist;
+
+struct swdevt;
+typedef void sw_strategy_t(struct buf *bp, struct swdevt *sw);
+typedef void sw_close_t(struct thread *td, struct swdevt *sw);
+
+struct swdevt {
+	int	sw_flags;
+	int	sw_nblks;
+	int     sw_used;
+	dev_t	sw_dev;
+	struct vnode *sw_vp;
+	void	*sw_id;
+	swblk_t	sw_first;
+	swblk_t	sw_end;
+	struct blist *sw_blist;
+	TAILQ_ENTRY(swdevt)	sw_list;
+	sw_strategy_t		*sw_strategy;
+	sw_close_t		*sw_close;
+};
+
+static struct nlist kvm_swap_nl[] = {
+	{ "_swtailq" },		/* list of swap devices and sizes */
+	{ "_dmmax" },		/* maximum size of a swap block */
+	{ NULL }
+};
+
+#define NL_SWTAILQ	0
+#define NL_DMMAX	1
 
 static int kvm_swap_nl_cached = 0;
 static int unswdev;  /* number of found swap dev's */
 static int dmmax;
 
+static int  kvm_getswapinfo_kvm(kvm_t *, struct kvm_swap *, int, int);
 static int  kvm_getswapinfo_sysctl(kvm_t *, struct kvm_swap *, int, int);
+static int  nlist_init(kvm_t *);
 static int  getsysctl(kvm_t *, char *, void *, size_t);
 
+#define KREAD(kd, addr, obj) \
+	(kvm_read(kd, addr, (char *)(obj), sizeof(*obj)) != sizeof(*obj))
+#define	KGET(idx, var)							\
+	KGET2(kvm_swap_nl[(idx)].n_value, var, kvm_swap_nl[(idx)].n_name)
+#define KGET2(addr, var, msg)						\
+	if (KREAD(kd, (u_long)(addr), (var))) {				\
+		_kvm_err(kd, kd->program, "cannot read %s", msg);	\
+		return (-1);						\
+	}
+	
 #define GETSWDEVNAME(dev, str, flags)					\
 	if (dev == NODEV) {						\
 		strlcpy(str, "[NFS swap]", sizeof(str));		\
@@ -91,8 +132,50 @@
 	if (ISALIVE(kd)) {
 		return kvm_getswapinfo_sysctl(kd, swap_ary, swap_max, flags);
 	} else {
-		return -1;
+		return kvm_getswapinfo_kvm(kd, swap_ary, swap_max, flags);
+	}
+}
+
+int
+kvm_getswapinfo_kvm(
+	kvm_t *kd,
+	struct kvm_swap *swap_ary,
+	int swap_max,
+	int flags
+) {
+	int i, ttl;
+	TAILQ_HEAD(, swdevt) swtailq;
+	struct swdevt *sp, swinfo;
+	struct kvm_swap tot;
+
+	if (!nlist_init(kd))
+		return (-1);
+
+	bzero(&tot, sizeof(tot));
+	KGET(NL_SWTAILQ, &swtailq);
+	sp = TAILQ_FIRST(&swtailq);
+	for (i = 0; sp != NULL; i++) {
+		KGET2(sp, &swinfo, "swinfo");
+		ttl = swinfo.sw_nblks - dmmax;
+		if (i < swap_max - 1) {
+			bzero(&swap_ary[i], sizeof(swap_ary[i]));
+			swap_ary[i].ksw_total = ttl;
+			swap_ary[i].ksw_used = swinfo.sw_used;
+			swap_ary[i].ksw_flags = swinfo.sw_flags;
+			GETSWDEVNAME(swinfo.sw_dev, swap_ary[i].ksw_devname,
+			     flags);
+		}
+		tot.ksw_total += ttl;
+		tot.ksw_used += swinfo.sw_used;
+		sp = TAILQ_NEXT(&swinfo, sw_list);
 	}
+
+	if (i >= swap_max)
+		i = swap_max - 1;
+	if (i >= 0)
+		swap_ary[i] = tot;
+
+        return(i);
 }
 
 #define	GETSYSCTL(kd, name, var)					\
@@ -168,6 +251,36 @@
 }
 
 static int
+nlist_init(kvm_t *kd)
+{
+	TAILQ_HEAD(, swdevt) swtailq;
+	struct swdevt *sp, swinfo;
+
+	if (kvm_swap_nl_cached)
+		return (1);
+
+	if (kvm_nlist(kd, kvm_swap_nl) < 0)
+		return (0);
+
+	/* Required entries */
+	if (kvm_swap_nl[NL_SWTAILQ].n_value == 0) {
+		_kvm_err(kd, kd->program, "unable to find swtailq");
+		return (0);
+	}
+		
+	if (kvm_swap_nl[NL_DMMAX].n_value == 0) {
+		_kvm_err(kd, kd->program, "unable to find dmmax");
+		return (0);
+	}
+
+	/* Get globals, type of swap */
+	KGET(NL_DMMAX, &dmmax);
+
+	kvm_swap_nl_cached = 1;
+	return (1);
+}
+
+static int
 getsysctl (
 	kvm_t *kd,
 	char *name,

-- 
John Baldwin

From owner-freebsd-arch@FreeBSD.ORG  Tue Feb  6 20:57:02 2007
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
X-Original-To: arch@freebsd.org
Delivered-To: freebsd-arch@FreeBSD.ORG
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 5D80E16A400;
	Tue,  6 Feb 2007 20:57:02 +0000 (UTC) (envelope-from jhb@freebsd.org)
Received: from server.baldwin.cx (66-23-211-162.clients.speedfactory.net
	[66.23.211.162])
	by mx1.freebsd.org (Postfix) with ESMTP id B90CD13C48E;
	Tue,  6 Feb 2007 20:57:01 +0000 (UTC) (envelope-from jhb@freebsd.org)
Received: from localhost.corp.yahoo.com (john@localhost [127.0.0.1])
	(authenticated bits=0)
	by server.baldwin.cx (8.13.6/8.13.6) with ESMTP id l16KuwUI072784;
	Tue, 6 Feb 2007 15:56:58 -0500 (EST) (envelope-from jhb@freebsd.org)
From: John Baldwin <jhb@freebsd.org>
To: "Attilio Rao" <attilio@freebsd.org>
Date: Tue, 6 Feb 2007 15:57:24 -0500
User-Agent: KMail/1.9.1
References: <200702061537.54056.jhb@freebsd.org>
	<3bbf2fe10702061246ofdc00a6kc9ba5366aea4e2e2@mail.gmail.com>
In-Reply-To: <3bbf2fe10702061246ofdc00a6kc9ba5366aea4e2e2@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain;
  charset="utf-8"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200702061557.24646.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by
	milter-greylist-2.0.2 (server.baldwin.cx [127.0.0.1]);
	Tue, 06 Feb 2007 15:56:58 -0500 (EST)
X-Virus-Scanned: ClamAV 0.88.3/2529/Tue Feb 6 14:25:02 2007 on
	server.baldwin.cx
X-Virus-Status: Clean
X-Spam-Status: No, score=-4.4 required=4.2 tests=ALL_TRUSTED,AWL,BAYES_00 
	autolearn=ham version=3.1.3
X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on server.baldwin.cx
Cc: arch@freebsd.org
Subject: Re: Restoring crash dump ability for kvm_getswapinfo(3)
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 06 Feb 2007 20:57:02 -0000

On Tuesday 06 February 2007 15:46, Attilio Rao wrote:
> 2007/2/6, John Baldwin <jhb@freebsd.org>:
> > The patch below fixes kvm_getswapinfo(3) (and thus 
pstat -T/pstat -s/swapinfo)
> > on crash dumps.  The one ugliness in it is that since 'struct swdevt' was
> > moved into swap_pager.c, there is no longer a header that I can include to
> > get it (used to be in vm/swap_pager.h) so I have a copy of the structure 
in
> > kvm_getswapinfo.c.  I'd prefer to move the structures back into
> > vm/swap_pager.h.
> >
> > Index: kvm_getswapinfo.c
> > ===================================================================
> > RCS file: /usr/cvs/src/lib/libkvm/kvm_getswapinfo.c,v
> > retrieving revision 1.26
> > diff -u -r1.26 kvm_getswapinfo.c
> > --- kvm_getswapinfo.c   31 Jul 2004 18:49:53 -0000      1.26
> > +++ kvm_getswapinfo.c   6 Feb 2007 20:31:09 -0000
> > @@ -49,18 +49,59 @@
> >
> >  #include "kvm_private.h"
> >
> > -#define NL_SWAPBLIST   0
> > -#define NL_SWDEVT      1
> > -#define NL_NSWDEV      2
> > -#define NL_DMMAX       3
> > +/* Grrr, this is hidden in swap_pager.c now, how annoying. */
> > +typedef        int32_t swblk_t;
> 
> It would not be better exporting swblk_t definition into swap_pager.h,
> while you are here?

Yes, as I mentioned, I would prefer that, but they were explicitly moved from 
swap_pager.h to swap_pager.c previously, so I'd like to make sure it is ok to 
move them back.

-- 
John Baldwin

From owner-freebsd-arch@FreeBSD.ORG  Tue Feb  6 21:02:03 2007
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
X-Original-To: arch@freebsd.org
Delivered-To: freebsd-arch@FreeBSD.ORG
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 25C7316A400;
	Tue,  6 Feb 2007 21:02:03 +0000 (UTC)
	(envelope-from phk@critter.freebsd.dk)
Received: from phk.freebsd.dk (phk.freebsd.dk [130.225.244.222])
	by mx1.freebsd.org (Postfix) with ESMTP id C211B13C461;
	Tue,  6 Feb 2007 21:02:02 +0000 (UTC)
	(envelope-from phk@critter.freebsd.dk)
Received: from critter.freebsd.dk (critter.freebsd.dk [192.168.48.2])
	by phk.freebsd.dk (Postfix) with ESMTP id 6F2C91747C;
	Tue,  6 Feb 2007 21:02:01 +0000 (UTC)
Received: from critter.freebsd.dk (localhost [127.0.0.1])
	by critter.freebsd.dk (8.13.8/8.13.8) with ESMTP id l16L20Fl094788;
	Tue, 6 Feb 2007 21:02:01 GMT (envelope-from phk@critter.freebsd.dk)
To: John Baldwin <jhb@freebsd.org>
From: "Poul-Henning Kamp" <phk@phk.freebsd.dk>
In-Reply-To: Your message of "Tue, 06 Feb 2007 15:57:24 EST."
	<200702061557.24646.jhb@freebsd.org> 
Date: Tue, 06 Feb 2007 21:02:00 +0000
Message-ID: <94787.1170795720@critter.freebsd.dk>
Sender: phk@critter.freebsd.dk
Cc: Attilio Rao <attilio@freebsd.org>, arch@freebsd.org
Subject: Re: Restoring crash dump ability for kvm_getswapinfo(3) 
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 06 Feb 2007 21:02:03 -0000

In message <200702061557.24646.jhb@freebsd.org>, John Baldwin writes:

>Yes, as I mentioned, I would prefer that, but they were explicitly moved from 
>swap_pager.h to swap_pager.c previously, so I'd like to make sure it is ok to 
>move them back.

No objection from me.

-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.

From owner-freebsd-arch@FreeBSD.ORG  Tue Feb  6 21:13:16 2007
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
X-Original-To: arch@freebsd.org
Delivered-To: freebsd-arch@FreeBSD.ORG
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 8989016A402
	for <arch@freebsd.org>; Tue,  6 Feb 2007 21:13:16 +0000 (UTC)
	(envelope-from asmrookie@gmail.com)
Received: from nf-out-0910.google.com (nf-out-0910.google.com [64.233.182.188])
	by mx1.freebsd.org (Postfix) with ESMTP id 26E5B13C494
	for <arch@freebsd.org>; Tue,  6 Feb 2007 21:13:15 +0000 (UTC)
	(envelope-from asmrookie@gmail.com)
Received: by nf-out-0910.google.com with SMTP id m19so278780nfc
	for <arch@freebsd.org>; Tue, 06 Feb 2007 13:13:15 -0800 (PST)
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta;
	h=received:message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references:x-google-sender-auth;
	b=crQrQREhBSOq22Oe0TcHP9Qc0g41ybbdR7sapJViDQ+4HtYKaUxc6ioSZ/C1AkyqmUWtxaSPlN10fpfJqADJTe9kHtrqCZg+NI8YRU7JjkDo6GcjSn+fCNre4CHhVkb8M2nR1yTK0MMCrzAmBxJ7JSAtnYwd3o7zljJHx9TnI/Q=
Received: by 10.82.136.4 with SMTP id j4mr5192053bud.1170794774248;
	Tue, 06 Feb 2007 12:46:14 -0800 (PST)
Received: by 10.48.238.9 with HTTP; Tue, 6 Feb 2007 12:46:14 -0800 (PST)
Message-ID: <3bbf2fe10702061246ofdc00a6kc9ba5366aea4e2e2@mail.gmail.com>
Date: Tue, 6 Feb 2007 21:46:14 +0100
From: "Attilio Rao" <attilio@freebsd.org>
Sender: asmrookie@gmail.com
To: "John Baldwin" <jhb@freebsd.org>
In-Reply-To: <200702061537.54056.jhb@freebsd.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
References: <200702061537.54056.jhb@freebsd.org>
X-Google-Sender-Auth: 3b650e85f0c74580
Cc: arch@freebsd.org
Subject: Re: Restoring crash dump ability for kvm_getswapinfo(3)
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 06 Feb 2007 21:13:16 -0000

2007/2/6, John Baldwin <jhb@freebsd.org>:
> The patch below fixes kvm_getswapinfo(3) (and thus pstat -T/pstat -s/swapinfo)
> on crash dumps.  The one ugliness in it is that since 'struct swdevt' was
> moved into swap_pager.c, there is no longer a header that I can include to
> get it (used to be in vm/swap_pager.h) so I have a copy of the structure in
> kvm_getswapinfo.c.  I'd prefer to move the structures back into
> vm/swap_pager.h.
>
> Index: kvm_getswapinfo.c
> ===================================================================
> RCS file: /usr/cvs/src/lib/libkvm/kvm_getswapinfo.c,v
> retrieving revision 1.26
> diff -u -r1.26 kvm_getswapinfo.c
> --- kvm_getswapinfo.c   31 Jul 2004 18:49:53 -0000      1.26
> +++ kvm_getswapinfo.c   6 Feb 2007 20:31:09 -0000
> @@ -49,18 +49,59 @@
>
>  #include "kvm_private.h"
>
> -#define NL_SWAPBLIST   0
> -#define NL_SWDEVT      1
> -#define NL_NSWDEV      2
> -#define NL_DMMAX       3
> +/* Grrr, this is hidden in swap_pager.c now, how annoying. */
> +typedef        int32_t swblk_t;

It would not be better exporting swblk_t definition into swap_pager.h,
while you are here?

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein

From owner-freebsd-arch@FreeBSD.ORG  Tue Feb  6 21:31:34 2007
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
X-Original-To: arch@freebsd.org
Delivered-To: freebsd-arch@FreeBSD.ORG
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 2251416A401
	for <arch@freebsd.org>; Tue,  6 Feb 2007 21:31:34 +0000 (UTC)
	(envelope-from julian@elischer.org)
Received: from outY.internet-mail-service.net (outY.internet-mail-service.net
	[216.240.47.248])
	by mx1.freebsd.org (Postfix) with ESMTP id 0D0E813C4B5
	for <arch@freebsd.org>; Tue,  6 Feb 2007 21:31:33 +0000 (UTC)
	(envelope-from julian@elischer.org)
Received: from shell.idiom.com (HELO idiom.com) (216.240.47.20)
	by out.internet-mail-service.net (qpsmtpd/0.32) with ESMTP;
	Tue, 06 Feb 2007 13:09:12 -0800
Received: from [192.168.2.5] (home.elischer.org [216.240.48.38])
	by idiom.com (Postfix) with ESMTP id 91EAF125B31;
	Tue,  6 Feb 2007 13:31:32 -0800 (PST)
Message-ID: <45C8F3B4.8060400@elischer.org>
Date: Tue, 06 Feb 2007 13:31:32 -0800
From: Julian Elischer <julian@elischer.org>
User-Agent: Thunderbird 1.5.0.9 (Macintosh/20061207)
MIME-Version: 1.0
To: John Baldwin <jhb@freebsd.org>
References: <200702061537.54056.jhb@freebsd.org>
In-Reply-To: <200702061537.54056.jhb@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: arch@freebsd.org
Subject: Re: Restoring crash dump ability for kvm_getswapinfo(3)
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 06 Feb 2007 21:31:34 -0000

John Baldwin wrote:
> The patch below fixes kvm_getswapinfo(3) (and thus pstat -T/pstat -s/swapinfo) 
> on crash dumps.  The one ugliness in it is that since 'struct swdevt' was 
> moved into swap_pager.c, there is no longer a header that I can include to 
> get it (used to be in vm/swap_pager.h) so I have a copy of the structure in 
> kvm_getswapinfo.c.  I'd prefer to move the structures back into 
> vm/swap_pager.h.

I agree


From owner-freebsd-arch@FreeBSD.ORG  Tue Feb  6 22:36:33 2007
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
X-Original-To: arch@freebsd.org
Delivered-To: freebsd-arch@FreeBSD.ORG
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 8D4B616A414;
	Tue,  6 Feb 2007 22:36:33 +0000 (UTC) (envelope-from jhb@freebsd.org)
Received: from server.baldwin.cx (66-23-211-162.clients.speedfactory.net
	[66.23.211.162])
	by mx1.freebsd.org (Postfix) with ESMTP id 1A73B13C467;
	Tue,  6 Feb 2007 22:36:32 +0000 (UTC) (envelope-from jhb@freebsd.org)
Received: from localhost.corp.yahoo.com (john@localhost [127.0.0.1])
	(authenticated bits=0)
	by server.baldwin.cx (8.13.6/8.13.6) with ESMTP id l16MaU7U073514;
	Tue, 6 Feb 2007 17:36:31 -0500 (EST) (envelope-from jhb@freebsd.org)
From: John Baldwin <jhb@freebsd.org>
To: "Poul-Henning Kamp" <phk@phk.freebsd.dk>
Date: Tue, 6 Feb 2007 17:36:54 -0500
User-Agent: KMail/1.9.1
References: <94787.1170795720@critter.freebsd.dk>
In-Reply-To: <94787.1170795720@critter.freebsd.dk>
MIME-Version: 1.0
Content-Type: text/plain;
  charset="iso-8859-15"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200702061736.54926.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by
	milter-greylist-2.0.2 (server.baldwin.cx [127.0.0.1]);
	Tue, 06 Feb 2007 17:36:31 -0500 (EST)
X-Virus-Scanned: ClamAV 0.88.3/2529/Tue Feb 6 14:25:02 2007 on
	server.baldwin.cx
X-Virus-Status: Clean
X-Spam-Status: No, score=-4.4 required=4.2 tests=ALL_TRUSTED,AWL,BAYES_00 
	autolearn=ham version=3.1.3
X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on server.baldwin.cx
Cc: Attilio Rao <attilio@freebsd.org>, arch@freebsd.org
Subject: Re: Restoring crash dump ability for kvm_getswapinfo(3)
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 06 Feb 2007 22:36:33 -0000

On Tuesday 06 February 2007 16:02, Poul-Henning Kamp wrote:
> In message <200702061557.24646.jhb@freebsd.org>, John Baldwin writes:
> 
> >Yes, as I mentioned, I would prefer that, but they were explicitly moved from 
> >swap_pager.h to swap_pager.c previously, so I'd like to make sure it is ok to 
> >move them back.
> 
> No objection from me.

Ok, cool.  This is the updated patch:

Index: lib/libkvm/kvm_getswapinfo.c
===================================================================
RCS file: /usr/cvs/src/lib/libkvm/kvm_getswapinfo.c,v
retrieving revision 1.26
diff -u -r1.26 kvm_getswapinfo.c
--- lib/libkvm/kvm_getswapinfo.c	31 Jul 2004 18:49:53 -0000	1.26
+++ lib/libkvm/kvm_getswapinfo.c	6 Feb 2007 21:27:52 -0000
@@ -33,6 +33,7 @@
 #include <sys/blist.h>
 #include <sys/sysctl.h>
 
+#include <vm/swap_pager.h>
 #include <vm/vm_param.h>
 
 #include <err.h>
@@ -49,18 +50,34 @@
 
 #include "kvm_private.h"
 
-#define NL_SWAPBLIST	0
-#define NL_SWDEVT	1
-#define NL_NSWDEV	2
-#define NL_DMMAX	3
+static struct nlist kvm_swap_nl[] = {
+	{ "_swtailq" },		/* list of swap devices and sizes */
+	{ "_dmmax" },		/* maximum size of a swap block */
+	{ NULL }
+};
+
+#define NL_SWTAILQ	0
+#define NL_DMMAX	1
 
 static int kvm_swap_nl_cached = 0;
 static int unswdev;  /* number of found swap dev's */
 static int dmmax;
 
+static int  kvm_getswapinfo_kvm(kvm_t *, struct kvm_swap *, int, int);
 static int  kvm_getswapinfo_sysctl(kvm_t *, struct kvm_swap *, int, int);
+static int  nlist_init(kvm_t *);
 static int  getsysctl(kvm_t *, char *, void *, size_t);
 
+#define KREAD(kd, addr, obj) \
+	(kvm_read(kd, addr, (char *)(obj), sizeof(*obj)) != sizeof(*obj))
+#define	KGET(idx, var)							\
+	KGET2(kvm_swap_nl[(idx)].n_value, var, kvm_swap_nl[(idx)].n_name)
+#define KGET2(addr, var, msg)						\
+	if (KREAD(kd, (u_long)(addr), (var))) {				\
+		_kvm_err(kd, kd->program, "cannot read %s", msg);	\
+		return (-1);						\
+	}
+	
 #define GETSWDEVNAME(dev, str, flags)					\
 	if (dev == NODEV) {						\
 		strlcpy(str, "[NFS swap]", sizeof(str));		\
@@ -91,8 +108,50 @@
 	if (ISALIVE(kd)) {
 		return kvm_getswapinfo_sysctl(kd, swap_ary, swap_max, flags);
 	} else {
-		return -1;
+		return kvm_getswapinfo_kvm(kd, swap_ary, swap_max, flags);
+	}
+}
+
+int
+kvm_getswapinfo_kvm(
+	kvm_t *kd,
+	struct kvm_swap *swap_ary,
+	int swap_max,
+	int flags
+) {
+	int i, ttl;
+	TAILQ_HEAD(, swdevt) swtailq;
+	struct swdevt *sp, swinfo;
+	struct kvm_swap tot;
+
+	if (!nlist_init(kd))
+		return (-1);
+
+	bzero(&tot, sizeof(tot));
+	KGET(NL_SWTAILQ, &swtailq);
+	sp = TAILQ_FIRST(&swtailq);
+	for (i = 0; sp != NULL; i++) {
+		KGET2(sp, &swinfo, "swinfo");
+		ttl = swinfo.sw_nblks - dmmax;
+		if (i < swap_max - 1) {
+			bzero(&swap_ary[i], sizeof(swap_ary[i]));
+			swap_ary[i].ksw_total = ttl;
+			swap_ary[i].ksw_used = swinfo.sw_used;
+			swap_ary[i].ksw_flags = swinfo.sw_flags;
+			GETSWDEVNAME(swinfo.sw_dev, swap_ary[i].ksw_devname,
+			     flags);
+		}
+		tot.ksw_total += ttl;
+		tot.ksw_used += swinfo.sw_used;
+		sp = TAILQ_NEXT(&swinfo, sw_list);
 	}
+
+	if (i >= swap_max)
+		i = swap_max - 1;
+	if (i >= 0)
+		swap_ary[i] = tot;
+
+        return(i);
 }
 
 #define	GETSYSCTL(kd, name, var)					\
@@ -168,6 +227,36 @@
 }
 
 static int
+nlist_init(kvm_t *kd)
+{
+	TAILQ_HEAD(, swdevt) swtailq;
+	struct swdevt *sp, swinfo;
+
+	if (kvm_swap_nl_cached)
+		return (1);
+
+	if (kvm_nlist(kd, kvm_swap_nl) < 0)
+		return (0);
+
+	/* Required entries */
+	if (kvm_swap_nl[NL_SWTAILQ].n_value == 0) {
+		_kvm_err(kd, kd->program, "unable to find swtailq");
+		return (0);
+	}
+		
+	if (kvm_swap_nl[NL_DMMAX].n_value == 0) {
+		_kvm_err(kd, kd->program, "unable to find dmmax");
+		return (0);
+	}
+
+	/* Get globals, type of swap */
+	KGET(NL_DMMAX, &dmmax);
+
+	kvm_swap_nl_cached = 1;
+	return (1);
+}
+
+static int
 getsysctl (
 	kvm_t *kd,
 	char *name,
Index: sys/vm/swap_pager.c
===================================================================
RCS file: /usr/cvs/src/sys/vm/swap_pager.c,v
retrieving revision 1.286
diff -u -r1.286 swap_pager.c
--- sys/vm/swap_pager.c	5 Jan 2007 19:09:01 -0000	1.286
+++ sys/vm/swap_pager.c	6 Feb 2007 21:30:28 -0000
@@ -138,37 +138,6 @@
 #define SWAP_META_PAGES		(SWB_NPAGES * 2)
 #define SWAP_META_MASK		(SWAP_META_PAGES - 1)
 
-typedef	int32_t	swblk_t;	/*
-				 * swap offset.  This is the type used to
-				 * address the "virtual swap device" and
-				 * therefore the maximum swap space is
-				 * 2^32 pages.
-				 */
-
-struct swdevt;
-typedef void sw_strategy_t(struct buf *bp, struct swdevt *sw);
-typedef void sw_close_t(struct thread *td, struct swdevt *sw);
-
-/*
- * Swap device table
- */
-struct swdevt {
-	int	sw_flags;
-	int	sw_nblks;
-	int     sw_used;
-	dev_t	sw_dev;
-	struct vnode *sw_vp;
-	void	*sw_id;
-	swblk_t	sw_first;
-	swblk_t	sw_end;
-	struct blist *sw_blist;
-	TAILQ_ENTRY(swdevt)	sw_list;
-	sw_strategy_t		*sw_strategy;
-	sw_close_t		*sw_close;
-};
-
-#define	SW_CLOSING	0x04
-
 struct swblock {
 	struct swblock	*swb_hnext;
 	vm_object_t	swb_object;
Index: sys/vm/swap_pager.h
===================================================================
RCS file: /usr/cvs/src/sys/vm/swap_pager.h,v
retrieving revision 1.51
diff -u -r1.51 swap_pager.h
--- sys/vm/swap_pager.h	10 Apr 2006 10:03:41 -0000	1.51
+++ sys/vm/swap_pager.h	6 Feb 2007 21:27:33 -0000
@@ -38,6 +38,38 @@
 #ifndef	_VM_SWAP_PAGER_H_
 #define	_VM_SWAP_PAGER_H_ 1
 
+typedef	int32_t	swblk_t;	/*
+				 * swap offset.  This is the type used to
+				 * address the "virtual swap device" and
+				 * therefore the maximum swap space is
+				 * 2^32 pages.
+				 */
+
+struct buf;
+struct swdevt;
+typedef void sw_strategy_t(struct buf *, struct swdevt *);
+typedef void sw_close_t(struct thread *, struct swdevt *);
+
+/*
+ * Swap device table
+ */
+struct swdevt {
+	int	sw_flags;
+	int	sw_nblks;
+	int     sw_used;
+	dev_t	sw_dev;
+	struct vnode *sw_vp;
+	void	*sw_id;
+	swblk_t	sw_first;
+	swblk_t	sw_end;
+	struct blist *sw_blist;
+	TAILQ_ENTRY(swdevt)	sw_list;
+	sw_strategy_t		*sw_strategy;
+	sw_close_t		*sw_close;
+};
+
+#define	SW_CLOSING	0x04
+
 #ifdef _KERNEL
 
 extern int swap_pager_full;

-- 
John Baldwin

From owner-freebsd-arch@FreeBSD.ORG  Fri Feb  9 00:00:47 2007
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
X-Original-To: arch@freebsd.org
Delivered-To: freebsd-arch@FreeBSD.ORG
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id E042816A402
	for <arch@freebsd.org>; Fri,  9 Feb 2007 00:00:47 +0000 (UTC)
	(envelope-from gnn@neville-neil.com)
Received: from mrout3.yahoo.com (mrout3.yahoo.com [216.145.54.173])
	by mx1.freebsd.org (Postfix) with ESMTP id C423813C428
	for <arch@freebsd.org>; Fri,  9 Feb 2007 00:00:47 +0000 (UTC)
	(envelope-from gnn@neville-neil.com)
Received: from minion.neville-neil.com (proxy8.corp.yahoo.com [216.145.48.13])
	by mrout3.yahoo.com (8.13.6/8.13.6/y.out) with ESMTP id
	l18NoH6T064239; Thu, 8 Feb 2007 15:50:18 -0800 (PST)
Date: Thu, 08 Feb 2007 15:50:10 -0800
Message-ID: <m2fy9gw6od.wl%gnn@neville-neil.com>
From: "George V. Neville-Neil" <gnn@neville-neil.com>
To: Julian Elischer <julian@elischer.org>
In-Reply-To: <45BFC246.2000005@elischer.org>
References: <45BFC246.2000005@elischer.org>
User-Agent: Wanderlust/2.14.0 (Africa) SEMI/1.14.6 (Maruoka) FLIM/1.14.8
	(=?ISO-8859-4?Q?Shij=F2?=) APEL/10.6 Emacs/22.0.92
	(i386-apple-darwin8.8.2) MULE/5.0 (SAKAKI)
MIME-Version: 1.0 (generated by SEMI 1.14.6 - "Maruoka")
Content-Type: text/plain; charset=US-ASCII
Cc: arch@freebsd.org, Marko Zec <zec@icir.org>,
	Robert Watson <rwatson@freebsd.org>
Subject: Re: vimage and modules
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 09 Feb 2007 00:00:48 -0000

At Tue, 30 Jan 2007 14:10:14 -0800,
julian wrote:
> 
> I've looked at the current vimage code and I'm impressed.
> 
> The question remains to me that we need to have vimage and modules
> interact well.
> 
> One question is, "should existing vimages suddenly get
> new capabilities when new kernel modules are loaded?"
> One alternative is to only  allow them to have access to modules that 
> were loaded before the creation of the vimage.
> 
> An analogy in the TLS (thread-local storage) world is that
> when a new shared library is loaded, TLS variables are
> immediately available to it. However this may not be needed
> in a virtual machine.
> 
> I heard someone mention the following idea in passing and the
> more I think about it, the more I like it..
> 
> Virtual machines are 'booted without loaded modules.'
> They have however, available to them all the modules loaded
> into the base system at the time that they are looking, and
> can 'load them' just as the base system can load kernel
> modules.
> 
> The difference is that they are not able to load new modules,
> but rather, only to do the 'vm_linkage' stage required on already
> loaded modules.
> 
> The vm linkage would be a separate subset of what needs to
> be done when a module is loaded. It would be a separate
> entry-point that would only be present on modules that
> supported vimages.
> 
> An example would be some module 'x'.
> 
> It would have a function that would set up any
> per-vimage linkages needed (mallocing and linking
> its own vimage-specific variables structure into the
> list of modules for that vimage.
> 
> Currently we have a load and unload method. This would
> suggest adding a vm_load method as well. creating a new
> vimage could run through all the existing modules, and
> call that functions, or we could do it as part of the booting of the 
> vimage from (say) /etc/rc or similar.
> 
> 
> If a module had no vimage-specific behaviour it would
> not have the extra entry-point.
> 
> It does mean that we need to look at the inter-module dependencies
> as if you wanted to have one module call into another, you'd need to
> have dependencies well documented. In the current code you get a
> linkage failure, if there are global variables accessed between them
> but with a 'per vmimage' structure of variables this becomes a
> runtime problem, unless we explicitly have the dependencies
> registered.

I suspect that what we need a diagram of how we think modules and
vimages relate.  Each vimage may wind up being what was called in
another system, a "protection domain" for lack of a better word.  If a
full vimage is really a copy of the whole kernel, modules and all,
then we probably need to think about a fork/exec model for kernels.
That is, creating a new vimage is the equivalent of a fork() where you
get copies of everything and some stuff is reset.  What exactly is
kept or reset is very important to work out.  I would suspect that we
would clear a lot of stuff, such as fd's, sockets etc. but keep a lot
of other state, like the RIB and FIB.

Best,
George


From owner-freebsd-arch@FreeBSD.ORG  Fri Feb  9 06:04:04 2007
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
X-Original-To: freebsd-arch@freebsd.org
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 664B316A400
	for <freebsd-arch@freebsd.org>; Fri,  9 Feb 2007 06:04:04 +0000 (UTC)
	(envelope-from delphij@delphij.net)
Received: from tarsier.geekcn.org (tarsier.geekcn.org [210.51.165.229])
	by mx1.freebsd.org (Postfix) with ESMTP id AC1B913C47E
	for <freebsd-arch@freebsd.org>; Fri,  9 Feb 2007 06:04:03 +0000 (UTC)
	(envelope-from delphij@delphij.net)
Received: from localhost (tarsier.geekcn.org [210.51.165.229])
	by tarsier.geekcn.org (Postfix) with ESMTP id C3EABEB205A
	for <freebsd-arch@freebsd.org>; Fri,  9 Feb 2007 14:03:59 +0800 (CST)
X-Virus-Scanned: amavisd-new at geekcn.org
Received: from tarsier.geekcn.org ([210.51.165.229])
	by localhost (mail.geekcn.org [210.51.165.229]) (amavisd-new,
	port 10024)
	with ESMTP id xJBxxOtPVYDo for <freebsd-arch@freebsd.org>;
	Fri,  9 Feb 2007 14:03:49 +0800 (CST)
Received: from [10.217.12.154] (sina152-194.staff.sina.com.cn [61.135.152.194])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by tarsier.geekcn.org (Postfix) with ESMTP id 76571EB08C6
	for <freebsd-arch@freebsd.org>; Fri,  9 Feb 2007 14:03:48 +0800 (CST)
DomainKey-Signature: a=rsa-sha1; s=default; d=delphij.net; c=nofws; q=dns;
	h=message-id:date:from:organization:user-agent:mime-version:to:
	subject:x-enigmail-version:content-type;
	b=Cr6FDuo8r5EzVlAyTDsB+mR7PIeiQhnAfWDq809MGGLZmBype4sGkE6UVJOP3bHVY
	pqMxMKJxztiKk0zHYc5pg==
Message-ID: <45CC0EB9.7030400@delphij.net>
Date: Fri, 09 Feb 2007 14:03:37 +0800
From: LI Xin <delphij@delphij.net>
Organization: The FreeBSD Project
User-Agent: Thunderbird 1.5.0.9 (Macintosh/20061207)
MIME-Version: 1.0
To: freebsd-arch@freebsd.org
X-Enigmail-Version: 0.94.1.0
Content-Type: multipart/signed; micalg=pgp-ripemd160;
	protocol="application/pgp-signature";
	boundary="------------enig1C02D17C056105FC2139BFB9"
Subject: Patch for review: resolve a race condition in [sg]etpriority()
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 09 Feb 2007 06:04:04 -0000

This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enig1C02D17C056105FC2139BFB9
Content-Type: multipart/mixed; boundary="------------070209090508050504040207"

This is a multi-part message in MIME format.
--------------070209090508050504040207
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Hi,

Here is a patch which resolves a race condition in [sg]etpriority(), but
I think there might be some better idea to resolve the problem, so I
would appreciate if someone would shed me some light :-)

Description of the problem:

In PRIO_USER of both [sg]etpriority(), p_ucred is expected to be a valid
reference to a valid user credential.  However, for PRS_NEW processes,
there is chances where it is not properly initialized, causing a fatal
trap 12 [1].

On the other hand, just determining whether a process is in PRS_NEW
state and skip those who are newly born is not enough for semantical
correctness.  A process could either have p_{start,end}copy section
copied or not, which needs to be handled with care.

The attached solution:

What I did in the attached patch is basically:

 - Before allproc_lock is sx_xunlock'ed in fork1(), lock both the parent
and the child.
 - After releasing allproc_lock, do process p_{start,end}copy,
p_{start,end}zero and crhold() immediately, and release parent and
child's p_mtx as soon as possible.
 - In getpriority(), simply skip PRS_NEW processes because they does not
affect PRIO_USER path's result.  This part is not really necessary for
correctness, though.

The pros for this is that it does not require an extensive API change,
and in theory it does not require that the allproc lock to be held for a
extended period; the cons is that this adds some overhead because it
adds two pairs of mutex lock/unlock.

In a private discussion, there was also some other ideas.  One is to
just move the unlock to where the process is completely initialized,
another is to introduce an event handler and msleep() p_state when
necessary, and do a wakeup once we bumped the state, but I think these
solution have their own limitations just like mine.

[1] Reproducing script can be obtained from kern/108071.

Cheers,
--=20
Xin LI <delphij@delphij.net>	http://www.delphij.net/
FreeBSD - The Power to Serve!

--------------070209090508050504040207
Content-Type: text/plain; x-mac-type="0"; x-mac-creator="0";
	name="patch-priority.20070124"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline;
 filename="patch-priority.20070124"

Index: kern_fork.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
RCS file: /home/ncvs/src/sys/kern/kern_fork.c,v
retrieving revision 1.266
diff -u -p -r1.266 kern_fork.c
--- kern_fork.c	23 Jan 2007 08:46:50 -0000	1.266
+++ kern_fork.c	24 Jan 2007 08:35:31 -0000
@@ -423,8 +423,22 @@ again:
 	AUDIT_ARG(pid, p2->p_pid);
 	LIST_INSERT_HEAD(&allproc, p2, p_list);
 	LIST_INSERT_HEAD(PIDHASH(p2->p_pid), p2, p_hash);
+
+	PROC_LOCK(p2);
+	PROC_LOCK(p1);
+
 	sx_xunlock(&allproc_lock);
=20
+	bcopy(&p1->p_startcopy, &p2->p_startcopy,
+	    __rangeof(struct proc, p_startcopy, p_endcopy));
+	PROC_UNLOCK(p1);
+
+	bzero(&p2->p_startzero,
+	    __rangeof(struct proc, p_startzero, p_endzero));
+
+	p2->p_ucred =3D crhold(td->td_ucred);
+	PROC_UNLOCK(p2);
+
 	/*
 	 * Malloc things while we don't hold any locks.
 	 */
@@ -482,13 +496,9 @@ again:
 	PROC_LOCK(p2);
 	PROC_LOCK(p1);
=20
-	bzero(&p2->p_startzero,
-	    __rangeof(struct proc, p_startzero, p_endzero));
 	bzero(&td2->td_startzero,
 	    __rangeof(struct thread, td_startzero, td_endzero));
=20
-	bcopy(&p1->p_startcopy, &p2->p_startcopy,
-	    __rangeof(struct proc, p_startcopy, p_endcopy));
 	bcopy(&td->td_startcopy, &td2->td_startcopy,
 	    __rangeof(struct thread, td_startcopy, td_endcopy));
=20
@@ -511,7 +521,6 @@ again:
 	sched_fork(td, td2);
=20
 	mtx_unlock_spin(&sched_lock);
-	p2->p_ucred =3D crhold(td->td_ucred);
 	td2->td_ucred =3D crhold(p2->p_ucred);
 #ifdef AUDIT
 	audit_proc_fork(p1, p2);
Index: kern_resource.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
RCS file: /home/ncvs/src/sys/kern/kern_resource.c,v
retrieving revision 1.165
diff -u -p -r1.165 kern_resource.c
--- kern_resource.c	17 Jan 2007 14:58:53 -0000	1.165
+++ kern_resource.c	24 Jan 2007 02:06:13 -0000
@@ -143,6 +143,9 @@ getpriority(td, uap)
 			uap->who =3D td->td_ucred->cr_uid;
 		sx_slock(&allproc_lock);
 		FOREACH_PROC_IN_SYSTEM(p) {
+			/* Do not bother to check PRS_NEW processes */
+			if (p->p_state =3D=3D PRS_NEW)
+				continue;
 			PROC_LOCK(p);
 			if (!p_cansee(td, p) &&
 			    p->p_ucred->cr_uid =3D=3D uap->who) {

--------------070209090508050504040207--

--------------enig1C02D17C056105FC2139BFB9
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFzA65OfuToMruuMARA7cHAJsGWVZlMcoepLM7h1+zT014Cj6u7gCgge6e
RL6f6wX1VMZHkO8l84269xQ=
=fkK6
-----END PGP SIGNATURE-----

--------------enig1C02D17C056105FC2139BFB9--

From owner-freebsd-arch@FreeBSD.ORG  Fri Feb  9 14:04:17 2007
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
X-Original-To: freebsd-arch@freebsd.org
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 8587916A541
	for <freebsd-arch@freebsd.org>; Fri,  9 Feb 2007 14:04:15 +0000 (UTC)
	(envelope-from jhb@freebsd.org)
Received: from server.baldwin.cx (66-23-211-162.clients.speedfactory.net
	[66.23.211.162])
	by mx1.freebsd.org (Postfix) with ESMTP id DE69C13C4A6
	for <freebsd-arch@freebsd.org>; Fri,  9 Feb 2007 14:04:14 +0000 (UTC)
	(envelope-from jhb@freebsd.org)
Received: from zion.baldwin.cx (zion.baldwin.cx [192.168.0.7])
	(authenticated bits=0)
	by server.baldwin.cx (8.13.6/8.13.6) with ESMTP id l19E3ple096573;
	Fri, 9 Feb 2007 09:03:54 -0500 (EST) (envelope-from jhb@freebsd.org)
From: John Baldwin <jhb@freebsd.org>
To: freebsd-arch@freebsd.org
Date: Fri, 9 Feb 2007 08:37:04 -0500
User-Agent: KMail/1.9.4
References: <45CC0EB9.7030400@delphij.net>
In-Reply-To: <45CC0EB9.7030400@delphij.net>
MIME-Version: 1.0
Content-Type: text/plain;
  charset="iso-8859-15"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200702090837.04495.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by
	milter-greylist-2.0.2 (server.baldwin.cx [192.168.0.1]);
	Fri, 09 Feb 2007 09:03:55 -0500 (EST)
X-Virus-Scanned: ClamAV 0.88.3/2544/Fri Feb 9 03:44:48 2007 on
	server.baldwin.cx
X-Virus-Status: Clean
X-Spam-Status: No, score=-4.4 required=4.2 tests=ALL_TRUSTED,AWL,BAYES_00 
	autolearn=ham version=3.1.3
X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on server.baldwin.cx
Cc: LI Xin <delphij@delphij.net>
Subject: Re: Patch for review: resolve a race condition in [sg]etpriority()
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 09 Feb 2007 14:04:17 -0000

On Friday 09 February 2007 01:03, LI Xin wrote:
> Hi,
> 
> Here is a patch which resolves a race condition in [sg]etpriority(), but
> I think there might be some better idea to resolve the problem, so I
> would appreciate if someone would shed me some light :-)
> 
> Description of the problem:
> 
> In PRIO_USER of both [sg]etpriority(), p_ucred is expected to be a valid
> reference to a valid user credential.  However, for PRS_NEW processes,
> there is chances where it is not properly initialized, causing a fatal
> trap 12 [1].
> 
> On the other hand, just determining whether a process is in PRS_NEW
> state and skip those who are newly born is not enough for semantical
> correctness.  A process could either have p_{start,end}copy section
> copied or not, which needs to be handled with care.
> 
> The attached solution:
> 
> What I did in the attached patch is basically:
> 
>  - Before allproc_lock is sx_xunlock'ed in fork1(), lock both the parent
> and the child.
>  - After releasing allproc_lock, do process p_{start,end}copy,
> p_{start,end}zero and crhold() immediately, and release parent and
> child's p_mtx as soon as possible.
>  - In getpriority(), simply skip PRS_NEW processes because they does not
> affect PRIO_USER path's result.  This part is not really necessary for
> correctness, though.
> 
> The pros for this is that it does not require an extensive API change,
> and in theory it does not require that the allproc lock to be held for a
> extended period; the cons is that this adds some overhead because it
> adds two pairs of mutex lock/unlock.
> 
> In a private discussion, there was also some other ideas.  One is to
> just move the unlock to where the process is completely initialized,
> another is to introduce an event handler and msleep() p_state when
> necessary, and do a wakeup once we bumped the state, but I think these
> solution have their own limitations just like mine.

My only reason for favoring the wakeup for complete initialization is that
while this patch may solve the getprio/setprio race, it doesn't solve all
PRS_NEW-related races, which the sleep/wakeup proposal did.

-- 
John Baldwin