Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 23 May 2018 20:37:13 -0600
From:      Nathan Friess <nathan.friess@gmail.com>
To:        =?UTF-8?Q?Roger_Pau_Monn=c3=a9?= <royger@FreeBSD.org>
Cc:        freebsd-xen@freebsd.org
Subject:   Re: Linux domU only works with xen_platform_pci=0 ?
Message-ID:  <3ee7f3e1-e46d-5fde-e2aa-37947ea25f0d@gmail.com>
In-Reply-To: <20180522090124.onimqynage4hys53@MBP-de-Roger>
References:  <a5a066b7-625f-d18f-6ea6-663256c09e59@duckster.net> <20180513151649.4ls73myegkhm3cep@MacBook-Pro-de-Roger.local> <0749df4b-1614-dcdf-1bf2-1bbad1ae5743@duckster.net> <20180514130445.ahqk5ol3kdhriqju@MacBook-Pro-de-Roger.local> <6c0e1f5a-3e7d-054e-298c-5ec3d97e6141@gmail.com> <20180515080836.kufr3q3mk5mxwx4q@MacBook-Pro-de-Roger.local> <c834ba9c-251a-ed00-dfaa-f3b012e3c302@gmail.com> <20180519080250.67fl66gqbew7xfzp@MacBook-Pro-de-Roger.local> <723f10d0-62c0-e48e-e8f4-b137378f0a25@gmail.com> <20180522090124.onimqynage4hys53@MBP-de-Roger>

next in thread | previous in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format.
--------------FC2B5AB8B9B6ED6E154A90A5
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit

On 2018-05-22 03:01 AM, Roger Pau Monné wrote:
> On Mon, May 21, 2018 at 01:51:13PM -0600, Nathan Friess wrote:
>> On 2018-05-19 02:02 AM, Roger Pau Monné wrote:
>> On a related note, did these patches ever make it into 11-stable?  I don't
>> see them at svn.freebsd.org/base/stable/11 but I may have missed something.
>>
>> https://lists.freebsd.org/pipermail/freebsd-xen/2016-December/002918.html
> 
> I don't think they are on HEAD either?
> 
> I don't have a lot of time ATM, so if you want to pick the patches,
> rebase them onto HEAD, test them and give me a branch I'm more than
> happy to review and commit them.
> 

Attached are 4 patches from your git repository, applied to HEAD with 
some testing on an up-to-date 12-CURRENT install.  I did see one kernel 
message that I assume is not fatal (more on that later).  I'm not able 
to test running as dom0 right now because I don't have any newish 
hardware available.  My main use case is FreeBSD as domU serving disks 
to other domUs.

The 0001-0003 patches applied cleanly to HEAD.  0004 required moving a 
function around to fix a compile error but otherwise is unchanged.  I 
also copied your commit messages from git into the patches for reference.


I was going to attach a patch to fix xen-tools with iasl in 12-CURRENT, 
but as I type this I see that you just committed a fix yesterday. 
Thanks! :)


The kernel message that I'm seeing appears when a domU shuts down or a 
disk is detached from a running domU, while the backend for the disk is 
my 12-CURRENT test system.  The backend spits this out:


lock order reversal: (sleepable after non-sleepable)
  1st 0xfffff8000802fe90 xbbd1 (xbbd1) @ 
/usr/src/sys/dev/xen/blkback/blkback.c:3423
  2nd 0xffffffff81fdf890 intrsrc (intrsrc) @ 
/usr/src/sys/x86/x86/intr_machdep.c:224
stack backtrace:
#0 0xffffffff80bdd993 at witness_debugger+0x73
#1 0xffffffff80bdd814 at witness_checkorder+0xe34
#2 0xffffffff80b7d798 at _sx_xlock+0x68
#3 0xffffffff811b3913 at intr_remove_handler+0x43
#4 0xffffffff811c63ef at xen_intr_unbind+0x10f
#5 0xffffffff80a12ecf at xbb_disconnect+0x2f
#6 0xffffffff80a12e54 at xbb_shutdown+0x1e4
#7 0xffffffff80a10be4 at xbb_frontend_changed+0x54
#8 0xffffffff80ed66a4 at xenbusb_back_otherend_changed+0x14
#9 0xffffffff80a2a382 at xenwatch_thread+0x182
#10 0xffffffff80b34164 at fork_exit+0x84
#11 0xffffffff8101ec9e at fork_trampoline+0xe
lock order reversal: (sleepable after non-sleepable)
  1st 0xfffff80008030690 xbbd0 (xbbd0) @ 
/usr/src/sys/dev/xen/blkback/blkback.c:3423
  2nd 0xffffffff81fdf890 intrsrc (intrsrc) @ 
/usr/src/sys/x86/x86/intr_machdep.c:224
stack backtrace:
#0 0xffffffff80bdd993 at witness_debugger+0x73
#1 0xffffffff80bdd814 at witness_checkorder+0xe34
#2 0xffffffff80b7d798 at _sx_xlock+0x68
#3 0xffffffff811b3913 at intr_remove_handler+0x43
#4 0xffffffff811c63ef at xen_intr_unbind+0x10f
#5 0xffffffff80a12ecf at xbb_disconnect+0x2f
#6 0xffffffff80a12e54 at xbb_shutdown+0x1e4
#7 0xffffffff80a10be4 at xbb_frontend_changed+0x54
#8 0xffffffff80ed66a4 at xenbusb_back_otherend_changed+0x14
#9 0xffffffff80a2a382 at xenwatch_thread+0x182
#10 0xffffffff80b34164 at fork_exit+0x84
#11 0xffffffff8101ec9e at fork_trampoline+0xe


Otherwise disk backends seem to work fine.

Nathan

--------------FC2B5AB8B9B6ED6E154A90A5
Content-Type: text/x-patch;
 name="0001-xenstore-remove-the-suspend-sx-lock.patch"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
 filename="0001-xenstore-remove-the-suspend-sx-lock.patch"

>From 9f9ca8f244c773ee5e6a08efa6874890a8217f05 Mon Sep 17 00:00:00 2001
From: Roger Pau Monne <roger.pau@citrix.com>
Date: Thu, 15 Dec 2016 16:52:50 +0000
Subject: [PATCH 1/4] xenstore: remove the suspend sx lock

There's no need to prevent suspend while doing xenstore transactions, callers of
transactions are supposed to be prepared for a transaction to fail.

This fixes a bug that could be triggered from the xenstore user-space device,
since starting a transaction from user-space would result in returning there
with a sx lock held, that causes a WITNESS check to trigger.

Sponsored by: Citrix Systems R&D
---
 sys/dev/xen/xenstore/xenstore.c | 81 ++---------------------------------------
 1 file changed, 4 insertions(+), 77 deletions(-)

diff --git a/sys/dev/xen/xenstore/xenstore.c b/sys/dev/xen/xenstore/xenstore.c
index 4f89b74..14e4ebd 100644
===================================================================
--- a/sys/dev/xen/xenstore/xenstore.c	(revision 334101)
+++ b/sys/dev/xen/xenstore/xenstore.c	(working copy)
@@ -202,18 +202,6 @@
 	struct mtx watch_events_lock;
 
 	/**
-	 * Sleepable lock used to prevent VM suspension while a
-	 * xenstore transaction is outstanding.
-	 *
-	 * Each active transaction holds a shared lock on the
-	 * suspend mutex.  Our suspend method blocks waiting
-	 * to acquire an exclusive lock.  This guarantees that
-	 * suspend processing will only proceed once all active
-	 * transactions have been retired.
-	 */
-	struct sx suspend_mutex;
-
-	/**
 	 * The processid of the xenwatch thread.
 	 */
 	pid_t xenwatch_pid;
@@ -710,50 +698,6 @@
 }
 
 /*---------------- XenStore Message Request/Reply Processing -----------------*/
-/**
- * Filter invoked before transmitting any message to the XenStore service.
- *
- * The role of the filter may expand, but currently serves to manage
- * the interactions of messages with transaction state.
- *
- * \param request_msg_type  The message type for the request.
- */
-static inline void
-xs_request_filter(uint32_t request_msg_type)
-{
-	if (request_msg_type == XS_TRANSACTION_START)
-		sx_slock(&xs.suspend_mutex);
-}
-
-/**
- * Filter invoked after transmitting any message to the XenStore service.
- *
- * The role of the filter may expand, but currently serves to manage
- * the interactions of messages with transaction state.
- *
- * \param request_msg_type     The message type for the original request.
- * \param reply_msg_type       The message type for any received reply.
- * \param request_reply_error  The error status from the attempt to send
- *                             the request or retrieve the reply.
- */
-static inline void
-xs_reply_filter(uint32_t request_msg_type,
-    uint32_t reply_msg_type, int request_reply_error)
-{
-	/*
-	 * The count of transactions drops if we attempted
-	 * to end a transaction (even if that attempt fails
-	 * in error), we receive a transaction end acknowledgement,
-	 * or if our attempt to begin a transaction fails.
-	 */
-	if (request_msg_type == XS_TRANSACTION_END
-	 || (request_reply_error == 0 && reply_msg_type == XS_TRANSACTION_END)
-	 || (request_msg_type == XS_TRANSACTION_START
-	  && (request_reply_error != 0 || reply_msg_type == XS_ERROR)))
-		sx_sunlock(&xs.suspend_mutex);
-
-}
-
 #define xsd_error_count	(sizeof(xsd_errors) / sizeof(xsd_errors[0]))
 
 /**
@@ -843,7 +787,6 @@
 	int error;
 
 	request_type = msg->type;
-	xs_request_filter(request_type);
 
 	sx_xlock(&xs.request_mutex);
 	if ((error = xs_write_store(msg, sizeof(*msg) + msg->len)) == 0)
@@ -850,8 +793,6 @@
 		error = xs_read_reply(&msg->type, &msg->len, result);
 	sx_xunlock(&xs.request_mutex);
 
-	xs_reply_filter(request_type, msg->type, error);
-
 	return (error);
 }
 
@@ -887,8 +828,6 @@
 	for (i = 0; i < num_vecs; i++)
 		msg.len += iovec[i].iov_len;
 
-	xs_request_filter(request_type);
-
 	sx_xlock(&xs.request_mutex);
 	error = xs_write_store(&msg, sizeof(msg));
 	if (error) {
@@ -908,7 +847,6 @@
 
 error_lock_held:
 	sx_xunlock(&xs.request_mutex);
-	xs_reply_filter(request_type, msg.type, error);
 	if (error)
 		return (error);
 
@@ -1206,7 +1144,6 @@
 	mtx_init(&xs.reply_lock, "reply lock", NULL, MTX_DEF);
 	sx_init(&xs.xenwatch_mutex, "xenwatch");
 	sx_init(&xs.request_mutex, "xenstore request");
-	sx_init(&xs.suspend_mutex, "xenstore suspend");
 	mtx_init(&xs.registered_watches_lock, "watches", NULL, MTX_DEF);
 	mtx_init(&xs.watch_events_lock, "watch events", NULL, MTX_DEF);
 
@@ -1249,7 +1186,6 @@
 	if (error != 0)
 		return (error);
 
-	sx_xlock(&xs.suspend_mutex);
 	sx_xlock(&xs.request_mutex);
 
 	return (0);
@@ -1269,8 +1205,10 @@
 	sx_xunlock(&xs.request_mutex);
 
 	/*
-	 * No need for registered_watches_lock: the suspend_mutex
-	 * is sufficient.
+	 * NB: since xenstore childs have not been resumed yet, there's
+	 * no need to hold any watch mutex. Having clients try to add or
+	 * remove watches at this point (before xenstore is resumed) is
+	 * clearly a violantion of the resume order.
 	 */
 	LIST_FOREACH(watch, &xs.registered_watches, list) {
 		sprintf(token, "%lX", (long)watch);
@@ -1277,8 +1215,6 @@
 		xs_watch(watch->node, token);
 	}
 
-	sx_xunlock(&xs.suspend_mutex);
-
 	/* Resume child Xen devices. */
 	bus_generic_resume(dev);
 
@@ -1631,8 +1567,6 @@
 
 	sprintf(token, "%lX", (long)watch);
 
-	sx_slock(&xs.suspend_mutex);
-
 	mtx_lock(&xs.registered_watches_lock);
 	KASSERT(find_watch(token) == NULL, ("watch already registered"));
 	LIST_INSERT_HEAD(&xs.registered_watches, watch, list);
@@ -1650,8 +1584,6 @@
 		mtx_unlock(&xs.registered_watches_lock);
 	}
 
-	sx_sunlock(&xs.suspend_mutex);
-
 	return (error);
 }
 
@@ -1664,12 +1596,9 @@
 
 	sprintf(token, "%lX", (long)watch);
 
-	sx_slock(&xs.suspend_mutex);
-
 	mtx_lock(&xs.registered_watches_lock);
 	if (find_watch(token) == NULL) {
 		mtx_unlock(&xs.registered_watches_lock);
-		sx_sunlock(&xs.suspend_mutex);
 		return;
 	}
 	LIST_REMOVE(watch, list);
@@ -1680,8 +1609,6 @@
 		log(LOG_WARNING, "XENSTORE Failed to release watch %s: %i\n",
 		    watch->node, error);
 
-	sx_sunlock(&xs.suspend_mutex);
-
 	/* Cancel pending watch events. */
 	mtx_lock(&xs.watch_events_lock);
 	TAILQ_FOREACH_SAFE(msg, &xs.watch_events, list, tmp) {

--------------FC2B5AB8B9B6ED6E154A90A5
Content-Type: text/x-patch;
 name="0002-xenstore-don-t-wait-with-the-PCATCH-flag.patch"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
 filename="0002-xenstore-don-t-wait-with-the-PCATCH-flag.patch"

>From c451da6a962f53b3892c1ece1c3e0473d5c93bc4 Mon Sep 17 00:00:00 2001
From: Roger Pau Monne <roger.pau@citrix.com>
Date: Thu, 15 Dec 2016 16:54:40 +0000
Subject: [PATCH 2/4] xenstore: don't wait with the PCATCH flag

Due to the current synchronous xenstore implementation in FreeBSD, we cannot
return from xs_read_reply without reading a reply, or else the ring gets out of
sync and the next request will read the previous reply and crash due to the
type mismatch. A proper solution involves making use of the req_id field in the
message and allowing multiple in-flight messages at the same time on the ring.

Sponsored by: Citrix Systems R&D
---
 sys/dev/xen/xenstore/xenstore.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/sys/dev/xen/xenstore/xenstore.c b/sys/dev/xen/xenstore/xenstore.c
index 14e4ebd..1c2e21f 100644
===================================================================
--- a/sys/dev/xen/xenstore/xenstore.c	(revision 334101)
+++ b/sys/dev/xen/xenstore/xenstore.c	(working copy)
@@ -798,8 +798,7 @@
 
 	mtx_lock(&xs.reply_lock);
 	while (TAILQ_EMPTY(&xs.reply_list)) {
-		error = mtx_sleep(&xs.reply_list, &xs.reply_lock,
-		    PCATCH, "xswait", hz/10);
+		error = mtx_sleep(&xs.reply_list, &xs.reply_lock, 0, "xswait", hz/10);
 		if (error && error != EWOULDBLOCK) {
 			mtx_unlock(&xs.reply_lock);
 			return (error);

--------------FC2B5AB8B9B6ED6E154A90A5
Content-Type: text/x-patch;
 name="0003-dev-xenstore-add-support-for-watches.patch"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
 filename="0003-dev-xenstore-add-support-for-watches.patch"

>From debde7d62ebcb2ead12f3ff9708870dc329273d4 Mon Sep 17 00:00:00 2001
From: Roger Pau Monne <roger.pau@cirix.com>
Date: Thu, 15 Dec 2016 16:54:39 +0000
Subject: [PATCH 3/4] dev/xenstore: add support for watches

Allow user-space applications to register watches using the xenstore device.
This is needed in order to run toolstack operations on domains different than
the one where xenstore is running (in which case the device is not used, since
the connection to xenstore is done using a plain socket).

Sponsored by:   Citrix Systems R&D
---
 sys/dev/xen/xenstore/xenstore_dev.c | 267 +++++++++++++++++++++++++++++++++---
 1 file changed, 247 insertions(+), 20 deletions(-)

diff --git a/sys/dev/xen/xenstore/xenstore_dev.c b/sys/dev/xen/xenstore/xenstore_dev.c
index ce62140..18f86b8 100644
===================================================================
--- a/sys/dev/xen/xenstore/xenstore_dev.c	(revision 334101)
+++ b/sys/dev/xen/xenstore/xenstore_dev.c	(working copy)
@@ -44,6 +44,8 @@
 #include <sys/malloc.h>
 #include <sys/conf.h>
 #include <sys/module.h>
+#include <sys/selinfo.h>
+#include <sys/poll.h>
 
 #include <xen/xen-os.h>
 
@@ -56,10 +58,20 @@
 	struct xs_transaction handle;
 };
 
+struct xs_dev_watch {
+	LIST_ENTRY(xs_dev_watch) list;
+	struct xs_watch watch;
+	char *token;
+	struct xs_dev_data *user;
+};
+
 struct xs_dev_data {
 	/* In-progress transaction. */
-	LIST_HEAD(xdd_list_head, xs_dev_transaction) transactions;
+	LIST_HEAD(, xs_dev_transaction) transactions;
 
+	/* Active watches. */
+	LIST_HEAD(, xs_dev_watch) watches;
+
 	/* Partial request. */
 	unsigned int len;
 	union {
@@ -71,8 +83,136 @@
 #define MASK_READ_IDX(idx) ((idx)&(PAGE_SIZE-1))
 	char read_buffer[PAGE_SIZE];
 	unsigned int read_cons, read_prod;
+
+	/* Serializes writes to the read buffer. */
+	struct mtx lock;
+
+	/* Polling structure (for reads only ATM). */
+	struct selinfo ev_rsel;
 };
 
+static void
+xs_queue_reply(struct xs_dev_data *u, const char *data, unsigned int len)
+{
+	int i;
+
+	for (i = 0; i < len; i++, u->read_prod++)
+		u->read_buffer[MASK_READ_IDX(u->read_prod)] = data[i];
+
+	KASSERT((u->read_prod - u->read_cons) <= sizeof(u->read_buffer),
+	    ("xenstore reply too big"));
+
+	wakeup(u);
+	selwakeup(&u->ev_rsel);
+}
+
+static const char *
+xs_dev_error_to_string(int error)
+{
+	int i;
+
+	for (i = 0; i < nitems(xsd_errors); i++)
+		if (xsd_errors[i].errnum == error)
+			return (xsd_errors[i].errstring);
+
+	return (NULL);
+}
+
+static void
+xs_dev_return_error(struct xs_dev_data *u, int error, int req_id, int tx_id)
+{
+	struct xsd_sockmsg msg;
+	const char *payload;
+
+	msg.type = XS_ERROR;
+	msg.req_id = req_id;
+	msg.tx_id = tx_id;
+	payload = NULL;
+
+
+	payload = xs_dev_error_to_string(error);
+	if (payload == NULL)
+		payload = xs_dev_error_to_string(EINVAL);
+	KASSERT(payload != NULL, ("Unable to find string for EINVAL errno"));
+
+	msg.len = strlen(payload) + 1;
+
+	mtx_lock(&u->lock);
+	xs_queue_reply(u, (char *)&msg, sizeof(msg));
+	xs_queue_reply(u, payload, msg.len);
+	mtx_unlock(&u->lock);
+}
+
+static int
+xs_dev_watch_message_parse_string(const char **p, const char *end,
+    const char **string_r)
+{
+	const char *nul = memchr(*p, 0, end - *p);
+	if (!nul)
+		return -EINVAL;
+
+	*string_r = *p;
+	*p = nul+1;
+
+	return 0;
+}
+
+static int
+xs_dev_watch_message_parse(const struct xsd_sockmsg *msg, const char **path_r,
+    const char **token_r)
+{
+	const char *begin = (const char*)msg;
+	const char *p = begin + sizeof(*msg);
+	const char *end = p + msg->len;
+	int error;
+
+	KASSERT(p <= end, ("payload overflow"));
+
+	error = xs_dev_watch_message_parse_string(&p, end, path_r);
+	if (error)
+		return error;
+	error = xs_dev_watch_message_parse_string(&p, end, token_r);
+	if (error)
+		return error;
+
+	return 0;
+}
+
+static struct xs_dev_watch *
+xs_dev_find_watch(struct xs_dev_data *u, const char *token)
+{
+	struct xs_dev_watch *watch;
+
+	LIST_FOREACH(watch, &u->watches, list) {
+		if (strcmp(watch->token, token) == 0)
+			return watch;
+	}
+
+	return NULL;
+}
+
+static void
+xs_dev_watch_cb(struct xs_watch *watch, const char **vec, unsigned int len)
+{
+	struct xs_dev_watch *dwatch;
+	struct xsd_sockmsg msg;
+	char *payload;
+
+	dwatch = (struct xs_dev_watch *)watch->callback_data;
+	msg.type = XS_WATCH_EVENT;
+	msg.req_id = msg.tx_id = 0;
+	msg.len = strlen(vec[XS_WATCH_PATH]) + strlen(dwatch->token) + 2;
+
+	payload = malloc(msg.len, M_XENSTORE, M_WAITOK);
+	strcpy(payload, vec[XS_WATCH_PATH]);
+	strcpy(&payload[strlen(vec[XS_WATCH_PATH]) + 1], dwatch->token);
+	mtx_lock(&dwatch->user->lock);
+	xs_queue_reply(dwatch->user, (char *)&msg, sizeof(msg));
+	xs_queue_reply(dwatch->user, payload, msg.len);
+	mtx_unlock(&dwatch->user->lock);
+	free(payload, M_XENSTORE);
+}
+
 static int 
 xs_dev_read(struct cdev *dev, struct uio *uio, int ioflag)
 {
@@ -101,27 +241,16 @@
 	return (0);
 }
 
-static void
-xs_queue_reply(struct xs_dev_data *u, char *data, unsigned int len)
-{
-	int i;
-
-	for (i = 0; i < len; i++, u->read_prod++)
-		u->read_buffer[MASK_READ_IDX(u->read_prod)] = data[i];
-
-	KASSERT((u->read_prod - u->read_cons) <= sizeof(u->read_buffer),
-	    ("xenstore reply too big"));
-
-	wakeup(u);
-}
-
 static int 
 xs_dev_write(struct cdev *dev, struct uio *uio, int ioflag)
 {
 	int error;
+	const char *wpath, *wtoken;
 	struct xs_dev_data *u;
 	struct xs_dev_transaction *trans;
+	struct xs_dev_watch *watch;
 	void *reply;
+	static const char *ok = "OK";
 	int len = uio->uio_resid;
 
 	error = devfs_get_cdevpriv((void **)&u);
@@ -168,35 +297,130 @@
 				LIST_REMOVE(trans, list);
 				free(trans, M_XENSTORE);
 			}
+			mtx_lock(&u->lock);
 			xs_queue_reply(u, (char *)&u->u.msg, sizeof(u->u.msg));
 			xs_queue_reply(u, (char *)reply, u->u.msg.len);
+			mtx_unlock(&u->lock);
 			free(reply, M_XENSTORE);
 		}
 		break;
+	case XS_WATCH:
+		u->u.msg.tx_id = 0;
+		error = xs_dev_watch_message_parse(&u->u.msg, &wpath, &wtoken);
+		if (error)
+			break;
+		if (xs_dev_find_watch(u, wtoken) != NULL) {
+			error = EINVAL;
+			break;
+		}
 
+		watch = malloc(sizeof(*watch), M_XENSTORE, M_WAITOK);
+		watch->watch.node = strdup(wpath, M_XENSTORE);
+		watch->watch.callback = xs_dev_watch_cb;
+		watch->watch.callback_data = (uintptr_t)watch;
+		watch->token = strdup(wtoken, M_XENSTORE);
+		watch->user = u;
+
+		error = xs_register_watch(&watch->watch);
+		if (error != 0) {
+			free(watch->token, M_XENSTORE);
+			free(watch->watch.node, M_XENSTORE);
+			free(watch, M_XENSTORE);
+			break;
+		}
+
+		LIST_INSERT_HEAD(&u->watches, watch, list);
+		u->u.msg.len = sizeof(ok);
+		mtx_lock(&u->lock);
+		xs_queue_reply(u, (char *)&u->u.msg, sizeof(u->u.msg));
+		xs_queue_reply(u, ok, sizeof(ok));
+		mtx_unlock(&u->lock);
+		break;
+	case XS_UNWATCH:
+		u->u.msg.tx_id = 0;
+		error = xs_dev_watch_message_parse(&u->u.msg, &wpath, &wtoken);
+		if (error)
+			break;
+		watch = xs_dev_find_watch(u, wtoken);
+		if (watch == NULL) {
+			error = EINVAL;
+			break;
+		}
+
+		LIST_REMOVE(watch, list);
+		xs_unregister_watch(&watch->watch);
+		free(watch->watch.node, M_XENSTORE);
+		free(watch->token, M_XENSTORE);
+		free(watch, M_XENSTORE);
+		u->u.msg.len = sizeof(ok);
+		mtx_lock(&u->lock);
+		xs_queue_reply(u, (char *)&u->u.msg, sizeof(u->u.msg));
+		xs_queue_reply(u, ok, sizeof(ok));
+		mtx_unlock(&u->lock);
+		break;
 	default:
 		error = EINVAL;
 		break;
 	}
 
-	if (error == 0)
-		u->len = 0;
+	if (error != 0)
+		xs_dev_return_error(u, error, u->u.msg.req_id, u->u.msg.tx_id);
 
-	return (error);
+	/* Reset the write buffer. */
+	u->len = 0;
+
+	return (0);
 }
 
+static int
+xs_dev_poll(struct cdev *dev, int events, struct thread *td)
+{
+	struct xs_dev_data *u;
+	int error, mask;
+
+	error = devfs_get_cdevpriv((void **)&u);
+	if (error != 0)
+		return (POLLERR);
+
+	/* we can always write */
+	mask = events & (POLLOUT | POLLWRNORM);
+
+	if (events & (POLLIN | POLLRDNORM)) {
+		if (u->read_cons != u->read_prod) {
+			mask |= events & (POLLIN | POLLRDNORM);
+		} else {
+			/* Record that someone is waiting */
+			selrecord(td, &u->ev_rsel);
+		}
+	}
+
+	return (mask);
+}
+
 static void
 xs_dev_dtor(void *arg)
 {
 	struct xs_dev_data *u = arg;
-	struct xs_dev_transaction *trans, *tmp;
+	struct xs_dev_transaction *trans, *tmpt;
+	struct xs_dev_watch *watch, *tmpw;
 
-	LIST_FOREACH_SAFE(trans, &u->transactions, list, tmp) {
+	seldrain(&u->ev_rsel);
+
+	LIST_FOREACH_SAFE(trans, &u->transactions, list, tmpt) {
 		xs_transaction_end(trans->handle, 1);
 		LIST_REMOVE(trans, list);
 		free(trans, M_XENSTORE);
 	}
 
+	LIST_FOREACH_SAFE(watch, &u->watches, list, tmpw) {
+		LIST_REMOVE(watch, list);
+		xs_unregister_watch(&watch->watch);
+		free(watch->watch.node, M_XENSTORE);
+		free(watch->token, M_XENSTORE);
+		free(watch, M_XENSTORE);
+	}
+	mtx_destroy(&u->lock);
+
 	free(u, M_XENSTORE);
 }
 
@@ -207,7 +431,9 @@
 	int error;
 
 	u = malloc(sizeof(*u), M_XENSTORE, M_WAITOK|M_ZERO);
+	mtx_init(&u->lock, "xsdev_lock", NULL, MTX_DEF);
 	LIST_INIT(&u->transactions);
+	LIST_INIT(&u->watches);
 	error = devfs_set_cdevpriv(u, xs_dev_dtor);
 	if (error != 0)
 		free(u, M_XENSTORE);
@@ -220,6 +446,7 @@
 	.d_read = xs_dev_read,
 	.d_write = xs_dev_write,
 	.d_open = xs_dev_open,
+	.d_poll = xs_dev_poll,
 	.d_name = "xs_dev",
 };
 

--------------FC2B5AB8B9B6ED6E154A90A5
Content-Type: text/x-patch;
 name="0004-xenstore-dev-prevent-user-space-xenstore-device-from.patch"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
 filename*0="0004-xenstore-dev-prevent-user-space-xenstore-device-from.pa";
 filename*1="tch"

>From 751095963c026876152c8a9c2cbfd9ff76ae6329 Mon Sep 17 00:00:00 2001
From: Roger Pau Monne <roger.pau@citrix.com>
Date: Thu, 15 Dec 2016 16:54:40 +0000
Subject: [PATCH 4/4] xenstore/dev: prevent user-space xenstore device from
 hijacking transactions

The user-space xenstore device is currently lacking a check to make sure that
the caller is only using transaction ids currently assigned to it. This allows
users of the xenstore device to hijack transactions not started by them,
although the scope is limited to transactions started by the same domain.

Sponsored by: Citrix Systems R&D
---
 sys/dev/xen/xenstore/xenstore_dev.c | 28 ++++++++++++++++++++++------
 1 file changed, 22 insertions(+), 6 deletions(-)

diff --git a/sys/dev/xen/xenstore/xenstore_dev.c b/sys/dev/xen/xenstore/xenstore_dev.c
index 18f86b8..de5d5e6 100644
===================================================================
--- a/sys/dev/xen/xenstore/xenstore_dev.c	(revision 334101)
+++ b/sys/dev/xen/xenstore/xenstore_dev.c	(working copy)
@@ -73,6 +73,18 @@
 	unsigned int read_cons, read_prod;
 };
 
+static struct xs_dev_transaction *
+xs_dev_find_transaction(struct xs_dev_data *u, uint32_t tx_id)
+{
+	struct xs_dev_transaction *trans;
+
+	LIST_FOREACH(trans, &u->transactions, list)
+		if (trans->handle.id == tx_id)
+			return trans;
+
+	return NULL;
+}
+
 static int 
 xs_dev_read(struct cdev *dev, struct uio *uio, int ioflag)
 {
@@ -151,6 +163,12 @@
 	case XS_MKDIR:
 	case XS_RM:
 	case XS_SET_PERMS:
+		/* Check that this transaction id is not hijacked. */
+		if (u->u.msg.tx_id != 0 &&
+			xs_dev_find_transaction(u, u->u.msg.tx_id) == NULL) {
+			error = EINVAL;
+			break;
+		}
 		error = xs_dev_request_and_reply(&u->u.msg, &reply);
 		if (!error) {
 			if (u->u.msg.type == XS_TRANSACTION_START) {
@@ -159,12 +177,10 @@
 				trans->handle.id = strtoul(reply, NULL, 0);
 				LIST_INSERT_HEAD(&u->transactions, trans, list);
 			} else if (u->u.msg.type == XS_TRANSACTION_END) {
-				LIST_FOREACH(trans, &u->transactions, list)
-					if (trans->handle.id == u->u.msg.tx_id)
-						break;
-#if 0 /* XXX does this mean the list is empty? */
-				BUG_ON(&trans->list == &u->transactions);
-#endif
+				trans = xs_dev_find_transaction(u,
+				    u->u.msg.tx_id);
+				KASSERT(trans != NULL,
+				    ("Unable to find transaction"));
 				LIST_REMOVE(trans, list);
 				free(trans, M_XENSTORE);
 			}

--------------FC2B5AB8B9B6ED6E154A90A5--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3ee7f3e1-e46d-5fde-e2aa-37947ea25f0d>