From owner-freebsd-xen@freebsd.org Thu May 24 02:37:18 2018 Return-Path: Delivered-To: freebsd-xen@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B372CEF5CA8 for ; Thu, 24 May 2018 02:37:17 +0000 (UTC) (envelope-from nathan.friess@gmail.com) Received: from mail-pg0-x244.google.com (mail-pg0-x244.google.com [IPv6:2607:f8b0:400e:c05::244]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 408806D2A0; Thu, 24 May 2018 02:37:17 +0000 (UTC) (envelope-from nathan.friess@gmail.com) Received: by mail-pg0-x244.google.com with SMTP id p9-v6so62604pgc.9; Wed, 23 May 2018 19:37:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language; bh=5QjU72cZKDM0BzAqPELHtRd5NEoE1lwmFgD0Lecr8fI=; b=P7EKUM3Z/Yc4mqfzkQTWaNftwEAoxTyg6MSwuSoiwuKXD2ZPP09080MrkbT7LF2m7E YV9jJD5yPBloYXD0sUYAUHrfQWO8IbJFinf9SEN1oYwaUyArNc/p9AyscyIqaxEkEYWg AasQxBtr9YVNN3h7rPnd4eOiDMs3g5KkH/prmAT0N66S2IM87iKSxYZXpJUtlp7ATnA/ vPtNXzYifzWIjMsSOV2YqkRDoirf8ByVjIcvP1p3o7cjg4TJ12ZdSuD3xbenw+/J4Xr7 KYnKe3Ga7z4zQ3X2KrXJ3/X5WZpwkMIRIn5a9svTTjBs4rZ76IQTM+3nzPA0RSDU6aQU Kh9g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language; bh=5QjU72cZKDM0BzAqPELHtRd5NEoE1lwmFgD0Lecr8fI=; b=qk2HR0tp0VM1bUtOrK+XNRjeSNKLAwI9HAn7lpvmBUwCE6qraXRKFWjjSEmV/AzkQz 9kbSb++VYJyIFqdkuPNj1BDI2SEZC2BxG1mDQAn8d4GxhqAEocbsrm4FprO7jjqYQqNc 4ENY9znpuvT5A2kk54K7beygTmRZIqQfQuCYGB0X6oUL9NGtAmoEeBRpd8sHGW5wcexz OKszbF/NyRu1SGII+c2VuZgbu2hRhbN6QFKMMYfuUlCJBxm9gBKiHX6r+B6hR+Et/yma Exc72LAn3Hvn96k/zIC1/DYrduRekdYT8H9AiAsdl19RbmX6aEorM2+0gy0M/40YTCq4 h7wQ== X-Gm-Message-State: ALKqPwf2oeDV9gA8/VocrHqtWNtQ5PBOLeEfFEAlen96wD5qbuWmpOOW AmZp002zDoa4sfwtA4b1j+WxI7W3 X-Google-Smtp-Source: AB8JxZqtQyU2PWQttOb7kHLZfgR2AdL2jPnJKs50UWliXIo44+K1Yo4A+piDbG2a08r5Uyo7RAu5cA== X-Received: by 2002:a63:b601:: with SMTP id j1-v6mr4292142pgf.335.1527129435427; Wed, 23 May 2018 19:37:15 -0700 (PDT) Received: from [10.2.1.1] (S01060018e7c4b870.cg.shawcable.net. [70.72.182.108]) by smtp.gmail.com with ESMTPSA id z131-v6sm29964319pgz.86.2018.05.23.19.37.14 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 23 May 2018 19:37:14 -0700 (PDT) Subject: Re: Linux domU only works with xen_platform_pci=0 ? To: =?UTF-8?Q?Roger_Pau_Monn=c3=a9?= Cc: freebsd-xen@freebsd.org References: <20180513151649.4ls73myegkhm3cep@MacBook-Pro-de-Roger.local> <0749df4b-1614-dcdf-1bf2-1bbad1ae5743@duckster.net> <20180514130445.ahqk5ol3kdhriqju@MacBook-Pro-de-Roger.local> <6c0e1f5a-3e7d-054e-298c-5ec3d97e6141@gmail.com> <20180515080836.kufr3q3mk5mxwx4q@MacBook-Pro-de-Roger.local> <20180519080250.67fl66gqbew7xfzp@MacBook-Pro-de-Roger.local> <723f10d0-62c0-e48e-e8f4-b137378f0a25@gmail.com> <20180522090124.onimqynage4hys53@MBP-de-Roger> From: Nathan Friess Message-ID: <3ee7f3e1-e46d-5fde-e2aa-37947ea25f0d@gmail.com> Date: Wed, 23 May 2018 20:37:13 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 MIME-Version: 1.0 In-Reply-To: <20180522090124.onimqynage4hys53@MBP-de-Roger> Content-Type: multipart/mixed; boundary="------------FC2B5AB8B9B6ED6E154A90A5" Content-Language: en-US X-BeenThere: freebsd-xen@freebsd.org X-Mailman-Version: 2.1.26 Precedence: list List-Id: Discussion of the freebsd port to xen - implementation and usage List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 24 May 2018 02:37:18 -0000 This is a multi-part message in MIME format. --------------FC2B5AB8B9B6ED6E154A90A5 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit On 2018-05-22 03:01 AM, Roger Pau Monné wrote: > On Mon, May 21, 2018 at 01:51:13PM -0600, Nathan Friess wrote: >> On 2018-05-19 02:02 AM, Roger Pau Monné wrote: >> On a related note, did these patches ever make it into 11-stable? I don't >> see them at svn.freebsd.org/base/stable/11 but I may have missed something. >> >> https://lists.freebsd.org/pipermail/freebsd-xen/2016-December/002918.html > > I don't think they are on HEAD either? > > I don't have a lot of time ATM, so if you want to pick the patches, > rebase them onto HEAD, test them and give me a branch I'm more than > happy to review and commit them. > Attached are 4 patches from your git repository, applied to HEAD with some testing on an up-to-date 12-CURRENT install. I did see one kernel message that I assume is not fatal (more on that later). I'm not able to test running as dom0 right now because I don't have any newish hardware available. My main use case is FreeBSD as domU serving disks to other domUs. The 0001-0003 patches applied cleanly to HEAD. 0004 required moving a function around to fix a compile error but otherwise is unchanged. I also copied your commit messages from git into the patches for reference. I was going to attach a patch to fix xen-tools with iasl in 12-CURRENT, but as I type this I see that you just committed a fix yesterday. Thanks! :) The kernel message that I'm seeing appears when a domU shuts down or a disk is detached from a running domU, while the backend for the disk is my 12-CURRENT test system. The backend spits this out: lock order reversal: (sleepable after non-sleepable) 1st 0xfffff8000802fe90 xbbd1 (xbbd1) @ /usr/src/sys/dev/xen/blkback/blkback.c:3423 2nd 0xffffffff81fdf890 intrsrc (intrsrc) @ /usr/src/sys/x86/x86/intr_machdep.c:224 stack backtrace: #0 0xffffffff80bdd993 at witness_debugger+0x73 #1 0xffffffff80bdd814 at witness_checkorder+0xe34 #2 0xffffffff80b7d798 at _sx_xlock+0x68 #3 0xffffffff811b3913 at intr_remove_handler+0x43 #4 0xffffffff811c63ef at xen_intr_unbind+0x10f #5 0xffffffff80a12ecf at xbb_disconnect+0x2f #6 0xffffffff80a12e54 at xbb_shutdown+0x1e4 #7 0xffffffff80a10be4 at xbb_frontend_changed+0x54 #8 0xffffffff80ed66a4 at xenbusb_back_otherend_changed+0x14 #9 0xffffffff80a2a382 at xenwatch_thread+0x182 #10 0xffffffff80b34164 at fork_exit+0x84 #11 0xffffffff8101ec9e at fork_trampoline+0xe lock order reversal: (sleepable after non-sleepable) 1st 0xfffff80008030690 xbbd0 (xbbd0) @ /usr/src/sys/dev/xen/blkback/blkback.c:3423 2nd 0xffffffff81fdf890 intrsrc (intrsrc) @ /usr/src/sys/x86/x86/intr_machdep.c:224 stack backtrace: #0 0xffffffff80bdd993 at witness_debugger+0x73 #1 0xffffffff80bdd814 at witness_checkorder+0xe34 #2 0xffffffff80b7d798 at _sx_xlock+0x68 #3 0xffffffff811b3913 at intr_remove_handler+0x43 #4 0xffffffff811c63ef at xen_intr_unbind+0x10f #5 0xffffffff80a12ecf at xbb_disconnect+0x2f #6 0xffffffff80a12e54 at xbb_shutdown+0x1e4 #7 0xffffffff80a10be4 at xbb_frontend_changed+0x54 #8 0xffffffff80ed66a4 at xenbusb_back_otherend_changed+0x14 #9 0xffffffff80a2a382 at xenwatch_thread+0x182 #10 0xffffffff80b34164 at fork_exit+0x84 #11 0xffffffff8101ec9e at fork_trampoline+0xe Otherwise disk backends seem to work fine. Nathan --------------FC2B5AB8B9B6ED6E154A90A5 Content-Type: text/x-patch; name="0001-xenstore-remove-the-suspend-sx-lock.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="0001-xenstore-remove-the-suspend-sx-lock.patch" >From 9f9ca8f244c773ee5e6a08efa6874890a8217f05 Mon Sep 17 00:00:00 2001 From: Roger Pau Monne Date: Thu, 15 Dec 2016 16:52:50 +0000 Subject: [PATCH 1/4] xenstore: remove the suspend sx lock There's no need to prevent suspend while doing xenstore transactions, callers of transactions are supposed to be prepared for a transaction to fail. This fixes a bug that could be triggered from the xenstore user-space device, since starting a transaction from user-space would result in returning there with a sx lock held, that causes a WITNESS check to trigger. Sponsored by: Citrix Systems R&D --- sys/dev/xen/xenstore/xenstore.c | 81 ++--------------------------------------- 1 file changed, 4 insertions(+), 77 deletions(-) diff --git a/sys/dev/xen/xenstore/xenstore.c b/sys/dev/xen/xenstore/xenstore.c index 4f89b74..14e4ebd 100644 =================================================================== --- a/sys/dev/xen/xenstore/xenstore.c (revision 334101) +++ b/sys/dev/xen/xenstore/xenstore.c (working copy) @@ -202,18 +202,6 @@ struct mtx watch_events_lock; /** - * Sleepable lock used to prevent VM suspension while a - * xenstore transaction is outstanding. - * - * Each active transaction holds a shared lock on the - * suspend mutex. Our suspend method blocks waiting - * to acquire an exclusive lock. This guarantees that - * suspend processing will only proceed once all active - * transactions have been retired. - */ - struct sx suspend_mutex; - - /** * The processid of the xenwatch thread. */ pid_t xenwatch_pid; @@ -710,50 +698,6 @@ } /*---------------- XenStore Message Request/Reply Processing -----------------*/ -/** - * Filter invoked before transmitting any message to the XenStore service. - * - * The role of the filter may expand, but currently serves to manage - * the interactions of messages with transaction state. - * - * \param request_msg_type The message type for the request. - */ -static inline void -xs_request_filter(uint32_t request_msg_type) -{ - if (request_msg_type == XS_TRANSACTION_START) - sx_slock(&xs.suspend_mutex); -} - -/** - * Filter invoked after transmitting any message to the XenStore service. - * - * The role of the filter may expand, but currently serves to manage - * the interactions of messages with transaction state. - * - * \param request_msg_type The message type for the original request. - * \param reply_msg_type The message type for any received reply. - * \param request_reply_error The error status from the attempt to send - * the request or retrieve the reply. - */ -static inline void -xs_reply_filter(uint32_t request_msg_type, - uint32_t reply_msg_type, int request_reply_error) -{ - /* - * The count of transactions drops if we attempted - * to end a transaction (even if that attempt fails - * in error), we receive a transaction end acknowledgement, - * or if our attempt to begin a transaction fails. - */ - if (request_msg_type == XS_TRANSACTION_END - || (request_reply_error == 0 && reply_msg_type == XS_TRANSACTION_END) - || (request_msg_type == XS_TRANSACTION_START - && (request_reply_error != 0 || reply_msg_type == XS_ERROR))) - sx_sunlock(&xs.suspend_mutex); - -} - #define xsd_error_count (sizeof(xsd_errors) / sizeof(xsd_errors[0])) /** @@ -843,7 +787,6 @@ int error; request_type = msg->type; - xs_request_filter(request_type); sx_xlock(&xs.request_mutex); if ((error = xs_write_store(msg, sizeof(*msg) + msg->len)) == 0) @@ -850,8 +793,6 @@ error = xs_read_reply(&msg->type, &msg->len, result); sx_xunlock(&xs.request_mutex); - xs_reply_filter(request_type, msg->type, error); - return (error); } @@ -887,8 +828,6 @@ for (i = 0; i < num_vecs; i++) msg.len += iovec[i].iov_len; - xs_request_filter(request_type); - sx_xlock(&xs.request_mutex); error = xs_write_store(&msg, sizeof(msg)); if (error) { @@ -908,7 +847,6 @@ error_lock_held: sx_xunlock(&xs.request_mutex); - xs_reply_filter(request_type, msg.type, error); if (error) return (error); @@ -1206,7 +1144,6 @@ mtx_init(&xs.reply_lock, "reply lock", NULL, MTX_DEF); sx_init(&xs.xenwatch_mutex, "xenwatch"); sx_init(&xs.request_mutex, "xenstore request"); - sx_init(&xs.suspend_mutex, "xenstore suspend"); mtx_init(&xs.registered_watches_lock, "watches", NULL, MTX_DEF); mtx_init(&xs.watch_events_lock, "watch events", NULL, MTX_DEF); @@ -1249,7 +1186,6 @@ if (error != 0) return (error); - sx_xlock(&xs.suspend_mutex); sx_xlock(&xs.request_mutex); return (0); @@ -1269,8 +1205,10 @@ sx_xunlock(&xs.request_mutex); /* - * No need for registered_watches_lock: the suspend_mutex - * is sufficient. + * NB: since xenstore childs have not been resumed yet, there's + * no need to hold any watch mutex. Having clients try to add or + * remove watches at this point (before xenstore is resumed) is + * clearly a violantion of the resume order. */ LIST_FOREACH(watch, &xs.registered_watches, list) { sprintf(token, "%lX", (long)watch); @@ -1277,8 +1215,6 @@ xs_watch(watch->node, token); } - sx_xunlock(&xs.suspend_mutex); - /* Resume child Xen devices. */ bus_generic_resume(dev); @@ -1631,8 +1567,6 @@ sprintf(token, "%lX", (long)watch); - sx_slock(&xs.suspend_mutex); - mtx_lock(&xs.registered_watches_lock); KASSERT(find_watch(token) == NULL, ("watch already registered")); LIST_INSERT_HEAD(&xs.registered_watches, watch, list); @@ -1650,8 +1584,6 @@ mtx_unlock(&xs.registered_watches_lock); } - sx_sunlock(&xs.suspend_mutex); - return (error); } @@ -1664,12 +1596,9 @@ sprintf(token, "%lX", (long)watch); - sx_slock(&xs.suspend_mutex); - mtx_lock(&xs.registered_watches_lock); if (find_watch(token) == NULL) { mtx_unlock(&xs.registered_watches_lock); - sx_sunlock(&xs.suspend_mutex); return; } LIST_REMOVE(watch, list); @@ -1680,8 +1609,6 @@ log(LOG_WARNING, "XENSTORE Failed to release watch %s: %i\n", watch->node, error); - sx_sunlock(&xs.suspend_mutex); - /* Cancel pending watch events. */ mtx_lock(&xs.watch_events_lock); TAILQ_FOREACH_SAFE(msg, &xs.watch_events, list, tmp) { --------------FC2B5AB8B9B6ED6E154A90A5 Content-Type: text/x-patch; name="0002-xenstore-don-t-wait-with-the-PCATCH-flag.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="0002-xenstore-don-t-wait-with-the-PCATCH-flag.patch" >From c451da6a962f53b3892c1ece1c3e0473d5c93bc4 Mon Sep 17 00:00:00 2001 From: Roger Pau Monne Date: Thu, 15 Dec 2016 16:54:40 +0000 Subject: [PATCH 2/4] xenstore: don't wait with the PCATCH flag Due to the current synchronous xenstore implementation in FreeBSD, we cannot return from xs_read_reply without reading a reply, or else the ring gets out of sync and the next request will read the previous reply and crash due to the type mismatch. A proper solution involves making use of the req_id field in the message and allowing multiple in-flight messages at the same time on the ring. Sponsored by: Citrix Systems R&D --- sys/dev/xen/xenstore/xenstore.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/sys/dev/xen/xenstore/xenstore.c b/sys/dev/xen/xenstore/xenstore.c index 14e4ebd..1c2e21f 100644 =================================================================== --- a/sys/dev/xen/xenstore/xenstore.c (revision 334101) +++ b/sys/dev/xen/xenstore/xenstore.c (working copy) @@ -798,8 +798,7 @@ mtx_lock(&xs.reply_lock); while (TAILQ_EMPTY(&xs.reply_list)) { - error = mtx_sleep(&xs.reply_list, &xs.reply_lock, - PCATCH, "xswait", hz/10); + error = mtx_sleep(&xs.reply_list, &xs.reply_lock, 0, "xswait", hz/10); if (error && error != EWOULDBLOCK) { mtx_unlock(&xs.reply_lock); return (error); --------------FC2B5AB8B9B6ED6E154A90A5 Content-Type: text/x-patch; name="0003-dev-xenstore-add-support-for-watches.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="0003-dev-xenstore-add-support-for-watches.patch" >From debde7d62ebcb2ead12f3ff9708870dc329273d4 Mon Sep 17 00:00:00 2001 From: Roger Pau Monne Date: Thu, 15 Dec 2016 16:54:39 +0000 Subject: [PATCH 3/4] dev/xenstore: add support for watches Allow user-space applications to register watches using the xenstore device. This is needed in order to run toolstack operations on domains different than the one where xenstore is running (in which case the device is not used, since the connection to xenstore is done using a plain socket). Sponsored by: Citrix Systems R&D --- sys/dev/xen/xenstore/xenstore_dev.c | 267 +++++++++++++++++++++++++++++++++--- 1 file changed, 247 insertions(+), 20 deletions(-) diff --git a/sys/dev/xen/xenstore/xenstore_dev.c b/sys/dev/xen/xenstore/xenstore_dev.c index ce62140..18f86b8 100644 =================================================================== --- a/sys/dev/xen/xenstore/xenstore_dev.c (revision 334101) +++ b/sys/dev/xen/xenstore/xenstore_dev.c (working copy) @@ -44,6 +44,8 @@ #include #include #include +#include +#include #include @@ -56,10 +58,20 @@ struct xs_transaction handle; }; +struct xs_dev_watch { + LIST_ENTRY(xs_dev_watch) list; + struct xs_watch watch; + char *token; + struct xs_dev_data *user; +}; + struct xs_dev_data { /* In-progress transaction. */ - LIST_HEAD(xdd_list_head, xs_dev_transaction) transactions; + LIST_HEAD(, xs_dev_transaction) transactions; + /* Active watches. */ + LIST_HEAD(, xs_dev_watch) watches; + /* Partial request. */ unsigned int len; union { @@ -71,8 +83,136 @@ #define MASK_READ_IDX(idx) ((idx)&(PAGE_SIZE-1)) char read_buffer[PAGE_SIZE]; unsigned int read_cons, read_prod; + + /* Serializes writes to the read buffer. */ + struct mtx lock; + + /* Polling structure (for reads only ATM). */ + struct selinfo ev_rsel; }; +static void +xs_queue_reply(struct xs_dev_data *u, const char *data, unsigned int len) +{ + int i; + + for (i = 0; i < len; i++, u->read_prod++) + u->read_buffer[MASK_READ_IDX(u->read_prod)] = data[i]; + + KASSERT((u->read_prod - u->read_cons) <= sizeof(u->read_buffer), + ("xenstore reply too big")); + + wakeup(u); + selwakeup(&u->ev_rsel); +} + +static const char * +xs_dev_error_to_string(int error) +{ + int i; + + for (i = 0; i < nitems(xsd_errors); i++) + if (xsd_errors[i].errnum == error) + return (xsd_errors[i].errstring); + + return (NULL); +} + +static void +xs_dev_return_error(struct xs_dev_data *u, int error, int req_id, int tx_id) +{ + struct xsd_sockmsg msg; + const char *payload; + + msg.type = XS_ERROR; + msg.req_id = req_id; + msg.tx_id = tx_id; + payload = NULL; + + + payload = xs_dev_error_to_string(error); + if (payload == NULL) + payload = xs_dev_error_to_string(EINVAL); + KASSERT(payload != NULL, ("Unable to find string for EINVAL errno")); + + msg.len = strlen(payload) + 1; + + mtx_lock(&u->lock); + xs_queue_reply(u, (char *)&msg, sizeof(msg)); + xs_queue_reply(u, payload, msg.len); + mtx_unlock(&u->lock); +} + +static int +xs_dev_watch_message_parse_string(const char **p, const char *end, + const char **string_r) +{ + const char *nul = memchr(*p, 0, end - *p); + if (!nul) + return -EINVAL; + + *string_r = *p; + *p = nul+1; + + return 0; +} + +static int +xs_dev_watch_message_parse(const struct xsd_sockmsg *msg, const char **path_r, + const char **token_r) +{ + const char *begin = (const char*)msg; + const char *p = begin + sizeof(*msg); + const char *end = p + msg->len; + int error; + + KASSERT(p <= end, ("payload overflow")); + + error = xs_dev_watch_message_parse_string(&p, end, path_r); + if (error) + return error; + error = xs_dev_watch_message_parse_string(&p, end, token_r); + if (error) + return error; + + return 0; +} + +static struct xs_dev_watch * +xs_dev_find_watch(struct xs_dev_data *u, const char *token) +{ + struct xs_dev_watch *watch; + + LIST_FOREACH(watch, &u->watches, list) { + if (strcmp(watch->token, token) == 0) + return watch; + } + + return NULL; +} + +static void +xs_dev_watch_cb(struct xs_watch *watch, const char **vec, unsigned int len) +{ + struct xs_dev_watch *dwatch; + struct xsd_sockmsg msg; + char *payload; + + dwatch = (struct xs_dev_watch *)watch->callback_data; + msg.type = XS_WATCH_EVENT; + msg.req_id = msg.tx_id = 0; + msg.len = strlen(vec[XS_WATCH_PATH]) + strlen(dwatch->token) + 2; + + payload = malloc(msg.len, M_XENSTORE, M_WAITOK); + strcpy(payload, vec[XS_WATCH_PATH]); + strcpy(&payload[strlen(vec[XS_WATCH_PATH]) + 1], dwatch->token); + mtx_lock(&dwatch->user->lock); + xs_queue_reply(dwatch->user, (char *)&msg, sizeof(msg)); + xs_queue_reply(dwatch->user, payload, msg.len); + mtx_unlock(&dwatch->user->lock); + free(payload, M_XENSTORE); +} + static int xs_dev_read(struct cdev *dev, struct uio *uio, int ioflag) { @@ -101,27 +241,16 @@ return (0); } -static void -xs_queue_reply(struct xs_dev_data *u, char *data, unsigned int len) -{ - int i; - - for (i = 0; i < len; i++, u->read_prod++) - u->read_buffer[MASK_READ_IDX(u->read_prod)] = data[i]; - - KASSERT((u->read_prod - u->read_cons) <= sizeof(u->read_buffer), - ("xenstore reply too big")); - - wakeup(u); -} - static int xs_dev_write(struct cdev *dev, struct uio *uio, int ioflag) { int error; + const char *wpath, *wtoken; struct xs_dev_data *u; struct xs_dev_transaction *trans; + struct xs_dev_watch *watch; void *reply; + static const char *ok = "OK"; int len = uio->uio_resid; error = devfs_get_cdevpriv((void **)&u); @@ -168,35 +297,130 @@ LIST_REMOVE(trans, list); free(trans, M_XENSTORE); } + mtx_lock(&u->lock); xs_queue_reply(u, (char *)&u->u.msg, sizeof(u->u.msg)); xs_queue_reply(u, (char *)reply, u->u.msg.len); + mtx_unlock(&u->lock); free(reply, M_XENSTORE); } break; + case XS_WATCH: + u->u.msg.tx_id = 0; + error = xs_dev_watch_message_parse(&u->u.msg, &wpath, &wtoken); + if (error) + break; + if (xs_dev_find_watch(u, wtoken) != NULL) { + error = EINVAL; + break; + } + watch = malloc(sizeof(*watch), M_XENSTORE, M_WAITOK); + watch->watch.node = strdup(wpath, M_XENSTORE); + watch->watch.callback = xs_dev_watch_cb; + watch->watch.callback_data = (uintptr_t)watch; + watch->token = strdup(wtoken, M_XENSTORE); + watch->user = u; + + error = xs_register_watch(&watch->watch); + if (error != 0) { + free(watch->token, M_XENSTORE); + free(watch->watch.node, M_XENSTORE); + free(watch, M_XENSTORE); + break; + } + + LIST_INSERT_HEAD(&u->watches, watch, list); + u->u.msg.len = sizeof(ok); + mtx_lock(&u->lock); + xs_queue_reply(u, (char *)&u->u.msg, sizeof(u->u.msg)); + xs_queue_reply(u, ok, sizeof(ok)); + mtx_unlock(&u->lock); + break; + case XS_UNWATCH: + u->u.msg.tx_id = 0; + error = xs_dev_watch_message_parse(&u->u.msg, &wpath, &wtoken); + if (error) + break; + watch = xs_dev_find_watch(u, wtoken); + if (watch == NULL) { + error = EINVAL; + break; + } + + LIST_REMOVE(watch, list); + xs_unregister_watch(&watch->watch); + free(watch->watch.node, M_XENSTORE); + free(watch->token, M_XENSTORE); + free(watch, M_XENSTORE); + u->u.msg.len = sizeof(ok); + mtx_lock(&u->lock); + xs_queue_reply(u, (char *)&u->u.msg, sizeof(u->u.msg)); + xs_queue_reply(u, ok, sizeof(ok)); + mtx_unlock(&u->lock); + break; default: error = EINVAL; break; } - if (error == 0) - u->len = 0; + if (error != 0) + xs_dev_return_error(u, error, u->u.msg.req_id, u->u.msg.tx_id); - return (error); + /* Reset the write buffer. */ + u->len = 0; + + return (0); } +static int +xs_dev_poll(struct cdev *dev, int events, struct thread *td) +{ + struct xs_dev_data *u; + int error, mask; + + error = devfs_get_cdevpriv((void **)&u); + if (error != 0) + return (POLLERR); + + /* we can always write */ + mask = events & (POLLOUT | POLLWRNORM); + + if (events & (POLLIN | POLLRDNORM)) { + if (u->read_cons != u->read_prod) { + mask |= events & (POLLIN | POLLRDNORM); + } else { + /* Record that someone is waiting */ + selrecord(td, &u->ev_rsel); + } + } + + return (mask); +} + static void xs_dev_dtor(void *arg) { struct xs_dev_data *u = arg; - struct xs_dev_transaction *trans, *tmp; + struct xs_dev_transaction *trans, *tmpt; + struct xs_dev_watch *watch, *tmpw; - LIST_FOREACH_SAFE(trans, &u->transactions, list, tmp) { + seldrain(&u->ev_rsel); + + LIST_FOREACH_SAFE(trans, &u->transactions, list, tmpt) { xs_transaction_end(trans->handle, 1); LIST_REMOVE(trans, list); free(trans, M_XENSTORE); } + LIST_FOREACH_SAFE(watch, &u->watches, list, tmpw) { + LIST_REMOVE(watch, list); + xs_unregister_watch(&watch->watch); + free(watch->watch.node, M_XENSTORE); + free(watch->token, M_XENSTORE); + free(watch, M_XENSTORE); + } + mtx_destroy(&u->lock); + free(u, M_XENSTORE); } @@ -207,7 +431,9 @@ int error; u = malloc(sizeof(*u), M_XENSTORE, M_WAITOK|M_ZERO); + mtx_init(&u->lock, "xsdev_lock", NULL, MTX_DEF); LIST_INIT(&u->transactions); + LIST_INIT(&u->watches); error = devfs_set_cdevpriv(u, xs_dev_dtor); if (error != 0) free(u, M_XENSTORE); @@ -220,6 +446,7 @@ .d_read = xs_dev_read, .d_write = xs_dev_write, .d_open = xs_dev_open, + .d_poll = xs_dev_poll, .d_name = "xs_dev", }; --------------FC2B5AB8B9B6ED6E154A90A5 Content-Type: text/x-patch; name="0004-xenstore-dev-prevent-user-space-xenstore-device-from.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename*0="0004-xenstore-dev-prevent-user-space-xenstore-device-from.pa"; filename*1="tch" >From 751095963c026876152c8a9c2cbfd9ff76ae6329 Mon Sep 17 00:00:00 2001 From: Roger Pau Monne Date: Thu, 15 Dec 2016 16:54:40 +0000 Subject: [PATCH 4/4] xenstore/dev: prevent user-space xenstore device from hijacking transactions The user-space xenstore device is currently lacking a check to make sure that the caller is only using transaction ids currently assigned to it. This allows users of the xenstore device to hijack transactions not started by them, although the scope is limited to transactions started by the same domain. Sponsored by: Citrix Systems R&D --- sys/dev/xen/xenstore/xenstore_dev.c | 28 ++++++++++++++++++++++------ 1 file changed, 22 insertions(+), 6 deletions(-) diff --git a/sys/dev/xen/xenstore/xenstore_dev.c b/sys/dev/xen/xenstore/xenstore_dev.c index 18f86b8..de5d5e6 100644 =================================================================== --- a/sys/dev/xen/xenstore/xenstore_dev.c (revision 334101) +++ b/sys/dev/xen/xenstore/xenstore_dev.c (working copy) @@ -73,6 +73,18 @@ unsigned int read_cons, read_prod; }; +static struct xs_dev_transaction * +xs_dev_find_transaction(struct xs_dev_data *u, uint32_t tx_id) +{ + struct xs_dev_transaction *trans; + + LIST_FOREACH(trans, &u->transactions, list) + if (trans->handle.id == tx_id) + return trans; + + return NULL; +} + static int xs_dev_read(struct cdev *dev, struct uio *uio, int ioflag) { @@ -151,6 +163,12 @@ case XS_MKDIR: case XS_RM: case XS_SET_PERMS: + /* Check that this transaction id is not hijacked. */ + if (u->u.msg.tx_id != 0 && + xs_dev_find_transaction(u, u->u.msg.tx_id) == NULL) { + error = EINVAL; + break; + } error = xs_dev_request_and_reply(&u->u.msg, &reply); if (!error) { if (u->u.msg.type == XS_TRANSACTION_START) { @@ -159,12 +177,10 @@ trans->handle.id = strtoul(reply, NULL, 0); LIST_INSERT_HEAD(&u->transactions, trans, list); } else if (u->u.msg.type == XS_TRANSACTION_END) { - LIST_FOREACH(trans, &u->transactions, list) - if (trans->handle.id == u->u.msg.tx_id) - break; -#if 0 /* XXX does this mean the list is empty? */ - BUG_ON(&trans->list == &u->transactions); -#endif + trans = xs_dev_find_transaction(u, + u->u.msg.tx_id); + KASSERT(trans != NULL, + ("Unable to find transaction")); LIST_REMOVE(trans, list); free(trans, M_XENSTORE); } --------------FC2B5AB8B9B6ED6E154A90A5--