FreeBSD Mail Archives

Date:      Wed, 15 Feb 2017 14:06:36 +0200
From:      liuyingdong <liuyingdong@huawei.com>
To:        <roger.pau@citrix.com>, <freebsd-xen@freebsd.org>, <"suoben@huawei.com; zhao.zhaojun@huawei.com; wanglinkai@huawei.com; leo.gaoxiaodong@huawei.com; chuzhaosong@huawei.com; chunfeng.wang"@huawei.com>
Subject:   =?UTF-8?Q?Re:_Multiple_patch_review_=28was:_Re:_=e7=ad=94=e5=a4=8d:?= =?UTF-8?Q?_=e7=ad=94=e5=a4=8d:_[PATCH]netfront:_need_release_all_resources?= =?UTF-8?Q?=29_after_adding_and_removing_NICs_time_and_again?=
Message-ID:  <af1ffb76-b9c4-8397-5f41-a9759b11d932@huawei.com>
In-Reply-To: <1676c942-84ec-e573-42e9-8debd47aeb1b@huawei.com>
References:  <1676c942-84ec-e573-42e9-8debd47aeb1b@huawei.com>

On 2017/2/13 12:05, liuyingdong wrote:
> 发件人: roger.pau@citrix.com [mailto:roger.pau@citrix.com]
> 发送时间: 2017年2月10日 17:12
> 收件人: Liuyingdong <liuyingdong@huawei.com>
> 抄送: freebsd-xen@freebsd.org; Suoben <suoben@huawei.com>; Zhaojun (Euler) <zhao.zhaojun@huawei.com>; Wanglinkai <wanglinkai@huawei.com>; chuzhaosong <chuzhaosong@huawei.com>; Wangchunfeng (Ivan) <chunfeng.wang@huawei.com>; Gaoxiaodong (Leo) <leo.gaoxiaodong@huawei.com>
> 主题: Multiple patch review (was: Re: 答复: 答复: [PATCH]netfront: need release all resources) after adding and removing NICs time and again
> 
> On Tue, Feb 07, 2017 at 04:55:34PM +0000, Liuyingdong wrote:
>> Hi Roger,
>>          I am so sorry and please review tne below URL:
>> https://lists.freebsd.org/pipermail/freebsd-xen/2017-February/002957.h
>> tml
> 
> Hello,
> 
> Thanks for the patches, and sorry for the delay. It seems like you have not applied some of my comments, so I will have to re-post them here. If some of the comments don't apply for whatever reason, please reply back and explain why, or else this is not going to progress in an useful way for any of us.
> 
> I would also request you to look into using `git send-email`, reviewing your patches as attachments is not very comfortable. Or else, you could create an account to https://reviews.freebsd.org/ and upload the patches there assigning me as a reviewer.
> 
I have created an account to https://reviews.freebsd.org/ and uploaded the patches but I cann't assign you as a reviewer so I assign visible to All Users.
The three modified patches are as follows:
1.https://reviews.freebsd.org/differential/diff/25207/
2.https://reviews.freebsd.org/differential/diff/25208/
3.https://reviews.freebsd.org/differential/diff/25209/
> When replying, can you also make sure that you reply in-line to my comments, or else it's very hard to follow the conversation.
> 
ok
> ---
>> From 0af05ac01be853340b3b53862de9853d8d44c0c6 Mon Sep 17 00:00:00 2001
>> From: liuyingdong <liuyingdong@huawei.com>
>> Date: Thu, 2 Feb 2017 19:40:48 +0200
>> Subject: introduce suspend_cancel mechanism for frontend devices.
>> 1.1 On a cancelled suspend, xen frontend devices need know their state is invariant.
>> 1.2 On a cancelled suspend the vcpu_info location does not change
>> (it's still in the per-cpu area registered by xen_hvm_cpu_init()).
>> So do not call xen_hvm_init_shared_info_page() which would make the kernel think its back in the shared info.
>> With the wrong vcpu_info, events cannot be received and the domain will hang after a cancelled suspend.
>>
>> ---
>>  dev/xen/blkfront/blkfront.c |  5 +++++
>>  dev/xen/control/control.c   | 11 +++++++++--
>>  dev/xen/netfront/netfront.c | 21 +++++++++++++++++++++
>>  x86/xen/hvm.c               | 16 ++++++++++------
>>  xen/xenbus/xenbusb.c        | 35 +++++++++++++++++++----------------
>>  xen/xenbus/xenbusvar.h      |  2 ++
>>  6 files changed, 66 insertions(+), 24 deletions(-)
>>
>> diff --git a/dev/xen/blkfront/blkfront.c b/dev/xen/blkfront/blkfront.c
>> index 9eca220..47cd83f 100644
>> --- a/dev/xen/blkfront/blkfront.c
>> +++ b/dev/xen/blkfront/blkfront.c
>> @@ -1537,6 +1537,11 @@ xbd_resume(device_t dev)  {
>>  	struct xbd_softc *sc = device_get_softc(dev);
>>
>> +	if(g_suspend_cancelled) {
>> +		sc->xbd_state = XBD_STATE_CONNECTED;
>> +		return (0);
>> +	}
>> +
>>  	DPRINTK("xbd_resume: %s\n", xenbus_get_node(dev));
>>
>>  	xbd_free(sc);
>> diff --git a/dev/xen/control/control.c b/dev/xen/control/control.c
>> index 58fefcc..4f8471f 100644
>> --- a/dev/xen/control/control.c
>> +++ b/dev/xen/control/control.c
>> @@ -190,6 +190,11 @@ xctrl_reboot()
>>  	shutdown_nice(0);
>>  }
>>
>> +static void _set_suspend_cancelled(bool suspend_cancelled) {
>> +	g_suspend_cancelled = suspend_cancelled; }
>> +
>>  static void
>>  xctrl_suspend()
>>  {
>> @@ -278,7 +283,9 @@ xctrl_suspend()
>>  	/*
>>  	 * Reset grant table info.
>>  	 */
>> -	gnttab_resume(NULL);
>> +	if(suspend_cancelled == 0)  {
>> +		gnttab_resume(NULL);
>> +	}
>>
>>  #ifdef SMP
>>  	if (!CPU_EMPTY(&cpu_suspend_map)) {
>> @@ -296,6 +303,7 @@ xctrl_suspend()
>>  	 * FreeBSD really needs to add DEVICE_SUSPEND_CANCEL or
>>  	 * similar.
>>  	 */
>> +	_set_suspend_cancelled(suspend_cancelled != 0);
> 
> No need for this helper, you can just s/suspend_cancelled/xen_suspend_called/
> and make it global.
> 
ok, I agree with you.
>>  	DEVICE_RESUME(root_bus);
>>  	mtx_unlock(&Giant);
>>
>> @@ -319,7 +327,6 @@ xctrl_suspend()
>>  #endif
>>
>>  	resume_all_proc();
>> -
> 
> Stray change.
> 
ok, I agree with you.
>>  	EVENTHANDLER_INVOKE(power_resume);
>>
>>  	if (bootverbose)
>> diff --git a/dev/xen/netfront/netfront.c b/dev/xen/netfront/netfront.c
>> index 459712a..537018a 100644
>> --- a/dev/xen/netfront/netfront.c
>> +++ b/dev/xen/netfront/netfront.c
>> @@ -153,6 +153,10 @@ static int xn_get_responses(struct netfront_rxq *,
>>      struct netfront_rx_info *, RING_IDX, RING_IDX *,
>>      struct mbuf **);
>>
>> +#ifdef INET
>> +static void netfront_send_fake_arp(device_t dev, struct netfront_info
>> +*info); #endif
>> +
>>  #define virt_to_mfn(x) (vtophys(x) >> PAGE_SHIFT)
>>
>>  #define INVALID_P2M_ENTRY (~0UL)
>> @@ -440,6 +444,23 @@ netfront_resume(device_t dev)  {
>>  	struct netfront_info *info = device_get_softc(dev);
>>
>> +	if(g_suspend_cancelled) {
>> +		u_int i;
>> +		for (i = 0; i < info->num_queues; i++) {
>> +		    XN_RX_LOCK(&info->rxq[i]);
>> +		    XN_TX_LOCK(&info->txq[i]);
> 
> Bad indentation, you should use tabs (not spaces).
> 
ok, I agree with you.
>> +		}
>> +		netfront_carrier_on(info);
>> +		for (i = 0; i < info->num_queues; i++) {
>> +		    XN_RX_UNLOCK(&info->rxq[i]);
>> +		    XN_TX_UNLOCK(&info->txq[i]);
> 
> Bad indentation, you should use tabs (not spaces).
> 
ok, I agree with you.
>> +		}
>> +#ifdef INET
>> +		netfront_send_fake_arp(dev, info);
>> +#endif
> 
> I don't think you need to send an ARP on a cancelled suspend, the bridge should already have the mac address of the virtual interface in it's cache right?
> 
For  FreeBSD 10.X and FreeBSD 11.0, I have tested and found ping was not available if I don't send an ARP on a cancelled suspend.
For 12.0-CURRENT,I don't think I need to send an ARP on a cancelled suspend. But I don't know why.
>> +		return (0);
>> +	}
>> +
>>  	netif_disconnect_backend(info);
>>  	return (0);
>>  }
>> diff --git a/x86/xen/hvm.c b/x86/xen/hvm.c index e10659e..c96b7be
>> 100644
>> --- a/x86/xen/hvm.c
>> +++ b/x86/xen/hvm.c
>> @@ -297,10 +297,9 @@ xen_hvm_init(enum xen_hvm_init_type init_type)
>>  	int error;
>>  	int i;
>>
>> -	if (init_type == XEN_HVM_INIT_CANCELLED_SUSPEND)
>> -		return;
>> -
>> -	error = xen_hvm_init_hypercall_stubs(init_type);
>> +	if (init_type != XEN_HVM_INIT_CANCELLED_SUSPEND) {
>> +		error = xen_hvm_init_hypercall_stubs(init_type);
>> +	}
>>
>>  	switch (init_type) {
>>  	case XEN_HVM_INIT_COLD:
>> @@ -331,6 +330,8 @@ xen_hvm_init(enum xen_hvm_init_type init_type)
>>  		CPU_FOREACH(i)
>>  			DPCPU_ID_SET(i, vcpu_info, NULL);
>>  		break;
>> +	case XEN_HVM_INIT_CANCELLED_SUSPEND:
>> +		break;
>>  	default:
>>  		panic("Unsupported HVM initialization type");
>>  	}
>> @@ -344,7 +345,9 @@ xen_hvm_init(enum xen_hvm_init_type init_type)
>>  	 * is passed inside the start_info struct and is already set, so this
>>  	 * functions are no-ops.
>>  	 */
>> -	xen_hvm_init_shared_info_page();
>> +	if (init_type != XEN_HVM_INIT_CANCELLED_SUSPEND) {
>> +		xen_hvm_init_shared_info_page();
>> +	}
>>  	xen_hvm_disable_emulated_devices();
> 
> I'm not sure I follow why this change to xen_hvm_init is needed, on a cancelled suspend you shouldn't need to re-set the callback vector or unplug any emulated devices.
> 
you are right,I need not reset the callback vector or unplug any emulated devices on a cancelled suspend.
>>
>> @@ -361,7 +364,8 @@ xen_hvm_resume(bool suspend_cancelled)
>>  	    XEN_HVM_INIT_CANCELLED_SUSPEND : XEN_HVM_INIT_RESUME);
>>
>>  	/* Register vcpu_info area for CPU#0. */
>> -	xen_hvm_cpu_init();
>> +	if(!suspend_cancelled)
>> +		xen_hvm_cpu_init();
> 
> Seeing this here, why don't we just avoid calling xen_hvm_resume from xctrl_suspend on a cancelled suspend?
> 
I have avoided calling xen_hvm_resume on a cancelled suspend.
>>
>>  static void
>> diff --git a/xen/xenbus/xenbusb.c b/xen/xenbus/xenbusb.c index
>> 3a0e543..fa9ba61 100644
>> --- a/xen/xenbus/xenbusb.c
>> +++ b/xen/xenbus/xenbusb.c
>> @@ -791,29 +791,32 @@ xenbusb_resume(device_t dev)
>>  			if (device_get_state(kids[i]) == DS_NOTPRESENT)
>>  				continue;
>>
>> -			ivars = device_get_ivars(kids[i]);
>> +			if(!g_suspend_cancelled) {
>> +				ivars = device_get_ivars(kids[i]);
>>
>> -			xs_unregister_watch(&ivars->xd_otherend_watch);
>> -			xenbus_set_state(kids[i], XenbusStateInitialising);
>> +				xs_unregister_watch(&ivars->xd_otherend_watch);
>> +				xenbus_set_state(kids[i], XenbusStateInitialising);
>>
>> -			/*
>> -			 * Find the new backend details and
>> -			 * re-register our watch.
>> -			 */
>> -			error = XENBUSB_GET_OTHEREND_NODE(dev, ivars);
>> -			if (error)
>> -				return (error);
>> +				/*
>> +				 * Find the new backend details and
>> +				 * re-register our watch.
>> +				 */
>> +				error = XENBUSB_GET_OTHEREND_NODE(dev, ivars);
>> +				if (error)
>> +					return (error);
>>
>> -			statepath = malloc(ivars->xd_otherend_path_len
>> -			    + strlen("/state") + 1, M_XENBUS, M_WAITOK);
>> -			sprintf(statepath, "%s/state", ivars->xd_otherend_path);
>> +				statepath = malloc(ivars->xd_otherend_path_len
>> +				    + strlen("/state") + 1, M_XENBUS, M_WAITOK);
>> +				sprintf(statepath, "%s/state", ivars->xd_otherend_path);
>>
>> -			free(ivars->xd_otherend_watch.node, M_XENBUS);
>> -			ivars->xd_otherend_watch.node = statepath;
>> +				free(ivars->xd_otherend_watch.node, M_XENBUS);
>> +				ivars->xd_otherend_watch.node = statepath;
>> +			}
>>
>>  			DEVICE_RESUME(kids[i]);
>>
>> -			xs_register_watch(&ivars->xd_otherend_watch);
>> +			if(!g_suspend_cancelled)
>> +				xs_register_watch(&ivars->xd_otherend_watch);
>>  #if 0
> 
> Why don't you just add:
> 
> if (xenbusb_suspend_cancelled == 1) {
> 	DEVICE_RESUME(kids[i]);
> 	continue;
> }
> 
> To the top of the loop? This would prevent adding a bunch of nested conditionals.
> 
I don't think so. DEVICE_RESUME(kids[i]) need be called whether the xenbusb_suspend_cancelled is 0 or 1.
>>  			/*
>>  			 * Can't do this yet since we are running in diff --git
>> a/xen/xenbus/xenbusvar.h b/xen/xenbus/xenbusvar.h index
>> 377d60c..02a5bfa 100644
>> --- a/xen/xenbus/xenbusvar.h
>> +++ b/xen/xenbus/xenbusvar.h
>> @@ -99,6 +99,8 @@ XENBUS_ACCESSOR(state,		STATE,			enum xenbus_state)
>>  XENBUS_ACCESSOR(otherend_id,	OTHEREND_ID,		int)
>>  XENBUS_ACCESSOR(otherend_path,	OTHEREND_PATH,		const char *)
>>
>> +bool g_suspend_cancelled;
> 
> This should be called xen_suspend_cancelled, and it would be better to place it in xen/xen-os.h instead. Also I'm not sure how that even compiles, isn't this missing an "extern" keyword in front?
> 
defining a bool xen_suspend_cancelled need add an "extern" keyword in the front of xen/xen-os.h.
>> +
>>  /**
>>   * Return the state of a XenBus device.
>>   *
>> --
>> 2.11.0.windows.3
> ---
>> From f4a0660ce77f72e0d0c08f3316ed4d8f78f1b8ab Mon Sep 17 00:00:00 2001
>> From: liuyingdong <liuyingdong@huawei.com>
>> Date: Thu, 2 Feb 2017 18:51:08 +0200
>> Subject: If there is a user process which maybe often reads and writes
>> xenstore, the application has been suspended after stop_all_proc is
>> called. It held xs.request_mutex lock but in the following functions xs_write and xs_suspend will not get the lock. So the VM will hang.
>> In order to prevent this from happening, this process need wait until
>> stop_all_proc has finished during live migration.
>>
>> ---
>>  dev/xen/control/control.c        |  2 ++
>>  dev/xen/xenstore/xenstore.c      | 31 +++++++++++++++++++++++++++++++
>>  dev/xen/xenstore/xenstore_dev.c  |  7 +++++++
>> xen/xenstore/xenstore_internal.h |  4 ++++
>>  4 files changed, 44 insertions(+)
>>
>> diff --git a/dev/xen/control/control.c b/dev/xen/control/control.c
>> index ae13c6c..58fefcc 100644
>> --- a/dev/xen/control/control.c
>> +++ b/dev/xen/control/control.c
>> @@ -147,6 +147,7 @@ __FBSDID("$FreeBSD$");  #include
>> <xen/interface/grant_table.h>
>>
>>  #include <xen/xenbus/xenbusvar.h>
>> +#include <xen/xenstore/xenstore_internal.h>
>>
>>  /*--------------------------- Forward Declarations
>> --------------------------*/
>>  /** Function signature for shutdown event handlers. */ @@ -199,7
>> +199,9 @@ xctrl_suspend()
>>  	int suspend_cancelled;
>>
>>  	EVENTHANDLER_INVOKE(power_suspend_early);
>> +	xs_lock();
>>  	stop_all_proc();
>> +	xs_unlock();
> 
> This seems fine, although I think that a better solution would be to simply don't use such kind of locks, 
> and allow concurrent access to the ring by properly using the xenstore ids, but that involves a non-trivial amount of code.
> 
I think so but this solution is more simple.
>>  	EVENTHANDLER_INVOKE(power_suspend);
>>
>>  #ifdef EARLY_AP_STARTUP
>> diff --git a/dev/xen/xenstore/xenstore.c b/dev/xen/xenstore/xenstore.c
>> index 4f89b74..d44f064 100644
>> --- a/dev/xen/xenstore/xenstore.c
>> +++ b/dev/xen/xenstore/xenstore.c
>> @@ -1255,6 +1255,37 @@ xs_suspend(device_t dev)
>>  	return (0);
>>  }
>>
>> +int
>> +xs_try_lock(void)
>> +{
>> +	/*
>> +	sx_try_slock() and sx_try_xlock() will return 0 if the shared/exclusive
>> +		 lock cannot be acquired immediately; otherwise the shared/exclusive lock
>> +		 will be acquired and a non-zero value will be returned.
>> +	*/
>> +	return sx_try_xlock(&xs.request_mutex); }
> 
> I don't think you need this at all, see below...
> 
> [...]
>> diff --git a/dev/xen/xenstore/xenstore_dev.c
>> b/dev/xen/xenstore/xenstore_dev.c index ce62140..d6b97c0 100644
>> --- a/dev/xen/xenstore/xenstore_dev.c
>> +++ b/dev/xen/xenstore/xenstore_dev.c
>> @@ -128,6 +128,13 @@ xs_dev_write(struct cdev *dev, struct uio *uio, int ioflag)
>>  	if (error != 0)
>>  		return (error);
>>
>> +	while(!xs_try_lock()) {
>> +		error = tsleep(u, PCATCH, "xsdwrite", hz/10);
>> +		if (error && error != EWOULDBLOCK)
>> +			return (error);
>> +	}
>> +	xs_unlock();
> 
> Why do you need this try_lock/unlock construction here? It is not protecting you against anything, the more that you don't perform any actual work with the lock held.
> 
yes,I think so. I have replaced xs_try_lock with xs_is_lock:
int
xs_is_lock(void)
{
	return sx_xlocked(&xs.request_mutex);
}
>>  	if ((len + u->len) > sizeof(u->u.buffer))
>>  		return (EINVAL);
>>
>> diff --git a/xen/xenstore/xenstore_internal.h
>> b/xen/xenstore/xenstore_internal.h
>> index 3355c27..31db935 100644
>> --- a/xen/xenstore/xenstore_internal.h
>> +++ b/xen/xenstore/xenstore_internal.h
>> @@ -34,3 +34,7 @@
>>
>>  /* Used by the XenStore character device to borrow kernel's store
>> connection. */  int xs_dev_request_and_reply(struct xsd_sockmsg *msg,
>> void **result);
>> +
>> +int xs_try_lock(void);
>> +void xs_lock(void);
>> +void xs_unlock(void);
>> --
>> 2.11.0.windows.3
> ---
>> From 78215c0b117fb7b651cf7e57a6a323407413ddde Mon Sep 17 00:00:00 2001
>> From: liuyingdong <liuyingdong@huawei.com>
>> Date: Thu, 2 Feb 2017 19:52:33 +0200
>> Subject: Because wrong order of device resume, VM will hang after live migration.
>> attach the Xen PV timer to the nexus directly as it was done before.
>>
>> ---
>>  dev/xen/timer/timer.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/dev/xen/timer/timer.c b/dev/xen/timer/timer.c index
>> 0b26847..ae8f603 100644
>> --- a/dev/xen/timer/timer.c
>> +++ b/dev/xen/timer/timer.c
>> @@ -546,5 +546,5 @@ static driver_t xentimer_driver = {
>>  	sizeof(struct xentimer_softc),
>>  };
>>
>> -DRIVER_MODULE(xentimer, xenpv, xentimer_driver, xentimer_devclass, 0,
>> 0); -MODULE_DEPEND(xentimer, xenpv, 1, 1, 1);
>> +DRIVER_MODULE(xentimer, nexus, xentimer_driver, xentimer_devclass, 0,
>> +0); MODULE_DEPEND(xentimer, nexus, 1, 1, 1);
> 
> No, this is not how this should be solved. I understand that you might not want to implement the full solution proposed in freebsd-arch [0], but going back to Xen devices attaching directly to the nexus is also not an option IMHO.
> 
> What I would accept is marking the Xen time-counter as not safe for suspension, so that FreeBSD itself will switch to a different timer when suspending. This simply means removing the TC_FLAGS_SUSPEND_SAFE from the timecounter flags.
> Please add a "FIXME" comment, explaining why this is disabled, and when it could be enabled again. Could you please test that, and see if it solves your issues?
> 
> Thanks, Roger
> 
> [0] https://lists.freebsd.org/pipermail/freebsd-arch/2016-December/018041.html
> 
I have tested that and find it may solve my issue.

Thanks, Terry.

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?af1ffb76-b9c4-8397-5f41-a9759b11d932>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation