From owner-freebsd-arch@FreeBSD.ORG Sun Jul 6 15:04:21 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AEFA41065676; Sun, 6 Jul 2008 15:04:21 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from fallbackmx09.syd.optusnet.com.au (fallbackmx09.syd.optusnet.com.au [211.29.132.242]) by mx1.freebsd.org (Postfix) with ESMTP id B90C68FC12; Sun, 6 Jul 2008 15:04:20 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail02.syd.optusnet.com.au (mail02.syd.optusnet.com.au [211.29.132.183]) by fallbackmx09.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id m651x1wY009059; Sat, 5 Jul 2008 11:59:01 +1000 Received: from c220-239-252-11.carlnfd3.nsw.optusnet.com.au (c220-239-252-11.carlnfd3.nsw.optusnet.com.au [220.239.252.11]) by mail02.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id m651wYda023278 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 5 Jul 2008 11:58:37 +1000 Date: Sat, 5 Jul 2008 11:58:34 +1000 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: Ed Schouten In-Reply-To: <20080704092244.GY14567@hoeg.nl> Message-ID: <20080705095912.M12433@delplex.bde.org> References: <20080702190901.GS14567@hoeg.nl> <20080703193406.GS29380@server.vk2pj.dyndns.org> <20080703205220.GW14567@hoeg.nl> <20080704022125.GA32475@server.vk2pj.dyndns.org> <20080704092244.GY14567@hoeg.nl> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: FreeBSD Current , FreeBSD Arch Subject: Re: MPSAFE TTY schedule X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 06 Jul 2008 15:04:21 -0000 On Fri, 4 Jul 2008, Ed Schouten wrote: >>>> cy(4), digi(4), rp(4), rc(4), si(4). >>> >>> Who actually owns one of these devices? If you do, please contact me. If >>> I didn't make myself clear enough: I *am* willing to (assist in >>> porting|port) these drivers. I have 24 ports on cy devices, but don't use them except for testing. >> I have access to a Digi Xem boards at work and have poked around >> inside the digi(4) code in the past. My difficulty is that the cards >> are all in use and upgrading to a FreeBSD-current that doesn't support >> them and then porting the driver is probably not an option (whereas >> converting it from using shims to access the TTY layer to doing so >> directly would probably be acceptable - because I can get the board >> going again in a hurry if needed). > > The problem with the old TTY layer, is that drivers tend to access the > internals of the TTY structure very often. A good example of this is the > clists, where TTY drivers tamper around inside the clist and cblock > structures. There is not much room to implement a compatibility layer > there. This is a very bad example. Clist accesses are, or should be, a non- problem. No (non-broken) tty drivers access the internals of clists directly, except for read-only accesses to the character count (c_cc), which isn't clist-specific. All of them use the old KPI functions (putc/getc/b_to_q/q_to_b, etc) for accessing the tty queues, and the implementation of the tty queues can be changed to anything without any KPI changes except possibly to the spelling of c_cc. Non-drivers like slip and ppp have slightly more clist-specific knowledge, but again their interface is limited mainly to the KPI (clist_alloc_cblocks, ...) and a read-only character count (cfreecount). if_sl.c still has has a lot of comments about its knowledge of clists, but these barely apply since its implementation only depends on an adequate buffering mechanism for characters (not sure if the characters need to be quotable). Driver-specific locking for clists is even less of a problem. No driver-specific locking is needed for the calls, since they are locked (using spl or Giant) in clist internals. The direct accesses to c_cc should be locked in the same way, but missing locking for these is almost harmless since the accesses are read-only and reading a stale value is usually harmless. Internal locking at each entry point of the KPI encourages unlocked accesses to c_cc, since c_cc becomes volatile immediately after releasing the internal lock. In practice, things like t_oproc() routines use higher-level locking so that there is no race and the internal locking in getc/q_to_b is bogus: xxxstart: /* * Lock whole loop. Without this, getc()'s access to c_cc would * give the same race as our direct access after getc() unlocks. * With this, getc()'s locking is just a waste of time (it's * recursive so that this isn't fatal, so the waste of time * hopefully isn't very large, but this requires recursive locking * to be used all over, so there are time and robustness costs all * over). * spltty(); /* Or Giant in bad drivers. */ /* * Non-Giant locking at a higher level than here might be OK, or * might not, depending on whether the driver wants very fine-grained * locking. If this is done, then we can remove all these fine- * grained spl lock calls in drivers instead of replacing them by * a tty lock. Meanwhile, the spls serve as placeholders to remind * us where to put the tty locks. */ while (tp->t_outq.c_cc != 0 && /* XXX efficiency hack. */ (ch = getc(&tp->t_outq)) != -1) move_ch_to_driver_buffer(...) splx(); A possibly better way to handle this is to accept losing races on the read-only variable but ensure that the driver is woken up without much delay if the variable changes. This already happens in most or all cases, since changing c_cc requires action to process the new state The number of KPI and c_cc accesses is also very small. In most drivers it consists of a whole 1 getc or q_to_b call in t_oproc() and a whole 1 c_cc read to avoid this call. A few drivers implement a bulk input routine that bypasses t_rint() in the TS_CAN_BYPASS_L_RINT case. This requires 1 b_to_q call and 1 c_cc read to implement flow control. This should be in the tty layer. It doesn't break the clist layering but it breaks the tty layering. slip and ppp make much heavier use of the KPI by count of the number of calls (about 5 putc's, 2 unputc's.... each). By churning the KPI, you create a lot of work. Bruce From owner-freebsd-arch@FreeBSD.ORG Sun Jul 6 22:30:21 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 80ED31065670 for ; Sun, 6 Jul 2008 22:30:21 +0000 (UTC) (envelope-from babkin@verizon.net) Received: from vms042pub.verizon.net (vms042pub.verizon.net [206.46.252.42]) by mx1.freebsd.org (Postfix) with ESMTP id 2C7208FC14 for ; Sun, 6 Jul 2008 22:30:21 +0000 (UTC) (envelope-from babkin@verizon.net) Received: from verizon.net ([63.24.199.79]) by vms042.mailsrvcs.net (Sun Java System Messaging Server 6.2-6.01 (built Apr 3 2006)) with ESMTPA id <0K3L0013UVU09X5I@vms042.mailsrvcs.net> for arch@freebsd.org; Sun, 06 Jul 2008 17:30:05 -0500 (CDT) Date: Sun, 06 Jul 2008 18:34:14 -0400 From: Sergey Babkin Sender: root To: arch@freebsd.org Message-id: <48714866.906912CC@verizon.net> MIME-version: 1.0 X-Mailer: Mozilla 4.7 [en] (X11; U; FreeBSD 4.7-RELEASE i386) Content-type: text/plain; charset=us-ascii Content-transfer-encoding: 7bit X-Accept-Language: en, ru Cc: Subject: Proposal: a revoke() system call X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 06 Jul 2008 22:30:21 -0000 Hi all, I want to propose a system call with the following functionality: Syntax: int revoke(int fd, int flags) Revoke a file desriptor from this proces. For all practical purposes, it's equivalent to close(), except that the descriptor (fd) is not freed. Any further calls (except close()) on this fd would return an error. Close() would free the file descriptor as usual. If any calls were in progress sleeping (such as read() waiting for data), they would be interrupted and return an error. Flags could contain a bitmap that would modify the meaning of the call. I can think of at least one such modification: REVOKE_EOF, that if set, would make any further read() calls return 0 (EOF indication) instead of an error. Rationale: In the multithreaded programs often multiple threads work with the same file descriptor. A particularly typical situation is a reader thread and a writer thread. The reader thread calls read(), gets blocked until it gets more data, then processes the data and continues the loop. Another example of a "reader thread" would be the main thread of a daemon that accepts the incoming connections and starts new per-connection threads. If the application decides that it wants to close this file descriptor abruptly, getting the reader thread to wake up and exit is not easy. It's fraught with synchronisation issues. Things get even more complicated if there are multiple layers of library wrappers. The proposed system call makes it easy to pretend that the file descriptor has experienced an error (or that a socket connection has been closed by the other side). The library layers should be already able to handle errors, so the problem would be solved transparently for them. For sockets a similar functionality can already be achieved with shutdown(fd, SHUT_RDWR). But it works only for connected sockets, not for other file types nor sockets runnig accept(). A new system call would apply it to all the kinds of file descriptors. Another option is to extend the shutdown() call to the non-socket file descriptors. Any comments? Would anyone mind if I implement it? -SB From owner-freebsd-arch@FreeBSD.ORG Sun Jul 6 23:05:30 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F41061065680 for ; Sun, 6 Jul 2008 23:05:29 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id C8C248FC36 for ; Sun, 6 Jul 2008 23:05:29 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 54DCB46C64; Sun, 6 Jul 2008 19:05:29 -0400 (EDT) Date: Mon, 7 Jul 2008 00:05:29 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Sergey Babkin In-Reply-To: <48714866.906912CC@verizon.net> Message-ID: <20080707000313.P56885@fledge.watson.org> References: <48714866.906912CC@verizon.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: arch@freebsd.org Subject: Re: Proposal: a revoke() system call X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 06 Jul 2008 23:05:30 -0000 On Sun, 6 Jul 2008, Sergey Babkin wrote: > int revoke(int fd, int flags) Seems like that conflicts with our existing revoke(2) system call. You could achieve something of the same end by opening /dev/null and then dup2()'ing to the file descriptor you want to revoke, perhaps? Right now there's a known issue that calling close(2) on a socket from one thread doesn't interrupt a socket in a blocking I/O call from another thread -- you first have to call shutdown(2), and then close(2). This has caused problems for Java in the past, but I'm not sure that it's really a bug given that it's not unreasonable behavior not rejected by the spec :-). Robert N M Watson Computer Laboratory University of Cambridge > > Revoke a file desriptor from this proces. For all practical > purposes, it's equivalent to close(), except that the descriptor > (fd) is not freed. Any further calls (except close()) on this fd > would return an error. Close() would free the file descriptor > as usual. If any calls were in progress sleeping (such as read() > waiting for data), they would be interrupted and return an error. > > Flags could contain a bitmap that would modify the meaning of the > call. I can think of at least one such modification: REVOKE_EOF, > that if set, would make any further read() calls return 0 (EOF > indication) instead of an error. > > Rationale: > > In the multithreaded programs often multiple threads work with the > same file descriptor. A particularly typical situation is a reader > thread and a writer thread. The reader thread calls read(), gets > blocked until it gets more data, then processes the data and > continues the loop. Another example of a "reader thread" would be > the main thread of a daemon that accepts the incoming connections > and starts new per-connection threads. > > If the application decides that it wants to close this file > descriptor abruptly, getting the reader thread to wake up and exit > is not easy. It's fraught with synchronisation issues. > Things get even more complicated if there are multiple layers > of library wrappers. > > The proposed system call makes it easy to pretend that the file > descriptor has experienced an error (or that a socket connection > has been closed by the other side). The library layers should be > already able to handle errors, so the problem would be solved > transparently for them. For sockets a similar > functionality can already be achieved with shutdown(fd, SHUT_RDWR). > But it works only for connected sockets, not for other file types > nor sockets runnig accept(). A new system call would apply it > to all the kinds of file descriptors. Another option is > to extend the shutdown() call to the non-socket file descriptors. > > Any comments? Would anyone mind if I implement it? > > -SB > _______________________________________________ > freebsd-arch@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" > From owner-freebsd-arch@FreeBSD.ORG Mon Jul 7 07:51:15 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AA9A9106566C for ; Mon, 7 Jul 2008 07:51:15 +0000 (UTC) (envelope-from phk@critter.freebsd.dk) Received: from phk.freebsd.dk (phk.freebsd.dk [130.225.244.222]) by mx1.freebsd.org (Postfix) with ESMTP id 6CD4E8FC1E for ; Mon, 7 Jul 2008 07:51:15 +0000 (UTC) (envelope-from phk@critter.freebsd.dk) Received: from critter.freebsd.dk (unknown [192.168.64.3]) by phk.freebsd.dk (Postfix) with ESMTP id 6FE8A170E3; Mon, 7 Jul 2008 07:51:13 +0000 (UTC) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.14.2/8.14.2) with ESMTP id m677pCCD004220; Mon, 7 Jul 2008 07:51:12 GMT (envelope-from phk@critter.freebsd.dk) To: Sergey Babkin From: "Poul-Henning Kamp" In-Reply-To: Your message of "Sun, 06 Jul 2008 18:34:14 -0400." <48714866.906912CC@verizon.net> Date: Mon, 07 Jul 2008 07:51:12 +0000 Message-ID: <4219.1215417072@critter.freebsd.dk> Sender: phk@critter.freebsd.dk Cc: arch@freebsd.org Subject: Re: Proposal: a revoke() system call X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Jul 2008 07:51:15 -0000 In message <48714866.906912CC@verizon.net>, Sergey Babkin writes: >Hi all, > >I want to propose a system call with the following functionality: > >Syntax: > > int revoke(int fd, int flags) We already have a revoke(2) system call, so the name will have to be something different. >Rationale: > >In the multithreaded programs often multiple threads work with the >same file descriptor. A particularly typical situation is a reader >thread and a writer thread. The reader thread calls read(), gets >blocked until it gets more data, then processes the data and >continues the loop. Another example of a "reader thread" would be >the main thread of a daemon that accepts the incoming connections >and starts new per-connection threads. Have you tried to implement the functionality you're asking for ? You'll have to hunt down into all sorts of protocols, drivers and other code to find the threads sleeping on your fd so you can wake them. It may be quite a piece of work. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From owner-freebsd-arch@FreeBSD.ORG Mon Jul 7 08:43:34 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5ADC71065673; Mon, 7 Jul 2008 08:43:34 +0000 (UTC) (envelope-from peterjeremy@optushome.com.au) Received: from fallbackmx06.syd.optusnet.com.au (fallbackmx06.syd.optusnet.com.au [211.29.132.8]) by mx1.freebsd.org (Postfix) with ESMTP id 2457D8FC14; Mon, 7 Jul 2008 08:43:11 +0000 (UTC) (envelope-from peterjeremy@optushome.com.au) Received: from mail18.syd.optusnet.com.au (mail18.syd.optusnet.com.au [211.29.132.199]) by fallbackmx06.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id m63JYbSg017475; Fri, 4 Jul 2008 05:34:37 +1000 Received: from server.vk2pj.dyndns.org (c122-106-215-175.belrs3.nsw.optusnet.com.au [122.106.215.175]) by mail18.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id m63JY7mu006019 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 4 Jul 2008 05:34:09 +1000 Received: from server.vk2pj.dyndns.org (localhost.vk2pj.dyndns.org [127.0.0.1]) by server.vk2pj.dyndns.org (8.14.2/8.14.2) with ESMTP id m63JY77W031093; Fri, 4 Jul 2008 05:34:07 +1000 (EST) (envelope-from peter@server.vk2pj.dyndns.org) Received: (from peter@localhost) by server.vk2pj.dyndns.org (8.14.2/8.14.2/Submit) id m63JY6ow031092; Fri, 4 Jul 2008 05:34:07 +1000 (EST) (envelope-from peter) Date: Fri, 4 Jul 2008 05:34:06 +1000 From: Peter Jeremy To: Ed Schouten Message-ID: <20080703193406.GS29380@server.vk2pj.dyndns.org> References: <20080702190901.GS14567@hoeg.nl> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="5L6AZ1aJH5mDrqCQ" Content-Disposition: inline In-Reply-To: <20080702190901.GS14567@hoeg.nl> X-PGP-Key: http://members.optusnet.com.au/peterjeremy/pubkey.asc User-Agent: Mutt/1.5.18 (2008-05-17) Cc: FreeBSD Arch , FreeBSD Current Subject: Re: MPSAFE TTY schedule X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Jul 2008 08:43:34 -0000 --5L6AZ1aJH5mDrqCQ Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 2008-Jul-02 21:09:01 +0200, Ed Schouten wrote: >+ The new pseudo-terminal driver is capable of garbage collecting unused > PTY's. Because PTY's are never recycled, they are a lot more robust > (they are always initialized the same, no need to revoke() them before > usage, etc). When you say 'never recycled', does this include the PTY number? If so, long running busy systems are going to get some fairly large numbers. When will the PTY number wrap? What is the impact on tools (eg ps, w) that assume they can represent a PTY in a small number of digits? What about utmp(5) which uses PTY number in the index? >- Not all drivers have been ported to the new TTY layer yet. These > drivers still need to be ported: sio(4), cy(4), digi(4), ubser(4), > uftdi(4), nmdm(4), ng_h4(4), ng_tty(4), snp(4), rp(4), rc(4), si(4), > umodem(4), dcons(4). > >Even though drivers are very important to have, I am convinced we can >get these working not long after the code as been integrated. ... > If you really care about one of these drivers, >please port it to the new TTY layer as soon as possible! IMHO, this is not a reasonable approach: "Hi everyone. I'm going to break infrastructure that a whole bunch of drivers depend on. If you don't fix your drivers in the next few weeks then I'll disconnect them". Either you need to provide compatibility shims (possibly temporary and not MPSAFE) or you need to be far more pro-active in assisting with porting existing consumers of the TTY layer. >TTY layer into our kernel. I would really appreciate it if I could get >this code in before the end of the summer break, because I've got heaps >of spare time to fix any problems then. That's all very nice but what about the maintainers of all the other drivers that you are impacting? > sio(4) has not been ported to the new TTY layer and is very hard > to do so. This is the only mention of how much effort is involved in porting a driver to use the MPSAFE TTY layer and "very hard" is not a good start. I can't quickly find any documentation on how to go about porting an existing driver - definitely there are no section 9 man pages describing the new API in your patchset. IMHO, if you can't commit fixed drivers along with the MPSAFE TTY layer, a more reasonable schedule is to replace the existing TTY layer with an MPSAFE TTY layer that includes compatibility shims. If the shims make things non-MPSAFE (which is likely) then warn that they will be going away in (say) six months. This gives developers a more reasonable timeframe in which to update, as well as working drivers whilst they adapt them. --=20 Peter Jeremy Please excuse any delays as the result of my ISP's inability to implement an MTA that is either RFC2821-compliant or matches their claimed behaviour. --5L6AZ1aJH5mDrqCQ Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (FreeBSD) iEYEARECAAYFAkhtKa4ACgkQ/opHv/APuIcUoQCgvbCXHvJ4XbBUfRb9scImg/D7 dmMAoIHMk6BMCiTO5ZVyVuLFEIZpEhFX =2AQg -----END PGP SIGNATURE----- --5L6AZ1aJH5mDrqCQ-- From owner-freebsd-arch@FreeBSD.ORG Mon Jul 7 11:06:56 2008 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2AD30106566C for ; Mon, 7 Jul 2008 11:06:56 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id F08328FC1B for ; Mon, 7 Jul 2008 11:06:55 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.2/8.14.2) with ESMTP id m67B6tds061977 for ; Mon, 7 Jul 2008 11:06:55 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.2/8.14.1/Submit) id m67B6tZE061933 for freebsd-arch@FreeBSD.org; Mon, 7 Jul 2008 11:06:55 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 7 Jul 2008 11:06:55 GMT Message-Id: <200807071106.m67B6tZE061933@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-arch@FreeBSD.org Cc: Subject: Current problem reports assigned to freebsd-arch@FreeBSD.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Jul 2008 11:06:56 -0000 Current FreeBSD problem reports Critical problems Serious problems Non-critical problems S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/120749 arch [request] Suggest upping the default kern.ps_arg_cache 1 problem total. From owner-freebsd-arch@FreeBSD.ORG Mon Jul 7 15:05:39 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B5F85106564A; Mon, 7 Jul 2008 15:05:39 +0000 (UTC) (envelope-from babkin@verizon.net) Received: from vms046pub.verizon.net (vms046pub.verizon.net [206.46.252.46]) by mx1.freebsd.org (Postfix) with ESMTP id 952E88FC15; Mon, 7 Jul 2008 15:05:39 +0000 (UTC) (envelope-from babkin@verizon.net) Received: from vms074.mailsrvcs.net ([172.18.12.133]) by vms046.mailsrvcs.net (Sun Java System Messaging Server 6.2-6.01 (built Apr 3 2006)) with ESMTPA id <0K3N00DBY5WPMVKD@vms046.mailsrvcs.net>; Mon, 07 Jul 2008 10:05:14 -0500 (CDT) Received: from 65.242.108.162 ([65.242.108.162]) by vms074.mailsrvcs.net (Verizon Webmail) with HTTP; Mon, 07 Jul 2008 10:05:13 -0500 (CDT) Date: Mon, 07 Jul 2008 10:05:13 -0500 (CDT) From: Sergey Babkin X-Originating-IP: [65.242.108.162] To: Sergey Babkin , Robert Watson Message-id: <7100389.65001215443113960.JavaMail.root@vms074.mailsrvcs.net> MIME-version: 1.0 Content-type: text/plain; charset=UTF-8 Content-transfer-encoding: 7bit Cc: arch@freebsd.org Subject: Re: Re: Proposal: a revoke() system call X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Jul 2008 15:05:39 -0000 >On Sun, 6 Jul 2008, Sergey Babkin wrote: > >> int revoke(int fd, int flags) > >Seems like that conflicts with our existing revoke(2) system call. You could Aha, I guess when I've checked, I've looked at a real old version of FreeBSD. Sure, the name can be changed. >achieve something of the same end by opening /dev/null and then dup2()'ing to >the file descriptor you want to revoke, perhaps? Right now there's a known That's a great idea. I haven't thought about it. It should do everything. >issue that calling close(2) on a socket from one thread doesn't interrupt a >socket in a blocking I/O call from another thread -- you first have to call >shutdown(2), and then close(2). This has caused problems for Java in the >past, but I'm not sure that it's really a bug given that it's not unreasonable >behavior not rejected by the spec :-). Maybe I'll see if I can fix that. -SB From owner-freebsd-arch@FreeBSD.ORG Mon Jul 7 15:12:43 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 228A31065676 for ; Mon, 7 Jul 2008 15:12:43 +0000 (UTC) (envelope-from babkin@verizon.net) Received: from vms046pub.verizon.net (vms046pub.verizon.net [206.46.252.46]) by mx1.freebsd.org (Postfix) with ESMTP id 022548FC29 for ; Mon, 7 Jul 2008 15:12:43 +0000 (UTC) (envelope-from babkin@verizon.net) Received: from vms074.mailsrvcs.net ([172.18.12.133]) by vms046.mailsrvcs.net (Sun Java System Messaging Server 6.2-6.01 (built Apr 3 2006)) with ESMTPA id <0K3N00CU868T8LA0@vms046.mailsrvcs.net> for arch@freebsd.org; Mon, 07 Jul 2008 10:12:29 -0500 (CDT) Received: from 65.242.108.162 ([65.242.108.162]) by vms074.mailsrvcs.net (Verizon Webmail) with HTTP; Mon, 07 Jul 2008 10:12:29 -0500 (CDT) Date: Mon, 07 Jul 2008 10:12:29 -0500 (CDT) From: Sergey Babkin X-Originating-IP: [65.242.108.162] To: Sergey Babkin , Poul-Henning Kamp Message-id: <1878557.67061215443549669.JavaMail.root@vms074.mailsrvcs.net> MIME-version: 1.0 Content-type: text/plain; charset=UTF-8 Content-transfer-encoding: 7bit Cc: arch@freebsd.org Subject: Re: Re: Proposal: a revoke() system call X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Jul 2008 15:12:43 -0000 >>Rationale: >> >>In the multithreaded programs often multiple threads work with the >>same file descriptor. A particularly typical situation is a reader >>thread and a writer thread. The reader thread calls read(), gets >>blocked until it gets more data, then processes the data and >>continues the loop. Another example of a "reader thread" would be >>the main thread of a daemon that accepts the incoming connections >>and starts new per-connection threads. > >Have you tried to implement the functionality you're asking for ? > >You'll have to hunt down into all sorts of protocols, drivers >and other code to find the threads sleeping on your fd so you can >wake them. My thinking has been that if close() wakes them up, then things would be inherited from there. The thing I didn't know is that apparently in many cases close() doesn't wake them up. -SB From owner-freebsd-arch@FreeBSD.ORG Mon Jul 7 15:30:07 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 960341065683 for ; Mon, 7 Jul 2008 15:30:07 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id 6CBB08FC1D for ; Mon, 7 Jul 2008 15:30:07 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id B4A2746C7D; Mon, 7 Jul 2008 11:30:06 -0400 (EDT) Date: Mon, 7 Jul 2008 16:30:06 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Sergey Babkin In-Reply-To: <7100389.65001215443113960.JavaMail.root@vms074.mailsrvcs.net> Message-ID: <20080707162733.V63144@fledge.watson.org> References: <7100389.65001215443113960.JavaMail.root@vms074.mailsrvcs.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: arch@freebsd.org Subject: Re: Re: Proposal: a revoke() system call X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Jul 2008 15:30:07 -0000 On Mon, 7 Jul 2008, Sergey Babkin wrote: >> On Sun, 6 Jul 2008, Sergey Babkin wrote: >> >>> int revoke(int fd, int flags) >> >> Seems like that conflicts with our existing revoke(2) system call. You >> could > > Aha, I guess when I've checked, I've looked at a real old version of > FreeBSD. Sure, the name can be changed. I won't point you at the HISTORY section of the revoke(2) system call then :-). >> achieve something of the same end by opening /dev/null and then dup2()'ing >> to the file descriptor you want to revoke, perhaps? Right now there's a >> known > > That's a great idea. I haven't thought about it. It should do everything. Right, and possibly this means that no additional kernel support is required -- we just make it a libc or libutil interface. >> issue that calling close(2) on a socket from one thread doesn't interrupt a >> socket in a blocking I/O call from another thread -- you first have to call >> shutdown(2), and then close(2). This has caused problems for Java in the >> past, but I'm not sure that it's really a bug given that it's not >> unreasonable behavior not rejected by the spec :-). > > Maybe I'll see if I can fix that. Well, fixing this is easy -- instead of holding a reference to the file descriptor over the system call, hold a reference to the socket. The problem with that is that it creates a lot more contention on the socket locks when the reference count is dropped, not to mention more locking operations. This can be fixed but requires quite a lot of work, whereas this rather minor semantic issue is a non-problem in practice. I do have dealing with this reference issue on my todo list, but it's very low on the list because there are lots of other areas where we can significantly improve performance or semantics more easily and more quickly :-). Robert N M Watson Computer Laboratory University of Cambridge From owner-freebsd-arch@FreeBSD.ORG Mon Jul 7 15:39:50 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 640C1106564A for ; Mon, 7 Jul 2008 15:39:50 +0000 (UTC) (envelope-from cokane@freebsd.org) Received: from QMTA06.westchester.pa.mail.comcast.net (qmta06.westchester.pa.mail.comcast.net [76.96.62.56]) by mx1.freebsd.org (Postfix) with ESMTP id F227A8FC1F for ; Mon, 7 Jul 2008 15:39:49 +0000 (UTC) (envelope-from cokane@freebsd.org) Received: from OMTA13.westchester.pa.mail.comcast.net ([76.96.62.52]) by QMTA06.westchester.pa.mail.comcast.net with comcast id myiQ1Z01717dt5G560SE00; Mon, 07 Jul 2008 15:39:49 +0000 Received: from mail.cokane.org ([24.60.133.163]) by OMTA13.westchester.pa.mail.comcast.net with comcast id n3fX1Z0013Xh0XL3Z3fXnp; Mon, 07 Jul 2008 15:39:31 +0000 X-Authority-Analysis: v=1.0 c=1 a=Li5Y0fHWCEIA:10 a=WHdRpAi2Uv8A:10 a=2unGPHi5edsetU3uhJcA:9 a=aPG21ZidV1ZJvFhAlioA:7 a=FyUPjD5X6zzcZFH1qrdFFafRYN0A:4 a=LY0hPdMaydYA:10 a=NlNGMweHOcKhGC9rj5MA:9 a=sg4nvr2fZj3WaDBFfOS0kcQGSwgA:4 a=rPt6xJ-oxjAA:10 Received: by mail.cokane.org (Postfix, from userid 103) id 94B1316B55B; Mon, 7 Jul 2008 11:39:38 -0400 (EDT) X-Spam-Checker-Version: SpamAssassin 3.1.8-gr1 (2007-02-13) on discordia X-Spam-Level: X-Spam-Status: No, score=-4.4 required=5.0 tests=ALL_TRUSTED,BAYES_00 autolearn=ham version=3.1.8-gr1 Received: from [172.20.1.3] (erwin.int.cokane.org [172.20.1.3]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.cokane.org (Postfix) with ESMTP id 10C5316B55D; Mon, 7 Jul 2008 11:39:22 -0400 (EDT) From: Coleman Kane To: Sergey Babkin In-Reply-To: <1878557.67061215443549669.JavaMail.root@vms074.mailsrvcs.net> References: <1878557.67061215443549669.JavaMail.root@vms074.mailsrvcs.net> Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="=-u+nzY9h2taZhoVYsMhkp" Organization: FreeBSD Project Date: Mon, 07 Jul 2008 11:37:01 -0400 Message-Id: <1215445021.2033.13.camel@localhost> Mime-Version: 1.0 X-Mailer: Evolution 2.22.3.1 FreeBSD GNOME Team Port Cc: arch@freebsd.org, Poul-Henning Kamp Subject: Re: Re: Proposal: a revoke() system call X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Jul 2008 15:39:50 -0000 --=-u+nzY9h2taZhoVYsMhkp Content-Type: text/plain Content-Transfer-Encoding: quoted-printable On Mon, 2008-07-07 at 10:12 -0500, Sergey Babkin wrote: > >>Rationale: > >> > >>In the multithreaded programs often multiple threads work with the > >>same file descriptor. A particularly typical situation is a reader > >>thread and a writer thread. The reader thread calls read(), gets > >>blocked until it gets more data, then processes the data and > >>continues the loop. Another example of a "reader thread" would be=20 > >>the main thread of a daemon that accepts the incoming connections > >>and starts new per-connection threads.=20 > > > >Have you tried to implement the functionality you're asking for ? > > > >You'll have to hunt down into all sorts of protocols, drivers > >and other code to find the threads sleeping on your fd so you can > >wake them. >=20 > My thinking has been that if close() wakes them up, then things would be > inherited from there. The thing I didn't know is that apparently in many = cases close() > doesn't wake them up. >=20 > -SB >=20 In cases where I need to wake the select() up immediately for cases such as this, I've implemented a "trigger pipe" that I include on the select list. This is a simple pipe, that is written to by the application in the cases where I want to design multi-threaded blocking I/O mechanisms such as this to wake up immediately. The "master thread" writes to the pipe, and the blocking thread gets notified of the readability of that pipe's fd through my blocking select() call. Then, it attempts to read (non-blocking-read) from the fd that I had closed, and gets an error returned. The reader thread then knows to exit the read-loop and return to the caller (which hopefully cleans it up later with pthread_join). That method seems to work really well in a relatively cross-platform manner. I typically shy away from these types of designs, however. I attempt to ensure that my "reader threads" use select() calls with a reasonable timeout (100ms or even 250ms is usually decent for non-realtime software), and have an external trigger variable such as a bool or similar (named stop_threads) that is part of the struct pointer that I have passed in the void* argument to the thread function when calling pthread_create(). Basically, my master thread would have a cleanup routine that runs at shut-down and sets the variable to true. It then proceeds to pthread_join() all of the threads that trigger their exits on that variable. Only once all threads are joined do I close() the file descriptor. The sequence of events can easily be applied to non-shutdown events where such behavior is desired, however. The key point here is the use of select() to determine when a descriptor is readable, and then using non-blocking I/O to perform the actual read/write calls. --=20 Coleman Kane --=-u+nzY9h2taZhoVYsMhkp Content-Type: application/pgp-signature; name=signature.asc Content-Description: This is a digitally signed message part -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (FreeBSD) iEYEABECAAYFAkhyOBcACgkQcMSxQcXat5eKwgCffWDRgJVTWe54K3FXJRe06YVP V5MAn2b2IrlH51vQ2NyiaNPxHbTGLoA2 =Pqi4 -----END PGP SIGNATURE----- --=-u+nzY9h2taZhoVYsMhkp-- From owner-freebsd-arch@FreeBSD.ORG Mon Jul 7 16:04:12 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D820A106568D for ; Mon, 7 Jul 2008 16:04:12 +0000 (UTC) (envelope-from phk@critter.freebsd.dk) Received: from phk.freebsd.dk (phk.freebsd.dk [130.225.244.222]) by mx1.freebsd.org (Postfix) with ESMTP id 9A4798FC1C for ; Mon, 7 Jul 2008 16:04:12 +0000 (UTC) (envelope-from phk@critter.freebsd.dk) Received: from critter.freebsd.dk (unknown [192.168.64.3]) by phk.freebsd.dk (Postfix) with ESMTP id 194A3170EB; Mon, 7 Jul 2008 16:04:11 +0000 (UTC) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.14.2/8.14.2) with ESMTP id m67G4A47006861; Mon, 7 Jul 2008 16:04:10 GMT (envelope-from phk@critter.freebsd.dk) To: Sergey Babkin From: "Poul-Henning Kamp" In-Reply-To: Your message of "Mon, 07 Jul 2008 10:12:29 EST." <1878557.67061215443549669.JavaMail.root@vms074.mailsrvcs.net> Date: Mon, 07 Jul 2008 16:04:10 +0000 Message-ID: <6860.1215446650@critter.freebsd.dk> Sender: phk@critter.freebsd.dk Cc: arch@freebsd.org Subject: Re: Proposal: a revoke() system call X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Jul 2008 16:04:12 -0000 In message <1878557.67061215443549669.JavaMail.root@vms074.mailsrvcs.net>, Serg ey Babkin writes: >My thinking has been that if close() wakes them up, then things would be >inherited from there. The thing I didn't know is that apparently in many cases close() >doesn't wake them up. It's a novel idea, seen with POSIX eyes, that a thread can close a fd it is sleeping on, so the semantics, how obvious they might be, is not described in the standards, and more importantly, not described in the code either. The device driver problem has more angles to it and should be thought out separately, since the same basic functionality is required for hardware removal, only more draconian. I'm not saying that such a systemcall is not a good idea, I'm merely very cautious about what it takes to implement it. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From owner-freebsd-arch@FreeBSD.ORG Mon Jul 7 16:05:56 2008 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 04A621065676; Mon, 7 Jul 2008 16:05:56 +0000 (UTC) (envelope-from phk@critter.freebsd.dk) Received: from phk.freebsd.dk (phk.freebsd.dk [130.225.244.222]) by mx1.freebsd.org (Postfix) with ESMTP id BA94E8FC15; Mon, 7 Jul 2008 16:05:55 +0000 (UTC) (envelope-from phk@critter.freebsd.dk) Received: from critter.freebsd.dk (unknown [192.168.64.3]) by phk.freebsd.dk (Postfix) with ESMTP id AB910170EB; Mon, 7 Jul 2008 16:05:54 +0000 (UTC) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.14.2/8.14.2) with ESMTP id m67G5scU006883; Mon, 7 Jul 2008 16:05:54 GMT (envelope-from phk@critter.freebsd.dk) To: Robert Watson From: "Poul-Henning Kamp" In-Reply-To: Your message of "Mon, 07 Jul 2008 16:30:06 +0100." <20080707162733.V63144@fledge.watson.org> Date: Mon, 07 Jul 2008 16:05:54 +0000 Message-ID: <6882.1215446754@critter.freebsd.dk> Sender: phk@critter.freebsd.dk Cc: arch@FreeBSD.org, Sergey Babkin Subject: Re: Proposal: a revoke() system call X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Jul 2008 16:05:56 -0000 In message <20080707162733.V63144@fledge.watson.org>, Robert Watson writes: >>> achieve something of the same end by opening /dev/null and then dup2()'ing >>> to the file descriptor you want to revoke, perhaps? Right now there's a >>> known >> >> That's a great idea. I haven't thought about it. It should do everything. > >Right, and possibly this means that no additional kernel support is required >-- we just make it a libc or libutil interface. I can't see how that could possibly work... If you do a dup2(), the original fd is closed, and that still does not release all threads that may be sleeing on it in device drivers. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From owner-freebsd-arch@FreeBSD.ORG Mon Jul 7 16:33:23 2008 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 44E3A1065674 for ; Mon, 7 Jul 2008 16:33:23 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id 1C0AC8FC23 for ; Mon, 7 Jul 2008 16:33:22 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 75DA146C16; Mon, 7 Jul 2008 12:33:22 -0400 (EDT) Date: Mon, 7 Jul 2008 17:33:22 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Poul-Henning Kamp In-Reply-To: <6882.1215446754@critter.freebsd.dk> Message-ID: <20080707173102.L63144@fledge.watson.org> References: <6882.1215446754@critter.freebsd.dk> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: arch@FreeBSD.org, Sergey Babkin Subject: Re: Proposal: a revoke() system call X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Jul 2008 16:33:23 -0000 On Mon, 7 Jul 2008, Poul-Henning Kamp wrote: > In message <20080707162733.V63144@fledge.watson.org>, Robert Watson writes: > >>>> achieve something of the same end by opening /dev/null and then >>>> dup2()'ing to the file descriptor you want to revoke, perhaps? Right now >>>> there's a known >>> >>> That's a great idea. I haven't thought about it. It should do everything. >> >> Right, and possibly this means that no additional kernel support is >> required -- we just make it a libc or libutil interface. > > I can't see how that could possibly work... > > If you do a dup2(), the original fd is closed, and that still does not > release all threads that may be sleeing on it in device drivers. I see interrupting current consumers as a separable issue from invalidating the file descriptor for future users. I'm not convinced there's a good general solution for interrupting current consumers of a file descriptor -- we can improve the semantics for a few objects (i.e., sockets) if required, but I'm not sure it generalizes well. For sockets, generally speaking, calling shutdown(2) is the approved way to initiate a disconnect, which will lead to other consumers being kicked out of operations on the file descriptor, rather than close(2), which in general doesn't initiate a disconnect because it's a reference count operation on the underlying object. Robert N M Watson Computer Laboratory University of Cambridge From owner-freebsd-arch@FreeBSD.ORG Mon Jul 7 17:28:26 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 52291106566C; Mon, 7 Jul 2008 17:28:26 +0000 (UTC) (envelope-from babkin@verizon.net) Received: from vms042pub.verizon.net (vms042pub.verizon.net [206.46.252.42]) by mx1.freebsd.org (Postfix) with ESMTP id 3149A8FC23; Mon, 7 Jul 2008 17:28:26 +0000 (UTC) (envelope-from babkin@verizon.net) Received: from vms227.mailsrvcs.net ([172.18.12.133]) by vms042.mailsrvcs.net (Sun Java System Messaging Server 6.2-6.01 (built Apr 3 2006)) with ESMTPA id <0K3N003O0CITD522@vms042.mailsrvcs.net>; Mon, 07 Jul 2008 12:28:05 -0500 (CDT) Received: from 65.242.108.162 ([65.242.108.162]) by vms227.mailsrvcs.net (Verizon Webmail) with HTTP; Mon, 07 Jul 2008 12:28:05 -0500 (CDT) Date: Mon, 07 Jul 2008 12:28:05 -0500 (CDT) From: Sergey Babkin X-Originating-IP: [65.242.108.162] To: Sergey Babkin , Coleman Kane Message-id: <22302744.211651215451685258.JavaMail.root@vms227.mailsrvcs.net> MIME-version: 1.0 Content-type: text/plain; charset=UTF-8 Content-transfer-encoding: 7bit Cc: arch@freebsd.org, Poul-Henning Kamp Subject: Re: Re: Re: Proposal: a revoke() system call X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Jul 2008 17:28:26 -0000 >> My thinking has been that if close() wakes them up, then things would be >> inherited from there. The thing I didn't know is that apparently in many cases close() >> doesn't wake them up. >> >> -SB >> > >In cases where I need to wake the select() up immediately for cases such >as this, I've implemented a "trigger pipe" that I include on the select >list. This is a simple pipe, that is written to by the application in Yep, This is the design I'm trying to avoid :-) -SB From owner-freebsd-arch@FreeBSD.ORG Mon Jul 7 17:33:06 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CD14D106568F; Mon, 7 Jul 2008 17:33:06 +0000 (UTC) (envelope-from babkin@verizon.net) Received: from vms044pub.verizon.net (vms044pub.verizon.net [206.46.252.44]) by mx1.freebsd.org (Postfix) with ESMTP id ACE438FC0C; Mon, 7 Jul 2008 17:33:06 +0000 (UTC) (envelope-from babkin@verizon.net) Received: from vms227.mailsrvcs.net ([172.18.12.133]) by vms044.mailsrvcs.net (Sun Java System Messaging Server 6.2-6.01 (built Apr 3 2006)) with ESMTPA id <0K3N00B47CQO99L4@vms044.mailsrvcs.net>; Mon, 07 Jul 2008 12:32:48 -0500 (CDT) Received: from 65.242.108.162 ([65.242.108.162]) by vms227.mailsrvcs.net (Verizon Webmail) with HTTP; Mon, 07 Jul 2008 12:32:48 -0500 (CDT) Date: Mon, 07 Jul 2008 12:32:48 -0500 (CDT) From: Sergey Babkin X-Originating-IP: [65.242.108.162] To: Sergey Babkin , Robert Watson Message-id: <22395548.214801215451968131.JavaMail.root@vms227.mailsrvcs.net> MIME-version: 1.0 Content-type: text/plain; charset=UTF-8 Content-transfer-encoding: 7bit Cc: arch@freebsd.org Subject: Re: Re: Re: Proposal: a revoke() system call X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Jul 2008 17:33:06 -0000 >>> issue that calling close(2) on a socket from one thread doesn't interrupt a >>> socket in a blocking I/O call from another thread -- you first have to call >>> shutdown(2), and then close(2). This has caused problems for Java in the >>> past, but I'm not sure that it's really a bug given that it's not >>> unreasonable behavior not rejected by the spec :-). >> >> Maybe I'll see if I can fix that. > >Well, fixing this is easy -- instead of holding a reference to the file >descriptor over the system call, hold a reference to the socket. The problem >with that is that it creates a lot more contention on the socket locks when >the reference count is dropped, not to mention more locking operations. This >can be fixed but requires quite a lot of work, whereas this rather minor >semantic issue is a non-problem in practice. I do have dealing with this I can't comment much without actually looking at the code, but why would the contention on close() be such an issue? Close() is not called that often, compared for example to read(), so there should not be much contention to start with. And why not just call the shutdown() logic from inside close() implementation? -SB From owner-freebsd-arch@FreeBSD.ORG Mon Jul 7 18:49:37 2008 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9E2FF1065690; Mon, 7 Jul 2008 18:49:37 +0000 (UTC) (envelope-from babkin@verizon.net) Received: from vms173003pub.verizon.net (vms173003pub.verizon.net [206.46.173.3]) by mx1.freebsd.org (Postfix) with ESMTP id 7C2A48FC1E; Mon, 7 Jul 2008 18:49:37 +0000 (UTC) (envelope-from babkin@verizon.net) Received: from vms227.mailsrvcs.net ([172.18.12.133]) by vms173003.mailsrvcs.net (Sun Java System Messaging Server 6.2-6.01 (built Apr 3 2006)) with ESMTPA id <0K3N008IPDI5RWAA@vms173003.mailsrvcs.net>; Mon, 07 Jul 2008 12:49:18 -0500 (CDT) Received: from 65.242.108.162 ([65.242.108.162]) by vms227.mailsrvcs.net (Verizon Webmail) with HTTP; Mon, 07 Jul 2008 12:49:17 -0500 (CDT) Date: Mon, 07 Jul 2008 12:49:17 -0500 (CDT) From: Sergey Babkin X-Originating-IP: [65.242.108.162] To: Robert Watson , Poul-Henning Kamp Message-id: <9820978.224231215452958017.JavaMail.root@vms227.mailsrvcs.net> MIME-version: 1.0 Content-type: text/plain; charset=UTF-8 Content-transfer-encoding: 7bit Cc: arch@FreeBSD.org, Sergey Babkin Subject: Re: Re: Proposal: a revoke() system call X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Jul 2008 18:49:37 -0000 >From: Poul-Henning Kamp >In message <20080707162733.V63144@fledge.watson.org>, Robert Watson writes: > >>>> achieve something of the same end by opening /dev/null and then dup2()'ing >>>> to the file descriptor you want to revoke, perhaps? Right now there's a >>>> known >>> >>> That's a great idea. I haven't thought about it. It should do everything. >> >>Right, and possibly this means that no additional kernel support is required >>-- we just make it a libc or libutil interface. > >I can't see how that could possibly work... > >If you do a dup2(), the original fd is closed, and that still does not >release all threads that may be sleeing on it in device drivers. Device drivers definitely would be a pain. I guess it depends on the semantics of the driver close() routine in cdevsw. Even if it's called every time a process does close() - well, assuming that a process didn't share it through fork(), and if it did then the last process'es close() - then the driver might still not be handling correctly the wake-up of all the threads coming from this file descriptor. But I guess it's again connected to whether they can handle close() from a multithreaded application, with other threads still trying to read. Maybe make a new entry in devsw, requesting the driver to wake up all the sleepers (either coming from a particular file descriptor, or all the sleepers at all) and return EINTR for them. Then if we replace the entry in the file table first, the interrupted threads would handle the signal as usual, come back and find that when they try to restart the call on this descriptor, they get a hard error. Hm, maybe even a new devsw entry is not needed. Just pretend delivering a signal to all the threads, skipping the ones that aren't currently sleeping on a file I/O (i.e. running on sleeping on a synchronization primitive). Then just don't call any signal handler for this pretend-signal, return a EINTR and let it be handled in an usual way. -SB From owner-freebsd-arch@FreeBSD.ORG Mon Jul 7 18:52:34 2008 Return-Path: Delivered-To: arch@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A9997106567B for ; Mon, 7 Jul 2008 18:52:34 +0000 (UTC) (envelope-from das@FreeBSD.ORG) Received: from zim.MIT.EDU (ZIM.MIT.EDU [18.95.3.101]) by mx1.freebsd.org (Postfix) with ESMTP id 68FA08FC17 for ; Mon, 7 Jul 2008 18:52:34 +0000 (UTC) (envelope-from das@FreeBSD.ORG) Received: from zim.MIT.EDU (localhost [127.0.0.1]) by zim.MIT.EDU (8.14.2/8.14.2) with ESMTP id m67IN2pL034901; Mon, 7 Jul 2008 14:23:02 -0400 (EDT) (envelope-from das@FreeBSD.ORG) Received: (from das@localhost) by zim.MIT.EDU (8.14.2/8.14.2/Submit) id m67IN29O034900; Mon, 7 Jul 2008 14:23:02 -0400 (EDT) (envelope-from das@FreeBSD.ORG) Date: Mon, 7 Jul 2008 14:23:02 -0400 From: David Schultz To: Sergey Babkin Message-ID: <20080707182302.GA34751@zim.MIT.EDU> Mail-Followup-To: Sergey Babkin , Poul-Henning Kamp , arch@FreeBSD.ORG References: <1878557.67061215443549669.JavaMail.root@vms074.mailsrvcs.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1878557.67061215443549669.JavaMail.root@vms074.mailsrvcs.net> Cc: arch@FreeBSD.ORG, Poul-Henning Kamp Subject: Re: Re: Proposal: a revoke() system call X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Jul 2008 18:52:34 -0000 On Mon, Jul 07, 2008, Sergey Babkin wrote: > >>Rationale: > >> > >>In the multithreaded programs often multiple threads work with the > >>same file descriptor. A particularly typical situation is a reader > >>thread and a writer thread. The reader thread calls read(), gets > >>blocked until it gets more data, then processes the data and > >>continues the loop. Another example of a "reader thread" would be > >>the main thread of a daemon that accepts the incoming connections > >>and starts new per-connection threads. > > > >Have you tried to implement the functionality you're asking for ? > > > >You'll have to hunt down into all sorts of protocols, drivers > >and other code to find the threads sleeping on your fd so you can > >wake them. > > My thinking has been that if close() wakes them up, then things would be > inherited from there. The thing I didn't know is that apparently in many cases close() > doesn't wake them up. In Solaris, if you close a file descriptor that has blocked readers, the readers wake up and read() returns 0 bytes (EOF). (At least this is true if you close the local end of a pipe.) It seems like implementing the same behavior in FreeBSD would address your problem without introducing a new system call. Is there a good reason why this might not be the right thing to do? From owner-freebsd-arch@FreeBSD.ORG Mon Jul 7 19:32:53 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 164861065671 for ; Mon, 7 Jul 2008 19:32:53 +0000 (UTC) (envelope-from kensmith@cse.Buffalo.EDU) Received: from phoebe.cse.buffalo.edu (phoebe.cse.buffalo.edu [128.205.32.89]) by mx1.freebsd.org (Postfix) with ESMTP id CE8A08FC13 for ; Mon, 7 Jul 2008 19:32:52 +0000 (UTC) (envelope-from kensmith@cse.Buffalo.EDU) Received: from [192.168.1.101] (cpe-74-77-179-53.buffalo.res.rr.com [74.77.179.53]) (authenticated bits=0) by phoebe.cse.buffalo.edu (8.14.1/8.13.7) with ESMTP id m67J06bB020002 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 7 Jul 2008 15:00:07 -0400 (EDT) (envelope-from kensmith@cse.buffalo.edu) From: Ken Smith To: Robert Watson In-Reply-To: <20080707000313.P56885@fledge.watson.org> References: <48714866.906912CC@verizon.net> <20080707000313.P56885@fledge.watson.org> Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="=-g9ENhPw2RXZKY97cVWJq" Date: Mon, 07 Jul 2008 15:00:01 -0400 Message-Id: <1215457201.89956.11.camel@neo.cse.buffalo.edu> Mime-Version: 1.0 X-Mailer: Evolution 2.12.1 FreeBSD GNOME Team Port X-DCC-Buffalo.EDU-Metrics: phoebe.cse.buffalo.edu 1336; Body=0 Fuz1=0 Fuz2=0 Cc: arch@freebsd.org, Sergey Babkin Subject: Re: Proposal: a revoke() system call X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Jul 2008 19:32:53 -0000 --=-g9ENhPw2RXZKY97cVWJq Content-Type: text/plain Content-Transfer-Encoding: quoted-printable On Mon, 2008-07-07 at 00:05 +0100, Robert Watson wrote: > You could=20 > achieve something of the same end by opening /dev/null and then dup2()'in= g to=20 > the file descriptor you want to revoke, perhaps? I might be missing something but isn't this what the deadfs vnodeops are for? --=20 Ken Smith - From there to here, from here to | kensmith@cse.buffalo.edu there, funny things are everywhere. | - Theodore Geisel | --=-g9ENhPw2RXZKY97cVWJq Content-Type: application/pgp-signature; name=signature.asc Content-Description: This is a digitally signed message part -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (FreeBSD) iEYEABECAAYFAkhyZ6YACgkQ/G14VSmup/bX1ACgmKgpwcQSw/35muWpvuLZ+ktB 00cAnR9FEn0QpIjbdtnEXhjA4aidGDeo =Shxv -----END PGP SIGNATURE----- --=-g9ENhPw2RXZKY97cVWJq-- From owner-freebsd-arch@FreeBSD.ORG Mon Jul 7 19:56:15 2008 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0B2DA106567E; Mon, 7 Jul 2008 19:56:15 +0000 (UTC) (envelope-from phk@critter.freebsd.dk) Received: from phk.freebsd.dk (phk.freebsd.dk [130.225.244.222]) by mx1.freebsd.org (Postfix) with ESMTP id C10EC8FC19; Mon, 7 Jul 2008 19:56:14 +0000 (UTC) (envelope-from phk@critter.freebsd.dk) Received: from critter.freebsd.dk (unknown [192.168.64.3]) by phk.freebsd.dk (Postfix) with ESMTP id E20DC170E5; Mon, 7 Jul 2008 19:56:12 +0000 (UTC) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.14.2/8.14.2) with ESMTP id m67JuCDU059734; Mon, 7 Jul 2008 19:56:12 GMT (envelope-from phk@critter.freebsd.dk) To: Sergey Babkin From: "Poul-Henning Kamp" In-Reply-To: Your message of "Mon, 07 Jul 2008 12:49:17 EST." <9820978.224231215452958017.JavaMail.root@vms227.mailsrvcs.net> Date: Mon, 07 Jul 2008 19:56:12 +0000 Message-ID: <59733.1215460572@critter.freebsd.dk> Sender: phk@critter.freebsd.dk Cc: arch@FreeBSD.org, Robert Watson Subject: Re: Proposal: a revoke() system call X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Jul 2008 19:56:15 -0000 In message <9820978.224231215452958017.JavaMail.root@vms227.mailsrvcs.net>, Ser gey Babkin writes: >Maybe make a new entry in devsw, requesting the driver to wake up >all the sleepers (either coming from a particular file descriptor, or all >the sleepers at all) and return EINTR for them. We have that, it's called ->d_purge(), but the semantics are "driver or hardware going away", not "get rid of this thread/fd". As I said earlier, this requires careful thought. If the only reason we are dicussing this, is that people find the magic pipe to select ugly, then I would even argue that we have not reached critical mass for even thinking. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From owner-freebsd-arch@FreeBSD.ORG Mon Jul 7 20:03:23 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 271C5106567B; Mon, 7 Jul 2008 20:03:23 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail09.syd.optusnet.com.au (mail09.syd.optusnet.com.au [211.29.132.190]) by mx1.freebsd.org (Postfix) with ESMTP id B74338FC12; Mon, 7 Jul 2008 20:03:22 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from besplex.bde.org (c220-239-252-11.carlnfd3.nsw.optusnet.com.au [220.239.252.11]) by mail09.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id m67K3Hkc011434 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 8 Jul 2008 06:03:18 +1000 Date: Tue, 8 Jul 2008 06:03:17 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: David Schultz In-Reply-To: <20080707182302.GA34751@zim.MIT.EDU> Message-ID: <20080708051956.L1122@besplex.bde.org> References: <1878557.67061215443549669.JavaMail.root@vms074.mailsrvcs.net> <20080707182302.GA34751@zim.MIT.EDU> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: arch@freebsd.org, Poul-Henning Kamp , Sergey Babkin Subject: Re: Re: Proposal: a revoke() system call X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Jul 2008 20:03:23 -0000 On Mon, 7 Jul 2008, David Schultz wrote: > On Mon, Jul 07, 2008, Sergey Babkin wrote: >>>> Rationale: >>>> >>>> In the multithreaded programs often multiple threads work with the >>>> same file descriptor. A particularly typical situation is a reader >>>> thread and a writer thread. The reader thread calls read(), gets >>>> blocked until it gets more data, then processes the data and >>>> continues the loop. Another example of a "reader thread" would be >>>> the main thread of a daemon that accepts the incoming connections >>>> and starts new per-connection threads. >>> >>> Have you tried to implement the functionality you're asking for ? >>> >>> You'll have to hunt down into all sorts of protocols, drivers >>> and other code to find the threads sleeping on your fd so you can >>> wake them. >> >> My thinking has been that if close() wakes them up, then things would be >> inherited from there. The thing I didn't know is that apparently in many cases close() >> doesn't wake them up. > > In Solaris, if you close a file descriptor that has blocked > readers, the readers wake up and read() returns 0 bytes (EOF). > (At least this is true if you close the local end of a pipe.) > It seems like implementing the same behavior in FreeBSD would > address your problem without introducing a new system call. > Is there a good reason why this might not be the right thing to do? Does this happen even for non-last closes of all file types? Pipes are too simple :-). Under FreeBSD, ordinary revoke(2) needs to do wake up all readers and synchronize with them (preferably without waiting for them), but it has never done this. The kernel has no mechanism for finding threads sleeping or doing i/o on an fd short of what fstat does (searching half of kmem for hints). Only a small amount of progress has been made in fixing this in the 20 years that revoke() has existed. Most of the necessary wakeups don't occur. A few occur accidentally. So it is normal for threads to be left active after revoke() completes, and the progress is mainly that the devfs and conf layers try harder to prevent deallocation of active data structures for devices in this state. The active threads may do some damage when they wake up with a closed or a new generation of open device, but usually don't. Tty drivers use a generation count to prevent some uses of new generations of opens, but don't check it in enough places. I haven't noticed any other class of drivers doing even this much. Since revoke() is used mainly on tty devices and the generation count almost works for these, these bugs are rarely noticed. Bruce From owner-freebsd-arch@FreeBSD.ORG Mon Jul 7 21:51:01 2008 Return-Path: Delivered-To: arch@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3BDD31065678 for ; Mon, 7 Jul 2008 21:51:01 +0000 (UTC) (envelope-from babkin@verizon.net) Received: from vms173001pub.verizon.net (vms173001pub.verizon.net [206.46.173.1]) by mx1.freebsd.org (Postfix) with ESMTP id 1A2868FC14 for ; Mon, 7 Jul 2008 21:51:00 +0000 (UTC) (envelope-from babkin@verizon.net) Received: from vms126.mailsrvcs.net ([172.18.12.131]) by vms173001.mailsrvcs.net (Sun Java System Messaging Server 6.2-6.01 (built Apr 3 2006)) with ESMTPA id <0K3N00E5ZOON5K06@vms173001.mailsrvcs.net>; Mon, 07 Jul 2008 16:50:48 -0500 (CDT) Received: from 65.242.108.162 ([65.242.108.162]) by vms126.mailsrvcs.net (Verizon Webmail) with HTTP; Mon, 07 Jul 2008 16:50:47 -0500 (CDT) Date: Mon, 07 Jul 2008 16:50:47 -0500 (CDT) From: Sergey Babkin X-Originating-IP: [65.242.108.162] To: Sergey Babkin , David Schultz Message-id: <9484951.340521215467447990.JavaMail.root@vms126.mailsrvcs.net> MIME-version: 1.0 Content-type: text/plain; charset=UTF-8 Content-transfer-encoding: 7bit Cc: arch@FreeBSD.ORG, Poul-Henning Kamp Subject: Re: Re: Re: Proposal: a revoke() system call X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Jul 2008 21:51:01 -0000 >From: David Schultz >On Mon, Jul 07, 2008, Sergey Babkin wrote: >> >>Rationale: >> >> >> >>In the multithreaded programs often multiple threads work with the >> >>same file descriptor. A particularly typical situation is a reader >> >>thread and a writer thread. The reader thread calls read(), gets >> >>blocked until it gets more data, then processes the data and >> >>continues the loop. Another example of a "reader thread" would be >> >>the main thread of a daemon that accepts the incoming connections >> >>and starts new per-connection threads. >> > >> >Have you tried to implement the functionality you're asking for ? >> > >> >You'll have to hunt down into all sorts of protocols, drivers >> >and other code to find the threads sleeping on your fd so you can >> >wake them. >> >> My thinking has been that if close() wakes them up, then things would be >> inherited from there. The thing I didn't know is that apparently in many cases close() >> doesn't wake them up. > >In Solaris, if you close a file descriptor that has blocked >readers, the readers wake up and read() returns 0 bytes (EOF). >(At least this is true if you close the local end of a pipe.) >It seems like implementing the same behavior in FreeBSD would >address your problem without introducing a new system call. >Is there a good reason why this might not be the right thing to do? No, actually I didn't realize that FreeBSD has this issue at all :-) My experience comes from Linux and Solaris implementations. The issue is that close() introduces a race between setting the fd number in the aplication data and closing the socket. The reader works like this pseudocode: int fd; fd = mystructure.fd; if (fd < 0) return -1; return read(fd, ...); This leaves a small race window between fd is checked and read() is executed. If in the meantime another thread does close() (and sets mystructure.fd to -1), and the third thread does open() then the result of this open would use the same fd number as our old fd (since now it's likely to be the lowest available number), then read() would happen on a completely wrong file. And yes, it does happen in real world. The best workaround I've come up with is a small pause between setting mystructure.fd = -1 and calling close(). The point of proposal is to do a close() without freeing the file descriptor. -SB From owner-freebsd-arch@FreeBSD.ORG Mon Jul 7 21:57:01 2008 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B11601065691; Mon, 7 Jul 2008 21:57:01 +0000 (UTC) (envelope-from babkin@verizon.net) Received: from vms046pub.verizon.net (vms046pub.verizon.net [206.46.252.46]) by mx1.freebsd.org (Postfix) with ESMTP id 91E7F8FC0C; Mon, 7 Jul 2008 21:57:01 +0000 (UTC) (envelope-from babkin@verizon.net) Received: from vms126.mailsrvcs.net ([172.18.12.131]) by vms046.mailsrvcs.net (Sun Java System Messaging Server 6.2-6.01 (built Apr 3 2006)) with ESMTPA id <0K3N00EI6OY9EH70@vms046.mailsrvcs.net>; Mon, 07 Jul 2008 16:56:33 -0500 (CDT) Received: from 65.242.108.162 ([65.242.108.162]) by vms126.mailsrvcs.net (Verizon Webmail) with HTTP; Mon, 07 Jul 2008 16:56:33 -0500 (CDT) Date: Mon, 07 Jul 2008 16:56:33 -0500 (CDT) From: Sergey Babkin X-Originating-IP: [65.242.108.162] To: Sergey Babkin , Poul-Henning Kamp Message-id: <29793635.342951215467793639.JavaMail.root@vms126.mailsrvcs.net> MIME-version: 1.0 Content-type: text/plain; charset=UTF-8 Content-transfer-encoding: 7bit Cc: arch@FreeBSD.org, Robert Watson Subject: Re: Re: Proposal: a revoke() system call X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Jul 2008 21:57:01 -0000 >From: Poul-Henning Kamp >If the only reason we are dicussing this, is that people find the >magic pipe to select ugly, then I would even argue that we have not >reached critical mass for even thinking. Well, there are a couple of problems with magical pipes: 1. It means using 3 times as many file descriptors. (One for the original socket, and 2 for the ends of the pipe). 2. When working with the 3rd-party libraries, it requires a substantial rework of these libraries. Getting a file decriptor from inside the library's implementation and forcing it to close is a lot less invasive and can be done with a simple API wrapper. -SB From owner-freebsd-arch@FreeBSD.ORG Mon Jul 7 23:06:42 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8E1CA106567E for ; Mon, 7 Jul 2008 23:06:42 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id 26C4F8FC24 for ; Mon, 7 Jul 2008 23:06:41 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id E5E6146C7C; Mon, 7 Jul 2008 19:06:36 -0400 (EDT) Date: Tue, 8 Jul 2008 00:06:36 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Ken Smith In-Reply-To: <1215457201.89956.11.camel@neo.cse.buffalo.edu> Message-ID: <20080708000132.K63144@fledge.watson.org> References: <48714866.906912CC@verizon.net> <20080707000313.P56885@fledge.watson.org> <1215457201.89956.11.camel@neo.cse.buffalo.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: arch@freebsd.org, Sergey Babkin Subject: Re: Proposal: a revoke() system call X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Jul 2008 23:06:42 -0000 On Mon, 7 Jul 2008, Ken Smith wrote: > On Mon, 2008-07-07 at 00:05 +0100, Robert Watson wrote: >> You could achieve something of the same end by opening /dev/null and then >> dup2()'ing to the file descriptor you want to revoke, perhaps? > > I might be missing something but isn't this what the deadfs vnodeops are > for? It's a little different, although similar. When a vnode is deadfs'd, such as after a call to revoke(2)'s historic implementation, all open file descriptors on the file are invalidated. I think that Sergey is suggesting semantics in which only the current file descriptor refering to the object is invalidated -- other independently acquired file descriptors in other processes would remain valid. BTW, this does show up one of the potential semantic conflicts in the proposed new revoke behavior: suppose a TCP connection is opened, and two processes have references to the file descriptor for the connection. One of those processes is multi-threaded, and has a blocking read(2) on the file descriptor in one thread, and calls close(2) from another thread. Is the proposal to cancel in-progress I/O's against the file descriptor even though the connection isn't closing due to the further reference to the same descriptor in another process? Solaris has a pretty complex infrastructure to support that sort of in-kernel cancellation -- the shutdown(2) behavior we have is fairly different in that it manipulates connection state to cancel outstanding I/O's, and would also affect the second process, rather than simply consumers on the one file descriptor. Robert N M Watson Computer Laboratory University of Cambridge From owner-freebsd-arch@FreeBSD.ORG Mon Jul 7 23:15:26 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 32FB91065683 for ; Mon, 7 Jul 2008 23:15:26 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id 0BEF38FC0C for ; Mon, 7 Jul 2008 23:15:25 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 9869D46C81; Mon, 7 Jul 2008 19:15:24 -0400 (EDT) Date: Tue, 8 Jul 2008 00:15:24 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Sergey Babkin In-Reply-To: <22395548.214801215451968131.JavaMail.root@vms227.mailsrvcs.net> Message-ID: <20080708000701.R63144@fledge.watson.org> References: <22395548.214801215451968131.JavaMail.root@vms227.mailsrvcs.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: arch@freebsd.org Subject: Re: Re: Re: Proposal: a revoke() system call X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Jul 2008 23:15:26 -0000 On Mon, 7 Jul 2008, Sergey Babkin wrote: >> Well, fixing this is easy -- instead of holding a reference to the file >> descriptor over the system call, hold a reference to the socket. The >> problem with that is that it creates a lot more contention on the socket >> locks when the reference count is dropped, not to mention more locking >> operations. This can be fixed but requires quite a lot of work, whereas >> this rather minor semantic issue is a non-problem in practice. I do have >> dealing with this > > I can't comment much without actually looking at the code, but why would the > contention on close() be such an issue? Close() is not called that often, > compared for example to read(), so there should not be much contention to > start with. And why not just call the shutdown() logic from inside close() > implementation? This is a fairly complex issue, and one that doesn't lend itself to in-depth discussion without first looking at the code. To direct your reading, I recommend starting with the socket reference model -- you can find a high-level summary in the comments at the head of uipc_socket.c, and the comments on sofree(9). The question you're getting at indirectly has to do with the differences between fdrop(9), which drops a reference to a file descriptor, and fputsock(9), which drops a reference to a socket. You'll also find it useful to do a bit of reading regarding the difference between close(2), which releases a reference to a file descriptor from userspace, and fo_close(9), which is invoked in-kernel when the last reference to a file descriptor goes away. Robert N M Watson Computer Laboratory University of Cambridge From owner-freebsd-arch@FreeBSD.ORG Mon Jul 7 23:23:02 2008 Return-Path: Delivered-To: arch@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2EAED1065671; Mon, 7 Jul 2008 23:23:02 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id 04AF48FC21; Mon, 7 Jul 2008 23:23:01 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 7D74546C81; Mon, 7 Jul 2008 19:23:01 -0400 (EDT) Date: Tue, 8 Jul 2008 00:23:01 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Sergey Babkin In-Reply-To: <9484951.340521215467447990.JavaMail.root@vms126.mailsrvcs.net> Message-ID: <20080708001929.E63144@fledge.watson.org> References: <9484951.340521215467447990.JavaMail.root@vms126.mailsrvcs.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: arch@FreeBSD.ORG, David Schultz , Poul-Henning Kamp Subject: Re: Re: Re: Proposal: a revoke() system call X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Jul 2008 23:23:02 -0000 On Mon, 7 Jul 2008, Sergey Babkin wrote: > This leaves a small race window between fd is checked and read() is > executed. If in the meantime another thread does close() (and sets > mystructure.fd to -1), and the third thread does open() then the result of > this open would use the same fd number as our old fd (since now it's likely > to be the lowest available number), then read() would happen on a completely > wrong file. And yes, it does happen in real world. The best workaround I've > come up with is a small pause between setting mystructure.fd = -1 and > calling close(). > > The point of proposal is to do a close() without freeing the file > descriptor. Which can be accomplished by calling dup2(2) to replace the file descriptor with another file descriptor, perhaps one to /dev/null. It would be worth carefully reviewing the implementation of dup2(2) to make sure that the close->replace there is atomic with respect to other threads simultaneously allocating file descriptors, such as with pipe(2). This won't cancel existing I/Os, but per discussion, I/O cancelation is a very complicated issue. Robert N M Watson Computer Laboratory University of Cambridge From owner-freebsd-arch@FreeBSD.ORG Tue Jul 8 06:09:48 2008 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C01B11065676; Tue, 8 Jul 2008 06:09:48 +0000 (UTC) (envelope-from phk@critter.freebsd.dk) Received: from phk.freebsd.dk (phk.freebsd.dk [130.225.244.222]) by mx1.freebsd.org (Postfix) with ESMTP id 83D438FC1E; Tue, 8 Jul 2008 06:09:48 +0000 (UTC) (envelope-from phk@critter.freebsd.dk) Received: from critter.freebsd.dk (unknown [192.168.64.3]) by phk.freebsd.dk (Postfix) with ESMTP id 12EF3170E4; Tue, 8 Jul 2008 06:09:47 +0000 (UTC) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.14.2/8.14.2) with ESMTP id m6869kFw000702; Tue, 8 Jul 2008 06:09:46 GMT (envelope-from phk@critter.freebsd.dk) To: Sergey Babkin From: "Poul-Henning Kamp" In-Reply-To: Your message of "Mon, 07 Jul 2008 16:56:33 EST." <29793635.342951215467793639.JavaMail.root@vms126.mailsrvcs.net> Date: Tue, 08 Jul 2008 06:09:46 +0000 Message-ID: <701.1215497386@critter.freebsd.dk> Sender: phk@critter.freebsd.dk Cc: arch@FreeBSD.org, Robert Watson Subject: Re: Proposal: a revoke() system call X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Jul 2008 06:09:48 -0000 In message <29793635.342951215467793639.JavaMail.root@vms126.mailsrvcs.net>, Se rgey Babkin writes: >>If the only reason we are dicussing this, is that people find the >>magic pipe to select ugly, then I would even argue that we have not >>reached critical mass for even thinking. > >Well, there are a couple of problems with magical pipes: > >1. It means using 3 times as many file descriptors. (One for the original >socket, and 2 for the ends of the pipe). > >2. When working with the 3rd-party libraries, it requires a substantial rework of these >libraries. Getting a file decriptor from inside the library's implementation >and forcing it to close is a lot less invasive and can be done with a simple >API wrapper. What you're proposing to do in the kernel isn't any less complicated :-) -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From owner-freebsd-arch@FreeBSD.ORG Tue Jul 8 11:16:52 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7DA921065676; Tue, 8 Jul 2008 11:16:52 +0000 (UTC) (envelope-from des@des.no) Received: from tim.des.no (tim.des.no [194.63.250.121]) by mx1.freebsd.org (Postfix) with ESMTP id 3E03B8FC1D; Tue, 8 Jul 2008 11:16:52 +0000 (UTC) (envelope-from des@des.no) Received: from ds4.des.no (des.no [84.49.246.2]) by smtp.des.no (Postfix) with ESMTP id AF6552086; Tue, 8 Jul 2008 12:57:20 +0200 (CEST) From: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= To: Sergey Babkin References: <7100389.65001215443113960.JavaMail.root@vms074.mailsrvcs.net> Date: Tue, 08 Jul 2008 12:57:20 +0200 In-Reply-To: <7100389.65001215443113960.JavaMail.root@vms074.mailsrvcs.net> (Sergey Babkin's message of "Mon\, 07 Jul 2008 10\:05\:13 -0500 \(CDT\)") Message-ID: <86ej647ien.fsf@ds4.des.no> User-Agent: Gnus/5.110006 (No Gnus v0.6) Emacs/23.0.60 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Cc: arch@freebsd.org, Robert Watson Subject: Re: Proposal: a revoke() system call X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Jul 2008 11:16:52 -0000 Sergey Babkin writes: > Robert Watson writes: > > Seems like that conflicts with our existing revoke(2) system call. > Aha, I guess when I've checked, I've looked at a real old version of > FreeBSD. "real old", as in "three years before FreeBSD even existed"? revoke(2) was introduced in 4.3BSD Reno in 1990. BTW, could you please switch to a MUA that correctly inserts In-Reply-To: and / or References: headers? DES --=20 Dag-Erling Sm=C3=B8rgrav - des@des.no From owner-freebsd-arch@FreeBSD.ORG Tue Jul 8 12:28:06 2008 Return-Path: Delivered-To: arch@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A16E41065684; Tue, 8 Jul 2008 12:28:06 +0000 (UTC) (envelope-from babkin@verizon.net) Received: from vms046pub.verizon.net (vms046pub.verizon.net [206.46.252.46]) by mx1.freebsd.org (Postfix) with ESMTP id 805648FC15; Tue, 8 Jul 2008 12:28:06 +0000 (UTC) (envelope-from babkin@verizon.net) Received: from verizon.net ([63.24.211.81]) by vms046.mailsrvcs.net (Sun Java System Messaging Server 6.2-6.01 (built Apr 3 2006)) with ESMTPA id <0K3O00E7UTAJEDW1@vms046.mailsrvcs.net>; Tue, 08 Jul 2008 07:27:57 -0500 (CDT) Date: Tue, 08 Jul 2008 08:32:18 -0400 From: Sergey Babkin Sender: root To: Robert Watson Message-id: <48735E52.65BE464B@verizon.net> MIME-version: 1.0 X-Mailer: Mozilla 4.7 [en] (X11; U; FreeBSD 4.7-RELEASE i386) Content-type: text/plain; charset=koi8-r Content-transfer-encoding: 7bit X-Accept-Language: en, ru References: <9484951.340521215467447990.JavaMail.root@vms126.mailsrvcs.net> <20080708001929.E63144@fledge.watson.org> Cc: arch@FreeBSD.ORG, David Schultz , Poul-Henning Kamp Subject: Re: Proposal: a revoke() system call X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Jul 2008 12:28:06 -0000 Robert Watson wrote: > > On Mon, 7 Jul 2008, Sergey Babkin wrote: > > > This leaves a small race window between fd is checked and read() is > > executed. If in the meantime another thread does close() (and sets > > mystructure.fd to -1), and the third thread does open() then the result of > > this open would use the same fd number as our old fd (since now it's likely > > to be the lowest available number), then read() would happen on a completely > > wrong file. And yes, it does happen in real world. The best workaround I've > > come up with is a small pause between setting mystructure.fd = -1 and > > calling close(). > > > > The point of proposal is to do a close() without freeing the file > > descriptor. > > Which can be accomplished by calling dup2(2) to replace the file descriptor > with another file descriptor, perhaps one to /dev/null. It would be worth Yes, dup2() is certainly a better idea than a separate call. I've just assumed that David is following the discussion so far :-) -SB From owner-freebsd-arch@FreeBSD.ORG Tue Jul 8 14:16:21 2008 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6A6F9106566C; Tue, 8 Jul 2008 14:16:21 +0000 (UTC) (envelope-from ed@hoeg.nl) Received: from palm.hoeg.nl (mx0.hoeg.nl [IPv6:2001:610:652::211]) by mx1.freebsd.org (Postfix) with ESMTP id 2B9308FC14; Tue, 8 Jul 2008 14:16:21 +0000 (UTC) (envelope-from ed@hoeg.nl) Received: by palm.hoeg.nl (Postfix, from userid 1000) id 3AEB41CE27; Tue, 8 Jul 2008 16:16:20 +0200 (CEST) Date: Tue, 8 Jul 2008 16:16:20 +0200 From: Ed Schouten To: FreeBSD Current , FreeBSD Arch Message-ID: <20080708141620.GG14567@hoeg.nl> References: <20080702190901.GS14567@hoeg.nl> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="wQkw7DhpL9hyPo7K" Content-Disposition: inline In-Reply-To: <20080702190901.GS14567@hoeg.nl> User-Agent: Mutt/1.5.18 (2008-05-17) Cc: Philip Paeps Subject: MPSAFE TTY schedule - update X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Jul 2008 14:16:21 -0000 --wQkw7DhpL9hyPo7K Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Hello everyone, First of all, I am really impressed by the amount of people that have shown interest in helping me drive this project forward: - kris@ and some other people has been testing the patches. So far they found some small bugs, which should all be fixed now. Thanks! - marcel@ and nyan@ have already been working on uart(4). Last time I heard, they may have already gotten it working on certain pieces of PC98 hardware. - kan@ has committed a patch in the mpsafetty P4 branch to make dcons(4) working again. Thank you! I think I'll continue using this schedule, with some very small changes, based on previous discussions: * Ed Schouten wrote: > - Not all drivers have been ported to the new TTY layer yet. These > drivers still need to be ported: sio(4), cy(4), digi(4), ubser(4), > uftdi(4), nmdm(4), ng_h4(4), ng_tty(4), snp(4), rp(4), rc(4), si(4), > umodem(4), dcons(4). This should now read: - Not all drivers have been ported to the new TTY layer yet. These drivers still need to be ported: sio(4), cy(4), digi(4), ubser(4), nmdm(4), ng_h4(4), ng_tty(4), snp(4), rp(4), rc(4), si(4). If time permits, I'll fix nmdm(4). I've also received some messages about si(4) and digi(4), so I'll contact those people to see what we can do here. Someone emailed me about ng_h4(4), ng_tty(4) and snp(4). There are no short term plans to make these drivers work. I am going to implement a new hooks interface into the TTY layer after we've integrated this patchset. I don't want to bring in too much code in a single run. > July 13 2008: > Make uart(4) the default serial port driver, instead of sio(4). > sio(4) has not been ported to the new TTY layer and is very hard > to do so. uart(4) has been proven to be more portable than > sio(4) and already supports the hardware we need. It looks like we can do this on i386 and amd64. pc98 still needs some polishing, but I've got full confidence this will be sorted out in time. I'll closely track marcel@'s work the next couple of days to make sure we can do this without breaking too much. > August 3 2008: > Disconnect drivers from the build that haven't been patched in > the MPSAFE TTY branch. I won't disconnect drivers here which are in the progress of being ported. If foo@ sends me an email to say he's working on rc(4) for example, I will leave that driver alone. Again, I should bug people with: > Please, make sure we can make this a smooth transition by > testing/reviewing my code. I tend to generate diffs very often. They can > be downloaded here: >=20 > http://www.il.fontys.nl/~ed/projects/mpsafetty/patches/ Thanks! --=20 Ed Schouten WWW: http://80386.nl/ --wQkw7DhpL9hyPo7K Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (FreeBSD) iEYEARECAAYFAkhzdrQACgkQ52SDGA2eCwWOnACeLI6jHIyYBGaOOKIYXZIPPEXD W+MAn1BFYUrTZTmYjphD4gL3fnnixZwq =Qe1e -----END PGP SIGNATURE----- --wQkw7DhpL9hyPo7K-- From owner-freebsd-arch@FreeBSD.ORG Tue Jul 8 14:45:50 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3521F106566B; Tue, 8 Jul 2008 14:45:50 +0000 (UTC) (envelope-from babkin@verizon.net) Received: from vms173005pub.verizon.net (vms173005pub.verizon.net [206.46.173.5]) by mx1.freebsd.org (Postfix) with ESMTP id 1324F8FC1C; Tue, 8 Jul 2008 14:45:50 +0000 (UTC) (envelope-from babkin@verizon.net) Received: from vms075.mailsrvcs.net ([172.18.12.131]) by vms173005.mailsrvcs.net (Sun Java System Messaging Server 6.2-6.01 (built Apr 3 2006)) with ESMTPA id <0K3O0004YZC2QFA3@vms173005.mailsrvcs.net>; Tue, 08 Jul 2008 09:38:26 -0500 (CDT) Received: from 65.242.108.162 ([65.242.108.162]) by vms075.mailsrvcs.net (Verizon Webmail) with HTTP; Tue, 08 Jul 2008 09:45:23 -0500 (CDT) Date: Tue, 08 Jul 2008 09:45:23 -0500 (CDT) From: Sergey Babkin X-Originating-IP: [65.242.108.162] To: Sergey Babkin , =?ISO8859-1?Q?Dag-Erling_Sm=F8rgrav?= Message-id: <14092723.330161215528323681.JavaMail.root@vms075.mailsrvcs.net> MIME-version: 1.0 Content-type: text/plain; charset=UTF-8 Content-transfer-encoding: 7bit Cc: arch@freebsd.org, Robert Watson Subject: Re: Re: Proposal: a revoke() system call X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Jul 2008 14:45:50 -0000 >From: =?ISO8859-1?Q?Dag-Erling_Sm=F8rgrav?= >Date: 2008/07/08 Tue AM 06:57:20 EDT >To: Sergey Babkin >Cc: Robert Watson , arch@freebsd.org >Subject: Re: Proposal: a revoke() system call >Sergey Babkin writes: >> Robert Watson writes: >> > Seems like that conflicts with our existing revoke(2) system call. >> Aha, I guess when I've checked, I've looked at a real old version of >> FreeBSD. > >"real old", as in "three years before FreeBSD even existed"? revoke(2) >was introduced in 4.3BSD Reno in 1990. I'm looking real stupid right now :-) Maybe I've misspelled it when looked first time. >BTW, could you please switch to a MUA that correctly inserts In-Reply-To: >and / or References: headers? That's my provider's web interface. -SB From owner-freebsd-arch@FreeBSD.ORG Tue Jul 8 15:26:22 2008 Return-Path: Delivered-To: arch@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C231A106564A; Tue, 8 Jul 2008 15:26:20 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id 4A3858FC0C; Tue, 8 Jul 2008 15:26:20 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 636D346BA5; Tue, 8 Jul 2008 11:26:19 -0400 (EDT) Date: Tue, 8 Jul 2008 16:26:19 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Sergey Babkin In-Reply-To: <20080708001929.E63144@fledge.watson.org> Message-ID: <20080708161802.N89342@fledge.watson.org> References: <9484951.340521215467447990.JavaMail.root@vms126.mailsrvcs.net> <20080708001929.E63144@fledge.watson.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: arch@FreeBSD.ORG, David Schultz , Poul-Henning Kamp Subject: Re: Re: Re: Proposal: a revoke() system call X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Jul 2008 15:26:23 -0000 On Tue, 8 Jul 2008, Robert Watson wrote: > Which can be accomplished by calling dup2(2) to replace the file descriptor > with another file descriptor, perhaps one to /dev/null. It would be worth > carefully reviewing the implementation of dup2(2) to make sure that the > close->replace there is atomic with respect to other threads simultaneously > allocating file descriptors, such as with pipe(2). BTW, on a similar note to the above: I've noticed there are several spots of relative non-atomicity in the Linux emulation code, where rather than just wrapping existing system calls with binary conversion of arguments and return values, we do a semantic wrapping that is necessarily non-atomic with respect to the native code. For example, consider the Linuxulator open code in linux_common_open(): 134 error = kern_openat(td, dirfd, path, UIO_SYSSPACE, bsd_flags, mode); 135 136 if (!error) { 137 fd = td->td_retval[0]; 138 /* 139 * XXX In between kern_open() and fget(), another process 140 * having the same filedesc could use that fd without 141 * checking below. 142 */ 143 error = fget(td, fd, &fp); 144 if (!error) { 145 sx_slock(&proctree_lock); 146 PROC_LOCK(p); 147 if (!(bsd_flags & O_NOCTTY) && 148 SESS_LEADER(p) && !(p->p_flag & P_CONTROLT)) { 149 PROC_UNLOCK(p); 150 sx_unlock(&proctree_lock); 151 if (fp->f_type == DTYPE_VNODE) 152 (void) fo_ioctl(fp, TIOCSCTTY, (caddr_t) 0, 153 td->td_ucred, td); 154 } else { 155 PROC_UNLOCK(p); 156 sx_sunlock(&proctree_lock); 157 } 158 if (l_flags & LINUX_O_DIRECTORY) { 159 if (fp->f_type != DTYPE_VNODE || 160 fp->f_vnode->v_type != VDIR) { 161 error = ENOTDIR; 162 } 163 } 164 fdrop(fp, td); 165 /* 166 * XXX as above, fdrop()/kern_close() pair is racy. 167 */ 168 if (error) 169 kern_close(td, fd); 170 } 171 } I think that comment is mine, or at least, got there because of a comment I made to Roman or the like. The fd has not yet been explicitly returned to userspace, since the open system call hasn't actually returned, but other threads could use the file descriptor in a system call that could lead to unexpected races. For example, if you dup2() on top of the file descriptor between the return of kern_openat() and the invocation of fget(), fo_ioctl() might be called on the wrong file, or the kern_close() in the error case might get invoked on the "wrong" file descriptor. In these cases, the races are mostly harmless since they involve incorrectly using a file descriptor from a second thread -- since it hasn't been returned yet, it isn't valid yet and the results will be undefined. However, there may well be cases where similar races exist that do affect the semantics of multi-threaded Linux applications, such as having a main event (open()) and an associated event (fo_ioctl()) be non-atomic and allowing a race between them that does have a semantically problematic result. These sorts of edge cases, btw, are one reason why I would *strongly* discourage application writers from doing things like calling close(2) on a file descriptor while still using it from another thread. :-) Robert N M Watson Computer Laboratory University of Cambridge From owner-freebsd-arch@FreeBSD.ORG Tue Jul 8 15:36:33 2008 Return-Path: Delivered-To: arch@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A4CA3106567E; Tue, 8 Jul 2008 15:36:33 +0000 (UTC) (envelope-from ed@hoeg.nl) Received: from palm.hoeg.nl (mx0.hoeg.nl [IPv6:2001:610:652::211]) by mx1.freebsd.org (Postfix) with ESMTP id 64F338FC0C; Tue, 8 Jul 2008 15:36:33 +0000 (UTC) (envelope-from ed@hoeg.nl) Received: by palm.hoeg.nl (Postfix, from userid 1000) id BAAFE1CE27; Tue, 8 Jul 2008 17:36:32 +0200 (CEST) Date: Tue, 8 Jul 2008 17:36:32 +0200 From: Ed Schouten To: Robert Watson Message-ID: <20080708153632.GI14567@hoeg.nl> References: <9484951.340521215467447990.JavaMail.root@vms126.mailsrvcs.net> <20080708001929.E63144@fledge.watson.org> <20080708161802.N89342@fledge.watson.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="wwX5Nmi7feudBrEr" Content-Disposition: inline In-Reply-To: <20080708161802.N89342@fledge.watson.org> User-Agent: Mutt/1.5.18 (2008-05-17) Cc: arch@FreeBSD.ORG Subject: Re: Proposal: a revoke() system call X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Jul 2008 15:36:33 -0000 --wwX5Nmi7feudBrEr Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Hello Robert, * Robert Watson wrote: > BTW, on a similar note to the above: I've noticed there are several spots= =20 > of relative non-atomicity in the Linux emulation code, where rather than= =20 > just wrapping existing system calls with binary conversion of arguments= =20 > and return values, we do a semantic wrapping that is necessarily=20 > non-atomic with respect to the native code. For example, consider the=20 > Linuxulator open code in linux_common_open(): I also noticed similar constructs inside the stat() calls, to translate device major/minor numbers. As you can see, some stat() routines call translate_path_major_minor_at() after performing the regular stat() operation. The translate_path_major_minor_at() is implemented by calling kern_openat(). This has three disadvantages: - It is non-atomic. - It can only perform the translation on nodes it has O_RDONLY access to. This shouldn't be a big problem, but may cause inconsistencies when users look around in devfs. - The translation may not always work when the calling process is out of file descriptors. Yours, --=20 Ed Schouten WWW: http://80386.nl/ --wwX5Nmi7feudBrEr Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (FreeBSD) iEYEARECAAYFAkhziYAACgkQ52SDGA2eCwUufgCdFABNXxHgwgRGa466AlktkxDv 7AgAnjCTLlEcDeG+6WL7bruGhIwE7on7 =meaN -----END PGP SIGNATURE----- --wwX5Nmi7feudBrEr-- From owner-freebsd-arch@FreeBSD.ORG Tue Jul 8 16:46:16 2008 Return-Path: Delivered-To: arch@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C9CB6106564A for ; Tue, 8 Jul 2008 16:46:16 +0000 (UTC) (envelope-from das@FreeBSD.ORG) Received: from zim.MIT.EDU (ZIM.MIT.EDU [18.95.3.101]) by mx1.freebsd.org (Postfix) with ESMTP id 881FB8FC1E for ; Tue, 8 Jul 2008 16:46:16 +0000 (UTC) (envelope-from das@FreeBSD.ORG) Received: from zim.MIT.EDU (localhost [127.0.0.1]) by zim.MIT.EDU (8.14.2/8.14.2) with ESMTP id m68Gmswu040786; Tue, 8 Jul 2008 12:48:54 -0400 (EDT) (envelope-from das@FreeBSD.ORG) Received: (from das@localhost) by zim.MIT.EDU (8.14.2/8.14.2/Submit) id m68Gmr08040785; Tue, 8 Jul 2008 12:48:53 -0400 (EDT) (envelope-from das@FreeBSD.ORG) Date: Tue, 8 Jul 2008 12:48:53 -0400 From: David Schultz To: Robert Watson Message-ID: <20080708164853.GA40704@zim.MIT.EDU> Mail-Followup-To: Robert Watson , Sergey Babkin , arch@FreeBSD.ORG, Poul-Henning Kamp References: <9484951.340521215467447990.JavaMail.root@vms126.mailsrvcs.net> <20080708001929.E63144@fledge.watson.org> <20080708161802.N89342@fledge.watson.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20080708161802.N89342@fledge.watson.org> Cc: arch@FreeBSD.ORG, Poul-Henning Kamp , Sergey Babkin Subject: Re: Re: Re: Proposal: a revoke() system call X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Jul 2008 16:46:16 -0000 On Tue, Jul 08, 2008, Robert Watson wrote: > These sorts of edge cases, btw, are one reason why I would *strongly* > discourage application writers from doing things like calling close(2) on a > file descriptor while still using it from another thread. :-) My reaction is that apps should use standard concurrency control primitives, e.g., pthreads primitives or message queues, to coordinate the activities of multiple threads. The are scads of ways to introduce race conditions when updating various aspects of the process state (the fd table, in this case). Once we start adding special-purpose APIs to facilitate clever lock-free tricks in very specific cases, when will it stop? Next we'll want a special version of exit(), a special version of sigaction(), a special version of free(), and so forth. That said, POSIX does require open() and close() to be atomic, so the Linux emulation layer should be fixed in that regard: 2.9.7 Thread Interactions with Regular File Operations All of the functions chmod(), close(), fchmod(), fcntl(), fstat(), ftruncate(), lseek(), open(), read(), readlink(), stat(), symlink(), and write() shall be atomic with respect to each other in the effects specified in IEEE Std 1003.1-2001 when they operate on regular files. If two threads each call one of these functions, each call shall either see all of the specified effects of the other call, or none of them. From owner-freebsd-arch@FreeBSD.ORG Tue Jul 8 16:54:51 2008 Return-Path: Delivered-To: arch@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E16671065689 for ; Tue, 8 Jul 2008 16:54:51 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id B9A118FC2D for ; Tue, 8 Jul 2008 16:54:51 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 55A8C46BB5; Tue, 8 Jul 2008 12:54:51 -0400 (EDT) Date: Tue, 8 Jul 2008 17:54:51 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Ed Schouten In-Reply-To: <20080708153632.GI14567@hoeg.nl> Message-ID: <20080708174957.M41405@fledge.watson.org> References: <9484951.340521215467447990.JavaMail.root@vms126.mailsrvcs.net> <20080708001929.E63144@fledge.watson.org> <20080708161802.N89342@fledge.watson.org> <20080708153632.GI14567@hoeg.nl> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: arch@FreeBSD.ORG Subject: Re: Proposal: a revoke() system call X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Jul 2008 16:54:52 -0000 On Tue, 8 Jul 2008, Ed Schouten wrote: > I also noticed similar constructs inside the stat() calls, to translate > device major/minor numbers. As you can see, some stat() routines call > translate_path_major_minor_at() after performing the regular stat() > operation. The translate_path_major_minor_at() is implemented by calling > kern_openat(). This has three disadvantages: > > - It is non-atomic. > > - It can only perform the translation on nodes it has O_RDONLY access > to. This shouldn't be a big problem, but may cause inconsistencies > when users look around in devfs. > > - The translation may not always work when the calling process is out of > file descriptors. - Opening a device node can have side effects, such as rewinding tapes, raising DTR on serial lines, triggering errors, or denying access to other consumers due to exclusive access requirements. Robert N M Watson Computer Laboratory University of Cambridge From owner-freebsd-arch@FreeBSD.ORG Wed Jul 9 14:21:26 2008 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AEE7D106567D for ; Wed, 9 Jul 2008 14:21:26 +0000 (UTC) (envelope-from stefan.lambrev@moneybookers.com) Received: from blah.sun-fish.com (blah.sun-fish.com [217.18.249.150]) by mx1.freebsd.org (Postfix) with ESMTP id 681508FC13 for ; Wed, 9 Jul 2008 14:21:26 +0000 (UTC) (envelope-from stefan.lambrev@moneybookers.com) Received: by blah.sun-fish.com (Postfix, from userid 1002) id B8D451B10E4E; Wed, 9 Jul 2008 16:06:03 +0200 (CEST) X-Spam-Checker-Version: SpamAssassin 3.2.4 (2008-01-01) on malcho.cmotd.com X-Spam-Level: X-Spam-Status: No, score=-10.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, J_CHICKENPOX_21 autolearn=no version=3.2.4 Received: from hater.haters.org (hater.cmotd.com [192.168.3.125]) by blah.sun-fish.com (Postfix) with ESMTP id CEC531B10CAA for ; Wed, 9 Jul 2008 16:05:56 +0200 (CEST) Message-ID: <4874C5C4.6080605@moneybookers.com> Date: Wed, 09 Jul 2008 17:05:56 +0300 From: Stefan Lambrev User-Agent: Thunderbird 2.0.0.14 (X11/20080616) MIME-Version: 1.0 To: freebsd-arch@freebsd.org Content-Type: text/plain; charset=windows-1251; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV version 0.93, clamav-milter version 0.93 on blah.cmotd.com X-Virus-Status: Clean Subject: Socket not ready problem. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Jul 2008 14:21:26 -0000 Greetings, I have few apps installed from ports (clamav-milter, spamassassin-milter and etc) which should chown/chmod their socket to allow apps running with different user to connect to them. The problem is that I see some race here. On SMP machine very often the app finish it's execution and next chown/chmod is called, but the socket is not opened/created at this point. Which is very annoying because I have to change my rc.d scripts by hand every time when I update/upgrade. And sometimes I forget ... Can you consider the following patch (or if you like something similar) for inclusion which will then allow rc.d shell script to call a function to wait for the socket and once ready to exit and let chown/chmod start. The function should be called with 2 parameters - socket path and timeout (and we can manage them from rc.conf) --- /usr/src/etc/rc.subr 2008-05-20 11:00:14.000000000 +0200 +++ /etc/rc.subr 2008-05-26 17:59:08.000000000 +0200 @@ -1569,4 +1569,22 @@ fi +wait_for_socket() +{ + _socketpath=$1 + _timeout=$2 + if [ -z "${_socketpath}" -o -z "${_timeout}" ]; then + err 3 'USAGE: wait_for_socket socketpath timeout' + fi + + while [ ${_timeout} -gt 0 ] + do + [ -S "${_socketpath}" ] && break + echo -n "." + sleep 1 + _timeout=$((${_timeout}-1)) + done + echo +} + _rc_subr_loaded=: -- Best Wishes, Stefan Lambrev ICQ# 24134177 From owner-freebsd-arch@FreeBSD.ORG Thu Jul 10 02:25:13 2008 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C0551106566C; Thu, 10 Jul 2008 02:25:13 +0000 (UTC) (envelope-from jhb@FreeBSD.org) Received: from server.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net [IPv6:2001:470:1f10:75::2]) by mx1.freebsd.org (Postfix) with ESMTP id 317638FC1D; Thu, 10 Jul 2008 02:25:13 +0000 (UTC) (envelope-from jhb@FreeBSD.org) Received: from localhost.corp.yahoo.com (john@localhost [IPv6:::1]) (authenticated bits=0) by server.baldwin.cx (8.14.2/8.14.2) with ESMTP id m6A2OvAu028341; Wed, 9 Jul 2008 22:25:04 -0400 (EDT) (envelope-from jhb@FreeBSD.org) From: John Baldwin To: Sergey Babkin Date: Wed, 9 Jul 2008 20:54:48 -0400 User-Agent: KMail/1.9.7 References: <9484951.340521215467447990.JavaMail.root@vms126.mailsrvcs.net> <20080708161802.N89342@fledge.watson.org> <20080708164853.GA40704@zim.MIT.EDU> In-Reply-To: <20080708164853.GA40704@zim.MIT.EDU> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200807092054.48748.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by milter-greylist-2.0.2 (server.baldwin.cx [IPv6:::1]); Wed, 09 Jul 2008 22:25:05 -0400 (EDT) X-Virus-Scanned: ClamAV 0.93.1/7680/Wed Jul 9 19:31:16 2008 on server.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.5 required=4.2 tests=AWL,BAYES_00,NO_RELAYS autolearn=ham version=3.1.3 X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on server.baldwin.cx Cc: David Schultz , Robert Watson , Poul-Henning Kamp , freebsd-arch@FreeBSD.org Subject: Re: Proposal: a revoke() system call X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Jul 2008 02:25:13 -0000 On Tuesday 08 July 2008 12:48:53 pm David Schultz wrote: > On Tue, Jul 08, 2008, Robert Watson wrote: > > These sorts of edge cases, btw, are one reason why I would *strongly* > > discourage application writers from doing things like calling close(2) on a > > file descriptor while still using it from another thread. :-) > > My reaction is that apps should use standard concurrency control > primitives, e.g., pthreads primitives or message queues, to > coordinate the activities of multiple threads. The are scads of > ways to introduce race conditions when updating various aspects of > the process state (the fd table, in this case). Once we start > adding special-purpose APIs to facilitate clever lock-free tricks > in very specific cases, when will it stop? Next we'll want a > special version of exit(), a special version of sigaction(), a > special version of free(), and so forth. I agree, this just sounds like an application bug. Plus, even if we add a new system call that rescues drowning file descriptors it won't really help with writing a portable application anyway unless you get other OS's to adopt a similar API. Just use the extra pipe for messages and/or real locking (in your original example you have an obvious race with the use of 'mystructure' and the solution is Don't Do That(tm)). -- John Baldwin From owner-freebsd-arch@FreeBSD.ORG Thu Jul 10 06:22:02 2008 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 18254106568E for ; Thu, 10 Jul 2008 06:22:02 +0000 (UTC) (envelope-from sson@freebsd.org) Received: from www.son.org (son.org [199.239.233.23]) by mx1.freebsd.org (Postfix) with ESMTP id C08408FC1B for ; Thu, 10 Jul 2008 06:22:01 +0000 (UTC) (envelope-from sson@freebsd.org) Received: from mactel.local (ppp-68-90-8-92.dsl.rcsntx.swbell.net [68.90.8.92]) (authenticated bits=0) by www.son.org (8.13.6.20060614/8.13.6) with ESMTP id m6A5xLCt052832 for ; Thu, 10 Jul 2008 00:59:21 -0500 (CDT) Message-ID: <4875A5D2.8030902@freebsd.org> Date: Thu, 10 Jul 2008 01:01:54 -0500 From: Stacey Son User-Agent: Thunderbird/3.0a2pre (Macintosh; 2008070503) MIME-Version: 1.0 To: freebsd-arch@freebsd.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: ksyms pseudo driver X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Jul 2008 06:22:02 -0000 Hi, I have created a ksyms pseudo driver for FreeBSD. Included below is the man page. The diff's to kernel source, the main source files, etc. can be found at: http://people.FreeBSD.org/~sson/ksyms/ The reason I created this driver is for dtrace and the port of the opensolaris lockstat(1M) command to FreeBSD. The ksyms driver allows a process to get a quick snapshot of the kernel symbol table including the symbols from any loaded modules. Unlike most other implementations, this ksyms driver maps memory in the process space to store the snapshot at the time /dev/ksyms is opened. It also checks to see if the process has already a snapshot open and won't allow it to open /dev/ksyms it again until it closes (and unmaps) its already opened snapshot first. Of course, this requires the read() handler to bounce the buffer into the kernel first before it is written back out to userspace. (Maybe there is a simple way to do an userspace to userspace copy instead?) The reason I went to all this trouble is to keep /dev/ksyms from turning into an easy way to exhaust all the kernel memory (unintentionally or intentionally). Let me know if you have any questions, comments, suggestions, and/or reasons why something like this should never be included in FreeBSD. Best Regards, -stacey. ----------------------------------------------------------------------------------- KSYMS(4) FreeBSD Kernel Interfaces Manual KSYMS(4) NAME ksyms -- kernel symbol table interface SYNOPSIS device ksyms DESCRIPTION The /dev/ksyms character device provides a read-only interface to a snap- shot of the kernel symbol table. The in-kernel symbol manager is designed to be able to handle many types of symbols tables, however, only elf(5) symbol tables are supported by this device. The ELF format image contains two sections: a symbol table and a corresponding string table. Symbol Table The SYMTAB section contains the symbol table entries present in the current running kernel, including the symbol table entries of any loaded modules. The symbols are ordered by the kernel module load time starting with kernel file symbols first, followed by the first loaded module's symbols and so on. String Table The STRTAB section contains the symbol name strings from the kernel and any loaded modules that the symbol table entries reference. Elf formatted symbol table data read from the /dev/ksyms file represents the state of the kernel at the time when the device is opened. Since /dev/ksyms has no text or data, most of the fields are initialized to NULL. The ksyms driver does not block the loading or unloading of mod- ules into the kernel while the /dev/ksyms file is open but may contain stale data. IOCTLS The ioctl(2) command codes below are defined in . The (third) argument to the ioctl(2) should be a pointer to the type indicated. KIOCGSIZE (size_t) Returns the total size of the current symbol table. This can be used when allocating a buffer to make a copy of the kernel symbol table. KIOCGADDR (void *) Returns the address of the kernel symbol table mapped in the process memory. FILES /dev/ksyms ERRORS An open(2) of /dev/ksyms will fail if: [EBUSY] The device is already open. A process must close /dev/ksyms before it can be opened again. [ENOMEM] There is a resource shortage in the kernel. [ENXIO] The driver was unsuccessful in creating a snapshot of the kernel symbol table. This may occur if the kernel was in the process of loading or unloading a module. SEE ALSO ioctl(2), nlist(3), elf(5), kldload(8) HISTORY A ksyms device exists in many different operating systems. This imple- mentation is similar in function to the Solaris and NetBSD ksyms driver. The ksyms driver first appeared in FreeBSD 8.0 to support lockstat(1). BUGS Because files can be dynamically linked into the kernel at any time the symbol information can vary. When you open the /dev/ksyms file, you have access to an ELF image which represents a snapshot of the state of the kernel symbol information at that instant in time. Keeping the device open does not block the loading or unloading of kernel modules. To get a new snapshot you must close and re-open the device. A process is only allowed to open the /dev/ksyms file once at a time. The process must close the /dev/ksyms before it is allowed to open it again. The ksyms driver uses the calling process' memory address space to store the snapshot. ioctl(2) can be used to get the memory address where the symbol table is stored to save kernel memory. mmap(2) may also be used but it will map it to another address. AUTHORS The ksyms driver was written by Stacey Son under the direction of John Birrell. FreeBSD 8.0 April 5, 2008 FreeBSD 8.0 From owner-freebsd-arch@FreeBSD.ORG Fri Jul 11 19:53:02 2008 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 380561065671; Fri, 11 Jul 2008 19:53:02 +0000 (UTC) (envelope-from gallatin@cs.duke.edu) Received: from duke.cs.duke.edu (duke.cs.duke.edu [152.3.140.1]) by mx1.freebsd.org (Postfix) with ESMTP id 0E08A8FC19; Fri, 11 Jul 2008 19:53:01 +0000 (UTC) (envelope-from gallatin@cs.duke.edu) Received: from grasshopper.cs.duke.edu (grasshopper.cs.duke.edu [152.3.145.30]) by duke.cs.duke.edu (8.14.2/8.14.2) with ESMTP id m6BJr0kn000072 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 11 Jul 2008 15:53:01 -0400 (EDT) X-DKIM: Sendmail DKIM Filter v2.5.3 duke.cs.duke.edu m6BJr0kn000072 Received: (from gallatin@localhost) by grasshopper.cs.duke.edu (8.12.9p2/8.12.9/Submit) id m6BJqWUS096410; Fri, 11 Jul 2008 15:52:32 -0400 (EDT) (envelope-from gallatin) Date: Fri, 11 Jul 2008 15:52:32 -0400 From: Andrew Gallatin To: Stacey Son Message-ID: <20080711155232.A96384@grasshopper.cs.duke.edu> References: <4875A5D2.8030902@freebsd.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <4875A5D2.8030902@freebsd.org>; from sson@freebsd.org on Thu, Jul 10, 2008 at 01:01:31AM -0500 X-Operating-System: FreeBSD 4.9-RELEASE-p1 on an i386 Cc: freebsd-arch@freebsd.org Subject: Re: ksyms pseudo driver X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Jul 2008 19:53:02 -0000 Stacey Son [sson@freebsd.org] wrote: > > The reason I created this driver is for dtrace and the port of the > opensolaris lockstat(1M) command to FreeBSD. The ksyms driver allows a > process to get a quick > snapshot of the kernel symbol table including the symbols from any > loaded modules. Very cool! After doing some Solaris work, I've really missed lockstat! This would also be useful for hwpmc. > its already opened snapshot first. Of course, this requires the read() > handler to bounce the buffer into the kernel first before it is written > back out to userspace. (Maybe there is a simple way to do an userspace > to userspace copy instead?) The reason I went to all this trouble is to > keep /dev/ksyms from turning into an easy way to exhaust all the kernel > memory (unintentionally or intentionally). Instead of doing the copy in the kernel, can you just have a simple ioctl which returns the address and size of the snapshot? Then the userspace side can do the copy itself. Drew From owner-freebsd-arch@FreeBSD.ORG Sat Jul 12 01:16:04 2008 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1CFA3106564A for ; Sat, 12 Jul 2008 01:16:04 +0000 (UTC) (envelope-from sson@freebsd.org) Received: from www.son.org (son.org [199.239.233.23]) by mx1.freebsd.org (Postfix) with ESMTP id DF2538FC22 for ; Sat, 12 Jul 2008 01:16:03 +0000 (UTC) (envelope-from sson@freebsd.org) Received: from 10net.scnet.net ([63.99.110.163]) (authenticated bits=0) by www.son.org (8.13.6.20060614/8.13.6) with ESMTP id m6C1FvnL061153; Fri, 11 Jul 2008 20:16:00 -0500 (CDT) Message-ID: <48780661.5050002@freebsd.org> Date: Fri, 11 Jul 2008 20:18:25 -0500 From: Stacey Son User-Agent: Thunderbird/3.0a2pre (Macintosh; 2008070703) MIME-Version: 1.0 To: Andrew Gallatin References: <4875A5D2.8030902@freebsd.org> <20080711155232.A96384@grasshopper.cs.duke.edu> In-Reply-To: <20080711155232.A96384@grasshopper.cs.duke.edu> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-arch@freebsd.org Subject: Re: ksyms pseudo driver X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 12 Jul 2008 01:16:04 -0000 Andrew Gallatin wrote: >> its already opened snapshot first. Of course, this requires the read() >> handler to bounce the buffer into the kernel first before it is written >> back out to userspace. (Maybe there is a simple way to do an userspace >> to userspace copy instead?) The reason I went to all this trouble is to >> keep /dev/ksyms from turning into an easy way to exhaust all the kernel >> memory (unintentionally or intentionally). >> > > Instead of doing the copy in the kernel, can you just have a simple > ioctl which returns the address and size of the snapshot? Then the > userspace side can do the copy itself. > Actually that is what the ioctls do now... You can just open /dev/ksyms to create the snapshot and then use ioctl() to get the size and address where the buffer is mapped. Or you can use mmap(). IOCTLS The ioctl(2) command codes below are defined in . The (third) argument to the ioctl(2) should be a pointer to the type indicated. KIOCGSIZE (size_t) Returns the total size of the current symbol table. KIOCGADDR (void *) Returns the address of the kernel symbol table mapped in the process memory. -stacey. From owner-freebsd-arch@FreeBSD.ORG Sat Jul 12 05:34:31 2008 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DCA4F1065674; Sat, 12 Jul 2008 05:34:31 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (skuns.zoral.com.ua [91.193.166.194]) by mx1.freebsd.org (Postfix) with ESMTP id 4FF808FC08; Sat, 12 Jul 2008 05:34:31 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id m6C4wbel060427 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 12 Jul 2008 07:58:37 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.2/8.14.2) with ESMTP id m6C4wb5b086151; Sat, 12 Jul 2008 07:58:37 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.2/8.14.2/Submit) id m6C4wb6E086150; Sat, 12 Jul 2008 07:58:37 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Sat, 12 Jul 2008 07:58:37 +0300 From: Kostik Belousov To: Stacey Son Message-ID: <20080712045837.GD17123@deviant.kiev.zoral.com.ua> References: <4875A5D2.8030902@freebsd.org> <20080711155232.A96384@grasshopper.cs.duke.edu> <48780661.5050002@freebsd.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="GAc89RRdceylj63L" Content-Disposition: inline In-Reply-To: <48780661.5050002@freebsd.org> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: ClamAV version 0.91.2, clamav-milter version 0.91.2 on skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-4.4 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.2.4 X-Spam-Checker-Version: SpamAssassin 3.2.4 (2008-01-01) on skuns.kiev.zoral.com.ua Cc: Andrew Gallatin , freebsd-arch@freebsd.org Subject: Re: ksyms pseudo driver X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 12 Jul 2008 05:34:31 -0000 --GAc89RRdceylj63L Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Jul 11, 2008 at 08:18:25PM -0500, Stacey Son wrote: > Andrew Gallatin wrote: > >>its already opened snapshot first. Of course, this requires the read() > >>handler to bounce the buffer into the kernel first before it is written > >>back out to userspace. (Maybe there is a simple way to do an userspace > >>to userspace copy instead?) The reason I went to all this trouble is to > >>keep /dev/ksyms from turning into an easy way to exhaust all the kernel > >>memory (unintentionally or intentionally). > >> =20 > > > >Instead of doing the copy in the kernel, can you just have a simple > >ioctl which returns the address and size of the snapshot? Then the > >userspace side can do the copy itself. > > =20 > Actually that is what the ioctls do now... You can just open=20 > /dev/ksyms to create the snapshot and then use ioctl() to get the size=20 > and address where the buffer is mapped. Or you can use mmap(). Most likely, I miss some obvious reason there. But for me it looks like you do it in the reverse. The natural setup would be to require userspace to supply an allocated memory to the driver, and then the driver fills the memory with symbol table. This solves the problem of exhaustion of kernel address space. As usual, when user-supplied region is too small, driver shall return both an error and new required size. It is understandable that the size is volatile and may be too small for the next call too. But, in fact, kernel symtable does not change too often, so I think even the one iteration mostly succeed. --GAc89RRdceylj63L Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (FreeBSD) iEYEARECAAYFAkh4OfwACgkQC3+MBN1Mb4jxtgCgofnRqwzq8QzlqE6jtIHXOI3Q cCYAmwS9jsXBz9CuvdmwtyqXRsdyRTkC =roYt -----END PGP SIGNATURE----- --GAc89RRdceylj63L--