From owner-freebsd-arch@freebsd.org Tue Dec 27 17:52:30 2016 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4CEA9C93301 for ; Tue, 27 Dec 2016 17:52:30 +0000 (UTC) (envelope-from delphij@gmail.com) Received: from mail-wj0-x242.google.com (mail-wj0-x242.google.com [IPv6:2a00:1450:400c:c01::242]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id B6938126A; Tue, 27 Dec 2016 17:52:29 +0000 (UTC) (envelope-from delphij@gmail.com) Received: by mail-wj0-x242.google.com with SMTP id kp2so52192934wjc.0; Tue, 27 Dec 2016 09:52:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=mUKemV519812VRHl/lwUQqlJ2XPPvFJLtaj8frx5a9k=; b=r9Fr8x7wXnbeOA6pLP916LvCxzEYSkqXrY6KVPEmyNdPUqDqgkZDCUSAOEi7B+R0s5 QcXoJrfmai7Dyj5+TUYPzMwGS7TV3BzSGdTSOj9Wf0P2WD5pkkCj5Z5WsCaOIHPKtjX9 ZCA7njry3i65r5sm8z1dSStSqe4s3TcnjQZFfVOn9EOL35Cnf4usFfO/MH4q6SrTFUbh dPlk6gHIwnWDbG+R9Y4cEQc5g9On+d6PnogdUp5ywAqSrFZuHXE62Gt+/P5Tq4pXtU0Z qxXV3RzZZCI0jxq8ekmilcFVimaE/hc1xFbmSL1VQCUW8drHJLYvKTyLdw+yL9O+LBqT sKvw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=mUKemV519812VRHl/lwUQqlJ2XPPvFJLtaj8frx5a9k=; b=sGM7MPkdwmwTzW9ckk8GdIr7clzouSE+Vlt0mEcAv7JILn6GbECYqnQsY8zy9Q2Gf6 LajvG3oEmfQD5ubnmTDbOeA9JQP6zicancfblvJw5bShIoMzGTxJ29UMs9E4bh+q/Ysk W5k+9c4X0MpXzyMipRsO4+U7kvIOvyEfEl8CLLuCFfFj/zO7Mzn7uNK3wKLcKyyKLFX8 C9n4nyC7GczsL2IAGUGl/8LPzdMHVkIn8zXNia2R9sYwIRcvH4RNBIF0Q4CyQk4TmjkZ 5835opqerzHN8ueagpraZeK491fJ1+P4scVSb+UZH+02q5OwTYZXLCjgYq7eW3/N1lXl FDXg== X-Gm-Message-State: AIkVDXLhzBoLFtMzIbW9Ezgn2D9eRRNl8TAtja2fcYqcMDDFVRZUo9en0CYQ1fCrHIHPgh1DisRpg7NOD2fTcQ== X-Received: by 10.194.164.42 with SMTP id yn10mr29690620wjb.46.1482861148098; Tue, 27 Dec 2016 09:52:28 -0800 (PST) MIME-Version: 1.0 Received: by 10.194.74.100 with HTTP; Tue, 27 Dec 2016 09:52:27 -0800 (PST) In-Reply-To: References: <2016122223570929089978@corp.netease.com> <2016122311014089280414@corp.netease.com> <2016122316484066524625@corp.netease.com> From: Xin LI Date: Tue, 27 Dec 2016 09:52:27 -0800 Message-ID: Subject: Re: question about fopen fd limit To: Alfred Perlstein Cc: =?UTF-8?B?55ub5oWn5Y2O?= , Hongjiang Zhang , freebsd-arch@freebsd.org, Jilles Tjoelker Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.23 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 Dec 2016 17:52:30 -0000 -freebsd-net (bcc), +freebsd-arch This would work but also comes with a lot of pain. (But really - why do we implement these accessors, like fileno() and friends as macros with knowledge of the sFILE layout? Will it be reasonable to start converting these to functions as a first step, which would not break ABI but allow us to do it in the future? Otherwise we would be just kicking the same can along the street forever...) FILE is supposed to be fully opaque to application writers in my opinion. On Sat, Dec 24, 2016 at 5:19 PM, Alfred Perlstein wrote: > Hello =E7=9B=9B=E6=85=A7=E5=8D=8E > > Here's another trick that may work. > > Use funopen(3) and provide your own read/write/seek and close functions > for the high fds. > > You can basically make "cookie" a struct that contains your "int sized" > fds. > > FILE * > funopen(const void *cookie, int (*readfn)(void *, char *, int), int > (*writefn)(void *, const char *, int), > fpos_t (*seekfn)(void *, fpos_t, int), int (*closefn)(void *)); > > > If you need more help please make sure to email me directly so I can see > your question. > > -Alfred > > > > On 12/23/16 12:48 AM, =E7=9B=9B=E6=85=A7=E5=8D=8E wrote: > >> hi all, >> >> Thank you for your advice ~ >> solution 2 definitly broaden my horizons ~~but may be not a good choic= e >> for my project ~~LoL >> i will try to mail freebsd-current mail list, if libc is as your >> description , may be i should modify it by myself ~~ >> Thank you so much~ >> Are u KingSoft's Dr Zhang ? nice to meet you !!!!! >> >> >> winson sheng >> >> >> winson sheng >> From: Hongjiang Zhang >> Date: 2016-12-23 11:44 >> To: =E7=9B=9B=E6=85=A7=E5=8D=8E; freebsd-net >> Subject: RE: RE: question about fopen fd limit >> Ok. I know. >> There are two possible solutions: >> Quick solution for short term: modify short to int in libc by yourself, >> buildworld and installworld. Pushing to modify libc may take a long time= , >> especially only few people encounter this issue. You=E2=80=99d better se= nd email to >> freebsd-current to confirm whether they accept your suggestion. >> Work around: You can first reserve a series of fd before opening TCP >> connections. For example, invoke open(=E2=80=9C/dev/null=E2=80=9D) for 1= 0000 times to get >> 10000 fds. Those fd values are small enough to be held by =E2=80=9Cshort= =E2=80=9D. After >> that, start TCP connections. Once you need to fopen a file, please call >> open(=E2=80=9Cxxx=E2=80=9D) instead, and then use dup2(old_fd, new_fd) t= o exchange the two >> fd. The old_fd value is the one obtained by open(=E2=80=9Cxxx=E2=80=9D),= and new_fd is one >> in your reserved fd fields, and next please use fdopen(fd, mode). Here, = you >> have to manage the reserved fds by yourself including open/close. >> In my eyes: >> is the quick method, and there is no modifications in your logic. >> Needs you to maintain the reserved consecutive fields for fd by yourself= , >> which increased the complexity of your logic. >> Thanks >> Hongjiang Zhang >> From: =E7=9B=9B=E6=85=A7=E5=8D=8E [mailto:hhsheng@corp.netease.com] >> Sent: Friday, December 23, 2016 11:02 AM >> To: Hongjiang Zhang ; freebsd-net < >> freebsd-net@freebsd.org> >> Subject: Re: RE: question about fopen fd limit >> hi all, >> not map TCP to FILE, you misunderstanding my meaning~ >> for example, if my server tcp already holds 32000 connection >> fopen only has 767 fd to use >> the problem has no bussiness with tcp fd, BUT fopen ... >> in some particular situlations , my server will open 1k+ FILE , tha= t >> will exceed the fileno limit, and overflow occur >> my server can't open any file more ,that's the problem ~ >> so i felt if bsd official could change FILE struct's fileno to a >> UNSIGNED SHORT that may be an effecient and convenient solution just for= my >> case ? >> UNSIGNED SHORT fileno is enough for me, and i don't wanna change a lo= t >> of FILE function that take FILE * as its argument ~ >> Thank you ~~~ >> winson sheng >> >> >> winson sheng >> From: Hongjiang Zhang >> Date: 2016-12-23 10:17 >> To: =E7=9B=9B=E6=85=A7=E5=8D=8E; freebsd-net >> Subject: RE: question about fopen fd limit >> Why do you need to map TCP fd to FILE? >> It is difficult to modify FILE structure. If it is possible, let us >> figure out some new designs to meet your requirement. >> -----Original Message----- >> From: owner-freebsd-net@freebsd.org [mailto:owner-freebsd-net@freebsd.or= g] >> On Behalf Of ??? >> Sent: Thursday, December 22, 2016 11:57 PM >> To: freebsd-net >> Subject: question about fopen fd limit >> hi all, >> hi~ >> we are from Chinese Game Develop Corp, Netease. >> and One of our product using FreeBsd as its OS platform. >> This Game has Millions of players online , and Each Server may holds >> 25000+ tcp connection at the same time.Thanks to BSD and kqueue :) >> for example, it's one of our server , netstat cmd to list >> connections overall... >> netstat -an | grep 13396 (it's our listening port) | wc -l >> 23221 >> recently we do some performance optimize and promote this connect >> limit to 28000+ or 30000+. >> But we find Freebsd has a limit that this huge online number will tak= e >> 28000+ fd, and bsd FILE * struct's >> fd only support to SHORT . such as .. >> struct __sFILE { >> ... >> short _file; /* (*) fileno, if Unix descriptor, else -1 */ ... >> so if our server want to fopen some file when we still hold this >> online number, the fd amount may easily exceed 32767, and fopen definite= ly >> return a err code. then the server will appear some fataly ERROR. >> we do a simple test and confirm this situation. >> then in fopen's code , we notice that we can use open to return a f= d >> instread of fopen to avoid this overflow, >> as below >> 68 /* >> 1 * File descriptors are a full int, but _file is only a short. >> 2 * If we get a valid file descriptor that is greater than >> 3 * SHRT_MAX, then the fd will get sign-extended into an >> 4 * invalid file descriptor. Handle this case by failing the >> 5 * open. >> 6 */ >> BUT ... so many c lib FILE series function needs a FILE * pointer >> as input argument, we can't convert all of them to fd, or it will be a >> rather suffering things to us. >> and even in BSD 10 , it seems this short limit still there , but >> other OS as debian , FILE strucnt's fileno is a int . >> we found an unoffical patch easily change this fileno to unsigned = , >> but we are a very stready project, we can't afford the risk to use an >> unoffical patch. >> so, do you have any plan to change this fopen fd limit to UNSIGNED >> SHORT or int in the future ? ushort is enough for us . >> if you do , we are really glad and excited~~~~~~~if you don't ,it >> donen't matter, plz give us a reply so that we may need to >> find some other plan to resolved this suffering thing. >> LoL, thank you !!!!! >> yours sincerely >> winson sheng >> winson sheng >> _______________________________________________ >> freebsd-net@freebsd.org mailing list >> https://na01.safelinks.protection.outlook.com/?url=3Dhttps%3A% >> 2F%2Flists.freebsd.org%2Fmailman%2Flistinfo%2Ffreebsd >> -net&data=3D02%7C01%7Chonzhan%40microsoft.com%7C4a9dfccbccd4 >> 46be2f4a08d42a833fb0%7C72f988bf86f141af91ab2d7cd011db47%7C1% >> 7C0%7C636180190584478890&sdata=3DPAwJP5IXHy0WJwxbV7MB%2B8 >> zvKheZUYjhHx3ohFRSPZM%3D&reserved=3D0 >> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >> _______________________________________________ >> freebsd-net@freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-net >> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >> > > _______________________________________________ > freebsd-net@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > From owner-freebsd-arch@freebsd.org Tue Dec 27 18:38:28 2016 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 44B50C93649 for ; Tue, 27 Dec 2016 18:38:28 +0000 (UTC) (envelope-from lew@perftech.com) Received: from smtp-gw.pt.net (smtp-gw.pt.net [206.210.194.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "smtp-gw.pt.net", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 0A20B1F7D for ; Tue, 27 Dec 2016 18:38:27 +0000 (UTC) (envelope-from lew@perftech.com) X-ASG-Debug-ID: 1482862740-09411a12f98e2df0001-RYubVt Received: from mail.pt.net (mail.pt.net [206.210.194.11]) by smtp-gw.pt.net with ESMTP id aYXADG1kmeGgXuPp (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Tue, 27 Dec 2016 12:19:00 -0600 (CST) X-Barracuda-Envelope-From: lew@perftech.com X-Barracuda-Effective-Source-IP: mail.pt.net[206.210.194.11] X-Barracuda-Apparent-Source-IP: 206.210.194.11 Received: from localhost (localhost [IPv6:::1]) by mail.pt.net (Postfix) with ESMTP id 73E03840263; Tue, 27 Dec 2016 12:19:00 -0600 (CST) Received: from mail.pt.net ([IPv6:::1]) by localhost (mail.pt.net [IPv6:::1]) (amavisd-new, port 10032) with ESMTP id onOqH_35Qxbc; Tue, 27 Dec 2016 12:18:59 -0600 (CST) Received: from localhost (localhost [IPv6:::1]) by mail.pt.net (Postfix) with ESMTP id D58138402ED; Tue, 27 Dec 2016 12:18:59 -0600 (CST) X-Virus-Scanned: amavisd-new at pt.net Received: from mail.pt.net ([IPv6:::1]) by localhost (mail.pt.net [IPv6:::1]) (amavisd-new, port 10026) with ESMTP id nlgk8vrbKjMD; Tue, 27 Dec 2016 12:18:59 -0600 (CST) Received: from vpn199-8.pt.net (vpn199-8.pt.net [206.210.199.8]) (Authenticated sender: lew@pt.net) by mail.pt.net (Postfix) with ESMTPSA id 81E66840263; Tue, 27 Dec 2016 12:18:59 -0600 (CST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) Subject: Re: question about fopen fd limit From: Lewis Donzis X-ASG-Orig-Subj: Re: question about fopen fd limit In-Reply-To: Date: Tue, 27 Dec 2016 13:19:01 -0500 Cc: Xin LI Content-Transfer-Encoding: quoted-printable Message-Id: <7A472344-E4DA-452A-AC81-9CA67CD0B26C@perftech.com> References: <2016122223570929089978@corp.netease.com> <2016122311014089280414@corp.netease.com> <2016122316484066524625@corp.netease.com> To: freebsd-arch@freebsd.org X-Mailer: Apple Mail (2.3124) X-Barracuda-Connect: mail.pt.net[206.210.194.11] X-Barracuda-Start-Time: 1482862740 X-Barracuda-Encrypted: ECDHE-RSA-AES256-GCM-SHA384 X-Barracuda-URL: https://smtp-gw.pt.net:443/cgi-mod/mark.cgi X-Barracuda-Scan-Msg-Size: 9208 X-Virus-Scanned: by bsmtpd at pt.net X-Barracuda-BRTS-Status: 1 X-Barracuda-Spam-Score: 1.32 X-Barracuda-Spam-Status: No, SCORE=1.32 using global scores of TAG_LEVEL=1000.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=9.0 tests=BSF_RULE7568M, HTTP_ESCAPED_HOST, URI_HEX, URI_NOVOWEL X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.3.35391 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- 0.32 URI_HEX URI: URI hostname has long hexadecimal sequence 0.00 HTTP_ESCAPED_HOST URI: Uses %-escapes inside a URL's hostname 0.50 URI_NOVOWEL URI: URI hostname has long non-vowel sequence 0.50 BSF_RULE7568M Custom Rule 7568M X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 Dec 2016 18:38:28 -0000 I agree, and, in my humble opinion, the FILE structure should never have = made any assumptions about the value of an fd, especially that it would = fit in a short when an fd is defined as an int. For that matter, = fileno() is defined as returning an int, but the implementation is = technically returning a short in a non-threaded environment. I would support changing the __sFILE structure=E2=80=99s _file member to = an int and probably also changing _flags to int or unsigned since it = would end up getting padded for alignment anyway. It would also be more = efficient. > On Dec 27, 2016, at 12:52 PM, Xin LI wrote: >=20 > -freebsd-net (bcc), +freebsd-arch >=20 > This would work but also comes with a lot of pain. >=20 > (But really - why do we implement these accessors, like fileno() and > friends as macros with knowledge of the sFILE layout? Will it be > reasonable to start converting these to functions as a first step, = which > would not break ABI but allow us to do it in the future? Otherwise we > would be just kicking the same can along the street forever...) >=20 > FILE is supposed to be fully opaque to application writers in my = opinion. >=20 > On Sat, Dec 24, 2016 at 5:19 PM, Alfred Perlstein > wrote: >=20 >> Hello =E7=9B=9B=E6=85=A7=E5=8D=8E >>=20 >> Here's another trick that may work. >>=20 >> Use funopen(3) and provide your own read/write/seek and close = functions >> for the high fds. >>=20 >> You can basically make "cookie" a struct that contains your "int = sized" >> fds. >>=20 >> FILE * >> funopen(const void *cookie, int (*readfn)(void *, char *, int), = int >> (*writefn)(void *, const char *, int), >> fpos_t (*seekfn)(void *, fpos_t, int), int (*closefn)(void = *)); >>=20 >>=20 >> If you need more help please make sure to email me directly so I can = see >> your question. >>=20 >> -Alfred >>=20 >>=20 >>=20 >> On 12/23/16 12:48 AM, =E7=9B=9B=E6=85=A7=E5=8D=8E wrote: >>=20 >>> hi all, >>>=20 >>> Thank you for your advice ~ >>> solution 2 definitly broaden my horizons ~~but may be not a good = choice >>> for my project ~~LoL >>> i will try to mail freebsd-current mail list, if libc is as your >>> description , may be i should modify it by myself ~~ >>> Thank you so much~ >>> Are u KingSoft's Dr Zhang ? nice to meet you !!!!! >>>=20 >>>=20 >>> winson sheng >>>=20 >>>=20 >>> winson sheng >>> From: Hongjiang Zhang >>> Date: 2016-12-23 11:44 >>> To: =E7=9B=9B=E6=85=A7=E5=8D=8E; freebsd-net >>> Subject: RE: RE: question about fopen fd limit >>> Ok. I know. >>> There are two possible solutions: >>> Quick solution for short term: modify short to int in libc by = yourself, >>> buildworld and installworld. Pushing to modify libc may take a long = time, >>> especially only few people encounter this issue. You=E2=80=99d = better send email to >>> freebsd-current to confirm whether they accept your suggestion. >>> Work around: You can first reserve a series of fd before opening TCP >>> connections. For example, invoke open(=E2=80=9C/dev/null=E2=80=9D) = for 10000 times to get >>> 10000 fds. Those fd values are small enough to be held by = =E2=80=9Cshort=E2=80=9D. After >>> that, start TCP connections. Once you need to fopen a file, please = call >>> open(=E2=80=9Cxxx=E2=80=9D) instead, and then use dup2(old_fd, = new_fd) to exchange the two >>> fd. The old_fd value is the one obtained by open(=E2=80=9Cxxx=E2=80=9D= ), and new_fd is one >>> in your reserved fd fields, and next please use fdopen(fd, mode). = Here, you >>> have to manage the reserved fds by yourself including open/close. >>> In my eyes: >>> is the quick method, and there is no modifications in your logic. >>> Needs you to maintain the reserved consecutive fields for fd by = yourself, >>> which increased the complexity of your logic. >>> Thanks >>> Hongjiang Zhang >>> From: =E7=9B=9B=E6=85=A7=E5=8D=8E [mailto:hhsheng@corp.netease.com] >>> Sent: Friday, December 23, 2016 11:02 AM >>> To: Hongjiang Zhang ; freebsd-net < >>> freebsd-net@freebsd.org> >>> Subject: Re: RE: question about fopen fd limit >>> hi all, >>> not map TCP to FILE, you misunderstanding my meaning~ >>> for example, if my server tcp already holds 32000 connection >>> fopen only has 767 fd to use >>> the problem has no bussiness with tcp fd, BUT fopen ... >>> in some particular situlations , my server will open 1k+ FILE , = that >>> will exceed the fileno limit, and overflow occur >>> my server can't open any file more ,that's the problem ~ >>> so i felt if bsd official could change FILE struct's fileno to a >>> UNSIGNED SHORT that may be an effecient and convenient solution just = for my >>> case ? >>> UNSIGNED SHORT fileno is enough for me, and i don't wanna change a = lot >>> of FILE function that take FILE * as its argument ~ >>> Thank you ~~~ >>> winson sheng >>>=20 >>>=20 >>> winson sheng >>> From: Hongjiang Zhang >>> Date: 2016-12-23 10:17 >>> To: =E7=9B=9B=E6=85=A7=E5=8D=8E; freebsd-net >>> Subject: RE: question about fopen fd limit >>> Why do you need to map TCP fd to FILE? >>> It is difficult to modify FILE structure. If it is possible, let us >>> figure out some new designs to meet your requirement. >>> -----Original Message----- >>> From: owner-freebsd-net@freebsd.org = [mailto:owner-freebsd-net@freebsd.org] >>> On Behalf Of ??? >>> Sent: Thursday, December 22, 2016 11:57 PM >>> To: freebsd-net >>> Subject: question about fopen fd limit >>> hi all, >>> hi~ >>> we are from Chinese Game Develop Corp, Netease. >>> and One of our product using FreeBsd as its OS platform. >>> This Game has Millions of players online , and Each Server may = holds >>> 25000+ tcp connection at the same time.Thanks to BSD and kqueue :) >>> for example, it's one of our server , netstat cmd to list >>> connections overall... >>> netstat -an | grep 13396 (it's our listening port) | wc -l >>> 23221 >>> recently we do some performance optimize and promote this = connect >>> limit to 28000+ or 30000+. >>> But we find Freebsd has a limit that this huge online number will = take >>> 28000+ fd, and bsd FILE * struct's >>> fd only support to SHORT . such as .. >>> struct __sFILE { >>> ... >>> short _file; /* (*) fileno, if Unix descriptor, else -1 */ ... >>> so if our server want to fopen some file when we still hold this >>> online number, the fd amount may easily exceed 32767, and fopen = definitely >>> return a err code. then the server will appear some fataly ERROR. >>> we do a simple test and confirm this situation. >>> then in fopen's code , we notice that we can use open to return = a fd >>> instread of fopen to avoid this overflow, >>> as below >>> 68 /* >>> 1 * File descriptors are a full int, but _file is only a short. >>> 2 * If we get a valid file descriptor that is greater than >>> 3 * SHRT_MAX, then the fd will get sign-extended into an >>> 4 * invalid file descriptor. Handle this case by failing the >>> 5 * open. >>> 6 */ >>> BUT ... so many c lib FILE series function needs a FILE * = pointer >>> as input argument, we can't convert all of them to fd, or it will be = a >>> rather suffering things to us. >>> and even in BSD 10 , it seems this short limit still there , but >>> other OS as debian , FILE strucnt's fileno is a int . >>> we found an unoffical patch easily change this fileno to = unsigned , >>> but we are a very stready project, we can't afford the risk to use = an >>> unoffical patch. >>> so, do you have any plan to change this fopen fd limit to = UNSIGNED >>> SHORT or int in the future ? ushort is enough for us . >>> if you do , we are really glad and excited~~~~~~~if you don't ,it >>> donen't matter, plz give us a reply so that we may need to >>> find some other plan to resolved this suffering thing. >>> LoL, thank you !!!!! >>> yours sincerely >>> winson sheng >>> winson sheng >>> _______________________________________________ >>> freebsd-net@freebsd.org mailing list >>> https://na01.safelinks.protection.outlook.com/?url=3Dhttps%3A% >>> 2F%2Flists.freebsd.org%2Fmailman%2Flistinfo%2Ffreebsd >>> -net&data=3D02%7C01%7Chonzhan%40microsoft.com%7C4a9dfccbccd4 >>> 46be2f4a08d42a833fb0%7C72f988bf86f141af91ab2d7cd011db47%7C1% >>> 7C0%7C636180190584478890&sdata=3DPAwJP5IXHy0WJwxbV7MB%2B8 >>> zvKheZUYjhHx3ohFRSPZM%3D&reserved=3D0 >>> To unsubscribe, send any mail to = "freebsd-net-unsubscribe@freebsd.org" >>> _______________________________________________ >>> freebsd-net@freebsd.org mailing list >>> https://lists.freebsd.org/mailman/listinfo/freebsd-net >>> To unsubscribe, send any mail to = "freebsd-net-unsubscribe@freebsd.org" >>>=20 >>=20 >> _______________________________________________ >> freebsd-net@freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-net >> To unsubscribe, send any mail to = "freebsd-net-unsubscribe@freebsd.org" >>=20 > _______________________________________________ > freebsd-arch@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to = "freebsd-arch-unsubscribe@freebsd.org" From owner-freebsd-arch@freebsd.org Tue Dec 27 19:10:01 2016 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E7883C9317E for ; Tue, 27 Dec 2016 19:10:01 +0000 (UTC) (envelope-from alfred@freebsd.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.freebsd.org (Postfix) with ESMTP id D4C631683 for ; Tue, 27 Dec 2016 19:10:01 +0000 (UTC) (envelope-from alfred@freebsd.org) Received: from [IPv6:2600:1010:b148:5371:24bd:6692:8899:33c0] (unknown [IPv6:2600:1010:b148:5371:24bd:6692:8899:33c0]) by elvis.mu.org (Postfix) with ESMTPSA id 103CE346DE66; Tue, 27 Dec 2016 11:09:49 -0800 (PST) Mime-Version: 1.0 (1.0) Subject: Re: question about fopen fd limit From: Alfred Perlstein X-Mailer: iPhone Mail (14C92) In-Reply-To: Date: Tue, 27 Dec 2016 11:09:47 -0800 Cc: =?utf-8?B?55ub5oWn5Y2O?= , Hongjiang Zhang , freebsd-arch@freebsd.org, Jilles Tjoelker Message-Id: <5461BE82-34CF-4B46-B7A1-6F8E1B3246C8@freebsd.org> References: <2016122223570929089978@corp.netease.com> <2016122311014089280414@corp.netease.com> <2016122316484066524625@corp.netease.com> To: Xin LI Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.23 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 Dec 2016 19:10:02 -0000 Makes sense, at the same time if we go to 4/2billion descriptors it shouldn'= t matter. Another trick would be to augment the inline function to check a v= ersion flag and if it's incorrect then call the libc function. This will cau= se some performance problems but only for really old software.=20 Sent from my iPhone > On Dec 27, 2016, at 9:52 AM, Xin LI wrote: >=20 > -freebsd-net (bcc), +freebsd-arch >=20 > This would work but also comes with a lot of pain. >=20 > (But really - why do we implement these accessors, like fileno() and frien= ds as macros with knowledge of the sFILE layout? Will it be reasonable to s= tart converting these to functions as a first step, which would not break AB= I but allow us to do it in the future? Otherwise we would be just kicking t= he same can along the street forever...) >=20 > FILE is supposed to be fully opaque to application writers in my opinion. >=20 >> On Sat, Dec 24, 2016 at 5:19 PM, Alfred Perlstein wr= ote: >> Hello =E7=9B=9B=E6=85=A7=E5=8D=8E >>=20 >> Here's another trick that may work. >>=20 >> Use funopen(3) and provide your own read/write/seek and close functions f= or the high fds. >>=20 >> You can basically make "cookie" a struct that contains your "int sized" f= ds. >>=20 >> FILE * >> funopen(const void *cookie, int (*readfn)(void *, char *, int), int (= *writefn)(void *, const char *, int), >> fpos_t (*seekfn)(void *, fpos_t, int), int (*closefn)(void *)); >>=20 >>=20 >> If you need more help please make sure to email me directly so I can see y= our question. >>=20 >> -Alfred >>=20 >>=20 >>=20 >>> On 12/23/16 12:48 AM, =E7=9B=9B=E6=85=A7=E5=8D=8E wrote: >>> hi all, >>>=20 >>> Thank you for your advice ~ >>> solution 2 definitly broaden my horizons ~~but may be not a good choic= e for my project ~~LoL >>> i will try to mail freebsd-current mail list, if libc is as your descri= ption , may be i should modify it by myself ~~ >>> Thank you so much~ >>> Are u KingSoft's Dr Zhang ? nice to meet you !!!!! >>>=20 >>> = winson sheng >>> =20 >>>=20 >>> winson sheng >>> From: Hongjiang Zhang >>> Date: 2016-12-23 11:44 >>> To: =E7=9B=9B=E6=85=A7=E5=8D=8E; freebsd-net >>> Subject: RE: RE: question about fopen fd limit >>> Ok. I know. >>> There are two possible solutions: >>> Quick solution for short term: modify short to int in libc by yourself, b= uildworld and installworld. Pushing to modify libc may take a long time, esp= ecially only few people encounter this issue. You=E2=80=99d better send emai= l to freebsd-current to confirm whether they accept your suggestion. >>> Work around: You can first reserve a series of fd before opening TCP con= nections. For example, invoke open(=E2=80=9C/dev/null=E2=80=9D) for 10000 ti= mes to get 10000 fds. Those fd values are small enough to be held by =E2=80=9C= short=E2=80=9D. After that, start TCP connections. Once you need to fopen a f= ile, please call open(=E2=80=9Cxxx=E2=80=9D) instead, and then use dup2(old_= fd, new_fd) to exchange the two fd. The old_fd value is the one obtained by o= pen(=E2=80=9Cxxx=E2=80=9D), and new_fd is one in your reserved fd fields, an= d next please use fdopen(fd, mode). Here, you have to manage the reserved fd= s by yourself including open/close. >>> In my eyes: >>> is the quick method, and there is no modifications in your logic. >>> Needs you to maintain the reserved consecutive fields for fd by yourself= , which increased the complexity of your logic. >>> Thanks >>> Hongjiang Zhang >>> From: =E7=9B=9B=E6=85=A7=E5=8D=8E [mailto:hhsheng@corp.netease.com] >>> Sent: Friday, December 23, 2016 11:02 AM >>> To: Hongjiang Zhang ; freebsd-net >>> Subject: Re: RE: question about fopen fd limit >>> hi all, >>> not map TCP to FILE, you misunderstanding my meaning~ >>> for example, if my server tcp already holds 32000 connection >>> fopen only has 767 fd to use >>> the problem has no bussiness with tcp fd, BUT fopen ... >>> in some particular situlations , my server will open 1k+ FILE , tha= t will exceed the fileno limit, and overflow occur >>> my server can't open any file more ,that's the problem ~ >>> so i felt if bsd official could change FILE struct's fileno to a UN= SIGNED SHORT that may be an effecient and convenient solution just for my ca= se ? >>> UNSIGNED SHORT fileno is enough for me, and i don't wanna change a lo= t of FILE function that take FILE * as its argument ~ >>> Thank you ~~~ >>> winson sheng >>> =20 >>>=20 >>> winson sheng >>> From: Hongjiang Zhang >>> Date: 2016-12-23 10:17 >>> To: =E7=9B=9B=E6=85=A7=E5=8D=8E; freebsd-net >>> Subject: RE: question about fopen fd limit >>> Why do you need to map TCP fd to FILE? >>> It is difficult to modify FILE structure. If it is possible, let us fi= gure out some new designs to meet your requirement. >>> -----Original Message----- >>> From: owner-freebsd-net@freebsd.org [mailto:owner-freebsd-net@freebsd.or= g] On Behalf Of ??? >>> Sent: Thursday, December 22, 2016 11:57 PM >>> To: freebsd-net >>> Subject: question about fopen fd limit >>> hi all, >>> hi~ >>> we are from Chinese Game Develop Corp, Netease. >>> and One of our product using FreeBsd as its OS platform. >>> This Game has Millions of players online , and Each Server may holds= 25000+ tcp connection at the same time.Thanks to BSD and kqueue :) >>> for example, it's one of our server , netstat cmd to list connecti= ons overall... >>> netstat -an | grep 13396 (it's our listening port) | wc -l >>> 23221 >>> recently we do some performance optimize and promote this connect= limit to 28000+ or 30000+. >>> But we find Freebsd has a limit that this huge online number will tak= e 28000+ fd, and bsd FILE * struct's >>> fd only support to SHORT . such as .. >>> struct __sFILE { >>> ... >>> short _file; /* (*) fileno, if Unix descriptor, else -1 */ ... >>> so if our server want to fopen some file when we still hold this on= line number, the fd amount may easily exceed 32767, and fopen definitely ret= urn a err code. then the server will appear some fataly ERROR. >>> we do a simple test and confirm this situation. >>> then in fopen's code , we notice that we can use open to return a f= d instread of fopen to avoid this overflow, >>> as below >>> 68 /* >>> 1 * File descriptors are a full int, but _file is only a short. >>> 2 * If we get a valid file descriptor that is greater than >>> 3 * SHRT_MAX, then the fd will get sign-extended into an >>> 4 * invalid file descriptor. Handle this case by failing the >>> 5 * open. >>> 6 */ >>> BUT ... so many c lib FILE series function needs a FILE * pointer= as input argument, we can't convert all of them to fd, or it will be a rath= er suffering things to us. >>> and even in BSD 10 , it seems this short limit still there , but ot= her OS as debian , FILE strucnt's fileno is a int . >>> we found an unoffical patch easily change this fileno to unsigned ,= but we are a very stready project, we can't afford the risk to use an unoff= ical patch. >>> so, do you have any plan to change this fopen fd limit to UNSIGNED S= HORT or int in the future ? ushort is enough for us . >>> if you do , we are really glad and excited~~~~~~~if you don't ,it don= en't matter, plz give us a reply so that we may need to >>> find some other plan to resolved this suffering thing. >>> LoL, thank you !!!!! >>> yours sincerely >>> winson sheng >>> winson sheng >>> _______________________________________________ >>> freebsd-net@freebsd.org mailing list >>> https://na01.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Flists= .freebsd.org%2Fmailman%2Flistinfo%2Ffreebsd-net&data=3D02%7C01%7Chonzhan%40m= icrosoft.com%7C4a9dfccbccd446be2f4a08d42a833fb0%7C72f988bf86f141af91ab2d7cd0= 11db47%7C1%7C0%7C636180190584478890&sdata=3DPAwJP5IXHy0WJwxbV7MB%2B8zvKheZUY= jhHx3ohFRSPZM%3D&reserved=3D0 >>> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >>> _______________________________________________ >>> freebsd-net@freebsd.org mailing list >>> https://lists.freebsd.org/mailman/listinfo/freebsd-net >>> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >>=20 >> _______________________________________________ >> freebsd-net@freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-net >> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >=20 From owner-freebsd-arch@freebsd.org Tue Dec 27 19:48:09 2016 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id F1B5DC932A9 for ; Tue, 27 Dec 2016 19:48:09 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from mail.baldwin.cx (bigwig.baldwin.cx [96.47.65.170]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 5D95B1C6E; Tue, 27 Dec 2016 19:48:09 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from ralph.baldwin.cx (c-73-231-226-104.hsd1.ca.comcast.net [73.231.226.104]) by mail.baldwin.cx (Postfix) with ESMTPSA id 3D3F210AA56; Tue, 27 Dec 2016 14:48:02 -0500 (EST) From: John Baldwin To: freebsd-arch@freebsd.org Cc: Xin LI , Alfred Perlstein , Hongjiang Zhang , =?utf-8?B?55ub5oWn5Y2O?= , Jilles Tjoelker Subject: Re: question about fopen fd limit Date: Tue, 27 Dec 2016 11:27:36 -0800 Message-ID: <2792667.4rqir5N98G@ralph.baldwin.cx> User-Agent: KMail/4.14.10 (FreeBSD/11.0-STABLE; KDE/4.14.10; amd64; ; ) In-Reply-To: References: <2016122223570929089978@corp.netease.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.4.3 (mail.baldwin.cx); Tue, 27 Dec 2016 14:48:02 -0500 (EST) X-Virus-Scanned: clamav-milter 0.99.2 at mail.baldwin.cx X-Virus-Status: Clean X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 Dec 2016 19:48:10 -0000 On Tuesday, December 27, 2016 09:52:27 AM Xin LI wrote: > -freebsd-net (bcc), +freebsd-arch >=20 > This would work but also comes with a lot of pain. >=20 > (But really - why do we implement these accessors, like fileno() and > friends as macros with knowledge of the sFILE layout? Will it be > reasonable to start converting these to functions as a first step, wh= ich > would not break ABI but allow us to do it in the future? Otherwise w= e > would be just kicking the same can along the street forever...) >=20 > FILE is supposed to be fully opaque to application writers in my opin= ion. It should be, but it's a pain. There are various things that know abou= t FILE internals to manipulate the ungetc() state, etc. That is in thing= s like the gzip code and copied into umpteen different places in various ports. I have an older set of branches that attempt to make FILE parti= ally opaque and then add a new int-sized _file. However, I haven't fixed th= e last round of fallout from trying to make FILE more opaque. The PR for this (hiding most of FILE) is here: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D205029 The associated GIT branch is here. I have done some updates to it sinc= e the build failures in the PR, but haven't verified that I've fixed all of the reported bugs via my own port builds before submitting an updated patch to the PR to try for a new exp-run. https://github.com/freebsd/freebsd/compare/master...bsdjhb:stdio_hide The other PR related to exp-runs for expanding _file is: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D172332 The GIT branch that includes the fixes for this (including some compat stubs to account for old perl binaries that predate fdclose()) is here:= https://github.com/freebsd/freebsd/compare/master...bsdjhb:stdio_file I still think trying to finish the stdio_hide branch and that approach first is the right path before returning to _file itself. > On Sat, Dec 24, 2016 at 5:19 PM, Alfred Perlstein > wrote: >=20 > > Hello =E7=9B=9B=E6=85=A7=E5=8D=8E > > > > Here's another trick that may work. > > > > Use funopen(3) and provide your own read/write/seek and close funct= ions > > for the high fds. > > > > You can basically make "cookie" a struct that contains your "int si= zed" > > fds. > > > > FILE * > > funopen(const void *cookie, int (*readfn)(void *, char *, int)= , int > > (*writefn)(void *, const char *, int), > > fpos_t (*seekfn)(void *, fpos_t, int), int (*closefn)(void= *)); > > > > > > If you need more help please make sure to email me directly so I ca= n see > > your question. > > > > -Alfred > > > > > > > > On 12/23/16 12:48 AM, =E7=9B=9B=E6=85=A7=E5=8D=8E wrote: > > > >> hi all, > >> > >> Thank you for your advice ~ > >> solution 2 definitly broaden my horizons ~~but may be not a good= choice > >> for my project ~~LoL > >> i will try to mail freebsd-current mail list, if libc is as your > >> description , may be i should modify it by myself ~~ > >> Thank you so much~ > >> Are u KingSoft's Dr Zhang ? nice to meet you !!!!! > >> > >> > >> winson sheng > >> > >> > >> winson sheng > >> From: Hongjiang Zhang > >> Date: 2016-12-23 11:44 > >> To: =E7=9B=9B=E6=85=A7=E5=8D=8E; freebsd-net > >> Subject: RE: RE: question about fopen fd limit > >> Ok. I know. > >> There are two possible solutions: > >> Quick solution for short term: modify short to int in libc by your= self, > >> buildworld and installworld. Pushing to modify libc may take a lon= g time, > >> especially only few people encounter this issue. You=E2=80=99d bet= ter send email to > >> freebsd-current to confirm whether they accept your suggestion. > >> Work around: You can first reserve a series of fd before opening T= CP > >> connections. For example, invoke open(=E2=80=9C/dev/null=E2=80=9D)= for 10000 times to get > >> 10000 fds. Those fd values are small enough to be held by =E2=80=9C= short=E2=80=9D. After > >> that, start TCP connections. Once you need to fopen a file, please= call > >> open(=E2=80=9Cxxx=E2=80=9D) instead, and then use dup2(old_fd, new= _fd) to exchange the two > >> fd. The old_fd value is the one obtained by open(=E2=80=9Cxxx=E2=80= =9D), and new_fd is one > >> in your reserved fd fields, and next please use fdopen(fd, mode). = Here, you > >> have to manage the reserved fds by yourself including open/close. > >> In my eyes: > >> is the quick method, and there is no modifications in your logic. > >> Needs you to maintain the reserved consecutive fields for fd by yo= urself, > >> which increased the complexity of your logic. > >> Thanks > >> Hongjiang Zhang > >> From: =E7=9B=9B=E6=85=A7=E5=8D=8E [mailto:hhsheng@corp.netease.c= om] > >> Sent: Friday, December 23, 2016 11:02 AM > >> To: Hongjiang Zhang ; freebsd-net < > >> freebsd-net@freebsd.org> > >> Subject: Re: RE: question about fopen fd limit > >> hi all, > >> not map TCP to FILE, you misunderstanding my meaning~ > >> for example, if my server tcp already holds 32000 connection > >> fopen only has 767 fd to use > >> the problem has no bussiness with tcp fd, BUT fopen ... > >> in some particular situlations , my server will open 1k+ FILE= , that > >> will exceed the fileno limit, and overflow occur > >> my server can't open any file more ,that's the problem ~ > >> so i felt if bsd official could change FILE struct's fileno t= o a > >> UNSIGNED SHORT that may be an effecient and convenient solution ju= st for my > >> case ? > >> UNSIGNED SHORT fileno is enough for me, and i don't wanna chang= e a lot > >> of FILE function that take FILE * as its argument ~ > >> Thank you ~~~ > >> winson sheng > >> > >> > >> winson sheng > >> From: Hongjiang Zhang > >> Date: 2016-12-23 10:17 > >> To: =E7=9B=9B=E6=85=A7=E5=8D=8E; freebsd-net > >> Subject: RE: question about fopen fd limit > >> Why do you need to map TCP fd to FILE? > >> It is difficult to modify FILE structure. If it is possible, let= us > >> figure out some new designs to meet your requirement. > >> -----Original Message----- > >> From: owner-freebsd-net@freebsd.org [mailto:owner-freebsd-net@free= bsd.org] > >> On Behalf Of ??? > >> Sent: Thursday, December 22, 2016 11:57 PM > >> To: freebsd-net > >> Subject: question about fopen fd limit > >> hi all, > >> hi~ > >> we are from Chinese Game Develop Corp, Netease. > >> and One of our product using FreeBsd as its OS platform. > >> This Game has Millions of players online , and Each Server may= holds > >> 25000+ tcp connection at the same time.Thanks to BSD and kqueue :)= > >> for example, it's one of our server , netstat cmd to list > >> connections overall... > >> netstat -an | grep 13396 (it's our listening port) | wc -l > >> 23221 > >> recently we do some performance optimize and promote this c= onnect > >> limit to 28000+ or 30000+. > >> But we find Freebsd has a limit that this huge online number wi= ll take > >> 28000+ fd, and bsd FILE * struct's > >> fd only support to SHORT . such as .. > >> struct __sFILE { > >> ... > >> short _file; /* (*) fileno, if Unix descriptor, else -1 */ ... > >> so if our server want to fopen some file when we still hold t= his > >> online number, the fd amount may easily exceed 32767, and fopen de= finitely > >> return a err code. then the server will appear some fataly ERROR. > >> we do a simple test and confirm this situation. > >> then in fopen's code , we notice that we can use open to retu= rn a fd > >> instread of fopen to avoid this overflow, > >> as below > >> 68 /* > >> 1 * File descriptors are a full int, but _file is only a short. > >> 2 * If we get a valid file descriptor that is greater than > >> 3 * SHRT_MAX, then the fd will get sign-extended into an > >> 4 * invalid file descriptor. Handle this case by failing the > >> 5 * open. > >> 6 */ > >> BUT ... so many c lib FILE series function needs a FILE * p= ointer > >> as input argument, we can't convert all of them to fd, or it will = be a > >> rather suffering things to us. > >> and even in BSD 10 , it seems this short limit still there , = but > >> other OS as debian , FILE strucnt's fileno is a int . > >> we found an unoffical patch easily change this fileno to uns= igned , > >> but we are a very stready project, we can't afford the risk to use= an > >> unoffical patch. > >> so, do you have any plan to change this fopen fd limit to UNS= IGNED > >> SHORT or int in the future ? ushort is enough for us . > >> if you do , we are really glad and excited~~~~~~~if you don't ,= it > >> donen't matter, plz give us a reply so that we may need to > >> find some other plan to resolved this suffering thing. > >> LoL, thank you !!!!! > >> yours sincerely > >> winson sheng > >> winson sheng > >> _______________________________________________ > >> freebsd-net@freebsd.org mailing list > >> https://na01.safelinks.protection.outlook.com/?url=3Dhttps%3A% > >> 2F%2Flists.freebsd.org%2Fmailman%2Flistinfo%2Ffreebsd > >> -net&data=3D02%7C01%7Chonzhan%40microsoft.com%7C4a9dfccbccd4 > >> 46be2f4a08d42a833fb0%7C72f988bf86f141af91ab2d7cd011db47%7C1% > >> 7C0%7C636180190584478890&sdata=3DPAwJP5IXHy0WJwxbV7MB%2B8 > >> zvKheZUYjhHx3ohFRSPZM%3D&reserved=3D0 > >> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.= org" > >> _______________________________________________ > >> freebsd-net@freebsd.org mailing list > >> https://lists.freebsd.org/mailman/listinfo/freebsd-net > >> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.= org" > >> > > > > _______________________________________________ > > freebsd-net@freebsd.org mailing list > > https://lists.freebsd.org/mailman/listinfo/freebsd-net > > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.o= rg" > > > _______________________________________________ > freebsd-arch@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.or= g" --=20 John Baldwin From owner-freebsd-arch@freebsd.org Wed Dec 28 09:30:25 2016 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0AD1EC94C58; Wed, 28 Dec 2016 09:30:25 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citapm.icyb.net.ua (citapm.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 3266013D9; Wed, 28 Dec 2016 09:30:23 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citapm.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id LAA05639; Wed, 28 Dec 2016 11:30:15 +0200 (EET) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1cMAYp-000LnF-F1; Wed, 28 Dec 2016 11:30:15 +0200 To: "freebsd-arch@freebsd.org" , freebsd-fs From: Andriy Gapon Subject: INVARIANTS vs DIAGNOSTIC % lf_advlockasync Message-ID: <2225968a-7bce-b100-f3fa-a5e2eb8b9f47@FreeBSD.org> Date: Wed, 28 Dec 2016 11:28:54 +0200 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:45.0) Gecko/20100101 Thunderbird/45.5.1 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 Dec 2016 09:30:25 -0000 I wonder if there are any guidelines on when to use INVARIANTS vs DIAGNOSTIC vs something else. Should the amount of output be taken into account? Or the performance impact? Or just the common sense? What I really mean is that if some sanity check could be rather expensive (e.g. it needs to iterate over a potentially long list), what option should be used to enabled it? I ask this question with one particular case in mind. lf_advlockasync() has a block of code under INVARIANTS with a loop over a list that has a nested loop over the same list for pair-wise checks. What's worse is that that code is executed with a lock held and that lock can potentially be highly contended (ls_lock). In our test environment we can observe the lock being held for as much as 125 milliseconds resulting in a huge backlog on the lock. (Even though the requested advisory locks are all shared locks and unlocks.) So, we would like to reduce the performance hit in that code, but still have the benefits of INVARIANTS enabled over all. Any suggestions are welcome. Thank you. -- Andriy Gapon From owner-freebsd-arch@freebsd.org Wed Dec 28 11:56:53 2016 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D7A7BC94DF4; Wed, 28 Dec 2016 11:56:53 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 58D7B1250; Wed, 28 Dec 2016 11:56:53 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id uBSBukm6036421 (version=TLSv1 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Wed, 28 Dec 2016 13:56:46 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua uBSBukm6036421 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id uBSBuk7e036420; Wed, 28 Dec 2016 13:56:46 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Wed, 28 Dec 2016 13:56:46 +0200 From: Konstantin Belousov To: Andriy Gapon Cc: "freebsd-arch@freebsd.org" , freebsd-fs Subject: Re: INVARIANTS vs DIAGNOSTIC % lf_advlockasync Message-ID: <20161228115646.GU94325@kib.kiev.ua> References: <2225968a-7bce-b100-f3fa-a5e2eb8b9f47@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <2225968a-7bce-b100-f3fa-a5e2eb8b9f47@FreeBSD.org> User-Agent: Mutt/1.7.2 (2016-11-26) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 Dec 2016 11:56:53 -0000 On Wed, Dec 28, 2016 at 11:28:54AM +0200, Andriy Gapon wrote: > > I wonder if there are any guidelines on when to use INVARIANTS vs DIAGNOSTIC vs > something else. Should the amount of output be taken into account? Or the > performance impact? Or just the common sense? > > What I really mean is that if some sanity check could be rather expensive (e.g. > it needs to iterate over a potentially long list), what option should be used to > enabled it? > > I ask this question with one particular case in mind. > lf_advlockasync() has a block of code under INVARIANTS with a loop over a list > that has a nested loop over the same list for pair-wise checks. > What's worse is that that code is executed with a lock held and that lock can > potentially be highly contended (ls_lock). > In our test environment we can observe the lock being held for as much as 125 > milliseconds resulting in a huge backlog on the lock. (Even though the > requested advisory locks are all shared locks and unlocks.) There are at least two blocks of code in kern_lockf.c that you might want to conditionally enable, one is lf_advlockasync(), another in the beginning of lf_add_edge(). > > So, we would like to reduce the performance hit in that code, but still have the > benefits of INVARIANTS enabled over all. > We have a precedent with DIAGNOSTIC enabling very heavy weight checks, see kern/subr_vmem.c enable_vmem_check sysctl. From owner-freebsd-arch@freebsd.org Thu Dec 29 11:46:00 2016 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id F3076C96021 for ; Thu, 29 Dec 2016 11:46:00 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: from mail-wm0-x244.google.com (mail-wm0-x244.google.com [IPv6:2a00:1450:400c:c09::244]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 87D62144E for ; Thu, 29 Dec 2016 11:46:00 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: by mail-wm0-x244.google.com with SMTP id c85so27941170wmi.1 for ; Thu, 29 Dec 2016 03:46:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=SV8LlheFMZZI9Tm7RatZ+AcuQi/sHsOqI0qrEjVLff4=; b=qAT/xydr+oOvkZO0YWmIYGsYS4/vZbNzrvJczJpWZ7FY0oYYCsLz+GsdrnlHG0TbjM cVYfFrezjANJY/0AkQthe1GoOnK/BvVjdeCPxRol/Ey5hVAhDGMcQCvDojEGBGB9EDp+ XO21d6jJl0v2FXsMd/G48g1YD/TYbihvMyqmJMPbcrHsZilaIGRQM+1ySk9+tyH0Vcy7 nK1EURVrMN2kUkO5YKlo3hsKDukDc1AMp5PT8L9OuIGI0fRM2bia8hmeDSJVU5MUpnDK 5VRplIZYUfK6xIa5+v9idw66i5FYc8Up80+Jyk8kLMgq4y7umQZl0cLIYNgKhj8+jajo uKZA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=SV8LlheFMZZI9Tm7RatZ+AcuQi/sHsOqI0qrEjVLff4=; b=At1v1RTcR4eC6fx0h09XBNnTaSmB05BsmqIR4bhjOOCTgZ9r+rZ/eDuJpSX2SoFwpP SlVXfe91GCHLnmqTVmYbWts9e/Zzm0b487/2vxTt1t+7J7Dz0mbX8XsrIiVde1ca08ky Df6bKim95+omb5qx3UAAPmDw0S1BpahANNwgiK0PYcJQC2e4ZdUmKxjaZLTZKmFKamjB jW67T9btuMpCs39aP3nxyZ/ApRz7m+ytgEOlmTqRR2dfnMF5T+mOeWSGxuPAtX1F228g chj78VOBOcAphD78STxCp/vcOsnIcVOgLhcHsaPbX3Q868fDGkSaGjqDkCrQLRM2RQ5n +nHg== X-Gm-Message-State: AIkVDXK89dgah76UV/o5Sy8RViUKgb3ak4CGhD/vsqzX26i9bxKJkbuBxaL66519Ei9PWg== X-Received: by 10.28.131.72 with SMTP id f69mr34605417wmd.135.1483011958632; Thu, 29 Dec 2016 03:45:58 -0800 (PST) Received: from dft-labs.eu (n1x0n-1-pt.tunnel.tserv5.lon1.ipv6.he.net. [2001:470:1f08:1f7::2]) by smtp.gmail.com with ESMTPSA id w18sm65172277wme.9.2016.12.29.03.45.57 (version=TLS1_2 cipher=AES128-SHA bits=128/128); Thu, 29 Dec 2016 03:45:57 -0800 (PST) Date: Thu, 29 Dec 2016 12:45:55 +0100 From: Mateusz Guzik To: Bruce Evans Cc: freebsd-arch@freebsd.org Subject: Re: __read_only in the kernel Message-ID: <20161229114554.GA29676@dft-labs.eu> References: <20161127212503.GA23218@dft-labs.eu> <20161130011033.GA20999@dft-labs.eu> <20161201070246.S1023@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20161201070246.S1023@besplex.bde.org> User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Dec 2016 11:46:01 -0000 On Thu, Dec 01, 2016 at 08:05:54AM +1100, Bruce Evans wrote: > On Wed, 30 Nov 2016, Mateusz Guzik wrote: > >>diff --git a/sys/amd64/include/param.h b/sys/amd64/include/param.h > >>index a619e395..ab66e79 100644 > >>--- a/sys/amd64/include/param.h > >>+++ b/sys/amd64/include/param.h > >>@@ -152,4 +152,6 @@ > >> #define INKERNEL(va) (((va) >= DMAP_MIN_ADDRESS && (va) < DMAP_MAX_ADDRESS) \ > >> || ((va) >= VM_MIN_KERNEL_ADDRESS && (va) < VM_MAX_KERNEL_ADDRESS)) > >> > >>+#define __read_mostly __attribute__((__section__(".data.read_mostly"))) > >>+ > >> #endif /* !_AMD64_INCLUDE_PARAM_H_ */ > > 1. Hard-coded gccism: __attribute__(). > 1a. Non-use of the FreeBSD macro __section() whose purpose is to make it > easy to avoid using __attribute__() for precisely what it is used for > here. > 2. Misplaced definition. Such definitions go in . This one > has no dependencies on amd64 except possibly for bugs elsewhere, but > is only in an amd64 header. > [..] > According to userland tests, section statements like the above and the > ones in don't need any linker support to work, since they > create sections as necessary. > > So the above definition in should almost perfectly for > all arches, even without linker support. Style bug (1) is smaller if > it is there. > I wanted to avoid providing the definition for archs which don't have the linker script bit and this was the only header I found which is md and effectively always included. Indeed it seems the section is harmless even without the explicit support. > >>diff --git a/sys/conf/ldscript.amd64 b/sys/conf/ldscript.amd64 > >>index 5d86b03..ae98447 100644 > >>--- a/sys/conf/ldscript.amd64 > >>+++ b/sys/conf/ldscript.amd64 > >>@@ -151,6 +151,11 @@ SECTIONS > >> KEEP (*(.gnu.linkonce.d.*personality*)) > >> } > >> .data1 : { *(.data1) } > >>+ .data_read_mostly : > >>+ { > >>+ *(.data.read_mostly) > >>+ . = ALIGN(64); > >>+ } > >> _edata = .; PROVIDE (edata = .); > >> __bss_start = .; > >> .bss : > > For arches without this linker support, the variables would be grouped > but not aligned so much. > > Aligning the subsection seems to be useless anyway. This only aligns > the first variable in the subsection. Most variables are still only > aligned according to their natural or specified alignment. This is > rarely as large as 64. But I think variables in the subsection can > be more aligned than the subsection. If they had to be (as in a.out), > then it is the responsibility of the linker to align the subsection > to more than the default if a single variable in the subsection needs > more than the default. > With the indended use grouping side-by-side is beneficial - as the vars in question are supposed to be rarely modified, there is no problem with them sharing a cache line. Making them all use dedicated lines would only waste memory. That said, what about the patch below. I also grepped the tree and found 2 surprises, handled in the patch. diff --git a/sys/compat/linuxkpi/common/include/linux/compiler.h b/sys/compat/linuxkpi/common/include/linux/compiler.h index c780abc..c32f1fa 100644 --- a/sys/compat/linuxkpi/common/include/linux/compiler.h +++ b/sys/compat/linuxkpi/common/include/linux/compiler.h @@ -67,7 +67,6 @@ #define typeof(x) __typeof(x) #define uninitialized_var(x) x = x -#define __read_mostly __attribute__((__section__(".data.read_mostly"))) #define __always_unused __unused #define __must_check __result_use_check diff --git a/sys/conf/ldscript.amd64 b/sys/conf/ldscript.amd64 index 5d86b03..d87d607 100644 --- a/sys/conf/ldscript.amd64 +++ b/sys/conf/ldscript.amd64 @@ -151,6 +151,11 @@ SECTIONS KEEP (*(.gnu.linkonce.d.*personality*)) } .data1 : { *(.data1) } + .data.read_mostly : + { + *(.data.read_mostly) + } + . = ALIGN(64); _edata = .; PROVIDE (edata = .); __bss_start = .; .bss : diff --git a/sys/dev/drm2/drm_os_freebsd.h b/sys/dev/drm2/drm_os_freebsd.h index dc01c6a..11c9feb 100644 --- a/sys/dev/drm2/drm_os_freebsd.h +++ b/sys/dev/drm2/drm_os_freebsd.h @@ -80,7 +80,6 @@ typedef void irqreturn_t; #define __init #define __exit -#define __read_mostly #define BUILD_BUG_ON(x) CTASSERT(!(x)) #define BUILD_BUG_ON_NOT_POWER_OF_2(x) diff --git a/sys/sys/systm.h b/sys/sys/systm.h index a1ce9b4..5f646ff 100644 --- a/sys/sys/systm.h +++ b/sys/sys/systm.h @@ -445,4 +445,6 @@ extern void (*softdep_ast_cleanup)(void); void counted_warning(unsigned *counter, const char *msg); +#define __read_mostly __section(".data.read_mostly") + #endif /* !_SYS_SYSTM_H_ */ -- Mateusz Guzik From owner-freebsd-arch@freebsd.org Thu Dec 29 15:31:43 2016 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A5F2CC963E1 for ; Thu, 29 Dec 2016 15:31:43 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: from mail-wj0-x244.google.com (mail-wj0-x244.google.com [IPv6:2a00:1450:400c:c01::244]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 34F9016B7 for ; Thu, 29 Dec 2016 15:31:43 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: by mail-wj0-x244.google.com with SMTP id qs7so20334959wjc.1 for ; Thu, 29 Dec 2016 07:31:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=BCc3pYoQktWBW4mNkJhejVp3BfLObhJBmxU1Oeodw+U=; b=VnooW3XXuGJSZxsUMMo5HHLCgdod2dUSxW93Y/5KvN41Z5TZY7gExiLmz6DFECwzYf tXeLyL4sZyWrarXIKLWiHyKP1vyVI7fwgJ1/4vFtGrXNAcNmJQg7kDSxxRNslsCV1BNM h7V9LvMO59SddyxX8HMi1e3toVqcazgLv68jHXuiXs/FjRKYTei4bzCM5UkvWoZ6G1qN 2DqZjh26SPya3AwUzoUqoLLetRasc7fHgRRshsTduLy05IKH8oaNsyIKo/CAB3ohLTaa JTuNP89WmmxsC5SXhzDsmMjM3Via9J34z79MxxcM5TbvPqvUAZMpbbuniH7NbNVf5QT5 V+SA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=BCc3pYoQktWBW4mNkJhejVp3BfLObhJBmxU1Oeodw+U=; b=RNksOxHVQeScmy54UIUQGEsKCwWcmhJ+zCcksMJUtM6R2NkfVAqc0aqOJ1bNHMa5Pd oF68KqCDO3aW6IYvXbt8YoP9dsKQIUpGmB+wLbiJtO8GIUZop53xCr7FDuQaM5lDy7rd N2aKHme8MxrhysteVI0ma1gTpKa7Vj+UAQiIkWaXCHQXXQ5I5MESsfDLGpheLYch7VEW NLOVldBadO+YGlbKxFWSQnaCzO3xzqGYaxPirAvqZIt+RxNIjmKGPq2QivgSKnOhMEoo nIsY5LrsqlNF5SSBrV5FD+VBGANDTwVgs09YnIbmbs9Gus6o77W3hX16rERmwPBV5VYV IVjg== X-Gm-Message-State: AIkVDXL4/62jSLicot4QUeyDK1XjCd+uGiL+lsZZWjxI3JPmCX7k6sM0pjbqef+qTn3qyw== X-Received: by 10.194.93.104 with SMTP id ct8mr43685498wjb.87.1483025501314; Thu, 29 Dec 2016 07:31:41 -0800 (PST) Received: from dft-labs.eu (n1x0n-1-pt.tunnel.tserv5.lon1.ipv6.he.net. [2001:470:1f08:1f7::2]) by smtp.gmail.com with ESMTPSA id f76sm65954587wmd.15.2016.12.29.07.31.40 (version=TLS1_2 cipher=AES128-SHA bits=128/128); Thu, 29 Dec 2016 07:31:40 -0800 (PST) Date: Thu, 29 Dec 2016 16:31:38 +0100 From: Mateusz Guzik To: Bruce Evans Cc: freebsd-arch@freebsd.org Subject: Re: __read_only in the kernel Message-ID: <20161229153138.GC29676@dft-labs.eu> References: <20161127212503.GA23218@dft-labs.eu> <20161130011033.GA20999@dft-labs.eu> <20161201070246.S1023@besplex.bde.org> <20161229114554.GA29676@dft-labs.eu> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20161229114554.GA29676@dft-labs.eu> User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Dec 2016 15:31:43 -0000 On Thu, Dec 29, 2016 at 12:45:55PM +0100, Mateusz Guzik wrote: > On Thu, Dec 01, 2016 at 08:05:54AM +1100, Bruce Evans wrote: > > On Wed, 30 Nov 2016, Mateusz Guzik wrote: > > >>diff --git a/sys/amd64/include/param.h b/sys/amd64/include/param.h > > >>index a619e395..ab66e79 100644 > > >>--- a/sys/amd64/include/param.h > > >>+++ b/sys/amd64/include/param.h > > >>@@ -152,4 +152,6 @@ > > >> #define INKERNEL(va) (((va) >= DMAP_MIN_ADDRESS && (va) < DMAP_MAX_ADDRESS) \ > > >> || ((va) >= VM_MIN_KERNEL_ADDRESS && (va) < VM_MAX_KERNEL_ADDRESS)) > > >> > > >>+#define __read_mostly __attribute__((__section__(".data.read_mostly"))) > > >>+ > > >> #endif /* !_AMD64_INCLUDE_PARAM_H_ */ > > > > 1. Hard-coded gccism: __attribute__(). > > 1a. Non-use of the FreeBSD macro __section() whose purpose is to make it > > easy to avoid using __attribute__() for precisely what it is used for > > here. > > 2. Misplaced definition. Such definitions go in . This one > > has no dependencies on amd64 except possibly for bugs elsewhere, but > > is only in an amd64 header. > > > [..] > > According to userland tests, section statements like the above and the > > ones in don't need any linker support to work, since they > > create sections as necessary. > > > > So the above definition in should almost perfectly for > > all arches, even without linker support. Style bug (1) is smaller if > > it is there. > > > > I wanted to avoid providing the definition for archs which don't have > the linker script bit and this was the only header I found which is md > and effectively always included. > > Indeed it seems the section is harmless even without the explicit > support. > > > >>diff --git a/sys/conf/ldscript.amd64 b/sys/conf/ldscript.amd64 > > >>index 5d86b03..ae98447 100644 > > >>--- a/sys/conf/ldscript.amd64 > > >>+++ b/sys/conf/ldscript.amd64 > > >>@@ -151,6 +151,11 @@ SECTIONS > > >> KEEP (*(.gnu.linkonce.d.*personality*)) > > >> } > > >> .data1 : { *(.data1) } > > >>+ .data_read_mostly : > > >>+ { > > >>+ *(.data.read_mostly) > > >>+ . = ALIGN(64); > > >>+ } > > >> _edata = .; PROVIDE (edata = .); > > >> __bss_start = .; > > >> .bss : > > > > For arches without this linker support, the variables would be grouped > > but not aligned so much. > > > > Aligning the subsection seems to be useless anyway. This only aligns > > the first variable in the subsection. Most variables are still only > > aligned according to their natural or specified alignment. This is > > rarely as large as 64. But I think variables in the subsection can > > be more aligned than the subsection. If they had to be (as in a.out), > > then it is the responsibility of the linker to align the subsection > > to more than the default if a single variable in the subsection needs > > more than the default. > > > > With the indended use grouping side-by-side is beneficial - as the vars > in question are supposed to be rarely modified, there is no problem with > them sharing a cache line. Making them all use dedicated lines would > only waste memory. > > That said, what about the patch below. I also grepped the tree and found > 2 surprises, handled in the patch. > Scratch the previous patch. I extended it with __exclusive_cache_line (happy with ideas for a better name) - to be used for something which has to be alone in the cacheline, e.g. an atomic counter. diff --git a/sys/compat/linuxkpi/common/include/linux/compiler.h b/sys/compat/linuxkpi/common/include/linux/compiler.h index c780abc..c32f1fa 100644 --- a/sys/compat/linuxkpi/common/include/linux/compiler.h +++ b/sys/compat/linuxkpi/common/include/linux/compiler.h @@ -67,7 +67,6 @@ #define typeof(x) __typeof(x) #define uninitialized_var(x) x = x -#define __read_mostly __attribute__((__section__(".data.read_mostly"))) #define __always_unused __unused #define __must_check __result_use_check diff --git a/sys/conf/ldscript.amd64 b/sys/conf/ldscript.amd64 index 5d86b03..45685a4 100644 --- a/sys/conf/ldscript.amd64 +++ b/sys/conf/ldscript.amd64 @@ -150,6 +150,17 @@ SECTIONS *(.data .data.* .gnu.linkonce.d.*) KEEP (*(.gnu.linkonce.d.*personality*)) } + . = ALIGN(64); + .data.read_mostly : + { + *(.data.read_mostly) + } + . = ALIGN(64); + .data.exclusive_cache_line : + { + *(.data.exclusive_cache_line) + } + . = ALIGN(64); .data1 : { *(.data1) } _edata = .; PROVIDE (edata = .); __bss_start = .; diff --git a/sys/dev/drm2/drm_os_freebsd.h b/sys/dev/drm2/drm_os_freebsd.h index dc01c6a..11c9feb 100644 --- a/sys/dev/drm2/drm_os_freebsd.h +++ b/sys/dev/drm2/drm_os_freebsd.h @@ -80,7 +80,6 @@ typedef void irqreturn_t; #define __init #define __exit -#define __read_mostly #define BUILD_BUG_ON(x) CTASSERT(!(x)) #define BUILD_BUG_ON_NOT_POWER_OF_2(x) diff --git a/sys/sys/systm.h b/sys/sys/systm.h index a1ce9b4..719e063 100644 --- a/sys/sys/systm.h +++ b/sys/sys/systm.h @@ -445,4 +445,8 @@ extern void (*softdep_ast_cleanup)(void); void counted_warning(unsigned *counter, const char *msg); +#define __read_mostly __section(".data.read_mostly") +#define __exclusive_cache_line __aligned(CACHE_LINE_SIZE) \ + __section(".data.exclusive_cache_line") + #endif /* !_SYS_SYSTM_H_ */ -- Mateusz Guzik