From owner-freebsd-hackers@FreeBSD.ORG Sun Jul 7 10:33:27 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id B5AE8CD1 for ; Sun, 7 Jul 2013 10:33:27 +0000 (UTC) (envelope-from kaushalgoa@gmail.com) Received: from mail-ea0-x22a.google.com (mail-ea0-x22a.google.com [IPv6:2a00:1450:4013:c01::22a]) by mx1.freebsd.org (Postfix) with ESMTP id 506F71EF0 for ; Sun, 7 Jul 2013 10:33:27 +0000 (UTC) Received: by mail-ea0-f170.google.com with SMTP id h10so2379540eaj.1 for ; Sun, 07 Jul 2013 03:33:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=ovYkycN8QPZXRjvoBzmVRGhVP1VpWyqRt3F8zimbgcM=; b=QVpiQM3PTDQhNo6TXmNAdNpFRhSKi6ZVszWJh9tE7pQ2KBeaCSNvow6Xkih/GaaOv9 0PUTNgKmZh6KbPL4EN2fBFVci158s2SrkEktbYCm2rAlyRHieaL6QbfbwtgckJV9GjTO WAJ9A6kRfLhV96sy1fPgMr2q8FMODcp7ij6U3mFfL95S36L7G9ND49Ii9MWohLxhipRe QefdC9kVWb9JPGqNDhsDQ+p4DNztbHZeBprZLy4eU7ueKXSWXK+H4ke3UzIBDXzvYeOB ZgwbiGk7ZfHTl4w2vlJo9k4XGPibsDHEQw1I46oTk97EvmCewwFPDZQ7F30DnRMgN39Z maBg== X-Received: by 10.14.110.194 with SMTP id u42mr19778735eeg.128.1373193206255; Sun, 07 Jul 2013 03:33:26 -0700 (PDT) MIME-Version: 1.0 Received: by 10.14.138.141 with HTTP; Sun, 7 Jul 2013 03:33:06 -0700 (PDT) In-Reply-To: References: From: Kaushal Bhandankar Date: Sun, 7 Jul 2013 16:03:06 +0530 Message-ID: Subject: Fwd: ixgbe Jumbo race condition leading to Deadlock To: freebsd-hackers@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 07 Jul 2013 10:33:27 -0000 In 82599, for a Jumbo packet of 9.5 K ( which consumes 5 descriptors of 2048 bytes each ), when does the Descriptor write back happen ? Does it happen per Descriptor or once per aggregated Descriptors ? Is it possible that all descriptors except last one to be written back and when you read RDH register, I get the last pending descriptor waiting inside 82599. We are using srrctl |= IXGBE_SRRCTL_DESCTYPE_ADV_ONEBUF; In my setup, I am seeing that, I don't see EOP set even when I read 5 descriptors. Checking DD will return me an incomplete packet. What should I do in such a case ? References from Data sheet: -> Checking through DD bits eliminates a potential race condition: all descriptor data is posted internally prior to incrementing the head register and a read of the head register could potentially pass the descriptor waiting inside the 82599. Regards, Kaushal From owner-freebsd-hackers@FreeBSD.ORG Sun Jul 7 11:45:33 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 10D591FC for ; Sun, 7 Jul 2013 11:45:33 +0000 (UTC) (envelope-from mngesh1@yahoo.com) Received: from nm43.bullet.mail.ne1.yahoo.com (nm43.bullet.mail.ne1.yahoo.com [98.138.120.50]) by mx1.freebsd.org (Postfix) with ESMTP id A7D141100 for ; Sun, 7 Jul 2013 11:45:32 +0000 (UTC) Received: from [127.0.0.1] by nm43.bullet.mail.ne1.yahoo.com with NNFMP; 07 Jul 2013 11:45:26 -0000 Received: from [98.139.214.32] by tm15.bullet.mail.ne1.yahoo.com with NNFMP; 07 Jul 2013 11:41:43 -0000 Received: from [98.139.212.229] by tm15.bullet.mail.bf1.yahoo.com with NNFMP; 07 Jul 2013 11:41:43 -0000 Received: from [127.0.0.1] by omp1038.mail.bf1.yahoo.com with NNFMP; 07 Jul 2013 11:41:43 -0000 X-Yahoo-Newman-Property: ymail-4 X-Yahoo-Newman-Id: 463095.10658.bm@omp1038.mail.bf1.yahoo.com Received: (qmail 7420 invoked by uid 60001); 7 Jul 2013 11:41:43 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1373197303; bh=dFIqwPmRRVrAgW1ZIUZLqMGqtb+B61koimZpNNZnRXA=; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:Message-ID:Date:From:Reply-To:Subject:To:MIME-Version:Content-Type; b=bzfcNUq17LEmKjPEsRELiFXnzKCcW6RC42HS4aqKn/+1fIsksxH9TrqXg5OfGy9Pi256wlBLlgtOo6YOd8hyi3SsqvC30Tqe1l9byLTbTR/taGaVZ84Eyeqd2usIvUD+TEtatpWsiCKD5tuhaUsgEp9v1ojBGlWrXpnnhQj+5+o= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:Message-ID:Date:From:Reply-To:Subject:To:MIME-Version:Content-Type; b=FsMQH1zdqoS3ncaLU5CKXWIOHYpTpUIg7cThM6ngRyEMoGh2KBQChIzEoOSmDOg5KYx2ruJCvuTxjKt2970QhM1bez1nFhHDc0ozvcFjwWP5wNPbFQK7BId2EtYcNT8erMD1h9KNGWomMi4E5fq0BQ7m4IkxeCxFOmBl4wymSsc= ; X-YMail-OSG: wZZuB08VM1kh9czDlnotZzIosQUfo._j10SSukMweBjFirF ZX6CwJyG9pxeNNxAgQOquuenY0CFVdaH4C1GjQFOF3l.5Cmn8CbYBj8Uydxl iMgA7Kg2dfmp6N8.xCco7Oe3kEM2IPiJz82mi3ROsU1SlsFw5Do4tXRQhIza zXQWUA9VLAmqxQqUVqDdjgMJgTjcp8y.wFlN2KYdQXVy9kEQqXcsShBzTxIE qbqLm14J4r7T9TT5357qG1sn2YLWNEOKYxpJG1WNx5.aC0x9mXv4hH4W.N5U LL99yi4fh7M1_DXmHiroZ5lhYn6LPm51gebR1iPmChx3qqY_GWHoaFQlH5QG DM_Q2mbrTQLYj2x1qFf2fpbgD1.XZpE68oHEewpymcVieDHuo._RkO9X_Z4M E2U6Ox6Q_Qd.9aT0sjaKD.SUC24ZkAcSJNXxWXLW6nb_VnyBappeJlzJaBWy Ko.lo5BIHxOBJYq_yrKoh6Q0ZmZa8aB4KCFt0pVth7LUdKmzSRH.OK0gOKS1 F243qyEFd0jhel_G38mCh20wMGVIhhzMaVwYHMVCMxmv3FJTjyY2c7vbaGrL ZyRoNdZriUG25lwyjQjvGagm22s7V_Odwy7s8ZFr4Fvw- Received: from [180.215.1.137] by web160703.mail.bf1.yahoo.com via HTTP; Sun, 07 Jul 2013 04:41:43 PDT X-Rocket-MIMEInfo: 002.001, SGksCgpXaGF0IGlzIHRoZcKgbWVtbWFwwqBlcXVpdmFsZW50IG9mIExpbnV4IGluwqBGcmVlQlNEPwoKSW4gTGludXjCoG1lbW1hcMKgaXMgdXNlZCB0byByZXNlcnZlIGEgcG9ydGlvbiBvZiBwaHlzaWNhbCBtZW1vcnkuIFRoaXMgaXMgdXNlZCBhcyBhIGtlcm5lbCBib290IGFyZ3VtZW50LiBFLmcuOiBtZW1tYXA9MkckMUcgd2lsbCByZXNlcnZlIDFHQsKgbWVtb3J5IGFib3ZlIDJHQizCoMKgaW5jYXNlwqBJwqBoYXZlIDNHQsKgUkFNLiBUaGlzIDFHQsKgcmVzZXJ2ZWQgbWVtb3J5IGlzIG5vdCB2aXNpYmwBMAEBAQE- X-Mailer: YahooMailWebService/0.8.148.557 Message-ID: <1373197303.40304.YahooMailNeo@web160703.mail.bf1.yahoo.com> Date: Sun, 7 Jul 2013 04:41:43 -0700 (PDT) From: mangesh chitnis Subject: memmap in FreeBSD To: "freebsd-hackers@freebsd.org" MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.14 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: mangesh chitnis List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 07 Jul 2013 11:45:33 -0000 Hi,=0A=0AWhat is the=A0memmap=A0equivalent of Linux in=A0FreeBSD?=0A=0AIn L= inux=A0memmap=A0is used to reserve a portion of physical memory. This is us= ed as a kernel boot argument. E.g.: memmap=3D2G$1G will reserve 1GB=A0memor= y above 2GB,=A0=A0incase=A0I=A0have 3GB=A0RAM. This 1GB=A0reserved memory i= s not visible to=A0the=A0OS, however this 1GB=A0can be used using ioremap.= =A0=0AHow can=A0I=A0reserve memory in=A0FreeBSD and later use it=A0i.e=A0me= mmap=A0and=A0ioremap=A0equivalent?=0A=0AI have tried using hw.physmem loade= r parameter.=0AI have 3 GB system memory and=A0I=A0have set=A0hw.physmem=3D= 2G.=A0=0A=0A=0Asysctl -a=A0shows:=0Ahw.physmem: 2.12G=0A=0Ahw.usermem: 1.9G= =0Ahw.realmem: 2.15G=0A=0Adevinfo -rv=A0shows:=0Aram0:=A0=0A=0A0x00-0x9f3ff= =A0=0A0x10000000-0xbfedffff=A0=0A0xbff00000-0xbfffffff=0A=0AHere, looks lik= e it is showing the full 3 GB mapping.=0ANow, how do=A0I=A0know which is th= at 1 GB available memory (In Linux, this memory is shown as reserved in /pr= oc/iomem under System RAM)=A0? Also, which function(similar to=A0ioremap) s= hould=A0I=A0call to map the physical address to virtual address?=0A=0A=0ATh= anks. From owner-freebsd-hackers@FreeBSD.ORG Sun Jul 7 11:45:47 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 0A7742EF for ; Sun, 7 Jul 2013 11:45:47 +0000 (UTC) (envelope-from aryeh.friedman@gmail.com) Received: from mail-pd0-x235.google.com (mail-pd0-x235.google.com [IPv6:2607:f8b0:400e:c02::235]) by mx1.freebsd.org (Postfix) with ESMTP id E266E110D for ; Sun, 7 Jul 2013 11:45:46 +0000 (UTC) Received: by mail-pd0-f181.google.com with SMTP id 14so3222351pdj.12 for ; Sun, 07 Jul 2013 04:45:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=ismV/tJkUX0EPav8HreOl7EIay15veTSOBTw/GG6Fto=; b=skmb0Eo0FyTNeMTiCtvVQQcoX0+hFpleQ0O9MYXsprWHYKY4YlSDWXEipzvuJ4hnzm c/q6AbGclL0mVmbd4v50nIOVK/aAdxBqoXF3KY3RTesZ6HaOL1oPtdbqLO/x6e/6xNjt EYjA8rPwb114113KywD0axdsbolVlokbhf26QfskBXQTZbH8nGH5Zw2Q422lzxuuFEzK KijA/GHEB9+HKZ/3oggrSAAMu5XNNsFUxcL7qErP+AYJi8qAvNHAic3gNFqK/cMdOzpO zK87t9mZEXz84IT3snI45/RxP0dp6Zzqohuy/sGxxjnJ5bFhVeS+q5deEnMJlyd9zdR3 FnzA== MIME-Version: 1.0 X-Received: by 10.68.197.98 with SMTP id it2mr16669859pbc.200.1373197546682; Sun, 07 Jul 2013 04:45:46 -0700 (PDT) Received: by 10.68.80.231 with HTTP; Sun, 7 Jul 2013 04:45:46 -0700 (PDT) Date: Sun, 7 Jul 2013 07:45:46 -0400 Message-ID: Subject: writing a rc.d script From: Aryeh Friedman To: FreeBSD Mailing List Content-Type: text/plain; charset=ISO-8859-1 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 07 Jul 2013 11:45:47 -0000 I have a program I am making a port for that also requires a /usr/local/etc/rc.d script is there anywhere I can find documentation on how write one and/or a template file to follow? From owner-freebsd-hackers@FreeBSD.ORG Sun Jul 7 11:48:13 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 66145650 for ; Sun, 7 Jul 2013 11:48:13 +0000 (UTC) (envelope-from lists@eitanadler.com) Received: from mail-vc0-x229.google.com (mail-vc0-x229.google.com [IPv6:2607:f8b0:400c:c03::229]) by mx1.freebsd.org (Postfix) with ESMTP id 2793C1135 for ; Sun, 7 Jul 2013 11:48:13 +0000 (UTC) Received: by mail-vc0-f169.google.com with SMTP id ia10so2615601vcb.14 for ; Sun, 07 Jul 2013 04:48:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=eitanadler.com; s=0xdeadbeef; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=2AsamYxVFGqzK4iOBKYGYjqJ2GJpt05bZwWMrTtLf34=; b=jreMOB2oVxgiU4t9V92f49WK63ycGA94DXlo4e+sMGc2Zl9adwsyblj7ODnLZCuA9h cQdCf7Z//BXrzIwbrngKnr/EZEqAZu7zSopcJZ9V2ppuzNxvJwuIfrkq+Sox8WIrk9kn NQRewZ8mQ7DHN9qzioWn9XbXoHAFG/9mtKz+c= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type:x-gm-message-state; bh=2AsamYxVFGqzK4iOBKYGYjqJ2GJpt05bZwWMrTtLf34=; b=DWiT1rjHGETpCsbR+591GzYCcDe4wcUSu+1Mlr8XnxLB+McKZzqCgBPsQbbThUlpoA BB+K6/GST6w4EMm1OmVEPy17k2soOmeIOXdJYRUs5maAhIgOlznWwPinvLIJK+mg86xk /5RHA9fiUrz8c2epiGoNZOlUu92zg7gzhGzO/Nae1nxb4CGPovLV4SphCZnQjq23sydQ S9fv8xIElk3axAqPSULSyOvhPw7hEFRJRqliyNlL409rrgi4dddd8e3DirYy4/DAbv4i SdE/rNHTsYeYtGH5kp/qAhPN6HET0zSoHxSEvyKEuMKCRG+4oxCthGq/1lrridVFhC8U aSHg== X-Received: by 10.52.34.196 with SMTP id b4mr9718363vdj.70.1373197692682; Sun, 07 Jul 2013 04:48:12 -0700 (PDT) MIME-Version: 1.0 Received: by 10.220.80.77 with HTTP; Sun, 7 Jul 2013 04:47:42 -0700 (PDT) In-Reply-To: References: From: Eitan Adler Date: Sun, 7 Jul 2013 13:47:42 +0200 Message-ID: Subject: Re: writing a rc.d script To: Aryeh Friedman Content-Type: text/plain; charset=UTF-8 X-Gm-Message-State: ALoCoQmNA9w4WGHhTcAlpAxrQmAgHOiDF5b6e/xw6VWpe0TfBx1Kr0QUHGHw27oMYX1khFLtiweK Cc: FreeBSD Mailing List X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 07 Jul 2013 11:48:13 -0000 On Sun, Jul 7, 2013 at 1:45 PM, Aryeh Friedman wrote: > I have a program I am making a port for that also requires a > /usr/local/etc/rc.d script is there anywhere I can find documentation > on how write one and/or a template file to follow? http://www.freebsd.org/doc/en_US.ISO8859-1/articles/rc-scripting/ http://www.freebsd.org/doc/en/books/porters-handbook/rc-scripts.html -- Eitan Adler From owner-freebsd-hackers@FreeBSD.ORG Sun Jul 7 14:49:18 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id BD0C3E22 for ; Sun, 7 Jul 2013 14:49:18 +0000 (UTC) (envelope-from mad@madpilot.net) Received: from winston.madpilot.net (winston.madpilot.net [78.47.75.155]) by mx1.freebsd.org (Postfix) with ESMTP id 7DCCC17AE for ; Sun, 7 Jul 2013 14:49:17 +0000 (UTC) Received: from winston.madpilot.net (localhost [127.0.0.1]) by winston.madpilot.net (Postfix) with ESMTP id 3bpCB24NS9zFTxW; Sun, 7 Jul 2013 16:39:54 +0200 (CEST) X-Virus-Scanned: amavisd-new at madpilot.net Received: from winston.madpilot.net ([127.0.0.1]) by winston.madpilot.net (winston.madpilot.net [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id qqvtZ5DTeVTg; Sun, 7 Jul 2013 16:39:52 +0200 (CEST) Received: from tommy.madpilot.net (micro.madpilot.net [88.149.173.206]) by winston.madpilot.net (Postfix) with ESMTPSA; Sun, 7 Jul 2013 16:39:52 +0200 (CEST) Message-ID: <51D97DB8.3020808@madpilot.net> Date: Sun, 07 Jul 2013 16:39:52 +0200 From: Guido Falsi User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130627 Thunderbird/17.0.7 MIME-Version: 1.0 To: Eitan Adler Subject: Re: writing a rc.d script References: In-Reply-To: Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Cc: FreeBSD Mailing List , Aryeh Friedman X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 07 Jul 2013 14:49:18 -0000 On 07/07/13 13:47, Eitan Adler wrote: > On Sun, Jul 7, 2013 at 1:45 PM, Aryeh Friedman wrote: >> I have a program I am making a port for that also requires a >> /usr/local/etc/rc.d script is there anywhere I can find documentation >> on how write one and/or a template file to follow? > > http://www.freebsd.org/doc/en_US.ISO8859-1/articles/rc-scripting/ > http://www.freebsd.org/doc/en/books/porters-handbook/rc-scripts.html > > There is also the port devel/rclint which is quite useful to check rc scripts for correctness. -- Guido Falsi From owner-freebsd-hackers@FreeBSD.ORG Sun Jul 7 21:58:54 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 93ABB509; Sun, 7 Jul 2013 21:58:54 +0000 (UTC) (envelope-from freebsd-list@nuos.org) Received: from cargobay.net (cargobay.net [174.136.100.98]) by mx1.freebsd.org (Postfix) with ESMTP id 69E07186E; Sun, 7 Jul 2013 21:58:54 +0000 (UTC) Received: from leonidas.ccsys.com (unknown [65.35.151.3]) by cargobay.net (Postfix) with ESMTPSA id 66532D1F; Sun, 7 Jul 2013 21:57:50 +0000 (UTC) Message-ID: <51D9E499.103@nuos.org> Date: Sun, 07 Jul 2013 21:58:49 +0000 From: "Chad J. Milios" User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130624 Thunderbird/17.0.6 MIME-Version: 1.0 To: freebsd-chat@freebsd.org, freebsd-hackers@freebsd.org, freebsd-advocacy@freebsd.org, freebsd-user-groups@freebsd.org, freebsd-rc@freebsd.org Subject: Announcing: nuOS 0.0.9.1b1 - a whole NEW FreeBSD distro, NOT a fork Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 07 Jul 2013 21:58:54 -0000 PLEASE reply to this only in freebsd-chat. I have posted this announcement to five freebsd mailing lists, I hope I am not overstepping. Hello everybody. My name's Chad J. Milios. Long-time lurker, sparse rare sporadic poster. TL;DR? -- Skip below to our summary of features in an outline format then grab it at http://nuos.org . I would like to enthusiastically announce the release of an open-source project of much pride and passion of my good friend Scott C. Ziegler and myself which we have brought forth thanks to the support and contributions of about 15 others. I believe it is solidly ready to be shared with the world in the hopes that others may help build out the software and community in a way that promotes quality, innovation and collaboration much like FreeBSD has led the open-source community at doing. The nuOS project ( http://nuos.org ) is about bringing back the power to the people! Currently, technical software, hardware and networking power. Ultimately, the power of personal communication and community self-organization. Currently made by geeks/nerds/hackers for geeks/nerds/hackers, our intent is to create an entirely new software ecosystem that promotes quality, easy to use software that is for any-and-every man woman and child yet without lassoing us all into one herd of sheeple. ;) Simple, common things should always be EASY. Complex, amazing or never-before imagined things should always be POSSIBLE. We have a live image for download from our site. (Fully functional at 189 MB, just cat or dd to your 4 GB or larger usb drive or select it as a flat-file virtual disk in your hypervisor of choice. It is not an ISO and nuOS does not work well from optical media.) Or grab our source (currently hosted by GitHub at https://github.com/CropCircleSys/nuOS ) and build the entire system from any FreeBSD 9.1 system with one simple yet deeply customizable command. (We only build/test on amd64 and would like that to change in the future.) It is my belief that our software is PRODUCTION READY with our new beta release. It might just be the answer to the management headaches you may be having. Take the plunge tonight and find yourself breezing through your day-job with "nu"-found ease tomorrow morning. If you're the comfortable yet cautious type, watch the discussion for a week or two first instead. Either way, we intend to cause a positive large and lasting motion in the FreeBSD community. I hope you will give nuOS a look and offer your assessments and ask any questions you have. Please tear it and us apart in discussion with the goal of a better FreeBSD for us all! Documentation is one area that is sorely lacking though it is mostly because Scott and I consider most of our code clear enough to have been pretty self-documenting [for our purposes we've had until now]. It is our hope that with the community's help we will bring more and more of this platform to the high standard of quality that FreeBSD is known for. We aren't trying to create our own new garden. We offer this code with hopes that it, in part or in whole, might be some day included in canonical FreeBSD releases. We have NO intention on forking FreeBSD and are instead developing a very lightweight suite of tools which hopefully capture and collect modern best practices while providing a testing and proving ground for advanced FreeBSD features. We want to bring computing to more people, bring more computer users to open source, bring more high-value and responsible open-source users to FreeBSD and bring more current FreeBSD users guidance and enlightenment regarding advanced features in the face of FreeBSD's typical adherence to maximal backward compatibility, legacy support and solid ground yet sometimes daunting array of intimately detailed configuration choices. We do not seek to limit those choices or to shift the ground beneath current FreeBSD users' feet. We seek to offer an alternative flavor of default system for those interested in taking a step back from their current perspective in order to take a giant flying leap forward. This doesn't mean giving up anything in terms of compatibility or configurabilty, quite the contrary. Throughout our evolution, we seek to always maintain the environment that FreeBSD users have come to know and love while reducing the issues that sometimes irk them. We simply seek to provide a better way to structure, provision and maintain production systems and development processes. Outline of features: Extends plain old FreeBSD 9.1 (RELEASE or STABLE) and maintains total compatibility We seek to remain nimble Expect a production-ready seal of approval to lag behind releases by no more than a week or two and prebuilt images and packages e.g. releases like 9.2 and 10.0, et al Someone should be able to build it and use all applicable features on 8.4 with ease we simply haven't the time or inclination to even try Default full ZFS filesystem layout, completely legacy-free Boot from ZFS, boot to ZFS If you'd like use all 100.0% of all your drives for one large zpool Use one large zpool for all of your filesystems block volumes alternate boot environments, including one called "rescue" which is included NO partitions, not some tiny /, not even a /boot Just ZFS datasets in their infinite flexibility /etc is now a ZFS dataset of its own How did we do it? Decades of conventional wisdom says /etc must be on /. Check it out, discuss the whys and the trade-offs. nu_jail - provision all sorts of jails No guesswork Yet no cookie-cutter limitations Clean-room jails provisioned almost instantly ZFS clone of /etc and /var give you almost no storage overhead nullfs and/or unionfs mounts of /, /usr, /usr/local give you almost no memory overhead Run 1,000 jails and 10,000 Apache instances they safely access the same executable memory pages they securely know not of one-another's existence Advanced intra-host networking with VIMAGE kernel by default, simplified Made for developers who want robustness, power and flexibility streamlined for Unlimited development, testing, staging and production environments Uses all of the new jail and vnet features of FreeBSD 9.1 We cleaned out all of the cruft left over from earlier versions That is just a taste of the features that we consider complete enough for use in your PRODUCTION systems. There are many more features production ready, our approach to package management for instance is in the early stages and provides simple functionality but does so in a way that is predictable, reliable and SOLID. It is also our strong commitment that we will never cram any of these features down your throat. You may take some a la carte without penalty and you may bring your own tools like pkg-ng, portupgrade or portmaster. We never store data in strange places or formats, we use the standard editable text configuration files and other sanctioned FreeBSD ways-of-doing-things as a single source of truth. ALL of the nuOS system is manageable from the command line and those utilities have no external dependencies, just sh, sed, awk and make from the base FreeBSD system. APIs still being built atop our core utilities and being packaged for open-source release expose interfaces such as HTTP REST, SNMPv3 and Mailman and may do so using advanced software packages from the ports collection. Functionality will NOT be introduced in APIs, web-apps or GUIs that is not equally usable, first-class, from the command line. Not even curses GUIs. Curse curses! All that being said, the project is in it's infancy. Just breaching the birth-canal, quite literally, with this announcement. It's not going to do your work for you or cook you dinner just yet. What it offers is clean and complete. Incomplete areas will be clearly marked with orange cones and yellow tape. They will not impede your path should you decide to avoid them. It should be noted that the nuOS project is a loose not-for-profit association currently sponsored by a for-profit corporation, Crop Circle Systems, Inc. ( http://ccsys.com ) of which I am a founder. (A corporation with a market cap of about that of a used Yugo, but a for-profit corporation nonetheless.) All code released from the project is and shall be covered by either the Simplified BSD license or Mozilla Public License v2.0 if it is not simply placed into the public domain. WARNING: It should be noted that the live boot image includes three user accounts with default names and passwords. "joe": He's your normal barely-privileged user, employee of business or all-around troublemaker; this would be your boss. "ninja": That's you, technical sword for honor and/or for hire. "sumyungai": That's me, your distributor. (Or you, when you disseminate nuOS to other ninjas along with your value-added contribution/support.) All of this is easily customizable with a few command line options when you stage a real deployment. On the live boot image the root account has no password and the local ttys are assumed physically secured, as per FreeBSD default, so you can just log in as root from the local console and change the account passwords and/or add one for root if you like. sshd is the one service already enabled but the network is not configured by default. Uncommenting a line in /etc/rc.conf.local is all it takes to enable auto-dhcp on every interface though most admins will want to add an appropriate line for their preferred interface. Thank you from Scott and myself for reading. Hopefully I'll be thanking you for trying, discussing and contributing! --Chad J. Milios From owner-freebsd-hackers@FreeBSD.ORG Sun Jul 7 22:09:25 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 5CC0B87B for ; Sun, 7 Jul 2013 22:09:25 +0000 (UTC) (envelope-from cbergstrom@pathscale.com) Received: from mail-pb0-f43.google.com (mail-pb0-f43.google.com [209.85.160.43]) by mx1.freebsd.org (Postfix) with ESMTP id 377E318E3 for ; Sun, 7 Jul 2013 22:09:25 +0000 (UTC) Received: by mail-pb0-f43.google.com with SMTP id md12so3619126pbc.30 for ; Sun, 07 Jul 2013 15:09:24 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding :x-gm-message-state; bh=5WPoTh1XwPx89kr5KC/2BkpO0h1QtPJNwYCnUpW6KJE=; b=oP3eJuyIpV4s51bXARsJG9N+tPnugo+qcgN97i8+CySZK2oEOjEAdxasapXcMlniI0 m75iSF/qKWbdXf/j/EIA0lguFoN7hMN9rzGdMfdY9j7KD/vQGpO1ntdJRTspAS+xteBO KEGOzpdW/Olg/Q8MO5BAJQ4slKU/GW3/snSMkgvGIVQLEkxoWW+KyUCU1wjjOTJxTzsG F8tMdRV0+x5Ac8NcwKYGofJm1rILCQDcHLVe6JU1rzQnEwYDJGIu1Ai8AIoiAPESpAha PKGVszpArEnsttxykNBoR9VEoumcuXMu0KQniKLDr0bXqyChF/ye5Jezv1P85GQHoqpr 4iwg== X-Received: by 10.68.170.37 with SMTP id aj5mr13871516pbc.79.1373234964749; Sun, 07 Jul 2013 15:09:24 -0700 (PDT) Received: from [192.168.1.36] (ppp-124-121-208-199.revip2.asianet.co.th. [124.121.208.199]) by mx.google.com with ESMTPSA id wg6sm18796772pbc.3.2013.07.07.15.09.21 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Sun, 07 Jul 2013 15:09:22 -0700 (PDT) Message-ID: <51D9E641.5020905@pathscale.com> Date: Mon, 08 Jul 2013 05:05:53 +0700 From: =?ISO-8859-1?Q?=22C=2E_Bergstr=F6m=22?= User-Agent: Mozilla/5.0 (X11; SunOS i86pc; rv:10.0.6esrpre) Gecko/20120731 Thunderbird/10.0.6 MIME-Version: 1.0 To: "Chad J. Milios" Subject: Re: [SPAM] Announcing: nuOS 0.0.9.1b1 - a whole NEW FreeBSD distro, NOT a fork References: <51D9E499.103@nuos.org> In-Reply-To: <51D9E499.103@nuos.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Gm-Message-State: ALoCoQlbcAAk0o1vy0ehBD7JvvXFoREh2ChhVjS3LOozehkjfuJ1QhOZ7de21FHxq3IH8aJu9QyF Cc: freebsd-hackers@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 07 Jul 2013 22:09:25 -0000 On 07/ 8/13 04:58 AM, Chad J. Milios wrote: > Outline of features: > > Extends plain old FreeBSD 9.1 (RELEASE or STABLE) and maintains total > compatibility > We seek to remain nimble > Expect a production-ready seal of approval to lag behind releases > by no more than a week or two > and prebuilt images and packages > e.g. releases like 9.2 and 10.0, et al > Someone should be able to build it and use all applicable > features on 8.4 with ease > we simply haven't the time or inclination to even try > Default full ZFS filesystem layout, completely legacy-free > Boot from ZFS, boot to ZFS > If you'd like use all 100.0% of all your drives for one large > zpool > Use one large zpool for all of your > filesystems > block volumes > alternate boot environments, including one called "rescue" > which is included > NO partitions, not some tiny /, not even a /boot > Just ZFS datasets in their infinite flexibility > /etc is now a ZFS dataset of its own > How did we do it? > Decades of conventional wisdom says /etc must be > on /. > Check it out, discuss the whys and the trade-offs. > nu_jail - provision all sorts of jails > No guesswork > Yet no cookie-cutter limitations > Clean-room jails provisioned almost instantly > ZFS clone of /etc and /var give you almost no storage overhead > nullfs and/or unionfs mounts of /, /usr, /usr/local give you > almost no memory overhead > Run 1,000 jails and 10,000 Apache instances > they safely access the same executable memory pages > they securely know not of one-another's existence > Advanced intra-host networking with VIMAGE kernel by default, > simplified > Made for developers who want robustness, power and flexibility > streamlined for > Unlimited development, testing, staging and production > environments > Uses all of the new jail and vnet features of FreeBSD 9.1 > We cleaned out all of the cruft left over from earlier versions omg you've created Solaris ------------ If you're going to spam commercial stuff with absolutely no technically interesting details - please keep it brief at the least. Generally people will be curious about What are you actually adding to the ISO which FBSD-current can't do? If it's not upstream already - will it be contributed upstream? From owner-freebsd-hackers@FreeBSD.ORG Sun Jul 7 22:28:52 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 8559B657 for ; Sun, 7 Jul 2013 22:28:52 +0000 (UTC) (envelope-from bright@mu.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.freebsd.org (Postfix) with ESMTP id 7725A19B7 for ; Sun, 7 Jul 2013 22:28:52 +0000 (UTC) Received: from Alfreds-MacBook-Pro-9.local (c-67-180-208-218.hsd1.ca.comcast.net [67.180.208.218]) by elvis.mu.org (Postfix) with ESMTPSA id 55E061A3C61 for ; Sun, 7 Jul 2013 15:28:52 -0700 (PDT) Message-ID: <51D9EBA4.2060207@mu.org> Date: Sun, 07 Jul 2013 15:28:52 -0700 From: Alfred Perlstein User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:17.0) Gecko/20130620 Thunderbird/17.0.7 MIME-Version: 1.0 To: freebsd-hackers@freebsd.org Subject: Re: [SPAM] Announcing: nuOS 0.0.9.1b1 - a whole NEW FreeBSD distro, NOT a fork References: <51D9E499.103@nuos.org> <51D9E641.5020905@pathscale.com> In-Reply-To: <51D9E641.5020905@pathscale.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 07 Jul 2013 22:28:52 -0000 On 7/7/13 3:05 PM, "C. Bergstrm" wrote: > On 07/ 8/13 04:58 AM, Chad J. Milios wrote: > > > >> Outline of features: >> >> Extends plain old FreeBSD 9.1 (RELEASE or STABLE) and maintains total >> compatibility >> We seek to remain nimble >> Expect a production-ready seal of approval to lag behind releases >> by no more than a week or two >> and prebuilt images and packages >> e.g. releases like 9.2 and 10.0, et al >> Someone should be able to build it and use all applicable >> features on 8.4 with ease >> we simply haven't the time or inclination to even try >> Default full ZFS filesystem layout, completely legacy-free >> Boot from ZFS, boot to ZFS >> If you'd like use all 100.0% of all your drives for one large >> zpool >> Use one large zpool for all of your >> filesystems >> block volumes >> alternate boot environments, including one called >> "rescue" which is included >> NO partitions, not some tiny /, not even a /boot >> Just ZFS datasets in their infinite flexibility >> /etc is now a ZFS dataset of its own >> How did we do it? >> Decades of conventional wisdom says /etc must be >> on /. >> Check it out, discuss the whys and the trade-offs. >> nu_jail - provision all sorts of jails >> No guesswork >> Yet no cookie-cutter limitations >> Clean-room jails provisioned almost instantly >> ZFS clone of /etc and /var give you almost no storage overhead >> nullfs and/or unionfs mounts of /, /usr, /usr/local give you >> almost no memory overhead >> Run 1,000 jails and 10,000 Apache instances >> they safely access the same executable memory pages >> they securely know not of one-another's existence >> Advanced intra-host networking with VIMAGE kernel by default, >> simplified >> Made for developers who want robustness, power and flexibility >> streamlined for >> Unlimited development, testing, staging and production >> environments >> Uses all of the new jail and vnet features of FreeBSD 9.1 >> We cleaned out all of the cruft left over from earlier versions > > omg you've created Solaris > > ------------ > If you're going to spam commercial stuff with absolutely no > technically interesting details - please keep it brief at the least. > > Generally people will be curious about > What are you actually adding to the ISO which FBSD-current can't do? > If it's not upstream already - will it be contributed upstream? It seems pretty obvious to me that the contribution is that all this stuff works out of the box. That is pretty nice. -- Alfred Perlstein VP Software Engineering, iXsystems From owner-freebsd-hackers@FreeBSD.ORG Sun Jul 7 22:35:48 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id B452692B for ; Sun, 7 Jul 2013 22:35:48 +0000 (UTC) (envelope-from cbergstrom@pathscale.com) Received: from mail-pa0-f44.google.com (mail-pa0-f44.google.com [209.85.220.44]) by mx1.freebsd.org (Postfix) with ESMTP id 8F3BF1A07 for ; Sun, 7 Jul 2013 22:35:48 +0000 (UTC) Received: by mail-pa0-f44.google.com with SMTP id lj1so3728577pab.3 for ; Sun, 07 Jul 2013 15:35:42 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding :x-gm-message-state; bh=I6uaTsNgvGAgao6M6PW4fissdK4jW1a9G+qspt5rwaE=; b=mXaUgqLqfUzyT5YHi1LiMU9kAYaaKypFAUT6n/B4y63FL8bGqYfXWBRNU+FUb25FzR aCR2fest8qYqDqp3llxm9mGgk/2bQ1ejFmxt+ZTiE23mS6Tt4JFMVHyRELVeVTkbyJI7 A0aPQGilfpQ0C49RoDEUf/H07n233cetttV0SVtGD/3K3hlya8/HfYwZXIbYy1yaQSCa m5DI5heT2ovRrJ57lqVr2AJq4tOp8g/8zQf0Y97Bf+qWnDCq3nucAT1AVjgXGofmvBJi eEu3qtHXbrPMT421BczbMwK7HSB0JMVV1wvnKHjA4U3cvoSVnX36S7mjKSdlUwHOAt4i 8JDA== X-Received: by 10.66.241.1 with SMTP id we1mr19857344pac.83.1373236542499; Sun, 07 Jul 2013 15:35:42 -0700 (PDT) Received: from [192.168.1.36] (ppp-124-121-208-199.revip2.asianet.co.th. [124.121.208.199]) by mx.google.com with ESMTPSA id vb8sm18854212pbc.11.2013.07.07.15.35.37 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Sun, 07 Jul 2013 15:35:41 -0700 (PDT) Message-ID: <51D9EC66.5010907@pathscale.com> Date: Mon, 08 Jul 2013 05:32:06 +0700 From: =?ISO-8859-1?Q?=22C=2E_Bergstr=F6m=22?= User-Agent: Mozilla/5.0 (X11; SunOS i86pc; rv:10.0.6esrpre) Gecko/20120731 Thunderbird/10.0.6 MIME-Version: 1.0 To: Alfred Perlstein Subject: Re: [SPAM] Announcing: nuOS 0.0.9.1b1 - a whole NEW FreeBSD distro, NOT a fork References: <51D9E499.103@nuos.org> <51D9E641.5020905@pathscale.com> <51D9EBA4.2060207@mu.org> In-Reply-To: <51D9EBA4.2060207@mu.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-Gm-Message-State: ALoCoQlx/e8MgUny5qoDb+ym3sFiz+kbpuksVCglU5gO+DR4MuAw0VQF4ATv3d98AylTbGKOh4y5 Cc: freebsd-hackers@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 07 Jul 2013 22:35:48 -0000 On 07/ 8/13 05:28 AM, Alfred Perlstein wrote: > On 7/7/13 3:05 PM, "C. Bergstrm" wrote: >> On 07/ 8/13 04:58 AM, Chad J. Milios wrote: >> >> >> >>> Outline of features: >>> >>> Extends plain old FreeBSD 9.1 (RELEASE or STABLE) and maintains >>> total compatibility >>> We seek to remain nimble >>> Expect a production-ready seal of approval to lag behind >>> releases by no more than a week or two >>> and prebuilt images and packages >>> e.g. releases like 9.2 and 10.0, et al >>> Someone should be able to build it and use all >>> applicable features on 8.4 with ease >>> we simply haven't the time or inclination to even try >>> Default full ZFS filesystem layout, completely legacy-free >>> Boot from ZFS, boot to ZFS >>> If you'd like use all 100.0% of all your drives for one >>> large zpool >>> Use one large zpool for all of your >>> filesystems >>> block volumes >>> alternate boot environments, including one called >>> "rescue" which is included >>> NO partitions, not some tiny /, not even a /boot >>> Just ZFS datasets in their infinite flexibility >>> /etc is now a ZFS dataset of its own >>> How did we do it? >>> Decades of conventional wisdom says /etc must be >>> on /. >>> Check it out, discuss the whys and the trade-offs. >>> nu_jail - provision all sorts of jails >>> No guesswork >>> Yet no cookie-cutter limitations >>> Clean-room jails provisioned almost instantly >>> ZFS clone of /etc and /var give you almost no storage overhead >>> nullfs and/or unionfs mounts of /, /usr, /usr/local give you >>> almost no memory overhead >>> Run 1,000 jails and 10,000 Apache instances >>> they safely access the same executable memory pages >>> they securely know not of one-another's existence >>> Advanced intra-host networking with VIMAGE kernel by default, >>> simplified >>> Made for developers who want robustness, power and flexibility >>> streamlined for >>> Unlimited development, testing, staging and production >>> environments >>> Uses all of the new jail and vnet features of FreeBSD 9.1 >>> We cleaned out all of the cruft left over from earlier versions >> >> omg you've created Solaris >> >> ------------ >> If you're going to spam commercial stuff with absolutely no >> technically interesting details - please keep it brief at the least. >> >> Generally people will be curious about >> What are you actually adding to the ISO which FBSD-current can't do? >> If it's not upstream already - will it be contributed upstream? > > It seems pretty obvious to me that the contribution is that all this > stuff works out of the box. That is pretty nice. Ok so I repeat my question ---- If the current ISO doesn't do this - why not? (bugs fixed, different configuration. etc) If it's not upstream already - will it be contributed upstream? (clearly someone is interested in this) From owner-freebsd-hackers@FreeBSD.ORG Sun Jul 7 23:06:11 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 13ED434F; Sun, 7 Jul 2013 23:06:11 +0000 (UTC) (envelope-from freebsd-list@nuos.org) Received: from cargobay.net (cargobay.net [174.136.100.98]) by mx1.freebsd.org (Postfix) with ESMTP id E394C1AE5; Sun, 7 Jul 2013 23:06:10 +0000 (UTC) Received: from leonidas.ccsys.com (unknown [65.35.151.3]) by cargobay.net (Postfix) with ESMTPSA id 1CCEAD58; Sun, 7 Jul 2013 23:05:07 +0000 (UTC) Message-ID: <51D9F45E.2050000@nuos.org> Date: Sun, 07 Jul 2013 23:06:06 +0000 From: "Chad J. Milios" User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130624 Thunderbird/17.0.6 MIME-Version: 1.0 To: =?ISO-8859-1?Q?=22C=2E_Bergstr=F6m=22?= Subject: Re: Announcing: nuOS 0.0.9.1b1 - a whole NEW FreeBSD distro, NOT a fork References: <51D9E499.103@nuos.org> <51D9E641.5020905@pathscale.com> In-Reply-To: <51D9E641.5020905@pathscale.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit Cc: freebsd-hackers@freebsd.org, freebsd-chat@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 07 Jul 2013 23:06:11 -0000 On 07/07/13 22:05, "C. Bergstrm" wrote: > > omg you've created Solaris > > ------------ > If you're going to spam commercial stuff with absolutely no > technically interesting details - please keep it brief at the least. > > Generally people will be curious about > What are you actually adding to the ISO which FBSD-current can't do? > If it's not upstream already - will it be contributed upstream? > Please reply further on freebsd-chat, I'd like to consolidate any discussion this may garner. This doesn't provide anything to the core OS that can't already be done, albeit with many more keystrokes and the peril of possible confusion and misconfiguration. The main thing here is a collaboration of what we consider best practices and consolidating the more useful configurations into consistent recipes with useful simplification of parameters. We don't mean to add yet another layer in the name of simplicity that obscures or hides the real nuts and bolt beneath and limits your options. We want to make things more flexible and easier at the same time by using the sanctioned FreeBSD ways of doing things, simply allowing the ones with most merit to rise to the top, hopefully through community involvement. We've had a lot of success using this in our production deployments and hope that we don't have to be the only ones to maintain it forever. It is an open offer of contribution to The FreeBSD Project but it probably doesn't exactly belong there yet. It's a layer above, so to speak, and we think we have a place in the community working side by side. It's a distro around FreeBSD, think picoBSD or maybe FreeNAS. It's not going to be a fork like PC-BSD or Dragonfly. I'm hoping we can be a proving ground for the more advanced features of FreeBSD, by allowing more users to jump on board with them sooner, and then offer the applicable bits and pieces back upstream while continually pushing the innovation envelope in a way that more people and companies can participate in. The tool nu_install is basically sysinstall on steroids. It doesn't do all the things that sysinstall does and you may still use sysinstall to configure a system or a jail you've provisioned with nu_install or nu_jail. nu_install automates a process of building a ZFS only FreeBSD system and offers a default dataset layout featuring best practices we've deduced from using ZFS on FreeBSD since its infancy and reading and considering many various differing and conflicting ZFS on root how-tos. For instance, many ZFS on root tutorials use a UFS /boot partition and/or mountpoint=legacy and entries in /etc/fstab. We suffer neither of those holdovers. Another feature I've not yet found in any tutorial is /etc having its own dataset. nu_jail creates cloned datasets and jail.conf entries along the school of thought set out by our nu_install base system. Jails in FreeBSD allow many use cases that were never dreamed of on other platforms and we don't seek to force any particular cookie-cutter way of provisioning a jail, just simplifying the uses that we've found most common. We wanted ease and simplicity but refused to give up less-common possibilities or give up the simplicity just to tweak something a little differently to do something that's never been done. Thank you for reading and offering your thoughts. LOL @ the Solaris comment, as I am a long-time Solaris user and fan but always been a bigger fan of the BSDs and FreeBSD mostly in particular for the last decade. In short, we seek to do with FreeBSD something like what Joyent has done with illumos in their SmartOS but then continue further with that idea. From owner-freebsd-hackers@FreeBSD.ORG Mon Jul 8 00:12:58 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 94025B58; Mon, 8 Jul 2013 00:12:58 +0000 (UTC) (envelope-from Devin.Teske@fisglobal.com) Received: from mx1.fisglobal.com (mx1.fisglobal.com [199.200.24.190]) by mx1.freebsd.org (Postfix) with ESMTP id 63D411C9F; Mon, 8 Jul 2013 00:12:58 +0000 (UTC) Received: from smtp.fisglobal.com ([10.132.206.31]) by ltcfislmsgpa01.fnfis.com (8.14.5/8.14.5) with ESMTP id r680Cv9d025879 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT); Sun, 7 Jul 2013 19:12:57 -0500 Received: from LTCFISWMSGMB21.FNFIS.com ([10.132.99.23]) by LTCFISWMSGHT03.FNFIS.com ([10.132.206.31]) with mapi id 14.02.0309.002; Sun, 7 Jul 2013 19:12:57 -0500 From: "Teske, Devin" To: "Chad J. Milios" Subject: Re: Announcing: nuOS 0.0.9.1b1 - a whole NEW FreeBSD distro, NOT a fork Thread-Topic: Announcing: nuOS 0.0.9.1b1 - a whole NEW FreeBSD distro, NOT a fork Thread-Index: AQHOe10+p/Zf2qC+20mYyH00Dv0xCplaPOCA Date: Mon, 8 Jul 2013 00:12:56 +0000 Message-ID: <13CA24D6AB415D428143D44749F57D7201FB6177@ltcfiswmsgmb21> References: <51D9E499.103@nuos.org> In-Reply-To: <51D9E499.103@nuos.org> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.132.253.126] Content-Type: text/plain; charset="us-ascii" Content-ID: <15939EF16295654FBA8F994607B8F0AF@fisglobal.com> Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.10.8794, 1.0.431, 0.0.0000 definitions=2013-07-07_08:2013-07-05,2013-07-07,1970-01-01 signatures=0 Cc: FreeBSD Hackers , Devin Teske X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: Devin Teske List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 08 Jul 2013 00:12:58 -0000 On Jul 7, 2013, at 2:58 PM, Chad J. Milios wrote: [snip] > /etc is now a ZFS dataset of its own > How did we do it? > Decades of conventional wisdom says /etc must be on /. > Check it out, discuss the whys and the trade-offs. Well, I see in nu_install on GitHub how you're doing it: You added: init_script=3D"/boot/init.sh" to /boot/loader.conf, wich -- among other things -- does these two interest= ing things (variable names changed to make things more clear): zfs rollback -r $zfs/swap/host@blank NOTE: $zfs is equal to $( /bin/kenv vfs.root.mountfrom ) minus the leading = "zfs:" and zfs mount $zpool/etc NOTE: $zpool is equal to $zfs from above, leading up to (but not including)= the first slash (/). Cute. Have to say I wasn't aware of the init_script feature of loader.conf(= 5). Not bad. --=20 Devin NOTE: Paring down on the cross-posting (bad OP). Posting only to -hackers@ = (as it seems to be the right place to post disseminating analysis). _____________ The information contained in this message is proprietary and/or confidentia= l. If you are not the intended recipient, please: (i) delete the message an= d all copies; (ii) do not disclose, distribute or use the message in any ma= nner; and (iii) notify the sender immediately. In addition, please be aware= that any message addressed to our domain is subject to archiving and revie= w by persons other than the intended recipient. Thank you. From owner-freebsd-hackers@FreeBSD.ORG Mon Jul 8 06:42:45 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id C7F7F9B7; Mon, 8 Jul 2013 06:42:45 +0000 (UTC) (envelope-from torek@torek.net) Received: from elf.torek.net (50-73-42-1-utah.hfc.comcastbusiness.net [50.73.42.1]) by mx1.freebsd.org (Postfix) with ESMTP id 971E81B4E; Mon, 8 Jul 2013 06:42:45 +0000 (UTC) Received: from elf.torek.net (localhost [127.0.0.1]) by elf.torek.net (8.14.5/8.14.5) with ESMTP id r686gbDQ089570; Mon, 8 Jul 2013 00:42:37 -0600 (MDT) (envelope-from torek@torek.net) Message-Id: <201307080642.r686gbDQ089570@elf.torek.net> From: Chris Torek To: freebsd-hackers@freebsd.org Subject: Re: expanding amd64 past the 1TB limit In-reply-to: Your message of "Fri, 28 Jun 2013 14:33:55 -0600." <201306282033.r5SKXtYK053022@elf.torek.net> Date: Mon, 08 Jul 2013 00:42:37 -0600 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (elf.torek.net [127.0.0.1]); Mon, 08 Jul 2013 00:42:37 -0600 (MDT) Cc: Konstantin Belousov , alc@freebsd.org, kib@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 08 Jul 2013 06:42:45 -0000 Here is a final (I hope) version of the patch. I dropped the config option, but I added code to limit the "real" size of the direct map PDEs. The end result is that on small systems, this ties up 14 more pages (15 from increasing NKPML4E, but one regained because the new static variable ndmpdpphys is 1 instead of 2). (I fixed the comment errors I spotted earlier, too.) Chris amd64/amd64/pmap.c | 100 +++++++++++++++++++++++++++++------------------- amd64/include/pmap.h | 36 +++++++++++++---- amd64/include/vmparam.h | 13 ++++--- 3 files changed, 97 insertions(+), 52 deletions(-) Author: Chris Torek Date: Thu Jun 27 18:49:29 2013 -0600 increase physical and virtual memory limits Increase kernel VM space: go from .5 TB of KVA and 1 TB of direct map, to 8 TB of KVA and 16 TB of direct map. However, we allocate less direct map space for small physical-memory systems. Also, if Maxmem is so large that there is not enough direct map space, reduce Maxmem to fit, so that the system can boot unassisted. diff --git a/amd64/amd64/pmap.c b/amd64/amd64/pmap.c index 8dcf232..7368c96 100644 --- a/amd64/amd64/pmap.c +++ b/amd64/amd64/pmap.c @@ -232,6 +232,7 @@ u_int64_t KPML4phys; /* phys addr of kernel level 4 */ static u_int64_t DMPDphys; /* phys addr of direct mapped level 2 */ static u_int64_t DMPDPphys; /* phys addr of direct mapped level 3 */ +static int ndmpdpphys; /* number of DMPDPphys pages */ static struct rwlock_padalign pvh_global_lock; @@ -531,12 +532,27 @@ static void create_pagetables(vm_paddr_t *firstaddr) { int i, j, ndm1g, nkpdpe; + pt_entry_t *pt_p; + pd_entry_t *pd_p; + pdp_entry_t *pdp_p; + pml4_entry_t *p4_p; /* Allocate page table pages for the direct map */ ndmpdp = (ptoa(Maxmem) + NBPDP - 1) >> PDPSHIFT; if (ndmpdp < 4) /* Minimum 4GB of dirmap */ ndmpdp = 4; - DMPDPphys = allocpages(firstaddr, NDMPML4E); + ndmpdpphys = howmany(ndmpdp, NPML4EPG); + if (ndmpdpphys > NDMPML4E) { + /* + * Each NDMPML4E allows 512 GB, so limit to that, + * and then readjust ndmpdp and ndmpdpphys. + */ + printf("NDMPML4E limits system to %d GB\n", NDMPML4E * 512); + Maxmem = atop(NDMPML4E * NBPML4); + ndmpdpphys = NDMPML4E; + ndmpdp = NDMPML4E * NPDEPG; + } + DMPDPphys = allocpages(firstaddr, ndmpdpphys); ndm1g = 0; if ((amd_feature & AMDID_PAGE1GB) != 0) ndm1g = ptoa(Maxmem) >> PDPSHIFT; @@ -553,6 +569,10 @@ create_pagetables(vm_paddr_t *firstaddr) * bootstrap. We defer this until after all memory-size dependent * allocations are done (e.g. direct map), so that we don't have to * build in too much slop in our estimate. + * + * Note that when NKPML4E > 1, we have an empty page underneath + * all but the KPML4I'th one, so we need NKPML4E-1 extra (zeroed) + * pages. (pmap_enter requires a PD page to exist for each KPML4E.) */ nkpt_init(*firstaddr); nkpdpe = NKPDPE(nkpt); @@ -561,32 +581,26 @@ create_pagetables(vm_paddr_t *firstaddr) KPDphys = allocpages(firstaddr, nkpdpe); /* Fill in the underlying page table pages */ - /* Read-only from zero to physfree */ + /* Nominally read-only (but really R/W) from zero to physfree */ /* XXX not fully used, underneath 2M pages */ - for (i = 0; (i << PAGE_SHIFT) < *firstaddr; i++) { - ((pt_entry_t *)KPTphys)[i] = i << PAGE_SHIFT; - ((pt_entry_t *)KPTphys)[i] |= PG_RW | PG_V | PG_G; - } + pt_p = (pt_entry_t *)KPTphys; + for (i = 0; ptoa(i) < *firstaddr; i++) + pt_p[i] = ptoa(i) | PG_RW | PG_V | PG_G; /* Now map the page tables at their location within PTmap */ - for (i = 0; i < nkpt; i++) { - ((pd_entry_t *)KPDphys)[i] = KPTphys + (i << PAGE_SHIFT); - ((pd_entry_t *)KPDphys)[i] |= PG_RW | PG_V; - } + pd_p = (pd_entry_t *)KPDphys; + for (i = 0; i < nkpt; i++) + pd_p[i] = (KPTphys + ptoa(i)) | PG_RW | PG_V; /* Map from zero to end of allocations under 2M pages */ /* This replaces some of the KPTphys entries above */ - for (i = 0; (i << PDRSHIFT) < *firstaddr; i++) { - ((pd_entry_t *)KPDphys)[i] = i << PDRSHIFT; - ((pd_entry_t *)KPDphys)[i] |= PG_RW | PG_V | PG_PS | PG_G; - } + for (i = 0; (i << PDRSHIFT) < *firstaddr; i++) + pd_p[i] = (i << PDRSHIFT) | PG_RW | PG_V | PG_PS | PG_G; - /* And connect up the PD to the PDP */ - for (i = 0; i < nkpdpe; i++) { - ((pdp_entry_t *)KPDPphys)[i + KPDPI] = KPDphys + - (i << PAGE_SHIFT); - ((pdp_entry_t *)KPDPphys)[i + KPDPI] |= PG_RW | PG_V | PG_U; - } + /* And connect up the PD to the PDP (leaving room for L4 pages) */ + pdp_p = (pdp_entry_t *)(KPDPphys + ptoa(KPML4I - KPML4BASE)); + for (i = 0; i < nkpdpe; i++) + pdp_p[i + KPDPI] = (KPDphys + ptoa(i)) | PG_RW | PG_V | PG_U; /* * Now, set up the direct map region using 2MB and/or 1GB pages. If @@ -596,37 +610,41 @@ create_pagetables(vm_paddr_t *firstaddr) * memory, pmap_change_attr() will demote any 2MB or 1GB page mappings * that are partially used. */ + pd_p = (pd_entry_t *)DMPDphys; for (i = NPDEPG * ndm1g, j = 0; i < NPDEPG * ndmpdp; i++, j++) { - ((pd_entry_t *)DMPDphys)[j] = (vm_paddr_t)i << PDRSHIFT; + pd_p[j] = (vm_paddr_t)i << PDRSHIFT; /* Preset PG_M and PG_A because demotion expects it. */ - ((pd_entry_t *)DMPDphys)[j] |= PG_RW | PG_V | PG_PS | PG_G | + pd_p[j] |= PG_RW | PG_V | PG_PS | PG_G | PG_M | PG_A; } + pdp_p = (pdp_entry_t *)DMPDPphys; for (i = 0; i < ndm1g; i++) { - ((pdp_entry_t *)DMPDPphys)[i] = (vm_paddr_t)i << PDPSHIFT; + pdp_p[i] = (vm_paddr_t)i << PDPSHIFT; /* Preset PG_M and PG_A because demotion expects it. */ - ((pdp_entry_t *)DMPDPphys)[i] |= PG_RW | PG_V | PG_PS | PG_G | + pdp_p[i] |= PG_RW | PG_V | PG_PS | PG_G | PG_M | PG_A; } for (j = 0; i < ndmpdp; i++, j++) { - ((pdp_entry_t *)DMPDPphys)[i] = DMPDphys + (j << PAGE_SHIFT); - ((pdp_entry_t *)DMPDPphys)[i] |= PG_RW | PG_V | PG_U; + pdp_p[i] = DMPDphys + ptoa(j); + pdp_p[i] |= PG_RW | PG_V | PG_U; } /* And recursively map PML4 to itself in order to get PTmap */ - ((pdp_entry_t *)KPML4phys)[PML4PML4I] = KPML4phys; - ((pdp_entry_t *)KPML4phys)[PML4PML4I] |= PG_RW | PG_V | PG_U; + p4_p = (pml4_entry_t *)KPML4phys; + p4_p[PML4PML4I] = KPML4phys; + p4_p[PML4PML4I] |= PG_RW | PG_V | PG_U; /* Connect the Direct Map slot(s) up to the PML4. */ - for (i = 0; i < NDMPML4E; i++) { - ((pdp_entry_t *)KPML4phys)[DMPML4I + i] = DMPDPphys + - (i << PAGE_SHIFT); - ((pdp_entry_t *)KPML4phys)[DMPML4I + i] |= PG_RW | PG_V | PG_U; + for (i = 0; i < ndmpdpphys; i++) { + p4_p[DMPML4I + i] = DMPDPphys + ptoa(i); + p4_p[DMPML4I + i] |= PG_RW | PG_V | PG_U; } - /* Connect the KVA slot up to the PML4 */ - ((pdp_entry_t *)KPML4phys)[KPML4I] = KPDPphys; - ((pdp_entry_t *)KPML4phys)[KPML4I] |= PG_RW | PG_V | PG_U; + /* Connect the KVA slots up to the PML4 */ + for (i = 0; i < NKPML4E; i++) { + p4_p[KPML4BASE + i] = KPDPphys + ptoa(i); + p4_p[KPML4BASE + i] |= PG_RW | PG_V | PG_U; + } } /* @@ -1685,8 +1703,11 @@ pmap_pinit(pmap_t pmap) pagezero(pmap->pm_pml4); /* Wire in kernel global address entries. */ - pmap->pm_pml4[KPML4I] = KPDPphys | PG_RW | PG_V | PG_U; - for (i = 0; i < NDMPML4E; i++) { + for (i = 0; i < NKPML4E; i++) { + pmap->pm_pml4[KPML4BASE + i] = (KPDPphys + (i << PAGE_SHIFT)) | + PG_RW | PG_V | PG_U; + } + for (i = 0; i < ndmpdpphys; i++) { pmap->pm_pml4[DMPML4I + i] = (DMPDPphys + (i << PAGE_SHIFT)) | PG_RW | PG_V | PG_U; } @@ -1941,8 +1962,9 @@ pmap_release(pmap_t pmap) m = PHYS_TO_VM_PAGE(pmap->pm_pml4[PML4PML4I] & PG_FRAME); - pmap->pm_pml4[KPML4I] = 0; /* KVA */ - for (i = 0; i < NDMPML4E; i++) /* Direct Map */ + for (i = 0; i < NKPML4E; i++) /* KVA */ + pmap->pm_pml4[KPML4BASE + i] = 0; + for (i = 0; i < ndmpdpphys; i++)/* Direct Map */ pmap->pm_pml4[DMPML4I + i] = 0; pmap->pm_pml4[PML4PML4I] = 0; /* Recursive Mapping */ diff --git a/amd64/include/pmap.h b/amd64/include/pmap.h index dc02e49..da80241 100644 --- a/amd64/include/pmap.h +++ b/amd64/include/pmap.h @@ -113,28 +113,50 @@ ((unsigned long)(l2) << PDRSHIFT) | \ ((unsigned long)(l1) << PAGE_SHIFT)) -#define NKPML4E 1 /* number of kernel PML4 slots */ +/* + * Number of kernel PML4 slots. Can be anywhere from 1 to 64 or so, + * but setting it larger than NDMPML4E makes no sense. + * + * Each slot provides .5 TB of kernel virtual space. + */ +#define NKPML4E 16 #define NUPML4E (NPML4EPG/2) /* number of userland PML4 pages */ #define NUPDPE (NUPML4E*NPDPEPG)/* number of userland PDP pages */ #define NUPDE (NUPDPE*NPDEPG) /* number of userland PD entries */ /* - * NDMPML4E is the number of PML4 entries that are used to implement the - * direct map. It must be a power of two. + * NDMPML4E is the maximum number of PML4 entries that will be + * used to implement the direct map. It must be a power of two, + * and should generally exceed NKPML4E. The maximum possible + * value is 64; using 128 will make the direct map intrude into + * the recursive page table map. */ -#define NDMPML4E 2 +#define NDMPML4E 32 /* - * The *PDI values control the layout of virtual memory. The starting address + * These values control the layout of virtual memory. The starting address * of the direct map, which is controlled by DMPML4I, must be a multiple of * its size. (See the PHYS_TO_DMAP() and DMAP_TO_PHYS() macros.) + * + * Note: KPML4I is the index of the (single) level 4 page that maps + * the KVA that holds KERNBASE, while KPML4BASE is the index of the + * first level 4 page that maps VM_MIN_KERNEL_ADDRESS. If NKPML4E + * is 1, these are the same, otherwise KPML4BASE < KPML4I and extra + * level 4 PDEs are needed to map from VM_MIN_KERNEL_ADDRESS up to + * KERNBASE. Similarly, if KMPL4I < (base+N), extra level 4 PDEs are + * needed to map from somewhere-above-KERNBASE to VM_MAX_KERNEL_ADDRESS. + * + * (KPML4I combines with KPDPI to choose where KERNBASE starts. + * Or, in other words, KPML4I provides bits 39..46 of KERNBASE, + * and KPDPI provides bits 30..38.) */ #define PML4PML4I (NPML4EPG/2) /* Index of recursive pml4 mapping */ -#define KPML4I (NPML4EPG-1) /* Top 512GB for KVM */ -#define DMPML4I rounddown(KPML4I - NDMPML4E, NDMPML4E) /* Below KVM */ +#define KPML4BASE (NPML4EPG-NKPML4E) /* KVM at highest addresses */ +#define DMPML4I rounddown(KPML4BASE-NDMPML4E, NDMPML4E) /* Below KVM */ +#define KPML4I (NPML4EPG-1) #define KPDPI (NPDPEPG-2) /* kernbase at -2GB */ /* diff --git a/amd64/include/vmparam.h b/amd64/include/vmparam.h index 33f62bd..cff2558 100644 --- a/amd64/include/vmparam.h +++ b/amd64/include/vmparam.h @@ -145,18 +145,19 @@ * 0x0000000000000000 - 0x00007fffffffffff user map * 0x0000800000000000 - 0xffff7fffffffffff does not exist (hole) * 0xffff800000000000 - 0xffff804020100fff recursive page table (512GB slot) - * 0xffff804020101000 - 0xfffffdffffffffff unused - * 0xfffffe0000000000 - 0xfffffeffffffffff 1TB direct map - * 0xffffff0000000000 - 0xffffff7fffffffff unused - * 0xffffff8000000000 - 0xffffffffffffffff 512GB kernel map + * 0xffff804020101000 - 0xffffdfffffffffff unused + * 0xffffe00000000000 - 0xffffefffffffffff 16TB direct map + * 0xfffff00000000000 - 0xfffff7ffffffffff unused + * 0xfffff80000000000 - 0xffffffffffffffff 8TB kernel map * * Within the kernel map: * * 0xffffffff80000000 KERNBASE */ -#define VM_MAX_KERNEL_ADDRESS KVADDR(KPML4I, NPDPEPG-1, NPDEPG-1, NPTEPG-1) -#define VM_MIN_KERNEL_ADDRESS KVADDR(KPML4I, NPDPEPG-512, 0, 0) +#define VM_MIN_KERNEL_ADDRESS KVADDR(KPML4BASE, 0, 0, 0) +#define VM_MAX_KERNEL_ADDRESS KVADDR(KPML4BASE + NKPML4E - 1, \ + NPDPEPG-1, NPDEPG-1, NPTEPG-1) #define DMAP_MIN_ADDRESS KVADDR(DMPML4I, 0, 0, 0) #define DMAP_MAX_ADDRESS KVADDR(DMPML4I + NDMPML4E, 0, 0, 0) From owner-freebsd-hackers@FreeBSD.ORG Mon Jul 8 11:22:53 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id D7101F62 for ; Mon, 8 Jul 2013 11:22:53 +0000 (UTC) (envelope-from mailist@yandex.com) Received: from forward13.mail.yandex.net (forward13.mail.yandex.net [IPv6:2a02:6b8:0:801::3]) by mx1.freebsd.org (Postfix) with ESMTP id 8ED6A1B22 for ; Mon, 8 Jul 2013 11:22:53 +0000 (UTC) Received: from web25f.yandex.ru (web25f.yandex.ru [95.108.131.159]) by forward13.mail.yandex.net (Yandex) with ESMTP id 5A6921425DB for ; Mon, 8 Jul 2013 12:35:34 +0400 (MSK) Received: from 127.0.0.1 (localhost.localdomain [127.0.0.1]) by web25f.yandex.ru (Yandex) with ESMTP id 12111436007C; Mon, 8 Jul 2013 12:35:34 +0400 (MSK) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex.com; s=mail; t=1373272534; bh=+N2oWPWBsi22ub5Qixn9mMB46UZznAcqviVrQyk2xZ4=; h=From:To:In-Reply-To:References:Subject:Date; b=LsECY3/UEflYeFyTuakbSV3MBuSeZ9dc3+28U3qGXkpD8o/0FJFaLBIkXDRwxTmkT b0g2t/aujnr9RROQDZeP7q7hXylliFcb/H8Be8QX9c1TvcS2Uu9tarARFybCeDC0x8 1Nt5TbMLdiR3tr0Fn+sgsWeZ20jAYN2ZK9KR6IFA= Received: from 85.98.186.174.dynamic.ttnet.com.tr (85.98.186.174.dynamic.ttnet.com.tr [85.98.186.174]) by web25f.yandex.ru with HTTP; Mon, 08 Jul 2013 12:35:33 +0400 From: =?utf-8?B?RW1yZSDDh2FtYWxhbg==?= Envelope-From: mailist@yandex.com.tr To: "freebsd-hackers@freebsd.org" In-Reply-To: <201711372949905@web8g.yandex.ru> References: <201711372949905@web8g.yandex.ru> Subject: Re: HP ILO FreeBSD 8.3 Installation problem MIME-Version: 1.0 Message-Id: <584911373272533@web25f.yandex.ru> X-Mailer: Yamail [ http://yandex.ru ] 5.0 Date: Mon, 08 Jul 2013 11:35:33 +0300 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=utf-8 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 08 Jul 2013 11:22:53 -0000 SOLUTION: Hi, USB memstick img file is solution for me. I try FreeBSD-8.3-RELEASE-amd64-memstick.img. I downloaded this img file and copy to USB, not burn it to USB. Then attach to İLO such as USB virtual image then sysinstall start, BUT I selected installation from usb install NOT CD/DVD installation. thanks for your answers. 04.07.2013, 17:58, "Emre Çamalan" : > Hi, > I'm trying to install FreeBSD with an HP ILO 4 advanced, web interface. I tried to install FreeBSD 8.2, FreeBSD 8.3 and FreeBSD 8.4. I tried to use acd0 and cd0 as media. I got the same result. Details about the problem I attach pictures. > > ERROR: I'm trying to add freebsd8.3iso from ILO such as virtual drive not from cd or dvd. > > Error: mounting /dev/acd0 on /dist: Input/output error (5) > > other ERROR: > Unable to initialize selected media. Would you like to adjust your media configuration and try again? > > , > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" From owner-freebsd-hackers@FreeBSD.ORG Mon Jul 8 18:43:54 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id B36F5A8F; Mon, 8 Jul 2013 18:43:54 +0000 (UTC) (envelope-from freebsd-list@nuos.org) Received: from cargobay.net (cargobay.net [174.136.100.98]) by mx1.freebsd.org (Postfix) with ESMTP id 8FC9011E3; Mon, 8 Jul 2013 18:43:54 +0000 (UTC) Received: from leonidas.ccsys.com (unknown [65.35.151.3]) by cargobay.net (Postfix) with ESMTPSA id DC7F7F42; Mon, 8 Jul 2013 18:42:49 +0000 (UTC) Message-ID: <51DB085A.9040701@nuos.org> Date: Mon, 08 Jul 2013 18:43:38 +0000 From: "Chad J. Milios" User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130624 Thunderbird/17.0.6 MIME-Version: 1.0 To: Devin Teske Subject: Re: Announcing: nuOS 0.0.9.1b1 - a whole NEW FreeBSD distro, NOT a fork References: <51D9E499.103@nuos.org> <13CA24D6AB415D428143D44749F57D7201FB6177@ltcfiswmsgmb21> In-Reply-To: <13CA24D6AB415D428143D44749F57D7201FB6177@ltcfiswmsgmb21> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: FreeBSD Hackers , "Teske, Devin" X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 08 Jul 2013 18:43:54 -0000 On 07/08/13 00:12, Teske, Devin wrote: > On Jul 7, 2013, at 2:58 PM, Chad J. Milios wrote: > [snip] > >> /etc is now a ZFS dataset of its own >> How did we do it? >> Decades of conventional wisdom says /etc must be on /. >> Check it out, discuss the whys and the trade-offs. > Well, I see in nu_install on GitHub how you're doing it: > > You added: > > init_script="/boot/init.sh" > > to /boot/loader.conf, wich -- among other things -- does these two interesting things (variable names changed to make things more clear): > > zfs rollback -r $zfs/swap/host@blank > NOTE: $zfs is equal to $( /bin/kenv vfs.root.mountfrom ) minus the leading "zfs:" > > and > > zfs mount $zpool/etc > NOTE: $zpool is equal to $zfs from above, leading up to (but not including) the first slash (/). > > Cute. Have to say I wasn't aware of the init_script feature of loader.conf(5). Not bad. We also had to put one file into the etc directory on the / "beneath" the /etc mount so that /sbin/init can read it before /etc is mounted. There were two or three ways we could do that and each has a tradeoff. What we did (mv /etc/login.conf.db /boot/etc; ln -s ../boot/etc/login.conf.db /etc/login.conf.db) has the undesirable effect that one must remember to (or be reminded/automated) run cap_mkdb anytime /etc is rolled to a different snapshot or a backup is restored to it (if that changes login.conf). With our customers at ccsys.com we have a proprietary management thing in userland (and you could lose out on that important event hook if you used anything other than our interface for zfs rollbacks and restoring backups, which we forbid). Since our goals at nuos.org are different, i'd like to implement that trigger somewhere better, ideally as-needed and immediate as possible. if anyone with more intimate knowledge of how and exactly when login.conf.db gets accessed has any thoughts... It could be a disaster for an admin to think their /etc is in a certain state and have that one file be out of sync. If better minds could chip in, I'm wondering if we're better off editing /sbin/init to run init_script _before_ loading the daemon class from login.conf.db (or explain why thats a bad idea) or if i should just add some sort of hook to run cap_mkdb right when needed using a DTrace script or auditd? Does anyone think this issue is moot? (Can't we just document this particular specific "gotcha" instance? I don't think so, I abhor any "gotcha" that deviates from behavior people expect from "upstream" fbsd.) Does anyone agree it's important we come as close to perfect a solution as we can? Is a separate /etc even worth it to people? Should i scrap that feature because of this issue? I think we can tighten this up so theres no twisted ankles and no one falling in this rare case but certainly potential manhole. (the manhole i'm talking about is login.conf and login.conf.db being out of sync because the later is a symlink to /boot/etc and someone might rollback to a more restrictive login.conf and think they're covered without running cap_mkdb again but their login.conf.db is actually out of sync and less restrictive in a way that burns them) Devin, thank you IMMENSELY for bsdinstall and especially bsdconfig. I use them both at work and they make life so much better. And thank you for the simplification using kenv. I was unaware of it From owner-freebsd-hackers@FreeBSD.ORG Mon Jul 8 22:44:09 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 1BA02DBB; Mon, 8 Jul 2013 22:44:09 +0000 (UTC) (envelope-from Devin.Teske@fisglobal.com) Received: from mx1.fisglobal.com (mx1.fisglobal.com [199.200.24.190]) by mx1.freebsd.org (Postfix) with ESMTP id D545F1D5E; Mon, 8 Jul 2013 22:44:08 +0000 (UTC) Received: from smtp.fisglobal.com ([10.132.206.17]) by ltcfislmsgpa05.fnfis.com (8.14.5/8.14.5) with ESMTP id r68MCe4f032179 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT); Mon, 8 Jul 2013 17:12:40 -0500 Received: from LTCFISWMSGMB21.FNFIS.com ([10.132.99.23]) by LTCFISWMSGHT06.FNFIS.com ([10.132.206.17]) with mapi id 14.02.0309.002; Mon, 8 Jul 2013 17:12:40 -0500 From: "Teske, Devin" To: "Chad J. Milios" Subject: Re: Announcing: nuOS 0.0.9.1b1 - a whole NEW FreeBSD distro, NOT a fork Thread-Topic: Announcing: nuOS 0.0.9.1b1 - a whole NEW FreeBSD distro, NOT a fork Thread-Index: AQHOe10+p/Zf2qC+20mYyH00Dv0xCplaPOCAgAE2VQCAADpnAA== Date: Mon, 8 Jul 2013 22:12:39 +0000 Message-ID: <13CA24D6AB415D428143D44749F57D7201FB7302@ltcfiswmsgmb21> References: <51D9E499.103@nuos.org> <13CA24D6AB415D428143D44749F57D7201FB6177@ltcfiswmsgmb21> <51DB085A.9040701@nuos.org> In-Reply-To: <51DB085A.9040701@nuos.org> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.132.253.126] Content-Type: text/plain; charset="iso-8859-1" Content-ID: Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.10.8794, 1.0.431, 0.0.0000 definitions=2013-07-08_03:2013-07-08,2013-07-08,1970-01-01 signatures=0 Cc: FreeBSD Hackers , Devin Teske X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: Devin Teske List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 08 Jul 2013 22:44:09 -0000 On Jul 8, 2013, at 11:43 AM, Chad J. Milios wrote: > On 07/08/13 00:12, Teske, Devin wrote: >> On Jul 7, 2013, at 2:58 PM, Chad J. Milios wrote: >> [snip] >>=20 >>> /etc is now a ZFS dataset of its own >>> How did we do it? >>> Decades of conventional wisdom says /etc must be on = /. >>> Check it out, discuss the whys and the trade-offs. >> Well, I see in nu_install on GitHub how you're doing it: >>=20 >> You added: >>=20 >> init_script=3D"/boot/init.sh" >>=20 >> to /boot/loader.conf, wich -- among other things -- does these two inter= esting things (variable names changed to make things more clear): >>=20 >> zfs rollback -r $zfs/swap/host@blank >> NOTE: $zfs is equal to $( /bin/kenv vfs.root.mountfrom ) minus the leadi= ng "zfs:" >>=20 >> and >>=20 >> zfs mount $zpool/etc >> NOTE: $zpool is equal to $zfs from above, leading up to (but not includi= ng) the first slash (/). >>=20 >> Cute. Have to say I wasn't aware of the init_script feature of loader.co= nf(5). Not bad. >=20 > We also had to put one file into the etc directory on the / "beneath" the= /etc mount so that /sbin/init can read it before /etc is mounted. There we= re two or three ways we could do that and each has a tradeoff. >=20 I've been bitten by that. Getting access to that file that's "beneath" once you've booted the system = can be ... less than easy. I'm interested in your cost/benefit points of having /etc a separate filesy= stem. On the face of it, I want to say that "/etc" is (or at least contains) the = "core identity" of the machine (and to a lesser extent -- because this is B= SD after-all -- /usr/local/etc). In my mind, /etc and /usr/local/etc *are* = the machine (metaphorically speaking), so the merits of having it as a sepa= rate filesystem are weighed against your desired topology. If you want to bunch of machines to look and/or act differently, then a sha= red /etc is precisely what you want. However, without allowing minor change= s (ala ZFS clone/snapshot or by way of UnionFS), you'll quickly find that t= he only way to cope is with role-based scripting in /etc/rc.conf (it is aft= er-all a shell script) or complicated abstraction layers (for example, usin= g netgraph eiface devices with the jail-name inside them so that rc.conf ha= ve have jail-specific ifconfig_* lines). But I digress. I think the better solution to your loading of files "beneath" the eventual= /etc filesystem is to throw away the ZFS snapshot/clone method and instead= move to a UnionFS approach for /etc. If you use UnionFS for your /etc, then what you do is for each of the machi= nes that you want *that* /etc to appear, you do something like: (as root) mount_unionfs -o below /etc /other/etc Now /other/etc (assuming it was empty before) looks exactly like /etc. Pros: With "rm -f ; rm -W " (in /other/etc) you can reclaim a f= ile from the underlying /etc. ZFS does not allow you to revert a single fil= e (you can revert the entire volume or filesystem, but not a single file). Cons: The advantage of having /etc as a ZFS filesystem is probably going to= be the compressratio. Using something like lzjb compression on your /etc d= irectory is beneficial (not as beneficial has say /var/log, but by means of= having mostly text files, /etc should compress nicely). But... if you *rea= lly* need to compress your /etc (that is to say, you're hard-up enough for = space that you need the little-savings that you'll gain from compressing /e= tc), then you're also hard-up enough that you should just set compression o= n the entire filesystem (nullifying your need to make /etc a separate files= ystem -- /etc would get the compression feature from the underlying root fi= lesystem; whatever that is -- zfs filesystem, zpool, zvol, etc.). So again,= UnionFS looks like a win unless you *really* want to set separate filesyst= em features for /etc that you don't set elsewhere. Were you perhaps after a zfs-/etc for some other reason? because there are = other reasons that I'm not getting into. For example, using sysutils/zxfer = to make backups of the /etc directory of an entire cloud of machines to a s= ingle host. If you don't have /etc as a separate filesystem (and all you wa= nt is /etc) then a ZFS stream is of course out of the question and you'll h= ave to resort to rsync. I personally think zxfer is more efficient than rsy= nc but I haven't done the calculations yet to prove it (but it feels like i= t -- incremental snapshot transfers are pretty darned quick). > What we did (mv /etc/login.conf.db /boot/etc; ln -s ../boot/etc/login.con= f.db /etc/login.conf.db) has the undesirable effect that one must remember = to (or be reminded/automated) run cap_mkdb anytime /etc is rolled to a diff= erent snapshot or a backup is restored to it (if that changes login.conf). >=20 *nods* > With our customers at ccsys.com we have a proprietary management thing in= userland (and you could lose out on that important event hook if you used = anything other than our interface for zfs rollbacks and restoring backups, = which we forbid). Interesting. > Since our goals at nuos.org are different, i'd like to implement that tri= gger somewhere better, ideally as-needed and immediate as possible. >=20 > if anyone with more intimate knowledge of how and exactly when login.conf= .db gets accessed has any thoughts... It could be a disaster for an admin t= o think their /etc is in a certain state and have that one file be out of s= ync. If better minds could chip in, I'm wondering if we're better off editi= ng /sbin/init to run init_script _before_ loading the daemon class from log= in.conf.db (or explain why thats a bad idea) or if i should just add some s= ort of hook to run cap_mkdb right when needed using a DTrace script or audi= td? >=20 That's an interesting aspect of the boot process I hadn't noticed before (h= aving not used init_script before). I would think that this should be filed= as a PR. Seems to me that the init_script should fire first -- but (and th= is is a guess) it may need to bootstrap the user that the init_script runs = as (presumably needing to load the daemon class for said user). While there= may be good reason, it certainly violates a principle (that one might be a= stonished to learn that init_script is not run in a fashion that only the d= ependencies thereof are required). > Does anyone think this issue is moot? (Can't we just document this partic= ular specific "gotcha" instance? I don't think so, I abhor any "gotcha" tha= t deviates from behavior people expect from "upstream" fbsd.) Does anyone a= gree it's important we come as close to perfect a solution as we can? Thanks for bringing up the issue with init_script. We should look to fix it= to make its use capable of handling the use-case you identified (using it = to bootstrap a separate /etc). > Is a separate /etc even worth it to people? Depends. Everybody? certainly not. Some? Sure. See above example-cases. > Should i scrap that feature because of this issue? It sounds like you contorted yourself working around a deficiency in it (a = POLA violation in that it has unforeseen dependencies). At the very least, = I would think that init could have a fall-back if the file can't be loaded. Are you putting anything beside the default daemon-class definition in your= login.conf "beneath" your true /etc? > I think we can tighten this up so theres no twisted ankles and no one fal= ling in this rare case but certainly potential manhole. (the manhole i'm ta= lking about is login.conf and login.conf.db being out of sync because the l= ater is a symlink to /boot/etc and someone might rollback to a more restric= tive login.conf and think they're covered without running cap_mkdb again bu= t their login.conf.db is actually out of sync and less restrictive in a way= that burns them) >=20 Sorry you had to work around that -- you should have filed a PR. > Devin, thank you IMMENSELY for bsdinstall and especially bsdconfig. I use= them both at work and they make life so much better. And thank you for the= simplification using kenv. I was unaware of it On a side-note, I didn't write bsdinstall -- I'm going to maintain it, but = I wrote bsdconfig ^_^ (smiles) Thank you very much for your appreciation. Certainly a labor of love and I'= m happy that others have kicked the wheels at least. --=20 Devin _____________ The information contained in this message is proprietary and/or confidentia= l. If you are not the intended recipient, please: (i) delete the message an= d all copies; (ii) do not disclose, distribute or use the message in any ma= nner; and (iii) notify the sender immediately. In addition, please be aware= that any message addressed to our domain is subject to archiving and revie= w by persons other than the intended recipient. Thank you. From owner-freebsd-hackers@FreeBSD.ORG Tue Jul 9 03:17:12 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 39723AF5; Tue, 9 Jul 2013 03:17:12 +0000 (UTC) (envelope-from julian@freebsd.org) Received: from vps1.elischer.org (vps1.elischer.org [204.109.63.16]) by mx1.freebsd.org (Postfix) with ESMTP id 103F512D6; Tue, 9 Jul 2013 03:17:11 +0000 (UTC) Received: from jre-mbp.elischer.org (ppp121-45-226-51.lns20.per1.internode.on.net [121.45.226.51]) (authenticated bits=0) by vps1.elischer.org (8.14.5/8.14.5) with ESMTP id r693Gx5Q009074 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Mon, 8 Jul 2013 20:17:02 -0700 (PDT) (envelope-from julian@freebsd.org) Message-ID: <51DB80A6.9080609@freebsd.org> Date: Tue, 09 Jul 2013 11:16:54 +0800 From: Julian Elischer User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:17.0) Gecko/20130620 Thunderbird/17.0.7 MIME-Version: 1.0 To: Devin Teske Subject: Re: Announcing: nuOS 0.0.9.1b1 - a whole NEW FreeBSD distro, NOT a fork References: <51D9E499.103@nuos.org> <13CA24D6AB415D428143D44749F57D7201FB6177@ltcfiswmsgmb21> <51DB085A.9040701@nuos.org> <13CA24D6AB415D428143D44749F57D7201FB7302@ltcfiswmsgmb21> In-Reply-To: <13CA24D6AB415D428143D44749F57D7201FB7302@ltcfiswmsgmb21> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: FreeBSD Hackers , "Chad J. Milios" , "Teske, Devin" X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Jul 2013 03:17:12 -0000 On 7/9/13 6:12 AM, Teske, Devin wrote: > On Jul 8, 2013, at 11:43 AM, Chad J. Milios wrote: > > > We also had to put one file into the etc directory on the / "beneath" the /etc mount so that /sbin/init can read it before /etc is mounted. There were two or three ways we could do that and each has a tradeoff. > > I've been bitten by that. > > Getting access to that file that's "beneath" once you've booted the system can be ... less than easy. if it's hardlinked to another copy that is not "beneath" then you can just edit it. I once had a system at vicor where I had a temporary "beneath" /etc that had all its files linked to files of the same name in /etc.boot/ > > From owner-freebsd-hackers@FreeBSD.ORG Tue Jul 9 07:38:01 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 9A48627F; Tue, 9 Jul 2013 07:38:01 +0000 (UTC) (envelope-from freebsd-list@nuos.org) Received: from cargobay.net (cargobay.net [174.136.100.98]) by mx1.freebsd.org (Postfix) with ESMTP id 739651DDA; Tue, 9 Jul 2013 07:38:01 +0000 (UTC) Received: from leonidas.ccsys.com (unknown [65.35.151.3]) by cargobay.net (Postfix) with ESMTPSA id 05FA3145; Tue, 9 Jul 2013 07:36:55 +0000 (UTC) Message-ID: <51DBBDCC.7060800@nuos.org> Date: Tue, 09 Jul 2013 07:37:48 +0000 From: "Chad J. Milios" User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130624 Thunderbird/17.0.6 MIME-Version: 1.0 To: Devin Teske , FreeBSD Hackers Subject: login.conf.db, /sbin/init, separate /etc, and configs around "thin provisioning" WAS: Re: nuOS References: <51D9E499.103@nuos.org> <13CA24D6AB415D428143D44749F57D7201FB6177@ltcfiswmsgmb21> <51DB085A.9040701@nuos.org> <13CA24D6AB415D428143D44749F57D7201FB7302@ltcfiswmsgmb21> In-Reply-To: <13CA24D6AB415D428143D44749F57D7201FB7302@ltcfiswmsgmb21> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Jul 2013 07:38:01 -0000 On 07/08/13 22:12, Teske, Devin wrote: > > We also had to put one file into the etc directory on the / "beneath" the /etc mount so that /sbin/init can read it before /etc is mounted. There were two or three ways we could do that and each has a tradeoff. > > I've been bitten by that. > > Getting access to that file that's "beneath" once you've booted the system can be ... less than easy. yeah i prefer resorting to trickery or "magic" as little as possible only as a last resort and i try to clutter up the standard tree of files as little as possible. in this case we only needed the one file, just a symlink actually. the "under" has only the following: lrwxr-xr-x 1 root wheel 25 Jun 25 17:59 /login.conf.db@ -> ../boot/etc/login.conf.db and in the "over" /etc we still place an identical symlink so that the real file is in /boot/etc/. cap_mkdb doesnt clobber the symlink, it writes through to /boot/etc/login.conf.db for you. so in the normal usual case, a user edits login.conf and runs cap_mkdb like they're supposed to and everything is fine. its only if they rollback or restore a backup to /etc that things potentially can end up out of sync. i don't want anyone to get confused by me talking about jails in the same email. The above snag we are working around involves /sbin/init ONLY WHEN booting the host FreeBSD. Our jailed customers don't have to worry about this because /etc is already in the right spot by the time jail runs /etc/rc. /sbin/init isn't even involved in a jail, is it? Not even in some "hooked-in" way? At any rate we dont have to do anything special for a separate /etc dataset for jails. We could just forgo the /etc dataset on the host but i am glad that we can manage our bare metal customers using the same methods and tools. Handling this symlink hack is less differentiation than giving up separate /etc on the host i think. > > I'm interested in your cost/benefit points of having /etc a separate filesystem. > > On the face of it, I want to say that "/etc" is (or at least contains) the "core identity" of the machine (and to a lesser extent -- because this is BSD after-all -- /usr/local/etc). In my mind, /etc and /usr/local/etc *are* the machine (metaphorically speaking), so the merits of having it as a separate filesystem are weighed against your desired topology. i agree. myself i like having such a lightweight "identity" and keeping /, /usr and /usr/local (which are all just on sitting on / in my case) mounted read-only. the "prototype" for a host is handled by a completely different department than the people/customers who sysadmin their deployments and instances. Early in the building/installing, before any ports/packages, /usr/local/etc is made a symlink to /etc/local, so the symlink is in the readonly / and every time you write or cd to /usr/local/etc you end up in /etc/local. An /etc dataset ends up under a MB zfs compressed and /var on a fresh instance is basically also nothing. all-in-all a new jail costs you under a MB of zpool. we jail stop/start and zfs send/receive instances in a blink of an eye and its "almost" as good as having live migration. We could get the same storage efficiency by simply cloning /, and having no sub-datasets. some customers feel like they want to be able to write anywhere and we give them those options but then they are on their own and we don't manage the software updates for those guys and some like it that way. we then bill each for all the storage they reference because a year down the road they may be the only one still holding a reference to the outdated prototype they're on even though they overwrote every file twice with make world or freebsd-update. their memory usage is also way higher than most because when executables are launched on the jails with the read-only nullfs mounted /, those all access the same memory pages but zfs isnt smart enough yet to let the virtual memory system maintain those pointers through the indirection of zfs clones and snapshots. so zfs separate /etc and /var give us great storage efficiency while nullfs gives us great memory performance and efficiency. > > If you want to bunch of machines to look and/or act differently, then a shared /etc is precisely what you want. However, without allowing minor changes (ala ZFS clone/snapshot or by way of UnionFS), you'll quickly find that the only way to cope is with role-based scripting in /etc/rc.conf (it is after-all a shell script) or complicated abstraction layers (for example, using netgraph eiface devices with the jail-name inside them so that rc.conf have have jail-specific ifconfig_* lines). But I digress. > > I think the better solution to your loading of files "beneath" the eventual /etc filesystem is to throw away the ZFS snapshot/clone method and instead move to a UnionFS approach for /etc. > > If you use UnionFS for your /etc, then what you do is for each of the machines that you want *that* /etc to appear, you do something like: > > (as root) mount_unionfs -o below /etc /other/etc > > Now /other/etc (assuming it was empty before) looks exactly like /etc. In theory, i love the concept of unionfs and it gives far more flexibility than zfs especially if the two can be combined effectively. For us, its semantics were just never well established enough and there are too many corner cases and combinations of possibilities that, while exciting, were never conceived of and cant be nailed down in a simple VFS or POSIX filesystem mindset for obvious reasons. When i have the time to really dig in again i'd love to see where unionfs is at today and if i can be using it to do some very cool things again (but now with less headaches legwork and sleepless nights). For the reasons stated though, i have to admit i'm simply just _afraid_ of unionfs. Your suggestion is simple enough though, i'm sure i wouldnt need a month of research and testing. :) It's probably overkill for our needs in this case. > > Pros: With "rm -f ; rm -W " (in /other/etc) you can reclaim a file from the underlying /etc. ZFS does not allow you to revert a single file (you can revert the entire volume or filesystem, but not a single file). I really liked the idea of removing whiteout and having a lower file appear but thats just me. :) You're right that ZFS doesn't let you do anything nearly as selective but it does allow you cherry pick files out of .zfs/snapshot. Like you said, that's not rolling a file back you're just copying an old version to a new version atop the top "layer". > > > > if anyone with more intimate knowledge of how and exactly when login.conf.db gets accessed has any thoughts... It could be a disaster for an admin to think their /etc is in a certain state and have that one file be out of sync. If better minds could chip in, I'm wondering if we're better off editing /sbin/init to run init_script _before_ loading the daemon class from login.conf.db (or explain why thats a bad idea) or if i should just add some sort of hook to run cap_mkdb right when needed using a DTrace script or auditd? > > That's an interesting aspect of the boot process I hadn't noticed before (having not used init_script before). I would think that this should be filed as a PR. Seems to me that the init_script should fire first -- but (and this is a guess) it may need to bootstrap the user that the init_script runs as (presumably needing to load the daemon class for said user). While there may be good reason, it certainly violates a principle (that one might be astonished to learn that init_script is not run in a fashion that only the dependencies thereof are required). > > I thought so too initially, init_script is documented as being for [init]ialization BEFORE /etc/rc itself. It's obviously run as root and early enough the machine ought to obey init_script as if it were commandments handed down by God. Why init needs to know anything about the daemon class beforehand is beyond me. Quite literally "beyond me". I don't have a strong enough opinion either way though to be filing a PR yet. I thought it's worth bringing up so brighter minds might take a look if they find it peculiar. I have it back-burnered on one of a full screen-border of post-it notes and i'll learn more about what's going on in /sbin/init soon if no one else steps forward. >> Does anyone think this issue is moot? (Can't we just document this particular specific "gotcha" instance? I don't think so, I abhor any "gotcha" that deviates from behavior people expect from "upstream" fbsd.) Does anyone agree it's important we come as close to perfect a solution as we can? > Thanks for bringing up the issue with init_script. We should look to fix it to make its use capable of handling the use-case you identified (using it to bootstrap a separate /etc). Good, see, this is why FreeBSD is awesome. People care about parameters and configurations and having a stable system even in the face of overwhelming combinatorics. Not to speak ill of Linux or sling mud with vague accusations and no specific instances (but i'm going to haha) but you have no idea how many times i've been using Linux in a project, usually to do something a little cutting edge or off-the-reservation, and i say "Hey i think i should be able to combine X with Y, can someone help me?" and all too often i get the attitude like "man, we're all doing Z now, havent ya heard? Z is here to end all our sorrows" and i'll be like "but Z doesn't do X+Y" and to that i'm shamed and ridiculed like "dude, if Z doesn't do everything you want and you don't worship Z with us, youre stupid" hahaha does anyone else feel similarly about any experience theyve had on the LKML? I can name almost 10 values for X, Y and flavor-of-the-week Z. > > >> Is a separate /etc even worth it to people? > Depends. Everybody? certainly not. Some? Sure. See above example-cases. > > >> Should i scrap that feature because of this issue? > It sounds like you contorted yourself working around a deficiency in it (a POLA violation in that it has unforeseen dependencies). At the very least, I would think that init could have a fall-back if the file can't be loaded. > > Are you putting anything beside the default daemon-class definition in your login.conf "beneath" your true /etc? Init does have a compiled in default class == the initial system default "default" class. login.conf remains the source of truth on the true "upper" /etc but things read login.conf.db to get their answers. At the very outset of a system build, i move the plain old default login.conf.db to /boot/etc and it contains all the classes. 99.9% of our users keep the default login.conf and maybe actually 100% are using it just that way on any given day. I'm just that anal-retentive that I think if i ignore this someone will suffer for their astonishment (or unknowing lack thereof) when their db ends up out of sync because they didnt know we introduced another event where cap_mkdb should get run (post rollback/restore of /etc). I would simply run cap_mkdb every time we mount /etc but i don't think thats good enough because i dont know when and what else accesses it, I'm assuming more than just /sbin/init at boot, right? Am I overthinking this because nothing else reads login.conf.db ever? /usr/bin/login accesses it every user login, no? Do i misunderstand totally? > > >> I think we can tighten this up so theres no twisted ankles and no one falling in this rare case but certainly potential manhole. (the manhole i'm talking about is login.conf and login.conf.db being out of sync because the later is a symlink to /boot/etc and someone might rollback to a more restrictive login.conf and think they're covered without running cap_mkdb again but their login.conf.db is actually out of sync and less restrictive in a way that burns them) >> > Sorry you had to work around that -- you should have filed a PR. > I will file a PR if i look at the problem more in depth if someone doesn't chime in and save me with already-expert knowledge that i don't have to dig for. (one can hope, right?) > >> Devin, thank you IMMENSELY for bsdinstall and especially bsdconfig. I use them both at work and they make life so much better. And thank you for the simplification using kenv. I was unaware of it > On a side-note, I didn't write bsdinstall -- I'm going to maintain it, but I wrote bsdconfig ^_^ (smiles) > > Thank you very much for your appreciation. Certainly a labor of love and I'm happy that others have kicked the wheels at least. Yeah i've more than kicked the tires. It's excellent work keep it up. From owner-freebsd-hackers@FreeBSD.ORG Tue Jul 9 09:42:50 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 67DC8456 for ; Tue, 9 Jul 2013 09:42:50 +0000 (UTC) (envelope-from wojtek@wojtek.tensor.gdynia.pl) Received: from wojtek.tensor.gdynia.pl (wojtek.tensor.gdynia.pl [188.252.31.196]) by mx1.freebsd.org (Postfix) with ESMTP id E8428148A for ; Tue, 9 Jul 2013 09:42:49 +0000 (UTC) Received: from wojtek.tensor.gdynia.pl (localhost [127.0.0.1]) by wojtek.tensor.gdynia.pl (8.14.7/8.14.7) with ESMTP id r699gYeY007362; Tue, 9 Jul 2013 11:42:34 +0200 (CEST) (envelope-from wojtek@wojtek.tensor.gdynia.pl) Received: from localhost (wojtek@localhost) by wojtek.tensor.gdynia.pl (8.14.7/8.14.7/Submit) with ESMTP id r699gYeL007359; Tue, 9 Jul 2013 11:42:34 +0200 (CEST) (envelope-from wojtek@wojtek.tensor.gdynia.pl) Date: Tue, 9 Jul 2013 11:42:33 +0200 (CEST) From: Wojciech Puchar To: Aryeh Friedman Subject: Re: writing a rc.d script In-Reply-To: Message-ID: References: User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.4.3 (wojtek.tensor.gdynia.pl [127.0.0.1]); Tue, 09 Jul 2013 11:42:34 +0200 (CEST) Cc: FreeBSD Mailing List X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Jul 2013 09:42:50 -0000 any other script is a template. man rc is your documentation On Sun, 7 Jul 2013, Aryeh Friedman wrote: > I have a program I am making a port for that also requires a > /usr/local/etc/rc.d script is there anywhere I can find documentation > on how write one and/or a template file to follow? > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" > > From owner-freebsd-hackers@FreeBSD.ORG Tue Jul 9 11:36:30 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 2B2C0B0D for ; Tue, 9 Jul 2013 11:36:30 +0000 (UTC) (envelope-from feld@feld.me) Received: from out1-smtp.messagingengine.com (out1-smtp.messagingengine.com [66.111.4.25]) by mx1.freebsd.org (Postfix) with ESMTP id F39A61A37 for ; Tue, 9 Jul 2013 11:36:29 +0000 (UTC) Received: from compute6.internal (compute6.nyi.mail.srv.osa [10.202.2.46]) by gateway1.nyi.mail.srv.osa (Postfix) with ESMTP id F05B520D61 for ; Tue, 9 Jul 2013 07:36:28 -0400 (EDT) Received: from frontend2.nyi.mail.srv.osa ([10.202.2.161]) by compute6.internal (MEProxy); Tue, 09 Jul 2013 07:36:28 -0400 DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=feld.me; h= content-type:to:subject:references:date:mime-version :content-transfer-encoding:from:message-id:in-reply-to; s= mesmtp; bh=fRH7T9+7mjLCX9577yREQN6CLI0=; b=mD+nVZGoQnU94cGCfdPX0 5py0O2M43UIiCIL/OUvTs6AOPPF2VC41usT58WZ73SjXuCMbdoasWZ2ntFytkUDj HuyoHCRxJnqSYel24o1gcVL0DyJ0py+qWl9KPNFD9kcgxLyj8QO5oGDsJS88JhhP d5cZb0jbzqYz/DhuAZqYEQ= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d= messagingengine.com; h=content-type:to:subject:references:date :mime-version:content-transfer-encoding:from:message-id :in-reply-to; s=smtpout; bh=fRH7T9+7mjLCX9577yREQN6CLI0=; b=YBdb SFNfozZvv3ObFd9NEeSTP9T9nVZOzE3qin/jhtzNZ1BWYkNuGrd8EUehjA0gmzVj Y5+iR3CEOQ5xWUGJ5xImUBB7Fg9uW1Pt/kPAncQU9wr8pcrxH6DDVkt7mn7rOOcu ka77U6MMBzQv9K8KVbN5U3teKxDJkunP1VTvCNE= X-Sasl-enc: TW7e2alK/c3xPdF25+pCo4meXBqAXcGH9YTrkaXp57Xj 1373369788 Received: from markf.office.supranet.net (unknown [66.170.8.18]) by mail.messagingengine.com (Postfix) with ESMTPA id BDE186804D5 for ; Tue, 9 Jul 2013 07:36:28 -0400 (EDT) Content-Type: text/plain; charset=us-ascii; format=flowed; delsp=yes To: freebsd-hackers@freebsd.org Subject: Re: writing a rc.d script References: Date: Tue, 09 Jul 2013 06:36:28 -0500 MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: "Mark Felder" Message-ID: In-Reply-To: User-Agent: Opera Mail/12.15 (FreeBSD) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Jul 2013 11:36:30 -0000 On Sun, 07 Jul 2013 06:45:46 -0500, Aryeh Friedman wrote: > I have a program I am making a port for that also requires a > /usr/local/etc/rc.d script is there anywhere I can find documentation > on how write one and/or a template file to follow? > Start with another similar port if possible. Make sure you use rclint to verify its correctness. There is a ton of information inside /etc/rc.subr and /usr/ports/Mk/* which may help you write your port where the documentation falls short. From owner-freebsd-hackers@FreeBSD.ORG Wed Jul 10 09:02:24 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 05561558; Wed, 10 Jul 2013 09:02:24 +0000 (UTC) (envelope-from des@des.no) Received: from smtp.des.no (smtp.des.no [194.63.250.102]) by mx1.freebsd.org (Postfix) with ESMTP id BBCE41D11; Wed, 10 Jul 2013 09:02:23 +0000 (UTC) Received: from nine.des.no (smtp.des.no [194.63.250.102]) by smtp-int.des.no (Postfix) with ESMTP id 885E342F1; Wed, 10 Jul 2013 09:02:16 +0000 (UTC) Received: by nine.des.no (Postfix, from userid 1001) id D7E0E352BD; Wed, 10 Jul 2013 11:02:00 +0200 (CEST) From: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= To: freebsd-fs@freebsd.org, freebsd-hackers@freebsd.org Subject: Make ZFS use the physical sector size when computing initial ashift Date: Wed, 10 Jul 2013 11:02:00 +0200 Message-ID: <86zjtupz3r.fsf@nine.des.no> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (berkeley-unix) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Cc: ivoras@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Jul 2013 09:02:24 -0000 --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable The attached patch causes ZFS to base the minimum transfer size for a new vdev on the GEOM provider's stripesize (physical sector size) rather than sectorsize (logical sector size), provided that stripesize is a power of two larger than sectorsize and smaller than or equal to VDEV_PAD_SIZE. This should eliminate the need for ivoras@'s gnop trick when creating ZFS pools on Advanced Format drives. DES --=20 Dag-Erling Sm=C3=B8rgrav - des@des.no --=-=-= Content-Type: text/x-patch Content-Disposition: attachment; filename=zfs-vdev-stripesize.diff Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c =================================================================== --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c (revision 253138) +++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c (working copy) @@ -578,6 +578,7 @@ { struct g_provider *pp; struct g_consumer *cp; + u_int sectorsize; size_t bufsize; int error; @@ -661,8 +662,21 @@ /* * Determine the device's minimum transfer size. + * + * This is a bit of a hack. For performance reasons, we would + * prefer to use the physical sector size (reported by GEOM as + * stripesize) as minimum transfer size. However, doing so + * unconditionally would break existing vdevs. Therefore, we + * compute ashift based on stripesize when the vdev isn't already + * part of a pool (vdev_asize == 0), and sectorsize otherwise. */ - *ashift = highbit(MAX(pp->sectorsize, SPA_MINBLOCKSIZE)) - 1; + if (vd->vdev_asize == 0 && pp->stripesize > pp->sectorsize && + ISP2(pp->stripesize) && pp->stripesize <= VDEV_PAD_SIZE) { + sectorsize = pp->stripesize; + } else { + sectorsize = pp->sectorsize; + } + *ashift = highbit(MAX(sectorsize, SPA_MINBLOCKSIZE)) - 1; /* * Clear the nowritecache settings, so that on a vdev_reopen() --=-=-=-- From owner-freebsd-hackers@FreeBSD.ORG Wed Jul 10 09:25:27 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id D675FEB5; Wed, 10 Jul 2013 09:25:27 +0000 (UTC) (envelope-from prvs=1903808b5b=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id BDCB21E65; Wed, 10 Jul 2013 09:25:26 +0000 (UTC) Received: from r2d2 ([82.69.141.170]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50004902843.msg; Wed, 10 Jul 2013 10:25:24 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Wed, 10 Jul 2013 10:25:24 +0100 (not processed: message from valid local sender) X-MDDKIM-Result: neutral (mail1.multiplay.co.uk) X-MDRemoteIP: 82.69.141.170 X-Return-Path: prvs=1903808b5b=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk Message-ID: <628C5D1AF6044488B708484203D70B7A@multiplay.co.uk> From: "Steven Hartland" To: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= , , , References: <86zjtupz3r.fsf@nine.des.no> Subject: Re: Make ZFS use the physical sector size when computing initial ashift Date: Wed, 10 Jul 2013 10:25:36 +0100 MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_NextPart_000_1133_01CE7D57.D127DB40" X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 Cc: ivoras@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Jul 2013 09:25:28 -0000 This is a multi-part message in MIME format. ------=_NextPart_000_1133_01CE7D57.D127DB40 Content-Type: text/plain; format=flowed; charset="utf-8"; reply-type=original Content-Transfer-Encoding: 8bit Hi DES, unfortunately you need a quite bit more than this to work compatibly. I've had a patch here that does just this for quite some time but there's been some discussion on how we want additional control over this so its not been commited. If others are interested I've attached this as it achieves what we needed here so may also be of use for others too. There's also a big discussion on illumos about this very subject ATM so I'm monitoring that too. Hopefully there will be a nice conclusion come from that how people want to proceed and we'll be able to get a change in that works for everyone. Regards Steve ----- Original Message ----- From: "Dag-Erling Smørgrav" To: ; Cc: Sent: Wednesday, July 10, 2013 10:02 AM Subject: Make ZFS use the physical sector size when computing initial ashift The attached patch causes ZFS to base the minimum transfer size for a new vdev on the GEOM provider's stripesize (physical sector size) rather than sectorsize (logical sector size), provided that stripesize is a power of two larger than sectorsize and smaller than or equal to VDEV_PAD_SIZE. This should eliminate the need for ivoras@'s gnop trick when creating ZFS pools on Advanced Format drives. DES -- Dag-Erling Smørgrav - des@des.no -------------------------------------------------------------------------------- > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. ------=_NextPart_000_1133_01CE7D57.D127DB40 Content-Type: application/octet-stream; name="zzz-zfs-ashift-fix.patch" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="zzz-zfs-ashift-fix.patch" Changes zfs zpool initial / desired ashift to be based off stripesize=0A= instead of sectorsize making it compatible with drives marked with=0A= the 4k sector size quirk.=0A= =0A= Without the correct min block size BIO_DELETE requests passed to=0A= a large number of current SSD's via TRIM don't actually perform=0A= any LBA TRIM so its vital for the correct operation of TRIM to get=0A= the correct min block size.=0A= =0A= To do this we added the additional dashift (desired ashift) to=0A= vdev_open_func_t calls. This was needed as just updating ashift to=0A= be based off stripesize would mean that a devices reported minimum=0A= transfer size (ashift) could increase and that in turn would cause=0A= member devices to be unusable and hence break pools with error=0A= ZFS-8000-5E.=0A= =0A= The global minimum ashift used for new zpools can now also be=0A= tuned using the vfs.zfs.min_create_ashift sysctl. This defaults=0A= to 12 (4096 byte blocks) in order to optimise for newer disks which=0A= are migrating from 512 to 4096 byte sectors.=0A= =0A= The value of vfs.zfs.min_create_ashift is limited to min of=0A= SPA_MINBLOCKSHIFT (9) and a max of SPA_MAXBLOCKSHIFT (17).=0A= --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_disk.c.orig = 2011-06-06 09:36:46.000000000 +0000=0A= +++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_disk.c = 2012-11-02 14:47:55.293668071 +0000=0A= @@ -32,6 +32,8 @@=0A= #include =0A= #include =0A= =0A= +extern int zfs_min_ashift;=0A= +=0A= /*=0A= * Virtual device vector for disks.=0A= */=0A= @@ -103,7 +105,7 @@=0A= }=0A= =0A= static int=0A= -vdev_disk_open(vdev_t *vd, uint64_t *psize, uint64_t *ashift)=0A= +vdev_disk_open(vdev_t *vd, uint64_t *psize, uint64_t *ashift, uint64_t = *dashift)=0A= {=0A= spa_t *spa =3D vd->vdev_spa;=0A= vdev_disk_t *dvd;=0A= @@ -284,7 +286,7 @@=0A= }=0A= =0A= /*=0A= - * Determine the device's minimum transfer size.=0A= + * Determine the device's minimum and desired transfer size.=0A= * If the ioctl isn't supported, assume DEV_BSIZE.=0A= */=0A= if (ldi_ioctl(dvd->vd_lh, DKIOCGMEDIAINFOEXT, (intptr_t)&dkmext,=0A= @@ -292,6 +294,7 @@=0A= dkmext.dki_pbsize =3D DEV_BSIZE;=0A= =0A= *ashift =3D highbit(MAX(dkmext.dki_pbsize, SPA_MINBLOCKSIZE)) - 1;=0A= + *dashift =3D highbit(MAX(dkmext.dki_pbsize, (1ULL << zfs_min_ashift))) = - 1;=0A= =0A= /*=0A= * Clear the nowritecache bit, so that on a vdev_reopen() we will=0A= --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_file.c.orig = 2012-01-05 22:31:25.000000000 +0000=0A= +++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_file.c = 2012-11-02 14:47:38.252107541 +0000=0A= @@ -30,6 +30,8 @@=0A= #include =0A= #include =0A= =0A= +extern int zfs_min_ashift;=0A= +=0A= /*=0A= * Virtual device vector for files.=0A= */=0A= @@ -47,7 +49,7 @@=0A= }=0A= =0A= static int=0A= -vdev_file_open(vdev_t *vd, uint64_t *psize, uint64_t *ashift)=0A= +vdev_file_open(vdev_t *vd, uint64_t *psize, uint64_t *ashift, uint64_t = *dashift)=0A= {=0A= vdev_file_t *vf;=0A= vnode_t *vp;=0A= @@ -127,6 +129,7 @@=0A= =0A= *psize =3D vattr.va_size;=0A= *ashift =3D SPA_MINBLOCKSHIFT;=0A= + *dashift =3D zfs_min_ashift;=0A= =0A= return (0);=0A= }=0A= --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c.orig = 2012-11-02 12:20:15.918986181 +0000=0A= +++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c = 2012-11-02 14:47:48.135273692 +0000=0A= @@ -36,6 +36,8 @@=0A= #include =0A= #include =0A= =0A= +extern int zfs_min_ashift;=0A= +=0A= /*=0A= * Virtual device vector for GEOM.=0A= */=0A= @@ -408,7 +410,7 @@=0A= }=0A= =0A= static int=0A= -vdev_geom_open(vdev_t *vd, uint64_t *psize, uint64_t *ashift)=0A= +vdev_geom_open(vdev_t *vd, uint64_t *psize, uint64_t *ashift, uint64_t = *dashift)=0A= {=0A= struct g_provider *pp;=0A= struct g_consumer *cp;=0A= @@ -494,9 +496,10 @@=0A= *psize =3D pp->mediasize;=0A= =0A= /*=0A= - * Determine the device's minimum transfer size.=0A= + * Determine the device's minimum and desired transfer size.=0A= */=0A= *ashift =3D highbit(MAX(pp->sectorsize, SPA_MINBLOCKSIZE)) - 1;=0A= + *dashift =3D highbit(MAX(pp->stripesize, (1ULL << zfs_min_ashift))) - = 1;=0A= =0A= /*=0A= * Clear the nowritecache settings, so that on a vdev_reopen()=0A= --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_mirror.c.orig = 2012-07-03 11:49:22.342245151 +0000=0A= +++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_mirror.c = 2012-07-03 11:58:02.161948585 +0000=0A= @@ -127,7 +127,7 @@=0A= }=0A= =0A= static int=0A= -vdev_mirror_open(vdev_t *vd, uint64_t *asize, uint64_t *ashift)=0A= +vdev_mirror_open(vdev_t *vd, uint64_t *asize, uint64_t *ashift, = uint64_t *dashift)=0A= {=0A= int numerrors =3D 0;=0A= int lasterror =3D 0;=0A= @@ -150,6 +150,7 @@=0A= =0A= *asize =3D MIN(*asize - 1, cvd->vdev_asize - 1) + 1;=0A= *ashift =3D MAX(*ashift, cvd->vdev_ashift);=0A= + *dashift =3D MAX(*dashift, cvd->vdev_dashift);=0A= }=0A= =0A= if (numerrors =3D=3D vd->vdev_children) {=0A= --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_missing.c.orig = 2012-07-03 11:49:10.545275865 +0000=0A= +++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_missing.c = 2012-07-03 11:58:07.670470640 +0000=0A= @@ -40,7 +40,7 @@=0A= =0A= /* ARGSUSED */=0A= static int=0A= -vdev_missing_open(vdev_t *vd, uint64_t *psize, uint64_t *ashift)=0A= +vdev_missing_open(vdev_t *vd, uint64_t *psize, uint64_t *ashift, = uint64_t *dashift)=0A= {=0A= /*=0A= * Really this should just fail. But then the root vdev will be in the=0A= @@ -50,6 +50,7 @@=0A= */=0A= *psize =3D 0;=0A= *ashift =3D 0;=0A= + *dashift =3D 0;=0A= return (0);=0A= }=0A= =0A= --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_raidz.c.orig = 2012-07-03 11:49:03.675875505 +0000=0A= +++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_raidz.c = 2012-07-03 11:58:15.334806334 +0000=0A= @@ -1447,7 +1447,7 @@=0A= }=0A= =0A= static int=0A= -vdev_raidz_open(vdev_t *vd, uint64_t *asize, uint64_t *ashift)=0A= +vdev_raidz_open(vdev_t *vd, uint64_t *asize, uint64_t *ashift, uint64_t = *dashift)=0A= {=0A= vdev_t *cvd;=0A= uint64_t nparity =3D vd->vdev_nparity;=0A= @@ -1476,6 +1476,7 @@=0A= =0A= *asize =3D MIN(*asize - 1, cvd->vdev_asize - 1) + 1;=0A= *ashift =3D MAX(*ashift, cvd->vdev_ashift);=0A= + *dashift =3D MAX(*dashift, cvd->vdev_dashift);=0A= }=0A= =0A= *asize *=3D vd->vdev_children;=0A= --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_root.c.orig = 2012-07-03 11:49:27.901760380 +0000=0A= +++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_root.c = 2012-07-03 11:58:19.704427068 +0000=0A= @@ -50,7 +50,7 @@=0A= }=0A= =0A= static int=0A= -vdev_root_open(vdev_t *vd, uint64_t *asize, uint64_t *ashift)=0A= +vdev_root_open(vdev_t *vd, uint64_t *asize, uint64_t *ashift, uint64_t = *dashift)=0A= {=0A= int lasterror =3D 0;=0A= int numerrors =3D 0;=0A= @@ -78,6 +78,7 @@=0A= =0A= *asize =3D 0;=0A= *ashift =3D 0;=0A= + *dashift =3D 0;=0A= =0A= return (0);=0A= }=0A= --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c.orig = 2012-10-22 20:41:50.234005351 +0000=0A= +++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c 2012-10-22 = 20:42:16.355805894 +0000=0A= @@ -1125,6 +1125,7 @@=0A= uint64_t osize =3D 0;=0A= uint64_t asize, psize;=0A= uint64_t ashift =3D 0;=0A= + uint64_t dashift =3D 0;=0A= =0A= ASSERT(vd->vdev_open_thread =3D=3D curthread ||=0A= spa_config_held(spa, SCL_STATE_ALL, RW_WRITER) =3D=3D = SCL_STATE_ALL);=0A= @@ -1154,7 +1155,7 @@=0A= return (ENXIO);=0A= }=0A= =0A= - error =3D vd->vdev_ops->vdev_op_open(vd, &osize, &ashift);=0A= + error =3D vd->vdev_ops->vdev_op_open(vd, &osize, &ashift, &dashift);=0A= =0A= /*=0A= * Reset the vdev_reopening flag so that we actually close=0A= @@ -1255,14 +1256,16 @@=0A= */=0A= vd->vdev_asize =3D asize;=0A= vd->vdev_ashift =3D MAX(ashift, vd->vdev_ashift);=0A= + vd->vdev_dashift =3D MAX(dashift, vd->vdev_dashift);=0A= } else {=0A= /*=0A= * Make sure the alignment requirement hasn't increased.=0A= */=0A= if (ashift > vd->vdev_top->vdev_ashift) {=0A= + printf("ZFS ashift open failure of %s (%ld > %ld)\n", vd->vdev_path, = ashift, vd->vdev_top->vdev_ashift);=0A= vdev_set_state(vd, B_TRUE, VDEV_STATE_CANT_OPEN,=0A= VDEV_AUX_BAD_LABEL);=0A= return (EINVAL);=0A= }=0A= }=0A= =0A= --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_label.c.orig = 2012-11-05 15:27:52.092194343 +0000=0A= +++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_label.c = 2012-11-05 15:53:26.449021023 +0000=0A= @@ -145,9 +145,12 @@=0A= #include =0A= =0A= static boolean_t vdev_trim_on_init =3D B_TRUE;=0A= +static boolean_t vdev_dashift_enable =3D B_TRUE;=0A= SYSCTL_DECL(_vfs_zfs_vdev);=0A= SYSCTL_INT(_vfs_zfs_vdev, OID_AUTO, trim_on_init, CTLFLAG_RW,=0A= &vdev_trim_on_init, 0, "Enable/disable full vdev trim on = initialisation");=0A= +SYSCTL_INT(_vfs_zfs_vdev, OID_AUTO, optimal_ashift, CTLFLAG_RW,=0A= + &vdev_dashift_enable, 0, "Enable/disable optimal ashift usage on = initialisation");=0A= =0A= /*=0A= * Basic routines to read and write from a vdev label.=0A= @@ -282,6 +285,16 @@=0A= vd->vdev_ms_array) =3D=3D 0);=0A= VERIFY(nvlist_add_uint64(nv, ZPOOL_CONFIG_METASLAB_SHIFT,=0A= vd->vdev_ms_shift) =3D=3D 0);=0A= + /*=0A= + * We use the max of ashift and dashift (the desired/optimal=0A= + * ashift), which is typically the stripesize of a device, to=0A= + * ensure we get the best performance from underlying devices.=0A= + * =0A= + * Its done here as it should only ever have an effect on new=0A= + * zpool creation.=0A= + */=0A= + if (vdev_dashift_enable)=0A= + vd->vdev_ashift =3D MAX(vd->vdev_ashift, vd->vdev_dashift);=0A= VERIFY(nvlist_add_uint64(nv, ZPOOL_CONFIG_ASHIFT,=0A= vd->vdev_ashift) =3D=3D 0);=0A= VERIFY(nvlist_add_uint64(nv, ZPOOL_CONFIG_ASIZE,=0A= --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/vdev_impl.h.orig = 2012-10-22 20:40:08.361577293 +0000=0A= +++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/vdev_impl.h = 2012-10-22 21:02:52.447781800 +0000=0A= @@ -55,7 +55,7 @@=0A= /*=0A= * Virtual device operations=0A= */=0A= -typedef int vdev_open_func_t(vdev_t *vd, uint64_t *size, uint64_t = *ashift);=0A= +typedef int vdev_open_func_t(vdev_t *vd, uint64_t *size, uint64_t = *ashift, uint64_t *dashift);=0A= typedef void vdev_close_func_t(vdev_t *vd);=0A= typedef uint64_t vdev_asize_func_t(vdev_t *vd, uint64_t psize);=0A= typedef int vdev_io_start_func_t(zio_t *zio);=0A= @@ -119,6 +119,7 @@=0A= uint64_t vdev_asize; /* allocatable device capacity */=0A= uint64_t vdev_min_asize; /* min acceptable asize */=0A= uint64_t vdev_ashift; /* block alignment shift */=0A= + uint64_t vdev_dashift; /* desired blk alignment shift */=0A= uint64_t vdev_state; /* see VDEV_STATE_* #defines */=0A= uint64_t vdev_prevstate; /* used when reopening a vdev */=0A= vdev_ops_t *vdev_ops; /* vdev operations */=0A= --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c.orig = 2012-11-02 14:56:29.474248887 +0000=0A= +++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c 2012-11-03 = 01:27:28.066912403 +0000=0A= @@ -41,6 +41,30 @@=0A= #include =0A= #include =0A= =0A= +#define ZFS_MIN_ASHIFT SPA_MINBLOCKSHIFT=0A= +/*=0A= + * Max ashift - limited by how labels are accessed by zio_read_phys = using offsets=0A= + * within vdev_label_t=0A= + *=0A= + * If label access is fixed to work with ashift properly then the max = should be=0A= + * set to SPA_MAXBLOCKSHIFT=0A= + */=0A= +#define ZFS_MAX_ASHIFT 13=0A= +/*=0A= + * Optimum ashift - defaults to 12 which results in a min block size of = 4096 as=0A= + * this is the optimum value for newer disks which are migrating from = 512 to 4096=0A= + * byte sectors=0A= + */=0A= +#define ZFS_OPTIMUM_ASHIFT 12 =0A= +=0A= +/*=0A= + * Minimum ashift used when creating new pools=0A= + *=0A= + * This can be tuned using the sysctl vfs.zfs.min_create_ashift but is = limited=0A= + * to a min of ZFS_MIN_ASHIFT and a max of ZFS_MAX_ASHIFT=0A= + * =0A= + */=0A= +int zfs_min_ashift =3D MAX(SPA_MINBLOCKSHIFT, ZFS_OPTIMUM_ASHIFT);=0A= int zfs_no_write_throttle =3D 0;=0A= int zfs_write_limit_shift =3D 3; /* 1/8th of physical memory */=0A= int zfs_txg_synctime_ms =3D 1000; /* target millisecs to sync a txg */=0A= @@ -54,6 +78,9 @@=0A= =0A= static pgcnt_t old_physmem =3D 0;=0A= =0A= +#ifdef _KERNEL=0A= +static int min_ashift_sysctl(SYSCTL_HANDLER_ARGS);=0A= +=0A= SYSCTL_DECL(_vfs_zfs);=0A= TUNABLE_INT("vfs.zfs.no_write_throttle", &zfs_no_write_throttle);=0A= SYSCTL_INT(_vfs_zfs, OID_AUTO, no_write_throttle, CTLFLAG_RDTUN,=0A= @@ -78,6 +105,32 @@=0A= TUNABLE_QUAD("vfs.zfs.write_limit_override", &zfs_write_limit_override);=0A= SYSCTL_QUAD(_vfs_zfs, OID_AUTO, write_limit_override, CTLFLAG_RDTUN,=0A= &zfs_write_limit_override, 0, "");=0A= +SYSCTL_PROC(_vfs_zfs, OID_AUTO, min_create_ashift, CTLTYPE_INT | = CTLFLAG_RW,=0A= + &zfs_min_ashift, 0, min_ashift_sysctl, "I",=0A= + "Minimum ashift used when creating new pools");=0A= +=0A= +static int=0A= +min_ashift_sysctl(SYSCTL_HANDLER_ARGS)=0A= +{=0A= + int error, value;=0A= +=0A= + value =3D *(int *)arg1;=0A= +=0A= + error =3D sysctl_handle_int(oidp, &value, 0, req);=0A= +=0A= + if ((error !=3D 0) || (req->newptr =3D=3D NULL))=0A= + return (error);=0A= +=0A= + if (value < ZFS_MIN_ASHIFT)=0A= + value =3D ZFS_MIN_ASHIFT;=0A= + else if (value > ZFS_MAX_ASHIFT)=0A= + value =3D ZFS_MAX_ASHIFT;=0A= +=0A= + *(int *)arg1 =3D value;=0A= +=0A= + return (0);=0A= +}=0A= +#endif=0A= =0A= int=0A= dsl_pool_open_special_dir(dsl_pool_t *dp, const char *name, dsl_dir_t = **ddp)=0A= ------=_NextPart_000_1133_01CE7D57.D127DB40-- From owner-freebsd-hackers@FreeBSD.ORG Wed Jul 10 09:50:53 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id DFC706C6 for ; Wed, 10 Jul 2013 09:50:53 +0000 (UTC) (envelope-from pluknet@gmail.com) Received: from mail-ve0-x234.google.com (mail-ve0-x234.google.com [IPv6:2607:f8b0:400c:c01::234]) by mx1.freebsd.org (Postfix) with ESMTP id A650C1FAA for ; Wed, 10 Jul 2013 09:50:53 +0000 (UTC) Received: by mail-ve0-f180.google.com with SMTP id pa12so5563113veb.11 for ; Wed, 10 Jul 2013 02:50:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=Z9LF2PRxy4Jp/FzpMpmCCVqw/+uFBJspmGGNXfsz1OY=; b=m5tdu/0an4VmU9zQufT79LxBa1GhHyDZoxN1wP8lT7goqLD7ztLJQ6IApTMfDYD1E3 g2mjWoXUjjvsNR5L7awpGfcskiw90/2JaVIEQgpljf4FAdQ3UzR2qzA0ypgdXPSF3WC9 dwHtK+kuv/5eyR39p8/vMUFrlz0d9GVoTkWJ31adUIvbd5EdvcTltwwif5b533WNJh7R 5W1Suaj1vrrz+Kg/8UqE2lrgAPJEs5Ug2fRUFd+8VlE4OmzVzAbde2U8p2esiWastLvp ahYM/b/30A1hIbLDCfeoa/0A7PyVLYvOtZEmkfEcebYhKKgYI88EdqnOmYB49EF450cP N1tA== MIME-Version: 1.0 X-Received: by 10.58.86.70 with SMTP id n6mr18816675vez.8.1373449853180; Wed, 10 Jul 2013 02:50:53 -0700 (PDT) Received: by 10.52.70.39 with HTTP; Wed, 10 Jul 2013 02:50:52 -0700 (PDT) In-Reply-To: References: Date: Wed, 10 Jul 2013 13:50:52 +0400 Message-ID: Subject: Re: hw.physmem/hw.realmem question From: Sergey Kandaurov To: Wojciech Puchar Content-Type: text/plain; charset=ISO-8859-1 Cc: freebsd-hackers X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Jul 2013 09:50:53 -0000 On 3 July 2013 01:45, Wojciech Puchar wrote: >> AMD Features2=0x1 >> TSC: P-state invariant, performance statistics >> real memory = 34359738368 (32768 MB) >> avail memory = 32191340544 (30700 MB) > > > 2GB memory "disappears" too even when you don't set anything. > > i asked such a question for other machine some time ago without much answer. > > > in your laptop it may be shared graphics memory reserved by chipset > > still on my dell server > > > real memory = 34359738368 (32768 MB) > avail memory = 33166921728 (31630 MB) > > i have over 1GB unavailable and it doesn't have shared graphics memory. > > it would be nice to be able to look exactly how memory is used. On amd64 about 3% is cut on startup for page structures, see vm_page_startup(). -- wbr, pluknet From owner-freebsd-hackers@FreeBSD.ORG Wed Jul 10 10:46:27 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 98129BB2; Wed, 10 Jul 2013 10:46:27 +0000 (UTC) (envelope-from des@des.no) Received: from smtp.des.no (smtp.des.no [194.63.250.102]) by mx1.freebsd.org (Postfix) with ESMTP id 5A20012B2; Wed, 10 Jul 2013 10:46:27 +0000 (UTC) Received: from nine.des.no (smtp.des.no [194.63.250.102]) by smtp-int.des.no (Postfix) with ESMTP id 2FD32447F; Wed, 10 Jul 2013 10:46:26 +0000 (UTC) Received: by nine.des.no (Postfix, from userid 1001) id 85689352D6; Wed, 10 Jul 2013 12:46:10 +0200 (CEST) From: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= To: "Steven Hartland" Subject: Re: Make ZFS use the physical sector size when computing initial ashift References: <86zjtupz3r.fsf@nine.des.no> <628C5D1AF6044488B708484203D70B7A@multiplay.co.uk> Date: Wed, 10 Jul 2013 12:46:10 +0200 In-Reply-To: <628C5D1AF6044488B708484203D70B7A@multiplay.co.uk> (Steven Hartland's message of "Wed, 10 Jul 2013 10:25:36 +0100") Message-ID: <86vc4ipua5.fsf@nine.des.no> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org, freebsd-hackers@freebsd.org, zfs-devel@FreeBSD.org, ivoras@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Jul 2013 10:46:27 -0000 "Steven Hartland" writes: > Hi DES, unfortunately you need a quite bit more than this to work > compatibly. *chirp* *chirp* *chirp* DES --=20 Dag-Erling Sm=C3=B8rgrav - des@des.no From owner-freebsd-hackers@FreeBSD.ORG Wed Jul 10 11:03:20 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id AD306958; Wed, 10 Jul 2013 11:03:20 +0000 (UTC) (envelope-from borjam@sarenet.es) Received: from proxypop04.sare.net (proxypop04.sare.net [194.30.0.65]) by mx1.freebsd.org (Postfix) with ESMTP id 6FF721453; Wed, 10 Jul 2013 11:03:20 +0000 (UTC) Received: from [172.16.2.2] (izaro.sarenet.es [192.148.167.11]) by proxypop04.sare.net (Postfix) with ESMTPSA id 0E5DE9DC4EE; Wed, 10 Jul 2013 13:03:13 +0200 (CEST) Subject: Re: Make ZFS use the physical sector size when computing initial ashift Mime-Version: 1.0 (Apple Message framework v1283) Content-Type: text/plain; charset=us-ascii From: Borja Marcos X-Priority: 3 In-Reply-To: <628C5D1AF6044488B708484203D70B7A@multiplay.co.uk> Date: Wed, 10 Jul 2013 13:03:09 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: <774B60E8-19C2-4A3A-880D-0D8726DC6727@sarenet.es> References: <86zjtupz3r.fsf@nine.des.no> <628C5D1AF6044488B708484203D70B7A@multiplay.co.uk> To: Steven Hartland X-Mailer: Apple Mail (2.1283) Cc: freebsd-fs@freebsd.org, =?iso-8859-1?Q?Dag-Erling_Sm=F8rgrav?= , zfs-devel@FreeBSD.org, ivoras@freebsd.org, freebsd-hackers@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Jul 2013 11:03:20 -0000 On Jul 10, 2013, at 11:25 AM, Steven Hartland wrote: > If others are interested I've attached this as it achieves what we = needed here so > may also be of use for others too. >=20 > There's also a big discussion on illumos about this very subject ATM = so I'm > monitoring that too. >=20 > Hopefully there will be a nice conclusion come from that how people = want to > proceed and we'll be able to get a change in that works for everyone. Hmm. I wonder if the simplest approach would be the better. I mean, = adding a flag to zpool. At home I have a playground FreeBSD machine with a ZFS zmirror, and, you = guessed it, I was careless when I purchased the components, I asked for two "1 TB drives" = and that I got, but different models, one of them "advanced format" and the other one "classic". I don't think it's that bad to create a pool on a classic disk using 4 = KB blocks, and it's quite likely that replacement disks will be 4 KB in the near future.=20 Also, if you use SSDs the situation is similar. Borja. From owner-freebsd-hackers@FreeBSD.ORG Wed Jul 10 11:10:44 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 2686ED3A; Wed, 10 Jul 2013 11:10:44 +0000 (UTC) (envelope-from prvs=1903808b5b=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 5096715E8; Wed, 10 Jul 2013 11:10:43 +0000 (UTC) Received: from r2d2 ([82.69.141.170]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50004904388.msg; Wed, 10 Jul 2013 12:10:35 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Wed, 10 Jul 2013 12:10:35 +0100 (not processed: message from valid local sender) X-MDDKIM-Result: neutral (mail1.multiplay.co.uk) X-MDRemoteIP: 82.69.141.170 X-Return-Path: prvs=1903808b5b=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk Message-ID: <197F9EAB64AD4A1FBC4DD75F7255D55D@multiplay.co.uk> From: "Steven Hartland" To: "Borja Marcos" References: <86zjtupz3r.fsf@nine.des.no> <628C5D1AF6044488B708484203D70B7A@multiplay.co.uk> <774B60E8-19C2-4A3A-880D-0D8726DC6727@sarenet.es> Subject: Re: Make ZFS use the physical sector size when computing initial ashift Date: Wed, 10 Jul 2013 12:10:54 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 Cc: freebsd-fs@freebsd.org, =?iso-8859-1?Q?Dag-Erling_Sm=F8rgrav?= , zfs-devel@FreeBSD.org, ivoras@freebsd.org, freebsd-hackers@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Jul 2013 11:10:44 -0000 There's lots more to consider when considering a way foward not least of all ashift isn't a zpool configuration option is per top level vdev, space consideration of moving from 512b to 4k, see previous and current discussions on zfs-devel@freebsd.org and zfs@lists.illumos.org for details. Regards Steve ----- Original Message ----- From: "Borja Marcos" On Jul 10, 2013, at 11:25 AM, Steven Hartland wrote: > If others are interested I've attached this as it achieves what we needed here so > may also be of use for others too. > > There's also a big discussion on illumos about this very subject ATM so I'm > monitoring that too. > > Hopefully there will be a nice conclusion come from that how people want to > proceed and we'll be able to get a change in that works for everyone. Hmm. I wonder if the simplest approach would be the better. I mean, adding a flag to zpool. At home I have a playground FreeBSD machine with a ZFS zmirror, and, you guessed it, I was careless when I purchased the components, I asked for two "1 TB drives" and that I got, but different models, one of them "advanced format" and the other one "classic". I don't think it's that bad to create a pool on a classic disk using 4 KB blocks, and it's quite likely that replacement disks will be 4 KB in the near future. Also, if you use SSDs the situation is similar. ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-hackers@FreeBSD.ORG Wed Jul 10 17:21:09 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 2077B847; Wed, 10 Jul 2013 17:21:09 +0000 (UTC) (envelope-from delphij@delphij.net) Received: from anubis.delphij.net (anubis.delphij.net [64.62.153.212]) by mx1.freebsd.org (Postfix) with ESMTP id 0C6201EF6; Wed, 10 Jul 2013 17:21:08 +0000 (UTC) Received: from zeta.ixsystems.com (drawbridge.ixsystems.com [206.40.55.65]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by anubis.delphij.net (Postfix) with ESMTPSA id 6BE4499BE; Wed, 10 Jul 2013 10:21:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=delphij.net; s=anubis; t=1373476867; bh=QWvdSJ8fOuO5/5aNWRb6P76tenHC3kB+m/UvBmfWclI=; h=Date:From:Reply-To:To:CC:Subject:References:In-Reply-To; b=kT7QZTcEcKXKd/5RzfQJW5lU3NogRHfy2Rzb06m9QqvqizHRMaBWxS604Vzh372ww QJth8LvJ0u13P7HRHcKwqChUG+qLJ5yIY80fRB5W3L90TPC4R0ocrX7aNkjflni6tu Bs+17tlBb5ffPo+6t/pREZzieHwHv/V9KFj7szkw= Message-ID: <51DD9801.4090808@delphij.net> Date: Wed, 10 Jul 2013 10:21:05 -0700 From: Xin Li Organization: The FreeBSD Project MIME-Version: 1.0 To: =?UTF-8?B?RGFnLUVybGluZyBTbcO4cmdyYXY=?= Subject: Re: Make ZFS use the physical sector size when computing initial ashift References: <86zjtupz3r.fsf@nine.des.no> In-Reply-To: <86zjtupz3r.fsf@nine.des.no> X-Enigmail-Version: 1.5.1 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Cc: freebsd-fs@freebsd.org, freebsd-hackers@freebsd.org, ivoras@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: d@delphij.net List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Jul 2013 17:21:09 -0000 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 On 07/10/13 02:02, Dag-Erling Sm￸rgrav wrote: > The attached patch causes ZFS to base the minimum transfer size for > a new vdev on the GEOM provider's stripesize (physical sector size) > rather than sectorsize (logical sector size), provided that > stripesize is a power of two larger than sectorsize and smaller > than or equal to VDEV_PAD_SIZE. This should eliminate the need for > ivoras@'s gnop trick when creating ZFS pools on Advanced Format > drives. I think there are multiple versions of this (I also have one[1]) but the concern is that if one creates a pool with ashift=9, and now ashift=12, the pool gets unimportable. So there need a way to disable this behavior. Another thing (not really related to the automatic detection) is that we need a way to manually override this setting from command line when creating the pool, this is under active discussion at Illumos mailing list right now. [1] https://github.com/trueos/trueos/commit/3d2e3a38faad8df4acf442b055c5e98ab873fb26 Cheers, - -- Xin LI https://www.delphij.net/ FreeBSD - The Power to Serve! Live free or die -----BEGIN PGP SIGNATURE----- iQEcBAEBCgAGBQJR3ZgAAAoJEG80Jeu8UPuzM6kIALu3Ud4uu+kdcsp+zNS54iw6 Etx2xWOjbHhJ1PZ0BKJ4R5/BOfpW4b1DrarPtpZLxoyg55GwlEVCH8Cia9ucznfP KgFGwzztQlsiI5hcWD6RVNkAx/2o7sSynbprxxP1UdEdmH7f5MWVpNwjGE2KiIpA 0TxfTu8Sg0/QB7h3pGWt5sJSuwyogewvHIfTAgHEqnQdYPXxpadH7PS7shSJVdim z2C9GoyLVQ6BMxXzQDcmA+fllgMZVKXROG7SxDFNDTWPnZ9HMZp2OJKELLtuZB1y Iaq/gd3uPR2ZzPxw2OjdYKe7khWtmuU5Ox6+natsOKCqfoAfCjArA8zJZYsZoMI= =Nd1V -----END PGP SIGNATURE----- From owner-freebsd-hackers@FreeBSD.ORG Wed Jul 10 17:38:58 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 7B7BA101; Wed, 10 Jul 2013 17:38:58 +0000 (UTC) (envelope-from gibbs@FreeBSD.org) Received: from aslan.scsiguy.com (aslan.scsiguy.com [70.89.174.89]) by mx1.freebsd.org (Postfix) with ESMTP id 58F54101B; Wed, 10 Jul 2013 17:38:58 +0000 (UTC) Received: from [192.168.6.139] (207-225-98-3.dia.static.qwest.net [207.225.98.3]) (authenticated bits=0) by aslan.scsiguy.com (8.14.7/8.14.5) with ESMTP id r6AHcoCY097797 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO); Wed, 10 Jul 2013 17:38:50 GMT (envelope-from gibbs@FreeBSD.org) Content-Type: multipart/signed; boundary="Apple-Mail=_4D9E9496-59FD-423A-B74B-D55D497C0941"; protocol="application/pgp-signature"; micalg=pgp-sha1 Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\)) Subject: Re: Make ZFS use the physical sector size when computing initial ashift From: "Justin T. Gibbs" In-Reply-To: <51DD9801.4090808@delphij.net> Date: Wed, 10 Jul 2013 11:38:45 -0600 Message-Id: <2B9367B6-8759-45C9-B120-9D00A381228F@FreeBSD.org> References: <86zjtupz3r.fsf@nine.des.no> <51DD9801.4090808@delphij.net> To: d@delphij.net X-Mailer: Apple Mail (2.1508) X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.4.3 (aslan.scsiguy.com [70.89.174.89]); Wed, 10 Jul 2013 17:38:50 +0000 (UTC) Cc: freebsd-fs@freebsd.org, =?iso-8859-1?Q?Dag-Erling_Sm=F8rgrav?= , ivoras@freebsd.org, freebsd-hackers@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Jul 2013 17:38:58 -0000 --Apple-Mail=_4D9E9496-59FD-423A-B74B-D55D497C0941 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 On Jul 10, 2013, at 11:21 AM, Xin Li wrote: > Signed PGP part > On 07/10/13 02:02, Dag-Erling Sm=EF=BF=B8rgrav wrote: > > The attached patch causes ZFS to base the minimum transfer size for > > a new vdev on the GEOM provider's stripesize (physical sector size) > > rather than sectorsize (logical sector size), provided that > > stripesize is a power of two larger than sectorsize and smaller > > than or equal to VDEV_PAD_SIZE. This should eliminate the need for > > ivoras@'s gnop trick when creating ZFS pools on Advanced Format > > drives. >=20 > I think there are multiple versions of this (I also have one[1]) but > the concern is that if one creates a pool with ashift=3D9, and now > ashift=3D12, the pool gets unimportable. So there need a way to = disable > this behavior. >=20 > Another thing (not really related to the automatic detection) is that > we need a way to manually override this setting from command line when > creating the pool, this is under active discussion at Illumos mailing > list right now. >=20 > [1] > = https://github.com/trueos/trueos/commit/3d2e3a38faad8df4acf442b055c5e98ab8= 73fb26 >=20 > Cheers, > - --=20 > Xin LI https://www.delphij.net/ > FreeBSD - The Power to Serve! Live free or die >=20 > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" I'm sure lots of folks have "some solution" to this. Here is an old version of what we use at Spectra: = http://people.freebsd.org/~gibbs/zfs_patches/zfs_auto_ashift.diff The above patch is missing some cleanup that was motivated by my discussions with George Wilson about this change in April. I'll dig that up later tonight. Even if you don't read the full diff, please read the included checkin comment since it explains the motivation behind this particular solution. This is on my list of things to upstream in the next week or so after I add logic to the userspace tools to report whether or not the TLVs in a pool are using an optimal allocation size. This is only possible if you actually make ZFS fully aware of logical, physical, and the configured allocation size. All of the other patches I've seen just treat physical as logical. -- Justin --Apple-Mail=_4D9E9496-59FD-423A-B74B-D55D497C0941 Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=signature.asc Content-Type: application/pgp-signature; name=signature.asc Content-Description: Message signed with OpenPGP using GPGMail -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.19 (Darwin) iQEcBAEBAgAGBQJR3ZwlAAoJED9n8CuvaSf4Aj0H/AgxokI9bUkCTo2Krp0PG6qJ BLPugsux3zOTmOoaChH41M9xEiPRu7wlzc7aHNqZQC8MDpk1LTTI81sfJ9M5e1UH DwSCvfRTp5NIBC4sgXt/z9mMogvI3HU1cn2TQp4AfCoKprBBiSnOSPXfp1tujxr6 LZWB0vAAQOlviBS/c4upPn5/gN8VC5qkudu2cLnS+XVxq/udkttjHnLXxV87Lh8/ Dw+R5wAKlAGUMlXTmSc4mJmMxi5jsqxgQ7izNPOwZqZooETSNIOfT9E6Ppl4n+DW CZYHjorTFUCmXiXWCNAmUox00LJcYcrWZZA9sOaGj5FIQ5iMeUYkAbml8PaKQyU= =Znt+ -----END PGP SIGNATURE----- --Apple-Mail=_4D9E9496-59FD-423A-B74B-D55D497C0941-- From owner-freebsd-hackers@FreeBSD.ORG Wed Jul 10 18:05:48 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 952D9C13; Wed, 10 Jul 2013 18:05:48 +0000 (UTC) (envelope-from prvs=1903808b5b=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id CF4D01179; Wed, 10 Jul 2013 18:05:47 +0000 (UTC) Received: from r2d2 ([82.69.141.170]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50004909991.msg; Wed, 10 Jul 2013 19:05:44 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Wed, 10 Jul 2013 19:05:44 +0100 (not processed: message from valid local sender) X-MDDKIM-Result: neutral (mail1.multiplay.co.uk) X-MDRemoteIP: 82.69.141.170 X-Return-Path: prvs=1903808b5b=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk Message-ID: From: "Steven Hartland" To: , =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= References: <86zjtupz3r.fsf@nine.des.no> <51DD9801.4090808@delphij.net> Subject: Re: Make ZFS use the physical sector size when computing initial ashift Date: Wed, 10 Jul 2013 19:06:00 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="utf-8"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 Cc: freebsd-fs@freebsd.org, freebsd-hackers@freebsd.org, ivoras@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Jul 2013 18:05:48 -0000 ----- Original Message ----- From: "Xin Li" > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA512 > > On 07/10/13 02:02, Dag-Erling Sm?rgrav wrote: >> The attached patch causes ZFS to base the minimum transfer size for >> a new vdev on the GEOM provider's stripesize (physical sector size) >> rather than sectorsize (logical sector size), provided that >> stripesize is a power of two larger than sectorsize and smaller >> than or equal to VDEV_PAD_SIZE. This should eliminate the need for >> ivoras@'s gnop trick when creating ZFS pools on Advanced Format >> drives. > > I think there are multiple versions of this (I also have one[1]) but > the concern is that if one creates a pool with ashift=9, and now > ashift=12, the pool gets unimportable. So there need a way to disable > this behavior. I've tested my patch in all configurations I can think of including exported ashift=9 pools being imported, all no issues. For your example e.g. # Create a 4K pool (min_create_ashift=4K, dev=512) test:src> sysctl vfs.zfs.min_create_ashift vfs.zfs.min_create_ashift: 12 test:src> mdconfig -a -t swap -s 128m -S 512 -u 0 test:src> zpool create mdpool md0 test:src> zdb mdpool | grep ashift ashift: 12 ashift: 12 # Create a 512b pool (min_create_ashift=512, dev=512) test:src> zpool destroy mdpool test:src> sysctl vfs.zfs.min_create_ashift=9 vfs.zfs.min_create_ashift: 12 -> 9 test:src> zpool create mdpool md0 test:src> zdb mdpool | grep ashift ashift: 9 ashift: 9 # Import a 512b pool (min_create_ashift=4K, dev=512) test:src> zpool export mdpool test:src> sysctl vfs.zfs.min_create_ashift=12 vfs.zfs.min_create_ashift: 9 -> 12 test:src> zpool import mdpool test:src> zdb mdpool | grep ashift ashift: 9 ashift: 9 # Create a 4K pool (min_create_ashift=512, dev=4K) test:src> zpool destroy mdpool test:src> mdconfig -d -u 0 test:src> mdconfig -a -t swap -s 128m -S 4096 -u 0 test:src> sysctl vfs.zfs.min_create_ashift=9 vfs.zfs.min_create_ashift: 12 -> 9 test:src> zpool create mdpool md0 test:src> zdb mdpool | grep ashift ashift: 12 ashift: 12 # Import a 4K pool (min_create_ashift=4K, dev=4K) test:src> zpool export mdpool test:src> sysctl vfs.zfs.min_create_ashift=12 vfs.zfs.min_create_ashift: 9 -> 12 test:src> zpool import mdpool test:src> zdb mdpool | grep ashift ashift: 12 ashift: 12 > Another thing (not really related to the automatic detection) is that > we need a way to manually override this setting from command line when > creating the pool, this is under active discussion at Illumos mailing > list right now. > > [1] > https://github.com/trueos/trueos/commit/3d2e3a38faad8df4acf442b055c5e98ab873fb26 Yep has been on my list for a while, based on previous discussions on zfs-devel@. I've not had any time recently but I'm following the illumos thread to see what conclusions they come to. Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-hackers@FreeBSD.ORG Wed Jul 10 18:13:14 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 29E1C31D; Wed, 10 Jul 2013 18:13:14 +0000 (UTC) (envelope-from delphij@delphij.net) Received: from anubis.delphij.net (anubis.delphij.net [64.62.153.212]) by mx1.freebsd.org (Postfix) with ESMTP id 12FF4120B; Wed, 10 Jul 2013 18:13:13 +0000 (UTC) Received: from zeta.ixsystems.com (drawbridge.ixsystems.com [206.40.55.65]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by anubis.delphij.net (Postfix) with ESMTPSA id 76ECB9DEA; Wed, 10 Jul 2013 11:13:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=delphij.net; s=anubis; t=1373479993; bh=FjIMYhZ22un7bkY8tLUyfXF6Kw82/IJxBZuDu/11fGg=; h=Date:From:Reply-To:To:CC:Subject:References:In-Reply-To; b=cI/vIr2mBorhCFAMuqfi4qLnifvdXjcGZNUtC8yUr40f6siCjnr0d5ICNur5B3ErI qWIrspOqjADKHiBsismW2/z0wb/kXm/EkZLNF+mU4y8UDSWUwuDeyqY3nVq5UpiH4L tJbID7e7paQLNlS9mByFjRD/QBC9vPt6/7mZim1o= Message-ID: <51DDA433.7040707@delphij.net> Date: Wed, 10 Jul 2013 11:13:07 -0700 From: Xin Li Organization: The FreeBSD Project MIME-Version: 1.0 To: "Justin T. Gibbs" Subject: Re: Make ZFS use the physical sector size when computing initial ashift References: <86zjtupz3r.fsf@nine.des.no> <51DD9801.4090808@delphij.net> <2B9367B6-8759-45C9-B120-9D00A381228F@FreeBSD.org> In-Reply-To: <2B9367B6-8759-45C9-B120-9D00A381228F@FreeBSD.org> X-Enigmail-Version: 1.5.1 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org, =?UTF-8?B?RGFnLUVybGluZyBTbcO4cmdyYXY=?= , d@delphij.net, ivoras@freebsd.org, freebsd-hackers@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: d@delphij.net List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Jul 2013 18:13:14 -0000 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 On 07/10/13 10:38, Justin T. Gibbs wrote: [snip] > I'm sure lots of folks have "some solution" to this. Here is an > old version of what we use at Spectra: > > http://people.freebsd.org/~gibbs/zfs_patches/zfs_auto_ashift.diff > > The above patch is missing some cleanup that was motivated by my > discussions with George Wilson about this change in April. I'll > dig that up later tonight. Even if you don't read the full diff, > please read the included checkin comment since it explains the > motivation behind this particular solution. > > This is on my list of things to upstream in the next week or so > after I add logic to the userspace tools to report whether or not > the TLVs in a pool are using an optimal allocation size. This is > only possible if you actually make ZFS fully aware of logical, > physical, and the configured allocation size. All of the other > patches I've seen just treat physical as logical. Yes, me too. Your version is superior. Cheers, - -- Xin LI https://www.delphij.net/ FreeBSD - The Power to Serve! Live free or die -----BEGIN PGP SIGNATURE----- iQEcBAEBCgAGBQJR3aQzAAoJEG80Jeu8UPuzHn8H/1ZpoTqAQ4+mgQOttOwXgBcr 2Fgh52ztW8fCEQSeIosxXKO06hP7HxFfTPvmeeWyjT8zIpSUSFV6G0NclebKDncP huGFofvx3BKPRmfzZp4iZx1wWQUxSHTmv6ceDwvP7P8GJ0mON+SrZxmmwUjKrf7V W9Sazl0p8e0nxSQykLyjjrkaBx5Iv+aUxu8Alomwy9BmpM8+gd2yutvzghW5L36L 0CvAtIMXdlc+eUdAqa/2rOk/nMOA9sfWVW0gkKYCZk6wvj2DMzjii05UechZ4Z+l 6nEU3UdVsbTX73CABZv4my4JAWc5Yk1s/cWrxtn68AfK8LMPFJCJcVXXOSckMWI= =351W -----END PGP SIGNATURE----- From owner-freebsd-hackers@FreeBSD.ORG Wed Jul 10 18:39:29 2013 Return-Path: Delivered-To: hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 54457B01 for ; Wed, 10 Jul 2013 18:39:29 +0000 (UTC) (envelope-from julian@elischer.org) Received: from vps1.elischer.org (vps1.elischer.org [204.109.63.16]) by mx1.freebsd.org (Postfix) with ESMTP id 0EB2B1333 for ; Wed, 10 Jul 2013 18:39:28 +0000 (UTC) Received: from jre-mbp.elischer.org (ppp121-45-226-51.lns20.per1.internode.on.net [121.45.226.51]) (authenticated bits=0) by vps1.elischer.org (8.14.5/8.14.5) with ESMTP id r6AIGM1B019789 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO) for ; Wed, 10 Jul 2013 11:16:26 -0700 (PDT) (envelope-from julian@elischer.org) From: Julian Elischer Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Subject: possible changes from Panzura Message-Id: Date: Thu, 11 Jul 2013 02:16:17 +0800 To: hackers@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\)) X-Mailer: Apple Mail (2.1508) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Jul 2013 18:39:29 -0000 I'm going through all the internal changes my current employer has made, = categorizing them into "proprietary" and "can feed back to FreeBSD". I will probably send out emails like this several times seeking feedback = on whether a particular patch is considered useful or not.. these are verse 8.0 at the moment. (this is part of our effort to = upgrade) My first candidates are: -----internal commit message---- Add support for dumping kernel dumps in addition to text dumps for kernel panics. Add a new version of savecore to the tree, which knows how to retrieve and save both dumps. Control the new dump behavior via = the debug.kerneldump_requested sysctl - disabling this wil go back to the old text dump-only behavior. ------ part 2 ----- Have savecore be more optimistic about saving compressed cores - always try, and only bail if we actually run out of space. The pessimistic "only try saving if we've got enough free space to handle the entire dump uncompressed" made it too easy for us to run out of space on our /var/crash partition ------- Julian= From owner-freebsd-hackers@FreeBSD.ORG Wed Jul 10 19:06:14 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 6BFEF3ED; Wed, 10 Jul 2013 19:06:14 +0000 (UTC) (envelope-from prvs=1903808b5b=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 8F3B41619; Wed, 10 Jul 2013 19:06:13 +0000 (UTC) Received: from r2d2 ([82.69.141.170]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50004910770.msg; Wed, 10 Jul 2013 20:06:11 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Wed, 10 Jul 2013 20:06:11 +0100 (not processed: message from valid local sender) X-MDDKIM-Result: neutral (mail1.multiplay.co.uk) X-MDRemoteIP: 82.69.141.170 X-Return-Path: prvs=1903808b5b=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk Message-ID: <97E5A0A8DFBF4F75AAE8EDEFDF849EB0@multiplay.co.uk> From: "Steven Hartland" To: "Justin T. Gibbs" , References: <86zjtupz3r.fsf@nine.des.no> <51DD9801.4090808@delphij.net> <2B9367B6-8759-45C9-B120-9D00A381228F@FreeBSD.org> Subject: Re: Make ZFS use the physical sector size when computing initial ashift Date: Wed, 10 Jul 2013 20:06:26 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="UTF-8"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 Cc: freebsd-fs@freebsd.org, =?UTF-8?Q?Dag-Erling_Sm=C3=B8rgrav?= , ivoras@freebsd.org, freebsd-hackers@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Jul 2013 19:06:14 -0000 ----- Original Message ----- From: "Justin T. Gibbs" > I'm sure lots of folks have "some solution" to this. Here is an > old version of what we use at Spectra: > > http://people.freebsd.org/~gibbs/zfs_patches/zfs_auto_ashift.diff > > The above patch is missing some cleanup that was motivated by my > discussions with George Wilson about this change in April. I'll > dig that up later tonight. Even if you don't read the full diff, > please read the included checkin comment since it explains the > motivation behind this particular solution. > > This is on my list of things to upstream in the next week or so after > I add logic to the userspace tools to report whether or not the > TLVs in a pool are using an optimal allocation size. This is only > possible if you actually make ZFS fully aware of logical, physical, > and the configured allocation size. All of the other patches I've seen > just treat physical as logical. Reading through your patch it seems that your logical_ashift equates to the current ashift values which for geom devices is based off sectorsize and your physical_ashift is based stripesize. This is almost identical to the approach I used adding a "desired ashift", which equates to your physical_ashift, along side the standard ashift i.e. required aka logical_ashift value :) One issue I did spot in your patch is that you currently expose zfs_max_auto_ashift as a sysctl but don't clamp its value which would cause problems should a user configure values > 13. If your interested in the reason for this its explained in the comments in my version which does a very similar thing with validation. Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-hackers@FreeBSD.ORG Wed Jul 10 19:24:40 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id B6BDD948; Wed, 10 Jul 2013 19:24:40 +0000 (UTC) (envelope-from gibbs@FreeBSD.org) Received: from aslan.scsiguy.com (mail.scsiguy.com [70.89.174.89]) by mx1.freebsd.org (Postfix) with ESMTP id 72B8916E5; Wed, 10 Jul 2013 19:24:39 +0000 (UTC) Received: from [192.168.6.139] (207-225-98-3.dia.static.qwest.net [207.225.98.3]) (authenticated bits=0) by aslan.scsiguy.com (8.14.7/8.14.5) with ESMTP id r6AJOWr5098398 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO); Wed, 10 Jul 2013 19:24:32 GMT (envelope-from gibbs@FreeBSD.org) Subject: Re: Make ZFS use the physical sector size when computing initial ashift Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\)) Content-Type: text/plain; charset=us-ascii From: "Justin T. Gibbs" X-Priority: 3 In-Reply-To: <97E5A0A8DFBF4F75AAE8EDEFDF849EB0@multiplay.co.uk> Date: Wed, 10 Jul 2013 13:24:26 -0600 Content-Transfer-Encoding: quoted-printable Message-Id: <0A3A05F7-7859-4285-B15A-5E7DDB751062@FreeBSD.org> References: <86zjtupz3r.fsf@nine.des.no> <51DD9801.4090808@delphij.net> <2B9367B6-8759-45C9-B120-9D00A381228F@FreeBSD.org> <97E5A0A8DFBF4F75AAE8EDEFDF849EB0@multiplay.co.uk> To: "Steven Hartland" X-Mailer: Apple Mail (2.1508) X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.4.3 (aslan.scsiguy.com [70.89.174.89]); Wed, 10 Jul 2013 19:24:32 +0000 (UTC) Cc: freebsd-fs@freebsd.org, =?iso-8859-1?Q?Dag-Erling_Sm=F8rgrav?= , d@delphij.net, ivoras@freebsd.org, freebsd-hackers@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Jul 2013 19:24:40 -0000 On Jul 10, 2013, at 1:06 PM, "Steven Hartland" = wrote: > ----- Original Message ----- From: "Justin T. Gibbs"=20 >> I'm sure lots of folks have "some solution" to this. Here is an >> old version of what we use at Spectra: >> http://people.freebsd.org/~gibbs/zfs_patches/zfs_auto_ashift.diff >> The above patch is missing some cleanup that was motivated by my >> discussions with George Wilson about this change in April. I'll >> dig that up later tonight. Even if you don't read the full diff, >> please read the included checkin comment since it explains the >> motivation behind this particular solution. >>=20 >> This is on my list of things to upstream in the next week or so after >> I add logic to the userspace tools to report whether or not the >> TLVs in a pool are using an optimal allocation size. This is only >> possible if you actually make ZFS fully aware of logical, physical, >> and the configured allocation size. All of the other patches I've = seen >> just treat physical as logical. >=20 > Reading through your patch it seems that your logical_ashift equates = to > the current ashift values which for geom devices is based off = sectorsize > and your physical_ashift is based stripesize. >=20 > This is almost identical to the approach I used adding a "desired = ashift", > which equates to your physical_ashift, along side the standard ashift > i.e. required aka logical_ashift value :) Yes, the approaches are similar. Our current version records the = logical access size in the vdev structure too, which might relate to the issue below. > One issue I did spot in your patch is that you currently expose > zfs_max_auto_ashift as a sysctl but don't clamp its value which would > cause problems should a user configure values > 13. I would expect the zio pipeline to simply insert an ashift aligned = thunking buffer for these operations, but I haven't tried going past an ashift of = 13 in my tests. If it is an issue, it seems the restriction should be based = on logical access size, not optimal access size. -- Justin= From owner-freebsd-hackers@FreeBSD.ORG Wed Jul 10 19:41:50 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 200F2DF7; Wed, 10 Jul 2013 19:41:50 +0000 (UTC) (envelope-from prvs=1903808b5b=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 34B9A1796; Wed, 10 Jul 2013 19:41:48 +0000 (UTC) Received: from r2d2 ([82.69.141.170]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50004911185.msg; Wed, 10 Jul 2013 20:41:47 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Wed, 10 Jul 2013 20:41:47 +0100 (not processed: message from valid local sender) X-MDDKIM-Result: neutral (mail1.multiplay.co.uk) X-MDRemoteIP: 82.69.141.170 X-Return-Path: prvs=1903808b5b=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk Message-ID: <7BB4167807A4434A9CD5FB0F1600439F@multiplay.co.uk> From: "Steven Hartland" To: "Justin T. Gibbs" References: <86zjtupz3r.fsf@nine.des.no> <51DD9801.4090808@delphij.net> <2B9367B6-8759-45C9-B120-9D00A381228F@FreeBSD.org> <97E5A0A8DFBF4F75AAE8EDEFDF849EB0@multiplay.co.uk> <0A3A05F7-7859-4285-B15A-5E7DDB751062@FreeBSD.org> Subject: Re: Make ZFS use the physical sector size when computing initial ashift Date: Wed, 10 Jul 2013 20:42:01 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 Cc: freebsd-fs@freebsd.org, =?iso-8859-1?Q?Dag-Erling_Sm=F8rgrav?= , d@delphij.net, ivoras@freebsd.org, freebsd-hackers@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Jul 2013 19:41:50 -0000 ----- Original Message ----- From: "Justin T. Gibbs" > On Jul 10, 2013, at 1:06 PM, "Steven Hartland" wrote: >> ----- Original Message ----- From: "Justin T. Gibbs" >>> I'm sure lots of folks have "some solution" to this. Here is an >>> old version of what we use at Spectra: >>> http://people.freebsd.org/~gibbs/zfs_patches/zfs_auto_ashift.diff >>> The above patch is missing some cleanup that was motivated by my >>> discussions with George Wilson about this change in April. I'll >>> dig that up later tonight. Even if you don't read the full diff, >>> please read the included checkin comment since it explains the >>> motivation behind this particular solution. >>> >>> This is on my list of things to upstream in the next week or so after >>> I add logic to the userspace tools to report whether or not the >>> TLVs in a pool are using an optimal allocation size. This is only >>> possible if you actually make ZFS fully aware of logical, physical, >>> and the configured allocation size. All of the other patches I've seen >>> just treat physical as logical. >> >> Reading through your patch it seems that your logical_ashift equates to >> the current ashift values which for geom devices is based off sectorsize >> and your physical_ashift is based stripesize. >> >> This is almost identical to the approach I used adding a "desired ashift", >> which equates to your physical_ashift, along side the standard ashift >> i.e. required aka logical_ashift value :) > > Yes, the approaches are similar. Our current version records the logical > access size in the vdev structure too, which might relate to the issue > below. > > > One issue I did spot in your patch is that you currently expose > > zfs_max_auto_ashift as a sysctl but don't clamp its value which would > > cause problems should a user configure values > 13. > > I would expect the zio pipeline to simply insert an ashift aligned thunking > buffer for these operations, but I haven't tried going past an ashift of 13 in > my tests. If it is an issue, it seems the restriction should be based on > logical access size, not optimal access size. Yes with your methodology you'll only see the issue if zfs_max_auto_ashift and physical_ashift are both > 13, but this can be the case for example on a RAID controller with large stripsize. Looking back at my old patch it too suffers from the same issue along with the current code base, but that would only happen if logical sector size resulted in an ashift > 13 which is going to be much less common ;-) Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-hackers@FreeBSD.ORG Wed Jul 10 19:50:49 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 74AF9353; Wed, 10 Jul 2013 19:50:49 +0000 (UTC) (envelope-from gibbs@FreeBSD.org) Received: from aslan.scsiguy.com (mail.scsiguy.com [70.89.174.89]) by mx1.freebsd.org (Postfix) with ESMTP id 3F9C21828; Wed, 10 Jul 2013 19:50:48 +0000 (UTC) Received: from [192.168.6.139] (207-225-98-3.dia.static.qwest.net [207.225.98.3]) (authenticated bits=0) by aslan.scsiguy.com (8.14.7/8.14.5) with ESMTP id r6AJoccA098537 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO); Wed, 10 Jul 2013 19:50:39 GMT (envelope-from gibbs@FreeBSD.org) Subject: Re: Make ZFS use the physical sector size when computing initial ashift Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\)) Content-Type: text/plain; charset=iso-8859-1 From: "Justin T. Gibbs" X-Priority: 3 In-Reply-To: <7BB4167807A4434A9CD5FB0F1600439F@multiplay.co.uk> Date: Wed, 10 Jul 2013 13:50:33 -0600 Content-Transfer-Encoding: quoted-printable Message-Id: <00205B20-742F-44F6-B538-3B809D8BC03F@FreeBSD.org> References: <86zjtupz3r.fsf@nine.des.no> <51DD9801.4090808@delphij.net> <2B9367B6-8759-45C9-B120-9D00A381228F@FreeBSD.org> <97E5A0A8DFBF4F75AAE8EDEFDF849EB0@multiplay.co.uk> <0A3A05F7-7859-4285-B15A-5E7DDB751062@FreeBSD.org> <7BB4167807A4434A9CD5FB0F1600439F@multiplay.co.uk> To: "Steven Hartland" X-Mailer: Apple Mail (2.1508) X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.4.3 (aslan.scsiguy.com [70.89.174.89]); Wed, 10 Jul 2013 19:50:39 +0000 (UTC) Cc: freebsd-fs@freebsd.org, =?iso-8859-1?Q?Dag-Erling_Sm=F8rgrav?= , d@delphij.net, ivoras@freebsd.org, freebsd-hackers@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Jul 2013 19:50:49 -0000 On Jul 10, 2013, at 1:42 PM, "Steven Hartland" = wrote: >=20 > ----- Original Message ----- From: "Justin T. Gibbs" >> On Jul 10, 2013, at 1:06 PM, "Steven Hartland" wrote: >>> ----- Original Message ----- From: "Justin T. Gibbs"=20 >>>> I'm sure lots of folks have "some solution" to this. Here is an >>>> old version of what we use at Spectra: >>>> http://people.freebsd.org/~gibbs/zfs_patches/zfs_auto_ashift.diff >>>> The above patch is missing some cleanup that was motivated by my >>>> discussions with George Wilson about this change in April. I'll >>>> dig that up later tonight. Even if you don't read the full diff, >>>> please read the included checkin comment since it explains the >>>> motivation behind this particular solution. >>>> This is on my list of things to upstream in the next week or so = after >>>> I add logic to the userspace tools to report whether or not the >>>> TLVs in a pool are using an optimal allocation size. This is only >>>> possible if you actually make ZFS fully aware of logical, physical, >>>> and the configured allocation size. All of the other patches I've = seen >>>> just treat physical as logical. >>> Reading through your patch it seems that your logical_ashift equates = to >>> the current ashift values which for geom devices is based off = sectorsize >>> and your physical_ashift is based stripesize. >>> This is almost identical to the approach I used adding a "desired = ashift", >>> which equates to your physical_ashift, along side the standard = ashift >>> i.e. required aka logical_ashift value :) >>=20 >> Yes, the approaches are similar. Our current version records the = logical >> access size in the vdev structure too, which might relate to the = issue >> below. >>=20 >> > One issue I did spot in your patch is that you currently expose >> > zfs_max_auto_ashift as a sysctl but don't clamp its value which = would >> > cause problems should a user configure values > 13. >>=20 >> I would expect the zio pipeline to simply insert an ashift aligned = thunking >> buffer for these operations, but I haven't tried going past an ashift = of 13 in >> my tests. If it is an issue, it seems the restriction should be = based on >> logical access size, not optimal access size. >=20 > Yes with your methodology you'll only see the issue if = zfs_max_auto_ashift > and physical_ashift are both > 13, but this can be the case for = example > on a RAID controller with large stripsize. I'm not sure I follow. logical_ashift is available in our latest code, = as is the physical_ashift. But even without the logical_ashift, why doesn't the = zio pipeline properly thunk zio_phys_read() access based on the configured = ashift? -- Justin From owner-freebsd-hackers@FreeBSD.ORG Wed Jul 10 18:57:19 2013 Return-Path: Delivered-To: hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 15E95E4 for ; Wed, 10 Jul 2013 18:57:19 +0000 (UTC) (envelope-from jkh@mail.turbofuzz.com) Received: from mail.crittercasa.com (mail.turbofuzz.com [208.87.221.144]) by mx1.freebsd.org (Postfix) with ESMTP id EADA9147C for ; Wed, 10 Jul 2013 18:57:18 +0000 (UTC) Received: from [10.20.30.145] (75-101-82-48.static.sonic.net [75.101.82.48]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by mail.crittercasa.com (Postfix) with ESMTPS id 55C61164896; Wed, 10 Jul 2013 11:55:11 -0700 (PDT) Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\)) Subject: Kernel dumps [was Re: possible changes from Panzura] From: Jordan Hubbard In-Reply-To: Date: Wed, 10 Jul 2013 11:57:10 -0700 Message-Id: <9890DFF1-892A-4DCA-9E33-B70681154F43@mail.turbofuzz.com> References: To: Julian Elischer X-Mailer: Apple Mail (2.1508) X-Mailman-Approved-At: Wed, 10 Jul 2013 19:54:52 +0000 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: hackers@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Jul 2013 18:57:19 -0000 On Jul 10, 2013, at 11:16 AM, Julian Elischer = wrote: > My first candidates are: Those sound useful. Just out of curiosity, however, since we're on the = topic of kernel dumps: Has anyone even looked into the notion of an = emergency fall-back network stack to enable remote kernel panic (or = system hang) debugging, the way OS X lets you do? I can't tell you the = number of times I've NMI'd a Mac and connected to it remotely in a = scenario where everything was totally wedged and just a couple of = minutes in kgdb (or now lldb) quickly showed that everything was waiting = on a specific lock and the problem became manifestly clear. The feature also lets you scrape a panic'd machine with automation, = running some kgdb scripts against it to glean useful information for = later analysis vs having to have someone schlep the dump image manually = to triage. It's going to be damn hard to live without this now, and if = someone else isn't working on it, that's good to know too! - Jordan From owner-freebsd-hackers@FreeBSD.ORG Wed Jul 10 20:16:58 2013 Return-Path: Delivered-To: hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id E1936CA7 for ; Wed, 10 Jul 2013 20:16:58 +0000 (UTC) (envelope-from eric@vangyzen.net) Received: from aussmtpmrkps320.us.dell.com (aussmtpmrkps320.us.dell.com [143.166.224.254]) by mx1.freebsd.org (Postfix) with ESMTP id B7AFB199A for ; Wed, 10 Jul 2013 20:16:58 +0000 (UTC) X-Loopcount0: from 64.238.244.148 X-IronPort-AV: E=Sophos;i="4.87,1038,1363150800"; d="scan'208";a="34508446" Message-ID: <51DDC133.7010401@vangyzen.net> Date: Wed, 10 Jul 2013 15:16:51 -0500 From: Eric van Gyzen User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130702 Thunderbird/17.0.7 MIME-Version: 1.0 To: Julian Elischer Subject: Re: possible changes from Panzura References: In-Reply-To: Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit Cc: hackers@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Jul 2013 20:16:58 -0000 On 07/10/2013 13:16, Julian Elischer wrote: > I'm going through all the internal changes my current employer has made, categorizing them > into "proprietary" and "can feed back to FreeBSD". > > I will probably send out emails like this several times seeking feedback on whether a particular patch is considered useful or not.. > these are verse 8.0 at the moment. (this is part of our effort to upgrade) > > My first candidates are: > > -----internal commit message---- > Add support for dumping kernel dumps in addition to text dumps for > kernel panics. Add a new version of savecore to the tree, which knows > how to retrieve and save both dumps. Control the new dump behavior via the > debug.kerneldump_requested sysctl - disabling this wil go back to the > old text dump-only behavior. I wonder which would be more useful: this, or just dumping the full dump and using crashinfo to create a text summary after reboot. Of course, crashinfo could be enhanced to show anything it's currently missing (relative to the text dump). This would have the advantage of doing less stuff at dump time. Yours would have the advantage that it exists and works. :) Thoughts? > ------ part 2 ----- > Have savecore be more optimistic about > saving compressed cores - always try, and only bail if we actually run > out of space. The pessimistic "only try saving if we've got enough free > space to handle the entire dump uncompressed" made it too easy for us to > run out of space on our /var/crash partition Yes, please. I've run into this occasionally, but it never annoyed me enough to fix it. Procrastination pays off yet again. ;) Eric From owner-freebsd-hackers@FreeBSD.ORG Wed Jul 10 20:37:54 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id EF3286B7; Wed, 10 Jul 2013 20:37:54 +0000 (UTC) (envelope-from prvs=1903808b5b=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 1FA4D1A84; Wed, 10 Jul 2013 20:37:53 +0000 (UTC) Received: from r2d2 ([82.69.141.170]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50004911799.msg; Wed, 10 Jul 2013 21:37:51 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Wed, 10 Jul 2013 21:37:51 +0100 (not processed: message from valid local sender) X-MDDKIM-Result: neutral (mail1.multiplay.co.uk) X-MDRemoteIP: 82.69.141.170 X-Return-Path: prvs=1903808b5b=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk Message-ID: From: "Steven Hartland" To: "Justin T. Gibbs" References: <86zjtupz3r.fsf@nine.des.no> <51DD9801.4090808@delphij.net> <2B9367B6-8759-45C9-B120-9D00A381228F@FreeBSD.org> <97E5A0A8DFBF4F75AAE8EDEFDF849EB0@multiplay.co.uk> <0A3A05F7-7859-4285-B15A-5E7DDB751062@FreeBSD.org> <7BB4167807A4434A9CD5FB0F1600439F@multiplay.co.uk> <00205B20-742F-44F6-B538-3B809D8BC03F@FreeBSD.org> Subject: Re: Make ZFS use the physical sector size when computing initial ashift Date: Wed, 10 Jul 2013 21:38:07 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 Cc: freebsd-fs@freebsd.org, =?iso-8859-1?Q?Dag-Erling_Sm=F8rgrav?= , d@delphij.net, ivoras@freebsd.org, freebsd-hackers@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Jul 2013 20:37:55 -0000 ----- Original Message ----- From: "Justin T. Gibbs" ... >>> > One issue I did spot in your patch is that you currently expose >>> > zfs_max_auto_ashift as a sysctl but don't clamp its value which would >>> > cause problems should a user configure values > 13. >>> >>> I would expect the zio pipeline to simply insert an ashift aligned thunking >>> buffer for these operations, but I haven't tried going past an ashift of 13 in >>> my tests. If it is an issue, it seems the restriction should be based on >>> logical access size, not optimal access size. >> >> Yes with your methodology you'll only see the issue if zfs_max_auto_ashift >> and physical_ashift are both > 13, but this can be the case for example >> on a RAID controller with large stripsize. > > I'm not sure I follow. logical_ashift is available in our latest code, as is the > physical_ashift. But even without the logical_ashift, why doesn't the zio > pipeline properly thunk zio_phys_read() access based on the configured ashift? When I looked at it, which was a long time ago now so please excuse me if I'm a little rusty on the details, zio_phys_read() was working more luck than judgement as the offsets passed in where calculated from a valid start + increment based on the size of a structure within vdev_label_offset() with no ashift logic applied that I cound find. The result was pools created with large ashift's where unstable when I tested. Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-hackers@FreeBSD.ORG Wed Jul 10 21:50:52 2013 Return-Path: Delivered-To: hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id C8469F96 for ; Wed, 10 Jul 2013 21:50:52 +0000 (UTC) (envelope-from jkh@mail.turbofuzz.com) Received: from mail.crittercasa.com (mail.turbofuzz.com [208.87.221.144]) by mx1.freebsd.org (Postfix) with ESMTP id B6C981D9B for ; Wed, 10 Jul 2013 21:50:52 +0000 (UTC) Received: from [10.20.30.11] (75-101-82-48.static.sonic.net [75.101.82.48]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by mail.crittercasa.com (Postfix) with ESMTPS id D8B88164893; Wed, 10 Jul 2013 14:47:58 -0700 (PDT) Content-Type: text/plain; charset=iso-8859-1 Mime-Version: 1.0 (Mac OS X Mail 7.0 \(1784.1\)) Subject: Re: Kernel dumps [was Re: possible changes from Panzura] From: Jordan Hubbard In-Reply-To: Date: Wed, 10 Jul 2013 14:50:19 -0700 Content-Transfer-Encoding: quoted-printable Message-Id: <3592BFB7-0663-4381-AFF5-C7DE0AE16858@mail.turbofuzz.com> References: <9890DFF1-892A-4DCA-9E33-B70681154F43@mail.turbofuzz.com> To: asomers@gmail.com X-Mailer: Apple Mail (2.1784.1) Cc: hackers@freebsd.org, Julian Elischer X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Jul 2013 21:50:52 -0000 On Jul 10, 2013, at 1:04 PM, asomers@gmail.com wrote: > I don't doubt that it would be useful to have an emergency network > stack. But have you ever looked into debugging over firewire? Absolutely. In fact, before the advent of remote network debugging, FW = was totally the debugging method of choice since firewire target DMA = lets you do all kinds of useful things (as well as a few things that = simply scare the security guys to death ;-) ). My point was more that actually being able to debug a machine over the = network is such a step up in terms of convenience/awesomeness that if = anyone is thinking of putting any time and attention into this area at = all, that's definitely the target to go for. Looking at = http://www.opensource.apple.com/tarballs/xnu/xnu-2050.22.13.tar.gz = there's even reasonable "documentation" on the kernel debugging protocol = in xnu/osfmk/kdp. Folks could do worse than try to clone it. The gdb = debugger macros in support of it are also in xnu/kgmacros. None of it = is going to be 'drop in' for FreeBSD by any stretch of the imagination, = but it's always easier to get to a destination when you have a map. :-) = Anyone with a Mac can also ""nvram boot-args=3D"debug=3D0x144"" and = test-drive it around, just to see how it works in actual practice. See = also: = https://developer.apple.com/library/mac/#documentation/Darwin/Conceptual/K= EXTConcept/KEXTConceptDebugger/debug_tutorial.html - Jordan From owner-freebsd-hackers@FreeBSD.ORG Wed Jul 10 20:04:18 2013 Return-Path: Delivered-To: hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id F3341872 for ; Wed, 10 Jul 2013 20:04:17 +0000 (UTC) (envelope-from asomers@gmail.com) Received: from mail-qc0-x231.google.com (mail-qc0-x231.google.com [IPv6:2607:f8b0:400d:c01::231]) by mx1.freebsd.org (Postfix) with ESMTP id BA63C18F2 for ; Wed, 10 Jul 2013 20:04:17 +0000 (UTC) Received: by mail-qc0-f177.google.com with SMTP id n1so3831445qcx.8 for ; Wed, 10 Jul 2013 13:04:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=MZWmvWr2a3/X1UuxMv/Lk93L8/fMxRRqcByrt3uBtu8=; b=RtgfcUsZz8OI+xQC4WksxxQ6zo+gfwM0mfeQaC6DpH2l0KPUpI1sUvGh1w6bBLo82P abGzDjGJZ7YjUJqCH+QrBWDiXjVGxxnzAkwC4ZwTnu+7oDJ8jbpvbqkSd2Q0iNmmIkkM Js6dwhleq1K6RU4uHbozqlDlOvQ+yEu3mQtSPKO5DwhRfVdkXqmvp8Wqs3J1j0zFu81B rAeVyqxyljEnDVpDR3lGSsIRaboRX1E4XKnPu59gV/p7CAgwk869goIPfPNf+iPoJoQo niBPYOXQdWXENitvPzDNTE6OZd+Gf21g2l2SKg9XrfHV9+axyLifou3ln7sUw6nYy8/B Ws6Q== MIME-Version: 1.0 X-Received: by 10.224.8.130 with SMTP id h2mr29434930qah.9.1373486657154; Wed, 10 Jul 2013 13:04:17 -0700 (PDT) Received: by 10.49.37.226 with HTTP; Wed, 10 Jul 2013 13:04:17 -0700 (PDT) In-Reply-To: <9890DFF1-892A-4DCA-9E33-B70681154F43@mail.turbofuzz.com> References: <9890DFF1-892A-4DCA-9E33-B70681154F43@mail.turbofuzz.com> Date: Wed, 10 Jul 2013 14:04:17 -0600 Message-ID: Subject: Re: Kernel dumps [was Re: possible changes from Panzura] From: asomers@gmail.com To: Jordan Hubbard Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Mailman-Approved-At: Wed, 10 Jul 2013 21:55:58 +0000 Cc: hackers@freebsd.org, Julian Elischer X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Jul 2013 20:04:18 -0000 On Wed, Jul 10, 2013 at 12:57 PM, Jordan Hubbard w= rote: > > On Jul 10, 2013, at 11:16 AM, Julian Elischer wrote= : > >> My first candidates are: > > Those sound useful. Just out of curiosity, however, since we're on the = topic of kernel dumps: Has anyone even looked into the notion of an emerge= ncy fall-back network stack to enable remote kernel panic (or system hang) = debugging, the way OS X lets you do? I can't tell you the number of times = I've NMI'd a Mac and connected to it remotely in a scenario where everythin= g was totally wedged and just a couple of minutes in kgdb (or now lldb) qui= ckly showed that everything was waiting on a specific lock and the problem = became manifestly clear. > > The feature also lets you scrape a panic'd machine with automation, runni= ng some kgdb scripts against it to glean useful information for later analy= sis vs having to have someone schlep the dump image manually to triage. It= 's going to be damn hard to live without this now, and if someone else isn'= t working on it, that's good to know too! I don't doubt that it would be useful to have an emergency network stack. But have you ever looked into debugging over firewire? We've had success with it. All of our development machines are connected to a single firewire bus. When one panics, we can remotely debug it with both kdb and ddb. It's not ethernet , but it's still much faster than a serial port. https://wiki.freebsd.org/DebugWithDcons > > - Jordan > > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org= " From owner-freebsd-hackers@FreeBSD.ORG Wed Jul 10 22:09:24 2013 Return-Path: Delivered-To: hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id E460D4FA for ; Wed, 10 Jul 2013 22:09:24 +0000 (UTC) (envelope-from toasty@dragondata.com) Received: from mail-gh0-x22e.google.com (mail-gh0-x22e.google.com [IPv6:2607:f8b0:4002:c05::22e]) by mx1.freebsd.org (Postfix) with ESMTP id A24781E5F for ; Wed, 10 Jul 2013 22:09:24 +0000 (UTC) Received: by mail-gh0-f174.google.com with SMTP id r17so2551349ghr.33 for ; Wed, 10 Jul 2013 15:09:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dragondata.com; s=google; h=content-type:mime-version:subject:from:in-reply-to:date:cc :message-id:references:to:x-mailer; bh=AGggicbba6vYC+SFopoJGDiCQEqkGH6D1c9NBCkhrXI=; b=HjsCc+4UUev8AKJLIzriEBjArgiXzHri9IM6nX3dZXirJ7kY1CEI5t4cp2pTIH5tzy +J1iDebwraimR7CvGjNzzOmR7tawYKLudOikfDDw/sNand09+WsXUpWGJiLcdTkNi7JX +MqUwosMjB4TKHi06HWMDqUEybpAfbeyiLsTI= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=content-type:mime-version:subject:from:in-reply-to:date:cc :message-id:references:to:x-mailer:x-gm-message-state; bh=AGggicbba6vYC+SFopoJGDiCQEqkGH6D1c9NBCkhrXI=; b=FEbzc1aOAYpiDy0Ar+ioAosOqc/Q3bXJBApZfok8oP6Ltj6Z9Y4Z7UR+RbtnkNXx8d fzRZILJF+KQDG/TU0p/MQJBh8vDaFA5v0Qks46iazFZL4THOypr3Mm5wqK0vV340Y4l9 PknCCASVagvJ7xZ/ygQr2bzkqlVYV9RS/9FAd+yxPfDoFnK8vLqi+hg0O0WlF9L6LQ6+ OQH07mimWg1uyZGlkPUEGLNTNXTZoPKigmqOviwtqsRdkx7HcArAdLCuSB74uYlUjCzQ QCbvps3p+nu2AImaL7e4r8lpVWBfHoiHT9IQVClX8QwlR07h/JjyULarYvXHxj7KJCZ4 xYOQ== X-Received: by 10.236.31.202 with SMTP id m50mr19275029yha.19.1373494163958; Wed, 10 Jul 2013 15:09:23 -0700 (PDT) Received: from vpn155.rw1.your.org (vpn155.rw1.your.org. [204.9.51.155]) by mx.google.com with ESMTPSA id m5sm55398020yha.23.2013.07.10.15.09.22 for (version=TLSv1.2 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 10 Jul 2013 15:09:23 -0700 (PDT) Mime-Version: 1.0 (Mac OS X Mail 7.0 \(1784.1\)) Subject: Re: Kernel dumps [was Re: possible changes from Panzura] From: Kevin Day In-Reply-To: <9890DFF1-892A-4DCA-9E33-B70681154F43@mail.turbofuzz.com> Date: Wed, 10 Jul 2013 17:09:10 -0500 Message-Id: <4F0DFAB7-D6D5-4068-A543-C9DF885D1A7D@dragondata.com> References: <9890DFF1-892A-4DCA-9E33-B70681154F43@mail.turbofuzz.com> To: Jordan Hubbard X-Mailer: Apple Mail (2.1784.1) X-Gm-Message-State: ALoCoQmrlP4jfHLSoTVrC/IK3L/q/BBBPFjuJbjS9zLXW+YWOi1NOmDsxjZbM8cT79xD4XDD1hxy Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: hackers@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Jul 2013 22:09:25 -0000 >=20 >=20 > Those sound useful. Just out of curiosity, however, since we're on = the topic of kernel dumps: Has anyone even looked into the notion of an = emergency fall-back network stack to enable remote kernel panic (or = system hang) debugging, the way OS X lets you do? I can't tell you the = number of times I've NMI'd a Mac and connected to it remotely in a = scenario where everything was totally wedged and just a couple of = minutes in kgdb (or now lldb) quickly showed that everything was waiting = on a specific lock and the problem became manifestly clear. >=20 > The feature also lets you scrape a panic'd machine with automation, = running some kgdb scripts against it to glean useful information for = later analysis vs having to have someone schlep the dump image manually = to triage. It's going to be damn hard to live without this now, and if = someone else isn't working on it, that's good to know too! At a previous employer, we had a system where on a panic it had a = totally separate stack capable of just IP/UDP/TFTP and would save its = core via TFTP to a server. This isn=92t as nice as full remote = debugging, but it was a whole lot easier to develop. The caveats I = remember were: 1) We didn=92t want to implement ARP, so you had to write the mac = address of the =93dump server=94 to the kernel via sysctl before = crashing. 2) We also didn=92t want to have to deal with routing tables, so you had = to manually specify what interface to blast packets out to, also via = sysctl. 3) After a panic we didn=92t want to rely on interrupt processing = working, so it polled the network interface and blocked whenever it = needed to. Since this was an embedded system, it wasn=92t too big of a = deal - only one network driver had to be hacked to support this. = Basically a flag that would switch to =93disable normal processing, = switch to polled fifos for input and output=94 until reboot. 4) The whole system used only preallocated buffers and its own stack = (carved out from memory on boot) so even if the kernel=92s malloc was = trashed, we could still dump. I=92m not sure this really would scratch your itch, but I believe this = took me no more than a day or two to implement. Parts #1 and #2 would be = pretty easy, but I=92m not sure how generic the kernel could support an = emergency network mode that doesn=92t require interrupts for every = network card out there. Maybe that isn=92t as important to you as it was = to us. The whole exercise is much easier if you don=92t use TFTP but a custom = protocol that doesn=92t require the crashing system to receive any = packets, if it can just blast away at some random host oblivious if it=92s= working or not, it=92s a lot less code to write. From owner-freebsd-hackers@FreeBSD.ORG Wed Jul 10 22:09:55 2013 Return-Path: Delivered-To: hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 025085E0 for ; Wed, 10 Jul 2013 22:09:55 +0000 (UTC) (envelope-from will@firepipe.net) Received: from mail-ve0-f176.google.com (mail-ve0-f176.google.com [209.85.128.176]) by mx1.freebsd.org (Postfix) with ESMTP id B9AEF1E6A for ; Wed, 10 Jul 2013 22:09:54 +0000 (UTC) Received: by mail-ve0-f176.google.com with SMTP id c13so6685387vea.35 for ; Wed, 10 Jul 2013 15:09:47 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding:x-gm-message-state; bh=P0hgntb69lkjdLa3xZcY1eJomyKef+w5S1ZlbOZLz14=; b=eVugtxSOK2AHaTq2v2m0FEHQ85QsCj0jh4uohMpKsVyLsdjFBuWAOZ1U6BNDSHtSMc TqQutw5mms8GJP7MwkNVPw/yEZvmnb5YD4nsagFoavObpeS6woT40xJ11MzbNtILdZw4 xYbOQTWn/a0gZuvWqNKDjtiG6EtqqgoljOOWa41BkQA9VG04JfmSMW9Gj2wcrferksCd bj4DHiYEngYee3wLJbOKSuPuo5Yc1G3UcgFzlnr9dLF4+osW62Q1jpdQyAoy7u+H15tI IV3sB+UnFd//F1PamboSV17g/TKOkt05+NEix2aXoIr7RmzCdGNP/SDcUagDJ6xPNxeE utOA== MIME-Version: 1.0 X-Received: by 10.58.234.161 with SMTP id uf1mr19886913vec.57.1373494187427; Wed, 10 Jul 2013 15:09:47 -0700 (PDT) Received: by 10.58.226.66 with HTTP; Wed, 10 Jul 2013 15:09:47 -0700 (PDT) In-Reply-To: <3592BFB7-0663-4381-AFF5-C7DE0AE16858@mail.turbofuzz.com> References: <9890DFF1-892A-4DCA-9E33-B70681154F43@mail.turbofuzz.com> <3592BFB7-0663-4381-AFF5-C7DE0AE16858@mail.turbofuzz.com> Date: Wed, 10 Jul 2013 16:09:47 -0600 Message-ID: Subject: Re: Kernel dumps [was Re: possible changes from Panzura] From: Will Andrews To: Jordan Hubbard Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Gm-Message-State: ALoCoQmK1ohdUOSlHEMS/Rv2QMzGIIXGWgoPai9R+PxzrJEzAhfRhd9OxZSh3dXQdNgBHHAXPTnR Cc: hackers@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Jul 2013 22:09:55 -0000 On Wed, Jul 10, 2013 at 3:50 PM, Jordan Hubbard wr= ote: > Absolutely. In fact, before the advent of remote network debugging, FW w= as totally the debugging method of choice since firewire target DMA lets yo= u do all kinds of useful things (as well as a few things that simply scare = the security guys to death ;-) ). > > My point was more that actually being able to debug a machine over the ne= twork is such a step up in terms of convenience/awesomeness that if anyone = is thinking of putting any time and attention into this area at all, that's= definitely the target to go for. > > Looking at http://www.opensource.apple.com/tarballs/xnu/xnu-2050.22.13.ta= r.gz there's even reasonable "documentation" on the kernel debugging protoc= ol in xnu/osfmk/kdp. Folks could do worse than try to clone it. The gdb d= ebugger macros in support of it are also in xnu/kgmacros. None of it is go= ing to be 'drop in' for FreeBSD by any stretch of the imagination, but it's= always easier to get to a destination when you have a map. :-) Anyone w= ith a Mac can also ""nvram boot-args=3D"debug=3D0x144"" and test-drive it a= round, just to see how it works in actual practice. See also: https://dev= eloper.apple.com/library/mac/#documentation/Darwin/Conceptual/KEXTConcept/K= EXTConceptDebugger/debug_tutorial.html Speaking of Apple solutions, I've recently used Apple's kgdb with the kernel debug kit & kdp remote debugging, to debug a panic'd OS X host. It's really quite nice, because the debug kit comes with a ton of macros, similar to kdb, and you also get the benefit of source debugging. I think FreeBSD would benefit massively from finding some way to share macros between kdb and kgdb, in addition to having an "emergency network stack" like you suggest. As Alan says, until then, there's firewire, and also gdbsx if your FreeBSD system is running as a Xen guest. --Will. From owner-freebsd-hackers@FreeBSD.ORG Wed Jul 10 22:50:27 2013 Return-Path: Delivered-To: hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id A8EBF305 for ; Wed, 10 Jul 2013 22:50:27 +0000 (UTC) (envelope-from bakul@bitblocks.com) Received: from mail.bitblocks.com (ns1.bitblocks.com [173.228.5.8]) by mx1.freebsd.org (Postfix) with ESMTP id 8DF881FE2 for ; Wed, 10 Jul 2013 22:50:27 +0000 (UTC) Received: from bitblocks.com (localhost [127.0.0.1]) by mail.bitblocks.com (Postfix) with ESMTP id 621EFB827; Wed, 10 Jul 2013 15:42:29 -0700 (PDT) To: Jordan Hubbard Subject: Re: Kernel dumps [was Re: possible changes from Panzura] In-reply-to: Your message of "Wed, 10 Jul 2013 14:50:19 PDT." <3592BFB7-0663-4381-AFF5-C7DE0AE16858@mail.turbofuzz.com> References: <9890DFF1-892A-4DCA-9E33-B70681154F43@mail.turbofuzz.com> <3592BFB7-0663-4381-AFF5-C7DE0AE16858@mail.turbofuzz.com> Comments: In-reply-to Jordan Hubbard message dated "Wed, 10 Jul 2013 14:50:19 -0700." Date: Wed, 10 Jul 2013 15:42:29 -0700 From: Bakul Shah Message-Id: <20130710224229.621EFB827@mail.bitblocks.com> Cc: asomers@gmail.com, hackers@freebsd.org, Julian Elischer X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Jul 2013 22:50:27 -0000 On Wed, 10 Jul 2013 14:50:19 PDT Jordan Hubbard wrote: > > On Jul 10, 2013, at 1:04 PM, asomers@gmail.com wrote: > > > I don't doubt that it would be useful to have an emergency network > > stack. But have you ever looked into debugging over firewire? > > My point was more that actually being able to debug a machine over the networ > k is such a step up in terms of convenience/awesomeness that if anyone is thi > nking of putting any time and attention into this area at all, that's definit > ely the target to go for. You have to use this just once to see how convenient it is! For a previous company James Da Silva did this in 1997 by adding a network console (IIRC in a day or two). A new ethernet type was used + a host specific ethernet multicast address so you could connect from any machine on the same ethernet segment. Either as a remote console for the usual console IO & ddb, or to run remote gdb. Quite insecure but that didn't matter as this was used in a test network. There was no emegerency network stack; just a polling function added to an ethernet driver since this had to work even when the kernel was on the operating table under anaesthetic! No new gdb hacks were necessary since the invoking program set things up for it. If I was doing this today, I'd probably still do the same and make sure that the interface used for remote debugging is on an isolated network. From owner-freebsd-hackers@FreeBSD.ORG Wed Jul 10 23:07:15 2013 Return-Path: Delivered-To: hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 17DFA6A7 for ; Wed, 10 Jul 2013 23:07:15 +0000 (UTC) (envelope-from vince@unsane.co.uk) Received: from unsane.co.uk (unsane-pt.tunnel.tserv5.lon1.ipv6.he.net [IPv6:2001:470:1f08:110::2]) by mx1.freebsd.org (Postfix) with ESMTP id B0F791096 for ; Wed, 10 Jul 2013 23:07:14 +0000 (UTC) Received: from vincemacbook.unsane.co.uk (vincemacbook.unsane.co.uk [10.10.10.20]) (authenticated bits=0) by unsane.co.uk (8.14.7/8.14.6) with ESMTP id r6AN7AF9013535 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Thu, 11 Jul 2013 00:07:10 +0100 (BST) (envelope-from vince@unsane.co.uk) Message-ID: <51DDE91E.4000400@unsane.co.uk> Date: Thu, 11 Jul 2013 00:07:10 +0100 From: Vincent Hoffman User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:17.0) Gecko/20130620 Thunderbird/17.0.7 MIME-Version: 1.0 To: Kevin Day Subject: Re: Kernel dumps [was Re: possible changes from Panzura] References: <9890DFF1-892A-4DCA-9E33-B70681154F43@mail.turbofuzz.com> <4F0DFAB7-D6D5-4068-A543-C9DF885D1A7D@dragondata.com> In-Reply-To: <4F0DFAB7-D6D5-4068-A543-C9DF885D1A7D@dragondata.com> X-Enigmail-Version: 1.5.1 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit Cc: hackers@freebsd.org, Jordan Hubbard X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Jul 2013 23:07:15 -0000 On 10/07/2013 23:09, Kevin Day wrote: >> >> Those sound useful. Just out of curiosity, however, since we're on the topic of kernel dumps: Has anyone even looked into the notion of an emergency fall-back network stack to enable remote kernel panic (or system hang) debugging, the way OS X lets you do? I can't tell you the number of times I've NMI'd a Mac and connected to it remotely in a scenario where everything was totally wedged and just a couple of minutes in kgdb (or now lldb) quickly showed that everything was waiting on a specific lock and the problem became manifestly clear. >> >> The feature also lets you scrape a panic'd machine with automation, running some kgdb scripts against it to glean useful information for later analysis vs having to have someone schlep the dump image manually to triage. It's going to be damn hard to live without this now, and if someone else isn't working on it, that's good to know too! > > At a previous employer, we had a system where on a panic it had a totally separate stack capable of just IP/UDP/TFTP and would save its core via TFTP to a server. This isnt as nice as full remote debugging, but it was a whole lot easier to develop. The caveats I remember were: > > 1) We didnt want to implement ARP, so you had to write the mac address of the dump server to the kernel via sysctl before crashing. > 2) We also didnt want to have to deal with routing tables, so you had to manually specify what interface to blast packets out to, also via sysctl. > 3) After a panic we didnt want to rely on interrupt processing working, so it polled the network interface and blocked whenever it needed to. Since this was an embedded system, it wasnt too big of a deal - only one network driver had to be hacked to support this. Basically a flag that would switch to disable normal processing, switch to polled fifos for input and output until reboot. > 4) The whole system used only preallocated buffers and its own stack (carved out from memory on boot) so even if the kernels malloc was trashed, we could still dump. > > Im not sure this really would scratch your itch, but I believe this took me no more than a day or two to implement. Parts #1 and #2 would be pretty easy, but Im not sure how generic the kernel could support an emergency network mode that doesnt require interrupts for every network card out there. Maybe that isnt as important to you as it was to us. > > The whole exercise is much easier if you dont use TFTP but a custom protocol that doesnt require the crashing system to receive any packets, if it can just blast away at some random host oblivious if its working or not, its a lot less code to write. > There was some work on something similar at one point, not sure what came of it. http://lists.freebsd.org/pipermail/freebsd-current/2010-September/020164.html Vince > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" > From owner-freebsd-hackers@FreeBSD.ORG Thu Jul 11 07:41:39 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 613FBD6; Thu, 11 Jul 2013 07:41:39 +0000 (UTC) (envelope-from neelnatu@gmail.com) Received: from mail-ie0-x22a.google.com (mail-ie0-x22a.google.com [IPv6:2607:f8b0:4001:c03::22a]) by mx1.freebsd.org (Postfix) with ESMTP id 25B5E17FE; Thu, 11 Jul 2013 07:41:39 +0000 (UTC) Received: by mail-ie0-f170.google.com with SMTP id e11so17764926iej.29 for ; Thu, 11 Jul 2013 00:41:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=1rscPNEtpVAwuswi+eJiB/2al35tuycmgGBcSyArLp0=; b=drzD+uipZsYaCDCOwy3b0EE/RltJ6XdFfNpmOt7DWphBFoTOoKXMKS9lHwAeKg+Knb G520Ha8zPbf4WBwj/VA0LMxbaHZXQKh03eW8OQJJI1Esnl4vXew+Ab890o/X96At+fJw kakguM6HjROe0fbCBG3tCrrdfLR4o8jihD68IzTDh/GAeQa/fHEY70Lp02qLIaC9jLrr dqEO5WnX6YsSRAcNTv2esrHQpYqXlGFtaiw9SgYhdWRWC0l2oD7WYHX0UBxuNBf09anm YKG2LbK7W3E2kU7ouSDQPuccfSZqjX/Ql6W6mWH3Up4A2BBva/YXq5zwO3S1Y03TE/mA XJaw== MIME-Version: 1.0 X-Received: by 10.43.91.10 with SMTP id bk10mr11049014icc.86.1373528498803; Thu, 11 Jul 2013 00:41:38 -0700 (PDT) Received: by 10.42.151.74 with HTTP; Thu, 11 Jul 2013 00:41:38 -0700 (PDT) In-Reply-To: <201307080642.r686gbDQ089570@elf.torek.net> References: <201306282033.r5SKXtYK053022@elf.torek.net> <201307080642.r686gbDQ089570@elf.torek.net> Date: Thu, 11 Jul 2013 00:41:38 -0700 Message-ID: Subject: Re: expanding amd64 past the 1TB limit From: Neel Natu To: Chris Torek Content-Type: text/plain; charset=ISO-8859-1 Cc: Konstantin Belousov , freebsd-hackers@freebsd.org, kib@freebsd.org, alc@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 11 Jul 2013 07:41:39 -0000 Hi Chris, On Sun, Jul 7, 2013 at 11:42 PM, Chris Torek wrote: > Here is a final (I hope) version of the patch. I dropped the > config option, but I added code to limit the "real" size of the > direct map PDEs. The end result is that on small systems, this > ties up 14 more pages (15 from increasing NKPML4E, but one > regained because the new static variable ndmpdpphys is 1 instead > of 2). > The patch looks good. I have a couple of comments inline: > (I fixed the comment errors I spotted earlier, too.) > > Chris > > amd64/amd64/pmap.c | 100 +++++++++++++++++++++++++++++------------------- > amd64/include/pmap.h | 36 +++++++++++++---- > amd64/include/vmparam.h | 13 ++++--- > 3 files changed, 97 insertions(+), 52 deletions(-) > > Author: Chris Torek > Date: Thu Jun 27 18:49:29 2013 -0600 > > increase physical and virtual memory limits > > Increase kernel VM space: go from .5 TB of KVA and 1 TB of direct > map, to 8 TB of KVA and 16 TB of direct map. However, we allocate > less direct map space for small physical-memory systems. Also, if > Maxmem is so large that there is not enough direct map space, > reduce Maxmem to fit, so that the system can boot unassisted. > > diff --git a/amd64/amd64/pmap.c b/amd64/amd64/pmap.c > index 8dcf232..7368c96 100644 > --- a/amd64/amd64/pmap.c > +++ b/amd64/amd64/pmap.c > @@ -232,6 +232,7 @@ u_int64_t KPML4phys; /* phys addr of kernel level 4 */ > > static u_int64_t DMPDphys; /* phys addr of direct mapped level 2 */ > static u_int64_t DMPDPphys; /* phys addr of direct mapped level 3 */ > +static int ndmpdpphys; /* number of DMPDPphys pages */ > > static struct rwlock_padalign pvh_global_lock; > > @@ -531,12 +532,27 @@ static void > create_pagetables(vm_paddr_t *firstaddr) > { > int i, j, ndm1g, nkpdpe; > + pt_entry_t *pt_p; > + pd_entry_t *pd_p; > + pdp_entry_t *pdp_p; > + pml4_entry_t *p4_p; The changes associated with pt_p, pd_p and p4_p are cosmetic and IMHO detract from the meat of the change. My preference would be for the cosmetic changes to be committed separately from the changes that rearrange the KVA. > > /* Allocate page table pages for the direct map */ > ndmpdp = (ptoa(Maxmem) + NBPDP - 1) >> PDPSHIFT; > if (ndmpdp < 4) /* Minimum 4GB of dirmap */ > ndmpdp = 4; > - DMPDPphys = allocpages(firstaddr, NDMPML4E); > + ndmpdpphys = howmany(ndmpdp, NPML4EPG); NPDPEPG should be used here instead of NPML4EPG even though they are numerically identical. > + if (ndmpdpphys > NDMPML4E) { > + /* > + * Each NDMPML4E allows 512 GB, so limit to that, > + * and then readjust ndmpdp and ndmpdpphys. > + */ > + printf("NDMPML4E limits system to %d GB\n", NDMPML4E * 512); > + Maxmem = atop(NDMPML4E * NBPML4); > + ndmpdpphys = NDMPML4E; > + ndmpdp = NDMPML4E * NPDEPG; > + } > + DMPDPphys = allocpages(firstaddr, ndmpdpphys); > ndm1g = 0; > if ((amd_feature & AMDID_PAGE1GB) != 0) > ndm1g = ptoa(Maxmem) >> PDPSHIFT; > @@ -553,6 +569,10 @@ create_pagetables(vm_paddr_t *firstaddr) > * bootstrap. We defer this until after all memory-size dependent > * allocations are done (e.g. direct map), so that we don't have to > * build in too much slop in our estimate. > + * > + * Note that when NKPML4E > 1, we have an empty page underneath > + * all but the KPML4I'th one, so we need NKPML4E-1 extra (zeroed) > + * pages. (pmap_enter requires a PD page to exist for each KPML4E.) > */ > nkpt_init(*firstaddr); > nkpdpe = NKPDPE(nkpt); > @@ -561,32 +581,26 @@ create_pagetables(vm_paddr_t *firstaddr) > KPDphys = allocpages(firstaddr, nkpdpe); > > /* Fill in the underlying page table pages */ > - /* Read-only from zero to physfree */ > + /* Nominally read-only (but really R/W) from zero to physfree */ > /* XXX not fully used, underneath 2M pages */ > - for (i = 0; (i << PAGE_SHIFT) < *firstaddr; i++) { > - ((pt_entry_t *)KPTphys)[i] = i << PAGE_SHIFT; > - ((pt_entry_t *)KPTphys)[i] |= PG_RW | PG_V | PG_G; > - } > + pt_p = (pt_entry_t *)KPTphys; > + for (i = 0; ptoa(i) < *firstaddr; i++) > + pt_p[i] = ptoa(i) | PG_RW | PG_V | PG_G; > > /* Now map the page tables at their location within PTmap */ > - for (i = 0; i < nkpt; i++) { > - ((pd_entry_t *)KPDphys)[i] = KPTphys + (i << PAGE_SHIFT); > - ((pd_entry_t *)KPDphys)[i] |= PG_RW | PG_V; > - } > + pd_p = (pd_entry_t *)KPDphys; > + for (i = 0; i < nkpt; i++) > + pd_p[i] = (KPTphys + ptoa(i)) | PG_RW | PG_V; > > /* Map from zero to end of allocations under 2M pages */ > /* This replaces some of the KPTphys entries above */ > - for (i = 0; (i << PDRSHIFT) < *firstaddr; i++) { > - ((pd_entry_t *)KPDphys)[i] = i << PDRSHIFT; > - ((pd_entry_t *)KPDphys)[i] |= PG_RW | PG_V | PG_PS | PG_G; > - } > + for (i = 0; (i << PDRSHIFT) < *firstaddr; i++) > + pd_p[i] = (i << PDRSHIFT) | PG_RW | PG_V | PG_PS | PG_G; > > - /* And connect up the PD to the PDP */ > - for (i = 0; i < nkpdpe; i++) { > - ((pdp_entry_t *)KPDPphys)[i + KPDPI] = KPDphys + > - (i << PAGE_SHIFT); > - ((pdp_entry_t *)KPDPphys)[i + KPDPI] |= PG_RW | PG_V | PG_U; > - } > + /* And connect up the PD to the PDP (leaving room for L4 pages) */ > + pdp_p = (pdp_entry_t *)(KPDPphys + ptoa(KPML4I - KPML4BASE)); > + for (i = 0; i < nkpdpe; i++) > + pdp_p[i + KPDPI] = (KPDphys + ptoa(i)) | PG_RW | PG_V | PG_U; > > /* > * Now, set up the direct map region using 2MB and/or 1GB pages. If > @@ -596,37 +610,41 @@ create_pagetables(vm_paddr_t *firstaddr) > * memory, pmap_change_attr() will demote any 2MB or 1GB page mappings > * that are partially used. > */ > + pd_p = (pd_entry_t *)DMPDphys; > for (i = NPDEPG * ndm1g, j = 0; i < NPDEPG * ndmpdp; i++, j++) { > - ((pd_entry_t *)DMPDphys)[j] = (vm_paddr_t)i << PDRSHIFT; > + pd_p[j] = (vm_paddr_t)i << PDRSHIFT; > /* Preset PG_M and PG_A because demotion expects it. */ > - ((pd_entry_t *)DMPDphys)[j] |= PG_RW | PG_V | PG_PS | PG_G | > + pd_p[j] |= PG_RW | PG_V | PG_PS | PG_G | > PG_M | PG_A; > } > + pdp_p = (pdp_entry_t *)DMPDPphys; > for (i = 0; i < ndm1g; i++) { > - ((pdp_entry_t *)DMPDPphys)[i] = (vm_paddr_t)i << PDPSHIFT; > + pdp_p[i] = (vm_paddr_t)i << PDPSHIFT; > /* Preset PG_M and PG_A because demotion expects it. */ > - ((pdp_entry_t *)DMPDPphys)[i] |= PG_RW | PG_V | PG_PS | PG_G | > + pdp_p[i] |= PG_RW | PG_V | PG_PS | PG_G | > PG_M | PG_A; > } > for (j = 0; i < ndmpdp; i++, j++) { > - ((pdp_entry_t *)DMPDPphys)[i] = DMPDphys + (j << PAGE_SHIFT); > - ((pdp_entry_t *)DMPDPphys)[i] |= PG_RW | PG_V | PG_U; > + pdp_p[i] = DMPDphys + ptoa(j); > + pdp_p[i] |= PG_RW | PG_V | PG_U; > } > > /* And recursively map PML4 to itself in order to get PTmap */ > - ((pdp_entry_t *)KPML4phys)[PML4PML4I] = KPML4phys; > - ((pdp_entry_t *)KPML4phys)[PML4PML4I] |= PG_RW | PG_V | PG_U; > + p4_p = (pml4_entry_t *)KPML4phys; > + p4_p[PML4PML4I] = KPML4phys; > + p4_p[PML4PML4I] |= PG_RW | PG_V | PG_U; > > /* Connect the Direct Map slot(s) up to the PML4. */ > - for (i = 0; i < NDMPML4E; i++) { > - ((pdp_entry_t *)KPML4phys)[DMPML4I + i] = DMPDPphys + > - (i << PAGE_SHIFT); > - ((pdp_entry_t *)KPML4phys)[DMPML4I + i] |= PG_RW | PG_V | PG_U; > + for (i = 0; i < ndmpdpphys; i++) { > + p4_p[DMPML4I + i] = DMPDPphys + ptoa(i); > + p4_p[DMPML4I + i] |= PG_RW | PG_V | PG_U; > } > > - /* Connect the KVA slot up to the PML4 */ > - ((pdp_entry_t *)KPML4phys)[KPML4I] = KPDPphys; > - ((pdp_entry_t *)KPML4phys)[KPML4I] |= PG_RW | PG_V | PG_U; > + /* Connect the KVA slots up to the PML4 */ > + for (i = 0; i < NKPML4E; i++) { > + p4_p[KPML4BASE + i] = KPDPphys + ptoa(i); > + p4_p[KPML4BASE + i] |= PG_RW | PG_V | PG_U; > + } > } > > /* > @@ -1685,8 +1703,11 @@ pmap_pinit(pmap_t pmap) > pagezero(pmap->pm_pml4); > > /* Wire in kernel global address entries. */ > - pmap->pm_pml4[KPML4I] = KPDPphys | PG_RW | PG_V | PG_U; > - for (i = 0; i < NDMPML4E; i++) { > + for (i = 0; i < NKPML4E; i++) { > + pmap->pm_pml4[KPML4BASE + i] = (KPDPphys + (i << PAGE_SHIFT)) | > + PG_RW | PG_V | PG_U; > + } > + for (i = 0; i < ndmpdpphys; i++) { > pmap->pm_pml4[DMPML4I + i] = (DMPDPphys + (i << PAGE_SHIFT)) | > PG_RW | PG_V | PG_U; > } > @@ -1941,8 +1962,9 @@ pmap_release(pmap_t pmap) > > m = PHYS_TO_VM_PAGE(pmap->pm_pml4[PML4PML4I] & PG_FRAME); > > - pmap->pm_pml4[KPML4I] = 0; /* KVA */ > - for (i = 0; i < NDMPML4E; i++) /* Direct Map */ > + for (i = 0; i < NKPML4E; i++) /* KVA */ > + pmap->pm_pml4[KPML4BASE + i] = 0; > + for (i = 0; i < ndmpdpphys; i++)/* Direct Map */ > pmap->pm_pml4[DMPML4I + i] = 0; > pmap->pm_pml4[PML4PML4I] = 0; /* Recursive Mapping */ > > diff --git a/amd64/include/pmap.h b/amd64/include/pmap.h > index dc02e49..da80241 100644 > --- a/amd64/include/pmap.h > +++ b/amd64/include/pmap.h > @@ -113,28 +113,50 @@ > ((unsigned long)(l2) << PDRSHIFT) | \ > ((unsigned long)(l1) << PAGE_SHIFT)) > > -#define NKPML4E 1 /* number of kernel PML4 slots */ > +/* > + * Number of kernel PML4 slots. Can be anywhere from 1 to 64 or so, > + * but setting it larger than NDMPML4E makes no sense. > + * > + * Each slot provides .5 TB of kernel virtual space. > + */ > +#define NKPML4E 16 > > #define NUPML4E (NPML4EPG/2) /* number of userland PML4 pages */ > #define NUPDPE (NUPML4E*NPDPEPG)/* number of userland PDP pages */ > #define NUPDE (NUPDPE*NPDEPG) /* number of userland PD entries */ > > /* > - * NDMPML4E is the number of PML4 entries that are used to implement the > - * direct map. It must be a power of two. > + * NDMPML4E is the maximum number of PML4 entries that will be > + * used to implement the direct map. It must be a power of two, > + * and should generally exceed NKPML4E. The maximum possible > + * value is 64; using 128 will make the direct map intrude into > + * the recursive page table map. > */ > -#define NDMPML4E 2 > +#define NDMPML4E 32 > > /* > - * The *PDI values control the layout of virtual memory. The starting address > + * These values control the layout of virtual memory. The starting address > * of the direct map, which is controlled by DMPML4I, must be a multiple of > * its size. (See the PHYS_TO_DMAP() and DMAP_TO_PHYS() macros.) > + * > + * Note: KPML4I is the index of the (single) level 4 page that maps > + * the KVA that holds KERNBASE, while KPML4BASE is the index of the > + * first level 4 page that maps VM_MIN_KERNEL_ADDRESS. If NKPML4E > + * is 1, these are the same, otherwise KPML4BASE < KPML4I and extra > + * level 4 PDEs are needed to map from VM_MIN_KERNEL_ADDRESS up to > + * KERNBASE. Similarly, if KMPL4I < (base+N), extra level 4 PDEs are level 2 PDE's, right? > + * needed to map from somewhere-above-KERNBASE to VM_MAX_KERNEL_ADDRESS. > + * > + * (KPML4I combines with KPDPI to choose where KERNBASE starts. > + * Or, in other words, KPML4I provides bits 39..46 of KERNBASE, > + * and KPDPI provides bits 30..38.) > */ > #define PML4PML4I (NPML4EPG/2) /* Index of recursive pml4 mapping */ > > -#define KPML4I (NPML4EPG-1) /* Top 512GB for KVM */ > -#define DMPML4I rounddown(KPML4I - NDMPML4E, NDMPML4E) /* Below KVM */ > +#define KPML4BASE (NPML4EPG-NKPML4E) /* KVM at highest addresses */ > +#define DMPML4I rounddown(KPML4BASE-NDMPML4E, NDMPML4E) /* Below KVM */ > > +#define KPML4I (NPML4EPG-1) > #define KPDPI (NPDPEPG-2) /* kernbase at -2GB */ > > /* > diff --git a/amd64/include/vmparam.h b/amd64/include/vmparam.h > index 33f62bd..cff2558 100644 > --- a/amd64/include/vmparam.h > +++ b/amd64/include/vmparam.h > @@ -145,18 +145,19 @@ > * 0x0000000000000000 - 0x00007fffffffffff user map > * 0x0000800000000000 - 0xffff7fffffffffff does not exist (hole) > * 0xffff800000000000 - 0xffff804020100fff recursive page table (512GB slot) > - * 0xffff804020101000 - 0xfffffdffffffffff unused > - * 0xfffffe0000000000 - 0xfffffeffffffffff 1TB direct map > - * 0xffffff0000000000 - 0xffffff7fffffffff unused > - * 0xffffff8000000000 - 0xffffffffffffffff 512GB kernel map > + * 0xffff804020101000 - 0xffffdfffffffffff unused > + * 0xffffe00000000000 - 0xffffefffffffffff 16TB direct map > + * 0xfffff00000000000 - 0xfffff7ffffffffff unused > + * 0xfffff80000000000 - 0xffffffffffffffff 8TB kernel map > * > * Within the kernel map: > * > * 0xffffffff80000000 KERNBASE > */ > > -#define VM_MAX_KERNEL_ADDRESS KVADDR(KPML4I, NPDPEPG-1, NPDEPG-1, NPTEPG-1) > -#define VM_MIN_KERNEL_ADDRESS KVADDR(KPML4I, NPDPEPG-512, 0, 0) > +#define VM_MIN_KERNEL_ADDRESS KVADDR(KPML4BASE, 0, 0, 0) > +#define VM_MAX_KERNEL_ADDRESS KVADDR(KPML4BASE + NKPML4E - 1, \ > + NPDPEPG-1, NPDEPG-1, NPTEPG-1) > > #define DMAP_MIN_ADDRESS KVADDR(DMPML4I, 0, 0, 0) > #define DMAP_MAX_ADDRESS KVADDR(DMPML4I + NDMPML4E, 0, 0, 0) best Neel From owner-freebsd-hackers@FreeBSD.ORG Thu Jul 11 13:30:44 2013 Return-Path: Delivered-To: hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id ACBB45D2 for ; Thu, 11 Jul 2013 13:30:44 +0000 (UTC) (envelope-from lars@e-new.0x20.net) Received: from mail.0x20.net (mail.0x20.net [IPv6:2001:aa8:fffb:1::3]) by mx1.freebsd.org (Postfix) with ESMTP id 707B11D72 for ; Thu, 11 Jul 2013 13:30:44 +0000 (UTC) Received: from e-new.0x20.net (mail.0x20.net [IPv6:2001:aa8:fffb:1::3]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.0x20.net (Postfix) with ESMTPS id 640086A6004; Thu, 11 Jul 2013 15:30:42 +0200 (CEST) Received: from e-new.0x20.net (localhost [127.0.0.1]) by e-new.0x20.net (8.14.7/8.14.7) with ESMTP id r6BDUgYK098739; Thu, 11 Jul 2013 15:30:42 +0200 (CEST) (envelope-from lars@e-new.0x20.net) Received: (from lars@localhost) by e-new.0x20.net (8.14.7/8.14.7/Submit) id r6BDUetl098592; Thu, 11 Jul 2013 15:30:40 +0200 (CEST) (envelope-from lars) Date: Thu, 11 Jul 2013 15:30:40 +0200 From: Lars Engels To: asomers@gmail.com Subject: Re: Kernel dumps [was Re: possible changes from Panzura] Message-ID: <20130711133040.GO88288@e-new.0x20.net> References: <9890DFF1-892A-4DCA-9E33-B70681154F43@mail.turbofuzz.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="SDZNRjzUoAX9KAL/" Content-Disposition: inline In-Reply-To: X-Editor: VIM - Vi IMproved 7.3 X-Operation-System: FreeBSD 8.4-RELEASE User-Agent: Mutt/1.5.21 (2010-09-15) Cc: hackers@freebsd.org, Julian Elischer , Jordan Hubbard X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 11 Jul 2013 13:30:44 -0000 --SDZNRjzUoAX9KAL/ Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Jul 10, 2013 at 02:04:17PM -0600, asomers@gmail.com wrote: > On Wed, Jul 10, 2013 at 12:57 PM, Jordan Hubbard = wrote: > > > > On Jul 10, 2013, at 11:16 AM, Julian Elischer wro= te: > > > >> My first candidates are: > > > > Those sound useful. Just out of curiosity, however, since we're on > > the topic of kernel dumps: Has anyone even looked into the notion > > of an emergency fall-back network stack to enable remote kernel > > panic (or system hang) debugging, the way OS X lets you do? I can't > > tell you the number of times I've NMI'd a Mac and connected to it > > remotely in a scenario where everything was totally wedged and just > > a couple of minutes in kgdb (or now lldb) quickly showed that > > everything was waiting on a specific lock and the problem became > > manifestly clear. > > > > The feature also lets you scrape a panic'd machine with automation, > > running some kgdb scripts against it to glean useful information for > > later analysis vs having to have someone schlep the dump image > > manually to triage. It's going to be damn hard to live without this > > now, and if someone else isn't working on it, that's good to know > > too! >=20 > I don't doubt that it would be useful to have an emergency network > stack. But have you ever looked into debugging over firewire? We've > had success with it. All of our development machines are connected to > a single firewire bus. When one panics, we can remotely debug it with > both kdb and ddb. It's not ethernet , but it's still much faster than > a serial port. > https://wiki.freebsd.org/DebugWithDcons Debugging over Firewire may be very nice to use, but Firewire is dead while every single device nowadays has a network interface, admittedly it's often wireless. --SDZNRjzUoAX9KAL/ Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.20 (FreeBSD) iEYEARECAAYFAlHes4AACgkQKc512sD3afg29QCdGEG7bQhxyoI6W8TyyOYvg0Wx 0HcAnj3R3pX1GLuO6zZW3XmLQVQOO5G9 =olYf -----END PGP SIGNATURE----- --SDZNRjzUoAX9KAL/-- From owner-freebsd-hackers@FreeBSD.ORG Thu Jul 11 14:28:04 2013 Return-Path: Delivered-To: hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 2482B22B for ; Thu, 11 Jul 2013 14:28:04 +0000 (UTC) (envelope-from julian@freebsd.org) Received: from vps1.elischer.org (vps1.elischer.org [204.109.63.16]) by mx1.freebsd.org (Postfix) with ESMTP id 06A3D1053 for ; Thu, 11 Jul 2013 14:28:03 +0000 (UTC) Received: from Julian-MBP3.local (ppp121-45-226-51.lns20.per1.internode.on.net [121.45.226.51]) (authenticated bits=0) by vps1.elischer.org (8.14.5/8.14.5) with ESMTP id r6BERv7E030579 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Thu, 11 Jul 2013 07:28:00 -0700 (PDT) (envelope-from julian@freebsd.org) Message-ID: <51DEC0E8.7010305@freebsd.org> Date: Thu, 11 Jul 2013 22:27:52 +0800 From: Julian Elischer User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:17.0) Gecko/20130620 Thunderbird/17.0.7 MIME-Version: 1.0 To: Kevin Day Subject: Re: Kernel dumps [was Re: possible changes from Panzura] References: <9890DFF1-892A-4DCA-9E33-B70681154F43@mail.turbofuzz.com> <4F0DFAB7-D6D5-4068-A543-C9DF885D1A7D@dragondata.com> In-Reply-To: <4F0DFAB7-D6D5-4068-A543-C9DF885D1A7D@dragondata.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 8bit Cc: hackers@freebsd.org, Jordan Hubbard X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 11 Jul 2013 14:28:04 -0000 On 7/11/13 6:09 AM, Kevin Day wrote: >> >> Those sound useful. Just out of curiosity, however, since we're on the topic of kernel dumps: Has anyone even looked into the notion of an emergency fall-back network stack to enable remote kernel panic (or system hang) debugging, the way OS X lets you do? I can't tell you the number of times I've NMI'd a Mac and connected to it remotely in a scenario where everything was totally wedged and just a couple of minutes in kgdb (or now lldb) quickly showed that everything was waiting on a specific lock and the problem became manifestly clear. >> >> The feature also lets you scrape a panic'd machine with automation, running some kgdb scripts against it to glean useful information for later analysis vs having to have someone schlep the dump image manually to triage. It's going to be damn hard to live without this now, and if someone else isn't working on it, that's good to know too! I could imagine that we could stash away a vimage stack just for this purpose. yould set it up on boot and leave it detached until you need it. you just need to switch the interfaces over to the new stack on panic and put them into 'poll' mode. Or maybe you'd need more (like pre-allocating mbufs for it to use). Just an idea. > > At a previous employer, we had a system where on a panic it had a totally separate stack capable of just IP/UDP/TFTP and would save its core via TFTP to a server. This isnt as nice as full remote debugging, but it was a whole lot easier to develop. The caveats I remember were: > > 1) We didnt want to implement ARP, so you had to write the mac address of the dump server to the kernel via sysctl before crashing. > 2) We also didnt want to have to deal with routing tables, so you had to manually specify what interface to blast packets out to, also via sysctl. > 3) After a panic we didnt want to rely on interrupt processing working, so it polled the network interface and blocked whenever it needed to. Since this was an embedded system, it wasnt too big of a deal - only one network driver had to be hacked to support this. Basically a flag that would switch to disable normal processing, switch to polled fifos for input and output until reboot. > 4) The whole system used only preallocated buffers and its own stack (carved out from memory on boot) so even if the kernels malloc was trashed, we could still dump. > > Im not sure this really would scratch your itch, but I believe this took me no more than a day or two to implement. Parts #1 and #2 would be pretty easy, but Im not sure how generic the kernel could support an emergency network mode that doesnt require interrupts for every network card out there. Maybe that isnt as important to you as it was to us. > > The whole exercise is much easier if you dont use TFTP but a custom protocol that doesnt require the crashing system to receive any packets, if it can just blast away at some random host oblivious if its working or not, its a lot less code to write. > > > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" > > From owner-freebsd-hackers@FreeBSD.ORG Thu Jul 11 14:43:35 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 9E85966A for ; Thu, 11 Jul 2013 14:43:35 +0000 (UTC) (envelope-from linnemannr@gmail.com) Received: from mail-oa0-x230.google.com (mail-oa0-x230.google.com [IPv6:2607:f8b0:4003:c02::230]) by mx1.freebsd.org (Postfix) with ESMTP id 6D5FF112C for ; Thu, 11 Jul 2013 14:43:35 +0000 (UTC) Received: by mail-oa0-f48.google.com with SMTP id f4so11430491oah.35 for ; Thu, 11 Jul 2013 07:43:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=15VGR0WadWBlBafFi6XWdi5Y4yixGpLKv/IBYDN997M=; b=Jb6fJu/yQruEohz2ZxG/PvJybcXx9yexV8+jHvGSorPsu/2569ni4kJJiB0wjMTsi3 4r5vfm8dtptCfezXd5yigsCyj8wFAXlCcBcmp1nRycwdzMbIN0+R3YmPtFmX/EkYGF2H n/FshoDO8slSrovyVrjDa5C2vH3+SSxb8IaKb4+bquNSfom9K6DVmgwIvf4QToO8/WVZ ZnkqU6imfptMkZREX4nLwaoo6iIH3BmVjz43SoKPjpK5QD5lAXeYXCMp4V8ioUO4KO92 /wiVdY0Lyzrl/Giv8fYsyztvw+KkiuK9lZ5S5EON3YVdTa8629IFGL5a5WBw673DRi/g QVZw== MIME-Version: 1.0 X-Received: by 10.182.232.225 with SMTP id tr1mr32127178obc.69.1373553815068; Thu, 11 Jul 2013 07:43:35 -0700 (PDT) Received: by 10.182.122.97 with HTTP; Thu, 11 Jul 2013 07:43:34 -0700 (PDT) Date: Thu, 11 Jul 2013 09:43:34 -0500 Message-ID: Subject: Attempting to roll back zfs transactions on a disk to recover a destroyed ZFS filesystem From: Reid Linnemann To: freebsd-hackers@freebsd.org Content-Type: multipart/mixed; boundary=001a11c312423a26da04e13d6b2b X-Content-Filtered-By: Mailman/MimeDel 2.1.14 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 11 Jul 2013 14:43:35 -0000 --001a11c312423a26da04e13d6b2b Content-Type: text/plain; charset=ISO-8859-1 So recently I was trying to transfer a root-on-ZFS zpool from one pair of disks to a single, larger disk. As I am wont to do, I botched the transfer up and decided to destroy the ZFS filesystems on the destination and start again. Naturally I was up late working on this, being sloppy and drowsy without any coffee, and lo and behold I issued my 'zfs destroy -R' and immediately realized after pressing [ENTER[ that I had given it the source's zpool name. oops. Fortunately I was able to interrupt the procedure with only /usr being destroyed from the pool and I was able to send/receive the truly vital data in my /var partition to the new disk and re-deploy the base system to /usr on the new disk. The only thing I'm really missing at this point is all of the third-party software configuration I had in /usr/local/etc and my apache data in /usr/local/www. After a few minutes on Google I came across this wonderful page: http://www.solarisinternals.com/wiki/index.php/ZFS_forensics_scrollback_script where the author has published information about his python script which locates the uberblocks on the raw disk and shows the user the most recent transaction IDs, prompts the user for a transaction ID to roll back to, and zeroes out all uberblocks beyond that point. Theoretically, I should be able to use this script to get back to the transaction prior to my dreaded 'zfs destroy -R', then be able to recover the data I need (since no further writes have been done to the source disks). First, I know there's a problem in the script on FreeBSD in which the grep pattern for the od output expects a single space between the output elements. I've attached a patch that allows the output to be properly grepped in FreeBSD, so we can actually get to the transaction log. But now we are to my current problem. When attempting to roll back with this script, it tries to dd zero'd bytes to offsets into the disk device (/dev/ada1p3 in my case) where the uberblocks are located. But even with kern.geom.debugflags set to 0x10 (and I am runnign this as root) I get 'Operation not permitted' when the script tries to zero out the unwanted transactions. I'm fairly certain this is because the geom is in use by the ZFS subsystem, as it is still recognized as a part of the original pool. I'm hesitant to zfs export the pool, as I don't know if that wipes the transaction history on the pool. Does anyone have any ideas? Thanks, -Reid --001a11c312423a26da04e13d6b2b Content-Type: application/octet-stream; name="zfs_revert-0.1.py.patch" Content-Disposition: attachment; filename="zfs_revert-0.1.py.patch" Content-Transfer-Encoding: base64 X-Attachment-Id: f_hj01fgkc0 LS0tIHpmc19yZXZlcnQtMC4xLnB5Lm9yaWcJMjAxMy0wNy0xMSAwOToxMTozOS41NTAyODQ5OTUg LTA1MDAKKysrIHpmc19yZXZlcnQtMC4xLnB5CTIwMTMtMDctMTEgMDk6MTI6MDkuMzA3Mjc2MTYx IC0wNTAwCkBAIC02NSw4ICs2NSw4IEBACiBwcmludCAnUmVhZGluZyBmcm9tICVzIGJsb2NrcyB0 byB0aGUgZW5kJyVsX3NraXAKIAogI2dldCB0aGUgdWJlcmJsb2NrcyBmcm9tIHRoZSBiZWdpbm5p bmcgYW5kIGVuZAoteWJlcmJsb2Nrc19hPWZvcm1hdHN0ZChzdWJwcm9jZXNzLlBvcGVuKCdzeW5j ICYmIGRkIGJzPSVzIGlmPSVzIGNvdW50PSVzIHwgb2QgLUEgeCAteCB8ICVzIC1BIDIgImIxMGMg MDBiYSIgfCAlcyAtdiAiXC1cLSInJShicyxmaWxlLCBhX2NvdW50LGdyZXBfY21kLGdyZXBfY21k KSwgc2hlbGw9VHJ1ZSwgc3Rkb3V0PXN1YnByb2Nlc3MuUElQRSkuY29tbXVuaWNhdGUoKVswXSkK LXliZXJibG9ja3NfbD1mb3JtYXRzdGQoc3VicHJvY2Vzcy5Qb3Blbignc3luYyAmJiBkZCBicz0l cyBpZj0lcyBza2lwPSVzIHwgb2QgLUEgeCAteCB8ICVzIC1BIDIgImIxMGMgMDBiYSIgfCAlcyAt diAiXC1cLSInJShicyxmaWxlLCBsX3NraXAsZ3JlcF9jbWQsZ3JlcF9jbWQpLCBzaGVsbD1UcnVl LCBzdGRvdXQ9c3VicHJvY2Vzcy5QSVBFKS5jb21tdW5pY2F0ZSgpWzBdKQoreWJlcmJsb2Nrc19h PWZvcm1hdHN0ZChzdWJwcm9jZXNzLlBvcGVuKCdzeW5jICYmIGRkIGJzPSVzIGlmPSVzIGNvdW50 PSVzIHwgb2QgLUEgeCAteCB8ICVzIC1BIDIgImIxMGMgXCswMGJhIiB8ICVzIC12ICJcLVwtIicl KGJzLGZpbGUsIGFfY291bnQsZ3JlcF9jbWQsZ3JlcF9jbWQpLCBzaGVsbD1UcnVlLCBzdGRvdXQ9 c3VicHJvY2Vzcy5QSVBFKS5jb21tdW5pY2F0ZSgpWzBdKQoreWJlcmJsb2Nrc19sPWZvcm1hdHN0 ZChzdWJwcm9jZXNzLlBvcGVuKCdzeW5jICYmIGRkIGJzPSVzIGlmPSVzIHNraXA9JXMgfCBvZCAt QSB4IC14IHwgJXMgLUEgMiAiYjEwY1wrIDAwYmEiIHwgJXMgLXYgIlwtXC0iJyUoYnMsZmlsZSwg bF9za2lwLGdyZXBfY21kLGdyZXBfY21kKSwgc2hlbGw9VHJ1ZSwgc3Rkb3V0PXN1YnByb2Nlc3Mu UElQRSkuY29tbXVuaWNhdGUoKVswXSkKIAogCiB5YmVyYmxvY2tzPVtdCg== --001a11c312423a26da04e13d6b2b-- From owner-freebsd-hackers@FreeBSD.ORG Thu Jul 11 15:04:34 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 3995AFB4 for ; Thu, 11 Jul 2013 15:04:34 +0000 (UTC) (envelope-from asomers@gmail.com) Received: from mail-qa0-x22b.google.com (mail-qa0-x22b.google.com [IPv6:2607:f8b0:400d:c00::22b]) by mx1.freebsd.org (Postfix) with ESMTP id 006A212A5 for ; Thu, 11 Jul 2013 15:04:33 +0000 (UTC) Received: by mail-qa0-f43.google.com with SMTP id d13so7708863qak.16 for ; Thu, 11 Jul 2013 08:04:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=QJ+Frq+C4KnGWwyrIqZWXklnzkAnbcnMhnaeTF/ju2M=; b=pUc2zpSjIiRjHf7SE/NdxfPPOPEOTwKUbyYuc6E1F3XBWpedeSgB4NqyVdQPae0+eU Jso70TWSk8Fhui3wr0osTZJ7W4bLLgx90QCQDZvBznx8usv08c/N9U2YS0fj36N9B2yf enT5TnQlkJSMY9+ln7OEGjPYm2SfWhH7TVYw2S0q38uefqzyC/XFKat4sjO4cFfV+U62 IZHq2GjFvwE6zi7jRv3Z/iwXn3uK8YB/8Zx78YedFs2pc4/puwkiRcgs1uAUd974nUaO OkgfT28n1K7f6P4DzQfCR6bdj09dRcsxiotFH2kM0elfFGY82lkSN9w/8IhstEW8rqmY SW6g== MIME-Version: 1.0 X-Received: by 10.49.6.6 with SMTP id w6mr30070293qew.44.1373555073539; Thu, 11 Jul 2013 08:04:33 -0700 (PDT) Sender: asomers@gmail.com Received: by 10.49.37.226 with HTTP; Thu, 11 Jul 2013 08:04:33 -0700 (PDT) In-Reply-To: References: Date: Thu, 11 Jul 2013 09:04:33 -0600 X-Google-Sender-Auth: 6k1dPWUEMptEa_mhQMLlP2jKH1k Message-ID: Subject: Re: Attempting to roll back zfs transactions on a disk to recover a destroyed ZFS filesystem From: Alan Somers To: Reid Linnemann Content-Type: text/plain; charset=ISO-8859-1 Cc: freebsd-hackers@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 11 Jul 2013 15:04:34 -0000 "zpool export" does not wipe the transaction history. It does, however, write new labels and some metadata, so there is a very slight chance that it might overwrite some of the blocks that you're trying to recover. But it's probably safe. An alternative, much more complicated, solution would be to have ZFS open the device non-exclusively. This patch will do that. Caveat programmer: I haven't tested this patch in isolation. Change 624068 by willa@willa_SpectraBSD on 2012/08/09 09:28:38 Allow multiple opens of geoms used by vdev_geom. Also ignore the pool guid for spares when checking to decide whether it's ok to attach a vdev. This enables using hotspares to replace other devices, as well as using a given hotspare in multiple pools. We need to investigate alternative solutions in order to allow opening the geoms exclusive. Affected files ... ... //SpectraBSD/stable/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c#2 edit Differences ... ==== //SpectraBSD/stable/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c#2 (text) ==== @@ -179,49 +179,23 @@ gp = g_new_geomf(&zfs_vdev_class, "zfs::vdev"); gp->orphan = vdev_geom_orphan; gp->attrchanged = vdev_geom_attrchanged; - cp = g_new_consumer(gp); - error = g_attach(cp, pp); - if (error != 0) { - printf("%s(%d): g_attach failed: %d\n", __func__, - __LINE__, error); - g_wither_geom(gp, ENXIO); - return (NULL); - } - error = g_access(cp, 1, 0, 1); - if (error != 0) { - printf("%s(%d): g_access failed: %d\n", __func__, - __LINE__, error); - g_wither_geom(gp, ENXIO); - return (NULL); - } - ZFS_LOG(1, "Created geom and consumer for %s.", pp->name); - } else { - /* Check if we are already connected to this provider. */ - LIST_FOREACH(cp, &gp->consumer, consumer) { - if (cp->provider == pp) { - ZFS_LOG(1, "Provider %s already in use by ZFS. " - "Failing attach.", pp->name); - return (NULL); - } - } - cp = g_new_consumer(gp); - error = g_attach(cp, pp); - if (error != 0) { - printf("%s(%d): g_attach failed: %d\n", - __func__, __LINE__, error); - g_destroy_consumer(cp); - return (NULL); - } - error = g_access(cp, 1, 0, 1); - if (error != 0) { - printf("%s(%d): g_access failed: %d\n", - __func__, __LINE__, error); - g_detach(cp); - g_destroy_consumer(cp); - return (NULL); - } - ZFS_LOG(1, "Created consumer for %s.", pp->name); + } + cp = g_new_consumer(gp); + error = g_attach(cp, pp); + if (error != 0) { + printf("%s(%d): g_attach failed: %d\n", __func__, + __LINE__, error); + g_wither_geom(gp, ENXIO); + return (NULL); + } + error = g_access(cp, /*r*/1, /*w*/0, /*e*/0); + if (error != 0) { + printf("%s(%d): g_access failed: %d\n", __func__, + __LINE__, error); + g_wither_geom(gp, ENXIO); + return (NULL); } + ZFS_LOG(1, "Created consumer for %s.", pp->name); cp->private = vd; vd->vdev_tsd = cp; @@ -251,7 +225,7 @@ cp->private = NULL; gp = cp->geom; - g_access(cp, -1, 0, -1); + g_access(cp, -1, 0, 0); /* Destroy consumer on last close. */ if (cp->acr == 0 && cp->ace == 0) { ZFS_LOG(1, "Destroyed consumer to %s.", cp->provider->name); @@ -384,6 +358,18 @@ cp->provider->name); } +static inline boolean_t +vdev_attach_ok(vdev_t *vd, uint64_t pool_guid, uint64_t vdev_guid) +{ + boolean_t pool_ok; + boolean_t vdev_ok; + + /* Spares can be assigned to multiple pools. */ + pool_ok = vd->vdev_isspare || pool_guid == spa_guid(vd->vdev_spa); + vdev_ok = vdev_guid == vd->vdev_guid; + return (pool_ok && vdev_ok); +} + static struct g_consumer * vdev_geom_attach_by_guids(vdev_t *vd) { @@ -420,8 +406,7 @@ g_topology_lock(); g_access(zcp, -1, 0, 0); g_detach(zcp); - if (pguid != spa_guid(vd->vdev_spa) || - vguid != vd->vdev_guid) + if (!vdev_attach_ok(vd, pguid, vguid)) continue; cp = vdev_geom_attach(pp, vd); if (cp == NULL) { @@ -498,8 +483,10 @@ g_topology_unlock(); vdev_geom_read_guids(cp, &pguid, &vguid); g_topology_lock(); - if (pguid != spa_guid(vd->vdev_spa) || - vguid != vd->vdev_guid) { + if (vdev_attach_ok(vd, pguid, vguid)) { + ZFS_LOG(1, "guids match for provider %s.", + vd->vdev_path); + } else { vdev_geom_close_locked(vd); cp = NULL; ZFS_LOG(1, "guid mismatch for provider %s: " @@ -507,9 +494,6 @@ (uintmax_t)spa_guid(vd->vdev_spa), (uintmax_t)vd->vdev_guid, (uintmax_t)pguid, (uintmax_t)vguid); - } else { - ZFS_LOG(1, "guids match for provider %s.", - vd->vdev_path); } } } @@ -601,8 +585,8 @@ g_topology_lock(); } if (error != 0) { - printf("ZFS WARNING: Unable to open %s for writing (error=%d).\n", - vd->vdev_path, error); + printf("ZFS WARNING: Error %d opening %s for write.\n", + error, vd->vdev_path); vdev_geom_close_locked(vd); cp = NULL; On Thu, Jul 11, 2013 at 8:43 AM, Reid Linnemann wrote: > So recently I was trying to transfer a root-on-ZFS zpool from one pair of > disks to a single, larger disk. As I am wont to do, I botched the transfer > up and decided to destroy the ZFS filesystems on the destination and start > again. Naturally I was up late working on this, being sloppy and drowsy > without any coffee, and lo and behold I issued my 'zfs destroy -R' and > immediately realized after pressing [ENTER[ that I had given it the > source's zpool name. oops. Fortunately I was able to interrupt the > procedure with only /usr being destroyed from the pool and I was able to > send/receive the truly vital data in my /var partition to the new disk and > re-deploy the base system to /usr on the new disk. The only thing I'm > really missing at this point is all of the third-party software > configuration I had in /usr/local/etc and my apache data in /usr/local/www. > > After a few minutes on Google I came across this wonderful page: > > http://www.solarisinternals.com/wiki/index.php/ZFS_forensics_scrollback_script > > where the author has published information about his python script which > locates the uberblocks on the raw disk and shows the user the most recent > transaction IDs, prompts the user for a transaction ID to roll back to, and > zeroes out all uberblocks beyond that point. Theoretically, I should be > able to use this script to get back to the transaction prior to my dreaded > 'zfs destroy -R', then be able to recover the data I need (since no further > writes have been done to the source disks). > > First, I know there's a problem in the script on FreeBSD in which the grep > pattern for the od output expects a single space between the output > elements. I've attached a patch that allows the output to be properly > grepped in FreeBSD, so we can actually get to the transaction log. > > But now we are to my current problem. When attempting to roll back with > this script, it tries to dd zero'd bytes to offsets into the disk device > (/dev/ada1p3 in my case) where the uberblocks are located. But even > with kern.geom.debugflags > set to 0x10 (and I am runnign this as root) I get 'Operation not permitted' > when the script tries to zero out the unwanted transactions. I'm fairly > certain this is because the geom is in use by the ZFS subsystem, as it is > still recognized as a part of the original pool. I'm hesitant to zfs export > the pool, as I don't know if that wipes the transaction history on the > pool. Does anyone have any ideas? > > Thanks, > -Reid > > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" From owner-freebsd-hackers@FreeBSD.ORG Thu Jul 11 15:59:06 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id B9DB4258 for ; Thu, 11 Jul 2013 15:59:06 +0000 (UTC) (envelope-from will@firepipe.net) Received: from mail-vc0-f175.google.com (mail-vc0-f175.google.com [209.85.220.175]) by mx1.freebsd.org (Postfix) with ESMTP id 7F4B71741 for ; Thu, 11 Jul 2013 15:59:06 +0000 (UTC) Received: by mail-vc0-f175.google.com with SMTP id hr11so6773769vcb.20 for ; Thu, 11 Jul 2013 08:59:05 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:x-gm-message-state; bh=SqVPF4dc5Jwr7mo0azieQVCu5ZhQ/ZWhqLBq0yZ0dV0=; b=NzfAHReMie4pZVu5oXsJYWATM/anPcENPup0qf6a9bGA58SM5dgHVd9ciSxQFiJXAi /Cw4bS8pYcpTXwlex3N/HDRYtyZxKdVxZ6GDj1nTZtCM03XCLBZ2nl70fgD48mYS+txO PG7BY106MihbxQzFcPzS+rQK1sWEDsRggAHlaSS95vYDbrS3YRttiSf0YJf9qjwO8aWM e6woiiAtKPisQjgr3aVK22Dp7TKD5qZfOwMSVOOlu03TmRH/NxpwQY37N81wROsxIpTu 4X3AkvPTRVLE9ZlvqDRxLhL2s1ftYHmjyjDj2v5oTdj2bZZX1WT5bTUCiOJ/6G8X+ttk k/NQ== MIME-Version: 1.0 X-Received: by 10.221.64.18 with SMTP id xg18mr22250420vcb.57.1373558345703; Thu, 11 Jul 2013 08:59:05 -0700 (PDT) Received: by 10.58.226.66 with HTTP; Thu, 11 Jul 2013 08:59:05 -0700 (PDT) In-Reply-To: References: Date: Thu, 11 Jul 2013 09:59:05 -0600 Message-ID: Subject: Re: Attempting to roll back zfs transactions on a disk to recover a destroyed ZFS filesystem From: Will Andrews To: freebsd-hackers@freebsd.org Content-Type: text/plain; charset=UTF-8 X-Gm-Message-State: ALoCoQnLLlZx8sC6MACTxN9OLfFah1rt9UpvAKkWEGyGd/7HAAilf7QR83A6zZENCQDPbYY6pYKF X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 11 Jul 2013 15:59:06 -0000 On Thu, Jul 11, 2013 at 9:04 AM, Alan Somers wrote: > "zpool export" does not wipe the transaction history. It does, > however, write new labels and some metadata, so there is a very slight > chance that it might overwrite some of the blocks that you're trying > to recover. But it's probably safe. An alternative, much more > complicated, solution would be to have ZFS open the device > non-exclusively. This patch will do that. Caveat programmer: I > haven't tested this patch in isolation. This change is quite a bit more than necessary, and probably wouldn't apply to FreeBSD given the other changes in the code. Really, to make non-exclusive opens you just have to change the g_access() calls in vdev_geom.c so the third argument is always 0. However, see below. > On Thu, Jul 11, 2013 at 8:43 AM, Reid Linnemann wrote: >> But now we are to my current problem. When attempting to roll back with >> this script, it tries to dd zero'd bytes to offsets into the disk device >> (/dev/ada1p3 in my case) where the uberblocks are located. But even >> with kern.geom.debugflags >> set to 0x10 (and I am runnign this as root) I get 'Operation not permitted' >> when the script tries to zero out the unwanted transactions. I'm fairly >> certain this is because the geom is in use by the ZFS subsystem, as it is >> still recognized as a part of the original pool. I'm hesitant to zfs export >> the pool, as I don't know if that wipes the transaction history on the >> pool. Does anyone have any ideas? You do not have a choice. Changing the on-disk state does not mean the in-core state will update to match, and the pool could get into a really bad state if you try to modify the transactions on disk while it's online, since it may write additional transactions (which rely on state you're about to destroy), before you export. Also, rolling back transactions in this manner assumes that the original blocks (that were COW'd) are still in their original state. If you're using TRIM or have a pretty full pool, the odds are not in your favor. It's a roll of the dice, in any case. --Will. From owner-freebsd-hackers@FreeBSD.ORG Thu Jul 11 16:05:37 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id ABFD65BA for ; Thu, 11 Jul 2013 16:05:37 +0000 (UTC) (envelope-from linnemannr@gmail.com) Received: from mail-ob0-x22f.google.com (mail-ob0-x22f.google.com [IPv6:2607:f8b0:4003:c01::22f]) by mx1.freebsd.org (Postfix) with ESMTP id 77C2F1784 for ; Thu, 11 Jul 2013 16:05:37 +0000 (UTC) Received: by mail-ob0-f175.google.com with SMTP id xn12so10002822obc.20 for ; Thu, 11 Jul 2013 09:05:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=Y6AEZ09ejIzhT0gpGN483fXjXQKoEkJ45Pz5TLPS7yg=; b=wZdBbVs93VgDqlRvNGxLpJQ1a6fOtiuFr2sTr53M5Jrv7/coj8tp2kSJS9bLdzdwne 72AXs729hDlFxNP8wLXVKH0HvoLaQBCk/so68kuPgLNNC0j8T7DaRfO/WCPkK9OyAwGL K8txmLgrAM52vZXfdNweb7o3loD8OhxBCN9Ht8w2zXAOHbhsrs3kxGqRH82Nf65/0ODF aYcZCmUlZUkT4JoIBLWg+1XrcnPc3n68so/KEg9oegERh9AhHau/k51b6kiYpUMwr6RY LsG30GfQQKCwwfASgH3r4cPkcBJTSCVaj37zr9cz+F00U7K0izjOVm0c1EpRSetRNsWF KlLA== MIME-Version: 1.0 X-Received: by 10.60.103.211 with SMTP id fy19mr32069300oeb.103.1373558737106; Thu, 11 Jul 2013 09:05:37 -0700 (PDT) Received: by 10.182.122.97 with HTTP; Thu, 11 Jul 2013 09:05:37 -0700 (PDT) In-Reply-To: References: Date: Thu, 11 Jul 2013 11:05:37 -0500 Message-ID: Subject: Re: Attempting to roll back zfs transactions on a disk to recover a destroyed ZFS filesystem From: Reid Linnemann To: Will Andrews Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: freebsd-hackers@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 11 Jul 2013 16:05:37 -0000 Will, Thanks, that makes sense. I know this is all a crap shoot, but I've really got nothing to lose at this point, so this is just a good opportunity to rummage around the internals of ZFS and learn a few things. I might even get lucky and recover some data! On Thu, Jul 11, 2013 at 10:59 AM, Will Andrews wrote: > On Thu, Jul 11, 2013 at 9:04 AM, Alan Somers wrote: > > "zpool export" does not wipe the transaction history. It does, > > however, write new labels and some metadata, so there is a very slight > > chance that it might overwrite some of the blocks that you're trying > > to recover. But it's probably safe. An alternative, much more > > complicated, solution would be to have ZFS open the device > > non-exclusively. This patch will do that. Caveat programmer: I > > haven't tested this patch in isolation. > > This change is quite a bit more than necessary, and probably wouldn't > apply to FreeBSD given the other changes in the code. Really, to make > non-exclusive opens you just have to change the g_access() calls in > vdev_geom.c so the third argument is always 0. > > However, see below. > > > On Thu, Jul 11, 2013 at 8:43 AM, Reid Linnemann > wrote: > >> But now we are to my current problem. When attempting to roll back with > >> this script, it tries to dd zero'd bytes to offsets into the disk device > >> (/dev/ada1p3 in my case) where the uberblocks are located. But even > >> with kern.geom.debugflags > >> set to 0x10 (and I am runnign this as root) I get 'Operation not > permitted' > >> when the script tries to zero out the unwanted transactions. I'm fairly > >> certain this is because the geom is in use by the ZFS subsystem, as it > is > >> still recognized as a part of the original pool. I'm hesitant to zfs > export > >> the pool, as I don't know if that wipes the transaction history on the > >> pool. Does anyone have any ideas? > > You do not have a choice. Changing the on-disk state does not mean > the in-core state will update to match, and the pool could get into a > really bad state if you try to modify the transactions on disk while > it's online, since it may write additional transactions (which rely on > state you're about to destroy), before you export. > > Also, rolling back transactions in this manner assumes that the > original blocks (that were COW'd) are still in their original state. > If you're using TRIM or have a pretty full pool, the odds are not in > your favor. It's a roll of the dice, in any case. > > --Will. > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" > From owner-freebsd-hackers@FreeBSD.ORG Thu Jul 11 17:41:26 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 6A555872 for ; Thu, 11 Jul 2013 17:41:26 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) by mx1.freebsd.org (Postfix) with ESMTP id 420D81C2D for ; Thu, 11 Jul 2013 17:41:26 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 8B569B999; Thu, 11 Jul 2013 13:41:25 -0400 (EDT) From: John Baldwin To: freebsd-hackers@freebsd.org Subject: Re: Intel D2500CC serial ports Date: Thu, 11 Jul 2013 10:14:42 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p25; KDE/4.5.5; amd64; ; ) References: In-Reply-To: MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201307111014.42903.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Thu, 11 Jul 2013 13:41:25 -0400 (EDT) Cc: Robert Ames X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 11 Jul 2013 17:41:26 -0000 On Sunday, June 30, 2013 1:24:27 pm Robert Ames wrote: > I just picked up an Intel D2500CCE motherboard and was disappointed > to find the serial ports didn't work. There has been discussion > about this problem here: > > http://lists.freebsd.org/pipermail/freebsd-current/2013-April/040897.html > http://lists.freebsd.org/pipermail/freebsd-current/2013-May/042088.html > > As seen in the second link, Juergen Weiss was able to work around > the problem. This patch (for 8.4-RELEASE amd64) makes all 4 serial > ports functional. > > --- /usr/src/sys/amd64/amd64/io_apic.c.orig 2013-06-02 13:23:05.000000000 -0500 > +++ /usr/src/sys/amd64/amd64/io_apic.c 2013-06-28 18:52:03.000000000 -0500 > @@ -452,6 +452,10 @@ > KASSERT(!(trig == INTR_TRIGGER_CONFORM || pol == INTR_POLARITY_CONFORM), > ("%s: Conforming trigger or polarity\n", __func__)); > > + if (trig == INTR_TRIGGER_EDGE && pol == INTR_POLARITY_LOW) { > + pol = INTR_POLARITY_HIGH; > + } > + Hmm, so this is your BIOS doing the wrong thing in its ASL. Maybe try this: --- //depot/user/jhb/acpipci/dev/acpica/acpi_resource.c 2011-07-22 17:59:31.000000000 0000 +++ /home/jhb/work/p4/acpipci/dev/acpica/acpi_resource.c 2011-07-22 17:59:31.000000000 0000 @@ -141,6 +141,10 @@ default: panic("%s: bad resource type %u", __func__, res->Type); } +#if defined(__amd64__) || defined(__i386__) + if (irq < 16 && trig == ACPI_EDGE_SENSITIVE && pol == ACPI_ACTIVE_LOW) + pol = ACPI_ACTIVE_HIGH; +#endif BUS_CONFIG_INTR(dev, irq, (trig == ACPI_EDGE_SENSITIVE) ? INTR_TRIGGER_EDGE : INTR_TRIGGER_LEVEL, (pol == ACPI_ACTIVE_HIGH) ? INTR_POLARITY_HIGH : INTR_POLARITY_LOW); -- John Baldwin From owner-freebsd-hackers@FreeBSD.ORG Thu Jul 11 17:41:29 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 0693C873 for ; Thu, 11 Jul 2013 17:41:29 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) by mx1.freebsd.org (Postfix) with ESMTP id D6F081C2E for ; Thu, 11 Jul 2013 17:41:28 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id E99EFB953; Thu, 11 Jul 2013 13:41:27 -0400 (EDT) From: John Baldwin To: freebsd-hackers@freebsd.org, mangesh chitnis Subject: Re: memmap in FreeBSD Date: Thu, 11 Jul 2013 10:23:12 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p25; KDE/4.5.5; amd64; ; ) References: <1373197303.40304.YahooMailNeo@web160703.mail.bf1.yahoo.com> In-Reply-To: <1373197303.40304.YahooMailNeo@web160703.mail.bf1.yahoo.com> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201307111023.12908.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Thu, 11 Jul 2013 13:41:28 -0400 (EDT) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 11 Jul 2013 17:41:29 -0000 On Sunday, July 07, 2013 7:41:43 am mangesh chitnis wrote: > Hi, > > What is the memmap equivalent of Linux in FreeBSD? > > In Linux memmap is used to reserve a portion of physical memory. This is used as a kernel boot argument. E.g.: memmap=2G$1G will reserve 1GB memory above 2GB, incase I have 3GB RAM. This 1GB reserved memory is not visible to the OS, however this 1GB can be used using ioremap. > How can I reserve memory in FreeBSD and later use it i.e memmap and ioremap equivalent? > > I have tried using hw.physmem loader parameter. > I have 3 GB system memory and I have set hw.physmem=2G. > > > sysctl -a shows: > hw.physmem: 2.12G Note that 'hw.physmem=2G' is using power of 2 units (so 2 * 2^30), not power of 10. > hw.usermem: 1.9G > hw.realmem: 2.15G > > devinfo -rv shows: > ram0: > > 0x00-0x9f3ff > 0x10000000-0xbfedffff > 0xbff00000-0xbfffffff > > Here, looks like it is showing the full 3 GB mapping. ram0 is reserving address space, so it always claims all of the memory installed. > Now, how do I know which is that 1 GB available memory (In Linux, this memory is shown as reserved in /proc/iomem under System RAM) ? Also, which function(similar to ioremap) should I call to map the physical address to virtual address? There is currently no way to see the memory above the cap you set. In the kernel you could perhaps fetch the SMAP metadata and walk the list to see if there is memory above Maxmem (and if so it is presumably available for use). However, to map it you would need to use pmap_*() routines directly. Alternatively, you could abuse OBJT_SG by creating an sglist that describes the unused memory range and then creating an OBJT_SG VM object backed by that sglist. You could then insert that VM object into the kernel's address space to map it into the kernel, or even make it available to userland via d_mmap_single(), or direct manipulation of a process' address space via an ioctl, etc. -- John Baldwin From owner-freebsd-hackers@FreeBSD.ORG Thu Jul 11 17:41:31 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 1FF808C2 for ; Thu, 11 Jul 2013 17:41:31 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) by mx1.freebsd.org (Postfix) with ESMTP id F24DB1C2F for ; Thu, 11 Jul 2013 17:41:30 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 4DD7DB949; Thu, 11 Jul 2013 13:41:30 -0400 (EDT) From: John Baldwin To: freebsd-hackers@freebsd.org, Jordan Hubbard Subject: Re: Kernel dumps [was Re: possible changes from Panzura] Date: Thu, 11 Jul 2013 10:28:37 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p25; KDE/4.5.5; amd64; ; ) References: <3592BFB7-0663-4381-AFF5-C7DE0AE16858@mail.turbofuzz.com> In-Reply-To: MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201307111028.37815.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Thu, 11 Jul 2013 13:41:30 -0400 (EDT) Cc: Will Andrews X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 11 Jul 2013 17:41:31 -0000 > Speaking of Apple solutions, I've recently used Apple's kgdb with the > kernel debug kit & kdp remote debugging, to debug a panic'd OS X host. > It's really quite nice, because the debug kit comes with a ton of > macros, similar to kdb, and you also get the benefit of source > debugging. I think FreeBSD would benefit massively from finding some > way to share macros between kdb and kgdb, in addition to having an > "emergency network stack" like you suggest. I have a set of macros I maintain that implement many ddb commands in kgdb including 'sleepchain' and 'lockchain'. http://www.freebsd.org/~jhb/gdb/ -- John Baldwin From owner-freebsd-hackers@FreeBSD.ORG Thu Jul 11 19:52:34 2013 Return-Path: Delivered-To: hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 055E7E4B; Thu, 11 Jul 2013 19:52:34 +0000 (UTC) (envelope-from jordan.hubbard@gmail.com) Received: from mail-pb0-x22e.google.com (mail-pb0-x22e.google.com [IPv6:2607:f8b0:400e:c01::22e]) by mx1.freebsd.org (Postfix) with ESMTP id CCBFA1157; Thu, 11 Jul 2013 19:52:33 +0000 (UTC) Received: by mail-pb0-f46.google.com with SMTP id rq2so8286426pbb.33 for ; Thu, 11 Jul 2013 12:52:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=content-type:mime-version:subject:from:in-reply-to:date:cc :message-id:references:to:x-mailer; bh=ILI4KwZBzh+9i4sy36eP3P9yx3d9dUKb4HZg6P7fftY=; b=SlCPbUVT4XPjXG6+80Zq/A5qa4mz/E9BiTdPnbefBfPrMeuSDToyU33+tFb5OI5GfG ZtoX20sJxrRy1bJNL2n1ByeGA11RBOArle6E4nyjGCgAAcj/8rBrVT6fkRHrBRU1hJlD oY24vyTQorekm/Iu8+8yuZEuVR/DGFhv7Re8DtjG5GFras55uKKjJ1TxVlU6SetUEzub paaIjYTS2Bz4RoyYTnKTwqj2FMiwGNMDsHPKTAprjP/olnNLF5JJOFAt5dMQJflSAaS1 ZJ0FtIH08taCporMZwg6Et6k+mRo4suTQmUgIJmLrfTYEuZkdGdAZbH++Yx8MEgj3hMW 7Yfw== X-Received: by 10.68.98.33 with SMTP id ef1mr38452716pbb.59.1373572353545; Thu, 11 Jul 2013 12:52:33 -0700 (PDT) Received: from [10.20.30.70] (75-101-82-48.static.sonic.net. [75.101.82.48]) by mx.google.com with ESMTPSA id vi8sm32660302pbc.31.2013.07.11.12.52.31 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 11 Jul 2013 12:52:32 -0700 (PDT) Mime-Version: 1.0 (Mac OS X Mail 6.6 \(1510\)) Subject: Re: Kernel dumps [was Re: possible changes from Panzura] From: "Jordan K. Hubbard" In-Reply-To: <51DEC0E8.7010305@freebsd.org> Date: Thu, 11 Jul 2013 12:52:29 -0700 Message-Id: References: <9890DFF1-892A-4DCA-9E33-B70681154F43@mail.turbofuzz.com> <4F0DFAB7-D6D5-4068-A543-C9DF885D1A7D@dragondata.com> <51DEC0E8.7010305@freebsd.org> To: Julian Elischer X-Mailer: Apple Mail (2.1510) X-Mailman-Approved-At: Thu, 11 Jul 2013 19:54:49 +0000 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: hackers@freebsd.org, Kevin Day X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 11 Jul 2013 19:52:34 -0000 On Jul 11, 2013, at 7:27 AM, Julian Elischer wrote: > I could imagine that we could stash away a vimage stack just for this = purpose. > yould set it up on boot and leave it detached until you need it. >=20 > you just need to switch the interfaces over to the new stack on panic = and put them into 'poll' mode. That sounds like a rather clever solution to this problem (OS X doesn't = support vimage, despite repeated attempts on my part to change that). How much work do you think it would take to bang out a proof of concept? = Is anyone up to the challenge? Any incentives I can provide? This = would be really useful. :-) - Jordan From owner-freebsd-hackers@FreeBSD.ORG Thu Jul 11 21:05:55 2013 Return-Path: Delivered-To: hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 11CD778B; Thu, 11 Jul 2013 21:05:55 +0000 (UTC) (envelope-from artemb@gmail.com) Received: from mail-vb0-x233.google.com (mail-vb0-x233.google.com [IPv6:2607:f8b0:400c:c02::233]) by mx1.freebsd.org (Postfix) with ESMTP id B3E021615; Thu, 11 Jul 2013 21:05:54 +0000 (UTC) Received: by mail-vb0-f51.google.com with SMTP id x17so972156vbf.10 for ; Thu, 11 Jul 2013 14:05:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=6DJDIHcM4e+E+/ldB0EKEZdKpy6sbN3lbYxzYjRxIBQ=; b=FynDpLjnB8QzTL3Q7rt4Olhz7dbg65Pox4et4kt0iYYfTyZsuBTUeMXnmmO1fTHp+h EJkFWk37qlm0W4L4LVlwW4z7d1weDYcPm6bZr3XhqmBTdNjDAFgvFBzQImz4cNxzdX6q FsMe4vGdVg9tTEitQtp+nCZTcrqq2egBDRF/mAl1hNwxkd5xFTl2+H4BgWdEjmC/p0CN 1tfJQ1JpEQqYUSAF0AExSdvH8Av/FbJUY2NDul8iCrTTuTsJGpY+pN4KKvqITcY27+wY rbUN2Kh94FcHZromGU5Hpv+WViQdMeNIHVPEbkEw1FDl9pnNuu3XHuwxvTOzV+19Bwoa J+tg== MIME-Version: 1.0 X-Received: by 10.58.109.5 with SMTP id ho5mr23114592veb.10.1373576754201; Thu, 11 Jul 2013 14:05:54 -0700 (PDT) Sender: artemb@gmail.com Received: by 10.221.41.6 with HTTP; Thu, 11 Jul 2013 14:05:54 -0700 (PDT) In-Reply-To: References: <9890DFF1-892A-4DCA-9E33-B70681154F43@mail.turbofuzz.com> <4F0DFAB7-D6D5-4068-A543-C9DF885D1A7D@dragondata.com> <51DEC0E8.7010305@freebsd.org> Date: Thu, 11 Jul 2013 14:05:54 -0700 X-Google-Sender-Auth: 6ymocqv92c92wVSetc9aLt1d6GM Message-ID: Subject: Re: Kernel dumps [was Re: possible changes from Panzura] From: Artem Belevich To: "Jordan K. Hubbard" Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: hackers@freebsd.org, Kevin Day X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 11 Jul 2013 21:05:55 -0000 On Thu, Jul 11, 2013 at 12:52 PM, Jordan K. Hubbard < jordan.hubbard@gmail.com> wrote: > > On Jul 11, 2013, at 7:27 AM, Julian Elischer wrote: > > > I could imagine that we could stash away a vimage stack just for this > purpose. > > yould set it up on boot and leave it detached until you need it. > > > > you just need to switch the interfaces over to the new stack on panic > and put them into 'poll' mode. > > That sounds like a rather clever solution to this problem (OS X doesn't > support vimage, despite repeated attempts on my part to change that). > > It would probably work for most of the crashes, but will not work in few interesting classes of failure. Using in-kernel stack implicitly assumes that your memory allocator still works as both the stack and the interface driver will need to get their mbufs and other data somewhere. Alas it's those unusual cases that are hardest to debug and where you really do want debugger or coredump to work. Back at my previous work we did it 'embedded system way'. Interface driver provided dumb functions to re-init device, send a frame and poll for received frame. All that without using system malloc. There was a dumb malloc that gave few chunks of memory from static buffer to gzip, but the rest of the code was independent of any kernel facilities. We had simple ARP/IP/UDP/TFTP(+gzip) implementation to upload compressed image of physical memory to a specified server. Overall it worked pretty well. Considering that this approach pretty much puts core dump outside of kernel, I wonder if we could start some sort of reverse BTX loader on crash. Instead of downloading the kernel it would upload the core. This way we should be able to produce the core in a fairly generic way on any system where we can use PXE for network I/O. The idea may be a non-starter as I have no clue whether it's possible to use PXE once kernel had booted and took control of NIC hardware. --Artem From owner-freebsd-hackers@FreeBSD.ORG Thu Jul 11 21:20:36 2013 Return-Path: Delivered-To: hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 51C06A8E for ; Thu, 11 Jul 2013 21:20:36 +0000 (UTC) (envelope-from toasty@dragondata.com) Received: from mail-gg0-x232.google.com (mail-gg0-x232.google.com [IPv6:2607:f8b0:4002:c02::232]) by mx1.freebsd.org (Postfix) with ESMTP id 0E52416B9 for ; Thu, 11 Jul 2013 21:20:35 +0000 (UTC) Received: by mail-gg0-f178.google.com with SMTP id l12so2889538gge.9 for ; Thu, 11 Jul 2013 14:20:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dragondata.com; s=google; h=content-type:mime-version:subject:from:in-reply-to:date:cc :message-id:references:to:x-mailer; bh=G+GmsOc2/l1TQLXLGb6+KWXNcM7sWN7uKeMQeQoTOjw=; b=PEZ8nn0OsE5YQTS5G1zOjiWGOQxYW4SRLUUCr0uBc2r04aw0VmZTI0i6mCxdeRUq9B KQuCp5J2dGldlFuuB4m+yO3xyZIauBGraWHyv7ij1Teosdej0wan5wPWwPvfKJFFL2Ti 0J3FVsoiiFUmNSBVEf20OQ4S1v+LA8nTIhlUs= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=content-type:mime-version:subject:from:in-reply-to:date:cc :message-id:references:to:x-mailer:x-gm-message-state; bh=G+GmsOc2/l1TQLXLGb6+KWXNcM7sWN7uKeMQeQoTOjw=; b=Pi37AU/fxv+Cyy3AZQD09LouBFD71RzkTkDH3T87ZIWonCbRXxhwF01nPkcEeE4wNn n4vi0OYgu67GM/eQwMNXIBFKWFuocGKAqtZBzrZVd+A0R2rDPCe1Q6mtLT8OCXJ2+ZLq r8bC8zabPOM9Ca/He+kzOhLjEDbfMV70IvW3yNLuhIlXQUT59Z1ai+kUul5lzw8sdL4m nlnMLEyMFefclj5xsaYmRlytC8SCpWiTelOPh9GaZn9cbum5b00DCgmlSiIBqUmkvQCW EEUMlApxFSGBx1qGt9xBFs6l9kMuEPQbeSGFdgUt+NGjLgjlNtrzDAS6IFokeQ6BjxZ5 RobQ== X-Received: by 10.236.161.3 with SMTP id v3mr21982090yhk.3.1373577635544; Thu, 11 Jul 2013 14:20:35 -0700 (PDT) Received: from unassigned.v6.your.org ([2001:4978:1:45:c937:933:fe02:226a]) by mx.google.com with ESMTPSA id h26sm62849823yhb.21.2013.07.11.14.20.33 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 11 Jul 2013 14:20:34 -0700 (PDT) Content-Type: multipart/signed; boundary="Apple-Mail=_17B5C70A-9C60-4DFA-A1CF-40315A6880DB"; protocol="application/pkcs7-signature"; micalg=sha1 Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\)) Subject: Re: Kernel dumps [was Re: possible changes from Panzura] From: Kevin Day In-Reply-To: Date: Thu, 11 Jul 2013 16:20:32 -0500 Message-Id: References: <9890DFF1-892A-4DCA-9E33-B70681154F43@mail.turbofuzz.com> <4F0DFAB7-D6D5-4068-A543-C9DF885D1A7D@dragondata.com> <51DEC0E8.7010305@freebsd.org> To: Artem Belevich X-Mailer: Apple Mail (2.1508) X-Gm-Message-State: ALoCoQl2GR7aoNnrsh6H3WydmrMzMCaKqHI8rbMAsdoJ8Wo6NQgm5syu5vHexO2+iBUQ5H4BHWoO Cc: hackers@freebsd.org, "Jordan K. Hubbard" X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 11 Jul 2013 21:20:36 -0000 --Apple-Mail=_17B5C70A-9C60-4DFA-A1CF-40315A6880DB Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=iso-8859-1 On Jul 11, 2013, at 4:05 PM, Artem Belevich wrote: > =20 > It would probably work for most of the crashes, but will not work in = few interesting classes of failure. Using in-kernel stack implicitly = assumes that your memory allocator still works as both the stack and the = interface driver will need to get their mbufs and other data somewhere. = Alas it's those unusual cases that are hardest to debug and where you = really do want debugger or coredump to work. >=20 > Back at my previous work we did it 'embedded system way'. Interface = driver provided dumb functions to re-init device, send a frame and poll = for received frame. All that without using system malloc. There was a = dumb malloc that gave few chunks of memory from static buffer to gzip, = but the rest of the code was independent of any kernel facilities. We = had simple ARP/IP/UDP/TFTP(+gzip) implementation to upload compressed = image of physical memory to a specified server. Overall it worked pretty = well. That's the exact reason why we invented our own mini stack and hooks = into the network driver. After many failure cases, you can no longer = rely on malloc, interrupts, routing tables or other goodies to be = working correctly. It's too easy for the rest of the system to be broken = enough that touching any of those pieces was enough to crash again. It really depends on the scope of problem you're trying to debug, but at = minimum I think you need to revert to polled networking, disable all = interrupts, and use your own stack/memory pool. Even then it's still not = foolproof, but at least then you spend less time trying to debug your = debugger. --Apple-Mail=_17B5C70A-9C60-4DFA-A1CF-40315A6880DB Content-Disposition: attachment; filename=smime.p7s Content-Type: application/pkcs7-signature; name=smime.p7s Content-Transfer-Encoding: base64 MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIPLzCCBN0w ggPFoAMCAQICEHGS++YZX6xNEoV0cTSiGKcwDQYJKoZIhvcNAQEFBQAwezELMAkGA1UEBhMCR0Ix GzAZBgNVBAgMEkdyZWF0ZXIgTWFuY2hlc3RlcjEQMA4GA1UEBwwHU2FsZm9yZDEaMBgGA1UECgwR Q29tb2RvIENBIExpbWl0ZWQxITAfBgNVBAMMGEFBQSBDZXJ0aWZpY2F0ZSBTZXJ2aWNlczAeFw0w NDAxMDEwMDAwMDBaFw0yODEyMzEyMzU5NTlaMIGuMQswCQYDVQQGEwJVUzELMAkGA1UECBMCVVQx FzAVBgNVBAcTDlNhbHQgTGFrZSBDaXR5MR4wHAYDVQQKExVUaGUgVVNFUlRSVVNUIE5ldHdvcmsx ITAfBgNVBAsTGGh0dHA6Ly93d3cudXNlcnRydXN0LmNvbTE2MDQGA1UEAxMtVVROLVVTRVJGaXJz dC1DbGllbnQgQXV0aGVudGljYXRpb24gYW5kIEVtYWlsMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8A MIIBCgKCAQEAsjmFpPJ9q0E7YkY3rs3BYHW8OWX5ShpHornMSMxqmNVNNRm5pELlzkniii8efNIx B8dOtINknS4p1aJkxIW9hVE1eaROaJB7HHqkkqgX8pgV8pPMyaQylbsMTzC9mKALi+VuG6JG+ni8 om+rWV6lL8/K2m2qL+usobNqqrcuZzWLeeEeaYji5kbNoKXqvgvOdjp6Dpvq/NonWz1zHyLmSGHG TPNpsaguG7bUMSAsvIKKjqQOpdeJQ/wWWq8dcdcRWdq6hw2v+vPhwvCkxWeM1tZUOt4KpLoDd7Nl yP0e03RiqhjKaJMeoYV+9Udly/hNVyh00jT/MLbu9mIwFIws6wIDAQABo4IBJzCCASMwHwYDVR0j BBgwFoAUoBEKIz6W8Qfs4q8p74Klf9AwpLQwHQYDVR0OBBYEFImCZ33EnSZwAEu0UEh83j2uBG59 MA4GA1UdDwEB/wQEAwIBBjAPBgNVHRMBAf8EBTADAQH/MB0GA1UdJQQWMBQGCCsGAQUFBwMCBggr BgEFBQcDBDARBgNVHSAECjAIMAYGBFUdIAAwewYDVR0fBHQwcjA4oDagNIYyaHR0cDovL2NybC5j b21vZG9jYS5jb20vQUFBQ2VydGlmaWNhdGVTZXJ2aWNlcy5jcmwwNqA0oDKGMGh0dHA6Ly9jcmwu Y29tb2RvLm5ldC9BQUFDZXJ0aWZpY2F0ZVNlcnZpY2VzLmNybDARBglghkgBhvhCAQEEBAMCAQYw DQYJKoZIhvcNAQEFBQADggEBAJ2Vyzy4fqUJxB6/C8LHdo45PJTGEKpPDMngq4RdiVTgZTvzbRx8 NywlVF+WIfw3hJGdFdwUT4HPVB1rbEVgxy35l1FM+WbKPKCCjKbI8OLp1Er57D9Wyd12jMOCAU9s APMeGmF0BEcDqcZAV5G8ZSLFJ2dPV9tkWtmNH7qGL/QGrpxp7en0zykX2OBKnxogL5dMUbtGB8SK N04g4wkxaMeexIud6H4RvDJoEJYRmETYKlFgTYjrdDrfQwYyyDlWjDoRUtNBpEMD9O3vMyfbOeAU TibJ2PU54om4k123KSZB6rObroP8d3XK6Mq1/uJlSmM+RMTQw16Hc6mYHK9/FX8wggUaMIIEAqAD AgECAhBtGeqnGU9qMyLmIjJ6qnHeMA0GCSqGSIb3DQEBBQUAMIGuMQswCQYDVQQGEwJVUzELMAkG A1UECBMCVVQxFzAVBgNVBAcTDlNhbHQgTGFrZSBDaXR5MR4wHAYDVQQKExVUaGUgVVNFUlRSVVNU IE5ldHdvcmsxITAfBgNVBAsTGGh0dHA6Ly93d3cudXNlcnRydXN0LmNvbTE2MDQGA1UEAxMtVVRO LVVTRVJGaXJzdC1DbGllbnQgQXV0aGVudGljYXRpb24gYW5kIEVtYWlsMB4XDTExMDQyODAwMDAw MFoXDTIwMDUzMDEwNDgzOFowgZMxCzAJBgNVBAYTAkdCMRswGQYDVQQIExJHcmVhdGVyIE1hbmNo ZXN0ZXIxEDAOBgNVBAcTB1NhbGZvcmQxGjAYBgNVBAoTEUNPTU9ETyBDQSBMaW1pdGVkMTkwNwYD VQQDEzBDT01PRE8gQ2xpZW50IEF1dGhlbnRpY2F0aW9uIGFuZCBTZWN1cmUgRW1haWwgQ0EwggEi MA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQCShIRbS1eY1F4vi6ThQMijU1hfZmXxMk73nzJ9 VdB4TFW3QpTg+SdxB8XGaaS5MsTxQBqQzCdWYn8XtXFpruUgG+TLY15gyqJB9mrho/+43x9IbWVD jCouK2M4d9+xF6zC2oIC1tQyatRnbyATj1w1+uVUgK/YcQodNwoCUFNslR2pEBS0mZVZEjH/CaLS TNxS297iQAFbSGjdxUq04O0kHzqvcV8H46y/FDuwJXFoPfQP1hdYRhWBPGiLi4MPbXohV+Y0sNsy fuNK4aVScmQmkU6lkg//4LFg/RpvaFGZY40ai6XMQpubfSJj06mg/M6ekN9EGfRcWzW6FvOnm//B AgMBAAGjggFLMIIBRzAfBgNVHSMEGDAWgBSJgmd9xJ0mcABLtFBIfN49rgRufTAdBgNVHQ4EFgQU ehNOAHRbxnhjZCfBL+KgW7x5xXswDgYDVR0PAQH/BAQDAgEGMBIGA1UdEwEB/wQIMAYBAf8CAQAw EQYDVR0gBAowCDAGBgRVHSAAMFgGA1UdHwRRME8wTaBLoEmGR2h0dHA6Ly9jcmwudXNlcnRydXN0 LmNvbS9VVE4tVVNFUkZpcnN0LUNsaWVudEF1dGhlbnRpY2F0aW9uYW5kRW1haWwuY3JsMHQGCCsG AQUFBwEBBGgwZjA9BggrBgEFBQcwAoYxaHR0cDovL2NydC51c2VydHJ1c3QuY29tL1VUTkFkZFRy dXN0Q2xpZW50X0NBLmNydDAlBggrBgEFBQcwAYYZaHR0cDovL29jc3AudXNlcnRydXN0LmNvbTAN BgkqhkiG9w0BAQUFAAOCAQEAhda+eFdVbTN/RFL+QtUGqAEDgIr7DbL9Sr/2r0FJ9RtaxdKtG3Nu PukmfOZMmMEwKN/L+0I8oSU+CnXW0D05hmbRoZu1TZtvryhsHa/l6nRaqNqxwPF1ei+eupN5yv7i kR5WdLL4jdPgQ3Ib7Y/9YDkgR/uLrzplSDyYPaUlv73vYOBJ5RbI6z9Dg/Dg7g3B080zX5vQvWBq szv++tTJOjwf7Zv/m0kzvkIpOYPuM2kugp1FTahp2oAbHj3SGl18R5mlmwhtEpmG1l1XBxunML5L SUS4kH7K0Xk467Qz+qA6XSZYnmFVGLQh1ZnV4ENAQjC+6qXnlNKw/vN1+X9u5zCCBSwwggQUoAMC AQICEQDbETdDYf7wYKjx8ymk38yAMA0GCSqGSIb3DQEBBQUAMIGTMQswCQYDVQQGEwJHQjEbMBkG A1UECBMSR3JlYXRlciBNYW5jaGVzdGVyMRAwDgYDVQQHEwdTYWxmb3JkMRowGAYDVQQKExFDT01P RE8gQ0EgTGltaXRlZDE5MDcGA1UEAxMwQ09NT0RPIENsaWVudCBBdXRoZW50aWNhdGlvbiBhbmQg U2VjdXJlIEVtYWlsIENBMB4XDTEzMDYxNjAwMDAwMFoXDTE0MDYxNjIzNTk1OVowJjEkMCIGCSqG SIb3DQEJARYVdG9hc3R5QGRyYWdvbmRhdGEuY29tMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIB CgKCAQEAvoIO+cLWLe7YYAGV/WdoWC85K8uIgstlYMg/bC8eGbC7AY/nuQXpRV5+xlTXgN7qry/m 6XErlaw1U3rmwlNyjMhJdYaPZclywBKKpYnc3sp0q2A6naeVmOF/t4QDImtfc3sV7SaEkIr7zssK MFTnkOX57g1r3MuiYoHBx1cMaWXYCJ5LDzsynwHGAExYuziRzXcu4sRZ1HBJlQ8hM3yhTTGGOQv1 H1ky13a1RxXC+uoTtYFyrxdBgPUd4eGF1tILHtK9NXnU6lhey90wDa2jmQOJQErgYuYPZriSuBXz QobK7tGcjMBgBQ1U+gxaTyThbXgxfb1MTjDx46hSl8Z35wIDAQABo4IB5TCCAeEwHwYDVR0jBBgw FoAUehNOAHRbxnhjZCfBL+KgW7x5xXswHQYDVR0OBBYEFO9wHp89I1B980w64KR38bmtuHFYMA4G A1UdDwEB/wQEAwIFoDAMBgNVHRMBAf8EAjAAMCAGA1UdJQQZMBcGCCsGAQUFBwMEBgsrBgEEAbIx AQMFAjARBglghkgBhvhCAQEEBAMCBSAwRgYDVR0gBD8wPTA7BgwrBgEEAbIxAQIBAQEwKzApBggr BgEFBQcCARYdaHR0cHM6Ly9zZWN1cmUuY29tb2RvLm5ldC9DUFMwVwYDVR0fBFAwTjBMoEqgSIZG aHR0cDovL2NybC5jb21vZG9jYS5jb20vQ09NT0RPQ2xpZW50QXV0aGVudGljYXRpb25hbmRTZWN1 cmVFbWFpbENBLmNybDCBiAYIKwYBBQUHAQEEfDB6MFIGCCsGAQUFBzAChkZodHRwOi8vY3J0LmNv bW9kb2NhLmNvbS9DT01PRE9DbGllbnRBdXRoZW50aWNhdGlvbmFuZFNlY3VyZUVtYWlsQ0EuY3J0 MCQGCCsGAQUFBzABhhhodHRwOi8vb2NzcC5jb21vZG9jYS5jb20wIAYDVR0RBBkwF4EVdG9hc3R5 QGRyYWdvbmRhdGEuY29tMA0GCSqGSIb3DQEBBQUAA4IBAQCBaQ8dcaprzzREiMtsc2UtOPSHFiCy dcd5OjE6BN+pkcQozhx3nol9dFKJ+YfGvIxIjHmDGFTOgJgJvjRZ0D1Hw2WJCEtyD+U6yi/cnDFu Ksl039qafzbah6ft2r+GM0QufuFmrBi/bTWU3lGuhL8TKOvsWeLFkyGqtv9AJz2vg7j7dpxutLQY NWnrt7nS2x6p4f1LXu3iwczefyNNFUYwE9zXAT0Uwn48g2iijuf9vekfpqtHBmfSu0tSfd3FS3JC hmFp1fMxnWOnuZ529HFtGeYzr1K8Tp+JEVPjzPCxymVFsZ945Vzj0kc0DT3f9N5Gdw6uybrUwupM NHJJCB9VMYIDrjCCA6oCAQEwgakwgZMxCzAJBgNVBAYTAkdCMRswGQYDVQQIExJHcmVhdGVyIE1h bmNoZXN0ZXIxEDAOBgNVBAcTB1NhbGZvcmQxGjAYBgNVBAoTEUNPTU9ETyBDQSBMaW1pdGVkMTkw NwYDVQQDEzBDT01PRE8gQ2xpZW50IEF1dGhlbnRpY2F0aW9uIGFuZCBTZWN1cmUgRW1haWwgQ0EC EQDbETdDYf7wYKjx8ymk38yAMAkGBSsOAwIaBQCgggHZMBgGCSqGSIb3DQEJAzELBgkqhkiG9w0B BwEwHAYJKoZIhvcNAQkFMQ8XDTEzMDcxMTIxMjAzMlowIwYJKoZIhvcNAQkEMRYEFM7yPZb12adQ MuSFiOhNu9JDx1x7MIG6BgkrBgEEAYI3EAQxgawwgakwgZMxCzAJBgNVBAYTAkdCMRswGQYDVQQI ExJHcmVhdGVyIE1hbmNoZXN0ZXIxEDAOBgNVBAcTB1NhbGZvcmQxGjAYBgNVBAoTEUNPTU9ETyBD QSBMaW1pdGVkMTkwNwYDVQQDEzBDT01PRE8gQ2xpZW50IEF1dGhlbnRpY2F0aW9uIGFuZCBTZWN1 cmUgRW1haWwgQ0ECEQDbETdDYf7wYKjx8ymk38yAMIG8BgsqhkiG9w0BCRACCzGBrKCBqTCBkzEL MAkGA1UEBhMCR0IxGzAZBgNVBAgTEkdyZWF0ZXIgTWFuY2hlc3RlcjEQMA4GA1UEBxMHU2FsZm9y ZDEaMBgGA1UEChMRQ09NT0RPIENBIExpbWl0ZWQxOTA3BgNVBAMTMENPTU9ETyBDbGllbnQgQXV0 aGVudGljYXRpb24gYW5kIFNlY3VyZSBFbWFpbCBDQQIRANsRN0Nh/vBgqPHzKaTfzIAwDQYJKoZI hvcNAQEBBQAEggEAtnX/mw/NQ//i259YIW4rtZnTa+N0WK3sKCrbByDfCbBgMNIYbi2BWW8Wdflr 1dmlQVkkdK+C+MuyiRmYbPvGlOmc07EEbwhU8p37ZuJgHnPzYUwlzkL7HDHfjSbu5Dr1AvkxHon0 GcWcEwawnvnUwZNnShai36zq+l5s3CuRI9E7kbcAcDeZ8e4zy+v0TT8YheSdgchieFitW0zxJvcK kLcHZNilrLSFWbAMQ/9GTvpl84gc9bsjdjxZRJDGrMm0K+hcmp2isRCTV4RdmsMuiMmSxODvroO9 4iK8rUqezcK++34yiAObYYqEqQR1rYbC9uT9FZnN0O2nWJS/qDQ4wgAAAAAAAA== --Apple-Mail=_17B5C70A-9C60-4DFA-A1CF-40315A6880DB-- From owner-freebsd-hackers@FreeBSD.ORG Fri Jul 12 03:06:41 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id B7B80C66 for ; Fri, 12 Jul 2013 03:06:41 +0000 (UTC) (envelope-from otacilio.neto@bsd.com.br) Received: from mail-gg0-x22f.google.com (mail-gg0-x22f.google.com [IPv6:2607:f8b0:4002:c02::22f]) by mx1.freebsd.org (Postfix) with ESMTP id 7B31515EA for ; Fri, 12 Jul 2013 03:06:41 +0000 (UTC) Received: by mail-gg0-f175.google.com with SMTP id q1so2944913ggm.20 for ; Thu, 11 Jul 2013 20:06:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsd.com.br; s=capeta; h=message-id:date:from:user-agent:mime-version:to:subject :x-enigmail-version:content-type:content-transfer-encoding; bh=VkLYzNAXVRjrY7doPmE77KizxA9qxz5kQIwEWdfC6d0=; b=Wf/uUrlzmgxEnsnScm8Dj7q8Q3WTTHAgHXa/Yr82IsFaz474S/pKuNeB4nIZa+6vpB 943K935FKhbaS95B8qrfzQq9iRLoYjiK87+MZzY2xhw1PVJKC47AD28tjhd/B/nX1lmS qrXYkW6e4UIjJMwoFb6XwqgaCTrVjcMvcOJ70= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:subject :x-enigmail-version:content-type:content-transfer-encoding :x-gm-message-state; bh=VkLYzNAXVRjrY7doPmE77KizxA9qxz5kQIwEWdfC6d0=; b=Jaip9zYF3Kxb0m+82xYu3lxoukQAvamZMxVTViDBzhgb5H9buBpr6fHCaCmrn8AFVN U7Gm7BPlkLcxBayotVB9od5hgikDGo1OkRP6mOsUjFkZDuS3xb5tNxZsQFzqU2AOFXfV 6sNa76Szb5Ufq3oroJ6nAdTyrwad5L9sd/Jx+3cttzTC7HfC6ZG5HhpYt6NiMfgUFCiC k2VrnkPyN5uph9plVbo5viCRt4OOgjmi3cy5BrnKw8MOSYvhXDKsACcuro9Yc05Nwd++ 4EOuklOyXx5JIQf+bwP8tOmPrwxbGSbSoD/kHcRWsQRzEwgl7Hb161AbOX2VqmaBjIdm z6kg== X-Received: by 10.236.37.201 with SMTP id y49mr311253yha.127.1373598401034; Thu, 11 Jul 2013 20:06:41 -0700 (PDT) Received: from [192.168.2.105] (177.207.87.201.dynamic.adsl.gvt.net.br. [177.207.87.201]) by mx.google.com with ESMTPSA id j63sm65024296yhh.17.2013.07.11.20.06.38 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 11 Jul 2013 20:06:40 -0700 (PDT) Message-ID: <51DF72B8.40502@bsd.com.br> Date: Fri, 12 Jul 2013 00:06:32 -0300 From: =?ISO-8859-1?Q?Otac=EDlio?= User-Agent: Mozilla/5.0 (X11; FreeBSD i386; rv:17.0) Gecko/20130406 Thunderbird/17.0.5 MIME-Version: 1.0 To: freebsd-hackers@freebsd.org Subject: Error on building cross-gcc X-Enigmail-Version: 1.5.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Gm-Message-State: ALoCoQk7/iUocigyYMh0zPXSrZUvaIzkrZuiP0CaYaUyUApO+0xq9y7ez2ZPnMA2clpSW17H8k/J X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 12 Jul 2013 03:06:41 -0000 Dears I'm tryning to build cross-gcc with this command line make TGTARCH=arm TGTABI=freebsd10 or make TGTARCH=arm TGTABI=freebsd8 on a FreeBSD squitch 8.4-RELEASE FreeBSD 8.4-RELEASE #27: Mon Jun 10 08:52:47 BRT 2013 ota@squitch:/usr/obj/usr/src/sys/SQUITCH i386 but all times I got /usr/ports/devel/cross-gcc/work/build/./gcc/xgcc -B/usr/ports/devel/cross-gcc/work/build/./gcc/ -B/usr/local/arm-freebsd10/bin/ -B/usr/local/arm-freebsd10/lib/ -isystem /usr/ports/devel/cross-gcc/work/build/./gcc -isystem /usr/local/arm-freebsd10/include -isystem /usr/local/arm-freebsd10/sys-include -g -O2 -pipe -fno-strict-aliasing -mbig-endian -O2 -g -O2 -pipe -fno-strict-aliasing -DIN_GCC -DCROSS_DIRECTORY_STRUCTURE -W -Wall -Wwrite-strings -Wcast-qual -Wstrict-prototypes -Wmissing-prototypes -Wold-style-definition -isystem ./include -fno-inline -g -DHAVE_GTHR_DEFAULT -DIN_LIBGCC2 -D__GCC_FLOAT_NOT_NEEDED -Dinhibit_libc -I. -I. -I../../.././gcc -I../../.././../gcc-4.5.4/libgcc -I../../.././../gcc-4.5.4/libgcc/. -I../../.././../gcc-4.5.4/libgcc/../gcc -I../../.././../gcc-4.5.4/libgcc/../include -DHAVE_CC_TLS -o _muldi3.o -MT _muldi3.o -MD -MP -MF _muldi3.dep -DL_muldi3 -c ../../.././../gcc-4.5.4/libgcc/../gcc/libgcc2.c \ In file included from ../../.././../gcc-4.5.4/libgcc/../gcc/tsystem.h:44:0, from ../../.././../gcc-4.5.4/libgcc/../gcc/libgcc2.c:29: /usr/ports/devel/cross-gcc/work/build/./gcc/include/stddef.h:59:24: fatal error: sys/_types.h: No such file or directory compilation terminated. gmake[4]: ** [_muldi3.o] Erro 1 Someone can give me a hint about what is happen? Thanks a lot -Otacilio From owner-freebsd-hackers@FreeBSD.ORG Fri Jul 12 06:18:12 2013 Return-Path: Delivered-To: hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id BB861322; Fri, 12 Jul 2013 06:18:12 +0000 (UTC) (envelope-from jordan.hubbard@gmail.com) Received: from mail-pd0-x22a.google.com (mail-pd0-x22a.google.com [IPv6:2607:f8b0:400e:c02::22a]) by mx1.freebsd.org (Postfix) with ESMTP id 8F5511C78; Fri, 12 Jul 2013 06:18:12 +0000 (UTC) Received: by mail-pd0-f170.google.com with SMTP id x11so8325999pdj.1 for ; Thu, 11 Jul 2013 23:18:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=content-type:mime-version:subject:from:in-reply-to:date:cc :message-id:references:to:x-mailer; bh=iPf/+XL61vSTjr49EQjXhjjs5Npdg6xxZzduXyChdJA=; b=PgJ5ZQbI3cE681X6No/+3X6NgiEC+kZ2TzoUG8CEsvjOxpHHwOf+S/X9vKKl9VGM7v O4qdgeUMlDhc0gNUwKlRBKdylbgbxYWyIIhUiXlt22c8zPJRicJBmLe1mQxJ8QtJwHEv D0DZFyVIEYvQHtuX4N2lN5OAkTn9UC1m+eGOlCUZRr3jFF6TFfTB+G3zsmBG2qpcdpuj 80VIM1UIYTRuCWAhsElMUaFT8ww4NxCbOaYyu1AHwXMZxPI5R4puknd6an5oMxsO9jOr maPgbiwUZqpzhgCReCoXmHyFGTQCorz4xtGCaXhxj3m8aXWMd94P1o2hgsHgMSfBS0Ne WKag== X-Received: by 10.68.217.137 with SMTP id oy9mr40072694pbc.130.1373609891891; Thu, 11 Jul 2013 23:18:11 -0700 (PDT) Received: from [10.20.30.70] (75-101-82-48.static.sonic.net. [75.101.82.48]) by mx.google.com with ESMTPSA id eq5sm43538520pbc.15.2013.07.11.23.18.09 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 11 Jul 2013 23:18:11 -0700 (PDT) Mime-Version: 1.0 (Mac OS X Mail 6.6 \(1510\)) Subject: Re: Kernel dumps [was Re: possible changes from Panzura] From: Hubbard Jordan In-Reply-To: Date: Thu, 11 Jul 2013 23:18:08 -0700 Message-Id: <90871AB4-6615-4571-A0F6-B605141497FA@gmail.com> References: <9890DFF1-892A-4DCA-9E33-B70681154F43@mail.turbofuzz.com> <4F0DFAB7-D6D5-4068-A543-C9DF885D1A7D@dragondata.com> <51DEC0E8.7010305@freebsd.org> To: Artem Belevich X-Mailer: Apple Mail (2.1510) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: hackers@freebsd.org, Kevin Day X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 12 Jul 2013 06:18:12 -0000 On Jul 11, 2013, at 2:05 PM, Artem Belevich wrote: > It would probably work for most of the crashes, but will not work in = few > interesting classes of failure. Using in-kernel stack implicitly = assumes > that your memory allocator still works as both the stack and the = interface > driver will need to get their mbufs and other data somewhere. Alas = it's > those unusual cases that are hardest to debug and where you really do = want > debugger or coredump to work. This is all true, though I think Julian was also suggesting that this = "fall-back vimage" would be somehow preallocated, perhaps not from the = kernel allocator pool, ahead of time. The mbufs needed for it a = "connection time" would also presumably come from a special pool, though = I don't know how much conditional logic in the existing stack would be = necessary to make sure it didn't try to allocate any data from the = generic allocator. It might indeed be easier to simply bake-in a much simpler, UDP-only = stack and polled device driver combo. - Jordan From owner-freebsd-hackers@FreeBSD.ORG Fri Jul 12 12:33:35 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id B673F250 for ; Fri, 12 Jul 2013 12:33:35 +0000 (UTC) (envelope-from c.kworr@gmail.com) Received: from mail-la0-x22c.google.com (mail-la0-x22c.google.com [IPv6:2a00:1450:4010:c03::22c]) by mx1.freebsd.org (Postfix) with ESMTP id 3E68511E3 for ; Fri, 12 Jul 2013 12:33:35 +0000 (UTC) Received: by mail-la0-f44.google.com with SMTP id er20so7703418lab.3 for ; Fri, 12 Jul 2013 05:33:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=NHeR5yvhDvwA3tn61zBM7ZVRIhGMq+SMu39cnSreaRY=; b=FwtzdW/QsA8x4irvHfdmdki9lArEYgvORCUt8gzRvWWgroUNAPm723aMaB0fmV5uC6 ZAqqQ7J/9GJwz78kJjyl95bTg52jO/3B5iNnyOPW20KQpE0GbjWGyzktiL+4bUPK/VWT Xgd0LhVTzbIoRWeFg4QpZ0hwfkF7ECdLM6bWTd9qjNUCrEPwTum3TKwxINgzWvKbruh3 d3ne3tgot0jJ1GbT6Aj/bkTI84JZW3Gze0qh7kRWOd9OxX43BVSuPprJPcaeocOHuZCF K50q2N0+9n78NmGxFI5HYvolrdMXVjp/2S5B0Z4QrV4OTiG4KyF5TQcHGKr4m7EWLlY/ kbqA== X-Received: by 10.152.42.236 with SMTP id r12mr19253762lal.46.1373632414264; Fri, 12 Jul 2013 05:33:34 -0700 (PDT) Received: from [192.168.1.139] (mau.donbass.com. [92.242.127.250]) by mx.google.com with ESMTPSA id e5sm13926010lbw.3.2013.07.12.05.33.33 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 12 Jul 2013 05:33:33 -0700 (PDT) Message-ID: <51DFF79C.905@gmail.com> Date: Fri, 12 Jul 2013 15:33:32 +0300 From: Volodymyr Kostyrko User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130710 Thunderbird/17.0.7 MIME-Version: 1.0 To: Reid Linnemann Subject: Re: Attempting to roll back zfs transactions on a disk to recover a destroyed ZFS filesystem References: In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Cc: freebsd-hackers@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 12 Jul 2013 12:33:35 -0000 11.07.2013 17:43, Reid Linnemann написав(ла): > So recently I was trying to transfer a root-on-ZFS zpool from one pair of > disks to a single, larger disk. As I am wont to do, I botched the transfer > up and decided to destroy the ZFS filesystems on the destination and start > again. Naturally I was up late working on this, being sloppy and drowsy > without any coffee, and lo and behold I issued my 'zfs destroy -R' and > immediately realized after pressing [ENTER[ that I had given it the > source's zpool name. oops. Fortunately I was able to interrupt the > procedure with only /usr being destroyed from the pool and I was able to > send/receive the truly vital data in my /var partition to the new disk and > re-deploy the base system to /usr on the new disk. The only thing I'm > really missing at this point is all of the third-party software > configuration I had in /usr/local/etc and my apache data in /usr/local/www. You can try to experiment with zpool hidden flags. Look at this command: zpool import -N -o readonly=on -f -R /pool It will try to import pool in readonly mode so no data would be written on it. It also doesn't mount anything on import so if any fs is damaged you have less chances triggering a coredump. Also zpool import has a hidden -T switch that gives you ability to select transaction that you want to try to restore. You'll need a list of available transaction though: zdb -ul This one when given a vdev lists all uberblocks with their respective transaction ids. You can take the highest one (it's not the last one) and try to mount pool with: zpool import -N -o readonly=on -f -R /pool -F -T Then check available filesystems. -- Sphinx of black quartz, judge my vow. From owner-freebsd-hackers@FreeBSD.ORG Fri Jul 12 13:14:49 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 7BDA12C9 for ; Fri, 12 Jul 2013 13:14:49 +0000 (UTC) (envelope-from db@db.net) Received: from diana.db.net (diana.db.net [66.113.102.10]) by mx1.freebsd.org (Postfix) with ESMTP id 65D1F15E5 for ; Fri, 12 Jul 2013 13:14:49 +0000 (UTC) Received: from night.db.net (localhost [127.0.0.1]) by diana.db.net (Postfix) with ESMTP id EABE12AA348; Fri, 12 Jul 2013 07:14:41 -0600 (MDT) Received: by night.db.net (Postfix, from userid 1000) id 2A56D1CC0E; Fri, 12 Jul 2013 08:14:34 -0500 (EST) Date: Fri, 12 Jul 2013 08:14:34 -0500 From: Diane Bruce To: Otac?lio Subject: Re: Error on building cross-gcc Message-ID: <20130712131434.GA56841@night.db.net> References: <51DF72B8.40502@bsd.com.br> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <51DF72B8.40502@bsd.com.br> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-hackers@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 12 Jul 2013 13:14:49 -0000 On Fri, Jul 12, 2013 at 12:06:32AM -0300, Otac?lio wrote: > Dears > > I'm tryning to build cross-gcc with this command line > > make TGTARCH=arm TGTABI=freebsd10 > > or > > make TGTARCH=arm TGTABI=freebsd8 > > on a > > FreeBSD squitch 8.4-RELEASE FreeBSD 8.4-RELEASE #27: Mon Jun 10 08:52:47 > BRT 2013 ota@squitch:/usr/obj/usr/src/sys/SQUITCH i386 > > > but all times I got > > /usr/ports/devel/cross-gcc/work/build/./gcc/xgcc > -B/usr/ports/devel/cross-gcc/work/build/./gcc/ > -B/usr/local/arm-freebsd10/bin/ -B/usr/local/arm-freebsd10/lib/ -isystem > /usr/ports/devel/cross-gcc/work/build/./gcc -isystem > /usr/local/arm-freebsd10/include -isystem > /usr/local/arm-freebsd10/sys-include -g -O2 -pipe > -fno-strict-aliasing -mbig-endian -O2 -g -O2 -pipe -fno-strict-aliasing > -DIN_GCC -DCROSS_DIRECTORY_STRUCTURE -W -Wall -Wwrite-strings > -Wcast-qual -Wstrict-prototypes -Wmissing-prototypes > -Wold-style-definition -isystem ./include -fno-inline -g > -DHAVE_GTHR_DEFAULT -DIN_LIBGCC2 -D__GCC_FLOAT_NOT_NEEDED -Dinhibit_libc > -I. -I. -I../../.././gcc -I../../.././../gcc-4.5.4/libgcc > -I../../.././../gcc-4.5.4/libgcc/. > -I../../.././../gcc-4.5.4/libgcc/../gcc > -I../../.././../gcc-4.5.4/libgcc/../include -DHAVE_CC_TLS -o _muldi3.o > -MT _muldi3.o -MD -MP -MF _muldi3.dep -DL_muldi3 -c > ../../.././../gcc-4.5.4/libgcc/../gcc/libgcc2.c \ > > In file included from ../../.././../gcc-4.5.4/libgcc/../gcc/tsystem.h:44:0, > from ../../.././../gcc-4.5.4/libgcc/../gcc/libgcc2.c:29: > /usr/ports/devel/cross-gcc/work/build/./gcc/include/stddef.h:59:24: > fatal error: sys/_types.h: No such file or directory > compilation terminated. > gmake[4]: ** [_muldi3.o] Erro 1 Did you compile cross-binutils first? Check back next week. Work is being done on this port. > > > Someone can give me a hint about what is happen? > > Thanks a lot > -Otacilio > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" -- - db@FreeBSD.org db@db.net http://www.db.net/~db From owner-freebsd-hackers@FreeBSD.ORG Fri Jul 12 14:01:08 2013 Return-Path: Delivered-To: hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 17B8E21A for ; Fri, 12 Jul 2013 14:01:08 +0000 (UTC) (envelope-from carpeddiem@gmail.com) Received: from mail-qa0-x229.google.com (mail-qa0-x229.google.com [IPv6:2607:f8b0:400d:c00::229]) by mx1.freebsd.org (Postfix) with ESMTP id D2F2C1831 for ; Fri, 12 Jul 2013 14:01:07 +0000 (UTC) Received: by mail-qa0-f41.google.com with SMTP id f14so345153qak.14 for ; Fri, 12 Jul 2013 07:01:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=6SGIF+3GSrn1IsUouWAQK/95Sg0EY9yyStvHi6dpK1g=; b=AhqVX4Tr5KDWhMOJGN2C79TFg4gLwbGmWTjfNxuEATjCQsLcZjr5Qa8d6FhwBMMHoD BH3LfYu+Csw1qjrsfGcdb4hvmmFGww4f2wAbzSe8U7cgEjHQ3bhiecDAAq4P6B0ioB/v gvQwbV/Ffr+iG4QqOMba9/llUektxfP8NkwUnNAs7g4uQBxPwRdDr3vcGMbOhWTeHgQ7 k+qRoreU0LUJallKMVULY4fKGXkb59+27qrYv8FLaginhEjtqQAs/r0TP50ZPNQhikEA nfh1lZMPa2zeZRQ8q49l0+TZNQFr9y7wJruxySXMo3tXBp6z3Wlaskc5Ul2Cq22R/rg8 zDcA== MIME-Version: 1.0 X-Received: by 10.229.134.2 with SMTP id h2mr6963130qct.94.1373637667445; Fri, 12 Jul 2013 07:01:07 -0700 (PDT) Sender: carpeddiem@gmail.com Received: by 10.224.209.6 with HTTP; Fri, 12 Jul 2013 07:01:07 -0700 (PDT) In-Reply-To: <51DDE91E.4000400@unsane.co.uk> References: <9890DFF1-892A-4DCA-9E33-B70681154F43@mail.turbofuzz.com> <4F0DFAB7-D6D5-4068-A543-C9DF885D1A7D@dragondata.com> <51DDE91E.4000400@unsane.co.uk> Date: Fri, 12 Jul 2013 10:01:07 -0400 X-Google-Sender-Auth: l8AkiZuHTcfYX7nG0VvkD0U59Gw Message-ID: Subject: Re: Kernel dumps [was Re: possible changes from Panzura] From: Ed Maste To: Vincent Hoffman Content-Type: text/plain; charset=ISO-8859-1 Cc: hackers@freebsd.org, Jordan Hubbard , Kevin Day X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 12 Jul 2013 14:01:08 -0000 On 10 July 2013 19:07, Vincent Hoffman wrote: > > There was some work on something similar at one point, not sure what > came of it. > http://lists.freebsd.org/pipermail/freebsd-current/2010-September/020164.html The code referenced there has been used in production since 2005 or so, and is based on an earlier implementation for FreeBSD 4.x that dates to 2000. Despite some shortcomings in the implementation it has proved quite reliable in practice. It hasn't made it into the tree yet for reasons raised in this thread. The primary issue is that it allocates mbufs in the packet sending path, and so relies on a number of kernel subsystems to be in a consistent state. It doesn't use the stack, routing table, or driver interrupt interfaces though. It could likely be made committable with a relatively small effort to switch to preallocating an mbuf+cluster or two at configuration time. More information is on the FreeBSD wiki, at https://wiki.freebsd.org/Netdump From owner-freebsd-hackers@FreeBSD.ORG Fri Jul 12 14:41:56 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id B637D187 for ; Fri, 12 Jul 2013 14:41:56 +0000 (UTC) (envelope-from linnemannr@gmail.com) Received: from mail-oa0-x22a.google.com (mail-oa0-x22a.google.com [IPv6:2607:f8b0:4003:c02::22a]) by mx1.freebsd.org (Postfix) with ESMTP id 8423A1A2C for ; Fri, 12 Jul 2013 14:41:56 +0000 (UTC) Received: by mail-oa0-f42.google.com with SMTP id j6so13054952oag.1 for ; Fri, 12 Jul 2013 07:41:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=EqftKSIizxWWGqMIl1x8IUfcA+JH0B/oezBaC2bP3m8=; b=VfMYTkUr0gkkdSLCMoP5Dmthjk7+Z0fPHsdSxo8EgUdFINfvkFdd1yo6IGWdJwNHC5 /pUMx2MlmFiGdIsWPSe92OmyjKxjB42FRLIsPcDCmEpKfHTKGaATJKRzxGDv5sS2p25J eZoExcUC1U7WLJSP+RqilwOpbRXkYbdwUdOCe5/r6Ky0JHSBBMVoQiQg+rRY6K8x3/Fz 3L+9/B/tOkbaAwtTraf0Yufuyq3mcffXOvsWeXsvEfKebiNq4xqwwLAbRtTG65ODD3Gf 0RgQjRFC+3EecAxxPoyU6iYvcI4jDmNkL6EnCSmeoTsNpCxQTR54eKjSocZlo2s/GbVj S5Ag== MIME-Version: 1.0 X-Received: by 10.60.103.211 with SMTP id fy19mr35599737oeb.103.1373640116097; Fri, 12 Jul 2013 07:41:56 -0700 (PDT) Received: by 10.182.122.97 with HTTP; Fri, 12 Jul 2013 07:41:56 -0700 (PDT) In-Reply-To: <51DFF79C.905@gmail.com> References: <51DFF79C.905@gmail.com> Date: Fri, 12 Jul 2013 09:41:56 -0500 Message-ID: Subject: Re: Attempting to roll back zfs transactions on a disk to recover a destroyed ZFS filesystem From: Reid Linnemann To: Volodymyr Kostyrko Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: freebsd-hackers@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 12 Jul 2013 14:41:56 -0000 Hey presto! /> zfs list NAME USED AVAIL REFER MOUNTPOINT bucket 485G 1.30T 549M legacy bucket/tmp 21K 1.30T 21K legacy bucket/usr 29.6G 1.30T 29.6G /mnt/usr bucket/var 455G 1.30T 17.7G /mnt/var bucket/var/srv 437G 1.30T 437G /mnt/var/srv There's my old bucket! Thanks much for the hidden -T argument, Volodymyr! Now I can get back the remainder of my missing configuration. On Fri, Jul 12, 2013 at 7:33 AM, Volodymyr Kostyrko wrot= e: > 11.07.2013 17:43, Reid Linnemann =D0=BD=D0=B0=D0=BF=D0=B8=D1=81=D0=B0=D0= =B2(=D0=BB=D0=B0): > > So recently I was trying to transfer a root-on-ZFS zpool from one pair o= f >> disks to a single, larger disk. As I am wont to do, I botched the transf= er >> up and decided to destroy the ZFS filesystems on the destination and sta= rt >> again. Naturally I was up late working on this, being sloppy and drowsy >> without any coffee, and lo and behold I issued my 'zfs destroy -R' and >> immediately realized after pressing [ENTER[ that I had given it the >> source's zpool name. oops. Fortunately I was able to interrupt the >> procedure with only /usr being destroyed from the pool and I was able to >> send/receive the truly vital data in my /var partition to the new disk a= nd >> re-deploy the base system to /usr on the new disk. The only thing I'm >> really missing at this point is all of the third-party software >> configuration I had in /usr/local/etc and my apache data in >> /usr/local/www. >> > > You can try to experiment with zpool hidden flags. Look at this command: > > zpool import -N -o readonly=3Don -f -R /pool > > It will try to import pool in readonly mode so no data would be written o= n > it. It also doesn't mount anything on import so if any fs is damaged you > have less chances triggering a coredump. Also zpool import has a hidden -= T > switch that gives you ability to select transaction that you want to try = to > restore. You'll need a list of available transaction though: > > zdb -ul > > This one when given a vdev lists all uberblocks with their respective > transaction ids. You can take the highest one (it's not the last one) and > try to mount pool with: > > zpool import -N -o readonly=3Don -f -R /pool -F -T > > Then check available filesystems. > > -- > Sphinx of black quartz, judge my vow. > From owner-freebsd-hackers@FreeBSD.ORG Fri Jul 12 16:06:00 2013 Return-Path: Delivered-To: hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id CD839E06; Fri, 12 Jul 2013 16:06:00 +0000 (UTC) (envelope-from julian@freebsd.org) Received: from vps1.elischer.org (vps1.elischer.org [204.109.63.16]) by mx1.freebsd.org (Postfix) with ESMTP id A3A341DE5; Fri, 12 Jul 2013 16:06:00 +0000 (UTC) Received: from jre-mbp.elischer.org (ppp121-45-226-51.lns20.per1.internode.on.net [121.45.226.51]) (authenticated bits=0) by vps1.elischer.org (8.14.5/8.14.5) with ESMTP id r6CG5qVc037450 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Fri, 12 Jul 2013 09:05:56 -0700 (PDT) (envelope-from julian@freebsd.org) Message-ID: <51E0295A.4050200@freebsd.org> Date: Sat, 13 Jul 2013 00:05:46 +0800 From: Julian Elischer User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:17.0) Gecko/20130620 Thunderbird/17.0.7 MIME-Version: 1.0 To: Ed Maste Subject: Re: Kernel dumps [was Re: possible changes from Panzura] References: <9890DFF1-892A-4DCA-9E33-B70681154F43@mail.turbofuzz.com> <4F0DFAB7-D6D5-4068-A543-C9DF885D1A7D@dragondata.com> <51DDE91E.4000400@unsane.co.uk> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: hackers@freebsd.org, Kevin Day , Jordan Hubbard , Vincent Hoffman X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 12 Jul 2013 16:06:00 -0000 On 7/12/13 10:01 PM, Ed Maste wrote: > On 10 July 2013 19:07, Vincent Hoffman wrote: >> There was some work on something similar at one point, not sure what >> came of it. >> http://lists.freebsd.org/pipermail/freebsd-current/2010-September/020164.html > The code referenced there has been used in production since 2005 or > so, and is based on an earlier implementation for FreeBSD 4.x that > dates to 2000. Despite some shortcomings in the implementation it has > proved quite reliable in practice. > > It hasn't made it into the tree yet for reasons raised in this thread. > The primary issue is that it allocates mbufs in the packet sending > path, and so relies on a number of kernel subsystems to be in a > consistent state. It doesn't use the stack, routing table, or driver > interrupt interfaces though. It could likely be made committable with > a relatively small effort to switch to preallocating an mbuf+cluster > or two at configuration time. > > More information is on the FreeBSD wiki, at https://wiki.freebsd.org/Netdump I would say this is one of the features I've looked for a LOT over the last 20 years. > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" > > From owner-freebsd-hackers@FreeBSD.ORG Fri Jul 12 16:26:16 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 6A439672 for ; Fri, 12 Jul 2013 16:26:16 +0000 (UTC) (envelope-from otacilio.neto@bsd.com.br) Received: from mail-yh0-x232.google.com (mail-yh0-x232.google.com [IPv6:2607:f8b0:4002:c01::232]) by mx1.freebsd.org (Postfix) with ESMTP id 2C14E1ED6 for ; Fri, 12 Jul 2013 16:26:16 +0000 (UTC) Received: by mail-yh0-f50.google.com with SMTP id i72so3770653yha.9 for ; Fri, 12 Jul 2013 09:26:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsd.com.br; s=capeta; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:x-enigmail-version:content-type :content-transfer-encoding; bh=M3vCJcLHWIDvx+JlxzCkqGsLutuDrEmUy5hwj/gTAyY=; b=RiwznauFEK5N/XGj2u79v+c2N3OaKUjMq3MwrALUhIUrfs86cWCNubQ5eZIAIX8Iuy KwtC0jfBL9NVA+zqiM7PaoCIFHaD47p1L/19V+qqnqeVpCbuePUFAEhUoDwUuBmbEv4M 6WsilFwQyZfcb/4y5ZLN1qyiXOV3N+bys5DAM= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:x-enigmail-version:content-type :content-transfer-encoding:x-gm-message-state; bh=M3vCJcLHWIDvx+JlxzCkqGsLutuDrEmUy5hwj/gTAyY=; b=HnqStFAPsp4lgA174oCRiGjfcUfO0wDUnPxj4SZ+R2F6sxgLcPR+z9zPuzdnQyYbT+ rWFLUO7dIaeRM8WqJiwqBTRP+ZSw3EQpiOBzRiQmnr4aQDrKXfL6M5393oR2l7Dt4Bmq rzgW669lV4W8AHB6TIzG4/T3Lwklk20RJeOMdkY/6ujmS3b+L4mnFUm1EUw2LW9HKi/T VPs2n/bO8KyuOlOKCwo7IjGWkEVBmjSCnVHN5sojYYRdpphJ/AUUUagkfAS2JqVEF/qj sIDtKaBBqwu2lrE3uApxd6tBXOauuDVq3aRkY0wVUEaxZOsb6k5N6bK4VWniAdbZdouo AOSg== X-Received: by 10.236.5.142 with SMTP id 14mr24534861yhl.207.1373646375102; Fri, 12 Jul 2013 09:26:15 -0700 (PDT) Received: from [192.168.2.105] (177.207.87.201.dynamic.adsl.gvt.net.br. [177.207.87.201]) by mx.google.com with ESMTPSA id o1sm68995745yho.2.2013.07.12.09.26.13 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 12 Jul 2013 09:26:14 -0700 (PDT) Message-ID: <51E02E21.5030608@bsd.com.br> Date: Fri, 12 Jul 2013 13:26:09 -0300 From: =?ISO-8859-1?Q?Otac=EDlio?= User-Agent: Mozilla/5.0 (X11; FreeBSD i386; rv:17.0) Gecko/20130406 Thunderbird/17.0.5 MIME-Version: 1.0 To: Diane Bruce Subject: Re: Error on building cross-gcc References: <51DF72B8.40502@bsd.com.br> <20130712131434.GA56841@night.db.net> In-Reply-To: <20130712131434.GA56841@night.db.net> X-Enigmail-Version: 1.5.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Gm-Message-State: ALoCoQn53FH29LWnX8CjkMttxOJS9jZzR92K1OhgGZwDxnWwjfivtnCO6yzMr2zpI6mZtsyVBLj5 Cc: freebsd-hackers@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 12 Jul 2013 16:26:16 -0000 On 12/07/2013 10:14, Diane Bruce wrote: > On Fri, Jul 12, 2013 at 12:06:32AM -0300, Otac?lio wrote: >> Dears >> >> I'm tryning to build cross-gcc with this command line >> >> make TGTARCH=arm TGTABI=freebsd10 >> >> or >> >> make TGTARCH=arm TGTABI=freebsd8 >> >> on a >> >> FreeBSD squitch 8.4-RELEASE FreeBSD 8.4-RELEASE #27: Mon Jun 10 08:52:47 >> BRT 2013 ota@squitch:/usr/obj/usr/src/sys/SQUITCH i386 >> >> >> but all times I got >> >> /usr/ports/devel/cross-gcc/work/build/./gcc/xgcc >> -B/usr/ports/devel/cross-gcc/work/build/./gcc/ >> -B/usr/local/arm-freebsd10/bin/ -B/usr/local/arm-freebsd10/lib/ -isystem >> /usr/ports/devel/cross-gcc/work/build/./gcc -isystem >> /usr/local/arm-freebsd10/include -isystem >> /usr/local/arm-freebsd10/sys-include -g -O2 -pipe >> -fno-strict-aliasing -mbig-endian -O2 -g -O2 -pipe -fno-strict-aliasing >> -DIN_GCC -DCROSS_DIRECTORY_STRUCTURE -W -Wall -Wwrite-strings >> -Wcast-qual -Wstrict-prototypes -Wmissing-prototypes >> -Wold-style-definition -isystem ./include -fno-inline -g >> -DHAVE_GTHR_DEFAULT -DIN_LIBGCC2 -D__GCC_FLOAT_NOT_NEEDED -Dinhibit_libc >> -I. -I. -I../../.././gcc -I../../.././../gcc-4.5.4/libgcc >> -I../../.././../gcc-4.5.4/libgcc/. >> -I../../.././../gcc-4.5.4/libgcc/../gcc >> -I../../.././../gcc-4.5.4/libgcc/../include -DHAVE_CC_TLS -o _muldi3.o >> -MT _muldi3.o -MD -MP -MF _muldi3.dep -DL_muldi3 -c >> ../../.././../gcc-4.5.4/libgcc/../gcc/libgcc2.c \ >> >> In file included from ../../.././../gcc-4.5.4/libgcc/../gcc/tsystem.h:44:0, >> from ../../.././../gcc-4.5.4/libgcc/../gcc/libgcc2.c:29: >> /usr/ports/devel/cross-gcc/work/build/./gcc/include/stddef.h:59:24: >> fatal error: sys/_types.h: No such file or directory >> compilation terminated. >> gmake[4]: ** [_muldi3.o] Erro 1 > > Did you compile cross-binutils first? > > Check back next week. Work is being done on this port. >> >> >> Someone can give me a hint about what is happen? >> >> Thanks a lot >> -Otacilio >> _______________________________________________ Yes I compile cross-binutils first. It is a dependency. Thank you!