章 17. 儲存設備

This translation may be out of date. To help with the translations please access the FreeBSD translations instance.

17.1. 概述

本章涵蓋如何在 FreeBSD 下使用磁碟及儲存媒體,這包含 SCSI 及 IDE 磁碟、CD 及 DVD 媒體、記憶體磁碟及 USB 儲存裝置。

讀完這章,您將了解:

  • 如何在 FreeBSD 系統加入額外的硬碟。

  • 如何在 FreeBSD 擴增磁碟分割區的大小。

  • 如何設定 FreeBSD 使用 USB 儲存裝置。

  • 如何在 FreeBSD 系統使用 CD 及 DVD 媒體。

  • 如何使用在 FreeBSD 下可用的備份程式。

  • 如何設定記憶體磁碟。

  • 什麼是檔案系統快照 (Snapshot) 以及如何有效使用。

  • 如何使用配額 (Quota) 來限制磁碟空間使用量。

  • 如何加密磁碟及交換空間來防範攻擊者。

  • 如何設定高可用性 (Highly available) 的儲存網路。

在開始閱讀這章之前,您需要:

17.2. 加入磁碟

本節將說明如何加入新的 SATA 磁碟到目前只有一個磁碟的機器上。 首先要關閉電腦並依照電腦、控制器及磁碟製造商的操作指南將磁碟安裝到電腦。重新啟動系統並登入 root

查看 /var/run/dmesg.boot 來確認已經找到新的磁碟。在本例中,會以 ada1 代表新加入的 SATA 磁碟。

在本例中,會在新的磁碟上建立單一大型分割區,使用 GPT 分割表格式而非較舊與通用性較差的 MBR 結構。

若新加入的磁碟不是空白的,可以使用 gpart delete 來移除舊的分割區資訊。請參考 gpart(8) 取得詳細資訊。

建立完分割表格式後接著加入一個分割區,要在新的磁碟增進效能可使用較大的硬體區塊大小 (Block size),此分割區會對齊 1 MB 的邊界:

# gpart create -s GPT ada1
# gpart add -t freebsd-ufs -a 1M ada1

依據使用情況,也可以使用較小的分割區。請參考 gpart(8) 來取得建立較小分割區的選項。

磁碟分割區資訊可以使用 gpart show 檢視:

% gpart show ada1
=>        34  1465146988  ada1  GPT  (699G)
          34        2014        - free -  (1.0M)
        2048  1465143296     1  freebsd-ufs  (699G)
  1465145344        1678        - free -  (839K)

在新磁碟的新分割區上建立檔案系統:

# newfs -U /dev/ada1p1

建立一個空的目錄做來做為掛載點 (mountpoint),一個在原有磁碟的檔案系統上可用來掛載新磁碟的位置:

# mkdir /newdisk

最後,將磁碟項目加入到 /etc/fstab,讓啟動時會自動掛載新的磁碟:

/dev/ada1p1	/newdisk	ufs	rw	2	2

新的磁碟也可手動掛載,無須重新啟動系統:

# mount /newdisk

17.3. 重設大小與擴增磁碟

磁碟的容量可以增加且不需要更動任何已存在的資料。這時常會用在虛擬機器,當虛擬磁碟太小且需要增加時。有時磁碟映像檔會被寫入到 USB 隨身碟,但卻沒有使用全部的容量。此節我們將說明如合重設大小或 擴增 磁碟內容來使用增加的容量。

要取得要重設大小的磁碟的代號可以查看 /var/run/dmesg.boot。在本例中,在系統上只有一個 SATA 磁碟,該磁碟會以 ada0 表示。

列出在磁碟上的分割區來查看目前的設定:

# gpart show ada0
=>      34  83886013  ada0  GPT  (48G) [CORRUPT]
        34       128     1  freebsd-boot  (64k)
       162  79691648     2  freebsd-ufs  (38G)
  79691810   4194236     3  freebsd-swap  (2G)
  83886046         1        - free -  (512B)

若磁碟已使用 GPT 分割表格式做格式化,可能會顯示為 "已損壞 (corrupted)" 因為 GPT 備份分割區已不存在於磁碟結尾。 使用 gpart 來修正備份分割區:

# gpart recover ada0
ada0 recovered

現在在磁碟上的額外空間已經可以被新的分割區使用,或者可以拿來擴充既有的分割區:

# gpart show ada0
=>       34  102399933  ada0  GPT  (48G)
         34        128     1  freebsd-boot  (64k)
        162   79691648     2  freebsd-ufs  (38G)
   79691810    4194236     3  freebsd-swap  (2G)
   83886046   18513921        - free -  (8.8G)

分割區只能在連續的未使用空間上重設大小。在這個例子中,磁碟上最後的分割區為交換 (Swap) 分割區,而第二個分割區才是需要重設大小的分割區。由於交換分割區中只會有暫存的資料,所以此時可以安全的卸載、刪除,然後在重設第二個分割區大小之後再重建最後一個分割區。

停用交換分割區:

# swapoff /dev/ada0p3

刪除 ada0 磁碟上的第三個分割區,可使用 -i 參數來指定分割區。

# gpart delete -i 3 ada0
ada0p3 deleted
# gpart show ada0
=>       34  102399933  ada0  GPT  (48G)
         34        128     1  freebsd-boot  (64k)
        162   79691648     2  freebsd-ufs  (38G)
   79691810   22708157        - free -  (10G)

在掛載的檔案系統上修改分割區表可能會造成資料遺失。最好的方式是在未掛載檔案系統的情況下 (使用 Live CD-ROM 或 USB 裝置) 執行以下步驟。雖然如此,若仍要這樣做的話,在關閉 GEOM 安全性功能之後可以在掛載的檔案系統上修改分割區表:

# sysctl kern.geom.debugflags=16

重設分割區大小並保留要用來重建交換分割區的空間,要重設大小的分割區可以用 -i 來指定,而要重設的大小可用 -s 來指定,若要對齊分割區可以使用 -a。這個動作只會修改分割區大小,分割區中的檔案系統需在另一個步驟擴增。

# gpart resize -i 2 -s 47G -a 4k ada0
ada0p2 resized
# gpart show ada0
=>       34  102399933  ada0  GPT  (48G)
         34        128     1  freebsd-boot  (64k)
        162   98566144     2  freebsd-ufs  (47G)
   98566306    3833661        - free -  (1.8G)

重建交換分割區並且啟動,若不使用 -s 指定大小則會使用所有剩餘的空間:

# gpart add -t freebsd-swap -a 4k ada0
ada0p3 added
# gpart show ada0
=>       34  102399933  ada0  GPT  (48G)
         34        128     1  freebsd-boot  (64k)
        162   98566144     2  freebsd-ufs  (47G)
   98566306    3833661     3  freebsd-swap  (1.8G)
# swapon /dev/ada0p3

擴增 UFS 檔案系統來使用重設分割區大小之後的新容量:

# growfs /dev/ada0p2
Device is mounted read-write; resizing will result in temporary write suspension for /.
It's strongly recommended to make a backup before growing the file system.
OK to grow file system on /dev/ada0p2, mounted on /, from 38GB to 47GB? [Yes/No] Yes
super-block backups (for fsck -b #) at:
 80781312, 82063552, 83345792, 84628032, 85910272, 87192512, 88474752,
 89756992, 91039232, 92321472, 93603712, 94885952, 96168192, 97450432

若檔案系統使用 ZFS,重設大小需執行 online 子指令並使用 -e 來觸發動作:

# zpool online -e zroot /dev/ada0p2

現在分割區與檔案系統已透過重設大小來使用新增加的磁碟空間。

17.4. USB 儲存裝置

許多外部儲存裝置的解決方案,例如硬碟、USB 隨身碟及 CD 與 DVD 燒錄機皆使用通用序列匯流排 (Universal Serial Bus, USB),FreeBSD 提供了對 USB 1.x, 2.0 及 3.0 裝置的支援。

部份硬體尚不相容 USB 3.0,包含 Haswell (Lynx point) 晶片組,若 FreeBSD 開機出現 failed with error 19 訊息,請在系統 BIOS 關閉 xHCI/USB3。

對 USB 儲存裝置的支援已內建於 GENERIC 核心,若為自訂的核心,請確定在核心設定檔中有下列幾行設定:

device scbus	# SCSI bus (required for ATA/SCSI)
device da	# Direct Access (disks)
device pass	# Passthrough device (direct ATA/SCSI access)
device uhci	# provides USB 1.x support
device ohci	# provides USB 1.x support
device ehci	# provides USB 2.0 support
device xhci	# provides USB 3.0 support
device usb	# USB Bus (required)
device umass	# Disks/Mass storage - Requires scbus and da
device cd	# needed for CD and DVD burners

FreeBSD 使用 umass(4) 驅動程式透過 SCSI 子系統來存取 USB 儲存裝置,因此任何在系統的 USB 裝置都會以 SCSI 裝置呈現,若 USB 裝置是 CD 或 DVD 燒錄機,請不要在自訂核心設定檔中引用 device atapicam

本節後續的部份將示範如何檢查 FreeBSD 能夠辦識 USB 儲存裝置以及如何設定該裝置。

17.4.1. 裝置設定

要測試 USB 設定,請先插入 USB 裝置,然後使用 dmesg 來確認系統訊息緩衝區中有出現該磁碟機,該訊息如下:

umass0: <STECH Simple Drive, class 0/0, rev 2.00/1.04, addr 3> on usbus0
umass0:  SCSI over Bulk-Only; quirks = 0x0100
umass0:4:0:-1: Attached to scbus4
da0 at umass-sim0 bus 0 scbus4 target 0 lun 0
da0: <STECH Simple Drive 1.04> Fixed Direct Access SCSI-4 device
da0: Serial Number WD-WXE508CAN263
da0: 40.000MB/s transfers
da0: 152627MB (312581808 512 byte sectors: 255H 63S/T 19457C)
da0: quirks=0x2<NO_6_BYTE>

不同的裝置會有不同的廠牌、裝置節點 (da0)、速度與大小。

當 USB 裝置可以做為 SCSI 檢視時,便可使用 camcontrol 來列出連接到系統的 USB 儲存裝置:

# camcontrol devlist
<STECH Simple Drive 1.04>          at scbus4 target 0 lun 0 (pass3,da0)

或者,可以使用 usbconfig 來列出裝置,請參考 usbconfig(8) 來取得更多有關此指令的資訊。

# usbconfig
ugen0.3: <Simple Drive STECH> at usbus0, cfg=0 md=HOST spd=HIGH (480Mbps) pwr=ON (2mA)

若該裝置尚未被格式化,請參考 加入磁碟 中有關如何在 USB 磁碟格式化與建立分割區的說明。若磁碟中有檔案系統,可由 root 依據 掛載與卸載檔案系統 中的說明掛載磁碟。

要允許未被信任的使用者掛載任意媒體,可開啟 vfs.usermount,詳細說明如下。從安全性的角度來看這並不是安全的,大多的檔案系統並不會防範惡意裝置。

要讓裝置可讓一般使用者掛載,其中一個解決方案便是使用 pw(8) 讓所有裝置的使用者成為 operator 群組的成員。接著,將下列幾行加入 /etc/devfs.rules 來確保 operator 能夠讀取與寫入裝置:

[localrules=5]
add path 'da*' mode 0660 group operator

若系統也同時安裝了內建 SCSI 磁碟,請更改第二行如下:

add path 'da[3-9]*' mode 0660 group operator

這會從 operator 群組中排除前三個 SCSI 磁碟 (da0da2),接著取代 3 為內部 SCSI 磁碟的編號。請參考 devfs.rules(5) 來取得更多有關此檔案的資訊。

接著,在 /etc/rc.conf 開啟規則:

devfs_system_ruleset="localrules"

然後,加入以下行到 /etc/sysctl.conf 指示系統允許正常使用者掛載檔案系統:

vfs.usermount=1

這樣只會在下次重新開機時生效,可使用 sysctl 來立即設定這個變數:

# sysctl vfs.usermount=1
vfs.usermount: 0 -> 1

最後一個步驟是建立要掛載檔案系統要的目錄,要掛載檔案系統的使用者需要擁有這個目錄。其中一個辦法是讓 root 建立由該使用者擁有的子目錄 /mnt/username。在下面的例子,將 username 替換為該使用者的登入名稱並將 usergroup 替換為該使用者的主要群組:

# mkdir /mnt/username
# chown username:usergroup /mnt/username

假如已經插入 USB 隨身碟,且已出現 /dev/da0s1 裝置。若裝置使用 FAT 格式的檔案系統,則使用者可使用以下指令掛載該檔案系統:

% mount -t msdosfs -o -m=644,-M=755 /dev/da0s1 /mnt/username

在裝置可以被拔除前,必須先卸載:

% umount /mnt/username

裝置移除之後,系統訊息緩衝區會顯示如下的訊息:

umass0: at uhub3, port 2, addr 3 (disconnected)
da0 at umass-sim0 bus 0 scbus4 target 0 lun 0
da0: <STECH Simple Drive 1.04> s/n WD-WXE508CAN263          detached
(da0:umass-sim0:0:0:0): Periph destroyed

17.4.2. 自動掛載可移除的媒體

可以取消註解在 /etc/auto_master 中的下行來自動掛載 USB 裝置:

/media		-media		-nosuid

然後加入這些行到 /etc/devd.conf

notify 100 {
	match "system" "GEOM";
	match "subsystem" "DEV";
	action "/usr/sbin/automount -c";
};

autofs(5) 以及 devd(8) 已經正在執行,則需重新載入設定:

# service automount restart
# service devd restart

要設定讓 autofs(5) 在開機時啟動可以加入此行到 /etc/rc.conf

autofs_enable="YES"

autofs(5) 需要開啟 devd(8),預設已經開啟。

立即啟動服務:

# service automount start
# service automountd start
# service autounmountd start
# service devd start

可以被自動掛載的檔案系統會在 /media/ 中以目錄呈現,會以檔案系統的標籤來命名目錄,若標籤遺失,則會以裝置節點命名。

檔案系統會在第一次存取時自動掛載,並在一段時間未使用後自動卸載。自動掛載的磁碟也可手動卸載:

# automount -fu

這個機制一般會用在記憶卡與 USB 隨身碟,也可用在任何 Block 裝置,包含光碟機或 iSCSILUN。

17.5. 建立與使用 CD 媒體

Compact Disc (CD) media provide a number of features that differentiate them from conventional disks. They are designed so that they can be read continuously without delays to move the head between tracks. While CD media do have tracks, these refer to a section of data to be read continuously, and not a physical property of the disk. The ISO 9660 file system was designed to deal with these differences.

The FreeBSD Ports Collection provides several utilities for burning and duplicating audio and data CDs. This chapter demonstrates the use of several command line utilities. For CD burning software with a graphical utility, consider installing the sysutils/xcdroast or sysutils/k3b packages or ports.

17.5.1. 支援的裝置

The GENERIC kernel provides support for SCSI, USB, and ATAPICD readers and burners. If a custom kernel is used, the options that need to be present in the kernel configuration file vary by the type of device.

For a SCSI burner, make sure these options are present:

device scbus	# SCSI bus (required for ATA/SCSI)
device da	# Direct Access (disks)
device pass	# Passthrough device (direct ATA/SCSI access)
device cd	# needed for CD and DVD burners

For a USB burner, make sure these options are present:

device scbus	# SCSI bus (required for ATA/SCSI)
device da	# Direct Access (disks)
device pass	# Passthrough device (direct ATA/SCSI access)
device cd	# needed for CD and DVD burners
device uhci	# provides USB 1.x support
device ohci	# provides USB 1.x support
device ehci	# provides USB 2.0 support
device xhci	# provides USB 3.0 support
device usb	# USB Bus (required)
device umass	# Disks/Mass storage - Requires scbus and da

For an ATAPI burner, make sure these options are present:

device ata	# Legacy ATA/SATA controllers
device scbus	# SCSI bus (required for ATA/SCSI)
device pass	# Passthrough device (direct ATA/SCSI access)
device cd	# needed for CD and DVD burners

On FreeBSD versions prior to 10.x, this line is also needed in the kernel configuration file if the burner is an ATAPI device:

device atapicam

Alternately, this driver can be loaded at boot time by adding the following line to /boot/loader.conf:

atapicam_load="YES"

This will require a reboot of the system as this driver can only be loaded at boot time.

To verify that FreeBSD recognizes the device, run dmesg and look for an entry for the device. On systems prior to 10.x, the device name in the first line of the output will be acd0 instead of cd0.

% dmesg | grep cd
cd0 at ahcich1 bus 0 scbus1 target 0 lun 0
cd0: <HL-DT-ST DVDRAM GU70N LT20> Removable CD-ROM SCSI-0 device
cd0: Serial Number M3OD3S34152
cd0: 150.000MB/s transfers (SATA 1.x, UDMA6, ATAPI 12bytes, PIO 8192bytes)
cd0: Attempt to query device size failed: NOT READY, Medium not present - tray closed

17.5.2. 燒錄 CD

In FreeBSD, cdrecord can be used to burn CDs. This command is installed with the sysutils/cdrtools package or port.

While cdrecord has many options, basic usage is simple. Specify the name of the ISO file to burn and, if the system has multiple burner devices, specify the name of the device to use:

# cdrecord dev=device imagefile.iso

To determine the device name of the burner, use -scanbus which might produce results like this:

# cdrecord -scanbus
ProDVD-ProBD-Clone 3.00 (amd64-unknown-freebsd10.0) Copyright (C) 1995-2010 Jörg Schilling
Using libscg version 'schily-0.9'
scsibus0:
        0,0,0     0) 'SEAGATE ' 'ST39236LW       ' '0004' Disk
        0,1,0     1) 'SEAGATE ' 'ST39173W        ' '5958' Disk
        0,2,0     2) *
        0,3,0     3) 'iomega  ' 'jaz 1GB         ' 'J.86' Removable Disk
        0,4,0     4) 'NEC     ' 'CD-ROM DRIVE:466' '1.26' Removable CD-ROM
        0,5,0     5) *
        0,6,0     6) *
        0,7,0     7) *
scsibus1:
        1,0,0   100) *
        1,1,0   101) *
        1,2,0   102) *
        1,3,0   103) *
        1,4,0   104) *
        1,5,0   105) 'YAMAHA  ' 'CRW4260         ' '1.0q' Removable CD-ROM
        1,6,0   106) 'ARTEC   ' 'AM12S           ' '1.06' Scanner
        1,7,0   107) *

Locate the entry for the CD burner and use the three numbers separated by commas as the value for dev. In this case, the Yamaha burner device is 1,5,0, so the appropriate input to specify that device is dev=1,5,0. Refer to the manual page for cdrecord for other ways to specify this value and for information on writing audio tracks and controlling the write speed.

Alternately, run the following command to get the device address of the burner:

# camcontrol devlist
<MATSHITA CDRW/DVD UJDA740 1.00>   at scbus1 target 0 lun 0 (cd0,pass0)

Use the numeric values for scbus, target, and lun. For this example, 1,0,0 is the device name to use.

17.5.3. 寫入資料到一個 ISO 檔案系統

In order to produce a data CD, the data files that are going to make up the tracks on the CD must be prepared before they can be burned to the CD. In FreeBSD, sysutils/cdrtools installs mkisofs, which can be used to produce an ISO 9660 file system that is an image of a directory tree within a UNIX™ file system. The simplest usage is to specify the name of the ISO file to create and the path to the files to place into the ISO 9660 file system:

# mkisofs -o imagefile.iso /path/to/tree

This command maps the file names in the specified path to names that fit the limitations of the standard ISO 9660 file system, and will exclude files that do not meet the standard for ISO file systems.

A number of options are available to overcome the restrictions imposed by the standard. In particular, -R enables the Rock Ridge extensions common to UNIX™ systems and -J enables Joliet extensions used by Microsoft™ systems.

For CDs that are going to be used only on FreeBSD systems, -U can be used to disable all filename restrictions. When used with -R, it produces a file system image that is identical to the specified FreeBSD tree, even if it violates the ISO 9660 standard.

The last option of general use is -b. This is used to specify the location of a boot image for use in producing an "El Torito" bootable CD. This option takes an argument which is the path to a boot image from the top of the tree being written to the CD. By default, mkisofs creates an ISO image in "floppy disk emulation" mode, and thus expects the boot image to be exactly 1200, 1440 or 2880 KB in size. Some boot loaders, like the one used by the FreeBSD distribution media, do not use emulation mode. In this case, -no-emul-boot should be used. So, if /tmp/myboot holds a bootable FreeBSD system with the boot image in /tmp/myboot/boot/cdboot, this command would produce /tmp/bootable.iso:

# mkisofs -R -no-emul-boot -b boot/cdboot -o /tmp/bootable.iso /tmp/myboot

The resulting ISO image can be mounted as a memory disk with:

# mdconfig -a -t vnode -f /tmp/bootable.iso -u 0
# mount -t cd9660 /dev/md0 /mnt

One can then verify that /mnt and /tmp/myboot are identical.

There are many other options available for mkisofs to fine-tune its behavior. Refer to mkisofs(8) for details.

It is possible to copy a data CD to an image file that is functionally equivalent to the image file created with mkisofs. To do so, use dd with the device name as the input file and the name of the ISO to create as the output file:

# dd if=/dev/cd0 of=file.iso bs=2048

The resulting image file can be burned to CD as described in 燒錄 CD.

17.5.4. 使用資料 CD

Once an ISO has been burned to a CD, it can be mounted by specifying the file system type, the name of the device containing the CD, and an existing mount point:

# mount -t cd9660 /dev/cd0 /mnt

Since mount assumes that a file system is of type ufs, a Incorrect super block error will occur if -t cd9660 is not included when mounting a data CD.

While any data CD can be mounted this way, disks with certain ISO 9660 extensions might behave oddly. For example, Joliet disks store all filenames in two-byte Unicode characters. If some non-English characters show up as question marks, specify the local charset with -C. For more information, refer to mount_cd9660(8).

In order to do this character conversion with the help of -C, the kernel requires the cd9660_iconv.ko module to be loaded. This can be done either by adding this line to loader.conf:

cd9660_iconv_load="YES"

and then rebooting the machine, or by directly loading the module with kldload.

Occasionally, Device not configured will be displayed when trying to mount a data CD. This usually means that the CD drive has not detected a disk in the tray, or that the drive is not visible on the bus. It can take a couple of seconds for a CD drive to detect media, so be patient.

Sometimes, a SCSICD drive may be missed because it did not have enough time to answer the bus reset. To resolve this, a custom kernel can be created which increases the default SCSI delay. Add the following option to the custom kernel configuration file and rebuild the kernel using the instructions in 編譯與安裝自訂核心:

options SCSI_DELAY=15000

This tells the SCSI bus to pause 15 seconds during boot, to give the CD drive every possible chance to answer the bus reset.

It is possible to burn a file directly to CD, without creating an ISO 9660 file system. This is known as burning a raw data CD and some people do this for backup purposes.

This type of disk can not be mounted as a normal data CD. In order to retrieve the data burned to such a CD, the data must be read from the raw device node. For example, this command will extract a compressed tar file located on the second CD device into the current working directory:

# tar xzvf /dev/cd1

In order to mount a data CD, the data must be written using mkisofs.

17.5.5. 複製音樂 CD

To duplicate an audio CD, extract the audio data from the CD to a series of files, then write these files to a blank CD.

Duplicating an Audio CD describes how to duplicate and burn an audio CD. If the FreeBSD version is less than 10.0 and the device is ATAPI, the atapicam module must be first loaded using the instructions in 支援的裝置.

Procedure: Duplicating an Audio CD

  1. The sysutils/cdrtools package or port installs cdda2wav. This command can be used to extract all of the audio tracks, with each track written to a separate WAV file in the current working directory:

    % cdda2wav -vall -B -Owav

    A device name does not need to be specified if there is only one CD device on the system. Refer to the cdda2wav manual page for instructions on how to specify a device and to learn more about the other options available for this command.

  2. Use cdrecord to write the .wav files:

    % cdrecord -v dev=2,0 -dao -useinfo  *.wav

    Make sure that 2,0 is set appropriately, as described in 燒錄 CD.

17.6. 建立與使用 DVD 媒體

Compared to the CD, the DVD is the next generation of optical media storage technology. The DVD can hold more data than any CD and is the standard for video publishing.

Five physical recordable formats can be defined for a recordable DVD:

  • DVD-R: This was the first DVD recordable format available. The DVD-R standard is defined by the DVD Forum. This format is write once.

  • DVD-RW: This is the rewritable version of the DVD-R standard. A DVD-RW can be rewritten about 1000 times.

  • DVD-RAM: This is a rewritable format which can be seen as a removable hard drive. However, this media is not compatible with most DVD-ROM drives and DVD-Video players as only a few DVD writers support the DVD-RAM format. Refer to 使用 DVD-RAM for more information on DVD-RAM use.

  • DVD+RW: This is a rewritable format defined by the DVD+RW Alliance. A DVD+RW can be rewritten about 1000 times.

  • DVD+R: This format is the write once variation of the DVD+RW format.

A single layer recordable DVD can hold up to 4,700,000,000 bytes which is actually 4.38 GB or 4485 MB as 1 kilobyte is 1024 bytes.

A distinction must be made between the physical media and the application. For example, a DVD-Video is a specific file layout that can be written on any recordable DVD physical media such as DVD-R, DVD+R, or DVD-RW. Before choosing the type of media, ensure that both the burner and the DVD-Video player are compatible with the media under consideration.

17.6.1. 設定

To perform DVD recording, use growisofs(1). This command is part of the sysutils/dvd+rw-tools utilities which support all DVD media types.

These tools use the SCSI subsystem to access the devices, therefore ATAPI/CAM support must be loaded or statically compiled into the kernel. This support is not needed if the burner uses the USB interface. Refer to USB 儲存裝置 for more details on USB device configuration.

DMA access must also be enabled for ATAPI devices, by adding the following line to /boot/loader.conf:

hw.ata.atapi_dma="1"

Before attempting to use dvd+rw-tools, consult the Hardware Compatibility Notes.

For a graphical user interface, consider using sysutils/k3b which provides a user friendly interface to growisofs(1) and many other burning tools.

17.6.2. 燒錄資料 DVD

Since growisofs(1) is a front-end to mkisofs, it will invoke mkisofs(8) to create the file system layout and perform the write on the DVD. This means that an image of the data does not need to be created before the burning process.

To burn to a DVD+R or a DVD-R the data in /path/to/data, use the following command:

# growisofs -dvd-compat -Z /dev/cd0 -J -R /path/to/data

In this example, -J -R is passed to mkisofs(8) to create an ISO 9660 file system with Joliet and Rock Ridge extensions. Refer to mkisofs(8) for more details.

For the initial session recording, -Z is used for both single and multiple sessions. Replace /dev/cd0, with the name of the DVD device. Using -dvd-compat indicates that the disk will be closed and that the recording will be unappendable. This should also provide better media compatibility with DVD-ROM drives.

To burn a pre-mastered image, such as imagefile.iso, use:

# growisofs -dvd-compat -Z /dev/cd0=imagefile.iso

The write speed should be detected and automatically set according to the media and the drive being used. To force the write speed, use -speed=. Refer to growisofs(1) for example usage.

In order to support working files larger than 4.38GB, an UDF/ISO-9660 hybrid file system must be created by passing -udf -iso-level 3 to mkisofs(8) and all related programs, such as growisofs(1). This is required only when creating an ISO image file or when writing files directly to a disk. Since a disk created this way must be mounted as an UDF file system with mount_udf(8), it will be usable only on an UDF aware operating system. Otherwise it will look as if it contains corrupted files.

To create this type of ISO file:

% mkisofs -R -J -udf -iso-level 3 -o imagefile.iso /path/to/data

To burn files directly to a disk:

# growisofs -dvd-compat -udf -iso-level 3 -Z /dev/cd0 -J -R /path/to/data

When an ISO image already contains large files, no additional options are required for growisofs(1) to burn that image on a disk.

Be sure to use an up-to-date version of sysutils/cdrtools, which contains mkisofs(8), as an older version may not contain large files support. If the latest version does not work, install sysutils/cdrtools-devel and read its mkisofs(8).

17.6.3. 燒錄 DVD-Video

A DVD-Video is a specific file layout based on the ISO 9660 and micro-UDF (M-UDF) specifications. Since DVD-Video presents a specific data structure hierarchy, a particular program such as multimedia/dvdauthor is needed to author the DVD.

If an image of the DVD-Video file system already exists, it can be burned in the same way as any other image. If dvdauthor was used to make the DVD and the result is in /path/to/video, the following command should be used to burn the DVD-Video:

# growisofs -Z /dev/cd0 -dvd-video /path/to/video

-dvd-video is passed to mkisofs(8) to instruct it to create a DVD-Video file system layout. This option implies the -dvd-compat growisofs(1) option.

17.6.4. 使用 DVD+RW

Unlike CD-RW, a virgin DVD+RW needs to be formatted before first use. It is recommended to let growisofs(1) take care of this automatically whenever appropriate. However, it is possible to use dvd+rw-format to format the DVD+RW:

# dvd+rw-format /dev/cd0

Only perform this operation once and keep in mind that only virgin DVD+RW medias need to be formatted. Once formatted, the DVD+RW can be burned as usual.

To burn a totally new file system and not just append some data onto a DVD+RW, the media does not need to be blanked first. Instead, write over the previous recording like this:

# growisofs -Z /dev/cd0 -J -R /path/to/newdata

The DVD+RW format supports appending data to a previous recording. This operation consists of merging a new session to the existing one as it is not considered to be multi-session writing. growisofs(1) will grow the ISO 9660 file system present on the media.

For example, to append data to a DVD+RW, use the following:

# growisofs -M /dev/cd0 -J -R /path/to/nextdata

The same mkisofs(8) options used to burn the initial session should be used during next writes.

Use -dvd-compat for better media compatibility with DVD-ROM drives. When using DVD+RW, this option will not prevent the addition of data.

To blank the media, use:

# growisofs -Z /dev/cd0=/dev/zero

17.6.5. 使用 DVD-RW

A DVD-RW accepts two disc formats: incremental sequential and restricted overwrite. By default, DVD-RW discs are in sequential format.

A virgin DVD-RW can be directly written without being formatted. However, a non-virgin DVD-RW in sequential format needs to be blanked before writing a new initial session.

To blank a DVD-RW in sequential mode:

# dvd+rw-format -blank=full /dev/cd0

A full blanking using -blank=full will take about one hour on a 1x media. A fast blanking can be performed using -blank, if the DVD-RW will be recorded in Disk-At-Once (DAO) mode. To burn the DVD-RW in DAO mode, use the command:

# growisofs -use-the-force-luke=dao -Z /dev/cd0=imagefile.iso

Since growisofs(1) automatically attempts to detect fast blanked media and engage DAO write, -use-the-force-luke=dao should not be required.

One should instead use restricted overwrite mode with any DVD-RW as this format is more flexible than the default of incremental sequential.

To write data on a sequential DVD-RW, use the same instructions as for the other DVD formats:

# growisofs -Z /dev/cd0 -J -R /path/to/data

To append some data to a previous recording, use -M with growisofs(1). However, if data is appended on a DVD-RW in incremental sequential mode, a new session will be created on the disc and the result will be a multi-session disc.

A DVD-RW in restricted overwrite format does not need to be blanked before a new initial session. Instead, overwrite the disc with -Z. It is also possible to grow an existing ISO 9660 file system written on the disc with -M. The result will be a one-session DVD.

To put a DVD-RW in restricted overwrite format, the following command must be used:

# dvd+rw-format /dev/cd0

To change back to sequential format, use:

# dvd+rw-format -blank=full /dev/cd0

17.6.6. 多階段燒錄 (Multi-Session)

Few DVD-ROM drives support multi-session DVDs and most of the time only read the first session. DVD+R, DVD-R and DVD-RW in sequential format can accept multiple sessions. The notion of multiple sessions does not exist for the DVD+RW and the DVD-RW restricted overwrite formats.

Using the following command after an initial non-closed session on a DVD+R, DVD-R, or DVD-RW in sequential format, will add a new session to the disc:

# growisofs -M /dev/cd0 -J -R /path/to/nextdata

Using this command with a DVD+RW or a DVD-RW in restricted overwrite mode will append data while merging the new session to the existing one. The result will be a single-session disc. Use this method to add data after an initial write on these types of media.

Since some space on the media is used between each session to mark the end and start of sessions, one should add sessions with a large amount of data to optimize media space. The number of sessions is limited to 154 for a DVD+R, about 2000 for a DVD-R, and 127 for a DVD+R Double Layer.

17.6.7. 取得更多資訊

To obtain more information about a DVD, use dvd+rw-mediainfo /dev/cd0 while the disc in the specified drive.

More information about dvd+rw-tools can be found in growisofs(1), on the dvd+rw-tools web site, and in the cdwrite mailing list archives.

When creating a problem report related to the use of dvd+rw-tools, always include the output of dvd+rw-mediainfo.

17.6.8. 使用 DVD-RAM

DVD-RAM writers can use either a SCSI or ATAPI interface. For ATAPI devices, DMA access has to be enabled by adding the following line to /boot/loader.conf:

hw.ata.atapi_dma="1"

A DVD-RAM can be seen as a removable hard drive. Like any other hard drive, the DVD-RAM must be formatted before it can be used. In this example, the whole disk space will be formatted with a standard UFS2 file system:

# dd if=/dev/zero of=/dev/acd0 bs=2k count=1
# bsdlabel -Bw acd0
# newfs /dev/acd0

The DVD device, acd0, must be changed according to the configuration.

Once the DVD-RAM has been formatted, it can be mounted as a normal hard drive:

# mount /dev/acd0 /mnt

Once mounted, the DVD-RAM will be both readable and writeable.

17.7. 建立與使用軟碟

This section explains how to format a 3.5 inch floppy disk in FreeBSD.

Procedure: Steps to Format a Floppy

A floppy disk needs to be low-level formatted before it can be used. This is usually done by the vendor, but formatting is a good way to check media integrity. To low-level format the floppy disk on FreeBSD, use fdformat(1). When using this utility, make note of any error messages, as these can help determine if the disk is good or bad.

  1. To format the floppy, insert a new 3.5 inch floppy disk into the first floppy drive and issue:

    # /usr/sbin/fdformat -f 1440 /dev/fd0
  2. After low-level formatting the disk, create a disk label as it is needed by the system to determine the size of the disk and its geometry. The supported geometry values are listed in /etc/disktab.

    To write the disk label, use bsdlabel(8):

    # /sbin/bsdlabel -B -w /dev/fd0 fd1440
  3. The floppy is now ready to be high-level formatted with a file system. The floppy’s file system can be either UFS or FAT, where FAT is generally a better choice for floppies.

    To format the floppy with FAT, issue:

    # /sbin/newfs_msdos /dev/fd0

The disk is now ready for use. To use the floppy, mount it with mount_msdosfs(8). One can also install and use emulators/mtools from the Ports Collection.

17.8. 備份基礎概念

為了要能夠從磁碟故障、意外刪除文件、隨機文件損壞或完全機器毀壞,包含本地備份毀壞進行恢復,執行備份計劃是必要的。

備份的類型與排程會依情況有所不同,取決於資料的重要性、檔案還原所需的程度以及可接受的停機時間。一些可用來備份的技術有:

  • 封存整個檔案系統,備份至永久、異地媒體。這可在以上所列的所有問題發生時提供保護,但要還原會較慢且不方便,特別是對於沒有權限的使用者。

  • 檔案系統快照 (Snapshot),對於還原已刪除的檔案或先前版本的檔案非常有用。

  • 整個檔案系統或磁碟的複本,使用排程的 net/rsync 來與網路上的另一個系統同步。

  • 硬體或軟體 RAID,來最小化或避免當磁碟故障時的停機時間。

通常會混合使用各種備份技術,例如,建立一個排程每週自動做儲存於異地的完整系統備份,並使用每小時的 ZFS 快照來輔助備份。此外,在對檔案做編輯或刪除前手動備份各別目錄或檔案。

本章節會介紹一些可以用來在 FreeBSD 上建立與管理系統備份的工具。

17.8.1. 檔案系統備份

要備份一個檔案系統,會用到 dump(8) 這個傳統 UNIX™ 程式來建立備份,並可使用 restore(8) 來還原備份。這兩個工具可在磁碟區塊的層級運作,這個層級比由檔案系統建立檔案、連結與目錄的抽象層級還要低,因此不像其他的備份軟體,dump 必須一次備份整個檔案系統,且無法只備份部份檔案系統或跨多個檔案系統的目錄樹,dump 會備份構成檔案與目錄的原始資料區塊,而非直接備份檔案與目錄。

在根目錄使用 dump,會無法備份 /home, /usr 或其他許多的目錄,由於這些目錄通常是其他檔案系統的掛載點或連結到其他檔案系統的符號連結。

還原資料時,restore 預設會儲存暫存檔案於 /tmp/,當使用一個 /tmp 較小的復原磁碟時,請設定 TMPDIR 到一個擁有較多可用空間的目錄以讓還原可以順利執行。

當使用 dump 時,請小心最早自 AT&T UNIX™,circa 1975 的版本 6 仍有一些問題存在,預設的參數會假設備份到一個 9 軌的磁帶,這並非其他類型的媒體或現今可用的高密度磁帶,必須另外在指令列修改這個預設值。

雖然可以使用 rdump(8)rrestore(8) 工具可以跨網路備份一個檔案系統到另一個系統或備份到連結另一台電腦的磁帶機,但這使用兩個工具備份的安全性並不足夠。

可改以在較安全的 SSH 連線上使用 dumprestore。以下例子會建立一個完整、壓縮的 /usr 備份並透過 SSH 連線傳送備份檔案到指定的主機。

例 1. 在 ssh 使用 dump
# /sbin/dump -0uan -f - /usr | gzip -2 | ssh -c blowfish \
          targetuser@targetmachine.example.com dd of=/mybigfiles/dump-usr-l0.gz

這個例子會設定 RSH,以便透過 SSH 連線寫入備份到遠端系統的磁帶機:

例 2. 在 ssh 使用 dump 透過 RSH 設定
# env RSH=/usr/bin/ssh /sbin/dump -0uan -f targetuser@targetmachine.example.com:/dev/sa0 /usr

17.8.2. 目錄備份

系統已有內建數個工具可在需要時用來備份與還原指定的檔案與目錄。

要備份一個目錄中的所有檔案最好的選擇是 tar(1),這個工具最早可以追朔自 AT&T UNIX™ 版本 6 時,因此預設會做一個遞迴備份到一個磁帶機,可以使用參數來改指定備份檔案的名稱。

這個例子會建立目前目錄的壓縮備份並儲存至 /tmp/mybackup.tgz,在建立備份檔案時,要確認備份檔案不要儲存到與目前備份目錄相同的目錄。

例 3. 使用 tar 備份目前目錄
# tar czvf /tmp/mybackup.tgz .

要還原整個備份,先 cd 進入要放置還原檔的目錄並指定備份的名稱。注意,這個動作會覆寫任何在該還原目錄中任何較新版的檔案,當不確定時,可先還原到一個暫時的目錄或指定備份檔中的檔案做還原。

例 4. 使用 tar 還原目前目錄
# tar xzvf /tmp/mybackup.tgz

除此之外還有許多可用的參數在 tar(1) 中會有說明。本工具也支援使用排除模式 (Exclude pattern) 來指定那些檔案應該在備份指定目錄或自備份還原檔案時排除。

要使用指定的檔案與目錄清單做備份使用 cpio(1) 是不錯的選擇。它並不像 tarcpio 並不知道如何走訪目錄樹,所以必須提供檔案的清單才能做備份。

例如,檔案的清單可以使用 lsfind 來產生。以下例子會建立一個目前目錄的遞迴清單然後轉送 (Piped) 給 cpio 來建立名稱為 /tmp/mybackup.cpio 的備份檔。

例 5. 使用 lscpio 來製作目前目錄的遞迴備份
# ls -R | cpio -ovF /tmp/mybackup.cpio

有一個備份工具嘗試整合 tarcpio 所提供的功能,便是 pax(1)。經歷數年,各種版本的 tarcpio 變的有一些無法相容。POSIX™ 開發出 pax,嘗試讀取與寫入各種版本的 cpio and tar 格式並加入自己的新格式。

以先前的例子改使用 pax 會是:

例 6. 使用 pax 備份目前目錄
# pax -wf /tmp/mybackup.pax .

17.8.3. 使用資料磁帶備份

隨著磁帶的技術持續發展,當今的備份系統將異地備份與本地可移除媒體做了結合。FreeBSD 支援任何使用 SCSI 的磁帶機,如 LTO 或 DAT,並有限制的支援 SATA 與 USB 磁帶機。

SCSI 磁帶機在 FreeBSD 會使用 sa(4) 驅動程式以及 /dev/sa0, /dev/nsa0/dev/esa0 裝置,實體裝置名稱為 /dev/sa0,當使用 /dev/nsa0 時,備份程式在寫入檔案之後不會倒帶,這可允許寫入超過一個檔案到磁帶,而使用 /dev/esa0 時,當關閉裝置後便會退出磁帶。

在 FreeBSD 中會使用 mt 來做磁帶機的控制操作,例如在磁帶中搜尋檔案或寫入磁帶控制記號到磁帶。例如,要保留磁帶上的前三個檔案,可以在寫入新檔案前跳過這些檔案:

# mt -f /dev/nsa0 fsf 3

這個工具尚支援許多操作,請參考 mt(1) 了解詳情。

要使用 tar 寫入單一檔案到磁帶,可指定磁帶裝置的名稱以及要備份的檔案:

# tar cvf /dev/sa0 file

要從磁帶上的 tar 封存檔還原檔案到目前的目錄可:

# tar xvf /dev/sa0

要備份一個 UFS 檔案系統可使用 dump。以下例子會備份 /usr 並在完成時不做倒帶:

# dump -0aL -b64 -f /dev/nsa0 /usr

要以互動的方式從磁帶上的 dump 檔案還原到目前目錄:

# restore -i -f /dev/nsa0

17.8.4. 第三方備份工具

FreeBSD Port 套件集提供了許多第三方工具可用於排程建立備份,簡化磁帶備份並讓備份更簡單方便。許多這類的應用程式是以客戶端/伺服器為基礎,可用來自動化單一系統或網路上所有電腦的備份。

較熱門的工具包含 Amanda, Bacula, rsync 以及 duplicity。

17.8.5. 緊急還原

除了正常的備份外,建議將下以步驟做為緊急準備計劃的一部份。

替以下指令的輸出建立一份可列印的複本:

  • gpart show

  • more /etc/fstab

  • dmesg

在安全的地方保存這份列印結果與安裝媒體的複本,在緊急還原時可能會需要,接著開機進入安裝媒體並選擇 Live CD 以存取救援 Shell (Rescue shell),這個救援模式可以用來檢視目前系統的狀態,若有需要,可重新格式化磁碟然後自備份還原資料。

FreeBSD/i386 11.2-RELEASE 的安裝媒體未內含救援 Shell,針對該版本,可改自 ftp://ftp.FreeBSD.org/pub/FreeBSD/releases/i386/ISO-IMAGES/11.2/FreeBSD-11.2-RELEASE-i386-livefs.iso 下載 Livefs CD 映像檔並燒錄。

然後,測試救援 Shell 下的備份。記錄下整個程序,將這份記錄隨媒體、列印結果、備份檔一併保存,這份記錄可以避免在緊張壓力下做緊急還原時因不慎造成備份的毀壞。

要再安全性一點,則可將最新的備份儲存在與實體電腦與磁碟機有一段明顯距離的遠端位置。

17.9. 記憶體磁碟

In addition to physical disks, FreeBSD also supports the creation and use of memory disks. One possible use for a memory disk is to access the contents of an ISO file system without the overhead of first burning it to a CD or DVD, then mounting the CD/DVD media.

In FreeBSD, the md(4) driver is used to provide support for memory disks. The GENERIC kernel includes this driver. When using a custom kernel configuration file, ensure it includes this line:

device md

17.9.1. 連接與解除連接既有的映象檔

To mount an existing file system image, use mdconfig to specify the name of the ISO file and a free unit number. Then, refer to that unit number to mount it on an existing mount point. Once mounted, the files in the ISO will appear in the mount point. This example attaches diskimage.iso to the memory device /dev/md0 then mounts that memory device on /mnt:

# mdconfig -f diskimage.iso -u 0
# mount -t cd9660 /dev/md0 /mnt

Notice that -t cd9660 was used to mount an ISO format. If a unit number is not specified with -u, mdconfig will automatically allocate an unused memory device and output the name of the allocated unit, such as md4. Refer to mdconfig(8) for more details about this command and its options.

When a memory disk is no longer in use, its resources should be released back to the system. First, unmount the file system, then use mdconfig to detach the disk from the system and release its resources. To continue this example:

# umount /mnt
# mdconfig -d -u 0

To determine if any memory disks are still attached to the system, type mdconfig -l.

17.9.2. 建立以檔案或記憶體為基底的磁碟

FreeBSD also supports memory disks where the storage to use is allocated from either a hard disk or an area of memory. The first method is commonly referred to as a file-backed file system and the second method as a memory-backed file system. Both types can be created using mdconfig.

To create a new memory-backed file system, specify a type of swap and the size of the memory disk to create. Then, format the memory disk with a file system and mount as usual. This example creates a 5M memory disk on unit 1. That memory disk is then formatted with the UFS file system before it is mounted:

# mdconfig -a -t swap -s 5m -u 1
# newfs -U md1
/dev/md1: 5.0MB (10240 sectors) block size 16384, fragment size 2048
        using 4 cylinder groups of 1.27MB, 81 blks, 192 inodes.
        with soft updates
super-block backups (for fsck -b #) at:
 160, 2752, 5344, 7936
# mount /dev/md1 /mnt
# df /mnt
Filesystem 1K-blocks Used Avail Capacity  Mounted on
/dev/md1        4718    4  4338     0%    /mnt

To create a new file-backed memory disk, first allocate an area of disk to use. This example creates an empty 5MB file named newimage:

# dd if=/dev/zero of=newimage bs=1k count=5k
5120+0 records in
5120+0 records out

Next, attach that file to a memory disk, label the memory disk and format it with the UFS file system, mount the memory disk, and verify the size of the file-backed disk:

# mdconfig -f newimage -u 0
# bsdlabel -w md0 auto
# newfs -U md0a
/dev/md0a: 5.0MB (10224 sectors) block size 16384, fragment size 2048
        using 4 cylinder groups of 1.25MB, 80 blks, 192 inodes.
super-block backups (for fsck -b #) at:
 160, 2720, 5280, 7840
# mount /dev/md0a /mnt
# df /mnt
Filesystem 1K-blocks Used Avail Capacity  Mounted on
/dev/md0a       4710    4  4330     0%    /mnt

It takes several commands to create a file- or memory-backed file system using mdconfig. FreeBSD also comes with mdmfs which automatically configures a memory disk, formats it with the UFS file system, and mounts it. For example, after creating newimage with dd, this one command is equivalent to running the bsdlabel, newfs, and mount commands shown above:

# mdmfs -F newimage -s 5m md0 /mnt

To instead create a new memory-based memory disk with mdmfs, use this one command:

# mdmfs -s 5m md1 /mnt

If the unit number is not specified, mdmfs will automatically select an unused memory device. For more details about mdmfs, refer to mdmfs(8).

17.10. 檔案系統快照

FreeBSD offers a feature in conjunction with Soft Updates: file system snapshots.

UFS snapshots allow a user to create images of specified file systems, and treat them as a file. Snapshot files must be created in the file system that the action is performed on, and a user may create no more than 20 snapshots per file system. Active snapshots are recorded in the superblock so they are persistent across unmount and remount operations along with system reboots. When a snapshot is no longer required, it can be removed using rm(1). While snapshots may be removed in any order, all the used space may not be acquired because another snapshot will possibly claim some of the released blocks.

The un-alterable snapshot file flag is set by mksnap_ffs(8) after initial creation of a snapshot file. unlink(1) makes an exception for snapshot files since it allows them to be removed.

Snapshots are created using mount(8). To place a snapshot of /var in the file /var/snapshot/snap, use the following command:

# mount -u -o snapshot /var/snapshot/snap /var

Alternatively, use mksnap_ffs(8) to create the snapshot:

# mksnap_ffs /var /var/snapshot/snap

One can find snapshot files on a file system, such as /var, using find(1):

# find /var -flags snapshot

Once a snapshot has been created, it has several uses:

  • Some administrators will use a snapshot file for backup purposes, because the snapshot can be transferred to CDs or tape.

  • The file system integrity checker, fsck(8), may be run on the snapshot. Assuming that the file system was clean when it was mounted, this should always provide a clean and unchanging result.

  • Running dump(8) on the snapshot will produce a dump file that is consistent with the file system and the timestamp of the snapshot. dump(8) can also take a snapshot, create a dump image, and then remove the snapshot in one command by using -L.

  • The snapshot can be mounted as a frozen image of the file system. To mount(8) the snapshot /var/snapshot/snap run:

    # mdconfig -a -t vnode -o readonly -f /var/snapshot/snap -u 4
    # mount -r /dev/md4 /mnt

The frozen /var is now available through /mnt. Everything will initially be in the same state it was during the snapshot creation time. The only exception is that any earlier snapshots will appear as zero length files. To unmount the snapshot, use:

# umount /mnt
# mdconfig -d -u 4

For more information about softupdates and file system snapshots, including technical papers, visit Marshall Kirk McKusick’s website at http://www.mckusick.com/.

17.11. 磁碟配額

磁碟配額可以用來限制使用者或群組成員能夠在各別檔案系統上使用的磁碟空間量或檔案數量。這個可避免一個使用者或群組成員耗盡所有磁碟的可用空間。

本節將說明如何設定 UFS 檔案系統的磁碟配額。要在 ZFS 檔案系統上設定配額,請參考 資料集、使用者以及群組配額

17.11.1. 開啟磁碟配額

查看 FreeBSD 核心是否支援磁碟配額:

% sysctl kern.features.ufs_quota
kern.features.ufs_quota: 1

在本例中,數值 1 代表支援磁碟配額,若為 0,則需加入下列設定到自訂核心設定檔然後依照 設定 FreeBSD 核心 的指示重新編譯核心:

options QUOTA

接著,在 /etc/rc.conf 開啟磁碟配額:

quota_enable="YES"

正常在開機時,會使用 quotacheck(8) 檢查每個檔案系統的配額完整性,這個程式會確保在配額資料庫中的資料正確的反映了檔案系統上的資料。這是一個耗費時間的程序,會明顯的影響系統開機的時間,要跳過這個步驟可以加入此變數到 /etc/rc.conf

check_quotas="NO"

最後,編輯 /etc/fstab 來開啟在各個檔案系統上的磁碟配額。要開啟在檔案系統上對每個使用者的配額要加入 userquota 選項到 /etc/fstab 要開啟配額的檔案系統的項目中。例如:

/dev/da1s2g   /home    ufs rw,userquota 1 2

要開啟群組配額,則使用 groupquota。要同時開啟使用者及群組配額,可使用逗號隔開選項:

/dev/da1s2g    /home    ufs rw,userquota,groupquota 1 2

預設配額檔案會儲存在檔案系統的根目錄的 quota.userquota.group,請參考 fstab(5) 來取得更多資訊,較不建議指定其他位置來儲存配額檔案。

設定完成之後,重新啟動系統,/etc/rc 會自動執行適當的指令對所有在 /etc/fstab 中開啟配磁的檔案系統建立初始的配額檔。

在一般的操作中,並不需要手動執行 quotacheck(8), quotaon(8) 或是 quotaoff(8),雖然如此,仍應閱讀這些指令的操作手冊來熟悉這些指令的操作。

17.11.2. 設定配額限制

要確認配額已經開啟,可執行:

# quota -v

每個有開啟配額的檔案系統應該會有一行磁碟用量及目前配額限制的摘要。

現在系統已準備好可以使用 edquota 分配配額限制。

有數個選項可以強制限制使用者或群組對磁碟空間的使用量以及可以建立多少檔案。可以用磁碟空間 (block 配額),檔案數量 (inode 配額) 或同時使用來分配。每種限制又可進一步細分為兩個類型:硬性 (Hard) 及軟性 (Soft) 限制。

硬性限制無法被超額使用。一旦使用者超出了硬性限制,該使用者在該檔案系統將無法再使用任何空間。舉例來說,若一個使用者在一個檔案系統上有 500 KB 的硬性限制,且目前已經使用了 490 KB,該使用者只能再使用 10 KB 的空間,若嘗試使用 11 KB 的空間將會失敗。

軟性限制在有限的時間內可以被超額使用,即為寬限期 (Grace period),預設為一週。若一個使用者超出限制並超過寬限期,則軟性限制將轉為硬性限制並且將不允許再使用空間。當使用者使用的空間回到低於軟性限制內,寬限期就會被重置。

在下面的例子中,會編輯 test 的配額。當執行 edquota 時,將會使用 EDITOR 指定的編輯器來編輯配額限制。預設的編輯器為 vi。

# edquota -u test
Quotas for user test:
/usr: kbytes in use: 65, limits (soft = 50, hard = 75)
        inodes in use: 7, limits (soft = 50, hard = 60)
/usr/var: kbytes in use: 0, limits (soft = 50, hard = 75)
        inodes in use: 0, limits (soft = 50, hard = 60)

正常每個開啟配額的檔案系統會有兩行需要設定,一行代表區塊限制 (Block limit) 而另一行代表節點限制 (inode limit),更改行內的值來修改配額限制。舉例來說,要在 /usr 提高區塊的軟性限制到 500 以及硬性限制到 600,可更改行內的值如下:

/usr: kbytes in use: 65, limits (soft = 500, hard = 600)

新的配額限制將在離開編輯器後生效。

有時會想要針對一群使用者設定配額限,這時可以透過指定想要的配額給第一個使用者,若然後使用 -p 來複製配額到指定範圍的使用者 ID (UID)。以下指定將複製配額限制給 UID 10,00019,999 的使用者:

# edquota -p test 10000-19999

要取得更多資訊,請參考 edquota(8)

17.11.3. 檢查配額限制與磁碟使用狀況

要檢查各別使用者或群組的配額與磁碟用量可使用 quota(1)。使用者僅可查看自己的配額以及所屬群組的配額,只有使超級使用者可以檢視所有使用者及群組的配額。要取得某個有開啟配額的檔案系統的所有配額及磁碟用量摘要,可使用 repquota(8)

正常情況,使用者未使用任何磁碟空間的檔案系統並不會顯示在 quota 的輸出結果中,即使該使用者有在該檔案系統設定配額限制,使用 -v 可以顯示這些檔案系統。以下是使用使用 quota -v 查詢某個使用者在兩個檔案系統上的配額限制的範例輸出。

Disk quotas for user test (uid 1002):
     Filesystem  usage    quota   limit   grace   files   quota   limit   grace
           /usr      65*     50      75   5days       7      50      60
       /usr/var       0      50      75               0      50      60

在這個例子當中,使用者在 /usr 的軟性限制 50 KB 已經超出了 15 KB 並已經過了 5 天寬限期。星號 * 代表該使用者目前已超出配額限制。

17.11.4. NFS 上的配額

在 NFS 伺服器上,配額會由配額子系統強制執行,rpc.rquotad(8) Daemon 會提供配額資訊給 NFS 客戶端的 quota,讓在那些主機的使用者可以查看它們的配額統計資訊。

在 NFS 伺服器上將 /etc/inetd.confrpc.rquotad 行前的 # 移除來開啟:

rquotad/1      dgram rpc/udp wait root /usr/libexec/rpc.rquotad rpc.rquotad

然後重新啟動 inetd

# service inetd restart

17.12. 磁碟分割區加密

FreeBSD offers excellent online protections against unauthorized data access. File permissions and Mandatory Access Control (MAC) help prevent unauthorized users from accessing data while the operating system is active and the computer is powered up. However, the permissions enforced by the operating system are irrelevant if an attacker has physical access to a computer and can move the computer’s hard drive to another system to copy and analyze the data.

Regardless of how an attacker may have come into possession of a hard drive or powered-down computer, the GEOM-based cryptographic subsystems built into FreeBSD are able to protect the data on the computer’s file systems against even highly-motivated attackers with significant resources. Unlike encryption methods that encrypt individual files, the built-in gbde and geli utilities can be used to transparently encrypt entire file systems. No cleartext ever touches the hard drive’s platter.

This chapter demonstrates how to create an encrypted file system on FreeBSD. It first demonstrates the process using gbde and then demonstrates the same example using geli.

17.12.1. 使用 gbde 做磁碟加密

The objective of the gbde(4) facility is to provide a formidable challenge for an attacker to gain access to the contents of a cold storage device. However, if the computer is compromised while up and running and the storage device is actively attached, or the attacker has access to a valid passphrase, it offers no protection to the contents of the storage device. Thus, it is important to provide physical security while the system is running and to protect the passphrase used by the encryption mechanism.

This facility provides several barriers to protect the data stored in each disk sector. It encrypts the contents of a disk sector using 128-bit AES in CBC mode. Each sector on the disk is encrypted with a different AES key. For more information on the cryptographic design, including how the sector keys are derived from the user-supplied passphrase, refer to gbde(4).

FreeBSD provides a kernel module for gbde which can be loaded with this command:

# kldload geom_bde

If using a custom kernel configuration file, ensure it contains this line:

options GEOM_BDE

The following example demonstrates adding a new hard drive to a system that will hold a single encrypted partition that will be mounted as /private.

Procedure: Encrypting a Partition with gbde

  1. Add the New Hard Drive

    Install the new drive to the system as explained in 加入磁碟. For the purposes of this example, a new hard drive partition has been added as /dev/ad4s1c and /dev/ad0s1* represents the existing standard FreeBSD partitions.

    # ls /dev/ad*
    /dev/ad0        /dev/ad0s1b     /dev/ad0s1e     /dev/ad4s1
    /dev/ad0s1      /dev/ad0s1c     /dev/ad0s1f     /dev/ad4s1c
    /dev/ad0s1a     /dev/ad0s1d     /dev/ad4
  2. Create a Directory to Hold gbde Lock Files

    # mkdir /etc/gbde

    The gbde lock file contains information that gbde requires to access encrypted partitions. Without access to the lock file, gbde will not be able to decrypt the data contained in the encrypted partition without significant manual intervention which is not supported by the software. Each encrypted partition uses a separate lock file.

  3. Initialize the gbde Partition

    A gbde partition must be initialized before it can be used. This initialization needs to be performed only once. This command will open the default editor, in order to set various configuration options in a template. For use with the UFS file system, set the sector_size to 2048:

    # gbde init /dev/ad4s1c -i -L /etc/gbde/ad4s1c.lock
    # $FreeBSD: head/zh_TW.UTF-8/books/handbook/book.xml 53653 2019-12-03 17:05:41Z rcyu $

    Once the edit is saved, the user will be asked twice to type the passphrase used to secure the data. The passphrase must be the same both times. The ability of gbde to protect data depends entirely on the quality of the passphrase. For tips on how to select a secure passphrase that is easy to remember, see http://world.std.com/~reinhold/diceware.htm.

    This initialization creates a lock file for the gbde partition. In this example, it is stored as /etc/gbde/ad4s1c.lock. Lock files must end in ".lock" in order to be correctly detected by the /etc/rc.d/gbde start up script.

    Lock files must be backed up together with the contents of any encrypted partitions. Without the lock file, the legitimate owner will be unable to access the data on the encrypted partition.

  4. Attach the Encrypted Partition to the Kernel

    # gbde attach /dev/ad4s1c -l /etc/gbde/ad4s1c.lock

    This command will prompt to input the passphrase that was selected during the initialization of the encrypted partition. The new encrypted device will appear in /dev as /dev/device_name.bde:

    # ls /dev/ad*
    /dev/ad0        /dev/ad0s1b     /dev/ad0s1e     /dev/ad4s1
    /dev/ad0s1      /dev/ad0s1c     /dev/ad0s1f     /dev/ad4s1c
    /dev/ad0s1a     /dev/ad0s1d     /dev/ad4        /dev/ad4s1c.bde
  5. Create a File System on the Encrypted Device

    Once the encrypted device has been attached to the kernel, a file system can be created on the device. This example creates a UFS file system with soft updates enabled. Be sure to specify the partition which has a *.bde extension:

    # newfs -U /dev/ad4s1c.bde
  6. Mount the Encrypted Partition

    Create a mount point and mount the encrypted file system:

    # mkdir /private
    # mount /dev/ad4s1c.bde /private
  7. Verify That the Encrypted File System is Available

    The encrypted file system should now be visible and available for use:

    % df -H
    Filesystem        Size   Used  Avail Capacity  Mounted on
    /dev/ad0s1a      1037M    72M   883M     8%    /
    /devfs            1.0K   1.0K     0B   100%    /dev
    /dev/ad0s1f       8.1G    55K   7.5G     0%    /home
    /dev/ad0s1e      1037M   1.1M   953M     0%    /tmp
    /dev/ad0s1d       6.1G   1.9G   3.7G    35%    /usr
    /dev/ad4s1c.bde   150G   4.1K   138G     0%    /private

After each boot, any encrypted file systems must be manually re-attached to the kernel, checked for errors, and mounted, before the file systems can be used. To configure these steps, add the following lines to /etc/rc.conf:

gbde_autoattach_all="YES"
gbde_devices="ad4s1c"
gbde_lockdir="/etc/gbde"

This requires that the passphrase be entered at the console at boot time. After typing the correct passphrase, the encrypted partition will be mounted automatically. Additional gbde boot options are available and listed in rc.conf(5).

sysinstall is incompatible with gbde-encrypted devices. All *.bde devices must be detached from the kernel before starting sysinstall or it will crash during its initial probing for devices. To detach the encrypted device used in the example, use the following command:

# gbde detach /dev/ad4s1c

17.12.2. 使用 geli 做磁碟加密

An alternative cryptographic GEOM class is available using geli. This control utility adds some features and uses a different scheme for doing cryptographic work. It provides the following features:

  • Utilizes the crypto(9) framework and automatically uses cryptographic hardware when it is available.

  • Supports multiple cryptographic algorithms such as AES, Blowfish, and 3DES.

  • Allows the root partition to be encrypted. The passphrase used to access the encrypted root partition will be requested during system boot.

  • Allows the use of two independent keys.

  • It is fast as it performs simple sector-to-sector encryption.

  • Allows backup and restore of master keys. If a user destroys their keys, it is still possible to get access to the data by restoring keys from the backup.

  • Allows a disk to attach with a random, one-time key which is useful for swap partitions and temporary file systems.

More features and usage examples can be found in geli(8).

The following example describes how to generate a key file which will be used as part of the master key for the encrypted provider mounted under /private. The key file will provide some random data used to encrypt the master key. The master key will also be protected by a passphrase. The provider’s sector size will be 4kB. The example describes how to attach to the geli provider, create a file system on it, mount it, work with it, and finally, how to detach it.

Procedure: Encrypting a Partition with geli

  1. Load geli Support

    Support for geli is available as a loadable kernel module. To configure the system to automatically load the module at boot time, add the following line to /boot/loader.conf:

    geom_eli_load="YES"

    To load the kernel module now:

    # kldload geom_eli

    For a custom kernel, ensure the kernel configuration file contains these lines:

    options GEOM_ELI
    device crypto
  2. Generate the Master Key

    The following commands generate a master key (/root/da2.key) that is protected with a passphrase. The data source for the key file is /dev/random and the sector size of the provider (/dev/da2.eli) is 4kB as a bigger sector size provides better performance:

    # dd if=/dev/random of=/root/da2.key bs=64 count=1
    # geli init -s 4096 -K /root/da2.key /dev/da2
    Enter new passphrase:
    Reenter new passphrase:

    It is not mandatory to use both a passphrase and a key file as either method of securing the master key can be used in isolation.

    If the key file is given as "-", standard input will be used. For example, this command generates three key files:

    # cat keyfile1 keyfile2 keyfile3 | geli init -K - /dev/da2
  3. Attach the Provider with the Generated Key

    To attach the provider, specify the key file, the name of the disk, and the passphrase:

    # geli attach -k /root/da2.key /dev/da2
    Enter passphrase:

    This creates a new device with an .eli extension:

    # ls /dev/da2*
    /dev/da2  /dev/da2.eli
  4. Create the New File System

    Next, format the device with the UFS file system and mount it on an existing mount point:

    # dd if=/dev/random of=/dev/da2.eli bs=1m
    # newfs /dev/da2.eli
    # mount /dev/da2.eli /private

    The encrypted file system should now be available for use:

    # df -H
    Filesystem     Size   Used  Avail Capacity  Mounted on
    /dev/ad0s1a    248M    89M   139M    38%    /
    /devfs         1.0K   1.0K     0B   100%    /dev
    /dev/ad0s1f    7.7G   2.3G   4.9G    32%    /usr
    /dev/ad0s1d    989M   1.5M   909M     0%    /tmp
    /dev/ad0s1e    3.9G   1.3G   2.3G    35%    /var
    /dev/da2.eli   150G   4.1K   138G     0%    /private

Once the work on the encrypted partition is done, and the /private partition is no longer needed, it is prudent to put the device into cold storage by unmounting and detaching the geli encrypted partition from the kernel:

# umount /private
# geli detach da2.eli

A rc.d script is provided to simplify the mounting of geli-encrypted devices at boot time. For this example, add these lines to /etc/rc.conf:

geli_devices="da2"
geli_da2_flags="-k /root/da2.key"

This configures /dev/da2 as a geli provider with a master key of /root/da2.key. The system will automatically detach the provider from the kernel before the system shuts down. During the startup process, the script will prompt for the passphrase before attaching the provider. Other kernel messages might be shown before and after the password prompt. If the boot process seems to stall, look carefully for the password prompt among the other messages. Once the correct passphrase is entered, the provider is attached. The file system is then mounted, typically by an entry in /etc/fstab. Refer to 掛載與卸載檔案系統 for instructions on how to configure a file system to mount at boot time.

17.13. 交換空間加密

Like the encryption of disk partitions, encryption of swap space is used to protect sensitive information. Consider an application that deals with passwords. As long as these passwords stay in physical memory, they are not written to disk and will be cleared after a reboot. However, if FreeBSD starts swapping out memory pages to free space, the passwords may be written to the disk unencrypted. Encrypting swap space can be a solution for this scenario.

This section demonstrates how to configure an encrypted swap partition using gbde(8) or geli(8) encryption. It assumes that /dev/ada0s1b is the swap partition.

17.13.1. 設定已加密的交換空間

Swap partitions are not encrypted by default and should be cleared of any sensitive data before continuing. To overwrite the current swap partition with random garbage, execute the following command:

# dd if=/dev/random of=/dev/ada0s1b bs=1m

To encrypt the swap partition using gbde(8), add the .bde suffix to the swap line in /etc/fstab:

# Device		Mountpoint	FStype	Options		Dump	Pass#
/dev/ada0s1b.bde	none		swap	sw		0	0

To instead encrypt the swap partition using geli(8), use the .eli suffix:

# Device		Mountpoint	FStype	Options		Dump	Pass#
/dev/ada0s1b.eli	none		swap	sw		0	0

By default, geli(8) uses the AES algorithm with a key length of 128 bits. Normally the default settings will suffice. If desired, these defaults can be altered in the options field in /etc/fstab. The possible flags are:

aalgo

Data integrity verification algorithm used to ensure that the encrypted data has not been tampered with. See geli(8) for a list of supported algorithms.

ealgo

Encryption algorithm used to protect the data. See geli(8) for a list of supported algorithms.

keylen

The length of the key used for the encryption algorithm. See geli(8) for the key lengths that are supported by each encryption algorithm.

sectorsize

The size of the blocks data is broken into before it is encrypted. Larger sector sizes increase performance at the cost of higher storage overhead. The recommended size is 4096 bytes.

This example configures an encrypted swap partition using the Blowfish algorithm with a key length of 128 bits and a sectorsize of 4 kilobytes:

# Device		Mountpoint	FStype	Options				Dump	Pass#
/dev/ada0s1b.eli	none		swap	sw,ealgo=blowfish,keylen=128,sectorsize=4096	0	0

17.13.2. 加密的交換空間檢驗

Once the system has rebooted, proper operation of the encrypted swap can be verified using swapinfo.

If gbde(8) is being used:

% swapinfo
Device          1K-blocks     Used    Avail Capacity
/dev/ada0s1b.bde   542720        0   542720     0%

If geli(8) is being used:

% swapinfo
Device          1K-blocks     Used    Avail Capacity
/dev/ada0s1b.eli   542720        0   542720     0%

17.14. 高可用存儲空間 (HAST)

High availability is one of the main requirements in serious business applications and highly-available storage is a key component in such environments. In FreeBSD, the Highly Available STorage (HAST) framework allows transparent storage of the same data across several physically separated machines connected by a TCP/IP network. HAST can be understood as a network-based RAID1 (mirror), and is similar to the DRBD® storage system used in the GNU/Linux™ platform. In combination with other high-availability features of FreeBSD like CARP, HAST makes it possible to build a highly-available storage cluster that is resistant to hardware failures.

The following are the main features of HAST:

  • Can be used to mask I/O errors on local hard drives.

  • File system agnostic as it works with any file system supported by FreeBSD.

  • Efficient and quick resynchronization as only the blocks that were modified during the downtime of a node are synchronized.

  • Can be used in an already deployed environment to add additional redundancy.

  • Together with CARP, Heartbeat, or other tools, it can be used to build a robust and durable storage system.

After reading this section, you will know:

  • What HAST is, how it works, and which features it provides.

  • How to set up and use HAST on FreeBSD.

  • How to integrate CARP and devd(8) to build a robust storage system.

Before reading this section, you should:

  • 了解 UNIX™ 及 FreeBSD 基礎 (FreeBSD 基礎)。

  • Know how to configure network interfaces and other core FreeBSD subsystems (設定與調校).

  • Have a good understanding of FreeBSD networking (網路通訊).

The HAST project was sponsored by The FreeBSD Foundation with support from http://www.omc.net/ and http://www.transip.nl/.

17.14.1. HAST 運作模式

HAST provides synchronous block-level replication between two physical machines: the primary, also known as the master node, and the secondary, or slave node. These two machines together are referred to as a cluster.

Since HAST works in a primary-secondary configuration, it allows only one of the cluster nodes to be active at any given time. The primary node, also called active, is the one which will handle all the I/O requests to HAST-managed devices. The secondary node is automatically synchronized from the primary node.

The physical components of the HAST system are the local disk on primary node, and the disk on the remote, secondary node.

HAST operates synchronously on a block level, making it transparent to file systems and applications. HAST provides regular GEOM providers in /dev/hast/ for use by other tools or applications. There is no difference between using HAST-provided devices and raw disks or partitions.

Each write, delete, or flush operation is sent to both the local disk and to the remote disk over TCP/IP. Each read operation is served from the local disk, unless the local disk is not up-to-date or an I/O error occurs. In such cases, the read operation is sent to the secondary node.

HAST tries to provide fast failure recovery. For this reason, it is important to reduce synchronization time after a node’s outage. To provide fast synchronization, HAST manages an on-disk bitmap of dirty extents and only synchronizes those during a regular synchronization, with an exception of the initial sync.

There are many ways to handle synchronization. HAST implements several replication modes to handle different synchronization methods:

  • memsync: This mode reports a write operation as completed when the local write operation is finished and when the remote node acknowledges data arrival, but before actually storing the data. The data on the remote node will be stored directly after sending the acknowledgement. This mode is intended to reduce latency, but still provides good reliability. This mode is the default.

  • fullsync: This mode reports a write operation as completed when both the local write and the remote write complete. This is the safest and the slowest replication mode.

  • async: This mode reports a write operation as completed when the local write completes. This is the fastest and the most dangerous replication mode. It should only be used when replicating to a distant node where latency is too high for other modes.

17.14.2. HAST 設定

The HAST framework consists of several components:

  • The hastd(8) daemon which provides data synchronization. When this daemon is started, it will automatically load geom_gate.ko.

  • The userland management utility, hastctl(8).

  • The hast.conf(5) configuration file. This file must exist before starting hastd.

Users who prefer to statically build GEOM_GATE support into the kernel should add this line to the custom kernel configuration file, then rebuild the kernel using the instructions in 設定 FreeBSD 核心:

options	GEOM_GATE

The following example describes how to configure two nodes in master-slave/primary-secondary operation using HAST to replicate the data between the two. The nodes will be called hasta, with an IP address of 172.16.0.1, and hastb, with an IP address of 172.16.0.2. Both nodes will have a dedicated hard drive /dev/ad6 of the same size for HAST operation. The HAST pool, sometimes referred to as a resource or the GEOM provider in /dev/hast/, will be called test.

Configuration of HAST is done using /etc/hast.conf. This file should be identical on both nodes. The simplest configuration is:

resource test {
	on hasta {
		local /dev/ad6
		remote 172.16.0.2
	}
	on hastb {
		local /dev/ad6
		remote 172.16.0.1
	}
}

For more advanced configuration, refer to hast.conf(5).

It is also possible to use host names in the remote statements if the hosts are resolvable and defined either in /etc/hosts or in the local DNS.

Once the configuration exists on both nodes, the HAST pool can be created. Run these commands on both nodes to place the initial metadata onto the local disk and to start hastd(8):

# hastctl create test
# service hastd onestart

It is not possible to use GEOM providers with an existing file system or to convert an existing storage to a HAST-managed pool. This procedure needs to store some metadata on the provider and there will not be enough required space available on an existing provider.

A HAST node’s primary or secondary role is selected by an administrator, or software like Heartbeat, using hastctl(8). On the primary node, hasta, issue this command:

# hastctl role primary test

Run this command on the secondary node, hastb:

# hastctl role secondary test

Verify the result by running hastctl on each node:

# hastctl status test

Check the status line in the output. If it says degraded, something is wrong with the configuration file. It should say complete on each node, meaning that the synchronization between the nodes has started. The synchronization completes when hastctl status reports 0 bytes of dirty extents.

The next step is to create a file system on the GEOM provider and mount it. This must be done on the primary node. Creating the file system can take a few minutes, depending on the size of the hard drive. This example creates a UFS file system on /dev/hast/test:

# newfs -U /dev/hast/test
# mkdir /hast/test
# mount /dev/hast/test /hast/test

Once the HAST framework is configured properly, the final step is to make sure that HAST is started automatically during system boot. Add this line to /etc/rc.conf:

hastd_enable="YES"

17.14.2.1. 容錯移轉設定

The goal of this example is to build a robust storage system which is resistant to the failure of any given node. If the primary node fails, the secondary node is there to take over seamlessly, check and mount the file system, and continue to work without missing a single bit of data.

To accomplish this task, the Common Address Redundancy Protocol (CARP) is used to provide for automatic failover at the IP layer. CARP allows multiple hosts on the same network segment to share an IP address. Set up CARP on both nodes of the cluster according to the documentation available in 共用位址備援協定 (CARP). In this example, each node will have its own management IP address and a shared IP address of 172.16.0.254. The primary HAST node of the cluster must be the master CARP node.

The HAST pool created in the previous section is now ready to be exported to the other hosts on the network. This can be accomplished by exporting it through NFS or Samba, using the shared IP address 172.16.0.254. The only problem which remains unresolved is an automatic failover should the primary node fail.

In the event of CARP interfaces going up or down, the FreeBSD operating system generates a devd(8) event, making it possible to watch for state changes on the CARP interfaces. A state change on the CARP interface is an indication that one of the nodes failed or came back online. These state change events make it possible to run a script which will automatically handle the HAST failover.

To catch state changes on the CARP interfaces, add this configuration to /etc/devd.conf on each node:

notify 30 {
	match "system" "IFNET";
	match "subsystem" "carp0";
	match "type" "LINK_UP";
	action "/usr/local/sbin/carp-hast-switch master";
};

notify 30 {
	match "system" "IFNET";
	match "subsystem" "carp0";
	match "type" "LINK_DOWN";
	action "/usr/local/sbin/carp-hast-switch slave";
};

If the systems are running FreeBSD 10 or higher, replace carp0 with the name of the CARP-configured interface.

Restart devd(8) on both nodes to put the new configuration into effect:

# service devd restart

When the specified interface state changes by going up or down , the system generates a notification, allowing the devd(8) subsystem to run the specified automatic failover script, /usr/local/sbin/carp-hast-switch. For further clarification about this configuration, refer to devd.conf(5).

Here is an example of an automated failover script:

#!/bin/sh

# Original script by Freddie Cash <fjwcash@gmail.com>
# Modified by Michael W. Lucas <mwlucas@BlackHelicopters.org>
# and Viktor Petersson <vpetersson@wireload.net>

# The names of the HAST resources, as listed in /etc/hast.conf
resources="test"

# delay in mounting HAST resource after becoming master
# make your best guess
delay=3

# logging
log="local0.debug"
name="carp-hast"

# end of user configurable stuff

case "$1" in
	master)
		logger -p $log -t $name "Switching to primary provider for ${resources}."
		sleep ${delay}

		# Wait for any "hastd secondary" processes to stop
		for disk in ${resources}; do
			while $( pgrep -lf "hastd: ${disk} \(secondary\)" > /dev/null 2>&1 ); do
				sleep 1
			done

			# Switch role for each disk
			hastctl role primary ${disk}
			if [ $? -ne 0 ]; then
				logger -p $log -t $name "Unable to change role to primary for resource ${disk}."
				exit 1
			fi
		done

		# Wait for the /dev/hast/* devices to appear
		for disk in ${resources}; do
			for I in $( jot 60 ); do
				[ -c "/dev/hast/${disk}" ] && break
				sleep 0.5
			done

			if [ ! -c "/dev/hast/${disk}" ]; then
				logger -p $log -t $name "GEOM provider /dev/hast/${disk} did not appear."
				exit 1
			fi
		done

		logger -p $log -t $name "Role for HAST resources ${resources} switched to primary."

		logger -p $log -t $name "Mounting disks."
		for disk in ${resources}; do
			mkdir -p /hast/${disk}
			fsck -p -y -t ufs /dev/hast/${disk}
			mount /dev/hast/${disk} /hast/${disk}
		done

	;;

	slave)
		logger -p $log -t $name "Switching to secondary provider for ${resources}."

		# Switch roles for the HAST resources
		for disk in ${resources}; do
			if ! mount | grep -q "^/dev/hast/${disk} on "
			then
			else
				umount -f /hast/${disk}
			fi
			sleep $delay
			hastctl role secondary ${disk} 2>&1
			if [ $? -ne 0 ]; then
				logger -p $log -t $name "Unable to switch role to secondary for resource ${disk}."
				exit 1
			fi
			logger -p $log -t $name "Role switched to secondary for resource ${disk}."
		done
	;;
esac

In a nutshell, the script takes these actions when a node becomes master:

  • Promotes the HAST pool to primary on the other node.

  • Checks the file system under the HAST pool.

  • Mounts the pool.

When a node becomes secondary:

  • Unmounts the HAST pool.

  • Degrades the HAST pool to secondary.

This is just an example script which serves as a proof of concept. It does not handle all the possible scenarios and can be extended or altered in any way, for example, to start or stop required services.

For this example, a standard UFS file system was used. To reduce the time needed for recovery, a journal-enabled UFS or ZFS file system can be used instead.

More detailed information with additional examples can be found at http://wiki.FreeBSD.org/HAST.

17.14.3. 疑難排解

HAST should generally work without issues. However, as with any other software product, there may be times when it does not work as supposed. The sources of the problems may be different, but the rule of thumb is to ensure that the time is synchronized between the nodes of the cluster.

When troubleshooting HAST, the debugging level of hastd(8) should be increased by starting hastd with -d. This argument may be specified multiple times to further increase the debugging level. Consider also using -F, which starts hastd in the foreground.

17.14.3.1. 自 Split-brain 情況復原

Split-brain occurs when the nodes of the cluster are unable to communicate with each other, and both are configured as primary. This is a dangerous condition because it allows both nodes to make incompatible changes to the data. This problem must be corrected manually by the system administrator.

The administrator must either decide which node has more important changes, or perform the merge manually. Then, let HAST perform full synchronization of the node which has the broken data. To do this, issue these commands on the node which needs to be resynchronized:

# hastctl role init test
# hastctl create test
# hastctl role secondary test

最後修改於: March 9, 2024 由 Danilo G. Baio