Linux软件RAID常见问题

RAID0 的 zone

Linux md RAID0 可以支持不同大小的磁盘构造多个区域(zone)的RAID,每个区域有不同的磁盘个数。例如两个 2TB、三个 4TB、一个 1TB 硬盘组成的软件 RAID0 设备容量为 17TB(包含 3 个 zone)。

而 RAID1,RAID456, RAID10 等对不同大小的磁盘只能使用最小的磁盘的容量作为公共的大小,多余的部分被浪费掉。

配置共享的热备盘

mdadm程序是允许多个RAID组共享冗余磁盘的。例如有 /dev/md0 和 /dev/md1 两个阵列,在创建时 /dev/md0 里面有一个热备磁盘,而 /dev/md1 没有热备磁盘。我们只要在 /etc/mdadm.conf 中配置两个阵列使用相同的 spare-group 组。

[root@fc5 mdadm-2.6.3]# cat /etc/mdadm.conf
DEVICE /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk
ARRAY /dev/md1 level=raid0 num-devices=3 spare-group=sparedisks
    UUID=dcff6ec9:53c4c668:58b81af9:ef71989d
ARRAY /dev/md0 level=raid10 num-devices=6 spare-group=sparedisks 
    UUID=0cabc5e5:842d4baa:e3f6261b:a17a477a

并运行 mdadm 的监控(monitor)模式命令。当 /dev/md1 阵列中的一个磁盘 /dev/sdi 失效时,mdadm 会自动从 /dev/md0 组中上移走 spare 磁盘,并加入 /dev/md1 中。

可分区的 RAID 设备

MD 设备支持两种块设备:一种是不可分区类型的名字是 md 设备,主设备号是 9,另一种是可分区的名字是 mdp 设备,mdp 的主设备号是动态分配的,一个 mdp 设备最多能支持 63 个分区。查看 /proc/devices 信息能看到 mdp 的主设备号,查询 /proc/partitions 信息能看到 md 和 mdp 设备的主设备号和从设备号。

[root@fc5 mdadm-2.6.3]# cat /proc/devices | grep md
1 ramdisk
9 md
253 mdp
[root@fc5 mdadm-2.6.3]# cat /proc/partitions | grep md
   9     1    2096896 md1
 253     0    5242560 md_d0
 253     1    1000002 md_d0p1
 253     2    1000002 md_d0p2

如果想对支持分区的 MD 设备(Partitionable raid array)进行分区,需要在创建时使用 /dev/md_d0 来替代前面的 /dev/md0。创建阵列是通过 --auto=mdp (或者其缩写 -ap) 参数指定。

[root@fc5 mdadm-2.6.3]#./mdadm -Cv --auto=mdp /dev/md_d0 -l5 -n6 /dev/sd[b-g] -x1 /dev/sdh

创建危急的阵列设备

To create a "degraded" array in which some devices are missing, simply give the word "missing" in place of a device name. This will cause mdadm to leave the corresponding slot in the array empty.

For a RAID4 or RAID5 array at most one slot can be "missing";

For a RAID6 array at most two slots;

For a RAID1 array, only one real device needs to be given. All of the others can be "missing".

ignoring sdX as it reports sdY as failed

What happened is that some disks of the raid failed. They were ejected. This happens. But the raid won't be assembled anymore if the failed disks are first on the mdadm assemble command line. Because for some reason, mdadm does not check what most disks say, but what the first disks say.

重启后 assemble 阵列为 md127 的问题

For version 1.2 superblocks, the preferred way to create arrays is by using a name instead of a number.

For example, if the array is your home partition, then creating the array with the option --name=home will cause the array to be assembled with a random device number (which is what you are seeing now, when an array doesn't have an assigned number we start at 127 and count backwards), but there will be a symlink in /dev/md/ that points to whatever number was used to assemble the array.

The symlink in /dev/md/ will be whatever is in the name field of the superblock. So in this example, you would have /dev/md/home that would point to /dev/md127 and the preferred method of use would be to access the device via the /dev/md/home entry.

A final note

mdadm will also check the homehost entry in the superblock and if it doesn't match either the system hostname or the HOSTNAME entry in mdadm.conf, then it will get assembled with a number postfix, so /dev/md/home might get assembled as /dev/md/home_0.

To turn this behavior off you either need to have a HOMEHOST entry in mdadm.conf that matches the homehost portion of the raid superblock or you need to have HOMEHOST in the mdadm.conf file.