zpool reports 16E expandsize on disks with oddball number of sectors

The issue is caused by a small discrepancy in how userland creates the
partition layout and the kernel estimates available space:

 * zpool command: subtract 9M from the usable device size, then align
   to 1M boundary. 9M is the sum of 1M "start" partition alignment + 8M
   EFI "reserved" partition.

 * kernel module: subtract 10M from the device size. 10M is the sum of
   1M "start" partition alignment + 1m "end" partition alignment + 8M
   EFI "reserved" partition.

For devices where the number of sectors is not a multiple of the
alignment size the zpool command will create a partition layout which
reserves less than 1M after the 8M EFI "reserved" partition:

  Disk /dev/sda: 1024 MiB, 1073739776 bytes, 2097148 sectors
  Units: sectors of 1 * 512 = 512 bytes
  Sector size (logical/physical): 512 bytes / 512 bytes
  I/O size (minimum/optimal): 512 bytes / 512 bytes
  Disklabel type: gpt
  Disk identifier: 49811D40-16F4-4E41-84A9-387703950D7F

  Device       Start     End Sectors  Size Type
  /dev/sda1     2048 2078719 2076672 1014M Solaris /usr & Apple ZFS
  /dev/sda9  2078720 2095103   16384    8M Solaris reserved 1

When the kernel module vdev_open() the device its max_asize ends up
being slightly smaller than asize: this results in a huge number (16E)
reported by metaslab_class_expandable_space().

This change prevents bdev_max_capacity() from returing a size smaller
than bdev_capacity().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed by: Sara Hartse <sara.hartse@delphix.com>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #1468 
Closes #8391
This commit is contained in:
loli10K 2019-02-23 00:36:34 +01:00 committed by Brian Behlendorf
parent 8d9e51c084
commit 0c637f3100
2 changed files with 22 additions and 4 deletions

View File

@ -1645,6 +1645,17 @@ vdev_open(vdev_t *vd)
error = vd->vdev_ops->vdev_op_open(vd, &osize, &max_osize, &ashift); error = vd->vdev_ops->vdev_op_open(vd, &osize, &max_osize, &ashift);
/*
* Physical volume size should never be larger than its max size, unless
* the disk has shrunk while we were reading it or the device is buggy
* or damaged: either way it's not safe for use, bail out of the open.
*/
if (osize > max_osize) {
vdev_set_state(vd, B_TRUE, VDEV_STATE_CANT_OPEN,
VDEV_AUX_OPEN_FAILED);
return (SET_ERROR(ENXIO));
}
/* /*
* Reset the vdev_reopening flag so that we actually close * Reset the vdev_reopening flag so that we actually close
* the vdev on error. * the vdev on error.

View File

@ -108,6 +108,10 @@ bdev_capacity(struct block_device *bdev)
* case, and updating the partition table if appropriate. Once the partition * case, and updating the partition table if appropriate. Once the partition
* size has been increased the additional capacity will be visible using * size has been increased the additional capacity will be visible using
* bdev_capacity(). * bdev_capacity().
*
* The returned maximum expansion capacity is always expected to be larger, or
* at the very least equal, to its usable capacity to prevent overestimating
* the pool expandsize.
*/ */
static uint64_t static uint64_t
bdev_max_capacity(struct block_device *bdev, uint64_t wholedisk) bdev_max_capacity(struct block_device *bdev, uint64_t wholedisk)
@ -122,14 +126,17 @@ bdev_max_capacity(struct block_device *bdev, uint64_t wholedisk)
* alignment restrictions. Over reporting this value isn't * alignment restrictions. Over reporting this value isn't
* harmful and would only result in slightly less capacity * harmful and would only result in slightly less capacity
* than expected post expansion. * than expected post expansion.
* The estimated available space may be slightly smaller than
* bdev_capacity() for devices where the number of sectors is
* not a multiple of the alignment size and the partition layout
* is keeping less than PARTITION_END_ALIGNMENT bytes after the
* "reserved" EFI partition: in such cases return the device
* usable capacity.
*/ */
available = i_size_read(bdev->bd_contains->bd_inode) - available = i_size_read(bdev->bd_contains->bd_inode) -
((EFI_MIN_RESV_SIZE + NEW_START_BLOCK + ((EFI_MIN_RESV_SIZE + NEW_START_BLOCK +
PARTITION_END_ALIGNMENT) << SECTOR_BITS); PARTITION_END_ALIGNMENT) << SECTOR_BITS);
if (available > 0) psize = MAX(available, bdev_capacity(bdev));
psize = available;
else
psize = bdev_capacity(bdev);
} else { } else {
psize = bdev_capacity(bdev); psize = bdev_capacity(bdev);
} }