• HW assisted DM integrity
    • Hide

      Acceptance Criteria:

      1) Appropriate HW needs to be purchase for development and validation

      NPo2 drives have various modes/capabilities.

      The Samsung PM1735 is known to have 4 modes (512+0, 512+8, 4k+0, 4k+8) and can already be found in a number of beaker machines (beaker search).  This drive does not have a 5th mode (4k+64) and is only suitable for RAID testing.

      There are problematic drives from Western Digital which support all 5 modes (e.g. WDC Ultrastar DC SN840).  These have all 5 modes, but we've had trouble booting systems while these are connected.

      You can test whether an NVMe drive has NPo2 capability by running:

      nvme id-ns -H /dev/nvme0n1

      and looking for:

      LBA Format  0 : Metadata Size: 0   bytes - Data Size: 512 bytes - Relative Performance: 0x1 Better (in use)
      LBA Format  1 : Metadata Size: 8   bytes - Data Size: 512 bytes - Relative Performance: 0x3 Degraded 
      LBA Format  2 : Metadata Size: 0   bytes - Data Size: 4096 bytes - Relative Performance: 0 Best 
      LBA Format  3 : Metadata Size: 8   bytes - Data Size: 4096 bytes - Relative Performance: 0x2 Good 
      LBA Format  4 : Metadata Size: 64  bytes - Data Size: 4096 bytes - Relative Performance: 0x3 Degraded

      2) Issues will be created and test plans should be written for the following:

      • dm-integrity testing with NPo2 storage
      • dm-integrity + RAID with NPo2 storage
      • dm-integrity + authenticated encryption with NPo2 storage

       

      Show
      Acceptance Criteria: 1) Appropriate HW needs to be purchase for development and validation NPo2 drives have various modes/capabilities. The Samsung PM1735 is known to have 4 modes (512+0, 512+8, 4k+0, 4k+8) and can already be found in a number of beaker machines ( beaker search ).  This drive does not have a 5th mode (4k+64) and is only suitable for RAID testing. There are problematic drives from Western Digital which support all 5 modes (e.g. WDC Ultrastar DC SN840 ).  These have all 5 modes, but we've had trouble booting systems while these are connected. You can test whether an NVMe drive has NPo2 capability by running: nvme id-ns -H /dev/nvme0n1 and looking for: LBA Format  0 : Metadata Size: 0   bytes - Data Size: 512 bytes - Relative Performance: 0x1 Better (in use) LBA Format  1 : Metadata Size: 8   bytes - Data Size: 512 bytes - Relative Performance: 0x3 Degraded  LBA Format  2 : Metadata Size: 0   bytes - Data Size: 4096 bytes - Relative Performance: 0 Best  LBA Format  3 : Metadata Size: 8   bytes - Data Size: 4096 bytes - Relative Performance: 0x2 Good  LBA Format  4 : Metadata Size: 64  bytes - Data Size: 4096 bytes - Relative Performance: 0x3 Degraded 2) Issues will be created and test plans should be written for the following: dm-integrity testing with NPo2 storage dm-integrity + RAID with NPo2 storage dm-integrity + authenticated encryption with NPo2 storage  
    • rhel-sst-platform-storage
    • False
    • Hide

      None

      Show
      None

      Description

      There are new NVMe drives emerging that have the capability to do non-power-of-2 (NPo2) sector sizes, usually 512+8, 4k+8, or 4k+64 bytes.  We can leverage this capability to improve the performance of dm-integrity substantially (> 2x).

      dm-integrity is a target that allows us to store a small bit of information about each sector - usually a checksum or CRC.  If the checksums match, we know that the data has not changed since it has been written - perhaps due to tampering or bit rot.  In effect, we are checking the "integrity" of the data, hence the name.

      We can leverage this target by putting it under RAID.  If a sector is read and found to be bad, an error is returned.  This error will trigger RAID to re-read the data from a redundant source and attempt a rewrite to the original bad sector.  This often corrects the error (especially if due to bit rot) and is a form of self-healing.  The extra amount of data required to store the checksum for this type of use case is small - 8 bytes will do.

      We can also put dm-integrity under dm-crypt to enable authenticated encryption (ensuring authorship).  This use case requires a bit stronger CRC and requires a bit more space per sector - 64 bytes will do.

      Until this new NPo2 hardware became available, dm-integrity would store sector checksums separately on disk from the associated sectors.  To do this atomically, a journal was required (or a bitmap).  So, there is a lot of write amplification.  A single sector write from an application would require writing to the journal, writing data, and separately writing the checksum.  NPo2 HW allows us to do this in one step.

      What SSTs and Layered Product teams should review this?

      sst_logical_storage
       

              jbrassow@redhat.com Jonathan Brassow
              jbrassow@redhat.com Jonathan Brassow
              Mikulas Patocka Mikulas Patocka
              Guangwu Zhang Guangwu Zhang
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated: