Commits
Tony Nguyen committed ff26449e415
PCI: Restore ARI Capable Hierarchy before setting numVFs In the restore path, we previously read PCI_SRIOV_VF_OFFSET and PCI_SRIOV_VF_STRIDE before restoring PCI_SRIOV_CTRL_ARI: pci_restore_state pci_restore_iov_state sriov_restore_state pci_iov_set_numvfs pci_read_config_word(... PCI_SRIOV_VF_OFFSET, &iov->offset) pci_read_config_word(... PCI_SRIOV_VF_STRIDE, &iov->stride) pci_write_config_word(... PCI_SRIOV_CTRL, iov->ctrl) But per SR-IOV r1.1, sec 3.3.3.5, the device can use PCI_SRIOV_CTRL_ARI to determine PCI_SRIOV_VF_OFFSET and PCI_SRIOV_VF_STRIDE. Therefore, this path, which is used for suspend/resume and AER recovery, can corrupt iov->offset and iov->stride. Since the iov state is associated with the device, not the driver, if we reload the driver, it will use the the corrupted data, which may cause crashes like this: kernel BUG at drivers/pci/iov.c:157! RIP: 0010:pci_iov_add_virtfn+0x2eb/0x350 Call Trace: pci_enable_sriov+0x353/0x440 ixgbe_pci_sriov_configure+0xd5/0x1f0 [ixgbe] sriov_numvfs_store+0xf7/0x170 dev_attr_store+0x18/0x30 sysfs_kf_write+0x37/0x40 kernfs_fop_write+0x120/0x1b0 vfs_write+0xb5/0x1a0 SyS_write+0x55/0xc0 Restore PCI_SRIOV_CTRL_ARI before calling pci_iov_set_numvfs(), then restore the rest of PCI_SRIOV_CTRL (which may set PCI_SRIOV_CTRL_VFE) afterwards. Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> [bhelgaas: changelog, add comment, also clear ARI if necessary] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Alexander Duyck <alexander.h.duyck@intel.com> CC: Emil Tantilov <emil.s.tantilov@intel.com>