I’ve been using a Proxmox home server for quite some time now without many problems.
Recently i got an AMD Navi 10 RX 5700 XT and tried to pass it through to a windows VM.
I mainly followed the official Proxmox guide but got it running by using some other tutorials too.
For now, it works once after i reboot the host. Then its no problem to start the VM, but after a restart the VM doesnt start no more, showing this error:
swtpm_setup: Not overwriting existing state file. kvm: ../hw/pci/pci.c:1637: pci_irq_handler: Assertion
0 <= irq_num && irq_num < PCI_NUM_PINS’ failed.
stopping swtpm instance (pid 98348) due to QEMU startup error
TASK ERROR: start failed: QEMU exited with code -1`
I tried fixing it using this but it didnt change much.
EDIT: link was not shown
Maybe this?
https://github.com/gnif/vendor-reset
Although I’ve been passing through a vega64 without needing this.
Yeah, i tried that - the link was just not shown in the original post That didnt really fix it
Try journalctl to get more details from when it fails?
This is the output from journalctl, since stopping and rebooting the VM: Main error seems to occur at 16:41:43 `Dec 19 16:40:45 pve pvedaemon[1590]: end task UPID:pve:00030675:000E7952:6581B96F:vncshell::root@pam: OK
Dec 19 16:40:47 pve kernel: vfio-pci 0000:03:00.0: not ready 16383ms after bus reset; waiting
Dec 19 16:41:03 pve pvedaemon[1590]: starting task UPID:pve:000308EE:000E85EB:6581B98F:qmstart:195:root@pam:
Dec 19 16:41:03 pve pvedaemon[198894]: start VM 195: UPID:pve:000308EE:000E85EB:6581B98F:qmstart:195:root@pam:
Dec 19 16:41:06 pve kernel: vfio-pci 0000:03:00.0: not ready 32767ms after bus reset; waiting
Dec 19 16:41:40 pve kernel: vfio-pci 0000:03:00.0: not ready 65535ms after bus reset; giving up
Dec 19 16:41:41 pve kernel: vfio-pci 0000:03:00.1: Unable to change power state from D0 to D3hot, device inaccessible
Dec 19 16:41:41 pve kernel: vfio-pci 0000:03:00.0: Unable to change power state from D0 to D3hot, device inaccessible
Dec 19 16:41:41 pve systemd[1]: 195.scope: Deactivated successfully.
Dec 19 16:41:41 pve systemd[1]: 195.scope: Consumed 54min 2.778s CPU time.
Dec 19 16:41:41 pve systemd[1]: Started 195.scope.
Dec 19 16:41:41 pve kernel: tap195i0: entered promiscuous mode
Dec 19 16:41:41 pve kernel: vmbr0: port 4(fwpr195p0) entered blocking state
Dec 19 16:41:41 pve kernel: vmbr0: port 4(fwpr195p0) entered disabled state
Dec 19 16:41:41 pve kernel: fwpr195p0: entered allmulticast mode
Dec 19 16:41:41 pve kernel: fwpr195p0: entered promiscuous mode
Dec 19 16:41:41 pve kernel: vmbr0: port 4(fwpr195p0) entered blocking state
Dec 19 16:41:41 pve kernel: vmbr0: port 4(fwpr195p0) entered forwarding state
Dec 19 16:41:41 pve kernel: fwbr195i0: port 1(fwln195i0) entered blocking state
Dec 19 16:41:41 pve kernel: fwbr195i0: port 1(fwln195i0) entered disabled state
Dec 19 16:41:41 pve kernel: fwln195i0: entered allmulticast mode
Dec 19 16:41:41 pve kernel: fwln195i0: entered promiscuous mode
Dec 19 16:41:41 pve kernel: fwbr195i0: port 1(fwln195i0) entered blocking state
Dec 19 16:41:41 pve kernel: fwbr195i0: port 1(fwln195i0) entered forwarding state
Dec 19 16:41:41 pve kernel: fwbr195i0: port 2(tap195i0) entered blocking state
Dec 19 16:41:41 pve kernel: fwbr195i0: port 2(tap195i0) entered disabled state
Dec 19 16:41:41 pve kernel: tap195i0: entered allmulticast mode
Dec 19 16:41:41 pve kernel: fwbr195i0: port 2(tap195i0) entered blocking state
Dec 19 16:41:41 pve kernel: fwbr195i0: port 2(tap195i0) entered forwarding state
Dec 19 16:41:43 pve kernel: vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible
Dec 19 16:41:43 pve kernel: vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible
Dec 19 16:41:43 pve kernel: vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible
Dec 19 16:41:43 pve kernel: vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible
Dec 19 16:41:43 pve kernel: vfio-pci 0000:03:00.1: Unable to change power state from D3cold to D0, device inaccessible
Dec 19 16:41:43 pve kernel: vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible
Dec 19 16:41:43 pve kernel: vfio-pci 0000:03:00.1: Unable to change power state from D3cold to D0, device inaccessible
Dec 19 16:41:44 pve kernel: pcieport 0000:02:00.0: broken device, retraining non-functional downstream link at 2.5GT/s
Dec 19 16:41:44 pve pvedaemon[1592]: VM 195 qmp command failed - VM 195 not running
Dec 19 16:41:45 pve kernel: pcieport 0000:02:00.0: retraining failed
Dec 19 16:41:46 pve kernel: pcieport 0000:02:00.0: broken device, retraining non-functional downstream link at 2.5GT/s
Dec 19 16:41:47 pve kernel: pcieport 0000:02:00.0: retraining failed
Dec 19 16:41:47 pve kernel: vfio-pci 0000:03:00.0: not ready 1023ms after bus reset; waiting
Dec 19 16:41:48 pve kernel: vfio-pci 0000:03:00.0: not ready 2047ms after bus reset; waiting
Dec 19 16:41:50 pve kernel: vfio-pci 0000:03:00.0: not ready 4095ms after bus reset; waiting
Dec 19 16:41:54 pve kernel: vfio-pci 0000:03:00.0: not ready 8191ms after bus reset; waiting
Dec 19 16:42:03 pve kernel: vfio-pci 0000:03:00.0: not ready 16383ms after bus reset; waiting
Dec 19 16:42:21 pve kernel: vfio-pci 0000:03:00.0: not ready 32767ms after bus reset; waiting
Dec 19 16:42:56 pve kernel: vfio-pci 0000:03:00.0: not ready 65535ms after bus reset; giving up
Dec 19 16:42:56 pve kernel: vfio-pci 0000:03:00.1: Unable to change power state from D3cold to D0, device inaccessible
Dec 19 16:42:56 pve kernel: vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible
Dec 19 16:42:56 pve kernel: fwbr195i0: port 2(tap195i0) entered disabled state
Dec 19 16:42:56 pve kernel: tap195i0 (unregistering): left allmulticast mode
Dec 19 16:42:56 pve kernel: fwbr195i0: port 2(tap195i0) entered disabled state
Dec 19 16:42:56 pve pvedaemon[199553]: stopping swtpm instance (pid 199561) due to QEMU startup error
Dec 19 16:42:56 pve pvedaemon[198894]: start failed: QEMU exited with code 1
Dec 19 16:42:56 pve pvedaemon[1590]: end task UPID:pve:000308EE:000E85EB:6581B98F:qmstart:195:root@pam: start failed: QEMU exit>
Dec 19 16:42:56 pve systemd[1]: 195.scope: Deactivated successfully.
Dec 19 16:42:56 pve systemd[1]: 195.scope: Consumed 1.736s CPU time.`
Formatted with a code block so it’s more readable:
16:41:43 `Dec 19 16:40:45 pve pvedaemon[1590]: end task UPID:pve:00030675:000E7952:6581B96F:vncshell::root@pam: OK Dec 19 16:40:47 pve kernel: vfio-pci 0000:03:00.0: not ready 16383ms after bus reset; waiting Dec 19 16:41:03 pve pvedaemon[1590]: starting task UPID:pve:000308EE:000E85EB:6581B98F:qmstart:195:root@pam: Dec 19 16:41:03 pve pvedaemon[198894]: start VM 195: UPID:pve:000308EE:000E85EB:6581B98F:qmstart:195:root@pam: Dec 19 16:41:06 pve kernel: vfio-pci 0000:03:00.0: not ready 32767ms after bus reset; waiting Dec 19 16:41:40 pve kernel: vfio-pci 0000:03:00.0: not ready 65535ms after bus reset; giving up Dec 19 16:41:41 pve kernel: vfio-pci 0000:03:00.1: Unable to change power state from D0 to D3hot, device inaccessible Dec 19 16:41:41 pve kernel: vfio-pci 0000:03:00.0: Unable to change power state from D0 to D3hot, device inaccessible Dec 19 16:41:41 pve systemd[1]: 195.scope: Deactivated successfully. Dec 19 16:41:41 pve systemd[1]: 195.scope: Consumed 54min 2.778s CPU time. Dec 19 16:41:41 pve systemd[1]: Started 195.scope. Dec 19 16:41:41 pve kernel: tap195i0: entered promiscuous mode Dec 19 16:41:41 pve kernel: vmbr0: port 4(fwpr195p0) entered blocking state Dec 19 16:41:41 pve kernel: vmbr0: port 4(fwpr195p0) entered disabled state Dec 19 16:41:41 pve kernel: fwpr195p0: entered allmulticast mode Dec 19 16:41:41 pve kernel: fwpr195p0: entered promiscuous mode Dec 19 16:41:41 pve kernel: vmbr0: port 4(fwpr195p0) entered blocking state Dec 19 16:41:41 pve kernel: vmbr0: port 4(fwpr195p0) entered forwarding state Dec 19 16:41:41 pve kernel: fwbr195i0: port 1(fwln195i0) entered blocking state Dec 19 16:41:41 pve kernel: fwbr195i0: port 1(fwln195i0) entered disabled state Dec 19 16:41:41 pve kernel: fwln195i0: entered allmulticast mode Dec 19 16:41:41 pve kernel: fwln195i0: entered promiscuous mode Dec 19 16:41:41 pve kernel: fwbr195i0: port 1(fwln195i0) entered blocking state Dec 19 16:41:41 pve kernel: fwbr195i0: port 1(fwln195i0) entered forwarding state Dec 19 16:41:41 pve kernel: fwbr195i0: port 2(tap195i0) entered blocking state Dec 19 16:41:41 pve kernel: fwbr195i0: port 2(tap195i0) entered disabled state Dec 19 16:41:41 pve kernel: tap195i0: entered allmulticast mode Dec 19 16:41:41 pve kernel: fwbr195i0: port 2(tap195i0) entered blocking state Dec 19 16:41:41 pve kernel: fwbr195i0: port 2(tap195i0) entered forwarding state Dec 19 16:41:43 pve kernel: vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible Dec 19 16:41:43 pve kernel: vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible Dec 19 16:41:43 pve kernel: vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible Dec 19 16:41:43 pve kernel: vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible Dec 19 16:41:43 pve kernel: vfio-pci 0000:03:00.1: Unable to change power state from D3cold to D0, device inaccessible Dec 19 16:41:43 pve kernel: vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible Dec 19 16:41:43 pve kernel: vfio-pci 0000:03:00.1: Unable to change power state from D3cold to D0, device inaccessible Dec 19 16:41:44 pve kernel: pcieport 0000:02:00.0: broken device, retraining non-functional downstream link at 2.5GT/s Dec 19 16:41:44 pve pvedaemon[1592]: VM 195 qmp command failed - VM 195 not running Dec 19 16:41:45 pve kernel: pcieport 0000:02:00.0: retraining failed Dec 19 16:41:46 pve kernel: pcieport 0000:02:00.0: broken device, retraining non-functional downstream link at 2.5GT/s Dec 19 16:41:47 pve kernel: pcieport 0000:02:00.0: retraining failed Dec 19 16:41:47 pve kernel: vfio-pci 0000:03:00.0: not ready 1023ms after bus reset; waiting Dec 19 16:41:48 pve kernel: vfio-pci 0000:03:00.0: not ready 2047ms after bus reset; waiting Dec 19 16:41:50 pve kernel: vfio-pci 0000:03:00.0: not ready 4095ms after bus reset; waiting Dec 19 16:41:54 pve kernel: vfio-pci 0000:03:00.0: not ready 8191ms after bus reset; waiting Dec 19 16:42:03 pve kernel: vfio-pci 0000:03:00.0: not ready 16383ms after bus reset; waiting Dec 19 16:42:21 pve kernel: vfio-pci 0000:03:00.0: not ready 32767ms after bus reset; waiting Dec 19 16:42:56 pve kernel: vfio-pci 0000:03:00.0: not ready 65535ms after bus reset; giving up Dec 19 16:42:56 pve kernel: vfio-pci 0000:03:00.1: Unable to change power state from D3cold to D0, device inaccessible Dec 19 16:42:56 pve kernel: vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible Dec 19 16:42:56 pve kernel: fwbr195i0: port 2(tap195i0) entered disabled state Dec 19 16:42:56 pve kernel: tap195i0 (unregistering): left allmulticast mode Dec 19 16:42:56 pve kernel: fwbr195i0: port 2(tap195i0) entered disabled state Dec 19 16:42:56 pve pvedaemon[199553]: stopping swtpm instance (pid 199561) due to QEMU startup error Dec 19 16:42:56 pve pvedaemon[198894]: start failed: QEMU exited with code 1 Dec 19 16:42:56 pve pvedaemon[1590]: end task UPID:pve:000308EE:000E85EB:6581B98F:qmstart:195:root@pam: start failed: QEMU exit> Dec 19 16:42:56 pve systemd[1]: 195.scope: Deactivated successfully. Dec 19 16:42:56 pve systemd[1]: 195.scope: Consumed 1.736s CPU time.
It does seem a lot like the reset bug, but then you already tried that. :/ Kernel module aren’t as easy to install and if you’re missing the required flags it might just do nothing.
Should show the 6 flags =y
Or maybe some variation of manual reset…
https://forum.proxmox.com/threads/issues-with-intel-arc-a770m-gpu-passthrough-on-nuc12snki72-vfio-pci-not-ready-after-flr-or-bus-reset.130667/
Just fyi, the 6 y-flags were shown
It was inteded to be a code block, but that way it was just a bunch of text without newlines somehow
dmesg also reported
vendor_reset: module verification failed: signature and/or required key missing - tainting kernel
However, according to https://github.com/gnif/vendor-reset/issues/46#issuecomment-983087796 this error is not as important…To everyone else encountering this error, I finally fixed it this way: This forum entry sent me here, which then helped me resolve the issue. Huge thanks to you, InEnduringGrowStrong, for pushing me in the right direction.
Ah nice you got it working.
Once it works it’s great.
I’ve been running mine for a while now, but purposefully avoided Kernel upgrades so far.
Haha, I already started worrying about that :) But you‘re right, its great.