u/CruxOfTheIssue

Hi. I don't think it needs to be said but I'm new to proxmox.

I have a got a PC with a 3060 and originally the plan was to make a full VM and passthrough the 3060 to the VM, since the host doesn't need it. When I was testing Proxmox on my old gaming laptop, this worked flawlessly with a 1070 laptop. This one has been plagued with issues where it dropped randomly and stopped working.

Now I'm using an LXC running Ollama and just sharing the graphics card from the host. It was working well at first but now we still have the same issue. I'm on an older kernel (6.14) because the newer nvidia drivers were not working and I couldn't get the old drivers on the newer kernel. The weirdest thing about it is that the driver always shows up on lspci, but nvidia-smi either doesn't work from boot, or drops once it idles for a while. I've tried using nvidia-persistenced but that doesn't work either.

~# dmesg | grep -iE "nvidia|xid|pcie|aer" | head -30

[ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-6.14.11-7-bpo12-pve root=/dev/mapper/pve-root ro pci=noaer quiet pcie_aspm=off pcie_port_pm=off intel_iommu=on

[ 0.085393] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-6.14.11-7-bpo12-pve root=/dev/mapper/pve-root ro pci=noaer quiet pcie_aspm=off pcie_port_pm=off intel_iommu=on

[ 0.085446] PCIe ASPM is disabled

[ 0.303407] ACPI FADT declares the system doesn't support PCIe ASPM, so disable it

[ 0.398653] pci 0000:00:01.0: [8086:4c01] type 01 class 0x060400 PCIe Root Port

[ 0.399552] pci 0000:00:06.0: [8086:4c09] type 01 class 0x060400 PCIe Root Port

[ 0.401997] pci 0000:00:14.3: [8086:43f0] type 00 class 0x028000 PCIe Root Complex Integrated Endpoint

[ 0.413603] pci 0000:00:1c.0: [8086:43bc] type 01 class 0x060400 PCIe Root Port

[ 0.414300] pci 0000:00:1d.0: [8086:43b0] type 01 class 0x060400 PCIe Root Port

[ 0.416709] pci 0000:01:00.0: [10de:2504] type 00 class 0x030000 PCIe Legacy Endpoint

[ 0.417097] pci 0000:01:00.1: [10de:228e] type 00 class 0x040300 PCIe Endpoint

[ 0.417255] pci 0000:02:00.0: [15b7:5009] type 00 class 0x010802 PCIe Endpoint

[ 3.366975] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input7

[ 3.435963] nvidia: loading out-of-tree module taints kernel.

[ 3.435972] nvidia: module license 'NVIDIA' taints kernel.

[ 3.435976] nvidia: module verification failed: signature and/or required key missing - tainting kernel

[ 3.435976] nvidia: module license taints kernel.

[ 3.497639] nvidia-nvlink: Nvlink Core is being initialized, major device number 235

[ 3.498715] nvidia 0000:01:00.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=none:owns=io+mem

[ 3.504842] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input8

[ 3.505232] input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input9

[ 3.505452] input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input10

[ 3.545555] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 550.163.01 Tue Apr 8 12:41:17 UTC 2025

[ 3.554547] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 550.163.01 Tue Apr 8 12:09:34 UTC 2025

[ 3.557497] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver

[ 3.557499] [drm] Initialized nvidia-drm 0.0.0 for 0000:01:00.0 on minor 1

[ 3.560653] nvidia_uvm: module uses symbols nvUvmInterfaceDisableAccessCntr from proprietary module nvidia, inheriting taint.

[ 3.591190] nvidia-uvm: Loaded the UVM driver, major device number 511.

I am basically completely out of ideas on how to make this work.

More weirdness, the VBIOS says ??.??.??.?? and the link speed is "downgraded".

cat /proc/driver/nvidia/gpus/0000:01:00.0/information

`Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-`

	`Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-`

	`LnkCap:`	`Port #0, Speed 16GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <512ns, L1 <16us`

	`LnkCtl:`	`ASPM Disabled; RCB 64 bytes, LnkDisable- CommClk-`

	`LnkSta:`	`Speed 2.5GT/s (downgraded), Width x16`

	`LnkCap2: Supported Link Speeds: 2.5-16GT/s, Crosslink- Retimer+ 2Retimers+ DRS-`

	`LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis-`

	`LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-`

	`Status:`	`InProgress-`

		`Status:`	`NegoPending- InProgress-`

	`LnkCtl3: LnkEquIntrruptEn- PerformEqu-`

Model: NVIDIA GeForce RTX 3060

IRQ: 170

GPU UUID: GPU-4dc155a4-ca48-7ab4-6be2-b53d299666e3

Video BIOS: ??.??.??.??.??

Bus Type: PCIe

DMA Size: 47 bits

DMA Mask: 0x7fffffffffff

Bus Location: 0000:01:00.0

Device Minor: 0

GPU Excluded: No

reddit.com
u/CruxOfTheIssue — 10 days ago