Hoping someone can help me make sense of some metrics and confirm whether our planned upgrade will actually fix the problem.
**Our setup:**
- Azure VM: Standard B2als_v2 (2 vCPUs, 4GB RAM)
- OS: Ubuntu 24.04
- Disk: 64GB Premium SSD LRS, Max IOPS: 240
- Moodle with H5P, SCORM, and standard activities
- ~70 active users currently, planning to onboard 350-500 more by October
**The problem:**
Duplicating activities (even lightweight text-only ones) is extremely slow — over a minute in some cases. Today, duplicating a single activity caused the entire site to become inaccessible for over 10 minutes.
**What the Azure metrics showed during the outage:**
- CPU: peaked at only 31%
- Disk Write Operations: spiked to 630+/sec against a Max IOPS of 240 (well over 2x the disk limit)
- Disk Write throughput: peaked at 1.59GiB
- Disk Read throughput: peaked at 555KiB
- Network Out: peaked at 77MiB — not sure if this is relevant or abnormal
**Our interpretation:**
The disk is being completely overwhelmed during duplication, causing everything to queue up and the site to go down. CPU and RAM don't appear to be the issue.
**Planned upgrade:**
We're considering moving to a D4as_v4 (4 vCPUs, 16GB RAM, local SCSI storage, higher IOPS) primarily because it addresses the disk I/O limitation. But we're not confident it can handle a 1.59GiB write spike — is that level of throughput normal for Moodle activity duplication, or does it suggest something else is going on?
**Specific questions:**
- Is there anything else we should be checking or tuning on the Moodle side before upgrading the VM?
Thanks in advance — any help is appreciated!
***Update: Thank you, Redditors for being more timely and helpful than Azure customer support!***