Polars data pipeline run slower on 128-core EC2
The problem:
I have a relatively complex data pipeline that is written in Polars. On my local machine with 12 cores, the pipeline finishes in about 1200ms. On my 128-core EC2 (c8i.32xlarge), it takes 13000ms to complete. I have tried setting the POLARS_MAX_THREADS parameter to 12 on the EC2, and it's still slower.
I am using a TMPFS partition on both machines to read the data into the pipeline directly from RAM. Both my machine and the EC2 have DDR5 RAM so I think they should be comparable.
Anyone have any ideas why the pipeline would run much slower on the EC2?