u/Popular-Sand-3185

Polars data pipeline run slower on 128-core EC2

The problem:

I have a relatively complex data pipeline that is written in Polars. On my local machine with 12 cores, the pipeline finishes in about 1200ms. On my 128-core EC2 (c8i.32xlarge), it takes 13000ms to complete. I have tried setting the POLARS_MAX_THREADS parameter to 12 on the EC2, and it's still slower.

I am using a TMPFS partition on both machines to read the data into the pipeline directly from RAM. Both my machine and the EC2 have DDR5 RAM so I think they should be comparable.

Anyone have any ideas why the pipeline would run much slower on the EC2?

reddit.com

u/Popular-Sand-3185 — 6 days ago

▲ 44 r/Python

Polars code runs slower on 128-core EC2

Disclaimer: I am not sure this post is appropriate for r/LearnPython since it's not a question of "how to do something in Python", rather I am looking for a lower-level discussion for why my Python application performs poorly on a significantly more powerful server. Hence I'm posting it here.

The problem:

I have a relatively complex data pipeline that is written in Polars. On my local machine with 12 cores, the pipeline finishes in about 1200ms. On my 128-core EC2, it takes 13000ms to complete. I have tried setting the POLARS_MAX_THREADS parameter to 12 on the EC2, and it's still slower.

I am using a TMPFS partition on both machines to read the data into the pipeline directly from RAM. Both my machine and the EC2 have DDR5 RAM so I think they should be comparable.

Anyone have any ideas why the pipeline would run much slower on the EC2?

reddit.com

u/Popular-Sand-3185 — 6 days ago