CPU intensive flask app can only support 1 user per VM?
I continue to battle a lot of server config with a flask app. Just reaching out once again in case anyone has any ideas.
The current status:
Primary endpoint is a CPU intensive endpoint, taking 6 seconds to complete. Gunicorn uses Gthread worker type, 1 worker, 1 thread. This appeared to be the most performant and stable setup.
I host the flask app on one of the main cloud providers, it spins up extra replicas of the app as needed.
Essentially with some of the nuances of python, the GIL in particular, I don't seem to be able to support more than 1 concurrent user on a single machine. Either the response time doubles, or there is queuing behind the scenes. Threads competing for resources etc.
Is there anything I'm missing besides this just being a hard computation problem and the machine gets exhausted?
I know the gunicorn docs say 1 worker can support thousands of users, but does that only apply if the task is lightweight?
Thanks for any ideas... Feel like I've exhausted all gunicorn worker types, worker numbers, thread counts etc etc.