r/FastAPI

▲ 15 r/FastAPI

Monitoring and Observability in FastAPI

I am trying to understand the best practices for monitoring and Observability in fastAPI. Does it come with different metrics and otel out of the box? Also, how are you using other tools and library along with it to make it production ready?

reddit.com
u/sohtw — 1 day ago
▲ 26 r/FastAPI

You Probably Don't Need Celery in Your FastAPI App

A lot of FastAPI developers end up with Celery not because they need a distributed task queue, but because BackgroundTasks stopped being enough and Celery was the first thing that came up when they searched for a solution.

This is about that gap, and a library that fills it without the overhead.

What BackgroundTasks does not give you

FastAPI's built-in BackgroundTasks is straightforward. You attach a function to the response and Starlette calls it after the response is sent.

@app.post("/signup")
def signup(email: str, background_tasks: BackgroundTasks):
    background_tasks.add_task(send_welcome_email, email)
    return {"ok": True}

That covers fire-and-forget. But in a real application you quickly hit walls:

No retries. If send_welcome_email fails because the SMTP server returned a 503, the task is gone. There is no retry, no backoff, no record of what happened.

No persistence. If the app restarts during a deploy, every queued task disappears. Tasks that were waiting to run simply never run.

No visibility. You cannot see what is running, what has run, what failed, or how long things took. The only way to know a task failed is to catch it in your logs, if you are logging at all.

No scheduling. BackgroundTasks runs things once, after the current request. There is no built-in way to run something on a schedule.

These are not edge cases. They are the baseline requirements for any background job in production.

Why Celery is the wrong answer for most of these

When developers hit these limitations, the standard advice is: add Celery. And Celery does solve all four problems. But it solves them by giving you a distributed task queue, which comes with the full infrastructure that entails.

To use Celery with FastAPI you need:

  • A message broker. Usually Redis or RabbitMQ. A separate service to run, configure, monitor, and back up.
  • A worker process. A separate process that consumes from the broker. Needs to be deployed, restarted on failure, and kept in sync with the app on every deploy.
  • Celery Beat for scheduling. Another separate process.
  • Flower or similar if you want visibility. Yet another service.

Celery was built for teams running tasks on dedicated workers across multiple machines at high volume. If that describes your situation, it is the right tool. But most FastAPI apps sending emails, processing uploads, running nightly reports, and syncing data are not in that category. They just needed BackgroundTasks to grow up a little.

What actually fills the gap

fastapi-taskflow is built specifically for this problem: FastAPI apps that have outgrown BackgroundTasks but do not need a distributed task queue.

It runs inside your FastAPI process. No broker. No separate worker. Tasks execute the same way they do today, after the response, but now with retries, persistence, scheduling, and a live dashboard.

Setup:

from fastapi import BackgroundTasks, FastAPI
from fastapi_taskflow import TaskAdmin, TaskManager

task_manager = TaskManager(snapshot_db="tasks.db", requeue_pending=True)
app = FastAPI()
TaskAdmin(app, task_manager, auto_install=True)

Retries:

@task_manager.task(retries=3, delay=60.0, backoff=2.0)
def send_welcome_email(email: str):
    _send(email)  # raise any exception — the retry handles it

The function stays a plain function. Raise an exception on failure and it retries automatically with exponential backoff.

Your routes stay the same:

@app.post("/signup")
def signup(email: str, background_tasks: BackgroundTasks):
    background_tasks.add_task(send_welcome_email, email=email)
    return {"ok": True}

Same annotation. Same calling convention. Nothing changes in your routes.

Persistence across restarts:

requeue_pending=True saves tasks that were queued at shutdown and re-dispatches them on the next startup. Tasks no longer disappear on deploy.

Scheduling:

@task_manager.schedule(cron="0 2 * * *")
def nightly_cleanup():
    _run_cleanup()

No Beat process. No separate service. The scheduler runs inside the app.

Eager dispatch:

BackgroundTasks always runs after the response is sent. If you need a task to start immediately, before the response goes out, set eager=True:

@task_manager.task(retries=3, eager=True)
async def notify_user(user_id: int):
    await push_service.send(user_id, "Your request is processing")

The task starts via asyncio.create_task the moment add_task() is called. It is still tracked, still retried on failure, still visible in the dashboard. You can also set it per call:

background_tasks.add_task(notify_user, user_id, eager=True)

Visibility:

/tasks/dashboard is a live dashboard that shows every task, its current status, duration, logs, and the full stack trace on failure. It updates over SSE in real time. No Flower setup, no external monitoring service.

The honest trade-offs

This is not a Celery replacement. If your tasks are CPU-intensive and need isolation from request handlers, if you need to route different task types to dedicated worker machines, or if you are processing thousands of tasks per minute, you need a proper task queue.

What fastapi-taskflow covers is the case where you reached for Celery because BackgroundTasks gave you nothing, not because you genuinely needed distributed workers.

For a single-host deployment, multiple instances on the same host share a SQLite file. For multiple hosts, swap to Redis or PostgreSQL as the backend and idempotency, requeue claiming, and task history all work across instances without any coordination overhead.

What you skip entirely

No broker to run or monitor. No worker process to deploy or restart. No Celery app instance or separate tasks module. No Beat process for scheduling. No Flower for visibility.

Local development stays at uvicorn app.main:app. New developers on the project do not need to learn a separate system.

The four things that pushed you toward Celery in the first place, retries, persistence, scheduling, and visibility, are covered.

Dashboard View

Error stacktrace View

reddit.com
u/Educational-Hope960 — 3 days ago

What “production-ready FastAPI” actually means beyond making the route work

A lot of beginner FastAPI projects stop at:

u/app.post("/login")
def login():
    ...

But in real apps, “it works” is not the same as “it’s safe to ship.”

Some things I think every FastAPI route should be checked for:

  • Does the route verify the current user owns the resource?
  • Does it return only safe response fields?
  • Are expired / invalid tokens tested?
  • Are duplicate emails handled properly?
  • Are async DB sessions used correctly?
  • Are errors consistent and not leaking internals?
  • Are tests covering failure cases, not only happy paths?

The biggest jump for me was realizing that backend quality is mostly about edge cases.

Curious what other FastAPI devs here check before shipping a route?

reddit.com
u/Mysterious-Aerie4808 — 2 days ago
▲ 37 r/FastAPI+1 crossposts

Implementing OpenTelemetry in FastAPI Projects

Hi Pythonistas, I recently revamped our article on Implementing OpenTelemetry in FastAPI Projects in a practical manner, which was originally written in 2024 and needed a fresh coat of paint.

The article covers auto-instrumentation, manual spans, visualizing metrics and how observability lets you understand how your web apps behave.
I've also included some advanced tips, such as, selective error tracking, and wrapping dependency functions to capture any operations within the `yield` scope.

Since a lot of the concepts discussed here are independent of the FastAPI framework, any developer working with Python can probably find something of use here.

Finally, I hope this write up helps some folks become familiar with OpenTelemetry and observability.
Any feedback would be much appreciated, also curious to understand what problems you face with monitoring your web apps, be it FastAPI or any other web framework.

---

On a personal note, when implementing OpenTelemetry in my previous job, I went in semi-blind and relied on agents to guide me, and then spend a good week dealing with the various issues that popped up along the way...

u/silksong_when — 3 days ago

which is best tutorial or crash course to learn FastAPI one day before interview.

I work at a firm as Odoo/Python intern, I have an interview the day after tomorrow, i got shortlisted because of a vibe coded project using fastapi. But i am familiar with the basics of Fastapi upto passing parameters in API url. i need ot cover fastapi before my interview, its first round so there might be basic questions only. please suggest me.

reddit.com
u/istiyak23 — 4 days ago

FastAPI with SvelteKit

Hello, I am building an inventory system. I build with Django for my other stuff but I decided to learn FastAPI and JS Frontend. I've learned DRF but never really go full on implementation of it, specially dealing with authentication. So any help would be much appreciated.

Here is the stack that I'm going for:
- SvelteKit (because of remote function?, or its better if should I go with just svelte?)
- FastAPI
- Postgresql, SQLAlchemy, Alembic
- pyjwt (for authentication? or there's more better library?)
- S3 for file storage

Maybe Zod for data validation in client? Do I need axios? Am I missing something? Is there a something that I'm missing? Like don't forget to setup CORSMiddleware in FastAPI.

Also, is there any github repo that with similar setup with this one that I could take a look?

reddit.com
u/beast_b0iii — 5 days ago

how do you actually handle prod bugs. do you write a repro test or just fix and deploy?

honest question because i've gone back and forth on this myself.

when sentry fires do you actually reproduce it locally as a failing test before touching anything, or do you just read the trace, understand what broke and push the fix?

i always end up spending like 30-45 mins just getting the repro right. reconstructing the state, getting deps working in the test, running it, realizing the inputs are slightly off, running it again. by the time it actually reproduces i've lost the whole debugging flow.

got annoyed enough that i started building something to automate it. grabs the frame locals from sentry, generates a pytest, runs it in docker against your branch. still figuring out if this is actually useful to other people or just my own problem.

how long does it take you to write a repro test from a sentry trace? do you even bother or just push and monitor? has skipping it ever come back to bite you?

reddit.com
u/sszz01 — 5 days ago
▲ 11 r/FastAPI+6 crossposts

ArchUnit for Python: visualize + enforce dependencies. I've added your requested features!

A week ago I posted about ArchUnitPython, my library for enforcing architecture rules in Python projects as unit tests.

A few of you pointed out two very practical gaps for real Python codebases:
external dependencies, and type-only imports. So to your request I’ve added both.

------

First a mini recap of what ArchUnitPython does:

  • Most tools catch style issues, formatting issues, or generic smells.
  • ArchUnitPython focuses on structural rules: wrong dependency directions, circular dependencies, naming convention drift, architecture/diagram mismatch, and so on.
  • You define those rules as tests, run them in pytest/unittest, and they automatically become part of CI/CD

In other words: ArchUnitPython allows you to enforce your architectural decisions by writing them as simple unit tests.

That matters more than ever in Claude Code / Codex times, because LLMs are great at generating code but they love to violate architectural boundaries, especially when they get stuck.

Repo: https://github.com/LukasNiessen/ArchUnitPython

------

Now what’s new

1. External Dependency Rules

Before, ArchUnitPython could already enforce internal dependency rules like:

“presentation must not depend on database” or “services must not import api”

Now it can also enforce rules about imports to modules outside your project, for example:

  • domain code must not import requests
  • core logic must not import sqlalchemy
  • only certain layers may use pandas, boto3, etc.

So you can now guard not just folder-to-folder boundaries, but also framework / SDK usage boundaries.

Example:

rule = (
    project_files("src/")
    .in_folder("**/domain/**")
    .should_not()
    .depend_on_external_modules()
    .matching("requests")
)
assert_passes(rule)

This is especially useful in layered or hexagonal architectures where the real problem is often not “wrong local file import”, but “core code now directly depends on infrastructure/framework code”.

2. TYPE_CHECKING-aware dependency analysis

Python has a common pattern for type-only imports:

from typing import TYPE_CHECKING

if TYPE_CHECKING:
    from my_app.models import User

Those imports are used for static typing, but they are not real runtime coupling in the same way normal imports are.

Previously, architecture analysis would still count them as ordinary dependencies.
Now you can choose to ignore them when checking architecture rules.

Example:

assert_passes(
    rule,
    CheckOptions(ignore_type_checking_imports=True),
)

This matters because modern Python codebases use type hints heavily, and otherwise architecture checks can become noisy or overly strict for relationships that only exist for typing.

------

Very curious for any type of feedback! PRs are also highly welcome.

u/trolleid — 6 days ago

Issue while processing multiple doc at a time in a limited memory space

I have an API that tags documents based on names from a predefined standard name list. Before tagging, I receive a list of file URLs (public Firebase URLs), typically around 40–80 files, which need to be processed using an LLM (currently OpenAI).

The issue I’m facing is with memory usage. When I download these documents for processing, the application sometimes crashes due to out-of-memory (OOM) errors. The instance I’m using has only 2GB RAM, and I’d prefer not to increase the instance size until I’ve fully optimized the code.

The problem seems to occur because multiple PDFs are being processed asynchronously, and at some point, many of them are held in memory simultaneously. I also perform additional operations like base64 encoding for images, which further increases memory usage. Since I need to return all document tags within about a minute, I’m using parallel processing.

Current approach:

  • There are 10–15 document types that I send directly to OpenAI.
  • For images (JPG, PNG, JPEG): I download them, base64 encode them, and send them to OpenAI.
  • For PDFs: I download them, upload them via OpenAI’s file API, and then send the file ID for processing.
  • All of this is done in parallel using semaphores:
    • OPENAI_SEMAPHORE = 30
    • DOWNLOAD_SEMAPHORE = 15

Problem:

Even with semaphores, memory usage spikes because multiple large files are downloaded and processed at the same time. This leads to OOM crashes.

Questions:

  • How can I reduce memory usage in this workflow?
  • Is there a better architectural approach to handle this kind of workload?
  • How can I avoid having too many documents in memory at once while still maintaining performance constraints?

​

async def _stream_download_file(url: str, ext: str) -> str:
    """
    Stream-download a file to disk in 64KB chunks.
    Never holds the full file in memory — writes directly to disk.
    Returns the path to the temp file.
    """
    async with DOWNLOAD_SEMAPHORE:
        temp_path = None
        try:
            temp = tempfile.NamedTemporaryFile(delete=False, suffix=ext or ".tmp")
            temp_path = temp.name
            temp.close()


            async with http_client.stream("GET", url, follow_redirects=True) as response:
                response.raise_for_status()
                with open(temp_path, "wb") as f:
                    async for chunk in response.aiter_bytes(chunk_size=65536):
                        f.write(chunk)


            return temp_path


        except asyncio.TimeoutError:
            _cleanup_local_file(temp_path, "failed download")
            raise Exception(f"Download timed out after {DOWNLOAD_TIMEOUT}s")
        except Exception as e:
            _cleanup_local_file(temp_path, "failed download")
            raise Exception(f"Download failed: {str(e)}")
reddit.com
u/BumblebeeTight9348 — 6 days ago

dbwarden: Everything you loved about Django migrations, in FastAPI.

I released DBWarden 0.6 yesterday. Here is what it does.

https://github.com/emiliano-gandini-outeda/DBWarden

DBWarden is a database toolkit for FastAPI and SQLAlchemy. Migrations, async sessions, startup validation, and health checks. One config call. Zero boilerplate.

Most migration setups spread config across multiple files, multiple abstractions, and multiple sources of truth. DBWarden collapses all of that into a single database_config() call. That one call drives your sessions, your health checks, and your migration state. Nothing else to configure.

Your migrations are plain SQL files. No DSL to learn. No auto-generated Python to decode. You write the SQL, you read the SQL, and that is exactly what runs against your database. What you get:

  • One config call for everything
  • Plain SQL migrations with rollback included by default
  • Async session dependency ready to inject with get_session()
  • A mountable health router with DBWardenHealthRouter()
  • A lifespan helper with migration_context()
  • Dev mode: SQLite locally, PostgreSQL in production, no changes to your migration files

Supported databases: PostgreSQL, MySQL, MariaDB, SQLite, ClickHouse.

Three commands to get started: dbwarden init dbwarden make-migrations "create users table" dbwarden migrate Done. Your schema is versioned, reviewable, and reversible.

No wrappers, no hidden state.

MIT licensed. Actively maintained. Source in this repo.

reddit.com
u/ReputationCautious77 — 7 days ago

[UPDATE] Major update to Violit (my FastAPI-powered Streamlit alternative): Signal-based reactivity, now with SQLModel ORM and Auth built-in.

Hey r/FastAPI!!!

https://preview.redd.it/ot03m2by8kxg1.png?width=1219&format=png&auto=webp&s=0f71d9bb7cb34b2d79465c5daa919931101ba3e4

A while back, I shared the very early alpha of Violit here. As a backend/AI dev, I loved the simplicity of Streamlit for building quick UIs, but I absolutely hated the "full script rerun" bottleneck. So, I built a framework using FastAPI as the core engine to deliver that top-down scripting experience, but with signal-based fine-grained reactivity.

Instead of rerunning the whole script on every click, FastAPI maintains a persistent WebSocket connection. When a state changes, only the exact dependent widget updates.

Since that first post, I've pushed a massive update that turns Violit from a simple UI tool into a full-stack Python framework. Because it’s built natively on FastAPI and Uvicorn, I was able to seamlessly bake in the tools our ecosystem already uses:

  • SQLModel ORM Built-in: Perfect for FastAPI users. Just pass a DB path (vl.App(db=...)) and start querying immediately.
  • Auth out of the box: Session auth, hashing, and page protection are natively supported.
  • Async Background Jobs: Need to run heavy AI inference or DB queries? Use app.background() to offload tasks via FastAPI's async capabilities without freezing the frontend.
  • Tailwind & Web Awesome: Style components directly using a simple cls parameter.
  • 90% Streamlit API compatibility: The syntax feels familiar, but the architecture is completely different.

It writes like a simple script, but runs like a modern reactive app over FastAPI.

It’s completely open-source (MIT). I’d love for my fellow FastAPI devs to try out the new update, roast the architecture, or let me know if you'd use this for your next data app or internal tool!

Thanks for reading!

reddit.com
u/Puzzleheaded_Clerk68 — 6 days ago
▲ 10 r/FastAPI

[UPDATE] I got tired of rebuilding OAuth for FastAPI projects, so I made a small CLI for it

Update on this -- I got tired of rebuilding OAuth for FastAPI projects, so I made a small CLI for it
by u/theRealSachinSpk in FastAPI

https://i.redd.it/dsous9vidjxg1.gif

Shipped v1.1.0 based on some of the feedback here and conversations I had after posting.

What changed:

  • Added Discord, Spotify, Microsoft, and LinkedIn as providers (6 total now)
  • Added PKCE support (OAuth 2.1) -- the thing I mentioned in the original post. You can enable it on any provider with one line
  • The oauth-init CLI now scaffolds all 6 providers with PKCE out of the box
  • Built an interactive OAuth debugger (Learn Mode) into the tutorial app -- it pauses at each step of the flow and shows you the actual HTTP requests, the token exchange body, the raw provider response, everything

That last one came from thinking about what u/ar_tyom2000 mentioned about fastapi-oauth2. There are great libraries that handle OAuth as middleware (please check it out), I'll close this up by letting you debug what's happening. The debugger shows the authorization URL parameters, the callback code, the token exchange POST, and the raw userinfo JSON. Useful if you're learning or if something breaks and you need to figure out why.

Also wrote up a longer walkthrough on Medium if anyone wants the full picture: Medium Article

GitHub: REPO
PyPI: pip install oauth-for-dummies

Thanks for the feedback last time -- it shaped where this went.

reddit.com
u/theRealSachinSpk — 6 days ago
▲ 3 r/FastAPI+1 crossposts

Benchmarking API agents vs vision agents on the same task - 40x fewer tokens, 44x faster

Hey r/FastAPI! I'm the creator of Reflex, an open-source Python web framework. We just released v0.9 and wanted to share something relevant with this community.

We ran a benchmark comparing two approaches to letting AI agents interact with a web app:

  1. A vision agent (browser/computer use) that screenshots the UI and clicks around
  2. An API agent that calls HTTP endpoints directly

The task for both agents was to find a "Smith" customer with the most orders, accept their pending reviews, and mark their most recent order as delivered. We chose this task since it's similar to automation work a typical tool sees.

The vision agent took 550k tokens and 17 minutes on average, the API agent took 12k tokens and 19.7 seconds. Of course, API agents are faster and more token efficient since they don't need to take screenshots and directly interface with the UI. The problem is many apps don't have APIs for every action, since it takes engineering overhead to build and maintain each separate API codebase.

We built a plugin for Reflex that auto-generates FastAPI-compatible HTTP endpoints from your app's existing event handlers. For example, your app has a button with an on_click handler, the plugin exposes that handler as an endpoint. An agent can call the same function a human click triggers. No separate API to build or maintain.

Reflex compiles to React on the frontend and Python on the backend, with full FastAPI compatibility.

benchmark link: Vision Agents vs API Calls
our repo: reflex-dev/reflex: 🕸️ Web apps in pure Python 🐍

u/Boordman — 4 days ago

[FOR HIRE] Python Developer | FastAPI REST APIs | Authentication | CRUD | Docker | Remote

I build clean, production-ready REST APIs with FastAPI. JWT auth, CRUD, PostgreSQL, Dockerized and ready to deploy.

Portfolio: https://github.com/IlkinCavadov

Open to small and medium projects. Feel free to DM me!

u/Dependent-Band-3289 — 1 day ago