Best small local LLMs and libraries for mobile apps?
Hey everyone,
I’m researching small local LLMs for mobile apps and trying to choose what model/runtime stack is worth testing first.
The use case is not general chat. I need basic local text processing: summarization, rewriting, extracting structured fields, generating JSON/Markdown-like output, etc.
I’m mostly interested in what is actually practical on iOS and Android.
Models I’m considering:
- Qwen 0.5B / 0.6B / 1.5B
- Gemma small models
- Phi small models
- any other mobile-friendly model you would recommend
Libraries/runtimes I’m considering:
- llama.cpp / GGUF
- MLC LLM
- MediaPipe GenAI
- ExecuTorch
- ONNX Runtime
- llama.rn
- native wrapper exposed to Flutter
- any Flutter-friendly package if it is actually usable
My main questions:
- Which small model would you test first for mobile?
- Which runtime/library would you pair it with?
- Is GGUF + llama.cpp still the most practical default choice?
- Are Qwen 0.6B / 1.5B good enough for structured output on-device?
- Is Gemma or Phi better for this kind of use case?
- What quantization level gives the best balance between size, RAM, speed, and quality?
- Are there libraries that work well from Flutter, or should I expect to write native bindings?
- What stack would you avoid based on real-world experience?
Main constraints:
- iOS and Android
- Flutter app
- Offline/local inference preferred
- Structured output matters more than open-ended chat quality
- Reasonable app size
- Acceptable speed on mid-range devices
- Native integration is okay if needed
I’m mainly looking for practical recommendations: model + runtime/library combinations that are worth trying first, and any examples or repos that helped you.
Thanks!