u/AkamazZz

Best small local LLMs and libraries for mobile apps?

Hey everyone,

I’m researching small local LLMs for mobile apps and trying to choose what model/runtime stack is worth testing first.

The use case is not general chat. I need basic local text processing: summarization, rewriting, extracting structured fields, generating JSON/Markdown-like output, etc.

I’m mostly interested in what is actually practical on iOS and Android.

Models I’m considering:

  • Qwen 0.5B / 0.6B / 1.5B
  • Gemma small models
  • Phi small models
  • any other mobile-friendly model you would recommend

Libraries/runtimes I’m considering:

  • llama.cpp / GGUF
  • MLC LLM
  • MediaPipe GenAI
  • ExecuTorch
  • ONNX Runtime
  • llama.rn
  • native wrapper exposed to Flutter
  • any Flutter-friendly package if it is actually usable

My main questions:

  • Which small model would you test first for mobile?
  • Which runtime/library would you pair it with?
  • Is GGUF + llama.cpp still the most practical default choice?
  • Are Qwen 0.6B / 1.5B good enough for structured output on-device?
  • Is Gemma or Phi better for this kind of use case?
  • What quantization level gives the best balance between size, RAM, speed, and quality?
  • Are there libraries that work well from Flutter, or should I expect to write native bindings?
  • What stack would you avoid based on real-world experience?

Main constraints:

  • iOS and Android
  • Flutter app
  • Offline/local inference preferred
  • Structured output matters more than open-ended chat quality
  • Reasonable app size
  • Acceptable speed on mid-range devices
  • Native integration is okay if needed

I’m mainly looking for practical recommendations: model + runtime/library combinations that are worth trying first, and any examples or repos that helped you.

Thanks!

reddit.com
u/AkamazZz — 1 day ago

Best local/offline speech transcription options for Flutter mobile apps?

I’m researching speech transcription options for a Flutter mobile app and trying to understand what is currently practical on iOS and Android.

The main use case is simple: record audio and transcribe it locally or semi-locally. It does not have to be real-time — file-based transcription is completely fine.

I’m currently looking at:

  • Whisper / whisper.cpp
  • ONNX-based Whisper models
  • sherpa-onnx
  • native iOS Speech APIs
  • Android SpeechRecognizer / related APIs
  • other offline ASR models or libraries

My main questions:

  • What is currently the most practical option for local/offline transcription on mobile?
  • Is Whisper still the default choice, or are there better alternatives for mobile?
  • For Flutter, would you recommend an existing package, FFI, or native platform channels?
  • How realistic is word-level timestamp support on iOS and Android?
  • Are there good examples of file-based transcription pipelines in Flutter?
  • What are the main issues with performance, battery usage, app size, and model size?

Main constraints:

  • Flutter app
  • iOS and Android
  • Preferably offline/local
  • File-based transcription is okay
  • Real-time is optional
  • Word-level timestamps would be a plus
  • Should work reasonably well on mid-range devices

I’m mainly interested in real-world experience: what actually works, what is too slow, what breaks on mobile, and which libraries are worth testing first.

Thanks!

reddit.com
u/AkamazZz — 1 day ago

How do you turn voice thoughts into actual notes, not just transcripts?

I’m trying to understand how people handle note writing when the original thought starts as voice.

Obsidian works really well when the thought is already written down as text. But a lot of my ideas start differently: I’m walking, thinking out loud, recording a quick voice note, or just dumping a messy chain of thoughts.

I know there are audio recorders and transcription/Whisper plugins, but transcription alone still gives me a raw block of text. The annoying part is what comes after:

  • separating mixed topics
  • extracting actual tasks
  • turning vague thoughts into proper notes
  • splitting one long brain dump into smaller notes
  • adding tags/categories
  • deciding what should go into Daily Notes vs project notes
  • keeping the raw transcript without making the vault messy

So I’m curious: how do you currently turn voice notes into useful Obsidian/PKM notes?

Do you manually clean up transcripts, use templates, use AI prompts, use plugins, or just avoid voice capture completely?

reddit.com
u/AkamazZz — 2 days ago