u/iMakeSense

Breakup Rumination

I've had a variety of relationships.

Every time I breakup, I enter this massive depression. I find myself re-playing conversations and arguments and other such things over and over trying to fill in information I can't know and understand about my partners.

Lots of the times, I finding that I'm by myself talking to myself, or walking somewhere and feeling some sort of anguish or angst that causes me to twitch. It's been so hard to catch. I still have dreams about them. It makes waking up hard.

This last one has fucked me up quite a lot. Poly. Seemed like they were monkey branching to me then boomeranging. I keep trying to understand. To find a narrative. I keep trying to wonder if it was me or if it was them or what I did wrong or what they did wrong. There were a lot of things we associated with each other, so a lot of things trigger to remind me of them throughout the day. I've broken my OCD and tried to message them through means even though she blocked me. I just keep finding other means. I messaged her in anticipation of my problems so she can block me in certain places. Sometimes I fail. I've failed recently.

I feel like it's a bunch of things. A bunch of things I need to do ERP with. If someone else has gone through this, I'd appreciate knowing how you got out. It's been 6 months since we broke up, we dated for 3.

reddit.com
u/iMakeSense — 1 day ago

I've been on a binge finding uses for local AI on my machine outside of general LLM usage as I'm not sure what other sub discovery of these things should go on. Here's a collection of my findings.

I'd appreciate other contributions that are off the beaten path or collections.

Somewhat "common" apps / models

Applio

invaluable voice to voice translation app. Was quite easy to find a voice online and map it from one to another. Used it to clean up some crappy lecture recordings. What you use if you want to make a recording sound like Obama.

Ultimate-TTS-Studio

great for converting any sort of text into audio using a variety of locally running models. Things like transcripts to ebooks. Comes with good tools to parse certain upload types. Used it to make an audiobook out of an EPUB.

Open Web UI

I know lots of people use this, but there's also a Desktop version in beta. I hate running containers or severs or what have you so this eases a lot of the headache.

There are also settings that allow you to use TTS models and STT models so you can have a vocal conversational experience.

Pinokio

A good hosting program for a bunch of AI apps. Good for if you want to just click, try something out, and then dip. Irritating though as lots of apps crash. Look for something with a high amount of checkins. Also a good interface for running Open Web UI.

Handy

easy speech to text for vocal transcription.

Apps / Models I've seen less mentioned

ComfyUI

Seems like a model pipeline manager, I just can't understand the ecosystem enough to use it with local models. I'm not sure if I have to do a lot of installation myself or how its plugin architecture works. Whenever I look at external plugins they seem to mostly be in chinese w/ english translations and have fewer stars than normal so I'm never sure if I'm doing the right thing. Spent an hour on it.

Ultimate Vocal Remover

this one is good but a PITA. You have to look at your system monitor to see that it's actually using the GPU and you have to install the latest BETA from the site. The settings are also convoluted. Fails silently a lot.

Meetily - Oddly hard to find closed caption model.

You'd think this would be the first thing people would use STT for, but oddly it's hard to find something realtime. Handy is more for text input rather than closed captioning.

Voice Upscaling

Neat package for voice upscaling, but I feel like something better ought to exist.

Long Form Speech Transcription

Parakeet 0.6b / VibeVoice / CohereTranscribe
I don't know why people keep touting whisper. These are more accurate, hallucinate less, and or run faster, or provide more features ( speaker tagging and voice activation ). Feels like GIMP vs. Krita. Whisper hallucinates because it's train off Youtube data.

It's odd that more leaderboards on hugging face aren't posted here. Oddly I feel as though most ASR frontends are geared towards smaller things.

Obscure Examples

Audio to Midi

Takes music, generates a midi file

Goon tagging

Porn classification.

Speakr - Seems to require a lot of config as well

Might need a separate compose setup to spin it up with corresponding models and take it down. For OCD note taking essentially.

Things I've been looking for

Gallery to slideshow

I've found this feature a lot in google photos and Samsung gallery. Something like an AMV generator like the old 2000s youtube channels would ma

AI video editing

Something where I can put in clips and it gives me processing options. Things like action tagging, topic transitions, silence and vocal activity, etc.

Voice Cloning -> singing :

Applio seems great for that but I'm figuring out how to "train" a voice in the format it requires. I'd be nice to have a tool that uses 30 second one shots like other tools, but I don't know if that'll reduce quality.

Speech editing

I've had lots of recorded audio where I'd like to get a transcript and re-type a part of my speech to make it seem natural without having to re-record.

Good image / video / text search front-end

I just want to tag and organize things ideally through embeddings where possible. Just something I can double click, configure, and point at a folder.

Spoken Audio Cleanup

Also oddly hard to find? There are stem separation tools, but it feels like this needs its own unique pipeline. Not sure which models are best for this.

Batch transcription front-end with cleanup pipeline

Something that can go Audio cleanup -> voice activation -> asr -> transcription -> output format ideally but anything with batch transcription would be great. Odd that this doesn't exist.

Generally the "Ollama" for other means

General AI packages and pipelines for things like audio production, conversation analysis, etc.

Discovery Methods

Github Tags

Searching through AI related repository stats

  • local-ai, speech-to-text, semantic-search, speech-enhancement

** Alternative To ** https://alternativeto.net/ Used to find open source alternatives to popular software

If you have any suggestions to discovery methods, obscure models, or other comprehensive model packaging tools I'd appreciate you sharing them! Ideally things with

  • decent communities
  • more recent / capable models
  • alternatives to popular paid tools.
u/iMakeSense — 8 days ago

I have a 5070 and a 5060ti in my machine. I know in Windows both are usable even concurrently. I'm trying to figure out why only one is showing up.

Both PCIe slots are showing. Got this from the logs:
I have a 5070 and a 5060ti in my machine. I know in Windows both are usable even concurrently. I'm trying to figure out why only one is showing up. Both PCIe slots are showing. Got this from the logs:

[    5.752828] nvidia 0000:04:00.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=none:owns=io+mem
[    5.765243] nvidia 0000:2b:00.0: enabling device (0000 -> 0003)
[    5.765367] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
[    5.765428] nvidia 0000:2b:00.0: probe with driver nvidia failed with error -1
[    5.765470] NVRM: The NVIDIA probe routine failed for 1 device(s).
[    5.765474] NVRM: loading NVIDIA UNIX Open Kernel Module for x86_64  580.159.03  Release Build  (dvs-builder@U22-I3-AM27-29-6)  Fri Apr 24 06:03:03 UTC 2026
reddit.com
u/iMakeSense — 9 days ago

It sucks being the first in your family to go to therapy.
It sucks accumulating label after label.
It sucks to be able to power through something and have no explanation, then being aware of said thing and hypersensitive to it.
It sucks to have empathy for your parents who didn't have the means that you did, but still suffer from the issues you're aware of.
It sucks that you can't convince them.
It sucks that they can't comfort you.
It sucks that they don't understand when you're upset.
It sucks. It sucks. It sucks. It sucks. It sucks. It sucks. It sucks. It sucks. It sucks. It sucks. It sucks. It sucks. It sucks. It sucks. It sucks.

I'm tired.

reddit.com
u/iMakeSense — 16 days ago
▲ 427 r/povertyLocalLLaMA+1 crossposts

For those who want to run latest dense ~30b models and only have 16GB VRAM, if you have a old card with 6GB VRAM or more, plug it in.

It matters that everything fits on the VRAM, even on 2 cards. Even if one of them is quite weak.

I have a 5070Ti 16GB and a old 2060 6GB. The common idea is you need 2 same GPU to maximize performance. But one day I was strike by the idea, why not give it a try?

Let's see, if you did not bought a mother board just for LLM, it's very possible you have a true PCI-E x16 slot and a couple that looks like x16 but are actually wired with x4, just like me. That's a perfect slot for a old card.

16GB + 6GB = 22GB, it's getting close to the 24GB class card. If you have a better old card, lucky you!

Then you use llama-server with a config like this

[*]
jinja = true
cache-prompt = true
n-gpu-layers = 999
no-mmap = true
mlock = false
np = 1
t = 0

[qwen/qwen3.6-27b]
model = ./Qwen3.6-27B-GGUF/Qwen3.6-27B-Q4_K_M.gguf
mmproj = ./Qwen3.6-27B-GGUF/mmproj-Qwen3.6-27B-BF16.gguf
reasoning = on
dev = Vulkan1,Vulkan2
c = 128000
no-mmproj-offload = true
cache-type-k = q8_0
cache-type-v = q8_0

A couple specific points:
- dev=Vulkan1,Vulkan2, this enables the two GPUs, run `llama-server.exe --list-devices` to see what you should set.
- no-mmap and mlock=false keeps the model away from your RAM
- np=1, no-mmproj-offload (or do not supply mmproj model), cache-type-k and cache-type-v to minimize VRAM needed
- n-gpu-layers=999 to prefer GPU offloading, well this may be unnecessary, but I'd keeps it
- split-mode=layer to split the layers asymmetrically across the device, "layer" is the default though so you don't see it above.
- c=128000 could be a little stretch, but works well enough for me.

BTW I also have intel integrated GPU that I plugged the monitors into, which is Vulkan0.

Some numbers, basically, at 128k max context, 71k actual context useage, pp=186t/s and tg=19t/s, quite usable speed compared to the 4t/s on single card.

[56288] prompt eval time =    5761.53 ms /  1076 tokens (    5.35 ms per token,   186.76 tokens per second)
[56288]        eval time =   58000.15 ms /  1114 tokens (   52.06 ms per token,    19.21 tokens per second)
[56288]       total time =   63761.69 ms /  2190 tokens
[56288] slot      release: id  0 | task 654 | stop processing: n_tokens = 71703, truncated = 0

Edit:

Some folks want numbers, so here is llama bench. This is with cuda instead. Runs with --device CUDA0 are on single GPU. Without uses all GPU. It's fairly clear fitting on GPU, even on a second weak one, matters a lot for tg speed, especially at long context.

llama-b8948-bin-win-cuda-12.4-x64/llama-bench.exe \
    --model ./lmstudio-community/Qwen3.6-27B-GGUF/Qwen3.6-27B-Q4_K_M.gguf \
    --device CUDA0 --fit-target 64  -d 8192,16384

| model                          |       size |     params | backend    | ngl | dev          |       fitt |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------------ | ---------: | --------------: | -------------------: |
| qwen35 27B Q4_K - Medium       |  15.40 GiB |    26.90 B | CUDA       |  99 | CUDA0        |         64 |   pp512 @ d8192 |       903.13 ± 26.25 |
| qwen35 27B Q4_K - Medium       |  15.40 GiB |    26.90 B | CUDA       |  99 | CUDA0        |         64 |   tg128 @ d8192 |         16.54 ± 0.14 |
| qwen35 27B Q4_K - Medium       |  15.40 GiB |    26.90 B | CUDA       |  99 | CUDA0        |         64 |  pp512 @ d16384 |        663.60 ± 9.22 |
| qwen35 27B Q4_K - Medium       |  15.40 GiB |    26.90 B | CUDA       |  99 | CUDA0        |         64 |  tg128 @ d16384 |         12.03 ± 0.08 |


llama-b8948-bin-win-cuda-12.4-x64/llama-bench.exe \
    --model ./lmstudio-community/Qwen3.6-27B-GGUF/Qwen3.6-27B-Q4_K_M.gguf \
    --fit-target 64 -d 8192,16384

| model                          |       size |     params | backend    | ngl |       fitt |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ---------: | --------------: | -------------------: |
| qwen35 27B Q4_K - Medium       |  15.40 GiB |    26.90 B | CUDA       |  99 |         64 |   pp512 @ d8192 |        769.00 ± 4.50 |
| qwen35 27B Q4_K - Medium       |  15.40 GiB |    26.90 B | CUDA       |  99 |         64 |   tg128 @ d8192 |         25.40 ± 0.30 |
| qwen35 27B Q4_K - Medium       |  15.40 GiB |    26.90 B | CUDA       |  99 |         64 |  pp512 @ d16384 |        668.83 ± 2.83 |
| qwen35 27B Q4_K - Medium       |  15.40 GiB |    26.90 B | CUDA       |  99 |         64 |  tg128 @ d16384 |         24.31 ± 0.09 |


llama-b8948-bin-win-cuda-13.1-x64/llama-bench.exe \
    --model ./lmstudio-community/Qwen3.6-27B-GGUF/Qwen3.6-27B-Q4_K_M.gguf \
    --device CUDA0 --fit-target 64 -d 8192,16384

| model                          |       size |     params | backend    | ngl | dev          |       fitt |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------------ | ---------: | --------------: | -------------------: |
| qwen35 27B Q4_K - Medium       |  15.40 GiB |    26.90 B | CUDA       |  99 | CUDA0        |         64 |   pp512 @ d8192 |       981.43 ± 27.91 |
| qwen35 27B Q4_K - Medium       |  15.40 GiB |    26.90 B | CUDA       |  99 | CUDA0        |         64 |   tg128 @ d8192 |         16.87 ± 0.17 |
| qwen35 27B Q4_K - Medium       |  15.40 GiB |    26.90 B | CUDA       |  99 | CUDA0        |         64 |  pp512 @ d16384 |       751.15 ± 16.03 |
| qwen35 27B Q4_K - Medium       |  15.40 GiB |    26.90 B | CUDA       |  99 | CUDA0        |         64 |  tg128 @ d16384 |         12.08 ± 0.12 |


llama-b8948-bin-win-cuda-13.1-x64/llama-bench.exe \
    --model ./lmstudio-community/Qwen3.6-27B-GGUF/Qwen3.6-27B-Q4_K_M.gguf \
    --fit-target 64 -d 8192,16384

| model                          |       size |     params | backend    | ngl |       fitt |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ---------: | --------------: | -------------------: |
| qwen35 27B Q4_K - Medium       |  15.40 GiB |    26.90 B | CUDA       |  99 |         64 |   pp512 @ d8192 |        807.61 ± 7.40 |
| qwen35 27B Q4_K - Medium       |  15.40 GiB |    26.90 B | CUDA       |  99 |         64 |   tg128 @ d8192 |         24.85 ± 1.57 |
| qwen35 27B Q4_K - Medium       |  15.40 GiB |    26.90 B | CUDA       |  99 |         64 |  pp512 @ d16384 |        732.96 ± 3.86 |
| qwen35 27B Q4_K - Medium       |  15.40 GiB |    26.90 B | CUDA       |  99 |         64 |  tg128 @ d16384 |         24.40 ± 0.07 |
reddit.com
u/akira3weet — 17 days ago