r/PiCodingAgent

▲ 146 r/PiCodingAgent+4 crossposts

Llama.cpp is getting better with every update

Last night I updated llama.cpp after like 2 or 3 weeks. The results were really exciting for someone running a 35B model on 6GB RTX 3050.

Today I was able to get stable token speeds and they didn't fall down to 9 t/s while coding 1000+ lines of code.

Now I can increase my context window to 64k range and I'm still getting 19 t/s minimum. Before it would do down drastically to 4 t/s.

But now it gives a solid 26 t/s. In high context window worflows it falls by 5-7 t/s only. This means I can do 1000$ worth of coding work on my laptop for free.

Yes. The AI bubble will pop for sure if people realizes they can locally get near same quality of the their cloud subscriptions.

reddit.com
u/Low-Alarm272 — 2 days ago
▲ 74 r/PiCodingAgent+1 crossposts

Markdown is not the agent god format anymore... HTML is back.

A Claude Code team member just wrote a piece arguing that now claude teams prefers HTML over Markdown for agent outputs.

Funny timing, because not long ago everyone was saying Md was the perfect agent-era format: simple, portable, easy for both humans and models....

His argument is basically that agent outputs are getting too complex now. Specs, plans, PR reviews, research reports, design explorations… at some point a giant Markdown file just becomes a wall of text nobody reads.

HTML gives agents much higher information density. It can include layout, tables, CSS, SVG diagrams, images, code snippets, visual flows, colors, interactions, and even small one-off tools. Basically, if Claude can understand it, it can probably represent it more clearly in HTML.

The bigger shift is that humans are not manually editing these files as much anymore. They are using them as specs, reference docs, brainstorming outputs, or review surfaces, then asking Claude to edit them again. So Markdown’s biggest advantage-easy human editing-matters less than before.

He does admit HTML is slower, more token-heavy, and worse for version control diffs. But his argument is that better readability, sharing, visualization, and interaction are worth it.

So maybe Markdown is still great for agent memory and plain docs. But for human-in-the-loop agent work, HTML starts looking less like a document format and more like a temporary UI.

Funny little format war to me lol. Markdown was supposed to be the agent-native format. Now HTML is coming back like, “actually I was the operating surface all along.”

Original post link: https://x.com/trq212/status/2052809885763747935

u/DoctorKhru — 2 days ago

How do you prevent your agents from getting stuck in an infinite review loop?

I've used a simple review loop before: after the main agent makes some changes, a reviewer with new context is called, and the results are fed back to the main agent, repeating this cycle.

However, AI tends to always find problems when you ask them to find one, every additional review round wastes a lot of time. I've also tried skipping the cycle and just doing one round of review, but that feels like I'm just kidding myself.

How do you strike a balance between accuracy and efficiency?

reddit.com
u/bsa-saa — 9 hours ago

What is your essential Pi extensions?

Hi everyone,

I'm new to Pi coding agent. and there are so many extensions, I've tried some but don't know which one are are essential to install.
I come from Claude Code. Could you guy pls recommend those extensions that work best for you.

reddit.com
u/54tribes — 3 days ago

The problem with Pi is its extension system

Honestly, I love Pi, and I'm going to keep using it. But the extension system is painful when it comes to using multiple different extensions that conflict with each other when they really don't have to conflict. They only conflict because of how the extension system is designed.

The only way to have a smooth experience using extensions is to write your own or to carefully choose one over another and accept the tradeoff when you really shouldn't have to.

Prime example, want nice edit tool rendering? Use pi-tool-display. But you can't if you want to use a hashline edit extension.

I feel like one of 2 things need to happen for Pi to really take off and become the neovim of harnesses (because at least to me, that's what it feels like it wants to be).

Either:

  1. The extension system is overhauled to allow coexistence. Examples, separate the tool rendering layer and the tool execution layer, allow request/response style communication between extensions (not just event bus)

  2. Extension writers do not focus on writing an extension that registers things like tools, but instead exporting APIs and such that others can install and compose themselves in their own extension. So you can for example compose hashline editing with nice edit tool rendering.

Thoughts?

PS: Maybe this has already been discussed a lot, but I haven't seen much of it. I'm kinda new here.

reddit.com
u/TheSaasDev — 1 day ago

Which provider you are using with PI?

Hey y'all I'm mostly doing excel work via python and web development. Which models you use with Pi I currently use claude code with the 100$ plan and 20$ plan codex.

I understood I can't use my claude claude sub for pi

What do you recommend?

Thanks

reddit.com
u/ishay_al — 3 days ago

Pi coding agent is amazing (or how I learned to stop worrying and leave OpenCode)

Warning: long post ahead. On the plus side, it’s completely human-written. No AI slop was used in writing this post. I’m old school that way, I like to actually write my own Reddit posts. Thought you all would appreciate something written entirely by a human for a change. ;)

Disclaimer: this post says nice things about Pi. I am not associated with the dev team of Pi coding agent in any way.

Yesterday I tried Pi coding agent on my local LLM rig for the first time. I had been using OpenCode as my daily driver agentic harness, and I had been intimidated by Pi’s stripped down, minimalist approach.

My rig, by the way, is an M4 MacBook Pro with 64Gb of RAM. oMLX is the backend, serving up jundot’s quant of qwen3.6:35b-a3b-oQ6. I average around 60 tokens/second at around 80 percent RAM usage.

My coding needs are fairly modest. I run around eight static websites for my hobby board gaming group, hosted on GitHub pages. So the daily tasks usually involve updating sites with user submissions, implementing feature requests, squashing minor bugs, things of that sort.

I had gotten used to the security blanket of OpenCode, with its set of built-in tools. I had come to accept that sometimes OpenCode will take a little longer to answer a request, and had gotten used to its sometimes dumb little oversights and charmingly stupid mistakes.

For example, I often ask OpenCode to make a 3x3 image collage of board game cover images using ImageMagick command line tools. It would usually take several revisions, as OpenCode would first render them in a straight line row instead of a 3x3 grid. Then after feedback, render a 3x3 grid, but each image was of different size. Then after even more feedback, it would finally output a 3x3 grid of equally sized images.

You know the old saying about LLMs acting like green interns? In my case, OpenCode often acts like an intern who needs the instructions explained multiple times before they get the task right.

But at least OpenCode was the evil intern that I was familiar with. As I said, I had gotten used to working within its limitations and quirks.

Anyway, yesterday I decided to overcome my nervousness about leaving the security blanket of OpenCode and dive into the unknown depths of Pi coding agent. I gave Pi the exact same task using a similar prompt: create a 3x3 grid of the cover images of these specified board games, each image 400x400 pixels.

Pi methodically went about the task. First it identified which images were available locally and which were not. Then it web searched the websites to grab the missing images and download them locally. Then it created the 3x3 grid, to my desired specs, right the first time. I was blown away at how much better, faster, more accurate, and more capable it felt working with Pi vs. OpenCode. I didn’t change the local model, I just changed the agentic harness. If OpenCode felt like working with an inexperienced intern, Pi felt more like working with a trustworthy and reliable teammate.

With OpenCode I had assumed it would be capable of only routine maintenance and updates, and that if ever I needed to do some heavier lifting, I would have to bust out a cloud frontier model like Codex. But I decided to give Pi a more challenging test to uncover its true capabilities. I asked Pi to plan set-by-step the addition of a search feature to one of my sites, with live filtering as the user types, a dropdown menu overlay matching the site’s existing CSS, etc.

Guess what, Pi made the plan, checked with me for my go-ahead, then started implanting the plan, task by task. It wasn’t perfect. There were a couple of points where functions were called in the wrong order. But I dutifully fed the web inspector errors to Pi, it quickly and correctly figured out the issues, and fixed them. Within a few minutes, my search feature was working, pretty much exactly as I had envisioned it.

Even more impressive: following Pi’s philosophy of “if you need extra features, ask Pi to build them”, I asked Pi to reflect on our coding session, then based on that suggest some enhancements to itself to address the main pain points. Pi identified that it needs a better auto-compact feature, and a better way to seamlessly pick up in context where it left off; and built those features into itself. It also added a JS script to mitigate those function calling timing issues we had encountered. So as one works with Pi, one gradually customizes and improves Pi to become more optimized for the actually coding work that you do.

Man, I was so impressed. Pi takes this local LLM thing from “works well enough for routine tasks” to “works well enough that I don’t think I need to fire up a cloud model”. I now have the confidence to leave OpenCode behind.

TL; DR: I overcame my fears and tried Pi instead of OpenCode, and had a great experience.

reddit.com
u/Konamicoder — 2 days ago

Best GUI, in your opinion?

Hello guys, i know this is common thread in Pi and people are despresttly looking for GUI for Pi to be honest i never wanted it but I need it too ..

I usually use Zed IDE for my work but i feel Zed is lacking alot. so I am just curious if i plan doing GUI for Pi what things you think you need in that GUI ?! is it functions like what? is it simplicity like what? etc please help me figure out how i can improve what i do and ill open source it soon once i have a polish solid GUI

reddit.com
u/SalimMalibari — 2 days ago

How do you use Pi without running out of usage

TLDR

How tf do people use this as a daily driver without smashing caps? I love this tool but I feel like I’m throwing money at the wall.

I have come from using 2 Claude Code subscriptions (1 personal & 1 with work) and a Cursor subscription.

I love Pi and the idea behind it. Being able to completely control the harness. After the recent regressions of Claude Code I was looking for alternative (didn’t want to fall in the same trap with allowing someone to control my harness).

I started using Pi and loved it at first. I have a Z.ai coding plan, however I’m constantly hit the 5 hour cap.

Then I decided to try the Codex Pro plan and hit the 5 hour cap after one hour of intense coding.

I had set reasoning effort from medium, then have tried low. It helped a bit but not amazingly.

Other things I’ve tried are Semble & Caveman mode for less token usage.

However I’m starting to wonder, have I not optimised my setup enough, is this normal?

Is this only viable with a local or high end coding plan.

How do you guys use this as a main driver and what advice do you have?

I’ve been trying the packages (however the page keeps timing out for me lol, so I can’t use it).

I’ve been playing with my system prompt and trying to keep it short & concise to reduce tokens. I removed all MCPs.

It’s started to make me question if I’m missing some kind of caching and optimisations most harnesses have built in.

reddit.com
u/alexdunlop_ — 4 days ago

Do subagents really matter?

I’ve been running Pi for two weeks without any subagents, agent teams, or anything like that. Compared to my previous workflow, I haven’t noticed any difference at all. In fact, the AI even seems a bit more accurate without subagents.

Something I think is really important: Is there a good way to automatically start/stop another agent client and let them send messages to each other?

Would love to hear any methods or tools that can do this!

reddit.com
u/bsa-saa — 6 days ago

These are the packages i use

These are the packages i use any addition or removal that you suggest ? i am thinking i have installed too much

()

"packages": [

    "npm:pi-mcp-adapter",

    "npm:@tintinweb/pi-subagents",

    "npm:@plannotator/pi-extension",

    "npm:@juicesharp/rpiv-todo",

    "npm:@juicesharp/rpiv-ask-user-question",

    "npm:pi-lens",

    "npm:@juicesharp/rpiv-advisor",

    "npm:pi-btw",

    "npm:pi-rewind-hook",

    "npm:@gotgenes/pi-permission-system",

    "git:github.com/leblancfg/pi-ansi-themes",

    "npm:pi-caveman",

    "npm:@juicesharp/rpiv-pi",

    "npm:@juicesharp/rpiv-args",

    "npm:pi-simplify",

    "npm:pi-studio",

    "npm:@ff-labs/pi-fff",

    "npm:pi-gsd",

    "npm:@aliou/pi-processes",

    "npm:@juicesharp/rpiv-web-tools",

    "git:github.com/ferologics/pi-notify",

    "git:github.com/jayshah5696/pi-agent-extensions",

    "npm:context-mode",

    "npm:pi-agent-browser-native",

    "npm:taskplane",

    "npm:pi-hermes-memory",

    "npm:@apmantza/greedysearch-pi",

    "npm:@feniix/pi-specdocs",

    "npm:@kaiserlich-dev/pi-session-search",

    "npm:pi-interactive-shell"

  ]

}
reddit.com
u/Prometheus4059 — 1 day ago

0% cache hit!

What is the problem? I got a 0% cache hit. i have zero extensions, just the context cache extension!.

https://preview.redd.it/m54dhvnc9w0h1.png?width=1081&format=png&auto=webp&s=7cec0395bd316543b1c9f23198818bd07d32fe6b

Am I missing something?

here is the prompt for all messages:

read this file /home/user/my_project/packages/cli-alias/index.js 10 times in raw

That makes the local model take a very long time. Im using LM Studio

https://preview.redd.it/jzunl7q9aw0h1.png?width=747&format=png&auto=webp&s=06283dbac9f107ecfdd647d2f632049e6391d929

https://preview.redd.it/92x9qxpibw0h1.png?width=278&format=png&auto=webp&s=c879f8971195f0b12259c5e74efe87b2801e2781

Edit:
It's LM Studio bug: https://github.com/lmstudio-ai/lmstudio-bug-tracker/issues/1563 i tried llama.cpp and all working perfectly.

reddit.com
u/IslamNofl — 13 hours ago

Hello guys!

Today i want to share my Pi agent setup, i think i got something in hands here that can benefit the community to really get a powerful agent, nothing compared to claude or codex. What i want to share is my list of extensions and the value each one add to the build.

I want to start with a basic one: pi-fork. This is a basic and minimalistic subagents extension, focused on one single thing, give the main agent the capability to spawn forks of itself to do work on its name. This is quite straightforward, you can achieve the same with any other subagents extension, the only difference is that this one is simpler and have prompts that optimize the communication between the forks and the main agent. This thing brings a single thing to the table: great context management, the main agent context will only contain relevant information, the main agent context will be richer and denser per token, all the noise stays out of the main agent context.

Ok now i want to share the core of this pi build: pi-observational-memory. This one is special, is a custom compaction algorithm inspired/copied from Mastra's article. This custom compaction algorithm enables pi sessions to last forever without maxing out the context window and keep the agent focused. This combined with the rich context window of the pi-fork extension creates a rich re-callable memory system that stays relevant no matter how many weeks you have been using the same session nor the compactions it have withstand.

If you install only the two extensions above, you will enable your pi agent to be on the next level. now i have a couple more extension that give some extra perks to my build:

pi-minimal-subagent: like any other subagents extension, this one is just simpler, without bs. i use this to enable 2 subagents: the "advisor" (concept copied from claude code) and the "reviewer". The fork from pi-fork are extensions of the main agents, they are basically the same agent, they share the same context. This two agents give access to the main agent to different points of view less biased, with clean context windows. The reviewer takes care of code quality, security and ux of the changes introduced by the main agent. The advisor is for strategical decisions around architecture and product.

pi-codemapper: a wrapper of codemapper that enables efficient codebase exploration. This codemapper repo is really bad and unmaintained, it had a cache bug i had to patch myself. im looking forward to switching to cymbal when i get some free time.

pi-rtk-optimizer: This is a classic, not much to say here, it saves some tokens.

Conclusion:

I describe this setup with a single phrase: A personal agent that never forgets and can be useful for weeks before the context window gets maxed out.

I hope you can get value from some of the extensions i shared guys, my own words are not good enough to describe the power i feel when working with this agent setup, so i beg you to try it yourself to really experience what im saying.

u/elpapi42 — 9 days ago

How are you handling Web Searches? I can't migrate away from Claude without it

Most of my time is spent on doing web searches and comparisons.

Claude has a WebSearch tool that runs a "Google" like web search and returns results with the source links.

I usually ask for:

  • How does tool x compare to y?
  • Are there any blog posts or articles talking about X?
  • Can you find A in github/stackoverflow/reddit?

How are you doing web searches?

Are there free options?

Which plugins/extensions do you recommend?

EDIT: Given that 2026 is already full of supply chain attacks...

I followed the suggestions and built my own extension with 4 different backends.

My extension queries 2 backends in parallel and gets 6 different results (3 from each), falls back to the other if rate-limited or exhausted, then pipes the response to Defuddle and exposes markdown to the LLM.

I'm quite happy, thanks for all the comments so far!

Great community!! 💪🏻

reddit.com
u/carlos-algms — 2 days ago

You can do basic web-search with just two simple cli tools

Hi! I was looking at the web search options available in the pi ecosystem and most of them wrap some API or require config..

I just want my tool to be able to

  1. Run a search query via a search provider
  2. Fetch pages preferably as markdown

For this I found that there exist two boring tools that work well together:

  1. The duckduckgo commandline tool ddgr. This is just one sudo apt install ddgr away
  2. The super weirdly named trafilatura tool. This is a python tool that extracts text content from a url. Has lots of options for presentation and what to include/exclude. pip install trafilatura.. I suppose? I use NixOS so I dunno how to install this globally with Python. Python is hell.

What is trafilatura?

It's a commandline tool that extract meaningful content from a web-page. It's been actively maintained for over 9 years (probably longer?), and its primary use-case is to help with academic research. I suppose it's usually useful for researchers to do scraping.

Anyway, it is rich, mature, old and just a cli tool. It supports markdown output, regular output, a mode to show very little content, a mode to show more content. You can choose to include/exclude links etc.


Anyway. If you wrap these in a simple extension you get 100% local search that works for the common use-case of "just quickly look something up on a forum, documentation, wikipedia or Github".

I haven't looked into how to publish this as an extension, but if people like it I could package it up.

This is the extension as a gist if anyone wants to try it.

https://gist.github.com/Azeirah/9375fb67c5aee6ca1b7e046f8b7cf0cd

Trafilatura has been configured to do:

  1. Show links
  2. Show markdown
  3. Show the concise output, so not the verbose output. I did that to save tokens
u/Combinatorilliance — 2 days ago

pi-emote extension: an avatar for your pi agent

Hi! I have been using pi for a while and really liked the harness. I wanted to have a small visual indicator while using it like the old school JRPGs. Not sure if there is something similar already, but it was fun making this extension as a learning project. It kinda feels like I'm playing SNES while coding.

The avatar was made using nano banana as a placeholder, but I would like to change it to something better and more consistent.

It would be nice to have different avatars per model, or customize it yourself. For now, it kinda works with a single avatar.

I only tested it in Ghostty. Wanted to run it through tmux or zellij, but apparently rendering images through them is not as straight forward.

Let me know what you think!

https://github.com/cgxeiji/pi-emote

u/CGx-Reddit — 8 days ago
▲ 6 r/PiCodingAgent+1 crossposts

Pi coding agent makes Reaper project: "Cold Machine Wakes"

WARNING: This "creation" gets VERY loud. Please be careful with listening. The AI does not seem to be very good at mixing yet.

The goal:

I thought it would be nice to see if LLMs could actually work with Reaper. So I made a Reaper skill for the Pi coding agent and asked it to create anything. It came up with this. I wouldn't call it a fantastic song but it is quite interesting to see what LLMs create when you give them access to a tool like Reaper.

It's also fascinating to see that the LLM also just creates any JSFX plugins it needs. So yeah, you can easily create the weirdest JSFX in seconds using LLMs. I'm quite impressed.

If you want to experiment with something like this yourself, you can use the Reaper web interface API. I've made a custom action that just executes a Reascript file and this Reascript file is written by the LLM. It then just triggers the written script by calling the web API with the command id.

And of course you can take this all waaaay further. Love to see what you come up with!

Check the skill out on Github:

https://github.com/michielpapenhove/reaper-audio

One note: there might be paths used in the skill that will not work for you.

u/MichettGodot — 4 days ago