u/aistranin

f you want a structured way to learn agent development without starting from random blog posts, Hugging Face has a free AI Agents course:

https://huggingface.co/learn/agents-course/en/unit0/introduction

It covers the basics first, then moves into actual frameworks and projects.

The syllabus includes:

What agents are
How tools, actions, and observations work
Agent frameworks like smolagents, LlamaIndex, and LangGraph
Agentic RAG
A final project where you build, test, and certify an agent
Bonus material on observability, evaluation, and function-calling

I like this kind of resource because it does not treat agents as just "LLM plus loop."

For junior devs, the useful concept is the agent control loop:

The model receives a goal and context
It chooses an action
A tool runs that action
The result comes back as an observation
The agent decides what to do next

That loop is the core of most agent systems. The framework changes, but the pattern keeps showing up.

If you are already comfortable with Python and basic LLM APIs, this seems like a good weekend learning path. Build the smallest possible agent first. Then add one tool. Then add logging. Then add a human approval step.

That progression teaches more than trying to build a giant "does everything" agent on day one.

u/aistranin — 12 days ago

▲ 2 r/PracticalAgenticDev

Came across this article and thought it was worth sharing here: How to Build Production-Grade Generative AI Applications

It’s a good practical overview of what teams usually learn the hard way after the prototype phase. A few points it gets right:

not every problem should use an LLM
model selection should be based on task fit, latency, cost, context window, and safety, not just hype
prompt engineering matters, but structured inputs/outputs matter just as much
guardrails, QA, eval pipelines, and tracing are not “later” concerns
production failures usually come from accuracy drift, hallucinations, cost, and lack of observability

What I liked most is that it frames GenAI systems as engineered products, not prompt demos. That maps well to agentic dev too: once agents can use tools and run longer workflows, monitoring, constraints, and evaluation become first-class design problems.

u/aistranin — 19 days ago

▲ 2 r/PracticalAgenticDev+1 crossposts

A lot of teams now say they are “testing AI workflows,” but when you dig in, the actual approach is all over the place.

I’ve seen combinations like:

mocked unit tests around prompt builders / orchestration logic
deterministic tests with frozen model outputs
cheap-model integration tests in CI
full end-to-end runs nightly
eval pipelines before release
production monitoring plus human review

The hard part is balancing:

cost
runtime
brittleness
confidence
reproducibility

What I’m trying to understand is what people here do in practice.

Questions:

What do you test with classic software tests vs evals?
Where do you mock, and where do you insist on real model calls?
What runs on every PR vs nightly?
How do you catch regressions that are not binary failures but “quality drift”?
What looked promising at first but turned out to be low-value?

Would love concrete examples of test architecture, CI strategy, and lessons learned.

reddit.com

u/aistranin — 12 days ago

▲ 2 r/PracticalAgenticDev

OpenAI published this on April 15: The next evolution of the Agents SDK.

The interesting part is not just “better agents.” It’s that the SDK is moving toward real execution infrastructure for systems that can inspect files, run commands, edit code, and work on longer-horizon tasks inside controlled environments.

That feels important for practical agentic development because the hard part is no longer just model quality. It’s whether the system can execute safely, repeatedly, and observably.

My take:

the center of gravity is moving from prompt tricks to runtime design
agent frameworks are becoming more like operating environments
the real moat is starting to look like execution, safety, evals, and observability rather than raw chat quality

Curious how people here see it:

Are you using vendor SDKs directly, or building your own orchestration layer?
What’s still missing most: evals, rollback, state handling, approvals, tracing?

Source: OpenAI Agents SDK update

reddit.com

u/aistranin — 24 days ago

▲ 1 r/PracticalAgenticDev

The new Stanford AI Index is out: 2026 AI Index Report

u/aistranin — 26 days ago

▲ 2 r/PracticalAgenticDev

35B parameters, ~3B active thanks to MoE.

Key points:

In agentic coding, it reaches the level of models with ~10× larger active parameter count
Outperforms Qwen3.5-27B (dense) and the previous Qwen3.5-35B-A3B
Natively multimodal architecture (text + vision)
In VLM benchmarks, comparable to Claude Sonnet 4.5, and in some tasks performs better
Strong metrics in spatial reasoning tasks

Benchmarks:

MMMU - 81.7 vs 79.6
MMMU-Pro - 75.3 vs 68.4
MathVista - 86.4 vs 79.8
RealWorldQA - 85.3 vs 70.3

Practical implications:

MoE provides a multiple reduction in compute without sacrificing quality
Well-suited for agent-based scenarios where sequential actions and planning matter
Can be used as a unified stack for both code and vision tasks

Apache 2.0 (no restrictions for production use)

https://huggingface.co/Qwen/Qwen3.6-35B-A3B

u/aistranin — 27 days ago

▲ 1 r/PracticalAgenticDev

Source: https://www.anthropic.com/news/claude-opus-4-7

u/aistranin — 28 days ago

▲ 1 r/PracticalAgenticDev

On one hand, planning is an incredibly powerful capability in AI systems. It opens the door to more autonomous, agent-like behavior and lets models tackle more complex, multi-step problems.

On the other hand, it’s also the part I trust the least right now.

In my experience, I’ve been able to get patterns like reflection and tool use to work quite reliably. They’re much easier to reason about, debug, and iterate on—and they consistently improve application performance.

Planning, though, feels different. It’s harder to predict what the model will actually do, especially ahead of time. Even with careful prompting and constraints, the outcomes can be inconsistent or surprising in ways that are tough to control.

That said, things are moving fast. The progress over the past year alone has been huge, so I’m pretty confident this gap will close sooner rather than later.

How do you evaluate planning? How to monitor?

reddit.com

u/aistranin — 1 month ago

▲ 1 r/PracticalAgenticDev

Hey - glad you’re here 👋

This is a dev-first community of people actually building agentic systems.

We care about practical agentic development:

real architectures
real failures
real tradeoffs
real systems that (sometimes) work

Relevant Community Topics:

autonomous agents
multi-agent setups
tool use / orchestration
evals, debugging, reliability
production lessons

reddit.com

u/aistranin — 1 month ago

▲ 1 r/PracticalTesting

Robotic process automation (RPA) for repetitive e2e tests

Robotic Process Automation (RPA) in testing refers to the use of “software robots” to mimic and repeat the actions that human testers perform when interacting with an application.

Is RPA the same as an automated testing script? No - RPA is not the same as automated testing scripts. It uses the UI to mimic human actions and execute workflows, while automated testing scripts programmatically verify that software behaves correctly.

RPA = “Do what a user does”
Test automation = “Check if the system behaves correctly”

According to https://testfort.com/blog/test-automation-trends, RPA adoption in testing is expected to grow significantly as organizations use it to reduce manual labor costs and scale testing efforts alongside AI-driven automation. Something to look after in the industry 👀

u/aistranin — 1 month ago

▲ 1 r/PracticalTesting

LLMs for test case generation are promising - but reliability is still a major issue

Source: https://link.springer.com/article/10.1007/s10586-026-06021-z

A recent review explores how large language models (LLMs) are being used to generate test cases.

https://preview.redd.it/guardbfaeltg1.png?width=1280&format=png&auto=webp&s=fc2f3acdb6a97bfe7d87e7fa30e7ad1cf9cbf154

Key takeaways:

Software testing is critical but still time-consuming and labor-intensive
Traditional automated methods (search-based, constraint-based) often:
- lack coverage
- produce less relevant test cases
LLMs introduce a new approach:
- understand natural language requirements
- generate context-aware test cases and code
- directly translate requirements to test cases
- LLM-based approaches show promising performance vs traditional methods

Open issues:

Lack of standard benchmarks and evaluation metrics
Concerns about correctness and reliability of generated tests

In practice, reliability seems like the biggest blocker - LLMs generate tests that look correct but often miss edge cases or assert the wrong behavior. Or they focus on retesting some obvious scenarios multiple times ignoring actual unit responsibility in the surrounding system.

What is your experience generating tests with AI?

reddit.com

u/aistranin — 1 month ago

▲ 2 r/PracticalAgenticDev+1 crossposts

Are you into testing AI agents?

From https://devops.com/is-your-ai-agent-secure-the-devops-case-for-adversarial-qa-testing/

>The future belongs to organizations that recognize “sunny day” testing is no longer enough. The teams that build the “storm simulators” now will operate with a level of confidence and security that their competitors cannot match.

They suggest simulating network failures, ambiguous requirements and prompt injection to see if an agent maintains safe behavior. The message is that AI agents are part of our software stack now, and they need to be tested with creativity.

What do you think?

reddit.com

u/aistranin — 28 days ago

▲ 1 r/PracticalTesting

What do you do when a legacy codebase has low trust?

https://preview.redd.it/z5ezimgw4ltg1.jpg?width=1000&format=pjpg&auto=webp&s=84239064f2dbaba2acadc8042fb121818aecc3dc

reddit.com

u/aistranin — 1 month ago

▲ 5 r/PracticalTesting+1 crossposts

My experience coding with AI has never been like 10× faster (more like 0.8× hehe). Sure, AI copilots can generate OK looking code, but for me it has mostly been a waste of time. The tech debt is leveraged, learning is slower, and you often end up spending more time fixing things than if you had just written the code by hand much more simply (without AI).

I tend to see more benefits from AI code generation when it’s used with Test-Driven Development (TDD), at least when starting with end-to-end or integration tests first. I also shared my thoughts on this on YouTube: https://youtu.be/Mj-72y4Omik

Some developers argue that TDD is too slow and that you should focus on end-to-end tests (writing them manually) and let AI generate unit tests. That kind of works. But when it comes to learning Python (especially for beginners), I see a lot of frustration from overusing AI. TDD seems like a nice approach to avoid just relying on AI.

What do you think?

u/aistranin — 1 month ago

▲ 1 r/PracticalTesting

New Dev Intros 🎉

Congrats on becoming a member of r/PracticalTesting community 🎉

Every great software community starts with people like you - developers who care about building, testing, and shipping great software products.

This space is all about practical testing: real-world approaches, useful tools, lessons learned, and honest discussions about what actually works (and what doesn’t).

Whether you’re here to learn, share your experience, or ask questions — you’re in the right place.

To get started:

Introduce yourself 👋
Share what you’re currently working on
(Optionally) Tell us more about your background/experience in testing

Let’s build a community where testing is not just theory, but something that truly helps us ship better code 🚀

reddit.com

u/aistranin — 1 month ago

▲ 1 r/PracticalTesting

Takeaways from the book "Unit Testing: Principles, Practices, and Patterns"

I am reading "Unit Testing: Principles, Practices, and Patterns" by Vladimir Khorikov right now. The main idea that stuck with me is to focus on test value instead of chasing coverage numbers or clever frameworks.

Source: \"Unit Testing Principles, Practices, and Patterns\" by Vladimir Khorikov

The book pushes hard on making tests about behavior and risk rather than about methods and branches. Really great book! Highly recommend this for reading.

reddit.com

u/aistranin — 1 month ago

▲ 1 r/PracticalTesting

CloudBees Smart Tests is now GA - using AI test intelligence in CI?

CloudBees just announced general availability of Smart Tests, their AI driven test intelligence product for CI/CD.

Source: https://www.cloudbees.com/newsroom/cloudbees-smart-tests-brings-control-to-ai-generated-code

CloudBees just announced general availability of Smart Tests, their AI driven test intelligence product for CI/CD.

The pitch is simple - instead of running every test on every change, Smart Tests learns which tests matter most for a given commit and runs those first.

Given how much AI generated code is now flowing through pipelines, this feels like a pretty important direction for test tooling.

WDYT?

reddit.com

u/aistranin — 1 month ago

▲ 1 r/PracticalTesting

paper on “systemic flakiness” - flaky tests are not random noise

There is a 2025 paper called “Systemic Flakiness: An Empirical Analysis of Co-Occurring Flaky Test Failures”.

👉 https://arxiv.org/abs/2504.16777

They looked at 10,000 test suite runs from 24 Java projects and found 810 flaky tests. The key claim is that flaky tests often fail in clusters that share root causes. They call this pattern “systemic flakiness”.

About 75 percent of flaky tests in their dataset belonged to some cluster.

They show that fixing a shared cause can remove many flaky tests at once. Common causes were unstable networks and flaky external dependencies.

We should search for shared root causes, not only patch single tests. This could be very relevant for teams that drown in flaky UI or API suites.

reddit.com

u/aistranin — 1 month ago

▲ 1 r/PracticalTesting

Thoughts on “The Pyramid of Unit Testing Benefits”?

I went back to Gergely Orosz’s article “The Pyramid of Unit Testing Benefits” and it hit harder than before.

👉 https://blog.pragmaticengineer.com/unit-testing-benefits-pyramid/

He talks about how unit tests start with basic validation but then stack into better design, living documentation, safer refactors, and faster iteration over time.

The idea that the real payoff shows up years later might explain why experienced devs fight hard to keep tests, while juniors often see them as a chore.

u/aistranin — 1 month ago