u/pvatokahu

🔥 Hot ▲ 59 r/EngineeringManagers

Cognitive load shift from doing work to checking AI work product

I found this article on WSJ from Katherine Blunt to be quite useful.

Gist - AI Is Getting Smarter. Catching Its Mistakes Is Getting Harder.

As chatbots and agents grow more powerful and ubiquitous, recognizing the moments when they go rogue can be tricky.

One of the comments on the article stood out to me -

… AI displaces the cognitive load from the actual doing of work to checking AI generated output …

Does that mean that people are spending more effort/focus on QA or increasing how much testing IC devs do?

wsj.com

u/pvatokahu — 7 days ago

▲ 2 r/LargeLanguageModels

NYT article on accuracy of Google's AI overviews

Interesting article from Cade Metz et al at NYT who have been writing about accuracy of AI models for a few years now.

We got to compare notes and my key take away was to ensure that your evaluations are in place as part of regular testing for any agents or LLM based apps.

We are quite diligent about it at Okahu with our debug, testing and observability agents. Ping me if you are building agents and would like to compare notes.

nytimes.com

u/pvatokahu — 13 days ago