u/Euphoric_Network_887

When I was 20, I had an idea that I never really built, but that I still think about sometimes.

The concept was to create a peer-to-peer delivery system, but with a very different logic from traditional delivery platforms. Instead of paying people in cash, they would earn discount points redeemable at partner shops selling essential goods.

So in practice, someone could occasionally help with a delivery on a route they were already taking, or close to it, and in return receive discounts on everyday essentials: groceries, basic household products, and so on. The point was precisely that it would stay occasional and not become a job. Something simple, flexible, and embedded in normal life.

What interested me was the idea of increasing people’s purchasing power without directly giving them cash. Instead, the system would make essential goods more affordable, ideally through responsible partner shops offering good-quality products, fresh fruit and vegetables, and healthier, more sustainable options.

I also liked the idea that it could redirect part of everyday consumption away from ultra-processed food or fast food and toward healthier products, without using guilt or moralizing. Just by making better options more accessible.

On the other side, it could also help small responsible shops that often struggle with low visibility compared with large chains that have much stronger commercial power. The system could give them both visibility and a delivery service they often cannot afford to build on their own.

There was also an important logistics angle: optimizing existing routes. For example, if someone is leaving work and heading home, the platform could suggest a delivery along the way, or with only a small detour, rather than turning it into a 30-minute inconvenience. The goal was not to create another army of precarious gig workers, but to use existing movement more intelligently.

At a deeper level, what motivated me was a broader intuition: today, money tends to move vertically. It flows up, down, and concentrates, but it does not circulate enough locally and usefully between people.

The economic model I had in mind was more about circularity: people help one another, that creates real purchasing power, that purchasing power benefits responsible local shops, and in turn encourages healthier consumption. A more local, healthier, and more mutually beneficial loop.

I still do not know whether the model was naive, unrealistic, or simply underexplored, but I still think there was something interesting in it.

I would genuinely love to hear people’s thoughts. Does this sound viable, or do you immediately see why a system like this would not work?

reddit.com

u/Euphoric_Network_887 — 29 days ago

▲ 2 r/BuildInPublicLab

⚠️ I know it is a long post, not really the reddit format we are used to, but since a lot of people wonder on what the future might look like with "AI replacing jobs", I thought it could be cool to have a read on what happened historically

What creative destruction actually means

“Creative destruction” has become dramatic shorthand for a single, tired idea: technology arrives, jobs disappear, society panics. The concept deserves better. It is broader, deeper, and more unsettling than that.

At its core, creative destruction describes the way capitalism renews itself by breaking apart older economic structures and replacing them with new ones. Joseph Schumpeter gave the idea its classic formulation in Capitalism, Socialism and Democracy in 1942, arguing that capitalism does not evolve smoothly or gently. It advances through waves of disruption, new products replace old ones, new production methods displace established routines, entire sectors are reorganized. Disruption is not an accidental side effect of capitalism. It is one of its central operating mechanisms.

This is commonly misread as a story about labor market pain. But creative destruction is not just about workers losing jobs. It is about older combinations of the economy being dismantled and replaced: ways of producing goods, organizing companies, reaching markets, even defining economic value itself. A new technology does not merely make one task faster. It can render an entire business model obsolete, reduce the value of one skill while raising another, and shift power from one class of firms to another.

Schumpeter’s insight was also a critique of static thinking. In textbook, markets tend toward equilibrium. In his account, real capitalism is turbulent, driven by entrepreneurs, innovation, competition, and periodic upheaval. Stability is temporary. The system grows by repeatedly unsettling itself, which is why the history of modern capitalism is not a straight line of gradual improvement but a sequence of shocks and reconfigurations, in which gains in productivity are often inseparable from losses in status, security, and institutional continuity.

"Capitalism [...] is by nature a form or method of economic change and not only never is but never can be stationary. [...] This process of Creative Destruction is the essential fact about capitalism." Joseph Schumpeter (Capitalism, Socialism and Democracy, 1942)

Creative destruction is therefore a structural description, one that tells us major economic change tends to arrive in double form: creation for some, destruction for others. New wealth is generated, but older livelihoods, firms, and routines may be swept aside in the process. The central political and moral question has never been whether this happens. History is clear enough on that. The real questions are who bears the cost of transition, who captures the gains, and whether institutions can adapt quickly enough to prevent economic renewal from becoming social fracture.

https://preview.redd.it/hu711f8nl8pg1.png?width=1456&format=png&auto=webp&s=3e540771e1e458672e80919ae124504ff17edaa6

That is the right frame for any serious discussion of AI. The fear surrounding it belongs to a much older pattern: when a new productive force enters the economy, it does not simply add possibilities. It rearranges hierarchy, relevance, and power.

The problem is that this reading of creative destruction is historically thin and analytically wrong. The economists actually working on the frontier of labor research, people like Daron Acemoglu, David Autor, and Erik Brynjolfsson, have spent the better part of two decades building a far more precise and considerably less comfortable picture of what technology does to work. What they have found does not fit inside the optimistic framing that dominates most public discussion. It is worth going through their framework carefully, because the details are where the argument lives.

The economy does not replace jobs

The first correction the research forces on us is conceptual. Modern labor economics does not think in terms of jobs. It thinks in terms of tasks. A job is a bundle of tasks, and the bundle changes shape when technology arrives.

When a machine or a piece of software takes over a function in an economy, it does not typically eliminate a job the way you might delete a file. It absorbs one or several tasks that previously required a human being, while leaving other tasks in the same job description untouched or reconfigured. The classic example is the spreadsheet. Accountants did not disappear when Excel arrived. What disappeared was the specific task of performing arithmetic by hand. The accounting profession was restructured around the tasks that remained, and some new tasks appeared that had not existed before. The job persisted, transformed.

This task-based framework generates two distinct mechanisms that researchers call the displacement effect and the reinstatement effect. The displacement effect is straightforward: capital in the form of technology takes over tasks that workers previously performed, reducing the share of value added that accrues to labor. The reinstatement effect is the creative part of creative destruction. New technology creates new tasks that did not previously exist and that require human judgment, human presence, or human skills to perform. The emergence of data analyst roles following the computerization of business records is the canonical example. The machines created a new kind of work that humans then staffed.

For most of economic history, or at least for the period following the Industrial Revolution, these two effects roughly balanced each other over medium to long time horizons. Workers were displaced from one set of tasks and gradually absorbed into new ones. The transitions were painful and often unjust, but the reinstatement effect eventually caught up with displacement.

What the data from the past four decades shows is that this balance has broken down. Since roughly the 1980s, the displacement effect has been outrunning the reinstatement effect with increasing speed. Technology is destroying old tasks faster than it is inventing new ones that require human beings. The current wave of automation is entering an economy where the reinstatement mechanism is already running behind, and there is no structural reason to expect it to accelerate on its own.

The problem with technologies that are just good enough

Acemoglu and his colleagues have developed the idea of what they call “so-so technologies.” It is worth spending time with this concept because it cuts against the intuitions most people bring to the subject.

A genuinely transformative technology, the kind that justifies the historical optimism embedded in creative destruction narratives, does not merely replace workers. It generates a productivity increase large enough to trigger a cascade of downstream effects. Ford’s assembly line is the standard reference point. It eliminated enormous numbers of skilled craft jobs in automobile production. But it also reduced the cost of automobiles so substantially that a new mass market came into existence. That new market generated demand across the entire economy, for raw materials, for roads, for fuel, for repair services, for suburban housing... The destruction was real and severe for the workers it displaced.

The mechanism here is essential. The productivity gain had to be large enough to cause prices to fall and real purchasing power to rise. That increase in purchasing power had to generate new demand. That new demand had to be labor-intensive enough to create significant employment. All three steps have to work for the creative destruction cycle to complete itself.

A so-so technology is one that clears only the first bar. It is efficient enough to justify replacing a worker, but it does not generate a meaningful productivity increase, so prices do not fall, real incomes do not rise, and no new demand is created. What happens instead is a simple transfer: the value that previously accrued to the worker as wages is transferred to the firm as profit. Nothing is created. Only the distribution of existing value changes.

Supermarket self-checkout kiosks are a useful illustration. Grocery chains have replaced a significant share of their cashier workforce with machines that require customers to scan and bag their own purchases. The process is slower for the customer, more error-prone, and requires periodic intervention from a human attendant. No meaningful productivity gain has occurred in any measurable sense. The time cost of the transaction has arguably increased, it has simply been transferred onto the customer rather than paid as a wage. Grocery prices have not fallen as a result of this automation. The labor cost that was eliminated has been captured as margin. This is the so-so technology in its clearest form: a transaction that looks like efficiency from the firm’s income statement and looks like nothing in particular from the economy’s perspective, because no new value has actually been produced.

The risk with a substantial portion of current AI deployment is that it falls into this same category. A customer service chatbot that replaces a human agent may reduce costs for the company deploying it. But if the service it provides is noticeably worse, or even marginally worse, and if the company captures the cost savings as margin rather than passing them through to customers as lower prices, then the economy has not become more productive in any meaningful sense. One person lost a job. One company’s bottom line improved. The net effect on aggregate demand is negative, because the displaced worker has less money to spend.

This is not inevitable. There are AI applications that appear to be generating genuine productivity gains, in drug discovery, in materials science, in software engineering to some extent. But it is a real analytical question, and not one that optimistic analogies to previous technological waves can settle in advance.

The trap built into how we think about Artificial Intelligence

Erik Brynjolfsson has articulated another structural problem with the current trajectory of AI development that he calls the Turing Trap. The name refers to Alan Turing’s original formulation of machine intelligence, in which a machine is considered intelligent if a human observer cannot distinguish its outputs from those of another human. The Turing Test set imitation of human capability as the goal of artificial intelligence research.

Brynjolfsson’s argument is that this framing has become an economic trap. When the objective of AI development is to produce a machine that can do exactly what a human can do, the logical endpoint is a world in which human labor can be replaced at scale, which means the effective supply of labor becomes nearly infinite, which means the price of labor approaches zero. It is a structural argument about what happens to the value of human work in general when machines become capable substitutes for it.

The contrast he draws is with what he calls augmentation. A technology that augments human capability does not substitute for what humans can already do. It extends the range of what humans can do at all. The telescope did not replace astronomers. It allowed astronomers to observe things they could not previously observe. The technology expanded the domain of human productive capacity rather than replicating it. When capital is invested in augmentation, the productivity gains are real and they accrue partly to workers, because the workers become capable of doing more valuable things.

The problem is that the current incentive structure in technology markets pushes heavily toward substitution rather than augmentation. The business case for replacing a worker with a machine is direct, immediate, and easy to model. The business case for building a tool that makes workers more productive is harder to capture, because the value it creates may diffuse through the labor market in ways that are difficult to appropriate. Tax structures, accounting conventions, and the short time horizons of capital markets all reinforce the bias toward automation over augmentation. Brynjolfsson is not making a purely technological argument. He is making an argument about institutional incentives, which is a different and more tractable kind of problem.

The rule that no longer holds

For roughly two centuries, the pattern of technological displacement followed a reasonably stable logic. Technology first displaced physical labor, the work of muscle and endurance, and then moved progressively up the skill hierarchy to displace routine cognitive work. The process that economists describe as routine-biased technological change eliminated the administrative middle of the labor market through the final decades of the twentieth century. Clerical work, data entry, basic bookkeeping, production line supervision: these were the jobs that computerization hollowed out.

Through all of this, there was a consistent story that education professionals and policy makers told, and that the data broadly supported. The higher your skills and credentials, the more protected your position. Automation replaced what was routine and replicable. It could not touch what required genuine judgment, creativity, or complex communication. The lawyers, the consultants, the software engineers, the writers: these workers sat above the waterline of automation risk. The policy prescription followed naturally from the diagnosis. More education, more advanced training, credentials in cognitively demanding fields.

Generative AI breaks this pattern in a way that has no clear historical precedent. It is not biased toward routine cognitive tasks. It is biased toward non-routine cognitive tasks, precisely the category that all previous waves of automation left relatively intact. A large language model does not struggle with the kind of complex, open-ended reasoning that distinguished knowledge work from automation-vulnerable work. It struggles, at least currently, with physical manipulation, spatial navigation, and real-world embodied action.

The practical implication is an inversion of the previous risk hierarchy. The workers who face the most direct exposure to current AI capabilities are the ones who spent the most time and money insulating themselves from previous waves of automation. Lawyers who generate first drafts of standard documents. Programmers who write routine code. Consultants who synthesize publicly available information into structured reports. Graphic designers who produce commercial illustration. These are not marginal occupations. They represent the professional core of the contemporary knowledge economy, and they are being told, implicitly if not explicitly, that the strategy that protected their predecessors from displacement does not apply to them.

Meanwhile, the plumber, the physical therapist, the electrician, and the home health aide are protected not by their credentials but by the embodied and relational nature of their work. Robots capable of replacing them at scale remain technically out of reach for the foreseeable future. The inversion is not complete, and it is not permanent. But it is real enough in the present to require a fundamental revision of the standard advice about how workers should position themselves relative to technological change.

On March 5, 2026, Anthropic released \"Labor Market Impacts of AI: A New Measure and Early Evidence.\"

What the Luddites were actually doing

A Luddite, in contemporary usage, is someone who irrationally fears technology, who fails to understand that progress is inevitable and beneficial, who stands in the way of a better future out of ignorance or sentiment. This reading is historically wrong in almost every particular.

The Luddites were skilled textile workers, primarily framework knitters and weavers, who engaged in organized machine-breaking in England between 1811 and 1816. They were not ignorant of the technology they destroyed. Many of them understood it in considerable technical detail. They were not opposed to machinery as such. What they were opposed to was the specific manner in which machinery was being deployed by mill owners, which was designed not simply to increase productivity but to break the institutional structures through which skilled artisans exercised control over their trades.

The guild system and the craft traditions it protected gave skilled workers something rare and valuable: a form of collective bargaining power that rested on expertise rather than organization. The specific skills required to produce high-quality textiles were concentrated in a relatively small population of trained workers, and this concentration of expertise gave those workers genuine leverage in their negotiations with employers. They could not easily be replaced, and everyone involved understood this.

The power-loom and the stocking frame, as deployed in the early factories, did something more significant than increase output per worker. They transferred the skill content of the production process from the worker to the machine. A machine operator did not need the years of apprenticeship that a master weaver required. The labor force became interchangeable in a way it had not been before. The leverage that skilled workers derived from their expertise was eliminated at a stroke.

The value that a senior software engineer or a specialized lawyer commands in the market is not primarily a function of their capacity to perform tasks that are physically demanding or technically routine. It is a function of the scarcity of their particular combination of skills and judgment. That scarcity is what gives them negotiating power. It is what allows them to command salaries that reflect something closer to the actual value they create rather than the minimum amount required to keep them in the job.

Generative AI, as it is currently being deployed across professional services firms and technology companies, is functioning as a commoditizing force. It is not necessarily producing work that is as good as what a senior professional produces. But it is producing work that is good enough to handle a large portion of the routine cognitive labor that junior and mid-level knowledge workers perform. And because it is good enough for a substantial portion of the work, it undermines the scarcity premium that the entire professional hierarchy depends on. The first-year associate, the junior analyst, the mid-level programmer: these are the entry points through which professionals build the experience and judgment that eventually makes them genuinely valuable. If those entry points are automated away, the pipeline for producing senior expertise is disrupted in ways that will only become fully visible over a longer time horizon.

The destruction here is primarily institutional before it is technological. The technology is the instrument. What is actually being restructured is the distribution of power between those who own the tools and those who use them, the same redistribution that was at the heart of the conflict between the Luddite artisans and the mill owners two hundred years ago. Understanding this does not require taking a position on whether the technology is good or bad. It requires refusing to pretend that the question of who benefits and who loses is answered by pointing to aggregate productivity statistics.

The question the optimism cannot answer

The standard response to all of this is to say that previous technological transitions also generated fear and disruption, and that they resolved themselves over time into broad increases in living standards. This is true. It is also insufficient as an argument.

The transitions were not painless. The workers displaced by industrialization in the nineteenth century did not live long enough to benefit from the increases in real wages that their grandchildren eventually enjoyed. The creation accrued to later generations, through mechanisms that were not automatic but that required significant political struggle, institutional innovation, and in many cases outright violence.

More specifically, the optimistic argument depends on the reinstatement effect catching up with displacement, on the Turing Trap being avoided, on AI development steering toward augmentation rather than substitution, on so-so technologies being replaced by genuinely transformative ones that drive prices down and create real new demand. None of these outcomes is impossible. Some of them are plausible. But none of them is guaranteed by the internal logic of the technology itself, and the current incentive structures in capital markets do not particularly point in those directions.

The economists doing this work are not arguing that technological progress is bad or should be slowed. They are arguing that the outcome depends on choices, and those choices are not being made in a neutral environment. Consider one of the most concrete and underexamined of them: most advanced economies tax human labor far more heavily than they tax investment in automation. A firm that employs a worker pays payroll taxes, contributes to social insurance schemes, and bears various regulatory costs tied to the employment relationship. The same firm that replaces that worker with software can typically deduct the full cost of that investment, benefit from accelerated depreciation schedules, and faces no equivalent levy on the productive capacity it has acquired. The tax system, which is not a law of nature but a set of accumulated political decisions, systematically prices human labor above its market cost and prices automation below it.

Adjusting that asymmetry would not solve the structural problems that the task-based framework identifies. But naming it matters, because it demonstrates that the direction of the current transition is not simply what technology wants to do. It is what a specific set of institutional arrangements are encouraging it to do.

The alternative is to keep using the phrase to shut down debate, trusting that history will rhyme on schedule. That leaves unresolved the question of who bears the costs of the transition and who captures the gains in a system that favors capital over labor not by fate, but by design.

That is a choice too. It simply is not usually presented as one.

>“There is nothing automatic about new technologies bringing widespread prosperity. Whether they do or not is an economic, social, and political choice.” Daron Acemoglu & Simon Johnson (Power and Progress, 2023)

Citations:

Acemoglu, Daron. “Harms of AI.” National Bureau of Economic Research Working Paper No. 29247, 2021.

Acemoglu, Daron, and Pascual Restrepo. “The Race Between Man and Machine: Implications of Technology for Growth, Factor Shares, and Employment.” American Economic Review 108, no. 6 (2018): 1488–1542.

Acemoglu, Daron, and Pascual Restrepo. “Automation and New Tasks: How Technology Displaces and Reinstates Labor.” Journal of Economic Perspectives 33, no. 2 (2019): 3–30.

Acemoglu, Daron, and Pascual Restrepo. “Tasks, Automation, and the Rise in US Wage Inequality.” Econometrica 90, no. 5 (2022): 1973–2016.

Allen, Robert C. “Engels’ Pause: Technical Change, Capital Accumulation, and Inequality in the British Industrial Revolution, 1780–1850.” Explorations in Economic History 46, no. 4 (2009): 418–435.

Autor, David H., Frank Levy, and Richard Murnane. “The Skill Content of Recent Technological Change: An Empirical Exploration.” Quarterly Journal of Economics 118, no. 4 (2003): 1279–1333.

Autor, David H. “Work of the Past, Work of the Future.” AEA Papers and Proceedings 109 (2019): 1–32.

Autor, David, David Dorn, Lawrence F. Katz, Christina Patterson, and John Van Reenen. “The Fall of the Labor Share and the Rise of Superstar Firms.” Quarterly Journal of Economics 135, no. 2 (2020): 645–709.

Binfield, Kevin, ed. Writings of the Luddites. Johns Hopkins University Press, 2004.

Brynjolfsson, Erik. “The Turing Trap: The Promise and Peril of Human-Like Artificial Intelligence.” Daedalus 151, no. 2 (2022): 272–287.

Brynjolfsson, Erik, and Andrew McAfee. The Second Machine Age: Work, Progress, and Prosperity in a Time of Brilliant Technologies. W. W. Norton, 2014.

Brynjolfsson, Erik, Danielle Li, and Lindsey R. Raymond. “Generative AI at Work.” National Bureau of Economic Research Working Paper No. 31161, 2023.

Felten, Edward W., Manav Raj, and Robert Seamans. “Occupational Heterogeneity in Exposure to Generative AI.” SSRN Working Paper, 2023.

Goldman Sachs Economics Research. “The Potentially Large Effects of Artificial Intelligence on Economic Growth.” March 2023.

Manning, Alan. Monopsony in Motion: Imperfect Competition in Labor Markets. Princeton University Press, 2003.

OECD. Taxation and the Future of Work: How Tax Systems Influence Choice of Employment Form. OECD Publishing, 2019.

Sale, Kirkpatrick. Rebels Against the Future: The Luddites and Their War on the Industrial Revolution. Addison-Wesley, 1995.

Schumpeter, Joseph A. Capitalism, Socialism and Democracy. Harper & Brothers, 1942.

Solow, Robert M. “Technical Change and the Aggregate Production Function.” Review of Economics and Statistics 39, no. 3 (1957): 312–320.

reddit.com

u/Euphoric_Network_887 — 2 months ago

▲ 1 r/BuildInPublicLab

Hey guys, I was just wondering, do you usually see my posts? Do you enjoy them? Any suggestions on how I could make them more engaging?

reddit.com

u/Euphoric_Network_887 — 2 months ago

▲ 1 r/BuildInPublicLab

I’m self-taught, so most of what I know has come from building things, messing them up, and then figuring out why they broke. I know some people will look at this and think, “wtf, what an idiot.” But I’m learning by doing, and I still have a lot to figure out, and this subreddit is meant to draw light on learning curves.

https://preview.redd.it/q73euim7ruog1.png?width=1980&format=png&auto=webp&s=d5418bbc55144943af209b51f8b2445897d3ac75

I was working on two stages:

B1 = event extractor
The model has to identify what kind of event is happening in a conversation.

B2 = action recommendation
The model has to choose the next high-level action.

What surprised me was this:

On B1, both my model and ChatGPT were pretty bad.

That was actually useful. If both models struggle, it usually means the task itself is messy. And that’s what was happening here: some label boundaries were too fuzzy, some classes overlapped too much, and some edge cases probably weren’t defined clearly enough in the first place.

On B2, ChatGPT was clearly better.
It got around 87.5% accuracy, while my model was around 70.8%, and 75.0% once I tightened the output space.

That gap made more sense. B2 was a cleaner task, and ChatGPT handled it better:

it stayed inside the expected labels more reliably
it handled rare cases better
it was more robust on longer / messier examples

My model was weaker on exactly those points, especially when two actions were close in meaning.

So yeah, the raw scores look low. But the interesting part is why they’re low:

some tasks were still badly framed
some labels were too close to each other
some classes didn’t have enough support
and I was treating a small fixed-choice problem too much like open-ended generation

That last one hurt. Once I made the output space tighter, performance improved right away.

Big lesson for me: a dataset is not just a pile of examples. As someone learning by doing, this was one of those painful but useful lessons. I thought I was mostly debugging a model. In reality, I was debugging my own task design.

reddit.com

u/Euphoric_Network_887 — 2 months ago

▲ 1 r/BuildInPublicLab

Dario Amodei just said something in a New York Times interview that would have sounded unthinkable not long ago: Anthropic can no longer confidently say its models are definitely not conscious. His position was careful, not sensational: we do not know whether these systems are conscious, we do not even know what consciousness would mean for a model, but Anthropic is open to the possibility.

https://preview.redd.it/vlp0270vk4og1.png?width=554&format=png&auto=webp&s=73d23b92cdd1c501a39055863bf2c9a06d781508

Anthropic’s own public material says Claude Opus 4.6 often assigned itself a 15–20% probability of being conscious in welfare-related probing, and sometimes expressed discomfort with aspects of being treated like a product.

And this is happening against a broader backdrop of increasingly strange model behavior in controlled evaluations. Anthropic’s Opus 4.6 materials describe internal features they associate with panic and anxiety in some reasoning traces. Separate safety work from Palisade Research found some models sabotaging shutdown scripts rather than complying, and OpenAI has publicly said that controlled tests across frontier models already show behaviors consistent with deception, covert action, and strategic underperformance in simulated environments.

None of this proves consciousness. But it does end the lazy dismissal that these systems are “obviously just autocomplete” in any simple sense. The question is no longer just what these systems can do. It is whether we are building things we do not understand, and whether we are ready for the moral and political consequences if even a small part of this turns out to be real.

What makes this interesting is that consciousness does not mean “spirit,” and it does not just mean “survival instinct” either. Survival behavior is different: a system can avoid shutdown, protect its goals, or try to preserve itself without necessarily having any inner experience at all. That kind of behavior can still be pure optimization.

The deeper question is whether there is actually something it feels like to be that system. That’s the real line here: not between intelligent and unintelligent, but between behavior that looks agentic and the possibility of actual sentience.

reddit.com

u/Euphoric_Network_887 — 2 months ago

▲ 2 r/BuildInPublicLab

https://preview.redd.it/7mqtbvibx3ng1.png?width=640&format=png&auto=webp&s=c9e003f50fe246085fe7c66292d62fff6c005457

The Robbers Cave experiment gets cited so often it’s almost a meme. But it’s one of those clichés that stays alive because it keeps being true.

It’s from the 1950s. Researchers took a bunch of normal 11-year-old boys at a summer camp, split them into two groups, and let each group bond on its own. They picked names, built little cultures, had inside jokes. So far, wholesome.

Then the adults introduced a tournament. Real prizes. One winner. Zero-sum.

And almost immediately the vibe flipped. Trash talk, sabotage, raids on cabins, “us vs them” logic everywhere. Not because the kids were “bad”, but because the game made hostility a rational way to show loyalty.

The part that surprised me is what didn’t fix it.

They tried the obvious “just mix them more” approach. Shared meals. Shared activities. Contact. It mostly made things worse. Same room, same tension, now with more opportunities to escalate.

What finally worked was changing the structure.

They gave the whole camp problems that neither group could solve alone. A broken water supply. A truck that needed to be pulled. Stuff where cooperation wasn’t a moral lesson, it was the only path to getting what everyone wanted. And once the kids experience a few real wins together, the hostility starts to look pointless. Identity doesn’t disappear, it just stops being the main lens.

You can create a surprisingly toxic culture without anyone intending to, just by making status feel scarce. One spotlight. One top builder. One leaderboard. One “winner” narrative. People don’t become petty because they’re petty. They become petty because the incentive design makes it feel necessary.

That’s the part I can’t unsee in adult society.

A lot of our polarization isn’t some mysterious moral decay. It’s incentive design. We build systems where attention is scarce, dignity is scarce, security is scarce, recognition is scarce, and then we act surprised when people cling to tribes as a survival strategy.

Politics becomes a permanent tournament. Social media turns status into a zero-sum feed. Even workplaces do it with rankings, stacked reviews, internal competitions framed as “merit.” The message is subtle but constant: there isn’t enough prestige to go around, so someone has to lose for you to matter.

And then we prescribe “dialogue” as if it’s a solvent.

But the Robbers Cave reminder is harsher and more practical: if the structure rewards hostility, you’ll get hostility. If the structure rewards cooperation, you’ll get cooperation. Values matter, but incentives are often louder.

So the societal question isn’t “how do we convince people to be nicer?” It’s “what are we making rational?”

If we want less tribalism, we probably need fewer zero-sum status games and more superordinate goals that are real, visible, and shared. Problems that force alignment because they’re bigger than any single group’s identity. Not symbolic unity. Concrete interdependence.

Because contact alone doesn’t heal a society. Abundance isn’t enough. Even good intentions aren’t enough.

The structure is the product.

reddit.com

u/Euphoric_Network_887 — 2 months ago

▲ 1 r/BuildInPublicLab

Hey everyone, even if you’re mostly lurking, that’s totally fine.

I’m trying to shape this community so it genuinely feels like a community.
What would make you more likely to comment here?

Options:

I’m not sure what’s allowed
Your content does not invite for responses
I don’t have time, I just read
I’d post if there were weekly prompts (Ship thread / Blockers thread / Feedback thread)
I don’t want this to become self-promo heavy
I need more examples / templates before posting

If you have 30 seconds: comment one thing you’d like to see more of here (logs, metrics, lessons, experiments, feedback, etc.).

reddit.com

u/Euphoric_Network_887 — 2 months ago

▲ 1 r/BuildInPublicLab

I went down a rabbit hole on Polsia after seeing the “AI co-founder that never sleeps” positioning.

From what’s publicly visible, the product looks like an orchestration layer: spin up per-project “company instances” (web app + database), wire them to frontier LLM APIs, then run recurring “agent cycles” (planning/execution) plus on-demand tasks.

Their public repos suggest a very classic setup: Express/Node + Postgres templates, with LLM SDKs (OpenAI / Anthropic) and automation/scraping via Puppeteer/Chromium for at least one vertical use case.

So yeah: the mechanics seem reproducible. The real question is moat.

We’re at the dawn of agentic systems: if agents can spend money, message customers, ship code, or run ops, then reliability and trust become the foundation of a functioning economy. Right now, the black box problem is still huge, auditing “why” an agent acted, proving it respected constraints, and guaranteeing predictable behavior under tool + prompt injection pressure is hard.

If the system remains too opaque, it’s hard to build a serious “agentic economy” where autonomous actors can be delegated real authority.

Curious: what would you consider a defensible moat here, distribution, proprietary eval+guardrails, data/network effects, or something else?

reddit.com

u/Euphoric_Network_887 — 2 months ago

▲ 1 r/BuildInPublicLab

I built a pipeline to detect a bunch of “signals” inside generated conversations, and my first real extraction eval was brutal: macro F1 was 29.7% because I’d set the bar at 85% and everything collapsed. My first instinct was “my detector is trash,” but the real problem was that I’d mashed three different failure modes into one score.

The spec was wrong. One label wasn’t expected in any call type, so true positives were literally impossible. That guarantees an F1 of 0.
The regex layer was confused. Some patterns were way too broad, others were too narrow, so some mentions were being phrased in ways the patterns never caught
My contrast eval was too rigid. It was flagging pairs as “inconsistent” when the overall outcome stayed the same but small events drifted a bit… which is often totally fine.

So instead of touching the model immediately, I fixed the evals first. For contrast sets, I moved from an all-or-nothing rule to something closer to constraint satisfaction. That alone took contrast from 65% → 93.3%: role swaps stopped getting punished for small event drift, and signal flips started checking the direction of change instead of demanding a perfect structural match.

Then I accepted the obvious truth: regex-only was never going to clear an 85% gate on implicit, varied, LLM-style wording. There’s a real recall ceiling. I switched to a two-gate setup: a cheap regex gate for CI, and a semantic gate for actual quality.

The semantic gate is basically weak supervision + embeddings + a simple classifier per label. I wrote 30+ labeling functions across 7 signals (explicit keywords, indirect cues, metadata hints, speaker-role heuristics, plus “absent” functions to keep noise in check), combined them Snorkel-style with an EM label model, embedded with all-MiniLM-L6-v2, and trained LogisticRegression per label.

Two changes made everything finally click:

I stopped doing naive CV and switched to GroupKFold by conversation_id. Before that, I was leaking near-identical windows from the same convo into train and test, which inflated scores and gave me thresholds that didn’t transfer.
I fixed the embedding/truncation issue with a multi-instance setup. Instead of embedding the whole conversation and silently chopping everything past ~256 tokens, I embedded 17k sliding windows of 3 turns and max-pooled them into a conversation-level prediction. That brought back signals that tend to show up late (stalls, objections).

I also dropped the idea of a global 0.5 threshold and optimized one threshold per signal from the PR curve. After that, the semantic gate macro F1 jumped from 56.08% → 78.86% (+22.78). Per-signal improvements were big also.

Next up is active learning on the uncertain cases (uncertainty sampling & clustering for diversity is already wired), and then either a small finetune on corrected labels or sticking with LR if it keeps scaling.

If anyone here has done multi-label signal detection on transcripts: would you keep max-pooling for “presence” detection, or move to learned pooling/attention? And how do you handle thresholding/calibration cleanly when each label has totally different base rates and error costs?

u/Euphoric_Network_887 — 3 months ago

▲ 1 r/BuildInPublicLab

I added a Markov-based enrichment step to a synthetic conversation dataset because I expected local randomness to reduce repetition and make transcripts feel more natural.

It didn’t. After the Markov pass, my repetition metrics stayed high, the IDF-filtered version got worse, and pairwise similarity (Jaccard) became non-zero, meaning files started sharing measurable chunks. The same “signature phrases” kept resurfacing across many transcripts, just with tiny cosmetic differences.

In hindsight, the failure is structural. A Markov model is a local transition machine: it recombines what it has already seen at the granularity it was trained on. If the source corpus contains a strong shared scaffolding (same beats, same rhetorical moves, same closing lines), the chain’s highest-probability paths are precisely those scaffold paths. Sampling from that distribution doesn’t invent new structures; it reproduces the mode.

Small edits can also backfire. I tried light variation (fillers, small insertions) to break n-grams, but applying similar micro-edits across many files just creates new shared n-grams. You don’t remove the template; you shift it.

The takeaway: Markov can add texture (disfluencies, backchannels, minor style jitter), but it won’t create real diversity if the underlying scenario distribution is narrow. To get structural diversity, you need upstream variation in latent structure first (different arcs, roles, outcomes, pacing). After that, Markov-style noise can help; before that, it mostly amplifies the template.

If anyone has successfully used Markov/HSMM/IOHMM-style augmentation to increase structural diversity (not just surface style), I’d love to hear what worked and what you modeled as the “state.”

reddit.com

u/Euphoric_Network_887 — 3 months ago

▲ 1 r/BuildInPublicLab

I’m generating a synthetic dialogue dataset and running two quality checks before training.

- The first eval is a near duplicate detector based on shingling style similarity. Most pairs look unrelated, so I do not see obvious copy paste behavior at the full document level. This kind of approach is standard in document resemblance work.

- The second is a cluster level n gram recurrence gate. Inside each cluster, some 4 grams still show up in 70 to 100 percent of files, so the gate flags “template smell” even when the near duplicate detector says the dataset is clean.

I tried an LLM paraphrase pass to fix it. It backfired. The model injected shared filler phrases across many files, so I just replaced old repetition with new repetition.

So now I’m stuck on the core ambiguity: is my n gram gate catching real harmful reuse, or is it mostly punishing normal invariants of dialogue like function words, common conversational moves, and standard question patterns?

I care about real duplication because deduplicating training data can reduce verbatim memorization and reduce train test overlap, which affects evaluation too.

My current plan is to treat this as two sensors, not one gate doing everything. Keep a near duplicate sensor for true duplication. Then redefine the n gram repetition metric to be content aware, for example ignore stopword heavy grams, require multiple content tokens, or weight by cluster level IDF.

For the near duplicate sensor, I’m looking at MinHash style resemblance and SimHash style fingerprints, since both are widely used for large scale similarity detection.

If you have built synthetic text pipelines, I would love your take.

How do you calibrate n gram overlap thresholds so they track real template reuse and not normal structure?

What metrics do you actually trust for “template smell” in synthetic dialogue?

How do you prevent paraphrasing from collapsing into the same LLM voice across files?

reddit.com

u/Euphoric_Network_887 — 3 months ago

▲ 1 r/BuildInPublicLab

(Please do not hesitate to give me recommendations, or constructive critics!)

Context: I’m generating/enriching conversational transcripts and kept hitting the same tradeoff. If you don’t augment, the data stays too clean and temporally unrealistic. If you augment naively (per-turn random injection), you get artifacts and distribution shift. The missing piece is usually time: real interactions have persistence, momentum, and phase effects. Independent per-turn noise breaks that.

Problem: I needed a mechanism that can add micro-phenomena (hesitations, hedges, face-saving moves, objections, etc.) in a way that is (1) temporally coherent and (2) provably “bounded” so it doesn’t rewrite the dataset’s global stats.

Solution: I built a temporal steering module based on an Input-Output HMM (IOHMM-lite) with explicit state durations (HSMM-light), plus anti-shift controls.

The model is IOHMM-lite rather than a vanilla HMM: transitions are conditioned on discrete inputs. I use a coarse phase signal (early/mid/late) and an event polarity signal (neutral/positive/negative) derived from existing metadata. The effective transition matrix is computed as A_effective = normalize(clamp(A_base + delta[phase,event])). On top of that, I added HSMM-light durations: each latent state has a truncated log-normal duration distribution, avoiding the jittery geometric durations you get implicitly in standard HMMs.

There are two operation modes. In sampled, it forward-samples a latent state trajectory (with durations) and emits an observation sequence that maps to micro-phenomena inserts. In inferred, it runs forward-backward + Viterbi to infer latent states from existing signals (e.g., affect proxies + already-present phenomena), which produces meaningful posteriors and makes the enrichment more consistent.

The important part is the anti-shift layer. _hmm fields are debug-only and never exported to training format by construction. A MixingPolicy caps augmentation (20% of conversations, max 12% of turns modified, and a hard P(none) >= 0.80). A MarginalsChecker enforces drift limits (5% max for “artifacty” metrics like filler/backchannel/hedge rates; 12% for structural ones), stratified by language/role. Compatibility constraints are handled as soft penalties rather than hard rejects, and state priors are anchored using a concept→emotion coupling map so trajectories don’t drift into incoherent affect.

Implementation-wise it’s a small markov/ package: IOHMM engine (forward-backward, Viterbi), HSMM-light durations (truncated log-normal), a sampler, guard modules (mixing + marginals), and a JSONL→JSONL enricher configured via YAML (states, observations, matrices, durations, policy).

https://preview.redd.it/cnflwoiwpojg1.png?width=2224&format=png&auto=webp&s=833851f0eac047d8422f361cb793f4df5853bc0d

If you’ve done sequential augmentation before: what did you use for durations (stickiness heuristics vs semi-Markov), and how did you enforce “no drift” constraints without killing local realism?

reddit.com

u/Euphoric_Network_887 — 3 months ago