u/Critical-Elephant630

Our schema had 47 tables. Our Confluence had 200 pages. Neither told us what the business actually did. A column named status appeared in 11 different tables. In 3 of them it meant completely different things. Nobody caught it for 4 years because the documentation was written by whoever built the table, never reconciled, and last updated in 2021. We fixed it by building a domain ontology directly from the schema. Not a data dictionary. Not an ER diagram. An actual ontology — where every concept has a formal definition, every relationship has a direction, and every uncertainty is explicitly labeled instead of silently papered over. Here's the process, because I've never seen it written down clearly.

Step 1: Classify what your tables actually are Before you touch any columns, you need to decide what role each table plays. Four categories cover almost everything:

Entity table → a thing that persists (Customer, Order, Product) Event/audit table → something that happened (OrderStatusChange, LoginAttempt) Junction/bridge table → a many-to-many relationship between entities Lookup/code table → a controlled vocabulary (StatusCodes, CountryCodes)

Most schemas are a mix, and the confusion comes from tables that look like entities but are actually event logs — or vice versa. In our case, three tables we'd been treating as entities were actually event logs with no primary entity attached. That was hiding half our business process from our data model.

Step 2: Classify your columns as properties or relations Two types:

Data property — a value attached to the entity (name, amount, timestamp) Object property — a link to another entity (foreign key)

The interesting column is status. If status is a FK into a lookup table, it's an object property — your entity has a relationship to a state. If it's a plain string like 'active'/'cancelled', you now need to decide: is that a value partition (enum) or are these actually instances of a State class with their own logic? That distinction changes your downstream queries, your event modeling, and whether your ML features are leaking state information they shouldn't have.

Step 3: Tag everything as Evidence, Hypothesis, or Gap This is the step nobody does and the reason data models drift.

Evidence: directly confirmed from the schema or from code (orders.customer_id is a FK → confirmed relation) Hypothesis: inferred but not confirmed ("the cancelled_at timestamp implies a Cancellation event class") Gap: explicitly missing ("no timestamp exists for the Approval transition — we cannot reconstruct approval history")

The Gaps are the most valuable output. They tell you exactly what your schema can't answer. Before we ran this process, we thought our schema had full order lifecycle coverage. After: we found 6 state transitions with no timestamp, meaning we had been silently reporting incorrect cycle times for 2 years.

Step 4: Reconcile the inconsistencies explicitly The status problem I mentioned? Once you've typed every table and classified every column, you run a simple check: any column with the same name that maps to a different primitive type across tables is an inconsistency that needs a formal resolution. In our case:

orders.status → State (current condition of an entity) payments.status → Event outcome (result of a completed process) users.status → Role flag (operational classification, not a state machine)

Three different semantic meanings. Same column name. One fix: rename them and add the reconciliation note to the ontology as a documented decision, not a silent rename in a migration script.

What changed after doing this Our data contracts got sharper because the ontology is the schema documentation — not a separate artifact that drifts. New engineers onboard to the domain model, not 200 Confluence pages. And when we get a question like "how long does an order stay in approval?" we can immediately tell them whether our schema can answer it or not, rather than spending a week on a query that returns wrong data. The process takes longer upfront. It's worth it.

What's the worst case of documentation-reality drift you've hit in a schema you inherited?

TL;DR:

"Prompt engineering" is on track to become a joke label in the same way "growth hacking" did.
Inside serious orgs, the real work looks like eval suites, CI, regression testing, safety, and governance – not “10 insane ChatGPT prompts.”
Unless practitioners push for standards (metrics, versioning, regression tests, security hygiene), hiring will stay misaligned and the reputation of the field will keep eroding.

Why I wrote this

Over the last two years, I’ve noticed a huge gap between how “prompt engineering” is portrayed on social media and what it actually looks like in production teams. On LinkedIn, TikTok, and carousel posts, prompt engineering is basically framed as clever copywriting plus “act as” tricks and screenshots. Inside real products, it has quietly turned into something much closer to software engineering: designing evaluation suites, wiring prompts into CI pipelines, and keeping quality and safety stable as everything around the model changes.

At the same time, job titles and media coverage haven’t caught up. We still see “prompt engineer” roles advertised as quasi-copywriting jobs, while teams that actually ship LLM systems expect people who understand eval tooling, regression testing, and LLM security risks. That mismatch creates bad hires, failed projects, and growing skepticism about whether “prompt engineering” was just hype. This post is my attempt to articulate what I think the discipline should mean — and to ask this sub whether we should defend the label, redefine it, or let it die.

The hype vs the real job

Most of the public narrative around prompt engineering still treats it as a shallow skill: “the new programming is English,” “you just need to be good with words,” “here are 10 magic prompts that will change your life.” That framing attracts a lot of people who are great at aesthetics and storytelling, but who have never built or maintained a production LLM workflow.

In mature teams, the work looks very different. Prompt engineering is tightly coupled to evaluation and experimentation:

Designing test suites that cover real user journeys, edge cases, and failure modes.
Using tools like PromptFoo, LangSmith, Braintrust, OpenAI Evals, etc. to run controlled experiments across hundreds or thousands of examples, not just a couple of cherry‑picked prompts that look good in a screenshot.
Treating prompts as first‑class artifacts with versioning, baselines, and automated regression tests that flag when a new variant underperforms.
Integrating prompt changes into CI/CD so they go through gates, reviews, and rollbacks like any other code change.

In that world, “one weird trick” prompts that worked once in a playground are basically noise. The job is less about inventing cute phrasing and more about making model behavior predictable and robust under change.

Safety and the security blast radius

The safety dimension makes the gap even sharper. OWASP now ranks prompt injection as the #1 LLM security risk (LLM01:2025), and a lot of security research frames prompts and system messages as part of the attack surface, not just UX sugar. When your model can call tools, write to databases, or trigger workflows, a sloppy prompt isn’t just “less accurate” — it’s a potential entry point for an attacker.

In that context, prompt engineering cannot be just about creativity or persuasion. It has to include basic threat modeling: how untrusted input can flow into prompts, how to enforce contextual guardrails, how to scope tools and outputs, and how to detect abuse. “TikTok-style” prompting doesn’t prepare anyone for that responsibility, but production systems have to deal with it every day.

Hiring, titles, and the growth hacking analogy

We’ve seen this movie before with “growth hacking.” Originally, it described a serious, data‑driven discipline at the intersection of product, engineering, and marketing: funnels, experiments, SQL, referral loops, retention cohorts. Over time, the term got hijacked by listicles and courses that reduced it to “clever marketing tricks.” Eventually, serious teams rebranded around “product‑led growth,” and “growth hacker” became something you side‑eye on a résumé.

Prompt engineering feels like it’s on the same trajectory, just in fast‑forward. Right now we have:

Candidates who are excellent at prompt aesthetics but have never designed an eval suite or touched a CI pipeline.
Companies hiring “prompt engineers” as if they were copywriters, then pushing them into production‑adjacent work they’re not equipped for.
Projects that quietly fail or underperform, and people concluding that “prompt engineering was just hype” instead of admitting the hiring criteria were wrong.

If this continues, “prompt engineer” will lose informational value. It will become one of those titles that experienced hiring managers treat as a red flag, precisely because it has been diluted by low‑bar content and misaligned expectations.

What I think “prompt engineering” should mean

If we want “prompt engineering” to remain a credible discipline (or a credible skill inside broader roles), I think we need at least a shared baseline. Something like:

Familiarity with eval tooling: has actually used at least one evaluation framework or platform to compare prompt variants on real datasets.
Ability to design and maintain test suites: can turn product requirements into representative examples, edge cases, and regression tests, not just ad‑hoc test prompts.
Regression mindset: understands that prompt changes can silently break behavior and knows how to guard against that with baselines and automated checks.
Basic LLM security literacy: knows what prompt injection and data exfiltration look like in practice, and how to reduce risk with context design, tool scoping, and input/output controls.
Governance and versioning: treats prompts and system messages as reviewable artifacts with owners, history, and approval workflows — not just private notes in someone’s playground.

If someone is making slick carousels with “10 insane ChatGPT prompts,” that’s content creation. If someone has shipped LLM systems with eval suites, telemetry, safety reviews, and prompt governance, that’s closer to what I’d call prompt engineering — or AI programming, if you prefer. The label matters because it’s how outsiders decide whether this field is serious.

Question for this sub

This is where I’d love to hear from people actually building and shipping things in 2025/2026:

If you’re hiring for “prompt engineer” (or something adjacent), what does that title mean in your org today?
What minimum bar would you expect before you trust someone with production‑critical prompts?
Do you think we should defend the term “prompt engineer,” let it die and fold into “AI engineer / AI programmer,” or something else entirely?

Curious to see how people here are thinking about standards, titles, and where this discipline is heading. Thank you for reading :)