r/semanticweb

▲ 9 r/semanticweb+3 crossposts

Knowledge Graphs to tackle the problem of searching code and documentation again and again with help of Mnemo

This is what your codebase actually looks like.

2032 nodes. 2878 edges. 7 relationship types.

Every service. Every dependency. Every API. Every owner. Every connection your team built over years — visualised in one graph.

Most AI coding assistants see none of this.

They see the file you have open.
Maybe the files you paste in.
Nothing else.

So when they generate code, they generate it blind.
No knowledge of what depends on what.
No knowledge of what breaks if you change something.
No knowledge of the relationships your team spent years building.

This is the real problem with AI in enterprise development.
It's not capability. The models are powerful.

It's context. AI operates on a fraction of the knowledge your senior engineers carry in their heads.

Mnemo builds this knowledge graph automatically from your codebase.

Services and their boundaries.
APIs and their consumers.
Dependencies and their blast radius.
Files and their owners.
Decisions and their history.

And then makes all of it available to your AI assistant — automatically, on every session.

No more blind generation.
No more code that compiles but breaks something downstream.
No more AI that doesn't know why things are the way they are.

This is what AI-assisted development should actually look like.

🔗 github.com/Mnemo-mcp/Mnemo

Drop a comment if you've ever had AI break something it didn't know existed.

u/killerexelon — 1 day ago
▲ 3 r/semanticweb+1 crossposts

How to turn a messy SQL schema into a domain ontology — the 4-step process I use

Our schema had 47 tables. Our Confluence had 200 pages. Neither told us what the business actually did. A column named status appeared in 11 different tables. In 3 of them it meant completely different things. Nobody caught it for 4 years because the documentation was written by whoever built the table, never reconciled, and last updated in 2021. We fixed it by building a domain ontology directly from the schema. Not a data dictionary. Not an ER diagram. An actual ontology — where every concept has a formal definition, every relationship has a direction, and every uncertainty is explicitly labeled instead of silently papered over. Here's the process, because I've never seen it written down clearly.

Step 1: Classify what your tables actually are Before you touch any columns, you need to decide what role each table plays. Four categories cover almost everything:

Entity table → a thing that persists (Customer, Order, Product) Event/audit table → something that happened (OrderStatusChange, LoginAttempt) Junction/bridge table → a many-to-many relationship between entities Lookup/code table → a controlled vocabulary (StatusCodes, CountryCodes)

Most schemas are a mix, and the confusion comes from tables that look like entities but are actually event logs — or vice versa. In our case, three tables we'd been treating as entities were actually event logs with no primary entity attached. That was hiding half our business process from our data model.

Step 2: Classify your columns as properties or relations Two types:

Data property — a value attached to the entity (name, amount, timestamp) Object property — a link to another entity (foreign key)

The interesting column is status. If status is a FK into a lookup table, it's an object property — your entity has a relationship to a state. If it's a plain string like 'active'/'cancelled', you now need to decide: is that a value partition (enum) or are these actually instances of a State class with their own logic? That distinction changes your downstream queries, your event modeling, and whether your ML features are leaking state information they shouldn't have.

Step 3: Tag everything as Evidence, Hypothesis, or Gap This is the step nobody does and the reason data models drift.

Evidence: directly confirmed from the schema or from code (orders.customer_id is a FK → confirmed relation) Hypothesis: inferred but not confirmed ("the cancelled_at timestamp implies a Cancellation event class") Gap: explicitly missing ("no timestamp exists for the Approval transition — we cannot reconstruct approval history")

The Gaps are the most valuable output. They tell you exactly what your schema can't answer. Before we ran this process, we thought our schema had full order lifecycle coverage. After: we found 6 state transitions with no timestamp, meaning we had been silently reporting incorrect cycle times for 2 years.

Step 4: Reconcile the inconsistencies explicitly The status problem I mentioned? Once you've typed every table and classified every column, you run a simple check: any column with the same name that maps to a different primitive type across tables is an inconsistency that needs a formal resolution. In our case:

orders.status → State (current condition of an entity) payments.status → Event outcome (result of a completed process) users.status → Role flag (operational classification, not a state machine)

Three different semantic meanings. Same column name. One fix: rename them and add the reconciliation note to the ontology as a documented decision, not a silent rename in a migration script.

What changed after doing this Our data contracts got sharper because the ontology is the schema documentation — not a separate artifact that drifts. New engineers onboard to the domain model, not 200 Confluence pages. And when we get a question like "how long does an order stay in approval?" we can immediately tell them whether our schema can answer it or not, rather than spending a week on a query that returns wrong data. The process takes longer upfront. It's worth it.

What's the worst case of documentation-reality drift you've hit in a schema you inherited?

reddit.com
▲ 3 r/semanticweb+1 crossposts

The content of public domain works is widely findable and retrievable, but the metadata is surprisingly difficult to access and understand. In this post, I dive into:

  • The Current State of Open Data on Public Domain Works
  • Exploring Public Domain Works in Wikidata
  • Querying Wikidata for Public Domain Works data
  • Public Domain Works data as a Knowledge Commons

To understand the data in Wikidata better, I generated an interactive knowledge graph of public domain data using the Open Data Explorer:

These weekly Open Data explorations are a project of The Knowledge Commons: https://theknowledgecommons.org

#opendata #wikidata #publicdomain #semanticweb

theknowledgecommons.org
u/shellybelle — 10 days ago

Background

UK and European heritage archives hold roughly 50 million aerial photographs: RAF wartime reconnaissance, post-war urban surveys, US-transferred imagery, satellite holdings. They're digitised (scanned, on the web, browsable as thumbnails). They're not computable: free-text dates in eight different formats, free-text rights statements, point coordinates instead of footprint geometries, ISAD-G metadata that doesn't survive a SPARQL query.

I've been building a focused, vertical digitisation standard that closes that specific gap. Sharing it now because the design is stable enough that pushback is more useful than more polish.

What's in it

  • Ontology — 30 classes, 29 properties, reusing PROV-O / GeoSPARQL / SKOS / Dublin Core / FOAF / DCAT (synthesis, not invention)
  • SHACL shapes for three tiers (Baseline / Enhanced / Aspirational), incrementally adoptable
  • End-to-end CSV → Turtle ingest pipeline (~200 LOC, runs)
  • IIIF Presentation 3.0 bridge so any IIIF viewer can consume it
  • Footprint derivation from flight metadata (altitude + focal length → vertical FOV polygon)
  • Stereo pair detection from overlap geometry
  • Sub-profiles for reconnaissance, satellite, UAV, photogrammetric, and aerial archaeology imagery
  • Governance proposal, partner clinic playbook, 9 ADRs, 40+ SPARQL queries, investment case

Aligned with Towards a National Collection (AHRC/UKRI) and the N-RICH Prototype. Licensed CC BY 4.0 / CC0 / MIT.

Where I'd appreciate feedback

  • Three tiers (Baseline/Enhanced/Aspirational) — right call, or would two tiers be cleaner?
  • I attach naph:capturedOn directly to the photograph rather than via a prov:Activity. Pragmatic shortcut or anti-pattern given that the rest of the model is PROV-aligned?
  • Footprint geometry in WGS84 only — should I model multi-CRS natively?
  • IIIF Presentation 3.0 mapping — anything important I'm missing?

https://github.com/fabio-rovai/open-ontologies/tree/main/case-studies/heritage-aerial

reddit.com
u/Successful-Farm5339 — 9 days ago