GraphRAG - Entity deduplication
Hi everyone,
I have a question related to GraphRAG. I have some experience applying it in the legal domain, and one recurring problem I face is entity duplication after the LLM extracts entities and relationships.
For example, the same person may appear in slightly different forms across documents, such as “jack,” “Dr. Jack,” “Jack Abbot,” or other variations. As a result, the graph ends up with multiple nodes that actually refer to the same real-world entity.
Have you encountered this issue before? If so, what approaches have worked best for resolving it?
I have tried several unification methods based on embedding similarity, but they have not fully solved the problem. I would be especially interested in practical strategies for entity canonicalization, entity resolution, or graph-level deduplication in a GraphRAG pipeline.