Open Source Excel Parser
Tested excel parser today and had a much better recall against Docling + bounding boxes are preserved and 99.95% accuracy for excel.
https://github.com/knowledgestack/excel-parser
It's significantly faster than docling, no VLLMs needed to chunk it.
It's MIT license for anyone using excel parser but also:
I would appreciate 2 things if anyone uses it:
Could you please help open issues and problems if you see any ? I am working on making this the best excel parser.
If you see accuracy improvements, I would love to hear it. I am investing a lot of time and energy because I believe large excel parsing is a problem and feeding entire excel to agent is not a good use of time and money.
Also I think if we can do this reasonably well the agent can generate excel with formulas much better. Hoping to add more functionality in the future to older excel formats and changing this from just a parser to a excel generation as well.
If this is helpful, and you think would be something useful, please star it as well. I would really appreciate it !