u/Cryptogrowthbox

Most workforce datasets are built for analysts.

Ours is built for models.

We’ve spent years assembling a longitudinal company intelligence dataset:

•	4M+ companies across 100+ countries

•	48M+ company-year records spanning 1950–2020

•	Three intelligence layers joined into a single flat file

•	Signal flags renamed for neutral, AI-readable language

•	Pre-COVID window (2018–2020) is the densest and most immediately useful

We call it the AI Foundation Layer:

The insight that changed how we pitch it: we fed the data to a language model and asked it to answer questions about specific companies. Without the dataset, narrative guesses. With it: precise, structured, verifiable answers about headcount trajectories, revenue bands, geographic expansion, and sector pivots going back decades.

That’s the delta. The model doesn’t need to hallucinate history. It already has it.

The dataset is available on Hugging Face as a sample.

- search for Vivameda

Would love feedback from builders here, what signals matter most to you when working with company-level longitudinal data?

We built a 70-year longitudinal dataset covering 4M+ companies and structured it specifically for AI ingestion.