u/Trick-Praline6688

Indian language speech datasets available (explicit consent from contributors)

Hi all,

I’m part of a team collecting speech datasets in several Indian languages. All recordings are collected directly from contributors who provide explicit consent for their audio to be used and licensed.

The datasets can be offered with either exclusive or non-exclusive rights depending on the requirement.

If you’re working on speech recognition, text-to-speech, voice AI, or other audio-related ML projects and are looking for Indian language data, feel free to get in touch. Happy to share more information about availability and languages covered.

— Divyam Bhatia
Founder, DataCatalyst

reddit.com
u/Trick-Praline6688 — 17 hours ago

[D] Offering licensed Indian language speech datasets (with explicit contributor consent)

Hi everyone,

I run a small data initiative where we collect speech datasets in multiple Indian languages directly from contributors who provide explicit consent for their recordings to be used and licensed.

We can provide datasets with either exclusive or non-exclusive rights depending on the use case. The goal is to make ethically sourced speech data available for teams working on ASR, TTS, voice AI, or related research.

If anyone here is working on speech models and might be looking for Indian language audio data, feel free to reach out. Happy to share more details about the datasets and collection process.

— Divyam
Founder, DataCatalyst
datacatalyst.in

reddit.com
u/Trick-Praline6688 — 17 hours ago