
Hey everyone!
I’m a final-year Data Science/Software Engineering student. For my Final Year project, I wanted to tackle the learning of the Japanese pitch accent
So, I built a web app from scratch (UI and backend) powered by a custom Deep Learning model that grades your accent by comparing it to a native pronounciation.
Here is a breakdown of how it works and what I learned building the AI behind it.
Link: https://pitchaccentapp.web.app/
How to use it (The UI)
I designed the interface to feel like an Anki deck. As you can see in the screenshot, you get a standard flashcard layout with the word (like 有力 / yuuryoku), the meaning, and an example sentence.
- Listen: You click the audio button to hear the native pronunciation.
- Visualize: You can see the intended pitch accent mapped out (the red/black text shows the highs and lows). Black is Low, Red is High
- Test Yourself: You tap the Mic button and say the word into your browser.
- Get Graded: My AI compares your audio to the native speaker's audio and gives you a similarity score to let you know if you nailed the pitch. AI score has a weight of 40% and DTW (dynamic tim warping ) algorthim has 60% to get a combined score
How I built it
- Data Scraping: I couldn't find a clean dataset, so I wrote a custom scraper to pull thousands of native audio files directly from OJAD (Online Japanese Accent Dictionary). I then had to write scripts to clean, resample, and standardize the audio so the neural network could process it.
- The AI: I built a Siamese Neural Network (using a math concept called Contrastive Loss). Instead of categorizing words, it uses twin networks to compare the mathematical distance between the native OJAD audio and your microphone input.
- Odaka: I trained the model on 900 samples of audio, 300 heiban, atamadaka and nakadaka. Odaka (same as heiban except when a particle is added) would cause confusion to the model so i removed it. since the deck consists of words, particles never come up.
Mel spectograms the model is trained on
Disclaimer: The AI definitely isn't perfect yet, its accurate 80% of the time. It's still a work in progress, so I am really looking for your feedback on the UI, the grading accuracy, or any suggestions.