Feeling stuck in Data Cleaning & Visualization despite knowing ML theory — any advice?
I’ve been learning Machine Learning for the past few months and I’m comfortable with the theory side of things now. I understand statistics, calculus, and the working of most ML algorithms.
I’ve also learned libraries like Pandas, NumPy, Matplotlib, and Seaborn, but the problem is that I still can’t confidently use them on real-world datasets. Either I get confused about what to do next, or I feel like my knowledge is too insufficient for practical projects.
I recently realized that in real-world Machine Learning, a huge amount of the work (probably 60%+) is actually:
- data cleaning
- preprocessing
- EDA
- feature engineering
- visualization
And this is exactly where I’m struggling badly.
When I get a messy real-world dataset, I often feel completely stuck:
- how to clean it properly
- what visualizations to create
- " I can't remember the syntax of any function "
- just feel stuck by looking at the data
At this point I honestly feel helpless and stuck because I don’t know how to bridge the gap between “understanding ML theory” and actually working with messy datasets confidently.
Has anyone else faced this stage before?
What resources, projects, courses, or practice methods helped you improve in data cleaning, EDA, and visualization?
Even small suggestions or personal experiences would really help.