u/Comprehensive-Tie992

Hey everyone, I'm participating in a competition where the goal is to predict PM2.5 air quality concentration using Sentinel 5P satellite data (things like NO2, CO, ozone levels) and weather data across hundreds of cities. Competition starts in 4 days so I'm preparing ahead of time.

I want to make sure I'm thinking about the problem the right way before the data drops. Here's what I'd love input on:

When you look at a brand new dataset for the first time, what are you actually looking for? What's your thought process before writing any code?
How do you decide which features are worth building vs which ones are a waste of time?
For tabular data with both location and time dimensions (multiple cities, daily readings), what validation strategy keeps local scores trustworthy?
What's the most common mistake in competitions like this that silently kills your score without you realising?
What would you prioritise in the first 48 hours after the data drops?

Any advice appreciated, even on just one question. Thanks

Title: First ML competition — predicting air quality from satellite data, looking for advice from people who've done this before