u/Only-Economist1887

How do I optimize SQL queries for large datasets as a beginner data analyst?

I'm a beginner data analyst (about 6 months into learning SQL) and I'm working with a retail sales dataset that has around 500,000 rows. My queries are running quite slow and I'm not sure where to start with optimization.

Here's a typical query I'm running:

SELECT product_category, SUM(sales_amount) as total_sales

FROM sales_data

WHERE sale_date BETWEEN '2024-01-01' AND '2024-12-31'

GROUP BY product_category

ORDER BY total_sales DESC;

This takes about 8-10 seconds to run. I've heard about indexing but I'm not sure how and where to apply it. Any tips or resources would be really helpful!

reddit.com
u/Only-Economist1887 — 14 hours ago