Real estate scraping project
Hello everyone! I am heading to college this fall and I am trying to start making valuable projects.
I am planning to build a couple pipelines that would load raw listing data and then be cleaned through a medallion architecture,
my scripts would be orchestrated, containered and ci/cd (I think this is just version control please correct if I am wrong)
and then I would load the gold layers into a visualization tool. I would like to also implement some methodologies such as kimballs bottom up 4 step approach that I learned, and also working with SCD through start and end dates to track time in the market.
Overall, here would be my stack:
Python, postgresql, dbt, airflow, powerBI, docker, github ofc
My questions are, however, is am I doing too much? Am I not doing enough? How can I improve this project so that I can reap the benefits of the project itself?
Thanks!