r/WGU_MSDA

Anyone working on this right now. My experience is in chemistry and math. I'm strong in statistics but all the SQL is new to me and I'm learning it. The course materials seem to be very fragmented for someone trying to learn the background knowledge necessary for this task. Any resources that people found were helpful? I've watch a lot of Datacamp and LinkedIn learning lessons and they've been very helpful as opposed to the course materials. I'm wondering if people have other resources they have liked. I'm specifically looking for help migrating data from the staging table to my normalized tables. Do I just use the Insert Distinct commands? I think some examples would help. Aside from feeling like I'm fumbling through this course I think Ive done an OK job up through task c, but the migrating data to the new tables is tripping me up and I can't seem to find a good resource for learning this. Thanks, and other tips are appreciated.

Received my graduation today. Thanks again to this community.

Experience: 20 years of IT experience. And 10 years as a Data Engineer, working in AWS and GCP-related platforms. (Tech stack SQL, Python, Airflow , GCP , AWS , Spark ETL and Big Data Engineer)

I accelerated and planned to take two courses a month, completing them in a term. I wouldn't say it was easy; it took a lot of time, and I occasionally burned out with my office work and WGU tasks. Weekly average around 16 hours.

Setup: Macbook, Jupyter, Intellij , Gitlab

(localhost path of Jupyter is the same as the home folder of Local IntelliJ Gitlab home. Much easier to demonstrate code for videos, and code check-in is also easier. I work in a Jupyter notebook, and it automatically reflects in IntelliJ, and I commit the code there. ipynb files can be submitted as code; they don't need to be exclusively .py.

Acceleration tips learned through the instructors:

1) Make it easy for the Evaluator. Label the papers using the rubrics and clearly highlight the key answers. Even if it is just one or two-line answers that meet the rubrics, evaluators will pass them. My first few papers were like essays, but later were very crisp and still passed.

2) Video preparation. After about 3 or 4 tasks, I learned that the videos can only be 7 or 8 minutes long (per the Instructor's suggestion). My first few videos were almost 30 minutes long, explaining everything in detail. Again, videos can be crisp, and explaining the key points per the rubrics should be sufficient. A line-by-line explanation of the Python code was not necessary. Localhost Jupyter Notebook executions by modules make it easier to make crisp videos where we can explain block by block.

3) Git History: Courses that ask for two versions of code, just showing the commit history of logical revisioning is sufficient from GitLab. Take a screenshot from GitLab history for the file.

For Acceleration, I kept submitting tasks as soon as they were completed, without fine-tuning. I just wanted initial feedback from the evaluators on whether the direction I was taking was correct. If the first few parts are not good, they will not evaluate the rest of the rubrics. To my surprise, some tasks were passed directly, and others returned with comments. An instructor suggested this strategy, so I followed through with the rest.

Brief key points. I have not repeated the points already discussed in this forum.

General Tasks: The first 3 Tasks should be doable for any new students, as they involve basic ideas/documentation, SQL, and basic Python analytics.

D596: Basic documentation. CliftonStrength is part of one lab exercise; do not skip it, as it is required for Task 2.

D597: DB-based. I installed both Postgres and MongoDB locally on my MacBook. Complete demonstrations from local. No specific incentive for doing so in the VDI Sandbox.

MongoDB: Should be scripts, not UI steps. The challenge was figuring out the performance difference before and after optimization. Write a bad query that scans everything first, and then write a good query. There was a 20% reduction, which was sufficient.

D598: The flowchart had a few revisions, and it was basic. Also, it is basic Python programming; even if you are new to data analytics, with some preparation, you should be able to get through it. Justifications for outliers would be fine.

Data Science Tasks: The next courses are Python-heavy, so there will be a lot of data Analytics.

D599: For a new student, this will likely take time, but if you are already familiar with Python and analytics, it should be fine.

Each task should have a different dataset. My only challenge was classifying variables, which got a couple of revisions. Even the course instructor agreed on my classification, but I updated my documents as the evaluators expected. No time to waste or argue. A large pool of evaluators is at work; each time it goes to a different evaluator. Added a comment in each submission on why the revision was made, so that even if it goes to a different evaluator, they can look back at the comments.

D600: Each task should have a different dataset. Continuation of D500, More analytics and visual graphs. Do not mind the accuracy of the final outcome, as long as it meets the rubrics. My final values were not ideal and way off, but there is nothing in the rubric to say it should be a specific value. If we provide justification and explanation of the outcomes and demonstrate the understanding, it should still be good to go.

D602: More into Data Science. This is where the fun begins. For a new student, this will likely be overwhelming. Maybe the course was likely designed that way. For me, it looked like how I work with my managers and product team. Requirements are unclear, and rubrics lack clear instructions you can connect back to. I attended the webinars, got my questions clarified, and then made corrections. Instructors and webinars are very helpful (They are your stakeholders, assume you got to collaborate and then get the work done). If you attend all these, you can learn the directions and make corrections as you go. Again, meeting the rubrics would be key. Understanding the flow is key. (main.py is just a workflow pipeline. B) import C) filter D) ml_experiment (Incorporate the fixed poly_regressor_Python_1.0.0). Everything should be runnable through main.py either individually or as a whole. So nothing major happens in main.py.

Data Visualization Task

D601: Tableau. Nothing challenging. I personally struggled because I was not familiar with the UI and Tableau. The idea and implementation were simple, but getting that visual done in Tableau was what I struggled with. I went through YouTube videos when I was stuck. But if you have used Tableau before, it should be easy enough.

DE specialization Tasks:

D607: I did not take this course at WGU. I completed the GCP PCDBE certification before enrolling and received 3 credits before I started the program. It was the only certification in this Program worth transferring credits in terms of time and effort, and it gives a head start before the program begins.

D608: Udacity Nanodegree. The Task itself is simple. Getting the AWS setup correct is important. If you are new to AWS and Airflow, it's likely to take some time. With a $25 AWS credit limit, I deployed only after my local Airflow testing was complete. I set up the local Docker environment as suggested by this community. Do not be discouraged by negative feedback for D608; if you prep and do not skip the steps (Order of step-by-step execution in Udacity is very important in this exercise, missing the order will mess up the AWS setup, and don't forget to turn off the Public IP (for cost saving) as mentioned in other posts, you should be able to complete it. Watch out for prices: the prices shown in the AWS console aren't live, so if you work overnight, it might show $5, and the next day it might show $15 because charges aren't updated in real time.

D609: Udacity Nanodegree. AWS, Glue. I followed each step (Do not miss the order). Glue Jobs were new to me, though I was familiar with Spark. During testing, my Glue jobs were failing. I kept resubmitting and quickly burned $15. Glue clusters are costly. Had to change strategy and test locally first. Used DuckDB SQL and tested all the steps in the localhost Jupyter notebook, which met the test requisites, and then converted to Spark SQL to test that in an AWS Glue Job. So, yes, a complete SQL-based solution is possible, and for local development, you can use this strategy without burning credits in AWS. Try to gain expertise in the technical stack for this course; that will help you pass through these two courses.

D610 Capstone. I was anxious about this. I got Dr. Sewell as an instructor based on all the feedback about this process in this community. But to my surprise, it was a breeze. He was very specific about what he needed. He gave a model template that has passed in the past, topics to be mentioned, and the sections he is looking for. If you follow the same template and submit your proposal, email him and schedule a call. Propose at least 2 or 3 ideas; he will likely suggest which one works best and what corrections are needed. The only issue was that this feedback call was very fast (he covered the feedback on the document and all the corrections needed in 5 minutes). Luckily, during this call, I was at my computer, quickly taking notes on all the comments and making corrections; he signed and approved the next day. It took a week to get his approval after I submitted the first proposal. Following the model template is a much easier path.

The next two tasks were completing the analysis and multimedia presentation. That was like any other course. And there was no need for any Data Engineering demonstration in this course. Complete Data Science analytics and presentation.

Thanks Again. If you are a new student, it only scratches the surface of DE; try to expand into other areas. (Certifications like Spark Databricks Developer, GCP Professional Data Engineer, AWS solutions architect will expand your knowledge. Topics covered in these certifications will cover the depth of DE specialization.)

All the best to future graduates.

D597

Finally got my confetti!

Done!!

Finally Completed. Thanks. My inputs!!!

D601 Task 2 - Don't forget to include your background!

Advice for D210 and D211