u/Delicious-View-8688

I've had roles that spans Management Consultant, Data Analyst, Data Scientist, and as of a couple of years ago, a Data Engineer. Of these four titles I've held, DE is both the easiest and the hardest of them all. Let me explain.

Consulting is mainly interacting with clients. It can be hard at times, but it's a relationship thing. The hard part is trying to find meaning in the work, and instead looking into the abyss.

DA work was intense, fielding questions left and right, running statistical analyses, hoping you're sampling it right and using the right kinds of models and tests. It's fun, you form a deep understanding of the datasets you work with. I also liked explaining things in plain English for the stakeholders.

DS work was a bit of... busy fancy work that was going nowhere. Leadership just heard from some conference that data was the big thing and machine learning will solve all efficiency problems. It was fun in some sense, and it was the title I wanted, but it was like consulting, just with a different set of jargon and playbooks.

Now the DE title... yeah moving TBs of data, combining a bunch of different data using a bit of ML is all well and good, but I think with LLMs, the expectations are expanding. One day I am load balancing Opensearch clusters, then I am tweaking parameters for an ML model because the data distribution is different now. Then a client is asking about how to find shortest paths in a graph database, inbetween trying to rejig the CI/CD pipeline and updating the friggin' certificates and access tokens using terraform or opentofu or whatever the eff is set up to create them. Another week, you're wrestling with kubernetes and datadog to check why the heck authentication to the app is failing. Completely unrelated, Databricks decides to change default settings, and the data you think was going to be retained for a few more weeks have been wiped. Oh no, why is that ML job failing? What? AWS ran out of compute in the region? Crap, the dev team changed the API, let me update my python package, together with the unit tests. New data source, yay. What? un undocumented XML structure? I guess I'll just navigate that tree. Stakeholders want the same data, but aggregated differently. Sure. That's easy enough. Just after I help out with this other business proposal. Oh no, somebody else is trying to push a different change in the same codebase as that PR I'm waiting to get reviewed. Rebase, resolve conflict. Argh, I missed the release window, let me cherry pick, I always forget how to do that. Why is dbt checks passing in staging but failing in prod with identical data? Hmm, I should really review the blocking strategy for this record-linkage algorithm. Wait this dashboard looks different in Chrome compared to Edge. Lemme take a look at dev tools. Why ain't that javascript thing loading? Ah crap. Postgres migration failed... and I crashed GitHub.

It's tool overload, and with the range of tasks... I don't know if I am a stats guy, a software guy, or a cloud guy. Didn't realise so much was expected.

Maybe I am not cut out to be a DE