r/datascience

▲ 1.4k r/datascience

Hello! I know there's people here with PhDs, working in FAANG, on top of the newest tech, and are absolutely brilliant Data Scientists.

I'm not one of them.

I've worked in medium to small companies with outdated technology, companies where I'm the only Analyst/Scientist, and places you've most likely never heard of. I don't do anything extraordinary, don't consider myself smart/brilliant, and I wouldn't pass a current day FAANG interview.

But I have still had an amazing experience being a Data Scientist, and I have made real impact with companies I've worked in. I still interview at companies and have no issues getting job offers (although it's much more difficult right now). I've always had a hunger and drive to learn new things, but I found that I have had a knack for translating complicated information into a way anyone can understand.

I make sure I'm kind, compassionate, and show anyone that data can be interesting and fun. I don't live to make myself look smarter, especially at the expense of other people, so I love breaking down complicated concepts in a way anyone can understand!

I love showing insight from data and directions we can go. I enjoy building models - even if a lot of them go nowhere. Some of the biggest impacts and decisions companies have made have come from bar charts and basic KPIs.

And I plan to keep doing it. I'm so average, maybe even below average, but I love what I do and I lean into what I'm good with. I have seen such a drastic change in the field, especially with AI, and I'm currently adapting to those changes too.

Anyway, I just wanted to share my positive experience from someone who is painfully average lol!! I wanted to show people, especially new grads and/or people pivoting into the field, that you don't have to be the smartest person in the room to get hired. You need to drill into the solid foundations and a have a drive to make change/bring value to a company.

reddit.com

u/tits_mcgee_92 — 11 days ago

▲ 481 r/datascience

So here's the story: another team in my company opened an associate-level DS role last week, we got 300+ applications, and somehow there were 30+ senior-level guys applying for it. Not fake senior either. Like actually senior all with 10+ yoe. One of them even was a master from Harvard.

I knew the market was bad, but seeing that kind of applicant piling up for an associate level role was still kind of unbelievable.

Feels like a lot of experienced people are applying down-level after being laid off now just to stay employed. Which is fair enough, but also DAMN.

Curious that are other people & teams seeing the same thing, or is this just a weird sample on our side?

reddit.com

u/Alarming-Wish207 — 12 days ago

▲ 210 r/datascience

Interviewing with hedge funds has been the worst experience of my career

Over the last year, I interviewed with two well-known hedge funds and one investment firm, and the experiences were strangely similar.

The first hedge fund dragged the process out for months, hinted at an offer, never turned the verbal discussions into anything official, and then sent a generic rejection email. If I wrote out the full experience, people would probably think I made it up.

The second hedge fund had me do an LLM case study and an IQ test, then completely ghosted me.

The third company, an investment firm, put me through multiple rounds ranging from hand-solved probability questions to LLM case studies. I do not mind a tough onsite process, but what bothered me was the sheer breadth of the interviews and the fact that they eventually stopped responding to my follow-ups altogether.

It feels weird that I have had such similar experiences across companies in the same space. Does this say something about the industry, or am I doing something wrong?

Edit: Best part is 2 out of these 3, I never even applied. They reached out on LinkedIn.

reddit.com

u/Fig_Towel_379 — 6 days ago

▲ 6 r/datascience+1 crossposts

Looking for advice: Online Master's in Applied Math for ML while working full-time

Hi everyone,

I'm looking for some honest input from people who've been down this road or know the landscape well.

My background:

B.Com in Finance & Accounting from Delhi University (2019)
During Covid somewhat made my way into machine learning by doing self study at home.
Currently a Senior ML Engineer at a large financial data/tech company in Bengaluru
Day-to-day work spans around NLP/LLM systems, real-time ML pipelines, distributed data infra, and AWS.

What I'm trying to do: I want to seriously deepen my foundations in applied mathematics for ML — think probability, linear algebra, optimization, statistical learning theory, the actual mathematical machinery behind modern ML rather than just the engineering side. I've been doing ML professionally for a few years now and I keep hitting the ceiling where deeper math intuition would make me significantly better at my job (and at research-leaning problems).

My constraints:

Can't leave my job. I need a fully online / part-time / WILP-style program.
Based in India, so an Indian program is ideal (IISc, IIT online degrees, CMI, ISI, BITS, etc, i know getting into top tiers college is very very hard for someone whose background isn't in engineering but still if there's any way they accept non-techincal degree holders, I would like to know more about how one can enrol for such programes)
Open to foreign universities too if the program is genuinely online and the time zones work out

What I'd love input on:

Programs you'd actually recommend (and ones to avoid) for applied math / mathematical ML at the master's level, fully online
If anyone has done IIT/IISc online degrees coming from non-technical background in math/stats/ML while working full-time, how was the experience and workload?

Not looking for career change advice happy in my role. Just trying to build deeper foundations the right way. Any pointers appreciated.

reddit.com

u/Lamba_ghoda — 18 hours ago

▲ 189 r/datascience

Preface: This is a burner account for ... reasons.

About Me: DS hiring manager for a F500 company. My company hires a combination of on site, hybrid and remote roles.

Overview: Through the past 1.5 years, hiring has become untenable due to lying, cheating and now fake candidates. If you are unaware of what I mean by fake candidates, read this article. I'll briefly touch on the lying then focus the rest on the cheating / fake candidates.

Lying: For roles where we cannot provide sponsorship, we have a survey during the application process that asks if you require sponsorship or will require sponsorship in the future. Those who hit "Yes" are immediately filtered out. The problem comes from those who are either lying or confused when they hit "No".

90% of the people who submit "No" either lying or confused are on OPT visas. These are post-Master's degree visas that allow you to work for 12 months in your field with an addition 24 months added if you are a STEM field (so 3 years total). When assessing someone's profile for 30 seconds it is immediately obvious:

Last work experience outside the US

In these situations the candidates either are lying or don't quite understand that when we say "or will require sponsorship in the future" it applies to people when cleared to work for 3 years. While these candidates pretty much exclusively originate from one country, please do not disparage my post with racial insults. These are people who simply want to work a job the same as you and I. It also does not make one more prone to lying. For every un-honest applicant we get, there are 2 others who apply honestly and are filtered out.

How does this impact you? Well we are getting 1,000s of applicants for these jobs. Because I do not discriminate on candidate name before opening a profile / resume, this means I spend a lot of my time (30s to 1 min) on candidates who are ultimately ineligible. Because I do not have all day to do this, it means I do not look at every candidate profile. Due to that, there is a chance that I will never see the profile of an eligible, qualified candidate.

That is all I will say on this. Again, do not post racial insults in the comment section.

Fake Candidates: Okay so let's now say I found a "candidate" who on paper appears eligible for our job. That is roughly 60% of the total applicants we get. Out of that 60%, 90%+ are absolutely fake candidates / people.

Below is a list of the key things that identify fake candidates. (EDIT: One bullet does not mean fake but the lions share or all DEFINITELY DOES):

Resume is an LLM generated recycle of our job description with no details, just buzz words and bold lettering
Phone area code also has no connection to education or work experience (appears a lot of bot farms are in Florida, Texas or Kansas)
They will say they work remote for companies that are notoriously in office or had a big RTO within the timeframe of their current work experience
Home addresses are non-residential or PO Boxes (someone applied with an address that I google street viewed was a highway overpass)

EDIT: Forgot email addresses like John.Doe.Dev@gmail

So if the resume isn't a dead give away, here are the next stages

Linkedin profile URL is legit, not a name and alpha numeric but there's slight discrepancies between resume and profile

Assuming I have not filtered you out from the above and the profile looks good, I will pass you to our recruiter to screen you. In these cases 50% of people I pass will still end up being fake! Our internal recruiter will catch things that are fishy, most often being its clear the person talking is not the one we saw on Linkedin. In these cases, the fake candidate is piggy backing off a real person's profile.

Cheating: Okay so now you are a real person at least and you're interviewing with us. Well unfortunately 50% of these candidates are using AI to cheat. We are very explicit at the start of an interview. We ask you not to use AI because we want to assess your education and experience. Its not that we don't use Windsurf or Codex ourselves but I need to know you'll understand what the LLM spits out and you aren't just a vibe code hero.

About a year ago cheating was more straightforward. A candidate would screen share only a tab, not their whole window. They would have a second monitor and by typing or copying some code into an LLM to generate a response.

Now the thing is voice to text or voice to voice technology. We will ask questions that are robust to copy-paste LLM cheating but the candidate has an app on their phone in their lap which will capture our question then show a response in text or send voice to their headphones. Dead give aways here are long pauses between our question and their response in a manner that is clear they are not actually thinking or looking down at their crotches a lot.

What can you do to stand out?

As much as I hate it, you need a Linkedin, you need it to have pictures of you (do not use any AI program to touch it up) and you need to genuinely engage in your industry and with old or new coworkers. This is the easiest way to confirm you are real
Create a unique URL for your linkedin page. Do not keep it as the base name/alpha numeric
Do not use any generic resume formatting for your resume. Create something that looks professional, is nice but unique to you.
Do not use LLMs to clean up your resume, focus details on very specific pieces of work you did that used a technology, don't just say you have CI/CD experience
If you fear discrimination based on your name, I would recommend putting that you are legally authorized to work in the US (though it sucks I have to say that)
Add something unique to your resume. If you made a medium post while working at an old job add it. Anything to stand out from fakes
Within the interview stage, always share your full screen and try not to wear headphones. That will help us not suspect you are cheating.

EDIT: A few folks seem angry about my opinion on LLM resume writing help. If it’s working for you, use it!

EDIT 2: Thanks for all the engagement! I’m going to take a break from responding. Just wanted one view into what’s going on, hope it’s been insightful!

To all those leaving frustrated comments, I’m sorry if this has been disappointing to you all. My hope was this post would show there are still actual humans taking time to review your applications and dealing with the headaches that a manual process is causing. Guess it didn’t come across that way.

u/OtterFox365 — 11 days ago

▲ 182 r/datascience

I've been looking for a new job lately (brutal market, btw), and a lot of the ML/AI engineering work now seems pretty LLM-dominated.

I still see a few jobs that seem to be doing more "classical", pre-ChatGPT era type of work with Pytorth or Tensorflow, but it seems that a lot of the work now is working with LLMs, doing RAG, prompt engineering, etc. with Langchain or what have you, and calling Anthropic or OpenAI model endpoints.

Is this an accurate take on the market? And if so, what happened to all the Pytorch/Tensorflow work? Why did it shift so heavily towards just using LLM providers in some package/endpoint?

reddit.com

u/Illustrious-Pound266 — 10 days ago

▲ 12 r/datascience

Healthcare (insurance, pop health, VBC) - actual AI use cases?

Pretty open ended here. I work in population health for a VBC organization. Goals are improving patient outcomes and reducing cost of care, particularly for Medicaid population.

Can anyone share actual AI use cases that are valuable? Outside of AI coding agents (huge value for some) nothing has really taken off.

Example: AI-generated patient summaries from medical claims and operational data. Super rich context about risk factors, gaps in care, recent conversations, etc. Providers loved the idea but zero adoption because they value autonomy and their judgement.

Example: Natural language chat interface to various operations and staff performance datasets. No uptake because nobody knew what to ask. Dashboards are just easier.

Example: Natural language interface to program outcomes via causal analytics. Literally ask about any market/program/subgroup and outcomes attributable to program. Zero adoption among executives because they either want 1) a quick verbal explanation or 2) a spreadsheet and slide deck.

reddit.com

u/dmorris87 — 1 day ago

▲ 179 r/datascience

Thoughts on DS I worked with inside vs outside FAANG

I get ask the question online and in person: what it takes to get into a good FAANG company?

I spent the last year working at a Google as DS and spent the previous 3 working at random industries (pharma, supply chain, large buy-side banks, etc.)

I genuinely think that the quality of DS I worked at in FAANG were higher caliber for the following reasons:

All my teammates weren't necessarily experts at a lot of things, but they had a very good grasp of the fundamentals. If you take the DS skill tree divided up into categories (ML/coding, communication, business/product sense, etc), my teammates were at least a 7-8/10 on all of these while being expert level at some things the team was responsible for. While doing mock interviews, what stood out the most is how badly some people commuinicate . I understand that a lot of people working in STEM have English as a second language, but that's not taken into considerationg when evaluating if they want to work with you. Also, I worked with a lot of DS that score very low in some aspect of what I would consider 'fundamentals'. Some knew how to code and develop, but never took a probability class. Others had heavy math background and had no idea what to do outside a notebook. Others had a good industry experience but weren't sure how to quantify their ideas and turn it into a stats problem. At Google everyone could reliably do everything to an acceptable level, and learn how to do it better if they needed to and everyone had a good 'vibe' that made them fun to talk to and work with. Honestly, the best part of the job were the coworkers while the work itself was pretty boring.

I think I was picked for the role since it was a communication heavy role and I had a lot of experience coaching people and public speaking

To land a job at these companies I don't think you need to be an expert specialist for the large majority of the positions. I think what you get evaluated on is if a DS problem is thrown at you, or you are in a discussion about a problem, you know what is being discussed, how the problem is solved generally, or know what to look up to solve it. If you have the extensive knowledge and experience + the things listed above you'll likely get promoted to Staff level pretty quickly or hired there.

So, my final thoughts is if you are studying for these positions, don't spend your time deep diving into niche topics or doing quant style problmes. Instead, have a very good baseline understanding of the fundamentals of what DS does and be able to communicate well and demonstrate that you can contribute.

For companies that can be highly picky (FAANG, MBB, etc) you also need to pass the airport test: How would I feel if I was stuck at an airport with you waiting for my next flight?

reddit.com

u/LeaguePrototype — 6 days ago

▲ 368 r/datascience

I did a physical onsite recently where they asked me to travel to their office, about 1.5 hours each way. The interviewers were nice and the interviews went pretty well, so I was hoping to hear back from them. The opposite happened. It has been two weeks since the onsite and I have not heard anything.
The recruiter was very polite before the onsite, but after it they completely stopped responding.

I had to take a day off work and make arrangements in my personal life, and the company cannot even bother to send a rejection email? I have never had a job search this difficult before.

reddit.com

u/Lamp_Shade_Head — 12 days ago

▲ 129 r/datascience

Job search was massively easier than just a year ago

ML Engineer in UK, senior level.

In 2024-25 I must have applied to 60 jobs in a 14 months period and it was a shitty experience overall. This year it took one months and about 8 applications from which I got 2 offers! so I am vibing.

Incidentally, since January I am getting LinkedIn messages like it was 2021, so maybe (hopefully) things are looking up for this field, the last 4 years have been unnerving.

End of communiqué.

reddit.com

u/autisticmice — 6 days ago

▲ 103 r/datascience

I got an interview invitation for a Machine Learning Engineer role at a FAANG company. There are two issues. I am not an MLE, so preparing for it feels nearly impossible. Also, I have never even interviewed for an MLE interview, let alone at FAANG.

I am currently a Data Scientist and have been interviewing, so I feel good about my preparation for DS roles. Can I tell the recruiter that I believe I am a better fit for a DS role than MLE? Do you have any other suggestions?

reddit.com

u/Lamp_Shade_Head — 8 days ago

▲ 139 r/datascience+1 crossposts

interviewquery.com

u/CryoSchema — 7 days ago

▲ 236 r/datascience

Two rounds: 1. Statistical Knowledge 2. Data Analytics and Intuition

For statistical knowledge, it was a complex question, but actually had a simple answer.

It required you to have through knowledge of distribution, expectations and confidence intervals.

The key challenge was to identify what was the distribution of the data, from a sample, generalize it to the population and find the confidence interval.

Looking back, it was a easy question, but I definitely took wayyyy to much time to get to the answer. They for sure test for Googlyness. I would assume the interviewer had multiple questions in mind but I never got to the next one. Soo no hire.

For the data analysis and Intuition, I was expecting a case study, on experimentation or ML. It was kind off an hybrid. It involved diagnosing a flawed model, how to improve it, and what other methods would work better. This part was fine, not too bad.

What caught me off guard was, they asked me to write the equation MLE for 2 models, one general and one a niche. Honestly I dint know, lol.

Well, learnings ? Practice your Stats and ML like you are writing a school exam.

reddit.com

u/saagggssss — 12 days ago

▲ 27 r/datascience

What to take away from failed interviews when you don’t really know why you failed?

After every interview and hiring decision, I keep notes on what went wrong, what I could improve, and why I either moved forward or got rejected. I recently finished two onsite interviews where I walked away feeling genuinely good about my performance and how I handled the conversations. For one of them, I was honestly pretty confident I would get an offer.

Instead, both ended in rejection, or at least that is how I see it since one company completely ghosted me afterward.

What I am struggling with now is figuring out what I am supposed to learn from experiences like this. If I prepared well, communicated well, and left feeling positive, then what exactly caused the rejection? More importantly, how do you improve when you cannot even identify what went wrong?

reddit.com

u/quite--average — 5 days ago

▲ 40 r/datascience+2 crossposts

Went down a rabbit hole on causal reasoning and came back up having learned about DAGs, mediators, and why predictive accuracy shouldn’t always be the target.

The past few months, I've been teaching myself Bayesian stats from the Statistical Rethinking textbook (highly recommend btw) and I went down a rabbit hole on causal reasoning which I found really compelling! It's a completely different framework from the "maximize predictive accuracy, throw everything in" approach I learned in bootcamps and instead called for thinking deliberately about the causal mechanisms generating your data.

Anyways, I thought it might be useful to write up an article summarizing some key ideas of causal reasoning like DAGs, mediators, and confounders for those that haven’t come across it yet. I also made a case for why adding more predictors may actually make your models worse if you don’t think carefully about the relationships your predictors have with one another. And to make these concepts more practical, I applied them towards a wildfire dataset to form a hypothesis on the data generating process behind total hectares burnt in a wildfire.

This is Part 1 (theory + DAG construction) of a two-part series. Part 2 will test the causal model with regression.

If you find this stuff interesting, useful, or even just inaccurate, I’d love to hear your feedback! Has anyone else gone down the causal inference rabbit hole? It feels like a whole different lens on ML that doesn't get talked about much but definitely needs more attention.

https://medium.com/towards-artificial-intelligence/rethinking-predictors-why-causal-reasoning-matters-in-data-science-part-1-f1d4c1e08068

https://preview.redd.it/n7isqm44v00h1.png?width=2779&format=png&auto=webp&s=fb4def19be69150c19bff3805d80243540eb6f2c

reddit.com

u/vanisle_kahuna — 5 days ago

▲ 100 r/datascience

My interview experience has been a massively varied at this point, but what I've noticed is the massive difference between big companies like FAANG and smaller orgs like DS in banking or random small companies

At FAANG it's kind of like an IQ + knowledge test (what google calls Role related knowledge) and smaller companies do assessments for very specific types of modeling or use cases, like build a model being evaluated on a certain metric.

So at FAANG I was asked questions like "why is the formula for s.d. different for pop. vs sample', or 'what happens to the bias/variance in x,y,z situation' mean while at companies that are smaller and pay less they sent me a random 30-60 minute assessment and asked me to directly clean data and code up a model with sklearn/pandas.

Is this what everyone else has experienced? It does seem like at smaller or traditional companies test if you will be a good code monkey while others look for actual understanding.

reddit.com

u/LeaguePrototype — 9 days ago

▲ 87 r/datascience+4 crossposts

Match-day Airbnb premiums across all 16 World Cup 2026 host cities, vs same DOW 2025 [OC]

Each cell = one city × one day, World Cup 2026 group stage. Color shows premium over same day-of-week in 2025 (controls for the Sat>Tue weekly rhythm in lodging pricing). White-outlined cells are match days.

A few findings:

The Final isn't the spike. MetLife on July 19 only hits +131% over baseline. The biggest single-match premium in the US is a Round of 32 game in Kansas City: +313% (June 29). Dallas peaks at +310% the same weekend.
Kansas City and Dallas dominate — not the "tier 1" cities. LA peaks at +54%, SF at +47%, Boston +67%, Miami +102%. Mid-market metros pop harder because they have less inventory to absorb demand. Kansas City averages +279% across its 6 match-days; Dallas averages +269%.
The "+109% YoY" headline buries the tail. Across 76 US match-days, the median is +109% — but P75 = +166%, and 1 in 10 match-days clears +268%. The arithmetic mean is hiding a very long right tail.
Round of 32 and R16 outpace QF, SF, and the Final. R32 averages +161% match-day lift, R16 +160%, QF only +87%, Final +131%. The narrowing-field rounds concentrate demand into single dates more than the marquee games do.

Source: AirROI's database — ~16,000 active Airbnb listings, 1,000 closest to each of the 16 FIFA host stadiums (radius 4–32 km depending on local density).

Tools: Python pandas pipeline, SVG built directly in Node, rasterized via sharp.

Full Report at https://www.airroi.com/world-cup-2026-airbnb-data

Free for editorial use (attribution: "Source: AirROI" + link): master CSV (16 cities × ~20 metrics × 2 years), methodology PDF, chart pack at https://www.airroi.com/world-cup-2026-airbnb-data/press-kit

airroi.com

u/jason-airroi — 6 days ago

▲ 66 r/datascience

Steam Recommender using similarity! pt 2 (Student Project)

I Just made a sequel to my Steam Game recommender website!

Last year I made a post about my steam reccomender The last one was great and served its purpose of showing many people new games, But this new version is much more functional!

I love making recommendation systems that tell the user WHY they got the recommendation.

During a steam sale event, I always find myself trying to look for new video games to play. If I wanted to find a new game I would try to whittle it down by using steam tags, but the steam tag system is very broad "action". could apply to many many games.

That got me thinking, what aspects do I like about my favorite games?

Well I like Persona 4 because of the city vibes and jazz fusion,

Spore because of the unique character creation and whimsical theme.

Balatro for its unique deck building synergies.

What if I could capture unique tags that identify a game that aren't just "action" and put them into vectors to show the (focus) of a game

For example I could break persona 4 into something like

Gameplay Focus vector:
Day cycle 20%
Dungeon crawling 20%
Social sim 20%

Tags:
Music: jazz fusion
Vibe: Small rural town

I find that this system makes searching for games more "fun" now I can see why I like balatro. I like it because of the card synergies not so much for its rogue-like nature.

I also find that this helps find new underrated games, and beats the trap that Collaborative Filtering algorithms that get into where it "feels" like you get recommended the same things.

find your next favorite game! : https://nextsteamgame.com/ pull a PR!: https://github.com/BakedSoups/NextSteamGame

( I actually made some git issues myself for problems I can't fix)

if anyone has any criticism I would love to hear it! this is probably my favorite passion project.

Hope this website helps people find new games! Also I have a advance mode for people that don't mind messing with sliders and weird data terms.

u/Expensive-Ad8916 — 5 days ago

▲ 58 r/datascience

RussellSB/pytrendy: Trend Detection in Python. Applicable for real-world industry use cases in time series.

For the past year, l've been building PyTrendy, an open-source Python package that fills a specific, often overlooked gap in time series analysis: Automated Trend Detection.

Why PyTrendy?

Most tools either give you a "trend component" (via decomposition) or "changepoints" (the moments of shift). PyTrendy is built for labelled segment analysis. I built this out of a direct need to improve on existing methods:

- Beyond Step Changes: While ruptures is the gold standard for abrupt shifts, I needed to also handle gradual slope changes - the kind often seen in digital marketing activity, stock trends, and energy time series.

- The Flat/Noise Problem: Previous tools such as pytrendseries, trendet, & tstrends are closest in function to what PyTrendy targets. But I found that they often over-fit trends on flat or noisy periods, expecting users to set up their own labour-intensive workarounds to avoid this. My approach uses signal-processing and post-processing logic under the hood to ensure the algorithm identifies trends that are precise and valid.

In a complex business ecosystem where dozens of time series interact, knowing exactly how they align or confound each other at specific points in time is invaluable. Especially for experiment design. Without understanding the DGP process well enough and how it varies across time, experiments could fly blind and generate misleading indications.

Explore the project

Let me know what you think! Hope other practitioners benefit from this for their own time series use cases.

- Documentation: https://russellsb.github.io/
pytrendy/
- GitHub Repository: https://github.com/RussellSB/pytrendy

github.com

u/devrus123 — 4 days ago

▲ 164 r/datascience

u/rhiever — 10 days ago