r/DataScientist

How are you keeping up with AI updates these days?

I’ve been running into the same issue recently—too many sources (research blogs, company updates, media), and a lot of overlap or noise.

I built a small pipeline to experiment with this:

  • ingestion from curated sources
  • deterministic filtering + deduplication
  • LLM-based scoring (relevance, importance, novelty)
  • clustering of related content
  • structured digest output

Main goal was to reduce context switching and make it easier to focus on what actually matters.

Curious how others here approach this—tools, workflows, or habits?

reddit.com
u/Elinova_3911 — 14 hours ago
▲ 7 r/DataScientist+2 crossposts

[FOR HIRE] Data Scientist / ML Engineer / AI Engineer | 4 YOE | Python, XGBoost, LightGBM, LLMs, MLflow, Spark | Remote | Full-time or Contract

[FOR HIRE] Data Scientist / ML Engineer / AI Engineer | 4 YOE | Python, XGBoost, LightGBM, LLMs, MLflow, Spark | Remote | Full-time or Contract

---

**Who I am**

Hi! I'm Keshav, a Data Scientist, ML Engineer, and AI Engineer with ~4 years of experience building production ML and AI systems — from raw feature engineering to model deployment and monitoring. I specialize in taking models from experimentation all the way to production.

---

**What I do well**

🔹 Supervised/Unsupervised ML — XGBoost, LightGBM, scikit-learn, PyTorch

🔹 LLM & GenAI pipelines — RAG, prompt engineering, fine-tuning, agentic workflows

🔹 MLOps — MLflow, Docker, Kubernetes, Airflow, CI/CD for model deployment

🔹 Data Engineering — BigQuery, Snowflake, Spark, dbt, SQL at scale

🔹 FastAPI-based ML services & REST API productionization

---

**Recent portfolio highlights**

📌 **CreditSense AI** — End-to-end credit risk scoring product built with FastAPI, XGBoost/LightGBM, deployed on Railway. Targets the Indian fintech market.

→ github.com/keshavloma1081-ctrl/Creditsense-ai

📌 AI evaluation & annotation tasks including agentic coding evals comparing LLM model responses (Labelbox).

---

**Stack at a glance**

Python | SQL | XGBoost | LightGBM | PyTorch | scikit-learn | FastAPI | LLMs | MLflow | Airflow | dbt | Spark | BigQuery | Snowflake | Docker | Kubernetes

---

**Availability**

📍 Based in Delhi NCR · Open to fully remote roles globally

🕐 Available for: Full-time employment, long-term contracts, or project-based freelance

💬 DM me — happy to share CV, portfolio, or jump on a call.

---

**Best fit for**

Fintech · Healthcare AI · Analytics platforms · LLM-powered products · Fraud/Risk modeling · Data pipelines · AI/ML startups

reddit.com
u/New_Conclusion_2211 — 1 day ago

Postcode is one of the most underrated features in modelling

One thing that has consistently surprised me across different companies is how strong postcode features tend to be in models.

At first glance, it's surprising that it's so predictive (it's "just geography facts"), but then it clicks: people tend to live in areas with somewhat likeminded people, and the (visible) area-level behaviours often correlate well with the individual behaviours that we're interested in.

The features that are captured for each postcode,

  • demographics
  • deprivation
  • housing characteristics
  • crime exposure
  • transport access
  • general behaviour patterns

are proxies for behaviours that are hard to observe directly: renewal propensities, fraud, risk.

The other issue is that postcode data is rarely "done properly". It's often:

  • built once and never updated
  • very incomplete
  • or treated as a static lookup rather than something that evolves over time

Of course, there are important considerations around fairness and bias here, since geographic features can correlate with socio-economic factors. In practice, how these features are used depends heavily on the application and regulatory context.

Curious how others are handling this -- do you tend to use postcode features, or is it something that gets deprioritised?

reddit.com
u/Sweaty-Stop6057 — 17 hours ago

How are you benchmarking forecasting models across classical, ML, and deep learning approaches?

u/Ankur_Packt — 4 days ago

Anyone else tired of babysitting Colab notebooks? I built a way to run them like jobs

u/jerronl — 2 days ago

Is Statistics a good major to pick if I want to pursue Data Science?

So I've gotten the chance to study study statistics at one of the best universities in my country . It's almost free of cost. I've also got the opportunity to study computer science at another university but it'll be too expensive for me.

So I guess my question is can I still become a data scientist by studying statistics?

reddit.com
u/Peasent_in_Yellow28 — 8 days ago
▲ 4 r/DataScientist+1 crossposts

Quant researcher → Data Scientist pivot - worth it?

Hi all, I'm making a huge life decision and deciding between 2 job offers, so I would really appreciate perspectives from people in the DS field.

For some background, I’m currently a quantitative researcher working in corporate bond trading at a large bank in NYC. My work is fairly modeling-heavy (pricing, analytics) so I have strong research skills but not as much experience with the more formal DS workflow or software (Spark, Hadoop, AWS, etc).

Offer 1 (NYC) - Quant researcher role at a company that builds fixed income pricing models (company is a vendor to trading firms, so more product-focused, not actually trading)

  • Higher compensation
  • Stronger alignment with my current skillset
  • Similar to 'Applied Scientist' roles at some tech firms and has strong data science component (tech stack, release cycles, product focus)
  • I'm really excited about this role as it marries my experience with my desire to get away from the day-to-day stress of trading.

Offer 2 (Chicago) - Data scientist at a consumer credit agency. Role would focus on credit risk modeling for clients.

  • More traditional DS role.
  • Located in Chicago (my family and I would ideally like to live there long-term)
  • However, I do like the idea of a role in consumer credit risk. It's practical, there will always be demand for it and there are lots of companies to transition to (PayPal, Stripe, Capital One, etc).

Goals / concerns:

  • Chicago is a preferred long-term location for personal/lifestyle reasons.
  • In a perfect world, I could do the quant job in Chicago but there are no companies like that there.
  • I also wouldnt mind staying in NYC for a few more years

before looking in DS again

  • but my concern is that I'm missing a golden opportunity to relocate and break into DS that I might not get again, even though the role itself is suboptimal.
  • I really want to get away from the day-to-day aspect and PnL pressure of trading so I wouldn't want to transition to a pricing role at a Chicago prop shop

How I’m thinking about it:

  • The DS role is a more direct path into the field (especially for credit/lending/fintech roles later) but it comes with a pay cut and potentially weaker long-term growth at that specific company
  • The quant role keeps me on a strong comp/skill trajectory, but makes the DS pivot less direct and requires more intentional repositioning. It also maintains the friction of transitioning cities as well as jobs, down the line.

Questions:

  1. Does starting in the credit-focused DS role meaningfully improve long-term opportunities vs transitioning later or would my more unique background from the pricing role help me stand out?
  2. Am I underestimating how competitive DS roles are for someone without direct experience?
  3. Would taking a pay cut now for a “cleaner” transition path be worth it in your view?

Appreciate any thoughts, especially from people who’ve made similar transitions or hired for DS roles.

Thanks!

edit: to be sure, the options i’m considering are either take the chicago DS job now or take NYC quant job now and look for better-paying DS job in Chicago in a few years.

reddit.com
u/Grouchy-Load562 — 6 days ago

실시간 데이터 스트림 내 자막 오기입과 초동 대응 프로세스의 상관관계

u/mattkahnn — 4 days ago
▲ 2 r/DataScientist+1 crossposts

Macbook pro vs Asus G14

I have the doubt which laptop is better for data science between macbook pro m5 and asus g14 rtx 5070 ti. Both with 32 gbs ram. I want a laptop for a data science master.

reddit.com
u/NeedleworkerWeak6192 — 7 days ago

Testing a New Product for Data Science Beginners

I am building a platform for beginner data science students.

The goal is to help students build projects on their own without depending completely on long project tutorials.

Instead of giving the full project directly, the platform breaks the project into small tasks so students can think, build, and learn step by step.

I want to understand:

  • Whether this approach feels useful
  • Which parts feel confusing
  • Where students get stuck
  • Whether it feels better than watching full tutorials

I am not selling anything right now. I only want honest feedback from people who are learning data science.

Website - https://sted.co.in/

reddit.com
u/Jealous_Parfait_6457 — 7 days ago

[Selling] German Job Market Dataset - 150K Indeed.de listings (April 2026) - 38 fields including salary data

Fresh scrape from Indeed . de (April 2026). Perfect for ML, research, or HR analytics.

📊 What you get:
- 150,936 unique jobs
- 38 fields: title, company, description, location, salary flags, apply counts, ratings
- CSV format (~455MB)
- 100% valid data, no duplicates

📥 Free sample (5,000 jobs): IN COMMENTS

💰 Price: 200 USD  
📦 Delivery: 2h

🎯 Use for:
- Job market research
- ML training data
- Salary benchmarking
- Competitive intelligence

Tg: @ gdataxxx

reddit.com
u/dracariz — 5 hours ago