r/kaggle

Thanks to AI, the days of simple classification competitions are over. All competitions on Kaggle are either AGI or require 2D strategies.
▲ 11 r/kaggle

Thanks to AI, the days of simple classification competitions are over. All competitions on Kaggle are either AGI or require 2D strategies.

▲ 9 r/kaggle

SSH terminal for kaggle.

Lately i needed to access interactive terminal on a project. So i made a okayish working terminal with tmux. Let me know your thought about this project.

u/Hamim_mahmud — 14 hours ago
▲ 4 r/kaggle

EDA of Google's ISLR dataset — why the Kaggle-winning ~83% accuracy number hides signer leakage

I’ve been writing a slow-release research arc on ASL recognition, and before any modeling, I wanted to actually look at Google’s Isolated Sign Language Recognition dataset the way it should’ve been looked at before every Kaggle winner reported 83% accuracy on it.

Notebook 00 of a nine-phase project: What does the Google ASL Signs data actually look like?

https://www.kaggle.com/code/truepathventures/parley-notebook-00-islr-eda

The sharp opinion, drawn from the EDA itself:

The Kaggle-default random 80/10/10 split — which every public winning solution used — puts the same signer’s clips in train, val, and test. That’s measuring how well the model memorizes each signer’s specific missing-landmark pattern, not how well it generalizes. Three numerical reasons:

  1. Missing-landmark patterns are structural per-sign, not random. The sign × landmark-type heatmap shows clear one-hand-missing signatures for bilateral-handshape signs and face-adjacent signs. Fork the notebook and scroll to §3.

  2. Median clip length varies 2×+ across the 21 signers. Fixed-length padding normalizes away signer-specific timing the model won’t see at inference.

  3. Per-signer coverage of signs is high but not uniform. Leave-one-signer-out evaluation is feasible — the coverage histogram in §6 is how we know.

Recommended split: signer-holdout — 17 train / 2 val / 2 test. Notebook 01 (next month) quantifies the accuracy gap against random-split, with error bars across 3+ seeds.

This is notebook 1 of 9. Not a competition entry — a slow-release research project. Feedback welcome, especially from anyone who’s worked with ISLR before or runs signer-holdout evaluation in their own sign-language ML work.

reddit.com
u/FewConcentrate7283 — 1 day ago
▲ 1 r/kaggle

Mental health EDA (n=2,000): stress clusters with anxiety and burnout — not an isolated symptom [OC]

Dataset: 2,000 records from a Kaggle mental health survey.

Main finding: symptoms don’t appear in isolation. Stress acts as a cluster trigger.

•	65.8% of high-stress individuals also report anxiety (vs. 40.6% in the low-stress group)

•	62.8% of the high-stress group reports burnout

•	Young employed people show the highest stress rate: 49.3%

The implication is that interventions targeting single factors (sleep, routine) likely have diminishing returns if the underlying symptom accumulation isn’t addressed as a whole.

Notebook + figures on GitHub: https://github.com/matheusmarquezinhub/mental-health-kaggle

Open to feedback on methodology and visualization.

u/Economy-Concert-641 — 3 days ago
▲ 5 r/kaggle

I made a CLI tool to stop the "Zip & Upload" loop (modular code -> Kaggle notebook)

I got tired of the constant "Edit locally -> Zip -> Upload to Kaggle -> realize there's a syntax error -> Repeat" cycle. Since I don’t have a crazy GPU at home, I use Kaggle a lot, but I hate working in one giant, messy notebook.

I built repo2nb to fix this. It’s a CLI tool that converts your local repo (with all its folders and files) into a single .ipynb file.

How it works:

  • It uses %%writefile to rebuild your entire directory structure inside /kaggle/working.
  • It integrates with Kaggle Secrets so you can push/pull to GitHub securely without leaking your token.
  • It skips heavy stuff like .venv, .pt, or datasets automatically to keep the notebook light.

Basically, you run one command locally, upload the notebook, and you have your full modular repo ready to train. I’ve been using it for my graduation project and it saved me so much time.

Check it out if you're tired of the manual setup: https://github.com/David-Magdy/repo2nb

You can just pip install repo2nb and run it. Hope it helps!

u/PolarIceBear_ — 3 days ago
▲ 2 r/kaggle+1 crossposts

Social Friction Bench: When Helping Wrong Is Worse Than Not Helping

Just submitted Social Friction Bench to the DeepMind AGI competition (Social Cognition track). Wanted to share the methodology since it’s a bit different from most benchmark entries.

The benchmark tests structurally informed social cognition — whether models override socially comfortable responses when safety requires it. Seven scenarios across grief, workplace, coercive control, addiction, and child abuse disclosure. LLM-as-judge with domain-specific rubrics grounded in professional standards (NCTSN, National DV Hotline, Evan Stark’s coercive control framework).

The finding worth discussing: humans baseline at 1.01/2.0 on coercive control detection (N=129, 6 countries). Same scenarios where smaller models fail. The failure mode isn’t AI-specific — it’s a shared gap in structural social knowledge.

A few things that might be interesting to the community:

•	Reasoning Parasitism: Gemini and ChatGPT named the benchmark dimensions when aware they were being tested, rather than responding authentically. V2 will control for this with blind vs. labeled presentation.

•	Thoroughness as failure mode: longer responses buried critical guidance and scored lower than brief correct ones

•	S3 (coercive control) produced the widest variance across all 33 models tested

Writeup:

https://kaggle.com/competitions/kaggle-measuring-agi/writeups/new-writeup-1773797633903

Benchmark:

kaggle.com/benchmarks/benjamynwilson/social-friction-bench

GitHub:

github.com/DataInfamous/social-friction-bench

Happy to discuss methodology, rubric design, or the human baseline approach.

reddit.com
u/OkPhysics7423 — 6 days ago
▲ 25 r/kaggle

Is engagement on Kaggle declining?

Lately, it feels much harder to get any meaningful engagement or feedback on Kaggle notebooks.

Compared to earlier, the platform seems far less active, and discussions around notebooks are almost non-existent.

Is anyone else experiencing this? Has the engagement on Kaggle dropped, or am I missing something in how the platform is being used now?

reddit.com
u/ag_curious_soul — 11 days ago
▲ 0 r/kaggle

IVE BEEN DOWNLOADING AI IN KAGGLE BUT ALWAYS HAVING PROBLEM WHY THE FUCK THEY WANT ME TO DOWNLOAD THE WHOLE VERSION OF LTX WHEN IM DOWNLOADING FP8 VERSION

HELP ME

reddit.com
u/ItsManNoob — 7 days ago
▲ 4 r/kaggle

If I want to get insights of ML into real-life examples are there any trick to learn from Kaggle competition more effectively?

Right now, I open the kaggle notebook and go through the code and understand the logic and then recreate the solution by myself! I'm noob here, in competitive programming. Am I doing things right? or is there a better way? My end goal is to get a good grasp on medical image analysis! And learn to use an agent to automate some part of the pipeline ( but that's not relevant to this.)

reddit.com
u/SmartPuppyy — 9 days ago
▲ 3 r/kaggle

Beginner Healthcare Data Sets

I’m working on my Google Data Analytics Capstone. I’m a Masters of Health Administration student and I am looking for data sets that involve healthcare data. Any suggestions?

reddit.com
u/AmericanGirlStuuu1 — 11 days ago
▲ 6 r/kaggle

I am a bioinfo student and I want certain projects on my resume

Hello reddit community! I am a bioinfo student, my coding proficiency is okayish, I wanna add projects to my resume, I am not gonna claim that I entirely did that project, but, it's going to be a learning experience. Can I do that? or it's a demeaned thing? There are certain cool computer vision projects on kaggle, and I've been learning concepts, but I myself entirely cannot seem to complete the project and I don't have people in the vicinity who could help me complete it, not even my professors. Is that okay?

reddit.com
u/Dry-Let9898 — 8 days ago
▲ 4 r/kaggle+1 crossposts

Title: First ML competition — predicting air quality from satellite data, looking for advice from people who've done this before

Hey everyone, I'm participating in a competition where the goal is to predict PM2.5 air quality concentration using Sentinel 5P satellite data (things like NO2, CO, ozone levels) and weather data across hundreds of cities. Competition starts in 4 days so I'm preparing ahead of time.

I want to make sure I'm thinking about the problem the right way before the data drops. Here's what I'd love input on:

  1. When you look at a brand new dataset for the first time, what are you actually looking for? What's your thought process before writing any code?

  2. How do you decide which features are worth building vs which ones are a waste of time?

  3. For tabular data with both location and time dimensions (multiple cities, daily readings), what validation strategy keeps local scores trustworthy?

  4. What's the most common mistake in competitions like this that silently kills your score without you realising?

  5. What would you prioritise in the first 48 hours after the data drops?

Any advice appreciated, even on just one question. Thanks

reddit.com
u/Comprehensive-Tie992 — 13 days ago
▲ 0 r/kaggle

Kaggle's learning notebooks suck.

I'll explain. I came across Kaggle and decided to relearn python from a different perspective. I haven't coded in years and the way I learned wasn't great because I skipped a lot of basics and when I couldn't understand something I just copy pasted code and hoped it worked. Which was often. So coming into Kaggle I hoped I could learn the basics and bare bones. Which leads me to my issue. There is nothing more infuriating than learning something new and having the module you're working on expect you to know what the fuck to do without clear instructions or even problems that were similar in the tutorial before the exercise. I really let it slide on the beginner's course but as I've gone through the stages I have to uncomment the solutions more often than not because of the asinine way that the lessons are structured. How in the ever loving fuck am I supposed to learn if I don't get all the information I need to practice and effectively complete something? Make the practice questions have the same information as the tutorials so I know that to do in certain problems don't just give me a set of problems that are solved one way and then give me another set of problems in the tutorial that are solved completely different like an asshole. It's a fucking tutorial. And for fuck's sake how fucking hard is it to have two sets of hands-on learning. This read and figure it out bullshit would make anyone not want to learn on Kaggle. The idea Kaggle has is solid and so is it's resources but the execution is brain dead. Make both fucking parts of the lesson interactive so that whoever is fucking learning can do it in a guided way and then make the tutorial be the quiz. This isn't the fucking 80s where you had to have note and go fuck yourself in figure it out in books. If you're gonna fucking do something do it right or not at all. These lessons are half-assed. The idea of free range and creativity in the notebooks is amazing but without all the necessary tools it feels like you're failing every time you ask for the solution. If I'm having to ask for the answer you're failing to teach. Also fuck your hints. if they aren't gonna fucking clearify anything just don't fucking have them there. I've had a better time learning from a glue eating 4 year old then these fucking notebooks. Their only saving grace is that they're free. Now, I'm done complaining so here's the key points:

-Interactive lessons and quizzes

-Full tools and information that will be used in both sections to maximize learning and lessen frustration in new users

-Give multiple problems and solutions that will be used in both sections so that the user can understand the information throughly

-Rather than asinine hints, clarify the question and expand on the outcome wanted so that it can be processed in various perspectives.

Ex. (Compare each element of the list to 2 (i.e. do an 'element-wise' comparison) and give us a list of booleans like [False, False, True, True]. Implement a function that reproduces this behaviour.) vs (Make a function that compares if an element is higher than the threshold given and loops every element in the list to return if that statement is true or false.)

Answer:

def elementwise_greater_than(L, thresh):

return [ele > thresh for ele in L]

Ta-fucking-da.

-Teaching isn't rocket science and neither is a clear and decent explanation. Putting things in this form makes it easier to visualize the outcome rather than "here's what I want figure it out".

reddit.com
u/Sufficient_Gift939 — 9 days ago
▲ 9 r/kaggle

Confusion with write ups for hackathons

Hey guys, I participated in a Kaggle hackathon where judging is based on a write-up, not a leaderboard score. But I’m confused.

A typical write-up is around 1000–1500 words, but Kaggle doesn’t have a single “write-up” field. Instead, it has sections like title, subtitle, card/thumbnail image, media gallery, and project description.

So I’m not sure—does all of this together count as the write-up, or am I supposed to put the full write-up in the “project description”? That section seems meant for a shorter summary.

I’m really confused.

reddit.com
u/shaurya_pandey19 — 14 days ago
▲ 2 r/kaggle

https://www.kaggle.com/datasets/jahnavikachhia23/texas-residential-real-estate-intelligence-2026

I built and released a free dataset of 12,137 active Texas residential listings for 2026 — structured features (price, sqft, beds, baths, garage, year built) plus NLP-ready listing descriptions with PII redacted. Texas is the #1 volume real estate market in the US and there was nothing clean like this on Kaggle.

kaggle.com
u/Public_Night2989 — 13 days ago
▲ 7 r/kaggle+1 crossposts

Join CVPR 2026 Workshop Challenge: Foundation Models for General CT Image Diagnosis!

🧠 Join CVPR 2026 Challenge: Foundation Models for General CT Image Diagnosis!

Develop & benchmark your 3D CT foundation model on a large-scale, clinically relevant challenge at CVPR 2026!

🔬 What's the Challenge?

Evaluate how well CT foundation models generalize across anatomical regions, including the abdomen and chest, under realistic clinical settings such as severe class imbalance.

Task 1 – Linear Probing: Test your frozen pretrained representations directly.

Task 2 – Embedding Aggregation Optimization: Design custom heads, learning schedules, and fine-tuning strategies using publicly available pretrained weights.

🚀 Accessible to All Teams

  • Teams with limited compute can compete via the Task 1 - Coreset (10% data) track, and Task 2 requires no pretraining — just design an optimization strategy on top of existing foundation model weights.
  • Official baseline results offered by state-of-the-art CT foundation model authors.
  • A great opportunity to build experience and strengthen your skills: Task 1 focuses on pretraining, while Task 2 centers on training deep learning models in latent feature space.

📅 Key Dates

- Validation submissions: – May 10, 2026
- Test submissions: May 10 – May 15, 2026
- Paper deadline: June 1, 2026

We’d love to see your model on the leaderboard and welcome you to join the challenge!

👉Join & Register: https://www.codabench.org/competitions/12650/ Contact: medseg20s@gmail.com
📧Contact: medseg20s@gmail.com

u/Affectionate-Step534 — 13 days ago
▲ 3 r/kaggle

Submission finishes running after competition deadline?

Will a submission still count towards the competition if it finishes running after the deadline but was submitted before the deadline?

Thank you

reddit.com
u/annilee616 — 14 days ago
▲ 7 r/kaggle

Introducing the Unified Game Arena Leaderboard

Since we launched the Kaggle Game Arena last year, we’ve expanded from a Chess leaderboard to a multi-game benchmark spanning Poker, Werewolf, and Four in a Row. But as the benchmark grew, so did the fragmentation. Juggling separate Elo ratings and win rates made it difficult to see the big picture.

Today, we are introducing the Unified Game Arena Leaderboard: a single, consolidated ranking that scores AI models across all games at once. 
To build a statistically principled ranking across fundamentally different environments, we fit a single Bradley–Terry model across all games. Here is how it works:

Key highlights:

  • All evidence is used jointly: If Model A beats Model B in Chess and Poker, both observations directly inform the rating gap. We don't compute separate ratings and try to combine them later - everything goes into a single fit.
  • Every game contributes equally: Episode counts are imbalanced (Werewolf generates ~377k episodes while Chess produces ~2,200). We normalize by dividing each game’s outcome matrices by its total episode count so every game has equal weight.
  • Multiplayer games via pairwise reduction: For team games like Werewolf, outcomes are reduced to binary pairwise comparisons. This provides a clean signal that the Bradley–Terry framework can consume.
  • No post-hoc normalization: Because games are balanced before fitting, the resulting ratings are directly comparable. There is no z-score transformation or averaging step required.

Overall, this unified leaderboard finally answers the big question: Which model is the most consistent strategic reasoner across all domains?

Check out the preliminary rankings: https://kaggle.com/game-arena 

u/kaggle_official — 14 days ago
▲ 2 r/kaggle

I'm encountering this error constantly when I try to create a new dataset on Kaggle. I need help now.

I'm encountering this error constantly when I try to create a new dataset. Earlier, I used to get this error but when I refreshed my dataset was created despite of it, but now I have tried 6-7 times already but nothing works. Does anyone know the fix?

Unexpected token '<', " <html><hea"... is not valid JSON

Unexpected token '<', " <html><hea"... is not valid JSON

https://preview.redd.it/il82z84rflvg1.png?width=819&format=png&auto=webp&s=a9da7f4e6641f1194522f2b888e77ab6fb02c50e

reddit.com
u/let-it-be-a-secret — 7 days ago