![Image 1 — My AI found a planet 2,000 light years away using just brightness data - here's how it works [OC]](https://preview.redd.it/8x4i1cidh21h1.jpg?width=1947&format=pjpg&auto=webp&s=bd40253deeadc26e5c324955c286885565c9e1c7)
![Image 2 — My AI found a planet 2,000 light years away using just brightness data - here's how it works [OC]](https://preview.redd.it/anxwxupdh21h1.jpg?width=1963&format=pjpg&auto=webp&s=1730c2075254e7e42a5bb9e0b186ed931603de72)
![Image 3 — My AI found a planet 2,000 light years away using just brightness data - here's how it works [OC]](https://preview.redd.it/bxfqm9xdh21h1.jpg?width=1952&format=pjpg&auto=webp&s=072e090f5990bb1aaa23b98562872d38afd73081)
![Image 4 — My AI found a planet 2,000 light years away using just brightness data - here's how it works [OC]](https://preview.redd.it/xfh88f3eh21h1.jpg?width=1940&format=pjpg&auto=webp&s=a9ef64c685d5ccbb6055848af4758a9c049f417d)
My AI found a planet 2,000 light years away using just brightness data - here's how it works [OC]
Started this 10 weeks ago knowing almost nothing about
astronomy. Just wanted to see if a neural network could
find planets from raw telescope data.
Here's what the app actually does:
You type any Kepler star ID → it downloads the real
light curve live from NASA's archive → runs a 6-step
preprocessing pipeline → a 1D-CNN scores it from 0 to 1
→ above 0.6914 means planet candidate.
The science behind it: when a planet crosses its star,
it blocks ~1% of the light. That tiny dip, repeating
every few days, is what the CNN learns to find.
Real results (no cherry picking):
• AUC 0.9628 competition benchmark
• 93% detection on hot Jupiters (high SNR)
• False positive rate dropped from 28% → 0%
after building an eclipsing binary filter
• Precision hit 1.000 zero false planets reported
• Caught 6/6 eclipsing binaries (100%)
The part I'm most proud of the EB rejection filter.
Eclipsing binaries look exactly like planets to the CNN.
Built a phase-folding pipeline that checks for secondary
eclipses and flags them before reporting a detection.
The honest failure:
Model scores near zero on active/variable stars.
Starspots create brightness variations that completely
drown out the planet signal. Spent Week 9 figuring out
why documented it fully rather than hiding it.
Wild-data AUC dropped from 0.9628 → 0.6933 on real
stars. Competition data is cleaner than reality.
That gap is the most important thing I learned.
Week by week:
1 → Dataset exploration (150k+ light curves)
2 → Preprocessing pipeline
3 → Baseline models (logistic regression, MLP)
4 → First 1D-CNN
5 → Data augmentation
6 → Final model - AUC 0.9628
7 → Wild data evaluation - found the 28% FPR problem
8 → Threshold calibration + EB filter → FPR 0%
9 → Broader catalog - found the variability wall
10 → Built and deployed the Streamlit app
Stack: TensorFlow · lightkurve · NumPy · SciPy · Streamlit
Links in first comment. Happy to answer anything about
the architecture, preprocessing, or EB rejection pipeline!