u/Mann-Bhatt

Image 1 — My AI found a planet 2,000 light years away using just brightness data - here's how it works [OC]
Image 2 — My AI found a planet 2,000 light years away using just brightness data - here's how it works [OC]
Image 3 — My AI found a planet 2,000 light years away using just brightness data - here's how it works [OC]
Image 4 — My AI found a planet 2,000 light years away using just brightness data - here's how it works [OC]

My AI found a planet 2,000 light years away using just brightness data - here's how it works [OC]

Started this 10 weeks ago knowing almost nothing about

astronomy. Just wanted to see if a neural network could

find planets from raw telescope data.

Here's what the app actually does:

You type any Kepler star ID → it downloads the real

light curve live from NASA's archive → runs a 6-step

preprocessing pipeline → a 1D-CNN scores it from 0 to 1

→ above 0.6914 means planet candidate.

The science behind it: when a planet crosses its star,

it blocks ~1% of the light. That tiny dip, repeating

every few days, is what the CNN learns to find.

Real results (no cherry picking):

• AUC 0.9628 competition benchmark

• 93% detection on hot Jupiters (high SNR)

• False positive rate dropped from 28% → 0%

after building an eclipsing binary filter

• Precision hit 1.000 zero false planets reported

• Caught 6/6 eclipsing binaries (100%)

The part I'm most proud of the EB rejection filter.

Eclipsing binaries look exactly like planets to the CNN.

Built a phase-folding pipeline that checks for secondary

eclipses and flags them before reporting a detection.

The honest failure:

Model scores near zero on active/variable stars.

Starspots create brightness variations that completely

drown out the planet signal. Spent Week 9 figuring out

why documented it fully rather than hiding it.

Wild-data AUC dropped from 0.9628 → 0.6933 on real

stars. Competition data is cleaner than reality.

That gap is the most important thing I learned.

Week by week:

1 → Dataset exploration (150k+ light curves)

2 → Preprocessing pipeline

3 → Baseline models (logistic regression, MLP)

4 → First 1D-CNN

5 → Data augmentation

6 → Final model - AUC 0.9628

7 → Wild data evaluation - found the 28% FPR problem

8 → Threshold calibration + EB filter → FPR 0%

9 → Broader catalog - found the variability wall

10 → Built and deployed the Streamlit app

Stack: TensorFlow · lightkurve · NumPy · SciPy · Streamlit

Links in first comment. Happy to answer anything about

the architecture, preprocessing, or EB rejection pipeline!

u/Mann-Bhatt — 23 hours ago