u/jajohn99

Is anyone getting big into agentic feature/model experimentation? Automating these pipelines is unlocking whole new worlds.

Been building an autonomous energy-demand forecasting research harness and curious if anyone here has gone deep on agentic/automated feature experimentation.

Current setup:
- NSW electricity demand forecasting
- weather + historical demand features
- rolling walk-forward validation
- Modal running large parallel experiment sweeps
- leaderboard + automatic scoring against fixed baselines

Right now the system is good at:
- model/config sweeps
- backtesting
- evaluation
- calibration

But I’m now moving toward automated feature generation/proposal.

The rough idea:
- LLM proposes feature sets/interactions/lags/transforms
- deterministic harness builds + evaluates them
- only improvements get promoted into the leaderboard

Examples:
- temp × humidity interactions
- lag structures
- rolling weather anomalies
- calendar effects
- weather regime features
- demand ramp features

I’m trying to avoid:
- leakage
- overfitting the leaderboard
- combinatorial garbage feature spam
- “LLM generated alpha soup”

Curious if anyone here has:
- done autonomous feature research seriously
- used agents for forecasting/model discovery
- built good constraints/DSLs around feature generation
- thoughts on how much value is actually there vs brute force + human intuition

Feels like forecasting is unusually well-suited to autonomous experimentation because the scoring loop is so clean.

Background: Australia curtails meaningful amounts of wind and solar every week. AEMO's quarterly reports show curtailment moving with forecast quality, network constraints, and price conditions. Battery and VPP operators bid days ahead on price forecasts, which depend on demand forecasts, which depend on weather. When the forecasts are off, curtailment rises, dispatch decisions miss, and the transition gets more expensive than it needs to be.

What works today: every retailer, every battery operator, every wind farm developer runs internal forecasting models. Commercial weather-to-power vendors sell proprietary outputs. AEMO publishes its own demand and renewable generation forecasts in the ESOO and ISP.

We can't easily tell which forecasting approaches actually work on Australian outcomes vs which only sound good in slide decks.

So I built OracleBook — paper-trading infrastructure where forecasters (mostly AI agents, some human teams) submit probabilistic forecasts on Australian weather and energy outcomes (BOM rainfall, AEMO power), trade against each other in an order book, and get scored against canonical settlements. The full record is public.

Running a two-week open competition starting 1 June. The whole point is to surface, in public, which forecasting approaches actually work on Australian outcomes — including the variables that matter to renewables operators: solar-zone weather, regional demand peaks, wind generation by zone.

Three questions I'd genuinely value this sub's view on:

Would a public forecasting record actually help renewables integration? Or do the operators who matter already have what they need internally, and the gap is in policy/transmission, not forecasting?
What outcomes would be most valuable to score? Currently: weekly rainfall, daily/weekly power index by region, price spikes. Worth adding solar generation by zone, wind by zone, REZ-level demand?
Anyone working on curtailment-aware bidding or VPP optimization? Genuinely curious whether sharper public forecasts would change the numbers in your dispatch model — or whether the bottleneck is somewhere else entirely.

If you want to compete (or know quants, weather modelers, or energy desks who would): get in touch. Industry support — sponsorship, judging, amplification — DM or comment. Critique with no commitment is welcome too.

Could better forecasting cut curtailment? Built a public scoring system for Australian weather/energy markets — 2-week open competition starts 1 June.