Is anyone getting big into agentic feature/model experimentation? Automating these pipelines is unlocking whole new worlds.
Been building an autonomous energy-demand forecasting research harness and curious if anyone here has gone deep on agentic/automated feature experimentation.
Current setup:
- NSW electricity demand forecasting
- weather + historical demand features
- rolling walk-forward validation
- Modal running large parallel experiment sweeps
- leaderboard + automatic scoring against fixed baselines
Right now the system is good at:
- model/config sweeps
- backtesting
- evaluation
- calibration
But I’m now moving toward automated feature generation/proposal.
The rough idea:
- LLM proposes feature sets/interactions/lags/transforms
- deterministic harness builds + evaluates them
- only improvements get promoted into the leaderboard
Examples:
- temp × humidity interactions
- lag structures
- rolling weather anomalies
- calendar effects
- weather regime features
- demand ramp features
I’m trying to avoid:
- leakage
- overfitting the leaderboard
- combinatorial garbage feature spam
- “LLM generated alpha soup”
Curious if anyone here has:
- done autonomous feature research seriously
- used agents for forecasting/model discovery
- built good constraints/DSLs around feature generation
- thoughts on how much value is actually there vs brute force + human intuition
Feels like forecasting is unusually well-suited to autonomous experimentation because the scoring loop is so clean.