u/Sweaty-Stop6057

Postcode is one of the most underrated features in modelling

One thing that has consistently surprised me across different companies is how strong postcode features tend to be in models.

At first glance, it's surprising that it's so predictive (it's "just geography facts"), but then it clicks: people tend to live in areas with somewhat likeminded people, and the (visible) area-level behaviours often correlate well with the individual behaviours that we're interested in.

The features that are captured for each postcode,

  • demographics
  • deprivation
  • housing characteristics
  • crime exposure
  • transport access
  • general behaviour patterns

are proxies for behaviours that are hard to observe directly: renewal propensities, fraud, risk.

The other issue is that postcode data is rarely "done properly". It's often:

  • built once and never updated
  • very incomplete
  • or treated as a static lookup rather than something that evolves over time

Of course, there are important considerations around fairness and bias here, since geographic features can correlate with socio-economic factors. In practice, how these features are used depends heavily on the application and regulatory context.

Curious how others are handling this -- do you tend to use postcode features, or is it something that gets deprioritised?

reddit.com
u/Sweaty-Stop6057 — 19 hours ago