
Optimizing Chained strcmp Calls for Speed and Clarity - From memcmp and bloom filters to 4CC encoding for small fixed-length string comparisons
I've been working on an article to describe a small performance issues with a pattern I've seen multiple times - long chain of if statements based on strcmp. This is the equivalent of switch/case on string (which is not supported in C).
bool model_ccy_lookup(const char *s, int asof, struct model_param *param)
{
// Major Currencies
if ( strcmp(s, "USD") == 0 || strcmp(s, "EUR") == 0 || ...) {
...
// Asia-Core
} else if ( strcmp(s, "CNY") == 0 || strcmp(s, "HKD") == 0 || ... ) {
...
} else if ( ... ) {
...
} else {
...
}
}
The code couldn’t be refactored into a different structure (for non-technical reasons), so I had to explore few approaches to keep the existing structure - without rewrite/reshape of the logic. I tried few tings - like memcmp, small filters, and eventually packing the strings into 32-bit values (“4CC”-style) and letting the compiler work with integer compares.
Sharing in the hope that other readers may find the ideas/process useful.
The article is on Medium (no paywall): Optimizing Chained strcmp Calls for Speed and Clarity.
I’m also trying a slightly different writing style than usual - a bit more narrative, focusing on the path (including the dead ends), not just the final result.
If you have a few minutes, I’d really appreciate feedback on two things:
- Does the technical content hold up?
- Is the presentation clear, or does it feel too long / indirect?
Interested to hear on other ideas/approach for this problem as well.