u/Yairlenga

Optimizing Chained strcmp Calls for Speed and Clarity - From memcmp and bloom filters to 4CC encoding for small fixed-length string comparisons

Optimizing Chained strcmp Calls for Speed and Clarity - From memcmp and bloom filters to 4CC encoding for small fixed-length string comparisons

I've been working on an article to describe a small performance issues with a pattern I've seen multiple times - long chain of if statements based on strcmp. This is the equivalent of switch/case on string (which is not supported in C).

bool model_ccy_lookup(const char *s, int asof, struct model_param *param)
{
    // Major Currencies
    if ( strcmp(s, "USD") == 0 || strcmp(s, "EUR") == 0 || ...) {
        ...
    // Asia-Core
    } else if ( strcmp(s, "CNY") == 0 || strcmp(s, "HKD") == 0 || ... ) {
        ...
    } else if ( ... ) {
        ...
    } else {
        ...
    }
} 

The code couldn’t be refactored into a different structure (for non-technical reasons), so I had to explore few approaches to keep the existing structure - without rewrite/reshape of the logic. I tried few tings - like memcmp, small filters, and eventually packing the strings into 32-bit values (“4CC”-style) and letting the compiler work with integer compares.

Sharing in the hope that other readers may find the ideas/process useful.

The article is on Medium (no paywall): Optimizing Chained strcmp Calls for Speed and Clarity.

I’m also trying a slightly different writing style than usual - a bit more narrative, focusing on the path (including the dead ends), not just the final result.

If you have a few minutes, I’d really appreciate feedback on two things:

  • Does the technical content hold up?
  • Is the presentation clear, or does it feel too long / indirect?

Interested to hear on other ideas/approach for this problem as well.

medium.com
u/Yairlenga — 9 days ago