u/Rich_Yam5419 — reddlx

The problem. You have a dataset of B rows. Each row is an input x (n numbers) and a measured response y (one number). You want to find the rows with the smallest and largest y.

Step 1: Lift. Take each row (x, y) and place it as a point p in (n+1)-dimensional space, choosing the position so that the distance from the origin equals R + y (where R is a large positive offset, big enough that R + y > 0 for every row). After this step:

big y → point is far from origin
small y → point is close to origin
"compare two y values" is now the same as "compare two distances to origin"

https://preview.redd.it/578u4myhjbzg1.png?width=1560&format=png&auto=webp&s=f8c94f189281f2860dd0423a900c8118bfb35164

Step 2: Fold. Push every point through a sequence of Householder reflections (mirror-flips across hyperplanes). The key property: a reflection preserves the distance to the origin. So no matter how many times you fold, each point stays on its own hypersphere of radius R + y. The fold sequence is precomputed once for each input dimension n and stored in data_householder/dim_NNN.npz.

https://preview.redd.it/70w1032mjbzg1.png?width=2200&format=png&auto=webp&s=96b862bde5c3cc24f3432808074313753a477951

Step 3: Read off. Compute ||folded_p|| for each row. The row with the smallest norm is argmin(y); the row with the largest norm is argmax(y). The folds preserved norm, so ||folded_p|| = R + y.

Step 4: Recover the original row. You now know the value of R + y for the optimum, but you still need which row it came from. A hashmap built ahead of time maps quantized |R + y| values back to row indices.

Step 5: Fallback. Floating-point drift can occasionally make the hash lookup miss. When that happens, fall back to a recursive split: chop the dataset in half and recurse until the optimum row is unambiguously located.

That's the whole pipeline: lift → fold → norm → recover (→ fallback if needed).

Repository: https://github.com/BoolHak/pizza-fold