How to balance shader work vs bandwidth?
I am working on a small scale voxel engine and currently just trying to push rendering distance to its absolute limits.
One of the optimisations I hear often is reducing the amount of data sent to the GPU. So I reduced my vertex buffer 7x to 4 bytes (32 bits) by storing local chunk coordinates instead of float global coord, packing normal vector into first 3 bits of a byte (as it can only ever have 6 values) and using the rest for block type.
But the work I had to do in a shader to decode those values ended up resulting in (slightly but still) worse performance than when sending all the data raw, at least on my high end GPU.
Is there a rule of thumb somewhere about how much to send vs what to delegate to a shader? Is less bandwidth always better or does it only start to become an issue once you reach certain amount of data sent? Is this balance any different on lower end GPUs, and I will feel the optimisation if I benchmark on a different machine?
Sorry if the question is stipud, I’m just a beginner.