
Basically, the database is full of all the tons of the most common phrases paired with a unique ID. On average it seems like I can compress my message to half the size. I wasn't really aiming to do this. I was just trying to make a code book and this was a byproduct and I thought It might be interesting to share.
But I got me thinking, what's the highest data compression we can get on text currently?