Getting the length of a string seems simple and is something we do in our code every day. Limiting the length of a string is also extremely common in both frontend and backend code. But both of those actions – especially length limiting – hide a lot of complexity, bug-risk, and even vulnerability danger. In this post, we’re going to examine string length limiting deeply enough to help us fully grok what it means when we do it and how best to do it… and discover that the best still isn’t great.
A TL;DR misses the “fully grok” part, but not everyone has time to read everything, so here are the key takeaways:
- Be aware that there are different ways of measuring string length.
- Really understand how your programming language stores strings in memory, exposes them to you, and determines string length.
- Make an intentional decision about how you’re going to count characters when limiting string length.
- Look carefully at how the “max length” features provided by your language (framework, etc.) actually work. There’s a very good chance that they do not match the limiting method you chose.
- Make sure you use that same counting method across all the layers of your architecture.
- Probably limit by counting normalized Unicode code points. (Like Google recommends.)