
Localization at Reddit: Developing for a Global Audience
Written by Cláudio Ribeiro u/EmeraldMacaw
TL;DR: Considering localization (L10n) at the inception of an online product isn't just a “nice to have,” it helps beyond translations by keeping the code cleaner, improving the UI's flexibility, and making sure the text content is top-notch.
Oftentimes, when a new online product is released, translation is treated like a future problem. It seems logical to say “I'll come back and fix it once we've scaled.” This happens often with software created by companies focused on a local market. But, including localization in the beginning is helpful beyond reaching more users: it makes the code more readable and guarantees text will display as intended everywhere.
Localizing after a product is out can be compared to making a fuel car electric, or trying to restyle a subreddit after millions of users have already gotten used to it. The effort required to retroactively localize is the most compelling reason to not leave it as an afterthought. Take Reddit, for example: our first attempt at localizing Old Reddit was crowdsourced and loosely supervised, which created an inconsistent experience and incomplete translations. Contributors also lacked the necessary context and visual aids to get the work done. In the end, few people used the localized versions and Reddit remained an English-first platform. (Though I must recognize the Pirate English version was pure gold.)
Once it became more noticeable that more and more people from different backgrounds and origins were browsing, contributing and creating, Reddit began working on a localized, globalized heart of the internet. Our first attempts were timid (volunteer translators commenting their suggestions in threads that contained the source strings), but we’ve matured our approach. We’ve implemented a translation management system (TMS) and are developing code in ways that keep localization in mind. Reddit now offers translations into 35 languages from 33 countries and supports 7 different alphabets that are used by millions of users.
Not surprisingly, we faced some setbacks before we got to where we are today: alphabets that wouldn't render, translations that weren't 100% adequate (as reviewers couldn't edit them), truncated text where the UI lacked room, untranslatable content (try translating the Tragedy of Darth Plagueis the Wise…), a mess with genders, plurals, and syntax, etc. These were difficult challenges to overcome, and we learned lessons along the way.
On that note, I’d like to share some of them with you. Below we’ll focus on some key aspects that illustrate the pros of pre-planning and how to get the house in order: accessibility, design, content review, time-to-market, data analysis, quality assurance, and code maintenance.
Accessibility
One of the first places where localization proves its worth is adding descriptions to images, buttons, and options (they even have their own writing style to be most useful to the end-user) to make a platform more accessible. Localizing the website is still relevant even when there are no ambitions to expand the brand abroad, as it's essentially “localizing” for users with impaired vision.
By making sure accessibility is implemented, a company can access a market within their own domain, and it becomes easier to localize into other languages. Accessibility is a way of “localizing” for users who need it; it includes different alphabets, such as sign language and braille.
At Reddit, focusing on accessibility was a gamechanger. We improved our apps to include those with impaired vision, which allowed us to better serve our existing users–and to remain inclusive when we entered new markets with new languages.
Content descriptions can provide translations to screen readers, too
When it comes to a product's design, localization can also help in less obvious ways. English is a “short” language, meaning it doesn't take much space and can express a lot of information without a lot of characters. This makes it easy to fit into tiny spaces, but other languages can take up way more space (up to 40% more than English in some cases) and that can often break the UI for users of longer languages.
This is where pseudo-localization comes in: it can run in design tools (Figma, Sketch, Penpot) and in the code, artificially expanding each word in English to 20~40% its size in a random distribution, allowing for designers to account for the most expansive languages without compromising the original content. It's like using “banana for scale” for buttons. Using pseudo-localization to design products improves the overall experience by preemptively ensuring the UI is comfortable to use in any resolution and language.
Psuedo-localized text in English
Cross-Functional Collaboration
When localization is introduced into a product's development lifecycle, it could mean potentially dozens, if not hundreds of extra specialized eyes will carefully inspect each word being written by different teams, for different projects, at different times. In order to do their best work translating, they interpret the context and the intention, so they can preserve the tone in their language. Their reading is way more focused than the general audience's, especially on a platform like Reddit, which has a truly unique personality and tone.
Translators are professional linguists who need to read, interpret, and fully understand each string we publish, be it in our product itself, in the Reddit Help Center, or even in marketing material. This greatly amplifies Reddit's capacity to fix typos, outdated content, inconsistent experiences, and so on, as translators need to pay more attention to the source and often find errors a casual reader might miss.
A set of corrections spotted by linguists in Reddit's Help Center
Linguist eyes bring an extra level of polish to what we write and make our content even more slick and “together,” which translates into trust with our users. A translated product name can sometimes even serve as inspiration for a company's naming conventions, and since each culture has its unique way to express itself, different perspectives can make what we say more universal and human, which is kind of our thing.
Linguists will navigate through the entire UI/UX to see the localized product in practice. This allows them to help engineering teams by finding issues that might have been overlooked, or that the regular user wouldn't bother to report. Any new feature release gets extra pairs of hands play testing the content. This, allied with the content review component, adds an extra layer of polish and results in a better overall user experience.
Sometimes an L10n bug will also help to improve existing English content
Localization Infrastructure
Caring for localization infrastructure helps in content homogeneity, but also makes us more nimble when it comes to market expansion. Even if expanding into new markets seems like a distant dream, getting the structure ready from the get-go gives a company much more speed when deciding to launch in a new region.
Implementation of plurals in a string
Properly implementing plurals is important because many languages have more than “one, other” plural options.
It ensures that code is ready to connect to translation management tools when needed, and dramatically reduces the cost and time spent to get things in place for translators and engineers. The effort required to go back and finish an incomplete structure and remove redundant code when more challenging markets come aboard (e.g. multiple plurals, different characters, right-to-left orientation, etc.) will inevitably delay your go-to-market timeline in those markets. When Reddit introduced Arabic, addressing these concerns was critical to how we shaped our approach and launch strategy.
Reddit in Modern Standard Arabic
By creating strings with localization in mind, the code also becomes cleaner and string drift is avoided (i.e. we don't have the same word being spelled in three different ways in three different files). Centralizing all the product's strings means normalizing the storage of site content, which is a core tenet of good database and software design. We decoupled the management of translations from logic and reduced complexity and overhead in our code.
Engineering with L10n in mind helps make the code cleaner, more readable, and robustly documented. It's easier to understand where any string gets inserted, makes changes simpler and safer (ensuring no hardcoded strings ever reach production), and paves the way for automated tests that can enforce best practices.
After an online product is localized, bugs are squashed, and continuous rounds of testing are carried out to ensure nothing is broken and that translations follow the context, are easy to understand, and don't conflict with the UI. That’s a huge win for users and development teams.
Strings before descriptions have been added
Descriptions can also be helpful for engineers who might need to update a piece of code related to a specific string.
L10n should be woven into every aspect
Localization ties together linguistics with development, influences marketing strategy, provides data for a coordinated expansion, encourages best practices, and is intimately intertwined with product development, whether it has been activated or not. That is why it can't be seen as a “plug-in” you can add at a later stage, but as a foundational layer that must be taken into account at the ideation stage. It will most certainly save you a headache in the future.
I'm new to L10n. Where should I start?
Check out the Unicode CLDR Project and find out how implementing a repository that takes care of dates, currencies, patterns, and measurements can also help in preventing bugs related to date, time, and locale.
Read about the ICU Message Format to learn how your strings can contain logic, plurals, and gender variants (this can be used even in English to personalize the user experience with “Mr.” and “Mrs.,” for example).
See the first steps to create localization-ready code in Python and Go.