All articles

Adding Welsh language support to UKHPI

The UK House Price Index, UKHPI, is a set of statistics that records information on the change in house prices across England, Wales, Scotland and Northern Ireland over time. Produced collaboratively by HM Land Registry (HMLR), Registers of Scotland, Land and Property Services in Northern Ireland, and the Office for National Statistics, the figures are published each month on a website owned by HMLR. Epimorphics helped HMLR develop the user-facing applications that allow people to find house price statistics for their area, and we continue to run the web site and the underlying data publishing platform. The underlying data is also available via SPARQL.

Given that the UKHPI includes house price information for properties in Wales, it will be of direct interest to people who speak Welsh, as well as English speakers. The Welsh Language Act (1993) requires that public bodies treat Welsh-speakers and English speakers equally. Welsh localization has long been a feature of the site that we had wanted to add; recently we had the opportunity to do so.

UKHPI is primarily a Ruby-on-Rails application, with a JavaScript front end that handles the interactive generation of graphs of house prices, together with map-based selection of locations that the user is interested in seeing data for. The JavaScript front-end uses the Vue.js application framework. Both Rails and Vue have support for “internationalisation” (sometimes abbreviated to ‘i18n’) – the process of arranging for a web site to display in one or more alternative languages. However, for historical reasons, neither the Rails code, nor the Vue code, was written to take advantage of these facilities. So essentially this project was one of retro-fitting an existing service with Welsh language support.

Of Message Catalogues

The basic mechanism in any programming framework for handling multiple languages is the message catalogue. This is essentially a look-up table, allowing us to replace some explicit text with the right value from the lookup-table. By changing the table, we change what the user sees:

So the non-internationalised code:

<h1>UK House Price Index</h1>

becomes:

<h1><%= I18n.t('common.header.app_title) %></h1>

The construct common.header.app_title is a key, specifying an entry in the lookup tables, which are different for English and Welsh:

# In config/locales/en.yml
en:
  common:
    header:
      app_title: "UK House Price Index"
# In config/locales/cy.yml
cy:
  common:
    header:
      app_title: "Mynegai Prisiau Tai y DU"

This is another application of the Fundamental theory of software engineering: “We can solve any problem by introducing an extra level of indirection.”

So a large part of the labour involved in adding internationalisation to an existing application is to find everywhere a piece of text is rendered onto the user interface, and replace that with a lookup into the message catalogue. This can be quite laborious, and easy to get wrong – especially for seldom displayed content like error messages and validation warnings!

My approach was to break it down into separate processes: first to replace all of the English text with message catalogue lookups, and then to generate a Welsh message catalogue in which every key had some nonsensical value such as ****. This makes it easier to spot when you have missed some obscure bit of content.

Of string composition

Many times, the content of the text we want to generate depends on a user selection, or a feature of the data. For example, if the user has selected the average price of houses in the Forest of Dean, we have two different titles for the graphs:

  • Average price by type of property in Forest of Dean
  • Pris cyfartalog yn ôl math o eiddo yn Fforest Y Ddena

Here, the location (Forest of Dean), the statistic (average price) and the topic (by type of property, such as detached houses, semi-detached, etc) are all user choices. How we, as developers, can construct that composite text can be quite complicated, and not a detail that we want to expose to the translators. The basic approach is to simply leave placeholders in the text, as shown below, but we do run into complications later!

# en
browse:
  print:
    unavailable: "We're sorry, #{indicator_name} is not available"
# cy
browse:
  print:
    unavailable: "Mae’n flin gennym, nid yw #{indicator_name} ar gael"

We then have to ensure we substitute the right indicator_name value: English or Welsh, to get a correct fully-formed string.

Working with translators

It’s rare for the dev team to have the language skills in-house to complete internationalisation correctly. Even if someone is a Welsh-language speaker (which I am not!), technical statistical information needs the care and attention to detail of a professional translator. Typically, professional translators are used to working with source material in Microsoft Word, or similar tools. This also offers the support of spelling and grammar checkers, which are mostly absent from dev tools.

To generate a document we could exchange with the translators, I wrote a small script that read in the contents of en.yml, and generated a series of tables, one per key:


Screenshot of the generated Welsh translations document

To provide context for each fragment of text, we provided annotated screenshots to show where the text would appear on the rendered page, as this can affect the correct choice of vocabulary.

Once we had generated the .docx file and annotated it with the screenshots, the translation team was able to get to work. Having the message catalogue keys embedded in the document made it straightforward to copy the translated text into the cy.yml file. This is a step that could in principle also be automated, but it didn’t take long to copy and paste by hand.

And then there’s the JavaScript

For full accessibility, the UKHPI app is designed to be fully functional with JavaScript disabled. However, with JavaScript turned on, the user will get a richer, more interactive experience with data visualizations enabled. We needed to enable translations to occur within the Vue app, but with as little duplication as possible.

Fortunately, there is a yaml-loader, which allows Webpack to directly load the message catalogue files when the app is built. This gives the JS code access to the same set of Welsh translations as the Ruby code (with one small wrinkle, see below). Vue has a vue-18n plugin to help with rendering strings in the current language. But this raises a question: what is the current language?

Selecting a language

Once the user selects a presentation language (currently either English or Welsh, we may add others in future), we want to remember that choice. A common way to keep state in a web application is to add a parameter to the URL: ?lang=en or ?lang=cy. However, a Welsh-speaking user may also set their preferred language to Welsh, meaning that sites that can present in Welsh should do so, even if the langURL parameter is not set. Under the covers, this sets the HTTP header Accept-Language. To parse the value of the Accept-Language header if it is set, we use the HttpAcceptLanguage gem:

user_locale =
  params['lang'] ||
  http_accept_language.compatible_language_from(I18n.available_locales)

This work to determine the user’s locale happens in Rails on the server. We also need JavaScript to know the user’s locale, but we don’t want to re-parse the URL parameters, and in any case JavaScript can’t see the request headers to look for Accept-Language. Our solution was to have a small snippet of code in the main page template pass the current locale to the client-side via the window global variable:

:javascript
  window.ukhpi = window.ukhpi || {};
  window.ukhpi.version = '#{Version::VERSION}';
  window.ukhpi.locale = '#{I18n.locale}';

Having done this, Ruby and JavaScript can share the same message catalogues. The one problem that we ran into is that Rails’ and Vue’s i18n frameworks use different conventions for embedding values:

# For Ruby to process, use #{}
print:
  unavailable: "We're sorry, #{indicator_name} is not available"
# For JavaScript to process, use %{}
graph:
  no_data: "Sorry, there is no %{label} data available for %{location}."

We didn’t find a convenient way to harmonise these choices, but ultimately it didn’t matter as the messages needing embedded values didn’t overlap.

Consonant mutations

One feature of the Welsh language did not fit conveniently into the i18n frameworks in either Rails or Vue: consonant mutations. In Welsh, under certain circumstances, consonants change in sound and in written form. For example, “in Pembrokeshire” translates to “yn Sir Benfro”, but “in Gwynedd” becomes “yng Ngwynedd” under the mutation rules. Since we are constructing prompts like “in Gwynedd” dynamically from user choices, we can’t rely on the translators having already performed the correct mutations for us. Our solution was to code up a set of custom rules based on regular expressions, corresponding to the mutations we needed to apply. There wasn’t a convenient way to re-use the same code in Rails and in Vue, so there is some duplication unfortunately.

Lessons learned

Overall, the project went more smoothly than I had expected. Extracting text from code and templates into message catalogues is laborious. But it’s also true that when building an app for the first time it’s an overhead to try to place all presentation text into a message catalogue, so there’s something to be said for prototyping without worrying about a second language.

Automating the extraction of messages from the message catalogue in a Word file was definitely worthwhile, as was adding the extra context of the annotated screenshots. In hindsight, we should probably have scripted the copying of translated text back into the message catalogue.

Final lesson: make sure the translators are on-board for QA testing. Some typos and grammar errors were caught by the Welsh language speakers that the dev team would not have noticed by themselves!

You can try the Welsh language version of the UK House Price Index for yourself.

Take a look at some of our other #TechTalk for other topics.

Diolch yn fawr