Multilingual Content in a Knowledge Base



In our previous blog post we showed that a knowledge base comprises individual pieces of knowledge that are called assets and that they are related to each other. When displaying a knowledge asset, related assets should also be displayed. How do multiple languages come in? What does it mean for a knowledge base to be multilingual?


Well, one way to look at this is that to a large extent the knowledge is expressed by the relationships between assets. Consider the company BMW, headquartered in beautiful Munich, Germany. This would be expressed as a knowledge asset “BMW” which has a relationship “headquarters” to another knowledge asset “Munich” (The relationship is also a knowledge asset). Normally, knowledge assets represent the real world, so that they exist and how they are related does not depend on the language. But the knowledge assets and the relationships would be known under different names in different languages. That said, an English user may wish to see information that looks like this:




A German user would see this instead:





And a user from Taiwan would perhaps see this:






How do we solve this?


We express each knowledge asset as an entity in our system, and each entity can have different names or labels (“designations”) in different languages. We can just provide designations for BMW, for the relationship Headquarters, and for the city Munich, each in multiple languages, and – presto! – problem solved.


Almost, at least. We will also need to find a way to show different languages to different users. Perhaps not every knowledge asset has a designation in every language, so perhaps users can specify a list of languages they understand. I, for example, would be happy with German, but I also understand English, so if a German designation is not present, I’m satisfied with an English one. So I would specify the languages German, English in that order.


This approach can take us pretty far. That said, languages are complicated. Perhaps it is necessary to provide US English (“Truck”) and British English (“Lorry”) designations for the same knowledge asset, so that users from both sides of the pond feel at home, whereas users from Down Under would still feel left out…. One way to approach this is to offer a default variety of English plus regional varieties, and the application would look for the regional variety first, then the default variety. For example, if a user specifies English (US) and German, in that order, then the system would search for a US English designation first, then a default English one, then a (default) German one. When adding a regional variety of a language, it suffices to specify designations for knowledge assets where they diverge from the default, which helps to reduce effort.


Conclusion


We have shown that combining three fairly simple ideas gives us a lot of mileage with respect to representing multilingual content. The first idea is to split our knowledge into knowledge assets that are related to each other, and the second idea is to provide names or labels (“designations”) for each knowledge asset in multiple languages. The third idea is to allow users to specify their preferred languages in preference order.



We love to hear from you! For any questions about knowledge management, please reach out to info@semedy.com.