Tag Archives: Languages

Preserving cultural heritage one language at a time – Microsoft on the Issues

There are close to 7,000 languages spoken around the world today. Yet, sadly, every two weeks a language dies with its last speaker, and it is predicted that between 50% and 90% of endangered languages will disappear by next century. When a community loses a language, it loses its connection to the past – and part of its present. It loses a piece of its identity. As we think about protecting this heritage and the importance of preserving language, we believe that new technology can help.

More than many nations, the people of New Zealand are acutely aware of this phenomenon. Centuries ago, the Māori people arrived on the islands to settle in and create a new civilization. Through the centuries and in the isolation of the South Pacific, the Māori developed their own unique culture and language. Today, in New Zealand, 15% of the population is Māori yet only a quarter of the Māori people speak their native language, and only 3% of all people living in New Zealand speak te reo Maori. Statistically, fluency in the language is extremely low.

New Zealand and its institutions have taken notice and are actively taking steps to promote the use of te reo Māori in meaningful ways. More and more schools are teaching te reo Māori, and city councils are revitalizing the country’s indigenous culture by giving new, non-colonial names to sites around their cities. Prime Minister Jacinda Ardern has promoted the learning of te reo Māori, calling for 1 million new speakers by 2040.  In a simple, yet profound, statement Ardern said, “Māori language is a part of who we are.” Despite all these efforts, today the fluency in te reo Māori is low.

For the past 14 years, Microsoft has been collaborating with te reo Māori experts and Te Taura Whiri i te Reo Māori (the Māori Language Commission) to weave te reo Māori into the technology that thousands of Kiwis use every day with the goal of ensuring it remains a living language with a strong future. Our collaboration has already resulted in translations of Minecraft educational resources and we recently commissioned a game immersed entirely in the traditional Māori world, Ngā Motu (The Islands).

To focus only on shaping the future ignores the value of the past, as well as our responsibility to preserve and celebrate te reo Māori heritage. This is why we are proud to announce the inclusion of te reo Māori as a language officially recognized in our free Microsoft Translator app. Microsoft Translator supports more than 60 languages, and this means that the free application can translate te reo Māori text into English text and vice versa. It will also support Māori into and from all other languages supported by Microsoft Translator. This is really all about breaking the language barrier at home, at work, anywhere you need it.

Dr. Te Taka Keegan, senior lecturer of computer science at the University of Waikato and one of the many local experts who have helped guide the project from its inception, says: “The language we speak is the heart of our culture. The development of this Māori language tool would not have been possible without many people working towards a common goal over many years. We hope our work doesn’t simply help revitalize and normalize te reo Māori for future generations of New Zealanders, but enables it to be shared, learned and valued around the world. It’s very important for me that the technology we use reflects and reinforces our cultural heritage, and language is the heart of that.”

Te reo Māori will employ Microsoft’s Neural Machine Translation (NMT) techniques, which can be more accurate than statistical translation models. We recently achieved human parity in translating news from Chinese to English, and the advanced machine learning used for te reo Māori will continue to become better and better as even more documents are used to “teach” it every nuance of the language. This technology will be leveraged across all our M365 products and services.

But while the technology is exciting, it’s not the heart of this story. This is about collaborating to develop the tools that boost our collective well-being. New Zealand’s government is also spearheading a “well-being” framework for measuring a nation’s progress in ways that don’t solely reflect economic growth. We need to look at cultural heritage the same way. Preserving our cultural heritage isn’t just a “nice thing to do” – according to the U.N., it’s vital to our resilience, social cohesion and sense of belonging, celebrating the values and stories we have in common.

I was fortunate to visit New Zealand this year, and it is a country that is genuinely working to achieve a delicate cultural balance, one that keeps in mind growth as well as guardianship, which maintains innovation and a future focus whilst preserving a deep reverence for its past. This kind of balance is something all nations should be striving for.

Globally, as part of our AI for Cultural Heritage program, Microsoft has committed $10 million over five years to support projects dedicated to the preservation and enrichment of cultural heritage that leverage the power of artificial intelligence. The ultimate role of technology is to serve humankind, not to replace it. We can harness the latest tools in ways that support an environment rich in diversity, perspectives and learnings from the past. And when we enable that knowledge and experience to be shared with the rest of the world, every society benefits.

For more information on Microsoft Translator please visit: https://www.microsoft.com/translator

Tags: , , , , , , , ,

Go to Original Article
Author: Microsoft News Center

Developers favor JVM languages for mobile, enterprise

Languages that run on the Java Virtual Machine have lined up well with mobile app developers, alongside the usual code suspects.

JavaScript, Java, Python, PHP and C# top RedMonk’s latest list of programming languages, ranked by code usage (GitHub pull requests) and discussions (Stack Overflow Q&As). C++, CSS, Ruby, C and Objective-C round out the top 10. But a host of JVM languages rank in the middle of the pack and are on the move up the list.

The JVM supports a host of programming languages, such as Kotlin, Groovy, Scala and Clojure, along with JRuby and Jython, as well as more obscure languages such as BeanShell, Pizza, Pnuts and Xtend. Scala (ranked 12th), Clojure and Groovy (tied at 21st) advanced in the RedMonk rankings, while Kotlin — one of the hottest languages around — fell back one spot to 28th.

Bright future for Kotlin, Swift for mobile OS development

Kotlin is especially popular with mobile app developers as a preferred language for Android application development due to its clean, modern design, wrote Stephen O’Grady, analyst at RedMonk, based in Portland, Maine, in a blog post.

Scala had dropped for three consecutive quarters prior to this latest ranking, although the drops were rather small. The causative factors behind Scala’s past declines are unclear, but likely involve competition not only from Java but from other JVM languages such as Clojure, Groovy and even Kotlin.

“Scala had its day in the sun, but it seems to be suffering from growing pains and unable to move under the resistance of its own considerable weight,” said Cameron Purdy, CEO of Xqiz.it, a Lexington, Mass., software startup in stealth mode, and formerly senior vice president of development at Oracle.

Swift, a newer language to build iOS applications, also slid one slot out of a tie with Objective-C, but still enjoys increased attention from developers. IBM and others have pushed Swift as a server-side language.

Like Kotlin, Swift appeals to developers as a language that hides the ugliness of a legacy platform, although it drags a ton of luggage from various legacy Apple technologies that feel less clean, Purdy said.

“If I were a developer starting out today, I’d prioritize Kotlin and Swift for Android and iOS development, with JavaScript or TypeScript for the browser,” he said. “Kotlin should also suffice for the back end.”

Reading the tea leaves

Other industry experts suggest the ebbs and flows of such language popularity rankings are nothing more than periodic changes in the schemes of software development.

As programmers change development projects, they’ll shift from “vanilla Java” to Kotlin if they’re doing Android development, or to Groovy for development with Grails, or to Clojure or Scala for various functional programming work, said Ted Neward, director of developer relations at Smartsheet, Bellevue, Wash.

The more Java improves, the less these other ‘Java++’ languages have compelling enough differences to justify the overhead of using something other than Java.
Charles Nuttersenior principal software engineer, Red Hat

“This is much like trying to read the tides by marking the waves on the side of the pier over a five-minute period,” he said. JVM languages in general have carved out a niche within the broader Java world, which is viable because that world is so large. “If anything, it signals that these languages are reaching a level of maturity and acceptance within the ecosystem,” he said.

Meanwhile, recent improvements in the Java language, such as lambdas in Java 8 and local variable type inference in Java 11, take some steam away from JVM alternatives, said Charles Nutter, co-lead of the JRuby open source project and a senior principal software engineer at Red Hat.

“The more Java improves, the less these other ‘Java++’ languages have compelling enough differences to justify the overhead of using something other than Java,” Nutter said.

Learn the basics of PowerShell for Azure Functions

just for developers; several scripting languages open up new opportunities for admins and systems analysts as well.

Scripting options for Azure Functions

Azure Functions is a collection of event-driven application components that can interact with other Azure services. It’s useful for asynchronous tasks, such as data ingestion and processing, extract, transform and load processes or other data pipelines, as well as microservices or cloud service integration.

In general, functions are well-suited as integration and scripting tools for legacy enterprise applications due to their event-driven, lightweight and infrastructure-free nature. The ability to use familiar languages, such as PowerShell, Python and Node.js, makes that case even stronger. Since PowerShell is popular with Windows IT shops and Azure users, the best practices below focus on that particular scripting language but apply to others as well.

PowerShell for Azure Functions

The initial implementation of PowerShell for Azure Functions uses PowerShell version 4 and only supports scripts (PS1 files), not modules (PSM1 files), which makes it best for simpler tasks and rapid development. To use PowerShell modules in Azure Functions, users can update the PSModulepath environment variable to point to a folder that contains custom modules and connect to it through FTP.

When you use scripts, pass data to PowerShell functions through files or environment variables, because a function won’t store or cache the runtime environment. Incoming data to a function, via an event trigger or input binding, is passed using files that are accessed in PowerShell through environment variables. The same scheme works for data output. Since the input data is just a raw file, users must know what to expect and parse accordingly. Functions itself won’t format data but will support most formats, including:

  • string;
  • int;
  • bool;
  • object/JavaScript Object Notation;
  • binary/buffer;
  • stream; and
  • HTTP

PowerShell functions can be triggered by HTTP requests, an Azure service queue, such as when a message is added to a specified storage queue, or a timer (see Figure 1). Developers can create Azure Functions with the Azure portal, Visual Studio — C# functions only — or a local code editor and integrated development environment, although the portal is the easiest option.

Triggers for PowerShell functions
Figure 1. PowerShell functions triggers

Recommendations

Azure Functions works the same whether the code is in C#, PowerShell or Python, which enables teams to use a language with which they have expertise or can easily master. The power of Functions stems from its integration with other Azure services and built-in runtime environments. Writing as a function is more efficient than creating a standalone app for simple tasks, such as triggering a webhook from an HTTP request.

While PowerShell is an attractive option for Windows teams, they need to proceed with caution since support for Azure Functions is still a work in progress. The implementation details will likely change, however, for the better.

Windows shapes the world’s languages

For Windows to be a truly global product, anyone in the world should be able to type in their language. The first step to unlocking text input for the world is to be able to display any of the world’s languages. This is a challenging task, one which most people don’t need to worry about because their language is already supported, but for millions of people around the world getting basic text support has been a problem. The stumbling block in most such cases is a little-known component called a “shaping engine”. A shaping engine is used for so-called complex text layout, which is needed for about half of the world’s writing systems. For many years, Windows customers have been able to install their own fonts and keyboards but before Windows 10, if there was no shaping engine for your script things wouldn’t look right.

Nko_NoShaping

Incorrect: N’ko script without a shaping engine.

Nko_Shaping

Correct: N’ko script with a shaping engine.

In order to get things to look right, that is, to get a complex script to render correctly on Windows, linguists and software engineers had to take time to study all of the features and requirements of that script and craft a shaping engine that would provide the necessary support. This meant that just a small number of new writing systems could be added in any release of Windows. By Windows 8.1, it had taken around 15 years to build shaping engines for 27 of the most widely used complex writing systems. But if your script wasn’t one of these 27, you were out of luck. In order to get things to look right, that is, to get a complex script to render correctly on Windows linguists and software engineers had to take time to study all of the features and requirements of that script and craft a shaping engine that would provide the necessary support. This meant that just a small number of new writing systems could be added in any release of Windows. By Windows 8.1 it had taken around 15 years to build shaping engines for 27 of the most widely used complex writing systems. But if your script wasn’t one of these 27, you were out of luck.

“The Universal Shaping Engine is the proverbial ‘game changer’ for complex script font development, especially for new Unicode scripts that might otherwise languish unsupported in software and fonts for years.” – John Hudson, CEO, Tiro Typeworks Ltd.

In recent years the The Unicode Standard has made amazing progress in defining and setting standards as to how all of the world’s writing systems should be supported in the digital era. The most recent version of Unicode includes 125 different writing systems, 56 of which need shaping support. The Script Encoding Initiative at the University of California Berkeley estimates there are about almost 50 more complex scripts yet to be added to Unicode. So for some language communities and scholars it looked like it would be many years before they would be able type their language on Windows, if that day would even ever come.

This problem led a small team of engineers in Microsoft’s Operating Systems Group to think about how to design a shaping engine so that any script defined in Unicode could be displayed correctly without the time and effort required to create a dedicated shaping engine. The result is a new kind of shaping engine, a “universal shaping engine”, that is capable of supporting any complex script when provided with a suitable font.

There are four parts to this engine that make it universal:

  1. It consumes data directly from the Unicode Standard
  2. It uses a “universal cluster model” that models the superset of human writing systems
  3. It enables OpenType font features to support cutting edge typography
  4. The specification is available publically

Consuming data directly from The Unicode Standard

The team worked with experts from the Unicode Technical Committee to make sure that all of the necessary research to shape each script is made available in a machine readable format and that this data will be kept up-to-date when new scripts are added to the standard. As a result, the burden of script research is contained within Unicode’s script encoding process and means that Microsoft doesn’t need to do additional research that would delay adding support for a new script. This approach makes it easy for Windows to keep current with the latest version of Unicode.

The Universal cluster model

The team did original research into the unsupported scripts of the world to determine formulas that describe how letter forms and other signs can combine. The result is a generalized “cluster model” that is applicable to any known writing system. Here is an example of the formula for the standard cluster in the universal cluster model:

< B | GB > [VS] (CMAbv)* (CMBlw)* (< H B | SUB > [VS] (CMAbv)* (CMBlw)*)* [R]
  [MPre] [MAbv] [MBlw] [MPst]
  (VPre)* (VAbv)* (VBlw)* (VPst)*
  (VMPre)* (VMAbv)* (VMBlw)* (VMPst)*
  (FAbv)* (FBlw)* (FPst)* [FM]

For an explanation of this formula you can check out the full specification.

Here is the same formula as a diagram:

diagram2

Enable the full set of font features to support cutting edge typography

The team consulted with a leading font designer and other OpenType font experts to determine the complete set of features that would enable font developers to create fonts that will meet the orthographic requirements for the newly enabled scripts as well as making it possible to do cutting edge typography. For example, Soyombo has a unique property among writing systems in that the length of a vertical bar that is part of each letter must match the longest bar in the line. We were able to use features of the Universal Shaping Engine to show that this script, which is still in the process of being encoded in Unicode, would be supported by the Universal Shaping Engine once it is published in the standard.

Sample of Soyombo script showing the consistent vertical descenders.

Sample of Soyombo script showing the consistent vertical descenders.

“The Universal Shaping Engine enabled us to work out how to encode Soyombo, a Mongolian script with very different clustering and typographic requirements.” – Anshuman Pandey, Post-Doctoral Researcher, University of California, Berkeley

Publish the specification

Microsoft Typography has been publishing specifications for its shaping engines for years so that font developers and other platforms can build compatible systems. By publishing the technical details for the Universal Shaping Engine we enable font developers to understand how to create fonts for the world’s complex scripts so that they will display correctly on Windows 10. We hope that other platforms and text layout software will create compatible systems so that documents and fonts produced on Windows will display correctly on other systems and vice-versa. That way language communities, enthusiasts and scholars can share documents in any of the world’s more than 7,000 languages using one of the 125 writing systems in Unicode (and counting).

“The Universal Shaping Engine … is of great importance for all language groups to be able to communicate on computers and through the internet.” – Lorna Evans, Script Technologist

This new engine is part of Windows 10, so if you want to type in Balinese or Tirhuta, or any of the other complex scripts included in Unicode 7.0, the shaping support is there and will keep up with Unicode as each new writing system is added to the standard. Now it’s over to font developers and language communities to take advantage of this support!

What scripts are covered?

The following writing systems are now supported using the Universal Shaping Engine on Windows 10. Some of these were supported on previous versions of Windows using different technology.

Balinese, Batak, Brahmi, Buginese, Buhid, Chakma, Cham, Duployan, Egyptian Hieroglyphs, Grantha, Hanunoo, Javanese, Kaithi, Kayah Li, Kharoshthi, Khojki, Khudawadi, Lepcha, Limbu, Mahajani, Mandaic, Manichaean, Meitei Mayek, Modi, Mongolian, N’Ko, Pahawh Hmong, Phags-pa, Psalter Pahlavi, Rejang, Saurashtra, Sharada, Siddham, Sinhala, Sundanese, Syloti Nagri, Tagalog, Tagbanwa, Tai Le, Tai Tham, Tai Viet, Takri, Tibetan, Tifinagh, and Tirhuta.

The examples below illustrate some of the Universal Shaping Engine’s supported scripts showing various degrees of shaping. All of these examples have been rendered using Unicode sequences in Notepad on Windows 10 with the Universal Shaping Engine. For Balinese, Batak, Lepcha, Sundanese, and Tai Viet, we used fonts from Google’s Noto Project (thanks for the great fonts!). The remaining fonts are Microsoft’s.

Balinese script.

Batak script.

Batak script.

Lepcha.

Lepcha.

Brahmi script.

Brahmi script.

Buhid.

Buhid.

Buginese.

Buginese.

Javanese.

Javanese.

Hanunoo.

Hanunoo.

Tagbanwa.

Tagbanwa.

Egyptian Hieroglyphs.

Egyptian Hieroglyphs.

Kharoshthi.

Kharoshthi.

N'Ko.

N’Ko.

Sundanese.

Sundanese.

Tai Viet.

Tai Viet.

Phags-pa.

Phags-pa.

Tai Le.

Tai Le.

Tifinagh.

Tifinagh.

What is a shaping engine?

A shaping engine enables scripts with contextual and non-linear typographic requirements to be displayed on a computer. Common types of complex text layout include:

Ligatures

In Latin script, the combination of f and i may form a ligature fi:

Ligatures_Latin

Such ligatures are optional in Latin script, but are required in other writing systems. In Devanagari, the sign क (ka) must form a ligature when combined with the sign ष (ṣa):

Ligatures_Devanagari

Reordering

In some scripts, vowels may be written in front of a letter that they follow in pronunciation. In Sinhala, the vowel symbol1a (e)  is written the left of a consonant, such as symbol2b (ka), so the sound ke is written:

Reordering

Joining:

Cursive scripts such as Arabic and Mongolian connect letters so that words may be written as a single stroke. This means that letters must change shape depending on whether they occur at the beginning, middle or end of a word. For example, the word “Mongolia” written in Mongolian script is written like this:

Joining

To learn more about International Mother Language Day, and what Microsoft is doing to support technology on this front, please visit the Official Microsoft Blog.

Andrew Glass
Program Manager, Operating Systems Group