NEW HAVEN — Looking at Chinese script, you might empathize with the words of an 18th-century Jesuit missionary: “One can only endure the pain of learning it for the love of God.” The piety may be gone, but the Chinese have heard this kind of complaint for over four centuries and are finally doing something about it.
This month, the Chinese government plans to introduce codes for some 3,000 Chinese characters as part of a grand project, known as the China Font Bank, to digitize 500,000 characters previously unavailable in electronic form. Until now, only 80,388 characters have been encoded in the international computing standard, Unicode.
The project highlights 100,000 characters from the country’s 56 ethnic minorities, and another 100,000 rare and ancient characters from China’s written corpus. Deploying almost 30 companies, institutions and universities, it’s the largest state-funded digitization project ever undertaken.
Characters that have long resided in the dusty pages of old manuscripts will come to life in the digital medium. The online expansion will give people in China and around the world more access to the script, thereby helping spread the Chinese language and culture.
China has struggled with the global information architecture that favors the Western alphabet. Not any of the significant innovations in modern communications — Morse Code, typewriters and the ASCII (American Standard Code for Information Interchange) encoding standard — were built with the Chinese script in mind.
Chinese scientists toiled for decades to break into the alphabetic media. In 1974, the government directed Chinese engineers and mathematicians to develop a way to piggyback on to the American alphabetic keyboard. Eventually, hundreds of keystrokes were reconfigured to allow tens of thousands of characters to be typed into a computer on the standard keyboard.
The Chinese have long believed in the superiority of their written language. Beijing thinks the current number of encoded characters in Unicode inadequately represents the richness of China’s cultural past. Through the Font Bank, the Chinese will unlock their written treasures, from oracle bone scripts to ancient writings in minority languages.
The spread of Chinese language and culture through Confucius Institutes and other efforts around the world has been part of Beijing’s soft-power strategy for the past decade. The Font Bank takes this mission into the digital realm.
Anything from scholarly papers to tweets will help extend the reach of Chinese through its sheer availability. As more of the language enters cyberspace, more people will use it, and its status will rise with its visibility.
The digitization project will also hit close to home for many Chinese people, who have been ill-served by the incomplete digitization of their language.
Last year a local Chinese media outlet reported the story of a 10-year-old boy whose auspicious name contained a rare character made up of “dragon” and “sky.” School authorities could not find the character in the computer system, and after he passed an important exam, the rare character was replaced with a common, less colorful one — meaning “white” — on his certificate. He was left with inadequate proof of his achievement, upsetting his father.
There are many other personal examples with graver consequences: Some people can’t access health insurance or their money because the correct character for their name cannot be displayed on identification papers. In the old days, one could get away with filling in a rarely used character by hand. Today, if your proper name doesn’t have an electronic form, it might as well not exist.
There were enough cases like this that in the early 2000s China began to designate the characters people could use in their names. Authorities mandated that any name outside of the 1,605 specified characters had to be changed. The newly available characters will solve these headaches without restricting parents’ naming rights.
With all the benefits of a richer digital presence for Chinese, there is reason to be wary. The same state agency that controls censorship and communications is overseeing the effort, whose aim, according to a spokesman for the project, is to reshape the digital content of the Western-dominated internet. Netizens who have been using obscure characters for secret or playful language to avoid government scrutiny can expect to have fewer words to hide behind.
As the state’s online monitoring apparatus has grown in recent years, netizens have found ways to take jabs at the government through wordplay, the use of mutated or ancient characters, and nonstandardized electronic scripts developed in places like Taiwan. The Font Bank project will standardize the language, and as the scripts for secret usage enter into an official database, subversive language will be more easily detected. The newly digitized characters will help China to better track people’s movements, finances, and public and private speech.
But the project will do so much more. Putting the largest vocabulary online has been described as “sailing out on a borrowed ship” — a strategy that makes use of other countries’ networks, infrastructure and resources to take China’s agenda global. Adding a half million more characters may not be what the Jesuits prayed for, but it marks a new form of smart power for a nation still on the rise.
Jing Tsu, a professor at Yale, is writing a book on how China has transformed the Chinese language into a global technology.