San José State University |
---|
applet-magic.com Thayer Watkins Silicon Valley & Tornado Alley USA |
---|
(a.k.a. Machine Translation) |
Before computers the process of attempted translation without a fluent translator involved the creation of glosses. This was the writing of words between the lines of the text to be translated. Thus a gloss could be prepared mechanically by someone with a proper bilingual dictionary, but without a knowledge of the language to be translated. With the word-for-word translation of the material a person with knowledge of the subject might be able to create a useful translation.
The impulse for computer translation came from the launching of Sputnik in 1957. There was a widespread perception that Russian technology was ahead of the West. I was at M.I.T. at the time and I remember a professor holding up a monograph in Russian on some technical subject and saying that he judged that the Russians were six years ahead of us.
If we were catch up the logical place to start was reading what was available in Russian. But not many working scientists could afford to stop what they were doing and start learning Russian. Obviously translation of technical material was labor intensive and many of those translators who were fluent in Russian did not have the interest and expertise in learning the hard science and technology for doing the translation. The wondrous machines called computers seemed to be the answer to the problem.
In 1958, the year after the launching of Sputnik, Congress passed the National Defence Education Act. This provided funding for education and research, but it also provided for a technical scientific information service. This information service was given the task of helping the American scientific and engineering organizations get access to information from throughout the world. The legislation specifically mentioned the use should be made of new and improved methods of translation such as mechanized systems. Such mechanized systems became known as machine translation, a term now archaic and replaced by computer translation.
There had been moves toward computer translation years before Sputnik. In 1946 Warren Weaver of the Rockefeller Foundation proposed that the successful use of computer to break codes during World War II be extended to translation of languages. Andrew D. Booth opined that such computer translation only required a large enough memory capacity. In July of 1949 Weaver distributed a memorandum on the topic to people in the computing community. As a result of that memorandum and funding from the Rockefeller Foundation research on the topic was begun at M.I.T., U.C.L.A. and the University of Washington.
A conference on computer translation was held in the spring of 1952. In 1954 Georgetown University and IBM revealed that a successful experiment in translation of Russian to English had been completed. This was for a narrow range of text that involved a vocabulary of only 251 words and six rules of syntax. This was an artificially limited range of text but its success prompted the Soviet leadership to initiate a program of computer translation. Work in computer translation in Britain began in 1955 at Cambridge University and in France a program of research computer translation was set up at Grenoble University. In Japan the program was established at Kyushu University in 1955.
By 1960 some researchers were having doubt as to whether high quality computer translation was possible. A prominent researcher in the program at MIT, Yehoshua Bar-Hillel, published a paper declaring that high quality computer translation is impossible. He left the field.
In 1964 the U.S. Academy of Science set up a committee to investigate the feasibility of computer translation. After a year of study the Committee published its report as the Automatic Language Processing Advisory Committee (ALPAC) Report. After studying the programs in America and Europe and making comparisons of computer translations with translations done by humans it concluded that computer translation was inferior to human translation not only in terms of quality but also cost. It recommended that further effort in the field should be devoted to making human translation faster and cheaper. As a result of the ALPAC Report the funding for computer translation by the U.S. government dropped to virtually zero. The researchers in the field left and started new careers. This was a tragic loss of expertise. Research in computer translation was virtually abandoned from 1965 to 1975.
Fortunately there were a few researchers who did not leave the field. In 1965 the system of Russian-English translation that was developed at Georgetown University was transferred to the Air Defence Center at Rome, New York. There it was used to translate Russian scientific material. Its output required post-editing, as does all computer translations but it saved time for the human translators by giving them a first draft. They did not have to do all of the typing.
One of the researchers at Georgetown, Peter Toma, a Hungarian-born linguistics expert developed a Russian-English translation system which, in 1973, he used as the basis for a private company, SYSTRAN. The company went on to develop translation software for French, Italian, Spanish and Portuguese to and from English. There will be more on SYSTRAN later. Another system LOGOS was developed to translate between English and Vietnamese. In France, Bernard Vauquois of the Centre Nationale de la Recherche Scientifique developed computer program to translate between Russian and French.
In Canada both English and French are national languages. This means that all government documents must be published both in English and French. Computer translation is not generally of suitable quality to allow its use in fulfilling this language requirement. However there are numerous weather stations throughout Canada that communicate weather reports and short term forecasts. The range of text is narrow and the sentences are generally very simple in structure. Therefore the Canadian government created a system, which became operational in 1978, in which the English language reports from the weather stations are sent to Montreal University where they are generally automatically translated into French. The system recognizes when it is confronted by an unusual sentence and presents its translation to a human translator for verification or correction. It thus achieves a high rate, 95 percent, of success. The system operates 24 hours a day, seven days a week. The name for the system is TAUM METEO from Traduction Automatique de l'Université de Montreal.
As a result of the success of TAUM METEO there was an attempt to create a TAUM AVIATION for the automatic translation of messages concerning the maintenance of aircraft. This was much more difficult because of the broader range of topics and was not considered successful.
Completely separate from the field of computer translation there were developments going on at the time in fundamental research in linguistic theory. Noam Chomsky published in 1957 his revolutionary work Syntactic Structures. This was brought to fruition with his Aspects of the Theory of Syntax in 1965. Chomsky had a foundation in the mathematical theory of languages and automata which cast his theories into a form suitable for use with computers. The phrase structure of sentences is such a fruitful line of analysis.
Chomsky's approach emphasized the formal structure of language text. Other linguists thought that a consideration of meaning, the semantics of text was essential to the translation process. Charles Fillmore developed case grammar for describing the relationship between the words of a sentence taking into account semantics. This was published in 1968 in Universals in linguistics theory (edited by E. Bach and T. Harms). Terry Winograd of MIT constructed a system that allowed for a conversation between a human and a computer if a knowledge of the world under discussion is incorporated in the system. There are some sentences that have alternate linguistically legitimate interpretations. For example, the sentence Time flies like an arrow has a philosophical meaning but a computer could also interpret as a statement about creatures called time flies. Knowing that the subject of discussion is philosophy rather than biology disambiguates the sentence.
In Japan further exploration in computer translation continued but with more limited scope. For example, there was a project to create a system for the automatic translation of the titles of scientific articles in English into Japanese. About ten thousand titles were collected for linguistic analysis. The analysis concluded that there were only 18 different linguistic structures used in those titles.
Business involved in exporting technical products such as computers became aware that the cost of translating support documents amounted to a substantial portion of the marketing costs. While computer translation could not eliminate the need for human translators it could reduce the time and costs by limiting the human component of the translation to post-editing. Human translators could expedite the translation process by pre-editing; i.e., by making judgments on which material is suitable for computer translation and which is not.
The European Community required the translation of official documents into all the languages of the member states. This led to delays in the holding of meetings. In 1976 The European Community started using SYSTRAN for English to French translation, with of course post-editing by human translators. By 1981 this was extended to the use of SYSTRAN for German and Italian.
For multilingual organizations like the European Community there is an economy in scale. A good computer translation system involves three phases. First the possible phrase structures of a sentence are generated. Second a knowledge of the topic area is used to select the most probable phrase structure from among the alternatives. Third the selected phrase structure is converted into the target language through the use of a dictionary and syntactic rules. Once the more difficult first two phases are carried out it is relatively easy to produce translations in a number of different languages.
SYSTRAN now offers computer translation between English and thirteen other languages. These include four Romance languages; French, Italian, Spanish and Portuguese; three Germanic languages: German, Dutch and Swedish; two Slavic languages; Russian and Polish; three Asian languages; Chinese (Mandarin in simplified characters), Japanese and Korean; and one Semitic language; Arabic. I was fortunate enough to be acquainted with someone who is an expert in computer translation software. He recommended SYSTRAN to me as being the best. I immediately bought SYSTRAN's entry level package of nine European and three Asian languages. Over the years I upgraded to the top level version which costs approximately one thousand dollars. My use of SYSTRAN was purely for fun. As a professor at San Jose State University I had started delivering class handouts in my classes in economics as webpages on the internet. The University initially told me that I could have ten megabytes of space on the University server for my webpages. I recognized that I would very quickly fill up that space, particularly with diagrams. Image diagrams require a lot of space; often ten times as much space as the accompanying text. I learned to create Java applets to draw the diagrams instead of using image files. At first that meant the diagram displayed more quickly from an applet than from an image file. Over the years that has changed. Even with the economy of the Java applets I was soon using more than the ten megabytes of server space. The University did not restrict my use but I was worried that it was only because the other faculty members were not using their ten megabytes. I was also putting up webpages on topic outside of economics. There were some inquiries as to whether all of my webpages were connected with my teaching in economics. I therefore secured a website from Yahoo! with the domain name applet-magic.com. Initially the applet-magic site was just a duplicate of my University site. There was an advantage to having two sites; I could tell my students that if the University site was down they could find the same material at the applet-magic site.
When I bought the SYSTRAN software I started using it to translate my webpages. I was well aware that while SYSTRAN might be the very best that did not mean that it was perfect. Therefore I did not put up my imperfect translation on the University website out of deference to the language departments at the University. To my surprise I often got ten times as many accesses to the Spanish translations as I got for the original English webpages. I assumed this was because of a paucity of Spanish webpages on the technical topics.
I became aware of the problem of quality when I used the software to translate material from Spanish into English. The results were terrible English but I could, with a little effort, edit them into suitable English. With computer translation I always knew what was being said, whereas with only the original that was not true.
The computer translations, even the best, often offend someone fluent in the target languages. Over the years I told my students about the translations and some of the international students looked at them. Their remarks were often quite candid. One Russian student said, "You know your Russian language translation program just doesn't work!" A Polish student said of the Polish translations that "They are so bad that they are almost useless." I perceive that Slavic languages are difficult for the computer translation programs to handle. For the other European language translations I found that they often appeared in the list of the top ten in Google and Bing searches on their topics.
(To be continued.)
For the later history of computer translation see Computational Translation.
HOME PAGE OF Thayer Watkins |