A new level in artificial intelligence-based research for the Hungarian language


With the help of machine learning algorithms using neural technology researchers at the ELKH Hungarian Research Centre for Linguistics (NYTK) have developed two world-class language models on supercomputers awarded in an ELKH infrastructure development tender.

HILANCO-GPTX, the first GPT-3 type artificial intelligence for the Hungarian language was created as a joint development by NYTK and the University of Pécs. The system, suitable for fluent communication and text production in both English and Hungarian can also generate program codes. To train this bilingual system the developers used an English text corpus of 102 billion words, and a Hungarian text corpus of 25 billion words. The linguistic artificial intelligence – created as a result of three months of machine learning – is able to create well-edited sentences in both languages and even to translate between the two languages.

Using the above technology and a supercomputer system, the staff of NYTK also created PULI GPT-3SX, a language model that learned solely using Hungarian material that was even larger than in the case of HILANCO-GPTX. PULI GPT-3SX was able to learn Hungarian by using material consisting of 32 billion words and containing only Hungarian-based texts.

For non-profit research and development purposes, both language models are available free of charge and their demo versions can be accessed

  • here – HILANCO-GPTX;
  • and here – PULI GPT-3SX.

NYTK researchers are organizing an event as part of the Hungarian Science Festival to present these two systems and their other new research results under the title Artificial Intelligence and the Hungarian Language on November 23, 2022 at 4 pm at the Headquarters of the Hungarian Academy of Sciences, Ceremonial Hall