Skip to main content

New social and data science findings using the KRTK Databank

News

The Centre for Economic & Regional Studies[1] (KRTK) Databank has created Admin3, the third time it has created a version of Admin, the research database for public administration records related to the social sciences. The Admin database offers another opportunity for the creation of independent, ethical and professional scientific research studies concerning Hungarian society and based on administrative data. The previous waves of linked data will also be made available for wider scientific usage.

The database created by the KRTK Databank marks the first time in the Central and Eastern European region that a database combining public administration records has been expressly created for research purposes. Because the database contains content from such a wide range of records, its importance to social science is enormous. It can be used to simultaneously research almost all topics examinable on the basis of administrative data. The database is being cleaned up with scientific care by the staff of the Databank, in partnership with other researchers who have been using the records for an extended period.

KRTK researchers have been working for many years on data-driven research on important policy issues. The database is currently being used to carry out research in the fields of health science, health policy, regional sciences, labor economics, business research, migration research, educational research, agronomy and social policy. Results of international significance have also been achieved in relation to the employment of people with disabilities, the impact of pensions on health care expenditure and changes in unemployment benefits. The figure below displays the main findings of an Admin-based piece of research on the labor market situation for prisoners.

Employment and career paths of prisoners

elkh-krtk-hir-abra-1

elkh-krtk-hir-abra-2

Source: Admin2 database, which tracks the labor market careers of nearly 40,000 people who had been to prison at least once between 2003 and 2011. Source: Demand constraints in the reintegration of people released from prison (István Boza – Anikó Csáki – Virág Ilyés – János Köllő – Zsófia Kőműves – Lili Márk – Mercedes Mészáros). Publication pending.

The latest results of domestic relevance report on the consequences of lowering the compulsory school age, the subsequent labor market effects of the competence results of 10th graders and the effectiveness of vocational training reform.

The following figures show the effect of 10th graders' competency test outcomes on their earnings at the ager of 25. Admin3-based research found that students who achieved better competency test scores had higher adult earnings and were less likely to become unemployed.

Correlation of the test results of the 10th graders of 2008 with the October 2017 earnings and the chances of unemployment in the 20 groups formed on the basis of the test results:

Earnings (log)

elkh-krtk-hir-abra-matematika
Mathematics
elkh-krtk-hir-abra-szovegertes
Reading comprehension

Unemployment

elkh-krtk-hir-abra-matematika-2
Mathematics
elkh-krtk-hir-abra-szovegertes
Reading comprehension

Averages for the 20 groups according to test score

Source: Admin3 database. The impact of reading and mathematics test results on future earning (Zoltán Hermann, Dániel Horn, János Köllő, Anna Sebők, András Semjén & Júlia Varga). In: Fazekas, K.; Csillag, M; Hermann, Z; Scharle, Á (editor). Munkaerőpiaci Tükör 2018., 45-53. / THE IMPACT OF READING AND MATHEMATICS TEST RESULTS ON FUTURE EARNINGS AND EMPLOYMENT, in: Fazekas, K; Csillag, M; Hermann, Z; Scharle, Á (eds.). The Hungarian Labour Market – Review and Analysis, 2019. 45-52.

Admin

The content of the records contained in the Admin database is linked on an individual level, anonymously. This means that 5 million people can be tracked on a monthly basis for 15 years without being directly identifiable. The database contains both the school, study data and the results of the competency test. Later, as the individuals get older, we can see the patterns in the labor market and characterize jobs, occupations, peer groups and earnings. The database can be used to see when individuals receive sick pay, when they leave a job, and even when they retire. We also get an idea of whether a person who is not currently working is receiving some kind of state benefit or has been registered as unemployed. In addition, it also provides data on their state of health.

Protection of personal data

Data is linked using a secure hashing algorithm unknown to all registrars. The hash procedure is used to encrypt unique identifiers (such as the TAJ number) in the records that represent the data source. As a result, information related to a given individual from different registers can be clearly assigned from one to another, but the data cannot be linked to specific persons on the basis of an identifier.

In order to further reduce the disclosure risk, the liaison body (Nemzeti Infokommunikációs Szolgáltató Zrt.) also provides anonymization. As part of the process, it consolidates categories with a critically low number of cases (e.g. codes for various illnesses, educational qualifications). In addition, in order to protect individual data, the KRTK Databank only allows the Admin database collection to be searched in a secure and closed server environment, and solely by researchers and professionals commissioned by a scientific institute or with an appropriate and verifiable scientific objective.

Professional protection of individual data and scientific ethical criteria provide mutual protection against disclosure risk. By contrast, the correct and ethical use of individual data is more doubtful in on the open market, as there it is not possible to provide the level of control found in scientific work.

Independent, cross-disciplinary, independent, data-driven social research practices that pay particular attention to verifiable data use are needed to develop policies, create an effective scientific and public administration sphere, and to develop data ethics guidelines applicable to all areas of life.

KRTK Databank

This year marks the 15th anniversary of the KRTK Databank, which is at the forefront of the development of the scientific infrastructure of empirical social science research in Hungary. In doing so, it carries out five main empirical activities. First, it produces the large-scale, administrative-based databases that track in detail people and companies over an extended period.

Second, it acquires and puts into a state suitable for research (for example, harmonizes, puts in chronological order, or cleans up) the most important household and corporate data surveys. Its third task is, for the last six years, to develop and manage the KSH-KRTK research room, which is available to all KRTK researchers and co-authors. In addition, the KRTK Databank also operates an open experimental laboratory, which enables researchers to conduct social science experiments. The Databank also serves as a place for internships, is involved in university education through holding courses, and makes a number of databases available to graduates and doctoral students.

The significance of the Databank is indicated by the fact that all the five Momentum groups of KRTK have now used the infrastructure. 580 publications (theses, dissertations, domestic and international studies) have been written using the data. A number of policy domestic and foreign impact assessments have been carried out based on data from the Databank, including programs such as ERC, H2020, Momentum, the Cooperation Program of Excellence and OTKA. The services and databases of the KRTK Databank can be used by researchers almost in their entirety, free of charge.

Detailed information is available on the KRTK Databank website.

You can read more about the Admin database in the article below.

[1] Hungarian Academy of Sciences Centre of Excellence