================ TETIS lab (Cirad and Inrae) invites applications for a fully funded 3-year PhD position on data mining and data ingestion in the context of the H2020 MOOD Project (Montpellier - France) ================ -- Starting date is about October, 1st 2020 -- Subject: Generic methods for epidemiological monitoring based on the integration of heterogeneous textual data BACKGROUND AND PROBLEM The timely detection of (re)emerging animal infectious diseases worldwide is a keystone for risk assessment and risk management regarding both human and animal health. Several surveillance systems were designed to automatize the monitoring of online sources regarding a wide range of health threats, such as MedISys (Mantero et al., 2011), HealthMap (Freifeld et al., 2008), GPHIN (Blench, 2008), ProMED (Madoff, 2004) or PADI-web (Valentin et al., 2020). PADI-web (Platform for Automated extraction of Disease Information from the web) is an automated system dedicated to the monitoring of online news sources for the detection of animal health infectious diseases. PADI-web was developed to suit the need of the French Epidemic Intelligence System (FEIS, or Veille sanitaire internationale in French), which is part of the animal health epidemiological surveillance Platform (ESA Platform). The tool automatically collects news with customised multilingual queries, classifies them and extracts epidemiological information. SUMMARY OF PROPOSED WORK In addition to the improvement of the current platform (considering the heterogeneity of the different sources used in a multilingual context), the PhD work deals with the development of generic methods for monitoring and ingestion of textual data in epidemiology by proposing a new system called PADI-Web One Health. To address this issue, three main aspects will be studied. - Proposal of a generalized framework which integrates different types of surveillance. Since 2014, the PADI-Web system focused on epidemiological monitoring related to animal health. The objective is to extend these approaches in a generic framework that integrates plant and food health surveillance by taking into account spatio-temporal issues (extraction and disambiguation of spatial information in texts) and the genericity of certain concepts (e.g., symptoms). - Identification of fine epidemiological events in multi-source data. The proposed task consists in identifying information from multilingual data (news, scientific articles, etc.) and qualifying extracted information based on data quality, type of sources ... In this context, we plan to focus on the identification of weak signals from heterogenous textual data. The proposed methods will combine machine learning approaches, rule-based systems and word embedding methods. - Fusion of epidemiological information from heterogeneous data. The last expected contribution seeks to combine information from official data (e.g., OIE) with unofficial textual data obtained by text mining in order to propose a generic, robust and complete method. PROJECT The proposed thesis is part of the H2020 MOOD project "Monitoring Outbreak events for Disease surveillance in a data science context" (https://mood-h2020.eu/). This project, which brings together 25 partners from 10 countries, aims to improve the detection, surveillance and evaluation of emerging infectious diseases in Europe using the techniques of exploration and analysis of massive data from sources multiple. It is led by CIRAD (UMR ASTRE) with significant participation from UMR TETIS in WP2 and WP3. APPLICATION Candidates must have solid background in Computer Science, biostatistic or quantitative epidemiology and passion to apply the knowledge for One Health topics. Applicants should have or be in the process of getting a Master's degree in Computer Science. Application documents (*=mandatory) to send before the 26th of June 2020: * detailed CV * * motivation letter * * grades of the last two academic years (with ranking)* * contacts for recommandation* * master thesis report to send by e-mail to: * Mathieu Roche (CIRAD, UMR TETIS) Ð mathieu.roche@cirad.fr * Maguelonne Teisseire (Inrae, UMR TETIS) Ð maguelonne.teisseire@inrae.fr * Renaud Lancelot (Cirad, UMR ASTRE) Ð renaud.lancelot@cirad.fr Research lab: https://www.umr-tetis.fr Biblio - Valentin S, Arsevska E., Mercier A., Falala S., Rabatel J., Lancelot R., Roche M. PADI web: an event-based surveillance system for detecting, classifying and processing online news. In Post-Proceedings of 8th Language & Technology Conference, LTC 2017, Lecture Notes in Computer Science / Lecture Notes in Artificial Intelligence - Springer, 2020, to appear - Valentin S, Arsevska E, Falala S, de Go‘r J, Lancelot R, Mercier A, Rabatel J, Roche M. PADI-web: a multilingual web-based biosurveillance system for the monitoring of animal infectious diseases. Computer and Electronics for Agriculture, Elsevier, 169: 105163, 2020 - Arsevska E, Valentin S, Rabatel J, de Herve JG, Falala S, Lancelot R, Roche M. Web Monitoring of Emerging Animal Infectious Diseases Integrated in the French Epidemic Intelligence System in Animal Health. PLOS One; 13: e0199960, 2018. Selected as one of the "Best paper" of IMIA Yearbook of Medical Informatics 2019. - Drury B., Roche M. A survey of the applications of text mining for agriculture. Computers and Electronics in Agriculture, 163 104864. 13 p. 2019. - Lossio-Ventura J.-A., Bian J., Jonquet C., Roche M., Teisseire M. 2018. A novel framework for biomedical entity sense induction. Journal of Biomedical Informatics, 84 : 31-41, 2018