Disease informatics
   HOME

TheInfoList



OR:

Disease Informatics (also infectious disease informatics) studies the knowledge production, sharing, modeling, and management of infectious diseases. It became a more studied field as a by-product of the rapid increases in the amount of biomedical and clinical data widely available, and to meet the demands for useful data analyses of such data. Considering infectious
diseases A disease is a particular abnormal condition that negatively affects the structure or function of all or part of an organism, and that is not immediately due to any external injury. Diseases are often known to be medical conditions that ar ...
contribute to millions of deaths every year, the ability to identify and understand disease diffusion is crucial for society to apply control and prevention measures. The knowledge gained by researchers in the field of disease informatics can be used to aid policymakers' decisions on issues such as spreading public awareness, updating the training of health professionals, and buying vaccines. Aside from aiding in policymakers' decisions, the goals of disease informatics also include increased identification of biomarkers for transmissibility, improved vaccine design, and a deeper understanding of
host A host is a person responsible for guests at an event or for providing hospitality during it. Host may also refer to: Places * Host, Pennsylvania, a village in Berks County People *Jim Host (born 1937), American businessman * Michel Host ...
- pathogen interactions, and the optimization of antimicrobial development.


Methods


Artificial intelligence

The use of artificial intelligence (AI) tools, such as machine learning and natural language processing ( NLP), in disease informatics increase efficiency by automating and speeding up several data analysis processes. Advances with AI and increased accessibility of data aid in predictive modeling and public health surveillance. AI uses predictive modeling to examine vast data sets and forecast future outcomes to increase the ability to predict disease outbreaks and help guide public health treatments. AI also provides a valuable avenue by combining its ability of spatial modeling with
geographic information system (GIS) A geographic information system (GIS) is a type of database containing geographic data (that is, descriptions of phenomena for which location is relevant), combined with software tools for managing, analyzing, and visualizing those data. In a br ...
data to uncover geographical patterns (for example disease clusters) to support data-driven decision-making for local-level predictions of disease diffusion. As the growth of AI continues, more advances for its use in disease informatics are expected to come.


Machine learning

Machine learning (ML) techniques aid the study of disease informatics with its capability to spatially and temporally predict the progression and transmission of infectious diseases. In disease informatics, ML algorithms are used to analyze extensive amounts of complex data sets to identify patterns across varying types of data such as demographics, electronic health records, environmental conditions, etc. The types of ML techniques commonly used are decision trees (
decision tree model In computational complexity the decision tree model is the model of computation in which an algorithm is considered to be basically a decision tree, i.e., a sequence of ''queries'' or ''tests'' that are done adaptively, so the outcome of the pre ...
), random forests, support vector machines (
support vector machine In machine learning, support vector machines (SVMs, also support vector networks) are supervised learning models with associated learning algorithms that analyze data for classification and regression analysis. Developed at AT&T Bell Laboratorie ...
), and deep learning networks (
deep learning Deep learning (also known as deep structured learning) is part of a broader family of machine learning methods based on artificial neural networks with representation learning. Learning can be supervised, semi-supervised or unsupervised. De ...
). Using these tools, researchers can apply them to data sets (for example genomic data, social media posts, and health records) to make predictions about the potential sources of an outbreak, the likelihood of an individual contracting a certain disease, and forecasting the number of cases of a disease in a given region. ML models have proven to be just as accurate as traditional statistical methods (especially when multiple ML models are used concurrently) when it comes to predicting the spread and onset of diseases, according to numerous studies.


Text mining

The use of text mining has become a beneficial avenue for querying large amounts of data to aid in gene mapping and the analysis of genomes. This tool provides the ability to query medical databases for processes such as genomic mapping, by integrating the genomic and proteomic data to map the genes and highlight their interrelationships with various diseases. Retrieving data of targeted sequences can be done in two ways, through a similarity search or by keyword search. A similarity search (using software like BLAST (biotechnology) is performed by entering a known sequence as a query sequence to search for sequences that have similarities. A keyword search (public tools include SRS, Entrez, and ACNUC) uses annotations that define the features of genes, such as sequence positions, to retrieve the desired gene sequences being searched for.


Syndromic Surveillance

Through a process called syndromic surveillance (related to public health surveillance) data analysis methods can be successfully used to predict potential disease outbreaks by detecting timely, pre-diagnosis health indicators. Syndromic surveillance combines demographic data (age, gender, ethnicity, etc.) with patient visit data (admission status,
chief complaint The chief complaint, formally known as CC in the medical field, or termed presenting complaint (PC) in Europe and Canada, forms the second step of medical history taking. It is sometimes also referred to as reason for encounter (RFE), presenting pro ...
, type of office visit, etc.) that can be put through natural language processes to highlight potential predictors of an outbreak. Due to the time-sensitivity in predicting possible outbreaks, the use of chief complaint data is valuable as it is available much more quickly than formal diagnosis data from physicians' offices. The key to successfully harnessing surveillance data for disease informatics is to use more than one source. Other important sources that are commonly used synchronically include the following: * Over-the-counter drug (OTC) sales * Hospital admissions * Absenteeism rates from schools and workplaces * Lab test orders * Poison control centers' communications * Case report numbers


Limitations and future prospects


Accessibility concerns

The accuracy of these AI tools and techniques relies upon providing them with high-quality, comprehensive data. Accessibility and collection of such data is still an ongoing challenge because most of the data pulled is incomplete, noisy, and contains human errors (i.e. grammar, abbreviations, spelling) which means the data must undergo a thorough cleaning ( data cleansing) before it is eligible to be used. The data collected will also come from numerous sources (due to differences in data availability and governance) that use varying formatting and software, creating an issue of needing some form of standardized infrastructure to better integrate and manage data. The formation of a standardized taxonomy for data analysis and predictive modeling would facilitate research collaboration, accelerate decisions, and help select the right predictive models to be used. One method being used is federated learning, which allows the AI to be trained across multiple different centers without the need for sharing raw data, keeping the data safe within its source. However, the same issues of different formatting and software to ensure model convergence still affect this approach as well, so algorithmic improvements are needed. Another concern is the potential for bias and
overfitting mathematical modeling, overfitting is "the production of an analysis that corresponds too closely or exactly to a particular set of data, and may therefore fail to fit to additional data or predict future observations reliably". An overfitt ...
of the predictive models, which could lead to inaccurate predictions. Human error can still persist even using these tools to automate tasks, due to the fact that if the AI tools are trained incorrectly, they will produce inaccurate data. A relevant study suggests that implementing AI with wearable devices and other emerging technology in the future would benefit some of the challenges by providing real-time data for the models to use, which could lead to increased accuracy of the data in its raw form, creating less need to spend time cleaning the data, and allowing the models to make more accurate predictions.


Ethical concerns

A critical concern for using AI and predictive modeling in disease informatics is data
security Security is protection from, or resilience against, potential harm (or other unwanted coercive change) caused by others, by restraining the freedom of others to act. Beneficiaries (technically referents) of security may be of persons and social ...
and privacy. The data sources being used (electronic health records, demographics, etc.) contain highly sensitive information that must be protected for all parties involved. Any models or techniques being used need to be in compliance with local governmental regulations and laws such as
HIPAA The Health Insurance Portability and Accountability Act of 1996 (HIPAA or the Kennedy– Kassebaum Act) is a United States Act of Congress enacted by the 104th United States Congress and signed into law by President Bill Clinton on August 21, 19 ...
in the United States. The data used must also undergo rigorous data anonymization and de-identification protocols to protect patient privacy. Through the further use and growth of explainable AI, also referred to as XAI, ( explainable artificial intelligence) researchers and all parties involved can ensure transparency and accountability when it comes to using data analysis and computational methods in the field of disease informatics. XAI provides explanations of how the algorithms being used work, why they were chosen, what knowledge they produce, and so on.


References

Health informatics Computational fields of study {{health-informatics-stub