Distributional Semantics to Characterize Cognitive Patterns in HIV/AIDS Online Community
In 2015, Louisiana was in 2nd place in the HIV infection rate (24.2%). To promote HIV prevention, shared stories and conversations from HIV/AIDS online communities provide valuable information to extend public health knowledge. However, exploiting such information presents significant challenges. For example, the available data sources contain a great portion of highly unstructured data, i.e., free text, bringing notable linguistic complexity. In addition, extracted data may be better understood if they could be translated into cognitive and behavioral patterns. Building on distributional semantics, we correlate semantics from posted stories and comments in HIV/AIDS online communities. Further, we portray users’ cognitive and behavioral patterns, e.g., false belief about HIV/AIDS, behavioral change, etc. by inferring nearest neighbors of retrieved semantics. These cognitive and behavioral patterns are discussed further to enhance health promotion.
This project is sponsored by a grant from the College of Applied and Natural Science at Louisiana Tech University.
Understanding Patient Safety Reports via Multi-label Text Classification and Semantic Representation
Medical errors are the results of problems in health care delivery. One of the key steps to eliminate errors and improve patient safety is through patient safety event reporting. A patient safety report may record a number of critical factors that are involved in the health care when incidents, near misses, and unsafe conditions occur. Therefore, clinicians and risk management can generate actionable knowledge by harnessing useful information from the reports. To date, efforts have been made to establish a nationwide reporting and error analysis mechanism. The increasing volume of reports has been driving improvement in quantity measures of patient safety. For example, statistical distributions of errors across types of error and health care settings have been well documented. Nevertheless, a shift to quality measure is highly demanded. In a health care system, errors are likely to occur if one or more components (e.g., procedures, equipment, etc.) that are intrinsically associated go wrong. However, our understanding of what and how these components are connected is limited for at least two reasons. Firstly, the patient safety reports present difficulties in aggregate analysis since they are large in volume and complicated in semantic representation. Secondly, an efficient and clinically valuable mechanism to identify and categorize these components is absent.
We strive to make contribution by investigating the multi-labeled nature of patient safety reports. To facilitate clinical implementation, we proposed that machine learning and semantics of reports, e.g., semantic similarity between terms, can be used to jointly perform automated multi-label classification. In specific, we enhanced semantic representation of patient safety reports by developing a patient safety ontology. The ontology supports a number of applications including automated text classification, semantic reasoning, and information retrieval. To better disclose healthcare system valnaribilities, we leveraged multi-label text classification algorithms on patient safety reports. To improve the performance of machine learning, we developed a framework for incorporating semantic similarities and kernel-based multi-label text classification.
Our work provides insights into the nature of patient safety reports, that is a report can be labeled by multiple components (e.g., different procedures, settings, error types, and contributing factors) it contains. Multi-labeled reports hold promise to disclose system vulnerabilities since they provide the insight of the intrinsically correlated components of health care systems. We demonstrated the effectiveness and efficiency of the use of automated multi-label text classification embedded with semantic similarity information on patient safety reports. The proposed solution holds potential to incorporate with existing reporting systems, significantly reducing the workload of aggregate report analysis.
This project is partially sponsored by a grand from NIH and a grant from the University of Texas System.