ABSTRACT: Improving performance of natural language processing part-of-speech tagging on clinical narratives through domain adaptation

Posted on 14 Mar 2013
By Brian S McGowan, PhD
In Abstract, Education Technology, Informatics & Analysis, Resources

Abstract:

Objective Natural language processing NLP tasks are commonly decomposed into subtasks, chained together to form processing pipelines. The residual error produced in these subtasks propagates, adversely affecting the end objectives. Limited availability of annotated clinical data remains a barrier to reaching state-of-the-art operating characteristics using statistically based NLP tools in the clinical domain. Here we explore the unique linguistic constructions of clinical texts and demonstrate the loss in operating characteristics when out-of-the-box part-of-speech POS tagging tools are applied to the clinical domain. We test a domain adaptation approach integrating a novel lexical-generation probability rule used in a transformation-based learner to boost POS performance on clinical narratives.Methods Two target corpora from independent healthcare institutions were constructed from high frequency clinical narratives. Four leading POS taggers with their out-of-the-box models trained from general English and biomedical abstracts were evaluated against these clinical corpora. A high performing domain adaptation method, Easy Adapt, was compared to our newly proposed method ClinAdapt. Results The evaluated POS taggers drop in accuracy by 8.5–15% when tested on clinical narratives. The highest performing tagger reports an accuracy of 88.6%. Domain adaptation with Easy Adapt reports accuracies of 88.3–91.0% on clinical texts. ClinAdapt reports 93.2–93.9%. Conclusions ClinAdapt successfully boosts POS tagging performance through domain adaptation requiring a modest amount of annotated clinical data. Improving the performance of critical NLP subtasks is expected to reduce pipeline error propagation leading to better overall results on complex processing tasks.

via Improving performance of natural language processing part-of-speech tagging on clinical narratives through domain adaptation — Ferraro et al. — Journal of the American Medical Informatics Association.

Post Tags - natural language processing

Written by Brian S McGowan, PhD

Dr. McGowan has served in leadership positions in numerous medical educational organizations and commercial supporters and is a Fellow of the Alliance (FACEhp). He founded the Outcomes Standardization Project, launched and hosted the Alliance Podcast, and most recently launched and hosts the JCEHP Emerging Best Practices in CPD podcast. In 2012 he Co-Founded ArcheMedX, Inc, a healthcare informatics and e-learning company to apply his research in practice.

You must be logged in to post a comment.

Clinical Operations

Commercial Teams

Resources

Medical Education

About Us

Connect with us

Resource Center

ABSTRACT: Improving performance of natural language processing part-of-speech tagging on clinical narratives through domain adaptation

Written by Brian S McGowan, PhD

Leave a Comment

Social

Recent News

ArcheMedX Releases Largest Confidence-Based Assessment Dataset in CNS Clinical Research at CNS Summit 2025

The Confidence Trap in Clinical Trials: When Knowing Just Enough Becomes Dangerous

Contact Us

300 E Main Street, Suite 101 Charlottesville, VA 22902

+1-434-260-1850

Email Us

About Us

Subscribe

Request a Demo