ABSTRACT: Automatic keyphrase annotation of scientific documents using Wikipedia and genetic algorithms

Posted on 26 Feb 2013
By Brian S McGowan, PhD
In Abstract, Informatics & Analysis, Resources

Abstract

Topical annotation of documents with keyphrases is a proven method for revealing the subject of scientific and research documents to both human readers and information retrieval systems. This article describes a machine learning-based keyphrase annotation method for scientific documents that utilizes Wikipedia as a thesaurus for candidate selection from documents’ content. We have devised a set of 20 statistical, positional and semantical features for candidate phrases to capture and reflect various properties of those candidates that have the highest keyphraseness probability. We first introduce a simple unsupervised method for ranking and filtering the most probable keyphrases, and then evolve it into a novel supervised method using genetic algorithms. We have evaluated the performance of both methods on a third-party dataset of research papers. Reported experimental results show that the performance of our proposed methods, measured in terms of consistency with human annotators, is on a par with that achieved by humans and outperforms rival supervised and unsupervised methods.

via Automatic keyphrase annotation of scientific documents using Wikipedia and genetic algorithms.

Written by Brian S McGowan, PhD

Dr. McGowan has served in leadership positions in numerous medical educational organizations and commercial supporters and is a Fellow of the Alliance (FACEhp). He founded the Outcomes Standardization Project, launched and hosted the Alliance Podcast, and most recently launched and hosts the JCEHP Emerging Best Practices in CPD podcast. In 2012 he Co-Founded ArcheMedX, Inc, a healthcare informatics and e-learning company to apply his research in practice.

You must be logged in to post a comment.

Clinical Operations

Commercial Teams

Resources

Medical Education

About Us

Connect with us

Resource Center

ABSTRACT: Automatic keyphrase annotation of scientific documents using Wikipedia and genetic algorithms

Written by Brian S McGowan, PhD

Leave a Comment

Social

Recent News

Fresh Starts, Real Readiness: Turning Site Initiation into a Trial Success Multiplier

ArcheMedX Releases Largest Confidence-Based Assessment Dataset in CNS Clinical Research at CNS Summit 2025

Contact Us

300 E Main Street, Suite 101 Charlottesville, VA 22902

+1-434-260-1850

Email Us

About Us

Subscribe

Request a Demo