ABSTRACT: Automatic keyphrase annotation of scientific documents using Wikipedia and genetic algorithms

Posted on 26 Feb 2013
By Brian S McGowan, PhD
In Abstract, Informatics & Analysis, Resources

Abstract

Topical annotation of documents with keyphrases is a proven method for revealing the subject of scientific and research documents to both human readers and information retrieval systems. This article describes a machine learning-based keyphrase annotation method for scientific documents that utilizes Wikipedia as a thesaurus for candidate selection from documents’ content. We have devised a set of 20 statistical, positional and semantical features for candidate phrases to capture and reflect various properties of those candidates that have the highest keyphraseness probability. We first introduce a simple unsupervised method for ranking and filtering the most probable keyphrases, and then evolve it into a novel supervised method using genetic algorithms. We have evaluated the performance of both methods on a third-party dataset of research papers. Reported experimental results show that the performance of our proposed methods, measured in terms of consistency with human annotators, is on a par with that achieved by humans and outperforms rival supervised and unsupervised methods.

via Automatic keyphrase annotation of scientific documents using Wikipedia and genetic algorithms.

Written by Brian S McGowan, PhD

Dr. McGowan has served in leadership positions in numerous medical educational organizations and commercial supporters and is a Fellow of the Alliance (FACEhp). He founded the Outcomes Standardization Project, launched and hosted the Alliance Podcast, and most recently launched and hosts the JCEHP Emerging Best Practices in CPD podcast. In 2012 he Co-Founded ArcheMedX, Inc, a healthcare informatics and e-learning company to apply his research in practice.

You must be logged in to post a comment.

Clinical Operations

Commercial Teams

Resources

Medical Education

About Us

Connect with us

Resource Center

ABSTRACT: Automatic keyphrase annotation of scientific documents using Wikipedia and genetic algorithms

Written by Brian S McGowan, PhD

Leave a Comment

Social

Recent News

The Compounding Power of Training: Early Investment Yields Exponential Returns

Raising the Bar: Why Quality and Training Will Define Your Success Under ICH E6(R3)

Contact Us

300 E Main Street, Suite 101 Charlottesville, VA 22902

+1-434-260-1850

Email Us

About Us

Subscribe

Request a Demo