MENUCLOSE

 

Connect with us

Resource Center

ABSTRACT: Automatic keyphrase annotation of scientific documents using Wikipedia and genetic algorithms

Abstract

Topical annotation of documents with keyphrases is a proven method for revealing the subject of scientific and research documents to both human readers and information retrieval systems. This article describes a machine learning-based keyphrase annotation method for scientific documents that utilizes Wikipedia as a thesaurus for candidate selection from documents’ content. We have devised a set of 20 statistical, positional and semantical features for candidate phrases to capture and reflect various properties of those candidates that have the highest keyphraseness probability. We first introduce a simple unsupervised method for ranking and filtering the most probable keyphrases, and then evolve it into a novel supervised method using genetic algorithms. We have evaluated the performance of both methods on a third-party dataset of research papers. Reported experimental results show that the performance of our proposed methods, measured in terms of consistency with human annotators, is on a par with that achieved by humans and outperforms rival supervised and unsupervised methods.

via Automatic keyphrase annotation of scientific documents using Wikipedia and genetic algorithms.

Written by

Dr. McGowan has served in leadership positions in numerous medical educational organizations and commercial supporters and is a Fellow of the Alliance (FACEhp). He founded the Outcomes Standardization Project, launched and hosted the Alliance Podcast, and most recently launched and hosts the JCEHP Emerging Best Practices in CPD podcast. In 2012 he Co-Founded ArcheMedX, Inc, a healthcare informatics and e-learning company to apply his research in practice.

Leave a Comment