ABSTRACT: Probability-based text clustering algorithm by alternately repeating two operations

Posted on 8 Feb 2013
By Brian S McGowan, PhD
In Abstract, Informatics & Analysis, Resources

Abstract

Owing to the rapid advance of internet technology, users have to face to a large amount of raw data from the World Wide Web every day, most of which is displayed in text format. This situation brings a great demand for efficient text analysis techniques by internet users. Since clustering is unsupervised and requires no prior knowledge, it is extensively adopted to help analyse textual data. Unfortunately, as far as I know, almost all the clustering algorithms proposed so far fail to deal with large-scale text collection. For precisely classifying large-scale text collection, a novel probability based text clustering algorithm by alternately repeating two operations (abbreviated as PTCART) is proposed in this paper. This algorithm just repeats two operations of (a) feature set construction and (b) text partition until the optimal partition is reached. Its convergent capacity is also validated. Experiments results demonstrate that, compared with several popular text clustering algorithms, PTCART has excellent performance.

via Probability-based text clustering algorithm by alternately repeating two operations.

Written by Brian S McGowan, PhD

Dr. McGowan has served in leadership positions in numerous medical educational organizations and commercial supporters and is a Fellow of the Alliance (FACEhp). He founded the Outcomes Standardization Project, launched and hosted the Alliance Podcast, and most recently launched and hosts the JCEHP Emerging Best Practices in CPD podcast. In 2012 he Co-Founded ArcheMedX, Inc, a healthcare informatics and e-learning company to apply his research in practice.

You must be logged in to post a comment.

Clinical Operations

Commercial Teams

Resources

Medical Education

About Us

Connect with us

Resource Center

ABSTRACT: Probability-based text clustering algorithm by alternately repeating two operations

Written by Brian S McGowan, PhD

Leave a Comment

Social

Recent News

The Earliest Warning Sign: How Measuring Readiness at Training Predicts Trial Risk Before It Shows Up in the Data

Fresh Starts, Real Readiness: Turning Site Initiation into a Trial Success Multiplier

Contact Us

300 E Main Street, Suite 101 Charlottesville, VA 22902

+1-434-260-1850

Email Us

About Us

Subscribe

Request a Demo