University of Bahrain
Scientific Journals

From Data to Insight: Topic Modeling and Automatic Labeling Strategies

Show simple item record

dc.contributor.author F. Najeeb, Rana
dc.contributor.author N. Dhannoon, Ban
dc.contributor.author Qais Alkhalidi, Farah
dc.date.accessioned 2024-04-26T16:21:19Z
dc.date.available 2024-04-26T16:21:19Z
dc.date.issued 2024-04-26
dc.identifier.issn 2210-142X
dc.identifier.uri https://journal.uob.edu.bh:443/handle/123456789/5627
dc.description.abstract Researchers usually present and synthesize their findings in scientific publications. For this reason, it is essential to analyze their substance to understand a subject. This study suggests improving the topic modeling in a collection of conference papers on Neural Information Processing Systems (NIPS) released between 1987 and 2017. Two goals of this study were achieved: producing more coherent topics and topic automatic labeling. The first goal was achieved through five phases, text pre-processing phase, reduction phase using a new method called RS-LW (Reduced Sentences Based on Length and Weight), which removes the sentences of shorter length, then calculates the weight for the remaining sentences and removes approximately 25% of the less weight sentences. Sentence embedding phase using S-BERT (Sentence-Bidirectional Encoder Representation from Transformer), Reducing the dimensionality of the sentences embedding phase by utilizing UMAP (Uniform Manifold Approximation and Projection). Lastly, the use of HDBSCAN (Hierarchical Density-Based Spatial Clustering of Applications with Noise) to organize comparable documents. The experimental findings demonstrate that the use of the proposed RS-LW phase has produced more cohesive topics. This has led to improvements in topic coherence by (0.593), and topic diversity performance by (0.96). Though topic modeling extracts the most salient sentences describing latent topics from text collections, an appropriate label has not yet been identified. The second goal was achieved by suggesting a new method to generate the keywords by accessing the authors profile in Google Scholar and extracting the interests for use in automatically labeling the topics. en_US
dc.language.iso en en_US
dc.publisher University of Bahrain en_US
dc.subject Deep Learning, Topic Modelling, Automatic Topic Labeling, S-BERT, Pre-trained Language Model. en_US
dc.title From Data to Insight: Topic Modeling and Automatic Labeling Strategies en_US
dc.identifier.doi http://dx.doi.org/10.12785/ijcds/XXXXXX
dc.volume 16 en_US
dc.issue 1 en_US
dc.pagestart 1 en_US
dc.pageend 10 en_US
dc.contributor.authorcountry Iraq en_US
dc.contributor.authorcountry Iraq en_US
dc.contributor.authorcountry Iraq en_US
dc.contributor.authoraffiliation Computer Science of Mustansiriyah University en_US
dc.contributor.authoraffiliation Computer Science of Al-Nahrain University en_US
dc.contributor.authoraffiliation Computer Science of Mustansiriyah University en_US
dc.source.title International Journal of Computing and Digital Systems en_US
dc.abbreviatedsourcetitle IJCDS en_US


Files in this item

This item appears in the following Issue(s)

Show simple item record

All Journals


Advanced Search

Browse

Administrator Account