Gelişmiş Arama

Basit öğe kaydını göster

dc.contributor.authorVoskergian, Daniel
dc.contributor.authorBakir-Gungor, Burcu
dc.contributor.authorYousef, Malik
dc.date.accessioned2024-02-01T13:43:28Z
dc.date.available2024-02-01T13:43:28Z
dc.date.issued2023en_US
dc.identifier.issn1664-8021
dc.identifier.otherWOS:001086438900001
dc.identifier.urihttps://doi.org/10.3389/fgene.2023.1243874
dc.identifier.urihttps://hdl.handle.net/20.500.12573/1916
dc.description.abstractWith the exponential growth in the daily publication of scientific articles, automatic classification and categorization can assist in assigning articles to a predefined category. Article titles are concise descriptions of the articles’ content with valuable information that can be useful in document classification and categorization. However, shortness, data sparseness, limited word occurrences, and the inadequate contextual information of scientific document titles hinder the direct application of conventional text mining and machine learning algorithms on these short texts, making their classification a challenging task. This study firstly explores the performance of our earlier study, TextNetTopics on the short text. Secondly, here we propose an advanced version called TextNetTopics Pro, which is a novel short-text classification framework that utilizes a promising combination of lexical features organized in topics of words and topic distribution extracted by a topic model to alleviate the data-sparseness problem when classifying short texts. We evaluate our proposed approach using nine state-of-the-art short-text topic models on two publicly available datasets of scientific article titles as shorttext documents. The first dataset is related to the Biomedical field, and the other one is related to Computer Science publications. Additionally, we comparatively evaluate the predictive performance of the models generated with and without using the abstracts. Finally, we demonstrate the robustness and effectiveness of the proposed approach in handling the imbalanced data, particularly in the classification of Drug-Induced Liver Injury articles as part of the CAMDA challenge. Taking advantage of the semantic information detected by topic models proved to be a reliable way to improve the overall performance of ML classifiers.en_US
dc.language.isoengen_US
dc.publisherFRONTIERS MEDIA SAen_US
dc.relation.isversionof10.3389/fgene.2023.1243874en_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjecttext classificationen_US
dc.subjectfeature selectionen_US
dc.subjecttopic selectionen_US
dc.subjecttopic projectionen_US
dc.subjecttopic modelingen_US
dc.subjectshort texten_US
dc.subjectsparse dataen_US
dc.titleTextNetTopics Pro, a topic model-based text classification for short text by integration of semantic and document-topic distribution informationen_US
dc.typearticleen_US
dc.contributor.departmentAGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümüen_US
dc.contributor.authorID0000-0002-2272-6270en_US
dc.contributor.institutionauthorBakir-Gungor, Burcu
dc.identifier.volume14en_US
dc.identifier.startpage1en_US
dc.identifier.endpage23en_US
dc.relation.journalFRONTIERS IN GENETICSen_US
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanıen_US


Bu öğenin dosyaları:

Thumbnail

Bu öğe aşağıdaki koleksiyon(lar)da görünmektedir.

Basit öğe kaydını göster