Gelişmiş Arama

Basit öğe kaydını göster

dc.contributor.authorBozkurt, Berat
dc.contributor.authorCoskun, Kerem
dc.contributor.authorBakal, Gokhan
dc.date.accessioned2024-08-20T12:11:10Z
dc.date.available2024-08-20T12:11:10Z
dc.date.issued2024en_US
dc.identifier.issn00104825
dc.identifier.urihttps://doi.org/10.1016/j.compbiomed.2024.108721
dc.identifier.urihttps://hdl.handle.net/20.500.12573/2339
dc.description.abstractSince the 2000s, digitalization has been a crucial transformation in our lives. Nevertheless, digitalization brings a bulk of unstructured textual data to be processed, including articles, clinical records, web pages, and shared social media posts. As a critical analysis, the classification task classifies the given textual entities into correct categories. Categorizing documents from different domains is straightforward since the instances are unlikely to contain similar contexts. However, document classification in a single domain is more complicated due to sharing the same context. Thus, we aim to classify medical articles about four common cancer types (Leukemia, Non-Hodgkin Lymphoma, Bladder Cancer, and Thyroid Cancer) by constructing machine learning and deep learning models. We used 383,914 medical articles about four common cancer types collected by the PubMed API. To build classification models, we split the dataset into 70% as training, 20% as testing, and 10% as validation. We built widely used machine-learning (Logistic Regression, XGBoost, CatBoost, and Random Forest Classifiers) and modern deep-learning (convolutional neural networks - CNN, long short-term memory - LSTM, and gated recurrent unit - GRU) models. We computed the average classification performances (precision, recall, F-score) to evaluate the models over ten distinct dataset splits. The best-performing deep learning model(s) yielded a superior F1 score of 98%. However, traditional machine learning models also achieved reasonably high F1 scores, 95% for the worst-performing case. Ultimately, we constructed multiple models to classify articles, which compose a hard-to-classify dataset in the medical domain.en_US
dc.language.isoengen_US
dc.publisherELSEVIERen_US
dc.relation.isversionof10.1016/j.compbiomed.2024.108721en_US
dc.rightsinfo:eu-repo/semantics/closedAccessen_US
dc.subjectText miningen_US
dc.subjectClassificationen_US
dc.subjectMachine learningen_US
dc.subjectDeep learningen_US
dc.titleBuilding a challenging medical dataset for comparative evaluation of classifier capabilitiesen_US
dc.typearticleen_US
dc.contributor.departmentAGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümüen_US
dc.contributor.authorID0000-0003-2897-3894en_US
dc.contributor.institutionauthorBozkurt, Berat
dc.contributor.institutionauthorCoskun, Kerem
dc.contributor.institutionauthorBakal, Gokhan
dc.identifier.volume178en_US
dc.identifier.startpage1en_US
dc.identifier.endpage8en_US
dc.relation.journalComputers in Biology and Medicineen_US
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanıen_US


Bu öğenin dosyaları:

Thumbnail

Bu öğe aşağıdaki koleksiyon(lar)da görünmektedir.

Basit öğe kaydını göster