Advanced Search

Show simple item record

dc.contributor.authorKABORE, KADER MONHAMADY
dc.date.accessioned2020-07-21T13:47:40Z
dc.date.available2020-07-21T13:47:40Z
dc.date.issued2018en_US
dc.identifier.otherTez No: 541338
dc.identifier.urihttps://hdl.handle.net/20.500.12573/323
dc.description.abstractDetection of key attributes in text is an area of research, which attracts attention due to the increase of data and the availability of massive documents. Key attributes serve as metadata for documents and the discovery of accurate characteristics allows to capture significant pieces of information from a lengthy text. They allow faster and efficient information retrieval on the web domain with an ever increasing number of websites. In this thesis, a novel two-stage machine learning method is developed to identify the company name from web page text. The problem is reduced to a classification task at the token (i.e. word) level followed by a post-processing phase for predicting the company name. Features are extracted using natural language processing techniques and by observing patterns present in textual data to reflect the properties and significance of the words in context. Derived features are sent as input to classification algorithms such as naive Bayes, decision tree, and random forest. In addition to the token-based classifier, a rule-based method is designed that also considers tokens from domain as well as page title and ranks tokens by computing similarity metrics. The results demonstrate high precision from the machine learning model along with high undefined cases whereas the rule-based approach obtained high accuracy with precision inferior to the token-based model. When the two classification strategies are combined into a two-stage classifier, high accuracy and precision scores are obtained.en_US
dc.language.isoengen_US
dc.publisherAbdullah Gül Üniversitesien_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectNamed Entity Recognitionen_US
dc.subjectCompany Name Detectionen_US
dc.subjectNatural Language Processingen_US
dc.subjectWeb Miningen_US
dc.subjectFeature Extractionen_US
dc.subjectMachine Learningen_US
dc.titleDeveloping machine learning methods for business intelligenceen_US
dc.title.alternativeİş zekası için makine öğrenmesi yöntemlerinin geliştirilmesien_US
dc.typemasterThesisen_US
dc.contributor.departmentAGÜ, Fen Bilimleri Enstitüsü, Elektrik ve Bilgisayar Mühendisliği Ana Bilim Dalıen_US
dc.contributor.institutionauthorKABORE, KADER MONHAMADY
dc.relation.publicationcategoryTezen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record