Format: HTML | BibTeX | DC | EndNote | NLM | MARC | Journal | MARCXML
Thesis / ROMDOC-THESIS-2017-1089

Classification and information retrieval techniques for heterogeneous data in the context of semantic applications

Cernian, Alexandra Suzana
2011-09-21

Abstract: Faculty of Automatic Control and Computer Science Thesis title: Classification and information retrieval techniques for heterogeneous data in the context of semantic applications Author: eng. Alexandra Suzana Cernian Scientific coordinator: prof. dr. ing. Valentin Sgârciu Abstract This PhD thesis is the result of the research conducted in order to define clustering methods based on compression algorithms in the context of semantic applications. The thesis integrates, both conceptually and practically, several research areas (data mining, data clustering, semantic indexing), modelling spaces and technological spaces. The main objective of this work is defining and validating a compression based clustering approach, as well as integrating this approach into a semantic-based software system, with usability in an IT company. The development of this thesis involved two main phases. The first stage consisted in an in-depth analysis and synthesis of the data clustering domain and compression algorithms, combined with the compression algorithms integration proposal in the context of semantic data clustering. The next step consisted of designing and implementing an original cluster analysis and validation test platform, called EasyClustering, containing 3 compression algorithms, 4 distance metrics and 3 clustering algorithms. This cluster analysis component is integrated through the XML technological space with an Fscore-based cluster validation expert, implemented in order to give the user a way to automatically validate clusters. The experimental results are based on the validation of clustering several representative data sets for semantic context of the thesis, leading to FScore values which reach the maximum value. A significant contribution in the experimental validation phase is the proposal of a strategy for improving clustering performance based on the use of metadata and the increased weight of the keywords. Based on the results obtained, the last part of the thesis proposes the design, UML modelling and implementation of an automatic system for clustering emails, based on semantics and integrating the compression based clustering approach. The application uses the semantic content of the message subject and body, with applicability in improving quality assurance activities in a software company, by automatically clustering bug reports and organizing them based on priority.

Keyword(s): Data mining -- Teză de doctorat ; Algoritmi de compresie -- Teză de doctorat ; Clasificarea datelor -- Teză de doctorat ; Web -- Regăsirea informaţiei -- Teză de doctorat
OPAC: See record in BC-UPB Web OPAC
Full Text: see files

Record created 2017-03-11, last modified 2017-03-11

Similar records


 
People who viewed this page also viewed:
(292)  Optimizarea conceptuală şi operaţională a instalaţiilor chimice multiscop - Voinescu, Sorin - ROMDOC-BC_UPB-THESIS-2003-000000054
(289)  Managementul congestiilor în sistemele electroenergetice în prezenţa surselor regenerabile - Boambă, Claudia-Elena - ROMDOC-THESIS-2021-2325
(281)  Tehnologiile informării şi comunicării : suport de curs - Curta, Olimpia - ROMDOC-BOOK-2007-005
(277)  Simularea numerică a echipamentelor hidropneumatice de aviaţie - Dincă, Liviu - ROMDOC-BC_UPB-THESIS-2005-000013939
(276)  THE EFFECT OF MICROWAVE RADIATION ON CATALASE EXTRACTED FROM TARAXACUM ROOTS - Popet, Laura et al - ROMDOC-BCUT-ARTICLE-2007-001

 
Rate this document:
Be the first to review this document.


Discuss this document:
Start a discussion about any aspect of this document.