UPB-CTTIP Romdoc: Record#1299: Contribuţii privind descoperirea cunoştinţelor în baze de date

Thesis /

ROMDOC-THESIS-2017-784

Contribuţii privind descoperirea cunoştinţelor în baze de date

Pupezescu, Valentin
2011-02-29

Abstract: Facultatea de Electronică, Telecomunicaţii şi Tehnologia Informaţiei Contribuţii privind descoperirea cunoştinţelor în baze de date - Abstract Autor: Ing. Valentin PUPEZESCU Conducător de doctorat: Prof. Dr. Ing. Rodica Strungaru Descoperirea cunoştinţelor în baze de date (Knowledge Discovery in Databases - KDD) reprezintă procesul de identificare a datelor valide, noi, potenţial folositoare în bazele de date. Data Mining (DM) reprezintă aplicarea propriu-zisă a unui algoritm specific de extragere a modelelor de date. Stadiul actual al cunoaşterii în acest domeniu este unul avansat în ceea ce priveşte algoritmii data mining de extragere a cunoştinţelor. În prezenta lucrare se face legătura dintre o sarcină DM (clasificarea) şi implementarea efectivă a bazelor de date distribuite. Structura folosită în experimente este cea de mașină comitet distribuită (Distributed Committee Machine - D-CM). Se studiază pentru prima oară în domeniul KDD interacțiunea structurilor D-CM compuse din mai multe perceptroane multistrat și cele mai folosite tehnologii de replicare a datelor din prezent. În capitolul 1 este prezentat întregul proces KDD. Se prezintă sarcinile primare și metodele pasului Data Mining precum și provocările procesului de descoperire a cunoștințelor în baze de date. În capitolul 2 sunt prezentate principalele structuri neuronale ce pot fi folosite în structuri distribuite. Capitolul 3 este dedicat studiului arhitecturilor DM distribuite. La finalul acestuia sunt propuse arhitecturile de lucru folosite în testele experimentale. În capitolul 4 se subliniază avantajul folosirii calculului distribuit (cu mai mulţi perceptroni multistrat care rulează în mod concurent) pentru a atinge performanțe mai bune de clasificare. Totodată sunt subliniate dezavantajele și se demonstrează experimental încetinirea procesului de descoperire a cunoștințelor în baze de date distribuite implementate într-o topologie de replicare de tip master-slave. Aceste probleme (din capitolul 4) sunt rezolvate în capitolul 5 unde s-a propus o modalitate optimizată de lucru a unui D-CM. În capitolul 6 se demonstrează experimental faptul că în sistemul de gestiune al bazelor de date MySql, pentru arhitectura D-CM, este indicată folosirea motorului de stocare MyISAM. De asemenea, se studiază şi influenţa nivelurilor de izolare tranzacționale asupra performanțelor unei arhitecturi D-CM deoarece accesul la tabele de rezultate se face concurent. În capitolul 7 se studiază influenta tipurilor de replicare din SQL server asupra procesului de descoperire a cunoștințelor în bazele de date distribuite. În capitolul 8 se studiază de asemenea performanțele tipurilor de replicare din SQL Server și influența nivelurilor de izolare asupra performanțelor arhitecturilor D-CM. În urma testelor efectuate se poate constata faptul ca arhitectura optimizată propusă este indicată la rezolvarea sarcinii de clasificare. În capitolul 9 se prezentat o nouă funcționare a unui singur perceptron. Noul perceptron se numește perceptronul pulsatoriu. Acest perceptron imită funcţionarea unei maşini comitet într-o funcţionare secvenţială. În ultimul capitol sunt prezentate contribuţiile personale precum şi direcţiile de studiu ulterioare în acest domeniu. Faculty of Electronics, Telecommunication and Information Technology Advances in Knowledge Discovery in Databases - Abstract Author: Ing. Valentin PUPEZESCU Scientific Coordinator: Prof. Dr. Ing. Rodica Strungaru The Knowledge Discovery in Databases (KDD) represents the overall process of finding valid, new and useful data in databases. Data Mining (DM) represents a set of specific methods and algorithms aimed solely at extracting patterns from raw data. The current state of knowledge in this field is advanced in terms of data mining algorithms used for extracting knowledge. This study links one DM task (classification) and the real implementation of distributed databases. The structure used in experiments is the distributed committee machine (D-CM). This research studies for the first time in KDD field the interaction between D-CM structures composed of several multilayer perceptrons and the data replication technologies used today. The first chapter presents the entire KDD process. In this chapter are presented the primary tasks and methods of the DM step and also the challenges of the knowledge discovery process in databases. Chapter 2 presents the main neural structures that can be used in distributed structures. Chapter 3 is dedicated to distributed DM architectures. At its end it proposes the architectures used in the experimental tests. Chapter 4 highlights the advantage of using distributed computing (with several multilayer perceptrons running concurrently) to achieve better performance in terms of classification. Disadvantages are also highlighted and the slowing of the knowledge discovery process in distributed database implemented in a master-slave replication topology is demonstrated experimentally. These problems (from Chapter 4) are addressed in Chapter 5 where it is proposed an optimized way of functioning of a D-CM. In chapter 6 it is demonstrated experimentally that D-CM architectures have the best results when using the MyISAM storage engine (MySQL) for stored data. This chapter also studies the influence of transactional isolation levels on the performance of D-CM architecture because of the concurent access to tables. Chapter 7 studies the influence of SQL Server replication types on the KDD process. Chapter 8 is also studying the performance of SQL Server replication types and how isolation levels affect the performances of the D-CM architectures. The experimental results show that the optimized D-CM architecture is the most suitable for solving the classification task. Chapter 9 presents a different way of functioning of a single perceptron. The new perceptron is called the pulsating perceptron. This way of functioning mimics the behaviour of a sequential committee machine. The last chapter presents personal contributions and directions for further study in this field.

Keyword(s): Data mining -- Teză de doctorat ; Baze de date -- Teză de doctorat ; Sisteme distribuite -- Teză de doctorat
OPAC: See record in BC-UPB Web OPAC
Full Text: see files

Record created 2017-01-05, last modified 2017-01-05

Similar records

People who viewed this page also viewed:

(262)  Optimizarea conceptuală şi operaţională a instalaţiilor chimice multiscop - Voinescu, Sorin - ROMDOC-BC_UPB-THESIS-2003-000000054

(260)  Managementul congestiilor în sistemele electroenergetice în prezenţa surselor regenerabile - Boambă, Claudia-Elena - ROMDOC-THESIS-2021-2325

(251)  Tehnologiile informării şi comunicării : suport de curs - Curta, Olimpia - ROMDOC-BOOK-2007-005

(250)  Spectrometrie de masă cu acceleratorul în zona nucleelor actinide - Păceşilă, Doru Gheorghe - ROMDOC-THESIS-2020-2165

(250)  Sisteme electronice de control automat a proceselor tehnologice din industria cimentului - Bernea, Florin George - ROMDOC-THESIS-2021-2312

Rate this document:

Be the first to review this document.

Discuss this document:

Start a discussion about any aspect of this document.