Skip to main content
Please use this identifier to cite or link to this item:

Issue Date: 2016
Title: Design and implementation of a component-based distributed system for text mining in social networks
Authors: Huang, Yu
Publisher : University of Ontario Institute of Technology
Degree : Master of Engineering (MEng)
Department : Electrical and Computer Engineering
Supervisor : Mahmoud, Qusay H.
Keywords: Text mining
Social networks
Distributed system
Apache Storm
Apache Kafka
Abstract: This report presents the design and implementation of a component-based distributed system for text mining in social networks. The system consists of three main types of components, data collection, data processing and data visualization. Three possible frameworks explore simple linear architecture, message feedback architecture, Kafka centric architecture and provide implementations of them. The final system adopts Kafka-centric architecture in which all components are connected through Kafka brokers. In terms of functionality, data collection components are responsible for collecting data from Twitter and producing messages to Kafka brokers. Data processing components contain a series of basic text mining topologies. Based on JavaScript libraries, data visualization is presented on web pages and allows users to interact with graphs and charts. In order to improve the scalability and performance of text mining, the project selects Apache Storm framework to implement data processing components. In this report, we evaluate the availability of Kafka and Storm, the rates of data collection components and the performance of data processing components. The experimental results demonstrate our system is available and scalable, and the component-based structure of this system enables it to be extended easily.
Appears in Collections:Master's Projects and Major Papers
Faculty of Engineering and Applied Science - Master Projects

Files in This Item:

File Description SizeFormat
Huang_Yu.pdf1.7 MBAdobe PDFView/Open

Items in e-scholar@UOIT are protected by copyright, with all rights reserved, unless otherwise indicated.