مشخصات پژوهش

صفحه نخست /Improvement of CluStream ...
عنوان Improvement of CluStream Algorithm Using Sliding Window for the Clustering of Data Streams
نوع پژوهش مقاله ارائه شده کنفرانسی
کلیدواژه‌ها data stream, clustering of data stream, window models, sliding window
چکیده Today, data are produced in large amounts, mostly in form of data streams. A data stream is an unlimited stream of data that is produced in large amounts and with high speeds. Therefore, it can be defined as a sequence of data objects in specified time intervals. One of the most common processes performed on data streams is clustering which is aimed at dividing the data items into homogeneous groups. A well-known clustering algorithm is CluStream, an implemented version of which has been developed for the distributed environment of Apache Spark. This algorithm makes use of a tilted window. The present paper offers a modified version of the algorithm which utilizes a sliding window for clustering. In the proposed method, called CluStreamSW, only the latest data are used in updating the produced model and the old data are removed, which allows for a higher speed of execution and achieving more desirable results. The CluStreamSW was implemented in Apache Spark. The results of multiple executions of this algorithm on authentic data and comparing them with the CluStream algorithm based on tilted window indicate that our algorithm performs up to 30 percent in the CoverType dataset and up to 17 percent in the PowerSupply dataset better on average in terms of precision. In general, in some cases, more than 90 percent improvement in precision is achieved.
پژوهشگران سحر احسنی (نفر اول)، مرتضی یوسف صنعتی (نفر دوم)، محرم منصوری زاده (نفر سوم)