خوشه بندی داده های جریانی با استفاده از آپاچی اسپارک

The speed of data generation is growing these days. The model of data generation has changed to multi producer- multi consumer from one producer-multi consumer, as individuals can generate and publish data easily. Traditional data base is not able to process these vast data, which are known as big data. One of the most important framework of big data is Hadoop. Hadoop consists of two parts: processing and storing parts. The main idea of processing is Map-Reduce. In this model of processing each process is divided into sub-process and the results are integrated at the end. Spark programing is the same as Map-Reduce with the difference that all operation is done in the main memory in order to increase the speed of processing. One of the important source of big data is synchrotron. Stream clustering is a procedure to categorize data in real time. Here, we are going to use stream clustering on synchrotron data by using spark programing.

0 پاسخ

دیدگاه خود را ثبت کنید

تمایل دارید در گفتگوها شرکت کنید؟
در گفتگو ها شرکت کنید.

پاسخی بگذارید

نشانی ایمیل شما منتشر نخواهد شد. بخش‌های موردنیاز علامت‌گذاری شده‌اند *