K-Means Clustering, and it’s use in the security domain

Sri Raviteja
3 min readSep 8, 2021

Task 10

Task Description 📄

📌 Create a blog/article/video about explaining k-mean clustering and its real use case in the security domain

Clustering is one of the most common exploratory data analysis techniques used to get an intuition about the structure of the data. It can be defined as the task of identifying subgroups in the data such that data points in the same cluster are very similar, while data points in different clusters are very different.

Clustering is the task of dividing the population or data points into a number of groups such that data points in the same groups are more similar to other data points in the same group as those in other groups. In simple words, the aim is to segregate groups with similar traits and assign them into clusters

Unlike supervised learning, clustering is considered an unsupervised learning method since we don’t have the ground truth to compare the output of the clustering algorithm to the true labels to evaluate its performance. We only want to try to investigate the structure of the data by grouping the data points into distinct subgroups.

K-Means Clustering

K-Means algorithm is used to find clusters in the given input data. There are a couple of ways to accomplish this. We can use the trial and error method by specifying the value of K. As we progress, we keep changing the value until we get the best clusters.

Another method is to use the Elbow technique to determine the value of K. Once we get the K’s value, the system will assign that many centroids randomly and measure the distance of each of the data points from these centroids. Accordingly, it assigns those points to the corresponding centroid from which the distance is minimum. So each data point will be assigned to the centroid, which is closest to it. Thereby, we have a K number of initial clusters.

For the newly formed clusters, it calculates the new centroid position. The position of the centroid moves compared to the randomly allocated one.

It’s use in the Security Domain

Computer’s network Security has become the chief problem of the information society.

With the continuous development of technology, the network intrusion behavior has hidden power, the means of destruction is complex, there is no time-space to restrict the existence of the network, there is great harm to the network security. Therefore, network security is the most important component of today’s society. As for the detection and prevention of intrusion detection, it becomes the primary problem that we need to solve. The research on intrusion detection systems also becomes extremely important. Based on the data mining of the k-means clustering algorithm, this paper conducts research on network security and discusses how to create a network security and harmonious environment

--

--