Pattern Recognition Letters 2009; 830–836
Large datasets with interactions between objects are common to numerous scientific fields including the social sciences and biology, as well as being a feature of specific phenomena such as the internet. The interactions naturally define a graph, and a common way of exploring and summarizing such datasets is graph clustering. Most techniques for clustering graph vertices use only the topology of connections, while ignoring information about the vertices’ features. In this paper we provide a clustering algorithm that harnesses both types of data, based on a statistical model with a latent structure characterizing each vertex both by a vector of features and by its connectivity. We perform simulations to compare our algorithm with existing approaches, and also evaluate our method using real datasets based on hypertext documents. We find that our algorithm successfully exploits whatever information is found both in the connectivity pattern and in the features.
