Tag Spam Creates Large Non-Giant Connected Components

AutorNicolas Neubauer, Robert Wetzker, Klaus Obermayer
QuelleAirweb 2009: Fifth International Workshop on Adversarial Information Retrieval on the Web| 18th Int. World Wide Web Conference, Madrid, Spain, 2009 | to appear 
LinksDownload   |   BibTeX 

Spammers in social bookmarking systems try to mimick bookmarking behaviour of real users to gain the attention of other users or search engines. Several methods have been proposed for the detection of such spam, including domainspeci c features (like URL terms) or similarity of users to previously identi ed spammers. However, as shown in our previous work, it is possible to identify a large fraction of spam users based on purely structural features. The hypergraph connecting documents, users, and tags can be decomposed into connected components, and all large, but nongiant components turned out to be almost entirely inhabitated by spam users in the examined dataset. Here, we test to what degree the decomposition of the complete hypergraph is really necessary, examining the component structure of the induced user/document and user/tag graphs. While the user/tag graph's connectivity does not help in classifying spammers, the user/document graph's connectivity is already highly informative. It can however be augmented with connectivity information from the hypergraph. Spam detection based on structural features, like the one proposed here, requires complex adaptation strategies from spammers and is well suited to complement other, more traditional detection approaches.