splogs

My blog search engine SproutSearch is now indexing over 8 million blogs. I am now working on changing the way the blogs are ranked. For now, they are sorted by the sheer amount of content they contain. I noticed a big problem with this method is that many spam blogs contain masses of content. I don’t like SproutSearch linking to so much spam, so I need to find a way to remove a lot of these listings.It is not practical for me to read 8 million blogs, so I need to come up with an automated method to detect spam. Many spam blogs use the same words over and over. So I wrote a program to count the number of repeated words. Most spam blogs seem to use a similar number of words per post. I made another program that computes the standard deviation of the number of words in a post. Using these metrics, I will make a program that flags potential spam so I can review and delete it.

More: continued here



Leave a Reply

You must be logged in to post a comment.



Related Resources

Free Downloads on ZDNet | Shareware, Trialware, Evaluation Software
ZDNet's Software Directory is the Web's largest library of software downloads. Covering software for Windows, Mac, and Mobile systems, ZDNet's Software Directory is the best source ...

Apple - QuickTime - Download
Download QuickTime 7 Player free for PC and Mac. Upgrade to QuickTime 7 Pro and capture video with a single click or convert media into a variety of formats.

Picasa 3: Free download from Google
Picasa is a software download from Google that helps you organize, edit, and share your photos. It's free, and easy to use.

Calyx Point Mortgage Loan Origination and Processing Software & Calyx ...
Calyx Point and WebCaster provide mortgage origination, automation solutions for mortgage businesses.

Welcome! - Free Software Foundation
The organization that "started it all" in free or open source software.