updated blog spider

I’ve just updated SproutSearch’s blog spider. It now tries to fetch the RSS feed for a blog before parsing it with my HTML parser. This should save some CPU time and give me better results. It also gets the date and time of the blog’s latest post and some statistics about the number of words used. This new data will allow me to generate better pages in the future. I am also brainstorming some methods of data mining I can use to make SproutSearch a bit more interesting. With over 8 million blogs in the database there are lots of possibilities.http://www.sproutsearch.com

More: continued here



Leave a Reply

You must be logged in to post a comment.



Related Resources

About Graphics Software - Tutorials Reviews Tips and Help for Working ...
About Graphics Software is the ultimate resource for learning about graphics software for Macintosh and Windows. Guide Sue Chastain brings you informational articles, how-tos ...

Software
Software Description: ZiLOG?s real-time preemptive multitasking kernel, RZK, is designed for time-critical embedded applications.

Software industry - Wikipedia, the free encyclopedia
The software industry comprises businesses involved in the development, maintenance and publication of computer software. The software industry started in the mid-1970s at the time ...

Software for Switches - HP ProCurve Networking
Home page for software downloads for HP ProCurve switches. This page supplies version and release information as well as download links.

Software Reviews, Download and Buy Software Online - Software.com
Software reviews from experts and regular people. Find out what other people think about the software you want to buy. Browse a selection of reviews for antivirus software ... OS ...

Welcome! - Free Software Foundation
The organization that "started it all" in free or open source software.

HP Software
Learn how HP business software solutions can help maximize performance of your IT infrastructure.