Automatic classification of feeds by similarity in content.

Links for php scripts

Moderators: macek, egami, gesf

jnmrno13
New php-forum User
New php-forum User
Posts: 1
Joined: Sat Oct 13, 2012 2:18 pm

Automatic classification of feeds by similarity in content.

Postby jnmrno13 » Sat Oct 13, 2012 2:30 pm

Hi, can anyone point me to the right direction on how to automatically classify RSS feeds from different websites, lets say newspapers, by the similarity of content. E.g. that the feeds from different sites that refer to the exact same news event come together under a <fieldset> for instance, by recognizing similarities in keywords.
I am not yet an expert on PHP. Would like some help on what is essential to consider for this script.
Thank you!

seandisanti
php-forum Fan User
php-forum Fan User
Posts: 838
Joined: Mon Oct 01, 2012 12:32 pm

Re: Automatic classification of feeds by similarity in conte

Postby seandisanti » Mon Oct 15, 2012 8:29 am

There's a GREAT book out that demonstrates collecting and analyzing data like this, except it's in python. The concepts are pretty straightforward and very portable though. Check out "Programming collective intelligence by Toby Segaran." http://www.amazon.com/s/ref=nb_sb_ss_i_ ... Caps%2C222. I don't like his naming conventions in his code (single and two letter identifiers that mean nothing and cloud the purpose) , and there are a couple of typos that stop example code from working if you follow directly in the book, but most errata are documented on the official site, and working example code. Beyond those two caveats though, this is a GREAT book, chock full of huge dataset analytic goodness.


Return to “PHP Scripts”

Who is online

Users browsing this forum: No registered users and 1 guest