Board index   FAQ   Search  
Register  Login
Board index php forum :: php coding PHP coding => General

similar_text() function

Ask about general coding issues or problems here.

Moderators: macek, egami, gesf

similar_text() function

Postby andyr » Wed Nov 07, 2012 11:49 am

Hi,

I am trying to use the php similar_text() function. I have some example code below. The 1st gives 100% as I hoped. The 2nd seems pretty reasonable also. But the third I would have expected 0% but get 26% even though none of the words are similar. Does anyone know why this is, how exactly does the similar_text() calculate this value?

Also if I wanted to round the 2nd and 3rd examples to zero decimal places can anyone tell me how I would write the code?

Code: Select all
<?php
//Example 1
$text1 = "This is some text to be checked for plagerism";
$text2 = "This is some text to be checked for plagerism";
similar_text($text1, $text2, $similar);
print "These texts are $similar% similar";
print "<br /><br />";

//Example 2
$text1 = "This is some text to be checked for plagerism";
$text2 = "This is a sentence ready to be checked";
similar_text($text1, $text2, $similar);
print "These texts are $similar% similar";
print "<br /><br />";

//Example 3
$text1 = "This is some text to be checked for plagerism";
$text2 = "Nothing here relates with text1";
similar_text($text1, $text2, $similar);
print "These texts are $similar% similar";
print "<br /><br />";
?>



Thanks,

Andy ;-)
andyr
New php-forum User
New php-forum User
 
Posts: 1
Joined: Wed Nov 07, 2012 11:40 am

Re: similar_text() function

Postby seandisanti » Wed Nov 07, 2012 12:47 pm

from php.net

Description
int similar_text ( string $first , string $second [, float &$percent ] )

This calculates the similarity between two strings as described in Programming Classics: Implementing the World's Best Algorithms by Oliver (ISBN 0-131-00413-1). Note that this implementation does not use a stack as in Oliver's pseudo code, but recursive calls which may or may not speed up the whole process. Note also that the complexity of this algorithm is O(N**3) where N is the length of the longest string.

I've been unsuccessful finding a better explanation of the algo in use, but i really invested only about a minute in researching it. According to the comments on php.net, changing the order of your operands can affect your results
seandisanti
php-forum Fan User
php-forum Fan User
 
Posts: 838
Joined: Mon Oct 01, 2012 12:32 pm

Re: similar_text() function

Postby MeroD » Wed Nov 07, 2012 10:32 pm

As an alternative you can try Levenshtein Distance: http://php.net/manual/en/function.levenshtein.php

It's apparently faster than similar_text, and you can control the different parts of the algorithm...
MeroD
New php-forum User
New php-forum User
 
Posts: 53
Joined: Wed Oct 10, 2012 12:14 am


Return to PHP coding => General

Who is online

Users browsing this forum: No registered users and 2 guests

Sponsored by Sitebuilder Web hosting and Traduzioni Italiano Rumeno and antispam for cPanel.

cron