Board index   FAQ   Search  
Register  Login
Board index php forum :: Database mySQL & php coding

Problem with Decaster a big data

Codes here !

Moderators: macek, egami, gesf

Problem with Decaster a big data

Postby trungtan » Sat Jul 07, 2012 7:42 pm

Hi everyone,

I have a table Companies(id, name) which contents about 100.000 records. I want to calculate the levenshtein between each company to determine duplicate companies, so I create a new table Matrix(id1, id2, distance).

My problem is the table Matrix is too big. Because it contents 100.000*99999/2 = 4999950000 records.
Each record takes at least 2+2+2=6bytes. So table Matrix takes 4999950000*6/1024^3 = 27Gb. It's too big.

Please help me some solutions to reduce table Matrix! Thank you!
trungtan
New php-forum User
New php-forum User
 
Posts: 12
Joined: Mon Nov 28, 2011 3:53 am

Re: Problem with Decaster a big data

Postby johnj » Sun Jul 08, 2012 5:25 am

One way out which may not be easy will be to reduce the number of candidates from 99999 to something slightly smaller.
johnj
php-forum Super User
php-forum Super User
 
Posts: 1805
Joined: Thu Mar 10, 2011 5:07 pm

Re: Problem with Decaster a big data

Postby johnj » Sun Jul 08, 2012 5:32 am

If the levenshtine distance is too big, then what is the point in saving that information. This should cut down the number of records saved.
johnj
php-forum Super User
php-forum Super User
 
Posts: 1805
Joined: Thu Mar 10, 2011 5:07 pm

Re: Problem with Decaster a big data

Postby trungtan » Mon Jul 09, 2012 1:05 am

Thanks, maybe that is the only way.
trungtan
New php-forum User
New php-forum User
 
Posts: 12
Joined: Mon Nov 28, 2011 3:53 am


Return to mySQL & php coding

Who is online

Users browsing this forum: No registered users and 1 guest

Sponsored by Sitebuilder Web hosting and Traduzioni Italiano Rumeno and antispam for cPanel.

cron