Hi everyone,
I have a table Companies(id, name) which contents about 100.000 records. I want to calculate the levenshtein between each company to determine duplicate companies, so I create a new table Matrix(id1, id2, distance).
My problem is the table Matrix is too big. Because it contents 100.000*99999/2 = 4999950000 records.
Each record takes at least 2+2+2=6bytes. So table Matrix takes 4999950000*6/1024^3 = 27Gb. It's too big.
Please help me some solutions to reduce table Matrix! Thank you!


