Thursday, April 29, 2010

Interesting set intersection by using 'uniq'

I was asked to find the intersection of two IP lists yesterday. I write a simple program with Perl. The program iterates the two lists to find the repeated IPs, which is a O(n^2) algorithm.

foreach $a (@A) {
foreach $b (@B) {
if ($a eq $b) {
...
}
}
}
When I wake up this morning, another way to do this pops onto my mind.

cat file-A | sort | uniq > tmp
cat file-B | sort | uniq >> tmp
cat tmp | sort | uniq -d

Basically, it is a O(nlog(n)) algorithm. (the sort operation).