Facebook Careers Puzzle—smallworld solution in Ruby

From Facebook’s Job Puzzles page:

As a popular engineer, you know many people in your home city. While traveling around town, visiting your friends, you realize it would be really handy to have a program that tells you which of your friends are closest based upon which friend you are currently visiting.

Being an engineer who is interested in writing software that is useful to everyone, you decide to write a general solution to your quandary. Each of your friends lives at a unique latitude and longitude. For the purposes of this program, the world is flat, and the latitude and longitude are for all intents and purposes Cartesian coordinates on a flat plane. For example, in our flat world, lat 45, long -179 is not as close to lat 45, long 179 when compared to lat 45, long 170.

Write a program that takes a single argument on the command line. This argument must be a file name which contains the input data. Your program should output the nearest three other friends for each friend in the list. You are virtually a celebrity and your friend list can be astoundingly huge; your program must have a running time of faster than O(n2) and be robust and resource efficient.

The naive algorithm is extremely simple but won’t get you anywhere as Facebook will run your code on a huge dataset and an O(n2) solution won’t cut it. I did some research and concluded that the best solution would use a kd-tree. The time complexity with this data structure is O(n log n). Implementing it was not too hard, although most reference implementations I found did not support querying for more than one nearest neighbor (we need 3). I really liked the data structure, so I generalized it and plopped it into my Algorithms and Containers library.

For testing, I used this randomly generated file with 10000 points. It took less than 5 seconds to solve.

The following solution in Ruby passed the Facebook puzzle robot:

(smallworld.rb) download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
#!/usr/bin/env ruby
# Small world puzzle
# Kanwei Li, 2009

# Make sure input file exists and read from it
filename = ARGV[0]
unless filename && File.exist?(filename)
  puts "error: must specify a valid input file"
  exit
end
input = File.open(filename)

$points = [] # Store array of points as a global so we don't have to pass it around

# Populate points array
while line = input.gets do
  break if line.empty?
  n, x, y = line.split(/\s+/)
  $points << [n.to_i, x.to_f, y.to_f]
end

Node = Struct.new(:id, :coords, :left, :right)

# Build a kd-tree
def build_tree(points, depth=0)
  return if points.empty?

  axis = depth % 2 + 1

  points.sort! { |a, b| a[axis] <=> b[axis] }
  median = points.size / 2

  node = Node.new(points[median][0], points[median][1..-1], nil, nil)
  node.left = build_tree(points[0...median], depth+1)
  node.right = build_tree(points[median+1..-1], depth+1)
  return node
end

# Euclidian distanced, squared, between a node and target coords
def distance2(node, target)
  return nil if node.nil? or target.nil?
  c = (node.coords[0] - target[0])
  d = (node.coords[1] - target[1])
  c * c + d * d
end

# Update array of nearest elements if necessary
def check_nearest(nearest, node, target)
  d = distance2(node, target)
  if nearest.size < 4 || d < nearest.last[0]
    nearest.pop if nearest.size >= 4
    nearest << [d, node.id]
    nearest.sort! { |a, b| a[0] <=> b[0] }
  end
  nearest
end

$nearest = []
def nearest(node, target, depth=0)
  axis = depth % 2

  if node.left.nil? && node.right.nil? # Leaf node
    $nearest = check_nearest($nearest, node, target)
    return
  end

  # Go down the nearest split
  if node.right.nil? || (node.left && target[axis] <= node.coords[axis])
    nearer = node.left
    further = node.right
  else
    nearer = node.right
    further = node.left
  end
  nearest(nearer, target, depth+1)

  # See if we have to check other side
  if further
    if $nearest.size < 4 || (target[axis] - node.coords[axis])**2 < $nearest.last[0]
      nearest(further, target, depth+1)
    end
  end

  $nearest = check_nearest($nearest, node, target)
  # puts "end: #{node.id}\t#{distance2(node, target)}"
end
root = build_tree($points)

$points.sort { |a, b| a[0] <=> b[0] }.each do |point|
  $nearest = []
  nearest_3 = nearest(root, point[1..-1])
  puts "#{point[0]} #{nearest_3[1..-1].collect{ |n| n[1] }.join(',')}\n"
end
Posted on