The Kanwei Decrees…

Posted
22 April 2008 @ 5pm

Tagged
Ramblings

Accepted to Google Summer of Code

F88AD820-36C4-48C7-B26E-0F4B77131D50.jpg

I got accepted into the 2008 Google Summer of Code program! It’s a Google initiative to pay programmers to work on open source software projects. I’ll be updating my blog more this summer.

I will be writing a Ruby library to implement various algorithms and data structures so that they do not have to be re-implemented in other projects. My submitted proposal is as follows:

Using the right data structure or algorithm for the situation is an important aspect of programming. In computer science literature, many data structures and algorithms have been researched and extensively documented. However, there is still no standard library in Ruby implementing useful structures and algorithms like Splay Trees, Red/Black Trees, tries, graphs, different sorting algorithms, etc. This project will create such a library with documentation on when to use a particular structure/algorithm. It will also come with a benchmark suite to compare performance in different situations.

By leveraging the flexibility of Ruby’s duck typing, the structures and algorithms can be used on all classes of objects as long as a few methods are defined, much like the Enumerable mix-in. The library will be written in Ruby so that it can be used by all the different Ruby implementations, but will also be implemented in C and Java for performance.

Possible data structures and algorithms:

  • Trees (AVLTree, SplayTree, Red/Black Tree)
  • Heaps
  • Tries (Radix Tree, Suffix Tree)
  • Graphs (Adjacency List, Kruskal’s Algorithm, Djikstra’s Algorithm)
  • Sorting (Mergesort, Quicksort, Radix Sort, Insertion Sort)
  • Searching (Knuth-Morris-Pratt, Breadth-first tree search, depth-first tree search)

Motivation:

To create a fast ruby library of common data structures and algorithms so that they do not have to be re-implemented for other projects.

Organization:

  • The data structures part of the library will be implemented as classes to be instantiated.
  • The sorting algorithms will be implemented as mix-ins, much like the Enumerable and Comparable libraries that require the user to define #each and #<=> respectively in order to get additional methods. The search methods could be added to built-in structures like arrays and strings, but this may cause conflicts with existing code and this must be considered.
  • Algorithms related to certain data structures, such as Graph searches, will be implemented for the appropriate structure.

Reference:

The primary literature referenced will be Robert Sedgewick’s Algorithms (Parts 1 to 5), as it has both code examples and proofs for code complexity. I have used this book in many classes and it is a definitive resource for algorithms and data structures. For homework, I have implemented many of the structures and algorithms in Java, and have experience in writing benchmarks to test for different cases.

Implementation:

I plan to first write the entire library in Ruby and then implement it in C and Java for performance. Data structures and algorithms are low enough level that this is necessary. I have experience in both of the latter languages and should have no problem finishing over summer.

A benchmark suite will be created in Ruby (probably using RSpec) to demonstrate the best, average, and worst case running times for each algorithm and structure.

Documentation:

A key aspect of the project will be to provide documentation with real-life code examples to demonstrate the usage of the algorithms and structures. For example, the documentation entry for LSD radix sort will say that it quickly sorts numbers and strings, and will list the Big O complexity to be linear. A benchmark comparing the results with that of the Enumerable#sort will be shown, and hopefully the radix sort will be faster as it is O(n) and not O(n log n).

Explaining data structures will be trickier, as experience and theoretical background is needed to choose the right data structure for the problem. For example, many real world problems can be solved in terms of graphs. I will write up some realistic examples to demonstrate the use of trees, heaps and graphs, the main structures to be implemented. As with the algorithms, the documentation will provide benchmarks and Big O complexity.


Ralph Nader Speaks at Emory Physical Activity