Sano

Sano Collections Framework

Paul Chiusano
paul.chiusano@gmail.com
Last updated: February 28, 2006

Contents:

1. About the framework
1.1 Getting Started
1.2 Basics
1.3 More Examples
2. API Documentation
3. Download
4. Repository access
5. Getting Help
6. Roadmap

1. About the framework

The Sano framework contains pure-Lua implementations of several common (and a few not-so-common) data structures and related algorithms. It runs under both Lua 5.0 and 5.1, and is MIT licensed.

Among the highlights are: Several sequence types, including Tuple (a lightweight, hashable table wrapper), Vector (slightly more industrial strength table wrapper, including negative indexing), QueueVector (random access + efficient inserts at head OR tail), a basic LinkedList, and SkipVector (a skip-list based random access sequence with efficient inserts or removals at ANY position).

The library also includes several map implementations: among them the basic HashMap (a table-based map with plenty of convenience methods and support for user-defined hashing functions), an ordered, mergeable, spliceable SkipMap, and a LinkedMap (iteration order same as insertion order). For each of these map types, Sano also has the parallel set type - HashSet, SkipSet, LinkedSet.

In addition, the library contains a PairingHeap (constant-time mergeable heap), Multimap (implements one to many mappings), Multiset (ordered or unordered), and numerous algorithms and utility functions for working with all the data structures in the library.

All the data structures in the library are at the very least lightly unit-tested; some I've used extensively for my own projects and are probably pretty stable. Bug reports are appreciated, as are suggestions and comments.

1.1 Getting Started

Sano is packaged as a single 200KB Lua file with all modules and documentation embedded. To get started with Sano, download the sano.lua file linked to on the Sano Luaforge page and place it somewhere Lua's 'require' mechanism knows to look. With that done, you can access the Sano library in the obvious way:

local sano = require 'sano'
...
local v = sano.Vector:new()

One minor caveat is that the root module returned via the call require 'sano' doesn't add anything to the global namespace; you'll need to store the result in a local (or global!) variable.

1.2 Basics

To start off, we'll look at two data structures which are not much more than lightweight wrappers around vanilla Lua tables: HashMap and Vector.

A Vector is not much more than a Lua table with a few handy metamethods. Vectors support negative indexing and can handle nil elements (Vector explicitly tracks the size of the table).

> Vector = sano.Vector
> iter = sano.iter
> vec = Vector:new()
> vec:addAll(iter.count(5)) -- see below
> print(vec)
[1, 2, 3, 4, 5]

All Sano data structures have two construction methods: new and make. The make method is a value constructor; the new method always creates an empty object, and for some classes, new accepts parameters which affect the behavior of the class (for instance, PairingHeap accepts an ordering function which determines which objects will appear at the top of the heap). We could have created the above Vector a little more succinctly using the make method:

> vec = Vector:make(iter.count(5))
> print(vec)
[1, 2, 3, 4, 5]

The make method, and most all Sano functions which handle plaural objects (Lua tables, iterators, strings, and other Sano collections), will work without trouble for any type of plaural object. In the above example, we constructed a Vector by passing in iter.count(5), which just returns an iterator that counts off the integers 1 through 5:

> for int in iter.count(5) do print(int) end
1
2
3
4
5

We could just as easily have supplied a Lua table or some Sano collection (including, but not limited to, Vector) as the argument to make. For example:

> v1 = Vector:make(iter.count(5))
> v2 = Vector:make{1,2,3,4,5}
> v3 = Vector:make(v1)
> print(v1==v2, v2==v3)
true    true

As the last line shows, the __eq metamethod for Vector performs a pairwise comparison of elements to determine equality. All the Sano data structures do something sensible for their equality comparison, and where it makes sense, most data structures have metamethods for other operators (for instance, set union can be expressed via +, and so on).

Here are some more examples of Vector:

> v = Vector:make(1,2,3,4,5)
> print(v)
[1, 2, 3, 4, 5]
> v:shuffle(); print(v)
[1, 5, 3, 2, 4]
> v:sort(); print(v)
[1, 2, 3, 4, 5]
> print(v:get(1), v:get(-1), v(-1)) -- v:get(x)==v(x)
1       5       5
> for e in v:iter() do print(e) end
1
2
3
4
5
> vSub = Vector:make(v:iter(2,-1)); print(vSub)
[2, 3, 4, 5]
> vSub:add(1,0); vSub:add(6); vSub:add(2.2,2)
> print(vSub)
[1, 2, 2.2, 3, 4, 5, 6]
> print(vSub:remove(3), vSub)
2.2     [1, 2, 3, 4, 5, 6]

HashMap is the basic Sano map/table implementation. The table below shows the mapping between Lua table syntax and Sano HashMap syntax:

table

HashMap

t = {}

t = HashMap:new()

t[k]

t:get(k), t(k)

t[k] = v

t:add(k,v)

t[k] = nil

t:remove(k)

HashMap supports user-defined hash functions. Thus, it's possible to use objects other than numbers and strings as keys in a HashMap without resorting to memoization or converting the keys to strings. For instance, the Tuple class overrides its __hash and __eq metamethods, so HashMap can be used as a multi-key map by making the keys tuples:

> h = HashMap:new()
> tuple = sano.makers.tuple -- see explanation below
> h:add(tuple(1,1,2), tuple(3,4))
...
> print( h:get(tuple(1,1,2)) )
(3, 4)

If h were a regular Lua table, the two instances of tuple(1,1,2) would not be recognized as being equivalent:

> h = {}
> h[tuple(1,1,2)] = tuple(3,4)
...
> print( h[tuple(1,1,2)] )
nil

As it can get a little verbose typing ClassName:make, especially for commonly used data structures, Sano has functions in the sano.makers table for constructing Sano data structures with less typing, for example:

> vector = sano.makers.vector -- Vector maker
> list = sano.makers.list -- SkipVector maker
> map = sano.makers.map -- HashMap maker
> omap = sano.makers.omap -- SkipMap maker
> set = sano.makers.set -- HashSet maker
> oset = sano.makers.oset -- SkipSet maker
> heap = sano.makers.heap -- PairingHeap maker
> tuple = sano.makers.tuple -- Tuple maker
...
> v = vector(5,4,3,2,1); print(v)
[5, 4, 3, 2, 1]
> l = list(5,4,3,2,1); print(l)
[5, 4, 3, 2, 1]
> m = map("abcde",{1,2,3,4,5}); print(m)
{a=1, c=3, b=2, e=5, d=4}
> om = omap("abcde",{1,2,3,4,5}); print(om)
{a=1, b=2, c=3, d=4, e=5}
> s = set(1,1,2,3,4); print(s)
{1, 2, 3, 4}
> s = set("abcdee"); print(s)
{"a", "c", "b", "e", "d"}
> so = oset("abcdee"); print(so)
{"a", "b", "c", "d", "e"}
> h = heap(l:shuffle()); print(h)
/1, 2, 3, 5, 4\

1.3 More Examples

> h = map{a=1,b=2,c=3,d=4} -- h = HashMap:make{...}
> print(h)
{a=1, d=4, c=3, b=2}
> for k,v in h:iter() do print(k,v) end
a       1
d       4
c       3
b       2
> -- iterate over the entries in sorted order
> for k,v in h:orderedIter() do print(k,v) end
a       1
b       2
c       3
d       4

SkipMap is an ordered map implementation:

> SkipMap = sano.SkipMap
> s = SkipMap:make(h); print(s)
{a=1, b=2, c=3, d=4}
> -- iterate over a sorted range of entries
> for k,v in s:iter('b','c') do print(k,v) end
b       2
c       3
> -- remove a sorted range of entries
> cut = s:remove('b','c'); print(cut)
{b=2, c=3} 
> print(s)
{a=1, d=4}
> -- merge the two maps back together
> s:merge(cut); print(s)
{a=1, b=2, c=3, d=4}
> -- create another SkipMap with the reverse ordering
> srev = SkipMap:new(sano.ordering.DECREASING)
> print(srev:addMappings(s))
{d=4, c=3, b=2, a=1}

Here are some example uses of different set implementations:

> HashSet, SkipSet = sano.HashSet, sano.SkipSet
> s1 = HashSet:make(iter.count(10))
> s2 = set(iter.count(10))
> print(s1 == s2)
true
> print(set(s1))
{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
> print(s1 - set(iter.count(5,10)))
{1, 2, 3, 4}
> so1 = SkipSet:make("ace"); so2 = oset("ace")
> print(so1 == so2)
true
> so3 = oset("bd"); print(so1, so3)
{"a", "c", "e"} {"b", "d"}
> -- efficiently merge two ordered sets:
> so1:merge(so3); print(so1)
{"a", "b", "c", "d", "e"}
> print(vector(so1:iter('b','d')))
["b", "c", "d"]

Multiset is a collection in which duplicate elements are not explicitly stored:

> m = Multiset:make(iter.repeatCall(100, math.random, 1, 5)); print(m)
{1^27, 2^14, 3^22, 4^17, 5^20}
> m2 = mset(m); print(m == m2)
true
> m3 = mset(iter.repeatCall(100, math.random, 1, 5)); print(m3)
{1^18, 2^28, 3^24, 4^14, 5^16}
> -- computing set operations is efficient
> m:addAll(m3); print(m)
{1^45, 2^42, 3^46, 4^31, 5^36}

Multimap is handy for working with one-to-many mappings. When Multimap:add(key,val) is called for some key which already exists in the map, val is added to a collection stored against key.

> Multimap = sano.Multimap
> m = Multimap:new()
> m:add('a',1); m:add('a',2); m:add('b',3); print(m)
{a=[1, 2], b=[3]}
> m:addValues('c',{1,2,3,4,5}); print(m)
{a=[1, 2], c=[1, 2, 3, 4, 5], b=[3]}

When a one-to-many mapping is built up incrementally, Multimap is often more convenient than inserting checks such as:

local vals = mmap:get(k)
if vals then 
    vals:add(val)
else 
    vals = Vector:new()
    vals:add(val) 
end
mmap:add(k, vals)

PairingHeap is a basic multi-pass pairing-heap implementation:

> h = PairingHeap:make(iter.repeatCall(10, math.random, 1, 5))
> print(h)
/1, 1, 4, 5, 4, 1, 2, 2, 2, 3\
> v = vector(h:extractFirstK(6)) -- partial sort
> print(v)
[1, 1, 1, 2, 2, 2]
> print(h)
/3, 4, 4, 5\
> h:merge(heap(iter.count(10))); print(h)
/1, 3, 10, 9, 8, 7, 6, 5, 4, 3, 2, 4, 4, 5\
> print(h:get())        
1
> print(h:remove(), h)
1       /2, 4, 3, 3, 8, 10, 4, 4, 5, 9, 6, 5, 7\
> print(vector(h:extractFirstK())) -- full heapsort
[2, 3, 3, 4, 4, 4, 5, 5, 6, 7, 8, 9, 10]

The sano.ordering module contains utility functions for working with ordered data. Here are just a few example uses:

> v = list(iter.count(10)):shuffle()
> print(v)
[5, 4, 10, 3, 7, 8, 6, 2, 9, 1]
> print(ordering.first(v))       
1
> print(ordering.last(v))
10
> print(ordering.first(v, ordering.DECREASING))
10
> kth = ordering.kth(v, 6); print(kth) -- runs in linear time
6
> vectors = vector(vector(1,4,2), vector(1,1), vector{1}, vector(1,2,3,4))
> print(vectors)
[[1, 4, 2], [1, 1], [1], [1, 2, 3, 4]]
> -- order these vectors by size
> vectors:sort(ordering.byTransform(Vector.size))
> print(vectors)
[[1], [1, 1], [1, 4, 2], [1, 2, 3, 4]]

2. API Documentation

Online API documentation is available here. The same documentation can also be accessed from Lua: to get help on any object, function, or module, sano.summary(obj) prints a brief one sentance description of obj, and, if obj is an object or module, a brief one sentance description of each of the methods of the object or functions of the module. sano.doc(obj) prints full documentation for obj.

For example, sano.summary(sano.PairingHeap) prints:

PairingHeap - Heap implementation with constant-time merge and get-first
  operations.

PairingHeap:new(orderFn) - Constructs a new PairingHeap using the
  supplied ordering function (default is natural increasing order).

collections.make(self, ...) - Creates and returns a new instance of self
  containing the elements passed in as vararg params.

PairingHeap:__eq(otherheap) - Returns true if both heaps have the same
  size and both self:iter() and otherheap:iter() return the same
  sequence of values.

PairingHeap:add(val) - Adds val to this PairingHeap.

PairingHeap:addNode(node) - Adds this node to the heap and returns it.

PairingHeap:get() - Returns the first/topmost element in the heap.

PairingHeap:get() - Returns the first/topmost element in the heap.

PairingHeap:iter(node) - Returns an iterator over the values in the
  heap, starting from node, which defaults to the root.

PairingHeap:merge(pheap) - Merges another PairingHeap, pheap, in-place
  into this heap.

PairingHeap:nodeIter(node) - Returns an iterator over the NODES in the 
  heap, starting from node (default is root).

PairingHeap:remove(node) - Removes the node from this PairingHeap and
  returns the value that was stored at that node.

PairingHeap:set(node, newVal) - Adjusts the value stored at node to be
  newVal and reestablishes the heap property.

PairingHeap:size() - Returns the number of elements in the PairingHeap.

PairingHeap:test() - Unit test.

PairingHeap:update(node) - Removes and reinserts node.

heaps.toString(heap) - Returns a string representation of heap.

collections.addAll(collection, iterable) - Calls collection:add(e) for
  each e in iter(iterable).

heaps.extractFirstK(heap [, k]) - Returns an iterator which 
  destructively removes and returns the first k elements from this 
  heap.

Likewise, calling sano.doc(PairingHeap.merge) prints:

PairingHeap:merge(pheap) - Merges another PairingHeap, pheap, in-place
  into this heap.

This heap is modified and pheap is invalidated as a result of this call.

returns: this PairingHeap (to allow chaining)
time complexity: O(1)

Note that a lot of modules "mix-in" methods from other modules so it's not uncommon to see, for example, collections.addAll in the summary for PairingHeap or SkipSet.

3. Download

The latest release can be downloaded from the Luaforge project page.

4. Repository access

The development version can be accessed via the Darcs repository:

darcs get http://www.cs.brandeis.edu/~pchiusano/repos/sano

5. Getting Help

I'll be monitoring the Luaforge forum (all messages get routed to my inbox), so if you have questions, comments, suggestions, or want help, you can post there and I'll respond as soon as I can. Of course, you can also email me directly.

6. Roadmap

Where is this all going? Well, I'm not sure. It really depends on what other people are interested in seeing. If you like the framework and have some idea for where you think it should go, or some addition you'd like to see, or if you'd like to contribute some new data structure or algorithm, post to the forum or email me.

MakeDoc2 by REBOL - 28-Feb-2006