Multiset

Overview

Collection in which duplicate elements are not explicitly stored.

A Multiset is implemented as a map in which the keys are the elements of the Multiset and the value stored against key is a number denoting the quantity (or multiplicity) of that element. When an element is added to a multiset, its quantity is incremented by 1.

Multisets are handy for efficiently processing collections of objects which are expected to contain a large number of duplicates. For instance, to sort a list of one million integers in the range 1..5:

> m = Multiset:new(SkipMap) 
> m:addAll(iter.repeatCall(1e6, math.random, 1, 5))
> print(m)
{1^199981, 2^199613, 3^200520, 4^200379, 5^199507}
> sorted = vector(m:iter())

The constructor takes an optional map implementation, which defaults to SkipMap (thus it wasn't neccessary to supply SkipMap to the constructor above).

Computing the sum of the numbers in a Multiset is also efficient:

> sum = 0
>  for e, quantity in m:uniqueIter() do
>>   sum = sum + (e*quantity)
>> end

Other operations, such as union and intersection, can be computed more efficiently by Multisets by taking advantage of implicit storage of duplicates.

Summary

newReturns a new Multiset, using map as the backing map, or Multiset.cDefaultMapType (initially SkipMap) if map is unspecified.
makeCreates and returns a new instance of self containing the elements passed in as vararg params.
__eqReturns true if both Multisets have the same unique set of elements and the same quantities for each of these elements.
__tostringReturns a string representation of this Multiset.
addAdds quantity copies of element to this Multiset, default 1.
addAllAdds all elements in collection to this Multiset.
containsReturns whether there are > 0 copies of element in this Multiset.
getQuantityRetrieves the number of copies of element in this Multiset, or 0 if the element does not exist.
intersectionCountEfficiently computes the number of overlapping elements between this Multiset and otherMultiset.
iterReturns an 'uncompressed' iterator over the elements in this Multiset, taking quanitity into account.
removeRemove quantity copies of element from this Multiset, default 1.
removeAllRemoves all elements in iterable from this Multiset, respecting quantity.
removeAllEntirelyRemove all copies of elements in iterable that exist in this Multiset.
retainOnlyModifies this Multiset so that it is the intersection (respecting quantity) of itself and iterable.
setQuantitySets the number of copies of element that are stored in this Multiset and returns the old quantity.
sizeReturns the total number of elements in this Multiset.
testUnit test.
uniqueIterReturns an iterator over the unique set of elements in this Multiset.
uniqueSizeReturns the total number of unique elements in this Multiset.
xorCountReturns the total number of unshared elements between this Multiset and otherMultiset.
unionReturns the union of set1 and set2 as a new set.
hashReturns the sum of utils.hash(i) for i in set.
differenceReturns the set that would be created by removing from set1 all elements in intersection(set1,set2).
cDefaultMapTypeThe default backing map for this Multiset, initially SkipMap.
containsAllReturns true iff collection:contains(e) for e in iter(iterable).
containsAnyReturns true if for collection:contains(e) for some e in iter(iterable).
differenceReturns the set that would be created by removing from set1 all elements in intersection(set1,set2).
enumDecorates another iteration to return the index of each element returned from the decorated iteration.
intersectionComputes and returns the intersection of set1 and set2.
differenceReturns the set that would be created by removing from set1 all elements in intersection(set1,set2).
removeElementEquivalent to set:remove(element).
unionReturns the union of set1 and set2 as a new set.
uniqueFilterCreate and return a uniqueness filter based on set.
xorReturns the set of elements which appear in either set1 or set2, but not in both.

Detail

Multiset:new([map])

Returns a new Multiset, using map as the backing map, or Multiset.cDefaultMapType (initially SkipMap) if map is unspecified.

There are not separate 'classes' for each different backing map type; instead, the newly returned multiset can be used to create other multisets with the same default backing map, for instance:

> HashMultiset = Multiset:new(HashMap)
> multi = HashMultiset:make(1,2,2,3,4)

collections.make(self, ...)

Creates and returns a new instance of self containing the elements passed in as vararg params.

If #args is 1, the argument is assumed to be iterable and its elements are added to the Collection.

Example, assuming Collection is a sequence type:

> print( Collection:make(1,2,3,4) )
[1, 2, 3, 4]
> print( Collection:make(iter.count(4) )
[1, 2, 3, 4]
> v = Collection:make(1,2,3,4); print( Collection:make(v) )
[1, 2, 3, 4]

This method is used as the make method for LinkedList, SkipVector, HashSet, SkipSet, QueueVector, and PairingHeap. It assumes that self has new() and addAll() methods.

Multiset:__eq(other)

Returns true if both Multisets have the same unique set of elements and the same quantities for each of these elements.

Multiset:__tostring()

Returns a string representation of this Multiset.

{a^3, b^1, e^12} signifies a multiset with 3 copies of element a, 1 copy of element b, and 12 copies of element e.

Multiset:add(element [,quantity])

Adds quantity copies of element to this Multiset, default 1.

param element: the element to add param quantity: the number of copies to add, defaults to 1 returns: the updated total number of copies of element in this multiset

Multiset:addAll(iterable)

Adds all elements in collection to this Multiset.

If iterable is also a Multiset, this takes time O(u), where u is the number of unique elements in iterable. Otherwise, this takes O(n), where n is the total number of elements in iterable.

Multiset:contains(element)

Returns whether there are > 0 copies of element in this Multiset.

Multiset:getQuantity(element)

Retrieves the number of copies of element in this Multiset, or 0 if the element does not exist.

Multiset:intersectionCount(otherMultiset)

Efficiently computes the number of overlapping elements between this Multiset and otherMultiset.

Multiset:iter([start, stop])

Returns an 'uncompressed' iterator over the elements in this Multiset, taking quanitity into account.

The optional start and stop parameters are passed to the backing map.

Multiset:remove(element, quantity)

Remove quantity copies of element from this Multiset, default 1.

param element: the element to remove param quantity: the number of copies to remove, defaults to 1 returns: element and the actual number of copies that were removed example: {a^3, b^6}:remove(b, 32) will return b, 6

Multiset:removeAll(iterable)

Removes all elements in iterable from this Multiset, respecting quantity.

Thus, if self={a^4, b^5, e^11} and iterable={a^2, b^3}, after a call to self:removeAll(iterable), self would be {a^2, b^2, e^11}. To remove all elements entirely, ignoring multiplicity, use removeAllEntirely().

If collection is also a Multiset, this runs in O(u), where u is the number of unique elements in iterable. Otherwise, this takes time O(n), where n is the total number of elements in iterable.

Multiset:removeAllEntirely(iterable)

Remove all copies of elements in iterable that exist in this Multiset.

Multiset:retainOnly(iterable)

Modifies this Multiset so that it is the intersection (respecting quantity) of itself and iterable. If iterable is also a Multiset, this takes time O(u*c), where u is the number of unique items in this Multiset and c is the time for a lookup in the backing map. If not, then it takes O(n + u*c), where n is the number of elements in iterable.

Multiset:setQuantity(element, quantity)

Sets the number of copies of element that are stored in this Multiset and returns the old quantity.

Multiset:size()

Returns the total number of elements in this Multiset.

This is also the total number of elements that would be returned by a call to iter().

Example: {a^3, b^7, c^8} has size 3+7+8=18 and uniqueSize=3

Multiset:test()

Unit test.

Multiset:uniqueIter([start, stop])

Returns an iterator over the unique set of elements in this Multiset.

The iteration order is determined by the iteration order of the map used to implement this Multiset. Each call to the iterator return a pair e,quantity, where e is a unique element in this Multiset, and quantity is the number of copies of e in the Multiset.

param start, stop: these are passed as params to the iterator of the underlying map

Multiset:uniqueSize()

Returns the total number of unique elements in this Multiset. This is also the total number of elements that would be returned by a call to uniqueIter().

Example: {a^3, b^7, c^8} has uniqueSize of 3 (and size of 3+7+8=18)

Multiset:xorCount(other)

Returns the total number of unshared elements between this Multiset and otherMultiset.

sets.union(set1, set2 [, setType])

Returns the union of set1 and set2 as a new set.

The returned set is created by a call to:

(setType or getmetatable(set1)):make(...)

synonyms: __add

sets.hash(set)

Returns the sum of utils.hash(i) for i in set.

This is the hash function used by all set implementations.

sets.difference(set1, set2 [,setType])

Returns the set that would be created by removing from set1 all elements in intersection(set1,set2).

The returned set is created by a call to:

(setType or getmetatable(set1)):new()

synonyms: __sub, minus

Multiset.cDefaultMapType

The default backing map for this Multiset, initially SkipMap.

collections.containsAll(collection, elements)

Returns true iff collection:contains(e) for e in iter(iterable).

collections.containsAny(collection, iterable)

Returns true if for collection:contains(e) for some e in iter(iterable).

sets.difference(set1, set2 [,setType])

Returns the set that would be created by removing from set1 all elements in intersection(set1,set2).

The returned set is created by a call to:

(setType or getmetatable(set1)):new()

synonyms: __sub, minus

iter.enum(iterable [, size, stop, step])

Decorates another iteration to return the index of each element returned from the decorated iteration.

If iterable is an object with a size() method, or if size is explicitly specified, the iteration will halt after iterable:size() steps. Otherwise, the iteration halts as soon as iter(iterable) halts (this may be undesireable if the iteration is expected to contain nil values)

If 2 args are given the indices will range from 1 to arg[2]. If 3 args are given the second and third arguments are interpreted as a counting range for the enumeration; If a 4th argument is specified, args 2 and 3 specify the counting range, and arg 4 specifies the step increment.

Examples:

> for ind,v in vector("abcd"):enum() do print(ind,v) end
1       a
2       b
3       c
4       d
> vec = vector("abc"); vec:add(nil)
> for ind,v in vec:enum() do print(ind, v) end
1       a
2       b
3       c
4       nil

sets.intersection(set1, set2 [, setType])

Computes and returns the intersection of set1 and set2.

The returned set is created by a call to:

(setType or getmetatable(set1)):new()

sets.difference(set1, set2 [,setType])

Returns the set that would be created by removing from set1 all elements in intersection(set1,set2).

The returned set is created by a call to:

(setType or getmetatable(set1)):new()

synonyms: __sub, minus

sets.removeElement(set, element)

Equivalent to set:remove(element).

In a sequence, the remove() method takes as input an index while the removeElement method takes as input the element to be removed. This method has the same semantics across both set and sequence implementations.

sets.union(set1, set2 [, setType])

Returns the union of set1 and set2 as a new set.

The returned set is created by a call to:

(setType or getmetatable(set1)):make(...)

synonyms: __add

sets.uniqueFilter(set)

Create and return a uniqueness filter based on set.

The filter is a function which returns true only for objects it has not yet seen. This can be used in conjuction with iter.filter. For example:

> v = vector(1,1,1,2,3,3,4,1,5,5)
> print(vector(iter.filter(v, HashSet:uniqueFilter())))
[1, 2, 3, 4, 5]

sets.xor(set1, set2 [,setType])

Returns the set of elements which appear in either set1 or set2, but not in both.

This operation is also often called the symmetric set difference.