SZaru: Porting of excellent Sawzall aggregators.

0.1.0

Overview

SZaru is a library to use Sawzall aggregators in pure C++, Ruby and Python. Currently, I have implemented the following 3 aggregators:

Top

Statistical samplings that record the 'top N' data items based on CountSketch algorithm from "Finding Frequent Items in Data Streams", Moses Charikar, Kevin Chen and Martin Farach-Colton, 2002.

Unique

Statistical estimators for the total number of unique data items.

Quantile
Approximate N-tiles for data items from an ordered domain based on the following paper: Munro & Paterson, "Selection and Sorting with Limited Storage", Theoretical Computer Science, Vol 12, p 315-323, 1980.

Example

include <iostream> 
include <szaru.h>                              
using namespace std;
using namespace SZaru;
TopEstimator<int32_t> *topEst = TopEstimator<int32_t>::Create(3);                                         
topEst->AddWeightedElem("abc", 1);                                                                        
topEst->AddWeightedElem("def", 2);                                                                        
topEst->AddWeightedElem("ghi", 3);                                                                        
topEst->AddWeightedElem("def", 4);                                                                        
topEst->AddWeightedElem("jkl", 5);
vector< TopEstimator<int32_t>::Elem > topElems;                                                           
topEst->Estimate(topElems);
cout << topElems[0].value << ", " << topElems[0].weight << endl; // => def, 6                                 
cout << topElems[1].value << ", " << topElems[1].weight << endl; // => jkl, 5                             
cout << topElems[2].value << ", " << topElems[2].weight << endl; // => ghi, 3
delete topEst;          

License

Apache License Version 2.0

 All Classes Functions
Generated on Sat Nov 13 18:14:59 2010 for SZaru by  doxygen 1.6.3