In-memory mapreduce implementation of Random Decision Forests
Each mapper is responsible for growing a number of trees with a whole copy of the dataset loaded in memory, it uses the reference implementation's code to build each tree and estimate the oob error.
The dataset is distributed to the slave nodes using the DistributedCache. A custom InputFormat (InMemInputFormat) is configured with the desired number of trees and generates a number of InputSplits (InMemInputSplit) equal to the configured number of maps (mapred.map.tasks).
There is no need for reducers, each map outputs (MapredOutput) the trees it built and, for each tree, the labels the tree predicted for each out-of-bag instance. This step has to be done in the mapper because only there we know which instances are o-o-b.
The Forest builder (InMemBuilder) is responsible for configuring and launching the job. At the end of the job it parses the output files and builds the corresponding DecisionForest, and for each tree prediction it calls (if available) a PredictionCallback that allows the caller to compute any error needed.