The Bayes example package provides some helper classes for training the Naive Bayes classifier on the Twenty Newsgroups
data. See {@link org.apache.mahout.classifier.bayes.PrepareTwentyNewsgroups} for details on running the trainer. See
{@link org.apache.mahout.classifier.bayes.PrepareTwentyNewsgroups} for details on formatting the Twenty Newsgroups data
properly for the training.
The easiest way to prepare the data is to use the ant task in core/build.xml:
ant extract-20news-18828
This runs the arg line:
-p ${working.dir}/20news-18828/ -o ${working.dir}/20news-18828-collapse -a ${analyzer} -c UTF-8