| // Licensed to the Apache Software Foundation (ASF) under one or more |
| // contributor license agreements. See the NOTICE file distributed with |
| // this work for additional information regarding copyright ownership. |
| // The ASF licenses this file to You under the Apache License, Version 2.0 |
| // (the "License"); you may not use this file except in compliance with |
| // the License. You may obtain a copy of the License at |
| // |
| // http://www.apache.org/licenses/LICENSE-2.0 |
| // |
| // Unless required by applicable law or agreed to in writing, software |
| // distributed under the License is distributed on an "AS IS" BASIS, |
| // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| // See the License for the specific language governing permissions and |
| // limitations under the License. |
| = Bagging |
| |
| Bagging stands for bootstrap aggregation. One way to reduce the variance of an estimate is to average together multiple estimates. For example, we can train M different trees on different subsets of the data (chosen randomly with replacement) and compute the ensemble: |
| |
| image::images/bagging.png[] |
| |
| Bagging uses bootstrap sampling to obtain the data subsets for training the base learners. For aggregating the outputs of base learners, bagging uses voting for classification and averaging for regression. |
| |
| |
| [source, java] |
| ---- |
| // Define the weak classifier. |
| DecisionTreeClassificationTrainer trainer = new DecisionTreeClassificationTrainer(5, 0); |
| |
| // Set up the bagging process. |
| BaggedTrainer<Double> baggedTrainer = TrainerTransformers.makeBagged( |
| trainer, // Trainer for making bagged |
| 10, // Size of ensemble |
| 0.6, // Subsample ratio to whole dataset |
| 4, // Feature vector dimensionality |
| 3, // Feature subspace dimensionality |
| new OnMajorityPredictionsAggregator()) |
| .withEnvironmentBuilder(LearningEnvironmentBuilder |
| .defaultBuilder() |
| .withRNGSeed(1) |
| ); |
| |
| // Train the Bagged Model. |
| BaggedModel mdl = baggedTrainer.fit( |
| ignite, |
| dataCache, |
| vectorizer |
| ); |
| ---- |
| |
| |
| TIP: A commonly used class of ensemble algorithms are forests of randomized trees. |
| |
| == Example |
| |
| The full example could be found as a part of the Titanic tutorial https://github.com/apache/ignite/blob/master/examples/src/main/java/org/apache/ignite/examples/ml/tutorial/Step_10_Bagging.java[here]. |
| |