docs/_docs/machine-learning/ensemble-methods/bagging.adoc - ignite - Git at Google

 // Licensed to the Apache Software Foundation (ASF) under one or more
 // contributor license agreements.  See the NOTICE file distributed with
 // this work for additional information regarding copyright ownership.
 // The ASF licenses this file to You under the Apache License, Version 2.0
 // (the "License"); you may not use this file except in compliance with
 // the License.  You may obtain a copy of the License at
 //
 // http://www.apache.org/licenses/LICENSE-2.0
 //
 // Unless required by applicable law or agreed to in writing, software
 // distributed under the License is distributed on an "AS IS" BASIS,
 // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 // See the License for the specific language governing permissions and
 // limitations under the License.
 = Bagging

 Bagging stands for bootstrap aggregation. One way to reduce the variance of an estimate is to average together multiple estimates. For example, we can train M different trees on different subsets of the data (chosen randomly with replacement) and compute the ensemble:

 image::images/bagging.png[]

 Bagging uses bootstrap sampling to obtain the data subsets for training the base learners. For aggregating the outputs of base learners, bagging uses voting for classification and averaging for regression.


 [source, java]
 ----
 // Define the weak classifier.
 DecisionTreeClassificationTrainer trainer = new DecisionTreeClassificationTrainer(5, 0);

 // Set up the bagging process.
 BaggedTrainer<Double> baggedTrainer = TrainerTransformers.makeBagged(
   trainer, // Trainer for making bagged
   10,      // Size of ensemble
   0.6,     // Subsample ratio to whole dataset
   4,       // Feature vector dimensionality
   3,       // Feature subspace dimensionality
   new OnMajorityPredictionsAggregator())
   .withEnvironmentBuilder(LearningEnvironmentBuilder
                           .defaultBuilder()
                           .withRNGSeed(1)
                          );

 // Train the Bagged Model.
 BaggedModel mdl = baggedTrainer.fit(
   ignite,
   dataCache,
   vectorizer
 );
 ----


 TIP: A commonly used class of ensemble algorithms are forests of randomized trees.

 == Example

 The full example could be found as a part of the Titanic tutorial https://github.com/apache/ignite/blob/master/examples/src/main/java/org/apache/ignite/examples/ml/tutorial/Step_10_Bagging.java[here].
	// Licensed to the Apache Software Foundation (ASF) under one or more
	// contributor license agreements. See the NOTICE file distributed with
	// this work for additional information regarding copyright ownership.
	// The ASF licenses this file to You under the Apache License, Version 2.0
	// (the "License"); you may not use this file except in compliance with
	// the License. You may obtain a copy of the License at
	//
	// http://www.apache.org/licenses/LICENSE-2.0
	//
	// Unless required by applicable law or agreed to in writing, software
	// distributed under the License is distributed on an "AS IS" BASIS,
	// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	// See the License for the specific language governing permissions and
	// limitations under the License.
	= Bagging

	Bagging stands for bootstrap aggregation. One way to reduce the variance of an estimate is to average together multiple estimates. For example, we can train M different trees on different subsets of the data (chosen randomly with replacement) and compute the ensemble:

	image::images/bagging.png[]

	Bagging uses bootstrap sampling to obtain the data subsets for training the base learners. For aggregating the outputs of base learners, bagging uses voting for classification and averaging for regression.


	[source, java]
	----
	// Define the weak classifier.
	DecisionTreeClassificationTrainer trainer = new DecisionTreeClassificationTrainer(5, 0);

	// Set up the bagging process.
	BaggedTrainer<Double> baggedTrainer = TrainerTransformers.makeBagged(
	trainer, // Trainer for making bagged
	10, // Size of ensemble
	0.6, // Subsample ratio to whole dataset
	4, // Feature vector dimensionality
	3, // Feature subspace dimensionality
	new OnMajorityPredictionsAggregator())
	.withEnvironmentBuilder(LearningEnvironmentBuilder
	.defaultBuilder()
	.withRNGSeed(1)
	);

	// Train the Bagged Model.
	BaggedModel mdl = baggedTrainer.fit(
	ignite,
	dataCache,
	vectorizer
	);
	----


	TIP: A commonly used class of ensemble algorithms are forests of randomized trees.

	== Example

	The full example could be found as a part of the Titanic tutorial https://github.com/apache/ignite/blob/master/examples/src/main/java/org/apache/ignite/examples/ml/tutorial/Step_10_Bagging.java[here].