[SPARK-33111][ML] aft transform optimization
### What changes were proposed in this pull request?
1, when `predictionCol` and `quantilesCol` are both set, we only need one prediction for each row: prediction is just the variable `lambda` in `predictQuantiles`;
2, in the computation of variable `quantiles` in `predictQuantiles`, a pre-computed vector `val baseQuantiles = $(quantileProbabilities).map(q => math.exp(math.log(-math.log1p(-q)) * scale))` can be reused for each row;
### Why are the changes needed?
avoid redundant computation in transform, like what we did in `ProbabilisticClassificationModel`, `GaussianMixtureModel`, etc
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
existing testsuite
Closes #30000 from zhengruifeng/aft_predict_transform_opt.
Authored-by: zhengruifeng <ruifengz@foxmail.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
diff --git a/mllib/src/main/scala/org/apache/spark/ml/regression/AFTSurvivalRegression.scala b/mllib/src/main/scala/org/apache/spark/ml/regression/AFTSurvivalRegression.scala
index f301c34..595a2f0 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/regression/AFTSurvivalRegression.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/regression/AFTSurvivalRegression.scala
@@ -421,9 +421,17 @@
}
if (hasQuantilesCol) {
- val predictQuantilesUDF = udf { features: Vector => predictQuantiles(features)}
+ val baseQuantiles = $(quantileProbabilities)
+ .map(q => math.exp(math.log(-math.log1p(-q)) * scale))
+ val lambdaCol = if ($(predictionCol).nonEmpty) {
+ predictionColumns.head
+ } else {
+ udf { features: Vector => predict(features) }.apply(col($(featuresCol)))
+ }
+ val predictQuantilesUDF =
+ udf { lambda: Double => Vectors.dense(baseQuantiles.map(q => q * lambda)) }
predictionColNames :+= $(quantilesCol)
- predictionColumns :+= predictQuantilesUDF(col($(featuresCol)))
+ predictionColumns :+= predictQuantilesUDF(lambdaCol)
.as($(quantilesCol), outputSchema($(quantilesCol)).metadata)
}