Using the E2006 tfidf regression example, we explain how to evaluate the prediction model on Hive.
select avg(actual), avg(predicted) from e2006tfidf_pa2a_submit;
-3.8200363760415414 -3.9124877451612488
set hivevar:mean_actual=-3.8200363760415414; select -- Root Mean Squared Error rmse(predicted, actual) as RMSE, -- sqrt(sum(pow(predicted - actual,2.0))/count(1)) as RMSE, -- Mean Squared Error mse(predicted, actual) as MSE, -- sum(pow(predicted - actual,2.0))/count(1) as MSE, -- Mean Absolute Error mae(predicted, actual) as MAE, -- sum(abs(predicted - actual))/count(1) as MAE, -- coefficient of determination (R^2) -- 1 - sum(pow(actual - predicted,2.0)) / sum(pow(actual - ${mean_actual},2.0)) as R2 r2(actual, predicted) as R2 -- supported since Hivemall v0.4.1-alpha.5 from e2006tfidf_pa2a_submit;
0.38538660838804495 0.14852283792484033 0.2466732002711477 0.48623913673053565
Logarithmic Loss can be computed as follows:
WITH t as ( select 0 as actual, 0.01 as predicted union all select 1 as actual, 0.02 as predicted ) select -SUM(actual*LN(predicted)+(1-actual)*LN(1-predicted))/count(1) as logloss1, logloss(predicted, actual) as logloss2 -- supported since Hivemall v0.4.2-rc.1 from from t;
1.9610366706408238 1.9610366706408238