A trainer learns the function f(x)=y, or weights W, of the following form to predict a label y where x is a feature vector. y=f(x)=Wx

Without a bias clause (or regularization), f(x) cannot make a hyperplane that divides (1,1) and (2,2) becuase f(x) crosses the origin point (0,0).

With bias clause b, a trainer learns the following f(x). f(x)=Wx+b Then, the predicted model considers bias existing in the dataset and the predicted hyperplane does not always cross the origin.

add_bias() of Hivemall, adds a bias to a feature vector. To enable a bias clause, use add_bias() for both(important!) training and test data as follows. The bias b is a feature of “0” (“-1” in before v0.3) by the default. See AddBiasUDF for the detail.

Note that Bias is expressed as a feature that found in all training/testing examples.

Adding a bias clause to test data

create table e2006tfidf_test_exploded as
select 
  rowid,
  target,
  split(feature,":")[0] as feature,
  cast(split(feature,":")[1] as float) as value
  -- extract_feature(feature) as feature, -- hivemall v0.3.1 or later
  -- extract_weight(feature) as value     -- hivemall v0.3.1 or later
from 
  e2006tfidf_test LATERAL VIEW explode(add_bias(features)) t AS feature;

Adding a bias clause to training data

create table e2006tfidf_pa1a_model as
select 
 feature,
 avg(weight) as weight
from 
 (select 
     pa1a_regress(add_bias(features),target) as (feature,weight)
  from 
     e2006tfidf_train_x3
 ) t 
group by feature;