This package has two parts:

• The first part provides tools to manipulate formulas.
• The second part provides functions to evaluate and check the marginal impacts of a linear model.

# 1 First Part: Manipulate formulas

### 1.0.1 different forms of x

Variables in Râ€™s linear formula/model can have different forms:

1. Model variables, the items showed up directly in the formula, separated by the â€˜+â€™ sign.
2. Raw variables, the underlying variables used.
3. Coefficient variables, the coefficient names; note that un-evaluated formulas donâ€™t have those variables.

### 1.0.2 model variables: `get_x(formula/model,'coeff')`

``````data = ggplot2::diamonds
diamond_lm  =  lm(log(price)~  I(carat^   2) + cut  + carat + table + carat:table, data)``````

At the first sight, the linear model above contains 5 variables:

• I(carat^ 2)
• cut
• carat
• table
• carat:table

In linear.tools we call them model variables and can access them using function `get_x(.,'model')`:

``get_x(diamond_lm,'model')``
``## [1] "I(carat^2)"  "cut"         "carat"       "table"       "carat:table"``

Note that in the original formula, there are redundant spaces â€˜I(carat^ 2)â€™; in `get_x(.,'model')` we deleted them.

### 1.0.3 raw variables: `get_x(formula/model,'coeff')`

Sometimes you want to get the underlying raw variables used in the formula, which are

• carat (the underlying variable for I(carat^ 2))
• cut
• carat
• table

In linear.tools we call them raw variables and can access them using function `get_x(.,'raw')`:

``get_x(diamond_lm,'raw')``
``## [1] "carat" "cut"   "table"``

`get_x(.,'model')` will show the linkage between model variables and raw variables: it will return a list with names as model variables and elements as their corresponding raw variables.

``get_model_pair(diamond_lm, data, 'raw')``
``````## \$`I(carat^2)`
## [1] "carat"
##
## \$cut
## [1] "cut"
##
## \$carat
## [1] "carat"
##
## \$table
## [1] "table"
##
## \$`carat:table`
## [1] "carat" "table"``````

### 1.0.4 coefficient variables: `get_x(model,'coeff')`

Sometimes you want the the coefficient names of the model

``get_x(diamond_lm,'coeff')``
``````## [1] "I(carat^2)"  "cut.L"       "cut.Q"       "cut.C"       "cut^4"
## [6] "carat"       "table"       "carat:table"``````

You may also want to see how â€˜modelâ€™ variables are linked with â€˜coeffâ€™ variables: `get_x(.,'coeff')` will return a list with names as model variables and elements as their corresponding coeff variables.

``get_model_pair(diamond_lm, data, 'coeff')``
``````## \$`I(carat^2)`
## [1] "I(carat^2)"
##
## \$cut
## [1] "cut.L" "cut.Q" "cut.C" "cut^4"
##
## \$carat
## [1] "carat"
##
## \$table
## [1] "table"
##
## \$`carat:table`
## [1] "carat:table"``````

### 1.0.6 get y : `get_y(formula/model)`

``get_y(diamond_lm,'raw')``
``## [1] "price"``
``get_y(diamond_lm,'model')``
``## [1] "log(price)"``

## 1.1 contrast: get_contrast(model)

Contrasts are how categorical variables show up in coefficients.

When R evaluate categorical variables in the linear model, R will transform them into sets of â€˜contrastsâ€™ using certain contrast encoding schedule. See UCLA idre for details.

For example, for categorical variable â€˜cutâ€™ in the above model, we can get its contrasts through function `get_contrast`

``````# get_contrast will return a list with each element as the contrasts of a categorical variable in the model
get_contrast(diamond_lm)``````
``````## \$cut
## [1] "cut.L" "cut.Q" "cut.C" "cut^4"``````

You can also return the contrast method.

``get_contrast(diamond_lm, return_method = T)``
``````## \$cut
## contr.poly``````

# 2 Second Part: Evaluate Marginal Effect

In formula `y ~ a + I(a^2) + b`, We define â€˜Marginal Effectâ€™ of `a` on `y` as: fixing `b`, how the change of `a` will affect value of `y`. Note that the marginal effect here is not just the coefficients for `a` and `I(a^2)`, neither the sum.

### 2.0.1 evaluate marginal effect: `effect`

We provide a easy tool to show the marginal effect and check its monotonicity. The example below will evaluate how the `carat` of the diamond will affect its `price` in a particular model.

``````# more carats, higher price.
diamond_lm3 = lm(price~  carat + I(carat^2) + I(carat^3) , ggplot2::diamonds) # a GLM

test1 = effect(model = diamond_lm3, focus_var_raw = c('carat'), focus_value =list(carat = seq(0.5,1,0.1))) ``````

``test1\$Monoton_Increase``
``## [1] TRUE``

You can see that the model did a good job to model monotonic increasing relations between `carat` and `price` when `carat` ranges from 0.5 to 1 (`\$Monoton_Increase` is `True`).

PS: A more interesting case is that, if you interact `carat` with the categorical variable `cut`, you can examine the marginal effects `carat` under different categories of `cut`

``````test_interaction = effect(model = lm(price~  carat*cut + I(carat^2)*cut, ggplot2::diamonds),
focus_var_raw = c('carat','cut'), focus_value =list(carat = seq(0.5,1,0.1))
) ``````

However, in the model `diamond_lm3` when we let the `carat` ranges from 0.5 to 6, the model failed to get the monotonic increasing relations: in the model below, when carat is larger than 3 approximately, the higher the carat, the lower the price!

``test2 = effect(model = diamond_lm3, focus_var_raw = c('carat'), focus_value =list(carat = seq(0.5,6,0.1))) ``

``test2\$Monoton_Increase``
``## [1] FALSE``

### 2.0.2 delete the marginal effect and re-evaluate

When a model has a wrong marginal effect, we can use function `deleting_wrongeffect` to delete a model variable that potentially causes the wrong marginal impacts and then re-estimate the model. This function can keep doing this until the correct marginal impacts are found.

The example below will

• first test the marginal effect of carat on price, which is supposed to be monotonic increasing.
• then as it finds incorrect marginal effect, it will delete one model variable that contains `carat` in the most right, and then recheck the marginal effect.
• It will keep doing the same thing until the marginal effect is correct, or all model variables containing `carat` are deleted.
``````model_correct_effect = deleting_wrongeffect(model = diamond_lm3,
focus_var_raw = 'carat',
focus_value = list(carat=seq(0.5,6,0.1)),
data = ggplot2::diamonds,
PRINT = T,STOP =F, PLOT = T,
Reverse = F)``````
``````##
## initial model:
##               Estimate     Pr(>|t|)
## (Intercept)  -198.3337 3.930283e-11
## carat         812.3639 1.540245e-19
## I(carat^2)   5813.2637 0.000000e+00
## I(carat^3)  -1308.8438 0.000000e+00
##
##
## check raw var:  carat
## check model var:  carat, I(carat^2), I(carat^3)
## Correct Monotonicity is supposed to be:  Increasing``````