The testing module

CheckModel Class

class CheckModel(fitted_model: RegressionResultsWrapper)

Check Linear Regression Assumptions

The CheckModel class provides methods to test the assumptions of the linear regression model. These assumptions are:

  • Normality of residuals

  • Homoscedasticity (equal variance) of residuals

  • Independence of residuals

  • No Multicollinearity among predictors

__init__(fitted_model: RegressionResultsWrapper)

Initializes the CheckModel class.

Parameters:

fitted_model – The fitted linear regression model which is an instance of RegressionResultsWrapper.

check_normality(alpha=0.05, plot=True, return_pvals=False)

Checks the normality assumption of the residuals using several statistical tests.

check_homocedasticity(alpha=0.05, plot=True, return_pvals=False)

Checks the homoscedasticity assumption (equal variance of residuals) using several statistical tests.

check_independence(alpha=0.05, plot=True, return_vals=False)

Checks the independence assumption of the residuals using several statistical tests.

check_multicollinearity(plot=True, return_cm=False)

Checks the multicollinearity assumption among predictors using the variance inflation factor (VIF).

check_all(alpha=0.05, plot=True, return_vals=False)

Checks all the assumptions of the linear regression model.

Examples

import statsmodels.api as sm
from sklearn.datasets import load_diabetes
from estyp.testing import CheckModel

diabetes = load_diabetes()
X = diabetes["data"]
y = diabetes["target"]
X = sm.add_constant(X)
model = sm.OLS(y, X)
fitted_model = model.fit()
cm = CheckModel(fitted_model)
cm.check_all()
Made by Esteban Rucán. Contact me in LinkedIn: https://www.linkedin.com/in/estebanrucan/
Normality tests results:
- Residuals appear as normally distributed according to KS test (p-value = 0.974).
- Residuals appear as normally distributed according to Shapiro-Wilk test (p-value = 0.616).
- Residuals appear as normally distributed according to Jarque-Bera test (p-value = 0.496).
- Residuals appear as normally distributed according to Omni test (p-value = 0.471).
_images/testing_1_2.png
Homocedasticity tests results:
- Heteroscedasticity (non-constant error variance) detected according to Breusch-Pagan test (p-value = 0.003).
- Heteroscedasticity (non-constant error variance) detected according to White test (p-value = 0.012).
- Error variance appears to be homoscedastic according to Goldfeld-Quandt test (p-value = 0.595).
_images/testing_1_4.png
Independence tests results:
- Residuals appear to be independent and not autocorrelated according to DW test (DW-Statistic = 2.029)
- Residuals appear to be independent and not autocorrelated according to Box-Pierce test (p-value = 0.351).
- Residuals appear to be independent and not autocorrelated according to Breusch-Godfrey test (p-value = 0.184).
_images/testing_1_6.png
Multicollinearity test results:
- The model may have multicollinearity problems (condition number = 227.22).
_images/testing_1_8.png

F Test to Compare Two Variances

var_test(x, y, ratio=1, alternative='two-sided', conf_level=0.95)

Performs an F test to compare the variances of two samples from normal populations. This function is inspired by the var.test() function of the software R.

Parameters:
  • y (x,) – numeric list, np.array or pd.Series of data values.

  • ratio – the hypothesized ratio of the population variances of x and y.

  • alternative – a character string specifying the alternative hypothesis, must be one of “two-sided” (default), “greater” or “less”. You can specify just the initial letter.

  • conf_level – a number between 0 and 1 indicating the confidence level of the interval.

Details

The null hypothesis is that the ratio of the variances of the populations from which x and y were drawn, is equal to ratio.

Value

An instance of the TestResults class containing the following attributes:

  • statistic: the value of the F test statistic.

  • df: the degrees of freedom for the F test statistic.

  • p_value: the p-value for the test.

  • ci: a confidence interval for the ratio of the population variances.

  • estimate: the ratio of the sample variances of x and y.

  • alternative: a string describing the alternative hypothesis.

Examples

import numpy as np
from estyp.testing import var_test

np.random.seed(2023)
x = np.random.normal(size=100)
y = np.random.normal(size=100)

print("1 - F Test for Two Samples")
print(var_test(x, y))
print("2 - F Test for Two Samples changing alternative hypothesis")
print(var_test(x, y, alternative="less"))
print("3 - F Test for Two Samples changing ratio")
print(var_test(x, y, ratio=0.9, alternative="greater"))
1 - F Test for Two Samples

    F test to compare two variances
    F = 1.2805 | df: {'x': 99, 'y': 99} | p-value = 0.2205
    alternative hypothesis: true ratio of variances is not equal to 1
    95 percent confidence interval:
     0.861545 1.903058
    sample estimates:
      ratio of variances: 1.280457
    
2 - F Test for Two Samples changing alternative hypothesis

    F test to compare two variances
    F = 1.2805 | df: {'x': 99, 'y': 99} | p-value = 0.8898
    alternative hypothesis: true ratio of variances is less than 1
    95 percent confidence interval:
     0.000000 1.785035
    sample estimates:
      ratio of variances: 1.280457
    
3 - F Test for Two Samples changing ratio

    F test to compare two variances
    F = 1.4227 | df: {'x': 99, 'y': 99} | p-value = 0.0405
    alternative hypothesis: true ratio of variances is greater than 0.9
    95 percent confidence interval:
     0.918508 inf
    sample estimates:
      ratio of variances: 1.280457
    

Student’s t-Test

t_test(x, y=None, alternative='two-sided', mu=0, paired=False, var_equal=False, conf_level=0.95)

Performs one and two sample t-tests on groups of data. This function is inspired by the t.test() function of the software R.

Parameters:
  • x – a (non-empty) numeric container of data values.

  • y – an (optional) numeric container of data values.

  • alternative – a string specifying the alternative hypothesis, must be one of “two-sided” (default), “greater” or “less”.

  • mu – a number indicating the true value of the mean (or difference in means if you are performing a two sample test).

  • paired – a logical indicating whether you want a paired t-test.

  • var_equal – a logical variable indicating whether to treat the two variances as being equal. If True then the pooled variance is used to estimate the variance otherwise the Welch (or Satterthwaite) approximation to the degrees of freedom is used.

  • conf_level – a number between 0 and 1 indicating the confidence level of the interval.

Details

alternative = “greater” is the alternative that x has a larger mean than y. For the one-sample case: that the mean is positive.

If paired is True then both x and y must be specified and they must be the same length. Missing values are silently removed (in pairs if paired is True). If var_equal is True then the pooled estimate of the variance is used. By default, if var_equal is False then the variance is estimated separately for both groups and the Welch modification to the degrees of freedom is used.

Value

An instance of the TestResults class containing the following attributes:

  • statistic: the value of the t-statistic.

  • df: the degrees of freedom for the t-statistic.

  • p_value: the p-value for the test.

  • ci: a confidence interval for the mean appropriate to the specified alternative hypothesis.

  • estimate: the estimated mean or list of estimated means depending on whether it was a one-sample test or a two-sample test.

  • alternative: a character string describing the alternative hypothesis.

  • mu: the mean of the null hypothesis.

Examples

import numpy as np
from estyp.testing import t_test

np.random.seed(2023)
x = np.random.normal(size=100)
y = np.random.normal(size=100)
mu = 0.1

print("1 - One Sample Test")
print(t_test(x, mu=mu, alternative="less"))
print("2 - Two Sample Test")
print(t_test(x, y, mu=mu))
print("3 - Two Sample Test with Equal Variances")
print(t_test(x, y, mu=mu, var_equal=True, alternative="greater"))
print("4 - Paired Test")
print(t_test(x, y, mu=mu, paired=True))
1 - One Sample Test

    One Sample t-test
    T = -1.3237 | df: 99 | p-value = 0.0943
    alternative hypothesis: true mean is less than 0.1
    95 percent confidence interval:
     -inf 0.138028
    sample estimates:
      mean of x: -0.049492
    
2 - Two Sample Test

    Welch's Two Sample t-test
    T = -1.2046 | df: 195.05 | p-value = 0.2298
    alternative hypothesis: true difference in means is not equal to 0.1
    95 percent confidence interval:
     -0.478794 0.115698
    sample estimates:
      [mean of x, mean of y]: [-0.049492, 0.032056]
    
3 - Two Sample Test with Equal Variances

    Two Sample t-test
    T = -1.2046 | df: 198 | p-value = 0.8851
    alternative hypothesis: true difference in means is greater than 0.1
    95 percent confidence interval:
     -0.430622 inf
    sample estimates:
      [mean of x, mean of y]: [-0.049492, 0.032056]
    
4 - Paired Test

    Paired t-test
    T = -1.2772 | df: 99 | p-value = 0.2045
    alternative hypothesis: true mean difference is not equal to 0.1
    95 percent confidence interval:
     -0.463595 0.100499
    sample estimates:
      [mean of x, mean of y]: [-0.049492, 0.032056]
    

Nested Models F-Test Function

nested_models_test(fitted_small_model, fitted_big_model)

This function performs a nested models F-test using deviance from two fitted models from statsmodels library. The test compares two nested models: a larger or “big” model and a smaller or “small” model. The purpose of this test is to determine whether the larger model significantly improves the model fit compared to the smaller model by adding additional predictors.

Parameters:
  • fitted_small_model (RegressionResultsWrapper) – The fitted model representing the smaller/nested model. It has to come from statsmodels.

  • fitted_big_model (RegressionResultsWrapper) – The fitted model representing the larger model, which includes all the predictors from the smaller model and potentially additional predictors. It has to come from statsmodels.

Returns:

The function returns an object of class TestResults that contains the following information:

  • method: A string indicating the name of the statistical test (Nested models F-test).

  • statistic: The computed F-statistic value.

  • estimate: The difference in deviances between the models.

  • df: A dictionary with the degrees of freedom for the numerator and denominator of the F-statistic.

  • p_value: The p-value associated with the F-statistic.

Examples:

  • Example 1: With OLS

import pandas as pd
import statsmodels.api as sm
from estyp.testing import nested_models_test

data = pd.DataFrame({
    "x": [2.01, 2.99, 4.01, 5.01, 6.89],
    "y": [2, 3, 4, 5, 6]
})
model_small = sm.OLS.from_formula("y ~ 1", data).fit()
model_big = sm.OLS.from_formula("y ~ x", data).fit()
print(nested_models_test(model_small, model_big))

    Nested models F-test
    F = 134.2747 | df: {'df_num': 1, 'df_den': 3} | p-value = 0.0014
    alternative hypothesis: big model is true
    sample estimates:
      Difference in deviances between models: 9.781460
    
  • Example 2: With Logit

data = pd.DataFrame({
    "x": [2.01, 2.99, 4.01, 3.01, 4.89],
    "y": [0, 1, 1, 0, 1]
})
model_small = sm.Logit.from_formula("y ~ 1", data).fit()
model_big = sm.Logit.from_formula("y ~ x", data).fit()
print(nested_models_test(model_small, model_big))
Optimization terminated successfully.
         Current function value: 0.673012
         Iterations 4
Optimization terminated successfully.
         Current function value: 0.290002
         Iterations 10

    Nested models F-test
    F = 3.9621 | df: {'df_num': 1, 'df_den': 3} | p-value = 0.1406
    alternative hypothesis: big model is true
    sample estimates:
      Difference in deviances between models: 3.830096
    
  • Example 3: With GLM

data = pd.DataFrame({
    "x": [2.01, 2.99, 4.01, 5.01, 6.89],
    "y": [2, 3, 4, 5, 6]
})
model_small = sm.GLM.from_formula("y ~ 1", data, family = sm.families.Gamma()).fit()
model_big = sm.GLM.from_formula("y ~ x", data, family = sm.families.Gamma()).fit()
print(nested_models_test(model_small, model_big))

    Nested models F-test
    F = 13.0985 | df: {'df_num': 1, 'df_den': 3} | p-value = 0.0363
    alternative hypothesis: big model is true
    sample estimates:
      Difference in deviances between models: 0.573166
    

Test of Equal or Given Proportions

prop_test(x, n=None, p=None, alternative='two-sided', conf_level=0.95, correct=True)
Parameters:
  • x (array_like) – A vector of counts of successes, a one-dimensional table with two entries, or a two-dimensional table (or matrix) with 2 columns, giving the counts of successes and failures, respectively.

  • n (array_like, optional) – A vector of counts of trials; ignored if x is a matrix or a table. If not provided, it is calculated as the sum of the elements in x.

  • p (array_like, optional) – A vector of probabilities of success. The length of p must be the same as the number of groups specified by x, and its elements must be greater than 0 and less than 1.

  • alternative (str, optional) – A character string specifying the alternative hypothesis, must be one of “two-sided” (default), “greater” or “less”. You can specify just the initial letter. Only used for testing the null that a single proportion equals a given value, or that two proportions are equal; ignored otherwise.

  • conf_level (float, optional) – Confidence level of the returned confidence interval. Must be a single number between 0 and 1. Only used when testing the null that a single proportion equals a given value, or that two proportions are equal; ignored otherwise.

  • correct (bool, optional) – A logical indicating whether Yates’ continuity correction should be applied where possible.

Returns

rtype:

TestResults

A data class with the following attributes:

  • statistic: float, The value of Pearson’s chi-squared test statistic.

  • df: int, The degrees of freedom of the approximate chi-squared distribution of the test statistic.

  • p_value: float, The p-value of the test.

  • estimate: array_like, A vector with the sample proportions x/n.

  • null_value: float or array_like, The value of p if specified by the null hypothesis.

  • conf_int: array_like, A confidence interval for the true proportion if there is one group, or for the difference in proportions if there are 2 groups and p is not given, or None otherwise. In the cases where it is not None, the returned confidence interval has an asymptotic confidence level as specified by conf_level, and is appropriate to the specified alternative hypothesis.

  • alternative: str, A character string describing the alternative.

  • method: str, A character string indicating the method used, and whether Yates’ continuity correction was applied.

Examples

import numpy as np
from scipy import stats
from estyp.testing import prop_test

x = np.array([83, 90, 129, 70])
n = np.array([86, 93, 136, 82])
result = prop_test(x, n)
print(result)

    4-sample test for given proportions without continuity correction
    X-squared = 12.6004 | df: 3 | p-value = 0.0056
    alternative hypothesis: the true proportions are not all equal
    sample estimates:
      proportion(s): [0.965116, 0.967742, 0.948529, 0.853659]
    

Test for Association/Correlation Between Paired Samples

cor_test(x, y, method='pearson', alternative='two-sided', conf_level=0.95, continuity=False)
Parameters:
  • x (array_like) – Numeric one-dimensional arrays, lists or pd.Series of data values. x and y must have the same length.

  • y (array_like) – Numeric one-dimensional arrays, lists or pd.Series of data values. x and y must have the same length.

  • method (str, optional) – A string indicating which correlation coefficient is to be used for the test. One of “pearson”, “kendall”, or “spearman”.

  • alternative (str, optional) – Indicates the alternative hypothesis and must be one of “two-sided”, “greater” or “less”. “greater” corresponds to positive association, “less” to negative association.

  • conf_level (float, optional) – Confidence level for the returned confidence interval. Currently only used for the Pearson product moment correlation coefficient if there are at least 4 complete pairs of observations.

  • continuity (bool, optional) – If True, a continuity correction is used for Kendall’s tau.

Returns

rtype:

TestResults

A TestResults instance containing the following attributes:

  • statistic: the value of the test statistic.

  • df (if applicable): the degrees of freedom of the test statistic.

  • p_value: the p-value of the test.

  • estimate: the estimated measure of association.

  • null_value: the value of the association measure under the null hypothesis, always 0.

  • alternative: a string describing the alternative hypothesis.

  • method: a string indicating how the association was measured.

  • conf_int (if applicable): a confidence interval for the measure of association.

Details

The three methods each estimate the association between paired samples and compute a test of the value being zero. They use different measures of association, all in the range [-1, 1] with 0 indicating no association. These are sometimes referred to as tests of no correlation, but that term is often confined to the default method.

References

[1] D. J. Best & D. E. Roberts (1975). Algorithm AS 89: The Upper Tail Probabilities of Spearman’s rho.

Applied Statistics, 24, 377–379. 10.2307/2347111.

[2] Myles Hollander & Douglas A. Wolfe (1973), Nonparametric Statistical Methods.

New York: John Wiley & Sons. Pages 185–194 (Kendall and Spearman tests).

Example

Using the iris dataset to test the association between sepal length and petal length using Pearson’s correlation:

from sklearn import datasets
from estyp.testing import cor_test

iris = datasets.load_iris()
sepal_length = iris.data[:, 0]
petal_length = iris.data[:, 2]

result = cor_test(sepal_length, petal_length, method="pearson")
print(result)

    Pearson's product-moment correlation
    t = 21.6460 | df: 148 | p-value = <0.0001
    alternative hypothesis: true correlation is not equal to 0
    95 percent confidence interval:
     0.827036 0.905508
    sample estimates:
      cor: 0.871754
    

Pearson’s Chi-squared Test for Count Data

chisq_test(x, y=None, p=None, correct=True, rescale_p=False)
Parameters:
  • x (array_like) – A numeric list or 2D list (matrix). x and y can also both be lists.

  • y (array_like, optional) – A numeric data; ignored if x is a matrix. If x is a list, y should be a list of the same length. The default is None.

  • p (array_like, optional) – A list of probabilities of the same length as x. An error is raised if any entry of p is negative.

  • correct (bool, optional) – A boolean indicating whether to apply continuity correction when computing the test statistic for 2x2 tables: one half is subtracted from all abs(O-E) differences; however, the correction will not be bigger than the differences themselves. The default is True.

  • rescale_p (bool, optional) – A boolean; if True then p is rescaled (if necessary) to sum to 1. If rescale_p is False, and p does not sum to 1, an error is raised.

Returns

rtype:

TestResults

A TestResults instance containing the following attributes:

  • statistic: The value of the chi-squared test statistic.

  • df: The degrees of freedom of the approximate chi-squared distribution of the test statistic.

  • p_value: The p-value for the test.

  • method: A string indicating the type of test performed, and whether continuity correction was used.

  • expected: The expected counts under the null hypothesis.

Details

If x is a matrix with one row or column, or if x is a list and y is not given, then a goodness-of-fit test is performed (x is treated as a one-dimensional contingency table). The entries of x must be non-negative integers. In this case, the hypothesis tested is whether the population probabilities equal those in p, or are all equal if p is not given.

If x is a matrix with at least two rows and columns, it is taken as a two-dimensional contingency table: the entries of x must be non-negative integers. Otherwise, x and y must be lists of the same length; cases with None values are removed, the lists are treated as factors, and the contingency table is computed from these. Then Pearson’s chi-squared test is performed of the null hypothesis that the joint distribution of the cell counts in a 2-dimensional contingency table is the product of the row and column marginals.

The p-value is computed from the asymptotic chi-squared distribution of the test statistic; continuity correction is only used in the 2-by-2 case (if correct is True, the default).

Examples

  • Example 1: From Agresti(2007) p.39

from estyp.testing import chisq_test

M = [[762, 327, 468], [484, 239, 477]]
result1 = chisq_test(M)
print(result1)

    Pearson's Chi-squared test
    X-squared = 30.0701 | df: 2 | p-value = <0.0001
    alternative hypothesis: true frequencies are not equal to expected frequencies
  • Example 2: Effect of rescale_p

x = [12, 5, 7, 7]
p = [0.4, 0.4, 0.2, 0.2]
result2 = chisq_test(x, p=p, rescale_p=True)
print(result2)

    Chi-squared test for given probabilities
    X-squared = 4.3226 | df: 3 | p-value = 0.2287
    alternative hypothesis: true frequencies are not equal to expected frequencies
  • Example 3.1: Testing for population probabilities

x = [20, 15, 25]
result31 = chisq_test(x)
print(result31)

    Chi-squared test for given probabilities
    X-squared = 2.5000 | df: 2 | p-value = 0.2865
    alternative hypothesis: true frequencies are not equal to expected frequencies
  • Example 3.2: A second example of testing for population probabilities

x = [89,37,30,28,2]
p = [0.40,0.20,0.20,0.19,0.01]
result32 = chisq_test(x, p=p)
print(result32)

    Chi-squared test for given probabilities
    X-squared = 5.7947 | df: 4 | p-value = 0.2150
    alternative hypothesis: true frequencies are not equal to expected frequencies
  • Example 4: Goodness of fit

x = [1, 2, 3, 4, 5, 6]
y = [6, 1, 2, 3, 4, 5]
result4 = chisq_test(x, y)
print(result4)

    Pearson's Chi-squared test
    X-squared = 30.0000 | df: 25 | p-value = 0.2243
    alternative hypothesis: true frequencies are not equal to expected frequencies