The linear_model module

Module contents

Logistic Regression

class LogisticRegression(X, y, penalty, dual, tol, C, fit_intercept, intercept_scaling, class_weight, random_state, solver, max_iter, multi_class, verbose, warm_start, n_jobs, l1_ratio)

This class implements a logistic regression model. It inherits from the sklearn.linear_model.LogisticRegression class, but adds additional methods for calculating confidence intervals, p-values, and model summaries.

__init__(penalty, dual, tol, C, fit_intercept, intercept_scaling, class_weight, random_state, solver, max_iter, multi_class, verbose, warm_start, n_jobs, l1_ratio)
Parameters:
  • X (Union[DataFrame, ndarray, None]) – A Pandas DataFrame or a NumPy array containing the model predictors.

  • y (Union[Series, ndarray, None]) – A Pandas Series or a NumPy array containing the model response.

  • penalty (Literal['l1', 'l2', 'elasticnet']) – The type of penalty to use. Can be one of "none" (default). "l1", "l2", or "elasticnet".

  • dual (bool) – Whether to use the dual formulation of the problem.

  • tol (float) – The tolerance for convergence.

  • C (int) – The regularization strength.

  • fit_intercept (bool) – Whether to fit an intercept term.

  • intercept_scaling (int) – The scaling factor for the intercept term.

  • class_weight (Union[None, str, dict]) – None (default), “balanced” or a dictionary that maps class labels to weights.

  • random_state (int) – The random seed.

  • solver (Literal['lbfgs', 'liblinear', 'newton-cg', 'newton-cholesky', 'sag', 'saga']) – The solver to use. Can be one of "lbfgs" (default), "liblinear", "newton-cg", "newton-cholesky", "sag", or "saga".

  • max_iter (int) – The maximum number of iterations.

  • multi_class (Literal['auto', 'ovr', 'multinomial']) – The type of multi-class classification to use. Can be one of "auto", "ovr", or "multinomial".

  • verbose (int) – The verbosity level.

  • warm_start (bool) – Whether to use the warm start.

  • n_jobs (int) – The number of jobs to use for parallel processing.

  • l1_ratio (Union[float, None]) – The l1_ratio parameter for elasticnet regularization.

fit()

Fits the model to the data.

predict(new_data: DataFrame)

Predicts the class labels for new data.

conf_int(conf_level=0.95)

Calculates the confidence intervals for the model coefficients.

se()

Calculates the standard errors for the model coefficients.

z_values()

Calculates the z-scores for the model coefficients.

p_values()

Calculates the p-values for the model coefficients.

summary(conf_level=0.95)

Prints a summary of the model.

from_formula(formula, data)

Class method to create an instance from a formula.

params

Returns the estimated values for model parameters.

aic

Calculates the Akaike information criterion (AIC) for the model.

bic

Calculates the Bayesian information criterion (BIC) for the model.

cov_matrix

Returns the covariance matrix for model parameters.

Examples

import numpy as np
import pandas as pd
from estyp.linear_model import LogisticRegression

np.random.seed(123)
data = pd.DataFrame({
   "y": np.random.randint(2, size=100),
   "x1": np.random.uniform(-1, 1, size=100),
   "x2": np.random.uniform(-1, 1, size=100),
})

formula = "y ~ x1 + x2"
spec = LogisticRegression.from_formula(formula, data)
model = spec.fit()

print(model.summary())
Made by Esteban Rucán. Contact me in LinkedIn: https://www.linkedin.com/in/estebanrucan/
           Estimate      S.E.         z  Pr(>|z|)   [Lower,    Upper]
Intercept -0.200864  0.202894 -0.989996  0.322176 -0.598530  0.196801
x1         0.032006  0.375254  0.085292  0.932030 -0.703478  0.767490
x2         0.438665  0.344263  1.274215  0.202587 -0.236078  1.113407