Signatures
- tka.metrics.signatures.activity_score(ss: int, rep_corr_coeff: float, num_features: int)[source]
Returns activity score (originally termed transcriptional activity score (PAS)) based on the number of feature hits, replicate correlation coefficient and the amount of features. See https://clue.io/connectopedia/signature_quality_metrics for more information
- Parameters:
ss (int) – signature strength
rep_corr_coeff (float) – replicate correlation coefficient
num_features (int) – number of features in each signature
- Returns:
activity score
- Return type:
float
- tka.metrics.signatures.collapse_and_adjust_signature(df_replicates: DataFrame, rep_corr_coeff: float, spearman_coeffs: dict)[source]
Collapses and adjusts signature as described on https://clue.io/connectopedia/replicate_collapse Weighting is determined via Spearman correlation between each pair of replicate profiles from each perturbagen experiment in the level 4 data. Since Spearman correlation operates on ranked lists, the raw z-scores are first converted to ranks from 1 to n within a replicate, where n is the number of genes in the replicates. The weighting of each replicate is then calculated as the normalized sum of associations between each replicate with the others. These normalized values act as multipliers for each respective replicate vector.
- Parameters:
df_replicates (pd.DataFrame) – a pd.DataFrame with the columns being z-score normalized features and the index column being the replicate samples
rep_corr_coeff (float) – replicate correlation coefficient - first return value of replicate_correlation_coefficient()
spearman_coeffs (dict) – replicate correlation coefficients dictd - second return value of replicate_correlation_coefficient()
- Returns:
adjusted signature of shape (df_replicates.shape[1],)
- Return type:
pd.Series
- tka.metrics.signatures.replicate_correlation_coefficient(df_replicates: DataFrame)[source]
Computes replicate correlation coefficient as described on https://clue.io/connectopedia/signature_quality_metrics Replicate correlation is a measure that assesses how consistent these replicates are in a given experiment. It is computed as the 75th quantile of all pairwise Spearman correlations between replicate level 4 profiles. Higher CC indicates that the given treatment induced a consistent response.
- Parameters:
df_replicates (pd.DataFrame) – a pd.DataFrame with the columns being z-score normalized features and the index column being the replicate samples
- Raises:
ValueError – if the shapes are invalid
- Returns:
correlation coefficient dict: a dictionary with values being pairwise correlation coefficients and keys being the replicates’ indices.
- Return type:
float
- tka.metrics.signatures.signature_strength(adjusted_signature: Series, population_means: Series, population_stds: Series, num_stds: float = 1.96)[source]
Computes signature strength (total number of features deviated more than 2 STDs from the mean - threshold may vary). See https://clue.io/connectopedia/signature_quality_metrics for more information.
- Parameters:
adjusted_signature (pd.Series) – returning value of collapse_and_adjust_signature() of shape (num_features,)
population_means (pd.Series) – mean values of all features for the entire plate population The number of features must match exactly the number of features in adjusted_signature.
population_stds (pd.Series) – std values of all features for the entire plate population The number of features must match exactly the number of features in adjusted_signature.
num_stds (float) – number of standard deviation in each way required to be considered a hit (defaults to 1.96)
- Returns:
total number of features deviated more than 2 STDs from the mean
- Return type:
int