Sonia modules in a Python scriptΒΆ
In order to incorporate the core algorithm into an analysis pipeline (or to write your own script wrappers) all that is needed is to import the modules. Each module defines some classes that only a few methods get called on.
The modules are:
Module name | Classes |
---|---|
evaluate_model.py | EvaluateModel |
sequence_generation.py | SequenceGeneration |
plotting.py. | Plotter |
sonia_leftpos_rightpos.py | SoniaLeftposRightpos |
sonia_length_pos.py | SoniaLengthPos |
sonia_vjl.py | SoniaVJL |
sonia.py | Sonia |
utils.py | N/A (contains util functions) |
The classes with methods that are of interest will be EvaluateModel (to evaluate seqs) and SequenceGeneration (to generate seqs), SoniaLeftposRightpos or SoniaLengthPos (to initialise and infer the models) and Plotter (to plot results).
You can find some examples in the examples folder. We demonstrate here some basic usage. Data and gen files are included in the GitHub repository to demonstrate usage, however we recommend to expand both data and generated files for an accurate inference.
import os
import sonia
from sonia.sonia_leftpos_rightpos import SoniaLeftposRightpos
from sonia.plotting import Plotter
from sonia.evaluate_model import EvaluateModel
from sonia.sequence_generation import SequenceGeneration
work_folder = 'examples/' # where data files are and output folder should be
data_file = work_folder + 'data_seqs.txt' # file with data sequences
gen_file = work_folder + 'gen_seqs.txt' # file with generated sequences if not generated internally
output_folder = work_folder + 'selection/' # location to save model
# load lists of sequences with gene specification
with open(data_file) as f: # this assume data sequences are in semi-colon separated text file, with gene specification
data_seqs = [x.strip().split(';') for x in f]
gen_seqs = []
with open(gen_file) as f: # this assume generated sequences are in semi-colon separated text file, with gene specification
gen_seqs = [x.strip().split(';') for x in f]
# creates the model object, load up sequences and set the features to learn
qm = SoniaLeftposRightpos(data_seqs=data_seqs, gen_seqs=gen_seqs)
# infer model
qm.infer_selection()
# plot results
pl=Plotter(qm)
pl.plot_model_learning('model_learning.png')
pl.plot_vjl('marginals.png')
pl.plot_logQ('log_Q.png')
# save the model
if not os.path.isdir(output_folder):
os.mkdir(output_folder)
qm.save_model(output_folder + 'SONIA_model_example')
# load default model (human TRA)
model_dir=os.path.join(os.path.dirname(sonia.sonia_leftpos_rightpos.__file__),'default_models','human_T_alpha')
qm=SoniaLeftposRightpos(load_dir=model_dir,chain_type='human_T_alpha')
# load evaluation and generation classes
ev=EvaluateModel(sonia_model=qm)
sq=SequenceGeneration(sonia_model=qm)
# generate seqs pre
seqs_pre=sq.generate_sequences_pre(10)
# generate seqs post
seqs_post = sq.generate_sequences_post(10)
print(seqs_post)
# evaluate Q, pgen and ppost of sequences
# NB: data has to be in format: list(array((n_seqs,3 or more))). Check output of generate_sequences_post method for an example (4th column is not used in the evaluate_seqs method).
qs,pgens,pposts= ev.evaluate_seqs(seqs_post)
print(pgens,pposts,qs)
Additional documentation of the modules is found in their docstrings (accessible either through pydoc or using help() within the python interpreter).