3. SVM_predict.py

3.1. Description

Build SVM model from “train_file” and then predict cases in “data_file”

3.2. Options

--version

show program’s version number and exit

-h, --help

show this help message and exit

-t TRAIN_FILE, --train_file=TRAIN_FILE

Tab or space separated file (for tranining purpose, to build SVM model). The first column contains sample IDs; the second column contains sample labels in integer (must be 0 or 1); the third column contains sample label names (string, must be consistent with column-2). The remaining columns contain featuers used to build SVM model.

-d DATA_FILE, --data_file=DATA_FILE

Tab or space separated file (new data to predict the label). The first column contains sample IDs; the second column contains sample labels in integer (must be 0 or 1); the third column contains sample label names (string, must be consistent with column-2). The remaining columns contain featuers used to build SVM model.

-C C_VALUE, --cvalue=C_VALUE

C value. default=1.0

-k S_KERNEL, --kernel=S_KERNEL

Specifies the kernel type to be used in the algorithm. It must be one of ‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’ or a callable. If none is given, ‘rbf’ will be used. default=linear

3.3. Input files format

TRAIN_FILE and DATA_FILE use the same format as below. the 2nd and 3rd columns in DATA_FILE can be consideres as Original Label and Original Name.

ID

Label

Label_name

feature_1

feature_2

feature_3

feature_n

sample_1

1

WT

1560

795

0.9716

feature_n

sample_2

1

WT

784

219

0.4087

feature_n

sample_3

1

WT

2661

2268

1.1691

feature_n

sample_4

0

Mut

643

198

0.5458

feature_n

sample_5

0

Mut

534

87

1.0545

feature_n

sample_6

0

Mut

332

75

0.5115

feature_n

3.4. Command

$ python3  SVM_predict.py -t lung_CES_5features.tsv  -d lung_CES_data_to_predict.tsv -C 10

3.5. Output to screen

TCGA_ID        Ori_Label       Ori_name        Predict_Label   Predict_Name
TCGA-05-4244   unknown TP53_WT 1       Truncating
TCGA-05-4249   unknown TP53_WT 1       Truncating
TCGA-05-4250   unknown TP53_WT 1       Truncating
TCGA-05-4389   unknown TP53_WT 1       Truncating
TCGA-05-4390   unknown TP53_WT 1       Truncating
TCGA-05-4403   unknown TP53_WT 1       Truncating
TCGA-38-7271   unknown TP53_WT 1       Truncating
TCGA-38-A44F   unknown TP53_WT 0       Normal
TCGA-39-5030   unknown TP53_WT 1       Truncating