save¶
Module save¶
A module containing the Saver class, used for storing DataFrames with molecules on disk.
- class npfc.save.SafeHDF5Store(*args, **kwargs)[source]¶
Implement safe HDFStore by obtaining file lock. Multiple writes will queue if lock is not obtained.
Edited after: https://stackoverflow.com/questions/41231678/obtaining-a-exclusive-lock-when-writing-to-an-hdf5-file
Initialize and obtain file lock.
- npfc.save.file(df, output_file, shuffle=False, random_seed=None, chunk_size=None, encode=True, col_mol='mol', col_id='idm', csv_sep='|')[source]¶
A method for saving DataFrames with molecules to different file types. This is handy way of using the Saver class without having to keep a Saver object.
- Parameters
df (
DataFrame
) – the input DataFrameoutput_file (
str
) – the output fileshuffle (
bool
) – randomize recordsrandom_seed (
Optional
[int
]) – a number for reproducing the shufflingchunk_size (
Optional
[int
]) – the maximum number of records per chunk. If this value is unset, no chunking is performed, otherwise each chunk filename gets appended with a suffix: file_XXX.ext.encode (
bool
) – encode RDKit Mol objects and other objects in predefined columns as base64 strings.col_mol (
str
) – if molecules need to be encoded, then the encoding is perfomed on this column.csv_sep (
str
) – separator to use in case of csv output
- Returns
the list of output files with their number of records
- npfc.save.random() x in the interval [0, 1). ¶
- npfc.save.write_sdf(df, out, molColName='ROMol', idName=None, properties=None, allNumeric=False)[source]¶
Redefinition of PandasTools.WriteSDF because RDKit 2019.03.1 is incompatible with Pandas 25.1.
Write an SD file for the molecules in the dataframe. Dataframe columns can be exported as SDF tags if specified in the “properties” list. “properties=list(df.columns)” would export all columns. The “allNumeric” flag allows to automatically include all numeric columns in the output. User has to make sure that correct data type is assigned to column. “idName” can be used to select a column to serve as molecule title. It can be set to “RowID” to use the dataframe row key as title.