File I/O & Processing (mpes.fprocessing)
¶
Custom methods to handle ARPES data I/O and standard data processing methods (filtering, dewarping, etc.)
@author: R. Patrick Xian, L. Rettig
-
mpes.fprocessing.
_arraysum
(array_a, array_b)¶ Calculate the sum of two arrays.
-
mpes.fprocessing.
_hist1d_numba_seq
(sample, bins, ranges)¶ 1D Binning function, pre-compiled by Numba for performance. Behaves much like numpy.histogramdd, but calculates and returns unsigned 32 bit integers
-
mpes.fprocessing.
_hist2d_numba_seq
(sample, bins, ranges)¶ 2D Binning function, pre-compiled by Numba for performance. Behaves much like numpy.histogramdd, but calculates and returns unsigned 32 bit integers
-
mpes.fprocessing.
_hist3d_numba_seq
(sample, bins, ranges)¶ 3D Binning function, pre-compiled by Numba for performance. Behaves much like numpy.histogramdd, but calculates and returns unsigned 32 bit integers
-
mpes.fprocessing.
_hist4d_numba_seq
(sample, bins, ranges)¶ 4D Binning function, pre-compiled by Numba for performance. Behaves much like numpy.histogramdd, but calculates and returns unsigned 32 bit integers
-
mpes.fprocessing.
applyJitter
(df, amp, col, type)¶ Add jittering to a dataframe column.
- Parameters
- dfdataframe
Dataframe to add noise/jittering to.
- ampnumeric
Amplitude scaling for the jittering noise.
- colstr
Name of the column to add jittering to.
- Return
Uniformly distributed noise vector with specified amplitude and size.
-
mpes.fprocessing.
binDataframe
(df, ncores=4, axes=None, nbins=None, ranges=None, binDict=None, pbar=True, jittered=True, pbenv='classic', **kwds)¶ Calculate multidimensional histogram from columns of a dask dataframe. Prof. Yves Acremann’s method.
- Paramters
- axes(list of) strings | None
Names the axes to bin.
- nbins(list of) int | None
Number of bins along each axis.
- ranges(list of) tuples | None
Ranges of binning along every axis.
- binDictdict | None
Dictionary with specifications of axes, nbins and ranges. If binDict is not None. It will override the specifications from other arguments.
- pbarbool | True
Option to display a progress bar.
- pbenvstr | ‘classic’
Progress bar environment (‘classic’ for generic version and ‘notebook’ for notebook compatible version).
- jitteredbool | True
Option to add histogram jittering during binning.
- **kwdskeyword arguments
See keyword arguments in
mpes.fprocessing.hdf5Processor.localBinning()
.
- Return
- histdictdict
Dictionary containing binned data and the axes values (if
ret = True
).
-
mpes.fprocessing.
binDataframe_fast
(df, ncores=4, axes=None, nbins=None, ranges=None, binDict=None, pbar=True, jittered=True, pbenv='classic', jpart=True, **kwds)¶ Calculate multidimensional histogram from columns of a dask dataframe.
- Paramters
- axes(list of) strings | None
Names the axes to bin.
- nbins(list of) int | None
Number of bins along each axis.
- ranges(list of) tuples | None
Ranges of binning along every axis.
- binDictdict | None
Dictionary with specifications of axes, nbins and ranges. If binDict is not None. It will override the specifications from other arguments.
- pbarbool | True
Option to display a progress bar.
- pbenvstr | ‘classic’
Progress bar environment (‘classic’ for generic version and ‘notebook’ for notebook compatible version).
- jitteredbool | True
Option to add histogram jittering during binning.
- **kwdskeyword arguments
See keyword arguments in
mpes.fprocessing.hdf5Processor.localBinning()
.
- Return
- histdictdict
Dictionary containing binned data and the axes values (if
ret = True
).
-
mpes.fprocessing.
binDataframe_lean
(df, ncores=4, axes=None, nbins=None, ranges=None, binDict=None, pbar=True, jittered=True, pbenv='classic', **kwds)¶ Calculate multidimensional histogram from columns of a dask dataframe.
- Paramters
- axes(list of) strings | None
Names the axes to bin.
- nbins(list of) int | None
Number of bins along each axis.
- ranges(list of) tuples | None
Ranges of binning along every axis.
- binDictdict | None
Dictionary with specifications of axes, nbins and ranges. If binDict is not None. It will override the specifications from other arguments.
- pbarbool | True
Option to display a progress bar.
- pbenvstr | ‘classic’
Progress bar environment (‘classic’ for generic version and ‘notebook’ for notebook compatible version).
- jitteredbool | True
Option to add histogram jittering during binning.
- **kwdskeyword arguments
See keyword arguments in
mpes.fprocessing.hdf5Processor.localBinning()
.
- Return
- histdictdict
Dictionary containing binned data and the axes values (if
ret = True
).
-
mpes.fprocessing.
binDataframe_numba
(df, ncores=4, axes=None, nbins=None, ranges=None, binDict=None, pbar=True, jittered=True, pbenv='classic', jpart=True, **kwds)¶ Calculate multidimensional histogram from columns of a dask dataframe.
- Paramters
- axes(list of) strings | None
Names the axes to bin.
- nbins(list of) int | None
Number of bins along each axis.
- ranges(list of) tuples | None
Ranges of binning along every axis.
- binDictdict | None
Dictionary with specifications of axes, nbins and ranges. If binDict is not None. It will override the specifications from other arguments.
- pbarbool | True
Option to display a progress bar.
- pbenvstr | ‘classic’
Progress bar environment (‘classic’ for generic version and ‘notebook’ for notebook compatible version).
- jitteredbool | True
Option to add histogram jittering during binning.
- **kwdskeyword arguments
See keyword arguments in
mpes.fprocessing.hdf5Processor.localBinning()
.
- Return
- histdictdict
Dictionary containing binned data and the axes values (if
ret = True
).
-
mpes.fprocessing.
binPartition
(partition, binaxes, nbins, binranges, jittered=False, jitter_params={})¶ Bin the data within a file partition (e.g. dask dataframe).
- Parameters
- partitiondataframe partition
Partition of a dataframe.
- binaxeslist
List of axes to bin.
- nbinslist
Number of bins for each binning axis.
- binrangeslist
The range of each axis to bin.
- jitteredbool | False
Option to include jittering in binning.
- jitter_paramsdict | {}
Parameters used to set jittering.
- Return
- hist_partitionndarray
Histogram from the binning process.
-
mpes.fprocessing.
binPartition_numba
(partition, binaxes, nbins, binranges, jittered=False, jitter_params={})¶ Bin the data within a file partition (e.g. dask dataframe).
- Parameters
- partitiondataframe partition
Partition of a dataframe.
- binaxeslist
List of axes to bin.
- nbinslist
Number of bins for each binning axis.
- binrangeslist
The range of each axis to bin.
- jitteredbool | False
Option to include jittering in binning.
- jitter_paramsdict | {}
Parameters used to set jittering.
- Return
- hist_partitionndarray
Histogram from the binning process.
-
class
mpes.fprocessing.
dataframeProcessor
(datafolder, paramfolder='', datafiles=[], ncores=None)¶ Processs the parquet file converted from single events data.
-
_addBinners
(axes=None, nbins=None, ranges=None, binDict=None)¶ Construct the binning parameters within an instance.
-
appendColumn
(colnames, colvals)¶ Append columns to dataframe.
- Parameters
- colnameslist/tuple
New column names.
- colvalsnumpy array/list
Entries of the new columns.
-
appendEAxis
(E0, **kwds)¶ Calculate and append the E axis to the events dataframe. This method can be reused.
- Parameter
- E0numeric
Time-of-flight offset.
-
appendKAxis
(x0, y0, X='X', Y='Y', newX='kx', newY='ky', **kwds)¶ Calculate and append the k axis coordinates (kx, ky) to the events dataframe. This method can be reused.
-
appendMarker
(source_name='ADC', mapping=<function multithresh>, marker_name='Marker', lower_bounds=[], upper_bounds=[], thresholds=[], update='append', **kwds)¶ Append markers to specific ranges in a source column. The mapping of the marker is usually a piecewise defined function. This enables binning in nonequivalent steps as the next step.
-
appendRow
(folder=None, df=None, ftype='parquet', **kwds)¶ Append rows read from other files to existing dataframe.
- Parameters
- folderstr | None
Folder directory for the files to append to the existing dataframe (i.e. when appending parquet files).
- dfdataframe | None
Dataframe to append to the exisitng dataframe.
- ftypestr | ‘parquet’
File type (‘parquet’, ‘dataframe’)
- **kwdskeyword arguments
Additional arguments to submit to
dask.dataframe.append()
.
-
applyECorrection
(type, **kwds)¶ Apply correction to the time-of-flight (TOF) axis of single-event data.
- Parameters
- typestr
Type of correction to apply to the TOF axis.
- **kwdskeyword arguments
Additional parameters to use for the correction. :corraxis: str | ‘t’
String name of the axis to correct.
- center
list/tuple | (650, 650) Image center pixel positions in (row, column) format.
- amplitude
numeric | -1 Amplitude of the time-of-flight correction term (negative sign meaning subtracting the curved wavefront).
- d
numeric | 0.9 Field-free drift distance.
- t0
numeric | 0.06 Time zero position corresponding to the tip of the valence band.
- gam
numeric Linewidth value for correction using a 2D Lorentz profile.
- sig
numeric Standard deviation for correction using a 2D Gaussian profile.
- gam2
numeric Linewidth value for correction using an asymmetric 2D Lorentz profile, X-direction.
- amplitude2
numeric Amplitude value for correction using an asymmetric 2D Lorentz profile, X-direction.
-
applyFilter
(colname, lb=- inf, ub=inf, update='replace', ret=False)¶ Application of bound filters to a specified column (can be used consecutively).
- Parameters
- colnamestr
Name of the column to filter.
- lb, ubnumeric, numeric | -infinity, infinity
The lower and upper bounds used in the filtering.
- updatestr | ‘replace’
Update option for the filtered dataframe.
- retbool | False
Return option for the filtered dataframe.
-
applyKCorrection
(X='X', Y='Y', newX='Xm', newY='Ym', type='mattrans', **kwds)¶ Calculate and replace the X and Y values with their distortion-corrected version. This method can be reused.
- Parameters
- X, Ystr, str | ‘X’, ‘Y’
Labels of the columns before momentum distortion correction.
- newX, newYstr, str | ‘Xm’, ‘Ym’
Labels of the columns after momentum distortion correction.
-
columnApply
(mapping, rescolname, **kwds)¶ Apply a user-defined function (e.g. partial function) to an existing column.
- Parameters
- mappingfunction
Function to apply to the column.
- rescolnamestr
Name of the resulting column.
- **kwdskeyword arguments
Keyword arguments of the user-input mapping function.
-
convert
(form='parquet', save_addr=None, namestr='/data', pq_append=False, **kwds)¶ Update or convert to other file formats.
- Parameters
- formstr | ‘parquet’
File format to convert into.
- save_addrstr | None
Path of the folder to save the converted files to.
- namestr‘/data’
Extra namestring attached to the filename.
- pq_appendbool | False
Option to append to the existing parquet file (if
True
) in the specified folder, otherwise the existing parquet files will be deleted before writing new files in.- **kwdskeyword arguments
See extra keyword arguments in
dask.dataframe.to_parquet()
for parquet conversion, or indask.dataframe.to_hdf()
for HDF5 conversion.
-
deleteColumn
(colnames)¶ Delete columns
- Parameters
- colnamesstr/list/tuple
List of column names to be dropped.
-
distributedBinning
(axes, nbins, ranges, binDict=None, pbar=True, binmethod='numba', ret=False, **kwds)¶ Binning the dataframe to a multidimensional histogram.
- Parameters
- axes, nbins, ranges, binDict, pbar
See
mpes.fprocessing.binDataframe()
.- binmethodstr | ‘numba’
Dataframe binning method (‘original’, ‘lean’, ‘fast’ and ‘numba’).
- retbool | False
Option to return binning results as a dictionary.
- **kwdskeyword arguments
See
mpes.fprocessing.binDataframe()
ormpes.fprocessing.binDataframe_lean()
-
getCountRate
(fids='all', plot=False)¶ Create count rate data for the files in the data frame processor specified in ‘fids’
- Parameters
fids: the file ids to include. ‘all’ | list of file ids. See arguments in
parallelHDF5Processor.subset()
andhdf5Processor.getCountRate()
.
-
getElapsedTime
(fids='all')¶ Return the elapsed time in the file from the msMarkers wave
return: secs: the length of the the file in seconds.
-
mapColumn
(mapping, *args, **kwds)¶ Apply a dataframe-partition based mapping function to an existing column.
- Parameters
- oldcolnamestr
The name of the column to use for computation.
- mappingfunction
Functional map to apply to the values of the old column. Takes the data frame as first argument. Further arguments are passed by **kwds
- newcolnamestr | ‘Transformed’
New column name to be added to the dataframe.
- argstuple | ()
Additional arguments of the functional map.
- updatestr | ‘append’
Updating option. ‘append’ = append to the current dask dataframe as a new column with the new column name. ‘replace’ = replace the values of the old column.
- **kwdskeyword arguments
Additional arguments for the
dask.dataframe.apply()
function.
-
property
ncol
¶ Number of columns in the distrbuted dataframe.
-
property
nrow
¶ Number of rows in the distributed dataframe.
-
read
(source='folder', ftype='parquet', fids=[], update='', timeStamps=False, **kwds)¶ Read into distributed dataframe.
- Parameters
- sourcestr | ‘folder’
Source of the file readout. :’folder’: Read from the provided data folder. :’files’: Read from the provided list of file addresses.
- ftypestr | ‘parquet’
Type of file to read into dataframe (‘h5’ or ‘hdf5’, ‘parquet’, ‘json’, ‘csv’).
- fidslist | []
IDs of the files to be selected (see
mpes.base.FileCollection.select()
). Specify ‘all’ to read all files of the given file type.- updatestr | ‘’
File selection update option (see
mpes.base.FileCollection.select()
).- **kwdskeyword arguments
See keyword arguments in
mpes.readDataframe()
.
-
saveHistogram
(form, save_addr, dictname='histdict', **kwds)¶ Export binned histogram in other formats.
- Parameters
See
mpes.fprocessing.saveDict()
.
-
toBandStructure
()¶ Convert to the xarray data structure from existing binned data.
- Return
An instance of
BandStructure()
orMPESDataset()
from thempes.bandstructure
module.
-
transformColumn
(oldcolname, mapping, newcolname='Transformed', args=(), update='append', **kwds)¶ Apply a simple function to an existing column.
- Parameters
- oldcolnamestr
The name of the column to use for computation.
- mappingfunction
Functional map to apply to the values of the old column.
- newcolnamestr | ‘Transformed’
New column name to be added to the dataframe.
- argstuple | ()
Additional arguments of the functional map.
- updatestr | ‘append’
Updating option. ‘append’ = append to the current dask dataframe as a new column with the new column name. ‘replace’ = replace the values of the old column.
- **kwdskeyword arguments
Additional arguments for the
dask.dataframe.apply()
function.
-
transformColumn2D
(map2D, X, Y, **kwds)¶ Apply a mapping simultaneously to two dimensions.
- Parameters
- map2Dfunction
2D mapping function.
- X, Yseries, series
The two columns of the dataframe to apply mapping to.
- **kwdskeyword arguments
Additional arguments for the 2D mapping function.
-
viewEventHistogram
(dfpid, ncol, axes=['X', 'Y', 't', 'ADC'], bins=[80, 80, 80, 80], ranges=[0, 1800, 0, 1800, 68000, 74000, 0, 500], backend='bokeh', legend=True, histkwds={}, legkwds={}, **kwds)¶ Plot individual histograms of specified dimensions (axes) from a substituent dataframe partition.
- Parameters
- dfpidint
Number of the data frame partition to look at.
- ncolint
Number of columns in the plot grid.
- axeslist/tuple
Name of the axes to view.
- binslist/tuple
Bin values of all speicified axes.
- rangeslist
Value ranges of all specified axes.
- backendstr | ‘matplotlib’
Backend of the plotting library (‘matplotlib’ or ‘bokeh’).
- legendbool | True
Option to include a legend in the histogram plots.
- histkwds, legkwds, **kwdsdict, dict, keyword arguments
Extra keyword arguments passed to
mpes.visualization.grid_histogram()
.
-
-
mpes.fprocessing.
extractEDC
(folder=None, files=[], axes=['t'], bins=[1000], ranges=[65000, 100000], binning_kwds={'jittered': True}, ret=True, **kwds)¶ Extract EDCs from a list of bias scan files.
-
class
mpes.fprocessing.
hdf5Processor
(f_addr, **kwds)¶ Class for generating multidimensional histogram from hdf5 files.
-
_addBinners
(axes=None, nbins=None, ranges=None, binDict=None, irregular_bins=False)¶ Construct the binning parameters within an instance.
-
getCountRate
(plot=False)¶ Create count rate trace from the msMarker field in the hdf5 file.
- Parameters
plot=False|True No function yet.
- return: countRate: the count rate in Hz
secs: the seconds into the scan.
-
getElapsedTime
()¶ Return the elapsed time in the file from the msMarkers wave
return: secs: the length of the the file in seconds.
-
loadMapping
(energy, momentum)¶ Load the mapping parameters
-
localBinning
(axes=None, nbins=None, ranges=None, binDict=None, jittered=False, histcoord='midpoint', ret='dict', **kwds)¶ Compute the photoelectron intensity histogram locally after loading all data into RAM.
- Paramters
- axes(list of) strings | None
Names the axes to bin.
- nbins(list of) int | None
Number of bins along each axis.
- ranges(list of) tuples | None
Ranges of binning along every axis.
- binDictdict | None
Dictionary with specifications of axes, nbins and ranges. If binDict is not None. It will override the specifications from other arguments.
- jitteredbool | False
Determines whether to add jitter to the data to avoid rebinning artefact.
- histcoordstring | ‘midpoint’
The coordinates of the histogram. Specify ‘edge’ to get the bar edges (every dimension has one value more), specify ‘midpoint’ to get the midpoint of the bars (same length as the histogram dimensions).
- retbool | True
- True
returns the dictionary containing binned data explicitly
- False
no explicit return of the binned data, the dictionary
generated in the binning is still retained as an instance attribute.
- **kwdskeyword argument
keyword
data type
default
meaning
amin
numeric/None
None
minimum value of electron sequence
amax
numeric/None
None
maximum value of electron sequence
jitter_axes
list
axes
list of axes to jitter
jitter_bins
list
nbins
list of the number of bins
jitter_amplitude
numeric/array
0.5
jitter amplitude (single number for all)
jitter_ranges
list
ranges
list of the binning ranges
- Return
- histdictdict
Dictionary containing binned data and the axes values (if
ret = True
).
-
localBinning_numba
(axes=None, nbins=None, ranges=None, binDict=None, jittered=False, histcoord='midpoint', ret='dict', **kwds)¶ Compute the photoelectron intensity histogram locally after loading all data into RAM.
- Paramters
- axes(list of) strings | None
Names the axes to bin.
- nbins(list of) int | None
Number of bins along each axis.
- ranges(list of) tuples | None
Ranges of binning along every axis.
- binDictdict | None
Dictionary with specifications of axes, nbins and ranges. If binDict is not None. It will override the specifications from other arguments.
- jitteredbool | False
Determines whether to add jitter to the data to avoid rebinning artefact.
- histcoordstring | ‘midpoint’
The coordinates of the histogram. Specify ‘edge’ to get the bar edges (every dimension has one value more), specify ‘midpoint’ to get the midpoint of the bars (same length as the histogram dimensions).
- retbool | True
- True
returns the dictionary containing binned data explicitly
- False
no explicit return of the binned data, the dictionary
generated in the binning is still retained as an instance attribute.
- **kwdskeyword argument
keyword
data type
default
meaning
amin
numeric/None
None
minimum value of electron sequence
amax
numeric/None
None
maximum value of electron sequence
jitter_axes
list
axes
list of axes to jitter
jitter_bins
list
nbins
list of the number of bins
jitter_amplitude
numeric/array
0.5
jitter amplitude (single number for all)
jitter_ranges
list
ranges
list of the binning ranges
- Return
- histdictdict
Dictionary containing binned data and the axes values (if
ret = True
).
-
saveHistogram
(dictname='histdict', form='h5', save_addr='./histogram', **kwds)¶ Save binned histogram and the axes. See
mpes.fprocessing.saveDict()
.
-
saveParameters
(form='h5', save_addr='./binning')¶ Save all the attributes of the binning instance for later use (e.g. binning axes, ranges, etc).
- Parameters
- formstr | ‘h5’
File format to for saving the parameters (‘h5’/’hdf5’, ‘mat’)
- save_addrstr | ‘./binning’
The address for the to be saved file.
-
toBandStructure
()¶ Convert to an instance of BandStructure.
-
toSplitter
()¶ Convert to an instance of hdf5Splitter.
-
updateHistogram
(axes=None, sliceranges=None, ret=False)¶ Update the dimensional sizes of the binning results.
-
viewEventHistogram
(ncol, axes=['X', 'Y', 't', 'ADC'], bins=[80, 80, 80, 80], ranges=[0, 1800, 0, 1800, 68000, 74000, 0, 500], axes_name_type='alias', backend='bokeh', legend=True, histkwds={}, legkwds={}, **kwds)¶ Plot individual histograms of specified dimensions (axes).
- Parameters
- ncolint
Number of columns in the plot grid.
- axeslist/tuple
Name of the axes to view.
- binslist/tuple
Bin values of all speicified axes.
- rangeslist
Value ranges of all specified axes.
- axes_name_typestr | ‘alias’
- ‘alias’
human-comprehensible aliases of the datasets from the hdf5 file (e.g. ‘X’, ‘ADC’, etc)
- ‘original’
original names of the datasets from the hdf5 file (e.g. ‘Stream0’, etc).
Type of specified axes names.
- backendstr | ‘matplotlib’
Backend of the plotting library (‘matplotlib’ or ‘bokeh’).
- legendbool | True
Option to include a legend in the histogram plots.
- histkwds, legkwds, **kwdsdict, dict, keyword arguments
Extra keyword arguments passed to
mpes.visualization.grid_histogram()
.
-
-
class
mpes.fprocessing.
hdf5Reader
(f_addr, ncores=None, **kwds)¶ HDF5 reader class
-
_assembleGroups
(gnames, amin=None, amax=None, use_alias=True, dtyp='float32', timeStamps=False, ret='array')¶ Assemble the content values of the selected groups.
- Parameters
- gnameslist
List of group names.
- amin, amaxnumeric, numeric | None, None
Index selection range for all groups.
- use_aliasbool | True
See
hdf5Reader.getGroupNames()
.- dtypestr | ‘float32’
Data type string.
- retstr | ‘array’
Return type specification (‘array’ or ‘dict’).
-
convert
(form, save_addr='./summary', pq_append=False, **kwds)¶ Format conversion from hdf5 to mat (for Matlab/Python) or ibw (for Igor).
- Parameters
- formstr
The format of the data to convert into.
- save_addrstr | ‘./summary’
File address to save to.
- pq_appendbool | False
Option to append to parquet files. :True: Append to existing parquet files. :False: The existing parquet files will be deleted before new file creation.
-
getAttributeNames
(wexpr=None, woexpr=None)¶ Retrieve attribute names from the loaded hdf5 file with string filtering.
- Parameters
- wexprstr | None
Expression in a name to leave in the attribute name list (w = with).
- woexprstr | None
Expression in a name to leave out of the attribute name list (wo = without).
- Return
- filteredAttrbuteNameslist
List of filtered attribute names.
-
getGroupNames
(wexpr=None, woexpr=None, use_alias=False)¶ Retrieve group names from the loaded hdf5 file with string filtering.
- Parameters
- wexprstr | None
Expression in a name to leave in the group name list (w = with).
- woexprstr | None
Expression in a name to leave out of the group name list (wo = without).
- use_aliasbool | False
Specification on the use of alias to replace the variable name.
- Return
- filteredGroupNameslist
List of filtered group names.
-
name2alias
(names_to_convert)¶ Find corresponding aliases of the named groups.
- Parameter
- names_to_convertlist/tuple
Names to convert to aliases.
- Return
- aliaseslist/tuple
Aliases corresponding to the names.
-
static
readAttribute
(element, *attribute, nullval='None')¶ Retrieve the content of the attribute(s) in the loaded hdf5 file.
- Parameter
- attributelist/tuple
Collection of attribute names.
- nullvalstr | ‘None’
Null value to retrieve as a replacement of NoneType.
- Return
- attributeContentlist/tuple
Collection of values of the corresponding attributes.
-
static
readGroup
(element, *group, amin=None, amax=None, sliced=True)¶ Retrieve the content of the group(s) in the loaded hdf5 file.
- Parameter
- grouplist/tuple
Collection of group names.
- amin, amaxnumeric, numeric | None, None
Minimum and maximum indice to select from the group (dataset).
- slicedbool | True
Perform slicing on the group (dataset), if
True
.
- Return
- groupContentlist/tuple
Collection of values of the corresponding groups.
-
summarize
(form='text', use_alias=True, timeStamps=False, ret=False, **kwds)¶ Summarize the content of the hdf5 file (names of the groups, attributes and the selected contents. Output in various user-specified formats.)
- Parameters
- formstr | ‘text’
- ‘dataframe’
HDF5 content summarized into a dask dataframe.
- ‘dict’
HDF5 content (both data and metadata) summarized into a dictionary.
- ‘metadict’
HDF5 metadata summarized into a dictionary.
- ‘text’
descriptive text summarizing the HDF5 content.
Format to summarize the content of the file into.
- use_aliasbool | True
Specify whether to use the alias to rename the groups.
- retbool | False
Specify whether function return is sought.
**kwds : keyword arguments
- Return
- hdfdictdict
Dictionary including both the attributes and the groups, using their names as the keys.
- edfdataframe
Dataframe (edf = electron dataframe) constructed using only the group values, and the column names are the corresponding group names (or aliases).
-
-
class
mpes.fprocessing.
hdf5Splitter
(f_addr, **kwds)¶ Class to split large hdf5 files.
-
split
(nsplit, save_addr='./', namestr='split_', split_group='Stream_0', pbar=False)¶ Split and save an hdf5 file.
- Parameters
- nsplitint
Number of split files.
- save_addrstr | ‘./’
Directory to store the split files.
- namestrstr | ‘split_’
Additional namestring attached to the front of the filename.
- split_groupstr | ‘Stream_0’
Name of the example group to split for file length reference.
- pbarbool | False
Enable (when True)/Disable (when False) the progress bar.
-
subset
(file_id)¶ Spawn an instance of hdf5Processor from a specified split file.
-
toProcessor
()¶ Change to an hdf5Processor instance.
-
-
mpes.fprocessing.
im2mat
(fdir)¶ Convert image to numpy ndarray.
-
mpes.fprocessing.
mat2im
(datamat, dtype='uint8', scaling=['normal'], savename=None)¶ Convert data matrix to image
-
mpes.fprocessing.
metaReadHDF5
(hfile, attributes=[], groups=[])¶ Parse the attribute (i.e. metadata) tree in the input HDF5 file and construct a dictionary of attributes
- Parameters
- hfileHDF5 file instance
Instance of the
h5py.File
class.- attributes, groupslist, list | [], []
List of strings representing the names of the specified attribute/group names. When specified as None, the components (all attributes or all groups) are ignored. When specified as [], all components (attributes/groups) are included. When specified as a list of strings, only the attribute/group names matching the strings are retrieved.
-
mpes.fprocessing.
numba_histogramdd
(sample, bins, ranges)¶ Wrapper for the Number pre-compiled binning functions. Behaves in total much like numpy.histogramdd. Returns uint32 arrays. This was chosen because it has a significant performance improvement over uint64 for large binning volumes. Be aware that this can cause overflows for very large sample sets exceeding 3E9 counts in a single bin. This should never happen in a realistic photoemission experiment with useful bin sizes.
-
class
mpes.fprocessing.
parallelHDF5Processor
(files=[], file_sorting=True, folder=None, ncores=None)¶ Class for parallel processing of hdf5 files.
-
_parse_metadata
(attributes, groups)¶ Parse the metadata from all HDF5 files.
- Parameters
- attributes, groupslist, list
See
mpes.fprocessing.metaReadHDF5()
.
-
combineResults
(ret=True)¶ Combine the results from all segments (only when self.results is non-empty).
- Parameters
- retbool | True
- True
returns the dictionary containing binned data explicitly
- False
no explicit return of the binned data, the dictionary
generated in the binning is still retained as an instance attribute.
- Return
- combinedresultdict
Return combined result dictionary (if
ret == True
).
-
convert
(form='parquet', save_addr='./summary', append_to_folder=False, pbar=True, pbenv='classic', **kwds)¶ Convert files to another format (e.g. parquet).
- Parameters
- formstr | ‘parquet’
File format to convert into.
- save_addrstr | ‘./summary’
Path of the folder for saving parquet files.
- append_to_folderbool | False
Option to append to the existing parquet files in the specified folder, otherwise the existing parquet files will be deleted first. The HDF5 files in the same folder are kept intact.
- pbarbool | True
Option to display progress bar.
- pbenvstr | ‘classic’
Specification of the progress bar environment (‘classic’ for generic version and ‘notebook’ for notebook compatible version).
- **kwdskeyword arguments
See
mpes.fprocessing.hdf5Processor.convert()
.
-
getCountRate
(fids='all', plot=False)¶ Create count rate data for the files in the parallel hdf5 processor specified in ‘fids’
- Parameters
fids: the file ids to include. ‘all’ | list of file ids. See arguments in
parallelHDF5Processor.subset()
andhdf5Processor.getCountRate()
.
-
getElapsedTime
(fids='all')¶ Return the elapsed time in the file from the msMarkers wave
return: secs: the length of the the file in seconds.
-
parallelBinning
(axes, nbins, ranges, scheduler='threads', combine=True, histcoord='midpoint', pbar=True, binning_kwds={}, compute_kwds={}, pbenv='classic', ret=False)¶ Parallel computation of the multidimensional histogram from file segments. Version with serialized loop over processor threads and parallel recombination to save memory.
- Parameters
- axes(list of) strings | None
Names the axes to bin.
- nbins(list of) int | None
Number of bins along each axis.
- ranges(list of) tuples | None
Ranges of binning along every axis.
- schedulerstr | ‘threads’
Type of distributed scheduler (‘threads’, ‘processes’, ‘synchronous’)
- histcoordstring | ‘midpoint’
The coordinates of the histogram. Specify ‘edge’ to get the bar edges (every dimension has one value more), specify ‘midpoint’ to get the midpoint of the bars (same length as the histogram dimensions).
- pbarbool | true
Option to display the progress bar.
- binning_kwdsdict | {}
Keyword arguments to be included in
mpes.fprocessing.hdf5Processor.localBinning()
.- compute_kwdsdict | {}
Keyword arguments to specify in
dask.compute()
.
-
parallelBinning_old
(axes, nbins, ranges, scheduler='threads', combine=True, histcoord='midpoint', pbar=True, binning_kwds={}, compute_kwds={}, ret=False)¶ Parallel computation of the multidimensional histogram from file segments. Old version with completely parallel binning with unlimited memory consumption. Crashes for very large data sets.
- Parameters
- axes(list of) strings | None
Names the axes to bin.
- nbins(list of) int | None
Number of bins along each axis.
- ranges(list of) tuples | None
Ranges of binning along every axis.
- schedulerstr | ‘threads’
Type of distributed scheduler (‘threads’, ‘processes’, ‘synchronous’)
- combinebool | True
Option to combine the results obtained from distributed binning.
- histcoordstring | ‘midpoint’
The coordinates of the histogram. Specify ‘edge’ to get the bar edges (every dimension has one value more), specify ‘midpoint’ to get the midpoint of the bars (same length as the histogram dimensions).
- pbarbool | true
Option to display the progress bar.
- binning_kwdsdict | {}
Keyword arguments to be included in
mpes.fprocessing.hdf5Processor.localBinning()
.- compute_kwdsdict | {}
Keyword arguments to specify in
dask.compute()
.
-
saveHistogram
(dictname='combinedresult', form='h5', save_addr='./histogram', **kwds)¶ Save binned histogram and the axes.
- Parameters
See
mpes.fprocessing.saveDict()
.
-
saveParameters
(form='h5', save_addr='./binning')¶ Save all the attributes of the binning instance for later use (e.g. binning axes, ranges, etc).
- Parameters
- formstr | ‘h5’
File format to for saving the parameters (‘h5’/’hdf5’, ‘mat’).
- save_addrstr | ‘./binning’
The address for the to be saved file.
-
subset
(file_id)¶ Spawn an instance of
mpes.fprocessing.hdf5Processor
from a specified substituent file.- Parameter
- file_idint
Integer-numbered file ID (any integer from 0 to self.nfiles - 1).
-
summarize
(form='dataframe', ret=False, **kwds)¶ Summarize the measurement information from all HDF5 files.
- Parameters
- formstr | ‘dataframe’
Format of the files to summarize into.
- retbool | False
Specification on value return.
- **kwdskeyword arguments
See keyword arguments in
mpes.fprocessing.readDataframe()
.
-
updateHistogram
(axes=None, sliceranges=None, ret=False)¶ Update the dimensional sizes of the binning results.
- Parameters
- axestuple/list | None
Collection of the names of axes for size change.
- slicerangestuple/list/array | None
Collection of ranges, e.g. (start_position, stop_position) pairs, for each axis to be updated.
- retbool | False
Option to return updated histogram.
-
viewEventHistogram
(fid, ncol, **kwds)¶ Plot individual histograms of specified dimensions (axes) from a substituent file.
- Parameters
See arguments in
parallelHDF5Processor.subset()
andhdf5Processor.viewEventHistogram()
.
-
-
class
mpes.fprocessing.
parquetProcessor
(folder, files=[], source='folder', ftype='parquet', fids=[], update='', ncores=None, **kwds)¶ Legacy version of the
mpes.fprocessing.dataframeProcessor
class.
-
mpes.fprocessing.
readARPEStxt
(fdir, withCoords=True)¶ Read and convert Igor-generated ARPES .txt files into numpy arrays The withCoords option specify whether the energy and angle information is given
-
mpes.fprocessing.
readBinnedhdf5
(fpath, combined=True, typ='float32')¶ Read binned hdf5 file (3D/4D data) into a dictionary.
- Parameters
- fpathstr
File path.
- combinedbool | True
Specify if the volume slices are combined.
- typstr | ‘float32’
Data type of the numerical values in the output dictionary.
- Return
- outdict
Dictionary with keys being the axes and the volume (slices).
-
mpes.fprocessing.
readDataframe
(folder=None, files=None, ftype='parquet', timeStamps=False, **kwds)¶ Read stored files from a folder into a dataframe.
- Parameters
- folder, filesstr, list/tuple | None, None
Folder path of the files or a list of file paths. The folder path has the priority such that if it’s specified, the specified files will be ignored.
- ftypestr | ‘parquet’
File type to read (‘h5’ or ‘hdf5’, ‘parquet’, ‘json’, ‘csv’, etc). If a folder path is given, all files of the specified type are read into the dataframe in the reading order.
- **kwdskeyword arguments
See the keyword arguments for the specific file parser in
dask.dataframe
module.
- Return
Dask dataframe read from specified files.
-
mpes.fprocessing.
readIgorBinFile
(fdir, **kwds)¶ Read Igor binary formats (pxp and ibw)
-
mpes.fprocessing.
readimg
(f_addr)¶ Read images (jpg, png, 2D/3D tiff)
-
mpes.fprocessing.
readtsv
(fdir, header=None, dtype='float', **kwds)¶ Read tsv file from hemispherical detector
Parameters
- fdirstr
file directory
- headerint | None
number of header lines
- dtypestr | ‘float’
data type of the return numpy.ndarray
- **kwdskeyword arguments
other keyword arguments for pandas.read_table()
Return
- datanumpy ndarray
read and type-converted data
-
mpes.fprocessing.
rot2d
(th, angle_unit)¶ construct 2D rotation matrix
-
mpes.fprocessing.
saveDict
(dct={}, processor=None, dictname='', form='h5', save_addr='./histogram', **kwds)¶ Save the binning result dictionary, including the histogram and the axes values (edges or midpoints).
- Parameters
- dctdict | {}
A dictionary containing the binned data and axes values to be exported.
- processorclass | None
Class including all attributes.
- dictnamestr | ‘’
Namestring of the dictionary to save (such as the attribute name in a class).
- formstr | ‘h5’
Save format, supporting ‘mat’, ‘h5’/’hdf5’, ‘tiff’ (need tifffile) or ‘png’ (need imageio).
- save_addrstr | ‘./histogram’
File path to save the binning result.
- **kwdskeyword arguments
keyword
data type
default
meaning
dtyp
string
‘float32’
Data type of the histogram
cutaxis
int
3
The index of axis to cut the 4D data
slicename
string
‘V’
The shared namestring for the 3D slice
binned_data_name
string
‘binned’
Namestring of the binned data
otheraxes
dict
None
Values along other or converted axes
mat_compression
bool
False
Matlab file compression
-
mpes.fprocessing.
sgfltr2d
(datamat, span, order, axis=0)¶ Savitzky-Golay filter for two dimensional data Operated in a line-by-line fashion along one axis Return filtered data
-
mpes.fprocessing.
sortNamesBy
(namelist, pattern, gp=0, slicerange=None, None)¶ Sort a list of names according to a particular sequence of numbers (specified by a regular expression search pattern)
Parameters
- nameliststr
List of name strings
- patternstr
Regular expression of the pattern
- gpint
Grouping number
Returns
- orderedseqarray
Ordered sequence from sorting
- sortednameliststr
Sorted list of name strings
-
mpes.fprocessing.
txtlocate
(ffolder, keytext)¶ Locate specific txt files containing experimental parameters