Author: | Ralf Schlatterbeck <rsc@runtux.com> |
---|
News 03-2022: Attempt to make this installable on Windows. This involves some workaround in the code because the visual compiler does not support certain C99 constructs.
News 01-2022: This version wraps multiple evaluation with NSGA-III (note the additional 'I').
News 12-2021: This version wraps multiple evaluation values from your objective function: Now you can return more than one value to either use it for constraints (that must be fulfilled before the objective is optimized) or for multi-objective optimization with the Nondominated Sorting Genetic Algorithm V.2 (NSGA-II). You can combine both, multi-objective optimization and constraints.
News: This version wraps the Differential Evolution method (that's quite an old method but is newly implemented in pgapack).
PGAPy is a wrapper for PGAPack, the parallel genetic algorithm library (see PGAPack Readme), a powerfull genetic algorithm library by D. Levine, Mathematics and Computer Science Division Argonne National Laboratory. The library is written in C. PGAPy wraps this library for use with Python. The original PGAPack library is already quite old but is one of the most complete and accurate (and fast, although this is not my major concern when wrapping it to python) genetic algorithm implementations out there with a lot of bells and whistles for experimentation. It also has shown a remarkably small number of bugs over the years. It supports parallel execution via the message passing interface MPI in addition to a normal "serial" version. That's why I wanted to use it in Python, too.
To get started you need the PGAPack library, although it now comes bundled with PGApy, to install a parallel version you currently need a pre-installed PGApack compiled for the MPI library of choice. See Installation section for details.
There currently is not much documentation for PGAPy. You really, absolutely need to read the documentation that comes with PGAPack. The PGAPack user guide is now shipped together with PGAPy. It is installed together with some examples in share/pgapy, wherever the Python installer decides to place this in the final installation (usually /usr/local/share on Linux).
The original PGAPack library can still be downloaded from the PGAPack ftp site, it is written in ANSI C and therefore should run on most platforms. Note that this version is not very actively maintained. I've started a PGAPack fork on github where I've ported the library to the latest version of the MPI standard and have fixed some minor inconsistencies in the documentation. I've also implemented some new features (notably enhancements in selection schemes and a new replacement strategy called restricted tournament replacement and, more recently, the differential evolution strategy.)
Note: When using NSGA_III replacement for multi (or many-) objective optimization you need to either
set reference points on the hyperplane intersecting all axes at offset 1. These reference points can be obtained with the convenience function pga.das_dennis, it creates a regular set of reference points using an algorithm originally publised by I. Das and J. E. Dennis. These points are then passed as the paramter reference_points to the PGA constructor.
See examples/dtlz2.py for a usage example and the user guide for the bibliographic reference. The function gets the dimensionality of the objective space (num_eval minus num_constraint) and the number of partition to use.
Or set reference directions (in the objective space) with the reference_directions parameter, number of partitions for these directions with the refdir_partitions parameter (see das_dennis above, this uses Das/Dennis points internally), and a scale factor with the parameter refdir_scale.
You can set both, these parameters are not mutually exclusive.
I have tested pgapy on Linux only and I'll currently not provide Windows versions. You also can find my PGAPack fork on github this repository has the three upstream releases as versions in git and contains some updates concerning support of newer MPI versions and documentation updates. I've also included patches in the git repository of the Debian maintainer of the package, Dirk Eddelbuettel.
To get you started, I've included some very simple examples in examples, e.g., one-max.py implements the "Maxbit" example similar to one in the PGAPack documentation. The examples were inspired by the book "Genetic Algorithms in Python" but are written from scratch and don't include any code from the book. The examples illustrates several points:
When you extend PGAPy – remember not all functions of PGAPack are wrapped yet and you may need additional functions – you should stick to my naming conventions when making changes. The following naming conventions were used for the wrapper:
PGApack has a lot of PGASet and PGAGet functions for setting parameters. These are reflected in constructor parameters on the one hand and in read-only properties of a PGA object on the other hand. The following table gives an overview of all the original PGApack names and the names of the python wrapper. For the PGApack name I've only listed the PGASet function, in many cases there is a corresponding PGAGet function. If a corresponding read-only property exists for a constructor parameter this is indicated in the "Prop" column. In some cases properties are missing because no corresponding PGAGet function is implemented in PGApack, in other cases returning a numeric value that has a symbolic constant in PGApy doesn't make much sense. The properties have the same name as the constructor parameter. There are Properties that don't have a corresponding constructor parameter, namely the eval_count property (returning the count of function evaluations) and the GA_iter property that returns the current GA generation. In the type column I'm listing the Python type. If the type is followed by a number, more than one item of that type is specified (a sequence in Python). Some entries contain "sym", these are integer values with a symbolic constant, the value "msym" indicates that several values denoted by a list of symbolic constants can be given. A special case are the PGASetRealInitRange, PGASetRealInitPercent, PGASetIntegerInitRange functions. These take two values for each allele of the gene. In python this is a sequence of 2-tuples. Note that this means that you can have different ranges of allowed values for each allele.
The num_eval property is special: Due to limitations of the C programming language, for multiple evaluations in C the first evaluation is returned as the function return-value of the evaluate function and all other parameters are returned in an auxiliary array. PGApack specifies the number of auxiliary evaluations to be returned. In Python the evaluation function can always return a sequence of evaluation values and the num_eval is one more than PGAGetNumAuxEval would return. The default for num_eval is 1.
The first two (mandatory) constructor parameters are the type of the gene (this takes a Python type, e.g., bool for a binary genome or int for an integer genome) and the length. Note that the string_length is implicitly set with the length parameter. The string_length is also available as the length of the PGA object using the Python built-in len function.
PGApack name | Constructor parameter | Type | Prop |
---|---|---|---|
PGASetCrossoverBoundedFlag | crossover_bounded | int | yes |
PGASetCrossoverBounceBackFlag | crossover_bounce_back | int | yes |
PGASetCrossoverSBXEta | crossover_SBX_eta | float | yes |
PGASetCrossoverSBXOncePerString | crossover_SBX_once_per_string | int | yes |
PGASetCrossoverProb | crossover_prob | float | yes |
PGASetCrossoverType | crossover_type | sym | no |
PGAGetEvalCount | eval_count | int | yes |
PGASetFitnessCmaxValue | fitness_cmax | float | yes |
PGASetFitnessType | fitness_type | sym | no |
PGAGetGAIterValue | GA_iter | int | yes |
PGASetIntegerInitPermute | integer_init_permute | int2 | no |
PGASetIntegerInitRange | init | no | |
PGASetMaxFitnessRank | max_fitness_rank | float | yes |
PGASetMaxGAIterValue | max_GA_iter | int | yes |
PGASetMaxNoChangeValue | max_no_change | int | no |
PGASetMaxSimilarityValue | max_similarity | int | no |
PGASetMutationAndCrossoverFlag | mutation_and_crossover | int | yes |
PGASetMutationBounceBackFlag | mutation_bounce_back | int | yes |
PGASetMutationBoundedFlag | mutation_bounded | int | yes |
PGASetMutationIntegerValue | mutation_value | int | yes |
PGASetMutationOrCrossoverFlag | mutation_or_crossover | int | yes |
PGASetMutationPolyEta | mutation_poly_eta | float | yes |
PGASetMutationPolyValue | mutation_poly_value | float | yes |
PGASetMutationProb | mutation_prob | float | yes |
PGASetMutationRealValue | mutation_value | float | yes |
PGASetMutationType | mutation_type | sym | no |
PGASetNoDuplicatesFlag | no_duplicates | int | no |
PGASetNumAuxEval | num_eval | int | yes |
PGASetNumConstraint | num_constraint | int | yes |
PGASetNumReplaceValue | num_replace | int | yes |
PGASetPopSize | pop_size | int | yes |
PGASetPopReplaceType | pop_replace_type | sym | no |
PGASetPrintFrequencyValue | print_frequency | int | yes |
PGASetPrintOptions | print_options | msym | no |
PGASetPTournamentProb | p_tournament_prob | float | yes |
PGASetRandomizeSelect | randomize_select | int | yes |
PGASetRandomSeed | random_seed | int | yes |
PGASetRealInitRange | init | no | |
PGASetRealInitPercent | init_percent | no | |
PGASetRestartFlag | restart | int | yes |
PGASetRestartFrequencyValue | restart_frequency | int | yes |
PGASetRTRWindowSize | rtr_window_size | int | yes |
PGASetSelectType | select_type | sym | no |
PGASetStoppingRuleType | stopping_rule_types | msym | no |
PGASetStringLength | string_length | int | yes |
PGASetSumConstraintsFlag | sum_constraints | int | yes |
PGASetTournamentSize | tournament_size | int | yes |
PGASetTournamentWithReplacement | tournament_with_replacement | int | yes |
PGASetTruncationProportion | truncation_proportion | float | yes |
PGASetUniformCrossoverProb | uniform_crossover_prob | float | yes |
The following are the methods that can be used during the run of the genetic search. The run method is used to start the search. This can be used, to, e.g., set an allele during hill-climbing in a custom endofgen method. Note that some methods only apply to certain gene types, e.g. the encode_int_ methods can only be used on binary alleles (they encode an integer value as a binary or gray code representation into the gene). Other methods take or return different types depending on the type of gene, e.g. get_allele or set_allele, they call different backend functions depending on the gene type. With the set_random_seed method, the random number generator can be re-seeded. It is usually best to seed the generator once at (before) the beginning by specifying random_seed in the constructor. For further details consult the user guide. The method get_evaluation will return a double for a single evaluation and a tuple of double for multiple evaluations (when num_eval is >1)
Method | Parameters | Return |
---|---|---|
check_stopping_conditions | True if stop should occur | |
encode_int_as_binary | p, pop, frm, to, val | None |
encode_int_as_gray_code | p, pop, frm, to, val | None |
encode_real_as_binary | p, pop, frm, to l, u, val | None |
encode_real_as_gray_code | p, pop, frm, to l, u, val | None |
euclidian_distance | p1, pop1 p2, pop2 | float |
fitness | pop | None |
get_allele | p, pop, index | allele value |
get_best_index | pop | index of best string |
get_best_report_index | pop, idx | index of best eval with idx |
get_evaluation | p, pop | evaluation of p |
get_evaluation_up_to_date | p, pop | True if up-to-date |
get_fitness | p, pop | fitness of p (float) |
get_int_from_binary | p, pop, frm, to | int |
get_int_from_gray_code | p, pop, frm, to | int |
get_iteration | deprecated, use GA_iter | |
get_real_from_binary | p, pop, frm, to, l, u | float |
get_real_from_gray_code | p, pop, frm, to, l, u | float |
random01 | float between 0 and 1 | |
random_flip | probability | 0 or 1 |
random_gaussian | mean, stddev | float |
random_interval | l, r | int between l, r |
random_uniform | l, r | float between l, r |
run | None | |
select_next_index | pop | index selected individual |
set_allele | p, pop, i, value | None |
set_evaluation | p, pop, value | None |
set_evaluation_up_to_date | p, pop, status | None |
set_random_seed | seed | None (use constructor!) |
PGApack has the concept of user functions. These allow customization of different areas of a genetic algorihm. In Python they are implemented as methods that can be changed in a derived class. One of the methods that must be implemented in a derived class is the evaluate function (although technically it is not a user function in PGApack). It interprets the gene and returns an evaluation value or a sequence of evaluation values if you set the num_eval constructor parameter. PGApack computes a fitness from the raw evaluation value. For some methods an up-call into the PGA class is possible, for some methods this is not possible (and in most cases not reasonable). Note that for the stop_cond method, the standard check for stopping conditions can be called with:
self.check_stopping_conditions()
The following table lists the overridable methods with their parameters (for the function signature the first parameter self is omitted). Note that in PGApack there are additional user functions that are needed for user-defined data types which are currently not exposed in Python. In the function signatures p denotes the index of the individual and pop denotes the population. If more than one individual is specified (e.g., for crossover) these can be followed by a number. For crossover c1 and c2 denote the destination individuals (children). The propability for the mutation method is a floating-point value between 0 and 1. Remember to count the number of mutations that happen, and return that value for the mutation method!
Method | Call Signature | Return Value | Up-Call |
---|---|---|---|
check_duplicate | p1, pop1, p2, pop2 | True if dupe | no |
stop_cond | True to stop | no | |
crossover | p1, p2, p_pop, c1, c2, c_pop | None | no |
endofgen | None | no | |
evaluate | p, pop | sequence of float | no |
gene_difference | p1, pop1, p2, pop2 | float | no |
initstring | p, pop | None | no |
mutation | p, pop, propability | #mutations | no |
pre_eval | pop | None | no |
print_string | file, p, pop | None | yes |
The following PGApack constants are available:
Constant | Description |
---|---|
PGA_CROSSOVER_ONEPT | One-point Crossover |
PGA_CROSSOVER_SBX | Simulated Binary Crossover |
PGA_CROSSOVER_TWOPT | Two-point Crossover |
PGA_CROSSOVER_UNIFORM | Uniform Crossover |
PGA_FITNESSMIN_CMAX | Map fitness by subtracting worst |
PGA_FITNESSMIN_RECIPROCAL | Map fitness via reciprocal |
PGA_FITNESS_NORMAL | Linear normalization of fitness |
PGA_FITNESS_RANKING | Linear fitness ranking |
PGA_FITNESS_RAW | Identity fitness function |
PGA_MUTATION_CONSTANT | Mutation by adding/subtracting constant |
PGA_MUTATION_GAUSSIAN | Mutation by selecting from Gaussian distribution |
PGA_MUTATION_PERMUTE | Mutation swaps two random genes |
PGA_MUTATION_POLY | Polynomial Mutation |
PGA_MUTATION_RANGE | Replace gene with uniform selection from init range |
PGA_MUTATION_UNIFORM | Mutation uniform from interval |
PGA_NEWPOP | Symbolic constant for new population |
PGA_OLDPOP | Symbolic constant for old population |
PGA_POPREPL_BEST | Population replacement best strings |
PGA_POPREPL_NSGA_II | Use NSGA-II replacement for multi-objective opt. |
PGA_POPREPL_NSGA_III | Use NSGA-III replacement for multi-objective opt. |
PGA_POPREPL_PAIRWISE_BEST | Compare same index in old and new population |
PGA_POPREPL_RANDOM_NOREP | Population replacement random no replacement |
PGA_POPREPL_RANDOM_REP | Population replacement random with replacement |
PGA_POPREPL_RTR | Restricted Tournament Replacement |
PGA_REPORT_AVERAGE | Report average evaluation |
PGA_REPORT_HAMMING | Report hamming distance |
PGA_REPORT_OFFLINE | Report offline |
PGA_REPORT_ONLINE | Report online |
PGA_REPORT_STRING | Report the string |
PGA_REPORT_WORST | Report the worst evaluation |
PGA_SELECT_LINEAR | Return individuals in population order |
PGA_SELECT_PROPORTIONAL | Fitness-proportional selection |
PGA_SELECT_PTOURNAMENT | Binary probabilistic tournament selection |
PGA_SELECT_SUS | Stochastic universal selection |
PGA_SELECT_TOURNAMENT | Tournament selection |
PGA_SELECT_TRUNCATION | Truncation selection |
PGA_STOP_MAXITER | Stop on max iterations |
PGA_STOP_NOCHANGE | Stop on max number of generations no change |
PGA_STOP_TOOSIMILAR | Stop when individuals too similar |
As already mentioned, not all functions and constants of PGAPack are wrapped yet – still for many applications the given set should be enough. If you need additional functions, you may want to wrap these and send me a patch.
Another feature of PGAPack is currently not implemented in the wrapper, the usage of custom datatypes. With PGAPack you can define your own datatypes complete with their custom implementations of the genetic algorithm functionality like crossover, mutation, etc. I don't expect problems implementing these, though.
Please use the Sourceforge Bug Tracker or the Github Bug Tracker and
Project information and download from Sourceforge main page
or checkout from Github
or directly install via pypi.
PGApy, as the name suggests, supports parallelizing the evaluation function of the genetic algorithm. This uses the Message Passing Interface (MPI) standard.
To install a serial version (without parallel programming using MPI) you can simply install from pypi using pip. Alternatively when you have unpacked or checked out from sources you can install with:
python3 setup.py install --prefix=/usr/local
If you want a parallel version using an MPI (Message-Passing Interface) library you will have to install a parallel version of PGApack first. The easiest way to do this is to use my pgapack debian package builder from github. Clone this repository, check out the branch debian/sid, install the build dependencies, they're listed in the file debian/control and build the debian packages using:
dpkg-buildpackage -rfakeroot
This builds pgapack debian packages for all supported MPI libraries in debian, currently these are mpich, openmpi, and lam. In addition to the MPI libraries a serial version of the pgapack library is also built. Proceed by installing the package pgapack and the MPI backend library of choice. If you don't have a preference for an MPI library, libpgapack-openmpi is the package that uses the Debians default preferences of an MPI library.
Once a parallel version of PGApack is installed, you can install PGApy as follows: You set environment variables for the PGA_PARALLEL_VARIANT (one of mpich, openmpi, or lam) and set the PGA_MODULE to module_from_parallel_install. Finally you envoke the setup, e.g.:
export PGA_PARALLEL_VARIANT=openmpi export PGA_MODULE=module_from_parallel_install python3 setup.py install --prefix=/usr/local
If your MPI library is installed in a different place you should study the Extension configurations in setup.py to come up with an Extension definition that fits your installation. If your installation is interesting to more people, feel free to submit a patch that adds your Extension-configuration to the standard setup.py.
Version 1.2: Many-objective optimization with NSGA-III
Version 1.1.6: Polynomial mutation and simulated binary crossover (SBX)
Version 1.1.1-1.1.5: Small PGAPack updates, fixes for non-debian
Version 1.1: Add multi-objective optimization with NSGA-II
Version 1.0: Add constraint handling
Version 0.9: Allow installation of parallel version
Version 0.8: Bugfix in real mutation
Version 0.7: Major changes in wrapping
Version 0.6: Major changes in wrapping
Version 0.5: Bug-fix release
Version 0.4: Bundle PGAPack
Version 0.3: Feature enhancements, Bug fixes
Port to Python3, Python2 is still supported, license change.
Version 0.2: Feature enhancements, Bug fixes
64 bit support, more PGAPack functions and attributes wrapped, Readme-update: Sourceforge logo, Changes chapter.
Version 0.1: Initial freshmeat announcement
PGAPy is a wrapper for PGAPack, the parallel genetic algorithm library, a powerful genetic algorithm library. PGAPy wraps this library for use with Python. Pgapack is one of the most complete and accurate genetic algorithm implementations out there with a lot of features for experimentation.