# 1 08-ParameterControl

Can we optimally vary parameters over the run?

## 1.1 Introduction

### 1.1.1 Parameter Control

• Motivation
• Parameter setting
• Tuning
• Control
• Examples
• Where to apply parameter control
• How to apply parameter control

### 1.1.2 Motivation

An EA has many strategy parameters, e.g.
* mutation operator and mutation rate
* crossover operator and crossover rate
* selection mechanism and selective pressure (e.g. tournament size)
* population size

Good parameter values facilitate good performance.

Q1: How to find good parameter values?
EA parameters are rigid (constant during a run).
But, an EA is a dynamic, adaptive process.
Thus, optimal parameter values may vary during a run.

Q2: How to vary parameter values?

### 1.1.4 Tuning

Parameter tuning:
the traditional way of testing and comparing different values before the “real” run.

Problems:
* users mistakes in settings can be sources of errors or sub-optimal performance
* parameters interact: exhaustive search is not practicable
* costs much time even with “smart” tuning
* good values may become bad during the run

### 1.1.5 Control

Parameter control:
setting values on-line, during the actual run , e.g.
* predetermined time-varying schedule p = p(t)
* using feedback from the search process
* encoding parameters in chromosomes and rely on selection

Problems:
* finding optimal p is hard, finding optimal p(t) is harder
* still user-defined feedback mechanism, how to “optimize”?
* when would natural selection work for strategy parameters?

## 1.2 Initial examples

We’ve covered some of these previously:

### 1.2.1 Varying mutation step size

• min f(x1,…,xn)
• Li xi Ui for i = 1,…,n bounds
• gi (x) 0 for i = 1,…,q inequality constraints
• hi (x) = 0 for i = q+1,…,m equality constraints
• Algorithm:
• EA with real-valued representation (x1,…,xn)
• arithmetic averaging crossover
• Gaussian mutation: x’i = xi + N(0, σ)
• standard deviation σ is called mutation step size

#### 1.2.1.1 option 1 deterministic

• Replace the constant σ by a function σ(t)
$$\sigma(t) = 1 - 0.9 \frac{t}{T}$$
• 0 t T is the current generation number
• Features:
• changes in σ are independent from the search progress
• strong user control of σ by the above formula
• σ is fully predictable
• a given σ acts on all individuals of the population

• Replace the constant σ by a function σ(t) updated after every n steps by the 1/5 success rule
Remember 1/5=0.2 …
$$\sigma(x) = \sum^mj=1 \left\{ \begin{array}{cc} \sigma(t - n) / c, & if p_s > 0.2 \\ \sigma(t - n) * c, & if p_s < 0.2 \\ \sigma(t - n), & otherwise \end{array} \right\}$$
• Features:
• changes in σ are based on feedback from the search progress
• some user control of σ by the above formula
• σ is not predictable
• a given σ acts on all individuals of the population

#### 1.2.1.3 option 3 self-adaptive evolving

• Assign a personal σ to each individual
• Incorporate this σ into the chromosome: (x1, …, xn, σ)
• Apply variation operators to xi’s and σ
• Features:
• changes in σ are results of natural selection
• (almost) no user control of σ
• σ is not predictable
• a given σ acts on one individual

#### 1.2.1.4 option 4 self-adaptive evolving more

• Assign a personal σ to each variable in each individual
• Incorporate σ’s into the chromosomes: (x1, …, xn, σ1, …, σn)
• Apply variation operators to xi’s and σi’s
• Features:
• changes in σi are results of natural selection
• (almost) no user control of σi
• σi is not predictable
• a given σi acts on one gene of one individual

### 1.2.2 Varying penalties

• Constraints in CSP
• gi(x) 0 for i = 1,…,q inequality constraints
• hi(x) = 0 for i = q+1,…,m equality constraints
• are handled by penalties:
eval (x) = f(x) + W × penalty(x)
• where
$$penalty(x) = \sum^m_{j=1} \left\{ \begin{array}{cc} 1 & for\ violated\ constraint, \\ 0 & for\ satisfied\ constraint. \\ \end{array} \right\}$$

#### 1.2.2.1 option 1 deterministic

• Replace the constant W by a function W(t):
W(t) = (C x T)σ
• 0 t T is the current generation number
• Features:
• changes in W independent from the search progress
• strong user control of W by the above formula
• W is fully predictable
• a given W acts on all individuals of the population

• Replace the constant W by W(t) updated in each generation
$$W(t + 1) = \left\{ \begin{array}{cc} \beta \cdot W(t) & if\ last\ k\ champions\ all\ feasible\ \\ \gamma \cdot W(t) & if\ last\ k\ champions\ all\ infeasible\ \\ W(t), & otherwise \end{array} \right\}$$
β < 1, γ > 1, β × γ ≠ 1 champion: best of its generation
• Features:
• changes in W are based on feedback from the search progress
• some user control of W by the above formula
• W is not predictable
• a given W acts on all individuals of the population

#### 1.2.2.3 option 3 self-adaptive evolving

Assign a personal W to each individual
Incorporate this W into the chromosome: (x1, …, xn, W)
Apply variation operators to xi’s and W
val((x, W)) = f(x) + W × penalty(x)
while for mutation step sizes we had
eval((x, σ))= f(x)
this option is thus sensitive “cheating” ⇒ makes no sense

However, one could use an objective tournament to evaluate fitness,
and still evolve the fitness function.

### 1.2.3 Past lessons learned

• Various forms of parameter control can be distinguished by:
• primary features:
• what component of the EA is changed
• how the change is made
• secondary features:
• evidence /data backing up changes
• level/scope of change

Various forms of parameter control discussed above can be distinguished by:

σ(t) = 1-0.9*t/T σ’ = σ/c, if r > ⅕ … (x1, …, xn, σ) (x1, …, xn, σ1, …, σn) W(t) = (C*t)α W’=β*W, if bi∈F (x1, …, xn, W)
What Step size Step size Step size Step size Penalty weight Penalty weight Penalty weight
Evidence Time Successful mutations rate (Fitness) (Fitness) Time Constraint satisfaction history (Fitness)
Scope Population Population Individual Gene Population Population Individual

## 1.3 Classification of control methods

1. What is changed?
3. What evidence informs the change?
4. What is the scope of the change?

### 1.3.1 1. Where to apply parameter control?

What is changed?

Practically any EA component can be parameterized and
thus controlled on-the-fly:
* representation
* evaluation function
* variation operators
* selection operator (parent or mating selection)
* replacement operator (survival or environmental selection)
* population (size, topology)

Note that each component can be parameterized,
and that the number of parameters is not clearly defined.
Despite the somewhat arbitrary character of this list of components and of the list of parameters of each component,
we will maintain the ‘what-aspect’ as one of the main classification features,
since this allows us to locate where a specific mechanism has its effect.

### 1.3.2 2. How to apply parameter control?

Three major types of parameter control:

#### 1.3.2.1 Deterministic

some rule modifies strategy parameter without feedback from the search,
based on some counter)

feedback rule based on some measure monitoring search progress

parameter values evolve along with solutions;
encoded into chromosomes, and they undergo variation and selection

### 1.3.3 3. What evidence informs changes?

• The parameter changes may be based on:
• time or number of evaluations (deterministic control)
• population diversity
• gene distribution, etc.
• relative fitness of individuals created with given values (adaptive or self-adaptive control)

#### 1.3.3.1 Absolute evidence

• predefined event triggers change, e.g.
• increase pm by 10% if population diversity falls under threshold x
• Direction and magnitude of change is fixed

#### 1.3.3.2 Relative evidence

• compare values through solutions created with them, e.g.
• increase pm if top quality offspring came by high mutation rates
• Direction and magnitude of change is not fixed

### 1.3.4 4. What is the scope/level of the change?

The parameter may take effect on different levels:
* environment (fitness function)
* population
* individual
* sub-individual

Note: given component (parameter) determines possibilities

Thus: scope/level is a derived or secondary feature in the classification scheme

## 1.4 Refined taxonomy Combinations of types and evidences
* Possible: +
* Impossible: - ## 1.5 Evaluation/Summary

• Parameter control offers the possibility to use appropriate values in various stages of the search
• offer users “liberation” from parameter tuning
• delegate parameter setting task to the evolutionary process
• the latter implies a double task for an EA: problem solving + self-calibrating (overhead)

## 1.6 More examples

Of each “What”?

### 1.6.1 Representation

L.D. Whitley, K.E. Mathias, and P. Fitzhorn.
Delta coding: An iterative search strategy for genetic algorithms,.
In Belew and Booker , pages 77–84.

We illustrate variable representations with the delta coding algorithm of Mathias and Whitley,
which effectively modifies the encoding of the function parameters.
The motivation behind this algorithm is to:
maintain a good balance between fast search and sustaining diversity.
In our taxonomy, it can be categorised as an adaptive adjustment of the representation,
based on absolute evidence.

The GA is used with multiple restarts;
the first run is used to find an interim solution,
and subsequent runs decode the genes as distances (delta values) from the last interim solution.
This way each restart forms a new hypercube,
with the interim solution at its origin.
The resolution of the delta values can also be altered at the restarts,
to expand or contract the search space.
Population density can be measured,
by the Hamming distance between the best and worst strings of the current population.
The restarts are triggered when population diversity
is not greater than 1.
The sketch of the algorithm showing the main idea is given below.
Note that the number of bits for δ can be increased if the same solution INTERIM is found.

BEGIN
/* given a starting population and genotype-phenotype encoding */
WHILE (HD > 1) DO
RUN GA with k bits per object variable;
OD
REPEAT UNTIL (global termination is satisfied) DO
save best solution as INTERIM;
reinitialise population with new coding;
/* k-1 bits as the distance δ to the object value in */
/* INTERIM and one sign bit */
WHILE (HD > 1) DO
RUN GA with this encoding;
OD
OD
END

### 1.6.2 Evaluation function

A.E. Eiben and J.K. van der Hauw.
Solving 3-SAT with adaptive genetic algorithms.
In ICEC-97 , pages 81–86.

Evaluation functions are typically not varied in an EA,
because they are often considered as part of the problem to be solved,
and not as part of the problem-solving algorithm.
In fact, an evaluation function forms the bridge between the two,
so both views are at least partially true.
In many EAs, the evaluation function is derived from the (optimization) problem at hand,
with a simple transformation of the objective function.
In the class of constraint satisfaction problems, however,
there is no objective function in the problem definition.
One possible approach here is based on penalties.
Let us assume that we have m constraints ci(i ∈ {1, . . . , m})
and n variables vj(j ∈ {1, . . . , n}) with the same domain S.
Then the penalties can be defined as follows:
$$f(\bar{s}) = \sum_{i=1}^m \chi (\bar{s}, c_i),$$
where wi is the weight associated with violating ci, and
$$\chi(\bar{s}, c_i) = \left\{ \begin{array}{cc} 1 & if\ \bar{s}\ violated\ c_i, \\ 0 & otherwise. \\ \end{array} \right\}$$
Obviously, the setting of these weights has a large impact on the EA performance,
and ideally wi should reflect how hard ci is to satisfy.
The problem is that:
finding the appropriate weights requires much insight into the given problem instance,
and therefore it might not be practicable.
The step-wise adaptation of weights (SAW) mechanism,
introduced by Eiben and van der Hauw,
provides a simple and effective way to set these weights.
The basic idea behind the SAW mechanism is that,
constraints that are not satisfied after a certain number of steps (fitness evaluations),
must be difficult, and thus must be given a high weight (penalty).
SAW-ing changes the evaluation function adaptively in an EA,
by periodically checking the best individual in the population,
and raising the weights of those constraints this individual violates.
Then the run continues with the new evaluation function.
A nice feature of SAW-ing is that:
it liberates the user from seeking good weight settings,
thereby eliminating a possible source of error.
Furthermore, the used weights reflect the difficulty of constraints,
for the given algorithm, on the given problem instance,
in the given stage of the search.
This property is also valuable since, in principle,
different weights could be appropriate for different algorithms.

### 1.6.3 Mutation

Oliver Kramer.
Evolutionary self-adaptation: a survey of operators and strategy parameters.
Evolutionary Intelligence, 3(2):51–65, 2010.

A large majority of work on adapting or self-adapting EA parameters concerns variation operators:
mutation and recombination (crossover).
The 1/5 rule of Rechenberg we discussed earlier,
constitutes a classic example for adaptive mutation step size control in ES.

### 1.6.4 Crossover

L. Davis, editor.
Handbook of Genetic Algorithms.
Van Nostrand Reinhold, 1991.

The classic example for adapting crossover rates in GAs is Davis’s adaptive operator fitness.
The method adapts the rates of crossover operators,
by rewarding those that are successful in creating better offspring.
This reward is diminishingly propagated back to operators of a few generations back,
who helped set it all up;
the reward is an increase in probability at the cost of other operators.
The GA using this method applies several crossover operators simultaneously,
within the same generation, each having its own crossover rate pc(opi).
Additionally, each operator has its local delta value, di,
that represents the strength of the operator,
measured by the advantage of a child created by using that operator,
with respect to the best individual in the population.
The local deltas are updated after every use of operator i.
The adaptation mechanism recalculates the crossover rates periodically,
redistributing 15% of the probabilities biased by the accumulated operator strengths,
that is, the local deltas.
To this end, these di values are normalized,
so that their sum equals 15, yielding dnormi for each i.
Then the new value for each pc(opi) is 85% of its old value, and its normalized strength:

$$p_c(op_i) = 0.85 \cdot p_c(op_i) + d^{norm}_i$$

Clearly, this method is adaptive based on relative evidence.

### 1.6.5 Selection

S.W. Mahfoud.
Boltzmann selection.
In Bäck et al., pages C2.5:1–4.

T. Bäck, D.B. Fogel, and Z. Michalewicz, editors.
Handbook of Evolutionary Computation.
Institute of Physics Publishing, Bristol, and Oxford University
Press, New York, 1997.

https://en.wikipedia.org/wiki/Simulated_annealing

• todo

### 1.6.6 Population

J. Arabas, Z. Michalewicz, and J. Mulawka. GAVaPS – a genetic algorithm
with varying population size. In ICEC-94 , pages 73–78.

Z. Michalewicz. Genetic Algorithms + Data Structures = Evolution Programs.
Springer, 3rd edition, 1996.

• todo

### 1.6.7 Multiple simultaneously

T. Bäck.
U. The interaction of mutation rate, selection and self-adaptation within a genetic algorithm.
V. In Männer and Manderick , pages 85–94.

T. Bäck, A.E. Eiben, and N.A.L. van der Vaart.
U. An empirical study on GAs “without parameters”.
V. In Schoenauer et al. , pages 315–324.

• todo

## 1.7 Discussion

Tuning works, and increases the search space.
Control works, and increases it further.
Be systematic in both approaches!