Added mathsym+tcc and boost against all advice
This commit is contained in:
parent
58ae49dd99
commit
90702a435d
136 changed files with 14409 additions and 0 deletions
172
eo/contrib/mathsym/README
Normal file
172
eo/contrib/mathsym/README
Normal file
|
|
@ -0,0 +1,172 @@
|
|||
|
||||
This is not yet another gp system (nyagp). For one, it is not general.
|
||||
It does one thing, find mathematical functions, and tries to do that well.
|
||||
|
||||
So, if you're trying to steer ants on various New Mexico trails, or build your
|
||||
own tiny block world, you're in the wrong place. However, if you're interested
|
||||
in finding mathematical functions either through direct application on data or
|
||||
running it through a simulator, you might find what you're looking for here.
|
||||
|
||||
=== Representation (sym/ + gen/) ========
|
||||
|
||||
Mathsym has a few interesting characteristics. First and foremost is the
|
||||
basic representation. It uses trees, but these trees are stored in a
|
||||
reference counted hashtable. This means that every subtree that is alive
|
||||
is stored once and only once. The reference counting mechanism takes care
|
||||
of memory management.
|
||||
|
||||
The idea of using a hashtable (for offline analysis) comes from Walter Tackett, in his
|
||||
1994 dissertation. The current system is just a real-time implementation of this
|
||||
idea, adding the reference counting for ease of use.
|
||||
|
||||
The hashtable brings overhead. It's still pretty fast, but a string based representation
|
||||
would run rounds around it. However, by virtue of it storing every subtree only once, it
|
||||
is fairly tight on memory. This helps tremendously when confronted with growing populations, bloat.
|
||||
The hashtable implementation can not stop bloat, but does make it more manageable. In a typical
|
||||
GP run, the number of distinct subtrees is only 10-20% of the total number of subtrees.
|
||||
|
||||
Other advantages of the hashtable are in the ability to examine the run more thoroughly. It is easy
|
||||
to check how many subtrees are present in the system, and for each subtree you can check the reference
|
||||
count.
|
||||
|
||||
The basic tree is called a Sym. A Sym is simply a tree, and has children, accessible through args().
|
||||
A Sym simply contains an iterator (== decorated pointer) to an entry in the hashtable.
|
||||
Every time you create a Sym, it is either looked up in the hashtable or added to the hashtable.
|
||||
A Sym has several members: size, depth, args, etc. One interesting member is the refcount().
|
||||
This returns the reference count of the Sym in the hashtable, and thus returns the number
|
||||
of distinct contexts in which the Sym is used.
|
||||
|
||||
Another nice thing of these hashtable Syms is that a check for equality reduces to a pointer comparison.
|
||||
|
||||
The Sym nodes are identified by a simple token, of type token_t (usually an unsigned int). It
|
||||
is completely generic and could conceivably be adapted to steer ants. The rest of the library
|
||||
is however targeted at mathematical functions purely.
|
||||
|
||||
sym/Sym.h is the file to look into for the functionality provided by Sym. The sym/ directory
|
||||
is where the source files are stored that are relevant for the generic Sym functionality. The
|
||||
'gen/' directory contains some generic functionality to build and traverse trees, independent of
|
||||
the function and terminal set.
|
||||
|
||||
The file sym/README.cpp documents the use of the sym library for general GP use.
|
||||
|
||||
=== Function Set (fun/) ===
|
||||
|
||||
The standard GP function set of binary functions: addition, multiplication, subtraction and
|
||||
division is NOT supported.
|
||||
|
||||
What is however supported are the functions of:
|
||||
|
||||
summation: arbitrary arity, arity zero meaning 0.0. Arity 2 is standard addition
|
||||
product: arbitrary arity, arity zero meaning 1.0. Arity 2 is standard multiplication
|
||||
inversion: 1.0 / x. Only arity 1
|
||||
unary minus: -x. Only arity 1
|
||||
|
||||
Plus a whole bunch of other functions (see "fun/FunDef.h")
|
||||
|
||||
The reason for this is the observation (actually from a friend of mine, thanks Luuk),
|
||||
that this set of functions is complete and slightly more orthogonal than a binary set.
|
||||
|
||||
The directory 'fun' contains the functionality for the function and terminal set, together
|
||||
with ERC's etc. fun/FunDef.cpp contains the definition of the functionality. Stuff can be
|
||||
added here, but best to contact me if you miss particular functions.
|
||||
|
||||
=== Evaluation (eval/) ===
|
||||
|
||||
The second important thing is evaluation. Although Syms can be evaluated through an interpreter,
|
||||
this is not the fastest way to go about with it. The standard way of evaluating a Sym is to
|
||||
first *compile* it to a function, and then run it in your favourite environment. Compilation
|
||||
is done through the use of the excellent tinycc compiler, which is blazingly fast and produces
|
||||
pretty good functions.
|
||||
|
||||
Compilation comes in several flavours: compile a single function and retrieve a pointer to a function
|
||||
of signature:
|
||||
|
||||
double func(const double* x);
|
||||
|
||||
where x is the input array. Another option is to compile a bunch of functions in one go, and retrieve an array
|
||||
of such function pointers. The Syms are simply printed and compiled. An example:
|
||||
|
||||
double func(const double* x) { return x*x + x * 1./x; }
|
||||
|
||||
The batch version proceeds significantly more quickly than calling compile every time. The function pointers
|
||||
can be given to a simulation for extremely quick evaluation.
|
||||
|
||||
A third option is to compile a complete population in one go, and return a single pointer of signature
|
||||
|
||||
void func(const double* x, double* y);
|
||||
|
||||
Where 'y' is the (preallocated) output array. This allows to evaluate a complete population in one function
|
||||
call, storing the results in 'y'. It uses the hashtable to store every calculation only once. An example
|
||||
for the two function x*x + x*1./x and x + sin(x*x) is:
|
||||
|
||||
void func(const double* x, double* y) {
|
||||
double a0 = x;
|
||||
double a1 = a0 * a0;
|
||||
double a2 = 1.0;
|
||||
double a3 = a2 / a0;
|
||||
double a4 = a2 * a3;
|
||||
y[0] = a4;
|
||||
double a5 = sin(a1);
|
||||
double a6 = a0 + a5;
|
||||
y[1] = a6;
|
||||
}
|
||||
|
||||
This is the fastest way to evaluate even humongous populations quickly. You might be surprised at
|
||||
the amount of code re-use in a GP population.
|
||||
|
||||
The three compilation functions can be found in eval/sym_compile.h
|
||||
|
||||
A limiting factor in tinycc is that the struct TCCState that is used to hold the compilation context,
|
||||
is not really self-contained. This unfortunately means that with every call to 'compile' ALL previous
|
||||
pointers that have been produced are invalidated. I'm still looking at ways to circumvent this.
|
||||
|
||||
To work with mathsym, a few small changes in tccelf.c were necessary, check README.TCC for details.
|
||||
|
||||
=== Interval Arithmetic (eval/) ===
|
||||
|
||||
GP is pretty good at finding mathematical expressions that are numerically unsound. Take for instance
|
||||
the function '1 / x'. This is well defined only when x is strictly positive, but will lead to problems
|
||||
when x equals 0. The standard answer is to define some pseudo-arithmetical function called 'protected
|
||||
division' that will return some value (usually 1) when a division by zero occurs. This leads to a number
|
||||
of protected functions (sqrt, log, tan, etc.) which all need to be protected. Interpreting results from
|
||||
GP using such functions is in general hard.
|
||||
|
||||
Interval arithmetic (through another excellent library boost/numeric/interval) is used to calculate
|
||||
if particular functions can conceivably produce problems. This completely annihilates the use for Koza-style
|
||||
protected operators and is a more safe and sound method. For interval arithmetic to function, the bounds
|
||||
on the input variables need to be known. As for every function we can calculate a guarenteed,
|
||||
though not necessarily tight, output interval given the input intervals, we can check arbitrary functions
|
||||
for possible problems. If, for example for division, the input interval contains 0, we know that a division
|
||||
by zero is theoretically possible. It's then best to throw away the entire function.
|
||||
|
||||
Interval Arithmetic is accessible through the class IntervalBoundsCheck (eval/BoundsCheck.h)
|
||||
|
||||
=== More generic support (gen/) ===
|
||||
|
||||
The gen subdirectory contains some general utility classes for defining function sets and for
|
||||
creating trees. The idea is that these functions are generic and only append on the sym/ part
|
||||
of the library. Unfortunately, the language table currently needs an ERC function, a default
|
||||
implementation is hidden inside fun/FunDef.cpp. Will fix at some point.
|
||||
|
||||
gen/LanguageTable.cpp -> defines the functions/terminals that can be used
|
||||
gen/TreeBuilder.cpp -> can create trees based on a LanguageTable
|
||||
|
||||
=== Data and Errors (regression/) ===
|
||||
|
||||
The above classes are generic and apply for any type of problem where a mathematical function can be
|
||||
used to steer some process, run a simulation, whatever. First check the intervals, then compile the
|
||||
Sym(s) to a (set of) function pointer(s), and use the pointers in some way to evaluate for fitness.
|
||||
One particular type of problem for which support is built in is 'symbolic regression'. This type of
|
||||
problem involves finding an mathematical input/output relationship based on some data.
|
||||
|
||||
To enable this, regression/ introduces the class Dataset to contain the data and ErrorMeasure to calculate
|
||||
error. Currently supported: mean squared error, mean absolute error and mean squared error scaled (proportional
|
||||
to correlation squared). They use some helper classes such as Scaling and TargetInfo.
|
||||
|
||||
=== EO interface (eo_interface/) ===
|
||||
|
||||
Contains the classes to make it all work with EO. Check the root application 'symreg' for ways to use this
|
||||
|
||||
|
||||
|
||||
|
||||
Reference in a new issue