Symbolic Regression of Nonlinear Time Continuous Systems

Note

Symbolic regression is using regularized evolution, simulated annealing, and gradient-free optimization to find suitable equations. Hence, the performance might differ and depends strongly on the hyperparameters of the optimization. This example might not recover the groundtruth, but is showing off the use within DataDrivenDiffEq.jl.

DataDrivenDiffEq offers an interface to SymbolicRegression.jl to infer more complex functions. To use it, simply load a sufficient version of SymbolicRegression (currently we support version >= 0.9).

using DataDrivenDiffEq
using ModelingToolkit
using LinearAlgebra
using OrdinaryDiffEq
using SymbolicRegression
using Plots

A = [-0.9 0.2; 0.0 -0.5]
B = [0.0; 1.0]
u0 = [10.0; -10.0]
tspan = (0.0, 20.0)

f(u,p,t) = A*u .+ B .* sin(0.5*t)

sys = ODEProblem(f, u0, tspan)
sol = solve(sys, Tsit5(), saveat = 0.01);

We will use the data provided by our problem, but add the control signal U = sin(0.5*t) to it. Instead of using a function, like in another example

X = Array(sol)
t = sol.t
U = permutedims(sin.(0.5*t))
prob = ContinuousDataDrivenProblem(X, t, U = U)
Continuous DataDrivenProblem{Float64} ##DDProblem#560 in 2 dimensions and 2001 samples with controls

And plot the problems data.

plot(prob)

To solve our problem, we will use EQSearch, which provides a wrapper for the symbolic regression interface. By default, it takes in a Vector of Functions and additional keyworded arguments. We will stick to simple operations like subtraction and multiplication, use a L1DistLoss , limit the maximum size and punish complex equations while fitting our equations on minibatches.

alg = EQSearch([-, *], loss = L1DistLoss(), verbosity = 0, maxsize = 9, batching = true, batchSize = 50, parsimony = 0.01f0)
EQSearch(Function[-, *], Base.Pairs{Symbol, Any, NTuple{6, Symbol}, NamedTuple{(:loss, :verbosity, :maxsize, :batching, :batchSize, :parsimony), Tuple{L1DistLoss, Int64, Int64, Bool, Int64, Float32}}}(:loss => L1DistLoss(), :verbosity => 0, :maxsize => 9, :batching => true, :batchSize => 50, :parsimony => 0.01f0))

Again, we solve the problem to obtain a DataDrivenSolution. Note that any additional keyworded arguments are passed onto symbolic regressions EquationSearch with the exception of niterations which is max_iter

res = solve(prob, alg, max_iter = 1_0, numprocs = 0, multithreading = false)
println(res)
Started!

Cycles per second: 1.300e+04
Head worker occupation: 86.1%
Progress: 36 / 300 total iterations (12.000%)
==============================
Best equations for output 1
Hall of Fame:
-----------------------------------------
Complexity  loss       Score     Equation
1           5.772e-01  4.002e-01  0.04618871
3           2.187e-01  4.852e-01  (x1 * -1.1506101)
5           4.955e-02  7.424e-01  ((x2 * 0.1916655) - x1)
7           9.565e-03  8.225e-01  ((x2 * 0.19820975) - (x1 * 0.92271245))

==============================
Best equations for output 2
Hall of Fame:
-----------------------------------------
Complexity  loss       Score     Equation
1           6.990e-01  2.029e-01  x3
3           6.534e-01  3.370e-02  (x2 * -0.3798564)
5           2.076e-03  2.876e+00  (x3 - (x2 * 0.5002925))
7           1.354e-03  2.138e-01  ((x3 - (x2 - x3)) * 0.50092924)

==============================
Press 'q' and then <enter> to stop execution early.

Cycles per second: 1.350e+04
Head worker occupation: 80.8%
Progress: 75 / 300 total iterations (25.000%)
==============================
Best equations for output 1
Hall of Fame:
-----------------------------------------
Complexity  loss       Score     Equation
1           5.753e-01  4.035e-01  0.03991594
3           2.187e-01  4.836e-01  (x1 * -1.1506101)
5           4.955e-02  7.424e-01  ((x2 * 0.1916655) - x1)
7           3.911e-03  1.270e+00  ((0.19820975 * x2) - (x1 * 0.91051865))
9           3.147e-03  1.086e-01  ((((x1 * -0.4825664) - x2) * -0.19908988) - x1)

==============================
Best equations for output 2
Hall of Fame:
-----------------------------------------
Complexity  loss       Score     Equation
1           6.990e-01  2.029e-01  x3
3           6.156e-01  6.350e-02  (x2 * -0.51250565)
5           2.037e-03  2.856e+00  (x3 - (x2 * 0.5009787))
7           1.354e-03  2.044e-01  ((x3 - (x2 - x3)) * 0.50092924)
9           1.196e-03  6.175e-02  (x3 - ((x2 - (x1 * 0.005515127)) * 0.49938065))

==============================
Press 'q' and then <enter> to stop execution early.

Cycles per second: 1.370e+04
Head worker occupation: 86.0%
Progress: 115 / 300 total iterations (38.333%)
==============================
Best equations for output 1
Hall of Fame:
-----------------------------------------
Complexity  loss       Score     Equation
1           5.670e-01  4.181e-01  -0.018630804
3           2.187e-01  4.763e-01  (x1 * -1.1506101)
5           4.955e-02  7.424e-01  ((x2 * 0.1916655) - x1)
7           3.911e-03  1.270e+00  ((0.19820975 * x2) - (x1 * 0.91051865))
9           6.550e-04  8.934e-01  (((x1 * 0.0955102) - (x2 * -0.2011499)) - x1)

==============================
Best equations for output 2
Hall of Fame:
-----------------------------------------
Complexity  loss       Score     Equation
1           6.990e-01  2.029e-01  x3
3           6.155e-01  6.359e-02  (x2 * -0.5165656)
5           2.037e-03  2.855e+00  (x3 - (x2 * 0.5009787))
7           1.354e-03  2.044e-01  ((x3 - (x2 - x3)) * 0.50092924)
9           1.195e-03  6.235e-02  (((x3 - (x2 - x3)) * 0.5006994) - -0.0012932614)

==============================
Press 'q' and then <enter> to stop execution early.

Cycles per second: 1.390e+04
Head worker occupation: 88.7%
Progress: 155 / 300 total iterations (51.667%)
==============================
Best equations for output 1
Hall of Fame:
-----------------------------------------
Complexity  loss       Score     Equation
1           5.670e-01  4.181e-01  -0.018630804
3           2.187e-01  4.763e-01  (x1 * -1.1506101)
5           4.955e-02  7.424e-01  ((x2 * 0.1916655) - x1)
7           3.911e-03  1.270e+00  ((0.19820975 * x2) - (x1 * 0.91051865))
9           6.550e-04  8.934e-01  (((x1 * 0.0955102) - (x2 * -0.2011499)) - x1)

==============================
Best equations for output 2
Hall of Fame:
-----------------------------------------
Complexity  loss       Score     Equation
1           6.990e-01  2.029e-01  x3
3           6.155e-01  6.359e-02  (x2 * -0.51654696)
5           2.037e-03  2.855e+00  (x3 - (x2 * 0.5009787))
7           1.352e-03  2.052e-01  ((x3 - (x2 - x3)) * 0.50094223)
9           1.195e-03  6.156e-02  (((x3 - (x2 - x3)) * 0.5006994) - -0.0012932614)

==============================
Press 'q' and then <enter> to stop execution early.

Cycles per second: 1.400e+04
Head worker occupation: 90.5%
Progress: 196 / 300 total iterations (65.333%)
==============================
Best equations for output 1
Hall of Fame:
-----------------------------------------
Complexity  loss       Score     Equation
1           5.670e-01  4.181e-01  -0.018630804
3           2.187e-01  4.763e-01  (x1 * -1.1506101)
5           4.955e-02  7.424e-01  ((x2 * 0.1916655) - x1)
7           3.911e-03  1.270e+00  ((0.19820975 * x2) - (x1 * 0.91051865))
9           6.515e-04  8.961e-01  (((x1 * 0.09559753) - (x2 * -0.20119086)) - x1)

==============================
Best equations for output 2
Hall of Fame:
-----------------------------------------
Complexity  loss       Score     Equation
1           6.990e-01  2.029e-01  x3
3           6.155e-01  6.359e-02  (x2 * -0.51654696)
5           2.037e-03  2.855e+00  (x3 - (x2 * 0.5009787))
7           1.352e-03  2.052e-01  ((x3 - (x2 - x3)) * 0.5009425)
9           1.195e-03  6.156e-02  (((x3 - (x2 - x3)) * 0.5006994) - -0.0012932614)

==============================
Press 'q' and then <enter> to stop execution early.

Cycles per second: 1.410e+04
Head worker occupation: 91.7%
Progress: 237 / 300 total iterations (79.000%)
==============================
Best equations for output 1
Hall of Fame:
-----------------------------------------
Complexity  loss       Score     Equation
1           5.670e-01  4.181e-01  -0.018630804
3           2.187e-01  4.763e-01  (x1 * -1.1506101)
5           4.955e-02  7.424e-01  ((x2 * 0.1916655) - x1)
7           3.708e-03  1.296e+00  ((x2 * 0.19820975) - (x1 * 0.9098831))
9           6.515e-04  8.695e-01  (((x1 * 0.09559753) - (x2 * -0.20119086)) - x1)

==============================
Best equations for output 2
Hall of Fame:
-----------------------------------------
Complexity  loss       Score     Equation
1           6.990e-01  2.029e-01  x3
3           6.155e-01  6.359e-02  (x2 * -0.51654696)
5           2.037e-03  2.855e+00  (x3 - (x2 * 0.5009787))
7           1.352e-03  2.052e-01  ((x3 - (x2 - x3)) * 0.5009425)
9           1.166e-03  7.378e-02  (x3 - ((x2 - (x1 * 0.0051128366)) * 0.4992699))

==============================
Press 'q' and then <enter> to stop execution early.

Cycles per second: 1.420e+04
Head worker occupation: 92.6%
Progress: 277 / 300 total iterations (92.333%)
==============================
Best equations for output 1
Hall of Fame:
-----------------------------------------
Complexity  loss       Score     Equation
1           5.670e-01  4.181e-01  -0.018720917
3           2.187e-01  4.763e-01  (x1 * -1.1505861)
5           4.955e-02  7.424e-01  ((x2 * 0.1916655) - x1)
7           7.987e-04  2.064e+00  (((x2 * 0.2226061) - x1) * 0.9039281)
9           6.515e-04  1.018e-01  (((x1 * 0.09559753) - (x2 * -0.20119086)) - x1)

==============================
Best equations for output 2
Hall of Fame:
-----------------------------------------
Complexity  loss       Score     Equation
1           6.990e-01  2.029e-01  x3
3           6.155e-01  6.359e-02  (x2 * -0.51654696)
5           2.037e-03  2.855e+00  (x3 - (x2 * 0.5009787))
7           1.352e-03  2.052e-01  ((x3 - (x2 - x3)) * 0.5009425)
9           1.166e-03  7.378e-02  (x3 - ((x2 - (x1 * 0.0051128366)) * 0.4992699))

==============================
Press 'q' and then <enter> to stop execution early.
Nonlinear Solution with 2 equations and 0 parameters.
Returncode: converged
L₂ Norm error : [0.013097956696176185, 0.00858541025345419]
AIC : [-8674.933375702132, -9520.13968806776]
R² : [0.9999976252527984, 0.9999970697781533]

We see that the system has been recovered correctly, indicated by the small error. A closer look at the equations r

system = result(res)

println(system)
println(res)
Model ##Basis#576 with 2 equations
States : x[1](t) x[2](t)
Independent variable: t
Equations
Differential(t)(x[1](t)) = 0.20119086x[2](t) - 0.9044025x[1](t)
Differential(t)(x[2](t)) = 0.0025526853x[1](t) + u[1](t) - 0.4992699x[2](t)
Nonlinear Solution with 2 equations and 0 parameters.
Returncode: converged
L₂ Norm error : [0.013097956696176185, 0.00858541025345419]
AIC : [-8674.933375702132, -9520.13968806776]
R² : [0.9999976252527984, 0.9999970697781533]

Shows that while not obvious, the representation And also plot the prediction of the recovered dynamics

plot(res)

To convert the result into an ODESystem, we substitute the control signal

u = controls(system)
t = get_iv(system)

subs_control = (u[1] => sin(0.5*t))

eqs = map(equations(system)) do eq
    eq.lhs ~ substitute(eq.rhs, subs_control)
end

@named sys = ODESystem(
    eqs,
    get_iv(system),
    states(system),
    []
    );

And simulated using OrdinaryDiffEq.jl using the (known) initial conditions and the parameter mapping of the estimation. Since the parameters are hard numerical values we do not need to include those.

x = states(system)
x0 = [x[1] => u0[1], x[2] => u0[2]]

ode_prob = ODEProblem(sys, x0, tspan)
estimate = solve(ode_prob, Tsit5(), saveat = prob.t);

And look at the result

plot(sol, color = :black)
plot!(estimate, color = :red, linestyle = :dash)

Copy-Pasteable Code

using DataDrivenDiffEq
using ModelingToolkit
using LinearAlgebra
using OrdinaryDiffEq
using SymbolicRegression

A = [-0.9 0.2; 0.0 -0.5]
B = [0.0; 1.0]
u0 = [10.0; -10.0]
tspan = (0.0, 20.0)

f(u,p,t) = A*u .+ B .* sin(0.5*t)

sys = ODEProblem(f, u0, tspan)
sol = solve(sys, Tsit5(), saveat = 0.01);

X = Array(sol)
t = sol.t
U = permutedims(sin.(0.5*t))
prob = ContinuousDataDrivenProblem(X, t, U = U)

alg = EQSearch([-, *], loss = L1DistLoss(), verbosity = 0, maxsize = 9, batching = true, batchSize = 50, parsimony = 0.01f0)

res = solve(prob, alg, max_iter = 1_0, numprocs = 0, multithreading = false)

system = result(res)
println(system)
println(res)

u = controls(system)
t = get_iv(system)

subs_control = (u[1] => sin(0.5*t))

eqs = map(equations(system)) do eq
    eq.lhs ~ substitute(eq.rhs, subs_control)
end

@named sys = ODESystem(
    eqs,
    get_iv(system),
    states(system),
    []
    );

x = states(system)
x0 = [x[1] => u0[1], x[2] => u0[2]]

ode_prob = ODEProblem(sys, x0, tspan)
estimate = solve(ode_prob, Tsit5(), saveat = prob.t);

# This file was generated using Literate.jl, https://github.com/fredrikekre/Literate.jl

This page was generated using Literate.jl.