Sparse Identification with noisy data

Many real world data sources are corrupted with measurment noise, which can have a big impact on the recovery of the underlying equations of motion. This example show how we can use collocation and batching to perform SINDy in the presence of noise.

using DataDrivenDiffEq
using LinearAlgebra
using ModelingToolkit
using OrdinaryDiffEq
using Plots
gr()

function pendulum(u, p, t)
    x = u[2]
    y = -9.81sin(u[1]) - 0.3u[2]^3 -3.0*cos(u[1]) - 10.0*exp(-((t-5.0)/5.0)^2)
    return [x;y]
end

u0 = [0.99π; -1.0]
tspan = (0.0, 15.0)
prob = ODEProblem(pendulum, u0, tspan)
sol = solve(prob, Tsit5(), saveat = 0.01);

We add random noise to our measurements.

X = sol[:,:] + 0.2 .* randn(size(sol));
ts = sol.t;

plot(ts, X', color = :red)
plot!(sol, color = :black)

To estimate the system, we first create a DataDrivenProblem via feeding in the measurement data. Using a Collocation method, it automatically provides the derivative and smoothes the trajectory. Control signals can be passed in as a function (u,p,t)->control or an array of measurements.

prob = ContinuousDataDrivenProblem(X, ts, GaussianKernel() ,
    U = (u,p,t)->[exp(-((t-5.0)/5.0)^2)], p = ones(2))

plot(prob, size = (600,600))

Now we infer the system structure. First we define a Basis which collects all possible candidate terms. Since we want to use SINDy, we call solve with an Optimizer, in this case STLSQ which iterates different sparsity thresholds and returns a pareto optimal solution of the underlying sparse_regression!. Note that we include the control signal in the basis as an additional variable c.

@variables u[1:2] c[1:1]
@parameters w[1:2]
u = collect(u)
c = collect(c)
w = collect(w)

h = Num[sin.(w[1].*u[1]);cos.(w[2].*u[1]); polynomial_basis(u, 5); c]

basis = Basis(h, u, parameters = w, controls = c);
Model ##Basis#361 with 24 equations
States : u[1] u[2]
Parameters : w[1] w[2]
Independent variable: t
Equations
φ₁ = sin(u[1]*w[1])
φ₂ = cos(u[1]*w[2])
φ₃ = 1
φ₄ = u[1]
...
φ₂₄ = c[1]

To solve the problem, we also define a DataSampler which defines randomly shuffled minibatches of our data and selects the best fit.

sampler = DataSampler(Batcher(n = 5, shuffle = true, repeated = true))
λs = exp10.(-10:0.1:-1)
opt = STLSQ(λs)
res = solve(prob, basis, opt, progress = false, sampler = sampler, denoise = false, normalize = false, maxiter = 5000)
Linear Solution with 2 equations and 7 parameters.
Returncode: solved
L₂ Norm error : [68.58981603423906, 228.31842830788685]
AIC : [6360.444248169436, 8165.542646239592]
R² : [0.9788629108077467, 0.9723820605878214]
Info

A more detailed description of the result can be printed via print(res, Val{true}), which also includes the discovered equations and parameter values.

Where the resulting DataDrivenSolution stores information about the inferred model and the parameters:

system = result(res)
params = parameters(res)
Model ##Basis#364 with 2 equations
States : u[1] u[2]
Parameters : 7
Independent variable: t
Equations
Differential(t)(u[1]) = p₃*u[2]
Differential(t)(u[2]) = p₆*(u[2]^3) + p₄*sin(u[1]*w[1]) + p₅*cos(u[1]*w[2]) + p₇*c[1]
[1.0, 1.0, 1.0024, -9.3, -2.8, -0.29, -9.5]

And a visual check of the result can be perfomed via plotting the result

plot(
    plot(prob), plot(res), layout = (1,2)
)

Copy-Pasteable Code

using DataDrivenDiffEq
using LinearAlgebra
using ModelingToolkit
using OrdinaryDiffEq

function pendulum(u, p, t)
    x = u[2]
    y = -9.81sin(u[1]) - 0.3u[2]^3 -3.0*cos(u[1]) - 10.0*exp(-((t-5.0)/5.0)^2)
    return [x;y]
end

u0 = [0.99π; -1.0]
tspan = (0.0, 15.0)
prob = ODEProblem(pendulum, u0, tspan)
sol = solve(prob, Tsit5(), saveat = 0.01);

X = sol[:,:] + 0.2 .* randn(size(sol));
ts = sol.t;

prob = ContinuousDataDrivenProblem(X, ts, GaussianKernel() ,
    U = (u,p,t)->[exp(-((t-5.0)/5.0)^2)], p = ones(2))

@variables u[1:2] c[1:1]
@parameters w[1:2]
u = collect(u)
c = collect(c)
w = collect(w)

h = Num[sin.(w[1].*u[1]);cos.(w[2].*u[1]); polynomial_basis(u, 5); c]

basis = Basis(h, u, parameters = w, controls = c);
println(basis) # hide

sampler = DataSampler(Batcher(n = 5, shuffle = true, repeated = true))
λs = exp10.(-10:0.1:-1)
opt = STLSQ(λs)
res = solve(prob, basis, opt, progress = false, sampler = sampler, denoise = false, normalize = false, maxiter = 5000)
println(res) #hide

system = result(res)
params = parameters(res)
println(system) #hide
println(params) #hide

# This file was generated using Literate.jl, https://github.com/fredrikekre/Literate.jl

This page was generated using Literate.jl.