Sparse Identification with noisy data
Many data real world data sources are corrupted with measurment noise, which can have a big impact on the recovery of the underlying equations of motion. This example show how we can use collocation
and batching
to perform SINDy in the presence of noise.
using DataDrivenDiffEq
using LinearAlgebra
using ModelingToolkit
using OrdinaryDiffEq
using Plots
gr()
function pendulum(u, p, t)
x = u[2]
y = -9.81sin(u[1]) - 0.3u[2]^3 -3.0*cos(u[1]) - 10.0*exp(-((t-5.0)/5.0)^2)
return [x;y]
end
u0 = [0.99π; -1.0]
tspan = (0.0, 15.0)
prob = ODEProblem(pendulum, u0, tspan)
sol = solve(prob, Tsit5(), saveat = 0.01);
We add random noise to our measurements.
X = sol[:,:] + 0.2 .* randn(size(sol));
ts = sol.t;
plot(ts, X', color = :red)
plot!(sol, color = :black)
To estimate the system, we first create a DataDrivenProblem
via feeding in the measurement data. Using a Collocation method, it automatically provides the derivative and smoothes the trajectory. Control signals can be passed in as a function (u,p,t)->control
or an array of measurements.
prob = ContinuousDataDrivenProblem(X, ts, GaussianKernel() ,
U = (u,p,t)->[exp(-((t-5.0)/5.0)^2)], p = ones(2))
plot(prob, size = (600,600))
Now we infer the system structure. First we define a Basis
which collects all possible candidate terms. Since we want to use SINDy, we call solve
with an Optimizer
, in this case STLSQ
which iterates different sparsity thresholds and returns a pareto optimal solution of the underlying sparse_regression!
. Note that we include the control signal in the basis as an additional variable c
.
@variables u[1:2] c[1:1]
@parameters w[1:2]
u = collect(u)
c = collect(c)
w = collect(w)
h = Num[sin.(w[1].*u[1]);cos.(w[2].*u[1]); polynomial_basis(u, 5); c]
basis = Basis(h, u, parameters = w, controls = c)
\begin{align} \varphi1 =& \sin\left( u1 w1 \right) \ \varphi2 =& \cos\left( u1 w2 \right) \ \varphi3 =& 1 \ \varphi4 =& u1 \ \varphi5 =& u1^{2} \ \varphi6 =& u1^{3} \ \varphi7 =& u1^{4} \ \varphi8 =& u1^{5} \ \varphi9 =& u2 \ \varphi{1 0} =& u1 u2 \ \varphi{1 1} =& u1^{2} u2 \ \varphi{1 2} =& u1^{3} u2 \ \varphi{1 3} =& u1^{4} u2 \ \varphi{1 4} =& u2^{2} \ \varphi{1 5} =& u2^{2} u1 \ \varphi{1 6} =& u2^{2} u1^{2} \ \varphi{1 7} =& u2^{2} u1^{3} \ \varphi{1 8} =& u2^{3} \ \varphi{1 9} =& u2^{3} u1 \ \varphi{2 0} =& u2^{3} u1^{2} \ \varphi{2 1} =& u2^{4} \ \varphi{2 2} =& u2^{4} u1 \ \varphi{2 3} =& u2^{5} \ \varphi{2 4} =& c_1 \end{align}
To solve the problem, we also define a DataSampler
which defines randomly shuffled minibatches of our data and selects the best fit.
sampler = DataSampler(Batcher(n = 5, shuffle = true, repeated = true))
λs = exp10.(-10:0.1:-1)
opt = STLSQ(λs)
res = solve(prob, basis, opt, progress = false, sampler = sampler, denoise = false, normalize = false, maxiter = 5000)
Linear Solution with 2 equations and 7 parameters.
Returncode: solved
L₂ Norm error : [65.26399418020465, 197.01483873182758]
AIC : [9426.609907370652, 6109.884562225435]
R² : [0.99387690992186, 1.0113767852722464]
A more detailed description of the result can be printed via print(res, Val{true})
, which also includes the discovered equations and parameter values.
Where the resulting DataDrivenSolution
stores information about the inferred model and the parameters:
system = result(res)
params = parameters(res)
Model ##Basis#383 with 2 equations
States : u[1] u[2]
Parameters : 7
Independent variable: t
Equations
Differential(t)(u[1]) = p₃*u[2]
Differential(t)(u[2]) = p₆*(u[2]^3) + p₄*sin(u[1]*w[1]) + p₅*cos(u[1]*w[2]) + p₇*c[1]
[1.0, 1.0, 1.0026, -9.5, -2.9, -0.31, -9.7]
And a visual check of the result can be perfomed via plotting the result
plot(
plot(prob), plot(res), layout = (1,2)
)
Copy-Pasteable Code
using DataDrivenDiffEq
using LinearAlgebra
using ModelingToolkit
using OrdinaryDiffEq
function pendulum(u, p, t)
x = u[2]
y = -9.81sin(u[1]) - 0.3u[2]^3 -3.0*cos(u[1]) - 10.0*exp(-((t-5.0)/5.0)^2)
return [x;y]
end
u0 = [0.99π; -1.0]
tspan = (0.0, 15.0)
prob = ODEProblem(pendulum, u0, tspan)
sol = solve(prob, Tsit5(), saveat = 0.01);
X = sol[:,:] + 0.2 .* randn(size(sol));
ts = sol.t;
prob = ContinuousDataDrivenProblem(X, ts, GaussianKernel() ,
U = (u,p,t)->[exp(-((t-5.0)/5.0)^2)], p = ones(2))
@variables u[1:2] c[1:1]
@parameters w[1:2]
u = collect(u)
c = collect(c)
w = collect(w)
h = Num[sin.(w[1].*u[1]);cos.(w[2].*u[1]); polynomial_basis(u, 5); c]
basis = Basis(h, u, parameters = w, controls = c)
sampler = DataSampler(Batcher(n = 5, shuffle = true, repeated = true))
λs = exp10.(-10:0.1:-1)
opt = STLSQ(λs)
res = solve(prob, basis, opt, progress = false, sampler = sampler, denoise = false, normalize = false, maxiter = 5000)
println(res) #hide
system = result(res)
params = parameters(res)
println(system) #hide
println(params) #hide
# This file was generated using Literate.jl, https://github.com/fredrikekre/Literate.jl
This page was generated using Literate.jl.