# Sparse Identification of Nonlinear Dynamics

Sparse Identification of Nonlinear Dynamics - or SINDy - identifies the equations of motion of a system as the result of a sparse regression over a chosen basis. In particular, it tries to find coefficients $\Xi$ such that:

where, in most cases, $Y$is the data matrix containing the derivatives of the state data stored in $X$. $\Theta$ is a matrix containing candidate functions $\xi$ over the measurements in $X$.

## Example

As in the original paper, we will estimate the Lorenz System. First, let's create the necessary data and have a look at the trajectory.

```
using DataDrivenDiffEq
using ModelingToolkit
using OrdinaryDiffEq
using LinearAlgebra
using Plots
gr()
# Create a test problem
function lorenz(u,p,t)
x, y, z = u
ẋ = 10.0*(y - x)
ẏ = x*(28.0-z) - y
ż = x*y - (8/3)*z
return [ẋ, ẏ, ż]
end
u0 = [-8.0; 7.0; 27.0]
p = [10.0; -10.0; 28.0; -1.0; -1.0; 1.0; -8/3]
tspan = (0.0,100.0)
dt = 0.001
problem = ODEProblem(lorenz,u0,tspan)
solution = solve(problem, Tsit5(), saveat = dt, atol = 1e-7, rtol = 1e-8)
```

Additionally, we generate the *ideal* derivative data.

```
X = Array(solution)
DX = similar(X)
for (i, xi) in enumerate(eachcol(X))
DX[:,i] = lorenz(xi, [], 0.0)
end
```

To generate the symbolic equations, we need to define a `Basis`

over the variables `x y z`

. In this example, we will use all monomials up to degree of 4 and their products:

```
@variables x y z
u = Operation[x; y; z]
polys = Operation[]
for i ∈ 0:4
for j ∈ 0:i
for k ∈ 0:j
push!(polys, u[1]^i*u[2]^j*u[3]^k)
push!(polys, u[2]^i*u[3]^j*u[1]^k)
push!(polys, u[3]^i*u[1]^j*u[2]^k)
end
end
end
basis = Basis(polys, u)
```

*A Basis consists of unique functions, so duplicates will be included just once*

To perform the sparse identification on our data, we need to define an `Optimizer`

. Here, we will use `STRRidge`

, which is described in the original paper. The threshold of the optimizer is set to `0.1`

. An overview of the different optimizers can be found below.

```
opt = STRRidge(0.1)
Ψ = SINDy(X, DX, basis, opt, maxiter = 100, normalize = true)
```

`Sparse Identification Result with 7 active terms.`

`Ψ`

is a `SINDyResult`

, which stores some about the regression. As we can see, we have 7 active terms inside the model. To look at the equations, simply type

`print_equations(Ψ)`

```
3 dimensional basis in ["x", "y", "z"]
dx = p₁ * x + p₂ * y
dy = p₃ * x + p₄ * y + p₅ * x * z
dz = p₆ * z + p₇ * x * y
```

First, let's have a look at the $L2$-Error and Akaikes Information Criterion of the result

`get_error(Ψ)`

```
3-element Array{Float64,1}:
6.7202639134663155e-12
3.4645801800720905e-11
1.2876598297504605e-11
```

`get_aicc(Ψ)`

```
3-element Array{Float64,1}:
106.90381476450982
102.34377824714254
104.30267815708592
```

We can also access the coefficient matrix $\Xi$ directly via `get_coefficients(Ψ)`

.

To generate a numerical model usable in `DifferentialEquations`

, we simply use the `ODESystem`

function from `ModelingToolkit`

. The resulting parameters used for the identification can be accessed via `parameters(Ψ)`

. The returned vector also includes the parameters of the original `Basis`

used to generate the result.

```
ps = parameters(Ψ)
sys = ODESystem(Ψ)
dudt = ODEFunction(sys)
prob = ODEProblem(dudt, u0, tspan, ps)
sol = solve(prob, Tsit5(), saveat = solution.t, atol = 1e-7, rtol = 1e-8)
```

Let's have a look at the trajectory of $u_{1}(t)$.

Finally, let's investigate the error of the chaotic equations:

which resembles the papers results. Next, we could use classical parameter estimation methods or use DiffEqFlux to fine-tune our result (if needed).

## Functions

`DataDrivenDiffEq.SINDy`

— Function```
SINDy(X, Y, basis, opt = STRRidge(); p, t, maxiter, convergence_error, denoise, normalize)
SINDy(X, Y, basis, lambdas, opt = STRRidge(); f, g, p, t, opt, maxiter, convergence_error, denoise, normalize)
```

Performs Sparse Identification of Nonlinear Dynamics given the data matrices `X`

and `Y`

via the `AbstractBasis`

`basis.`

Keyworded arguments include the parameter (values) of the basis `p`

and the timepoints `t`

which are passed in optionally. `opt`

is an `AbstractOptimizer`

useable for sparse regression, `maxiter`

the maximum iterations to perform and `convergence_error`

the bound which causes the optimizer to stop. `denoise`

defines if the matrix holding candidate trajectories should be thresholded via the optimal threshold for singular values. `normalize`

normalizes the matrix holding candidate trajectories via the L2-Norm over each function.

Typically `X`

represent the state measurements and `Y`

the measurements of the differential state. Since only the number of measurements (column dimension of the matrices) have to be equal, it is possible to augment `X`

with additional data, e.g. external forcing or inputs.

If `SINDy`

is called with an additional array of thresholds contained in `lambdas`

, it performs a multi objective optimization over all thresholds. The best candidate is determined via the mapping onto a feature space `f`

and an (scalar, positive definite) evaluation `g`

. The signature of should be `f(xi, theta, yi)`

where `xi`

are the coefficients of the sparse optimization,`theta`

is the evaluated candidate library and `yi`

are the rows of the matrix `Y`

. Returns a `SINDyResult`

. If the pareto optimization is used, the result combines the best candidate for each row of `Y`

.

`DataDrivenDiffEq.sparse_regression`

— Function```
sparse_regression(X, Y, basis, p, t, maxiter, opt, denoise, normalize, convergence_error)
sparse_regression!(Xi, X, Y, basis, p, t, maxiter, opt, denoise, normalize, convergence_error)
spares_regression!(Xi, Theta, Y, maxiter, opt, denoise, normalize, convergence_error)
```

Performs a sparse regression via the algorithm `opt <: AbstractOptimizer`

. `maxiter`

specifies the upper bound of the iterations of the optimizer, `convergence_error`

the breaking condition due to convergence. `denoise`

defines if the matrix holding candidate trajectories should be thresholded via the optimal threshold for singular values. `normalize`

normalizes the matrix holding candidate trajectories via the L2-Norm over each function.

If the data matrices `X`

, `Y`

are given with a `Basis`

`basis`

and the additional information for parameters `p`

and time points of the measurements `t`

, it returns the coefficient matrix `Xi`

and the iterations taken. This function is also available in place, which returns just the iterations.

If `Xi`

, `Theta`

and `Y`

are given, the sparse regression will find the coefficients `Xi`

, which minimize the objective and return the iterations needed.

**Example**

```
opt = STRRidge()
maxiter = 10
c_error = 1e-3
Xi, iters = sparse_regression(X, Y, basis, [], [], maxiter, opt, false, false, c_error)
iters = sparse_regression!(Xi,X, Y, basis, [], [], maxiter, opt, false, false, c_error)
Xi2 = zeros(size(Y, 1), size(X, 1))
iters = sparse_regression!(Xi2, X, Y, maxiter, opt, false, false, c_error)
```

`DataDrivenDiffEq.SparseIdentificationResult`

— Type`SparseIdentificationResult()`

Contains the result of a sparse identification. Contains the coefficient matrix `Ξ`

, the equations of motion, and its associated parameters. It also stores the optimizer, iteration counter, and convergence status.

Additionally, the model is evaluated over the training data and the $L_2$-error, Akaikes Information Criterion, and the $L_0$-Norm of the coefficients is stored.

`DataDrivenDiffEq.get_sparsity`

— Function`get_sparsity(res)`

Return the $L_0$-Norm of the `SparseIdentificationResult`

`DataDrivenDiffEq.get_error`

— Function`get_error(res)`

Return the $L_2$-Error of the `SparseIdentificationResult`

over the training data.

`DataDrivenDiffEq.get_aicc`

— Function`get_aicc(res)`

Return Akaikakes Information Criterion of the `SparseIdentificationResult`

over the training data.

`DataDrivenDiffEq.get_coefficients`

— Function`get_coefficients(res)`

Return the coefficient matrix `Ξ`

of the `SparseIdentificationResult`

.