Here is an outline of the required elements and choices:
- Define a problem using your data.
- Data can be discrete, continuous, or direct.
- Choose a basis.
- This is optional depending on which solver you choose.
- Solve the problem.
- Many solvers exist; see the docs.
using DataDrivenDiffEq, ModelingToolkit # The function we are trying to find f(u) = u^2 + 4u + 4 X = f.(1:100) # Generate data X = reshape(X, length(X), 1) # Reshape into a matrix # Create a problem from the data problem = DiscreteDataDrivenProblem(X) # Choose a basis @variables u[1:1] using Symbolics: scalarize u = scalarize(u) basis = Basis(monomial_basis(u, 2), u) # Solve the problem, using the solver of your choosing res = solve(problem, basis, STLSQ())
Problems of identification, estimation, or inference are defined by data. These data contain at least measurements of the states
X, which would be sufficient to describe a
DiscreteDataDrivenProblem with unit time steps similar to the first example on dynamic mode decomposition. Of course, we can extend this to include time points
t, control signals
U or a function describing those
u(x,p,t). Additionally, any parameters
p known a priori can be included in the problem. In practice, this looks like:
problem = DiscreteDataDrivenProblem(X) problem = DiscreteDataDrivenProblem(X, t) problem = DiscreteDataDrivenProblem(X, t, U) problem = DiscreteDataDrivenProblem(X, t, U, p = p) problem = DiscreteDataDrivenProblem(X, t, (x,p,t)->u(x,p,t))
ContinuousDataDrivenProblem would need at least measurements and time-derivatives (
DX) or measurements, time information and a way to derive the time derivatives(
t and a Collocation method). Again, this can be extended by including a control input as measurements or a function and possible parameters:
problem = ContinuousDataDrivenProblem(X, DX) problem = ContinuousDataDrivenProblem(X, t, DX) problem = ContinuousDataDrivenProblem(X, t, DX, U, p = p) problem = ContinuousDataDrivenProblem(X, t, DX, (x,p,t)->u(x,p,t)) # Using collocation problem = ContinuousDataDrivenProblem(X, t, InterpolationMethod()) problem = ContinuousDataDrivenProblem(X, t, GaussianKernel()) problem = ContinuousDataDrivenProblem(X, t, U, InterpolationMethod()) problem = ContinuousDataDrivenProblem(X, t, U, GaussianKernel(), p = p)
You can also directly use a
DESolution as an input to your
problem = DataDrivenProblem(sol; kwargs...)
which evaluates the function at the specific timepoints
t using the parameters
p of the original problem instead of using the interpolation. If you want to use the interpolated data, add the additional keyword
use_interpolation = true.
An additional type of problem is the
DirectDataDrivenProblem, which does not assume any kind of causal relationship. It is defined by
X and an observed output
Y in addition to the usual arguments:
problem = DirectDataDrivenProblem(X, Y) problem = DirectDataDrivenProblem(X, t, Y) problem = DirectDataDrivenProblem(X, t, Y, U) problem = DirectDataDrivenProblem(X, t, Y, p = p) problem = DirectDataDrivenProblem(X, t, Y, (x,p,t)->u(x,p,t), p = p)
A basis is optional, depending on the solver and solution method you are using. For instance, for DMD, a basis is not required, but for SINDy using STLQS(), it is required.
A basis can be defined like:
@variables u[1:2] Ψ = Basis([u; u^2], u)
See the Implicit Systems tutorials for more complex examples of defining a Basis.
Next up, we choose a method to
DataDrivenProblem. Depending on the input arguments and the type of problem, the function will return a result derived via
Sparse Optimization, or general
Symbolic Regression. Different options can be provided, depending on the inference method, for options like rounding, normalization, or the progress bar. A
Basis can be used for lifting the measurements.
# Use a Koopman based inference res = solve(problem, DMDSVD(), kwargs...) # Use a sparse identification res = solve(problem, basis, STLQS(), kwargs...)
res contains a
result which is the inferred system and a
metrics which is a
NamedTuple containing different metrics like the L2 error of the inferred system with the provided data and the
AICC. These can be accessed via:
# The inferred system system = result(res) # The metrics m = metrics(res) m.Sparsity # No. of active terms / nonzero coefficients m.Error # L2 Error of all data m.Errors # Individual error of the different data rows m.AICC # AICC m.AICCs # ....
Since the inferred system is a parametrized equation, the corresponding parameters can be accessed and returned via
# Vector ps = parameters(res) # Parameter map ps = parameter_map(res)
The keyword argument
eval_expression controls the function creation behavior.
eval_expression=true means that
eval is used, so normal world-age behavior applies (i.e. the functions cannot be called from the function that generates them). If
eval_expression=false, then construction via GeneralizedGenerated.jl is utilized to allow for same world-age evaluation. However, this can cause Julia to segfault on sufficiently large basis functions. By default eval_expression=false.