struct DataDrivenProblem{dType, cType, probType} <: DataDrivenDiffEq.AbstractDataDrivenProblem{dType, cType, probType}

The DataDrivenProblem defines a general estimation problem given measurements, inputs and (in the near future) observations. Three construction methods are available:

  • DirectDataDrivenProblem for direct mappings
  • DiscreteDataDrivenProblem for time discrete systems
  • ContinousDataDrivenProblem for systems continuous in time

where all are aliases for constructing a problem.


  • X

    State measurements

  • t

    Time measurements (optional)

  • DX

    Differental state measurements (optional); Used for time continuous problems

  • Y

    Output measurements (optional); Used for direct problems

  • U

    Input measurements (optional); Used for non-autonoumous problems

  • p

    Parameters associated with the problem (optional)

  • name

    Name of the problem



X, DX, t = data...

# Define a discrete time problem
prob = DiscreteDataDrivenProblem(X)

# Define a continuous time problem without explicit time points
prob = ContinuousDataDrivenProblem(X, DX)

# Define a continuous time problem without explicit derivatives
prob = ContinuousDataDrivenProblem(X, t)

# Define a discrete time problem with an input function as a function
input_signal(u,p,t) = t^2
prob = DiscreteDataDrivenProblem(X, t, input_signal)

Defining a Problem

Problems of identification, estimation, or inference are defined by data. These data contain at least measurements of the states X, which would be sufficient to describe a DiscreteDataDrivenProblem with unit time steps similar to the first example on dynamic mode decomposition. Of course, we can extend this to include time points t, control signals U or a function describing those u(x,p,t). Additionally, any parameters p known a priori can be included in the problem. In practice, this looks like:

problem = DiscreteDataDrivenProblem(X)
problem = DiscreteDataDrivenProblem(X, t)
problem = DiscreteDataDrivenProblem(X, t, U)
problem = DiscreteDataDrivenProblem(X, t, U, p = p)
problem = DiscreteDataDrivenProblem(X, t, (x,p,t)->u(x,p,t))

Similarly, a ContinuousDataDrivenProblem would need at least measurements and time-derivatives (X and DX) or measurements, time information and a way to derive the time derivatives(X, t and a Collocation method). Again, this can be extended by including a control input as measurements or a function and possible parameters:

# Using available data
problem = ContinuousDataDrivenProblem(X, DX)
problem = ContinuousDataDrivenProblem(X, t, DX)
problem = ContinuousDataDrivenProblem(X, t, DX, U, p = p)
problem = ContinuousDataDrivenProblem(X, t, DX, (x,p,t)->u(x,p,t))

# Using collocation
problem = ContinuousDataDrivenProblem(X, t, InterpolationMethod())
problem = ContinuousDataDrivenProblem(X, t, GaussianKernel())
problem = ContinuousDataDrivenProblem(X, t, U, InterpolationMethod())
problem = ContinuousDataDrivenProblem(X, t, U, GaussianKernel(), p = p)

You can also directly use a DESolution as an input to your DataDrivenProblem:

problem = DataDrivenProblem(sol; kwargs...)

which evaluates the function at the specific timepoints t using the parameters p of the original problem instead of using the interpolation. If you want to use the interpolated data, add the additional keyword use_interpolation = true.

An additional type of problem is the DirectDataDrivenProblem, which does not assume any kind of causal relationship. It is defined by X and an observed output Y in addition to the usual arguments:

problem = DirectDataDrivenProblem(X, Y)
problem = DirectDataDrivenProblem(X, t, Y)
problem = DirectDataDrivenProblem(X, t, Y, U)
problem = DirectDataDrivenProblem(X, t, Y, p = p)
problem = DirectDataDrivenProblem(X, t, Y, (x,p,t)->u(x,p,t), p = p)

Concrete Types


A time continuous DataDrivenProblem useable for problems of the form f(x,p,t,u) ↦ dx/dt.

ContinuousDataDrivenProblem(X, DX; kwargs...)

Automatically constructs derivatives via an additional collocation method, which can be either a collocation or an interpolation from DataInterpolations.jl wrapped by an InterpolationMethod.



struct DataDrivenDataset{N, U, C} <: DataDrivenDiffEq.AbstractDataDrivenProblem{N, U, C}

A collection of DataDrivenProblems used to concatenate different trajectories or experiments.

Can be called with either a NTuple of problems or a NamedTuple of NamedTuples. Similar to the DataDrivenProblem, it has three constructors available:

  • DirectDataset for direct problems
  • DiscreteDataset for discrete problems
  • ContinuousDataset for continuous problems


  • name

    Name of the dataset

  • probs

    The problems

  • sizes

    The length of each problem - for internal use



A DataDrivenDataset collects several DataDrivenProblems of the same type but treads them as union used for system identification.

Concrete Types


A time continuous DataDrivenDataset useable for problems of the form f(x,p,t,u) ↦ dx/dt.

ContinuousDataset(s; name, collocation, kwargs...)

Automatically constructs derivatives via an additional collocation method, which can be either a collocation or an interpolation from DataInterpolations.jl wrapped by an InterpolationMethod provided by the collocation keyworded argument.



struct DataSampler{T} <: DataDrivenDiffEq.AbstractSampler

A simple sampler container. Takes in AbstractSamplers to apply onto a DataDrivenProblem in the order they are given. If a Split sampler is provided, then it will be moved to the first index by definition.

struct Split <: DataDrivenDiffEq.AbstractSampler

Performs a train test split of the DataDrivenProblem where ratio defines the (rough) percentage of training data.

The optional keyword shuffle indicates to sample from random shuffles of the data, allowing for repetition.

Returns ranges for training and testing data.

struct Batcher <: DataDrivenDiffEq.AbstractSampler

Partitions the DataDrivenProblem into n equal partitions. If used after performing a train test Split, works just on the training data.

The optional keyword shuffle indicates to sample from random shuffles of the data, allowing for repetition.

The optional keyword repeated indicates to allow for repeated sampling of data points.

batchsize_min is the minimum batchsize, which should be used within each partition of the dataset.

Returns ranges for each partition of the provided data.