Skip to main content

Model Data Lifecycle

The XINA model data lifecycle involves four primary phases:

Import and Mining

Source Files

In XINA data models, source files refer to the input data files for mnemonic data (and potentially events). These come in two flavors, buffer source files and archive source files. Archive files are considered the definitive record of source data for a single origin (see below). Buffer source files are an optional feature for less structured data inputs. Scheduled asynchronous tasks merge buffer files for each origin into archive files, allowing the buffer files to be deleted.

Buffer Files

Each origin may contain a single buffer source file database. It is configured as single-file-per-record.

Required Fields

Field Type Description
uuid uuid universally unique ID
name utf8vstring(128) file name
t_min instant(us) earliest time of data in file
t_max instant(us) latest time of data in file
format asciivstring(16) file format (see below)
conf jsonobject (may be null) configuration parameters, depending on format

Archive Files

Each origin must contain a single archive source file database. It is configured as a multiple-file-per-record structure.

Required Fields

Field Type Description
uuid uuid universally unique ID
p_id int(8) primary ID
s_id int(4) secondary ID
t_start instant(us) start time of data in file
t_end instant(us) end time of data in file
duration duration t_end - t_start
t_min instant(us) earliest time of data in file
t_max instant(us) latest time of data in file
label utf8vstring(128) plain text label
content utf8text extended text / CSV / HTML
meta jsonobject (may be null) arbitrary metadata as needed
format asciivstring(16) file format (see below)
conf jsonobject (may be null) configuration parameters, depending on format
type int(2) interval type code
level int(1) level code

Source File Formats

Currently there are two natively supported general purpose formats, one using the codes csv/tsv (full documentation here), and a binary format using the code xbin (full documentation here) Additional formats will be added in the future, and custom project specific formats may be added as needed.

Data Flow

XINA model data input involves two phases, the import phase and the mining phase. The approach to these phases differs depending on whether data is being imported with buffer files or archive files.

Buffer Import

Buffer files are imported with the STRUCT_BUFFER_IMPORT action. This invokes three effects:

  • the raw buffer file is parsed, validated, and stored in the model origin buffer file database
  • new definitions are created for any unrecognized mnemonic labels
  • data is added to the mnemonic buffer database for the associated origin

No additional data processing occurs as part of this step. XINA models utilizing buffer source files must implement routine execution of the STRUCT_BUFFER_ARCHIVE asynchronous task (typically every 24 hours) to merge the files into archive files, which can then be processed by MODEL_ARCH_MINE tasks to fully process data into model standard databases.

Pros

  • minimal client side configuration required to get started
  • allows smaller, faster file uploads to view data close to real-time
  • flexible and responsive to changing environments, mnemonics, requirements

Cons

  • performance is worse than client side aggregation
  • not recommended above 1k total data points per second

Archive Import

Alternatively, archive files may be imported directly with the MODEL_ARCH_IMPORT action.

Pros

  • much higher performance ceiling than server side aggregation
  • stringent validation ensures data conforms to standard

Cons

  • more complex initial setup
  • mnemonic definitions need coordination between client and server
  • changes are more complex and likely involve human interaction

Assumptions and Limitations

Each archive source file is considered the single source of truth for all mnemonics, instants, and intervals for it's associated origin for its time range. This has the following implications:

Archive files with the same origin cannot contain overlapping time ranges. If an import operation is performed with a file violating this constraint the operation will fail and return an error.

Within a single model, each mnemonic may only come from a single origin. Because mnemonics are not necessarily strictly associated with models, and the source may vary between models, this cannot be verified on import and must be verified on the client prior to importing data.