Model Data Lifecycle
The XINA model data lifecycle involves four primary phases:
Import and Mining
Source
Each Filesorigin
Inmaintains XINAa dataset models,of source files, refercontaining to the inputall data filesimported into XINA for mnemonicthat data (and potentially events).origin. These come in two flavors, buffer source files and archive source files. Archive files are considered the definitive record of source data for a range of time for a single origin (see below). Buffer source files are an optional feature for less structured data inputs. Scheduled asynchronous tasks merge buffer files for each origin into archive files, allowing the buffer files to be deleted.
BufferIn Files
general, Eachthere are three supported approaches for origin data flow: variable time archive import, fixed time archive import, and buffer import. While a single origin may containonly support one approach, a model may combine multiple approaches with different origins.
Direct Archive Import
Archive files are imported directly with the STRUCT_ARCHIVE_IMPORT
API action, which triggers an immediate mining operation to store, index, and optimize data in XINA databases.
Pros
- much higher performance ceiling than server side aggregation
- stringent validation ensures data conforms to standard
Cons
- more complex initial setup
- mnemonic definitions need coordination between client and server
- changes are more complex and likely involve human interaction
Variable-Time Archive Import
With variable-time archive import each archive specifies a custom time range. This is a recommended solution for projects which generate their own archival equivalent (for example, outputting a discrete data set after running a script). Because the time ranges are determined by the source data, the archive database for this approach includes all interval fields as well, to support handling the database as an interval database. It is configured as a multiple-file-per-record structure.
Required Fields
Field | Type | Description |
---|---|---|
uuid |
uuid |
universally unique ID |
p_id |
int(8) |
primary ID |
s_id |
int(8) |
secondary ID |
t_start |
instant(us) |
start time of data in file |
t_end |
instant(us) |
end time of data in file |
duration |
duration |
t_end - t_start |
t_min |
instant(us) |
earliest time of data in file |
t_max |
instant(us) |
latest time of data in file |
label |
utf8vstring(128) |
plain text label |
content |
utf8text |
extended text / CSV / HTML |
type |
int(2) |
interval type code |
level |
int(1) |
level code |
meta |
jsonobject (may be null ) |
arbitrary metadata as needed |
format |
asciivstring(16) |
file format (see below) |
conf |
jsonobject (may be null ) |
configuration parameters, depending on format |
Fixed-Time Archive Import
With fixed-time archive import each archive has a fixed time range. This is a recommended solution for projects which generate a persistent data stream (for example, data sources piped through a FEDS server). Unlike the variable-time archive database, this database is not treated as an interval database, because the time windows are arbitrary. It is configured as a multiple-file-per-record structure.
Required Fields
Field | Type | Description |
---|---|---|
uuid |
uuid |
universally unique ID |
t_start |
instant(us) |
start time of data in file |
t_end |
instant(us) |
end time of data in file |
duration |
duration |
t_end - t_start |
t_min |
instant(us) |
earliest time of data in file |
t_max |
instant(us) |
latest time of data in file |
meta |
jsonobject (may be null ) |
arbitrary metadata as needed |
format |
asciivstring(16) |
file format (see below) |
conf |
jsonobject (may be null ) |
configuration parameters, depending on format |
Buffer Import
Buffer files are imported with the STRUCT_BUFFER_IMPORT
action. This invokes three effects:
- the raw buffer file is parsed, validated, and stored in the model origin buffer file database
- new definitions are created for any unrecognized mnemonic labels
- data is added to the mnemonic buffer database for the associated origin
No additional data processing occurs as part of this step. XINA models utilizing buffer source files must implement routine execution of the STRUCT_BUFFER_ARCHIVE
asynchronous task (typically every 24 hours) to merge the files into archive files in a fixed-time archive format, which can then be processed by STRUCT_ARCHIVE_MINE
tasks to fully process data into model standard databases.
Pros
- minimal client side configuration required to get started
- allows smaller, faster file uploads to view data close to real-time
- flexible and responsive to changing environments, mnemonics, requirements
Cons
- performance is worse than client side aggregation
- not recommended above 1k total data points per second
An origin must include a single buffer source file database.database Itto support buffer importing. Unlike the archive database, it is configured as single-file-per-record.
Required Fields
Field | Type | Description |
---|---|---|
uuid |
uuid |
universally unique ID |
name |
utf8vstring(128) |
file name |
t_min |
instant(us) |
earliest time of data in file |
t_max |
instant(us) |
latest time of data in file |
format |
asciivstring(16) |
file format (see below) |
conf |
jsonobject (may be null ) |
configuration parameters, depending on format |
Archive Files
Each origin must contain a single archive source file database. It is configured as a multiple-file-per-record structure.
Required Fields
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
|
|
Source File Formats
Currently there are two natively supported general purpose formats, one using the codes csv
/tsv
(full documentation here), and a binary format using the code xbin
(full documentation here) Additional formats will be added in the future, and custom project specific formats may be added as needed.
Data Flow
XINA model data input involves two phases, the import phase and the mining phase. The approach to these phases differs depending on whether data is being imported with buffer files or archive files.
Buffer Import
Buffer files are imported with the STRUCT_BUFFER_IMPORT action. This invokes three effects:
the raw buffer file is parsed, validated, and stored in the model origin buffer file databasenew definitions are created for any unrecognized mnemonic labelsdata is added to the mnemonic buffer database for the associated origin
No additional data processing occurs as part of this step. XINA models utilizing buffer source files must implement routine execution of the STRUCT_BUFFER_ARCHIVE asynchronous task (typically every 24 hours) to merge the files into archive files, which can then be processed by MODEL_ARCH_MINE tasks to fully process data into model standard databases.
Pros
minimal client side configuration required to get startedallows smaller, faster file uploads to view data close to real-timeflexible and responsive to changing environments, mnemonics, requirements
Cons
performance is worse than client side aggregationnot recommended above 1k total data points per second
Archive Import
Alternatively, archive files may be imported directly with the MODEL_ARCH_IMPORT action.
Pros
much higher performance ceiling than server side aggregationstringent validation ensures data conforms to standard
Cons
more complex initial setupmnemonic definitions need coordination between client and serverchanges are more complex and likely involve human interaction
Assumptions and Limitations
Each archive source file is considered the single source of truth for all mnemonics, instants, and intervals for it's associated origin for its time range. This has the following implications:
Archive files with the same origin cannot contain overlapping time ranges. If an import operation is performed with a file violating this constraint the operation will fail and return an error.
Within a single model, each mnemonic may only come from a single origin. Because mnemonics are not necessarily strictly associated with models, and the source may vary between models, this cannot be verified on import and must be verified on the client prior to importing data.