Source Files

In XINA data models, source files refer to the input data files for mnemonic data (and potentially instants/intervals). These come in two flavors, buffer source files and archive source files. Archive files are considered the definitive record of source data for a single origin (see below). Buffer source files are an optional feature for less structured data inputs. Scheduled asynchronous tasks merge buffer files for each origin into archive files, allowing the buffer files to be deleted.

Origin

Abstractly, a data origin (or simply origin) is a single point of data import to a model. In many cases, a model will only have a single data origin; for example, if all data is provided directly from a single instrument, or multiple components are merged into a single data stream through FEDS before import into XINA. In these cases delineation by origin is not required in model organzation, and should use this pattern:

In a model group, the data group will be used as the default location for source files which does not specify an origin.

However, in environments with multiple import points running in parallel, databases must be designed with multiple origins.

In this example each source file would need to specify either origin_a or origin_b. Additionally, each origin has distinct databases for instant, interval, and mnemonic data. This would be required if each data source provided all three data types. As requirements for instants and intervals are less stringent than mnemonics, in some circumstances instants and intervals could be considered a single source and populated independently:

Buffer Database

Each model must contain a single buffer source file database. It is configured as single-file-per-record.

Required Fields

Field	Type	Description
`org`	`asciivstring(32)` (may be `null`)	origin name
`u_id`	`uuid`	universally unique ID
`name`	`utf8vstring(128)`	file name
`t_min`	`instant(us)`	earliest time of data in file
`t_max`	`instant(us)`	latest time of data in file
`format`	`asciivstring(16)`	file format (see below)
`conf`	`jsonobject` (may be `null`)	configuration parameters, depending on `format`

If org is null, the telemetry file will be associated with the default source (and data) group.

Archive Database

Each model must contain a single archive source file database. It may either be configured as single-file-per-record or multiple-file-per-record structure, depending on the nature of the archive files.

Required Fields

Field	Type	Description
`org`	`asciivstring(32)` (may be `null`)	origin name
`u_id`	`uuid`	universally unique ID
`name`	`utf8vstring(128)`	file name
`t_start`	`instant(us)`	start time of data in file
`t_end`	`instant(us)`	end time of data in file
`meta`	`jsonobject` ~~(may be~~ `null`)	~~arbitrary metadata as needed~~
`format`	`asciivstring(16)`	~~file format (see below)~~
`conf`	`jsonobject` ~~(may be~~ `null`)	~~configuration parameters, depending on~~ `format`

If org is null~~, the telemetry file will be associated with the default source (and~~ data~~) group.~~

Archive Database

~~Each model must contain a single archive source file database. It may either be configured as a single file per record or multi-file per record structure.~~

~~Required Fields~~

~~Field~~	~~Type~~	~~Description~~
`org`	`asciivstring(32)` ~~(may be~~ `null`)	~~origin name~~
`u_id`	`uuid`	~~universally unique ID~~
`name`	`utf8vstring(128)`	~~file name~~
`t_startt_min`	`instant(us)`	~~start~~earliest time of data in file
`t_endt_max`	`instant(us)`	~~end~~latest time of data in file
`meta`	`jsonobject` (may be `null`)	arbitrary metadata as needed
`format`	`asciivstring(16)`	file format (see below)
`conf`	`jsonobject` (may be `null`)	configuration parameters, depending on `format`

If org is null, the telemetry file will be associated with the default source (and data) group.

Source File Formats

Currently there isare ~~only one~~two natively supported general purpose ~~format,~~formats, one using the codes csv/tsv (full documentation here), and a binary format using the code csvxbin/tsv. (~~The~~ full documentation ~~is available here.~~here) Additional formats will be added in the future, and custom project specific formats may be added as needed.

Data Flow

XINA model data input involves two phases, the import phase and the mining phase. The approach to these phases differs depending on whether data is being imported with buffer files or archive files.

Buffer Import

Buffer files are imported with the MODEL_BUFF_IMPORT action. This invokes three effects:

the raw buffer file is parsed, validated, and stored in the model source buffer database

new definitions are created for any unrecognized mnemonic labels

data is added to the mnemonic buffer database for the associated origin

No additional data processing occurs as part of this step. XINA models utilizing buffer source files must implement routine execution of the MODEL_BUFF_ARCH asynchronous task (typically every 24 hours) to merge the files into archive files, which can then be processed by MODEL_ARCH_MINE tasks to fully process data into model standard databases.

Pros

minimal client side configuration required to get started

allows smaller, faster file uploads to view data close to real-time

flexible and responsive to changing environments, mnemonics, requirements

Cons

performance is worse than client side aggregation

not recommended above 1k total data points per second

Archive Import

Alternatively, archive files may be imported directly with the MODEL_ARCH_IMPORT action.

Pros

much higher performance ceiling than server side aggregation

stringent validation ensures data conforms to standard

Cons

more complex initial setup

mnemonic definitions need coordination between client and server

changes are more complex and likely involve human interaction

Assumptions and Limitations

Each archive source file is considered the single source of truth for all mnemonics, instants, and intervals for it's associated origin for its time range. This has the following implications:

Archive files with the same origin cannot contain overlapping time ranges. If an import operation is performed with a file violating this constraint the operation will fail and return an error.

Within a single model, each mnemonic may only come from a single origin. Because mnemonics are not necessarily strictly associated with models, and the source may vary between models, this cannot be verified on import and must be verified on the client prior to importing data.

Data Flow

~~XINA data input involves two phases, the~~ ~~import~~ ~~phase and the~~ ~~mining~~ ~~phase. The approach to these phases differs depending on whether data is being imported with~~ ~~buffer~~ ~~files or~~ ~~archive~~ ~~files.~~

Buffer Import

~~Buffer files are imported with the~~ MODEL_SRC_BUFFER ~~action. This invokes three effects:~~

~~the raw buffer file is parsed, validated, and stored in the model~~ buff ~~database~~

~~new definitions are created for any unrecognized mnemonic labels~~

~~data is added to the~~ mn.buff ~~database for the associated origin~~

~~No additional data processing occurs as part of this step. XINA models utilizing buffer source files must implement routine execution of the~~ MODEL_BUFFER_ARCHIVE ~~asynchronous task (typically every 24 hours) to merge the files into archive files, which can then be processed by~~ MODEL_ARCHIVE_MINE ~~tasks to full process data into model standard databases.~~

~~Full details of the mining process vary depending on the file format. The default~~ csv ~~format mining tool is documented here.~~

~~All of these concepts merge together into two core paradigms of data flow management: server side aggregation, and client side aggregation.~~

Server Side Aggregation

With server side aggregation, the XINA server is responsible for aggregating one or more data services into a model. Typically this means XINA is also responsible for management of mnemonic definitions, using the MODEL_MN_IMPORT ~~API call.~~

~~Pros~~

~~minimal client side configuration required to get started~~

~~flexible and responsive to changing environments, mnemonics, requirements~~

~~Cons~~

~~performance is worse than client side aggregation~~

~~not recommended above 1k total data points per second~~

~~less stringent validation means user mistakes may go unnoticed~~

Client Side Aggregation

With client side aggregation, data for a model is entire aggregated into a single data source on the client. This solution is common for telemetry generated directly by an instrument, or multiple sources merged through FEDS. Typically the merged file(s) are a binary format and require custom utilities to convert to XINA formats, which can be deployed within the XINA ecosystem to XINA Run servers.

~~Pros~~

~~much higher performance ceiling than server side aggregation~~

~~stringent validation ensures data conforms to standard~~

~~Cons~~

~~more complex initial setup~~

~~mnemonic definitions need coordination between client and server~~

~~changes are more complex and likely involve human interaction~~