Skip to content

Experimental Data Redundancy

Miguel Tomas Silva edited this page Nov 15, 2023 · 26 revisions

Home >> Connect Multiple DAQ devices for the same Experimental Setup / Project >> Experimental Data Redundancy

Change Language
Last update: 15-11-2023

Experimental Data Redundancy during an experiment is made on every dataset uploaded to a data repository, for instance, a dataverse, where previously uploaded dataset values are reuploaded on each subsequent dataset with newer data. On the repository side, as an experiment advances, the number of dataset files available increases, with overlapping measured sensor data values between consecutive files.

The last dataset upload holds all data values with a unique fingerprint ID and is linked together in a blockchain-like kind of logic.

It is important to mention here, the importance of linear timed dataset uploads as a way to assist in data acquisition validation of experimental data origins. Therefore is important to happen:

  • TRUE randomized dataset uploads to a data repository, by the smart DAQ setup in any experiment and during the course of an experimental campaign
  • TRUE randomized dataset uploads requests made remotely by the data repository.

During the day, at work hours, and during the night when all is dark.


Smart DAQ Autonomous Cooperation

datasets record history

One possible way to improve the trustworthiness of experimental data collected from local sensors is to use redundant datasets. Each smart DAQ connected in a swarm-like manner will hold in real-time a copy of all other datasets on all devices it connects. Updated and verified on each new data measurement made and replicated across the swarm. This way, in case of external interference, it is possible to identify if is one of continuous nature or instead, it happened intermittently during the experimental campaign, by identifying mismatched data values between the current dataset and older datasets previously received from all other devices. Each smart DAQ has the task of comparing previous existing datasets from another device with newer ones received from the same device. While at the same time verifying changes on the experimental data blockchain hashes.

Another way to improve the trustworthiness of the experimental data collected is by uploading to the data repository all copies of a dataset including the one where data was collected from all smart DAQ devices connected. In this case, the data repository needs to verify incoming datasets by comparing previous and existing ones in the data repository with the newer just arrived. The data repository must be able to do verification of changes on the experimental data blockchain hashes among all dataset files previously downloaded and notify back to a smart DAQ when a mismatch is found.

Finally, is also possible to implement MD5 checksum, or other authenticity file verification algorithm, to each dataset file saved over time during an ongoing experiment. This allows the linkage of the previous dataset file to the next, in all similar to a blockchain, meaning any attempts to change a dataset are easily identified when performing experimental data validation tasks.


DAQ Cooperation in the same batch and the same experiment

Is also possible to set up many Smart DAQs to collaborate together autonomously. For instance, in an experimental setup, with the same batch of specimens to be tested simultaneously, each with its own smart DAQ collecting sensor data, DAQs can be configured to exchange experimental data among each, and do:

  • synchronization of collected data among each other, for instance, using the same Primary key as a common index for the different databases holding experimental data
  • Link individual measurements on each individual specimen, together, by sharing individual fingerprint data IDs with each other stored redundantly, and with the same index key across the different local databases.

This blockchain-like logic includes not only the fingerprint ID of the previous data measurement but also all fingerprint IDs generated on all other specimens in the same experimental setup. An example, let's say a researcher has set up 3 specimens for a round of testing, each scheduled sensor data measured, on each of the 3 specimens, will include the sensor data itself with its unique fingerprint ID, and also all fingerprint IDs from the other 2 specimens from the current time-indexed measurement and also from the previous measurement. In total, for this particular example, with only 3 specimens, any sensor data, at any given time, will hold 6 Unique fingerprint IDs on every individual "experimental data block". For the case of a setup with 5 specimens, each sensor data block will hold 10 Unique fingerprint IDs.

This way recording experimental data have advantages over the current experimental data acquisition setup. The ability to upload experimental data in real-time, live, to a remote data repository, and assists remote researcher teams to do experimental data validation with the lowest latency only limited by technology constraints and network bandwidth usage at any given time (Time here is to be perceived in milliseconds). So any data forgery requires direct access when an experiential dataset is being uploaded, and also able to understand it and change it with the intent of changing any data values other than the factually measured and intended for the experimental setup.



DAQ Cooperation in the same batch and different experiments

Is also possible to link experimental data blocks collected from the same batch of specimens across different experimental setups:

  • Setup to start on the same date and time
  • Setup to start at different dates and times and with the same measurement interval

For such cases, the recording of individual "experimental data blocks" will include a time-dependent index key, linking individual sensor data measurements, and also include all fingerprint IDs for the corresponding index key from each specimen across the different experimental setups. For a better understanding of what is previously said, consider 2 different experiments with the same batch of 5 specimens each. For this particular setup, each sensor data measurement is stored on its individual "data block" with its individual unique fingerprint ID will also include all unique fingerprint IDs from all specimens across the 2 experiments, a total o 10 fingerprint IDs.

This kind of data validation logic, using data redundancy and time-aware experimental data measurements, allows a researcher to present dataset files where sensor data is linked to any and all other experiments previously set up and linked to do data measurements. This increases the trustworthiness of data stored in a public data repository while at the same allows remote researchers to program and automate experimental data linkage. For instance in an Excel workbook or in a Python script.


DAQ Cooperation with different batches in the same project

It is also possible to link experimental data blocks collected from different batches of specimens across different experimental setups. In all similar, if not equal, to what was described previously.


DAQ Cooperation with different research projects

Finally, this smart DAQ, allows the linkage of individual sensor data measurements across different research projects, whether happening in the same physical laboratory or at different laboratory locations. The same principles previously stated are applied. Individual Smart DAQs are set up and configured to exchange data autonomously, with the purpose of generating, storing, managing, and exchanging unique data fingerprint IDs from the different experimental data sources.


Verification of the blockchain integrity

Verification of the blockchain integrity is made in parallel during an experimental campaign and using separate computing resources. It can be made on the Laptop/computer of the scientific researcher and all members of a research team. As well as it can happen to the editorial staff of a journal when submitting any communication related to all existing datasets in a data repository.



Clone this wiki locally