Skip to content

Experimental Data Redundancy

Miguel Tomas Silva edited this page Mar 25, 2023 · 26 revisions

Experimental Data Redundancy during an experiment is made on every dataset uploaded to a data repository, for instance, a dataverse, where previously uploaded dataset values are reuploaded on each subsequent dataset with newer data. On the repository side, as an experiment advances, the number of dataset files available increases, with overlapping measured sensor data values between consecutive files.

The last dataset upload holds all data values with a unique fingerprint ID and is linked together in a blockchain-like kind of logic.

It is important to mention here, the importance of linear timed dataset uploads as a way to assist in data acquisition validation of experimental data origins. Therefore is important to happen:

  • TRUE randomized dataset uploads to a data repository, by the smart DAQ setup in any experiment and during the course of an experimental campaign
  • TRUE randomized dataset uploads requests made remotely by the data repository.

During the day, at work hours, and during the night when all is dark.


Collaborative Smart DAQ

In the same batch and the same experiment

Is also possible to set up many Smart DAQs to collaborate together autonomously. For instance, in an experimental setup, with the same batch of specimens to be tested simultaneously, each with its own smart DAQ collecting sensor data, DAQs can be configured to exchange experimental data among each, and do:

  • synchronization of collected data among each other, for instance, using the same Primary key as a common index for the different databases holding experimental data
  • Link individual measurements on each individual specimen, together, by sharing individual fingerprint data IDs with each other stored redundantly, and with the same index key across the different local databases.

This blockchain-like logic includes not only the fingerprint ID of the previous data measurement but also all fingerprint IDs generated on all other specimens in the same experimental setup. An example, let's say a researcher has set up 3 specimens for a round of testing, each scheduled sensor data measured, on each of the 3 specimens, will include the sensor data itself with its unique fingerprint ID, and also all fingerprint IDs from the other 2 specimens from the current time-indexed measurement and also from the previous measurement. In total, for this particular example, with only 3 specimens, any sensor data, at any given time, will hold 6 Unique fingerprint IDs on every individual "experimental data block". For the case of a setup with 5 specimens, each sensor data block will hold 10 Unique fingerprint IDs.

This way recording experimental data have advantages over the current experimental data acquisition setup. The ability to upload experimental data in real-time, live, to a remote data repository, and assists remote researcher teams to do experimental data validation with the lowest latency only limited by technology constraints and network bandwidth usage at any given time (Time here is to be perceived in milliseconds). So any data forgery requires direct access when an experiential dataset is being uploaded, and also able to understand it and change it with the intent of changing any data values other than the factually measured and intended for the experimental setup.


In the same batch and different experiments

Is also possible to link experimental data blocks collected from the same batch of specimens across different experimental setups:

  • Setup to start on the same date and time
  • Setup to start at different dates and times and with the same measurement interval

For such cases, the recording of individual "experimental data blocks" will include a time-dependent index key, linking individual sensor data measurements, and also include all fingerprint IDs for the corresponding index key from each specimen across the different experimental setups. For a better understanding of what is previously said, consider 2 different experiments with the same batch of 5 specimens each. For this particular setup, each sensor data measurement is stored on its individual "data block" with its individual unique fingerprint ID will also include all unique fingerprint IDs from all specimens across the 2 experiments, a total o 10 fingerprint IDs.

This kind of data validation logic, using data redundancy and time-aware experimental data measurements, allows a researcher to present dataset files where sensor data is linked to any and all other experiments previously set up and linked to do data measurements. This increases the trustworthiness of data stored in a public data repository while at the same allows remote researchers to program and automate experimental data linkage. For instance in an Excel workbook or in a python script.


With different batches in the same project

Is also possible to link experimental data blocks collected from different batches of specimens across different experimental setups. In all similar, if not equal, to what was described previously.


With different research projects

Finally, this smart DAQ, allows the linkage of individual sensor data measurements across different research projects, whether happening in the same physical laboratory or at different laboratory locations. The same principles previously stated are applied. Individual Smart DAQs are set up and configured to exchange data autonomously, with the purpose of generating, storing, managing, and exchanging unique data fingerprint IDs from the different experimental data sources.



Clone this wiki locally