Skip to content

Introduction

Miguel Tomas Silva edited this page Jul 31, 2023 · 27 revisions

index >> Introduction


Current scientific methods use traditional dataloggers (DAQ) for doing experimental data measurements and collection. This means collected data many times is stored in a paper format, and most of the time in a conventional CSV Excel data file. This is prone to errors and even worst, forgery of experimental data. To this date, there are no dataloggers able to automate experimental data acquisition in a scientific experiment, making it transparent and trustworthy beyond common critiques found nowadays. To overcome such limitations, it was conceptualized and prototyped a Swarm Learning (SL) architecture (hardware and software) as a fully decentralized machine learning principle to improve data trustworthiness. Conceptually, SL is a decentralized approach to validate and maintain a trustworthy database of experimental data publicly accessible in real-time through data redundancy, validation, and authentication of datasets across the multiple smart data acquisition devices connected, locally or remotely. Every participating site is a node in the Swarm network and participates in the data validation and authentication tasks by sharing local hardware resources. Data security and sovereignty are ensured through public permissioned blockchain technology. New smart data acquisition devices can enter a "Swarm network" via a blockchain smart contract, regulating access and operational conditions in a fully autonomous way. New Swarm nodes agree to the collaboration terms, obtain the model, and perform local validation and authentication until all tasks are completed. This allows the acquisition of much larger experimental datasets, validated and authenticated publicly, available for analysis from sources outside the primary scientific research of a given specific site, and offers new opportunities to overcome the limitations for collaborative work in science, as several research sites may easily join forces to tackle the same research question from an individual point of view with increased experimental data trustworthiness from unknown sources

The proposed smart DAQ device prototype has the minimum hardware characteristics to handle data measurements collected from sensors locally connected to it, store it on a local CSV or SQLite database file, and finally connect and synchronize data measurements collected with a data repository hosted remotely on a Dataverse.

These Smart DAQ devices are of type "Internet of Everything" (IoE) Smart Devices and are able to connect with each other using swarm intelligence. The main purpose is to increase data integrity and trustworthiness among DAQ devices connected and on all experimental data collected during an experiment or research project.

Experimental data collected is stored in a block format, meaning, a single block stores an individual piece of experimental data written to it, the hash of the previous block, and its own hash.

This is the main principle of operation behind blockchain technologies, to make it really difficult to modify experimental data once it’s written to a block since hashes are interconnected among each other since the beginning of an experiment, experimental campaign, and even since the beginning of a research project. Every block written references the hash of its previous block. This way, for any modification to the data stored in a block, the hash it stores changes forcing the following blocks to also indicate a change (since they must have the hash of the previous block). To modify a block is needed a rewrite on all blocks.

In everyday science at a laboratory, these Smart DAQ devices are able to connect among each other, in a swarm-like manner, and when doing so, increase experimental data trustworthiness and authenticity in an experiment part of a research project or experimental campaign. Setting up a Swarm network of smart DAQ devices not only increases the quality of research results, by tagging each individual piece of experimental data collected from each individual sensor, with a unique data fingerprint ID (hash) at the exact same moment of data collection, broadcast it to other nearby smart DAQ devices and finally do data upload to a repository where a new, additional data fingerprint is added to existing ones (generated locally). This way is maintained and guaranteed data collection integrity locally, from the laboratory, until the moment is received and stored in a data repository in a cloud server.

Early concept versions of this smart DAQ were simple component assemblies of ready-to-buy and install modules for the Arduino Nano series. Finally, I decided to put my effort into it and design from the ground up a new layout PCB using the well-known ESP32 microcontroller from ExpressIF systems. For the next (almost) 2 years, the PCB design and initial concept went through many changes and revisions before it matured into the current state of development. March 2023 PCB revision, utilizes a Tensilica XTensa LX7 32bit MCU with the capability to store locally up to 4GB of data. It is capable of LIVE experimental data measurements, storing them with a blockchain-like unique FingerPrint ID. Do LIVE dataset uploads to a data repository, in particular into a dataverse (compatibility with other open repositories will be added in the future) In such a way never seen before. Technologies being developed are all based on OPEN guidelines and work methodologies and include, for instance, experimental data redundancy as a form of remote data validation.


Symbols used

                 ∆     Max({i})- Min({i})
                 θ     node weight 

References

[1.0] -

Clone this wiki locally