Jump to content

Datachain

From Matthews Lab

Future decentralized autonomous systems (DAS) will likely be focused on distributing specialized AI agents for finding patterns in bulk data sets. These DAS will be funded with a pre-existing cryptocurrency like Bitcoin and then pegged to the entity in such a way that the assets can be given out as rewards for correct solutions without human intervention. The resulting system forms a decentralized, autonomous, peer-to-peer, client-to-client, datachain that rewards assets for finding data that the AI likes.

The use-cases for such a system would be in big data processing, web scraping, and data mining where remote files are scattered all over the Internet and are too resource intensive for any one organization to search for specific patterns. Such a system may help us to find new meaning in the vast stores of content already accessible via the Internet, for example - crowd sourced cat pictures.

Many types of content people like follow a similar theme. For example, memes tend to follow a common format often employing slight remixes of the same image. I argue this could be a good place to start for crowd sourced content discovery via deep learning algorithms.

These deep learning algorithms can be applied to produce an arbitrary AI that could be downloaded and run by anyone to detect content you may like in exchange for receiving a reward – almost like proof-of-work using an AI. With memes this would be like paying a website for content, only the content you receive would be based on your taste and provided by the crowd - all without having to expose your credit card details.

Another interesting bonus to this is that you wouldn’t need to broadcast your preferences to other people. Instead, the AI would contain an abstract description of the content you find interested (based on your organized, TB meme collection) and the exact qualities that it recognizes can only be vaguely deduced by comparing a collection of compatible matches with a theme.

Once a person has found a match, they can submit the content to the DAS to verify that the match is correct, but since anyone can claim to have found a match and waste resources – it would also be necessary to back rewards in the datachain with collateral in another concrete cryptocurrency like Bitcoin using something like a two-way-peg so that spammers incur a cost if they waste resources. This would all be possible using many kinds of smart contracts or perhaps by encoding the rules in two different blockchains to negotiate trading these collateral-backed crypto-derivatives.

Miners would then be able to check that a unique, collateral-backed deposit existed when a solution was submitted. This would allow miners to ignore running the computationally expensive AI on a chosen solution if it wasn’t backed by collateral which would be incredibly fast. If a given solution was incorrect, any collateral could be distributed to miners so that the network stayed protected. If a solution was correct it would enter the chain, becoming a permanent record of solutions shared throughout the network, with the original collateral released back to the submitter.

.. To put this another way: you’ve now setup a decentralized, unstoppable system that’s run autonomously by AIs to find you files you like which you’ve incentivized people to keep running by using cryptocurrencies as a reward. In this way datachains are almost similar to DNA in that they’re self-replicating entities as long as there is a survival advantage for doing so – which in this case is whether or not the entities funding the datachain are able to continue to receive some benefits from doing so.

Attacking the datachain[edit | edit source]

Since you’re attaching submissions to the minting of new rewards within a chain you would need to use a hybrid proof-of-work system where submissions were also mined using standard hashcash style proof-of-work (or just stamp them with Bitcoin.) Otherwise Google can come along and replace the entire chain in 1 nano second.

There is also the issue of reverse engineering the AI to produce thousands of random garbage images that would all be considered valid. To stop that problem, the data set would likely have to be specified beforehand by using hashes – for example by using meta-data collected from a list of torrent files that is then distributed with the DAS at the time of its creation.

The final problem is that you would need to find a way to verify that submissions were unique so that minor alterations couldn’t produce millions of new matches. There are algorithms that can do that without effecting the decentralized nature of the DAS, however the problem with that is – the algorithm would likely have exponential properties since it would need to check every past image against new solutions (this is bad.)

Alternatively, you could just ignore duplicates. Doubtless to say that a vast number of duplicates would still occur but if you restricted solutions so that they can only come from a valid pre-existing set of files based on meta-data: the incentivisation structure would probably still work.

Future work[edit | edit source]

All of the major problems with a datachain seem to be solved when the DAS has tight restrictions on the parameters for accepted solutions. For example, rather than saying a solution can come from anywhere you have a big list of hashes and tell the network to process files in that search space. This is an easy way to solve things as there is a huge amount of data in torrents that fit that definition – but its still slightly impractical.

In the future one can imagine a datachain being used to run decentralized, autonomous, websites based around organizing content mined from other services that fit a certain pattern. The content would be provided by miners who maintain the chain in exchange for a reward where the chain is kept around so long as the content is found to be useful. This would be quite an interesting spin on dynamic, user-generated content websites while providing a robust platform for keeping such information available.

Another idea is to use something that I call a “programmatic organization” to incentivize the creation of new content. For example, you could specify that submissions could only come from certain places (organizations) after a certain point in time and remove some of the restrictions for submissions. This would mean introducing third-party trust (so no longer a DAS) but could be another interesting structure for funding new content.

I can think of some more programmatic organizations that might make sense in this context: like article production type stuff comes to mind where payment is guaranteed based on pattern matching regardless of how the employer acts. Then when it comes to the more specialized case of a datachain as a DAS you might also have things like search engines that would benefit from being funded by the user for content procurement.

There are definitely more datachains left to discover and different types of DAS but the rest I’ll leave up to the reader to discover. pandoc version 3.6.1