Conf42 Chaos Engineering 2022 - Online

Application for Blockchain in Crowdsourcing Data

Video size:

Abstract

Dealing with a crowd is never easy. We need to make our systems resilient against a number of attacks and they way we do it is utilise the crowd itself. We will talk about:

  • Chainlink oracles,
  • distributed open source systems,
  • incentivisation,
  • application of machine learning
  • and more.

Summary

  • Jamaica make up real time feedback into the behavior of your distributed systems and observing changes exceptions. Errors in real time allows you to not only experiment with confidence, but respond instantly to get things working again. New data sources are coming online and evolving all the time due to Chainlink world. The amount of data is growing massively.
  • The number of data collectors and analysts can be nearly limitless. A lot of organizations have data silos. For others, data access is often restricted to protect their competitive advantage. When dealing with a crowd, this protection is no longer viable.
  • Crowdsourcing data can also enable data analysis on a scale and depth that is not viable for a small organization. By providing small incentives for actions done on the blockchain, you encourage people to work together. Decentralized based systems can be more scalable and fault tolerant.
  • Etihad was the first blockchain to implement smart contracts functionality on a blockchain. Cardano is my favorite blockchain because it takes a different approach towards development. New blockchains are always coming online and they implement new functionality. This is only a review of the top blockchains to work on.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Jamaica make up real time feedback into the behavior of your distributed systems and observing changes exceptions. Errors in real time allows you to not only experiment with confidence, but respond instantly to get things working again. Close SAK about application application for blockchain in crowdsourcing data what are the problems in data sourcing? The world is Chainlink around all the time. Companies formed, others become smaller and go bankrupt. New technologies come to light, governments come into power either through democratic means or otherwise, and government policies are changing all the time. We live in a very turbulent time right now due to coronavirus and the conflict in Ukraine. Things that we thought were going to stay the same are changing in a rapid and massive way. New data sources are coming online and evolving all the time due to Chainlink world, we have more and more data that's becoming available, just difficult to keep up. The amount of data is growing massively. 2.5 quintillion bytes I've generated every day. Also, each data source has its own design, some format or language, and its culture that is embedded in might be different, it might be working in different laws, require different relations that can be very hard to keep up. All this change seems to be on a massive amount affecting the world. Some data sources are resistant to quantification. Scientists, engineers, analysts love the data that is uniform and quantifiable, is easy to enrich and transform. They love standard data formats as they're important. For example, when calculating inflation, you need to know what category particular product is in, how to convert units into a standard format, and what currency it is measured in. If you are the scale of Google, everyone will try to serve you. It is different if you are at scale of a startup. Large established corporation such as Google has a massive scale and near monopoly grip on the web, which means everybody's trying to serve them. Everybody's trying to adapt their websites, make it easier for Google algorithm to extract the data and to rank the website higher than their competitors. Everyone is trying to produce content that is suitable for the Google algorithm. However, if you are a small startup, your challenges are going to be much different. Nobody is going to care to transform the data for you. That means this is going to be your own responsibility. Your resources are limited, so your challenge is to do it at any scale. So what are the advantages of crowdsourcing data collection? Naturally, the first advantage is that you can scale it up easily. As long as you can incentivize the crowd to contribute your project. The number of data collectors and analysts can be nearly limitless. This is like a cloud service provider for computing or storage services can also discover new means to defeat data quantification. A lot of organizations have data silos. They do not want to necessarily make them available because it can reduce their competitive advantage. However, to interact with individuals such as their customers, they do have share small parts of that data set. At least some have implemented automated data scraping protection, which is curious considering decades ago Google started doing scraping without much permission. There was no such thing. These days everybody knows that Google brings customers over. However, for others, data access is often restricted to protect their competitive advantage while crowdsourcing this data. This type of protection can be defeated on several levels because there is no way to share your data with the customer while protecting it as well. Another approach for an organization can take is by rapidly modifying their user interface so they can change up where things appear to change, how they look, how they are presented. However, when dealing with a crowd, this protection is no longer viable because an individual can always understand and help quantify it into an easily understandable format. Sometimes, however, data is also made available to others through APIs and interfaces. However, due to large number of different interfaces and lack of standards, it can be difficult to have resources to integrate them and this is where crowdsourcing data can also help. It can also enable data analysis on a scale and depth that is not viable for a small organization. After gathering raw data, it is important to perform analysis convert it into useful knowledge. Raw data is only as valuable as much as you can analyze and obtain insights. A smaller organization can have difficulty performing a wealth of different types of analysis because for each analysis you might need a different skill set. You might need a different type of expert which can be very expensive to either contract them for a short time at a high rate, or to have somebody permanent which takes time to train and incorporate into an organization. But you can also crowdsource analysis if you provide an incentive. It means that the expert can perform a small step toward achieving a result and you can share whatever benefits from the analysis with the person to incentivisation them. This means you don't necessarily have to come up with sufficient capital in advance to contract the expert. You can also collect data from online and offline sources. A lot of the data is online and more and more of it comes available. However, no matter how effective the collection and upload means are, there still is always more information available. Offline data is also available offline first, therefore before it can be become online. When crowdsourcing the data, you can use the crowd wisdom to note offline events and incentivisation the crowd to share those events as they happen this way to enable functionality that is more rapidly done than without a crowd. What are the differences between centralized and decentralized systems? Centralized systems can be less complicated to build and to control. However, they are opaque. They are not tolerant to attacks and can be less secure. They can be less scalable and have a central point of failure, at least the organization that created them. Decentralized based systems can be more scalable and fault tolerant can be more secure. As all nodes are treated fairly, it eliminates the need for intermediaries and is more transparent. So how decentralized are blockchain? There is no such thing as a perfectly decentralized system. Nakamoto coefficient was invented to attempt to quantify the level of decentralization of a blockchain system. It takes into account various parameters such as how many active wallets are there, how many developers are there, how many nodes active, and so on. It might not be a perfect measure because bitcoin has three large mining pools that in principle could coordinate what's called a 51% attack. However, its Nakamoto coefficient is the highest. An attack like that would result in bitcoin's price dropping and the value of their mining work being reduced, and as such, there is no incentive to do so. What is a blockchain? A blockchain is a decentralized ledger which is implemented as a series of blocks. Each block contains a number of transactions and refers to the previous block and its hash. This ensures that the order of transactions can always be verified. Modern blockchains also support having smart contracts, which can run code based on the conditions in the blockchain and can interact with the outside world through interfaces called oracles. For example, you could set up a smart contract that pays somebody cryptocurrency into their account every month, like a loan. Or you could set up a system where people play roulette and this would be more transparent than a traditional casino website, as you could verify everything on the blockchain. Blockchain also can incorporate actual legal contracts with people that are called ricardian contract. However, this is largely unrecognized to date. Blockchain requires a consensus algorithm to make sure that we know which data is valid. For example, bitcoin is using longest chain consensus, which means that everybody is trying to build on the longest chain to make sure their work is included. Bitcoin is using a proof of work consensus mechanism, which is very energy intensive. Alternative consensus mechanisms such as proof of stake have been implemented to solve that problem. On other blockchains such as cardano. What is blockchain good at? It is good at incentivizing people. It means by giving them something of value, often of monetary value. By providing small incentives for actions done on the blockchain, you can encourage people to work together with you. A lot of cryptocurrencies used on the blockchain are deflationary, which means over time they appreciate in value. That can be a powerful long term incentive for some investors of sweat equity. It is good at determining the order of things. As blockchain. All transactions in a block have to be ordered. It is easy to determine if someone, for example, provided a piece of information before somebody else, and to make sure the first person to do so gets a reward. It is good at anonymity or pseudonymity, as most blockchains are pseudonymous, which means that you do not know who owns our particular wallet. But normally you can track the transactions from one address to another. Can also preserve anonymity by, for example, generating a new address for each transaction. It is good at artificial scarcity. It is a way where you can make something scarce by design. For example, bitcoin has a limited supply and therefore, as demand goes up, the price of bitcoin is going up as well. It is easy to make a token that is artificial cars on blockchain and you can also create nonfungible tokens which represent unique things like pieces of art, digital or otherwise. It is also good at decentralization as it can easily scale. If the incentives are set up right, the number of nodes can increase as the demand for the network increases. What is blockchain lacking at? It is difficult to store large amounts of data on the blockchain because each copy of the data needs to be available on every node that is participating in the network. There are attempts to solve this, and some blockchains, such as filecoin, are storing large amounts of data also to enable queries of the data is not yet a fully solved problem, and that a lot of blockchains are integrating with centralized systems for the storage of data. You can also not make data changes easily as each change is preserved, and as such is costly. The trilemma of blockchain challenges has been described by Vitalik Butler, the founder of deuterium. It means that from three parameters, decentralized, scalable and secure, you can only have two. Multiple blockchains have made attempts to solve it, or at least find a workable middle ground. For example, Polkadot has a small number of validator nodes that are elected and such can be very fast but is not very centralized. Decentralized application application application application application application for blockchain in crowdsourcing data blockchain to obtain the data first, we can incentivize people by providing them with a token that ideally would appreciate in value over time, so they would be interested in investing their time and effort so they can obtain more of this token. Also, the blockchain ensures the security of this reward and enables trading it in for money if necessary. We can determine who owns what piece of work. We can use blockchain to determine who has performed what work or provided what data first. This can enable people to easily obtain proof and therefore incentivize people who are pioneering a particular type of tasks we can enable on chain governance blockchain is a good means of blockchain governance as you prove who owns a particular number of votes and the only drawback is that blockchain voting is currently public only. So to implement private voting you would need to have new technologies such as zero knowledge proofs. Distributed autonomous organizations have been around for a while now and are using on chain governance for disrupting the work of executives. Choosing a Blockchain Ethereum was the first blockchain to implement smart contracts functionality on a blockchain. Its founder Vitalik actually tried to implement smart contracts on bitcoin first, but his everports were rejected and as such he created his own blockchain. Ethereum is the oldest one and has the best community documentation and infrastructure to be able to learn more about programming blockchain. It uses solidity language as a standard for most other blockchains. The drawback of this blockchain? That it is very expensive to make transactions on it due to scalability challenges. There are hopes that the next version of Ethereum is going to address these issues, but this has been promised for a while. Sushiswap is the most popular distributed exchange on ethereum. Ethereum is also the most popular platform for non fungible tokens. Binance Smart chain is a fork of Ethereum has much lower fees, however it's much more transcendralized as it relies on nodes that are run by Binance. A lot of projects use binance due to its low fees and it trade on decentralized exchanges such as pancake swap. Solana is one of the fastest blockchains and very popular one as well has a very high transaction per second rate compared to others. However, because of its speed and low cost has been attacked and taken down several times by denial of service attacks. Cardano is my favorite blockchain because it takes a different approach towards development. They aim to write academic papers first and only develop code after the papers are reviewed. They also employ mathematical proofs to make sure that they have a good quality of their code compared to other blockchains like Ethereum and Solana. As a result, it is very safe blockchain with no known exploits to date because of this approach, however, they have been recently dealing with some congestion problems after the first popular decentralized exchange has been launched on Cardano called Pancake Swap. This has been somewhat improved with tweaking the parameters and hopefully will be improved later with further scalability advancements such as sharding Filecoin is a good blockchain to store a lot of data. It is somewhat similar to s three buckets on AWS. Chain link is a blockchain for oracles. It integrates with other blockchains to enable their smart contracts to obtain data from the offline or centralized world. For example, you can make database query or request information from a web API to get the price of bitcoin. This is only a review of the top blockchains to work on. New blockchains are always coming online and they implement new functionality as well, so there's plenty to watch out for. Where can you learn more? Naturally, we can start with various documentations on the specific blockchains such as Ethereum. However, once you advance a bit further, Amorgo provides courses for a variety of technical and non technical topics in the blockchain world, and you can obtain certification if you complete one of the courses. Build space is also a good place to learn to code on the blockchain. While doing it, you will also learn earn non fungible tokens for each course you complete on time. Crowdflation is an open source project building a decentralized autonomous mission on the blockchain to crowdsourcing data from the people built an alternative inflation index. Participants are rewarded with a cryptocurrency token. Thank you all for listening and for making this talk happen.
...

Ignas Galvelis

Software Developer @ Sky

Ignas Galvelis's LinkedIn account Ignas Galvelis's twitter account



Join the community!

Learn for free, join the best tech learning community for a price of a pumpkin latte.

Annual
Monthly
Newsletter
$ 0 /mo

Event notifications, weekly newsletter

Delayed access to all content

Immediate access to Keynotes & Panels

Community
$ 8.34 /mo

Immediate access to all content

Courses, quizes & certificates

Community chats

Join the community (7 day free trial)