DeFi Builders Are Neglecting Data-Related Risks And It Could Lead to Massive Failures: Sergey Nazarov

May 29, 2020

Hello Defiers! This week’s interview is with Sergey Nazarov, the cofounder of oracle provider Chainlink. The way blockchain applications get their data has proven to be crucial, as failures in those systems have been at the core of many of the latest attacks in decentralized finance. As Chainlink’s large and active group of supporters and token holders, known as “Marines,” will often remind us, this platform is meant to help avoid those risks. But how exactly can Chainlink help?

In this conversation, Nazarov makes the case on why Chainlink can provide secure data feeds, and warns about the risks developers are taking by underestimating the complexities of building data aggregators and oracle systems, and trying to tackle both. He says, trust the oracle mechanism to Chainlink, while Chainlink leaves the data quality to experienced data companies.

Nazarov talks about the trends he’s seeing across blockchain ecosystems; he sees most of the action happening around DeFi, insurance and gaming. He’s excited to see a growing number of more centralized companies move to decentralize part of their operations. In the coming months, he’s most looking forward to continue supporting DeFi, increasing the number of data inputs Chainlink provides, and adding staking to better align incentives for node operators.

Nazarov also talked about his vision for a future where finance will be increasingly based on blockchain technology.

You’re a free signup, which means you get only part of the transcript below. Stay tuned for the podcast on this interview, which I’ll be sending soon!

The open economy is taking over the old one. Subscribe to keep up with this revolution. Click here to pay with DAI (for 70 Dai/yr vs $100/yr).

🙌 Together with Eidoo, a cryptocurrency-powered debit card and platform for easy access to decentralized finance.

Camila Russo:If you're interested in DeFi, you've heard about Chainlink. It's an oracle provider. But I want to talk with you about what exactly is the importance of oracles for DeFi specifically, and how does Chainlink address that problem?

Sergey Nazarov: Excited to discuss those things. I think the way to look at oracles is that there are contracts that need to interact with external data. They need to know about something that happened, for example, a price change, or there’s an insurance contract that needs to know whether goods were delivered and if while they were in transit, they remained frozen, or they need to know whether there was rainfall to know if they should pay out insurance policy to a farmer.

All of these more advanced contracts require the ability to know these things. Now, what people may not know, because of the semantics of smart contracts is that despite being called smart contracts, they really should be called Tamper-Proof Digital Agreements, or something like that, because what they really do is they create a record and a space to generate conditional logic that can be written around events. But the systems in which smart contracts actually run, do not have the capability to actually know about these events.

This is the sense in which something is called an oracle. It can know something that the system to which it gives that data, cannot know. It is a source of truth about events.

The Oracle Problem

I think there's a nuance here around why smart contracts can't access these events. And it's basically because the way that they're secured by using these independent entities called miners, that package together transactions and secure those transactions through what's called consensus, which is an agreement around the transactions basically. And if a key input into those transactions is then controllable by one of those miners, you lose a lot of the security guarantees over those transactions.

So, I think the first problem to understand is what is the oracle problem? The oracle problem is that smart contracts or logic that's executing on these blockchain environments, because of the security requirements of a blockchain to provide its value, those same contracts cannot access external data. There's kind of a big, highly secure wall, which is what gives the blockchain its security, but at the same time precludes the logic inside of that wall to access any information.

“The oracle problem is that smart contracts or logic that's executing on these blockchain environments, because of the security requirements of a blockchain to provide its value, those same contracts cannot access external data.”

Image source: Web3 / YouTube

CR: So, what I've seen that oracle systems are trying to do to solve this issue is have a system that's also decentralized. That is getting their data in a decentralized way. And in some ways, it starts to look a little bit like a blockchain even if it's not. Like I know Chainlink uses nodes, which is a concept that we usually associate with blockchains. So, can you explain more about how Chainlink works and how different oracle systems are trying to solve this issue of providing really secure and decentralized data?

SN:I would say that oracles have certain concepts which they borrow from blockchains, but they are not blockchains. And I think some of the problems people have had is where they've tried to take certain blockchain concepts and they've tried to apply them entirely to oracles and just ignore the difference.

So, the difference is, for example, that blockchains have a purposefully limited set of transaction types or computational operations they can do and that's all that they can do. And they have certain size limits in their blocks and they have certain limits on what the virtual machine can do.

Now, what oracles focus on is they approach an entirely different problem. They approach the problem of taking non-deterministic, unvalidated, insecure, in some cases, untrustworthy data from other places and trying to put it through a system that then raises its reliability. The first fundamental difference here is that you're dealing with other systems. You're not creating an encapsulated system that is deterministic and exists and lives in its own universe. You're actually trying to take non-deterministic systems that have these security and reliability issues and you're trying to, by combining a few of them or any number of other cryptographic means to validate where data came from, you're trying to make sure that that data, in many cases collectively now meets those high standards.

So, I'll give you an example. Some people build oracle systems with something called Dynamic Membership. Dynamic Membership allows random people to just show up and make transactions. We don't really have that approach. The approach that we have is you have node operators. Those node operators can prove their security and their reliability. Those node operators then form something called the service agreement with a user contract. So, on chain, you have a transaction that will commit the node operator to delivering data at a certain level of quality, frequency and deviation from other sources.

Node Freedom

CR: And how do they prove their reliability, the node operators?

SN: On our system, people are meant to be able to intelligently compose an oracle network on the basis of a lot of information about node operators and data providers. Because once again, the situation here is not I am making a subset of computational capabilities and that's all that I'm doing or making a subset of transaction types. It's really about people being able to choose a specific configuration of oracles, of node operators, and a specific configuration of data sources. And then scale those configurations as the value which they control scales.

“It's really about people being able to choose a specific configuration of oracles, of node operators, and a specific configuration of data sources. And then scale those configurations as the value which they control scales.”

So, this means if I have a contract on a very secure system, say Ethereum with thousands and thousands of node operators, then great, that's a secure system. But let's say the contract only holds $10. At that point, I don't necessarily need 1,000 Chainlink oracles. What I do need to do is I need to match my need for security to my budget. And that's sometimes determined by the value secured. Sometimes it's determined by the user fees people paid to use application.

And then I want to scale that security on the data provider, on both the quality of the data and the node operators that transport the data. And what our system does is the service agreements and a lot of other insight these nodes generate is something that's easy to analyze and open. Somebody in our ecosystem launched something called Reputation.link, which is an entirely separate team from ours that built a framework to analyze all this data and help people make informed decisions about which are the node operators they want to use.

Another kind of platform or category of this is something called Market.link, which is also run by a separate team, where you can see the certifications that a team has, whether it's been security reviewed, whether it's been identity reviewed. And generally speaking, right now, in our system we focus predominantly on quality of node operators and we're slowly expanding the amount of node operators. I mean, we essentially have hundreds of node operators in different stages of being live or tests but our system really focuses on the highest quality node operators making up these oracle networks. So, there might be one oracle network of seven nodes for one contract and there might be another oracle network of 50 nodes for another contract.

CR:Okay. And so, does Chainlink approve and node operators to have high enough quality to join kind of the network or who approves them?

SN:It depends on the different networks. We generally don't approve quality. We have a review that we do on node operators so that they meet a certain level of quality. There are networks that people can compose those node operators into.

CR: But you have certain requirements that node operators need to meet in order to start providing data?

SN: Yeah, and that once again, depends on the network. So, the point here we are not saying we will have 50,000 anonymous people who could really be one person or two people. We're not saying that. We are saying we have hundreds of extremely reliable entities with extremely high-quality DevOps teams of 10, 15, 20 people that have already successfully secured hundreds of millions, in some cases, billions of dollars on an ongoing daily basis. And these teams are premium node operators that we can then compose together with other node operators and we can also select the right data providers. And this doesn't mean that you don't get to thousands of nodes and it doesn't mean that you don't arrive at thousands of anonymous nodes.

It just means that if you want a certain level of security, you should be able to make an informed decision about the node operators you need to select to reach this level of security. And if I want anonymity, then those are the node operators over there that I want to select. And you can combine them or not combine them. But the point is, you should have an informed way of doing that, which is essentially what we've built. We've built an informed and way for users to select high quality oracles and also very soon to also make informed decisions about data quality and data providers.

CR: And right now, are all of your nodes known? Or are some of them anonymous?

SN:No, there are people that run nodes and we don't know who they. It depends on what you mean by nodes. In most of the higher quality networks that provide premium data to real applications, most of those node operators are known. And they have very large, well-known teams. There are many anonymous node operators that people can compose into an oracle network of anonymous node operators. They simply take a different set of risks.

CR: I think it's an interesting question, because there's the risk of when you're dealing with a few known nodes that they can be corrupted, right? They can be susceptible to manipulation and so forth. I think that's the idea of using a wide network of anonymous nodes, so it's harder to corrupt it.

SN:Right. Yeah, that's security through obscurity. And the reality is that we don't preclude thousands of nodes. The point is that if somebody wants to compose a network of 1,000 anonymous nodes, they can. What we seek to do is we seek to give people choice and we seek to give them an ability to make an informed security assessment and to say I want 100 nodes of this quality, I want them to be running in three different data centers, so I want a third of the nodes in Amazon, a third of them in Azure and third of them in GCP. And I want all of those different nodes to be providing different sets of guarantees. So, one set of nodes uses trusted execution environments, one set of nodes use zero-knowledge proofs and one set of nodes simply has impeccable historic reputation.

So, I'm not saying that you don't want many nodes, you do want many nodes. You just want that to be at the discretion of users and you want it to scale with the value secured and you want people to make informed decisions about who their node operators should be. And if people decide that they want thousands of anonymous nodes, that don't have a performance history, don't provide any guarantees, then they can do that if they value anonymity, at that high price and cost of securing something.

What they'll have to do is they'll have to say, okay, I can't have 50 pseudo-anonymous node operators, that's not going to work. I need over 1,000. And I think there will be use cases that have a portion of their oracle network that is pseudo-anonymous or anonymous and then there will be oracle networks that are completely that way.

Right now, the right security dynamic seems to be high quality nodes, high quality data providers. That's what's really, I think, necessary to provide the highest level of data quality to an application. Because if you have lower quality node operators and they don't deliver data, you could run into problems. And this once again, goes to the difference of the problem we're solving here. We're really solving about a problem where we need a highly reliable, always up, middleware that provides you these guarantees.

Underestimating Data Quality

CR: And talking about these problems, we have seen a few problems recently in DeFi and some of these have been triggered by faulty oracles or pricing mechanisms. One of the major crashes in DeFi happened when the MakerDAO liquidation system broke down and part of it was because prices weren't updating as quickly during the time that Ether was crashing. Then earlier in the year, we saw the bZx attacks with flash loans and I think the issue was that the attacker manipulated the use of a Uniswap and Kyber for pricing. So, we've seen less than ideal pricing and oracle systems trigger attacks in DeFi and cause problems for DeFi applications. Could a different oracle solution have prevented these issues? Are these problems just kind of innate in these DeFi apps?

SN: No, no, they're not innate. They are consequences of architecting an application a certain way. The one thing that I can say is that during that period that you've described, none of our users had any losses from using our oracles, so that's something we can definitively say.

Now, I think the other nuance here is that this is a multilayered problem that seems like it's simple. So, it seems like it's simple because people basically take their experience from building web applications and they say, I'm going to just do that here. And in the web application world, they have a lot of frameworks and a lot of plumbing and a lot of infrastructure that's already built to allow them to connect all kinds of services. There can be a service, like Twilio to send an SMS. There can be a service like Google Maps to get the location of a user. And then there can be a service like Stripe to make a payment.

That API infrastructure for data or payments or anything else is what oracles are built around and give access to. But I think what ends up happening in some of the cases you've described and many others that I've seen, some of which are not exactly public and they don't need to be, they've luckily educated the people that have been building. People ignore, I think to their peril, they ignore data quality and they ignore the quality of their node operators and the assurances that their node operators give to users.

“People ignore, I think to their peril, they ignore data quality and they ignore the quality of their node operators”

There's these two approaches that I see the people solving this problem. One approach is I'm going to take blockchains and I'm going to replicate everything I do in blockchain-land on to oracles. That usually misses the point of the key problem you're solving. The key problem you're solving is highly reliable, highly secure, highly available access to external systems, which you don't even know if those systems are secure, so you then also need to deal with that problem. And so that blockchain approach of, “it's just a blockchain but different,” is one of the first issues.

The second issue is that —and this is something we take very seriously and I've done a presentation on recently and I advise anybody building an Oracle mechanism on their own internally or whatever system they use to look at very seriously— is I've seen one or two oracle mechanisms where people do not take into account data quality. So, they basically ignore the fact that in traditional finance there is a large group of data companies like Bloomberg, Reuters and others, that are hugely successful companies in hugely competitive markets, because they smooth out risks related to data.

And there's people that basically say, I'm going to completely ignore all the risks from data, I'm going to both make a data aggregation methodology and I'm going to build an oracle mechanism, so I'm going to do both. I'm going to create a data company and I'm going to create a piece of software that's meant to provide security about the transportation of data. And then they begin to make all the mistakes that people who have never made a data company make, which is why we don't do that.

We do not actually generate the data. We go to data providers, such as the ones that sell crypto data, to Bloomberg and Reuters, to power their systems and all the systems they sell into and we take that high-quality data and we leave that problem with truly experienced, multiple decades of experience data aggregation teams for crypto prices and other categories of data and we focus on the proper transport of that data with the maximum amount of guarantees that it came from where it came from, it's going to keep coming out a predetermined set of conditions, and the oracle will in certain cases soon be able to guarantee with a depositor staking, that it will arrive.

And I think these issues have flown under the radar a little bit. Like the fact that we're talking about these issues now is due to some kind of failure, which I'm not happy about. I think that people who build these systems should seriously consider what set of problems they want to solve, they should understand the full depth of the problem they're solving and they should seriously consider if they want to solve a data quality problem, which is its own problem and then the data transport oracle problem.

We, for example, don't solve the data quality problem. We go to people that solve it and then we combine the efforts of those people in a single system that seeks to minimize the risk of that problem. So, we only work with the highest quality data providers.

Confusing Practices

I'll tell you some of the things I've seen which absolutely, you know, confused me. I've seen people say that I'm going to go to two exchanges and those exchanges are going to define the market prices for this asset. And then I've seen another third exchange show up and get all the volume. And therefore, the two exchanges they have integrated into their oracle system are now representing a very small percentage of the volume, which is very easy to manipulate by people living without any technical experience, just a trader can go into those environments and manipulate the prices in those exchanges. And the reason that isn't a clear issue is because the people who put together that architecture, they don't run a data company. They don't have pager duty alerts when volume shifts to another exchange that which creates this large kind of existential risk. And the folks that we work with, they've come to appreciate this risk and realize that you need a secure data transport layer and you need secure data providers.

“I'll tell you some of the things I've seen which absolutely, you know, confused me. I've seen people say that I'm going to go to two exchanges and those exchanges are going to define the market prices for this asset.”

Other things I've seen that very much concern me is we're going to use a single exchange to define price. How can you predict what the volume of that exchange will be, especially for certain asset classes or tokens that are thinly traded?

I think the only reason these very dangerous patterns are not as discussed is because the losses have not been Mt. Gox level. And I think that it's very possible that somebody who nonchalantly just says, I'm going to use one exchange and it'll be fine for some category of tokens finds themselves in a situation where the possible losses they incur, especially if crypto values rise and especially if the amount secured in one or another DeFi application rises, they could find themselves in a situation where some home-baked oracle or something where they're taking on three or four problems that, you know, are very difficult to solve problems that require double-digit teams of experienced people and they can hardly solve them, results in the type of loss that then colors the whole DeFi space in a bad light. And this is what I think people should seriously avoid.

“I think the only reason these very dangerous patterns are not as discussed is because the losses have not been Mt. Gox level.”

So, they can use whatever oracle mechanism they feel comfortable with. But I think what people should understand is it's not as simple as, I'm going to just connect to an API and I'll be okay. That's really what we do, is we allow people to quickly build a DeFi application without making these design decisions of like, I'm just going to have one exchange which is going to completely control my contract. And you can ask them, you can say, hey, what happens if that exchange suddenly becomes thinly traded? Do you have pager duty alerts to let you know that this oracle system you set up is now exposed and it'll only cost $100,000 to manipulate the price input into your DeFi contract, which is irreversible and people could take money from and you'll never get returned?

Ticking Time Bomb

CR: And it’s easier than ever to do with flash loans.

SN: Exactly. So, the environment is making these things easier and people are looking at this problem as if it's something they can solve with web development experience. This is a data quality problem, which is complicated and it is a security problem which is complicated.

CR: Right. So, I guess the root of the issue that we're seeing in the latest attacks is people underestimating the difficulty of creating good data aggregators and thinking they can…

SN:No, it's their misunderstanding that in the financial world, there are data companies that are massively successful, because they manage a very large portion of risk related to this. And ignoring, those facts very, very dangerous in my opinion. And it just hasn't blown up, because either the numbers aren't high enough or somebody is waiting to exploit a system like that or maybe there's other targets that are higher on people's target lists. But I really wouldn't underestimate either data quality or the quality of an oracle mechanism and taking on both of those problems simultaneously, building a data company for various types of data and building a highly secure oracle mechanism.

“I really wouldn't underestimate either data quality or the quality of an oracle mechanism and taking on both of those problems simultaneously, building a data company for various types of data and building a highly secure oracle mechanism.”

Like I said, we've built a mechanism where we focused on security and provability and we were very lucky to work with top data providers. And we're working with as many of them as we can and more and more of them to make sure that the people generating high quality data that triggers these contracts is coming from a place where people have experience and an actual secured systems to minimize risks like, market coverage risk or those manipulation attacks or even a whole bunch of other attacks, which people haven't experienced yet, which, you know, once we explain those on deeper integration calls. That's why I think people end up going with us, is because we don't hand wave away these problems.

We go to them, we tell them, look, you have a serious attack vector here, you have attack surface area, you can secure it or we can secure it, but somebody needs to secure it. Because, you know, imagine the price of crypto goes up and to the right and then imagine the amount of value blocked and DeFi goes up and to the right, that's the only reason some of these oracle issues haven't been front page news is because the numbers have been sufficiently low.

CR:So, say that the numbers do increase in value, prices, number of users, do you think, looking at how DeFi is secured right now, is it safe enough for it to blow up tomorrow?

SN:I tend not to comment publicly on other people's applications, just I don't tend to do those things. I can tell you some general principles and I'm always glad to describe to people the security dynamics they should be aware of. I think people should very seriously consider whether this is a simple easy problem or whether it's a problem with multiple depths of complexity that they discover as they get into it and whether they want to put the fate of their entire system in the hands of something that doesn't even know there are certain risks out there based on how it's architected. That's a serious consideration I would recommend people make.

[ … ]

Paid subscribers have access to the full transcript, including sections on:

The risks lurking in DeFi
““The issue here is that there are people basically saying, I'm going to make a data company, it's not hard. I'm going to make an oracle mechanism, it's not hard.”
Trends in the blockchain space
“Most environments want a stablecoin. Most environments want a lending capability. Many environments want a derivatives”
Provable randomness and other inputs
“As we get more inputs, like randomness, like price data, like weather data, we are likely to see more and more very creative things get built.”
Being blockchain agnostic
““The more blockchains we do integrate, the more attractive it becomes for data providers to provide data through Chainlink, because they suddenly have a larger universe.”
A third path to decentralization
“There has been usually only two paths. There has been the centralized path and the decentralized path and it's been all in on one or the other (…) I think this third path is very attractive, because it doesn't require a huge investment. It allows people to use their existing system and it allows them to gradually provide decentralization guarantees.”
The next year for Chainlink
“The chief focus is to make sure that decentralized financial products that are built in these various blockchain environments, get high quality data, and eventually get data with increasingly large guarantees (…) [those guarantees include] staking.”

Subscribe now so you don’t miss any of The Defiant content. Subscribers reading this post: Head to posts marked with the little lock to see the full content.

The Defiant is a daily newsletter focusing on decentralized finance, a new financial system that’s being built on top of open blockchains. The space is evolving at breakneck speed and revolutionizing tech and money. Sign up to learn more and keep up on the latest, most interesting developments. Subscribers get full access at $10/month or $100/year, while free signups get only part of the content.

Click here to pay with DAI.There’s a limited amount of OG Memberships at 70 Dai per annual subscription ($100/yr normal price).

About the founder: I’m Camila Russo, a financial journalist writing a book on Ethereum with Harper Collins. (Pre-order The Infinite Machine here). I was previously at Bloomberg News in New York, Madrid and Buenos Aires covering markets. I’ve extensively covered crypto and finance, and now I’m diving into DeFi, the intersection of the two.

WE'VE MOVED TO thedefiant.io