How OneSoil uses data science to transform farming_Cover_OneSoil Blog

How OneSoil uses big data for farming

Reading time — 10 min

We don't talk often enough about how our technology works. It's time to fix that. The OneSoil apps are built on big data analysis and machine learning. The more data we have, the more accurate our calculations and forecasts are. That directly impacts the functions that farmers see in the apps. That's why we're giving a peek behind the magic at OneSoil: modern-day magic called data science.

But for starters, how can we understand how this magic works? By looking at the numbers.

We delineated field boundaries in 57 countries using satellite images so that farmers could easily jump into using our app. They just need to select their field on the map. We often hear that our apps feature a great design and work fast. It must be true. In just a year and a half, 120,000 users signed up with OneSoil as of February 2020.

Here's how the number of fields has grown since we introduced the app.

Our users entered 30 million hectares of land on the app as of February 2020. According to the FAO's latest estimates, the world contains 1,567 billion hectares of arable land. That means users entered 1.9% of the Earth's arable land right here on OneSoil!

One country has a third of all its agricultural land on the OneSoil platform. Hi, Ukraine!

How do we do it? When we develop our apps, we use big data and data science. Our 8-person R&D team takes charge.

Data 'harvesting'

We need a lot – A LOT – of data to train our machine learning algorithms. We collect data on real fields. Then, using special mathematical operations, we increase the amount of data we have thousands-fold. We then teach the neural network to find patterns and trends.

A neural network is a computer program made up of many algorithms that works like a human brain. Data is introduced into the neural network, which then processes it using algorithms and produces a result. On each run, it considers its previous calculations and improves the result.

We need a lot – A LOT – of data to train our machine learning algorithms. We collect data on real fields. Then, using special mathematical operations, we increase the amount of data we have thousands-fold. We then teach the neural network to find patterns and trends.

We'll take a look at an example of delineating field boundaries using satellite images. At first, we only had data from a few farms in Belarus and the Baltic states that our machine learning algorithms used to learn how to predict field boundaries. Here's how it worked. For each field (whose owners told us where the boundaries were), we calculated the area of the similarity of the real field boundaries with those predicted by the algorithms. If the algorithm circled areas outside of the boundaries, it was penalized. That's how it learned. This metric is called 'intersection over union'. It can accept values from 0 to 1, where 1 is a 100% match. For us, this metric varied from region to region, but, on average, hovered between 0.85 and 0.88.

A neural network is a computer program made up of many algorithms that works like a human brain. Data is introduced into the neural network, which then processes it using algorithms and produces a result. On each run, it considers its previous calculations and improves the result.

We'll take a look at an example of delineating field boundaries using satellite images. At first, we only had data from a few farms in Belarus and the Baltic states that our machine learning algorithms used to learn how to predict field boundaries. Here's how it worked. For each field (whose owners told us where the boundaries were), we calculated the area of the similarity of the real field boundaries with those predicted by the algorithms. If the algorithm circled areas outside of the boundaries, it was penalized. That's how it learned. This metric is called 'intersection over union'. It can accept values from 0 to 1, where 1 is a 100% match. For us, this metric varied from region to region, but, on average, hovered between 0.85 and 0.88.

Then we started showing the neural network millions of images of agricultural fields so it could learn how to detect a field as opposed to a building or an airstrip. The algorithm spends a long time learning, and we monitor the results and improve the algorithm until its field boundary detection has a high degree of accuracy. How do we know that its accuracy has improved? We compare our calculations against the real data we have on the fields. We can currently detect field boundaries with a high degree of accuracy in 57 countries.

When we can confidently detect field boundaries in, say, Ukraine, that doesn't mean that everything will necessarily work the same somewhere in Brazil. That country has its own fields with its own unique agricultural methods. Once again, we need real data to refine and improve our algorithm.

We're sharing this to drive home the point that we can't go far without data on real fields. Finding that data isn't easy. We have to use all kinds of resources to do so. Now we'll move on to discussing how we do that.

We get data from users
OneSoil users enter data in our apps about the crops growing in their fields, sowing and harvesting dates, average yield, and phenophases. A few months back, we first started using this data to train our machine learning algorithms. The R&D team verified the accuracy of past sowing date forecasts in one region.

30 mln ha

1 mln fields

Data our users have entered on the OneSoil platform as of February 2020

This is a good place to make an important digression. Users' data entering our algorithms are generalized. For us (or, more precisely, for our neural networks), it doesn't really matter who the fields belong to. What's important for the algorithm to know is what's growing in the fields. In other words, it doesn't matter that the cornfield belongs to Dean Rohrbacher from Columbus, Texas, in the United States. What DOES matter is how many total cornfields there are in the Columbus area. We're not so worried about the little details as we are about the big picture. That's why users' data in the OneSoil apps is both securely stored and helpful in making our neural networks smarter.

We do outreach
Our R&D team is constantly meeting new research institutes and individual researchers who work in the same sector as we do. Often, they reach out to us first.

When the interactive OneSoil Map was released in 2018, we received a message from Guido Lemoine, a department director at the Joint Research Center (JRC). Last year at a European Space Agency conference, our Data Science expert Christina Butsko met him in person.

"They shared a list of open-source data that they use and that isn't very easy to find," Christina shares. "I'm really looking forward to seeing their unique dataset on plant phenophases that they've been compiling over two years of field studies."

Last year, our R&D team worked hard on predicting crop phenophases using satellite images. The JRC's dataset will help them move closer to solving this task.

ESA Living Planet Symposium 2019 in Milan_OneSoil Blog

ESA Living Planet Symposium 2019 in Milan, May 2019. One of the conferences where we meet future partners and share knowledge.

We exchange knowledge
Our precision farming expert and OneSoil's cofounder, Usevalad Henin, is rarely in the office because he spends most of his time in the field. Or, quite literally, the fields. Usevalad speaks with farmers, analyzes their fields, and jointly conducts experiments on variable-rate seeding and applying fertilizers and pesticides. He talks to farmers a lot, and, fairly often, they agree to partner up.

Last year, several dozen Ukrainian and Russian companies provided us with 4 years of data on fields covering a total area of 7 million hectares in exchange for us analyzing their data. This database includes information on crops, sowing dates, harvest dates, and average crop yields. It's a true gift for our R&D team. Thanks in large part to analyzing this data, we can determine the sowing date in Ukrainian fields with an accuracy of 2-3 days and help improve planning for fieldwork. More lies ahead.

"In 2020, we will conduct variable-rate seeding experiments in fields covering over 100,000 hectares," Usevalad says.

We guess we probably won't see him much in the office in 2020, either!

Usevalad researching fields for an experiment

We ask questions
In 2018, our CEO, Slava Mazai, wrote a letter to the Canadian government. The data we had on fields and crops in the country was insufficient for verifying the accuracy of our machine learning algorithms' calculations. So, Slava wrote a letter to a Canadian ministry that started out, "Dear Canada". Seriously.

The letter to Canada

The cool part is they wrote back. A year later, we got their reply. That's how we got data on 50,000 fields in three Canadian provinces, which has helped us to more accurately identify crops in Canada and to make the OneSoil platform even more convenient for farmers in the region.

170 mln ha

30 mln fields

Field data from our partners as of February 2020

So, when we have a lot of data from open sources and various partners, we improve our algorithms that we already use in the OneSoil apps (or will use in the future). When we have a lot of data from users, we can improve the accuracy of our estimates even more. That's how data and technology interact with each other.

The magic behind our algorithms was explored by Olga Polevikova
Illustrations created by Vanya Uvarov and Dasha Sazanovich
Layout by Anton Sidorov

Watch your crops
grow with OneSoil

Start Now

Do you like this post?