Navigating and querying the LiteFarm database

 

This article will cover:

  • An introduction to SQL queries

  • The overall structure of the LiteFarm database schema

  • Navigation of the LiteFarm schema using foreign keys

  • Common or particularly valuable queries for extracting data from LiteFarm

 

No prior knowledge is necessary, though a basic understanding of relational databases will help.

 

Things you’ll need to follow-along:

 

  • A tool to open a “.erd” file - DBeaver is a free example

  • A Postgres client

  • Read-only access to the beta environment: Talk to @Lite Farm about this

  • The first page of this cheat sheet

  • This .erd (accurate as of March 22nd 2022)

 

 

Reference videos

 

Nathan Williams does a really good introduction to SQL broken down into easy-to-learn videos, so I’ve linked to the most relevant videos in his playlist for all but the first video below. I won’t be using pgAdmin or using the data sets he does (which he covers in the first and second video, respectively) for our LiteFarm specific session, but it probably makes sense to watch those videos (and use that tool and dataset, respectively) when you’re getting started so that you can more effectively follow along with the videos we do cover. I’ll also refer back to specific videos as we progress through some of the LiteFarm examples.

 

  1. What is SQL? (4 minutes)

  2. Basic SQL queries (7 minutes)

  3. ORDER BY and LIMIT (3 minutes)

  4. Filtering using the WHERE clause (11 minutes)

  5. Basic aggregations (6 minutes)

  6. Advanced aggregations (6 minutes)

  7. String functions (12 minutes)

  8. Rounding (3 minutes)

  9. Formatting dates using TO_CHAR (4 minutes)

    1. TO_CHAR reference sheet

  10. Date arithmetic using DATE_TRUNC (7 minutes)

  11. Joins for beginners (15 minutes)

  12. Combining queries (5 minutes)

  13. Case statements (6 minutes)

 

Since this is an introduction, I’ve decided to omit a few of the less relevant or more complicated actions you can take using SQL. If you feel pretty comfortable with everything above, feel free to keep going. Depending on future needs, we may do a future training covering these topics too:

 

What is a database schema?

 

A database schema is a structured collection of database tables (tables). Following object oriented principles, tables are generally created to hold information about a conceptual idea such as a task, farm, or user. There are important exceptions to this rule, but in general, this is the case. Each table has certain attributes or columns that fit within the concept of the table. For example, the user table would have columns such as username, email, and name that relate to every user. Each row in that table is a one instance - or record - of that table. Using our user table example, you could have:

 

id

Username

Email

Name

id

Username

Email

Name

1

jdoe

jdoe@gmail.com

Jane Doe

2

LightningBoltBoiZ

zeus@olympus.gr

Zues

 

The above user table has 2 records. One for Jane Doe and another for Zues. Videos 1 - 10 in the reference videos section above give you background on many ways you can modify or display specific columns on individual tables - so please watch them and experiment. However, where things get really interesting is video 11. In this video, Nathan shows us how to connect different tables together using primary and foreign keys using an operation called a “JOIN”. This is a somewhat difficult concept, but is absolutely essential to getting the most of the LiteFarm database. We’ll get into that in just a moment, but first…

A note on ‘deleted’

 

In LiteFarm, we almost always use “soft deletes”. This means that when a user “deletes” data in the app what’s actually happening is we’re setting a flag in the database that says that data is now deleted, e.g. deleted = true. A “hard delete” is when the row of data is actually just removed from the table altogether. In LiteFarm, the user interface knows to not display those records that have deleted = true. As such, you’ll also need to make sure and exclude deleted records when you’re pulling information from the database or risk pulling incorrect information. In most cases this just means you’ll need to add the following to whatever table or tables you’re querying:

 

AND deleted = FALSE

 

Users, userFarms, and Farms

 

Commonly, we’ll need to track down information about:

  • A user

  • Their farm

  • Who has access to which farm(s)

 

For this, we’ll need to join the user, userFarm, and farm tables.

 

 

Looking at the ERD we can see:

  • The users table has the primary key: user_id

  • The farm table has the primary key: farm_id

  • The userFarm table has two foreign keys: user_id & farm_id

What this means is that we can JOIN these tables based on their shared keys. Below is a query that will do exactly this by finding users based on either their name or email address and then finding any userFarms and farms that the selected user is connected to via join. By joining these tables, we see which farms each person has access to, their roles at those farms, and the details of the farm.

 

SELECT * -- You can modify the * to be just the fields you want, e.g. first_name FROM users u JOIN "userFarm" uf ON u.user_id = uf.user_id -- This combines the user and what farms they can access JOIN farm f ON uf.farm_id = f.farm_id -- This pulls in the information about the farm WHERE email LIKE '%%' OR -- When searching by email, remove the "OR" and the name portion CONCAT(first_name, last_name) LIKE '%%' -- Put in the name you're looking for and remove email

 

A quick note: “--” tells SQL to ignore anything after it. It’s a way to add human readable context or “comments” to your queries.

 

Using the above as a base, we can make a few small tweaks in order to ask the database to show us every person with the email address “kcussen@litefarm.org”, their full name, their role on a farm, and the name of the farm where they have that role:

 

 

In this case, there is 1 user, with access to 2 different farms (Joykill Farms and Cultivando sonrisas) where they are the Farm Owner and Farm Manager, respectively.

A quick note: Role_id’s can be translated into things like “Farm Owner” by querying the ‘Role’ table, e.g. “SELECT role FROM roles WHERE role_id = 1”. This table doesn’t change, so you should quickly be able to memorize which number is which role. I’ve shown the Role table contents below:

 

 

We could also turn this query around and find information about farms by making the farm the target of the query (the table named after ‘FROM’). We could then pull in the people working on those farms by joining on the farms that are returned. For example, we might want to know everyone that is a “Farm owner” on a farm in the US.

 

 

A quick note: Here I’m identifying the country based on the country_id foreign key held in the farm table. You can query the ‘countries’ table to see which country_id’s map to which country. The United States is 212. Canada is 37. Farms created on LiteFarm prior to May 2021 may not have a country_id (due to a bug that was fixed at that time), so you may need to query based on a combination or country_id, address, and currency if you’re trying to find older farms.

 

There are innumerable ways to query information about farms and users, so please bring any specific problems you may have in mind to our training and we’ll work them through together.

 

Locations

 

Farms consist of many different location types, such as farm site boundaries, fields, and gates. Many attributes of these specific location types are general attributes of every location, so you’ll typically need to join the type and location to see all of what you’re looking for.

Additionally, there’s a separate idea regarding locations that describes the “physical” attributes of the location. Things like it’s location (using latitude and longitude or a collection of latitudes and longitudes - collectively known as ‘grid points’), it’s area, perimeter, etc. We link this idea to a location through the “figure” table. Furthermore, there are three types of figures: areas (things that have area), points, and lines.

You can learn much more about these concepts on this page: Location categories: Areas, Lines, & Points. For now, it’s just important to know you may need to do joins across each of these tables to get all the information for a particular location or set of locations.

 

Lets start with a very simple query to find out how many locations exist on a particular farm:

SELECT COUNT(*) FROM location WHERE farm_id = 'abc' -- You would input the actual farm_id in the place of 'abc' AND deleted = FALSE

 

Digging a little deeper, we could list out the name and some of the attributes of all of the locations for a specific farm by executing the following:

 

Here’s an example of what the above query returns when run on an actual demo farm:

A quick note: All areas are stored in m^2. All lengths are stored in m. If you need other units, you’ll need to convert them during your query or in post-processing. For example, to convert m^2 to Ha, you could do the following:

ROUND(a.total_area, 2)/10000 AS "Total Area (Ha)"

Obviously, this will be a case by case basis on the types of figures you’re dealing with.

 

Taking a step back. If we were interested to know about mapped locations across the entire dataset rather than a specific farm, we could extend this to learn how many locations all farms in the system have by executing the following:

 

Executing this query would tell us the number of each type of location across the entire system:

 

Doing so, we can see many more fields are created than any other type of location. No surprise there!

We might also be interested in learning more about specific types of locations on a farm. For example, it might be interesting to know how many fields different farms have.

 

We can easily extend this query to do some descriptive analysis of those fields:

 

Leaving out the farm names for privacy, we can see a wide variety of field sizes here:

 

A brief aside on data quality

 

It might be good to pause here and briefly discuss data quality. LiteFarm is a publicly available app and anyone can enter whatever data they would like. We should make an effort to analyze data returned in our queries and attempt to make them more defensive in nature to exclude obviously bad data. For example, Using the last query in the locations section, sorted in DESC order by “Total area under production (Ha)” we see the following 2 farms:

According to Herdsy.com, the biggest farm in the world is 22.5m acres. The 5th largest is 4m acres. 4m acres is roughly 1.6m hectares. In fact, most of the largest farms in the world (by area) are cattle or dairy operations with the largest crop producing farms being significantly smaller (less than 200,000 Ha). Since LiteFarm doesn’t support dairy or cattle operations, and we don’t exactly target super large farms, we can probably safely set our upper bounds on area significantly lower, perhaps 50,000 Ha.

We might also consider excluding some obvious terms dabblers might put into their farm names to indicate they aren’t real data points, such as “Fake” or “Test”. As such, our WHERE clause might grow to look as follows: