This article will cover:
An introduction to SQL queries
The overall structure of the LiteFarm data modeldatabase schema
Navigation of the LiteFarm schema using foreign keys
Common or particularly valuable queries for extracting data Navigation of the data model using foreign keysfrom LiteFarm
No prior knowledge is neceasrynecessary, though a basic understanding of relational databases will help.
Things you’ll need to follow-along:
A tool to open a “.erd” file - DBeaver is a free example
A Postgres client (
We’ll use Postico (Mac only)
though DBeaver also works
and pgAdmin is advised for engineers
Read-only access to the beta environment: Talk to Lite Farm about this
The first page of this cheat sheet
This .erd 👇 (accurate as of March 22nd 2022)
...
View file | ||
---|---|---|
|
Reference videos
Nathan Williams does a really good introduction to SQL broken down into easy-to-learn videos, so I’ve linked to the most relevant videos in his playlist for all but the first video below. I won’t be using pgAdmin or using the data sets he does (which he covers in the first and second video, respectively) for our LiteFarm specific session, but it probably makes sense to watch those videos (and use that tool and dataset, respectively) when you’re getting started so that you can more effectively follow along with the videos we do cover. I’ll also refer back to specific videos as we progress through some of the LiteFarm examples.
...
Since this is an introduction, I’ve decided to omit a few of the less relevant or more complicated actions you can take using SQL. If you feel pretty comfortable with everything above, feel free to keep going. Depending on future needs, we may do a future training covering these topics too:
What is a database schema?
A database schema is a structured collection of database tables (tables). Following object oriented principles, tables are generally created to hold information about a conceptual idea such as a task, farm, or user. There are important exceptions to this rule, but in general, this is the case. Each table has certain attributes or columns that fit within the concept of the table. For example, the user table would have columns such as username, email, and name that relate to every user. Each row in that table is a one instance - or record - of that table. Using our user table example, you could have:
id | Username | Name | |
---|---|---|---|
1 | jdoe | jdoe@gmail.com | Jane Doe |
2 | LightningBoltBoiZ | zeus@olympus.gr | Zues |
...
The above user table has 2 records. One for Jane Doe and another for Zues. Videos 1 - 10 in the reference videos section above give you background on many ways you can modify or display specific columns on individual tables - so please watch them and experiment. However, where things get really interesting is video 11. In this video, Nathan shows us how to connect different tables together using primary and foreign keys using an operation called a “JOIN”. This is a somewhat difficult concept, but is absolutely essential to getting the most of the LiteFarm database. We’ll get into that in just a moment, but first…
A note on ‘deleted’
In LiteFarm, we almost always use “soft deletes”. This means that when a user “deletes” data in the app what’s actually happening is we’re setting a flag in the database that says that data is now deleted, e.g. deleted = true. A “hard delete” is when the row of data is actually just removed from the table altogether. In LiteFarm, the user interface knows to not display those records that have deleted = true. As such, you’ll also need to make sure and exclude deleted records when you’re pulling information from the database or risk pulling incorrect information. In most cases this just means you’ll need to add the following to whatever table or tables you’re querying:
Code Block |
---|
AND deleted = FALSE |
Users, userFarms, and Farms
Commonly, we’ll need to track down information about:
A user
Their farm
Who has access to which farm(s)
For this, we’ll need to join the user, userFarm, and farm tables.
Looking at the ERD we can see:
The users table has the primary key: user_id
The farm table has the primary key: farm_id
The userFarm table has two foreign keys: user_id & farm_id
What this means is that we can JOIN these tables based on their shared keys. Below is a query that will do exactly this by finding users based on either their name or email address and then finding any userFarms and farms that the selected user is connected to via join. By joining these tables, we see which farms each person has access to, their roles at those farms, and the details of the farm.
Code Block |
---|
SELECT * -- You can modify the * to be just the fields you want, e.g. first_name
FROM users u
JOIN "userFarm" uf ON u.user_id = uf.user_id -- This combines the user and what farms they can access
JOIN farm f ON uf.farm_id = f.farm_id -- This pulls in the information about the farm
WHERE email LIKE '%%' OR -- When searching by email, remove the "OR" and the name portion
CONCAT(first_name, last_name) LIKE '%%' -- Put in the name you're looking for and remove email |
A quick note: “--” tells SQL to ignore anything after it. It’s a way to add human readable context or “comments” to your queries.
Using the above as a base, we can make a few small tweaks in order to ask the database to show us every person with the email address “kcussen@litefarm.org”, their full name, their role on a farm, and the name of the farm where they have that role:
In this case, there is 1 user, with access to 2 different farms (Joykill Farms and Cultivando sonrisas) where they are the Farm Owner and Farm Manager, respectively.
A quick note: Role_id’s can be translated into things like “Farm Owner” by querying the ‘Role’ table, e.g. “SELECT role FROM roles WHERE role_id = 1”. This table doesn’t change, so you should quickly be able to memorize which number is which role. I’ve shown the Role table contents below:
We could also turn this query around and find information about farms by making the farm the target of the query (the table named after ‘FROM’). We could then pull in the people working on those farms by joining on the farms that are returned. For example, we might want to know everyone that is a “Farm owner” on a farm in the US.
A quick note: Here I’m identifying the country based on the country_id foreign key held in the farm table. You can query the ‘countries’ table to see which country_id’s map to which country. The United States is 212. Canada is 37. Farms created on LiteFarm prior to May 2021 may not have a country_id (due to a bug that was fixed at that time), so you may need to query based on a combination or country_id, address, and currency if you’re trying to find older farms.
There are innumerable ways to query information about farms and users, so please bring any specific problems you may have in mind to our training and we’ll work them through together.
Locations
Farms consist of many different location types, such as farm site boundaries, fields, and gates. Many attributes of these specific location types are general attributes of every location, so you’ll typically need to join the type and location to see all of what you’re looking for.
Additionally, there’s a separate idea regarding locations that describes the “physical” attributes of the location. Things like it’s location (using latitude and longitude or a collection of latitudes and longitudes - collectively known as ‘grid points’), it’s area, perimeter, etc. We link this idea to a location through the “figure” table. Furthermore, there are three types of figures: areas (things that have area), points, and lines.
You can learn much more about these concepts on this page: Location categories: Areas, Lines, & Points. For now, it’s just important to know you may need to do joins across each of these tables to get all the information for a particular location or set of locations.
...
Lets start with a very simple query to find out how many locations exist on a particular farm:
Code Block |
---|
SELECT COUNT(*)
FROM location
WHERE farm_id = 'abc' -- You would input the actual farm_id in the place of 'abc'
AND deleted = FALSE |
Digging a little deeper, we could list out the name and some of the attributes of all of the locations for a specific farm by executing the following:
Code Block | ||
---|---|---|
| ||
SELECT name, type, ROUND(a.total_area) AS "Total Area m^2",
ROUND(a.perimeter) AS "Total perimeter m", notes
FROM location l
JOIN figure f ON f.location_id = l.location_id
FULL JOIN area a ON a.figure_id = f.figure_id -- Note the full join here since not all figures are areas
JOIN farm fa ON l.farm_id = fa.farm_id
WHERE l.farm_id = 'abc' -- You would input the actual farm_id in the place of 'abc'
AND fa.deleted = FALSE
AND l.deleted = FALSE |
Here’s an example of what the above query returns when run on an actual demo farm:
...
A quick note: All areas are stored in m^2. All lengths are stored in m. If you need other units, you’ll need to convert them during your query or in post-processing. For example, to convert m^2 to Ha, you could do the following:
ROUND(a.total_area, 2)/10000 AS "Total Area (Ha)"
Obviously, this will be a case by case basis on the types of figures you’re dealing with.
Taking a step back. If we were interested to know about mapped locations across the entire dataset rather than a specific farm, we could extend this to learn how many locations all farms in the system have by executing the following:
Code Block |
---|
SELECT f.type, COUNT(f.type)
FROM location l
JOIN figure f ON f.location_id = l.location_id
FULL JOIN area a ON a.figure_id = f.figure_id
JOIN farm fa ON l.farm_id = fa.farm_id
WHERE fa.deleted = FALSE
AND l.deleted = FALSE
GROUP BY f.type |
Executing this query would tell us the number of each type of location across the entire system:
Doing so, we can see many more fields are created than any other type of location. No surprise there!
We might also be interested in learning more about specific types of locations on a farm. For example, it might be interesting to know how many fields different farms have.
Code Block |
---|
SELECT farm_name, COUNT(f.type) AS "Number of fields"
FROM location l
JOIN figure f ON f.location_id = l.location_id
FULL JOIN area a ON a.figure_id = f.figure_id
JOIN farm fa ON l.farm_id = fa.farm_id
WHERE f.type = 'field' -- Here we limit the query to only view fields
AND fa.deleted = FALSE
AND l.deleted = FALSE
GROUP BY farm_name, f.type
ORDER BY COUNT(f.type) DESC |
We can easily extend this query to do some descriptive analysis of those fields:
Code Block |
---|
SELECT farm_name, COUNT(f.type) AS "Number of fields",
SUM(a.total_area)/10000 AS "Total Area (Ha)",
ROUND(AVG(a.total_area)/10000, 4) AS "Avg. field size (Ha)",
MAX(a.total_area)/10000 AS "Largest field (Ha)",
MIN(a.total_area)/10000 AS "Smallest field (Ha)"
FROM location l
JOIN figure f ON f.location_id = l.location_id
FULL JOIN area a ON a.figure_id = f.figure_id
JOIN farm fa ON l.farm_id = fa.farm_id
WHERE f.type = 'field'
AND fa.deleted = FALSE
AND l.deleted = FALSE
GROUP BY farm_name, f.type
ORDER BY COUNT(f.type) DESC |
Leaving out the farm names for privacy, we can see a wide variety of field sizes here:
...
A brief aside on data quality
It might be good to pause here and briefly discuss data quality. LiteFarm is a publicly available app and anyone can enter whatever data they would like. We should make an effort to analyze data returned in our queries and attempt to make them more defensive in nature to exclude obviously bad data. For example, Using the last query in the locations section, sorted in DESC order by “Total area under production (Ha)” we see the following 2 farms:
...
According to Herdsy.com, the biggest farm in the world is 22.5m acres. The 5th largest is 4m acres. 4m acres is roughly 1.6m hectares. In fact, most of the largest farms in the world (by area) are cattle or dairy operations with the largest crop producing farms being significantly smaller (less than 200,000 Ha). Since LiteFarm doesn’t support dairy or cattle operations, and we don’t exactly target super large farms, we can probably safely set our upper bounds on area significantly lower, perhaps 50,000 Ha.
We might also consider excluding some obvious terms dabblers might put into their farm names to indicate they aren’t real data points, such as “Fake” or “Test”. As such, our WHERE clause might grow to look as follows:
Code Block |
---|
WHERE ...
AND farm.deleted = FALSE
AND location.deleted = FALSE
AND LOWER(farm_name) NOT LIKE '%test%'
AND LOWER(farm_name) NOT LIKE '%fake%'
AND (a.total_area/10000) < 50000 |