Skip to main content

Command Palette

Search for a command to run...

Introduction to PostgreSQL Aggregate Functions

Updated
4 min read

Aggregation is a really helpful tool in the SQL arsenal, and it enables us to ask slightly more sophisticated questions of our data than we would be able to otherwise. This is a slightly more advanced topic, so be sure to brush up on basic SQL syntax before jumping into this article.

We'll be using the following data model for the examples.

Count

Here is a relatively simple example of when COUNT can be useful. We're calculating the number of expensive locations, which are those with a per-encounter cost of over $100. Here the WHERE clause is filtering out the cheap rows before our aggregation function operates against the resulting record set and produces a scalar value.

select count(*)
    from locations
    where cost > 100

Group by

Group by allows us to slightly modify the behavior of our aggregation function in an interesting way. Consider the below example where we've chosen to group our encounters by patient ID and then apply the COUNT aggregation function. Because the aggregation is happening for each group, the output here is the number of encounters a given patient has had.

select patientid, count(*)
    from encounters
    group by patientid
order by patientid

Sum

In this example, we are calculating the total revenue for each of our business' locations. First we are joining our location and encounters tables to produce an intermediate record set that has all encounters listed, along with the associated cost for each encounter. We group these per location and SUM adds up the cost values for each location.

select name, sum(cost) as revenue
    from locations loc
    inner join encounters enc
        on loc.id = enc.locid
    group by name
order by revenue

Extract

If we want to calculate the total number of encounters per location in April 2024, we could do it this way.

select locid, count(*)
    from encounters
    where starttime >= '2024-04-01'
            and starttime < '2024-05-01'
            and status = 'cancelled'
    group by locid
order by count(*) desc

However, we also could make use of the extract function instead. The exact function lets us pull out pieces of PostgreSQL timestamps.

select locid, count(*)
    from encounters
    where extract(month from starttime) = 4
            and extract(year from starttime) = 2024
            and status = 'cancelled'
    group by locid
order by count(*) desc

Here is a slightly more sophisticated usage of extract, where we seek to calculate total encounters per location per month in 2024.

select locid, extract(month from starttime) as q_month, count(*)
    from encounters
    where extract(year from starttime) = 2024
    group by locid, q_month
order by locid, q_month

Having

The HAVING clause differs from WHERE because it is applied after the grouping has already occurred. In other words, HAVING filters the output of the aggregation function, while WHERE filters what goes into the aggregation function. This example shows how we would output only locations that have more than 100 total encounters.

select locid, count(*) as "Total Encounters"
    from encounters
    group by locid
    having count(*) > 100
order by locid

Here is another example where we list locations that have less than $1000 of revenue.

select name, sum(cost) as revenue
    from locations loc
    inner join encounters enc
        on loc.id = enc.locid
    group by loc.name
    having sum(cost) < 1000
order by revenue

Just for science, here is an alternative query that uses a subquery instead of HAVING. Note that in the subquery version, we only have to describe our aggregation once sum(cost) while in the HAVING query we actually have to repeat it in the SELECT. Unfortunately, this is because unlike some other flavors of SQL, PostgreSQL does not support putting column alias names in the HAVING clause. This means that modifying our previous example to look like HAVING revenue < 1000 results in an error.

select name, revenue
    from (select name, sum(cost) as revenue
            from locations loc
            inner join encounters enc
                on loc.id = enc.locid
            group by loc.name
        ) as sub
    where sub.revenue < 1000
order by revenue

Conclusion

In this post we covered aggregation functions like COUNT and SUM alongside grouping tools like GROUP BY and HAVING. We also covered EXTRACT, which makes aggregating data based on dates more manageable. Combined, these tools allow us to ask more detailed questions about our data and retrieve grouped and aggregated metrics from our tables.