Using dplyr and pool to query a database
By: Bárbara Borges Ribeiro
pool is now compatible with
dplyr 0.7.0 and the brand new
dbplyr. If you want to continue using pool with an older version of dplyr, please install
There are four packages that you need throughout this series of articles. Here are the installation instructions, so your code runs smoothly:
Many of you might be using
dplyr to connect to your database, especially because it allows you to query it using R code, rather than the more cumbersome SQL equivalent. For this reason and for performance, we’d actually recommend users to use
dplyr by default, unless they actually need more functionality and must turn to
DBI instead (when it comes to databases,
dplyr only allows you to do queries that can be transformed into a
SELECT SQL statement). If you’re not familiar with
dplyr applied to databases, make sure to read the section about this on the first article of this series.
dplyr tends to perform better than
dplyr’s vignette about databases):
When working with databases, dplyr tries to be as lazy as possible:
- It never pulls data into R unless you explicitly ask for it.
- It delays doing any work until the last possible moment: it collects together everything you want to do and then sends it to the database in one step.
But on top of this, it’s possible to get even better performance by using a pool. Most importantly, however, you should use a pool for its connection management benefits. To do so, you only need to do minor changes to your existing code.
First, let’s consider how you’d connect to and query a MySQL database using only
Now, let’s do the same thing using a Pool object:
What’s the advantage of using
dplyr, rather than just using
dplyr to query a database? As usual with
pool, the answer is performance and connection management. In order to reap these benefits within a Shiny app, however, you need to be careful about where you create your pool and where you use
tbl (or equivalent). The same idea that we used for combining
pool, also applies here. First, you should create your pool at the top of
server.R (or in
global.R), but outside the actual server function. Then, for each query, you should use
tbl (or equivalent). In a Shiny app, this means that each reactive or function that queries the database has its own call to
tbl. For example, here’s the same app used in the
pool basics article, but written with
dplyr instead of
You’ll note that there is no need to do our own input sanitizing for SQL injection prevention (i.e. no need to call a function like DBI’s
sqlInterpolate). This is because
dplyr acts a middleman between you and the actual SQL query sent to the database; as that middleman, it also sanitizes your inputs for you.
If you have questions about this article or would like to discuss ideas presented here, please post on RStudio Community. Our developers monitor these forums and answer questions periodically. See help for more help with all things Shiny.