Monday, July 3, 2017

Connect to Microsoft SQL database with dplyr 0.7 and R

Leave a Comment

I'm trying to use dplyr/dbplyr (version 0.7) with a database (Microsoft SQL Server 2014). I've been able to connect to this and extract data using the RODBC, DBI and odbc packages. The problem arises when I try to use dplyr verbs directly with the base.

When I attempt to use dplyr, I get the following error:

Error in new_result(connection@ptr, statement) : std::bad_alloc 

I raised this on the dplyr repo, where Hadley told me it was most likely an odbc error. Jim was super helpful when I raised the issue on the odbc repo, but wasn't able to solve the problem.

To be clear:

RODBC::sqlQuery() and DBI::dbGetQuery() both work: I get a dataframe back as I would expect, and I can use SQL queries to get back whatever I want. I only run into the memory allocation error when attempting to use dplyr verbs (actually, even when I try tbl()). There is a lot of data, but I'm querying for a subset of it, and it fits comfortably in R memory.

Ideally, I would rather keep the data out of R's memory altogether, if possible. The reason for this is that I'm building a Shiny app that will produce plots etc based on aspects of the data that the user chooses. As you can imagine, connecting to the database, sending queries and receiving data everytime the user does this renders the app somewhat useless because it's so slow. My ideal situation is like the RStudio example:

my_db <- src_mysql(   dbname = "shinydemo",   host = "shiny-demo.csa7qlmguqrf.us-east-1.rds.amazonaws.com",   user = "guest",   password = "guest" )  > my_db %>% tbl("City") %>% head(5) # Source:   lazy query [?? x 5] # Database: mysql 10.0.17-MariaDB [guest@shiny-demo.csa7qlmguqrf.us-east-1.rds.amazonaws.com:/shinydemo]      ID           Name CountryCode      District Population   <dbl>          <chr>       <chr>         <chr>      <dbl> 1     1          Kabul         AFG         Kabol    1780000 2     2       Qandahar         AFG      Qandahar     237500 3     3          Herat         AFG         Herat     186800 4     4 Mazar-e-Sharif         AFG         Balkh     127800 5     5      Amsterdam         NLD Noord-Holland     731200 

However, with my base I am unable to do that. Any help would be much appreciated.

1 Answers

Answers 1

Sigh. (Answer for anybody who runs into something similar.)
Turns out this is a problem with the specific table I'm trying to access. This table comes back fine without problems with RODBC and DBI/odbc, but as soon as I use dplyr it spits out this error. Other tables in the database work fine with dplyr, so it must be something to do with this specific table. Worse, I have no idea why.

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment