I'm trying to use dplyr/dbplyr (version 0.7) with a database (Microsoft SQL Server 2014). I've been able to connect to this and extract data using the RODBC, DBI and odbc packages. The problem arises when I try to use dplyr verbs directly with the base.
When I attempt to use dplyr, I get the following error:
Error in new_result(connection@ptr, statement) : std::bad_alloc
I raised this on the dplyr repo, where Hadley told me it was most likely an odbc error. Jim was super helpful when I raised the issue on the odbc repo, but wasn't able to solve the problem.
To be clear:
RODBC::sqlQuery()
and DBI::dbGetQuery()
both work: I get a dataframe back as I would expect, and I can use SQL queries to get back whatever I want. I only run into the memory allocation error when attempting to use dplyr verbs (actually, even when I try tbl()
). There is a lot of data, but I'm querying for a subset of it, and it fits comfortably in R memory.
Ideally, I would rather keep the data out of R's memory altogether, if possible. The reason for this is that I'm building a Shiny app that will produce plots etc based on aspects of the data that the user chooses. As you can imagine, connecting to the database, sending queries and receiving data everytime the user does this renders the app somewhat useless because it's so slow. My ideal situation is like the RStudio example:
my_db <- src_mysql( dbname = "shinydemo", host = "shiny-demo.csa7qlmguqrf.us-east-1.rds.amazonaws.com", user = "guest", password = "guest" ) > my_db %>% tbl("City") %>% head(5) # Source: lazy query [?? x 5] # Database: mysql 10.0.17-MariaDB [guest@shiny-demo.csa7qlmguqrf.us-east-1.rds.amazonaws.com:/shinydemo] ID Name CountryCode District Population <dbl> <chr> <chr> <chr> <dbl> 1 1 Kabul AFG Kabol 1780000 2 2 Qandahar AFG Qandahar 237500 3 3 Herat AFG Herat 186800 4 4 Mazar-e-Sharif AFG Balkh 127800 5 5 Amsterdam NLD Noord-Holland 731200
However, with my base I am unable to do that. Any help would be much appreciated.
1 Answers
Answers 1
Sigh. (Answer for anybody who runs into something similar.)
Turns out this is a problem with the specific table I'm trying to access. This table comes back fine without problems with RODBC and DBI/odbc, but as soon as I use dplyr it spits out this error. Other tables in the database work fine with dplyr, so it must be something to do with this specific table. Worse, I have no idea why.
0 comments:
Post a Comment