Why does tensorflow/keras choke when I try to fit multiple models in parallel?

I'm trying to fit a finite mixture model, with the mixture models for each class being neural networks. It'd be super-useful for me to be able to be able to parallelize, because keras doesn't max out all of the available cores on my laptop, let alone a large cluster.

But when I try to set different learning rates for different models inside of a parallel foreach loop the whole thing chokes.

What is going on? I suspect that it has something to do with scope -- the workers aren't running on separate instantiations of tensorflow, maybe. But I really don't know. How can I make this work? And what do I need to understand to know why this doesn't work?

Here's a MWE. Set the foreach loop to %do% and it works fine. Set it to %dopar% and it chokes on the fitting stage.

library(foreach) library(doParallel) registerDoParallel(2) library(keras) library(tensorflow) mnist <- dataset_mnist() x_train <- mnist$train$x y_train <- mnist$train$y x_test <- mnist$test$x y_test <- mnist$test$y  x_train <- array_reshape(x_train, c(nrow(x_train), 784)) x_test <- array_reshape(x_test, c(nrow(x_test), 784)) # rescale x_train <- x_train / 255 x_test <- x_test / 255  y_train <- to_categorical(y_train, 10) y_test <- to_categorical(y_test, 10)  # make tensorflow run single-threaded session_conf <- tf$ConfigProto(intra_op_parallelism_threads = 1L,                                inter_op_parallelism_threads = 1L) # Create the session using the custom configuration sess <- tf$Session(config = session_conf) K <- backend() K$set_session(sess)   models <- foreach(i = 1:2) %dopar%{   model <- keras_model_sequential()    model %>%      layer_dense(units = 256/i, activation = 'relu', input_shape = c(784)) %>%      layer_dropout(rate = 0.4) %>%      layer_dense(units = 128/i, activation = 'relu') %>%     layer_dropout(rate = 0.3) %>%     layer_dense(units = 10, activation = 'softmax')    print("A")   model %>% compile(     loss = 'categorical_crossentropy',     optimizer = optimizer_rmsprop(),     metrics = c('accuracy')   )   print("B")   history <- model %>% fit(     x_train, y_train,      epochs = 3, batch_size = 128,      validation_split = 0.2, verbose = 0   )   print("done")   }

Here's sessionInfo():

R version 3.5.1 (2018-07-02) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 18.04.1 LTS  Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1  locale:  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8     [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C         attached base packages: [1] splines   parallel  stats     graphics  grDevices utils     datasets  methods   base       other attached packages:  [1] panelNNET_1.0       matrixStats_0.54.0  MASS_7.3-50         lfe_2.8-2           tensorflow_1.9      keras_2.1.6.9005     [7] mgcv_1.8-24         nlme_3.1-137        scales_1.0.0        forcats_0.3.0       stringr_1.3.1       purrr_0.2.5         [13] readr_1.1.1         tidyr_0.8.1         tibble_1.4.2        tidyverse_1.2.1     maptools_0.9-3      rgeos_0.3-28        [19] rgdal_1.3-4         sp_1.3-1            broom_0.5.0         ggplot2_3.0.0       randomForest_4.6-14 dplyr_0.7.6         [25] glmnet_2.0-16       Matrix_1.2-14       doBy_4.6-2          doParallel_1.0.11   iterators_1.0.10    foreach_1.4.4        loaded via a namespace (and not attached):  [1] httr_1.3.1          jsonlite_1.5        modelr_0.1.2        Formula_1.2-3       assertthat_0.2.0    cellranger_1.1.0     [7] yaml_2.2.0          pillar_1.3.0        backports_1.1.2     lattice_0.20-35     glue_1.3.0          reticulate_1.10     [13] digest_0.6.15       RcppEigen_0.3.3.4.0 rvest_0.3.2         colorspace_1.3-2    sandwich_2.5-0      plyr_1.8.4          [19] pkgconfig_2.0.1     haven_1.1.2         xtable_1.8-2        whisker_0.3-2       withr_2.1.2         lazyeval_0.2.1      [25] cli_1.0.0           magrittr_1.5        crayon_1.3.4        readxl_1.1.0        xml2_1.2.0          foreign_0.8-70      [31] tools_3.5.1         hms_0.4.2           munsell_0.5.0       bindrcpp_0.2.2      compiler_3.5.1      rlang_0.2.2         [37] grid_3.5.1          rstudioapi_0.7      base64enc_0.1-3     labeling_0.3        gtable_0.2.0        codetools_0.2-15    [43] R6_2.2.2            tfruns_1.3          zoo_1.8-3           lubridate_1.7.4     zeallot_0.1.0       bindr_0.1.1         [49] stringi_1.2.4       Rcpp_0.12.18        tidyselect_0.2.4

1 Answers

Answers 1

Keras requires there is only one training in a given session. I would try to create a different session for each model.

I would insert this part of the code inside the %dopar%, to create a different session per model

sess <- tf$Session(config = session_conf) K <- backend() K$set_session(sess)

Coding Question

Sunday, September 16, 2018

Why does tensorflow/keras choke when I try to fit multiple models in parallel?

1 Answers

Answers 1

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment

Search

Popular Posts

Labels

Blog Archive

Find Us On Facebook