Monday, April 23, 2018

Spring Data MongoRepository save(T) not working… sometimes

Leave a Comment

So there's this little Angular + Java + Spring Boot + MongoDB app I'm working with. It's getting quite a lot of action (read: code modifications) lately, but the data access classes have gone largely untouched AFAIK.
However, it seems like MongoRepository suddenly decided to stop persisting the changes I'm save()ing to DB.

Inspecting mongod.log this is what I see when the save() works:

2018-04-11T15:04:06.840+0200 I COMMAND  [conn6] command pdfviewer.bookData command: find { find: "bookData", filter: { _id: "ID_1" }, limit: 1, singleBatch: true } planSummary: IDHACK keysExamined:1 docsExamined:1 idhack:1 cursorExhausted:1 keyUpdates:0 writeConflicts:0 numYields:1 nreturned:1 reslen:716 locks:{ Global: { acquireCount: { r: 4 } }, Database: { acquireCount: { r: 2 } }, Collection: { acquireCount: { r: 2 } } } protocol:op_query 102ms 2018-04-11T17:30:19.615+0200 I WRITE    [conn7] update pdfviewer.bookData query: { _id: "ID_1" } update: { _class: "model.BookData", _id: "ID_1", config: { mode: "normal", offlineEnabled: true }, metadata: { title: "PDFdePrueba3pag   copia  6 ", ...}, downloaded: false, currentPageNumber: 2, availablePages: 3, bookmarks: [], stats: { _id: "c919e517-3c68-462c-8396-d4ba391762e6", dateOpen: new Date(1523460575872), dateClose: new Date(1523460575951), timeZone: "+2", ... }, ... } keysExamined:1 docsExamined:1 nMatched:1 nModified:1 keyUpdates:0 writeConflicts:1 numYields:1 locks:{ Global: { acquireCount: { r: 2, w: 2 } }, Database: { acquireCount: { w: 2 } }, Collection: { acquireCount: { w: 2 } } } 315ms 2018-04-11T17:30:19.615+0200 I COMMAND  [conn7] command pdfviewer.$cmd command: update { update: "bookData", ordered: false, updates: [ { q: { _id: "ID_1" }, u: { _class: "model.BookData", _id: "ID_1", config: { mode: "normal", offlineEnabled: true }, metadata: { title: "PDFdePrueba3pag   copia  6 ", ...}, downloaded: false, currentPageNumber: 2, availablePages: 3, bookmarks: [], stats: { _id: "c919e517-3c68-462c-8396-d4ba391762e6", dateOpen: new Date(1523460575872), dateClose: new Date(1523460575951), timeZone: "+2", ... }, ... }, upsert: true } ] } keyUpdates:0 writeConflicts:0 numYields:0 reslen:55 locks:{ Global: { acquireCount: { r: 2, w: 2 } }, Database: { acquireCount: { w: 2 } }, Collection: { acquireCount: { w: 2 } } } protocol:op_query 316ms 

And this is what I see when it doesn't:

2018-04-11T18:13:21.864+0200 I NETWORK  [initandlisten] connection accepted from 127.0.0.1:64271 #1 (1 connection now open) 2018-04-11T18:18:51.425+0200 I NETWORK  [initandlisten] connection accepted from 127.0.0.1:64329 #2 (2 connections now open) 2018-04-11T18:19:06.967+0200 I NETWORK  [initandlisten] connection accepted from 127.0.0.1:64346 #3 (3 connections now open) 

By doing a tail -f1 on the log file while debugging, I've seen those connections appear right when my code calls findById() or save(), so it seems like the app can reach the DB.

This is (more or less) the relevant Java code:

/* BookData.java */ @Document public class BookData {      @Id private String id;     // Some more non-Id Strings...     private Config config;     private Metadata metadata;     private Boolean downloaded;     private Integer currentPageNumber;     private int availablePages;     private List<Bookmark> bookmarks;     private StatsModel stats;      @Transient private byte[] contents;      public BookData() {}      // getters and setters }  /* BookDataRepository.java */ // MongoRepository comes from spring-boot-starter-parent-1.4.5.RELEASE public interface BookDataRepository extends MongoRepository<BookData, String> {     BookData findById(String id); }  /* BookDataServiceImpl.java */ public BookData updateBookData(String id, BookData newData) {     final BookData original = bookDataRepository.findById(id);     if (original == null) {         return null;     }     original.setCurrentPageNumber(Optional.ofNullable(newData.getCurrentPageNumber()).orElseGet(original::getCurrentPageNumber));     // similar code for a couple other fields      return bookDataRepository.save(original); } 

I've stepped through that part a hundred times while debugging and everything seems to be OK:

  • findById(id) correctly returns the expected BookData original object: check ✓
  • newData contains the expected values to be used for updating: check ✓
  • right before calling save(original), original has been correctly modified using newData values: check ✓
  • save() executes without errors: check ✓
  • save() returns a new BookData with correctly updated values: to my own surprise, check ✓
  • after save() returns, a db.bookData.find() query in Mongo Shell shows that the values have been updated: fail.
  • after save() returns, the BookData object retrieved by new calls to findById() contains the updated values: fail (sometimes it does, sometimes it doesn't).

It just looks like MongoDB is waiting for some kind of flush(), but this is not a JPA repository where one could call saveAndFlush() instead.

Any ideas why this might be happening?

EDIT: versions (as requested):

  • Java 8
  • Spring Boot 1.4.5
  • MongoDB 3.2.6
  • Windows 10

I also included BookData above.

2 Answers

Answers 1

Problem solved.
A different async call from the JS client, to a different endpoint in the Java backend, was overwriting my updated document in a different thread with the original values.

Both update operations were calling findById before saving. Problem was that they did so at the same time, so they were getting the same original values.
Each then carried on with updating their relevant fields and calling save at the end, resulting in the other thread effectively overriding my changes.
Each call was logged with just the relevant modified fields, so I didn't realize that one of them was overwriting the changes of the other.

Once I added systemLog.verbosity: 3 to MongoDB's config.cfg so it logged all operations, it was clear that 2 different WRITE operations were happening at the same time (~500 ms apart) but using different values.
Then it was just a matter of moving the findById closer to the save and ensuring that the JS calls were done in order (by making one of the promises depend on the other).

In hindsight, this probably wouldn't have happened if I used MongoOperations or MongoTemplate, which offer single update and findAndModify methods that also allow single-field operations, instead of MongoRepository where I'm forced to do it in 3 steps (find, modify returned entity, save) and to work with the full document.


EDIT: I didn't really like my first "move findById closer to save" approach, so in the end I did what I felt was right and implemented custom save methods that used MongoTemplate's finer-grained update API. Final code:

/* MongoRepository provides entity-based default Spring Data methods */ /* BookDataRepositoryCustom provides field-level update methods */ public interface BookDataRepository extends MongoRepository<BookData, String>, BookDataRepositoryCustom {      BookData findById(String id);  }  /* Interface for the custom methods */ public interface BookDataRepositoryCustom {      int saveCurrentPage(String id, Integer currentPage); }  /* Custom implementation using MongoTemplate. */ @SuppressWarnings("unused") public class BookDataRepositoryImpl implements BookDataRepositoryCustom {     @Inject     MongoTemplate mongoTemplate;      @Override     public int saveCurrentPage(String id, Integer currentPage) {         Query query = new Query(Criteria.where("_id").is(id));         Update update = new Update();         update.set("currentPage", currentPage);          WriteResult result = mongoTemplate.updateFirst(query, update, BookData.class);          return result == null ? 0 : result.getN();     } }  // Old code: get entity from DB, update, save. 3 steps with plenty of room for interferences. //        BookData bookData = bookDataRepository.findById(bookDataId); //        bookData.setCurrentPage(currentPage); //        bookDataRepository.save(bookData); // New code: update single field. 1 step, 0 problems.         bookDataRepository.saveCurrentPage(bookDataId, currentPage); 

By doing so, each endpoint can update as frequently as needed via MongoTemplate without ever worrying about overwriting unrelated fields, and I still keep the entity-based MongoRepository methods for things like new entity creation, findBy methods, annotated @Querys etc.

Answers 2

MongoDB is inherently a cache store, by which I mean, the contents are not guaranteed to be latest or necessarily correct. I haven't been able to find the configuration options for flush time (but they would be configured in the DB itself), but MongoDB has added features so that you can choose fast+dirty, or slow+clean. This "freshness" factor is most likely your issue if you are seeing this kind of problem. (Even if you aren't running distributed, there is a timing difference between request acknowledge, and request committed)

Here is a link to post regarding "clean reading" (Key point in following quote)

http://www.dagolden.com/index.php/2633/no-more-dirty-reads-with-mongodb/

I encourage MongoDB users to place themselves (or at least, their application activities) into one of the following groups:

"I want low latency" – Dirty reads are OK as long as things are fast. Use w=1 and read concern 'local'. (These are the default settings.) "I want consistency" – Dirty reads are not OK, even at the cost of latency or slightly out of date data. Use w='majority' and read concern 'majority. use MongoDB v1.2.0;

my $mc = MongoDB->connect(     $uri,     {         read_concern_level => 'majority',         w => 'majority',     } ); 

further reading that may or may not be useful

Update

If running in a multi-thread environment, make sure your threads aren't trampling over another's updates. You can verify if this is happening by configuring the system or query logging level to 5. https://docs.mongodb.com/manual/reference/log-messages/#log-messages-configure-verbosity

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment