Saturday, July 8, 2017

Spark streaming mapWithState timeout without remove

Leave a Comment

Imagine a use case where events are streaming in per user but only the first week of events are of interest. Within that time frame stateful logic is taking place using mapWithState. After that period the user incoming events should be disregarded.

As the user's state takes memory, it makes sense to change it after the user's week period to a simple already-seen-marker.

If any event comes in for that user a week or later after his first event, it is easy to change the state to that already-seen-marker.

But, if no events come after that week, the state never changes to that already-seen-marker, and the state will continue to occupy memory forever.

As far as I understand, adding a timeout ( to user's state ) will not help, as you are not allowed to change state for a timeout state ( makes sense, as it is going to be removed ).

Is there a simple way to achieve this use case?

0 Answers

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment