Thursday, July 19, 2018

Best JPA entity model for Hashtags

Leave a Comment

I need to design a JPA entity model for hashtags in my application similar to Instagram, twitter and Stack Overflow. My application. The following are the schema specific points that our application uses

  • Hashtag should be belongs to a question
  • Hashtag is user specific.
  • Each user can tag their question with same hashtag

Currently, I have two JPA model for the above schema

Common tables applicable for both models

table: questions

columns: id, question, description, etc.,

table: users

columns: id, name, role, group, etc.,

Model 1

table: question_hash_tags

columns: id, question_id, user_id, hashtag_text

Model 2

table: hashtags

columns: id, hastag_text

table: user_hashtags

columns: user_id, hashtag_id, question_id

Model 1 will have each row even if hashtag is same between users.

Model 2 will have unique hashtag row and it is reffered across users using user_hashtags.

I am expecting a better and standard model beyond these two.

Note: Questions are searchable based on hashtags as well as users

2 Answers

Answers 1

We have three "things" here: Users, Questions, and Hashtags. I'd avoid model #1 in your example mainly because it's not flexible. What happens when the product owner later decides to add a description to tags? What happens if the product owner next week wants a user to only choose tags from an existing pool? While your domain requirements are incomplete and unclear, a more flexible solution allows for implementing future functionality on top of your model without major refactoring. With that said, I would definitely consider Hashtag to be its own independent entity.

In your second model, the tertiary relationship represented in the USER_HASHTAGS table seems to assume an optional many-to-many relationship between Users and Questions. If indeed your domain requires that many users can author the same question, I think model #2 would suite your needs. More likely, your requirement might explicitly restrict the ability for more than one user to author a single specific question. If so, such constraints should be accommodated in the model. Plus if a user decides to tag a question with 5 different tags, that one-to-many relationship between users and questions would be asserted 5 times in your user_hashtags table for each tag. A simple query to display the questions authored by a user would involve using DISTINCT which should immediately raise alarm bells about the design.

Assuming a one-to-many relationship between users and questions, I'd remove the USER_ID from your "USER_HASHTAGS" table and instead put the USER_ID into the QUESTIONS table as a foreign key. Then rename your USER_HASHTAGS table to QUESTION_HASHTAGS. That makes it both simple and performant because now you can simply query a single table to get the questions authored by a specific USER_ID without joining and adding who-knows-how-many duplicates the database would need to filter with DISTINCT. Also now supported is the possibility where a user chooses not to include a hashtag for their question (no blank foreign key).

NOTE:

There are many factors that influence the physical design of a database - not only the "select" access patterns, but potentially even the relative frequency of each. The ratio between database writes to reads, what can be updated, and how often in relation to other updates, etc. can also influence how you structure your tables in the end. So there is no "definitive" answer, just an answer based on a few assumptions and the limited information provided in your question.

Answers 2

Hashtags are already a kind of ID, so no need for a dedicated table for them. All you need is questions table:

create table questions (     id bigint not null constraint questions_pkey primary key,     user_id bigint constraint fk_questions_users references users,     question text not null; ) 

and questions_hashtags relation table with index by hashtag field:

create table questions_hashtags (     question_id bigint not null fk_questions_hashtags_questions references questions,     hashtag text not null,     constraint uk_questions_hashtags unique (question_id, hashtag) );  create index index_questions_hashtags_hashtag on questions_hashtags(hashtag); 

(here is a PostgreSQL dialect.)

These tables simple map to the single(!) entity (not considering User entity):

@Entity @Table(name = "questions") public class Question {     @Id     @GeneratedValue     private Long id;      @Column(nullable = false)     private String question;      @ManyToOne(optional = false)     private User user;      @CollectionTable(name = "questions_hashtags", joinColumns = @JoinColumn(name = "question_id"))     @Column(name = "hashtag")     @ElementCollection(fetch = FetchType.EAGER)     @BatchSize(size = 20)     private Set<String> hashtags = new HashSet<>();      public Question(User user, String question) {         this.user = user;         this.question = question;     }      private Set<String> extractHashtags() {       // extract hashtags from question to Set...     }      @PrePersist     @PreUpdate     private void populateHashtags() {         hashtags.clear();         hashtags.addAll(extractHashtags());     }      // other stuff } 

It's a very convenient model. To create and save question with hashtags you simply do something like this:

questionRepo.save(new Question(user, question)); 

To get all hashtags you can use this query method of your questionRepo:

@Query("select distinct h as hashtag from Question q join q.hashtags h") List<String> getAllHashtags(); 

To find all questions related to particular hashtag you can use the following query method:

@Query("select q from Question q join q.hashtags h where h = ?1") List<Question> getQuestionsByHashtag(String hashtag); 

To find questions by several hashtags you can use this method:

@Query("select q from Question q join q.hashtags h where h in (?1)") List<Question> getQuestionsByHashtag(Set<String> hashtags); 

And to find users related to the given hashtags you can use this method:

@Query("select distinct u from Question q join q.user u join q.hashtags h where h in (?1)") List<User> getUsersByHashtag(Set<String> hashtags); 

See my REST service example of hashtag usage sb-hashtag-usage-example:

1) POST /users - create new user

{     "name": "user1" } 

2) POST /questions - create new question

{     "question": "How implement best JPA #entity #model for Hashtags?",     "user": "/user/1" } 

3) GET /hashtags - get all hashtags

4) GET/questions/search/by_hashtag?hashtag=%23model - get questions by one hashtag

5) GET /questions/search/by_hashtags?hashtags=%23entity,%23model - get questions by several hashtags

6) GET /users/search/by_hashtags?hashtags=%23entity - get users by several hashtags

(Other methods like PATCH, DELETE also available.)

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment