Tuesday, August 8, 2017

select all records holding some condition in has_many association - Ruby On Rails

Leave a Comment

I have a model profile.rb with following association

class User < ActiveRecord::Base    has_one :profile end  class Profile < ActiveRecord::Base     has_many :skills     belongs_to :user end 

I have a model skills.rb with following association

class Skill < ActiveRecord::Base     belongs_to :profile end 

I have following entries in skills table

id:         name:           profile_id: ==================================================== 1           accounting          1 2           martial arts        2 3           law                 1 4           accounting          2 5           journalist          3 6           administration      1 

and so on , how can i query all the profiles with ,lets say, "accounting" & "administration" skills which will be profile with id 1 considering the above recode. so far i have tried following

Profile.includes(:skills).where(skills: {name: ["accounting" , "administration"]} ) 

but instead of finding profile with id 1 - It gets me [ 1, 2 ] because profile with id 2 holds "accounting" skills and it's performing an "IN" operation in database

Note: I'm using postgresql and question is not only about a specific id of profile as described (which i used only as an example) - The original question is to get all the profiles which contain these two mentioned skills.

My activerecord join fires the following query in postgres

SELECT FROM "profiles" LEFT OUTER JOIN "skills" ON "skills"."profile_id" = "profiles"."id" WHERE "skills"."name" IN ('Accounting', 'Administration') 

In below Vijay Agrawal's answer is something which i already have in my application and both, his and mine, query use IN wildcard which result in profile ids which contain either of skills while my question is to get profile ids which contain both the skills. I'm sure that there must be a way to fix this thing in the same query way which is listed in original question and i'm curious to learn that way . I hope that i'll get some more help with you guys - thanks

For clarity, I want to query all the profiles with multiple skills in a model with has_many relationship with profile model - using the Profile as primary table not the skills

Reason for using Profile as primary table is that in pagination i don't want to get all skills from related table ,say 20_000 or more rows and then filter according to profile.state column . instead anyone would like to select only 5 records which meet the profile.state , profile.user.is_active and other columns condition and match the skills without retrieving thousands of irrelevant records and then filter them again.

5 Answers

Answers 1

You should do this to get all profile_ids which have both accounting and administration skills :

Skill.where(name: ["accounting", "administration"]).group(:profile_id).having("count('id') = 2").pluck(:profile_id) 

If you need profiles details, you can put this query in where clause of Profile for id.

Note the number 2 in query, it is length of your array used in where clause. In this case ["accounting", "administration"].length

UPDATE::

Based on updated question description, instead of pluck you can use select and add subquery to make sure it happens in one query.

Profile.where(id: Skill.where(name: ["accounting", "administration"]).group(:profile_id).having("count('id') = 2").select(:profile_id)) 

More over you have control over sorting, pagination and additional where clause. Don't see any concerns over there which are mentioned in question edit.

UPDATE 2::

Another way to get intersect of profiles with both the skills (likely to be less efficient than above solution):

profiles = Profile  ["accounting", "administration"].each do |name|   profiles = profiles.where(id: Skill.where(name: name).select(:profile_id)) end 

Answers 2

Profile.includes(:skills).where("skills.name" => %w(accounting administration)) 

For more information, read about finding through ActiveRecord associations.

Update

If this is not working for you then you likely do not have your database and models properly configured, because in a brand new Rails app this works as expected.

class CreateProfiles < ActiveRecord::Migration[5.1]   def change     create_table :profiles do |t|       t.timestamps     end   end end  class CreateSkills < ActiveRecord::Migration[5.1]   def change     create_table :skills do |t|       t.string :name       t.integer :profile_id       t.timestamps     end   end end  class Profile < ApplicationRecord   has_many :skills end  class Skill < ApplicationRecord   belongs_to :profile end  Profile.create Profile.create Skill.create(name: 'foo', profile_id: 1) Skill.create(name: 'bar', profile_id: 1) Skill.create(name: 'baz', profile_id: 2)  Profile.includes(:skills).where("skills.name" => %w(foo))   SQL (0.3ms)  SELECT  DISTINCT "profiles"."id" FROM "profiles" LEFT OUTER JOIN "skills" ON "skills"."profile_id" = "profiles"."id" WHERE "skills"."name" = 'foo' LIMIT ?  [["LIMIT", 11]]   SQL (0.1ms)  SELECT "profiles"."id" AS t0_r0, "profiles"."created_at" AS t0_r1, "profiles"."updated_at" AS t0_r2, "skills"."id" AS t1_r0, "skills"."name" AS t1_r1, "skills"."profile_id" AS t1_r2, "skills"."created_at" AS t1_r3, "skills"."updated_at" AS t1_r4 FROM "profiles" LEFT OUTER JOIN "skills" ON "skills"."profile_id" = "profiles"."id" WHERE "skills"."name" = 'foo' AND "profiles"."id" = 1  => #<ActiveRecord::Relation [#<Profile id: 1, created_at: "2017-07-28 21:52:56", updated_at: "2017-07-28 21:52:56">]>  Profile.includes(:skills).where("skills.name" => %w(bar))   SQL (0.3ms)  SELECT  DISTINCT "profiles"."id" FROM "profiles" LEFT OUTER JOIN "skills" ON "skills"."profile_id" = "profiles"."id" WHERE "skills"."name" = 'bar' LIMIT ?  [["LIMIT", 11]]   SQL (0.1ms)  SELECT "profiles"."id" AS t0_r0, "profiles"."created_at" AS t0_r1, "profiles"."updated_at" AS t0_r2, "skills"."id" AS t1_r0, "skills"."name" AS t1_r1, "skills"."profile_id" AS t1_r2, "skills"."created_at" AS t1_r3, "skills"."updated_at" AS t1_r4 FROM "profiles" LEFT OUTER JOIN "skills" ON "skills"."profile_id" = "profiles"."id" WHERE "skills"."name" = 'bar' AND "profiles"."id" = 1  => #<ActiveRecord::Relation [#<Profile id: 1, created_at: "2017-07-28 21:52:56", updated_at: "2017-07-28 21:52:56">]>  Profile.includes(:skills).where("skills.name" => %w(baz))   SQL (0.3ms)  SELECT  DISTINCT "profiles"."id" FROM "profiles" LEFT OUTER JOIN "skills" ON "skills"."profile_id" = "profiles"."id" WHERE "skills"."name" = 'baz' LIMIT ?  [["LIMIT", 11]]   SQL (0.1ms)  SELECT "profiles"."id" AS t0_r0, "profiles"."created_at" AS t0_r1, "profiles"."updated_at" AS t0_r2, "skills"."id" AS t1_r0, "skills"."name" AS t1_r1, "skills"."profile_id" AS t1_r2, "skills"."created_at" AS t1_r3, "skills"."updated_at" AS t1_r4 FROM "profiles" LEFT OUTER JOIN "skills" ON "skills"."profile_id" = "profiles"."id" WHERE "skills"."name" = 'baz' AND "profiles"."id" = 2  => #<ActiveRecord::Relation [#<Profile id: 2, created_at: "2017-07-28 21:53:34", updated_at: "2017-07-28 21:53:34">]> 

Update 2

Downvoting an answer because you changed your question later is poor form.

You should change your model relationships from has_many and belongs_to to has_and_belongs_to_many. This will allow you to stop recording a new skill every time; if someone adds the skill administration and then later on someone else adds that skill, you don't have to create a new skill. You just re-use the existing skill and associate it with multiple profiles:

class Profile < ApplicationRecord   has_and_belongs_to_many :skills end  class Skill < ApplicationRecord   has_and_belongs_to_many :profiles end 

Add a join table with a unique index (so each profile can have each skill once and only once):

class Join < ActiveRecord::Migration[5.1]   def change     create_table :profiles_skills, id: false do |t|       t.belongs_to :profile, index: true       t.belongs_to :skill, index: true       t.index ["profile_id", "skill_id"], name: "index_profiles_skills_on_profile_id_skill_id", unique: true, using: :btree     end   end end 

Create your models:

Profile.create Profile.create Skill.create(name: 'foo') Skill.create(name: 'bar') Skill.create(name: 'baz') Profile.first.skills << Skill.first Profile.first.skills << Skill.second Profile.second.skills << Skill.second Profile.second.skills << Skill.third 

And then run your query to return just the first profile:

skills = %w(foo bar).uniq Profile.includes(:skills).where('skills.name' => skills).group(:id).having("count(skills.id) >= #{skills.size}")   SQL (0.4ms)  SELECT  DISTINCT "profiles"."id" FROM "profiles" LEFT OUTER JOIN "profiles_skills" ON "profiles_skills"."profile_id" = "profiles"."id" LEFT OUTER JOIN "skills" ON "skills"."id" = "profiles_skills"."skill_id" WHERE "skills"."name" IN ('foo', 'bar') GROUP BY "profiles"."id" HAVING (count(skills.id) = 2) LIMIT ?  [["LIMIT", 11]]   SQL (0.2ms)  SELECT "profiles"."id" AS t0_r0, "profiles"."created_at" AS t0_r1, "profiles"."updated_at" AS t0_r2, "skills"."id" AS t1_r0, "skills"."name" AS t1_r1, "skills"."profile_id" AS t1_r2, "skills"."created_at" AS t1_r3, "skills"."updated_at" AS t1_r4 FROM "profiles" LEFT OUTER JOIN "profiles_skills" ON "profiles_skills"."profile_id" = "profiles"."id" LEFT OUTER JOIN "skills" ON "skills"."id" = "profiles_skills"."skill_id" WHERE "skills"."name" IN ('foo', 'bar') AND "profiles"."id" = 1 GROUP BY "profiles"."id" HAVING (count(skills.id) = 2)  => #<ActiveRecord::Relation [#<Profile id: 1, created_at: "2017-07-28 21:52:56", updated_at: "2017-07-28 21:52:56">]> 

Confirm with additional testing:

Should return both profiles:

skills = %w(bar).uniq Profile.includes(:skills).where('skills.name' => skills).group(:id).having("count(skills.id) >= #{skills.size}")   SQL (0.4ms)  SELECT  DISTINCT "profiles"."id" FROM "profiles" LEFT OUTER JOIN "profiles_skills" ON "profiles_skills"."profile_id" = "profiles"."id" LEFT OUTER JOIN "skills" ON "skills"."id" = "profiles_skills"."skill_id" WHERE "skills"."name" = 'bar' GROUP BY "profiles"."id" HAVING (count(skills.id) >= 1) LIMIT ?  [["LIMIT", 11]]   SQL (0.3ms)  SELECT "profiles"."id" AS t0_r0, "profiles"."created_at" AS t0_r1, "profiles"."updated_at" AS t0_r2, "skills"."id" AS t1_r0, "skills"."name" AS t1_r1, "skills"."profile_id" AS t1_r2, "skills"."created_at" AS t1_r3, "skills"."updated_at" AS t1_r4 FROM "profiles" LEFT OUTER JOIN "profiles_skills" ON "profiles_skills"."profile_id" = "profiles"."id" LEFT OUTER JOIN "skills" ON "skills"."id" = "profiles_skills"."skill_id" WHERE "skills"."name" = 'bar' AND "profiles"."id" IN (1, 2) GROUP BY "profiles"."id" HAVING (count(skills.id) >= 1)  => #<ActiveRecord::Relation [#<Profile id: 1, created_at: "2017-07-28 21:52:56", updated_at: "2017-07-28 21:52:56">, #<Profile id: 2, created_at: "2017-07-28 21:53:34", updated_at: "2017-07-28 21:53:34">]> 

Should return just the second profile:

skills = %w(bar baz).uniq   SQL (0.3ms)  SELECT  DISTINCT "profiles"."id" FROM "profiles" LEFT OUTER JOIN "profiles_skills" ON "profiles_skills"."profile_id" = "profiles"."id" LEFT OUTER JOIN "skills" ON "skills"."id" = "profiles_skills"."skill_id" WHERE "skills"."name" IN ('bar', 'baz') GROUP BY "profiles"."id" HAVING (count(skills.id) >= 2) LIMIT ?  [["LIMIT", 11]]   SQL (0.2ms)  SELECT "profiles"."id" AS t0_r0, "profiles"."created_at" AS t0_r1, "profiles"."updated_at" AS t0_r2, "skills"."id" AS t1_r0, "skills"."name" AS t1_r1, "skills"."profile_id" AS t1_r2, "skills"."created_at" AS t1_r3, "skills"."updated_at" AS t1_r4 FROM "profiles" LEFT OUTER JOIN "profiles_skills" ON "profiles_skills"."profile_id" = "profiles"."id" LEFT OUTER JOIN "skills" ON "skills"."id" = "profiles_skills"."skill_id" WHERE "skills"."name" IN ('bar', 'baz') AND "profiles"."id" = 2 GROUP BY "profiles"."id" HAVING (count(skills.id) >= 2)  => #<ActiveRecord::Relation [#<Profile id: 2, created_at: "2017-07-28 21:53:34", updated_at: "2017-07-28 21:53:34">]> 

Should return no profiles:

skills = %w(foo baz).uniq Profile.includes(:skills).where('skills.name' => skills).group(:id).having("count(skills.id) >= #{skills.size}")   SQL (0.3ms)  SELECT  DISTINCT "profiles"."id" FROM "profiles" LEFT OUTER JOIN "profiles_skills" ON "profiles_skills"."profile_id" = "profiles"."id" LEFT OUTER JOIN "skills" ON "skills"."id" = "profiles_skills"."skill_id" WHERE "skills"."name" IN ('foo', 'baz') GROUP BY "profiles"."id" HAVING (count(skills.id) >= 2) LIMIT ?  [["LIMIT", 11]]  => #<ActiveRecord::Relation []> 

Answers 3

PostgreSQL dependent solution:

where_clause = <<~SQL   ARRAY(     SELECT name FROM skills WHERE profile_id = profiles.id   ) @> ARRAY[?] SQL Profile.where(where_clause, %w[skill1 skill2]) 

It works, but it makes sense to change DB structure for speed-up. There are two options:

  • has_and_belongs_to_many way adds consistency (skills tables turns into the dictionary) and ability to use indexes
  • skills as array|jsonb column of profile - adds fast search by index without sub-selects or joins.

Answers 4

I would use EXISTS with a correlated sub-query, like this:

required_skills = %w{accounting administration} q = Profile.where("1=1") required_skills.each do |sk|   q = q.where(<<-EOQ, sk)     EXISTS (SELECT 1             FROM   skills s             WHERE  s.profile_id = profiles.id             AND    s.name = ?)   EOQ end 

There are some other ideas at this similar question but I think in your case multiple EXISTS clauses is the simplest and most likely fastest.

(By the way in Rails 4+ you can start with Profile.all instead of Profile.where("1=1"), because all returns a Relation, but in the old days it used to return an array.)

Answers 5

The problem with the following query

Profile.includes(:skills).where(skills: { name: ["accounting" , "administration"] } ) is that it create a query with IN operator like IN ('Accounting', 'Administration')

Now as per the SQA standard, it will match all the records which matches any value and not all the values from the array.

Here is a simplest solution

skills = ["accounting" , "administration"]  Profile.includes(:skills).where(skills: { name: skills }).group(:profile_id).having("count(*) = #{skills.length}") 

P.S. This assumes you will have at least one skill. Adjust having condition as per your usecase

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment