I have a model profile.rb
with following association
class User < ActiveRecord::Base has_one :profile end class Profile < ActiveRecord::Base has_many :skills belongs_to :user end
I have a model skills.rb
with following association
class Skill < ActiveRecord::Base belongs_to :profile end
I have following entries in skills table
id: name: profile_id: ==================================================== 1 accounting 1 2 martial arts 2 3 law 1 4 accounting 2 5 journalist 3 6 administration 1
and so on , how can i query all the profiles with ,lets say, "accounting" & "administration" skills which will be profile with id 1 considering the above recode. so far i have tried following
Profile.includes(:skills).where(skills: {name: ["accounting" , "administration"]} )
but instead of finding profile with id 1
- It gets me [ 1, 2 ]
because profile with id 2 holds "accounting" skills
and it's performing an "IN" operation
in database
Note: I'm using postgresql
and question is not only about a specific id of profile as described (which i used only as an example) - The original question is to get all the profiles which contain these two mentioned skills.
My activerecord join fires the following query in postgres
SELECT FROM "profiles" LEFT OUTER JOIN "skills" ON "skills"."profile_id" = "profiles"."id" WHERE "skills"."name" IN ('Accounting', 'Administration')
In below Vijay Agrawal's answer is something which i already have in my application and both, his and mine, query use IN
wildcard which result in profile ids which contain either of skills while my question is to get profile ids which contain both the skills. I'm sure that there must be a way to fix this thing in the same query way which is listed in original question and i'm curious to learn that way . I hope that i'll get some more help with you guys - thanks
For clarity, I want to query all the profiles with multiple skills in a model with has_many relationship with profile model - using the Profile
as primary table not the skills
Reason for using Profile as primary table is that in pagination i don't want to get all skills from related table ,say 20_000 or more rows and then filter according to profile.state
column . instead anyone would like to select only 5 records
which meet the profile.state , profile.user.is_active and other columns condition
and match the skills without retrieving thousands of irrelevant records and then filter them again.
5 Answers
Answers 1
You should do this to get all profile_id
s which have both accounting and administration skills :
Skill.where(name: ["accounting", "administration"]).group(:profile_id).having("count('id') = 2").pluck(:profile_id)
If you need profiles details, you can put this query in where clause of Profile
for id
.
Note the number 2
in query, it is length of your array used in where clause. In this case ["accounting", "administration"].length
UPDATE::
Based on updated question description, instead of pluck
you can use select
and add subquery to make sure it happens in one query.
Profile.where(id: Skill.where(name: ["accounting", "administration"]).group(:profile_id).having("count('id') = 2").select(:profile_id))
More over you have control over sorting, pagination and additional where clause. Don't see any concerns over there which are mentioned in question edit.
UPDATE 2::
Another way to get intersect of profiles with both the skills (likely to be less efficient than above solution):
profiles = Profile ["accounting", "administration"].each do |name| profiles = profiles.where(id: Skill.where(name: name).select(:profile_id)) end
Answers 2
Profile.includes(:skills).where("skills.name" => %w(accounting administration))
For more information, read about finding through ActiveRecord associations.
Update
If this is not working for you then you likely do not have your database and models properly configured, because in a brand new Rails app this works as expected.
class CreateProfiles < ActiveRecord::Migration[5.1] def change create_table :profiles do |t| t.timestamps end end end class CreateSkills < ActiveRecord::Migration[5.1] def change create_table :skills do |t| t.string :name t.integer :profile_id t.timestamps end end end class Profile < ApplicationRecord has_many :skills end class Skill < ApplicationRecord belongs_to :profile end Profile.create Profile.create Skill.create(name: 'foo', profile_id: 1) Skill.create(name: 'bar', profile_id: 1) Skill.create(name: 'baz', profile_id: 2) Profile.includes(:skills).where("skills.name" => %w(foo)) SQL (0.3ms) SELECT DISTINCT "profiles"."id" FROM "profiles" LEFT OUTER JOIN "skills" ON "skills"."profile_id" = "profiles"."id" WHERE "skills"."name" = 'foo' LIMIT ? [["LIMIT", 11]] SQL (0.1ms) SELECT "profiles"."id" AS t0_r0, "profiles"."created_at" AS t0_r1, "profiles"."updated_at" AS t0_r2, "skills"."id" AS t1_r0, "skills"."name" AS t1_r1, "skills"."profile_id" AS t1_r2, "skills"."created_at" AS t1_r3, "skills"."updated_at" AS t1_r4 FROM "profiles" LEFT OUTER JOIN "skills" ON "skills"."profile_id" = "profiles"."id" WHERE "skills"."name" = 'foo' AND "profiles"."id" = 1 => #<ActiveRecord::Relation [#<Profile id: 1, created_at: "2017-07-28 21:52:56", updated_at: "2017-07-28 21:52:56">]> Profile.includes(:skills).where("skills.name" => %w(bar)) SQL (0.3ms) SELECT DISTINCT "profiles"."id" FROM "profiles" LEFT OUTER JOIN "skills" ON "skills"."profile_id" = "profiles"."id" WHERE "skills"."name" = 'bar' LIMIT ? [["LIMIT", 11]] SQL (0.1ms) SELECT "profiles"."id" AS t0_r0, "profiles"."created_at" AS t0_r1, "profiles"."updated_at" AS t0_r2, "skills"."id" AS t1_r0, "skills"."name" AS t1_r1, "skills"."profile_id" AS t1_r2, "skills"."created_at" AS t1_r3, "skills"."updated_at" AS t1_r4 FROM "profiles" LEFT OUTER JOIN "skills" ON "skills"."profile_id" = "profiles"."id" WHERE "skills"."name" = 'bar' AND "profiles"."id" = 1 => #<ActiveRecord::Relation [#<Profile id: 1, created_at: "2017-07-28 21:52:56", updated_at: "2017-07-28 21:52:56">]> Profile.includes(:skills).where("skills.name" => %w(baz)) SQL (0.3ms) SELECT DISTINCT "profiles"."id" FROM "profiles" LEFT OUTER JOIN "skills" ON "skills"."profile_id" = "profiles"."id" WHERE "skills"."name" = 'baz' LIMIT ? [["LIMIT", 11]] SQL (0.1ms) SELECT "profiles"."id" AS t0_r0, "profiles"."created_at" AS t0_r1, "profiles"."updated_at" AS t0_r2, "skills"."id" AS t1_r0, "skills"."name" AS t1_r1, "skills"."profile_id" AS t1_r2, "skills"."created_at" AS t1_r3, "skills"."updated_at" AS t1_r4 FROM "profiles" LEFT OUTER JOIN "skills" ON "skills"."profile_id" = "profiles"."id" WHERE "skills"."name" = 'baz' AND "profiles"."id" = 2 => #<ActiveRecord::Relation [#<Profile id: 2, created_at: "2017-07-28 21:53:34", updated_at: "2017-07-28 21:53:34">]>
Update 2
Downvoting an answer because you changed your question later is poor form.
You should change your model relationships from has_many
and belongs_to
to has_and_belongs_to_many
. This will allow you to stop recording a new skill every time; if someone adds the skill administration
and then later on someone else adds that skill, you don't have to create a new skill. You just re-use the existing skill and associate it with multiple profiles:
class Profile < ApplicationRecord has_and_belongs_to_many :skills end class Skill < ApplicationRecord has_and_belongs_to_many :profiles end
Add a join table with a unique index (so each profile can have each skill once and only once):
class Join < ActiveRecord::Migration[5.1] def change create_table :profiles_skills, id: false do |t| t.belongs_to :profile, index: true t.belongs_to :skill, index: true t.index ["profile_id", "skill_id"], name: "index_profiles_skills_on_profile_id_skill_id", unique: true, using: :btree end end end
Create your models:
Profile.create Profile.create Skill.create(name: 'foo') Skill.create(name: 'bar') Skill.create(name: 'baz') Profile.first.skills << Skill.first Profile.first.skills << Skill.second Profile.second.skills << Skill.second Profile.second.skills << Skill.third
And then run your query to return just the first profile:
skills = %w(foo bar).uniq Profile.includes(:skills).where('skills.name' => skills).group(:id).having("count(skills.id) >= #{skills.size}") SQL (0.4ms) SELECT DISTINCT "profiles"."id" FROM "profiles" LEFT OUTER JOIN "profiles_skills" ON "profiles_skills"."profile_id" = "profiles"."id" LEFT OUTER JOIN "skills" ON "skills"."id" = "profiles_skills"."skill_id" WHERE "skills"."name" IN ('foo', 'bar') GROUP BY "profiles"."id" HAVING (count(skills.id) = 2) LIMIT ? [["LIMIT", 11]] SQL (0.2ms) SELECT "profiles"."id" AS t0_r0, "profiles"."created_at" AS t0_r1, "profiles"."updated_at" AS t0_r2, "skills"."id" AS t1_r0, "skills"."name" AS t1_r1, "skills"."profile_id" AS t1_r2, "skills"."created_at" AS t1_r3, "skills"."updated_at" AS t1_r4 FROM "profiles" LEFT OUTER JOIN "profiles_skills" ON "profiles_skills"."profile_id" = "profiles"."id" LEFT OUTER JOIN "skills" ON "skills"."id" = "profiles_skills"."skill_id" WHERE "skills"."name" IN ('foo', 'bar') AND "profiles"."id" = 1 GROUP BY "profiles"."id" HAVING (count(skills.id) = 2) => #<ActiveRecord::Relation [#<Profile id: 1, created_at: "2017-07-28 21:52:56", updated_at: "2017-07-28 21:52:56">]>
Confirm with additional testing:
Should return both profiles:
skills = %w(bar).uniq Profile.includes(:skills).where('skills.name' => skills).group(:id).having("count(skills.id) >= #{skills.size}") SQL (0.4ms) SELECT DISTINCT "profiles"."id" FROM "profiles" LEFT OUTER JOIN "profiles_skills" ON "profiles_skills"."profile_id" = "profiles"."id" LEFT OUTER JOIN "skills" ON "skills"."id" = "profiles_skills"."skill_id" WHERE "skills"."name" = 'bar' GROUP BY "profiles"."id" HAVING (count(skills.id) >= 1) LIMIT ? [["LIMIT", 11]] SQL (0.3ms) SELECT "profiles"."id" AS t0_r0, "profiles"."created_at" AS t0_r1, "profiles"."updated_at" AS t0_r2, "skills"."id" AS t1_r0, "skills"."name" AS t1_r1, "skills"."profile_id" AS t1_r2, "skills"."created_at" AS t1_r3, "skills"."updated_at" AS t1_r4 FROM "profiles" LEFT OUTER JOIN "profiles_skills" ON "profiles_skills"."profile_id" = "profiles"."id" LEFT OUTER JOIN "skills" ON "skills"."id" = "profiles_skills"."skill_id" WHERE "skills"."name" = 'bar' AND "profiles"."id" IN (1, 2) GROUP BY "profiles"."id" HAVING (count(skills.id) >= 1) => #<ActiveRecord::Relation [#<Profile id: 1, created_at: "2017-07-28 21:52:56", updated_at: "2017-07-28 21:52:56">, #<Profile id: 2, created_at: "2017-07-28 21:53:34", updated_at: "2017-07-28 21:53:34">]>
Should return just the second profile:
skills = %w(bar baz).uniq SQL (0.3ms) SELECT DISTINCT "profiles"."id" FROM "profiles" LEFT OUTER JOIN "profiles_skills" ON "profiles_skills"."profile_id" = "profiles"."id" LEFT OUTER JOIN "skills" ON "skills"."id" = "profiles_skills"."skill_id" WHERE "skills"."name" IN ('bar', 'baz') GROUP BY "profiles"."id" HAVING (count(skills.id) >= 2) LIMIT ? [["LIMIT", 11]] SQL (0.2ms) SELECT "profiles"."id" AS t0_r0, "profiles"."created_at" AS t0_r1, "profiles"."updated_at" AS t0_r2, "skills"."id" AS t1_r0, "skills"."name" AS t1_r1, "skills"."profile_id" AS t1_r2, "skills"."created_at" AS t1_r3, "skills"."updated_at" AS t1_r4 FROM "profiles" LEFT OUTER JOIN "profiles_skills" ON "profiles_skills"."profile_id" = "profiles"."id" LEFT OUTER JOIN "skills" ON "skills"."id" = "profiles_skills"."skill_id" WHERE "skills"."name" IN ('bar', 'baz') AND "profiles"."id" = 2 GROUP BY "profiles"."id" HAVING (count(skills.id) >= 2) => #<ActiveRecord::Relation [#<Profile id: 2, created_at: "2017-07-28 21:53:34", updated_at: "2017-07-28 21:53:34">]>
Should return no profiles:
skills = %w(foo baz).uniq Profile.includes(:skills).where('skills.name' => skills).group(:id).having("count(skills.id) >= #{skills.size}") SQL (0.3ms) SELECT DISTINCT "profiles"."id" FROM "profiles" LEFT OUTER JOIN "profiles_skills" ON "profiles_skills"."profile_id" = "profiles"."id" LEFT OUTER JOIN "skills" ON "skills"."id" = "profiles_skills"."skill_id" WHERE "skills"."name" IN ('foo', 'baz') GROUP BY "profiles"."id" HAVING (count(skills.id) >= 2) LIMIT ? [["LIMIT", 11]] => #<ActiveRecord::Relation []>
Answers 3
PostgreSQL
dependent solution:
where_clause = <<~SQL ARRAY( SELECT name FROM skills WHERE profile_id = profiles.id ) @> ARRAY[?] SQL Profile.where(where_clause, %w[skill1 skill2])
It works, but it makes sense to change DB structure for speed-up. There are two options:
has_and_belongs_to_many
way adds consistency (skills
tables turns into the dictionary) and ability to use indexesskills
asarray
|jsonb
column ofprofile
- adds fast search by index without sub-selects or joins.
Answers 4
I would use EXISTS
with a correlated sub-query, like this:
required_skills = %w{accounting administration} q = Profile.where("1=1") required_skills.each do |sk| q = q.where(<<-EOQ, sk) EXISTS (SELECT 1 FROM skills s WHERE s.profile_id = profiles.id AND s.name = ?) EOQ end
There are some other ideas at this similar question but I think in your case multiple EXISTS
clauses is the simplest and most likely fastest.
(By the way in Rails 4+ you can start with Profile.all
instead of Profile.where("1=1")
, because all
returns a Relation
, but in the old days it used to return an array.)
Answers 5
The problem with the following query
Profile.includes(:skills).where(skills: { name: ["accounting" , "administration"] } )
is that it create a query with IN
operator like IN ('Accounting', 'Administration')
Now as per the SQA standard, it will match all the records which matches any value and not all the values from the array.
Here is a simplest solution
skills = ["accounting" , "administration"] Profile.includes(:skills).where(skills: { name: skills }).group(:profile_id).having("count(*) = #{skills.length}")
P.S. This assumes you will have at least one skill. Adjust having
condition as per your usecase
0 comments:
Post a Comment