Thursday, July 19, 2018

Sparksql query an array<string> contains a string,use predicate pushdown

Leave a Comment

I'm trying to query an array in ElasticSearch

data: "names":[{"name":"allen"},{"name":"bill"},{"name":"dave"},{"name":"poter"}] goal: "select names from table where array_contains(names.name, "bill")" 

but spark won't do predicate pushdown if SQL statement use array_contains function.
hint: names.name = ["allen","bill","dave","poter"]
I've tried

select * from table where array_contains(names.name,"bill")  -- and   select explode(names.name) as name from table as t1;select * from t1 where name = "bill"  -- and   select * from table where cast(names.name as string) like '%bill%' 

All failed to do pushdown, any other ways to do it?

1 Answers

Answers 1

The failure to pushdown is expected. For predicate to be delegated you need a Data Source support, and ElasticSearch connector doesn't list array_contains among pushed operations, which as today include:

  • =, => , <, >= , <=
  • is_null / is_not_null
  • in
  • String[Starts|Ends]With, StringContains
  • NULL safe equality.
  • Application of Boolean operators AND / OR / NOT.

Also any additional transformations (including CAST) disable predicate pushdown.

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment