Sunday, March 6, 2016

Looping using Hiveql

Leave a Comment

I'm trying to merge 2 datasets, say A and B. The dataset A has a variable "Flag" which takes 2 values. Rather than jut merging both data together I was trying to merge 2 datasets based on "flag" variable.

The merging code is the following:

create table new_data as select a.*,b.y from A as a left join B as b on a.x=b.x 

Since I'm running Hive code through CLI, I'm calling this through the following command

hive -f new_data.hql 

The looping part of the code I'm calling to merge data based on "Flag" variable is the following:

for flag in 1 2; do   hive -hivevar flag=$flag -f new_data.hql done 

I put the above code in another ".hql" file asn calling it:

hive -f loop_data.hql 

But it's throwing error.

cannot recognize input near 'for' 'flag' 'in'

Can anybody please tell me where I'm making mistake.

Thanks!

1 Answers

Answers 1

  1. You should be adding the loop logic to shell script.

File Name: loop_data.sh

for flag in 1 2; do   hive -hivevar flag=$flag -f new_data.hql done 

And execute the script like:

sh loop_data.sh 
  1. In your new_data.hql script, you are creating table. Since you should split out the DDL & DML in 2 separate scripts. Like

DDL: create_new_data.hql

create table new_data as select    a.*,   b.y from    A as a left join    B as b on    a.x = b.x where    1 = 0; 

DML: insert_new_data.hql

insert into new_data  select    a.*,   b.y from    A as a left join    B as b on    a.x = b.x where   flag = ${hiveconf:flag} 

And update you shell script like:

File Name: loop_new_data.sh

# Create table hive -f create_new_data.hql  # Insert data for flag in 1 2; do   hive -hiveconf flag=$flag -f insert_new_data.hql done 

And execute it like:

sh loop_new_data.sh 

Let me know if you want more info.

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment