Search This Blog

Monday 6 March 2017

PIG LOAD DUMP STORE DESCRIBE ORDER FILTER JOINS

grunt> rec1= load 'emp.txt' using PigStorage(',')as (eno:int,ename:chararray,sal:int,dname:chararray);
grunt> dump rec1;

grunt> hsal=filter rec1 by sal > 25000;
grunt> dump hasl;

grunt> store hsal into 'pigoutput99' using PigStorage (',');



[training@localhost ~]$ hadoop fs -cat  pigoutput99/part-m-00000
101,aaa,34000,hr

--->Dump Operator The Dump operator is used to run the Pig Latin statements
and display the results on the screen.

--->The describe operator is used to view the schema of a relation.

----->The explain operator is used to display the
 logical, physical, and MapReduce execution plans of a relation.


grunt> group1= GROUP rec1 by dname;
grunt> dump group1;


(hr,{(101,aaa,34000,hr),(105,eee,24000,hr)})
(fin,{(102,bbb,23000,fin),(103,ccc,23000,fin)})
(mark,{(104,ddd,25000,mark)})


[training@localhost ~]$ cat emp33.txt
101,aaa,hr,11
102,bbb,fin,12
103,ccc,mark,13
104,ddd,sales,14
[training@localhost ~]$ cat dep33.txt
11,hr
12,fin
13,mark
15,shipping
16,accounts
[training@localhost ~]$


copy into hdfs


[training@localhost ~]$ hadoop fs -put emp33.txt emp33.txt
[training@localhost ~]$ hadoop  fs -put dep33.txt dep33.txt


grunt> empdata= load 'emp33.txt' using PigStorage(',')as (eno:int,ename:chararray,dname:chararray,did:int);
grunt> dump empdata;

grunt> deptdata= load 'dep33.txt' using PigStorage(',')as (did:int,dname:chararray);                        
grunt> dump deptdata;

Inner Join Inner Join is used quite frequently; it is also referred to as equijoin.
An inner join returns rows when there is a match in both tables.


Relation3_name = JOIN Relation1_name BY key, Relation2_name BY key ;

grunt> join1= join empdata by did,deptdata by did;
grunt> dump join1

(101,aaa,hr,11,11,hr)
(102,bbb,fin,12,12,fin)
(103,ccc,mark,13,13,mark)

Outer Join
Unlike inner join, outer join returns all the rows from at least one of the relations.
An outer join operation is carried out in three ways –
? Left outer join
? Right outer join
? Full outer join
Left Outer Join The left outer Join operation returns all rows from the left table
, even if there are no matches in the right relation.

grunt> leftjoin1= join empdata by did LEFT OUTER,deptdata by did;
grunt> dump leftjoin1


(101,aaa,hr,11,11,hr)
(102,bbb,fin,12,12,fin)
(103,ccc,mark,13,13,mark)
(104,ddd,sales,14,,)

grunt>rightjoin1= join empdata by did RIGHT OUTER,deptdata by did;
grunt> dump rightjoin1





(101,aaa,hr,11,11,hr)
(102,bbb,fin,12,12,fin)
(103,ccc,mark,13,13,mark)
(,,,,15,shipping)
(,,,,16,accounts)

No comments:

Post a Comment

Hadoop Analytics

NewolympicData

  Alison Bartosik 21 United States 2004 08-29-04 Synchronized Swimming 0 0 2 2 Anastasiya Davydova 21 Russia 2004 0...