Search This Blog

Tuesday, 25 March 2025

pig scripts

 first create  people_data.txt file

Alice,30,New York Bob,25,California Charlie,35,Texas David,30,California Eva,20,New York Frank,40,California Grace,30,Texas Hannah,45,New York Ivy,25,California Jack,20,Texas

--------------------------


copy data to hadoop directory


> hadoop fs -put people_data.txt people_data.txt

---------------------------------------


write pig script


pig1.pig


people= LOAD 'people_data.txt' USING PigStorage(',')as (name:chararray, age:int, city:chararray);


dump people;

-------------------------------------------
-----------------------------------------
pig2.pig

people= LOAD 'people_data.txt' USING PigStorage(',')as (name:chararray, age:int, city:chararray);

filtered_data = FILTER people BY age > 30;

dump filtered_data;
-----------------------------------------------------------

pig3.pig


people= LOAD 'people_data.txt' USING PigStorage(',')as (name:chararray, age:int, city:chararray);

Group_data = GROUP people by city;

dump Group_data;


----------------------------------------------------

pig4.pig

people= LOAD 'people_data.txt' USING PigStorage(',')as (name:chararray, age:int, city:chararray);

sort_age = ORDER people BY age DESC;

dump sort_age;

--------------------------------------------------------

pig5.pig

people_data.txt:

sql
John, 30, New York Alice, 25, Los Angeles Bob, 35, Chicago Charlie, 28, New York

city_data.txt:

sql
New York, 8000000, New York Los Angeles, 4000000, California Chicago, 2700000, Illinois San Francisco, 870000, California

Now, let's perform the join on the city field.

Pig Script:

pig
-- Load the people data people = LOAD 'people_data.txt' USING PigStorage(',') AS (name:chararray, age:int, city:chararray); -- Load the city data city_data = LOAD 'city_data.txt' USING PigStorage(',') AS (city:chararray, population:int, state:chararray); -- Perform the join on the 'city' field joined_data = JOIN people BY city, city_data BY city; -- Display the result of the join DUMP joined_data;


----------------------------------




No comments:

Post a Comment

Hadoop Analytics

NLP BASICS

  1. What is NLP? NLP is a field of artificial intelligence (AI) that focuses on the interaction between computers and human languages. Its...