Search This Blog

Wednesday 22 March 2017

PULL twitter data INTO HDFS USING FLUME

Create an twitter App
Open Dev.twitter.com
My app is Susshmatweets:

Get keys:

consumerKey =  jocL1adBdVN4l1kHfKJCmha77      
consumerSecret =  VMl0s0T9SoXG9XpWn5OvVZbo6r9OrY8QH7c0yBjT4gNFC3MJMg    
accessToken =      372205344-Oe32hIDaKvijuFoHKU0GpUvnuywAMvRG1cDKeqFx   
accessTokenSecret =  UAFNdhzmSBa2fN79OwORcjswjdWguUU0CBR7YmQHzvbDj  

create conf file for twitter in your flume:
TwitterAgent.sources = Twitter
TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS
TwitterAgent.sources.Twitter.type = com.cloudera.flume.source.TwitterSource
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sources.Twitter.consumerKey =  jocL1adBdVN4l1kHfKJCmha77       
TwitterAgent.sources.Twitter.consumerSecret =  VMl0s0T9SoXG9XpWn5OvVZbo6r9OrY8QH7c0yBjT4gNFC3MJMg    
TwitterAgent.sources.Twitter.accessToken =      372205344-Oe32hIDaKvijuFoHKU0GpUvnuywAMvRG1cDKeqFx   
TwitterAgent.sources.Twitter.accessTokenSecret =  UAFNdhzmSBa2fN79OwORcjswjdWguUU0CBR7YmQHzvbDj  
TwitterAgent.sources.Twitter.keywords = sushmaswaraj
TwitterAgent.sinks.HDFS.channel = MemChannel
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = /user/training/tweet_data/
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 1000
TwitterAgent.sinks.HDFS.hdfs.rollInterval = 600
TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 1000
TwitterAgent.channels.MemChannel.transactionCapacity = 100

[training@localhost apache-flume-1.6.0-bin]$ flume-ng agent -n TwitterAgent -c --conf -f conf/flume.conf -Dflume.root.logger=WARN,console -Dtwitter4j.http.proxyHost=10.0.0.2 -Dtwitter4j.http.proxyPort=808 -Dtwitter4j.http.proxyUser=swamy@bdps.in -Dtwitter4j.http.proxyPassword=swamy@123 -Dtwitter4j.streamBaseURL=https://stream.twitter.com/1.1/



Open hdfs:
[training@localhost ~]$ hadoop fs -cat    tweet_data/FlumeData.1490170721558.tmp
text":"@SushmaSwaraj ji steps in to rescue Indian woman facing domestic abuse in Pakistan really a bold action taken by her\nhttps://t.co/0t6S011o8r","contributors":null,"geo":null,"entities":{"symbols":[],"urls":[{"expanded_url":"http://www.thehindu.com/news/national/sushma-swaraj-steps-in-to-rescue-indian-woman-facing-domestic-abuse-in-pakistan/article17568863.ece","indices":[117,140],"display_url":"thehindu.com/news/national/\u2026","url":"https://t.co/0t6S011o8r"}],"hashtags":[],"user_mentions":[{"id":219617448,"name":"Sushma Swaraj","indices":[0,13],"screen_name":"SushmaSwaraj","id_str":"219617448"}]},"is_quote_status":false,"source":"<a href=\"http://twitter.com/download/android\" rel=\"nofollow\">Twitter for Android<\/a>","favorited":false,"in_reply_to_user_id":219617448,"retweet_count":40,"id_str":"844405931362455558","user":{"location":"Dalhousie, India","default_profile":false,"statuses_count":13964,"profile_background_tile":false,"lang":"en","profile_link_color":"FF691F","profile_banner_url":"https://pbs.twimg.com/profile_banners/288716551/1489699188","id":288716551,"following":null,"favourites_count":5205,"protected":false,"profile_text_color":"000000","verified":false,"description":"#SwayamSewak🇮🇳   #Biblophile, Truth perfective\n Next target #loksabha2019 RTs not are endorsements","contributors_enabled":false,"profile_sidebar_bord










No comments:

Post a Comment

Hadoop Analytics

NewolympicData

  Alison Bartosik 21 United States 2004 08-29-04 Synchronized Swimming 0 0 2 2 Anastasiya Davydova 21 Russia 2004 0...