Flume

Apache Flume is a distributed, reliable, and available system for efficiently collecting, aggregating and moving large amounts of log data from many different sources to a centralized data store. Since data sources are customizable, Flume can be used to transport massive quantities of event data including but not limited to network traffic data, social-media-generated data, email messages and pretty much any data source possible

Flume Design Goals :

  • Reliability
  • Scalability
  • Manageability
  • Scalability

Apache Flume Components

  • Source —————–Channel———————–Sink
  • Data is moved between source,channel,sink etc in the form of “Event”.
  • Event consists of header and payload.
  • Payload is in the form of byte array.
  • There can be multiple channels as well.
  • Source can write to multiple channels.
  • Channel is a holding area which holds events from source before passing to sink.
  • Sink cannot take event directly from source. Event has to come from channel.
  • Source, Channel and Sink run inside a Daemon process called Agent
  • Agent -> can have multiple source,multiple channels, multiple sinks

A Simple Example

//sources,channels and sinks are defined per agent
//and in this case called ‘agent’

agent.sources = seqGenSrc
agent.channels = memoryChannel
agent.sinks = loggerSink

//for each one of the sources, type is defined
agent.sources.seqGenSrc.type = exec
agent.sources.seqGenSrc.command = tail -F /home/ruchi/flumedata/logfile
agent.sources.seqGenSrc.channels = memoryChannel

//Channel’s type is defined
agent.channels.memoryChannel.type = memory
agent.channels.memoryChannel.capacity = 100

//Sink defined

agent.sinks.loggerSink.channel = memoryChannel

agent.sinks.loggerSink.type = hdfs
agent.sinks.loggerSink.hdfs.path = hdfs://:8020/user/ruchi/logs
agent.sinks.loggerSink.hdfs.fileType = DataStream

  • copy flume.conf in /usr/local/flume/conf[for example)
  • In command prompt, use the following command to start flume
  • flume-ng agent –conf-file /usr/local/flume/conf/flume.conf -name agent -Dflume.root.logger=INFO,console
  • For /home/ruchi/flumedata/logfile, You can see that in the hdfs logs directory there is corresponding file w.r.t. tail done
  • If you make changes in the /home/ruchi/flumedata/logfile, there will be file created in the hdfs logs directory

Hope this helps…

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s