Apache Flume is a distributed, reliable, and available system for efficiently collecting, aggregating and moving large amounts of log data from many different sources to a centralized data store. Since data sources are customizable, Flume can be used to transport massive quantities of event data including but not limited to network traffic data, social-media-generated data, email messages and pretty much any data source possible
Flume Design Goals :
Apache Flume Components
- Source —————–Channel———————–Sink
- Data is moved between source,channel,sink etc in the form of “Event”.
- Event consists of header and payload.
- Payload is in the form of byte array.
- There can be multiple channels as well.
- Source can write to multiple channels.
- Channel is a holding area which holds events from source before passing to sink.
- Sink cannot take event directly from source. Event has to come from channel.
- Source, Channel and Sink run inside a Daemon process called Agent
- Agent -> can have multiple source,multiple channels, multiple sinks
A Simple Example
//sources,channels and sinks are defined per agent
//and in this case called ‘agent’
agent.sources = seqGenSrc
agent.channels = memoryChannel
agent.sinks = loggerSink
//for each one of the sources, type is defined
agent.sources.seqGenSrc.type = exec
agent.sources.seqGenSrc.command = tail -F /home/ruchi/flumedata/logfile
agent.sources.seqGenSrc.channels = memoryChannel
//Channel’s type is defined
agent.channels.memoryChannel.type = memory
agent.channels.memoryChannel.capacity = 100
agent.sinks.loggerSink.channel = memoryChannel
agent.sinks.loggerSink.type = hdfs
agent.sinks.loggerSink.hdfs.path = hdfs://:8020/user/ruchi/logs
agent.sinks.loggerSink.hdfs.fileType = DataStream
- copy flume.conf in /usr/local/flume/conf[for example)
- In command prompt, use the following command to start flume
- flume-ng agent –conf-file /usr/local/flume/conf/flume.conf -name agent -Dflume.root.logger=INFO,console
- For /home/ruchi/flumedata/logfile, You can see that in the hdfs logs directory there is corresponding file w.r.t. tail done
- If you make changes in the /home/ruchi/flumedata/logfile, there will be file created in the hdfs logs directory
Hope this helps…