Apache Flume (flume.apache.org) is a tool for handling log data. This is especially meant for web-server log-files.


Source Description
netcat The flume agent is listening on a TCPIP port. This is often used for testing purposes.
syslogtcp Can be used if the webserver is exposing his logs using tools like Rsyslog or Syslog-ng.
exec Can be used to execute a command on the webserver like tail -f /etc/httpd/logs/access_log


Channel Description
memory The channel stores the logfiles in memory. Shouldn't be used in production.
JDBC Use JDBC to persist data in a database.
kafka You can configure a kafka channel to write to a kafka topic. A sink isn't needed in this case.


Sink Description
logger outputs the data to the agents console window. Use this sink-type for testing
avro Use avro-sinks to compile log-files from different web-servers or to store it on HDFS


You only need to configure source, channel & sink and execute the flume agent.