Apache Flume (flume.apache.org) is a tool for handling log data. This is especially meant for web-server log-files.
Source | Description |
netcat | The flume agent is listening on a TCPIP port. This is often used for testing purposes. |
syslogtcp | Can be used if the webserver is exposing his logs using tools like Rsyslog or Syslog-ng. |
exec | Can be used to execute a command on the webserver like tail -f /etc/httpd/logs/access_log |
Channel | Description |
memory | The channel stores the logfiles in memory. Shouldn't be used in production. |
JDBC | Use JDBC to persist data in a database. |
kafka | You can configure a kafka channel to write to a kafka topic. A sink isn't needed in this case. |
Sink | Description |
logger | outputs the data to the agents console window. Use this sink-type for testing |
avro | Use avro-sinks to compile log-files from different web-servers or to store it on HDFS |
You only need to configure source, channel & sink and execute the flume agent.