Details
-
Feature Request
-
Resolution: Obsolete
-
Major
-
None
-
7.0.0.Alpha4
-
None
Description
Map/Reduce tasks should collect statistics during the task execution that can be returned to the user to help them determine the optimal settings for the task. Here are some thoughts on useful statistics:
Final status - completed, failed, cancelled, etc.
Duration - either overall, per node, per phase (map, reduce, combine, collate)
Number of nodes participating in the task
Keys in the intermediate cache
Keys in the result map
Node specific statistics:
Status of node - completed, failed, cancelled, etc.
Number of keys processed
Max size of collector
Here are the built in counters that are reported by Hadoop:
https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-8/counters