Object recognition has advanced significantly in recent years, with large scale benchmarks and CNN's pushing the domain in excellent directions. Action recognition has made similar pushes toward large scale benchmarks, however due to the high cost of labelling data it has struggled achieving this aim.
Definition of an action is often implicit within papers and ends up being a concatenation of various appearance based models. To this end we are interested in exploring to what extent actions can be described though combinations of object detectors (Pure appearance based model). This builds on top of recent object detection work, allowing us to more conclusively answer this question. Exploring various combinations and parameters for techniques for combining the output of object detectors.