We present a hierarchical model for human activity recognition in entire multi-person scenes. Our model describes human behaviour at multiple levels of detail, ranging from low-level actions through to high-level events. We also include a model of social roles, the expected behaviours of certain people, or groups of people, in a scene. The hierarchical model includes these varied representations, and various forms of interactions between people present in a scene. The model is trained in a discriminative max-margin framework. Experimental results demonstrate that this model can improve performance at all considered levels of detail, on two challenging datasets.
This paper describes a general model that subsumes many parametric models for continuous data. The model comprises hidden layers of state-space or dynamic causal models, arranged so that the output of one provides input to another. The ensuing hierarchy furnishes a model for many types of data, of arbitrary complexity. Special cases range from the general linear model for static data to generalised convolution models, with system noise, for nonlinear time-series analysis. Crucially, all of these models can be inverted using exactly the same scheme, namely, dynamic expectation maximization. This means that a single model and optimisation scheme can be used to invert a wide range of models. We present the model and a brief review of its inversion to disclose the relationships among, apparently, diverse generative models of empirical data. We then show that this inversion can be formulated as a simple neural network and may provide a useful metaphor for inference and learning in the brain.
The recognition of hierarchical structure in human behavior was one of the founding insights of the cognitive revolution. Despite decades of research, however, the computational mechanisms underlying hierarchically organized behavior are still not fully understood. Recent findings from behavioral and neuroscientific research have fueled a resurgence of interest in the problem, inspiring a new generation of computational models. In addition to developing some classic proposals, these models also break fresh ground, teasing apart different forms of hierarchical structure, placing a new focus on the issue of learning and addressing recent findings concerning the representation of behavioral hierarchies within the prefrontal cortex. In addition to offering explanations for some key aspects of behavior and functional neuroanatomy, the latest models also pose new questions for empirical research.
The ventral visual stream underlies key human visual object recognition abilities. However, neural encoding in the higher areas of the ventral stream remains poorly understood. Here, we describe a modeling approach that yields a quantitatively accurate model of inferior temporal (IT) cortex, the highest ventral cortical area. Using high-throughput computational techniques, we discovered that, within a class of biologically plausible hierarchical neural network models, there is a strong correlation between a model's categorization performance and its ability to predict individual IT neural unit response data. To pursue this idea, we then identified a high-performing neural network that matches human performance on a range of recognition tasks. Critically, even though we did not constrain this model to match neural data, its top output layer turns out to be highly predictive of IT spiking responses to complex naturalistic images at both the single site and population levels. Moreover, the model's intermediate layers are highly predictive of neural responses in the V4 cortex, a midlevel visual area that provides the dominant cortical input to IT. These results show that performance optimization—applied in a biologically appropriate model class— can be used to build quantitative predictive models of neural processing.
Associated to each simplicial complex is a binary hierarchical model. We classify the simplicial complexes that yield unimodular binary hierarchical models. Our main theorem provides both a construction of all unimodular binary hierarchical models, together with a characterization in terms of excluded minors, where our definition of a minor allows the taking of links and induced complexes. A key tool in the proof is the lemma that the class of unimodular binary hierarchical models is closed under the Alexander duality operation on simplicial complexes.
Semantic segmentation is the problem of assigning an object label to each pixel. It unifies the image segmentation and object recognition problems. The importance of using contextual information in semantic segmentation frameworks has been widely realized in the field. We propose a contextual framework, called contextual hierarchical model (CHM), which learns contextual information in a hierarchical framework for semantic segmentation. At each level of the hierarchy, a classifier is trained based on downsampled input images and outputs of previous levels. Our model then incorporates the resulting multi-resolution contextual information into a classifier to segment the input image at original resolution. This training strategy allows for optimization of a joint posterior probability at multiple resolutions through the hierarchy. Contextual hierarchical model is purely based on the input image patches and does not make use of any fragments or shape examples. Hence, it is applicable to a variety of problems such as object segmentation and edge detection. We demonstrate that CHM performs at par with state-of-the-art on Stanford background and Weizmann horse datasets. It also outperforms state-of-the-art edge detection methods on NYU depth dataset and achieves state-of-the-art on Berkeley segmentation dataset (BSDS 500).
Ecological research uses data collection techniques that are prone to substantial and unique types of measurement error to address scientific questions about species abundance and distribution. These data collection schemes include a number of survey methods in which unmarked individuals are counted, or determined to be present, at spatially-referenced sites. Examples include site occupancy sampling, repeated counts, distance sampling, removal sampling, and double observer sampling. To appropriately analyze these data, hierarchical models have been developed to separately model explanatory variables of both a latent abundance or occurrence process and a conditional detection process. Because these models have a straightforward interpretation paralleling mechanisms under which the data arose, they have recently gained immense popularity. The common hierarchical structure of these models is well-suited for a unified modeling interface. The R package unmarked provides such a unified modeling framework, including tools for data exploration, model fitting, model criticism, post-hoc analysis, and model comparison.
Visual processing in cortex is classically modeled as a hierarchy of increasingly sophisticated representations, naturally extending the model of simple to complex cells of Hubel and Wiesel. Surprisingly, little quantitative modeling has been done to explore the biological feasibility of this class of models to explain aspects of higher-level visual processing such as object recognition. We describe a new hierarchical model consistent with physiological data from inferotemporal cortex that accounts for this complex visual task and makes testable predictions. The model is based on a MAX-like operation applied to inputs to certain cortical neurons that may have a general role in cortical function.