Human behaviour understanding in crowded spaces represents a challenge regarding the location of pedestrians, the estimation and inferation of their body and head pose. Human detection and pose estimation are two closely related problems that have been tackledindependently. By coupling both detection and estimation it is possible to incrementally learn models to face both problems. This a novel framework proposing a joint detection and recognition of both head and body poses. The framework is based on learning an ensemble of pose-sensitive human body models whose outputs provide a new representation for poses. This type of work avoids tedious and inconsistent manual annotation for learning pose-sensitive models. Consequently, I formulated a semi-supervised learning method for model training which bootstraps an initial model using a small set of labelled data, and subsequently improves the model iteratively by data mining from a large unlabelled dataset.Experiments in CCTV videos from a busy station of the London Underground demonstrate that the proposed method significantly outperforms a state-of-the-art person detector and is able to yield extremely accurate head and body pose estimation in crowded public spaces.