Multi-modal analysis for music performances