Towards Natural Language Understanding Using Multimodal Deep Learning