Exploiting Application Characteristics for Efficient System Support of Data-Parallel Machine Learning