Optimizing parallel programs using composable locality models