Characterization and exploitation of nested parallelism and concurrent kernel execution to accelerate high performance applications