A data science approach to pattern discovery in complex structures with applications in bioinformatics