Computational Challenges in Protein-RNA Interactions
Protein-RNA binding, mediated through both RNA sequence and structure, plays vital role in many cell processes, including neurodegenerative-diseases. Modeling the sequence and structure binding preference of an RNA-binding protein is a key computational challenge. Accurate models will enable prediction of new interactions and better understanding of the binding mechanism. In addition, designing compact and efficient sequence libraries to experimentally measure these interactions is necessary to discover novel binding preferences. In this talk, I will present my work in solving these two challenges. In the first part, I will describe RCK, an efficient algorithm to learn k-mer based sequence and structure scores, which outperforms the state-of-the-art. I will give examples of novel biological insights we can gain by applying RCK to the largest dataset of protein-RNA interactions. In the second part, I will consider the problem of generating a minimum-size set of unstructured RNA sequences covering all k-mers. I will prove that a general definition of this problem is NP-hard, and describe CurlCAKE, a greedy heuristic to solve this problem that works well in practice. I will conclude with open questions and future plans.