Guía Docente 2020-21
APRENDIZAJE AUTÓNOMO Y EVOLUCIÓN

BASIC DETAILS:

Subject: APRENDIZAJE AUTÓNOMO Y EVOLUCIÓN
Id.: 33295
Programme: GRADUADO EN BIOINFORMÁTICA. PLAN 2019 (BOE 06/02/2019)
Module: BIOINFORMÁTICA
Subject type: OBLIGATORIA
Year: 2 Teaching period: Segundo Cuatrimestre
Credits: 6 Total hours: 150
Classroom activities: 60 Individual study: 90
Main teaching language: Inglés Secondary teaching language: Castellano
Lecturer: Email:

PRESENTATION:

The main goal of this subject is to explore the algorithms for bioinformatics pourpouses and their computational application in order to understand the mechanisms of biological database searching, protein and nucleic acid aligments, perform predictions of bidimensional structures, identify genes and detect regulatory sequences. Statistics and machine learning will be included to support the basis of the algorithms that are based on them.

Search algorithms as greedy or exhaustive will be studied to find repeated sequence as motifs or in the DNA mapping.

Dynamic programming will be used to analyse the local and global alignments of aminoacid and nucleotides sequences based on similarity or distance calculations. Softwares, as GENESCAN, will be studied to see a direct application of the statistical Likehood Ratio.

Identification of DNA, proteins as well as genome assembly will be assess through graphical algorithms, for what Euler's theorem will be introduced.

A machine learning approximation will be done with Hidden Markov Models as long as the Montecarlo sampling. The searching of gene regulatory sequence, crutial structures in epigenetics, the CpG islands need from the HMM with the decoding algorithms of Viterbia and Baum-Welch. This part will also include stochastic models and log odd ratio.

More complex applications of dynamic programming and stochastic models will be studied through the determination of the RNA secondary structure. Structural aligments require from HMM and stochastic context free grammar to make probabilistic competitive approximations. Other machine learning algorithms will be applied in order to determine the secondary structure of proteins, as well as the Chou and Fasman methods design for the same aim.

Algorithms were developped in Python and R languages, being chosen whichever suits better for the computational requirements.

PROFESSIONAL COMPETENCES ACQUIRED IN THE SUBJECT:

General programme competences G01 Use learning strategies autonomously for their application in the continuous improvement of professional practice.
G02 Perform the analysis and synthesis of problems of their professional activity and apply them in similar environments.
G04 Reason critically based on information, data and lines of action and their application on relevant issues of a social, scientific or ethical nature.
G05 Communicate professional topics in Spanish and / or English both orally and in writing.
G07 Choose between different complex models of knowledge to solve problems.
G10 Apply creativity, independence of thought, self-criticism and autonomy in the professional practice.
Specific programme competences E02 Develop the use and programming of computers, databases and computer programs and their application in bioinformatics.
E03 Apply the fundamental concepts of mathematics, logic, algorithmics and computational complexity to solve problems specific to bioinformatics.
E06 Apply the fundamental principles and basic techniques of intelligent systems and their practical application in the field of bioinformatics.
E12 Apply the principles and techniques of protein computational modelling to predict their biological function, their activity or new therapeutic targets (Structural Bioinformatics, Computational Toxicology).
E14 Use programming languages, most commonly used in the field of Life Sciences, to develop and evaluate techniques and/ or computational tools.
E15 Infer the evolutionary history of genes and proteins through the creation and interpretation of phylogenetic trees.
E18 Apply statistical and computational methods to solve problems in the fields of molecular biology, genomics, medical research and population genetics.
Learning outcomes R01 Contrast the different autonomous learning algorithms for use in different biotechnological areas.
R02 Choose the best autonomous learning tools in each case.
R03 Apply existing bioinformatics tools to search for sequence and structural homologies.
R04 Predict genes that encode proteins or RNA in an unknown genomic environment.
R05 Derive the evolutionary history of biological sequences and genomic rearrangements.
R06 Calculate the secondary structure of RNA and proteins.
R07 Classify the relevant results of a gene expression microarray.
R08 Discriminate biologically important information from public databases for use in autonomous learning algorithms.

PRE-REQUISITES:

Statistics is needed to understand the underlying knowledge of bioinformatics algorithms. Programming skills in R and Python would be necessary in order to understand and to code properly the bioinformatics algorithms explained. Knowledge in molecular biology and structural features of the main biomolecules are essential.

 

SUBJECT PROGRAMME:

Observations:


The order of the units might vary depeding on the learning pace of the students. The content of the units can be modify to adapt it to all the students.

It is highly recommended to the student to study, revise and practise the learnt algorithms before starting with the next.

Unexpected events might change the subject, as resources disponibility, academic calendar or group capacity, for these reasons cannot be considerated definitely closed.

Subject contents:

1 - Introduction to bioinformatics algorithms
    1.1 - Algorithm definition. Types of algorithms: recursive, iterative, fast, slow. Big-O Notation. Examples
    1.2 - Algorithms in bioinformatics and their application. Examples.
2 - Mapping DNA and Finding Signals.
    2.1 - Exhaustive search / Bruteforce algorithms: restriction mapping. Coding in python bruteforce algorithms.
    2.2 - Exhaustive search / Bruteforce algorithms.
    2.3 - Greedy algorithms: A greedy approach to motif finding.
3 - Comparing sequences and predicting genes.
    3.1 - Dynamic programming algorithms. Python/R code for alignments.
    3.2 - Dynamic programming algorithms. Python/R code for gene prediction.
    3.3 - Divide-and-conquer algorithms. Python code for: Space-efficient Sequence and block aligment.
    3.4 - Combinatorial pattern matching. Python/R code: exact pattern matching.
    3.5 - Combinatorial pattern matching. BLAST.
    3.6 - Burrows-Wheeler Matrix: NGS aligments.
4 - CpG islands search.
    4.1 - Montecarlo sampling. Stochastic models. Markov chains.
    4.2 - Hidden Markov Models.
5 - DNA Sequencing.
    5.1 - Graph algorithms. DNA microarrays.
    5.2 - Graph algorithms. Genome assembly.
6 - RNA secondary structure.
    6.1 - RNA secondary structure.
    6.2 - Zuker and Nussinov algorithms for prediction.
    6.3 - Competitive probabilistic aproximation.
    6.4 - Structural aligments.
7 - Protein Sencondary Structure.
    7.1 - Protein secondary structures.
    7.2 - Chou-Fasman method.
    7.3 - Improvement of Chou-Fasman method. GOR method.

Subject planning could be modified due unforeseen circumstances (group performance, availability of resources, changes to academic calendar etc.) and should not, therefore, be considered to be definitive.


TEACHING AND LEARNING METHODOLOGIES AND ACTIVITIES:

Teaching and learning methodologies and activities applied:

The student must be concerned to practise everytime after a lecture the coding work proposed in order to not get behind the rest. Programming skills are trained through practise, for that reason it is highly recommended a continous studying of the subject, reproducing examples and doing the proposal exercices.

Magistral lectures will be alterned with practical in obligatory assistance sessions. The programming languages that are going to be used are Python and R. Practical lessons will be aimed to get fluency in coding in Python languages using laboratory work (real examples of sequences to be compared, assembled...) and cases analysis. Most of the time students will have to translate from pseudocode to the corresponding language the orders the programmes have to follow.

As the subject needs from logical intelligence application and coding practise, all works to hand will be individuals. Nevertheless, common sessions can be used to put in commons problems, doubts or any kind of difficulty students might find. Students will be supported by the teacher all the time through e-mail and tutor appointments.

Two individual works will be handed through all the semester duration. This way, students must demonstrate their programming habilities and the capacities to use bioinformatics algorithms to solve the assigned cases. Moreover, students have to show understanding of the probabilistic issues behind of algorithms applied in each of the cases. Both coursework will be presented to the class, students must show the programme elaborated by themselves works come along with a clear and concise explanation.

 

Student work load:

Teaching mode Teaching methods Estimated hours
Classroom activities
Master classes 25
Practical exercises 15
Practical work, exercises, problem-solving etc. 16
Workshops 4
Individual study
Tutorials 3
Individual study 24
Individual coursework preparation 35
Research work 15
Compulsory reading 4
Recommended reading 4
Other individual study activities 5
Total hours: 150

ASSESSMENT SCHEME:

Calculation of final mark:

Individual coursework: 30 %
Final exam: 50 %
Presentaciones: 20 %
TOTAL 100 %

*Las observaciones específicas sobre el sistema de evaluación serán comunicadas por escrito a los alumnos al inicio de la materia.

BIBLIOGRAPHY AND DOCUMENTATION:

Basic bibliography:

JONES, Neil C.; PEVZNER, Pavel A. An introduction to bioinformatics algorithms. MIT press, 2004.
COMPEAU, Phillip; PEVZNER, P. A. Bioinformatics algorithms: an active learning approach, Vol. I. Sl: ACTIVE LEARNING, 2015.
COMPEAU, Phillip; PEVZNER, P. A. Bioinformatics algorithms: an active learning approach, Vol. II. Sl: ACTIVE LEARNING, 2015.
DURBIN, Richard, et al. Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge university press, 1998.

Recommended bibliography:

ANDERSON, James WJ, et al. Evolving stochastic context-free grammars for RNA secondary structure prediction. BMC bioinformatics, 2012, vol. 13, no 1, p. 78.
NEBEL, Markus E.; SCHEID, Anika. Evaluation of a sophisticated SCFG design for RNA secondary structure prediction. Theory in Biosciences, 2011, vol. 130, no 4, p. 313-336.
MODEL, Mitchell L. Bioinformatics Programming Using Python: Practical Programming for Biological Data. " O'Reilly Media, Inc.", 2009.

Recommended websites:

Documentación python https://docs.python.org/2.7/
Problemas python http://rosalind.info/problems/locations/
Bioconductor http://bioconductor.org/
Algoritmos bioinformáticos http://www.bioalgorithms.info


* Guía Docente sujeta a modificaciones