Guía Docente 2020-21


Id.: 33709
Subject type: OBLIGATORIA
Year: 3 Teaching period: Primer Cuatrimestre
Credits: 3 Total hours: 75
Classroom activities: 27 Individual study: 48
Main teaching language: Inglés Secondary teaching language: Castellano
Lecturer: JIMENO MARTIN, ANGELA (T) Email:


This subject presents the principles needed to integrate different biological data sources. It introduces the most common approaches through a review of their architecture. These architectures are described through practical case studies of currently integrated biological repositories. This subject will provide an overview of XML language and related tools (XLST, XPath, XQuery and XML Schema) used to manage and retrieve information from biological databases and handle the meta-data that enable the different integration architectures. Finally the subject will cover the main matching and mapping techniques that make possible the semantic and syntactic integration of the information.


General programme competences G01 Use learning strategies autonomously for their application in the continuous improvement of professional practice.
G02 Perform the analysis and synthesis of problems of their professional activity and apply them in similar environments.
G03 Cooperate to achieve common results through teamwork in a context of integration, collaboration and empowerment of critical discussion.
G04 Reason critically based on information, data and lines of action and their application on relevant issues of a social, scientific or ethical nature.
G05 Communicate professional topics in Spanish and / or English both orally and in writing.
G06 Solve complex or unforeseen problems that arise during the professional activity within any type of organisation and adapt to the needs and demands of their professional environment.
G07 Choose between different complex models of knowledge to solve problems.
G09 Apply information and communication technologies in the professional field.
G10 Apply creativity, independence of thought, self-criticism and autonomy in the professional practice.
Specific programme competences E02 Develop the use and programming of computers, databases and computer programs and their application in bioinformatics.
E03 Apply the fundamental concepts of mathematics, logic, algorithmics and computational complexity to solve problems specific to bioinformatics.
E04 Program applications in a robust, correct, and efficient way, choosing the paradigm and the most appropriate programming languages, applying knowledge about basic algorithmic procedures and using the most appropriate types and data structures.
E05 Implement well-founded applications, previously designed and analysed, in the characteristics of the databases.
E06 Apply the fundamental principles and basic techniques of intelligent systems and their practical application in the field of bioinformatics.
E07 Apply the principles, methodologies and life cycles of software engineering to the development of a project in the field of bioinformatics.
E12 Apply the principles and techniques of protein computational modelling to predict their biological function, their activity or new therapeutic targets (Structural Bioinformatics, Computational Toxicology).
E13 Apply omics technologies for the extraction of statistically significant information and for the creation of relational databases of biodata that can be updated and publicly accessible to the scientific community.
E14 Use programming languages, most commonly used in the field of Life Sciences, to develop and evaluate techniques and/ or computational tools.
E15 Infer the evolutionary history of genes and proteins through the creation and interpretation of phylogenetic trees.
E16 Plan linkage and association studies for medical and environmental purposes.
E17 Induce complex relationships between samples by applying statistical and classification techniques.
E18 Apply statistical and computational methods to solve problems in the fields of molecular biology, genomics, medical research and population genetics.
E21 Apply computational and data processing techniques for the integration of physical, chemical and biological concepts and data for the description and/ or prediction of the activity of a substance in a given context.


It is recommended that students have a global vision about main biological databases and understand basic SQL syntax.



Due to the blended learning organization of this degree (online and place-based learning activities), students are expected to alternate weeks of traditional classes at University facilities with weeks of working through online platform webinars (via Microsoft TEAMS) and activities. However, the students have the chance to choose optinally to come in person at regular class schedule to carry out on-site activities.

Consequently with the health situation due to covid-19 pandemic, the evaluation tests that cannot be hold at on-site class will be develop through online platform.

Subject contents:

1 - Introduction
    1.1 - Overview of information systems in bioinformatics
    1.2 - Types and requirements
2 - Architectures for Information Integration
    2.1 - Data Warehouse
    2.2 - Federated Databases
    2.3 - Mediator-base Databases
    2.4 - Peer-to-peer Databases
3 - XML language applied to Bioinformatics
    3.1 - Introduction to XML
    3.2 - XPath and XSLT
    3.3 - DTD and XML Schema
    3.4 - XQuery
4 - Schema and meta-data management at information integration systems
    4.1 - Matching techniques
    4.2 - Mapping techniques

Subject planning could be modified due unforeseen circumstances (group performance, availability of resources, changes to academic calendar etc.) and should not, therefore, be considered to be definitive.


Teaching and learning methodologies and activities applied:

Theory Sessions: Lectures will be used to explain the basis of the different chapters. Wherever  possible,explanations will be accompanied by images, text or sounds to be used as practical examples and discussion topics. During the sessions, the lecturer will propose activities or to look for information out of the class and she will resolve doubts.

Practical activities: During these sessions, student will see real examples of different integration architectures explained in class, available through different websites, and they will learn to take advantage of mining tools offered at each site to retrieve data. Also, they will apply concepts explained in class with hands-on practice creating XML files. They should be able to expand it with the content explained in class and other bibliographic resources.

The lecturer will be available to students during the tutorial schedule to help them in all matters concerning the course. On request, group tutorials may be programmed to control the work of the group. The concepts explained in one chapter will be used in the followings.

Student work load:

Teaching mode Teaching methods Estimated hours
Classroom activities
Master classes 15
Other theory activities 5
Practical work, exercises, problem-solving etc. 3
Coursework presentations 2
Assessment activities 2
Individual study
Tutorials 3
Individual study 11
Individual coursework preparation 15
Project work 6
Compulsory reading 5
Recommended reading 3
Other individual study activities 2
Information research 3
Total hours: 75


Calculation of final mark:

Written tests: 20 %
Individual coursework: 15 %
Final exam: 45 %
Individual project: 20 %
TOTAL 100 %

*Las observaciones específicas sobre el sistema de evaluación serán comunicadas por escrito a los alumnos al inicio de la materia.


Basic bibliography:

Silberschatz, Korth and Sudarshan. Database System Concepts. Mcgraw-Hill S.A, 2007.
LACROIX, Zoe and CRITCHLOW, Terence. Bioinformatics: Managing Scientific Data (The Morgan Kaufmann Series in Multimedia Information and Systems). Morgan Kaufmann Publishers Inc., 2003
CERAMI, Ethan. XML for Bioinformatics. Springer, 2005.

Recommended bibliography:

ABITEBOUL, Serge and MANOLESCU, Ioana. Web Data Managment. Cambridge University Press, 2011
CHEN, Ming and HOFESTÄDT, Ralf. Approaches in Integrative Bioinformatics (Towards the Virtual Cell). Springer, 2014.

Recommended websites:

About data warehouse
Apache Cassandra
NeuroImaging tools and resources collaboratory
W3C Extensible Markup Language (XML) 1.0
XPath and XQuery Data Model 3.1
XSL Transformations (XSLT) Version 3.1
XML Schema