69443 - ALGORITHMS AND DATA STRUCTURES FOR COMPUTATIONAL BIOLOGY

Academic Year 2013/2014

  • Teaching Mode: Traditional lectures
  • Campus: Bologna
  • Corso: Second cycle degree programme (LM) in Bioinformatics (cod. 8020)

Learning outcomes

At the end of the cycle of lessons, the student has the basic knowledge of the methodologies and issues for the definition and complexity analysis of correct and efficient algorithms and data structures. In particular, the student has basic knowledge on abstract data types and is able to design scalable and arbitrarily complex data structures functional to the algorithmic requirements. The student will be able to create a functional synergy between the design of correct and efficient algorithms and the corresponding well designed data structures for common computational tasks, under the optimization criteria of both computational and space complexity for algorithms execution. The algorithmic methodologies will be presented with concrete examples and applications, including iterative and recursive programming, divide et impera, greedy and dynamic programming techniques. The student will be able to analyze the complexity of existing algorithms and data structures, using analysis methodologies for algorithmic complexity evaluation, and will be trained on the design and analysis new algorithms and data structures for computational biology.

Course contents

The international master in bioinformatics requires that all the classes are kept in english.
Introduction to algorithms and basic data structures: definitions, algorithms, simple data structures. Lists, arrays, simple operations, examples, implementation. Complex data structures, computation efficiency and relation between computation efficiency and data structures. Stacks and queues (definition, examples, implementation). Trees, visit algorithms of a tree and tree operations. Sets, dictionaries and hash tables, their operations and implementation concepts. Graphs, visit algorithms for graphs, operations and implementation concepts. Exercices and tests ( graphs and trees, priority queues and heaps). Binary trees, search trees, balanced binary trees, B-trees, AVL trees.
Algorithms. Decision tree. Analysis of computational complexity of algorithms. O(), Omega() and Theta() functions. Iterative and recursive programming methodologies. Divide and conquer. Recursion. Greedy techniques (knapsack problem, radix sorting problem, Huffman codes). Binary search. Sorting algorithms. Sequence pattern analysis and matching. Algorithm optimization. Complexity analysis of algorithms. Exercises and tests. 

Readings/Bibliography

It is fundamental to use the electronic slides and material provided during the classes, including live exercises. 
 Electronic slides will be provided (also made available on the teachers' website). 
 
Suggested readings (not to be necessarily purchased).  
 
Books more specific on computational biology topics:       
- Carlos Setubal, Joao Meidanis, Introduction to computational molecular biology, Cengage Learning eds.;    
- 1997 C. Gibas, P. Jambeck, Developing Bioinformatics Computer Skills, O'Reilly media, 2001  
 
Books more general on computer algorithms and data structures topics:   
- Alfred V. Aho, Jeffrey D. Ullman, John E. Hopcroft. Data Structures and Algorithms. Addison Wesley, 1983.   
- Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein. Introduction to Algorithms. McGraw-Hill, 2001.   
- Donald E. Knuth. The Art of Computer Programming, Volumes 1-3. Addison-Wesley Professional, 1998.   
- S.B. Kishor Data Structures, Edition 3. Das Ganu Prakashan, Nagpur, 2008.

Teaching methods

This course is allocated in the second semester, as a single module totalling 10 cfu. The course is structured with live lessons, kept in english, illustrating the theoretical aspects of the topics concerning the data structures and algorithmic design for computational biology. After the introduction of basic definitions and concepts, the course illustrates the design of simple data structures and the relationship with the costs of execution (computational efficiency) and memory occupation (space) on the computer architecture.  Some results are presented and demonstrated concerning the design of efficient algorithms and their relationship with efficient and well designed data structures, functional to the algorithm execution. Some programming and algorithmic design techniques are introduced and discussed in terms of computational and space complexity: iterative, recursive, divide et impera, greedy, dynamic programming. Finally, space is given to the application of design concepts towards the realization of solutions for exercises and analysis methodologies in bioinformatics and molecular biology. In addition to the live lessons, live exercises in class and homeworks exercises are provided.

Assessment methods

The final exam is composed of a written part (3 hours) and an oral part (30 minutes). During the written part it is allowed the use of books and notes. It is NOT allowed the use of calculator and communication devices, including Internet access. The written part is composed of two sections: the first section concerns the design and analysis of algorithms and data structures oriented to the maximum computational and space efficiency, under some predefined constraints of the execution architecture, by adopting the techniques and knowledge learnt during the classes. Specifically, the candidates must develop the pseudo code of the algorithm and the corresponding well designed data structures, by commenting and motivating the choices made, and defining the space and time complexity of execution. The first exercise will takes more or less 90 minutes, and the weigth of the score is more or less 50% of the total score. Such details on time and weight are provided in the text, case by case. The second section of the written part concerns exercises on the topics of the classes, including interpretation and design of the concepts learned. Such exercises are derived and similar to those made in class. The time limit is 90 minutes and the weight is more or less 50% of the total score. The calendar of written exams is provided by the end of the classes, and at least one writen exam session is allocated in June, July, September, October, December and February. It is mandatory to subscribe to the list of participants on almaesami within the announced deadlines. Provided that the candidate passes the written exam (with evaluation greater or equal to 18/30), the candidate is scheduled for the oral part, which must be afforded within the end of the exam session, in the days following the written exam. In case of insufficient evaluation or no show at the oral part, the candidate must repeat the written exam in next sessions. Upon sufficient evaluation in the oral part, the candidate is provided with a final evaluation computer as weighted average of the evaluations in the written and oral parts. The weight of the written part is 80% and the oral part is 30% (total 110%). The sum (written score * 0,80) + (oral score * 0,30) provides the final score. In case the final score is above 30/30 the candidate will be awarded the "cum laude" evaluation. To pass the exam it is mandatory to receive a sufficient evaluation in both the written and oral parts. It is facultative for candidates to realize a personal lab project dealing with the realization of algorithmic solutions to bioinformatic problems, based on, or extending the current state of the art. The facultative project will provide a range of (0, +3) points to be added to the final evaluation of the written and oral part.


Teaching tools

Electronic slides and projector.
Personal computer.
Black board.

Links to further information

http://www.cs.unibo.it/~bononi/

Office hours

See the website of Luciano Bononi