[ Announcements | Homeworks | Projects ]



CSC 450/550  Algorithms in Bioinformatics

Class meetings Fall 2024
Tuesdays and Thursdays, 2:00-3:15pm
in-person at Gould-Simpson 906
Dates
August 27
November 28
December 10
  First class
  Thanksgiving recess
  Last class
Course description This course introduces fundamental results in discrete algorithms for combinatorial problems in bioinformatics and computational biology. The emphasis is on realistic models of computational problems that arise in the analysis of biological data, and practical algorithms for their solution. The content has depth in the area of biological sequence analysis, and breadth in areas such as pathway inference in cellular reaction networks. Grades are based on homeworks, exams, and a term paper.
Official catalog description The official catalog description of the course is available at the University course catalog here.
Prerequisites CSC 345 (Analysis of Discrete Structures), or permission of instructor
Instructor John Kececioglu
kece@cs.arizona.edu
Gould-Simpson 727
(520) 621-4526
Assistant Aunabil Chakma
aunabilchakma@arizona.edu
Office hours
Monday   11am-12pm    Aunabil    Gould-Simpson 942
Tuesday   3:30-4:30pm    John    Gould-Simpson 727
Wednesday   1:30-2:30pm    John    Gould-Simpson 727
Web
Course homepage
Instructor homepage
D2L site
  www.cs.arizona.edu/classes/cs450/fall24/
  www.cs.arizona.edu/people/kece/
  d2l.arizona.edu/d2l/home/1507437
Course format The course is taught in-person in a lecture format (without an online Zoom stream or video recordings of lectures). Lectures are given by the instructor. Homeworks and exams all involve individual work, while the term paper can involve group work.
Course objectives The teaching objectives for the course are algorithms for: biological sequence analysis, specifically string matching and sequence alignment; and higher-level analysis, specifically pathway inference in reaction networks.
Learning outcomes CSC 450 and 550   Students successfully completing the course will learn fundamental algorithms that are widely used in bioinformatics, and will gain the capacity to design new algorithms for bioinformatics and computational biology applications. Skill in algorithm design for bioinformatics will be acquired through problem solving on homeworks and exams.

CSC 550   For graduate students taking CSC 550, the ability to perform problem reduction (reducing new problems to known solved problems) will also be acquired through problem solving on homeworks.

Course communications Online communication outside the lectures will be via postings on the announcements section at the bottom of this page, and by email with the instructor at the address given above.
Required text Wing-Kin Sung,
Algorithms in Bioinformatics: A Practical Introduction,
Chapman and Hall, Boca Raton, Florida, 2010.

(The University Library's ebook for the above required text, which provides free access for students, is available here.)

Related books The following books are also relevant.
Handouts The handouts that have been used in lectures are listed below.
Homeworks There are four homework assignments, which emphasize algorithm design and analysis. The last homework is optional and not required, and may be used to replace your lowest prior homework score.
Exams There is one exam: a midterm.
  • Midterm exam (450/550), due October 17 by 11:59pm.
The final project that is described below will serve as the final exam for the course. (The University's final exam regulations are here, and the University's final exam schedule is here.)
Project There is one final project for the course, that serves as the final exam. The project consists of a term paper that surveys the literature on a topic outside the lectures, or presents work toward developing a new result in bioinformatics.
Grading

Course grades use the regular scale of A, B, C, D, and E. Grades are determined from a weighted average of the percentage of points earned on homeworks, the midterm exam, and the term paper for the final project.

30%   Homeworks
40%   Midterm exam
30%   Final project

Attaining a weighted score of at least 90% guarantees an A in the course, at least 80% guarantees a B, at least 70% guarantees a C, and at least 60% guarantees a D. These thresholds may be lowered, but will not be raised.

The awarding of points on homework and exam questions is roughly according to the following scheme. Having the right solution idea and the right technical execution of this idea earns at least 90%. (Only a perfect write-up that cannot be improved earns 100%.) Having the right idea but with errors in its execution is at least 80%. Having the wrong idea and errors in its execution, but demonstrating comprehension of the material, is at least 70%. Having the wrong idea, errors in execution, and deficiencies in comprehension, is roughly 60%. Work that shows no understanding is roughly 50%.

On homework and exam questions, writing an answer that relates to the question guarantees at least 50% of the points for the question (which is a failing percentage). No points are awarded for writing nothing.

Homeworks may contain bonus problems. Points earned on bonus questions are not added to points for required questions. Bonus points are tallied separately at the end of the semester, and considered subjectively as a measure of effort when deciding whether to move up a student who is near a letter-grade boundary.

On homework, very-high-level ideas can be discussed with friends, but solutions must represent individual work and must be written up separately. Use of solutions from previous offerings of the course is not permitted. Any material from the Internet that is used in a solution must be cited by its URL; to not cite it is plagiarism, which is considered cheating. (For more information, see the Department of Computer Science Course Policy on Collaboration and the University Code of Academic Integrity.)

When turning in solutions to homeworks, start each problem on a separate page, and put the problems in the correct order. Neatness, and especially conciseness, is required to earn the highest marks. If a problem eludes solution, state this up front and write down only what you know to be correct. Rambling at length about attempts that didn't succeed may result in more points being deducted.

Homework that is late (submitted after the due date and time) is not accepted.

Convened 400/500-level course   In this co-convened course, for graduate students taking CSC 550, the problems on homeworks and exams will be at a more challenging level of difficulty than for undergraduate students taking CSC 450. Other than this difference, the lectures, assignments, and grading are the same for both CSC 450 and CSC 550.

Incomplete or withdrawal   Requests for an incomplete (I) or withdrawal (W) must be made in accordance with University policies, which are available here.

Regrades   Requests for regrades of homeworks or exams must be submitted within one week of receiving the graded homework or exam.

Attendance Attendance of lectures is required. Roll is taken at the start of class with a sign-in sheet; if your signature is not on the roll sheet, it counts as an absence. Students who miss class due to illness or emergency are required to bring documentation from their healthcare provider or other relevant, professional third parties. Failure to submit third-party documentation will result in unexcused absences.

The attendance policy is that you are allowed one unexcused absence in the first week, two unexcused absences in the first four weeks, and four unexcused absences in the first eight weeks. On exceeding these limits, you may be dropped from class.

Honors credit Undergraduate students who wish to take the course for honors credit should email the instructor to make an appointment to discuss the terms and to sign the honors course request form.
Course topics We cover the topics of the course in three parts. We go into depth in the area of sequence analysis, with the first part focusing on string matching, and the second part on sequence alignment. For breadth we then move to higher-level analysis, with the third part focusing on pathway inference in cellular reaction networks.

I   String matching

  • Suffix arrays:  Exact string matching using suffix arrays; linear-time construction; height array construction.
  • Ferragina-Manzini index:  Optimal string matching using the Ferragina-Manzini index; the Burrows-Wheeler transform.
  • Applications:  Solving diverse string matching problems using suffix arrays and the Ferragina-Manzini index.

II   Sequence alignment

  • Two-sequence alignment:  Local and global alignment; affine gap costs; linear-space alignment; inverse parametric alignment.
  • Multiple-sequence alignment:  Models of multiple sequence alignment; exact algorithms, approximation algorithms, heuristics; parameter advising.

III   Pathway inference

  • Shortest hyperpaths:  Inferring systems of reactions that produce target molecules from source compounds via shortest hyperpaths in directed hypergraphs.
  • Shortest factories:  Inferring reactions that produce targets from sources while conserving or not depleting intermediate metabolites via shortest factories in metabolic networks.
Weekly schedule We cover these topics in the following weekly schedule. (This schedule is tentative, and subject to adjustment based on instructor discretion. When appropriate, relevant readings in the sections of the textbook, and links to handouts, will be posted.)

August 27 August 29 Overview; biology background
Septemer 3 September 5 Suffix arrays and their use in string matching
September 10 September 12
(Homework 1 out)
Suffix array construction in linear time
September 17 September 19 Height array construction
September 24 September 26
(Homework 1 due)
(Homework 2 out)
Burrows-Wheeler transform and Ferragina-Manzini index
October 1 October 3
(Homework 1 back)
Applications of suffix arrays
October 8 October 10
(Homework 2 due)
Applications continued; homework solutions discussed
October 15
(Homework 2 back)
October 17
(Midterm exam)
Exam review; midterm exam
October 22 October 24
(Midterm exam back)
(Homework 3 out)
Sequence alignment; relation to shortest paths
October 29 October 31 Linear space alignment
November 5 November 7
(Homework 3 due)
(optional Homework 4 out)
Multiple sequence alignment models; exact algorithms
November 12 November 14
(Term paper out)
(Homework 3 back)
Approximation algorithms for multiple alignment
November 19 November 21
(optional Homework 4 due)
Aligning alignments exactly
November 26 (November 28)
(No class)
Heuristics for multiple alignment; Thanksgiving recess
December 3
(Homework 4 back)
December 5 Parameter advising for multiple alignment; pathway inference by shortest hyperpaths
December 10
(Term paper due)
Pathway inference by shortest factories

Classroom behavior

To foster a positive learning environment, students and instructors have a shared responsibility. We want a safe, welcoming, and inclusive environment where all of us feel comfortable with each other and where we can challenge ourselves to succeed. To that end, our focus is on the tasks at hand and not on extraneous activities (such as texting, chatting, reading a newspaper, making phone calls, web surfing, and so on).

Students are asked to refrain from disruptive conversations with people sitting around them during lecture. Students observed engaging in disruptive activity will be asked to cease this behavior. Those who continue to disrupt the class will be asked to leave lecture or discussion and may be reported to the Dean of Students.

Some learning styles are best served by using personal electronics, such as laptops and iPads. These devices can be distracting to other learners. Therefore, students who prefer to use electronic devices for note-taking during lecture should use one side of the classroom.

Safety on campus For a list of emergency procedures for all types of incidents, please visit the website of the Critical Incident Response Team (CIRT), available here.

Please also watch the video available here.

University-wide policies Links to the following University policies (listed below) are available here.
  • Absence and class participation policies
  • Threatening behavior policy
  • Accessibility and accommodations policy
  • Code of academic integrity
  • Nondiscrimination and anti-harrassment policy
Department-wide policies Links to the following Department policies (listed below) are available here.
  • Department code of conduct
  • Class recordings
  • Illnesses and emergencies
  • Obtaining help
  • Preferred names and pronouns
  • Confidentiality of student records
  • Additional resources
Land Acknowledgement Statement We respectfully acknowledge the University of Arizona is on the land and territories of indigenous peoples. Today, Arizona is home to twenty-two federally recognized tribes, with Tucson being home to the O’odham and the Yaqui. Committed to diversity and inclusion, the University strives to build sustainable relationships with sovereign native nations and indigenous communities through education offerings, partnerships, and community service.
Subject to change

Information contained in the course syllabus, other than the grade and absence policy, may be subject to change with advance notice, as deemed appropriate by the instructor.

Distributions Grade distributions are available here.
Announcements

The grade distributions for the Term Paper, and for the Overall course score, have been posted above.

The final response rate for the SCS survey at the end of Wednesday, December 11, reached 72.41%.

(Please complete the SCS survey on course feedback. If the response rate is at least 80% by December 11 at 5pm, all enrolled students will receive 5 bonus points.)


[ Top | Department ]

http://www.cs.arizona.edu/classes/cs450/fall24/
John Kececioglu (kece@cs.arizona.edu)
20 December 2024