Bioinformtics and functional genomics.pdfseeders: 35
leechers: 9
Bioinformtics and functional genomics.pdf (Size: 37.61 MB)
Description
BIOINFORMATICS AND FUNCTIONAL GENOMICS
Second Edition Jonathan Pevsner Department of Neurology, Kennedy Krieger Institute and Department of Neuroscience and Division of Health Sciences Informatics, The Johns Hopkins School of Medicine, Baltimore, Maryland Cover illustration includes detail from Leonardo da Vinci (1452–1519), dated c.1506–1507, courtesy of the Schlossmuseum (Weimar). ISBN: 978-0-470-08585-1 Preface to the Second Edition The Neurobehavioral Unit of the Kennedy Krieger Institute has 16 hospital beds. Most of the patients are children who have been diagnosed with autism, and most engage in self-injurious behavior. They engage in self-biting, self-hitting, headbanging, and other destructive behaviors. In most cases, we do not understand the genetic contributions to such behaviors, limiting the available strategies for treatment. In my research, I ammotivated to understand molecular changes that underlie childhood brain diseases. The field of bioinformatics provides tools we can use to understand disease processes through the analysis of molecular sequence data. More broadly, bioinformatics facilitates our understanding of the basic aspects of biology including development, metabolism, adaptation to the environment, genetics (e.g., the basis of individual differences), and evolution. Since the publication of the first edition of this textbook in 2003, the fields of bioinformatics and genomics have grown explosively. In the preface to the first edition (2003) I noted that tens of billions of base pairs (gigabases) of DNA had been deposited in GenBank. Now in 2009 we are reaching tens of trillions (terabases) of DNA, presenting us with unprecedented challenges in how to store, analyze, and interpret sequence data. In this second edition I have made numerous changes to the content and organization of the book. All of the chapters are rewritten, and about 90% of the figures and tables are updated. There are two new chapters, one on functional genomics and one on the eukaryotic chromosome. I now focus on the globins as examples throughout the book. Globins have a special place in the history of biology, as they were among the first proteins to be identified (in the 1830s) and sequenced (in the 1950s and 1960s). The first protein to have its structure solved by X-ray crystallography was myoglobin (Chapter 11); molecular phylogeny was applied to the globins in the 1960s (Chapter 7); and the globin gene loci were among the first to be sequenced (in the 1980s; see Chapter 16). The fields of bioinformatics and genomics are far too broad to be understood by one person. Thus many textbooks are written by multiple authors, each of whom brings a deeper knowledge of the subject matter. I hope that this book at least offers the benefit of a single author’s vision of how to present the material. This is essentially two textbooks: one on bioinformatics (parts I and II) and one on genomics (part III). I feel that presenting bioinformatics on its own would be incomplete without further applying those approaches to sequence analysis of genomes across the tree of life. Similarly I feel that it is not possible to approach genomics without first treating the bioinformatics tools that are essential engines of that field. As with the previous edition a companion website is available which provides up-to-date web links referred to in the book and PowerPoint slides arranged by xxi chapter (www.bioinfbook.org). A resource site for instructors is also available giving detailed solutions to problems (www.wiley.com/go/pevsnerbioinformatics). In preparing each edition of this book I read many papers and reviewed several thousand websites. I sincerely apologize to those authors, researchers and others whose work I did not cite. It is a great pleasure to acknowledge my colleagues who have helped in the preparation of this book. Some read chapters including Jef Boeke (Chapter 12), Rafael Irizarry (Chapter 9), Stuart Ray (Chapter 7), Ingo Ruczinski (Chapter 11), and Sarah Wheelan (Chapters 3 and 5–7). I thank many students and faculty at Johns Hopkins and elsewhere who have provided critical feedback, including those who have lectured in bioinformatics and genomics courses (Judith Bender, Jef Boeke, Egbert Hoiczyk, Ingo Ruczinski, Alan Scott, David Sullivan, David Valle, and Sarah Wheelan). Many others engaged in helpful discussions including Charles D. Cohen, Bob Cole, Donald Coppock, Laurence Frelin, Hugh Gelch, Gary W. Goldstein, Marjan Gucek, Ada Hamosh, Nathaniel Miller, Akhilesh Pandey, Elisha Roberson, Kirby D. Smith, Jason Ting, and N. Varg. I thank my wife Barbara for her support and love as I prepared this book. Foreword Ask 10 investigators in human genetics what resources they need most and it is highly likely that computational skills and tools will be at the top of the list. Genomics, with its reliance on microarrays, genotyping, high throughput sequencing and the like, is intensely data-rich and for this reason is impossible to disentangle from bioinformatics. This text, with its clear descriptions, practical examples and focus on the overlaps and interdependence of these two fields, is thus an essential resource for students and practitioners alike. Interestingly, bioinformatics and genomics are both relatively recent disciplines. Each emerged in the course of the Human Genome Project (HGP) that was conceived in the mid-1980s and began officially on October 1, 1990. As the HGP matured from its initial focus on gene maps in model organisms to the massive efforts to produce a reference human whole genome sequence, there was an increasing need for computational biology tools to store, analyze and disseminate large amounts of sequence data. For this reason, genomics increasingly relied on bioinformatics and, in turn, the field of bioinformatics flourished. Today, no serious student of genomics can imagine life without bioinformatics. This interdependence continues to grow by leaps and bounds as the questions and activities of investigators in genomics become bolder and more expansive; consider, for example, whole genome association studies (GWAS), theENCODEproject, the challenge of copy number variants, the 1000 Genomes project, epigenomics, and the looming growth of personal genome sequences and their analysis. This textbook provides a clear and timely introduction to both bioinformatics and genomics. It is organized so that each chapter can correspond to a lecture for a course on bioinformatics or genomics and, indeed, we have used it this way for our students. Also, for readers not taking courses, the book provides essential background material. For computer scientists and biologists alike the book offers explanations of available methods and the kinds of problems for which they can be used. The sections on bioinformatics in the first part of the book describe many of the basic tools that are used to analyze and compare DNA and protein sequences. The tone is inviting as the reader is guided to learn to use different software by example. Multiple approaches for solving particular problems, such as sequence alignment and molecular phylogeny, are presented. The middle part of the book introduces functional genomics. Here again the focus is on helping the reader to learn how to do analyses (such as microarray data analysis or protein structure prediction) in a practical way. A companion website provides many data sets, so the student can get experience in performing analyses. Chapter 12 provides a roadmap to the very complicated topic of functional genomics, spanning a range of techniques and model organisms used to study gene function. The last third of xxvii the book provides a survey of the tree of life from a genomics perspective. There is an attempt to be comprehensive, and at the same time, to present the material in an interesting way, highlighting the fascinating features that make each genome unique. Far from being a dry account of the facts of genomics and bioinformatics, the book offers many features that highlight the vitality of this field. There are discussions throughout about how to critically evaluate the performance of different software. For example, there are ‘competitions’ in which different research groups perform computational analyses on data sets that have been validated with some ‘gold standard’, allowing false positive and false negative error rates to be determined. These competitions are described in areas such as microarray data analysis (Chapter 9), mass spectrometry (Chapter 10), protein structure prediction (Chapter 11), or gene prediction (Chapter 16). The book also includes descriptions of important movements in the fields of bioinformatics and genomics, ranging from the RefSeq project for organizing sequences to the ENCODE and HapMap projects. Similarly, there is a rich description of the historical context for different aspects of bioinformatics and genomics, such as Garrod’s views on disease (Chapter 20); Ohno’s classic 1970 book on genome duplication (Chapter 17); and, the earliest attempts to create alignments and phylogenetic trees of the globins. Where will the fields of bioinformatics and genomics go in the next five to 10 years? The opportunities are vast and any prediction will certainly be incomplete, but it is certain that the rapid technological advances in sequencing will provide an unprecedented view of human genetic variation and how this relates to phenotype. In the area of human disease studies, genome-wide association studies can be expected to lead to the identification of hundreds of genes underlying complex disorders. Finally, our understanding of evolution and its relevance to medicine will expand dramatically. Dr Pevsner’s valuable book will help the student or researcher access the tools and learn the principles that will enable this exciting research. David Valle, M.D. Henry J. Knott Professor and Director McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine Contents in Brief PART I ANALYZING DNA, RNA, AND PROTEIN SEQUENCES IN DATABASES 1 Introduction 2 Access to Sequence Data and Literature Information 3 Pairwise Sequence Alignment 4 Basic Local Alignment Search Tool (BLAST) 5 Advanced Database Searching 6 Multiple Sequence Alignment 7 Molecular Phylogeny and Evolution PART II GENOMEWIDE ANALYSIS OF RNA AND PROTEIN 8 Bioinformatic Approaches to Ribonucleic Acid (RNA) 9 Gene Expression: Microarray Data Analysis 10 Protein Analysis and Proteomics 11 Protein Structure 12 Functional Genomics PART III GENOME ANALYSIS 13 Completed Genomes 14 Completed Genomes: Viruses 15 Completed Genomes: Bacteria and Archaea 16 The Eukaryotic Chromosome 17 Eukaryotic Genomes: Fungi 18 Eukaryotic Genomes: From Parasites to Primates 19 Human Genome 20 Human Disease Glossary Answers to Self-Test Quizzes Author Index Subject Index Sharing Widget |
All Comments