Bioinformtics and functional genomics.pdf

seeders: 35
leechers: 9
Added on July 28, 2012 by bellatrix231in Books
Torrent verified.



Bioinformtics and functional genomics.pdf (Size: 37.61 MB)
 Bioinformtics and functional genomics.pdf37.61 MB

Description

BIOINFORMATICS AND FUNCTIONAL GENOMICS

Second Edition

Jonathan Pevsner

Department of Neurology, Kennedy Krieger Institute
and
Department of Neuroscience and Division of Health Sciences
Informatics, The Johns Hopkins School of Medicine,
Baltimore, Maryland




Cover illustration includes detail from Leonardo da Vinci (1452–1519), dated c.1506–1507, courtesy
of the Schlossmuseum (Weimar).
ISBN: 978-0-470-08585-1


Preface to the Second Edition

The Neurobehavioral Unit of the Kennedy Krieger Institute has 16 hospital beds.
Most of the patients are children who have been diagnosed with autism, and most
engage in self-injurious behavior. They engage in self-biting, self-hitting, headbanging,
and other destructive behaviors. In most cases, we do not understand the
genetic contributions to such behaviors, limiting the available strategies for treatment.
In my research, I ammotivated to understand molecular changes that underlie
childhood brain diseases. The field of bioinformatics provides tools we can use to
understand disease processes through the analysis of molecular sequence data.
More broadly, bioinformatics facilitates our understanding of the basic aspects of
biology including development, metabolism, adaptation to the environment, genetics
(e.g., the basis of individual differences), and evolution.
Since the publication of the first edition of this textbook in 2003, the fields of
bioinformatics and genomics have grown explosively. In the preface to the first edition
(2003) I noted that tens of billions of base pairs (gigabases) of DNA had been deposited
in GenBank. Now in 2009 we are reaching tens of trillions (terabases) of DNA,
presenting us with unprecedented challenges in how to store, analyze, and interpret
sequence data. In this second edition I have made numerous changes to the content
and organization of the book. All of the chapters are rewritten, and about 90% of the
figures and tables are updated. There are two new chapters, one on functional
genomics and one on the eukaryotic chromosome. I now focus on the globins as
examples throughout the book. Globins have a special place in the history of biology,
as they were among the first proteins to be identified (in the 1830s) and sequenced (in
the 1950s and 1960s). The first protein to have its structure solved by X-ray crystallography
was myoglobin (Chapter 11); molecular phylogeny was applied to the globins
in the 1960s (Chapter 7); and the globin gene loci were among the first to be
sequenced (in the 1980s; see Chapter 16).
The fields of bioinformatics and genomics are far too broad to be understood by
one person. Thus many textbooks are written by multiple authors, each of whom
brings a deeper knowledge of the subject matter. I hope that this book at least
offers the benefit of a single author’s vision of how to present the material. This is
essentially two textbooks: one on bioinformatics (parts I and II) and one on genomics
(part III). I feel that presenting bioinformatics on its own would be incomplete without
further applying those approaches to sequence analysis of genomes across the tree
of life. Similarly I feel that it is not possible to approach genomics without first treating
the bioinformatics tools that are essential engines of that field.
As with the previous edition a companion website is available which provides
up-to-date web links referred to in the book and PowerPoint slides arranged by
xxi
chapter (www.bioinfbook.org). A resource site for instructors is also available giving
detailed solutions to problems (www.wiley.com/go/pevsnerbioinformatics).
In preparing each edition of this book I read many papers and reviewed several
thousand websites. I sincerely apologize to those authors, researchers and others
whose work I did not cite. It is a great pleasure to acknowledge my colleagues who
have helped in the preparation of this book. Some read chapters including Jef
Boeke (Chapter 12), Rafael Irizarry (Chapter 9), Stuart Ray (Chapter 7), Ingo
Ruczinski (Chapter 11), and Sarah Wheelan (Chapters 3 and 5–7). I thank many
students and faculty at Johns Hopkins and elsewhere who have provided critical feedback,
including those who have lectured in bioinformatics and genomics courses
(Judith Bender, Jef Boeke, Egbert Hoiczyk, Ingo Ruczinski, Alan Scott, David
Sullivan, David Valle, and Sarah Wheelan). Many others engaged in helpful discussions
including Charles D. Cohen, Bob Cole, Donald Coppock, Laurence Frelin,
Hugh Gelch, Gary W. Goldstein, Marjan Gucek, Ada Hamosh, Nathaniel Miller,
Akhilesh Pandey, Elisha Roberson, Kirby D. Smith, Jason Ting, and N. Varg.
I thank my wife Barbara for her support and love as I prepared this book.


Foreword


Ask 10 investigators in human genetics what resources they need most and it is highly
likely that computational skills and tools will be at the top of the list. Genomics, with
its reliance on microarrays, genotyping, high throughput sequencing and the like, is
intensely data-rich and for this reason is impossible to disentangle from bioinformatics.
This text, with its clear descriptions, practical examples and focus on the
overlaps and interdependence of these two fields, is thus an essential resource for
students and practitioners alike.
Interestingly, bioinformatics and genomics are both relatively recent disciplines.
Each emerged in the course of the Human Genome Project (HGP) that was conceived
in the mid-1980s and began officially on October 1, 1990. As the HGP
matured from its initial focus on gene maps in model organisms to the massive efforts
to produce a reference human whole genome sequence, there was an increasing need
for computational biology tools to store, analyze and disseminate large amounts of
sequence data. For this reason, genomics increasingly relied on bioinformatics
and, in turn, the field of bioinformatics flourished. Today, no serious student of genomics
can imagine life without bioinformatics. This interdependence continues to
grow by leaps and bounds as the questions and activities of investigators in genomics
become bolder and more expansive; consider, for example, whole genome association
studies (GWAS), theENCODEproject, the challenge of copy number variants,
the 1000 Genomes project, epigenomics, and the looming growth of personal
genome sequences and their analysis.
This textbook provides a clear and timely introduction to both bioinformatics
and genomics. It is organized so that each chapter can correspond to a lecture for a
course on bioinformatics or genomics and, indeed, we have used it this way for our
students. Also, for readers not taking courses, the book provides essential
background material. For computer scientists and biologists alike the book offers
explanations of available methods and the kinds of problems for which they can be
used. The sections on bioinformatics in the first part of the book describe many of
the basic tools that are used to analyze and compare DNA and protein sequences.
The tone is inviting as the reader is guided to learn to use different software by
example. Multiple approaches for solving particular problems, such as sequence
alignment and molecular phylogeny, are presented. The middle part of the book
introduces functional genomics. Here again the focus is on helping the reader to
learn how to do analyses (such as microarray data analysis or protein structure
prediction) in a practical way. A companion website provides many data sets, so
the student can get experience in performing analyses. Chapter 12 provides a
roadmap to the very complicated topic of functional genomics, spanning a range
of techniques and model organisms used to study gene function. The last third of
xxvii
the book provides a survey of the tree of life from a genomics perspective. There is an
attempt to be comprehensive, and at the same time, to present the material in an
interesting way, highlighting the fascinating features that make each genome unique.
Far from being a dry account of the facts of genomics and bioinformatics, the
book offers many features that highlight the vitality of this field. There are discussions
throughout about how to critically evaluate the performance of different software.
For example, there are ‘competitions’ in which different research groups perform
computational analyses on data sets that have been validated with some ‘gold standard’,
allowing false positive and false negative error rates to be determined. These
competitions are described in areas such as microarray data analysis (Chapter 9),
mass spectrometry (Chapter 10), protein structure prediction (Chapter 11), or
gene prediction (Chapter 16). The book also includes descriptions of important
movements in the fields of bioinformatics and genomics, ranging from the RefSeq
project for organizing sequences to the ENCODE and HapMap projects.
Similarly, there is a rich description of the historical context for different aspects of
bioinformatics and genomics, such as Garrod’s views on disease (Chapter 20);
Ohno’s classic 1970 book on genome duplication (Chapter 17); and, the earliest
attempts to create alignments and phylogenetic trees of the globins.
Where will the fields of bioinformatics and genomics go in the next five to 10
years? The opportunities are vast and any prediction will certainly be incomplete,
but it is certain that the rapid technological advances in sequencing will provide an
unprecedented view of human genetic variation and how this relates to phenotype.
In the area of human disease studies, genome-wide association studies can be
expected to lead to the identification of hundreds of genes underlying complex disorders.
Finally, our understanding of evolution and its relevance to medicine will
expand dramatically. Dr Pevsner’s valuable book will help the student or researcher
access the tools and learn the principles that will enable this exciting research.
David Valle, M.D.
Henry J. Knott Professor and Director McKusick-Nathans Institute of Genetic Medicine,
Johns Hopkins University School of Medicine

Contents in Brief


PART I ANALYZING DNA, RNA, AND PROTEIN SEQUENCES IN DATABASES

1 Introduction

2 Access to Sequence Data and Literature Information

3 Pairwise Sequence Alignment

4 Basic Local Alignment Search Tool (BLAST)

5 Advanced Database Searching

6 Multiple Sequence Alignment

7 Molecular Phylogeny and Evolution



PART II GENOMEWIDE ANALYSIS OF RNA AND PROTEIN

8 Bioinformatic Approaches to Ribonucleic Acid (RNA)

9 Gene Expression: Microarray Data Analysis

10 Protein Analysis and Proteomics

11 Protein Structure

12 Functional Genomics



PART III GENOME ANALYSIS

13 Completed Genomes

14 Completed Genomes: Viruses

15 Completed Genomes: Bacteria and Archaea

16 The Eukaryotic Chromosome

17 Eukaryotic Genomes: Fungi

18 Eukaryotic Genomes: From Parasites to Primates

19 Human Genome

20 Human Disease



Glossary

Answers to Self-Test Quizzes

Author Index

Subject Index

Sharing Widget


Download torrent
37.61 MB
seeders:35
leechers:9
Bioinformtics and functional genomics.pdf

All Comments

thanks