Information Theory Inference And Learning Algorithms - David Mackay.pdf

(11195 KB) Pobierz
346901294 UNPDF
Information Theory, Inference, and Learning Algorithms
David J.C. MacKay
346901294.001.png
Information Theory,
Inference,
and Learning Algorithms
David J.C. MacKay
mackay@mrao.cam.ac.uk
c 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003
Version 6.0 (as published) June 26, 2003
Please send feedback on this book via
http://www.inference.phy.cam.ac.uk/mackay/itila/
This book will be published by C.U.P. in September 2003. It will remain
viewable on-screen on the above website, in postscript, djvu, and pdf
formats.
(C.U.P. replace this page with their own page ii.)
Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
1 Introduction to Information Theory . . . . . . . . . . . . . . . . 3
2 Probability, Entropy, and Inference . . . . . . . . . . . . . . . . . 22
3 More about Inference . . . . . . . . . . . . . . . . . . . . . . . . 48
I
Data Compression . . . . . . . . . . . . . . . . . . . . . . . . 65
4 The Source Coding Theorem . . . . . . . . . . . . . . . . . . . . 67
5 Symbol Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
6 Stream Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
7 Codes for Integers . . . . . . . . . . . . . . . . . . . . . . . . . . 132
II
Noisy-Channel Coding . . . . . . . . . . . . . . . . . . . . . . 137
8 Correlated Random Variables . . . . . . . . . . . . . . . . . . . . 138
9 Communication over a Noisy Channel . . . . . . . . . . . . . . . 146
10 The Noisy-Channel Coding Theorem . . . . . . . . . . . . . . . . 162
11 Error-Correcting Codes and Real Channels . . . . . . . . . . . . 177
III Further Topics in Information Theory . . . . . . . . . . . . . 191
12 Hash Codes: Codes for Ecient Information Retrieval . . . . . 193
13 Binary Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
14 Very Good Linear Codes Exist . . . . . . . . . . . . . . . . . . . 229
15 Further Exercises on Information Theory . . . . . . . . . . . . . 233
16 Message Passing . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
17 Communication over Constrained Noiseless Channels . . . . . . 248
18 Crosswords and Codebreaking . . . . . . . . . . . . . . . . . . . 260
19 Why have Sex? Information Acquisition and Evolution . . . . . 269
IV Probabilities and Inference . . . . . . . . . . . . . . . . . . . 281
20 An Example Inference Task: Clustering . . . . . . . . . . . . . . 284
21 Exact Inference by Complete Enumeration . . . . . . . . . . . . 293
22 Maximum Likelihood and Clustering . . . . . . . . . . . . . . . . 300
23 Useful Probability Distributions . . . . . . . . . . . . . . . . . . 311
24 Exact Marginalization . . . . . . . . . . . . . . . . . . . . . . . . 319
25 Exact Marginalization in Trellises . . . . . . . . . . . . . . . . . 324
26 Exact Marginalization in Graphs . . . . . . . . . . . . . . . . . . 334
346901294.002.png
27 Laplace's Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 341
28 Model Comparison and Occam's Razor . . . . . . . . . . . . . . 343
29 Monte Carlo Methods . . . . . . . . . . . . . . . . . . . . . . . . 357
30 Ecient Monte Carlo Methods . . . . . . . . . . . . . . . . . . . 387
31 Ising Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 400
32 Exact Monte Carlo Sampling . . . . . . . . . . . . . . . . . . . . 413
33 Variational Methods . . . . . . . . . . . . . . . . . . . . . . . . . 422
34 Independent Component Analysis and Latent Variable Mod-
elling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437
35 Random Inference Topics . . . . . . . . . . . . . . . . . . . . . . 445
36 Decision Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . 451
37 Bayesian Inference and Sampling Theory . . . . . . . . . . . . . 457
V
Neural networks . . . . . . . . . . . . . . . . . . . . . . . . . 467
38 Introduction to Neural Networks . . . . . . . . . . . . . . . . . . 468
39 The Single Neuron as a Classier . . . . . . . . . . . . . . . . . . 471
40 Capacity of a Single Neuron . . . . . . . . . . . . . . . . . . . . . 483
41 Learning as Inference . . . . . . . . . . . . . . . . . . . . . . . . 492
42 Hopeld Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 505
43 Boltzmann Machines . . . . . . . . . . . . . . . . . . . . . . . . . 522
44 Supervised Learning in Multilayer Networks . . . . . . . . . . . . 527
45 Gaussian Processes . . . . . . . . . . . . . . . . . . . . . . . . . 535
46 Deconvolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 549
VI Sparse Graph Codes . . . . . . . . . . . . . . . . . . . . . . 555
47 Low-Density Parity-Check Codes . . . . . . . . . . . . . . . . . 557
48 Convolutional Codes and Turbo Codes . . . . . . . . . . . . . . . 574
49 Repeat{Accumulate Codes . . . . . . . . . . . . . . . . . . . . . 582
50 Digital Fountain Codes . . . . . . . . . . . . . . . . . . . . . . . 589
VII Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 597
A Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 598
B Some Physics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 601
C Some Mathematics . . . . . . . . . . . . . . . . . . . . . . . . . . 605
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 620
Preface
This book is aimed at senior undergraduates and graduate students in Engi-
neering, Science, Mathematics, and Computing. It expects familiarity with
calculus, probability theory, and linear algebra as taught in a rst- or second-
year undergraduate course on mathematics for scientists and engineers.
Conventional courses on information theory cover not only the beauti-
ful theoretical ideas of Shannon, but also practical solutions to communica-
tion problems. This book goes further, bringing in Bayesian data modelling,
Monte Carlo methods, variational methods, clustering algorithms, and neural
networks.
Why unify information theory and machine learning? Because they are
two sides of the same coin. In the 1960s, a single eld, cybernetics, was
populated by information theorists, computer scientists, and neuroscientists,
all studying common problems. Information theory and machine learning still
belong together. Brains are the ultimate compression and communication
systems. And the state-of-the-art algorithms for both data compression and
error-correcting codes use the same tools as machine-learning.
How to use this book
The essential dependencies between chapters are indicated in the gure on the
next page. An arrow from one chapter to another indicates that the second
chapter requires some of the rst.
Within Parts I, II, IV, and V of this book, chapters on advanced or optional
topics are towards the end. All chapters of Part III are optional on a rst
reading, except perhaps for Chapter 16 (Message Passing).
The same system sometimes applies within a chapter: the nal sections of-
ten deal with advanced topics that can be skipped on a rst reading. For exam-
ple in two key chapters { Chapter 4 (The Source Coding Theorem) and Chap-
ter 10 (The Noisy-Channel Coding Theorem) { the rst-time reader should
detour at section 4.5 and section 10.4 respectively.
Pages vii{x show a few ways to use this book. First, I give the roadmap for
a course that I teach in Cambridge: `Information theory, pattern recognition,
and neural networks'. The book is also intended as a textbook for traditional
courses in information theory. The second roadmap shows the chapters for an
introductory information theory course and the third for a course aimed at an
understanding of state-of-the-art error-correcting codes. The fourth roadmap
shows how to use the text in a conventional course on machine learning.
v
346901294.003.png
Zgłoś jeśli naruszono regulamin