Information Theory Inference And Learning Algorithms - David Mackay.pdf - Telekomunikacja - leszcz_chom

Information Theory, Inference, and Learning Algorithms

David J.C. MacKay

Information Theory,

Inference,

and Learning Algorithms

David J.C. MacKay

mackay@mrao.cam.ac.uk

c 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003

Version 6.0 (as published) June 26, 2003

Please send feedback on this book via

http://www.inference.phy.cam.ac.uk/mackay/itila/

This book will be published by C.U.P. in September 2003. It will remain

viewable on-screen on the above website, in postscript, djvu, and pdf

formats.

(C.U.P. replace this page with their own page ii.)

Contents

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v

1 Introduction to Information Theory . . . . . . . . . . . . . . . . 3

2 Probability, Entropy, and Inference . . . . . . . . . . . . . . . . . 22

3 More about Inference . . . . . . . . . . . . . . . . . . . . . . . . 48

Data Compression . . . . . . . . . . . . . . . . . . . . . . . . 65

4 The Source Coding Theorem . . . . . . . . . . . . . . . . . . . . 67

5 Symbol Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

6 Stream Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

7 Codes for Integers . . . . . . . . . . . . . . . . . . . . . . . . . . 132

Noisy-Channel Coding . . . . . . . . . . . . . . . . . . . . . . 137

8 Correlated Random Variables . . . . . . . . . . . . . . . . . . . . 138

9 Communication over a Noisy Channel . . . . . . . . . . . . . . . 146

10 The Noisy-Channel Coding Theorem . . . . . . . . . . . . . . . . 162

11 Error-Correcting Codes and Real Channels . . . . . . . . . . . . 177

III Further Topics in Information Theory . . . . . . . . . . . . . 191

12 Hash Codes: Codes for Ecient Information Retrieval . . . . . 193

13 Binary Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206

14 Very Good Linear Codes Exist . . . . . . . . . . . . . . . . . . . 229

15 Further Exercises on Information Theory . . . . . . . . . . . . . 233

16 Message Passing . . . . . . . . . . . . . . . . . . . . . . . . . . . 241

17 Communication over Constrained Noiseless Channels . . . . . . 248

18 Crosswords and Codebreaking . . . . . . . . . . . . . . . . . . . 260

19 Why have Sex? Information Acquisition and Evolution . . . . . 269

IV Probabilities and Inference . . . . . . . . . . . . . . . . . . . 281

20 An Example Inference Task: Clustering . . . . . . . . . . . . . . 284

21 Exact Inference by Complete Enumeration . . . . . . . . . . . . 293

22 Maximum Likelihood and Clustering . . . . . . . . . . . . . . . . 300

23 Useful Probability Distributions . . . . . . . . . . . . . . . . . . 311

24 Exact Marginalization . . . . . . . . . . . . . . . . . . . . . . . . 319

25 Exact Marginalization in Trellises . . . . . . . . . . . . . . . . . 324

26 Exact Marginalization in Graphs . . . . . . . . . . . . . . . . . . 334

27 Laplace's Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 341

28 Model Comparison and Occam's Razor . . . . . . . . . . . . . . 343

29 Monte Carlo Methods . . . . . . . . . . . . . . . . . . . . . . . . 357

30 Ecient Monte Carlo Methods . . . . . . . . . . . . . . . . . . . 387

31 Ising Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 400

32 Exact Monte Carlo Sampling . . . . . . . . . . . . . . . . . . . . 413

33 Variational Methods . . . . . . . . . . . . . . . . . . . . . . . . . 422

34 Independent Component Analysis and Latent Variable Mod-

elling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437

35 Random Inference Topics . . . . . . . . . . . . . . . . . . . . . . 445

36 Decision Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . 451

37 Bayesian Inference and Sampling Theory . . . . . . . . . . . . . 457

Neural networks . . . . . . . . . . . . . . . . . . . . . . . . . 467

38 Introduction to Neural Networks . . . . . . . . . . . . . . . . . . 468

39 The Single Neuron as a Classier . . . . . . . . . . . . . . . . . . 471

40 Capacity of a Single Neuron . . . . . . . . . . . . . . . . . . . . . 483

41 Learning as Inference . . . . . . . . . . . . . . . . . . . . . . . . 492

42 Hopeld Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 505

43 Boltzmann Machines . . . . . . . . . . . . . . . . . . . . . . . . . 522

44 Supervised Learning in Multilayer Networks . . . . . . . . . . . . 527

45 Gaussian Processes . . . . . . . . . . . . . . . . . . . . . . . . . 535

46 Deconvolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 549

VI Sparse Graph Codes . . . . . . . . . . . . . . . . . . . . . . 555

47 Low-Density Parity-Check Codes . . . . . . . . . . . . . . . . . 557

48 Convolutional Codes and Turbo Codes . . . . . . . . . . . . . . . 574

49 Repeat{Accumulate Codes . . . . . . . . . . . . . . . . . . . . . 582

50 Digital Fountain Codes . . . . . . . . . . . . . . . . . . . . . . . 589

VII Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 597

A Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 598

B Some Physics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 601

C Some Mathematics . . . . . . . . . . . . . . . . . . . . . . . . . . 605

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 620

Preface

This book is aimed at senior undergraduates and graduate students in Engi-

neering, Science, Mathematics, and Computing. It expects familiarity with

calculus, probability theory, and linear algebra as taught in a rst- or second-

year undergraduate course on mathematics for scientists and engineers.

Conventional courses on information theory cover not only the beauti-

ful theoretical ideas of Shannon, but also practical solutions to communica-

tion problems. This book goes further, bringing in Bayesian data modelling,

Monte Carlo methods, variational methods, clustering algorithms, and neural

networks.

Why unify information theory and machine learning? Because they are

two sides of the same coin. In the 1960s, a single eld, cybernetics, was

populated by information theorists, computer scientists, and neuroscientists,

all studying common problems. Information theory and machine learning still

belong together. Brains are the ultimate compression and communication

systems. And the state-of-the-art algorithms for both data compression and

error-correcting codes use the same tools as machine-learning.

How to use this book

The essential dependencies between chapters are indicated in the gure on the

next page. An arrow from one chapter to another indicates that the second

chapter requires some of the rst.

Within Parts I, II, IV, and V of this book, chapters on advanced or optional

topics are towards the end. All chapters of Part III are optional on a rst

reading, except perhaps for Chapter 16 (Message Passing).

The same system sometimes applies within a chapter: the nal sections of-

ten deal with advanced topics that can be skipped on a rst reading. For exam-

ple in two key chapters { Chapter 4 (The Source Coding Theorem) and Chap-

ter 10 (The Noisy-Channel Coding Theorem) { the rst-time reader should

detour at section 4.5 and section 10.4 respectively.

Pages vii{x show a few ways to use this book. First, I give the roadmap for

a course that I teach in Cambridge: `Information theory, pattern recognition,

and neural networks'. The book is also intended as a textbook for traditional

courses in information theory. The second roadmap shows the chapters for an

introductory information theory course and the third for a course aimed at an

understanding of state-of-the-art error-correcting codes. The fourth roadmap

shows how to use the text in a conventional course on machine learning.

Information Theory Inference And Learning Algorithms - David Mackay.pdf

Plik z chomika:

Inne pliki z tego folderu:

Inne foldery tego chomika: