Mining the Social Web_ Analyzing Data from Facebook, Twitter, LinkedIn, ... [Russell 2011-02-11](1).pdf
(
4928 KB
)
Pobierz
684020212 UNPDF
Table of Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
1. Introduction: Hacking on Twitter Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Installing Python Development Tools
1
Collecting and Manipulating Twitter Data
3
Tinkering with Twitter’s API
4
Frequency Analysis and Lexical Diversity
7
Visualizing Tweet Graphs
14
Synthesis: Visualizing Retweets with Protovis
15
Closing Remarks
17
2. Microformats: Semantic Markup and Common Sense Collide . . . . . . . . . . . . . . . . . . 19
XFN and Friends
19
Exploring Social Connections with XFN
22
A Breadth-First Crawl of XFN Data
23
Geocoordinates: A Common Thread for Just About Anything
30
Wikipedia Articles + Google Maps = Road Trip?
30
Slicing and Dicing Recipes (for the Health of It)
35
Collecting Restaurant Reviews
37
Summary
40
3. Mailboxes: Oldies but Goodies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
mbox: The Quick and Dirty on Unix Mailboxes
42
mbox + CouchDB = Relaxed Email Analysis
48
Bulk Loading Documents into CouchDB
51
Sensible Sorting
52
Map/Reduce-Inspired Frequency Analysis
55
Sorting Documents by Value
61
couchdb-lucene: Full-Text Indexing and More
63
Threading Together Conversations
67
Look Who’s Talking
73
ix
Visualizing Mail “Events” with SIMILE Timeline
77
Analyzing Your Own Mail Data
80
The Graph Your (Gmail) Inbox Chrome Extension
81
Closing Remarks
82
4. Twitter: Friends, Followers, and Setwise Operations . . . . . . . . . . . . . . . . . . . . . . . . . 83
RESTful and OAuth-Cladded APIs 84
No, You Can’t Have My Password 85
A Lean, Mean Data-Collecting Machine 88
A Very Brief Refactor Interlude 91
Redis: A Data Structures Server 92
Elementary Set Operations 94
Souping Up the Machine with Basic Friend/Follower Metrics 96
Calculating Similarity by Computing Common Friends and Followers 102
Measuring Influence
103
Constructing Friendship Graphs
108
Clique Detection and Analysis
110
The Infochimps “Strong Links” API
114
Interactive 3D Graph Visualization
116
Summary
117
5. Twitter: The Tweet, the Whole Tweet, and Nothing but the Tweet . . . . . . . . . . . . 119
Pen : Sword :: Tweet : Machine Gun (?!?) 119
Analyzing Tweets (One Entity at a Time) 122
Tapping (Tim’s) Tweets 125
Who Does Tim Retweet Most Often? 138
What’s Tim’s Influence? 141
How Many of Tim’s Tweets Contain Hashtags? 144
Juxtaposing Latent Social Networks (or #JustinBieber Versus #TeaParty) 147
What Entities Co-Occur Most Often with #JustinBieber and #TeaParty
Tweets?
148
On Average, Do #JustinBieber or #TeaParty Tweets Have More
Hashtags? 153
Which Gets Retweeted More Often: #JustinBieber or #TeaParty? 154
How Much Overlap Exists Between the Entities of #TeaParty and
#JustinBieber Tweets?
156
Visualizing Tons of Tweets
158
Visualizing Tweets with Tricked-Out Tag Clouds
158
Visualizing Community Structures in Twitter Search Results
162
Closing Remarks
166
6. LinkedIn: Clustering Your Professional Network for Fun (and Profit?) . . . . . . . . . . 167
Motivation for Clustering
168
x | Table of Contents
Plik z chomika:
musli_com
Inne pliki z tego folderu:
21 Recipes for Mining Twitter_ Distilling Rich Information from Messy Data [Russell 2011-03-10](1).pdf
(1049 KB)
Active Mining_ New Directions of Data Mining [Motoda 2002-07-29](2).pdf
(8618 KB)
Advanced Data Mining Techniques [Olson & Delen 2008-01-21](1).pdf
(1098 KB)
Advances in Data Mining_ Knowledge Discovery and Applications [Karahoca 2014](2).pdf
(15624 KB)
Advances in K-means Clustering_ A Data Mining Thinking [Wu 2012-07-10](1).pdf
(2511 KB)
Inne foldery tego chomika:
cheat-sheets
Data Structures
Demystified Series
Dreamweaver
Eclipse
Zgłoś jeśli
naruszono regulamin