# Machine Learning For Hackers

#### Machine Learning for Hackers

by Drew Conway, John White

If you’re an experienced programmer interested in crunching data, this book will get you started with machine learning—a toolkit of algorithms that enables computers to train themselves to automate useful tasks. Authors Drew Conway and John Myles White help you understand machine learning and statistics tools through a series of hands-on case studies, instead of a traditional math-heavy presentation.

Each chapter focuses on a specific problem in machine learning, such as classification, prediction, optimization, and recommendation. Using the R programming language, you’ll learn how to analyze sample datasets and write simple machine learning algorithms. *Machine Learning for Hackers* is ideal for programmers from any background, including business, government, and academic research.

- Develop a naïve Bayesian classifier to determine if an email is spam, based only on its text
- Use linear regression to predict the number of page views for the top 1,000 websites
- Learn optimization techniques by attempting to break a simple letter cipher
- Compare and contrast U.S. Senators statistically, based on their voting records
- Build a “whom to follow” recommendation system from Twitter data

#### Machine Learning for Hackers

by Drew Conway, John Myles White

If you’re an experienced programmer interested in crunching data, this book will get you started with machine learning—a toolkit of algorithms that enables computers to train themselves to automate useful tasks. Authors Drew Conway and John Myles White help you understand machine learning and statistics tools through a series of hands-on case studies, instead of a traditional math-heavy presentation.

Each chapter focuses on a specific problem in machine learning, such as classification, prediction, optimization, and recommendation. Using the R programming language, you’ll learn how to analyze sample datasets and write simple machine learning algorithms. *Machine Learning for Hackers* is ideal for programmers from any background, including business, government, and academic research.

- Develop a naïve Bayesian classifier to determine if an email is spam, based only on its text
- Use linear regression to predict the number of page views for the top 1,000 websites
- Learn optimization techniques by attempting to break a simple letter cipher
- Compare and contrast U.S. Senators statistically, based on their voting records
- Build a “whom to follow” recommendation system from Twitter data

#### Bayesian Methods for Hackers

by Cameron Davidson-Pilon

**Master Bayesian Inference through Practical Examples and Computation–Without Advanced Mathematical Analysis**

** **

Bayesian methods of inference are deeply natural and extremely powerful. However, most discussions of Bayesian inference rely on intensely complex mathematical analyses and artificial examples, making it inaccessible to anyone without a strong mathematical background. Now, though, Cameron Davidson-Pilon introduces Bayesian inference from a computational perspective, bridging theory to practice–freeing you to get results using computing power.

** Bayesian Methods for Hackers **illuminates Bayesian inference through probabilistic programming with the powerful PyMC language and the closely related Python tools NumPy, SciPy, and Matplotlib. Using this approach, you can reach effective solutions in small increments, without extensive mathematical intervention.

Davidson-Pilon begins by introducing the concepts underlying Bayesian inference, comparing it with other techniques and guiding you through building and training your first Bayesian model. Next, he introduces PyMC through a series of detailed examples and intuitive explanations that have been refined after extensive user feedback. You’ll learn how to use the Markov Chain Monte Carlo algorithm, choose appropriate sample sizes and priors, work with loss functions, and apply Bayesian inference in domains ranging from finance to marketing. Once you’ve mastered these techniques, you’ll constantly turn to this guide for the working PyMC code you need to jumpstart future projects.

**Coverage includes**

** **

• Learning the Bayesian “state of mind” and its practical implications

• Understanding how computers perform Bayesian inference

• Using the PyMC Python library to program Bayesian analyses

• Building and debugging models with PyMC

• Testing your model’s “goodness of fit”

• Opening the “black box” of the Markov Chain Monte Carlo algorithm to see how and why it works

• Leveraging the power of the “Law of Large Numbers”

• Mastering key concepts, such as clustering, convergence, autocorrelation, and thinning

• Using loss functions to measure an estimate’s weaknesses based on your goals and desired outcomes

• Selecting appropriate priors and understanding how their influence changes with dataset size

• Overcoming the “exploration versus exploitation” dilemma: deciding when “pretty good” is good enough

• Using Bayesian inference to improve A/B testing

• Solving data science problems when only small amounts of data are available

**Cameron Davidson-Pilon **has worked in many areas of applied mathematics, from the evolutionary dynamics of genes and diseases to stochastic modeling of financial prices. His contributions to the open source community include lifelines, an implementation of survival analysis in Python. Educated at the University of Waterloo and at the Independent University of Moscow, he currently works with the online commerce leader Shopify.

#### Programming Collective Intelligence

by Toby Segaran

Want to tap the power behind search rankings, product recommendations, social bookmarking, and online matchmaking? This fascinating book demonstrates how you can build Web 2.0 applications to mine the enormous amount of data created by people on the Internet. With the sophisticated algorithms in this book, you can write smart programs to access interesting datasets from other web sites, collect data from users of your own applications, and analyze and understand the data once you’ve found it.

Programming Collective Intelligence takes you into the world of machine learning and statistics, and explains how to draw conclusions about user experience, marketing, personal tastes, and human behavior in general — all from information that you and others collect every day. Each algorithm is described clearly and concisely with code that can immediately be used on your web site, blog, Wiki, or specialized application. This book explains:

- Collaborative filtering techniques that enable online retailers to recommend products or media
- Methods of clustering to detect groups of similar items in a large dataset
- Search engine features — crawlers, indexers, query engines, and the PageRank algorithm
- Optimization algorithms that search millions of possible solutions to a problem and choose the best one
- Bayesian filtering, used in spam filters for classifying documents based on word types and other features
- Using decision trees not only to make predictions, but to model the way decisions are made
- Predicting numerical values rather than classifications to build price models
- Support vector machines to match people in online dating sites
- Non-negative matrix factorization to find the independent features in a dataset
- Evolving intelligence for problem solving — how a computer develops its skill by improving its own code the more it plays a game

Each chapter includes exercises for extending the algorithms to make them more powerful. Go beyond simple database-backed applications and put the wealth of Internet data to work for you.

“Bravo! I cannot think of a better way for a developer to first learn these algorithms and methods, nor can I think of a better way for me (an old AI dog) to reinvigorate my knowledge of the details.”

— Dan Russell, Google

“Toby’s book does a great job of breaking down the complex subject matter of machine-learning algorithms into practical, easy-to-understand examples that can be directly applied to analysis of social interaction across the Web today. If I had this book two years ago, it would have saved precious time going down some fruitless paths.”

— Tim Wolters, CTO, Collective Intellect

#### Machine Learning for Email

by Drew Conway, John White

If you’re an experienced programmer willing to crunch data, this concise guide will show you how to use machine learning to work with email. You’ll learn how to write algorithms that automatically sort and redirect email based on statistical patterns. Authors Drew Conway and John Myles White approach the process in a practical fashion, using a case-study driven approach rather than a traditional math-heavy presentation.

This book also includes a short tutorial on using the popular R language to manipulate and analyze data. You’ll get clear examples for analyzing sample data and writing machine learning programs with R.

- Mine email content with R functions, using a collection of sample files
- Analyze the data and use the results to write a Bayesian spam classifier
- Rank email by importance, using factors such as thread activity
- Use your email ranking analysis to write a priority inbox program
- Test your classifier and priority inbox with a separate email sample set

#### Machine Learning and Security

by Clarence Chio, David Freeman

Can machine learning techniques solve our computer security problems and finally put an end to the cat-and-mouse game between attackers and defenders? Or is this hope merely hype? Now you can dive into the science and answer this question for yourself! With this practical guide, you’ll explore ways to apply machine learning to security issues such as intrusion detection, malware classification, and network analysis.

Machine learning and security specialists Clarence Chio and David Freeman provide a framework for discussing the marriage of these two fields, as well as a toolkit of machine-learning algorithms that you can apply to an array of security problems. This book is ideal for security engineers and data scientists alike.

- Learn how machine learning has contributed to the success of modern spam filters
- Quickly detect anomalies, including breaches, fraud, and impending system failure
- Conduct malware analysis by extracting useful information from computer binaries
- Uncover attackers within the network by finding patterns inside datasets
- Examine how attackers exploit consumer-facing websites and app functionality
- Translate your machine learning algorithms from the lab to production
- Understand the threat attackers pose to machine learning solutions

#### Virtual & Augmented Reality For Dummies

by Paul Mealy

**An easy-to-understand primer on Virtual Reality and Augmented Reality**

Virtual Reality (VR) and Augmented Reality (AR) are driving the next technological revolution. If you want to get in on the action, this book helps you understand what these technologies are, their history, how they’re being used, and how they’ll affect consumers both personally and professionally in the very near future.

With VR and AR poised to become mainstream within the next few years, an accessible book to bring users up to speed on the subject is sorely needed—and that’s where this handy reference comes in! Rather than focusing on a specific piece of hardware (HTC Vive, Oculus Rift, iOS ARKit) or software (Unity, Unreal Engine), *Virtual & Augmented Reality For Dummies* offers a broad look at both VR and AR, giving you a bird’s eye view of what you can expect as they continue to take the world by storm.

* Keeps you up-to-date on the pulse of this fast-changing technology

* Explores the many ways AR/VR are being used in fields such as healthcare, education, and entertainment

* Includes interviews with designers, developers, and technologists currently working in the fields of VR and AR

Perfect for both potential content creators and content consumers, this book will change the way you approach and contribute to these emerging technologies.

#### Data Science from Scratch

by Joel Grus

Data science libraries, frameworks, modules, and toolkits are great for doing data science, but they’re also a good way to dive into the discipline without actually understanding data science. In this book, you’ll learn how many of the most fundamental data science tools and algorithms work by implementing them *from scratch*.

If you have an aptitude for mathematics and some programming skills, author Joel Grus will help you get comfortable with the math and statistics at the core of data science, and with hacking skills you need to get started as a data scientist. Today’s messy glut of data holds answers to questions no one’s even thought to ask. This book provides you with the know-how to dig those answers out.

- Get a crash course in Python
- Learn the basics of linear algebra, statistics, and probability—and understand how and when they’re used in data science
- Collect, explore, clean, munge, and manipulate data
- Dive into the fundamentals of machine learning
- Implement models such as k-nearest Neighbors, Naive Bayes, linear and logistic regression, decision trees, neural networks, and clustering
- Explore recommender systems, natural language processing, network analysis, MapReduce, and databases

#### Machine Learning in Action

by Peter Harrington

**Summary**

*Machine Learning in Action* is unique book that blends the foundational theories of machine learning with the practical realities of building tools for everyday data analysis. You’ll use the flexible Python programming language to build programs that implement algorithms for data classification, forecasting, recommendations, and higher-level features like summarization and simplification.

**About the Book**

A machine is said to learn when its performance improves with experience. Learning requires algorithms and programs that capture data and ferret out the interestingor useful patterns. Once the specialized domain of analysts and mathematicians, machine learning is becoming a skill needed by many.

*Machine Learning in Action* is a clearly written tutorial for developers. It avoids academic language and takes you straight to the techniques you’ll use in your day-to-day work. Many (Python) examples present the core algorithms of statistical data processing, data analysis, and data visualization in code you can reuse. You’ll understand the concepts and how they fit in with tactical tasks like classification, forecasting, recommendations, and higher-level features like summarization and simplification.

Readers need no prior experience with machine learning or statistical processing. Familiarity with Python is helpful.

Purchase of the print book comes with an offer of a free PDF, ePub, and Kindle eBook from Manning. Also available is all code from the book.

**What’s Inside**

- A no-nonsense introduction
- Examples showing common ML tasks
- Everyday data analysis
- Implementing classic algorithms like Apriori and Adaboos

**Table of Contents** PART 1 CLASSIFICATION

- Machine learning basics
- Classifying with k-Nearest Neighbors
- Splitting datasets one feature at a time: decision trees
- Classifying with probability theory: naïve Bayes
- Logistic regression
- Support vector machines
- Improving classification with the AdaBoost meta algorithm
- PART 2 FORECASTING NUMERIC VALUES WITH REGRESSION
- Predicting numeric values: regression
- Tree-based regression
- PART 3 UNSUPERVISED LEARNING
- Grouping unlabeled items using k-means clustering
- Association analysis with the Apriori algorithm
- Efficiently finding frequent itemsets with FP-growth
- PART 4 ADDITIONAL TOOLS
- Using principal component analysis to simplify data
- Simplifying data with the singular value decomposition
- Big data and MapReduce