What is Machine Learning?
Machine learning is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is simply the process of computing the best parameters to model the relationship between some features and targets.
Introduction to the ML ecosystem
Python is a general-purpose interpreted programming language. It is easy to learn and use primarily because the language focuses on readability.
It is a popular language in general, consistently appearing in the top 10 programming languages in surveys on StackOverflow (for example the 2015 survey results). It’s a dynamic language and very suited to interactive development and quick prototyping with the power to support the development of large applications.
It is also widely used for machine learning and data science because of the excellent library support and because it is a general-purpose programming language (unlike R or Matlab). For example, see the results of the Kaggle platform survey results in 2011 and the KDD Nuggets 2015 tool survey results.
SciPy is an ecosystem of Python libraries for mathematics, science, and engineering. It is an add-on to Python that you will need for machine learning.
The SciPy ecosystem is comprised of the following core modules relevant to machine learning:
- NumPy: A foundation for SciPy that allows you to efficiently work with data in arrays.
- Matplotlib: This allows you to create 2D charts and plots from data.
- Pandas: Tools and data structures to organize and analyze your data. To be effective at machine learning in Python you must install and become familiar with SciPy. Specifically:
You will use Pandas to load explore and better understand your data. You will use Matplotlib (and wrappers of Matplotlib in other frameworks) to create plots and charts of your data. You will prepare your data as NumPy arrays for modeling in machine learning algorithms.
The Scikit-Learn library is how you can develop and practice machine learning in python.
The focus of the library is machine learning algorithms for classification, regression, clustering, and more. It also provides tools for related tasks such as evaluating models, tuning parameters, and pre-processing data.
Like Python and SciPy, Scikit-Learn is open source and commercially usable under the BSD license. This means that you can learn about machine learning, develop models, and put them into operations all with the same ecosystem and code. A powerful reason to use Scikit-Learn.
Collaboratory, or “Colab” for short, is a product from Google Research. Colab allows anybody to write and execute arbitrary python code through the browser and is especially well suited to machine learning, data analysis, and education.
Colab notebooks allow you to combine executable code and rich text in a single document, along with images, HTML, LaTeX, and more. When you create your own Colab notebooks, they are stored in your Google Drive account. You can easily share your Colab notebooks with co-workers or friends, allowing them to comment on your notebooks or even edit them.