Scikit-learn in Python is by far one of the most useful open-source libraries. Scikit learn in Python was develop by David Cournapeau in 2007 as a Google summer of code project. It is the most useful and robust library for machine learning in Python. This library is primarily write in Python and built upon SciPy, NumPy, and Matplotlib. Canopy and Anaconda they both ship the latest version of scikit-learn.
Installation
If you already installed NumPy and Scipy, following are the two easiest ways to install scikit-learn.
Using pip
Following command can be use to install :-
pip install -U scikit-learn
Using conda
Following command can be use to install :-
conda install scikit-learn
Features
- Supervised Learning algorithms
- Almost all the popular supervise learning algorithms, just like Linear Regression, Support Vector Machine, Decision Tree etc., are the part of scikit-learn
- Unsupervised Learning algorithms
- it also has all the popular unsupervised learning algorithms from clustering, factor analysis and PCA to unsupervised neural networks.
- Clustering
- scikit-learn model is use for grouping unlabeled data.
- Cross Validation
- It is use to check the accuracy of supervise models on unseen data.
- Dimensionality Reduction
- It is use for reducing the number of attributes in data which can be further use for summarisation, visualisation and feature selection.
- Ensemble method
- As name suggest, it is use for combining the predictions of multiple supervise models.
- Feature extraction
- It is use to extract the features from data to define the attributes in image and text data.
- Feature selection
- It is use to identify useful attributes to create supervise models.
- Open Source
- It is open source library and also commercially under BSD license.
Modelling Process
Dataset Loading
A collection of data is knows as dataset.
There are two components are Following:-
- Features
- Response
Features − The variables of data is knows as features. They are also known as predictors, inputs or attributes.
- Matrix − It is the collection of features, in case there are more than one.
- Names − It is the list of all the names of the features.
Response − It is the output variable that basically depends upon the feature variables is known as Response
- Vector − It is use to represent response column. Generally, we have just one response column.
- Names − It represent the possible values taken by a response vector.
Splitting the Dataset
Split the dataset into two pieces
- Training set
- The training set to train the model
- Testing set
- The testing set to test the model
Example of Splitting the Dataset
from sklearn.model_selection import train_test_split
X = df.iloc[:, :-1].values
y = df.iloc[:, -1].values
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size = 0.3, random_state = 1
)
print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)
Output
(105, 3) (45, 3) (105,) (45,)
Tree Algorithm
The tree method is a most powerful non-parametric supervised learning method. A node represents a feature, A branch indicates a decision function, and every leaf node indicates the conclusion in a decision tree.
Example of Tree
import numpy as np
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import cross_val_score, train_test_split
X, Y = load_iris( return_X_y = True )
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.4, random_state=0)
dtc = DecisionTreeClassifier(random_state = 0)
dtc.fit(X_train, Y_train)
score = cross_val_score(dtc, X, Y, cv = 10)
print(“Accuracy scores: “, score)
print(“Mean accuracy score: “, np.mean(score))
Output of Tree
Accuracy scores:[1. 0.93333333 1. 0.93333333 0.93333333 0.86666667 0.93333333 1. 1. 1.] Mean accuracy score: 0.96
Gradient Boosting
Example of Gradient Boosting
from sklearn.datasets import make_hastie_10_2
from sklearn.ensemble import GradientBoostingClassifier
X, Y = make_hastie_10_2(random_state = 10)
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.4, random_state=0)
gbc = GradientBoostingClassifier(n_estimators = 100, learning_rate = 1.0, max_depth = 1, random_state = 0)
gbc.fit(X_train, Y_train)
score = gbc.score(X_test, Y_test)
print(“Accuracy scores: “, score)
Output of Gradient Boosting
Accuracy scores: 0.9185416666666667 Dimensionality Reduction using PCA in Sklearn
Clustering Methods
Clustering is the best-unsupervised ML techniques for finding patterns of similarity and relationships between data sets is knows as clustering Method.
KMeans
This algorithm calculates the centroids is knows as Kmeans.
Example of Kmeans
from sklearn.cluster import KMeans
import numpy as np
from sklearn.datasets import load_diabetes
X, Y = load_diabetes(return_X_y = True)
cluster = KMeans(n_clusters = 10)
cluster.fit(X[:50, :])
print(“The number of clusters are: “, cluster.labels_)
Output of Kmeans
The number of clusters are: [6 0 6 2 0 8 8 5 6 2 8 6 0 6 0 5 3 5 2 2 8 8 2 7 2 6 8 2 4 3 2 4 1 4 4 9 3 2 5 6 5 8 6 9 1 6 2 8 0 1]
Spectral Clustering
Example of Spectral Clustering
from sklearn.cluster import SpectralClustering
import numpy as np
from sklearn.datasets import load_diabetes
X, Y = load_diabetes(return_X_y = True)
cluster = SpectralClustering(n_clusters = 10)
cluster.fit(X[:50, :])
print(“The number of clusters are: “, cluster.labels_)
Output of Spectral Clustering
The number of clusters are: [0 2 0 8 4 3 6 4 9 1 3 0 4 6 2 8 5 4 7 1 7 6 9 5 2 8 3 9 1 3 9 5 0 5 4 5 1 5 8 1 7 3 6 5 0 6 1 3 6 8]
Hierarchical Clustering
Example of Hierarchical Clustering
from sklearn.cluster import AgglomerativeClustering
import numpy as np
from sklearn.datasets import load_diabetes
X, Y = load_diabetes(return_X_y = True)
cluster = AgglomerativeClustering(n_clusters = 10, compute_distances = True)
cluster.fit(X[:50, :])
print(“The number of clusters are: “, cluster.labels_)
Output of Hierarchical Clustering
The number of clusters are: [3 6 3 5 6 0 0 1 3 5 0 2 6 3 6 1 4 1 5 6 0 0 5 9 5 2 0 5 6 4 5 0 8 7 6 7 4 5 1 3 1 0 2 7 8 3 0 0 3 2]
If you have any queries regarding this article or if I have missed something on this topic, please feel free to add in the comment down below for the audience. See you guys in another article.
To know more about Scikit-learn Library Function please Wikipedia click here
Stay Connected Stay Safe, Thank you.
0 Comments