Scikit-Learn Library Functions in Python

Scikit-learn in Python is by far one of the most useful open-source libraries. Scikit learn in Python was develop by David Cournapeau in 2007 as a Google summer of code project. It is the most useful and robust library for machine learning in Python. This library is primarily write in Python and built upon SciPy, NumPy, and Matplotlib. Canopy and Anaconda they both ship the latest version of scikit-learn.

Installation

If you already installed NumPy and Scipy, following are the two easiest ways to install scikit-learn.

Using pip

Following command can be use to install :-

pip install -U scikit-learn

Using conda

Following command can be use to install :-

conda install scikit-learn

Features

Supervised Learning algorithms
- Almost all the popular supervise learning algorithms, just like Linear Regression, Support Vector Machine, Decision Tree etc., are the part of scikit-learn
Unsupervised Learning algorithms
- it also has all the popular unsupervised learning algorithms from clustering, factor analysis and PCA to unsupervised neural networks.
Clustering
- scikit-learn model is use for grouping unlabeled data.
Cross Validation
- It is use to check the accuracy of supervise models on unseen data.
Dimensionality Reduction
- It is use for reducing the number of attributes in data which can be further use for summarisation, visualisation and feature selection.
Ensemble method
- As name suggest, it is use for combining the predictions of multiple supervise models.
Feature extraction
- It is use to extract the features from data to define the attributes in image and text data.
Feature selection
- It is use to identify useful attributes to create supervise models.
Open Source
- It is open source library and also commercially under BSD license.

Modelling Process

Dataset Loading

A collection of data is knows as dataset.

There are two components are Following:-

Features
Response

Features − The variables of data is knows as features. They are also known as predictors, inputs or attributes.

Matrix − It is the collection of features, in case there are more than one.
Names − It is the list of all the names of the features.

Response − It is the output variable that basically depends upon the feature variables is known as Response

Vector − It is use to represent response column. Generally, we have just one response column.
Names − It represent the possible values taken by a response vector.

Splitting the Dataset

Split the dataset into two pieces

Training set
- The training set to train the model
Testing set
- The testing set to test the model

Example of Splitting the Dataset

from sklearn.model_selection import train_test_split

X = df.iloc[:, :-1].values

y = df.iloc[:, -1].values

X_train, X_test, y_train, y_test = train_test_split(

   X, y, test_size = 0.3, random_state = 1

)

print(X_train.shape)

print(X_test.shape)

print(y_train.shape)

print(y_test.shape)

Output

(105, 3)
(45, 3)
(105,)
(45,)

Tree Algorithm

The tree method is a most powerful non-parametric supervised learning method. A node represents a feature, A branch indicates a decision function, and every leaf node indicates the conclusion in a decision tree.

Example of Tree

import numpy as np

from sklearn.datasets import load_iris

from sklearn.tree import DecisionTreeClassifier

from sklearn.model_selection import cross_val_score, train_test_split

X, Y = load_iris( return_X_y = True )

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.4, random_state=0)

dtc = DecisionTreeClassifier(random_state = 0)

dtc.fit(X_train, Y_train)

score = cross_val_score(dtc, X, Y, cv = 10)

print(“Accuracy scores: “, score)

print(“Mean accuracy score: “, np.mean(score))

Output of Tree

Accuracy scores:[1. 0.93333333 1. 0.93333333 0.93333333 0.86666667 0.93333333 1. 1. 1.]
Mean accuracy score:  0.96

Gradient Boosting

Example of Gradient Boosting

from sklearn.datasets import make_hastie_10_2

from sklearn.ensemble import GradientBoostingClassifier

X, Y = make_hastie_10_2(random_state = 10)

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.4, random_state=0)

gbc = GradientBoostingClassifier(n_estimators = 100, learning_rate = 1.0, max_depth = 1, random_state = 0)

gbc.fit(X_train, Y_train)

score = gbc.score(X_test, Y_test)

print(“Accuracy scores: “, score)

Output of Gradient Boosting

Accuracy scores:  0.9185416666666667
Dimensionality Reduction using PCA in Sklearn

Clustering Methods

Clustering is the best-unsupervised ML techniques for finding patterns of similarity and relationships between data sets is knows as clustering Method.

KMeans

This algorithm calculates the centroids is knows as Kmeans.

Example of Kmeans

from sklearn.cluster import KMeans

import numpy as np

from sklearn.datasets import load_diabetes

X, Y = load_diabetes(return_X_y = True)

cluster =  KMeans(n_clusters = 10)

cluster.fit(X[:50, :])

print(“The number of clusters are: “, cluster.labels_)

Output of Kmeans

The number of clusters are:  [6 0 6 2 0 8 8 5 6 2 8 6 0 6 0 5 3 5 2 2 8 8 2 7 2 6 8 2 4 3 2 4 1 4 4 9 3 2 5 6 5 8 6 9 1 6 2 8 0 1]

Spectral Clustering

Example of Spectral Clustering

from sklearn.cluster import SpectralClustering

import numpy as np

from sklearn.datasets import load_diabetes

X, Y = load_diabetes(return_X_y = True)

cluster =  SpectralClustering(n_clusters = 10)

cluster.fit(X[:50, :])

print(“The number of clusters are: “, cluster.labels_)

Output of Spectral Clustering

The number of clusters are:  [0 2 0 8 4 3 6 4 9 1 3 0 4 6 2 8 5 4 7 1 7 6 9 5 2 8 3 9 1 3 9 5 0 5 4 5 1 5 8 1 7 3 6 5 0 6 1 3 6 8]

Hierarchical Clustering

Example of Hierarchical Clustering

from sklearn.cluster import AgglomerativeClustering

import numpy as np

from sklearn.datasets import load_diabetes

X, Y = load_diabetes(return_X_y = True)

cluster = AgglomerativeClustering(n_clusters = 10, compute_distances = True)

cluster.fit(X[:50, :])

print(“The number of clusters are: “, cluster.labels_)

Output of Hierarchical Clustering

The number of clusters are:  [3 6 3 5 6 0 0 1 3 5 0 2 6 3 6 1 4 1 5 6 0 0 5 9 5 2 0 5 6 4 5 0 8 7 6 7 4 5 1 3 1 0 2 7 8 3 0 0 3 2]

If you have any queries regarding this article or if I have missed something on this topic, please feel free to add in the comment down below for the audience. See you guys in another article.

To know more about Scikit-learn Library Function please Wikipedia click here

Stay Connected Stay Safe, Thank you.

Scikit-Learn Library Functions in Python

Published by Basic Engineer on December 12, 2022December 12, 2022

Installation

Using pip

Using conda

Features

Modelling Process

Splitting the Dataset

Output

Tree Algorithm

Gradient Boosting

Clustering Methods

KMeans

Spectral Clustering

Hierarchical Clustering

Basic Engineer

0 Comments

Leave a Reply Cancel reply

Software

Different Communication Protocols in Smart Grid

Software

Industry Specific Solution for Utilities by SAP

Software

DLMS and COSEM Protocol

Scikit-Learn Library Functions in Python

Published by Basic Engineer on December 12, 2022December 12, 2022

Installation

Using pip

Using conda

Features

Modelling Process

Splitting the Dataset

Output

Tree Algorithm

Gradient Boosting

Clustering Methods

KMeans

Spectral Clustering

Hierarchical Clustering

Basic Engineer

0 Comments

Leave a Reply Cancel reply

Related Posts

Software

Different Communication Protocols in Smart Grid

Software

Industry Specific Solution for Utilities by SAP

Software

DLMS and COSEM Protocol