Machine Learning : Simple Linear Regression with Python, ASSIGNMENT - 7, GO_STP_8113
SOLUTIONS
In this task we have to find the students scores based on their study hours. This is a simple Regression problem type because it has only two variables.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv("/content/StudentHoursScores.csv")
print(data)
OUTPUT:
Hours Scores
0 7.7 79
1 5.9 60
2 4.5 45
3 3.3 33
4 1.1 12
5 8.9 87
6 2.5 21
7 1.9 19
8 2.7 29
9 8.3 81
10 5.5 58
11 9.2 88
12 1.5 14
13 3.5 34
14 8.5 85
15 3.2 32
16 6.5 66
17 2.5 21
18 9.6 96
19 4.3 42
20 4.1 40
21 3.0 30
22 2.6 25
plt.plot(data['Hours'],data['Scores'],color='g')
plt.xlabel('Hours')
plt.ylabel('Scores')
plt.title('StudentHoursScores')
plt.show()
print(data.head())
print()
print(data.tail())
print()
print(data.shape)
print()
print(data.dtypes)
print()
print(data.columns)
print()
print(data.corr())
print()
print(data.describe())
print()
print(data.min())
print()
print(data.info())
print()
x = data.iloc[:,0:-1]
y = data.iloc[:, 1]
print(x.head())
print(y.head())
OUTPUT:
Hours Scores
0 7.7 79
1 5.9 60
2 4.5 45
3 3.3 33
4 1.1 12
Hours Scores
18 9.6 96
19 4.3 42
20 4.1 40
21 3.0 30
22 2.6 25
(23, 2)
Hours float64
Scores int64
dtype: object
Index(['Hours', 'Scores'], dtype='object')
Hours Scores
Hours 1.000000 0.997656
Scores 0.997656 1.000000
Hours Scores
count 23.000000 23.000000
mean 4.817391 47.695652
std 2.709688 27.103228
min 1.100000 12.000000
25% 2.650000 27.000000
50% 4.100000 40.000000
75% 7.100000 72.500000
max 9.600000 96.000000
Hours 1.1
Scores 12.0
dtype: float64
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 23 entries, 0 to 22
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Hours 23 non-null float64
1 Scores 23 non-null int64
dtypes: float64(1), int64(1)
memory usage: 496.0 bytes
None
Hours
0 7.7
1 5.9
2 4.5
3 3.3
4 1.1
0 79
1 60
2 45
3 33
4 12
Name: Scores, dtype: int64
from sklearn.model_selection import train_test_split
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size = 0.2, random_state = 0)
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(xtrain, ytrain)
ypred = model.predict(xtest)
print("Prediciton of testing data by model:\n", ypred)
OUTPUT:
Prediciton of testing data by model:
[91.81882791 54.56931042 29.40071751 84.7716219 40.47489839]
Comments
Post a Comment