Potato
์•ˆ๋…•ํ•˜์„ธ์š”, ๊ฐ์žก๋‹ˆ๋‹ค?๐Ÿฅ” ^___^ ๐Ÿ˜บ github ๋ฐ”๋กœ๊ฐ€๊ธฐ ๐Ÿ‘‰๐Ÿป

AI study/๋ฐ์ดํ„ฐ ๋ถ„์„ test

[Kaggle] Bike Sharing Demand : ์ž์ „๊ฑฐ ์ˆ˜์š” ์˜ˆ์ธก

๊ฐ์ž ๐Ÿฅ” 2021. 5. 5. 04:00
๋ฐ˜์‘ํ˜•

 

์‹œ์ž‘ํ•˜๋ฉฐ

๋ฌด๋ ค 6๋…„์ „์— Kaggle ๋Œ€ํšŒ๋กœ ์˜ฌ๋ผ์˜จ ์ž์ „๊ฑฐ ์ˆ˜์š” ์˜ˆ์ธก ํ”„๋กœ์ ํŠธ!
๋ฐ์ดํ„ฐ ๋ถ„์„ ์—ญ๋Ÿ‰ ํ…Œ์ŠคํŠธ๋ฅผ ๋Œ€๋น„ํ•˜๊ธฐ ์œ„ํ•ด ๊ฐ€์žฅ ๊ธฐ๋ณธ์ ์ธ Kaggle ๋Œ€ํšŒ ๋ถ€ํ„ฐ ์—ฐ์Šตํ•ด๋ณด๊ธฐ๋กœ ํ–ˆ๋‹ค.
๋‚˜์˜ ์ด๋Ÿฌํ•œ ์—ฐ์Šต์ด ์ข‹๊ฒŒ ์ž‘์šฉํ•˜์—ฌ ์ฝ”ํ…Œ์— ๊ผญ ํ•ฉ๊ฒฉํ•˜๊ธธ ๋ฐ”๋ผ๋ฉด์„œ ์‹œ์ž‘ํ•˜๋Š” ๊ณต๋ถ€๋ฆฌ๋ทฐ start ~ ! 

 

1. ํ”„๋กœ์ ํŠธ ์„ค๋ช…

www.kaggle.com/c/bike-sharing-demand/

 

Bike Sharing Demand

Forecast use of a city bikeshare system

www.kaggle.com

์›Œ์‹ฑํ„ด D.C ์˜ Capital Bikeshare ํ”„๋กœ๊ทธ๋žจ์—์„œ ์ž์ „๊ฑฐ ๋Œ€์—ฌ ์ˆ˜์š”๋ฅผ ์˜ˆ์ธกํ•˜๊ธฐ ์œ„ํ•œ ํ”„๋กœ์ ํŠธ์ด๋‹ค. ๊ณผ๊ฑฐ ์‚ฌ์šฉ ํŒจํ„ด์„ ๋‚ ์”จ ๋ฐ์ดํ„ฐ์™€ ๊ฒฐํ•ฉํ•˜์—ฌ ์ˆ˜์š” ์˜ˆ์ธก์„ ์ˆ˜ํ–‰ํ•ด ๋ณธ๋‹ค.

 

2. ํ”„๋กœ์ ํŠธ ๋ชฉํ‘œ

ํŠน์ • ์‹œ๊ฐ„๋Œ€์— ์–ผ๋งˆ๋‚˜ ๋งŽ์€ ์‚ฌ๋žŒ๋“ค์ด ์ž์ „๊ฑฐ๋ฅผ ๋Œ€์—ฌํ•˜๋Š”์ง€ ์˜ˆ์ธกํ•œ๋‹ค.

 

3. ๋ฐ์ดํ„ฐ ์„ค๋ช…

train.csv / test.csv / SampleSubmission.csv ์„ธ ๊ฐ€์ง€์˜ ํŒŒ์ผ์ด ์ฃผ์–ด์ง„๋‹ค. train์€ ๋ง๊ทธ๋Œ€๋กœ ํ•™์Šต๋ฐ์ดํ„ฐ, test ๋ฐ์ดํ„ฐ, ๊ทธ๋ฆฌ๊ณ  ์ตœ์ข… ์ œ์ถœ ํ˜•์‹์ด ๋‹ด๊ฒจ์žˆ๋Š” Sample ํŒŒ์ผ์ด๋‹ค.

๋ฐ์ดํ„ฐ๋Š” ์•„๋ž˜์™€ ๊ฐ™์ด ์ฃผ์–ด์ง„๋‹ค.

Columns ๋ช… ๋ฐ์ดํ„ฐ ๋‚ด์šฉ
Datetime ์‹œ๊ฐ„ (YYYY-MM-DD 00:00:00)
Season ๋ด„(1) ์—ฌ๋ฆ„(2) ๊ฐ€์„(3) ๊ฒจ์šธ(4)
Holiday ๊ณตํœด์ผ(1) ๊ทธ์™ธ(0)
Workingday ๊ทผ๋ฌด์ผ(1) ๊ทธ์™ธ(0)
Weather ์•„์ฃผ๊นจ๋—ํ•œ๋‚ ์”จ(1) ์•ฝ๊ฐ„์˜ ์•ˆ๊ฐœ์™€ ๊ตฌ๋ฆ„(2) ์•ฝ๊ฐ„์˜ ๋ˆˆ,๋น„(3) ์•„์ฃผ๋งŽ์€๋น„์™€ ์šฐ๋ฐ•(4)
Temp ์˜จ๋„(์„ญ์”จ๋กœ ์ฃผ์–ด์ง)
Atemp ์ฒด๊ฐ์˜จ๋„(์„ญ์”จ๋กœ ์ฃผ์–ด์ง)
Humidity ์Šต๋„
Windspeed ํ’์†
Casual ๋น„ํšŒ์›์˜ ์ž์ „๊ฑฐ ๋Œ€์—ฌ๋Ÿ‰
Registered ํšŒ์›์˜ ์ž์ „๊ฑฐ ๋Œ€์—ฌ๋Ÿ‰
Count ์ด ์ž์ „๊ฑฐ ๋Œ€์—ฌ๋Ÿ‰ (๋น„ํšŒ์›+ํšŒ์›)

 

4. Library Import / Data Check

4.1 Library Import

์šฐ์„  ๋ฐ์ดํ„ฐ๋ฅผ ํ™•์ธํ•˜๊ธฐ ์œ„ํ•œ Library๋ฅผ import ํ•ด์ค€๋‹ค.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import datetime as dt
import scipy

 

4.2 Data ๋ถˆ๋Ÿฌ์˜ค๊ธฐ

train ๋ฐ์ดํ„ฐ์™€ test ๋ฐ์ดํ„ฐ๋ฅผ ๋ถˆ๋Ÿฌ์˜จ๋‹ค.

train = pd.read_csv("tain.csv")
test = pd.read_csv("test.csv")
submission = pd.read_csv("SampleSubmission.csv")

 

4.3 ๋ฐ์ดํ„ฐ ํ™•์ธ

์ด์ œ ๋ฐ์ดํ„ฐ์˜ ์ƒ๊น€์ƒˆ๋ฅผ ์‚ดํŽด๋ณด์ž.

train.columns

test.columns

train ๊ณผ test ์˜ ์ปฌ๋Ÿผ์— ์ฐจ์ด๊ฐ€ ๋ณด์ธ๋‹ค. train ๋ฐ์ดํ„ฐ์—” casual, registered, count ๋ณ€์ˆ˜๊ฐ€ ์žˆ์ง€๋งŒ, test ๋ณ€์ˆ˜์—๋Š” ์—†๋‹ค. ๋”ฐ๋ผ์„œ ์šฐ๋ฆฌ๊ฐ€ ์˜ˆ์ธกํ•ด์•ผํ•  ๋ณ€์ˆ˜๋Š” Count๋ผ๊ณ  ์•Œ ์ˆ˜ ์žˆ๋‹ค. (Count ๋ณ€์ˆ˜ = Casual + Registered ๋ผ๊ณ  ๋ช…์‹œ๋˜์–ด ์žˆ์—ˆ๊ธฐ ๋•Œ๋ฌธ์— ์ตœ์ข…์ ์ธ ๊ฒฐ๊ณผ๊ฐ’์„ ๋ณด์œ ํ•œ Count ๊ฐ€ ์ฐ์ž„์„ ์•Œ ์ˆ˜์žˆ์—ˆ๋‹น)

train.head()

test.head()

submission.head()

train data ์™€ test data ๋Š” ์œ„์™€ ๊ฐ™์ด ์ƒ๊ฒผ๊ณ , submission์€ ์•„๋ž˜์ฒ˜๋Ÿผ ์ƒ๊ฒผ๋‹ค. sample submission์—์„œ ๋ณด์ด๋“ฏ count๋ฅผ ์˜ˆ์ธกํ•˜๋ฉด๋œ๋‹ค๊ณ  Label์„ ๋–กํ•˜๋‹ˆ ๋ช…์‹œํ•ด์ค€๋‹ค~

๋ฐ์ดํ„ฐ์˜ datetime์„ ๋‚ ์งœ๋กœ ์ธ์‹ํ•ด์ฃผ๊ธฐ ์œ„ํ•ด์„œ ํ•œ๊ฐ€์ง€์˜ ๊ณผ์ •์„ ๊ฑฐ์ณค๋‹ค. pandas์˜ to_datetime ์„ ํ™œ์šฉํ•˜์—ฌ datetime ์ปฌ๋Ÿผ์„ ๋‚ ์งœ๋กœ ์ธ์‹ํ•˜๊ฒŒ๋” ํ˜•ํƒœ๋ฅผ ๋ฐ”๊พธ์–ด ์ฃผ์—ˆ๋‹ค.

train['datetime'] = pd.to_datetime(train['datetime'])
test['datetime'] = pd.to_datetime(test['datetime'])

info()๋ฅผ ์ถœ๋ ฅํ•ด๋ณด๋‹ˆ, datetime ํ˜•ํƒœ๋กœ ์ž˜ ๋ฐ”๋€Œ์—ˆ๊ณ , ๋‚˜๋จธ์ง€ ๋ฐ์ดํ„ฐ๋“ค์˜ ํ˜•ํƒœ๋„ ํ™•์ธ์ด ๊ฐ€๋Šฅํ–ˆ๋‹ค. ๋‚˜๋จธ์ง€ ๋ณ€์ˆ˜๋Š” ์ „๋ถ€ ์ •์ˆ˜ ํ˜น์€ float ํ˜•ํƒœ๋กœ ๊ตฌ์„ฑ๋˜์–ด์žˆ๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜์žˆ๋‹ค.

ํŠนํžˆ weather, season ๋“ฑ๊ณผ ๊ฐ™์ด ๋Œ€๋ถ€๋ถ„์˜ ๋ณ€์ˆ˜๋Š” '์—ฐ์†'์„ ์˜๋ฏธํ•˜๋Š” ์ˆซ์ž๊ฐ€ ์•„๋‹Œ '๊ทธ๋ ‡๋‹ค, ์•„๋‹ˆ๋‹ค'๋ฅผ ์˜๋ฏธํ•˜๋Š” ๋ฒ”์ฃผํ˜• ๋ณ€์ˆ˜์ด๊ธฐ ๋•Œ๋ฌธ์— ์ถ”ํ›„ ํ˜• ๋ณ€ํ™˜์ด ํ•„์š”ํ•  ๊ฒƒ์ด๋ผ๊ณ  ์ƒ๊ฐํ–ˆ๋‹ค.

train.info()

test.info()

 

์ด์ œ ๋ฐ์ดํ„ฐ์˜ shape์„ ํ™•์ธํ•ด ๋ณด๊ฒ ๋‹ค.

print(train.shape)
print(test.shape)

-- ๊ฒฐ๊ณผ
# (10886 12)
# (6493, 9)

 

4.4 ๋ฐ์ดํ„ฐ EDA ์ง„ํ–‰

์•ž์„œ ๋ฐ์ดํ„ฐ์˜ shape, dtype ๋“ฑ์„ ์‚ดํŽด๋ณด์•˜๋‹ค. ์ด๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์ฐจ๊ทผ์ฐจ๊ทผ EDA๋ฅผ ์ง„ํ–‰ํ•ด๋ณด๋ ค๊ณ  ํ•œ๋‹ค. ๋ฐ์ดํ„ฐ๋ฅผ ์‹œ๊ฐํ™” ํ•ด๋ณต๋ณด๊ณ , null ๊ฐ’์„ ์ฒ˜๋ฆฌํ•˜๋Š” ๋“ฑ ๋ชจ๋ธ๋ง์„ ์œ„ํ•œ ์‚ฌ์ „ ์ค€๋น„๋ฅผ ์‹œ์ž‘ํ•ด๋ณด๊ฒ ๋‹ค.

4.4.1 ๊ธฐ์ค€๋ณ„ ์ž์ „๊ฑฐ ์ˆ˜์š”๋Ÿ‰ ํ™•์ธํ•ด๋ณด๊ธฐ (์‹œ๊ฐํ™”)

ํ˜„์žฌ ๋ฐ์ดํ„ฐ ๋‚ ์งœ๋Š” yyyy-mm-dd 00:00:00 ํ˜•ํƒœ์ด๋‹ค. ์ด๋ฅผ ๋…„,์›”,์ผ,์‹œ,๋ถ„,์ดˆ๋กœ ๋‚˜๋ˆ„์–ด ์—ฐ๋„๋ณ„, ์›”๋ณ„.... ์˜ ์ž์ „๊ฑฐ ์ˆ˜์š”๋Ÿ‰์ด ์–ด๋–ป๊ฒŒ ๋ณ€ํ™”ํ•˜๋Š”์ง€ ์šฐ์„ ์ ์œผ๋กœ ํ™•์ธํ•ด๋ณด๊ฒ ๋‹ค.

์ฃผ ๋‹จ์œ„๋กœ๋„ ํ™•์ธํ•ด๋ณด๊ธฐ์œ„ํ•ด datetime ํŒจํ‚ค์ง€์—์„œ ์ง€์›ํ•˜๋Š” dayofweek๋„ ์ถœ๋ ฅํ•ด๋ณด์•˜๋‹ค.

train['year'] = train['datetime'].dt.year
train['month'] = train['datetime'].dt.month
train['day'] = train['datetime'].dt.day
train['hour'] = train['datetime'].dt.hour
train['minute'] = train['datetime'].dt.minute
train['second'] = train['datetime'].dt.second
# dayofweek ๋Š” ์š”์ผ์„ ๊ฐ€์ ธ์˜ค๋Š” ๋ง
#์›”(0) ํ™”(1) ์ˆ˜(2) ๋ชฉ(3) ๊ธˆ(4) ํ† (5) ์ผ(6)
train['dayofweek'] = train['datetime'].dt.dayofweek

test['year'] = test['datetime'].dt.year
test['month'] = test['datetime'].dt.month
test['day'] = test['datetime'].dt.day
test['hour'] = test['datetime'].dt.hour
test['minute'] = test['datetime'].dt.minute
test['second'] = test['datetime'].dt.second
# dayofweek ๋Š” ์š”์ผ์„ ๊ฐ€์ ธ์˜ค๋Š” ๋ง
#์›”(0) ํ™”(1) ์ˆ˜(2) ๋ชฉ(3) ๊ธˆ(4) ํ† (5) ์ผ(6)
test['dayofweek'] = test['datetime'].dt.dayofweek

 

(1) year ๋ณ„ ์ž์ „๊ฑฐ ์ˆ˜์š”๋Ÿ‰

sns.barplot(data = train, x = 'year', y = 'count')

  • ๋…„๋„๊ฐ€ ์ฆ๊ฐ€ํ•จ์— ๋”ฐ๋ผ ๊ฐ‘์ž๊ธฐ ์ˆ˜์š”๋Ÿ‰์ด ์ฆ๊ฐ€ํ–ˆ๋‹ค. ํ•˜์ง€๋งŒ ๋‹จ ๋‘๊ฐœ์˜ year๋งŒ ์กด์žฌํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ง€์†์ ์œผ๋กœ ์ˆ˜์š”๊ฐ€ ์ฆ๊ฐ€ํ•˜๋Š”์ง€์— ๋Œ€ํ•œ ์—ฌ๋ถ€๋Š” ์•Œ ์ˆ˜ ์—†๊ณ , ๋‹จ์ง€ ํ•ด๋‹น ์ž์ „๊ฑฐ ๋Œ€์—ฌ ํšŒ์‚ฌ๊ฐ€ ์„ฑ์žฅํ–ˆ๋‹ค or ์ž์ „๊ฑฐ ์ˆ˜์š”๊ฐ€ ๋Š˜์—ˆ๋‹ค. ๊นŒ์ง€๋งŒ ์•Œ ์ˆ˜ ์žˆ๋‹ค.
  • 2012๋…„์— ๊ฐ‘์ž๊ธฐ ์ž์ „๊ฑฐ๊ฐ€ ๋Œ€์œ ํ–‰ํ•ด์„œ , 2013๋…„์—๋Š” ์ค„์–ด๋“ค์ˆ˜๋„,,, ๋Š˜์–ด๋‚ ์ˆ˜๋„ ์žˆ๋‹ค๋Š” ๊ฒƒ!
  • ์–ด์ฉƒ๋“  ์ฃผ์–ด์ง„ ๋ฐ์ดํ„ฐ ๋‚ด์—์„œ๋Š” ๋ถ„๋ช…ํžˆ ๋…„๋„๋ณ„๋กœ ์ˆ˜์š”๋Ÿ‰์˜ ์ฐจ์ด๊ฐ€ ์กด์žฌํ•จ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค. ๋”ฐ๋ผ์„œ ์˜ˆ์ธกํ•˜๋Š”๋ฐ year๋ณ€์ˆ˜๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜๋Š” ์žˆ๊ฒ ๋‹ค.

(2) month ๋ณ„ ์ž์ „๊ฑฐ ์ˆ˜์š”๋Ÿ‰

sns.barplot(data = train, x = 'month', y = 'count')

  • ๋น„๊ต์  12, 1, 2์›”์— ์ž์ „๊ฑฐ ์ˆ˜์š”๋Ÿ‰์ด ์ ๋‹ค
  • 6, 7, 8์›”์— ์ž์ „๊ฑฐ ์ˆ˜์š”๋Ÿ‰์ด ๊ฐ€์žฅ ๋งŽ์Œ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค. ๋”ฐ๋ผ์„œ month ๋ณ€์ˆ˜๋„ ์˜ˆ์ธก์— ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๊ฒ ๋‹ค.

(3) day ๋ณ„ ์ž์ „๊ฑฐ ์ˆ˜์š”๋Ÿ‰

sns.barplot(data = train, x = 'day', y = 'count')

  • day๋ณ„๋กœ ์ž์ „๊ฑฐ ์ˆ˜์š”๋Ÿ‰์ด ๋‹ฌ๋ผ์ง์„ ์•Œ ์ˆ˜์žˆ๋‹ค.
  • ํ•˜์ง€๋งŒ ๋ช…๋ฐฑํ•œ ์ฐจ์ด๊ฐ€ ์—†๊ธฐ ๋–„๋ฌธ์— ๋ณ€์ˆ˜์—์„œ ์ œ๊ฑฐํ•˜๊ฑฐ๋‚˜ ์ œ๊ฑฐํ•˜์ง€ ์•Š๊ฑฐ๋‚˜์— ๋Œ€ํ•œ ์„ ํƒ์€ ๋‚˜์ค‘์— ํ•ด๋ณด๊ฒ ๋‹ค.

(4) season (๊ณ„์ ˆ๋ณ„)

 sns.barplot(data = train, x = 'season', y = 'count')

  • ์›”๋ณ„๋กœ ์ถœ๋ ฅํ–ˆ์„ ๋•Œ ๋ถ„๋ช…, 12,1,2์›”(๊ฒจ์šธ)์— ๊ฐ€์žฅ ์ˆ˜์š”๋Ÿ‰์ด ์ ์—ˆ๋Š”๋ฐ, ์œ„ season ๊ทธ๋ž˜ํ”„์™€๋Š” ๋‹ค๋ฅด๊ฒŒ ๋‚˜์™”๋‹ค. ๋”ฐ๋ผ์„œ season์„ ๋‚˜๋ˆ„๋Š” ๊ฒฝ๊ณ„๊ฐ€ ๋‹ค๋ฆ„์„ ์•Œ ์ˆ˜ ์žˆ์—ˆ๋‹ค.
print(train[train['season'] == 1].month.unique()
print(train[train['season'] == 2].month.unique()
print(train[train['season'] == 3].month.unique()
print(train[train['season'] == 4].month.unique()

  • ์–ด์จ‹๋“  ๊ณ„์ ˆ์— ๋”ฐ๋ผ ์ˆ˜์š”๋Ÿ‰์˜ ์ฐจ์ด๊ฐ€ ์กด์žฌํ•˜๊ธฐ์—, season๋„ ์˜ˆ์ธกํ•˜๋Š”๋ฐ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๊ฒ ๋‹ค.

(5) ์‹œ๊ฐ„๋Œ€๋ณ„ point plot

fig, (ax1) = plt.subplot(1,1)
fig.set_size_inches(20, 5) # ๊ฐ€๋กœ,์„ธ๋กœ

sns.pointplot(data = train, x = 'hour', y = 'count', ax = ax1)

  • ์‹œ๊ฐ„๋Œ€๋ณ„๋กœ ๋ณด์•˜์„๋•Œ, 8์‹œ , 17์‹œ๊ฒฝ์— ๊ฐ€์žฅ ์ž์ „๊ฑฐ ์ˆ˜์š”๋Ÿ‰์ด ๋งŽ๋‹ค. (์ถœ, ํ‡ด๊ทผ์‹œ๊ฐ„์ž„์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค.)

(6) workingday (์นดํ…Œ๊ณ ๋ฆฌํ˜•) , ์‹œ๊ฐ„๋Œ€๋ณ„๋กœ point plot ํ™•์ธ

fig, (ax1) = plt.subplot(1,1)
fig.set_Size_inches(20, 5)

#seaborn์—์„œ ์นดํ…Œ๊ณ ๋ฆฌํ˜• ๋ฐ์ดํ„ฐ๋ฅผ ๊ณ ๋ คํ•œ ๊ทธ๋ž˜ํ”„๋ฅผ ์ถœ๋ ฅํ•˜๊ณ  ์‹ถ์œผ๋ฉด, hue๋ณ€์ˆ˜๋ฅผ ์ถ”๊ฐ€ํ•ด์ค€๋‹ค.
#๋”ฐ๋ผ์„œ ์‹œ๊ฐ„๋Œ€๋ณ„๋กœ holiday == 1์ผ๋•Œ, holiday =- 0 ์ผ๋•Œ๋กœ ์ถœ๋ ฅ๋œ๋‹ค.
sns.pointplot(data = train, x = 'hour', y = 'count', hue = 'workingday', ax = ax1)

  • 1: ๊ทผ๋ฌด์ผ / 0: ๊ทผ๋ฌด์ผ ์•„๋‹ ๋•Œ
  • ๊ทผ๋ฌด์ผ์—๋Š” ์ถœํ‡ด๊ทผ ์‹œ๊ฐ„์— ์ˆ˜์š”๋Ÿ‰์ด ๊ธ‰์ฆํ•˜๊ณ , ๊ทผ๋ฌด์ผ์ด ์•„๋‹ ๋•Œ์—๋Š” ์˜คํ›„์‹œ๊ฐ„๋Œ€์— ์ˆ˜์š”๋Ÿ‰์ด ์ฆ๊ฐ€ํ•˜๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค. ๋”ฐ๋ผ์„œ holiday ๋ณ€์ˆ˜๋„ ์˜ˆ์ธก์— ์˜ํ–ฅ์„ ์ค„ ๊ฒƒ์ด๋ผ๊ณ  ํŒ๋‹จํ–ˆ๋‹ค.

(7) holiday (์นดํ…Œ๊ณ ๋ฆฌํ˜•), ์‹œ๊ฐ„๋Œ€๋ณ„๋กœ point plot ํ™•์ธ

fig, ax1 = plt.subplot(1,1)
fig.set_size_inches(20, 5)

sns.pointplot(data = train, x = 'hour', y = 'count', hue = 'holiday', ax = ax1)

  • 1: ํœด์ผ / 0: ํœด์ผ์ด ์•„๋‹๋•Œ
  • ํœด์ผ์ด ์•„๋‹๋•Œ๋Š” ์ถœ,ํ‡ด๊ทผ์‹œ๊ฐ„๋Œ€์— ์ž์ „๊ฑฐ์˜ ์ˆ˜์š”๊ฐ€ ์ฆ๊ฐ€ํ•˜๊ณ , ํœด์ผ์ผ ๋•Œ๋Š” ์˜คํ›„ ์‹œ๊ฐ„๋Œ€์— ์ฆ๊ฐ€ํ•œ๋‹ค. ๋”ฐ๋ผ์„œ holiday ๋ณ€์ˆ˜๋„ ์—์ธกํ•˜๋Š”๋ฐ ์‚ฌ์šฉํ•ด๋ณด๋ ค๊ณ  ํ•œ๋‹ค.

(8) weather (์นดํ…Œ๊ณ ๋ฆฌํ˜•), ์‹œ๊ฐ„๋Œ€๋ณ„๋กœ point plot ํ™•์ธ

fig, ax1 = plt.subplot(1,1)
fig.set_size_inches(20, 5)

sns.pointplot(data = train, x = 'hour', y = 'count', hue = 'weather', ax = ax1)

  • ์•„์ฃผ๊นจ๋—ํ•œ๋‚ ์”จ(1) ์•ฝ๊ฐ„์˜ ์•ˆ๊ฐœ์™€ ๊ตฌ๋ฆ„(2) ์•ฝ๊ฐ„์˜ ๋ˆˆ,๋น„(3) ์•„์ฃผ๋งŽ์€๋น„์™€ ์šฐ๋ฐ•(4)
  • 1,2๋ฒˆ๋‚ ์”จ > 3๋ฒˆ๋‚ ์”จ > 4๋ฒˆ๋‚ ์”จ๋Š” ๋ฐ์ดํ„ฐ๊ฐ€ ๊ฑฐ์˜ ์—†์Œ

(9) dayofweek (์นดํ…Œ๊ณ ๋ฆฌํ˜•), ์‹œ๊ฐ„๋Œ€๋ณ„๋กœ point plot ํ™•์ธ

fig, ax1 = plt.subplot(1,1)
fig.set_size_inches(20, 5)

sns.pointplot(data = train, x = 'hour', y = 'count', hue = ' dayofweek', ax = ax1)

  • dayofweek ์˜ ๋ชจ๋“  ์–‘์ƒ์€ ๋น„์Šทํ•˜๊ฒŒ ๋‚˜ํƒ€๋‚˜์ง€๋งŒ ์˜ˆ์ธก์—๋Š” ์‚ฌ์šฉํ•ด๋ณด๋„๋ก ํ•œ๋‹ค. 

๊ฒฐ๋ก ์ ์œผ๋กœ year, month, day, hour, weather, holiday, workingday, dayofweek, season์„ ์ถ”ํ›„์— ์˜ˆ์ธก์— ํ™œ์šฉํ•ด๋ณด๋„๋ก ํ•œ๋‹ค.

 

4.4.2 ๋ณ€์ˆ˜๋ผ๋ฆฌ์˜ ์ƒ๊ด€๊ด€๊ณ„

corr_data = [['datetime', 'season', 'holiday', 'weather', 'temp', 'atemp', 'humidity', 'windspeed']]
colormap = plt.cm.PuBu
sns.heatmap(corr_data.corr(),
			, linewidths = 0.1
            , square = True
            , annot = True
            , cmap = colormap)

  • temp์™€ atemp์˜ ์ƒ๊ด€๊ด€๊ณ„๊ฐ€ ๋งค์šฐ ๋†’์•„์„œ, ๋‹ค์ค‘๊ณต์„ ์„ฑ์ด ์˜์‹ฌ๋œ๋‹ค. ๋”ฐ๋ผ์„œ  temp ๋ณ€์ˆ˜ ํ•˜๋‚˜๋งŒ ์‚ฌ์šฉํ•  ๊ฒƒ์ด๋‹ค.

4.4.3 ์˜จ๋„, ์Šต๋„, ๋ฐ”๋žŒ์„ธ๊ธฐ์— ๋Œ€ํ•ด์„œ ์‚ดํŽด๋ณด๊ธฐ 

์•ž์„œ ์นดํ…Œ๊ณ ๋ฆฌํ˜• ๋ณ€์ˆ˜์— ๋Œ€ํ•ด์„œ ์‚ดํŽด๋ณด์•˜์œผ๋‹ˆ, ์ด์ œ ์—ฐ์†ํ˜• ๋ณ€์ˆ˜๋ฅผ ์‚ดํŽด๋ณด๋„๋ก ํ•œ๋‹ค.
Temp, Humidity, Windspeed ์˜ ๊ฐ’ ๋ถ„ํฌ๋ฅผ ๋ณด๊ธฐ ์œ„ํ•ด ๊ฐ๊ฐ scatter plot์„ ๊ทธ๋ ค๋ณธ๋‹ค.

fig, (ax1, ax2, ax3) = plt.subplots(ncols = 3, figsize=(12,5))

sns.scatterplot(data = train, x = 'windspeed', y = 'count', ax = ax1)
sns.scatterplot(data = train, x = 'temp', y = 'count', ax = ax2)
sns.scatterplot(data = train, x = 'humidity', y =  'count', ax = ax3)

์ฐจ๋ก€๋Œ€๋กœ ๋ฐ”๋žŒ์„ธ๊ธฐ, ์˜จ๋„, ์Šต๋„์˜ ๋ฐ์ดํ„ฐ ๋ถ„ํฌ๋ฅผ ๋ณผ ์ˆ˜ ์žˆ๋‹ค. ์›๋ž˜ ์—ฌ๊ธฐ์„œ ๋‚˜๋Š” ๋ณ„๋‹ค๋ฅธ ์ด์ƒํ•œ ์ ์„ ์ฐพ์ง€ ๋ชปํ–ˆ์ง€๋งŒ, HONG_YP๋‹˜์˜ ๋ธ”๋กœ๊ทธ๋ฅผ ๋ณด๊ณ  '์•—!!!' ํ•˜๋Š” ์—„์ฒญ๋‚œ ์ธ์‚ฌ์ดํŠธ๋ฅผ ์–ป๊ฒŒ๋œ๋‹ค. ์ด๋•Œ๋ฌธ์— ์‹ค์ œ๋กœ ์บ๊ธ€์˜ ์ ์ˆ˜๊ฐ€ ์ƒ์Šนํ•˜๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์—ˆ๋‹ค.

HONG_YP ๋‹˜๊ป˜์„œ ์บ์น˜ํ•˜์‹  ์ธ์‚ฌ์ดํŠธ๋Š” ๋ฐ”๋กœ, '๋ฐ”๋žŒ ์„ธ๊ธฐ๊ฐ€ 0์ธ ๊ฒฝ์šฐ๋Š” ๊ฑฐ์˜ ์—†์ง€ ์•Š๋‚˜?' ๋ผ๋Š” ์ ์ด์—ˆ๋‹ค. ๋ฐ์ดํ„ฐ ๋ถ„ํฌ๋ฅผ ๋ณด๋ฉด, windspeed ๋ณ€์ˆ˜์— 0์˜ ๊ฐ’์ด ์ƒ๋‹นํžˆ ๋งŽ์ด ์กด์žฌํ•œ๋‹ค.

len(train[train['windspeed']==0])
# ๊ฒฐ๊ณผ : 1313

์‹ค์ œ๋กœ ์ถœ๋ ฅํ•ด๋ณด๋‹ˆ windspeed์— 0 ๊ฐ’์ด ์žˆ๋Š” ๊ฒƒ์€ 1313๊ฐœ ์˜€๊ณ , 10000์—ฌ๊ฐœ์˜ ๋ฐ์ดํ„ฐ์ค‘ 1313๊ฐœ๋ฉด ์ƒ๋‹นํžˆ ๋งŽ์€ ๋ถ€๋ถ„์„ ์ฐจ์ง€ํ•œ๋‹ค๊ณ  ์ƒ๊ฐํ•  ์ˆ˜ ์žˆ๋‹ค. ๋”ฐ๋ผ์„œ windspeed๊ฐ€ 0 ์ธ ๊ฒƒ์„ ๋Œ€์ฒดํ•˜๋Š” feature engineering ๊ณผ์ •์ด ํ•˜๋‚˜ ๋” ํ•„์š”ํ•˜๋‹ค. 

 

5. Feature Engineering

์„ธ ๊ฐ€์ง€์˜ ๋‹จ๊ณ„๋กœ ์ง„ํ–‰ํ•˜๋ ค๊ณ  ํ•œ๋‹ค. ๋ฐ์ดํ„ฐ์˜ ์™œ๋„์™€ ์ฒจ๋„๋ฅผ ์‚ดํŽด๋ณด๊ณ  ์กฐ์ ˆํ•ด ์ค„๊ฒƒ์ด๊ณ , IQR๋ฐฉ๋ฒ•์œผ๋กœ ์ด์ƒ์น˜๋ฅผ ์ œ๊ฑฐํ•ด ์ค€ ํ›„, ์•ž์„œ ์–ธ๊ธ‰ํ–ˆ๋˜ windspeed ์— ๋Œ€ํ•œ feature engineering์„ ์ง„ํ–‰ํ•  ๊ฒƒ์ด๋‹ค.

5.1 ์ด์ƒ์น˜ ์ œ๊ฑฐ

5.1.1 ์—ฐ์†ํ˜• ๋ณ€์ˆ˜์— ๋Œ€ํ•œ Boxplot ์ž‘์„ฑ ๋ฐ ์ด์ƒ์น˜ ํ™•์ธ

์ด์ƒ์น˜ ์ œ๊ฑฐ๋ฅผ ์œ„ํ•ด box plot์„ ๋จผ์ € ์ž‘์„ฑํ•ด ๋ณด๊ฒ ๋‹ค. ์นดํ…Œ๊ณ ๋ฆฌํ˜• ๋ณ€์ˆ˜๋ง๊ณ  ์—ฐ์†ํ˜• ๋ณ€์ˆ˜์— ๋Œ€ํ•ด์„œ ์ง„ํ–‰ํ•ด๋ณด์•˜๋‹ค.

fig, (ax1, ax2, ax3, ax4, ax5, ax6) = plt.subplots(nrows = 6, figsize = (12,10))
sns.boxplot(data = train, x = 'windspeed', ax = ax1)
sns.boxplot(data = train, x = 'humidity', ax = ax2)
sns.boxplot(data = train, x = 'temp', ax = ax3)
sns.boxplot(data = train, x = 'casual', ax = ax4)
sns.boxplot(data = train, x = 'registered', ax = ax5)
sns.boxplot(data = train, x = 'count', ax = ax6)

5.1.2 IQR ๋ฐฉ์‹์„ ์‚ฌ์šฉํ•œ ์ด์ƒ์น˜ ์ œ๊ฑฐ

IQR๋ฐฉ์‹์€ 4๋ถ„์œ„ ๊ฐœ๋…์œผ๋กœ ์ถœ๋ฐœํ•œ๋‹ค. ์ „์ฒด ๋ฐ์ดํ„ฐ๋ฅผ ์˜ค๋ฆ„์ฐจ์ˆœ์œผ๋กœ ์ •๋ฆฌํ•œ ํ›„, ์ •ํ™•ํžˆ 4๋“ฑ๋ถ„ํ•œ๋‹ค. (25%, 50%, 75%, 100%) ์—ฌ๊ธฐ์„œ 75%์™€ 25% ์ง€์ ์˜ ๊ฐ’์˜ ์ฐจ์ด๋ฅผ IQR์ด๋ผ๊ณ  ๋ถ€๋ฅด๊ณ , ๊ทธ ์ด์ƒ๊ณผ ์ดํ•˜์— ํ•ด๋‹นํ•˜๋Š” ๊ฐ’์„ ์ด์ƒ์น˜๋ผ๊ณ  ํŒ๋‹จํ•˜๋Š” ๋ฐฉ์‹์ด๋‹ค.  (IQR ์„ค๋ช… : hwi-doc.tistory.com/entry/IQR-%EB%B0%A9%EC%8B%9D%EC%9D%84-%EC%9D%B4%EC%9A%A9%ED%95%9C-%EC%9D%B4%EC%83%81%EC%B9%98-%EB%8D%B0%EC%9D%B4%ED%84%B0Outlier-%EC%A0%9C%EA%B1%B0 ์ฐธ๊ณ )

https://towardsdatascience.com/understanding-boxplots-5e2df7bcbd51

์ด์ƒ์น˜ ์ œ๊ฑฐ๋ฅผ ์œ„ํ•œ ํ•จ์ˆ˜๋ฅผ ์ž‘์„ฑํ•ด๋ณด๊ฒ ๋‹ค. ํ•ด๋‹น ์ฝ”๋“œ๋Š” HONG_YP๋‹˜์˜ ๋ธ”๋กœ๊ทธ์— ์žˆ๋Š” ์ฝ”๋“œ๋ฅผ ๊ทธ๋Œ€๋กœ ์‚ฌ์šฉํ–ˆ๋‹ค! (๋งจ ์•„๋ž˜ ๋งํฌ ์ฒจ๋ถ€)

from collections import Counter

def detect_outliers(data, n, cols):
	outliser_indices = []
    for col in cols:
    	Q1 = np.percentile(data[col], 25)
        Q3 = np.percentile(data[col], 75)
        IQR = Q3 - Q1
        
        outlier_step = 1.5 * IQR
        
        outlier_list_col = data[(data[col] < Q1 - outlier_step) | (data[col] > Q3 + outliser_step)].index
		outlier_indices.extend(outlier_list_col)
	outlier_indices = Counter(outlier_indices)
    multiple_outliers = list(k for k, v in outlier_indices.items() if v > n)
    
    return multiple_outliers
    
Outliers_to_drop = detect_outliers(train, 2, ["temp", "atemp", "casual", "registered", "humidity", "windspeed", "count"]

์ด์ƒ์น˜ drop ํ•˜๊ธฐ ์ „ data ํ™•์ธ

train.shape

์ด์ƒ์น˜ drop ์‹คํ–‰!

train = train.drop(Outliers_to_drop, axis = 0).reset_index(drop = True)
train.shape

์•„์ฃผ ์กฐ๊ธˆ์ด์ง€๋งŒ, ์ด์ƒ์น˜๊ฐ€ ์ œ๊ฑฐ๋์Œ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

5.2 ์™œ๋„(skewness)์™€ ์ฒจ๋„(kurtosis) ํ™•์ธ

๋ฐ์ดํ„ฐ ๋ถ„์„์—์„œ์˜ ์™œ๋„์™€ ์ฒจ๋„๋Š” ์ค‘์š”ํ•˜๋‹ค. ๊ฐ„๋‹จํ•˜๊ฒŒ ์ •๋ฆฌํ•˜์ž๋ฉด, ์•„๋ž˜์™€ ๊ฐ™๊ณ , ์™œ๋„์™€ ์ฒจ๋„์˜ ์ˆ˜์น˜๋ฅผ ๋ณด๊ณ  ๋ฐ์ดํ„ฐ์˜ ์น˜์šฐ์นจ ์ •๋„๋ฅผ ์•Œ ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

  • ์™œ๋„
    • ๋ฐ์ดํ„ฐ์˜ ๋ถ„ํฌ๊ฐ€ ํ•œ์ชฝ์œผ๋กœ ์ ๋ฆฐ๊ฒƒ์„ ์˜๋ฏธ
    • skew์˜ ์ˆ˜์น˜๊ฐ€ -2~+2 ์ •๋„๊ฐ€ ๋˜์–ด์•ผ ์น˜์šฐ์นจ์ด ์—†๋Š” ๋ฐ์ดํ„ฐ
    • -2์ดํ•˜๋Š” ์™ผ์ชฝ์œผ๋กœ ์ ๋ฆฐ ๋ฐ์ดํ„ฐ (negative) +2 ์ด์ƒ์€ ์˜ค๋ฅธ์ชฝ์œผ๋กœ ์ ๋ฆฐ ๋ฐ์ดํ„ฐ(positive)
      • positive์ผ๊ฒฝ์šฐ ๋ณ€ํ™˜๋ฐฉ๋ฒ• : square root, cube, log(๋ฐ‘10)
      • negative์ผ๊ฒฝ์šฐ ๋ณ€ํ™˜๋ฐฉ๋ฒ• : square, cube root, logarithmic(๋ฐ‘2์ธlog)
  • ์ฒจ๋„
    • ๋ถ„ํฌ์˜ ๋พฐ์กฑํ•จ์ด๋‚˜ ํ‰ํ‰ํ•จ์— ๊ด€๋ จ๋œ ๊ฒƒ์ด ์•„๋‹ˆ๋ผ ๋ถ„ํฌ์˜ tail์— ๋Œ€ํ•œ ๋ชจ๋“  ๊ฒƒ
    • ํ•œ์ชฝ ๊ผฌ๋ฆฌ๋ถ€๋ถ„์˜ ๊ทน๊ฐ’๊ณผ ๋‹ค๋ฅธ์ชฝ ๊ผฌ๋ฆฌ์˜ ๊ทน๊ฐ’๊ณผ์˜ ์ฐจ์ด๋ฅผ ๋ณด์—ฌ์คŒ
    • ์•„์›ƒ๋ผ์ด์–ด๋ฅผ ์ฐพ์„ ๋•Œ ์ฃผ๋กœ ์‚ฌ์šฉ
    • ์ฒจ๋„๊ฐ€ ๋†’๋‹ค -> ์•„์›ƒ๋ผ์ด์–ด๊ฐ€ ๋งŽ์ด ์žˆ๋‹ค

(1) ์™œ๋„์™€ ์ฒจ๋„๋ฅผ ์‹œ๊ฐํ™”ํ•ด์„œ ์‚ดํŽด๋ณด๊ธฐ

๊ฐ€์žฅ ๊ธฐ๋ณธ์ ์ธ ํžˆ์Šคํ† ๊ทธ๋žจ (distplot)์„ ๊ทธ๋ ค์„œ ๋ฐ์ดํ„ฐ๋ฅผ ํ™•์ธํ•ด ๋ณด๊ฒ ๋‹ค. 

fig, ax = plt.subplot(1,1, figsize = (10, 6))

graph = sns.distplot(train['count'],
					, color = 'b;
                    , label = 'Skewness:{.2f}'.format(train['count'].skew())
                    , ax = ax)
                    
graph = graph.legend(loc = 'best')

print('skewness(์™œ๋„): %f' %train['count'].skew())
print('kurtosis(์ฒจ๋„): %f' %train['count'].kurt())

๊ฒฐ๋ก ์ ์œผ๋กœ ์ˆ˜์น˜์ƒ์œผ๋กœ๋Š” ์™œ๋„์™€ ์ฒจ๋„์— ๋ฌธ์ œ๊ฐ€ ์—†๊ฒŒ ์ถœ๋ ฅ๋œ๋‹ค. ํ•˜์ง€๋งŒ ๋ฐ์ดํ„ฐ์˜ ํžˆ์Šคํ† ๊ทธ๋žจ์„ ๋ณด์•„ํ•˜๋‹ˆ, count ๊ฐ€ 0์— ๊ต‰์žฅํžˆ ๋งŽ์ด ์น˜์šฐ์ณ์ € ์žˆ๋Š” ๊ฒƒ์„ ํ™•์ธ ํ•  ์ˆ˜ ์žˆ๋‹ค. ์ด๋•Œ Log scaling์„ ํ†ตํ•ด ์ •๊ทœํ™” ์‹œ์ผœ์ฃผ๋„๋ก ํ•˜์ž.

(์—ฌ๊ธฐ์„œ ์ฃผ์˜ํ•  ์ ์€, y๊ฐ’์ธ count ๊ฐ’์— log๋ฅผ ์ทจํ•ด์ฃผ์—ˆ์œผ๋‹ˆ, ๋งˆ์ง€๋ง‰์— ๋‚˜์˜จ ์˜ˆ์ธก๊ฒฐ๊ณผ๊ฐ’์—๋Š” ๋‹ค์‹œ log๋ฅผ ์ทจํ•ด์ฃผ์–ด์•ผ ์›๋ž˜ ์›ํ•˜๋˜ ๊ฐ’์ด ๋‚˜์˜จ๋‹ค!!!!)

(2) count๊ฐ’์— log๋ฅผ ์ทจํ•ด์ฃผ์–ด ์ •๊ทœํ™” ํ•ด์ฃผ๊ธฐ 

#lambda ๋ฅผ ์‚ฌ์šฉํ•ด์„œ ๋กœ๊ทธ๋ฅผ ์ทจํ•ด์ค€ count๊ฐ’์„ count_log ์ปฌ๋Ÿผ์œผ๋กœ ์ƒ์„ฑํ•ด์ฃผ์ž
train['count_log'] = train['count'].map(lambda i:np.log(i) if i > 0 else 0)

fig, ax = plt.subplots(1,1, figsize = (10, 6))
graph = sns.distplot(train['count_log']
			, color = 'b'
            , label = 'skewness: {:2f}'.format(train['count_log'].skew())
            , ax = ax)
graph = graph.legend(loc = 'best')

print("skewness(์™œ๋„): %f" %train['count_log'].skew())
print("kurtosis(์ฒจ๋„): %f" %train['count_log'].kurt())

#ํ•„์š”์—†๋Š” count๊ฐ’ ์—†์• ์ฃผ๊ธฐ
train.drop('count', axis = 1, inplace = True)

์™œ๋„์™€ ์ฒจ๋„์˜ ์ˆ˜์น˜๋„ ๊ดœ์ฐฎ๊ฒŒ ๋‚˜์™”๊ณ , ๋ฐ์ดํ„ฐ๊ฐ€ 0 ํ•˜๋‚˜์—๋งŒ ์น˜์šฐ์ณ์ ธ ์žˆ๋Š” ๊ฒƒ์ด ๊ฐœ์„ ๋œ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. ๋”ฐ๋ผ์„œ count_log ๊ฐ’์„ ์˜ˆ์ธก์— ํ™œ์šฉํ•  ๊ฒƒ์ด๊ณ , ๋‚˜์ค‘์— ๋‹ค์‹œ log๋ฅผ ์ทจํ•ด์ฃผ๋„๋ก ํ•˜์ž.

5.3 windspeed = 0 ๋Œ€์ฒด๊ฐ’ ์ฐพ๊ธฐ

์•ž์„œ ์–ธ๊ธ‰ํ–ˆ๋˜ ๊ฒƒ์ฒ˜๋Ÿผ, ํ’์†์ด 0์ผ๋•Œ๊ฐ€ ๊ฑฐ์˜ ์—†๋Š” ๊ฒƒ์„ ๊ณ ๋ คํ•˜์—ฌ windspeed ๊ฐ’์„ ๋Œ€์ฒดํ•ด์ฃผ๊ธฐ๋กœ ํ•œ๋‹ค. 

<< ๊ฒฐ์ธก๊ฐ’ ์ฒ˜๋ฆฌ ๋ฐฉ๋ฒ•>>

- ๊ฒฐ์ธก๊ฐ’์„ ์•ž ๋ฐฉํ–ฅ ํ˜น์€ ๋’ท ๋ฐฉํ–ฅ์œผ๋กœ ์ฑ„์šฐ๊ธฐ
- ๊ฒฐ์ธก๊ฐ’์„ ๋ณ€์ˆ˜๋ณ„ ํ‰๊ท ์œผ๋กœ ์ฑ„์šฐ๊ธฐ
- ๊ฒฐ์ธก๊ฐ’์„ ๊ฐ€์ง„ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ญ์ œํ•˜๊ธฐ
- ๊ฒฐ์ธก๊ฐ’์„ 0 ์ด๋‚˜, ์•„์˜ˆ ๋‹ค๋ฅธ ๊ฐ’์œผ๋กœ (-999) ๋Œ€์ฒดํ•˜๊ธฐ
- ๊ฒฐ์ธก๊ฐ’์„ ์˜ˆ์ธก๋œ ๊ฐ’์œผ๋กœ ๋Œ€์ฒดํ•˜๊ธฐ (๋จธ์‹ ๋Ÿฌ๋‹์„ ๋Œ๋ ค์„œ)

์•„๋ฌด๋ž˜๋„ windspeed๊ฐ€ null๊ฐ’์ธ ๊ฒฝ์šฐ๋ฅผ 0์œผ๋กœ ์ฒ˜๋ฆฌํ•œ ๊ฒƒ์ด๋ผ๊ณ  ์ƒ๊ฐํ–ˆ๊ธฐ์—, ๊ฒฐ์ธก์น˜๋ฅผ ์ฑ„์šฐ๋Š” ๋ฐฉ๋ฒ•์ค‘์— ํ•˜๋‚˜์˜ ๋ฐฉ๋ฒ•์ธ ์˜ˆ์ธก๋œ ๊ฐ’์œผ๋กœ ๋Œ€์ฒดํ•˜๋Š” ๋ฐฉ๋ฒ•์œผ๋กœ 0์„ ๋ฐ”๊พธ์–ด์ฃผ๋ ค๊ณ  ํ•œ๋‹ค.

  • RandomForest ๋ฅผ ํ™œ์šฉํ•˜์—ฌ ์—์ธก๊ฐ’์œผ๋กœwindspeed = 0.0 ๊ฐ’์„ ๋Œ€์ฒดํ•˜๊ธฐ
from sklearn.ensemble import RandomForestClassifier

def predict_windspeed(data):
	wind0 = data.loc[data['windspeed'] == 0]
    windnot0 = data.loc[data['windspeed'] != 0]
    
    #ํ’์†์ด ๋‚ ์”จ๋ณ€์ˆ˜์ด๊ธฐ ๋•Œ๋ฌธ์— ๋‚ ์”จ๋ณ€์ˆ˜๋ฅผ ํ™œ์šฉํ•ด์„œ windspeed๋ฅผ ์˜ˆ์ธกํ•ด์ค„ ๊ฒƒ
    col = ['season', 'weather', 'temp', 'humidity', 'atemp', 'day']
    windnot0['windspeed'] = windnot0['windspeed'].astype('str')
    
    rf = RandomForestClassifier()
    #windspeed๊ฐ€ 0์ด ์•„๋‹Œ ์ปฌ๋Ÿผ์œผ๋กœ fit ํ•ด์คŒ
    #model.fit(X_train, y_train)
    rf.fit(windnot0[col], windnot0['windspeed'])
 
 	#windspeed๊ฐ€ 0์ธ ๋ถ€๋ถ„์„ ์˜ˆ์ธก
    #model.predict(X_test)
    pred_wind0 = rf.predict(X = wind0[col])
    
    #wind0์˜ windspeed ๊ฐ’์„ pred_wind0์œผ๋กœ ๋ฐ”๊ฟ”์ฃผ๊ณ 
    wind0['windspeed'] = pred_wind0
    
    #windnot0๊ณผ wind0์„ ํ•ฉ์ณ์ค€๋‹ค
    data = windnot0.append(wind0)
    data['windspeed'] = data['windspeed'].astype('float')
    
    data.reset_index(inplace = True)
    data.drop("index", inplace = True, axis = 1)
   
    return data
train = predict_windspeed(train)
test = predict_windspeed(test)

 

  • windspeed 0์ธ ๊ฐ’ ์กด์žฌ์—ฌ๋ถ€ ํ™•์ธ
train[train['windspeed'] == 0.0]

 

  • train๊ณผ test์˜ windspeed ๊ฐ’ ์‹œ๊ฐํ™”
fig, (ax1, ax2) = plt.subplots(2,1)
fig.set_size_inches(20,15)

#๊ฐฏ์ˆ˜๋ฅผ ์„ธ์•ผํ•˜๋‹ˆ countplot
sns.countplot(data = train, x = 'windspeed', ax = ax1)
sns.countplot(data = test, x = 'windspeed', ax = ax2)

windspeed์˜ ๊ฐ’์—์„œ 0.0 ์ด ์‚ฌ๋ผ์ง„ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

 

5.4 one-hot encoding ๋ฒ”์ฃผํ˜• ๋ณ€์ˆ˜ ์ฒ˜๋ฆฌ

Season ๋ด„(1) ์—ฌ๋ฆ„(2) ๊ฐ€์„(3) ๊ฒจ์šธ(4)
Holiday ๊ณตํœด์ผ(1) ๊ทธ์™ธ(0)
Workingday ๊ทผ๋ฌด์ผ(1) ๊ทธ์™ธ(0)
Weather ์•„์ฃผ๊นจ๋—ํ•œ๋‚ ์”จ(1) ์•ฝ๊ฐ„์˜ ์•ˆ๊ฐœ์™€ ๊ตฌ๋ฆ„(2) ์•ฝ๊ฐ„์˜ ๋ˆˆ,๋น„(3) ์•„์ฃผ๋งŽ์€๋น„์™€ ์šฐ๋ฐ•(4)

ํ•ด๋‹น ๋ณ€์ˆ˜๋Š” ์ˆซ์ž๋กœ ๋œ ๊ฐ’์ด '์ˆ˜์น˜'๋ฅผ ์˜๋ฏธํ•˜์ง€ ์•Š๋Š”๋‹ค. ๋”ฐ๋ผ์„œ ์›ํ•ซ์ธ์ฝ”๋”ฉ์œผ๋กœ ๋ฒ”์ฃผํ˜• ๋ณ€์ˆ˜๋ฅผ ์ฒ˜๋ฆฌํ•ด์ฃผ์—ˆ๋‹ค.

#prefix ๋ž€, ๋ณ€์ˆ˜ ์ƒ์„ฑ๋ช… ์•ž์— weather_1 ์ด๋Ÿฐ์‹์œผ๋กœ ์ƒ์„ฑ๋˜๊ฒŒ ํ•˜๋Š” ๊ฒƒ
train['weather'] = pd.get_dummies(train, columns = ['weather'], prefix = 'weather')
test['weather'] = pd.get_dummies(test, columns = ['weather'], prefeix = 'weather')

train['season'] = pd.get_dummies(train, columns = ['season'], prefix = 'season')
test['season'] = pd.get_dummies(test, columns = ['season'], prefix = 'season')

train = pd.get_dummies(train, columns = ['holiday'], prefix = 'holiday')
test = pd.get_dummies(test, columns = ['holiday'], prefix = 'holiday')
train.columns

test.columns

 

 

6. Modeling

6.1 train์— ์‚ฌ์šฉ๋  ๋ณ€์ˆ˜๋ฅผ ์„ ํƒํ•˜์ž

์œ„์˜ columns ๋‚ด์—ญ์„ ๋ณด๊ณ  dropํ•  ์ปฌ๋Ÿผ์„ ์„ ํƒํ•ด์ฃผ์—ˆ๋‹ค.

#submission์˜ ํ˜•ํƒœ๋ฅผ ์‚ดํŽด๋ณด์•˜์„ ๋•Œ, datetime์„ ๊ธฐ์ค€์œผ๋กœ ์˜ˆ์ธก๊ฐ’์„ ์ ์—‡๋‹ค.
#๋”ฐ๋ผ์„œ test์˜ datetime์€ ๋ฏธ๋ž˜์˜ submission ์„ ์œ„ํ•ด์„œ ๋”ฐ๋กœ ์ €์žฅํ•ด๋‘๊ธฐ๋กœ ํ•œ๋‹ค.
test_datetime = test['datetime']

train.drop(['datetime', 'workingday', 'atemp', 'registered', 'casual', 'minute', 'second'], axis = 1, inplace = True)
test.drop(['datetime', 'workingday', 'atemp', 'minute', 'second'], axis = 1, inplace = True) 

์ผ๋‹จ workingday๋Š” holiday ์™€ ๋„ˆ๋ฌด ๋น„์Šทํ•œ ์–‘์ƒ์„ ๋„๊ณ  ์žˆ์–ด์„œ workiingday๋ฅผ ์‚ญ์ œํ•ด์ฃผ๊ธฐ๋กœ ํ–ˆ๋‹ค. ๋˜ temp์™€ atemp์˜ ์ƒ๊ด€๊ด€๊ณ„๊ฐ€ ๋งค์šฐ ๋†’์•„ ๋‹ค์ค‘๊ณต์„ ์„ฑ์ด ์˜์‹ฌ๋๊ธฐ ๋•Œ๋ฌธ์—, atemp ๋ณ€์ˆ˜๋ฅผ ์‚ญ์ œํ•ด ์ฃผ์—ˆ๋‹ค. year, month, day ๋“ฑ ์‹œ๊ฐ„์— ๋Œ€ํ•œ ๋ณ€์ˆ˜๊ฐ€ ๋”ฐ๋กœ ์กด์žฌํ•˜๊ธฐ๋•Œ๋ฌธ์— datetime ๋„ ์‚ญ์ œํ•ด์ฃผ๊ณ , ์ดˆ๋‹จ์œ„, ๋ถ„๋‹จ์œ„์— ๋”ฐ๋ฅธ ์ž์ „๊ฑฐ ์ˆ˜์š”๋Ÿ‰์˜ ๋ณ€ํ™”๋Š” ์•Œ๊ธฐ ์‰ฝ์ง€ ์•Š๋‹ค๊ณ  ํŒ๋‹จํ•˜์—ฌ ์—†์•  ์ฃผ์—ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์ตœ์ข…์ ์œผ๋กœ ์„ ์ •ํ•˜๊ฒŒ๋œ ์ปฌ๋Ÿผ์€ ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

train.columns

test.columns

 

6.2 Gradient Boosting ๋ชจ๋ธ ํ•™์Šต

6.2.1 ๋ฐ์ดํ„ฐ์…‹ ๋ถ„ํ• 

from sklearn.model_selection import train_test_split
from sklearn import metrics
#๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„ ํ˜•ํƒœ๊ฐ€ ์•„๋‹Œ array ํ˜•ํƒœ์—ฌ์•ผํ•˜๊ธฐ๋•Œ๋ฌธ์— values๋ฅผ ์ทจํ•ด์คŒ
x_train = train.drop('count_log', axis = 1).values
target_label = train['count_log'].values
x_test = test.values

# train : val = 0.8 : 0.2 ๋กœ split
x_train, x_val, y_train, y_val = train_test_split(x_train, target_label, test_size = 0.2, random_state = 2000)

๋ฐฐ์—ด ํ˜•ํƒœ๋กœ ๋ฐ์ดํ„ฐ๊ฐ€ ์ž˜ ๋‚˜๋‰˜์–ด ์ง„๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

6.2.2 ๋ชจ๋ธ๋ง ๋ฐ ํ•™์Šต

gradient boosting ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜๊ธฐ๋กœ ํ•œ๋‹ค. ์ด ๋ชจ๋ธ์˜ ์žฅ๋‹จ์ , ์™œ ์„ ํƒํ•˜๊ฒŒ ๋˜์—ˆ๋Š”์ง€ ๋“ฑ์— ๋Œ€ํ•œ ์ž์„ธํ•œ ํฌ์ŠคํŒ…์€ ์ถ”ํ›„์— ์ด์–ด๋‚˜๊ฐ€๋„๋ก ํ•˜๊ฒ ๋‹ค.

from sklearn.ensemble import GradientBoostingRegressor
regressor = GradientBoostingRegressor(n_estimators = 2000
					, learning_rate = 0.05
                                    , max_depth = 5
                                    , min_samples_leaf = 15
                                    , min_samples_split = 10
                                    , random_state = 42)
#model.fit(x, y)
regressor.fit(x_train, y_train)

ํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹์„ ํ†ตํ•ด ๋” ์ ํ•ฉํ•œ ๋งค๊ฐœ๋ณ€์ˆ˜ ๊ฐ’์„ ์ฐพ์•„์ฃผ๋Š” ๊ฒƒ์ด ์ข‹๊ฒ ์ง€๋งŒ, ๋ฐ์ดํ„ฐ ์—ญ๋Ÿ‰ ํ…Œ์ŠคํŠธ ์‹œํ—˜ ํŠน์„ฑ์ƒ 2์‹œ๊ฐ„์œผ๋กœ ์‹œ๊ฐ„์ด ํ•œ์ •๋˜์–ด ์žˆ๊ธฐ์— ์•„์ง์€ ์—ฐ์Šตํ•˜์ง€ ์•Š์•˜๋‹ค. ํ•˜์ง€๋งŒ ์ถ”ํ›„ ํ•ด๋‹น ๋ชจ๋ธ์— gridsearch ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹์„ ํ•ด๋ณผ ์˜ˆ์ •์ด๋‹ค.

6.2.3 ๋ชจ๋ธ ์„ฑ๋Šฅ ํ‰๊ฐ€

๊ฐ„๋‹จํ•˜๊ฒŒ accuracy๋ฅผ ์ถœ๋ ฅํ•ด๋ณด์•˜๋‹ค. ๋งŒ์•ฝ ํ•ด๋‹น ๊ณผ์ •์—์„œ ์˜ค๋ฒ„ํ”ผํŒ…์ด ๋ฐœ์ƒํ•œ๋‹ค๋ฉด, learning_rate๋‚˜ max_depth(ํŠธ๋ฆฌ์˜ ๊นŠ์ด)๋ฅผ ์ž‘๊ฒŒ ์ˆ˜์ •ํ•ด์„œ ์ง„ํ–‰ํ•ด๋ณด๋„๋ก ํ•ด๋ณด์ž. ์ง€๊ธˆ์€ test๊ฐ€ ๋” ์ •ํ™•๋„๊ฐ€ ๋†’๊ธด ํ•˜์ง€๋งŒ ์ฐจ์ด๊ฐ€ ํฌ์ง€ ์•Š๊ธฐ์— ๊ทธ๋ƒฅ ์ง„ํ–‰ํ•˜๊ฒ ๋‹ค.

score_train = regressor.score(x_train, y_train)
score_val = regressor.score(x_val, y_val)

print("train score: %f" %score_train)
print("validation score: %f" %score_val)

7. ์˜ˆ์ธก ๋ฐ submission.csv ์ƒ์„ฑ

7.1 ์˜ˆ์ธก

์•ž์„œ ๋งŒ๋“  regressor ๋ชจ๋ธ์— x_test ๋ฐ์ดํ„ฐ๋ฅผ ๋„ฃ์–ด ์˜ˆ์ธก์„ ์ˆ˜ํ–‰ํ•ด ๋ณด๊ฒ ๋‹ค.

pred = regressor.predict(x_test)

7.2 ์ œ์ถœ ํŒŒ์ผ ์ƒ์„ฑ

๋Œ€ํšŒ์—์„œ ์—…๋กœ๋“œํ•ด์ค€ submission ํŒŒ์ผ์„ ๋ณด๋ฉด, ์•„๋ž˜์™€ ๊ฐ™์ด ์ƒ๊ฒผ๋‹ค.

sample = pd.read_csv("SampleSubmission.csv")
sample.head()

์ด ํ˜•ํƒœ์™€ ๋˜‘๊ฐ™์ด ๋งŒ๋“  ํ›„ ์ œ์ถœํ•ด์•ผ ํ•œ๋‹ค. 
์ดˆ๋ฐ˜์— train๊ณผ test์—์„œ ์‚ฌ์šฉํ•  ๋ณ€์ˆ˜๋ฅผ ์ถ”๋ ค๋‚ด๋Š” ๊ณผ์ •์—์„œ, test ํŒŒ์ผ์—์„œ์˜ datetime์„ test_datetime ๋ณ€์ˆ˜์— ์ €์žฅํ•ด๋†“์€ ๊ฒƒ์„ ๊ธฐ์–ตํ•  ๊ฒƒ์ด๋‹ค.

๋”ฐ๋ผ์„œ ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์„ ์ƒ์„ฑํ•˜๊ณ , ๊ฑฐ๊ธฐ์— test_datetime ๊ณผ, ์˜ˆ์ธกํ•œ ๊ฒฐ๊ณผ๊ฐ’์ธ pred๋ฅผ ๋„ฃ์–ด์ค€๋‹ค.

submission = pd.DataFrame()
submission['datetime'] = test_datetime
submission['count_log'] = pred

์—ฌ๊ธฐ์„œ ๊ฐ€์žฅ ์ค‘์š”ํ•œ ๊ฒƒ์€, count_log ๋ผ๋Š” ์ ์ด๋‹ค. ์ด๋•Œ ๊ทธ๋ƒฅ count๋กœ ํ•ด์„œ ์ œ์ถœํ•ด๋ฒ„๋ฆฌ๋ฉด...๋”์ฐํ•˜๋‹ค..๋ชจ๋“ ๊ฒŒ ๋ฌผ๊ฑฐํ’ˆ..
์ดˆ๋ฐ˜์— count์— log๋ฅผ ์ทจํ•ด์„œ ์ •๊ทœํ™” ์‹œ์ผœ์ค€ ๊ฒƒ์„ ๊ผฌ์˜ค์˜ค์˜ฅ ๊ธฐ์–ตํ•˜์ž.

๋”ฐ๋ผ์„œ ๋งˆ์ง€๋ง‰์œผ๋กœ count_log์— ๋‹ค์‹œ log๋ฅผ ์ทจํ•ด, ์›๋ž˜์˜ ๊ฐ’์„ ์ฐพ์•„์ค€๋‹ค.

submission['count'] = np.exp(submission['count_log'])
    
submission.drop('count_log', axis = 1, inplace = True)
submission.head()

submission.to_csv("Bike.csv", index = False)

 

8. kaggle์— ์ œ์ถœ

www.kaggle.com/c/bike-sharing-demand/submissions

์ ‘์†ํ•ด์„œ ๋กœ๊ทธ์ธ ํ›„, Late submission์— ๋“ค์–ด๊ฐ€์„œ ํŒŒ์ผ์„ ๋“œ๋ž˜๊ทธํ•˜์—ฌ ์—…๋กœ๋“œ ํ•œ ํ›„, ๋ฐ”๋กœ ํ‰๊ฐ€๋ฐ›์„ ์ˆ˜ ์žˆ๋‹ค.

๋ฐฉ๊ธˆ ์ œ์ถœํ•œ ํŒŒ์ผ์˜ ์ ์ˆ˜๋ฅผ ๋ณด๋ฉด, 0.41826์œผ๋กœ ์‚ฐ์ •๋˜์—ˆ๋‹ค. ํ˜„์žฌ ๋Œ€ํšŒ๊ฐ€ ๋งˆ๊ฐ๋˜์–ด ์ˆœ์œ„๋Š” ๋ณผ ์ˆ˜ ์—†๋‹ค. ๋ณธ ๋Œ€ํšŒ ์ ์ˆ˜ ์ธก์ • ๋ฐฉ์‹์€ RMSE ๊ธฐ์ค€์œผ๋กœ, ์ ์ˆ˜๊ฐ€ ๋‚ฎ์•„์•ผ ์ข‹์€ ๊ฒƒ์ด๋‹ค. ํ•˜์ง€๋งŒ ๋‚ด๊ฐ€ ๋ชจ๋ธ๋งํ•œ ๊ฒƒ์€ 0.41 ์ •๋„๋กœ ์ •ํ™•ํžˆ 426๋“ฑ ์ •๋„ ํ–ˆ๊ฒ ๋‹ค. ... ํ—ˆํ—ˆ.... ๋ชจ๋ธ์„ ๋‹ค๋ฅธ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜๊ฑฐ๋‚˜, ํŒŒ์ƒ๋ณ€์ˆ˜๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋“ฑ ํ”ผ์ณ์— ๋Œ€ํ•œ ์ƒ๊ฐ์„ ์ข€ ๋” ๊นŠ๊ฒŒ ํ•ด๋ณธ๋‹ค๋ฉด ์ ์ˆ˜๋Š” ํฌ๊ฒŒ ํ–ฅ์ƒํ•  ์ˆ˜ ์žˆ์ง€ ์•Š์„๊นŒ ์‹ถ๋‹ค. 

 

๋งˆ์น˜๋ฉฐ....

๋ฐ์ดํ„ฐ ๋ถ„์„ ์—ญ๋Ÿ‰ ํ…Œ์ŠคํŠธ๋ฅผ ์น˜๋ค„์•ผ ํ•˜๊ธฐ ๋•Œ๋ฌธ์—, ๋ฐ์ดํ„ฐ๋ฅผ ์‚ดํŽด๋ณด๋Š” ๋ฒ•, EDA ํ•˜๊ณ  ๋ชจ๋ธ๋งํ•˜๋Š” ๋ฒ•๊นŒ์ง€ ์ญ‰ ๊ณต๋ถ€ํ•ด๋ณด์•˜๋‹ค. ์ˆ˜๋งŽ์€ ํ”„๋กœ์ ํŠธ๋ฅผ ์ง„ํ–‰ํ•ด๋ณด๋ฉด์„œ ์ฐธ๊ณ ์ž๋ฃŒ(๊ฐ“ ๊ตฌ๊ธ€, ์ฑ… ๋“ฑ) ์—†์ด ํ˜ผ์ž ๋จธ์‹ ๋Ÿฌ๋‹์„ ๋‹ค๋ค„๋ณธ์ ์ด ์—†๋Š”๋ฐ...
์‹œํ—˜์—์„œ๋Š” ์ฐธ๊ณ ์ž๋ฃŒ ์—†์ด ์˜ค์ง ๋ ˆํผ๋Ÿฐ์Šค์— ์˜์ง€ํ•˜์—ฌ ๋ฐ์ดํ„ฐ๋ฅผ ํƒ์ƒ‰ํ•˜๊ณ , ๋ชจ๋ธ๋งํ•ด์•ผํ•œ๋‹ค๋Š” ๋ถ€๋‹ด๊ฐ์ด ๋„ˆ๋ฌด๋„ˆ๋ฌด ํฌ๋‹ค. ์‹ค์ œ๋กœ ๋‚ด๊ฐ€ ์•„๋Š” ๋ชจ๋ธ ์ง€์‹ ๋“ฑ์„ ์ž˜ ์จ๋จน์ง€ ๋ชปํ• ๊นŒ๋ด ๊ฑฑ์ •์ด ์•ž์„ ๋‹ค ใ… ใ…  

์ง€๊ธˆ ์ดํ‹€ ๋‚จ์•˜๋Š”๋ฐ, ์ตœ์„ ์„ ๋‹คํ•ด์„œ ๋ชจ๋ธ๋ง๊ณผ ํ”ผ์ณ ์—”์ง€๋‹ˆ์–ด๋ง ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด ์ตํžŒ ํ›„, ์‹œํ—˜์—์„œ ๋ฌด์‚ฌํžˆ ์›ํ•˜๋Š” ๊ฒฐ๊ณผ๋ฅผ ์–ป์—ˆ์œผ๋ฉด ์ข‹๊ฒ ๋‹ค!!!

์„ธ์ƒ์—”.... AI๋ฅผ ์ž˜๋‹ค๋ฃจ๋Š” ์‚ฌ๋žŒ์ด ๋„ˆ๋ฌด ๋งŽ๋‹ค  ส˜ฬฅ_ส˜ฬฅ 

์•„ ๊ทธ๋ฆฌ๊ณ , ์ด ๊ธ€์„ ์ž‘์„ฑํ•˜๋Š”๋ฐ ๊ฐ„๊ฐ„ํžˆ ๊ธ€์—์„œ ๋“ฑ์žฅํ–ˆ๋“ฏ์ด,  HONG_YP๋‹˜์˜ ๋ธ”๋กœ๊ทธ ๋ฅผ ๋งŽ์ด ์ฐธ๊ณ ํ–ˆ๋‹ค. windspeed ๋ณ€์ˆ˜๋ฅผ ๊ฐ–๊ณ  ์ €๋ ‡๊ฒŒ ์ƒ๊ฐํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค๋Š” ๊ฒƒ๊ณผ, ์™œ๋„์™€ ์ฒจ๋„๋ฅผ ๊ณ ๋ คํ•ด์„œ count๋ฅผ ์ •๊ทœํ™” ์‹œ์ผœ์ฃผ๋Š” ๋“ฑ... ์•„์ง ํ”ผ์ณ ์—”์ง€๋‹ˆ์–ด๋ง์— ๋Œ€ํ•ด์„œ ๋ฐฐ์šธ ์ ์ด ์ฐธ ๋งŽ์€ ๊ฒƒ ๊ฐ™๋‹ค. ๋‚˜ ํ˜ผ์ž ์Šค์Šค๋กœ ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ ์ธ์‚ฌ์ดํŠธ๋ฅผ ๋ฝ‘์•„๋‚ด๋Š” ๊ทธ๋‚ ๊นŒ์ง€ ๋” ์—ด์‹ฌํžˆ ๋งŽ์€ ์‚ฌ๋žŒ์˜ ์ž๋ฃŒ๋ฅผ ์ฐพ์•„๋ณด๊ณ  ๊ณต๋ถ€ํ•ด๋ด์•ผ๊ฒ ๋‹ค๋Š” ๋‹ค์ง์ด ์ƒ๊ฒผ๋‹ค!

hong-yp-ml-records.tistory.com/77?category=823206

 

[Bike Sharing Demand] ์บ๊ธ€ ์ž์ „๊ฑฐ ์ˆ˜์š”์˜ˆ์ธก Part 4

https://www.kaggle.com/kongnyooong/bike-sharing-demand-for-korean-beginners [Bike Sharing Demand] for Korean Beginners (ํ•œ๊ธ€์ปค๋„) Explore and run machine learning code with Kaggle Notebooks | Using..

hong-yp-ml-records.tistory.com

 

๋ฐ˜์‘ํ˜•