Potato
์•ˆ๋…•ํ•˜์„ธ์š”, ๊ฐ์žก๋‹ˆ๋‹ค?๐Ÿฅ” ^___^ ๐Ÿ˜บ github ๋ฐ”๋กœ๊ฐ€๊ธฐ ๐Ÿ‘‰๐Ÿป

AI study/potato's PJT (in dongguk)

[์ถ”์ฒœ์‹œ์Šคํ…œ] ์˜ํ™” ํ‰์  ๋ฐ์ดํ„ฐ๋กœ ํ˜‘์—…ํ•„ํ„ฐ๋ง ์ถ”์ฒœ์‹œ์Šคํ…œ (Collaborative Filtering) ๊ตฌํ˜„

๊ฐ์ž ๐Ÿฅ” 2021. 4. 29. 02:57
๋ฐ˜์‘ํ˜•

์‹œ์ž‘ํ•˜๋ฉฐ..! :) 

๋ฐ์ดํ„ฐ ๋ถ„์„ ์ˆ˜์—…์„ ์ˆ˜๊ฐ•ํ•˜๋ฉฐ ๋ฐฐ์šฐ๊ฒŒ๋œ ์ถ”์ฒœ์‹œ์Šคํ…œ!
์œ ํŠœ๋ธŒ๊ฐ€ ์ƒ์šฉํ™” ๋œ ์š”์ฆ˜, '์œ ํŠœ๋ธŒ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์ผ์ž˜ํ•˜๋„ค~'๋ผ๋Š” ๋ง์„ ์ž…์— ๋‹ฌ๊ณ  ์‚ด๋ฉด์„œ ์‹ค์ƒํ™œ์— ์ถ”์ฒœ์‹œ์Šคํ…œ์ด ๋…น์•„๋“ค์–ด์žˆ๋‹ค .์ด๋Ÿฐ ์ถ”์ฒœ์‹œ์Šคํ…œ์„ ์ง์ ‘ ๊ตฌํ˜„ํ•ด๋ณผ ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ ์ž์ฒด๋งŒ์œผ๋กœ๋„ ๊ต‰์žฅํ•œ ํฅ๋ฏธ๊ฐ€ ์ƒ๊ฒผ๊ณ , ๋‚ด ์กธ์—…ํ”„๋กœ์ ํŠธ ๋˜ํ•œ ์ถ”์ฒœ์‹œ์Šคํ…œ์œผ๋กœ ์ฃผ์ œ๊ฐ€ ์ •ํ•ด์ง€๊ฒŒ ๋˜์—ˆ์ง€... ์–ด์จŒ๋“ , ๋ฏธ๋ž˜์— ๋‚˜๋Š” ์‹ค๋ฌด์—์„œ ์–ด๋–ค ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ง€๊ณ  ์–ผ๋งˆ๋‚˜ ๋Œ€๋‹จํ•œ ์ถ”์ฒœ์‹œ์Šคํ…œ์„ ๊ตฌํ˜„ํ•  ์ˆ˜ ์žˆ์„๊นŒ!? ๋ฒŒ์จ ์„ค๋ Œ๋‹น! ์–ผ๋ฅธ ์ทจ์—…ํ•˜์ž....โ˜…

-- ๋ณธ ์ž๋ฃŒ๋Š” ๋™๊ตญ๋Œ€ํ•™๊ต ๋ฐ์ดํ„ฐ์–ด๋‚ผ๋ฆฌํ‹ฑ์Šค ๊ฐ•์˜ ์ž๋ฃŒ๋ฅผ ํ™œ์šฉํ•˜์—ฌ ์ง„ํ–‰ํ•œ ํ”„๋กœ์ ํŠธ ์ž…๋‹ˆ๋‹ค :)

 

1. ํ”„๋กœ์ ํŠธ ์ง„ํ–‰ ๋ฐฉํ–ฅ

R ์–ธ์–ด๋ฅผ ํ™œ์šฉํ•˜์—ฌ Collaborative Filtering ๊ธฐ๋ฒ•์„ ํ™œ์šฉํ•œ Recommendation System์„ ๊ตฌํ˜„ํ•  ๊ฒƒ์ด๋‹ค. Collaborative Filtering ๊ธฐ๋ฒ• ์ค‘์—์„œ๋„, Item-Based ์™€ User-Based ๋‘๊ฐ€์ง€์˜ ๋ฐฉ๋ฒ•์„ ์ด์šฉํ•ด ๋ณผ๊ฒƒ์ด๋‹ค! ๋‘ ๊ธฐ๋ฒ•์— ๋Œ€ํ•œ ์ฐจ์ด์™€ ์ž์„ธํ•œ ์„ค๋ช…์€ ์กฐ๋งŒ๊ฐ„ ๋˜ ์—…๋กœ๋“œ ํ•ด์•ผ๊ฒ ๋‹ค :) (๊ณต๋ถ€ ํ•„์š”ํ•ด...!)

 

2. ํ”„๋กœ์ ํŠธ ์ฃผ์ œ

์‚ฌ์šฉ์ž๊ฐ€ ๋‚จ๊ธด ์˜ํ™” ํ‰์  ๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•˜์—ฌ ์ถ”์ฒœ์‹œ์Šคํ…œ์„ ๊ตฌํ˜„ํ•œ๋‹ค.

User-base, Item-based ๋‘๊ฐ€์ง€์˜ ๋ฐฉ์‹์„ ์‚ฌ์šฉํ•ด๋ณด๊ณ , Bestํ•œ ๋ชจ๋ธ์ด ๋ฌด์—‡์ธ์ง€ ์‚ดํŽด๋ณด๋„๋ก ํ•œ๋‹ค.

 

3. ํ™œ์šฉ ๋ฐ์ดํ„ฐ

  • MovieLense์—์„œ ์ œ๊ณตํ•˜๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ–ˆ๋‹ค.
  • 1682๊ฐœ์˜ ์˜ํ™”์— 943๋ช…์˜ ์œ ์ €๊ฐ€ ๋‚จ๊ธด 10๋งŒ๊ฐœ์˜ ํ‰์  ๋ฆฌ๋ทฐ๋ฅผ ํ™œ์šฉํ–ˆ๋‹ค. (ํ‰์ ์€ 1~5์  ์‚ฌ์ด)
  • ๊ฐ ์œ ์ €๋Š” ์ตœ์†Œ 20๊ฐœ์˜ ์˜ํ™”์— ๋ฆฌ๋ทฐ ํ‰์ ์„ ๋‚จ๊ฒผ๋‹ค.
  • ์œ ์ €์˜ ๊ฐ„๋‹จํ•œ demo ์ •๋ณด๋„ ๋“ค์–ด์žˆ๋‹ค. (age, gender, occupatin, zip)

 

3. ๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ

3.1 ํ™˜๊ฒฝ ๊ตฌ์„ฑ

์ด๋ฒˆ ํ”„๋กœ์ ํŠธ๋ฅผ ์ง„ํ–‰ํ•˜๊ธฐ ์œ„ํ•œ library๋ฅผ ์„ค์น˜ ๋ฐ ๋ถˆ๋Ÿฌ์™€์ค€๋‹ค. 

install.packages("SnowballC") #tf-idf๋ฅผ ๊ตฌ์„ฑํ•˜๊ธฐ ์œ„ํ•œ ํŒจํ‚ค์ง€
install.packages("class") #KNN ๋ถ„์„์„ ์œ„ํ•œ ํŒจํ‚ค์ง€
install.packages("dbscan") #KNN ๋ถ„์„์„ ์œ„ํ•œ ํŒจํ‚ค์ง€
install.packages("proxy") #์ฝ”์‚ฌ์ธ ์œ ์‚ฌ๋„, ๊ฑฐ๋ฆฌ๋ฅผ ๊ณ„์‚ฐํ•˜๊ธฐ ์œ„ํ•œ ํŒจํ‚ค์ง€
install.packages("recommenderlab") #์ถ”์ฒœ ์‹œ์Šคํ…œ์„ ์œ„ํ•œ ํŒจํ‚ค์ง€
install.packages("dplyr") #๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์„ ์ฒ˜๋ฆฌํ•˜๋Š” ํ•จ์ˆ˜ ํŒจํ‚ค์ง€
install.packages("tm") #tf-idf matrix๋ฅผ ๊ตฌ์„ฑํ•˜๊ธฐ ์œ„ํ•œ ํŒจํ‚ค์ง€
install.packages('caTools')

library(recommenderlab)
library(dplyr)
library(tm)
library(SnowballC)
library(class)
library(dbscan)
library(proxy)
library(caTools)

 

3.2 ๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ

# ๊ฒฝ๋กœ ์„ค์ •ํ•˜๊ณ  ๋ฐ์ดํ„ฐ ์ฝ๊ธฐ
  setwd("C:\\Users\\ml-100k")
data <- read.table('u.data')
colnames(data) <- c('user_id','item_id','rating','timestamp')
data = data[,-which(names(data) %in% c('timestamp'))]

# ์ฝ์–ด์˜จ ๋ฐ์ดํ„ฐ ํ™•์ธํ•˜๊ธฐ
summary(data)
str(data)
hist(data$rating, main="Histogram of Rating")

# Data sparsity
RatingNumber = nrow(data)
MovieNumber = length(unique(data$item_id))
UserNumber = length(unique(data$user_id))

data summary ๊ฒฐ๊ณผ
๋ฐ์ดํ„ฐ์˜ histogram

 

3.3 ๋ฐ์ดํ„ฐ๋ฅผ Matrix ํ˜•ํƒœ๋กœ ๋ณ€ํ™˜

์ถ”์ฒœ์‹œ์Šคํ…œ์€ ํ–‰๋ ฌ ๊ธฐ๋ฐ˜์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ์—, Matrix ๋กœ ๋ณ€ํ™”ํ•ด์„œ ์‚ฌ์šฉํ•ด์ฃผ์–ด์•ผ ํ•œ๋‹ค.

# ๋งคํŠธ๋ฆญ์Šค ํ˜•ํƒœ๋กœ ๋ณ€ํ™˜
data.frame2matrix = function(data, rowtitle, coltitle, datatitle, 
                             rowdecreasing = FALSE, coldecreasing = FALSE,
                             default_value = NA) {
  ## ์—ด ์ด๋ฆ„ ์กด์žฌํ•˜๋Š”์ง€ ํ™•์ธ
  if ( (!(rowtitle%in%names(data))) 
       || (!(coltitle%in%names(data))) 
       || (!(datatitle%in%names(data))) ) {
    stop('data.frame2matrix: bad row-, col-, or datatitle.')
  }
  
  ## ์—ด ๊ฐœ์ˆ˜
  ndata = dim(data)[1]
  
  ## ํ–‰ ๋ฐ ์—ด ์ด๋ฆ„
  rownames = sort(unique(data[[rowtitle]]), decreasing = rowdecreasing)
  nrows = length(rownames)
  colnames = sort(unique(data[[coltitle]]), decreasing = coldecreasing)
  ncols = length(colnames)
  
  ## matrix ์ดˆ๊ธฐํ™”
  out_matrix = matrix(NA, 
                      nrow = nrows, ncol = ncols,
                      dimnames=list(rownames, colnames))
  
  ## ๋ฐ์ดํ„ฐ์˜ ํ–‰ ๋ฐ˜๋ณต
  for (i1 in 1:ndata) {
    ## ํ˜„์žฌ ๋ฐ์ดํ„ฐ ํ–‰์— ๋Œ€ํ•œ ํ–‰๋ ฌ-ํ–‰ ๋ฐ ํ–‰๋ ฌ-์—ด ์ƒ‰์ธ ๊ฐ€์ ธ์˜ค๊ธฐ
    iR = which(rownames==data[[rowtitle]][i1])
    iC = which(colnames==data[[coltitle]][i1])
    
    ## matrix ํ•ญ๋ชฉ (iR, iC)๊ฐ€ ๊ณต๋ฐฑ์ด ์•„๋‹ˆ๋ผ๋ฉด ์—๋Ÿฌ ๋ฐœ์ƒ
    if (!is.na(out_matrix[iR, iC])) stop('data.frame2matrix: double entry in data.frame')
    out_matrix[iR, iC] = data[[datatitle]][i1]
  }
  
  ## matrix ๊ฒฐ์ธก์น˜์— default ๊ฐ’ ์‚ฝ์ž…ํ•˜๊ธฐ
  out_matrix[is.na(out_matrix)] = default_value
  
  # return matrix
  return(out_matrix)
  
}
# colnames(data) <- c('user_id','item_id','rating','timestamp')

pre_data = data.frame2matrix(data, 'user_id', 'item_id', 'rating')
target_data <- as(as.matrix(pre_data), "realRatingMatrix")
๋ฐ˜์‘ํ˜•

3.4 ๋ฐ์ดํ„ฐ Partitions

๋ฐ์ดํ„ฐ๋Š” 30ํšŒ ์ด์ƒ์˜ ๋ฆฌ๋ทฐ๊ฐ€ ์กด์žฌํ•˜๋Š” ์˜ํ™”๋งŒ ์‚ฌ์šฉํ•  ๊ฒƒ์ด๋‹ค.

# ๋ฆฌ๋ทฐ๊ฐ€ 30๋ฒˆ ์ด์ƒ ๋‹ฌ๋ฆฐ ์˜ํ™”๋งŒ ์ด์šฉ
useData <- data[data$user_id
                %in% names(table(data$user_id))
                [table(data$user_id)>30],]

 

๊ทธ๋ฆฌ๊ณ  Train๊ณผ Test๋ฅผ 7:3 ๋น„์œจ๋กœ ๋‚˜๋ˆ„์–ด ์ฃผ์—ˆ๋‹ค.

# Train:Test = 7:3
spl <- sample.split(useData$rating,0.7)
train <- subset(useData,spl==TRUE)
test <- subset(useData,spl==FALSE)

 

ํ‰์ ์ด ๊ฐ€์žฅ ๋†’์€ ์˜ํ™”, ์กฐํšŒ์ˆ˜๊ฐ€ ๋†’์€ ์˜ํ™”๋ฅผ ์กฐํšŒํ•ด๋ณด์ž.

# ํ‰๊ท  ํ‰์  Top 5
avg_top5 <- target_data %>% normalize %>% colMeans %>% sort(decreasing=TRUE) %>% head(5)
avg_top5

# ์กฐํšŒ์ˆ˜ Top 5
freq_top5 <- target_data %>% normalize %>% colCounts %>% sort(decreasing=TRUE) %>% head(5)
freq_top5

# ํ‰์  & ์กฐํšŒ์ˆ˜ Top 5
avg_freq_top5 <- target_data %>% normalize %>% binarize(minRating=1) %>% colCounts() %>% sort(decreasing=TRUE) %>% head(5)
avg_freq_top5

๊ฒฐ๊ณผ๊ฐ’

  • ํ‰๊ท  ํ‰์ ์ด ๊ฐ€์žฅ ๋†’์€ ์˜ํ™” Top 5
    • 814๋ฒˆ ์˜ํ™”, 1536๋ฒˆ ์˜ํ™”, 1467๋ฒˆ์˜ํ™”, 1500๋ฒˆ ์˜ํ™”, 1599๋ฒˆ ์˜ํ™” ์ˆœ์œผ๋กœ ํ‰์ ์ด ๋†’๋‹ค.
    • ํ‰์ ์˜ ํ‰๊ท ์€ ๊ฐ๊ฐ 1.90, 1.70, 1.49, 1.47, 1.41 ์ด๋‹ค!
  • ์กฐํšŒ์ˆ˜๊ฐ€ ๋†’์€ ์˜ํ™” Top5
    • 50๋ฒˆ ์˜ํ™”, 258๋ฒˆ ์˜ํ™”, 100๋ฒˆ ์˜ํ™”, 181๋ฒˆ ์˜ํ™”, 294๋ฒˆ ์˜ํ™” ์ˆœ์œผ๋กœ ์กฐํšŒ์ˆ˜๊ฐ€ ๋†’๋‹ค.
    • ์กฐํšŒ์ˆ˜๋Š” ๊ฐ๊ฐ 583, 509, 508 507, 485 ์ˆœ์œผ๋กœ ๋ณด์—ฌ์ง„๋‹ค.
  • ํ‰์ ๊ณผ ํ‰๊ท  ๋™์‹œ์— ๊ณ ๋ คํ•œ ์ ์ˆ˜๊ฐ€ ๋†’์€ ์˜ํ™” Top5
    • 50๋ฒˆ, 100๋ฒˆ, 127๋ฒˆ 174๋ฒˆ, 313๋ฒˆ ์ˆœ์œผ๋กœ ๋†’๋‹ค.
    • ๊ฐ๊ฐ 280, 190, 182, 168, 159์ ์ด๋‹ค.

4. User-based Collaborative Filtering

4.1 ๋ฐ์ดํ„ฐ์˜ train/test๋ฅผ ๋‚˜๋ˆ„์–ด์ค€๋‹ค.

75 %์˜ train set ๊ณผ 25%์˜ test set์œผ๋กœ ๋ถ„๋ฅ˜ํ•ด์ฃผ์—ˆ๋‹ค.

# Train / Test data ๋‚˜๋ˆ„๊ธฐ
set.seed(100)
index <- sample(1:nrow(target_data), size=nrow(target_data)*0.75)

# ๋ฐ์ดํ„ฐ ๊ตฌ๋ถ„(train/test๋ฐ์ดํ„ฐ๋กœ ๊ตฌ๋ถ„)
train <- target_data[index, ]
test <- target_data[-index, ]

dim(train) 

 

4.1 User-based Collaborative Filtering ๋ชจ๋ธ๋ง

# User-based Collaborative Filtering ๋ชจ๋ธ๋ง
recommender_models <- recommenderRegistry$get_entries(dataType="realRatingMatrix")
model_UBCF <- Recommender(data=train, method="UBCF")
model_UBCF
model_UBCF@model$data

 

4.2 ์˜ˆ์ธก ์ˆ˜ํ–‰

# Test data ์˜ˆ์ธก (๊ฐ€์žฅ ๋งŽ์ด ์ถ”์ฒœ๋ฐ›์„ ๊ฐ€๋Šฅ์„ฑ์ด ๋†’์€ ์˜ํ™” ์ƒ์œ„ 10๊ฐœ)
pred_UBCF <- predict(model_UBCF, newdata=test, n=10)
pred_UBCF

# Test data์— ์žˆ๋Š” ํ•ด๋‹น ์‚ฌ์šฉ์ž๊ฐ€ ๋ช‡ ๋ฒˆ์งธ ์˜ํ™”๋ฅผ ์ถ”์ฒœ๋ฐ›์„์ง€ ์˜ˆ์ธก
pred_list_UBCF <- sapply(pred_UBCF@items, function(x) { colnames(pre_data)[x] })
pred_list_UBCF[5]   ## 5๋ฒˆ ์‚ฌ์šฉ์ž์—๊ฒŒ ์ถ”์ฒœํ•˜๋Š” ์˜ํ™”

# Test์— ์†ํ•œ ์‚ฌ์šฉ์ž๋“ค์—๊ฒŒ ๋ช‡๊ฐœ์˜ ์˜ํ™”๊ฐ€ ์ถ”์ฒœ๋˜์—ˆ๋Š”๊ฐ€?
table(unlist(lapply(pred_list_UBCF, length)))

 

4.3 ๋ชจ๋ธ ํ‰๊ฐ€

#๋ชจ๋ธํ‰๊ฐ€ ( train = 0.8, k = 10 )
data_modify <- target_data[rowCounts(target_data)]

eval_sets_UBCF <- evaluationScheme(data = data_modify,
                                   method = "cross-validation",
                                   train = 0.8, k = 10, goodRating = 3, given = 3)
sapply(eval_sets_UBCF@runsTrain, length)
getData(eval_sets_UBCF, "train")


# Traning
recomm_eval <- Recommender(data = getData(eval_sets_UBCF, "train"),
                           method = "UBCF", parameter = NULL)
recomm_eval

# prediction
pred_eval <- predict(recomm_eval, 
                     newdata = getData(eval_sets_UBCF, "known"),
                     n = 10, type = "ratings")
pred_eval

์œ„ ์ฝ”๋“œ์˜ ๊ฒฐ๊ณผ

 

# Calculate accuracy(๊ฐ ์‚ฌ์šฉ์ž๋ณ„ ์ถ”์ฒœ์ด ์ ์ ˆํ–ˆ๋Š”์ง€)
accuracy_eval_UBCF <- calcPredictionAccuracy(x = pred_eval,
                                             data = getData(eval_sets_UBCF,
                                                            "unknown"),
                                             byUser = TRUE)
head( accuracy_eval_UBCF, 10 )

# User-based Collaborative Filtering ์ •ํ™•๋„
colMeans(accuracy_eval_UBCF, na.rm = TRUE)

# ์ •๋ฐ€๋„/์žฌํ˜„์œจ์„ ์ด์šฉํ•œ ์ •ํ™•๋„
accuracy_eval2_UBCF <- evaluate(x=eval_sets_UBCF, method="UBCF" )
head( getConfusionMatrix(accuracy_eval2_UBCF), 10)

# ROC ์ปค๋ธŒ
plot(accuracy_eval2_UBCF, annotate=TRUE, main="ROC Curve")

 

5. Item-based Collaborative Filtering

5.1 Item-based Collaborative Filtering ๋ชจ๋ธ๋ง

list(k)๋Š” ์•„์ดํ…œ์˜ ์œ ์‚ฌ๋„ ๊ฐ’์„ ๊ณ„์‚ฐํ•˜๋Š”๋ฐ ๊ณ ๋ คํ•˜๋Š” ์ด์›ƒ์˜ ์ˆ˜๋ฅผ ์˜๋ฏธํ•œ๋‹ค. k=30์œผ๋กœ ์ง€์ •!

# Item-based Collaborative Filtering
model_IBCF <- Recommender(data = train, 
                             method = "IBCF",
                             parameter = list(k = 30))
model_IBCF
str( getModel(model_IBCF) )

 

5.2 ์˜ˆ์ธก

# Test data ๋‚ด์šฉ ์˜ˆ์ธก
pred_IBCF <- predict(model_IBCF, newdata=test, n=10)
pred_list_IBCF <- sapply(pred_IBCF@items, function(x) { colnames(target_data)[x] })
pred_list_IBCF[1]
table(unlist(lapply(pred_list_IBCF, length)))

5.3 ๋ชจ๋ธํ‰๊ฐ€

๋ฆฌ๋ทฐ๊ฐ€ 30๊ฐœ ์ด์ƒ์ธ ๋ฐ์ดํ„ฐ๋งŒ ์‚ฌ์šฉํ•ด์„œ, Train ๊ณผ Test๋ฅผ 7:3์œผ๋กœ ๋‚˜๋ˆ„๊ณ  ์ƒ์œ„ 10๊ฐœ์˜ ๊ฒฐ๊ณผ๋ฅผ ์˜ˆ์ธกํ–‡๋‹ค.
k๊ฐ’์€ 5๋กœ ์ง€์ •ํ–ˆ๋‹ค. 

# ๋ชจ๋ธํ‰๊ฐ€
data_modify <- target_data[rowCounts(target_data)]

eval_sets_IBCF <- evaluationScheme(data = data_modify,
                                   method = "cross-validation",
                                   train = 0.7,
                                   k = 5,
                                   goodRating = 3,
                                   given = 15)

# ๋ฐ์ดํ„ฐ ์„ธํŠธ ์ถ”์ถœ
sapply(eval_sets_IBCF@runsTrain, length)

getData(eval_sets_IBCF, "train")

# Training
recomm_eval_IBCF <- Recommender(data = getData(eval_sets_IBCF, "train"),
                                method = "IBCF", 
                                parameter = NULL)
recomm_eval_IBCF

# Prediction
pred_eval_IBCF <- predict(recomm_eval_IBCF, 
                          newdata = getData(eval_sets_IBCF, "known"),
                          n = 10, type = "ratings")
pred_eval_IBCF

# ์ •ํ™•๋„ ๊ณ„์‚ฐ
accuracy_eval_IBCF <- calcPredictionAccuracy(x = pred_eval_IBCF,
                                             data = getData(eval_sets_IBCF, 
                                                            "unknown"),
                                             byUser = TRUE)

head(accuracy_eval_IBCF,10)

# ํ‰๊ท  ์ •ํ™•๋„
meanitem<- colMeans(accuracy_eval_IBCF,na.rm = TRUE)

# Item-based Collaborative Filtering ์ •ํ™•๋„
meanitem

# ์ •๋ฐ€๋„/์žฌํ˜„์œจ ์ •ํ™•๋„
accuracy_eval_IBCF <- evaluate(x = eval_sets_IBCF, 
                               method = "IBCF", 
                               n = seq(10, 100, by = 10))
head( getConfusionMatrix(accuracy_eval_IBCF) )

#ROC ์ปค๋ธŒ
plot(accuracy_eval_IBCF, annotate = TRUE, main = "ROC Curve")

5.4 ๋งค๊ฐœ๋ณ€์ˆ˜ ํŠœ๋‹

  • ์•„์ดํ…œ ๊ฐ„ ์œ ์‚ฌ๋„๋ฅผ ๊ณ„์‚ฐํ•˜๋Š”๋ฐ ํ•„์š”ํ•œ ์ตœ์ ์˜ ์ด์›ƒ์˜ ์ˆ˜(K)๋ฅผ ์•Œ์•„๋ณด๊ธฐ ์œ„ํ•ด ๋งค๊ฐœ๋ณ€์ˆ˜ ํŠœ๋‹์„ ์ง„ํ–‰ํ•˜์˜€๋‹ค.
  • ์ฝ”์‚ฌ์ธ ๋˜๋Š” ํ”ผ์–ด์Šจ ์ƒ๊ด€๊ณ„์ˆ˜ ์ค‘ ์œ ์‚ฌ๋„ ๊ณ„์‚ฐ ๊ธฐ์ค€์„ ์„ ํƒํ–ˆ๋‹ค.
#๋งค๊ฐœ๋ณ€์ˆ˜ ํŠœ๋‹
#1. ์•„์ดํ…œ ๊ฐ„ ์œ ์‚ฌ๋„๋ฅผ ๊ณ„์‚ฐํ•˜๋Š”๋ฐ ํ•„์š”ํ•œ ์ตœ์ ์˜ ์ด์›ƒ์˜ ์ˆ˜
#2. ์ฝ”์‚ฌ์ธ ๋˜๋Š” ํ”ผ์–ด์Šจ ์ƒ๊ด€๊ณ„์ˆ˜ ์ค‘ ์œ ์‚ฌ๋„ ๊ณ„์‚ฐ ๊ธฐ์ค€ ์„ ํƒ
vector_k <- c(5, 10, 20, 30, 40)
mod1 <- lapply(vector_k, function(k, l) { list(name = "IBCF", 
                                               parameter = list(method = "cosine", 
                                                                k = k)) })
names(mod1) <- paste0("IBCF_cos_k_", vector_k)
names(mod1)

mod2 <- lapply(vector_k, function(k, l) { list(name = "IBCF", 
                                               parameter = list(method = "pearson", 
                                                                k = k)) })
names(mod2) <- paste0("IBCF_pea_k_", vector_k)
names(mod2)

mod <- append(mod1, mod2)

list_results <- evaluate(x = eval_sets_IBCF, 
                         method = mod,
                         n = c(1, 5, seq(10, 100, by = 10)))

#๋งค๊ฐœ๋ณ€์ˆ˜ ๋ณ„ ROC ์ปค๋ธŒ
plot(list_results, annotate = c(1, 2), legend = "topleft")
title("ROC Curve")

#๋งค๊ฐœ๋ณ€์ˆ˜ ๋ณ„ ์žฌํ˜„/ํšŒ์ƒ
plot(list_results, "prec/rec", annotate = 1, legend = "bottomright")
title("Precision-Recall")

getTopNLists signature(x = "realRatingMatrix"):
getTopNLists()

 

6. ์ตœ์ข… ๋ชจ๋ธ ์„ ์ •

RMSE ๋ž€, ๊ด€์ธก๊ฐ’์—์„œ ๋‚˜ํƒ€๋‚˜๋Š” ์˜ค์ฐจ๋ฅผ ์ œ๊ณฑํ•ด์„œ ํ‰๊ท ํ•œ๊ฐ’์˜ ์ œ๊ณฑ๊ทผ์„ ์˜๋ฏธํ•œ๋‹ค. RMSE์— ๋Œ€ํ•œ ์ž์„ธํ•œ ์„ค๋ช…์€ ๋‹ค๋ฅธ ํฌ์ŠคํŒ…์—์„œ ๋‹ค๋ฃจ๋„๋ก ํ•˜๊ณ , RMSE๊ฐ’์ด ๊ฐ€์žฅ ์ตœ์†Œํ™” ๋˜๊ฒŒ๋” ๋ชจ๋ธ์„ ๋งŒ๋“œ๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•˜๋‹ค.

๋”ฐ๋ผ์„œ ์œ„ ๋‘๊ฐœ์˜ ๋ชจ๋ธ (User-based, Item-based) ์ค‘์—์„œ RMSE๊ฐ€ ๋” ๋‚ฎ์€ Item-based ๋ชจ๋ธ์„ ์ตœ์ข… ๋ชจ๋ธ๋กœ ์„ ์ •ํ•˜๋ฉด ๋˜๊ฒ ๋‹ค.

Item-based RMSE๊ฐ€ ๋” ์ž‘๋‹ค.

 

๋งˆ์น˜๋ฉฐ,,!

์ˆ˜์—…์—์„œ ๋ฐฐ์šด ๋‚ด์šฉ์„ ํ™œ์šฉํ•˜์—ฌ ๋‘ ๊ฐ€์ง€์˜ ๋ชจ๋ธ์„ ๊ตฌํ˜„ํ•ด๋ณด๊ณ , RMSE๋ฅผ ํ™œ์šฉํ•˜์—ฌ best ๋ชจ๋ธ์„ ์„ ์ •ํ•ด ๋ณด์•˜๋‹ค. ๋‹ค์Œ์—๋Š” ๋ฆฌ๋ทฐ ํ…์ŠคํŠธ ๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•ด์„œ ์ถ”์ฒœ์‹œ์Šคํ…œ์„ ๊ตฌํ˜„ํ•ด๋ณด๊ณ ์‹ถ๋‹ค๊ณ  ๋Š๊ผˆ๋‹ค.

๋ฐ˜์‘ํ˜•