Potato
μ•ˆλ…•ν•˜μ„Έμš”, κ°μž‘λ‹ˆλ‹€?πŸ₯” ^___^ 😺 github λ°”λ‘œκ°€κΈ° πŸ‘‰πŸ»

AI study/μžμ—°μ–΄ 처리 (NLP)

[NLP] μžμ—°μ–΄ 처리λ₯Ό μœ„ν•œ μ‹œν€€μŠ€ λͺ¨λΈλ§ (μ‹œν€€μŠ€μ™€ RNN)

감자 πŸ₯” 2021. 8. 3. 11:12
λ°˜μ‘ν˜•

-- λ³Έ ν¬μŠ€νŒ…μ€ νŒŒμ΄ν† μΉ˜λ‘œ λ°°μš°λŠ” μžμ—°μ–΄ 처리 (ν•œλΉ›λ―Έλ””μ–΄) 책을 μ°Έκ³ ν•΄μ„œ μž‘μ„±λœ κΈ€μž…λ‹ˆλ‹€.
-- μ†ŒμŠ€μ½”λ“œλŠ” μ—¬κΈ°

 

1. NLPμ—μ„œμ˜ μ‹œν€€μŠ€

1.1 μ‹œν€€μŠ€λž€

  • μˆœμ„œκ°€ μžˆλŠ” ν•­λͺ©μ˜ λͺ¨μŒ
  • 순차 데이터

β–Ά μ˜ˆμ‹œ

The book is on the table.
The boos are on the table.

μœ„ λ‘κ°œμ˜ λ¬Έμž₯은 μ˜μ–΄μ—μ„œ λ‹€μˆ˜μΈμ§€ λ³΅μˆ˜μΈμ§€μ— 따라 동사가 달라진닀. 이런 λ¬Έμž₯은 μ•„λž˜μ™€ 같이 λ¬Έμž₯이 κΈΈμ–΄μ§ˆ 수둝 μ˜μ‘΄μ„±μ΄ 더 λ†’μ•„μ§ˆ 수 μžˆλ‹€.

The book that I got yesterday is on the table.
The books read by the second=grade children are shelved in the lower rack.

 

λ”₯λŸ¬λ‹μ—μ„œμ˜ μ‹œν€€μŠ€ λͺ¨λΈλ§μ€ μˆ¨κ²¨μ§„ 'μƒνƒœ 정보(은닉 μƒνƒœ)'λ₯Ό μœ μ§€ν•˜λŠ” 것과 관련이 μžˆλ‹€. μ‹œν€€μŠ€μ— μžˆλŠ” 각 ν•­λͺ©μ΄ 은닉 μƒνƒœλ₯Ό μ—…λ°μ΄νŠΈ ν•˜κ³ , μ‹œν€€μŠ€ ν‘œν˜„μ΄λΌκ³  λΆˆλ¦¬λŠ” 이 μ€λ‹‰μƒνƒœμ˜ 벑터λ₯Ό μ‹œν€€μŠ€ λͺ¨λΈλ§ μž‘μ—…μ— ν™œμš©ν•˜λŠ” 과정을 κ±°μΉœλ‹€. κ°€μž₯ λŒ€ν‘œμ μΈ μ‹œν€€μŠ€ 신경망 λͺ¨λΈμ€ 'RNN(Recurrent nerual network)'이닀. 그럼 NLPμ—μ„œμ˜ μ‹œν€€μŠ€ λͺ¨λ°λ£¨ RNN에 λŒ€ν•΄μ„œ μ•Œμ•„λ³΄λ„λ‘ ν•˜μž.

 

2. μˆœν™˜ 신경망, RNN (recurrent neural network)

  • RNN의 λͺ©μ μ€ μ‹œν€€μŠ€ ν…μ„œλ₯Ό λͺ¨λΈλ§ ν•˜λŠ” 것
  • μž…λ ₯κ³Ό 좜λ ₯을 μ‹œν€€μŠ€ λ‹¨μœ„λ‘œ μ²˜λ¦¬ν•¨
  • RNN의 μ’…λ₯˜λŠ” μ—¬λŸ¬κ°€μ§€κ°€ μžˆμ§€λ§Œ, ν•΄λ‹Ή ν¬μŠ€νŒ…μ—μ„œλŠ” μ—˜λ§ŒRNN에 λŒ€ν•΄ λ‹€λ£° κ²ƒμž„
    • λ‘κ°œμ˜ RNN 을 ν™œμš©ν•œ sequence2sequence λ‹€μ–‘ν•œ RNNλͺ¨λΈμ΄ NLPμ˜μ—­μ—μ„œ ν™œμš©λ˜κ³  μžˆλ‹€.
  • 같은 νŒŒλΌλ―Έν„°λ₯Ό ν™œμš©ν•΄μ„œ νƒ€μž„ μŠ€ν…λ§ˆλ‹€ 좜λ ₯을 κ³„μ‚°ν•˜κ³ , μ΄λ•Œ μ€λ‹‰ μƒνƒœμ˜ 벑터에 μ˜μ‘΄ν•΄μ„œ μ‹œν€€μŠ€μ˜ μƒνƒœλ₯Ό κ°μ§€ν•œλ‹€.
  • RNN의 μ£Ό λͺ©μ μ€ 주어진 은닉 μƒνƒœ 벑터와 μž…λ ₯ 벑터에 λŒ€ν•œ 좜λ ₯을 κ³„μ‚°ν•¨μœΌλ‘œμ¨ μ‹œν€€μŠ€μ˜ λΆˆλ³€μ„±μ„ ν•™μŠ΅ν•˜λŠ” 것이닀.

 

2.1 λ™μž‘ 방식

  • ν”Όλ“œ ν¬μ›Œλ“œ μ‹ κ²½λ§κ³ΌλŠ” λ‹€λ₯΄κ²Œ, 은닉측 λ…Έλ“œμ—μ„œ ν™œμ„±ν™” ν•¨μˆ˜λ₯Ό 톡해 λ‚˜μ˜¨ 결과값을 좜λ ₯μΈ΅ λ°©ν–₯μœΌλ‘œλ„ λ΄¬λ©΄μ„œ λ™μ‹œμ— λ‹€μ‹œ 은닉측 λ…Έλ“œμ˜ λ‹€μŒ 계산 μž…λ ₯으둜 λ³΄λ‚΄λŠ” νŠΉμ§•μ„ 가짐
  • 즉, ν˜„μž¬ μž…λ ₯ 벑터와 이전 은닉 μƒνƒœ λ²‘ν„°λ‘œ ν˜„μž¬μ˜ 은닉 μƒνƒœ 벑터λ₯Ό 계산함
  • μ—˜λ§ŒRNN μ—μ„œλŠ” 은닉벑터가 예츑 λŒ€μƒμ΄λ‹€.

νŒŒμ΄ν† μΉ˜λ‘œ λ°°μš°λŠ” μžμ—°μ–΄μ²˜λ¦¬ p183

  • ꡬ체적인 계산 방식은 μ•„λž˜μ™€ κ°™λ‹€

νŒŒμ΄ν† μΉ˜λ‘œ λ°°μš°λŠ” μžμ—°μ–΄ 처리 p184

  • 은닉-은닉 κ°€μ€‘μΉ˜ 행렬을 μ‚¬μš©ν•΄ 이전 은닉 μƒνƒœ 벑터λ₯Ό 맀핑
  • μž…λ ₯-은닉 κ°€μ€‘μΉ˜ 행렬을 μ‚¬μš©ν•΄ ν˜„μž¬ μž…λ ₯ 벑터λ₯Ό 맀핑
  • 두 개의 값을 λ”ν•˜μ—¬ μƒˆλ‘œμš΄ 은닉 벑터λ₯Ό μƒμ„±ν•˜λŠ” 과정을 κ±°μΉœλ‹€.

 

3. RNN κ΅¬ν˜„ν•˜κΈ°

β–Ά column_gather ν•¨μˆ˜

  • 배치 ν–‰ 인덱슀λ₯Ό μˆœνšŒν•˜λ©΄μ„œ x_lengths에 μžˆλŠ” 값에 ν•΄λ‹Ήν•˜λŠ” 인덱슀 μœ„μΉ˜ (즉, μ‹œν€€μŠ€μ˜ λ§ˆμ§€λ§‰ μΈλ±μŠ€μ— μžˆλŠ”)의 벑터λ₯Ό λ°˜ν™˜ν•˜λŠ” ν•¨μˆ˜
def column_gather(y_out, x_lengths):
    ''' y_out에 μžˆλŠ” 각 데이터 ν¬μΈνŠΈμ—μ„œ λ§ˆμ§€λ§‰ 벑터 μΆ”μΆœν•©λ‹ˆλ‹€

    쑰금 더 ꡬ체적으둜 λ§ν•˜λ©΄ 배치 ν–‰ 인덱슀λ₯Ό μˆœνšŒν•˜λ©΄μ„œ
    x_lengths에 μžˆλŠ” 값에 ν•΄λ‹Ήν•˜λŠ” 인덱슀 μœ„μΉ˜μ˜ 벑터λ₯Ό λ°˜ν™˜ν•©λ‹ˆλ‹€.

    λ§€κ°œλ³€μˆ˜:
        y_out (torch.FloatTensor, torch.cuda.FloatTensor)
            shape: (batch, sequence, feature)
        x_lengths (torch.LongTensor, torch.cuda.LongTensor)
            shape: (batch,)

    λ°˜ν™˜κ°’:
        y_out (torch.FloatTensor, torch.cuda.FloatTensor)
            shape: (batch, feature)
    '''
    x_lengths = x_lengths.long().detach().cpu().numpy() - 1

    out = []
    for batch_index, column_index in enumerate(x_lengths):
        out.append(y_out[batch_index, column_index])

    return torch.stack(out)

 

 

β–Άμ—˜λ§ŒRNN ν•¨μˆ˜

  • RNNCell을 μ‚¬μš©ν•˜μ—¬ λ§Œλ“  RNN λͺ¨λΈ
  • μ΅œμ’…κ°’μœΌλ‘œ 각 νƒ€μž„stepμ—μ„œμ˜ 은닉벑터λ₯Ό λ°˜ν™˜ν•œλ‹€
class ElmanRNN(nn.Module):
    """ RNNCell을 μ‚¬μš©ν•˜μ—¬ λ§Œλ“  μ—˜λ§Œ RNN """
    def __init__(self, input_size, hidden_size, batch_first=False):
        """
        λ§€κ°œλ³€μˆ˜:
            input_size (int): μž…λ ₯ 벑터 크기
            hidden_size (int): 은닉 μƒνƒœ 벑터 크기
            batch_first (bool): 0번째 차원이 λ°°μΉ˜μΈμ§€ μ—¬λΆ€
        """
        super(ElmanRNN, self).__init__()
        
        # RNNCell을 μ‚¬μš©ν•˜μ—¬ μž…λ ₯-은닉 κ°€μ€‘μΉ˜ ν–‰λ ¬κ³Ό 은닉-μ€λ‹‰κ°€μ€‘μΉ˜ν–‰λ ¬μ„ λ§Œλ“œλŠ” κ²ƒμž„
        # RNNCell을 ν˜ΈμΆœν• λ•Œλ§ˆλ‹€ μž…λ ₯벑터와 은닉행렬을 λ°›λŠ”λ‹€
        self.rnn_cell = nn.RNNCell(input_size, hidden_size)
        
        self.batch_first = batch_first
        self.hidden_size = hidden_size

    def _initial_hidden(self, batch_size):
        return torch.zeros((batch_size, self.hidden_size))

    def forward(self, x_in, initial_hidden=None):
      # μž…λ ₯ν…μ„œλ₯Ό μˆœνšŒν•˜λ©΄μ„œ νƒ€μž…μŠ€ν…λ§ˆλ‹€ μ€λ‹‰μƒνƒœ 벑터λ₯Ό κ³„μ‚°ν•΄μ£ΌλŠ” forwardλ©”μ„œλ“œ
        """ ElmanRNN의 μ •λ°©ν–₯ 계산
        
        λ§€κ°œλ³€μˆ˜:
            x_in (torch.Tensor): μž…λ ₯ 데이터 ν…μ„œ 
                If self.batch_first: x_in.shape = (batch_size, seq_size, feat_size)
                Else: x_in.shape = (seq_size, batch_size, feat_size)
            initial_hidden (torch.Tensor): RNN의 초기 은닉 μƒνƒœ
        λ°˜ν™˜κ°’:
            hiddens (torch.Tensor): 각 νƒ€μž„ μŠ€ν…μ—μ„œ RNN 좜λ ₯
                If self.batch_first: 
                   hiddens.shape = (batch_size, seq_size, hidden_size)
                Else: hiddens.shape = (seq_size, batch_size, hidden_size)
        """
        # batch_firstκ°€ True이면좜λ ₯μ€λ‹‰μƒνƒœμ˜ 배치 차원을 0번째둜 λ°”κΏˆ(?)
        if self.batch_first:
            batch_size, seq_size, feat_size = x_in.size()
            x_in = x_in.permute(1, 0, 2)
        else:
            seq_size, batch_size, feat_size = x_in.size()
    
        hiddens = []

        if initial_hidden is None:
            initial_hidden = self._initial_hidden(batch_size)
            initial_hidden = initial_hidden.to(x_in.device)

        hidden_t = initial_hidden
                    
        for t in range(seq_size):
            hidden_t = self.rnn_cell(x_in[t], hidden_t)
            hiddens.append(hidden_t)
            
        hiddens = torch.stack(hiddens)

        if self.batch_first:
            hiddens = hiddens.permute(1, 0, 2)

        return hiddens

 

β–Άν›ˆλ ¨μ—μ„œ μ‚¬μš©ν•  SurnameClassifier ν•¨μˆ˜

  • RNNκ³Ό Linear측으둜 λ‚˜λ‰¨
  • RNNμΈ΅μ—μ„œ μ‹œν€€μŠ€μ˜ 벑터 ν‘œν˜„ (νžˆλ“ λ²‘ν„°)λ₯Ό 계산해주고 μ„±μ”¨μ˜ λ§ˆμ§€λ§‰ λ¬Έμžμ— ν•΄λ‹Ήν•˜λŠ” 벑터λ₯Ό μΆ”μΆœν•΄μ£ΌλŠ” 역할을 μˆ˜ν–‰ (μ„±μ”¨μ˜ λ§ˆμ§€λ§‰ λ¬Έμžμ— ν•΄λ‹Ήν•˜λŠ” λ²‘ν„°λž€, μ΅œμ’… 벑터λ₯Ό λ§ν•œλ‹€.)
  • μ΅œμ’…λ²‘ν„°κ°€ μ „μ²΄μ‹œν€€μŠ€ μž…λ ₯을 거쳐 μ „λ‹¬λœ 결과물인 μš”μ•½λ²‘ν„°λΌκ³  ν•  μˆ˜μžˆλ‹€.
  • LinearμΈ΅μ—μ„œ μš”μ•½λ²‘ν„°λ₯Ό ν™œμš©ν•˜μ—¬ μ˜ˆμΈ‘λ²‘ν„°λ₯Ό κ³„μ‚°ν•œλ‹€.
# RNNμΈ΅, Linear 측으둜 λ‚˜λ‰¨
class SurnameClassifier(nn.Module):
    """ RNN으둜 νŠΉμ„±μ„ μΆ”μΆœν•˜κ³  MLP둜 λΆ„λ₯˜ν•˜λŠ” λΆ„λ₯˜ λͺ¨λΈ """
    def __init__(self, embedding_size, num_embeddings, num_classes,
                 rnn_hidden_size, batch_first=True, padding_idx=0):
        """
        λ§€κ°œλ³€μˆ˜:
            embedding_size (int): 문자 μž„λ² λ”©μ˜ 크기
            num_embeddings (int): μž„λ² λ”©ν•  문자 개수
            num_classes (int): 예츑 λ²‘ν„°μ˜ 크기
                λ…ΈνŠΈ: ꡭ적 개수
            rnn_hidden_size (int): RNN의 은닉 μƒνƒœ 크기
            batch_first (bool): μž…λ ₯ ν…μ„œμ˜ 0번째 차원이 λ°°μΉ˜μΈμ§€ μ‹œν€€μŠ€μΈμ§€ λ‚˜νƒ€λ‚΄λŠ” ν”Œλž˜κ·Έ
            padding_idx (int): ν…μ„œ νŒ¨λ”©μ„ μœ„ν•œ 인덱슀; 
                torch.nn.Embedding을 μ°Έκ³ ν•˜μ„Έμš”
        """
        super(SurnameClassifier, self).__init__()

        #λ¨Όμ € μž„λ² λ”©μΈ΅μ„ μ‚¬μš©ν•˜μ—¬ μ •μˆ˜λ₯Ό μž„λ² λ”©ν•΄μ€Œ
        self.emb = nn.Embedding(num_embeddings=num_embeddings,
                                embedding_dim=embedding_size,
                                padding_idx=padding_idx)
        # κ·Έ λ‹€μŒ RNN측으둜 μ‹œν€€μŠ€μ˜ λ²‘ν„°ν‘œν˜„μ„ κ³„μ‚°ν•΄μ€Œ
        # 이 λ²‘ν„°λŠ” 성씨에 μžˆλŠ” 각 λ¬Έμžμ— λŒ€ν•œ μ€λ‹‰μƒνƒœλ₯Ό λ‚˜νƒ€λƒ„
        # μ„±μ”¨μ˜ λ§ˆμ§€λ§‰ λ¬Έμžμ— ν•΄λ‹Ήν•˜λŠ” 벑터λ₯Ό μΆ”μΆœ (μ΅œμ’…λ²‘ν„°)
        # 이 μ΅œμ’…λ²‘ν„°κ°€ 전체 μ‹œν€€μŠ€ μž…λ ₯을 거쳐 μ „λ‹¬λœ 결과물이라고 ν•  수있음 (성씨λ₯Ό μš”μ•½ν•œ 벑터)
        self.rnn = ElmanRNN(input_size=embedding_size,
                             hidden_size=rnn_hidden_size,
                             batch_first=batch_first)
        # μš”μ•½λ²‘ν„°λ₯Ό linear 측으둜 μ „λ‹¬ν•˜μ—¬ μ˜ˆμΈ‘λ²‘ν„° 계산
        # μ˜ˆμΈ‘λ²‘ν„°λ₯Ό μ‚¬μš©ν•˜μ—¬ softmaxν•¨μˆ˜μ— μ μš©ν•˜κ±°λ‚˜, ν›ˆλ ¨ 손싀을 κ³„μ‚°ν•˜μ—¬ 성씨에 λŒ€ν•œ ν™•λ₯  뢄포λ₯Ό λ§Œλ“ λ‹€.
        self.fc1 = nn.Linear(in_features=rnn_hidden_size,
                         out_features=rnn_hidden_size)
        self.fc2 = nn.Linear(in_features=rnn_hidden_size,
                          out_features=num_classes)
    
    # μ‹œν€€μŠ€μ˜ 길이 x_lengthκ°€ ν•„μš” 
    # μ‹œν€€μŠ€μ˜ κΈΈμ΄λŠ” gather_column()ν•¨μˆ˜λ₯Ό μ‚¬μš©ν•˜μ—¬ ν…μ„œμ—μ„œ μ‹œν€€μŠ€λ§ˆλ‹€ λ§ˆμ§€λ§‰ 벑터λ₯Ό μΆ”μΆœν•˜μ—¬ λ°˜ν™˜
    def forward(self, x_in, x_lengths=None, apply_softmax=False):
        """ λΆ„λ₯˜κΈ°μ˜ μ •λ°©ν–₯ 계산
        
        λ§€κ°œλ³€μˆ˜:
            x_in (torch.Tensor): μž…λ ₯ 데이터 ν…μ„œ 
                x_in.shapeλŠ” (batch, input_dim)μž…λ‹ˆλ‹€
            x_lengths (torch.Tensor): λ°°μΉ˜μ— μžˆλŠ” 각 μ‹œν€€μŠ€μ˜ 길이
                μ‹œν€€μŠ€μ˜ λ§ˆμ§€λ§‰ 벑터λ₯Ό μ°ΎλŠ”λ° μ‚¬μš©ν•©λ‹ˆλ‹€
            apply_softmax (bool): μ†Œν”„νŠΈλ§₯슀 ν™œμ„±ν™” ν•¨μˆ˜λ₯Ό μœ„ν•œ ν”Œλž˜κ·Έ
                크둜슀-μ—”νŠΈλ‘œν”Ό 손싀을 μ‚¬μš©ν•˜λ €λ©΄ False둜 μ§€μ •ν•©λ‹ˆλ‹€
        λ°˜ν™˜κ°’:
            κ²°κ³Ό ν…μ„œ. tensor.shapeλŠ” (batch, output_dim)μž…λ‹ˆλ‹€.
        """
        x_embedded = self.emb(x_in)
        y_out = self.rnn(x_embedded)

        if x_lengths is not None:
            # column_gather : 쑰금 더 ꡬ체적으둜 λ§ν•˜λ©΄ 배치 ν–‰ 인덱슀λ₯Ό μˆœνšŒν•˜λ©΄μ„œ 
            # x_lengths에 μžˆλŠ” 값에 ν•΄λ‹Ήν•˜λŠ”(μ‹œν€€μŠ€μ˜ λ§ˆμ§€λ§‰ μΈλ±μŠ€μ— μžˆλŠ”) 인덱슀 μœ„μΉ˜μ˜ 벑터λ₯Ό λ°˜ν™˜ν•˜λŠ” ν•¨μˆ˜
            y_out = column_gather(y_out, x_lengths)
        else:
            y_out = y_out[:, -1, :]

        y_out = F.relu(self.fc1(F.dropout(y_out, 0.5)))
        y_out = self.fc2(F.dropout(y_out, 0.5))

        if apply_softmax:
            y_out = F.softmax(y_out, dim=1)

        return y_out

 

 

 

λ°˜μ‘ν˜•