-- λ³Έ ν¬μ€ν
μ νμ΄ν μΉλ‘ λ°°μ°λ μμ°μ΄ μ²λ¦¬ (νλΉλ―Έλμ΄) μ±
μ μ°Έκ³ ν΄μ μμ±λ κΈμ
λλ€.
-- μμ€μ½λλ μ¬κΈ°
1. NLPμμμ μνμ€
1.1 μνμ€λ
- μμκ° μλ νλͺ©μ λͺ¨μ
- μμ°¨ λ°μ΄ν°
βΆ μμ
The book is on the table.
The boos are on the table.
μ λκ°μ λ¬Έμ₯μ μμ΄μμ λ€μμΈμ§ 볡μμΈμ§μ λ°λΌ λμ¬κ° λ¬λΌμ§λ€. μ΄λ° λ¬Έμ₯μ μλμ κ°μ΄ λ¬Έμ₯μ΄ κΈΈμ΄μ§ μλ‘ μμ‘΄μ±μ΄ λ λμμ§ μ μλ€.
The book that I got yesterday is on the table.
The books read by the second=grade children are shelved in the lower rack.
λ₯λ¬λμμμ μνμ€ λͺ¨λΈλ§μ μ¨κ²¨μ§ 'μν μ 보(μλ μν)'λ₯Ό μ μ§νλ κ²κ³Ό κ΄λ ¨μ΄ μλ€. μνμ€μ μλ κ° νλͺ©μ΄ μλ μνλ₯Ό μ λ°μ΄νΈ νκ³ , μνμ€ ννμ΄λΌκ³ λΆλ¦¬λ μ΄ μλμνμ 벑ν°λ₯Ό μνμ€ λͺ¨λΈλ§ μμ μ νμ©νλ κ³Όμ μ κ±°μΉλ€. κ°μ₯ λνμ μΈ μνμ€ μ κ²½λ§ λͺ¨λΈμ 'RNN(Recurrent nerual network)'μ΄λ€. κ·ΈλΌ NLPμμμ μνμ€ λͺ¨λ°λ£¨ RNNμ λν΄μ μμ보λλ‘ νμ.
2. μν μ κ²½λ§, RNN (recurrent neural network)
- RNNμ λͺ©μ μ μνμ€ ν μλ₯Ό λͺ¨λΈλ§ νλ κ²
- μ λ ₯κ³Ό μΆλ ₯μ μνμ€ λ¨μλ‘ μ²λ¦¬ν¨
- RNNμ μ’
λ₯λ μ¬λ¬κ°μ§κ° μμ§λ§, ν΄λΉ ν¬μ€ν
μμλ μλ§RNNμ λν΄ λ€λ£° κ²μ
- λκ°μ RNN μ νμ©ν sequence2sequence λ€μν RNNλͺ¨λΈμ΄ NLPμμμμ νμ©λκ³ μλ€.
- κ°μ νλΌλ―Έν°λ₯Ό νμ©ν΄μ νμ μ€ν λ§λ€ μΆλ ₯μ κ³μ°νκ³ , μ΄λ μλ μνμ 벑ν°μ μμ‘΄ν΄μ μνμ€μ μνλ₯Ό κ°μ§νλ€.
- RNNμ μ£Ό λͺ©μ μ μ£Όμ΄μ§ μλ μν 벑ν°μ μ λ ₯ 벑ν°μ λν μΆλ ₯μ κ³μ°ν¨μΌλ‘μ¨ μνμ€μ λΆλ³μ±μ νμ΅νλ κ²μ΄λ€.
2.1 λμ λ°©μ
- νΌλ ν¬μλ μ κ²½λ§κ³Όλ λ€λ₯΄κ², μλμΈ΅ λ Έλμμ νμ±ν ν¨μλ₯Ό ν΅ν΄ λμ¨ κ²°κ³Όκ°μ μΆλ ₯μΈ΅ λ°©ν₯μΌλ‘λ 봬면μ λμμ λ€μ μλμΈ΅ λ Έλμ λ€μ κ³μ° μ λ ₯μΌλ‘ 보λ΄λ νΉμ§μ κ°μ§
- μ¦, νμ¬ μ λ ₯ 벑ν°μ μ΄μ μλ μν 벑ν°λ‘ νμ¬μ μλ μν 벑ν°λ₯Ό κ³μ°ν¨
- μλ§RNN μμλ μλ벑ν°κ° μμΈ‘ λμμ΄λ€.
- ꡬ체μ μΈ κ³μ° λ°©μμ μλμ κ°λ€
- μλ-μλ κ°μ€μΉ νλ ¬μ μ¬μ©ν΄ μ΄μ μλ μν 벑ν°λ₯Ό 맀ν
- μ λ ₯-μλ κ°μ€μΉ νλ ¬μ μ¬μ©ν΄ νμ¬ μ λ ₯ 벑ν°λ₯Ό 맀ν
- λ κ°μ κ°μ λνμ¬ μλ‘μ΄ μλ 벑ν°λ₯Ό μμ±νλ κ³Όμ μ κ±°μΉλ€.
3. RNN ꡬννκΈ°
- RNNμ νμ©ν΄μ 'μ±μ¨ λΆλ₯'νλ μμ λ₯Ό μ§νν μμ
- λ°μ΄ν°λ₯Ό λ§λ€μ΄μ£ΌλDataset, Vocabulary, Vectorizer classλ μ΄μ ν¬μ€ν κ³Ό λμΌ (ν΄λ¦)
- μνμ€ ννμ μΈν λ°μ΄ν°λ₯Ό νμ©νλ κ²μ΄ μ°¨μ΄μ
βΆ column_gather ν¨μ
- λ°°μΉ ν μΈλ±μ€λ₯Ό μννλ©΄μ x_lengthsμ μλ κ°μ ν΄λΉνλ μΈλ±μ€ μμΉ (μ¦, μνμ€μ λ§μ§λ§ μΈλ±μ€μ μλ)μ 벑ν°λ₯Ό λ°ννλ ν¨μ
def column_gather(y_out, x_lengths):
''' y_outμ μλ κ° λ°μ΄ν° ν¬μΈνΈμμ λ§μ§λ§ λ²‘ν° μΆμΆν©λλ€
μ‘°κΈ λ ꡬ체μ μΌλ‘ λ§νλ©΄ λ°°μΉ ν μΈλ±μ€λ₯Ό μννλ©΄μ
x_lengthsμ μλ κ°μ ν΄λΉνλ μΈλ±μ€ μμΉμ 벑ν°λ₯Ό λ°νν©λλ€.
맀κ°λ³μ:
y_out (torch.FloatTensor, torch.cuda.FloatTensor)
shape: (batch, sequence, feature)
x_lengths (torch.LongTensor, torch.cuda.LongTensor)
shape: (batch,)
λ°νκ°:
y_out (torch.FloatTensor, torch.cuda.FloatTensor)
shape: (batch, feature)
'''
x_lengths = x_lengths.long().detach().cpu().numpy() - 1
out = []
for batch_index, column_index in enumerate(x_lengths):
out.append(y_out[batch_index, column_index])
return torch.stack(out)
βΆμλ§RNN ν¨μ
- RNNCellμ μ¬μ©νμ¬ λ§λ RNN λͺ¨λΈ
- μ΅μ’ κ°μΌλ‘ κ° νμstepμμμ μλ벑ν°λ₯Ό λ°ννλ€
class ElmanRNN(nn.Module):
""" RNNCellμ μ¬μ©νμ¬ λ§λ μλ§ RNN """
def __init__(self, input_size, hidden_size, batch_first=False):
"""
맀κ°λ³μ:
input_size (int): μ
λ ₯ λ²‘ν° ν¬κΈ°
hidden_size (int): μλ μν λ²‘ν° ν¬κΈ°
batch_first (bool): 0λ²μ§Έ μ°¨μμ΄ λ°°μΉμΈμ§ μ¬λΆ
"""
super(ElmanRNN, self).__init__()
# RNNCellμ μ¬μ©νμ¬ μ
λ ₯-μλ κ°μ€μΉ νλ ¬κ³Ό μλ-μλκ°μ€μΉνλ ¬μ λ§λλ κ²μ
# RNNCellμ νΈμΆν λλ§λ€ μ
λ ₯벑ν°μ μλνλ ¬μ λ°λλ€
self.rnn_cell = nn.RNNCell(input_size, hidden_size)
self.batch_first = batch_first
self.hidden_size = hidden_size
def _initial_hidden(self, batch_size):
return torch.zeros((batch_size, self.hidden_size))
def forward(self, x_in, initial_hidden=None):
# μ
λ ₯ν
μλ₯Ό μννλ©΄μ νμ
μ€ν
λ§λ€ μλμν 벑ν°λ₯Ό κ³μ°ν΄μ£Όλ forwardλ©μλ
""" ElmanRNNμ μ λ°©ν₯ κ³μ°
맀κ°λ³μ:
x_in (torch.Tensor): μ
λ ₯ λ°μ΄ν° ν
μ
If self.batch_first: x_in.shape = (batch_size, seq_size, feat_size)
Else: x_in.shape = (seq_size, batch_size, feat_size)
initial_hidden (torch.Tensor): RNNμ μ΄κΈ° μλ μν
λ°νκ°:
hiddens (torch.Tensor): κ° νμ μ€ν
μμ RNN μΆλ ₯
If self.batch_first:
hiddens.shape = (batch_size, seq_size, hidden_size)
Else: hiddens.shape = (seq_size, batch_size, hidden_size)
"""
# batch_firstκ° Trueμ΄λ©΄μΆλ ₯μλμνμ λ°°μΉ μ°¨μμ 0λ²μ§Έλ‘ λ°κΏ(?)
if self.batch_first:
batch_size, seq_size, feat_size = x_in.size()
x_in = x_in.permute(1, 0, 2)
else:
seq_size, batch_size, feat_size = x_in.size()
hiddens = []
if initial_hidden is None:
initial_hidden = self._initial_hidden(batch_size)
initial_hidden = initial_hidden.to(x_in.device)
hidden_t = initial_hidden
for t in range(seq_size):
hidden_t = self.rnn_cell(x_in[t], hidden_t)
hiddens.append(hidden_t)
hiddens = torch.stack(hiddens)
if self.batch_first:
hiddens = hiddens.permute(1, 0, 2)
return hiddens
βΆνλ ¨μμ μ¬μ©ν SurnameClassifier ν¨μ
- RNNκ³Ό LinearμΈ΅μΌλ‘ λλ¨
- RNNμΈ΅μμ μνμ€μ λ²‘ν° νν (νλ 벑ν°)λ₯Ό κ³μ°ν΄μ£Όκ³ μ±μ¨μ λ§μ§λ§ λ¬Έμμ ν΄λΉνλ 벑ν°λ₯Ό μΆμΆν΄μ£Όλ μν μ μν (μ±μ¨μ λ§μ§λ§ λ¬Έμμ ν΄λΉνλ 벑ν°λ, μ΅μ’ 벑ν°λ₯Ό λ§νλ€.)
- μ΅μ’ 벑ν°κ° μ 체μνμ€ μ λ ₯μ κ±°μ³ μ λ¬λ κ²°κ³Όλ¬ΌμΈ μμ½λ²‘ν°λΌκ³ ν μμλ€.
- LinearμΈ΅μμ μμ½λ²‘ν°λ₯Ό νμ©νμ¬ μ츑벑ν°λ₯Ό κ³μ°νλ€.
# RNNμΈ΅, Linear μΈ΅μΌλ‘ λλ¨
class SurnameClassifier(nn.Module):
""" RNNμΌλ‘ νΉμ±μ μΆμΆνκ³ MLPλ‘ λΆλ₯νλ λΆλ₯ λͺ¨λΈ """
def __init__(self, embedding_size, num_embeddings, num_classes,
rnn_hidden_size, batch_first=True, padding_idx=0):
"""
맀κ°λ³μ:
embedding_size (int): λ¬Έμ μλ² λ©μ ν¬κΈ°
num_embeddings (int): μλ² λ©ν λ¬Έμ κ°μ
num_classes (int): μμΈ‘ 벑ν°μ ν¬κΈ°
λ
ΈνΈ: κ΅μ κ°μ
rnn_hidden_size (int): RNNμ μλ μν ν¬κΈ°
batch_first (bool): μ
λ ₯ ν
μμ 0λ²μ§Έ μ°¨μμ΄ λ°°μΉμΈμ§ μνμ€μΈμ§ λνλ΄λ νλκ·Έ
padding_idx (int): ν
μ ν¨λ©μ μν μΈλ±μ€;
torch.nn.Embeddingμ μ°Έκ³ νμΈμ
"""
super(SurnameClassifier, self).__init__()
#λ¨Όμ μλ² λ©μΈ΅μ μ¬μ©νμ¬ μ μλ₯Ό μλ² λ©ν΄μ€
self.emb = nn.Embedding(num_embeddings=num_embeddings,
embedding_dim=embedding_size,
padding_idx=padding_idx)
# κ·Έ λ€μ RNNμΈ΅μΌλ‘ μνμ€μ 벑ν°ννμ κ³μ°ν΄μ€
# μ΄ λ²‘ν°λ μ±μ¨μ μλ κ° λ¬Έμμ λν μλμνλ₯Ό λνλ
# μ±μ¨μ λ§μ§λ§ λ¬Έμμ ν΄λΉνλ 벑ν°λ₯Ό μΆμΆ (μ΅μ’
벑ν°)
# μ΄ μ΅μ’
벑ν°κ° μ 체 μνμ€ μ
λ ₯μ κ±°μ³ μ λ¬λ κ²°κ³Όλ¬Όμ΄λΌκ³ ν μμμ (μ±μ¨λ₯Ό μμ½ν 벑ν°)
self.rnn = ElmanRNN(input_size=embedding_size,
hidden_size=rnn_hidden_size,
batch_first=batch_first)
# μμ½λ²‘ν°λ₯Ό linear μΈ΅μΌλ‘ μ λ¬νμ¬ μμΈ‘λ²‘ν° κ³μ°
# μ츑벑ν°λ₯Ό μ¬μ©νμ¬ softmaxν¨μμ μ μ©νκ±°λ, νλ ¨ μμ€μ κ³μ°νμ¬ μ±μ¨μ λν νλ₯ λΆν¬λ₯Ό λ§λ λ€.
self.fc1 = nn.Linear(in_features=rnn_hidden_size,
out_features=rnn_hidden_size)
self.fc2 = nn.Linear(in_features=rnn_hidden_size,
out_features=num_classes)
# μνμ€μ κΈΈμ΄ x_lengthκ° νμ
# μνμ€μ κΈΈμ΄λ gather_column()ν¨μλ₯Ό μ¬μ©νμ¬ ν
μμμ μνμ€λ§λ€ λ§μ§λ§ 벑ν°λ₯Ό μΆμΆνμ¬ λ°ν
def forward(self, x_in, x_lengths=None, apply_softmax=False):
""" λΆλ₯κΈ°μ μ λ°©ν₯ κ³μ°
맀κ°λ³μ:
x_in (torch.Tensor): μ
λ ₯ λ°μ΄ν° ν
μ
x_in.shapeλ (batch, input_dim)μ
λλ€
x_lengths (torch.Tensor): λ°°μΉμ μλ κ° μνμ€μ κΈΈμ΄
μνμ€μ λ§μ§λ§ 벑ν°λ₯Ό μ°Ύλλ° μ¬μ©ν©λλ€
apply_softmax (bool): μννΈλ§₯μ€ νμ±ν ν¨μλ₯Ό μν νλκ·Έ
ν¬λ‘μ€-μνΈλ‘νΌ μμ€μ μ¬μ©νλ €λ©΄ Falseλ‘ μ§μ ν©λλ€
λ°νκ°:
κ²°κ³Ό ν
μ. tensor.shapeλ (batch, output_dim)μ
λλ€.
"""
x_embedded = self.emb(x_in)
y_out = self.rnn(x_embedded)
if x_lengths is not None:
# column_gather : μ‘°κΈ λ ꡬ체μ μΌλ‘ λ§νλ©΄ λ°°μΉ ν μΈλ±μ€λ₯Ό μννλ©΄μ
# x_lengthsμ μλ κ°μ ν΄λΉνλ(μνμ€μ λ§μ§λ§ μΈλ±μ€μ μλ) μΈλ±μ€ μμΉμ 벑ν°λ₯Ό λ°ννλ ν¨μ
y_out = column_gather(y_out, x_lengths)
else:
y_out = y_out[:, -1, :]
y_out = F.relu(self.fc1(F.dropout(y_out, 0.5)))
y_out = self.fc2(F.dropout(y_out, 0.5))
if apply_softmax:
y_out = F.softmax(y_out, dim=1)
return y_out