ํŠœํ† ๋ฆฌ์–ผ: ์ฒ˜์Œ๋ถ€ํ„ฐ ์ŠคํŽ˜์ธ์–ด์šฉ ELECTRA๋ฅผ ์‚ฌ์ „ ๊ต์œกํ•˜๋Š” ๋ฐฉ๋ฒ•

๋ชฉ์ฐจ

ํŠœํ† ๋ฆฌ์–ผ: ์ฒ˜์Œ๋ถ€ํ„ฐ ์ŠคํŽ˜์ธ์–ด์šฉ ELECTRA๋ฅผ ์‚ฌ์ „ ๊ต์œกํ•˜๋Š” ๋ฐฉ๋ฒ•

    Skim AI์˜ ๋จธ์‹ ๋Ÿฌ๋‹ ์—ฐ๊ตฌ์› ํฌ๋ฆฌ์Šค ํŠธ๋ž€์ด ์ฒ˜์Œ ๊ฒŒ์‹œํ–ˆ์Šต๋‹ˆ๋‹ค.

Google Colab์—์„œ ์‹คํ–‰

์†Œ๊ฐœ

์ด ๊ธ€์—์„œ๋Š” Transformer ์‚ฌ์ „ ํ›ˆ๋ จ ๋ฐฉ๋ฒ• ์ œํ’ˆ๊ตฐ์˜ ๋˜ ๋‹ค๋ฅธ ๊ตฌ์„ฑ์›์ธ ELECTRA๋ฅผ ์ŠคํŽ˜์ธ์–ด์šฉ์œผ๋กœ ์‚ฌ์ „ ํ›ˆ๋ จํ•˜์—ฌ ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ ๋ฒค์น˜๋งˆํฌ์—์„œ ์ตœ์ฒจ๋‹จ ๊ฒฐ๊ณผ๋ฅผ ์–ป๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค. ๋‹ค์–‘ํ•œ ์‚ฌ์šฉ ์‚ฌ๋ก€๋ฅผ ์œ„ํ•œ ์ŠคํŽ˜์ธ์–ด์šฉ ๋งž์ถคํ˜• BERT ์–ธ์–ด ๋ชจ๋ธ ํ›ˆ๋ จ์— ๋Œ€ํ•œ ์‹œ๋ฆฌ์ฆˆ ์ค‘ 3๋ถ€์ž…๋‹ˆ๋‹ค:


1. ์†Œ๊ฐœ

ICLR 2020์—์„œ, ELECTRA: ํ…์ŠคํŠธ ์ธ์ฝ”๋”๋ฅผ ์ƒ์„ฑ๊ธฐ๊ฐ€ ์•„๋‹Œ ํŒ๋ณ„๊ธฐ๋กœ ์‚ฌ์ „ ํ›ˆ๋ จํ•˜๊ธฐ๋ผ๋Š” ์ƒˆ๋กœ์šด ์ž๊ธฐ ์ง€๋„ ์–ธ์–ด ํ‘œํ˜„ ํ•™์Šต ๋ฐฉ๋ฒ•์„ ์†Œ๊ฐœํ–ˆ์Šต๋‹ˆ๋‹ค. ์ผ๋ ‰ํŠธ๋ผ๋Š” ํŠธ๋žœ์Šคํฌ๋จธ ์‚ฌ์ „ ํ›ˆ๋ จ ๋ฐฉ๋ฒ• ๊ณ„์—ด์˜ ๋˜ ๋‹ค๋ฅธ ๋ฉค๋ฒ„๋กœ, ์ด์ „ ๋ฉค๋ฒ„์ธ BERT, GPT-2, RoBERTa๋Š” ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ ๋ฒค์น˜๋งˆํฌ์—์„œ ๋งŽ์€ ์ตœ์ฒจ๋‹จ ๊ฒฐ๊ณผ๋ฅผ ๋‹ฌ์„ฑํ•œ ๋ฐ” ์žˆ์Šต๋‹ˆ๋‹ค.

๋‹ค๋ฅธ ๋งˆ์Šคํฌ ์–ธ์–ด ๋ชจ๋ธ๋ง ๋ฐฉ๋ฒ•๊ณผ ๋‹ฌ๋ฆฌ, ELECTRA๋Š” ๋Œ€์ฒด ํ† ํฐ ๊ฐ์ง€๋ฅผ ํ†ตํ•ด ์ƒ˜ํ”Œ ํšจ์œจ์„ฑ์ด ๋†’์€ ์‚ฌ์ „ ํ•™์Šต ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ์†Œ๊ทœ๋ชจ์˜ ๊ฒฝ์šฐ, ๋‹จ์ผ GPU์—์„œ 4์ผ ๋™์•ˆ ELECTRA-small์„ ํ›ˆ๋ จํ•˜์—ฌ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. GPT(Radford et al., 2018) (30๋ฐฐ ๋” ๋งŽ์€ ์ปดํ“จํŒ…์„ ์‚ฌ์šฉํ•˜์—ฌ ํ›ˆ๋ จ๋จ) GLUE ๋ฒค์น˜๋งˆํฌ์—์„œ. ๋Œ€๊ทœ๋ชจ์—์„œ๋Š” ELECTRA-large๊ฐ€ ๋‹ค์Œ์„ ๋Šฅ๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. ALBERT(Lan et al., 2019) ๋ฅผ ํ†ตํ•ด SQuAD 2.0์„ ์œ„ํ•œ ์ƒˆ๋กœ์šด ์ตœ์ฒจ๋‹จ ๊ธฐ์ˆ ์„ ์„ ๋ณด์ž…๋‹ˆ๋‹ค.

ELECTRA๋Š” ๋งˆ์Šคํ‚น ์–ธ์–ด ๋ชจ๋ธ ์‚ฌ์ „ ํ•™์Šต ๋ฐฉ์‹๋ณด๋‹ค ์ง€์†์ ์œผ๋กœ ๋›ฐ์–ด๋‚œ ์„ฑ๋Šฅ์„ ๋ฐœํœ˜ํ•ฉ๋‹ˆ๋‹ค.
{: .text-center}

2. ๋ฐฉ๋ฒ•

๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋งˆ์Šคํฌ๋“œ ์–ธ์–ด ๋ชจ๋ธ๋ง ์‚ฌ์ „ ํ•™์Šต ๋ฐฉ๋ฒ• BERT(Devlin ์™ธ, 2019) ์ผ๋ถ€ ํ† ํฐ(์ผ๋ฐ˜์ ์œผ๋กœ ์ž…๋ ฅ์˜ 15%)์„ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋Œ€์ฒดํ•˜์—ฌ ์ž…๋ ฅ์„ ์†์ƒ์‹œํ‚ต๋‹ˆ๋‹ค. [๋งˆ์Šคํฌ] ๋ฅผ ์ž…๋ ฅํ•œ ๋‹ค์Œ ๋ชจ๋ธ์„ ํ•™์Šต์‹œ์ผœ ์›๋ž˜ ํ† ํฐ์„ ์žฌ๊ตฌ์„ฑํ•ฉ๋‹ˆ๋‹ค.

๋งˆ์Šคํ‚น ๋Œ€์‹  ELECTRA๋Š” ์ผ๋ถ€ ํ† ํฐ์„ ์ถ•์†Œ๋œ ๋งˆ์Šคํ‚น ์–ธ์–ด ๋ชจ๋ธ ์ถœ๋ ฅ์˜ ์ƒ˜ํ”Œ๋กœ ๋Œ€์ฒดํ•˜์—ฌ ์ž…๋ ฅ์„ ์†์ƒ์‹œํ‚ต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฐ ๋‹ค์Œ ๊ฐ ํ† ํฐ์ด ์›๋ณธ์ธ์ง€ ๋Œ€์ฒดํ’ˆ์ธ์ง€ ์˜ˆ์ธกํ•˜๊ธฐ ์œ„ํ•ด ํŒ๋ณ„ ๋ชจ๋ธ์„ ํ•™์Šต์‹œํ‚ต๋‹ˆ๋‹ค. ์‚ฌ์ „ ํ›ˆ๋ จ ํ›„์—๋Š” ์ƒ์„ฑ๊ธฐ๋ฅผ ๋ฒ„๋ฆฌ๊ณ  ํŒ๋ณ„๊ธฐ๋ฅผ ๋‹ค์šด์ŠคํŠธ๋ฆผ ์ž‘์—…์—์„œ ๋ฏธ์„ธ ์กฐ์ •ํ•ฉ๋‹ˆ๋‹ค.

ELECTRA์— ๋Œ€ํ•œ ๊ฐœ์š”์ž…๋‹ˆ๋‹ค.
{: .text-center}

GAN๊ณผ ๊ฐ™์€ ์ƒ์„ฑ๊ธฐ์™€ ๊ฐ๋ณ„๊ธฐ๊ฐ€ ์žˆ์ง€๋งŒ, ELECTRA๋Š” ์†์ƒ๋œ ํ† ํฐ์„ ์ƒ์„ฑํ•˜๋Š” ์ƒ์„ฑ๊ธฐ๊ฐ€ ๊ฐ๋ณ„๊ธฐ๋ฅผ ์†์ด๋„๋ก ํ›ˆ๋ จ๋˜๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ ์ตœ๋Œ€ํ•œ์˜ ํ™•๋ฅ ๋กœ ํ›ˆ๋ จ๋œ๋‹ค๋Š” ์ ์—์„œ ์ ๋Œ€์ ์ด์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

ELECTRA๊ฐ€ ์™œ ๊ทธ๋ ‡๊ฒŒ ํšจ์œจ์ ์ธ๊ฐ€์š”?

์ƒˆ๋กœ์šด ๊ต์œก ๋ชฉํ‘œ๋ฅผ ํ†ตํ•ด ELECTRA๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๊ฐ•๋ ฅํ•œ ๋ชจ๋ธ์— ํ•„์ ํ•˜๋Š” ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. RoBERTa(Liu ์™ธ, (2019)) ๋Š” ๋” ๋งŽ์€ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์œผ๋ฉฐ ํ›ˆ๋ จ์— 4๋ฐฐ ๋” ๋งŽ์€ ์ปดํ“จํŒ…์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ์ด ๋…ผ๋ฌธ์—์„œ๋Š” ELECTRA์˜ ํšจ์œจ์„ฑ์— ์‹ค์ œ๋กœ ๊ธฐ์—ฌํ•˜๋Š” ์š”์†Œ๋ฅผ ํŒŒ์•…ํ•˜๊ธฐ ์œ„ํ•ด ๋ถ„์„์„ ์ˆ˜ํ–‰ํ–ˆ์Šต๋‹ˆ๋‹ค. ์ฃผ์š” ๊ฒฐ๊ณผ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค:

  • ELECTRA๋Š” ์ผ๋ถ€ ํ† ํฐ์ด ์•„๋‹Œ ๋ชจ๋“  ์ž…๋ ฅ ํ† ํฐ์— ๋Œ€ํ•ด ์†์‹ค์„ ์ •์˜ํ•จ์œผ๋กœ์จ ํฐ ์ด์ ์„ ๋ˆ„๋ฆฌ๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์ข€ ๋” ๊ตฌ์ฒด์ ์œผ๋กœ ์„ค๋ช…ํ•˜์ž๋ฉด, ELECTRA์—์„œ๋Š” ํŒ๋ณ„๊ธฐ๊ฐ€ ์ž…๋ ฅ๋œ ๋ชจ๋“  ํ† ํฐ์„ ์˜ˆ์ธกํ•˜๋Š” ๋ฐ˜๋ฉด, BERT์—์„œ๋Š” ์ œ๋„ˆ๋ ˆ์ดํ„ฐ๊ฐ€ ์ž…๋ ฅ๋œ 15% ๋งˆ์Šคํฌ ํ† ํฐ๋งŒ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค.
  • ์‚ฌ์ „ ํ•™์Šต ๋‹จ๊ณ„์—์„œ ๋ชจ๋ธ์ด ๋‹ค์Œ์„ ๋ณด๊ธฐ ๋•Œ๋ฌธ์— BERT์˜ ์„ฑ๋Šฅ์ด ์•ฝ๊ฐ„ ์ €ํ•˜๋ฉ๋‹ˆ๋‹ค. [๋งˆ์Šคํฌ] ํ† ํฐ์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์ง€๋งŒ, ๋ฏธ์„ธ ์กฐ์ • ๋‹จ๊ณ„์—์„œ๋Š” ๊ทธ๋ ‡์ง€ ์•Š์Šต๋‹ˆ๋‹ค.


์ผ๋ ‰ํŠธ๋ผ์™€ ๋ฒ„ํŠธ
{: .text-center}

3. ELECTRA ์‚ฌ์ „ ๊ต์œก

์ด ์„น์…˜์—์„œ๋Š” ELECTRA์˜ ์ž‘์„ฑ์ž๊ฐ€ ์ œ๊ณตํ•œ ์Šคํฌ๋ฆฝํŠธ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ TensorFlow๋กœ ELECTRA๋ฅผ ์ฒ˜์Œ๋ถ€ํ„ฐ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค. ๊ตฌ๊ธ€-์—ฐ๊ตฌ/์ผ๋ ‰ํŠธ๋ผ. ๊ทธ๋Ÿฐ ๋‹ค์Œ ๋ชจ๋ธ์„ ํŒŒ์ดํ† ์น˜์˜ ์ฒดํฌํฌ์ธํŠธ๋กœ ๋ณ€ํ™˜ํ•˜๊ณ , ํ—ˆ๊น… ํŽ˜์ด์Šค์˜ ํŠธ๋žœ์Šคํฌ๋จธ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ.

์„ค์ •

!pip install tensorflow==1.15
!pip install transformers==2.8.0
!.git clone https://github.com/google-research/electra.git
import os
import json
ํŠธ๋žœ์Šคํฌ๋จธ์—์„œ ์˜คํ† ํ† ํฐ๋ผ์ด์ €๋ฅผ ๊ฐ€์ ธ์˜ต๋‹ˆ๋‹ค.

๋ฐ์ดํ„ฐ

์˜คํ”ˆ์„œ๋ธŒํƒ€์ดํ‹€์—์„œ ๊ฒ€์ƒ‰ํ•œ ์ŠคํŽ˜์ธ์–ด ์˜ํ™” ์ž๋ง‰ ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ ELECTRA๋ฅผ ์‚ฌ์ „ ํ›ˆ๋ จํ•ฉ๋‹ˆ๋‹ค. ์ด ๋ฐ์ดํ„ฐ ์„ธํŠธ์˜ ํฌ๊ธฐ๋Š” 5.4GB์ด๋ฉฐ ํ”„๋ ˆ์  ํ…Œ์ด์…˜์„ ์œ„ํ•ด ์•ฝ 30MB์˜ ์ž‘์€ ํ•˜์œ„ ์ง‘ํ•ฉ์œผ๋กœ ํ›ˆ๋ จํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค.

DATA_DIR = "./data" #@param {type: "๋ฌธ์ž์—ด"}
TRAIN_SIZE = 1000000 #@param {์œ ํ˜•:"์ •์ˆ˜"}
MODEL_NAME = "electra-spanish" #@param {type: "๋ฌธ์ž์—ด"}
# ์ŠคํŽ˜์ธ์–ด ์˜ํ™” ์ž๋ง‰ ๋ฐ์ดํ„ฐ ์„ธํŠธ ๋‹ค์šด๋กœ๋“œ ๋ฐ ์••์ถ• ํ’€๊ธฐ
os.path.exists(DATA_DIR)๊ฐ€ ์•„๋‹Œ ๊ฒฝ์šฐ:
  !.mkdir -p $DATA_DIR
  !.wget "https://object.pouta.csc.fi/OPUS-OpenSubtitles/v2016/mono/es.txt.gz" -O $DATA_DIR/OpenSubtitles.txt.gz
  !.gzip -d $DATA_DIR/OpenSubtitles.txt.gz
  !head -n $TRAIN_SIZE $DATA_DIR/OpenSubtitles.txt > $DATA_DIR/train_data.txt
  !.rm $DATA_DIR/OpenSubtitles.txt

์‚ฌ์ „ ํ•™์Šต ๋ฐ์ดํ„ฐ ์ง‘ํ•ฉ์„ ๊ตฌ์ถ•ํ•˜๊ธฐ ์ „์— ๋ง๋ญ‰์น˜์˜ ํ˜•์‹์ด ๋‹ค์Œ๊ณผ ๊ฐ™์€์ง€ ํ™•์ธํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค:

  • ๊ฐ ์ค„์€ ๋ฌธ์žฅ์ž…๋‹ˆ๋‹ค.
  • ๋นˆ ์ค„์€ ๋‘ ๋ฌธ์„œ๋ฅผ ๊ตฌ๋ถ„ํ•ฉ๋‹ˆ๋‹ค.

์‚ฌ์ „ ํ•™์Šต ๋ฐ์ดํ„ฐ ์„ธํŠธ ๊ตฌ์ถ•

์ €ํฌ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ํ† ํฐ๋ผ์ด์ €๋ฅผ ์‚ฌ์šฉํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค. bert-๊ธฐ๋ฐ˜-๋‹ค๊ตญ์–ด-์ผ€์ด์Šค ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ŠคํŽ˜์ธ์–ด ํ…์ŠคํŠธ๋ฅผ ์ฒ˜๋ฆฌํ•ฉ๋‹ˆ๋‹ค.

# ์‚ฌ์ „ ํ•™์Šต๋œ WordPiece ํ† ํฐ๋ผ์ด์ €๋ฅผ ์ €์žฅํ•˜์—ฌ vocab.txt๋ฅผ ๊ฐ€์ ธ์˜ต๋‹ˆ๋‹ค.
tokenizer = AutoTokenizer.from_pretrained("bert-base-multilingual-cased")
tokenizer.save_pretrained(DATA_DIR)

๋‹น์‚ฌ๋Š” ๋‹ค์Œ์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ๋นŒ๋“œ_ํ”„๋ฆฌํŠธ๋ ˆ์ด๋‹_๋ฐ์ดํ„ฐ์„ธํŠธ.py ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์›์‹œ ํ…์ŠคํŠธ ๋คํ”„์—์„œ ์‚ฌ์ „ ํ•™์Šต ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ๋งŒ๋“ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

!python3 electra/build_pretraining_dataset.py \.
  --corpus-dir $DATA_DIR \.
  --vocab-file $DATA_DIR/vocab.txt \.
  --output-dir $DATA_DIR/pretrain_tfrecords \.
  --max-seq-length 128 \
  --blanks-separate-docs False \.
  --์†Œ๋ฌธ์ž ์—†์Œ \
  --num-processes 5

๊ต์œก ์‹œ์ž‘

๋‹น์‚ฌ๋Š” ๋‹ค์Œ์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. run_pretraining.py ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ELECTRA ๋ชจ๋ธ์„ ์‚ฌ์ „ ํ•™์Šตํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์†Œํ˜• ์ผ๋ ‰ํŠธ๋ผ ๋ชจ๋ธ์„ 1๋ฐฑ๋งŒ ๊ฑธ์Œ์œผ๋กœ ํ›ˆ๋ จํ•˜๋ ค๋ฉด ์‹คํ–‰ํ•˜์„ธ์š”:

python3 run_pretraining.py --data-dir $DATA_DIR --model-name electra_small

Tesla V100 GPU์—์„œ๋Š” 4์ผ์ด ์กฐ๊ธˆ ๋„˜๊ฒŒ ๊ฑธ๋ฆฝ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ด ๋ชจ๋ธ์€ 20๋งŒ ๊ฑธ์Œ(v100 GPU์—์„œ 10์‹œ๊ฐ„ ๋™์•ˆ ํ›ˆ๋ จ)์„ ์ˆ˜ํ–‰ํ•œ ํ›„์—๋„ ๊ดœ์ฐฎ์€ ๊ฒฐ๊ณผ๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๊ต์œก์„ ์‚ฌ์šฉ์ž ์ง€์ •ํ•˜๋ ค๋ฉด .json ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ ํฌํ•จ๋œ ํŒŒ์ผ์ž…๋‹ˆ๋‹ค. ๋‹ค์Œ์„ ์ฐธ์กฐํ•˜์„ธ์š”. configure_pretraining.py ๋ฅผ ์ž…๋ ฅํ•˜๋ฉด ๋ชจ๋“  ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ์˜ ๊ธฐ๋ณธ๊ฐ’์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์•„๋ž˜์—์„œ๋Š” 100๋‹จ๊ณ„์— ๋Œ€ํ•ด์„œ๋งŒ ๋ชจ๋ธ์„ ํ•™์Šตํ•˜๋„๋ก ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์„ค์ •ํ–ˆ์Šต๋‹ˆ๋‹ค.

hparams = {
    "do_train": "true",
    "do_eval": "false",
    "model_size": "small",
    "do_lower_case": "false",
    "vocab_size": 119547,
    "num_train_steps": 100,
    "์ €์žฅ_์ฒดํฌํฌ์ธํŠธ_๋‹จ๊ณ„": 100,
    "train_batch_size": 32,
}

open("hparams.json", "w")๋ฅผ f๋กœ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค:
    json.dump(hparams, f)

๊ต์œก์„ ์‹œ์ž‘ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค:

!python3 electra/run_pretraining.py \.
  --data-dir $DATA_DIR \.
  --model-name $MODEL_NAME \.
  --hparams "hparams.json"

๊ฐ€์ƒ ๋จธ์‹ ์—์„œ ํ›ˆ๋ จํ•˜๋Š” ๊ฒฝ์šฐ, ํ„ฐ๋ฏธ๋„์—์„œ ๋‹ค์Œ ๋ช…๋ น์„ ์‹คํ–‰ํ•˜์—ฌ TensorBoard๋กœ ํ›ˆ๋ จ ๊ณผ์ •์„ ๋ชจ๋‹ˆํ„ฐ๋งํ•˜์„ธ์š”.

pip install -U tensorboard
tensorboard dev upload --logdir data/models/electra-spanish

์ด๊ฒƒ์€ ํ…์„œ๋ณด๋“œ V100 GPU์—์„œ 4์ผ ๋™์•ˆ 1๋ฐฑ๋งŒ ๊ฑธ์Œ์„ ๊ฑธ์„ ์ˆ˜ ์žˆ๋Š” ์ผ๋ ‰ํŠธ๋ผ ์Šค๋ชฐ ํŠธ๋ ˆ์ด๋‹์„ ์ˆ˜ํ–‰ํ–ˆ์Šต๋‹ˆ๋‹ค.

{: .align-center}

4. ํ…์„œํ”Œ๋กœ ์ฒดํฌํฌ์ธํŠธ๋ฅผ ํŒŒ์ดํ† ์น˜ ํ˜•์‹์œผ๋กœ ๋ณ€ํ™˜ํ•˜๊ธฐ

ํ—ˆ๊น… ํŽ˜์ด์Šค์—๋Š” ๋„๊ตฌ ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ…์„œํ”Œ๋กœ์šฐ ์ฒดํฌํฌ์ธํŠธ๋ฅผ PyTorch๋กœ ๋ณ€ํ™˜ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ด ๋„๊ตฌ๋Š” ์•„์ง ELECTRA์šฉ์œผ๋กœ ์—…๋ฐ์ดํŠธ๋˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค. ๋‹คํ–‰ํžˆ๋„ ์ด ์ž‘์—…์— ๋„์›€์ด ๋  ์ˆ˜ ์žˆ๋Š” @lonePatient์˜ GitHub ๋ฆฌํฌ์ง€ํ† ๋ฆฌ๋ฅผ ๋ฐœ๊ฒฌํ–ˆ์Šต๋‹ˆ๋‹ค.

!.git clone https://github.com/lonePatient/electra_pytorch.git
MODEL_DIR = "data/models/electra-spanish/"

config = {
  "vocab_size": 119547,
  "embedding_size": 128,
  "hidden_size": 256,
  "num_hidden_layers": 12,
  "num_attention_heads": 4,
  "intermediate_size": 1024,
  "generator_size":"0.25",
  "hidden_act": "์ ค๋ฃจ",
  "hidden_dropout_prob": 0.1,
  "attention_probs_dropout_prob": 0.1,
  "max_position_embedings": 512,
  "type_vocab_size": 2,
  "initializer_range": 0.02
}

open(MODEL_DIR + "config.json", "w")๋ฅผ f๋กœ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค:
    json.dump(config, f)
!.python electra_pytorch/convert_electra_tf_checkpoint_to_pytorch.py \.
    --tf_checkpoint_path=$MODEL_DIR \.
    --electra_config_file=$MODEL_DIR/config.json \.
    --pytorch_dump_path=$MODEL_DIR/pytorch_model.bin

ELECTRA๋ฅผ ๋‹ค์Œ๊ณผ ํ•จ๊ป˜ ์‚ฌ์šฉํ•˜์„ธ์š”. ํŠธ๋žœ์Šคํฌ๋จธ

๋ชจ๋ธ ์ฒดํฌํฌ์ธํŠธ๋ฅผ PyTorch ํ˜•์‹์œผ๋กœ ๋ณ€ํ™˜ํ•œ ํ›„, ์‚ฌ์ „ ํ•™์Šต๋œ ELECTRA ๋ชจ๋ธ์„ ๋‹ค์šด์ŠคํŠธ๋ฆผ ์ž‘์—…์— ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํŠธ๋žœ์Šคํฌ๋จธ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ.

ํ† ์น˜ ๊ฐ€์ ธ์˜ค๊ธฐ
ํŠธ๋žœ์Šคํฌ๋จธ์—์„œ ์ผ๋ ‰ํŠธ๋ผํฌํ”„๋ฆฌํŠธ๋ ˆ์ด๋‹, ์ผ๋ ‰ํŠธ๋ผํ† ์ปค๋‚˜์ด์ €ํŒจ์ŠคํŠธ ๊ฐ€์ ธ์˜ค๊ธฐ

ํŒ๋ณ„์ž = ElectraForPreTraining.from_pretrained(MODEL_DIR)
ํ† ํฐํ™”๊ธฐ = ElectraTokenizerFast.from_pretrained(DATA_DIR, do_lower_case=False)
๋ฌธ์žฅ = "๋กœ์Šค ํŒŒํ•˜๋กœ ์—์Šคํƒ„ ์นธํƒ„๋„" # ์ƒˆ๋“ค์ด ๋…ธ๋ž˜ํ•˜๊ณ  ์žˆ์–ด์š”.
fake_sentence = "๋กœ์Šค ํŒŒํ•˜๋กœ ์—์Šคํƒ„ ํ•˜๋ธ”๋ž€๋„" # ์ƒˆ๋“ค์ด ๋งํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

fake_tokens = tokenizer.tokenize(fake_sentence, add_special_tokens=True)
fake_inputs = tokenizer.encode(fake_sentence, return_tensors="pt")
ํŒ๋ณ„์ž_์•„์›ƒํ’‹ = ํŒ๋ณ„์ž(๊ฐ€์งœ_์ž…๋ ฅ)
์˜ˆ์ธก = ํŒ๋ณ„์ž_์•„์›ƒํ’‹[0] > 0

[print("%7s" % ํ† ํฐ, end="") for token in fake_tokens]
print("\n")
[print("%7s" % int(prediction), end="") for prediction in predictions.tolist()];
  [CLS] ํŒŒ์ฆˆ #1TP5ํƒ€๋กœ ์—์Šคํƒ„ ํ•˜๋ธ”๋ผ #1TP5ํƒ€๋„ [SEP]

      1 0 0 0 0 0 0 0

์ €ํฌ ๋ชจ๋ธ์€ 100๋‹จ๊ณ„๋งŒ ํ•™์Šต๋˜์—ˆ๊ธฐ ๋•Œ๋ฌธ์— ์˜ˆ์ธก์ด ์ •ํ™•ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ์™„์ „ํžˆ ํ•™์Šต๋œ ์ŠคํŽ˜์ธ์–ด์šฉ ์ผ๋ ‰ํŠธ๋ผ ์Šค๋ชฐ์€ ์•„๋ž˜์™€ ๊ฐ™์ด ๋กœ๋“œํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

ํŒ๋ณ„์ž = ElectraForPreTraining.from_pretrained("skimai/electra-small-spanish")
ํ† ํฐํ™”๊ธฐ = ElectraTokenizerFast.from_pretrained("skimai/electra-small-spanish", do_lower_case=False)

5. ๊ฒฐ๋ก 

์ด ๊ธ€์—์„œ๋Š” ELECTRA ๋ฐฑ์„œ๋ฅผ ํ†ตํ•ด ELECTRA๊ฐ€ ํ˜„์žฌ ๊ฐ€์žฅ ํšจ์œจ์ ์ธ ํŠธ๋žœ์Šคํฌ๋จธ ์‚ฌ์ „ ํ›ˆ๋ จ ๋ฐฉ์‹์ธ ์ด์œ ๋ฅผ ์‚ดํŽด๋ดค์Šต๋‹ˆ๋‹ค. ์†Œ๊ทœ๋ชจ์˜ ๊ฒฝ์šฐ, ELECTRA-small์€ ํ•˜๋‚˜์˜ GPU๋กœ 4์ผ ๋™์•ˆ ํŠธ๋ ˆ์ด๋‹ํ•˜์—ฌ GLUE ๋ฒค์น˜๋งˆํฌ์—์„œ GPT๋ฅผ ๋Šฅ๊ฐ€ํ•˜๋Š” ์„ฑ๋Šฅ์„ ๋‚ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋Œ€๊ทœ๋ชจ๋กœ ๋ณด๋ฉด ELECTRA-large๋Š” SQuAD 2.0์˜ ์ƒˆ๋กœ์šด ๊ธฐ์ค€์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.

๊ทธ๋Ÿฐ ๋‹ค์Œ ์‹ค์ œ๋กœ ์ŠคํŽ˜์ธ์–ด ํ…์ŠคํŠธ์— ๋Œ€ํ•ด ELECTRA ๋ชจ๋ธ์„ ํ•™์Šตํ•˜๊ณ  Tensorflow ์ฒดํฌํฌ์ธํŠธ๋ฅผ PyTorch๋กœ ๋ณ€ํ™˜ํ•œ ํ›„ ํŠธ๋žœ์Šคํฌ๋จธ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ.

์ฐธ์กฐ

๊ท€์‚ฌ์˜ AI ์†”๋ฃจ์…˜์— ๋Œ€ํ•ด ๋…ผ์˜ํ•ด ๋ณด์„ธ์š”

    ๊ด€๋ จ ๊ฒŒ์‹œ๋ฌผ

    • LLM ํ”„๋กฌํ”„ํŠธ ๊ธฐ์ˆ 

      ํšจ๊ณผ์ ์ธ ๋Œ€๊ทœ๋ชจ ์–ธ์–ด ๋ชจ๋ธ(LLM) ํ”„๋กฌํ”„ํŠธ๋ฅผ ๋งŒ๋“œ๋Š” ๊ธฐ์ˆ ์€ AI ์‹ค๋ฌด์ž์—๊ฒŒ ์ค‘์š”ํ•œ ๊ธฐ์ˆ ์ด ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์ž˜ ์„ค๊ณ„๋œ ํ”„๋กฌํ”„ํŠธ๋Š” LLM์˜ ์„ฑ๋Šฅ์„ ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œ์ผœ ๋ณด๋‹ค ์ •ํ™•ํ•˜๊ณ  ๊ด€๋ จ์„ฑ์ด ๋†’์œผ๋ฉฐ ์ฐฝ์˜์ ์ธ ๊ฒฐ๊ณผ๋ฌผ์„ ์–ป์„ ์ˆ˜ ์žˆ๊ฒŒ ํ•ด์ค๋‹ˆ๋‹ค. ์ด ๋ธ”๋กœ๊ทธ ๊ฒŒ์‹œ๋ฌผ์—์„œ๋Š” 10๊ฐ€์ง€ ํ”„๋กฌํ”„ํŠธ์— ๋Œ€ํ•ด ์‚ดํŽด๋ด…๋‹ˆ๋‹ค.

      ํ”„๋กฌํ”„ํŠธ ์—”์ง€๋‹ˆ์–ด๋ง
    • ์ƒ์œ„ 10๊ฐœ AI ์œ ํŠœ๋ธŒ ์ฑ„๋„

      ์ธ๊ณต์ง€๋Šฅ ๋ถ„์•ผ๋Š” ์ง€์†์ ์ธ ํ•™์Šต์„ ์š”๊ตฌํ•˜๋ฉฐ, YouTube๋Š” ์ธ๊ณต์ง€๋Šฅ ๊ต์œก ๋ฐ ์ „๋ฌธ์„ฑ ๊ฐœ๋ฐœ์„ ์œ„ํ•œ ๊ฐ€์žฅ ๊ฐ•๋ ฅํ•œ ํ”Œ๋žซํผ ์ค‘ ํ•˜๋‚˜๋กœ ๋ถ€์ƒํ–ˆ์Šต๋‹ˆ๋‹ค. ์—ฐ๊ตฌ ๋…ผ๋ฌธ๊ณผ ์ „ํ†ต์ ์ธ ํ•™์Šต ๊ฒฝ๋กœ๊ฐ€ ์—ฌ์ „ํžˆ ์ค‘์š”ํ•˜์ง€๋งŒ, ์„ ๋„์ ์ธ AI ์œ ํŠœ๋ฒ„๋“ค์€

      ์ œ๋„ˆ๋ ˆ์ดํ‹ฐ๋ธŒ AI
    • ์œ ํŠœ๋ธŒ์šฉ AI ๋„๊ตฌ

      ์—ญ๋™์ ์ธ YouTube ์ฝ˜ํ…์ธ  ์ œ์ž‘ ํ™˜๊ฒฝ์—์„œ AI ๋„๊ตฌ๋ฅผ ํ™œ์šฉํ•˜๋ฉด ๋™์˜์ƒ ์ œ์ž‘ ํ”„๋กœ์„ธ์Šค๋ฅผ ํฌ๊ฒŒ ๊ฐœ์„ ํ•˜๊ณ  ์›Œํฌํ”Œ๋กœ๋ฅผ ๊ฐ„์†Œํ™”ํ•˜๋ฉฐ ์ฑ„๋„์˜ ์„ฑ์žฅ์„ ์ด‰์ง„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋™์˜์ƒ ํŽธ์ง‘ ์†Œํ”„ํŠธ์›จ์–ด๋ถ€ํ„ฐ ํ‚ค์›Œ๋“œ ์กฐ์‚ฌ ๋„๊ตฌ๊นŒ์ง€, ๋‹ค์Œ๊ณผ ๊ฐ™์€ AI ๊ธฐ๋ฐ˜ ์†”๋ฃจ์…˜์€ YouTube ํฌ๋ฆฌ์—์ดํ„ฐ๋ฅผ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค.

      TV, ์˜ํ™” ๋ฐ ์ฝ˜ํ…์ธ 

    ๋น„์ฆˆ๋‹ˆ์Šค๋ฅผ ๊ฐ•ํ™”ํ•  ์ค€๋น„ ์™„๋ฃŒ

    ko_KRํ•œ๊ตญ์–ด