{"id":3732,"date":"2020-12-28T23:33:59","date_gmt":"2020-12-29T04:33:59","guid":{"rendered":"http:\/\/skimai.com\/?p=3732"},"modified":"2024-05-20T07:38:31","modified_gmt":"2024-05-20T12:38:31","slug":"%e5%90%8d%e5%89%8d%e4%bb%98%e3%81%8d%e3%82%a8%e3%83%b3%e3%83%86%e3%82%a3%e3%83%86%e3%82%a3%e8%aa%8d%e8%ad%98%e3%81%ae%e3%81%9f%e3%82%81%e3%81%aebert%e3%81%ae%e5%be%ae%e8%aa%bf%e6%95%b4%e6%96%b9","status":"publish","type":"post","link":"https:\/\/skimai.com\/ja\/how-to-fine-tune-bert-for-named-entity-recognition-ner\/","title":{"rendered":"\u30c1\u30e5\u30fc\u30c8\u30ea\u30a2\u30eb\u540d\u524d\u4ed8\u304d\u56fa\u6709\u8868\u73fe\u8a8d\u8b58\uff08NER\uff09\u306e\u305f\u3081\u306eBERT\u306e\u5fae\u8abf\u6574\u65b9\u6cd5"},"content":{"rendered":"<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_85 counter-hierarchy ez-toc-counter ez-toc-grey ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Table of Contents<\/p>\n<span class=\"ez-toc-title-toggle\"><a href=\"#\" class=\"ez-toc-pull-right ez-toc-btn ez-toc-btn-xs ez-toc-btn-default ez-toc-toggle\" aria-label=\"Toggle Table of Content\"><span class=\"ez-toc-js-icon-con\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #999;color:#999\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #999;color:#999\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/span><\/a><\/span><\/div>\n<nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-1'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/skimai.com\/ja\/how-to-fine-tune-bert-for-named-entity-recognition-ner\/#Tutorial_How_to_Fine-tune_BERT_for_NER\" >Tutorial: How to Fine-tune BERT for NER<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-1'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/skimai.com\/ja\/how-to-fine-tune-bert-for-named-entity-recognition-ner\/#Introduction\" >Introduction<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-1'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/skimai.com\/ja\/how-to-fine-tune-bert-for-named-entity-recognition-ner\/#Setup\" >Setup<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-1'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/skimai.com\/ja\/how-to-fine-tune-bert-for-named-entity-recognition-ner\/#Data\" >Data<\/a><ul class='ez-toc-list-level-2' ><li class='ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/skimai.com\/ja\/how-to-fine-tune-bert-for-named-entity-recognition-ner\/#1_Download_Datasets\" >1. Download Datasets<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/skimai.com\/ja\/how-to-fine-tune-bert-for-named-entity-recognition-ner\/#2_Preprocessing\" >2. Preprocessing<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/skimai.com\/ja\/how-to-fine-tune-bert-for-named-entity-recognition-ner\/#3_Labels\" >3. Labels<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-1'><a class=\"ez-toc-link ez-toc-heading-8\" href=\"https:\/\/skimai.com\/ja\/how-to-fine-tune-bert-for-named-entity-recognition-ner\/#Fine-tuning_Model\" >Fine-tuning Model<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-1'><a class=\"ez-toc-link ez-toc-heading-9\" href=\"https:\/\/skimai.com\/ja\/how-to-fine-tune-bert-for-named-entity-recognition-ner\/#Pipeline\" >Pipeline<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-1'><a class=\"ez-toc-link ez-toc-heading-10\" href=\"https:\/\/skimai.com\/ja\/how-to-fine-tune-bert-for-named-entity-recognition-ner\/#Conclusion\" >Conclusion<\/a><\/li><\/ul><\/nav><\/div>\n<h1><span class=\"ez-toc-section\" id=\"Tutorial_How_to_Fine-tune_BERT_for_NER\"><\/span>Tutorial: How to Fine-tune BERT for NER<span class=\"ez-toc-section-end\"><\/span><\/h1>\n<pre><code>    Originally published by Skim AI's Machine Learning Researcher, Chris Tran.<\/code><\/pre>\n<p><a href=\"https:\/\/colab.research.google.com\/drive\/1ezuE7wC7Fa21Wu3fvzRffx2m14CAySS1#scrollTo=LhKZ3vItVBzi\"><img decoding=\"async\" src=\"https:\/\/img.shields.io\/badge\/Colab-Run_in_Google_Colab-blue?logo=Google&#038;logoColor=FDBA18\" alt=\"Run in Google Colab\"><\/a><\/p>\n<\/p>\n<h1><span class=\"ez-toc-section\" id=\"Introduction\"><\/span>Introduction<span class=\"ez-toc-section-end\"><\/span><\/h1>\n<p>This article is on how to fine-tune BERT for Named Entity Recognition (NER). Specifically, how to train a BERT variation, SpanBERTa, for NER. It is Part II of III in a series on training custom BERT Language Models for Spanish for a variety of use cases:<\/p>\n<\/p>\n<ul>\n<li><a href=\"http:\/\/skimai.com\/roberta-language-model-for-spanish\/\">Part I: How to Train a RoBERTa Language Model for Spanish from Scratch<\/a><\/li>\n<li><a href=\"http:\/\/skimai.com\/how-to-train-electra-language-model-for-spanish\/\">Part III: How to Train an ELECTRA Language Model for Spanish from Scratch<\/a><\/li>\n<\/ul>\n<p>In my previous blog post, we have discussed how my team pretrained SpanBERTa, a transformer language model for Spanish, on a big corpus from scratch. The model has shown to be able to predict correctly masked words in a sequence based on its context. In this blog post, to really leverage the power of transformer models, we will fine-tune SpanBERTa for a named-entity recognition task.<\/p>\n<p>According to its definition on <a href=\"https:\/\/en.wikipedia.org\/wiki\/Named-entity_recognition\">Wikipedia<\/a>, Named-entity recognition (NER) (also known as entity identification, entity chunking and entity extraction) is a subtask of information extraction that seeks to locate and classify named entity mentioned in unstructured text into pre-defined categories such as person names, organizations, locations, medical codes, time expressions, quantities, monetary values, percentages, etc.<\/p>\n<p>We will use the script <a href=\"https:\/\/github.com\/huggingface\/transformers\/blob\/master\/examples\/ner\/run_ner.py\"><code>run_ner.py<\/code><\/a> by Hugging Face and <a href=\"https:\/\/www.kaggle.com\/nltkdata\/conll-corpora\">CoNLL-2002 dataset<\/a> to fine-tune SpanBERTa.<\/p>\n<h1><span class=\"ez-toc-section\" id=\"Setup\"><\/span>Setup<span class=\"ez-toc-section-end\"><\/span><\/h1>\n<p>Download <code>transformers<\/code> and install required packages.<\/p>\n<pre><code>%%capture\n!git clone https:\/\/github.com\/huggingface\/transformers\n%cd transformers\n!pip install .\n!pip install -r .\/examples\/requirements.txt\n%cd ..\n<\/code><\/pre>\n<h1><span class=\"ez-toc-section\" id=\"Data\"><\/span>Data<span class=\"ez-toc-section-end\"><\/span><\/h1>\n<h2><span class=\"ez-toc-section\" id=\"1_Download_Datasets\"><\/span>1. Download Datasets<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>The below command will download and unzip the dataset. The files contain the train and test data for three parts of the <a href=\"https:\/\/www.clips.uantwerpen.be\/conll2002\/ner\/\">CoNLL-2002<\/a> shared task:<\/p>\n<ul>\n<li>esp.testa: Spanish test data for the development stage<\/li>\n<li>esp.testb: Spanish test data<\/li>\n<li>esp.train: Spanish train data<\/li>\n<\/ul>\n<pre><code>%%capture\n!wget -O 'conll2002.zip' 'https:\/\/drive.google.com\/uc?export=download&id=1Wrl1b39ZXgKqCeAFNM9EoXtA1kzwNhCe'\n!unzip 'conll2002.zip'\n<\/code><\/pre>\n<p>The size of each dataset:<\/p>\n<pre><code>!wc -l conll2002\/esp.train\n!wc -l conll2002\/esp.testa\n!wc -l conll2002\/esp.testb\n<\/code><\/pre>\n<pre><code>273038 conll2002\/esp.train\n54838 conll2002\/esp.testa\n53050 conll2002\/esp.testb\n<\/code><\/pre>\n<p>All data files has three columns: words, associated part-of-speech tags and named entity tags in the IOB2 format. Sentence breaks are encoded by empty lines.<\/p>\n<pre><code>!head -n20 conll2002\/esp.train\n<\/code><\/pre>\n<pre><code>Melbourne NP B-LOC\n( Fpa O\nAustralia NP B-LOC\n) Fpt O\n, Fc O\n25 Z O\nmay NC O\n( Fpa O\nEFE NC B-ORG\n) Fpt O\n. Fp O\n- Fg O\nEl DA O\nAbogado NC B-PER\nGeneral AQ I-PER\ndel SP I-PER\nEstado NC I-PER\n, Fc O\n<\/code><\/pre>\n<p>We will only keep the word column and the named entity tag column for our train, dev and test datasets.<\/p>\n<pre><code>!cat conll2002\/esp.train | cut -d \" \" -f 1,3 > train_temp.txt\n!cat conll2002\/esp.testa | cut -d \" \" -f 1,3 > dev_temp.txt\n!cat conll2002\/esp.testb | cut -d \" \" -f 1,3 > test_temp.txt\n<\/code><\/pre>\n<h2><span class=\"ez-toc-section\" id=\"2_Preprocessing\"><\/span>2. Preprocessing<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Let&#8217;s define some variables that we need for further pre-processing steps and training the model:<\/p>\n<pre><code>MAX_LENGTH = 120 #@param {type: \"integer\"}\nMODEL = \"chriskhanhtran\/spanberta\" #@param [\"chriskhanhtran\/spanberta\", \"bert-base-multilingual-cased\"]\n<\/code><\/pre>\n<p>The script below will split sentences longer than <code>MAX_LENGTH<\/code> (in terms of tokens) into small ones. Otherwise, long sentences will be truncated when tokenized, causing the loss of training data and some tokens in the test set not being predicted.<\/p>\n<pre><code>%%capture\n!wget \"https:\/\/raw.githubusercontent.com\/stefan-it\/fine-tuned-berts-seq\/master\/scripts\/preprocess.py\"\n<\/code><\/pre>\n<pre><code>!python3 preprocess.py train_temp.txt $MODEL $MAX_LENGTH > train.txt\n!python3 preprocess.py dev_temp.txt $MODEL $MAX_LENGTH > dev.txt\n!python3 preprocess.py test_temp.txt $MODEL $MAX_LENGTH > test.txt\n<\/code><\/pre>\n<pre><code>2020-04-22 23:02:05.747294: I tensorflow\/stream_executor\/platform\/default\/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1\nDownloading: 100% 1.03k\/1.03k [00:00<00:00, 704kB\/s]\nDownloading: 100% 954k\/954k [00:00<00:00, 1.89MB\/s]\nDownloading: 100% 512k\/512k [00:00<00:00, 1.19MB\/s]\nDownloading: 100% 16.0\/16.0 [00:00<00:00, 12.6kB\/s]\n2020-04-22 23:02:23.409488: I tensorflow\/stream_executor\/platform\/default\/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1\n2020-04-22 23:02:31.168967: I tensorflow\/stream_executor\/platform\/default\/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1\n<\/code><\/pre>\n<h2><span class=\"ez-toc-section\" id=\"3_Labels\"><\/span>3. Labels<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>In CoNLL-2002\/2003 datasets, there are have 9 classes of NER tags:<\/p>\n<ul>\n<li>O, Outside of a named entity<\/li>\n<li>B-MIS, Beginning of a miscellaneous entity right after another miscellaneous entity<\/li>\n<li>I-MIS, Miscellaneous entity<\/li>\n<li>B-PER, Beginning of a person\u2019s name right after another person\u2019s name<\/li>\n<li>I-PER, Person\u2019s name<\/li>\n<li>B-ORG, Beginning of an organisation right after another organisation<\/li>\n<li>I-ORG, Organisation<\/li>\n<li>B-LOC, Beginning of a location right after another location<\/li>\n<li>I-LOC, Location<\/li>\n<\/ul>\n<p>If your dataset has different labels or more labels than CoNLL-2002\/2003 datasets, run the line below to get unique labels from your data and save them into <code>labels.txt<\/code>. This file will be used when we start fine-tuning our model.<\/p>\n<pre><code>!cat train.txt dev.txt test.txt | cut -d \" \" -f 2 | grep -v \"^$\"| sort | uniq > labels.txt\n<\/code><\/pre>\n<h1><span class=\"ez-toc-section\" id=\"Fine-tuning_Model\"><\/span>Fine-tuning Model<span class=\"ez-toc-section-end\"><\/span><\/h1>\n<p>These are the example scripts from <code>transformers<\/code>'s repo that we will use to fine-tune our model for NER. After 04\/21\/2020, Hugging Face has updated their example scripts to use a new <code>Trainer<\/code> class. To avoid any future conflict, let's use the version before they made these updates.<\/p>\n<pre><code>%%capture\n!wget \"https:\/\/raw.githubusercontent.com\/chriskhanhtran\/spanish-bert\/master\/ner\/run_ner.py\"\n!wget \"https:\/\/raw.githubusercontent.com\/chriskhanhtran\/spanish-bert\/master\/ner\/utils_ner.py\"\n<\/code><\/pre>\n<p>Now it's time for transfer learning. In my <a href=\"https:\/\/chriskhanhtran.github.io\/posts\/spanberta-bert-for-spanish-from-scratch\/\">previous blog post<\/a>, I have pretrained a RoBERTa language model on a very large Spanish corpus to predict masked words based on the context they are in. By doing that, the model has learned inherent properties of the language. I have uploaded the pretrained model to Hugging Face's server. Now we will load the model and start fine-tuning it for the NER task.<\/p>\n<p>Below are our training hyperparameters.<\/p>\n<pre><code>MAX_LENGTH = 128 #@param {type: \"integer\"}\nMODEL = \"chriskhanhtran\/spanberta\" #@param [\"chriskhanhtran\/spanberta\", \"bert-base-multilingual-cased\"]\nOUTPUT_DIR = \"spanberta-ner\" #@param [\"spanberta-ner\", \"bert-base-ml-ner\"]\nBATCH_SIZE = 32 #@param {type: \"integer\"}\nNUM_EPOCHS = 3 #@param {type: \"integer\"}\nSAVE_STEPS = 100 #@param {type: \"integer\"}\nLOGGING_STEPS = 100 #@param {type: \"integer\"}\nSEED = 42 #@param {type: \"integer\"}\n<\/code><\/pre>\n<p>Let's start training.<\/p>\n<pre><code>!python3 run_ner.py \n  --data_dir .\/ \n  --model_type bert \n  --labels .\/labels.txt \n  --model_name_or_path $MODEL \n  --output_dir $OUTPUT_DIR \n  --max_seq_length  $MAX_LENGTH \n  --num_train_epochs $NUM_EPOCHS \n  --per_gpu_train_batch_size $BATCH_SIZE \n  --save_steps $SAVE_STEPS \n  --logging_steps $LOGGING_STEPS \n  --seed $SEED \n  --do_train \n  --do_eval \n  --do_predict \n  --overwrite_output_dir\n<\/code><\/pre>\n<p>Performance on the dev set:<\/p>\n<pre><code>04\/21\/2020 02:24:31 - INFO - __main__ -   ***** Eval results  *****\n04\/21\/2020 02:24:31 - INFO - __main__ -     f1 = 0.831027443864822\n04\/21\/2020 02:24:31 - INFO - __main__ -     loss = 0.1004064822183894\n04\/21\/2020 02:24:31 - INFO - __main__ -     precision = 0.8207885304659498\n04\/21\/2020 02:24:31 - INFO - __main__ -     recall = 0.8415250344510795\n<\/code><\/pre>\n<p>Performance on the test set:<\/p>\n<pre><code>04\/21\/2020 02:24:48 - INFO - __main__ -   ***** Eval results  *****\n04\/21\/2020 02:24:48 - INFO - __main__ -     f1 = 0.8559533721898419\n04\/21\/2020 02:24:48 - INFO - __main__ -     loss = 0.06848683688204177\n04\/21\/2020 02:24:48 - INFO - __main__ -     precision = 0.845858475041141\n04\/21\/2020 02:24:48 - INFO - __main__ -     recall = 0.8662921348314607\n<\/code><\/pre>\n<p>Here are the tensorboards of fine-tuning <a href=\"https:\/\/tensorboard.dev\/experiment\/Ggs7aCjWQ0exU2Nbp3pPlQ\/#scalars&#038;_smoothingWeight=0.265\">spanberta<\/a> and <a href=\"https:\/\/tensorboard.dev\/experiment\/M9AXw2lORjeRzFZzEJOxkA\/#scalars\">bert-base-multilingual-cased<\/a> for 5 epoches. We can see that the models overfit the training data after 3 epoches.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/raw.githubusercontent.com\/chriskhanhtran\/spanish-bert\/master\/img\/spanberta-ner-tb-5.JPG\" alt=\"\"><\/p>\n<p><strong>Classification Report<\/strong><\/p>\n<p>To understand how well our model actually performs, let's load its predictions and examine the classification report.<\/p>\n<pre><code>def read_examples_from_file(file_path):\n    \"\"\"Read words and labels from a CoNLL-2002\/2003 data file.\n    Args:\n      file_path (str): path to NER data file.\n    Returns:\n      examples (dict): a dictionary with two keys: <code>words<\/code> (list of lists)\n        holding words in each sequence, and <code>labels<\/code> (list of lists) holding\n        corresponding labels.\n    \"\"\"\n    with open(file_path, encoding=\"utf-8\") as f:\n        examples = {\"words\": [], \"labels\": []}\n        words = []\n        labels = []\n        for line in f:\n            if line.startswith(\"-DOCSTART-\") or line == \"\" or line == \"n\":\n                if words:\n                    examples[\"words\"].append(words)\n                    examples[\"labels\"].append(labels)\n                    words = []\n                    labels = []\n            else:\n                splits = line.split(\" \")\n                words.append(splits[0])\n                if len(splits) > 1:\n                    labels.append(splits[-1].replace(\"n\", \"\"))\n                else:\n                    # Examples could have no label for mode = \"test\"\n                    labels.append(\"O\")\n    return examples\n<\/code><\/pre>\n<p>Read data and labels from the raw text files:<\/p>\n<pre><code>y_true = read_examples_from_file(\"test.txt\")[\"labels\"]\ny_pred = read_examples_from_file(\"spanberta-ner\/test_predictions.txt\")[\"labels\"]\n<\/code><\/pre>\n<p>Print the classification report:<\/p>\n<pre><code>from seqeval.metrics import classification_report as classification_report_seqeval\nprint(classification_report_seqeval(y_true, y_pred))\n<\/code><\/pre>\n<pre><code>           precision    recall  f1-score   support\n      LOC       0.87      0.84      0.85      1084\n      ORG       0.82      0.87      0.85      1401\n     MISC       0.63      0.66      0.65       340\n      PER       0.94      0.96      0.95       735\nmicro avg       0.84      0.86      0.85      3560\nmacro avg       0.84      0.86      0.85      3560\n<\/code><\/pre>\n<p>The metrics we are seeing in this report are designed specifically for NLP tasks such as NER and POS tagging, in which all words of an entity need to be predicted correctly to be counted as one correct prediction. Therefore, the metrics in this classification report are much lower than in <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.metrics.classification_report.html\">scikit-learn's classification report<\/a>.<\/p>\n<pre><code>import numpy as np\nfrom sklearn.metrics import classification_report\nprint(classification_report(np.concatenate(y_true), np.concatenate(y_pred)))\n<\/code><\/pre>\n<pre><code>              precision    recall  f1-score   support\n       B-LOC       0.88      0.85      0.86      1084\n      B-MISC       0.73      0.73      0.73       339\n       B-ORG       0.87      0.91      0.89      1400\n       B-PER       0.95      0.96      0.95       735\n       I-LOC       0.82      0.81      0.81       325\n      I-MISC       0.85      0.76      0.80       557\n       I-ORG       0.89      0.87      0.88      1104\n       I-PER       0.98      0.98      0.98       634\n           O       1.00      1.00      1.00     45355\n    accuracy                           0.98     51533\n   macro avg       0.89      0.87      0.88     51533\nweighted avg       0.98      0.98      0.98     51533\n<\/code><\/pre>\n<p>From above reports, our model has a good performance in predicting person, location and organization. We will need more data for <code>MISC<\/code> entities to improve our model's performance on these entities.<\/p>\n<h1><span class=\"ez-toc-section\" id=\"Pipeline\"><\/span>Pipeline<span class=\"ez-toc-section-end\"><\/span><\/h1>\n<p>After fine-tuning our models, we can share them with the community by following the tutorial in this <a href=\"https:\/\/huggingface.co\/transformers\/model_sharing.html\">page<\/a>. Now we can start loading the fine-tuned model from Hugging Face's server and use it to predict named entities in Spanish documents.<\/p>\n<pre><code>from transformers import pipeline, AutoModelForTokenClassification, AutoTokenizer\nmodel = AutoModelForTokenClassification.from_pretrained(\"skimai\/spanberta-base-cased-ner-conll02\")\ntokenizer = AutoTokenizer.from_pretrained(\"skimai\/spanberta-base-cased-ner-conll02\")\nner_model = pipeline('ner', model=model, tokenizer=tokenizer)\n<\/code><\/pre>\n<p>The example below is obtained from <a href=\"https:\/\/laopinion.com\/2020\/04\/19\/secretario-del-tesoro-advierte-que-la-economia-de-estados-unidos-tardara-meses-en-recuperarse-tras-coronavirus\/\">La Opini\u00f3n<\/a> and means \"<em>The economic recovery of the United States after the coronavirus pandemic will be a matter of months, said Treasury Secretary Steven Mnuchin.<\/em>\"<\/p>\n<pre><code>sequence = \"La recuperaci\u00f3n econ\u00f3mica de los Estados Unidos despu\u00e9s de la \" \n           \"pandemia del coronavirus ser\u00e1 cuesti\u00f3n de meses, afirm\u00f3 el \" \n           \"Secretario del Tesoro, Steven Mnuchin.\"\nner_model(sequence)\n<\/code><\/pre>\n<pre><code>[{'entity': 'B-ORG', 'score': 0.9155661463737488, 'word': '\u0120Estados'},\n {'entity': 'I-ORG', 'score': 0.800682544708252, 'word': '\u0120Unidos'},\n {'entity': 'I-MISC', 'score': 0.5006815791130066, 'word': '\u0120corona'},\n {'entity': 'I-MISC', 'score': 0.510674774646759, 'word': 'virus'},\n {'entity': 'B-PER', 'score': 0.5558510422706604, 'word': '\u0120Secretario'},\n {'entity': 'I-PER', 'score': 0.7758238315582275, 'word': '\u0120del'},\n {'entity': 'I-PER', 'score': 0.7096233367919922, 'word': '\u0120Tesoro'},\n {'entity': 'B-PER', 'score': 0.9940345883369446, 'word': '\u0120Steven'},\n {'entity': 'I-PER', 'score': 0.9962581992149353, 'word': '\u0120M'},\n {'entity': 'I-PER', 'score': 0.9918380379676819, 'word': 'n'},\n {'entity': 'I-PER', 'score': 0.9848328828811646, 'word': 'uch'},\n {'entity': 'I-PER', 'score': 0.8513168096542358, 'word': 'in'}]\n<\/code><\/pre>\n<p>Looks great! The fine-tuned model successfully recognizes all entities in our example, and even recognizes \"corona virus.\"<\/p>\n<h1><span class=\"ez-toc-section\" id=\"Conclusion\"><\/span>Conclusion<span class=\"ez-toc-section-end\"><\/span><\/h1>\n<p>Named-entity recognition can help us quickly extract important information from texts. Therefore, its application in business can have a direct impact on improving human's productivity in reading contracts and documents. However, it is a challenging NLP task because NER requires accurate classification at the word level, making simple approaches such as bag-of-word impossible to deal with this task.<\/p>\n<p>We have walked through how we can leverage a pretrained BERT model to quickly gain an excellent performance on the NER task for Spanish. The pretrained SpanBERTa model can also be fine-tuned for other tasks such as document classification. I have written a detailed tutorial to finetune BERT for sequence classification and sentiment analysis.<\/p>\n<ul>\n<li><a href=\"http:\/\/skimai.com\/fine-tuning-bert-for-sentiment-analysis\/\">Fine-tuning BERT for Sentiment Analysis<\/a><\/li>\n<\/ul>\n<p>Next in this series is Part 3, we will discuss how to use ELECTRA, a more efficient pre-training approach for transformer models which can quickly achieve state-of-the-art performance. Stay tuned!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Tutorial: How to Fine-tune BERT for NER Originally published by Skim AI&#8217;s Machine Learning Researcher, Chris Tran. Introduction This article is on how to fine-tune BERT for Named Entity Recognition (NER). Specifically, how to train a BERT variation, SpanBERTa, for NER. It is Part II of III in a series on training custom BERT Language [&hellip;]<\/p>\n","protected":false},"author":1003,"featured_media":3734,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"single-custom-post-template.php","format":"image","meta":{"_et_pb_use_builder":"on","_et_pb_old_content":"<h1>Tutorial: How to Fine-tune BERT for NER<\/h1>        \n<pre><code>    Originally published by Skim AI's Machine Learning Researcher, Chris Tran.<\/code><\/pre>\n<p><a href=\"https:\/\/colab.research.google.com\/drive\/1ezuE7wC7Fa21Wu3fvzRffx2m14CAySS1#scrollTo=LhKZ3vItVBzi\"><img src=\"https:\/\/img.shields.io\/badge\/Colab-Run_in_Google_Colab-blue?logo=Google&logoColor=FDBA18\" alt=\"Run in Google Colab\"><\/a><\/p><\/p>\n<h1>Introduction<\/h1>\n<p>This article is on how to fine-tune BERT for Named Entity Recognition (NER). Specifically, how to train a BERT variation, SpanBERTa, for NER. It is Part II of III in a series on training custom BERT Language Models for Spanish for a variety of use cases:<\/p>\n<p><br><br><\/p>\n<ul>\n<li><a href=\"http:\/\/skimai.com\/roberta-language-model-for-spanish\/\">Part I: How to Train a RoBERTa Language Model for Spanish from Scratch<\/a><\/li>\n<li><a href=\"http:\/\/skimai.com\/how-to-train-electra-language-model-for-spanish\/\">Part III: How to Train an ELECTRA Language Model for Spanish from Scratch<\/a><\/li>\n<\/ul>\n<p>In my previous blog post, we have discussed how my team pretrained SpanBERTa, a transformer language model for Spanish, on a big corpus from scratch. The model has shown to be able to predict correctly masked words in a sequence based on its context. In this blog post, to really leverage the power of transformer models, we will fine-tune SpanBERTa for a named-entity recognition task.<\/p>\n<p>According to its definition on <a href=\"https:\/\/en.wikipedia.org\/wiki\/Named-entity_recognition\">Wikipedia<\/a>, Named-entity recognition (NER) (also known as entity identification, entity chunking and entity extraction) is a subtask of information extraction that seeks to locate and classify named entity mentioned in unstructured text into pre-defined categories such as person names, organizations, locations, medical codes, time expressions, quantities, monetary values, percentages, etc.<\/p>\n<p>We will use the script <a href=\"https:\/\/github.com\/huggingface\/transformers\/blob\/master\/examples\/ner\/run_ner.py\"><code>run_ner.py<\/code><\/a> by Hugging Face and <a href=\"https:\/\/www.kaggle.com\/nltkdata\/conll-corpora\">CoNLL-2002 dataset<\/a> to fine-tune SpanBERTa.<\/p>\n<h1>Setup<\/h1>\n<p>Download <code>transformers<\/code> and install required packages.<\/p>\n<pre><code>%%capture\n!git clone https:\/\/github.com\/huggingface\/transformers\n%cd transformers\n!pip install .\n!pip install -r .\/examples\/requirements.txt\n%cd ..\n<\/code><\/pre>\n<h1>Data<\/h1>\n<h2>1. Download Datasets<\/h2>\n<p>The below command will download and unzip the dataset. The files contain the train and test data for three parts of the <a href=\"https:\/\/www.clips.uantwerpen.be\/conll2002\/ner\/\">CoNLL-2002<\/a> shared task:<\/p>\n<ul>\n<li>esp.testa: Spanish test data for the development stage<\/li>\n<li>esp.testb: Spanish test data<\/li>\n<li>esp.train: Spanish train data<\/li>\n<\/ul>\n<pre><code>%%capture\n!wget -O 'conll2002.zip' 'https:\/\/drive.google.com\/uc?export=download&id=1Wrl1b39ZXgKqCeAFNM9EoXtA1kzwNhCe'\n!unzip 'conll2002.zip'\n<\/code><\/pre>\n<p>The size of each dataset:<\/p>\n<pre><code>!wc -l conll2002\/esp.train\n!wc -l conll2002\/esp.testa\n!wc -l conll2002\/esp.testb\n<\/code><\/pre>\n<pre><code>273038 conll2002\/esp.train\n54838 conll2002\/esp.testa\n53050 conll2002\/esp.testb\n<\/code><\/pre>\n<p>All data files has three columns: words, associated part-of-speech tags and named entity tags in the IOB2 format. Sentence breaks are encoded by empty lines.<\/p>\n<pre><code>!head -n20 conll2002\/esp.train\n<\/code><\/pre>\n<pre><code>Melbourne NP B-LOC\n( Fpa O\nAustralia NP B-LOC\n) Fpt O\n, Fc O\n25 Z O\nmay NC O\n( Fpa O\nEFE NC B-ORG\n) Fpt O\n. Fp O\n- Fg O\nEl DA O\nAbogado NC B-PER\nGeneral AQ I-PER\ndel SP I-PER\nEstado NC I-PER\n, Fc O\n<\/code><\/pre>\n<p>We will only keep the word column and the named entity tag column for our train, dev and test datasets.<\/p>\n<pre><code>!cat conll2002\/esp.train | cut -d \" \" -f 1,3 > train_temp.txt\n!cat conll2002\/esp.testa | cut -d \" \" -f 1,3 > dev_temp.txt\n!cat conll2002\/esp.testb | cut -d \" \" -f 1,3 > test_temp.txt\n<\/code><\/pre>\n<h2>2. Preprocessing<\/h2>\n<p>Let's define some variables that we need for further pre-processing steps and training the model:<\/p>\n<pre><code>MAX_LENGTH = 120 #@param {type: \"integer\"}\nMODEL = \"chriskhanhtran\/spanberta\" #@param [\"chriskhanhtran\/spanberta\", \"bert-base-multilingual-cased\"]\n<\/code><\/pre>\n<p>The script below will split sentences longer than <code>MAX_LENGTH<\/code> (in terms of tokens) into small ones. Otherwise, long sentences will be truncated when tokenized, causing the loss of training data and some tokens in the test set not being predicted.<\/p>\n<pre><code>%%capture\n!wget \"https:\/\/raw.githubusercontent.com\/stefan-it\/fine-tuned-berts-seq\/master\/scripts\/preprocess.py\"\n<\/code><\/pre>\n<pre><code>!python3 preprocess.py train_temp.txt $MODEL $MAX_LENGTH > train.txt\n!python3 preprocess.py dev_temp.txt $MODEL $MAX_LENGTH > dev.txt\n!python3 preprocess.py test_temp.txt $MODEL $MAX_LENGTH > test.txt\n<\/code><\/pre>\n<pre><code>2020-04-22 23:02:05.747294: I tensorflow\/stream_executor\/platform\/default\/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1\nDownloading: 100% 1.03k\/1.03k [00:00<00:00, 704kB\/s]\nDownloading: 100% 954k\/954k [00:00<00:00, 1.89MB\/s]\nDownloading: 100% 512k\/512k [00:00<00:00, 1.19MB\/s]\nDownloading: 100% 16.0\/16.0 [00:00<00:00, 12.6kB\/s]\n2020-04-22 23:02:23.409488: I tensorflow\/stream_executor\/platform\/default\/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1\n2020-04-22 23:02:31.168967: I tensorflow\/stream_executor\/platform\/default\/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1\n<\/code><\/pre>\n<h2>3. Labels<\/h2>\n<p>In CoNLL-2002\/2003 datasets, there are have 9 classes of NER tags:<\/p>\n<ul>\n<li>O, Outside of a named entity<\/li>\n<li>B-MIS, Beginning of a miscellaneous entity right after another miscellaneous entity<\/li>\n<li>I-MIS, Miscellaneous entity<\/li>\n<li>B-PER, Beginning of a person\u2019s name right after another person\u2019s name<\/li>\n<li>I-PER, Person\u2019s name<\/li>\n<li>B-ORG, Beginning of an organisation right after another organisation<\/li>\n<li>I-ORG, Organisation<\/li>\n<li>B-LOC, Beginning of a location right after another location<\/li>\n<li>I-LOC, Location<\/li>\n<\/ul>\n<p>If your dataset has different labels or more labels than CoNLL-2002\/2003 datasets, run the line below to get unique labels from your data and save them into <code>labels.txt<\/code>. This file will be used when we start fine-tuning our model.<\/p>\n<pre><code>!cat train.txt dev.txt test.txt | cut -d \" \" -f 2 | grep -v \"^$\"| sort | uniq > labels.txt\n<\/code><\/pre>\n<h1>Fine-tuning Model<\/h1>\n<p>These are the example scripts from <code>transformers<\/code>'s repo that we will use to fine-tune our model for NER. After 04\/21\/2020, Hugging Face has updated their example scripts to use a new <code>Trainer<\/code> class. To avoid any future conflict, let's use the version before they made these updates.<\/p>\n<pre><code>%%capture\n!wget \"https:\/\/raw.githubusercontent.com\/chriskhanhtran\/spanish-bert\/master\/ner\/run_ner.py\"\n!wget \"https:\/\/raw.githubusercontent.com\/chriskhanhtran\/spanish-bert\/master\/ner\/utils_ner.py\"\n<\/code><\/pre>\n<p>Now it's time for transfer learning. In my <a href=\"https:\/\/chriskhanhtran.github.io\/posts\/spanberta-bert-for-spanish-from-scratch\/\">previous blog post<\/a>, I have pretrained a RoBERTa language model on a very large Spanish corpus to predict masked words based on the context they are in. By doing that, the model has learned inherent properties of the language. I have uploaded the pretrained model to Hugging Face's server. Now we will load the model and start fine-tuning it for the NER task.<\/p>\n<p>Below are our training hyperparameters.<\/p>\n<pre><code>MAX_LENGTH = 128 #@param {type: \"integer\"}\nMODEL = \"chriskhanhtran\/spanberta\" #@param [\"chriskhanhtran\/spanberta\", \"bert-base-multilingual-cased\"]\nOUTPUT_DIR = \"spanberta-ner\" #@param [\"spanberta-ner\", \"bert-base-ml-ner\"]\nBATCH_SIZE = 32 #@param {type: \"integer\"}\nNUM_EPOCHS = 3 #@param {type: \"integer\"}\nSAVE_STEPS = 100 #@param {type: \"integer\"}\nLOGGING_STEPS = 100 #@param {type: \"integer\"}\nSEED = 42 #@param {type: \"integer\"}\n<\/code><\/pre>\n<p>Let's start training.<\/p>\n<pre><code>!python3 run_ner.py \n  --data_dir .\/ \n  --model_type bert \n  --labels .\/labels.txt \n  --model_name_or_path $MODEL \n  --output_dir $OUTPUT_DIR \n  --max_seq_length  $MAX_LENGTH \n  --num_train_epochs $NUM_EPOCHS \n  --per_gpu_train_batch_size $BATCH_SIZE \n  --save_steps $SAVE_STEPS \n  --logging_steps $LOGGING_STEPS \n  --seed $SEED \n  --do_train \n  --do_eval \n  --do_predict \n  --overwrite_output_dir\n<\/code><\/pre>\n<p>Performance on the dev set:<\/p>\n<pre><code>04\/21\/2020 02:24:31 - INFO - __main__ -   ***** Eval results  *****\n04\/21\/2020 02:24:31 - INFO - __main__ -     f1 = 0.831027443864822\n04\/21\/2020 02:24:31 - INFO - __main__ -     loss = 0.1004064822183894\n04\/21\/2020 02:24:31 - INFO - __main__ -     precision = 0.8207885304659498\n04\/21\/2020 02:24:31 - INFO - __main__ -     recall = 0.8415250344510795\n<\/code><\/pre>\n<p>Performance on the test set:<\/p>\n<pre><code>04\/21\/2020 02:24:48 - INFO - __main__ -   ***** Eval results  *****\n04\/21\/2020 02:24:48 - INFO - __main__ -     f1 = 0.8559533721898419\n04\/21\/2020 02:24:48 - INFO - __main__ -     loss = 0.06848683688204177\n04\/21\/2020 02:24:48 - INFO - __main__ -     precision = 0.845858475041141\n04\/21\/2020 02:24:48 - INFO - __main__ -     recall = 0.8662921348314607\n<\/code><\/pre>\n<p>Here are the tensorboards of fine-tuning <a href=\"https:\/\/tensorboard.dev\/experiment\/Ggs7aCjWQ0exU2Nbp3pPlQ\/#scalars&_smoothingWeight=0.265\">spanberta<\/a> and <a href=\"https:\/\/tensorboard.dev\/experiment\/M9AXw2lORjeRzFZzEJOxkA\/#scalars\">bert-base-multilingual-cased<\/a> for 5 epoches. We can see that the models overfit the training data after 3 epoches.<\/p>\n<p><img src=\"https:\/\/raw.githubusercontent.com\/chriskhanhtran\/spanish-bert\/master\/img\/spanberta-ner-tb-5.JPG\" alt=\"\"><\/p>\n<p><strong>Classification Report<\/strong><\/p>\n<p>To understand how well our model actually performs, let's load its predictions and examine the classification report.<\/p>\n<pre><code>def read_examples_from_file(file_path):\n    \"\"\"Read words and labels from a CoNLL-2002\/2003 data file.\n    Args:\n      file_path (str): path to NER data file.\n    Returns:\n      examples (dict): a dictionary with two keys: <code>words<\/code> (list of lists)\n        holding words in each sequence, and <code>labels<\/code> (list of lists) holding\n        corresponding labels.\n    \"\"\"\n    with open(file_path, encoding=\"utf-8\") as f:\n        examples = {\"words\": [], \"labels\": []}\n        words = []\n        labels = []\n        for line in f:\n            if line.startswith(\"-DOCSTART-\") or line == \"\" or line == \"n\":\n                if words:\n                    examples[\"words\"].append(words)\n                    examples[\"labels\"].append(labels)\n                    words = []\n                    labels = []\n            else:\n                splits = line.split(\" \")\n                words.append(splits[0])\n                if len(splits) > 1:\n                    labels.append(splits[-1].replace(\"n\", \"\"))\n                else:\n                    # Examples could have no label for mode = \"test\"\n                    labels.append(\"O\")\n    return examples\n<\/code><\/pre>\n<p>Read data and labels from the raw text files:<\/p>\n<pre><code>y_true = read_examples_from_file(\"test.txt\")[\"labels\"]\ny_pred = read_examples_from_file(\"spanberta-ner\/test_predictions.txt\")[\"labels\"]\n<\/code><\/pre>\n<p>Print the classification report:<\/p>\n<pre><code>from seqeval.metrics import classification_report as classification_report_seqeval\nprint(classification_report_seqeval(y_true, y_pred))\n<\/code><\/pre>\n<pre><code>           precision    recall  f1-score   support\n      LOC       0.87      0.84      0.85      1084\n      ORG       0.82      0.87      0.85      1401\n     MISC       0.63      0.66      0.65       340\n      PER       0.94      0.96      0.95       735\nmicro avg       0.84      0.86      0.85      3560\nmacro avg       0.84      0.86      0.85      3560\n<\/code><\/pre>\n<p>The metrics we are seeing in this report are designed specifically for NLP tasks such as NER and POS tagging, in which all words of an entity need to be predicted correctly to be counted as one correct prediction. Therefore, the metrics in this classification report are much lower than in <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.metrics.classification_report.html\">scikit-learn's classification report<\/a>.<\/p>\n<pre><code>import numpy as np\nfrom sklearn.metrics import classification_report\nprint(classification_report(np.concatenate(y_true), np.concatenate(y_pred)))\n<\/code><\/pre>\n<pre><code>              precision    recall  f1-score   support\n       B-LOC       0.88      0.85      0.86      1084\n      B-MISC       0.73      0.73      0.73       339\n       B-ORG       0.87      0.91      0.89      1400\n       B-PER       0.95      0.96      0.95       735\n       I-LOC       0.82      0.81      0.81       325\n      I-MISC       0.85      0.76      0.80       557\n       I-ORG       0.89      0.87      0.88      1104\n       I-PER       0.98      0.98      0.98       634\n           O       1.00      1.00      1.00     45355\n    accuracy                           0.98     51533\n   macro avg       0.89      0.87      0.88     51533\nweighted avg       0.98      0.98      0.98     51533\n<\/code><\/pre>\n<p>From above reports, our model has a good performance in predicting person, location and organization. We will need more data for <code>MISC<\/code> entities to improve our model's performance on these entities.<\/p>\n<h1>Pipeline<\/h1>\n<p>After fine-tuning our models, we can share them with the community by following the tutorial in this <a href=\"https:\/\/huggingface.co\/transformers\/model_sharing.html\">page<\/a>. Now we can start loading the fine-tuned model from Hugging Face's server and use it to predict named entities in Spanish documents.<\/p>\n<pre><code>from transformers import pipeline, AutoModelForTokenClassification, AutoTokenizer\nmodel = AutoModelForTokenClassification.from_pretrained(\"skimai\/spanberta-base-cased-ner-conll02\")\ntokenizer = AutoTokenizer.from_pretrained(\"skimai\/spanberta-base-cased-ner-conll02\")\nner_model = pipeline('ner', model=model, tokenizer=tokenizer)\n<\/code><\/pre>\n<p>The example below is obtained from <a href=\"https:\/\/laopinion.com\/2020\/04\/19\/secretario-del-tesoro-advierte-que-la-economia-de-estados-unidos-tardara-meses-en-recuperarse-tras-coronavirus\/\">La Opini\u00f3n<\/a> and means \"<em>The economic recovery of the United States after the coronavirus pandemic will be a matter of months, said Treasury Secretary Steven Mnuchin.<\/em>\"<\/p>\n<pre><code>sequence = \"La recuperaci\u00f3n econ\u00f3mica de los Estados Unidos despu\u00e9s de la \" \n           \"pandemia del coronavirus ser\u00e1 cuesti\u00f3n de meses, afirm\u00f3 el \" \n           \"Secretario del Tesoro, Steven Mnuchin.\"\nner_model(sequence)\n<\/code><\/pre>\n<pre><code>[{'entity': 'B-ORG', 'score': 0.9155661463737488, 'word': '\u0120Estados'},\n {'entity': 'I-ORG', 'score': 0.800682544708252, 'word': '\u0120Unidos'},\n {'entity': 'I-MISC', 'score': 0.5006815791130066, 'word': '\u0120corona'},\n {'entity': 'I-MISC', 'score': 0.510674774646759, 'word': 'virus'},\n {'entity': 'B-PER', 'score': 0.5558510422706604, 'word': '\u0120Secretario'},\n {'entity': 'I-PER', 'score': 0.7758238315582275, 'word': '\u0120del'},\n {'entity': 'I-PER', 'score': 0.7096233367919922, 'word': '\u0120Tesoro'},\n {'entity': 'B-PER', 'score': 0.9940345883369446, 'word': '\u0120Steven'},\n {'entity': 'I-PER', 'score': 0.9962581992149353, 'word': '\u0120M'},\n {'entity': 'I-PER', 'score': 0.9918380379676819, 'word': 'n'},\n {'entity': 'I-PER', 'score': 0.9848328828811646, 'word': 'uch'},\n {'entity': 'I-PER', 'score': 0.8513168096542358, 'word': 'in'}]\n<\/code><\/pre>\n<p>Looks great! The fine-tuned model successfully recognizes all entities in our example, and even recognizes \"corona virus.\"<\/p>\n<h1>Conclusion<\/h1>\n<p>Named-entity recognition can help us quickly extract important information from texts. Therefore, its application in business can have a direct impact on improving human's productivity in reading contracts and documents. However, it is a challenging NLP task because NER requires accurate classification at the word level, making simple approaches such as bag-of-word impossible to deal with this task.<\/p>\n<p>We have walked through how we can leverage a pretrained BERT model to quickly gain an excellent performance on the NER task for Spanish. The pretrained SpanBERTa model can also be fine-tuned for other tasks such as document classification. I have written a detailed tutorial to finetune BERT for sequence classification and sentiment analysis.<\/p>\n<ul>\n<li><a href=\"http:\/\/skimai.com\/fine-tuning-bert-for-sentiment-analysis\/\">Fine-tuning BERT for Sentiment Analysis<\/a><\/li>\n<\/ul>\n<p>Next in this series is Part 3, we will discuss how to use ELECTRA, a more efficient pre-training approach for transformer models which can quickly achieve state-of-the-art performance. Stay tuned!","_et_gb_content_width":"","footnotes":""},"categories":[125,64,67],"tags":[74,79,76,75,82],"class_list":["post-3732","post","type-post","status-publish","format-image","has-post-thumbnail","hentry","category-enterprise-ai-blog","category-how-to","category-ml-nlp","tag-bert","tag-how-to","tag-ner","tag-nlp","tag-tutorial","post_format-post-format-image"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v24.1 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Tutorial: How to Fine-Tune BERT for Named Entity Recognition (NER)<\/title>\n<meta name=\"description\" content=\"&quot;How to&quot; fine-tune BERT for NER. Part 2 in a 3-part series on how to train BERT, roBERTa, and ELECTRA language models for multiple use cases\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/skimai.com\/ja\/\u540d\u524d\u4ed8\u304d\u30a8\u30f3\u30c6\u30a3\u30c6\u30a3\u8a8d\u8b58\u306e\u305f\u3081\u306ebert\u306e\u5fae\u8abf\u6574\u65b9\/\" \/>\n<meta property=\"og:locale\" content=\"ja_JP\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Tutorial: How to Fine-Tune BERT for Named Entity Recognition (NER)\" \/>\n<meta property=\"og:description\" content=\"&quot;How to&quot; fine-tune BERT for NER. Part 2 in a 3-part series on how to train BERT, roBERTa, and ELECTRA language models for multiple use cases\" \/>\n<meta property=\"og:url\" content=\"https:\/\/skimai.com\/ja\/\u540d\u524d\u4ed8\u304d\u30a8\u30f3\u30c6\u30a3\u30c6\u30a3\u8a8d\u8b58\u306e\u305f\u3081\u306ebert\u306e\u5fae\u8abf\u6574\u65b9\/\" \/>\n<meta property=\"og:site_name\" content=\"Skim AI\" \/>\n<meta property=\"article:published_time\" content=\"2020-12-29T04:33:59+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-05-20T12:38:31+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/skimai.com\/wp-content\/uploads\/2020\/12\/part2-1.png\" \/>\n\t<meta property=\"og:image:width\" content=\"2048\" \/>\n\t<meta property=\"og:image:height\" content=\"1152\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Greggory Elias\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"\u57f7\u7b46\u8005\" \/>\n\t<meta name=\"twitter:data1\" content=\"Greggory Elias\" \/>\n\t<meta name=\"twitter:label2\" content=\"\u63a8\u5b9a\u8aad\u307f\u53d6\u308a\u6642\u9593\" \/>\n\t<meta name=\"twitter:data2\" content=\"6\u5206\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/skimai.com\/how-to-fine-tune-bert-for-named-entity-recognition-ner\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/skimai.com\/how-to-fine-tune-bert-for-named-entity-recognition-ner\/\"},\"author\":{\"name\":\"Greggory Elias\",\"@id\":\"https:\/\/skimai.com\/uk\/#\/schema\/person\/7a883b4a2d2ea22040f42a7975eb86c6\"},\"headline\":\"Tutorial: How to Fine-Tune BERT for Named Entity Recognition (NER)\",\"datePublished\":\"2020-12-29T04:33:59+00:00\",\"dateModified\":\"2024-05-20T12:38:31+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/skimai.com\/how-to-fine-tune-bert-for-named-entity-recognition-ner\/\"},\"wordCount\":1013,\"publisher\":{\"@id\":\"https:\/\/skimai.com\/uk\/#organization\"},\"image\":{\"@id\":\"https:\/\/skimai.com\/how-to-fine-tune-bert-for-named-entity-recognition-ner\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/skimai.com\/wp-content\/uploads\/2020\/12\/part2-1.png\",\"keywords\":[\"bert\",\"how to\",\"ner\",\"nlp\",\"tutorial\"],\"articleSection\":[\"Enterprise AI\",\"How to\",\"LLMs \/ NLP\"],\"inLanguage\":\"ja\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/skimai.com\/how-to-fine-tune-bert-for-named-entity-recognition-ner\/\",\"url\":\"https:\/\/skimai.com\/how-to-fine-tune-bert-for-named-entity-recognition-ner\/\",\"name\":\"Tutorial: How to Fine-Tune BERT for Named Entity Recognition (NER)\",\"isPartOf\":{\"@id\":\"https:\/\/skimai.com\/uk\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/skimai.com\/how-to-fine-tune-bert-for-named-entity-recognition-ner\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/skimai.com\/how-to-fine-tune-bert-for-named-entity-recognition-ner\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/skimai.com\/wp-content\/uploads\/2020\/12\/part2-1.png\",\"datePublished\":\"2020-12-29T04:33:59+00:00\",\"dateModified\":\"2024-05-20T12:38:31+00:00\",\"description\":\"\\\"How to\\\" fine-tune BERT for NER. Part 2 in a 3-part series on how to train BERT, roBERTa, and ELECTRA language models for multiple use cases\",\"breadcrumb\":{\"@id\":\"https:\/\/skimai.com\/how-to-fine-tune-bert-for-named-entity-recognition-ner\/#breadcrumb\"},\"inLanguage\":\"ja\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/skimai.com\/how-to-fine-tune-bert-for-named-entity-recognition-ner\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"ja\",\"@id\":\"https:\/\/skimai.com\/how-to-fine-tune-bert-for-named-entity-recognition-ner\/#primaryimage\",\"url\":\"https:\/\/skimai.com\/wp-content\/uploads\/2020\/12\/part2-1.png\",\"contentUrl\":\"https:\/\/skimai.com\/wp-content\/uploads\/2020\/12\/part2-1.png\",\"width\":2048,\"height\":1152,\"caption\":\"part2 1\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/skimai.com\/how-to-fine-tune-bert-for-named-entity-recognition-ner\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/skimai.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Tutorial: How to Fine-Tune BERT for Named Entity Recognition (NER)\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/skimai.com\/uk\/#website\",\"url\":\"https:\/\/skimai.com\/uk\/\",\"name\":\"Skim AI\",\"description\":\"The AI Agent Workforce Platform\",\"publisher\":{\"@id\":\"https:\/\/skimai.com\/uk\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/skimai.com\/uk\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"ja\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/skimai.com\/uk\/#organization\",\"name\":\"Skim AI\",\"url\":\"https:\/\/skimai.com\/uk\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ja\",\"@id\":\"https:\/\/skimai.com\/uk\/#\/schema\/logo\/image\/\",\"url\":\"http:\/\/skimai.com\/wp-content\/uploads\/2020\/07\/SKIM-AI-Header-Logo.png\",\"contentUrl\":\"http:\/\/skimai.com\/wp-content\/uploads\/2020\/07\/SKIM-AI-Header-Logo.png\",\"width\":194,\"height\":58,\"caption\":\"Skim AI\"},\"image\":{\"@id\":\"https:\/\/skimai.com\/uk\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.linkedin.com\/company\/skim-ai\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/skimai.com\/uk\/#\/schema\/person\/7a883b4a2d2ea22040f42a7975eb86c6\",\"name\":\"Greggory Elias\",\"url\":\"https:\/\/skimai.com\/ja\/author\/gregg\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"\u30c1\u30e5\u30fc\u30c8\u30ea\u30a2\u30eb\u540d\u524d\u4ed8\u304d\u56fa\u6709\u8868\u73fe\u8a8d\u8b58\uff08NER\uff09\u306e\u305f\u3081\u306eBERT\u306e\u5fae\u8abf\u6574\u65b9\u6cd5","description":"BERT \u3092 NER \u7528\u306b\u5fae\u8abf\u6574\u3059\u308b\u300c\u65b9\u6cd5\u300d\u3002\u8907\u6570\u306e\u30e6\u30fc\u30b9\u30b1\u30fc\u30b9\u306b\u5bfe\u5fdc\u3059\u308b BERT\u3001roBERTa\u3001ELECTRA \u8a00\u8a9e\u30e2\u30c7\u30eb\u306e\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u65b9\u6cd5\u306b\u95a2\u3059\u308b 3 \u90e8\u69cb\u6210\u306e\u30d1\u30fc\u30c8 2","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/skimai.com\/ja\/\u540d\u524d\u4ed8\u304d\u30a8\u30f3\u30c6\u30a3\u30c6\u30a3\u8a8d\u8b58\u306e\u305f\u3081\u306ebert\u306e\u5fae\u8abf\u6574\u65b9\/","og_locale":"ja_JP","og_type":"article","og_title":"Tutorial: How to Fine-Tune BERT for Named Entity Recognition (NER)","og_description":"\"How to\" fine-tune BERT for NER. Part 2 in a 3-part series on how to train BERT, roBERTa, and ELECTRA language models for multiple use cases","og_url":"https:\/\/skimai.com\/ja\/\u540d\u524d\u4ed8\u304d\u30a8\u30f3\u30c6\u30a3\u30c6\u30a3\u8a8d\u8b58\u306e\u305f\u3081\u306ebert\u306e\u5fae\u8abf\u6574\u65b9\/","og_site_name":"Skim AI","article_published_time":"2020-12-29T04:33:59+00:00","article_modified_time":"2024-05-20T12:38:31+00:00","og_image":[{"width":2048,"height":1152,"url":"https:\/\/skimai.com\/wp-content\/uploads\/2020\/12\/part2-1.png","type":"image\/png"}],"author":"Greggory Elias","twitter_card":"summary_large_image","twitter_misc":{"\u57f7\u7b46\u8005":"Greggory Elias","\u63a8\u5b9a\u8aad\u307f\u53d6\u308a\u6642\u9593":"6\u5206"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/skimai.com\/how-to-fine-tune-bert-for-named-entity-recognition-ner\/#article","isPartOf":{"@id":"https:\/\/skimai.com\/how-to-fine-tune-bert-for-named-entity-recognition-ner\/"},"author":{"name":"Greggory Elias","@id":"https:\/\/skimai.com\/uk\/#\/schema\/person\/7a883b4a2d2ea22040f42a7975eb86c6"},"headline":"Tutorial: How to Fine-Tune BERT for Named Entity Recognition (NER)","datePublished":"2020-12-29T04:33:59+00:00","dateModified":"2024-05-20T12:38:31+00:00","mainEntityOfPage":{"@id":"https:\/\/skimai.com\/how-to-fine-tune-bert-for-named-entity-recognition-ner\/"},"wordCount":1013,"publisher":{"@id":"https:\/\/skimai.com\/uk\/#organization"},"image":{"@id":"https:\/\/skimai.com\/how-to-fine-tune-bert-for-named-entity-recognition-ner\/#primaryimage"},"thumbnailUrl":"https:\/\/skimai.com\/wp-content\/uploads\/2020\/12\/part2-1.png","keywords":["bert","how to","ner","nlp","tutorial"],"articleSection":["Enterprise AI","How to","LLMs \/ NLP"],"inLanguage":"ja"},{"@type":"WebPage","@id":"https:\/\/skimai.com\/how-to-fine-tune-bert-for-named-entity-recognition-ner\/","url":"https:\/\/skimai.com\/how-to-fine-tune-bert-for-named-entity-recognition-ner\/","name":"\u30c1\u30e5\u30fc\u30c8\u30ea\u30a2\u30eb\u540d\u524d\u4ed8\u304d\u56fa\u6709\u8868\u73fe\u8a8d\u8b58\uff08NER\uff09\u306e\u305f\u3081\u306eBERT\u306e\u5fae\u8abf\u6574\u65b9\u6cd5","isPartOf":{"@id":"https:\/\/skimai.com\/uk\/#website"},"primaryImageOfPage":{"@id":"https:\/\/skimai.com\/how-to-fine-tune-bert-for-named-entity-recognition-ner\/#primaryimage"},"image":{"@id":"https:\/\/skimai.com\/how-to-fine-tune-bert-for-named-entity-recognition-ner\/#primaryimage"},"thumbnailUrl":"https:\/\/skimai.com\/wp-content\/uploads\/2020\/12\/part2-1.png","datePublished":"2020-12-29T04:33:59+00:00","dateModified":"2024-05-20T12:38:31+00:00","description":"BERT \u3092 NER \u7528\u306b\u5fae\u8abf\u6574\u3059\u308b\u300c\u65b9\u6cd5\u300d\u3002\u8907\u6570\u306e\u30e6\u30fc\u30b9\u30b1\u30fc\u30b9\u306b\u5bfe\u5fdc\u3059\u308b BERT\u3001roBERTa\u3001ELECTRA \u8a00\u8a9e\u30e2\u30c7\u30eb\u306e\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u65b9\u6cd5\u306b\u95a2\u3059\u308b 3 \u90e8\u69cb\u6210\u306e\u30d1\u30fc\u30c8 2","breadcrumb":{"@id":"https:\/\/skimai.com\/how-to-fine-tune-bert-for-named-entity-recognition-ner\/#breadcrumb"},"inLanguage":"ja","potentialAction":[{"@type":"ReadAction","target":["https:\/\/skimai.com\/how-to-fine-tune-bert-for-named-entity-recognition-ner\/"]}]},{"@type":"ImageObject","inLanguage":"ja","@id":"https:\/\/skimai.com\/how-to-fine-tune-bert-for-named-entity-recognition-ner\/#primaryimage","url":"https:\/\/skimai.com\/wp-content\/uploads\/2020\/12\/part2-1.png","contentUrl":"https:\/\/skimai.com\/wp-content\/uploads\/2020\/12\/part2-1.png","width":2048,"height":1152,"caption":"part2 1"},{"@type":"BreadcrumbList","@id":"https:\/\/skimai.com\/how-to-fine-tune-bert-for-named-entity-recognition-ner\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/skimai.com\/"},{"@type":"ListItem","position":2,"name":"Tutorial: How to Fine-Tune BERT for Named Entity Recognition (NER)"}]},{"@type":"WebSite","@id":"https:\/\/skimai.com\/uk\/#website","url":"https:\/\/skimai.com\/uk\/","name":"\u30b9\u30ad\u30e0AI","description":"AI\u30a8\u30fc\u30b8\u30a7\u30f3\u30c8\u30fb\u30ef\u30fc\u30af\u30d5\u30a9\u30fc\u30b9\u30fb\u30d7\u30e9\u30c3\u30c8\u30d5\u30a9\u30fc\u30e0","publisher":{"@id":"https:\/\/skimai.com\/uk\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/skimai.com\/uk\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"ja"},{"@type":"Organization","@id":"https:\/\/skimai.com\/uk\/#organization","name":"\u30b9\u30ad\u30e0AI","url":"https:\/\/skimai.com\/uk\/","logo":{"@type":"ImageObject","inLanguage":"ja","@id":"https:\/\/skimai.com\/uk\/#\/schema\/logo\/image\/","url":"http:\/\/skimai.com\/wp-content\/uploads\/2020\/07\/SKIM-AI-Header-Logo.png","contentUrl":"http:\/\/skimai.com\/wp-content\/uploads\/2020\/07\/SKIM-AI-Header-Logo.png","width":194,"height":58,"caption":"Skim AI"},"image":{"@id":"https:\/\/skimai.com\/uk\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.linkedin.com\/company\/skim-ai"]},{"@type":"Person","@id":"https:\/\/skimai.com\/uk\/#\/schema\/person\/7a883b4a2d2ea22040f42a7975eb86c6","name":"\u30b0\u30ec\u30b4\u30ea\u30fc\u30fb\u30a8\u30ea\u30a2\u30b9","url":"https:\/\/skimai.com\/ja\/author\/gregg\/"}]}},"_links":{"self":[{"href":"https:\/\/skimai.com\/ja\/wp-json\/wp\/v2\/posts\/3732","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/skimai.com\/ja\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/skimai.com\/ja\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/skimai.com\/ja\/wp-json\/wp\/v2\/users\/1003"}],"replies":[{"embeddable":true,"href":"https:\/\/skimai.com\/ja\/wp-json\/wp\/v2\/comments?post=3732"}],"version-history":[{"count":0,"href":"https:\/\/skimai.com\/ja\/wp-json\/wp\/v2\/posts\/3732\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/skimai.com\/ja\/wp-json\/wp\/v2\/media\/3734"}],"wp:attachment":[{"href":"https:\/\/skimai.com\/ja\/wp-json\/wp\/v2\/media?parent=3732"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/skimai.com\/ja\/wp-json\/wp\/v2\/categories?post=3732"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/skimai.com\/ja\/wp-json\/wp\/v2\/tags?post=3732"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}