{"id":3733,"date":"2020-12-28T12:06:26","date_gmt":"2020-12-28T17:06:26","guid":{"rendered":"http:\/\/skimai.com\/?p=3733"},"modified":"2024-05-20T07:38:31","modified_gmt":"2024-05-20T12:38:31","slug":"reconocimiento-de-entidades-con-nombre-mediante-transformadores","status":"publish","type":"post","link":"https:\/\/skimai.com\/es\/named-entity-recognition-with-transformers\/","title":{"rendered":"Reconocimiento de entidades con nombre mediante transformadores"},"content":{"rendered":"<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_85 counter-hierarchy ez-toc-counter ez-toc-grey ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Table of Contents<\/p>\n<span class=\"ez-toc-title-toggle\"><a href=\"#\" class=\"ez-toc-pull-right ez-toc-btn ez-toc-btn-xs ez-toc-btn-default ez-toc-toggle\" aria-label=\"Alternar tabla de contenidos\"><span class=\"ez-toc-js-icon-con\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #999;color:#999\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #999;color:#999\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/span><\/a><\/span><\/div>\n<nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-1'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/skimai.com\/es\/named-entity-recognition-with-transformers\/#Named_Entity_Recognition_with_Transformers\" >Named Entity Recognition with Transformers<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-1'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/skimai.com\/es\/named-entity-recognition-with-transformers\/#IntroductionPermalink\" >IntroductionPermalink<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-1'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/skimai.com\/es\/named-entity-recognition-with-transformers\/#SetupPermalink\" >SetupPermalink<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-1'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/skimai.com\/es\/named-entity-recognition-with-transformers\/#DataPermalink\" >DataPermalink<\/a><ul class='ez-toc-list-level-2' ><li class='ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/skimai.com\/es\/named-entity-recognition-with-transformers\/#1_Download_DatasetsPermalink\" >1. Download DatasetsPermalink<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/skimai.com\/es\/named-entity-recognition-with-transformers\/#2_PreprocessingPermalink\" >2. PreprocessingPermalink<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/skimai.com\/es\/named-entity-recognition-with-transformers\/#3_LabelsPermalink\" >3. LabelsPermalink<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-1'><a class=\"ez-toc-link ez-toc-heading-8\" href=\"https:\/\/skimai.com\/es\/named-entity-recognition-with-transformers\/#Fine-tuning_ModelPermalink\" >Fine-tuning ModelPermalink<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-1'><a class=\"ez-toc-link ez-toc-heading-9\" href=\"https:\/\/skimai.com\/es\/named-entity-recognition-with-transformers\/#PipelinePermalink\" >PipelinePermalink<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-1'><a class=\"ez-toc-link ez-toc-heading-10\" href=\"https:\/\/skimai.com\/es\/named-entity-recognition-with-transformers\/#ConclusionPermalink\" >ConclusionPermalink<\/a><\/li><\/ul><\/nav><\/div>\n<h1><span class=\"ez-toc-section\" id=\"Named_Entity_Recognition_with_Transformers\"><\/span>Named Entity Recognition with Transformers<span class=\"ez-toc-section-end\"><\/span><\/h1>\n<pre><code>        <a href=\"https:\/\/colab.research.google.com\/drive\/1ezuE7wC7Fa21Wu3fvzRffx2m14CAySS1#scrollTo=LhKZ3vItVBzi\">\n                <\/a>\n        <section><\/code><\/pre>\n<h1 id=\"introduction\"><span class=\"ez-toc-section\" id=\"IntroductionPermalink\"><\/span>Introduction<a title=\"Permalink\" href=\"#introduction\">Permalink<\/a><span class=\"ez-toc-section-end\"><\/span><\/h1>\n<ul>\n<li><a href=\"https:\/\/chriskhanhtran.github.io\/posts\/spanberta-bert-for-spanish-from-scratch\/\">Part I: How We Trained RoBERTa Language Model for Spanish from Scratch<\/a><\/li>\n<\/ul>\n<p>In my previous blog post, we discussed how we pretrained SpanBERTa, a transformer language model for Spanish, on a big corpus from scratch. The model has shown to be able to predict correctly masked words in a sequence based on its context. In this blog post, to really leverage the power of transformer models, we will fine-tune SpanBERTa for a named-entity recognition task.<\/p>\n<p>According to its definition on <a href=\"https:\/\/en.wikipedia.org\/wiki\/Named-entity_recognition\">Wikipedia<\/a>, Named-entity recognition (NER) (also known as entity identification, entity chunking and entity extraction) is a subtask of information extraction that seeks to locate and classify named entity mentioned in unstructured text into pre-defined categories such as person names, organizations, locations, medical codes, time expressions, quantities, monetary values, percentages, etc.<\/p>\n<p>We will use the script <a href=\"https:\/\/github.com\/huggingface\/transformers\/blob\/master\/examples\/ner\/run_ner.py\"><code>run_ner.py<\/code><\/a> by Hugging Face and <a href=\"https:\/\/www.kaggle.com\/nltkdata\/conll-corpora\">CoNLL-2002 dataset<\/a> to fine-tune SpanBERTa.<\/p>\n<h1 id=\"setup\"><span class=\"ez-toc-section\" id=\"SetupPermalink\"><\/span>Setup<a title=\"Permalink\" href=\"#setup\">Permalink<\/a><span class=\"ez-toc-section-end\"><\/span><\/h1>\n<p>Download <code>transformers<\/code> and install required packages.<\/p>\n<pre><code>%%capture\n!git clone https:\/\/github.com\/huggingface\/transformers\n%cd transformers\n!pip install .\n!pip install -r .\/examples\/requirements.txt\n%cd ..\n<\/code><\/pre>\n<h1 id=\"data\"><span class=\"ez-toc-section\" id=\"DataPermalink\"><\/span>Data<a title=\"Permalink\" href=\"#data\">Permalink<\/a><span class=\"ez-toc-section-end\"><\/span><\/h1>\n<h2 id=\"1-download-datasets\"><span class=\"ez-toc-section\" id=\"1_Download_DatasetsPermalink\"><\/span>1. Download Datasets<a title=\"Permalink\" href=\"#1-download-datasets\">Permalink<\/a><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>The below command will download and unzip the dataset. The files contain the train and test data for three parts of the <a href=\"https:\/\/www.clips.uantwerpen.be\/conll2002\/ner\/\">CoNLL-2002<\/a> shared task:<\/p>\n<ul>\n<li>esp.testa: Spanish test data for the development stage<\/li>\n<li>esp.testb: Spanish test data<\/li>\n<li>esp.train: Spanish train data<\/li>\n<\/ul>\n<pre><code>%%capture\n!wget -O 'conll2002.zip' 'https:\/\/drive.google.com\/uc?export=download&amp;id=1Wrl1b39ZXgKqCeAFNM9EoXtA1kzwNhCe'\n!unzip 'conll2002.zip'\n<\/code><\/pre>\n<p>The size of each dataset:<\/p>\n<pre><code>!wc -l conll2002\/esp.train\n!wc -l conll2002\/esp.testa\n!wc -l conll2002\/esp.testb\n<\/code><\/pre>\n<pre><code>273038 conll2002\/esp.train\n54838 conll2002\/esp.testa\n53050 conll2002\/esp.testb\n<\/code><\/pre>\n<p>All data files has three columns: words, associated part-of-speech tags and named entity tags in the IOB2 format. Sentence breaks are encoded by empty lines.<\/p>\n<pre><code>!head -n20 conll2002\/esp.train\n<\/code><\/pre>\n<pre><code>Melbourne NP B-LOC\n( Fpa O\nAustralia NP B-LOC\n) Fpt O\n, Fc O\n25 Z O\nmay NC O\n( Fpa O\nEFE NC B-ORG\n) Fpt O\n. Fp O\n- Fg O\nEl DA O\nAbogado NC B-PER\nGeneral AQ I-PER\ndel SP I-PER\nEstado NC I-PER\n, Fc O\n<\/code><\/pre>\n<p>We will only keep the word column and the named entity tag column for our train, dev and test datasets.<\/p>\n<pre><code>!cat conll2002\/esp.train | cut -d \" \" -f 1,3 &gt; train_temp.txt\n!cat conll2002\/esp.testa | cut -d \" \" -f 1,3 &gt; dev_temp.txt\n!cat conll2002\/esp.testb | cut -d \" \" -f 1,3 &gt; test_temp.txt\n<\/code><\/pre>\n<h2 id=\"2-preprocessing\"><span class=\"ez-toc-section\" id=\"2_PreprocessingPermalink\"><\/span>2. Preprocessing<a title=\"Permalink\" href=\"#2-preprocessing\">Permalink<\/a><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Let\u2019s define some variables that we need for further pre-processing steps and training the model:<\/p>\n<pre><code>MAX_LENGTH = 120 #@param {type: \"integer\"}\nMODEL = \"chriskhanhtran\/spanberta\" #@param [\"chriskhanhtran\/spanberta\", \"bert-base-multilingual-cased\"]\n<\/code><\/pre>\n<p>The script below will split sentences longer than <code>MAX_LENGTH<\/code> (in terms of tokens) into small ones. Otherwise, long sentences will be truncated when tokenized, causing the loss of training data and some tokens in the test set not being predicted.<\/p>\n<pre><code>%%capture\n!wget \"https:\/\/raw.githubusercontent.com\/stefan-it\/fine-tuned-berts-seq\/master\/scripts\/preprocess.py\"\n<\/code><\/pre>\n<pre><code>!python3 preprocess.py train_temp.txt $MODEL $MAX_LENGTH &gt; train.txt\n!python3 preprocess.py dev_temp.txt $MODEL $MAX_LENGTH &gt; dev.txt\n!python3 preprocess.py test_temp.txt $MODEL $MAX_LENGTH &gt; test.txt\n<\/code><\/pre>\n<pre><code>2020-04-22 23:02:05.747294: I tensorflow\/stream_executor\/platform\/default\/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1\nDownloading: 100% 1.03k\/1.03k [00:00&lt;00:00, 704kB\/s]\nDownloading: 100% 954k\/954k [00:00&lt;00:00, 1.89MB\/s]\nDownloading: 100% 512k\/512k [00:00&lt;00:00, 1.19MB\/s]\nDownloading: 100% 16.0\/16.0 [00:00&lt;00:00, 12.6kB\/s]\n2020-04-22 23:02:23.409488: I tensorflow\/stream_executor\/platform\/default\/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1\n2020-04-22 23:02:31.168967: I tensorflow\/stream_executor\/platform\/default\/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1\n<\/code><\/pre>\n<h2 id=\"3-labels\"><span class=\"ez-toc-section\" id=\"3_LabelsPermalink\"><\/span>3. Labels<a title=\"Permalink\" href=\"#3-labels\">Permalink<\/a><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>In CoNLL-2002\/2003 datasets, there are have 9 classes of NER tags:<\/p>\n<ul>\n<li>O, Outside of a named entity<\/li>\n<li>B-MIS, Beginning of a miscellaneous entity right after another miscellaneous entity<\/li>\n<li>I-MIS, Miscellaneous entity<\/li>\n<li>B-PER, Beginning of a person\u2019s name right after another person\u2019s name<\/li>\n<li>I-PER, Person\u2019s name<\/li>\n<li>B-ORG, Beginning of an organisation right after another organisation<\/li>\n<li>I-ORG, Organisation<\/li>\n<li>B-LOC, Beginning of a location right after another location<\/li>\n<li>I-LOC, Location<\/li>\n<\/ul>\n<p>If your dataset has different labels or more labels than CoNLL-2002\/2003 datasets, run the line below to get unique labels from your data and save them into <code>labels.txt<\/code>. This file will be used when we start fine-tuning our model.<\/p>\n<pre><code>!cat train.txt dev.txt test.txt | cut -d \" \" -f 2 | grep -v \"^$\"| sort | uniq &gt; labels.txt\n<\/code><\/pre>\n<h1 id=\"fine-tuning-model\"><span class=\"ez-toc-section\" id=\"Fine-tuning_ModelPermalink\"><\/span>Fine-tuning Model<a title=\"Permalink\" href=\"#fine-tuning-model\">Permalink<\/a><span class=\"ez-toc-section-end\"><\/span><\/h1>\n<p>These are the example scripts from <code>transformers<\/code>\u2019s repo that we will use to fine-tune our model for NER. After 04\/21\/2020, Hugging Face has updated their example scripts to use a new <code>Trainer<\/code> class. To avoid any future conflict, let\u2019s use the version before they made these updates.<\/p>\n<pre><code>%%capture\n!wget \"https:\/\/raw.githubusercontent.com\/chriskhanhtran\/spanish-bert\/master\/ner\/run_ner.py\"\n!wget \"https:\/\/raw.githubusercontent.com\/chriskhanhtran\/spanish-bert\/master\/ner\/utils_ner.py\"\n<\/code><\/pre>\n<p>Now it\u2019s time for transfer learning. In my <a href=\"https:\/\/chriskhanhtran.github.io\/posts\/spanberta-bert-for-spanish-from-scratch\/\">previous blog post<\/a>, I have pretrained a RoBERTa language model on a very large Spanish corpus to predict masked words based on the context they are in. By doing that, the model has learned inherent properties of the language. I have uploaded the pretrained model to Hugging Face\u2019s server. Now we will load the model and start fine-tuning it for the NER task.<\/p>\n<p>Below are our training hyperparameters.<\/p>\n<pre><code>MAX_LENGTH = 128 #@param {type: \"integer\"}\nMODEL = \"chriskhanhtran\/spanberta\" #@param [\"chriskhanhtran\/spanberta\", \"bert-base-multilingual-cased\"]\nOUTPUT_DIR = \"spanberta-ner\" #@param [\"spanberta-ner\", \"bert-base-ml-ner\"]\nBATCH_SIZE = 32 #@param {type: \"integer\"}\nNUM_EPOCHS = 3 #@param {type: \"integer\"}\nSAVE_STEPS = 100 #@param {type: \"integer\"}\nLOGGING_STEPS = 100 #@param {type: \"integer\"}\nSEED = 42 #@param {type: \"integer\"}\n<\/code><\/pre>\n<p>Let\u2019s start training.<\/p>\n<pre><code>!python3 run_ner.py \n --data_dir .\/ \n --model_type bert \n --labels .\/labels.txt \n --model_name_or_path $MODEL \n --output_dir $OUTPUT_DIR \n --max_seq_length $MAX_LENGTH \n --num_train_epochs $NUM_EPOCHS \n --per_gpu_train_batch_size $BATCH_SIZE \n --save_steps $SAVE_STEPS \n --logging_steps $LOGGING_STEPS \n --seed $SEED \n --do_train \n --do_eval \n --do_predict \n --overwrite_output_dir\n<\/code><\/pre>\n<p>Performance on the dev set:<\/p>\n<pre><code>04\/21\/2020 02:24:31 - INFO - __main__ - ***** Eval results *****\n04\/21\/2020 02:24:31 - INFO - __main__ - f1 = 0.831027443864822\n04\/21\/2020 02:24:31 - INFO - __main__ - loss = 0.1004064822183894\n04\/21\/2020 02:24:31 - INFO - __main__ - precision = 0.8207885304659498\n04\/21\/2020 02:24:31 - INFO - __main__ - recall = 0.8415250344510795\n<\/code><\/pre>\n<p>Performance on the test set:<\/p>\n<pre><code>04\/21\/2020 02:24:48 - INFO - __main__ - ***** Eval results *****\n04\/21\/2020 02:24:48 - INFO - __main__ - f1 = 0.8559533721898419\n04\/21\/2020 02:24:48 - INFO - __main__ - loss = 0.06848683688204177\n04\/21\/2020 02:24:48 - INFO - __main__ - precision = 0.845858475041141\n04\/21\/2020 02:24:48 - INFO - __main__ - recall = 0.8662921348314607\n<\/code><\/pre>\n<p>Here are the tensorboards of fine-tuning <a href=\"https:\/\/tensorboard.dev\/experiment\/Ggs7aCjWQ0exU2Nbp3pPlQ\/#scalars&_smoothingWeight=0.265\">spanberta<\/a> and <a href=\"https:\/\/tensorboard.dev\/experiment\/M9AXw2lORjeRzFZzEJOxkA\/#scalars\">bert-base-multilingual-cased<\/a> for 5 epoches. We can see that the models overfit the training data after 3 epoches.<\/p>\n<p><img decoding=\"async\" src=\".\/Named Entity Recognition with Transformers - Chris Tran_files\/spanberta-ner-tb-5.JPG\" alt=\"\" \/><\/p>\n<p><strong>Classification Report<\/strong><\/p>\n<p>To understand how well our model actually performs, let\u2019s load its predictions and examine the classification report.<\/p>\n<pre><code>def read_examples_from_file(file_path):\n \"\"\"Read words and labels from a CoNLL-2002\/2003 data file.\n Args:\n file_path (str): path to NER data file.\n Returns:\n examples (dict): a dictionary with two keys: <code>words<\/code> (list of lists)\n holding words in each sequence, and <code>labels<\/code> (list of lists) holding\n corresponding labels.\n \"\"\"\n with open(file_path, encoding=\"utf-8\") as f:\n examples = {\"words\": [], \"labels\": []}\n words = []\n labels = []\n for line in f:\n if line.startswith(\"-DOCSTART-\") or line == \"\" or line == \"n\":\n if words:\n examples[\"words\"].append(words)\n examples[\"labels\"].append(labels)\n words = []\n labels = []\n else:\n splits = line.split(\" \")\n words.append(splits[0])\n if len(splits) &gt; 1:\n labels.append(splits[-1].replace(\"n\", \"\"))\n else:\n # Examples could have no label for mode = \"test\"\n labels.append(\"O\")\n return examples\n<\/code><\/pre>\n<p>Read data and labels from the raw text files:<\/p>\n<pre><code>y_true = read_examples_from_file(\"test.txt\")[\"labels\"]\ny_pred = read_examples_from_file(\"spanberta-ner\/test_predictions.txt\")[\"labels\"]\n<\/code><\/pre>\n<p>Print the classification report:<\/p>\n<pre><code>from seqeval.metrics import classification_report as classification_report_seqeval\nprint(classification_report_seqeval(y_true, y_pred))\n<\/code><\/pre>\n<pre><code> precision recall f1-score support\n LOC 0.87 0.84 0.85 1084\n ORG 0.82 0.87 0.85 1401\n MISC 0.63 0.66 0.65 340\n PER 0.94 0.96 0.95 735\nmicro avg 0.84 0.86 0.85 3560\nmacro avg 0.84 0.86 0.85 3560\n<\/code><\/pre>\n<p>The metrics we are seeing in this report are designed specifically for NLP tasks such as NER and POS tagging, in which all words of an entity need to be predicted correctly to be counted as one correct prediction. Therefore, the metrics in this classification report are much lower than in <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.metrics.classification_report.html\">scikit-learn\u2019s classification report<\/a>.<\/p>\n<pre><code>import numpy as np\nfrom sklearn.metrics import classification_report\nprint(classification_report(np.concatenate(y_true), np.concatenate(y_pred)))\n<\/code><\/pre>\n<pre><code> precision recall f1-score support\n B-LOC 0.88 0.85 0.86 1084\n B-MISC 0.73 0.73 0.73 339\n B-ORG 0.87 0.91 0.89 1400\n B-PER 0.95 0.96 0.95 735\n I-LOC 0.82 0.81 0.81 325\n I-MISC 0.85 0.76 0.80 557\n I-ORG 0.89 0.87 0.88 1104\n I-PER 0.98 0.98 0.98 634\n O 1.00 1.00 1.00 45355\n accuracy 0.98 51533\n macro avg 0.89 0.87 0.88 51533\nweighted avg 0.98 0.98 0.98 51533\n<\/code><\/pre>\n<p>From above reports, our model has a good performance in predicting person, location and organization. We will need more data for <code>MISC<\/code> entities to improve our model\u2019s performance on these entities.<\/p>\n<h1 id=\"pipeline\"><span class=\"ez-toc-section\" id=\"PipelinePermalink\"><\/span>Pipeline<a title=\"Permalink\" href=\"#pipeline\">Permalink<\/a><span class=\"ez-toc-section-end\"><\/span><\/h1>\n<p>After fine-tuning our models, we can share them with the community by following the tutorial in this <a href=\"https:\/\/huggingface.co\/transformers\/model_sharing.html\">page<\/a>. Now we can start loading the fine-tuned model from Hugging Face\u2019s server and use it to predict named entities in Spanish documents.<\/p>\n<pre><code>from transformers import pipeline, AutoModelForTokenClassification, AutoTokenizer\nmodel = AutoModelForTokenClassification.from_pretrained(\"skimai\/spanberta-base-cased-ner-conll02\")\ntokenizer = AutoTokenizer.from_pretrained(\"skimai\/spanberta-base-cased-ner-conll02\")\nner_model = pipeline('ner', model=model, tokenizer=tokenizer)\n<\/code><\/pre>\n<p>The example below is obtained from <a href=\"https:\/\/laopinion.com\/2020\/04\/19\/secretario-del-tesoro-advierte-que-la-economia-de-estados-unidos-tardara-meses-en-recuperarse-tras-coronavirus\/\">La Opini\u00f3n<\/a> and means \u201c<em>The economic recovery of the United States after the coronavirus pandemic will be a matter of months, said Treasury Secretary Steven Mnuchin.<\/em>\u201d<\/p>\n<pre><code>sequence = \"La recuperaci\u00f3n econ\u00f3mica de los Estados Unidos despu\u00e9s de la \" \n \"pandemia del coronavirus ser\u00e1 cuesti\u00f3n de meses, afirm\u00f3 el \" \n \"Secretario del Tesoro, Steven Mnuchin.\"\nner_model(sequence)\n<\/code><\/pre>\n<pre><code>[{'entity': 'B-ORG', 'score': 0.9155661463737488, 'word': '\u0120Estados'},\n {'entity': 'I-ORG', 'score': 0.800682544708252, 'word': '\u0120Unidos'},\n {'entity': 'I-MISC', 'score': 0.5006815791130066, 'word': '\u0120corona'},\n {'entity': 'I-MISC', 'score': 0.510674774646759, 'word': 'virus'},\n {'entity': 'B-PER', 'score': 0.5558510422706604, 'word': '\u0120Secretario'},\n {'entity': 'I-PER', 'score': 0.7758238315582275, 'word': '\u0120del'},\n {'entity': 'I-PER', 'score': 0.7096233367919922, 'word': '\u0120Tesoro'},\n {'entity': 'B-PER', 'score': 0.9940345883369446, 'word': '\u0120Steven'},\n {'entity': 'I-PER', 'score': 0.9962581992149353, 'word': '\u0120M'},\n {'entity': 'I-PER', 'score': 0.9918380379676819, 'word': 'n'},\n {'entity': 'I-PER', 'score': 0.9848328828811646, 'word': 'uch'},\n {'entity': 'I-PER', 'score': 0.8513168096542358, 'word': 'in'}]\n<\/code><\/pre>\n<p>Looks great! The fine-tuned model successfully recognizes all entities in our example, and even recognizes \u201ccorona virus.\u201d<\/p>\n<h1 id=\"conclusion\"><span class=\"ez-toc-section\" id=\"ConclusionPermalink\"><\/span>Conclusion<a title=\"Permalink\" href=\"#conclusion\">Permalink<\/a><span class=\"ez-toc-section-end\"><\/span><\/h1>\n<p>Named-entity recognition can help us quickly extract important information from texts. Therefore, its application in business can have a direct impact on improving human\u2019s productivity in reading contracts and documents. However, it is a challenging NLP task because NER requires accurate classification at the word level, making simple approaches such as bag-of-word impossible to deal with this task.<\/p>\n<p>We have walked through how we can leverage a pretrained BERT model to quickly gain an excellent performance on the NER task for Spanish. The pretrained SpanBERTa model can also be fine-tuned for other tasks such as document classification. I have written a detailed tutorial to finetune BERT for sequence classification and sentiment analysis.<\/p>\n<ul>\n<li><a href=\"https:\/\/chriskhanhtran.github.io\/posts\/bert-for-sentiment-analysis\/\">Fine-tuning BERT for Sentiment Analysis<\/a><\/li>\n<\/ul>\n<p>Next in this series, we will discuss ELECTRA, a more efficient pre-training approach for transformer models which can quickly achieve state-of-the-art performance. Stay tuned!<br \/>\n<\/section><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Named Entity Recognition with Transformers IntroductionPermalink Part I: How We Trained RoBERTa Language Model for Spanish from Scratch In my previous blog post, we discussed how we pretrained SpanBERTa, a transformer language model for Spanish, on a big corpus from scratch. The model has shown to be able to predict correctly masked words in a [&hellip;]<\/p>\n","protected":false},"author":1003,"featured_media":3017,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"single-custom-post-template.php","format":"standard","meta":{"_et_pb_use_builder":"","_et_pb_old_content":"","_et_gb_content_width":"","footnotes":""},"categories":[125,64,67],"tags":[],"class_list":["post-3733","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-enterprise-ai-blog","category-how-to","category-ml-nlp"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v24.1 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Named Entity Recognition with Transformers - Skim AI<\/title>\n<meta name=\"description\" content=\"A Skim AI expert walks you through fine tuning BERT for sentiment analysis using HuggingFace\u2019s transformers library and compares it to a baseline.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/skimai.com\/es\/reconocimiento-de-entidades-con-nombre-mediante-transformadores\/\" \/>\n<meta property=\"og:locale\" content=\"es_ES\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Named Entity Recognition with Transformers - Skim AI\" \/>\n<meta property=\"og:description\" content=\"A Skim AI expert walks you through fine tuning BERT for sentiment analysis using HuggingFace\u2019s transformers library and compares it to a baseline.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/skimai.com\/es\/reconocimiento-de-entidades-con-nombre-mediante-transformadores\/\" \/>\n<meta property=\"og:site_name\" content=\"Skim AI\" \/>\n<meta property=\"article:published_time\" content=\"2020-12-28T17:06:26+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-05-20T12:38:31+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/skimai.com\/wp-content\/uploads\/2020\/03\/Screen-Shot-2020-04-13-at-5.59.33-PM.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1288\" \/>\n\t<meta property=\"og:image:height\" content=\"486\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Greggory Elias\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Escrito por\" \/>\n\t<meta name=\"twitter:data1\" content=\"Greggory Elias\" \/>\n\t<meta name=\"twitter:label2\" content=\"Tiempo de lectura\" \/>\n\t<meta name=\"twitter:data2\" content=\"9 minutos\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/skimai.com\/es\/reconocimiento-de-entidades-con-nombre-mediante-transformadores\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/skimai.com\/es\/reconocimiento-de-entidades-con-nombre-mediante-transformadores\/\"},\"author\":{\"name\":\"Greggory Elias\",\"@id\":\"https:\/\/skimai.com\/uk\/#\/schema\/person\/7a883b4a2d2ea22040f42a7975eb86c6\"},\"headline\":\"Named Entity Recognition with Transformers\",\"datePublished\":\"2020-12-28T17:06:26+00:00\",\"dateModified\":\"2024-05-20T12:38:31+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/skimai.com\/es\/reconocimiento-de-entidades-con-nombre-mediante-transformadores\/\"},\"wordCount\":939,\"publisher\":{\"@id\":\"https:\/\/skimai.com\/uk\/#organization\"},\"image\":{\"@id\":\"https:\/\/skimai.com\/es\/reconocimiento-de-entidades-con-nombre-mediante-transformadores\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/skimai.com\/wp-content\/uploads\/2020\/03\/Screen-Shot-2020-04-13-at-5.59.33-PM.png\",\"articleSection\":[\"Enterprise AI\",\"How to\",\"LLMs \/ NLP\"],\"inLanguage\":\"es\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/skimai.com\/es\/reconocimiento-de-entidades-con-nombre-mediante-transformadores\/\",\"url\":\"https:\/\/skimai.com\/es\/reconocimiento-de-entidades-con-nombre-mediante-transformadores\/\",\"name\":\"Named Entity Recognition with Transformers - Skim AI\",\"isPartOf\":{\"@id\":\"https:\/\/skimai.com\/uk\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/skimai.com\/es\/reconocimiento-de-entidades-con-nombre-mediante-transformadores\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/skimai.com\/es\/reconocimiento-de-entidades-con-nombre-mediante-transformadores\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/skimai.com\/wp-content\/uploads\/2020\/03\/Screen-Shot-2020-04-13-at-5.59.33-PM.png\",\"datePublished\":\"2020-12-28T17:06:26+00:00\",\"dateModified\":\"2024-05-20T12:38:31+00:00\",\"description\":\"A Skim AI expert walks you through fine tuning BERT for sentiment analysis using HuggingFace\u2019s transformers library and compares it to a baseline.\",\"breadcrumb\":{\"@id\":\"https:\/\/skimai.com\/es\/reconocimiento-de-entidades-con-nombre-mediante-transformadores\/#breadcrumb\"},\"inLanguage\":\"es\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/skimai.com\/es\/reconocimiento-de-entidades-con-nombre-mediante-transformadores\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"es\",\"@id\":\"https:\/\/skimai.com\/es\/reconocimiento-de-entidades-con-nombre-mediante-transformadores\/#primaryimage\",\"url\":\"https:\/\/skimai.com\/wp-content\/uploads\/2020\/03\/Screen-Shot-2020-04-13-at-5.59.33-PM.png\",\"contentUrl\":\"https:\/\/skimai.com\/wp-content\/uploads\/2020\/03\/Screen-Shot-2020-04-13-at-5.59.33-PM.png\",\"width\":1288,\"height\":486,\"caption\":\"Screen Shot 2020 04 13 at 5.59.33 PM\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/skimai.com\/es\/reconocimiento-de-entidades-con-nombre-mediante-transformadores\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/skimai.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Named Entity Recognition with Transformers\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/skimai.com\/uk\/#website\",\"url\":\"https:\/\/skimai.com\/uk\/\",\"name\":\"Skim AI\",\"description\":\"The AI Agent Workforce Platform\",\"publisher\":{\"@id\":\"https:\/\/skimai.com\/uk\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/skimai.com\/uk\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"es\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/skimai.com\/uk\/#organization\",\"name\":\"Skim AI\",\"url\":\"https:\/\/skimai.com\/uk\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"es\",\"@id\":\"https:\/\/skimai.com\/uk\/#\/schema\/logo\/image\/\",\"url\":\"http:\/\/skimai.com\/wp-content\/uploads\/2020\/07\/SKIM-AI-Header-Logo.png\",\"contentUrl\":\"http:\/\/skimai.com\/wp-content\/uploads\/2020\/07\/SKIM-AI-Header-Logo.png\",\"width\":194,\"height\":58,\"caption\":\"Skim AI\"},\"image\":{\"@id\":\"https:\/\/skimai.com\/uk\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.linkedin.com\/company\/skim-ai\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/skimai.com\/uk\/#\/schema\/person\/7a883b4a2d2ea22040f42a7975eb86c6\",\"name\":\"Greggory Elias\",\"url\":\"https:\/\/skimai.com\/es\/author\/gregg\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Reconocimiento de entidades con nombre mediante transformadores - Skim AI","description":"Un experto en IA de Skim le guiar\u00e1 a trav\u00e9s del ajuste fino de BERT para el an\u00e1lisis de sentimientos mediante la biblioteca de transformadores de HuggingFace y lo comparar\u00e1 con una l\u00ednea de base.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/skimai.com\/es\/reconocimiento-de-entidades-con-nombre-mediante-transformadores\/","og_locale":"es_ES","og_type":"article","og_title":"Named Entity Recognition with Transformers - Skim AI","og_description":"A Skim AI expert walks you through fine tuning BERT for sentiment analysis using HuggingFace\u2019s transformers library and compares it to a baseline.","og_url":"https:\/\/skimai.com\/es\/reconocimiento-de-entidades-con-nombre-mediante-transformadores\/","og_site_name":"Skim AI","article_published_time":"2020-12-28T17:06:26+00:00","article_modified_time":"2024-05-20T12:38:31+00:00","og_image":[{"width":1288,"height":486,"url":"https:\/\/skimai.com\/wp-content\/uploads\/2020\/03\/Screen-Shot-2020-04-13-at-5.59.33-PM.png","type":"image\/png"}],"author":"Greggory Elias","twitter_card":"summary_large_image","twitter_misc":{"Escrito por":"Greggory Elias","Tiempo de lectura":"9 minutos"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/skimai.com\/es\/reconocimiento-de-entidades-con-nombre-mediante-transformadores\/#article","isPartOf":{"@id":"https:\/\/skimai.com\/es\/reconocimiento-de-entidades-con-nombre-mediante-transformadores\/"},"author":{"name":"Greggory Elias","@id":"https:\/\/skimai.com\/uk\/#\/schema\/person\/7a883b4a2d2ea22040f42a7975eb86c6"},"headline":"Named Entity Recognition with Transformers","datePublished":"2020-12-28T17:06:26+00:00","dateModified":"2024-05-20T12:38:31+00:00","mainEntityOfPage":{"@id":"https:\/\/skimai.com\/es\/reconocimiento-de-entidades-con-nombre-mediante-transformadores\/"},"wordCount":939,"publisher":{"@id":"https:\/\/skimai.com\/uk\/#organization"},"image":{"@id":"https:\/\/skimai.com\/es\/reconocimiento-de-entidades-con-nombre-mediante-transformadores\/#primaryimage"},"thumbnailUrl":"https:\/\/skimai.com\/wp-content\/uploads\/2020\/03\/Screen-Shot-2020-04-13-at-5.59.33-PM.png","articleSection":["Enterprise AI","How to","LLMs \/ NLP"],"inLanguage":"es"},{"@type":"WebPage","@id":"https:\/\/skimai.com\/es\/reconocimiento-de-entidades-con-nombre-mediante-transformadores\/","url":"https:\/\/skimai.com\/es\/reconocimiento-de-entidades-con-nombre-mediante-transformadores\/","name":"Reconocimiento de entidades con nombre mediante transformadores - Skim AI","isPartOf":{"@id":"https:\/\/skimai.com\/uk\/#website"},"primaryImageOfPage":{"@id":"https:\/\/skimai.com\/es\/reconocimiento-de-entidades-con-nombre-mediante-transformadores\/#primaryimage"},"image":{"@id":"https:\/\/skimai.com\/es\/reconocimiento-de-entidades-con-nombre-mediante-transformadores\/#primaryimage"},"thumbnailUrl":"https:\/\/skimai.com\/wp-content\/uploads\/2020\/03\/Screen-Shot-2020-04-13-at-5.59.33-PM.png","datePublished":"2020-12-28T17:06:26+00:00","dateModified":"2024-05-20T12:38:31+00:00","description":"Un experto en IA de Skim le guiar\u00e1 a trav\u00e9s del ajuste fino de BERT para el an\u00e1lisis de sentimientos mediante la biblioteca de transformadores de HuggingFace y lo comparar\u00e1 con una l\u00ednea de base.","breadcrumb":{"@id":"https:\/\/skimai.com\/es\/reconocimiento-de-entidades-con-nombre-mediante-transformadores\/#breadcrumb"},"inLanguage":"es","potentialAction":[{"@type":"ReadAction","target":["https:\/\/skimai.com\/es\/reconocimiento-de-entidades-con-nombre-mediante-transformadores\/"]}]},{"@type":"ImageObject","inLanguage":"es","@id":"https:\/\/skimai.com\/es\/reconocimiento-de-entidades-con-nombre-mediante-transformadores\/#primaryimage","url":"https:\/\/skimai.com\/wp-content\/uploads\/2020\/03\/Screen-Shot-2020-04-13-at-5.59.33-PM.png","contentUrl":"https:\/\/skimai.com\/wp-content\/uploads\/2020\/03\/Screen-Shot-2020-04-13-at-5.59.33-PM.png","width":1288,"height":486,"caption":"Screen Shot 2020 04 13 at 5.59.33 PM"},{"@type":"BreadcrumbList","@id":"https:\/\/skimai.com\/es\/reconocimiento-de-entidades-con-nombre-mediante-transformadores\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/skimai.com\/"},{"@type":"ListItem","position":2,"name":"Named Entity Recognition with Transformers"}]},{"@type":"WebSite","@id":"https:\/\/skimai.com\/uk\/#website","url":"https:\/\/skimai.com\/uk\/","name":"Desnatado AI","description":"La plataforma AI Agent Workforce","publisher":{"@id":"https:\/\/skimai.com\/uk\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/skimai.com\/uk\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"es"},{"@type":"Organization","@id":"https:\/\/skimai.com\/uk\/#organization","name":"Desnatado AI","url":"https:\/\/skimai.com\/uk\/","logo":{"@type":"ImageObject","inLanguage":"es","@id":"https:\/\/skimai.com\/uk\/#\/schema\/logo\/image\/","url":"http:\/\/skimai.com\/wp-content\/uploads\/2020\/07\/SKIM-AI-Header-Logo.png","contentUrl":"http:\/\/skimai.com\/wp-content\/uploads\/2020\/07\/SKIM-AI-Header-Logo.png","width":194,"height":58,"caption":"Skim AI"},"image":{"@id":"https:\/\/skimai.com\/uk\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.linkedin.com\/company\/skim-ai"]},{"@type":"Person","@id":"https:\/\/skimai.com\/uk\/#\/schema\/person\/7a883b4a2d2ea22040f42a7975eb86c6","name":"Greggory Elias","url":"https:\/\/skimai.com\/es\/author\/gregg\/"}]}},"_links":{"self":[{"href":"https:\/\/skimai.com\/es\/wp-json\/wp\/v2\/posts\/3733","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/skimai.com\/es\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/skimai.com\/es\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/skimai.com\/es\/wp-json\/wp\/v2\/users\/1003"}],"replies":[{"embeddable":true,"href":"https:\/\/skimai.com\/es\/wp-json\/wp\/v2\/comments?post=3733"}],"version-history":[{"count":0,"href":"https:\/\/skimai.com\/es\/wp-json\/wp\/v2\/posts\/3733\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/skimai.com\/es\/wp-json\/wp\/v2\/media\/3017"}],"wp:attachment":[{"href":"https:\/\/skimai.com\/es\/wp-json\/wp\/v2\/media?parent=3733"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/skimai.com\/es\/wp-json\/wp\/v2\/categories?post=3733"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/skimai.com\/es\/wp-json\/wp\/v2\/tags?post=3733"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}