Generación de texto rápida y fácil en cualquier idioma utilizando el marco Huggingface
Como parte del curso “Machine Learning. Avanzado ” preparó una traducción de material interesante.
También lo invitamos a participar en el seminario web abierto sobre "Bandidos con armas múltiples para optimizar las pruebas AB". En el webinar, los participantes, junto con un experto, analizarán uno de los casos de uso más efectivos para el aprendizaje por refuerzo y también considerarán cómo reformular el problema de la prueba AB en un problema de inferencia bayesiano.
Introducción
— (Natural Language Processing - NLP) . , , GPT-3, , , . - , , .
GPT-2 — GPT-3. Transformers, Huggingface. , GPT-2 , : GPT2 Pytorch
GPT-2 , , ! , .
1:
2:
3:
4: ,
5:
:
1:
Huggingface Transformers, , PyTorch. PyTorch, .
PyTorch, Huggingface Transformers, :
pip install transformers
2:
Transformers, pipeline:
from transformers import pipeline
pipeline , .
3:
. :
text_generation = pipeline(“text-generation”)
— GPT-2, .
4: ,
, . :
The world is
()
prefix_text = "The world is"
5:
, , ! , :
generated_text= text_generation(prefix_text, max_length=50, do_sample=False)[0]
print(generated_text[‘generated_text’])
max_length
50 . :
The world is a better place if you’re a good person.
( , .)
I’m not saying that you should be a bad person. I’m saying that you should be a good person.
( , . , .)
I’m not saying that you should be a bad
( , .)
, , , . , . , , (, top-k/top-p ) , . , Huggingface TextGenerationPipeline.
:
-, , ; , , . , Huggingface , ( ), , .
, . GPT2 CKIPLab , .
from transformers import BertTokenizerFast, AutoModelWithLMHead
:
tokenizer = BertTokenizerFast.from_pretrained(‘bert-base-chinese’) model = AutoModelWithLMHead.from_pretrained(‘ckiplab/gpt2-base-chinese’)
:
text_generation = pipeline(“text-generation”, model=model, tokenizer=tokenizer)
, , :
我 想 要 去
prefix_text = "我 想 要 去"
##
, , :
generated_text= text_generation(prefix_text, max_length=50, do_sample=False)[0]
print(generated_text['generated_text'])
:
我 想 要 去 看 看 。 」 他 說 : 「 我 們 不 能 說, 我 們 不 能 說, 我 們 不 能 說, 我 們 不 能 說, 我 們 不 能 說, 我 們 不 能 說, 我 們
## ». : « , , ,
, , , .
, , .
! , , API, Huggingface, . Jupyter:
In [1]:
from transformers import pipeline
In [ ]:
text_generation = pipeline("text-generation")
In [7]:
prefix_text = "The world is"
In [8]:
generated_text= text_generation(prefix_text, max_length=50, do_sample=False)[0]
print(generated_text['generated_text'])
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
The world is a better place if you're a good person.
I'm not saying that you should be a bad person. I'm saying that you should be a good person.
I'm not saying that you should be a bad
! , . - , . , , . , .
Brown, Tom B., et al. “Language models are few-shot learners.” arXiv preprint arXiv:2005.14165 (2020).
Radford, Alec, et al. “Language models are unsupervised multitask learners.” OpenAI blog 1.8 (2019): 9.
Transformers Github, Huggingface
Transformers Official Documentation, Huggingface
Pytorch Official Website, Facebook AI Research
Fan, Angela, Mike Lewis, and Yann Dauphin. “Hierarchical neural story generation.” arXiv preprint arXiv:1805.04833 (2018).
Welleck, Sean, et al. “Neural text generation with unlikelihood training.” arXiv preprint arXiv:1908.04319 (2019).
CKIPLab Transformers Github, Chinese Knowlege and Information Processing at the Institute of Information Science and the Institute of Linguistics of Academia Sinica
«Multi-armed bandits AB ».