👨🏻 ⛹️ 👩🏿‍🤝‍👩🏼 Generando texto con GPT2 y PyTorch ⛷️ 💝 👶🏼

Generación de texto rápida y fácil en cualquier idioma utilizando el marco Huggingface

Como parte del curso “Machine Learning. Avanzado ” preparó una traducción de material interesante.

También lo invitamos a participar en el seminario web abierto sobre "Bandidos con armas múltiples para optimizar las pruebas AB". En el webinar, los participantes, junto con un experto, analizarán uno de los casos de uso más efectivos para el aprendizaje por refuerzo y también considerarán cómo reformular el problema de la prueba AB en un problema de inferencia bayesiano.

Introducción

— (Natural Language Processing - NLP) . , , GPT-3, , , . - , , .

GPT-2 — GPT-3. Transformers, Huggingface. , GPT-2 , : GPT2 Pytorch

GPT-2 , , ! , .

4: ,

1:

Huggingface Transformers, , PyTorch. PyTorch, .

PyTorch, Huggingface Transformers, :

pip install transformers

2:

Transformers, pipeline:

from transformers import pipeline

pipeline , .

3:

. :

text_generation = pipeline(“text-generation”)

— GPT-2, .

4: ,

, . :

The world is

()

prefix_text = "The world is"

5:

, , ! , :

generated_text= text_generation(prefix_text, max_length=50, do_sample=False)[0]

print(generated_text[‘generated_text’])

max_length

50 . :

The world is a better place if you’re a good person.

(   ,    .)

I’m not saying that you should be a bad person. I’m saying that you should be a good person.

(  ,      .  ,      .)

I’m not saying that you should be a bad

(  ,     .)

, , , . , . , , (, top-k/top-p ) , . , Huggingface TextGenerationPipeline.

:

-, , ; , , . , Huggingface , ( ), , .

, . GPT2 CKIPLab , .

, :

from transformers import BertTokenizerFast, AutoModelWithLMHead

tokenizer = BertTokenizerFast.from_pretrained(‘bert-base-chinese’)

model = AutoModelWithLMHead.from_pretrained(‘ckiplab/gpt2-base-chinese’)

text_generation = pipeline(“text-generation”, model=model, tokenizer=tokenizer)

, , :

我 想 要 去

prefix_text = "我 想 要 去"

##

, , :

generated_text= text_generation(prefix_text, max_length=50, do_sample=False)[0]

print(generated_text['generated_text'])

我 想 要 去 看 看 。 」 他 說 : 「 我 們 不 能 說, 我 們 不 能 說, 我 們 不 能 說, 我 們 不 能 說, 我 們 不 能 說, 我 們 不 能 說, 我 們

##    ».  : «   ,    ,    , 
   ,    ,    , .

, , .

! , , API, Huggingface, . Jupyter:

In [1]:
from transformers import pipeline
 
In [ ]:
text_generation = pipeline("text-generation")
 
In [7]:
prefix_text = "The world is"
 
In [8]:
generated_text= text_generation(prefix_text, max_length=50, do_sample=False)[0]
print(generated_text['generated_text'])
 
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
 
The world is a better place if you're a good person.
 
I'm not saying that you should be a bad person. I'm saying that you should be a good person.
 
I'm not saying that you should be a bad

! , . - , . , , . , .

Brown, Tom B., et al. “Language models are few-shot learners.” arXiv preprint arXiv:2005.14165 (2020).

Radford, Alec, et al. “Language models are unsupervised multitask learners.” OpenAI blog 1.8 (2019): 9.

Transformers Github, Huggingface

Transformers Official Documentation, Huggingface

Pytorch Official Website, Facebook AI Research

Fan, Angela, Mike Lewis, and Yann Dauphin. “Hierarchical neural story generation.” arXiv preprint arXiv:1805.04833 (2018).

Welleck, Sean, et al. “Neural text generation with unlikelihood training.” arXiv preprint arXiv:1908.04319 (2019).

CKIPLab Transformers Github, Chinese Knowlege and Information Processing at the Institute of Information Science and the Institute of Linguistics of Academia Sinica

«Machine Learning. Advanced».

«Multi-armed bandits AB ».

Generando texto con GPT2 y PyTorch

Generación de texto rápida y fácil en cualquier idioma utilizando el marco Huggingface

Introducción

1:

2:

3:

4: ,

5:

:

More articles: