Generando texto con GPT2 y PyTorch

Generación de texto rápida y fácil en cualquier idioma utilizando el marco Huggingface

Como parte del curso “Machine Learning. Avanzado ” preparó una traducción de material interesante.



También lo invitamos a participar en el seminario web abierto sobre "Bandidos con armas múltiples para optimizar las pruebas AB". En el webinar, los participantes, junto con un experto, analizarán uno de los casos de uso más efectivos para el aprendizaje por refuerzo y también considerarán cómo reformular el problema de la prueba AB en un problema de inferencia bayesiano.






Introducción

(Natural Language Processing - NLP) . , , GPT-3, , , . - , , .





GPT-2 — GPT-3. Transformers, Huggingface. , GPT-2 , : GPT2 Pytorch





GPT-2 , , ! , .





  • 1:





  • 2:





  • 3:





  • 4: ,





  • 5:





  • :





1:

Huggingface Transformers, , PyTorch. PyTorch, .





PyTorch, Huggingface Transformers, :





pip install transformers
      
      



2:

Transformers, pipeline:





from transformers import pipeline
      
      



pipeline , .





3:

. :





text_generation = pipeline(“text-generation”)
      
      



— GPT-2, .





4: ,

, . :





The world is

()

prefix_text = "The world is"
      
      



5:

, , ! , :





generated_text= text_generation(prefix_text, max_length=50, do_sample=False)[0]

print(generated_text[‘generated_text’])
      
      



max_length



50 . :





The world is a better place if you’re a good person.

(   ,    .)

I’m not saying that you should be a bad person. I’m saying that you should be a good person.

(  ,      .  ,      .)

I’m not saying that you should be a bad

(  ,     .)
      
      



, , , . , . , , (, top-k/top-p ) , . , Huggingface TextGenerationPipeline.





:

-, , ; , , . , Huggingface , ( ), , .





, . GPT2 CKIPLab , .





, :





from transformers import BertTokenizerFast, AutoModelWithLMHead
      
      



:





tokenizer = BertTokenizerFast.from_pretrained(‘bert-base-chinese’)

model = AutoModelWithLMHead.from_pretrained(‘ckiplab/gpt2-base-chinese’)
      
      



:





text_generation = pipeline(“text-generation”, model=model, tokenizer=tokenizer)
      
      



, , :





我 想 要 去

prefix_text = "我 想 要 去"

##  
      
      



, , :





generated_text= text_generation(prefix_text, max_length=50, do_sample=False)[0]

print(generated_text['generated_text'])
      
      



:





我 想 要 去 看 看 。 」 他 說 : 「 我 們 不 能 說, 我 們 不 能 說, 我 們 不 能 說, 我 們 不 能 說, 我 們 不 能 說, 我 們 不 能 說, 我 們

##    ».  : «   ,    ,    , 
   ,    ,    , .
      
      



, , .





! , , API, Huggingface, . Jupyter:









In [1]:
from transformers import pipeline
 
In [ ]:
text_generation = pipeline("text-generation")
 
In [7]:
prefix_text = "The world is"
 
In [8]:
generated_text= text_generation(prefix_text, max_length=50, do_sample=False)[0]
print(generated_text['generated_text'])
 
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
 
The world is a better place if you're a good person.
 
I'm not saying that you should be a bad person. I'm saying that you should be a good person.
 
I'm not saying that you should be a bad

      
      



! , . - , . , , . , .





Brown, Tom B., et al. “Language models are few-shot learners.” arXiv preprint arXiv:2005.14165 (2020).





Radford, Alec, et al. “Language models are unsupervised multitask learners.” OpenAI blog 1.8 (2019): 9.





Transformers Github, Huggingface





Transformers Official Documentation, Huggingface





Pytorch Official Website, Facebook AI Research





Fan, Angela, Mike Lewis, and Yann Dauphin. “Hierarchical neural story generation.” arXiv preprint arXiv:1805.04833 (2018).





Welleck, Sean, et al. “Neural text generation with unlikelihood training.” arXiv preprint arXiv:1908.04319 (2019).





CKIPLab Transformers Github, Chinese Knowlege and Information Processing at the Institute of Information Science and the Institute of Linguistics of Academia Sinica






«Machine Learning. Advanced».





«Multi-armed bandits AB ».








All Articles