Blog
Latest
The Hidden Truth About LLM Performance: Why Your Benchmark Results Might Be Misleading
There are two systematic issues with benchmarks, which were already present in the community before the advent of LLMs but became worse …
Exploring Advanced Prompt Engineering with Google’s New Gemma Models
Today, Google launched a new set of models named Gemma. These models are based on the same tech and research used for creating the Gemini…
🚀 Launching Napolab: The Natural Portuguese Language Benchmark 📊
Napolab is here: a curated collection of Portuguese datasets designed for easy evaluation of language models.
Hashformers v2.0.0 is out! 🚀
Hashtag segmentation, the task of adding spaces between words in a hashtag, can now be done with Large Language Models (LLMs).
A Simple Method to Detect In-Demand Tech Skills
It’s no secret that the tech landscape is dynamic and ever-evolving. New technologies are born, they mature, and then, often, they are…
Hashformers: Hashtag Segmentation Applications in Abusive Language Detection
Abusive language detection, a critical aspect of modern NLP research, is often challenged by the lack of generalization across different…
📢 New Portuguese NLP Model Alert! 🇵🇹 🇧🇷
I am thrilled to announce the latest milestone in the advancement of Portuguese language technology — the Albertina PT ! This breakthrough…
Archive
<h2 class="archive__subtitle">2025</h2>
The Hidden Truth About LLM Performance: Why Your Benchmark Results Might Be Misleading
There are two systematic issues with benchmarks, which were already present in the community before the advent of LLMs but became worse …
<h2 class="archive__subtitle">2024</h2>
Exploring Advanced Prompt Engineering with Google’s New Gemma Models
Today, Google launched a new set of models named Gemma. These models are based on the same tech and research used for creating the Gemini…
<h2 class="archive__subtitle">2023</h2>
🚀 Launching Napolab: The Natural Portuguese Language Benchmark 📊
Napolab is here: a curated collection of Portuguese datasets designed for easy evaluation of language models.
Hashformers v2.0.0 is out! 🚀
Hashtag segmentation, the task of adding spaces between words in a hashtag, can now be done with Large Language Models (LLMs).
A Simple Method to Detect In-Demand Tech Skills
It’s no secret that the tech landscape is dynamic and ever-evolving. New technologies are born, they mature, and then, often, they are…
Hashformers: Hashtag Segmentation Applications in Abusive Language Detection
Abusive language detection, a critical aspect of modern NLP research, is often challenged by the lack of generalization across different…
📢 New Portuguese NLP Model Alert! 🇵🇹 🇧🇷
I am thrilled to announce the latest milestone in the advancement of Portuguese language technology — the Albertina PT ! This breakthrough…
<h2 class="archive__subtitle">2022</h2>
15 Datasets for Word Segmentation on the Hugging Face Hub
Word segmentation is the task of adding spaces between words. It can be an important preprocessing step in Natural Language Processing…
The cold start problem in NLP
The cold start problem in NLP:
The word ‘had’ and the calendar in Finnegans Wake
A calendar in the Wake
Integrating Ray Tune, Hugging Face Transformers and W&B
Update 03/21/2021: I published my modified version of run_glue.py as a public gist on GitHub.