NLP¶
This page describes the collection of Natural Language Processing (NLP) software on HiPerGator. NLP is a part of artificial intelligence (AI) that helps computers understand and respond to human language. It's used in applications like voice assistants, chatbots, and translation. NLP combines language rules with machine learning to help computers grasp not just words but also the intent and feelings behind them. NLP enhances AI performance across various fields, such as healthcare, where it helps analyze medical records to aid patient care. Research Computing can assist with language modeling for knowledge exploration, measurement, classification, summarization, conversational AI, and more through support requests or consulting.
Environment Modules for NLP¶
- Nemo:
module load nemo
loads a singularity container environment with Python and Nvidia NeMo. NeMo provides NLP task training, speech-to-text, text-to-speech models, and allows use of pretrained Megatron language models.- To list available versions on HiPerGator-AI:
module spider nemo
- To list available versions on HiPerGator-AI:
- BioNeMo:
module load bionemo
launches a singularity container environment with Python and Nvidia BioNeMo, specialized in biomedical NLP tasks like medical text analysis and patient information processing. Allows integration of pretrained Megatron language models.- To list available versions on HiPerGator-AI:
module spider bionemo
- To list available versions on HiPerGator-AI:
- Llama:
module load llama
provides tools for fine-tuning Meta Llama models, supporting domain adaptation and building LLM-based applications locally, in the cloud, or on-premises.- To list available versions on HiPerGator-AI:
module spider llama
- To list available versions on HiPerGator-AI:
- Mistral AI:
module load mistral
is a set of tools for Mistral models, including tokenization and tools for structured conversation.- To list available versions on HiPerGator-AI:
module spider mistral
- To list available versions on HiPerGator-AI:
- Gemma LLMs:
module load gemma_llm
provides tools for working with Google Gemma models, compatible with PyTorch, Keras-NLP, Nvidia NeMo, and Hugging Face Transformers.- To list available versions on HiPerGator-AI:
module spider gemma_llm
- To list available versions on HiPerGator-AI:
-
PyTorch or TensorFlow: Use
module load pytorch
ormodule load tensorflow
to access these environments. For additional libraries, create a custom Conda environment. See Conda and Managing Python environments and Jupyter kernels for details.- To list available versions on HiPerGator-AI:
or
module spider pytorch
module spider tensorflow
- To list available versions on HiPerGator-AI:
-
ngc-pytorch:
module load ngc-pytorch
provides a container environment with PyTorch and Nvidia Apex, supporting large parameter Megatron language models. See/data/ai/examples/nlp
or AI Examples for details.- To list available versions on HiPerGator-AI:
module spider ngc-pytorch
- To list available versions on HiPerGator-AI:
-
Transformers: Available in
nlp/1.3
andllama/3
, these packages support NLP models like BERT and GPT for handling various tasks across text, vision, and audio modalities. -
LangChain: A framework to simplify applications using large language models, supporting document analysis, summarization, and chatbots. Available in
nlp/1.3
andllama/3
. -
LlamaIndex: A flexible data framework for connecting custom data sources to large language models (LLMs), available in
nlp/1.3
andllama/3
. -
TensorRT-LLM: An open-source library that optimizes LLM inference performance on NVIDIA platforms. Available in
llama/3
.
- nlp:
module load nlp
loads a Python environment with PyTorch, torchtext, nltk, Spacy, transformers, and other NLP tools.- To list available versions on HiPerGator-AI:
module spider nlp
- To list available versions on HiPerGator-AI:
-
spark-nlp: See Spark for instructions to start a Spark cluster. Available in
tensorflow/2.4.1
. -
FlairNLP: See the FlairNLP documentation for more information.
-
parlai: A conversational AI framework by Facebook, with models ranging from 110M to 9B parameters.
Large Language Models¶
HiPerGator offers access to various LLMs for download, including starter LLMs trained with Megatron-LM, Llama2, and Llama3. These models, including 20B parameter GPT and 9B parameter BERT, can be further trained or fine-tuned. For advanced LLMs like LLaMA, GEMMA, and Mistral AI, submit a help ticket for support. Additional details are available on the AI Models page.
Examples and Reference Data¶
Please see /data/ai/
folder, AI_Examples, and
AI_Reference_Datasets for helpful resources. Notebooks
and batch scripts cover everything from pretraining and inferencing to
summarization, information extraction, and topic modeling. Addition reference
data, including benchmarks such as the popular
superglue, are already available in
/data/ai/benchmarks/nlp
.