NLP¶

This page describes the collection of Natural Language Processing (NLP) software on HiPerGator. NLP is a part of artificial intelligence (AI) that helps computers understand and respond to human language. It's used in applications like voice assistants, chatbots, and translation. NLP combines language rules with machine learning to help computers grasp not just words but also the intent and feelings behind them. NLP enhances AI performance across various fields, such as healthcare, where it helps analyze medical records to aid patient care. Research Computing can assist with language modeling for knowledge exploration, measurement, classification, summarization, conversational AI, and more through support requests or consulting.

Environment Modules for NLP¶

Nemo: module load nemo loads a singularity container environment with Python and Nvidia NeMo. NeMo provides NLP task training, speech-to-text, text-to-speech models, and allows use of pretrained Megatron language models.
- To list available versions on HiPerGator-AI:
```
module spider nemo
```

BioNeMo: module load bionemo launches a singularity container environment with Python and Nvidia BioNeMo, specialized in biomedical NLP tasks like medical text analysis and patient information processing. Allows integration of pretrained Megatron language models.
- To list available versions on HiPerGator-AI:
```
module spider bionemo
```

Llama: module load llama provides tools for fine-tuning Meta Llama models, supporting domain adaptation and building LLM-based applications locally, in the cloud, or on-premises.
- To list available versions on HiPerGator-AI:
```
module spider llama
```

Mistral AI: module load mistral is a set of tools for Mistral models, including tokenization and tools for structured conversation.
- To list available versions on HiPerGator-AI:
```
module spider mistral
```

Gemma LLMs: module load gemma_llm provides tools for working with Google Gemma models, compatible with PyTorch, Keras-NLP, Nvidia NeMo, and Hugging Face Transformers.
- To list available versions on HiPerGator-AI:
```
module spider gemma_llm
```

PyTorch or TensorFlow: Use module load pytorch or module load tensorflow to access these environments. For additional libraries, create a custom Conda environment. See Conda and Managing Python environments and Jupyter kernels for details.
- To list available versions on HiPerGator-AI:
```
module spider pytorch
```
  or
```
module spider tensorflow
```
ngc-pytorch: module load ngc-pytorch provides a container environment with PyTorch and Nvidia Apex, supporting large parameter Megatron language models. See /data/ai/examples/nlp or AI Examples for details.
- To list available versions on HiPerGator-AI:
```
module spider ngc-pytorch
```

Transformers: Available in nlp/1.3 and llama/3, these packages support NLP models like BERT and GPT for handling various tasks across text, vision, and audio modalities.
LangChain: A framework to simplify applications using large language models, supporting document analysis, summarization, and chatbots. Available in nlp/1.3 and llama/3.
LlamaIndex: A flexible data framework for connecting custom data sources to large language models (LLMs), available in nlp/1.3 and llama/3.
TensorRT-LLM: An open-source library that optimizes LLM inference performance on NVIDIA platforms. Available in llama/3.

nlp: module load nlp loads a Python environment with PyTorch, torchtext, nltk, Spacy, transformers, and other NLP tools.
- To list available versions on HiPerGator-AI:
```
module spider nlp
```

spark-nlp: See Spark for instructions to start a Spark cluster. Available in tensorflow/2.4.1.
FlairNLP: See the FlairNLP documentation for more information.
parlai: A conversational AI framework by Facebook, with models ranging from 110M to 9B parameters.

Large Language Models¶

HiPerGator offers access to various LLMs for download, including starter LLMs trained with Megatron-LM, Llama2, and Llama3. These models, including 20B parameter GPT and 9B parameter BERT, can be further trained or fine-tuned. For advanced LLMs like LLaMA, GEMMA, and Mistral AI, submit a help ticket for support. Additional details are available on the AI Models page.

Examples and Reference Data¶

Please see /data/ai/ folder, AI_Examples, and AI_Reference_Datasets for helpful resources. Notebooks and batch scripts cover everything from pretraining and inferencing to summarization, information extraction, and topic modeling. Addition reference data, including benchmarks such as the popular superglue, are already available in /data/ai/benchmarks/nlp.