f22929a129 | ||
---|---|---|
README.md |
README.md
Llama2 Installation Guide for Mac (M1 Chip)
Guide for setting up and running Llama2 on Mac systems with Apple silicon. This repo provides instructions for installing prerequisites like Python and Git, cloning the necessary repositories, downloading and converting the Llama models, and finally running the model with example prompts.
Prerequisites
Before starting, ensure your system meets the following requirements:
-
Python 3.8+ (Python 3.11 recommended): Check your Python version:
python3 --version
Install Python 3.11 (if needed):
brew install python@3.11
-
Install Mini Conda.
Cloning the Llama2 Repository
git clone https://github.com/facebookresearch/llama.git
Clone the llama C++ port repository
git clone https://github.com/ggerganov/llama.cpp.git
Now, both repositories should be in your llama2
directory.
Inside the llama.cpp
directory, build it:
cd llama.cpp
make
Requesting Access to Llama Models
- Visit Meta AI Resources.
- Fill in your details in the request form.
- You’ll receive an email with a unique URL to download the models.
Downloading the Models
-
In your terminal, navigate to the
llama
directory:cd llama
-
Run the download script:
/bin/bash ./download.sh
-
When prompted, enter the custom URL from the email.
Converting the Downloaded Models
-
Navigate back to the
llama.cpp
repository:cd llama.cpp
-
Create a conda environment named
llama2
:conda create --name llama2
-
Activate the environment:
conda activate llama2
-
Install Python dependencies:
python3 -m pip install -r requirements.txt
-
Convert the model to f16 format:
python3 convert.py --outfile models/7B/ggml-model-f16.bin --outtype f16 ../llama2/llama-2-7b-chat --vocab-dir ../llama2
Note: If you encounter an error about a vocab size mismatch (model has -1, but tokenizer.model has 32000), update
params.json
in../llama2/llama-2-7b-chat
from -1 to 32000. -
Quantize the model to reduce its size:
./quantize ./models/7B/ggml-model-f16.bin ./models/7B/ggml-model-q4_0.bin q4_0
Running the Model
-
Execute the following command:
./main -m ./models/7B/ggml-model-q4_0.bin -n 1024 --repeat_penalty 1.0 --color -i -r "User:" -f ./prompts/chat-with-bob.txt
-m
: Model file-n
: Number of tokens--color
: Colored text input-i
: Interactive mode-r "User:"
: User input marker-f
: Path to prompt file
Now you're ready to use Llama2!