You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/docker.md
+15-11Lines changed: 15 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,9 +7,9 @@
7
7
## Images
8
8
We have three Docker images available for this project:
9
9
10
-
1.`ghcr.io/ggml-org/llama.cpp:full`: This image includes both the main executable file and the tools to convert LLaMA models into ggml and convert into 4-bit quantization. (platforms: `linux/amd64`, `linux/arm64`, `linux/s390x`)
11
-
2.`ghcr.io/ggml-org/llama.cpp:light`: This image only includes the main executable file. (platforms: `linux/amd64`, `linux/arm64`, `linux/s390x`)
12
-
3.`ghcr.io/ggml-org/llama.cpp:server`: This image only includes the server executable file. (platforms: `linux/amd64`, `linux/arm64`, `linux/s390x`)
10
+
1.`ghcr.io/ggml-org/llama.cpp:full`: This image includes both the `llama-cli` and `llama-completion` executables and the tools to convert LLaMA models into ggml and convert into 4-bit quantization. (platforms: `linux/amd64`, `linux/arm64`, `linux/s390x`)
11
+
2.`ghcr.io/ggml-org/llama.cpp:light`: This image only includes the `llama-cli` and `llama-completion` executables. (platforms: `linux/amd64`, `linux/arm64`, `linux/s390x`)
12
+
3.`ghcr.io/ggml-org/llama.cpp:server`: This image only includes the `llama-server` executable. (platforms: `linux/amd64`, `linux/arm64`, `linux/s390x`)
13
13
14
14
Additionally, there the following images, similar to the above:
15
15
@@ -44,13 +44,15 @@ docker run -v /path/to/models:/models ghcr.io/ggml-org/llama.cpp:full --all-in-o
44
44
On completion, you are ready to play!
45
45
46
46
```bash
47
-
docker run -v /path/to/models:/models ghcr.io/ggml-org/llama.cpp:full --run -m /models/7B/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 512
47
+
docker run -v /path/to/models:/models ghcr.io/ggml-org/llama.cpp:full --entrypoint /app/llama-cli --run -m /models/7B/ggml-model-q4_0.gguf
48
+
docker run -v /path/to/models:/models ghcr.io/ggml-org/llama.cpp:full --entrypoint /app/llama-completion --run -m /models/32B/ggml-model-q8_0.gguf -no-cnv -p "Building a mobile app can be done in 15 steps:" -n 512
48
49
```
49
50
50
51
or with a light image:
51
52
52
53
```bash
53
-
docker run -v /path/to/models:/models ghcr.io/ggml-org/llama.cpp:light -m /models/7B/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 512
54
+
docker run -v /path/to/models:/models ghcr.io/ggml-org/llama.cpp:light --entrypoint /app/llama-cli -m /models/7B/ggml-model-q4_0.gguf
55
+
docker run -v /path/to/models:/models ghcr.io/ggml-org/llama.cpp:light --entrypoint /app/llama-completion -m /models/32B/ggml-model-q8_0.gguf -no-cnv -p "Building a mobile app can be done in 15 steps:" -n 512
In the above examples, `--entrypoint /app/llama-cli` is specified for clarity, but you can safely omit it since it's the default entrypoint in the container.
65
+
62
66
## Docker With CUDA
63
67
64
68
Assuming one has the [nvidia-container-toolkit](https://github.com/NVIDIA/nvidia-container-toolkit) properly installed on Linux, or is using a GPU enabled cloud, `cuBLAS` should be accessible inside the container.
@@ -80,9 +84,9 @@ The defaults are:
80
84
81
85
The resulting images, are essentially the same as the non-CUDA images:
82
86
83
-
1.`local/llama.cpp:full-cuda`: This image includes both the main executable file and the tools to convert LLaMA models into ggml and convert into 4-bit quantization.
84
-
2.`local/llama.cpp:light-cuda`: This image only includes the main executable file.
85
-
3.`local/llama.cpp:server-cuda`: This image only includes the server executable file.
87
+
1.`local/llama.cpp:full-cuda`: This image includes both the `llama-cli` and `llama-completion` executables and the tools to convert LLaMA models into ggml and convert into 4-bit quantization.
88
+
2.`local/llama.cpp:light-cuda`: This image only includes the `llama-cli` and `llama-completion` executables.
89
+
3.`local/llama.cpp:server-cuda`: This image only includes the `llama-server` executable.
86
90
87
91
## Usage
88
92
@@ -114,9 +118,9 @@ The defaults are:
114
118
115
119
The resulting images, are essentially the same as the non-MUSA images:
116
120
117
-
1.`local/llama.cpp:full-musa`: This image includes both the main executable file and the tools to convert LLaMA models into ggml and convert into 4-bit quantization.
118
-
2.`local/llama.cpp:light-musa`: This image only includes the main executable file.
119
-
3.`local/llama.cpp:server-musa`: This image only includes the server executable file.
121
+
1.`local/llama.cpp:full-musa`: This image includes both the `llama-cli` and `llama-completion` executables and the tools to convert LLaMA models into ggml and convert into 4-bit quantization.
122
+
2.`local/llama.cpp:light-musa`: This image only includes the `llama-cli` and `llama-completion` executables.
123
+
3.`local/llama.cpp:server-musa`: This image only includes the `llama-server` executable.
0 commit comments