Skip to content

Commit 99ead70

Browse files
CLI: fixed adding cli and completion into docker containers, improved docs
1 parent 5266379 commit 99ead70

File tree

5 files changed

+23
-14
lines changed

5 files changed

+23
-14
lines changed

.devops/cann.Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -107,7 +107,7 @@ ENTRYPOINT ["/app/tools.sh"]
107107
# ENTRYPOINT ["/app/llama-server"]
108108

109109
### Target: light
110-
# Lightweight image containing only llama-cli
110+
# Lightweight image containing only llama-cli and llama-completion
111111
# ==============================================================================
112112
FROM base AS light
113113

.devops/llama-cli-cann.Dockerfile

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,11 +23,12 @@ ENV LD_LIBRARY_PATH=${ASCEND_TOOLKIT_HOME}/runtime/lib64/stub:$LD_LIBRARY_PATH
2323
RUN echo "Building with static libs" && \
2424
source /usr/local/Ascend/ascend-toolkit/set_env.sh --force && \
2525
cmake -B build -DGGML_NATIVE=OFF -DGGML_CANN=ON -DBUILD_SHARED_LIBS=OFF -DLLAMA_BUILD_TESTS=OFF && \
26-
cmake --build build --config Release --target llama-cli
26+
cmake --build build --config Release --target llama-cli && \
27+
cmake --build build --config Release --target llama-completion
2728

2829
# TODO: use image with NNRT
2930
FROM ascendai/cann:$ASCEND_VERSION AS runtime
30-
COPY --from=build /app/build/bin/llama-cli /llama-cli
31+
COPY --from=build /app/build/bin/llama-cli /app/build/bin/llama-completion /
3132

3233
ENV LC_ALL=C.utf8
3334

.devops/llama-cpp-cuda.srpm.spec

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@ make -j GGML_CUDA=1
3737
%install
3838
mkdir -p %{buildroot}%{_bindir}/
3939
cp -p llama-cli %{buildroot}%{_bindir}/llama-cuda-cli
40+
cp -p llama-completion %{buildroot}%{_bindir}/llama-cuda-completion
4041
cp -p llama-server %{buildroot}%{_bindir}/llama-cuda-server
4142
cp -p llama-simple %{buildroot}%{_bindir}/llama-cuda-simple
4243

@@ -68,6 +69,7 @@ rm -rf %{_builddir}/*
6869

6970
%files
7071
%{_bindir}/llama-cuda-cli
72+
%{_bindir}/llama-cuda-completion
7173
%{_bindir}/llama-cuda-server
7274
%{_bindir}/llama-cuda-simple
7375
/usr/lib/systemd/system/llamacuda.service

.devops/llama-cpp.srpm.spec

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,7 @@ make -j
3939
%install
4040
mkdir -p %{buildroot}%{_bindir}/
4141
cp -p llama-cli %{buildroot}%{_bindir}/llama-cli
42+
cp -p llama-completion %{buildroot}%{_bindir}/llama-completion
4243
cp -p llama-server %{buildroot}%{_bindir}/llama-server
4344
cp -p llama-simple %{buildroot}%{_bindir}/llama-simple
4445

@@ -70,6 +71,7 @@ rm -rf %{_builddir}/*
7071

7172
%files
7273
%{_bindir}/llama-cli
74+
%{_bindir}/llama-completion
7375
%{_bindir}/llama-server
7476
%{_bindir}/llama-simple
7577
/usr/lib/systemd/system/llama.service

docs/docker.md

Lines changed: 15 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,9 @@
77
## Images
88
We have three Docker images available for this project:
99

10-
1. `ghcr.io/ggml-org/llama.cpp:full`: This image includes both the main executable file and the tools to convert LLaMA models into ggml and convert into 4-bit quantization. (platforms: `linux/amd64`, `linux/arm64`, `linux/s390x`)
11-
2. `ghcr.io/ggml-org/llama.cpp:light`: This image only includes the main executable file. (platforms: `linux/amd64`, `linux/arm64`, `linux/s390x`)
12-
3. `ghcr.io/ggml-org/llama.cpp:server`: This image only includes the server executable file. (platforms: `linux/amd64`, `linux/arm64`, `linux/s390x`)
10+
1. `ghcr.io/ggml-org/llama.cpp:full`: This image includes both the `llama-cli` and `llama-completion` executables and the tools to convert LLaMA models into ggml and convert into 4-bit quantization. (platforms: `linux/amd64`, `linux/arm64`, `linux/s390x`)
11+
2. `ghcr.io/ggml-org/llama.cpp:light`: This image only includes the `llama-cli` and `llama-completion` executables. (platforms: `linux/amd64`, `linux/arm64`, `linux/s390x`)
12+
3. `ghcr.io/ggml-org/llama.cpp:server`: This image only includes the `llama-server` executable. (platforms: `linux/amd64`, `linux/arm64`, `linux/s390x`)
1313

1414
Additionally, there the following images, similar to the above:
1515

@@ -44,13 +44,15 @@ docker run -v /path/to/models:/models ghcr.io/ggml-org/llama.cpp:full --all-in-o
4444
On completion, you are ready to play!
4545

4646
```bash
47-
docker run -v /path/to/models:/models ghcr.io/ggml-org/llama.cpp:full --run -m /models/7B/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 512
47+
docker run -v /path/to/models:/models ghcr.io/ggml-org/llama.cpp:full --run -m /models/7B/ggml-model-q4_0.gguf
48+
docker run -v /path/to/models:/models ghcr.io/ggml-org/llama.cpp:full --run-legacy -m /models/32B/ggml-model-q8_0.gguf -no-cnv -p "Building a mobile app can be done in 15 steps:" -n 512
4849
```
4950

5051
or with a light image:
5152

5253
```bash
53-
docker run -v /path/to/models:/models ghcr.io/ggml-org/llama.cpp:light -m /models/7B/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 512
54+
docker run -v /path/to/models:/models --entrypoint /app/llama-cli ghcr.io/ggml-org/llama.cpp:light -m /models/7B/ggml-model-q4_0.gguf
55+
docker run -v /path/to/models:/models --entrypoint /app/llama-completion ghcr.io/ggml-org/llama.cpp:light -m /models/32B/ggml-model-q8_0.gguf -no-cnv -p "Building a mobile app can be done in 15 steps:" -n 512
5456
```
5557

5658
or with a server image:
@@ -59,6 +61,8 @@ or with a server image:
5961
docker run -v /path/to/models:/models -p 8080:8080 ghcr.io/ggml-org/llama.cpp:server -m /models/7B/ggml-model-q4_0.gguf --port 8080 --host 0.0.0.0 -n 512
6062
```
6163

64+
In the above examples, `--entrypoint /app/llama-cli` is specified for clarity, but you can safely omit it since it's the default entrypoint in the container.
65+
6266
## Docker With CUDA
6367

6468
Assuming one has the [nvidia-container-toolkit](https://github.com/NVIDIA/nvidia-container-toolkit) properly installed on Linux, or is using a GPU enabled cloud, `cuBLAS` should be accessible inside the container.
@@ -80,9 +84,9 @@ The defaults are:
8084

8185
The resulting images, are essentially the same as the non-CUDA images:
8286

83-
1. `local/llama.cpp:full-cuda`: This image includes both the main executable file and the tools to convert LLaMA models into ggml and convert into 4-bit quantization.
84-
2. `local/llama.cpp:light-cuda`: This image only includes the main executable file.
85-
3. `local/llama.cpp:server-cuda`: This image only includes the server executable file.
87+
1. `local/llama.cpp:full-cuda`: This image includes both the `llama-cli` and `llama-completion` executables and the tools to convert LLaMA models into ggml and convert into 4-bit quantization.
88+
2. `local/llama.cpp:light-cuda`: This image only includes the `llama-cli` and `llama-completion` executables.
89+
3. `local/llama.cpp:server-cuda`: This image only includes the `llama-server` executable.
8690

8791
## Usage
8892

@@ -114,9 +118,9 @@ The defaults are:
114118

115119
The resulting images, are essentially the same as the non-MUSA images:
116120

117-
1. `local/llama.cpp:full-musa`: This image includes both the main executable file and the tools to convert LLaMA models into ggml and convert into 4-bit quantization.
118-
2. `local/llama.cpp:light-musa`: This image only includes the main executable file.
119-
3. `local/llama.cpp:server-musa`: This image only includes the server executable file.
121+
1. `local/llama.cpp:full-musa`: This image includes both the `llama-cli` and `llama-completion` executables and the tools to convert LLaMA models into ggml and convert into 4-bit quantization.
122+
2. `local/llama.cpp:light-musa`: This image only includes the `llama-cli` and `llama-completion` executables.
123+
3. `local/llama.cpp:server-musa`: This image only includes the `llama-server` executable.
120124

121125
## Usage
122126

0 commit comments

Comments
 (0)