Skip to content

Commit 862d8cd

Browse files
cli: fixed dead links to tools/main for cli and completion, fixed code owners
1 parent 4d5ae24 commit 862d8cd

File tree

7 files changed

+28
-24
lines changed

7 files changed

+28
-24
lines changed

CODEOWNERS

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -87,7 +87,8 @@
8787
/tests/ @ggerganov
8888
/tests/test-chat-.* @pwilkin
8989
/tools/batched-bench/ @ggerganov
90-
/tools/main/ @ggerganov
90+
/tools/cli/ @ngxson
91+
/tools/completion/ @ggerganov
9192
/tools/mtmd/ @ngxson
9293
/tools/perplexity/ @ggerganov
9394
/tools/quantize/ @ggerganov

README.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -313,7 +313,7 @@ The Hugging Face platform provides a variety of online tools for converting, qua
313313

314314
To learn more about model quantization, [read this documentation](tools/quantize/README.md)
315315

316-
## [`llama-cli`](tools/main)
316+
## [`llama-cli`](tools/cli)
317317

318318
#### A CLI tool for accessing and experimenting with most of `llama.cpp`'s functionality.
319319

@@ -525,7 +525,8 @@ To learn more about model quantization, [read this documentation](tools/quantize
525525
526526
## Other documentation
527527
528-
- [main (cli)](tools/main/README.md)
528+
- [cli](tools/cli/README.md)
529+
- [completion](tools/completion/README.md)
529530
- [server](tools/server/README.md)
530531
- [GBNF grammars](grammars/README.md)
531532

docs/development/HOWTO-add-model.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,8 @@ Adding a model requires few steps:
99
After following these steps, you can open PR.
1010

1111
Also, it is important to check that the examples and main ggml backends (CUDA, METAL, CPU) are working with the new architecture, especially:
12-
- [main](/tools/main/)
12+
- [cli](/tools/cli/)
13+
- [completion](/tools/completion/)
1314
- [imatrix](/tools/imatrix/)
1415
- [quantize](/tools/quantize/)
1516
- [server](/tools/server/)

grammars/README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# GBNF Guide
22

3-
GBNF (GGML BNF) is a format for defining [formal grammars](https://en.wikipedia.org/wiki/Formal_grammar) to constrain model outputs in `llama.cpp`. For example, you can use it to force the model to generate valid JSON, or speak only in emojis. GBNF grammars are supported in various ways in `tools/main` and `tools/server`.
3+
GBNF (GGML BNF) is a format for defining [formal grammars](https://en.wikipedia.org/wiki/Formal_grammar) to constrain model outputs in `llama.cpp`. For example, you can use it to force the model to generate valid JSON, or speak only in emojis. GBNF grammars are supported in various ways in `tools/cli`, `tools/completion` and `tools/server`.
44

55
## Background
66

@@ -135,7 +135,7 @@ While semantically correct, the syntax `x? x? x?.... x?` (with N repetitions) ma
135135
You can use GBNF grammars:
136136

137137
- In [llama-server](../tools/server)'s completion endpoints, passed as the `grammar` body field
138-
- In [llama-cli](../tools/main), passed as the `--grammar` & `--grammar-file` flags
138+
- In [llama-cli](../tools/cli) and [llama-completion](../tools/completion), passed as the `--grammar` & `--grammar-file` flags
139139
- With [test-gbnf-validator](../tests/test-gbnf-validator.cpp), to test them against strings.
140140

141141
## JSON Schemas → GBNF
@@ -145,7 +145,7 @@ You can use GBNF grammars:
145145
- In [llama-server](../tools/server):
146146
- For any completion endpoints, passed as the `json_schema` body field
147147
- For the `/chat/completions` endpoint, passed inside the `response_format` body field (e.g. `{"type", "json_object", "schema": {"items": {}}}` or `{ type: "json_schema", json_schema: {"schema": ...} }`)
148-
- In [llama-cli](../tools/main), passed as the `--json` / `-j` flag
148+
- In [llama-cli](../tools/cli) and [llama-completion](../tools/completion), passed as the `--json` / `-j` flag
149149
- To convert to a grammar ahead of time:
150150
- in CLI, with [examples/json_schema_to_grammar.py](../examples/json_schema_to_grammar.py)
151151
- in JavaScript with [json-schema-to-grammar.mjs](../tools/server/public_legacy/json-schema-to-grammar.mjs) (this is used by the [server](../tools/server)'s Web UI)

tools/cli/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
TODO

tools/completion/README.md

Lines changed: 16 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# llama.cpp/tools/main
1+
# llama.cpp/tools/completion
22

33
This example program allows you to use various LLaMA language models easily and efficiently. It is specifically designed to work with the [llama.cpp](https://github.com/ggml-org/llama.cpp) project, which provides a plain C/C++ implementation with optional 4-bit quantization support for faster, lower memory inference, and is optimized for desktop CPUs. This program can be used to perform various inference tasks with LLaMA models, including generating text based on user-provided prompts and chat-like interactions with reverse prompts.
44

@@ -27,64 +27,64 @@ Once downloaded, place your model in the models folder in llama.cpp.
2727
##### Input prompt (One-and-done)
2828

2929
```bash
30-
./llama-cli -m models/gemma-1.1-7b-it.Q4_K_M.gguf -no-cnv --prompt "Once upon a time"
30+
./llama-completion -m models/gemma-1.1-7b-it.Q4_K_M.gguf -no-cnv --prompt "Once upon a time"
3131
```
3232
##### Conversation mode (Allow for continuous interaction with the model)
3333

3434
```bash
35-
./llama-cli -m models/gemma-1.1-7b-it.Q4_K_M.gguf --chat-template gemma
35+
./llama-completion -m models/gemma-1.1-7b-it.Q4_K_M.gguf --chat-template gemma
3636
```
3737

3838
##### Conversation mode using built-in jinja chat template
3939

4040
```bash
41-
./llama-cli -m models/gemma-1.1-7b-it.Q4_K_M.gguf --jinja
41+
./llama-completion -m models/gemma-1.1-7b-it.Q4_K_M.gguf --jinja
4242
```
4343

4444
##### One-and-done query using jinja with custom system prompt and a starting prompt
4545

4646
```bash
47-
./llama-cli -m models/gemma-1.1-7b-it.Q4_K_M.gguf --jinja --single-turn -sys "You are a helpful assistant" -p "Hello"
47+
./llama-completion -m models/gemma-1.1-7b-it.Q4_K_M.gguf --jinja --single-turn -sys "You are a helpful assistant" -p "Hello"
4848
```
4949

5050
##### Infinite text from a starting prompt (you can use `Ctrl-C` to stop it):
5151
```bash
52-
./llama-cli -m models/gemma-1.1-7b-it.Q4_K_M.gguf --ignore-eos -n -1
52+
./llama-completion -m models/gemma-1.1-7b-it.Q4_K_M.gguf --ignore-eos -n -1
5353
```
5454

5555
### Windows:
5656

5757
##### Input prompt (One-and-done)
5858
```powershell
59-
./llama-cli.exe -m models\gemma-1.1-7b-it.Q4_K_M.gguf -no-cnv --prompt "Once upon a time"
59+
./llama-completion.exe -m models\gemma-1.1-7b-it.Q4_K_M.gguf -no-cnv --prompt "Once upon a time"
6060
```
6161
##### Conversation mode (Allow for continuous interaction with the model)
6262

6363
```powershell
64-
./llama-cli.exe -m models\gemma-1.1-7b-it.Q4_K_M.gguf --chat-template gemma
64+
./llama-completion.exe -m models\gemma-1.1-7b-it.Q4_K_M.gguf --chat-template gemma
6565
```
6666

6767
##### Conversation mode using built-in jinja chat template
6868

6969
```powershell
70-
./llama-cli.exe -m models\gemma-1.1-7b-it.Q4_K_M.gguf --jinja
70+
./llama-completion.exe -m models\gemma-1.1-7b-it.Q4_K_M.gguf --jinja
7171
```
7272

7373
##### One-and-done query using jinja with custom system prompt and a starting prompt
7474

7575
```powershell
76-
./llama-cli.exe -m models\gemma-1.1-7b-it.Q4_K_M.gguf --jinja --single-turn -sys "You are a helpful assistant" -p "Hello"
76+
./llama-completion.exe -m models\gemma-1.1-7b-it.Q4_K_M.gguf --jinja --single-turn -sys "You are a helpful assistant" -p "Hello"
7777
```
7878

7979
#### Infinite text from a starting prompt (you can use `Ctrl-C` to stop it):
8080

8181
```powershell
82-
llama-cli.exe -m models\gemma-1.1-7b-it.Q4_K_M.gguf --ignore-eos -n -1
82+
llama-completion.exe -m models\gemma-1.1-7b-it.Q4_K_M.gguf --ignore-eos -n -1
8383
```
8484

8585
## Common Options
8686

87-
In this section, we cover the most commonly used options for running the `llama-cli` program with the LLaMA models:
87+
In this section, we cover the most commonly used options for running the `llama-completion` program with the LLaMA models:
8888

8989
- `-m FNAME, --model FNAME`: Specify the path to the LLaMA model file (e.g., `models/gemma-1.1-7b-it.Q4_K_M.gguf`; inferred from `--model-url` if set).
9090
- `-mu MODEL_URL --model-url MODEL_URL`: Specify a remote http url to download the file (e.g [https://huggingface.co/ggml-org/gemma-1.1-7b-it-Q4_K_M-GGUF/resolve/main/gemma-1.1-7b-it.Q4_K_M.gguf?download=true](https://huggingface.co/ggml-org/gemma-1.1-7b-it-Q4_K_M-GGUF/resolve/main/gemma-1.1-7b-it.Q4_K_M.gguf?download=true)).
@@ -97,7 +97,7 @@ In this section, we cover the most commonly used options for running the `llama-
9797

9898
## Input Prompts
9999

100-
The `llama-cli` program provides several ways to interact with the LLaMA models using input prompts:
100+
The `llama-completion` program provides several ways to interact with the LLaMA models using input prompts:
101101

102102
- `--prompt PROMPT`: Provide a prompt directly as a command-line option.
103103
- `--file FNAME`: Provide a file containing a prompt or multiple prompts.
@@ -107,7 +107,7 @@ The `llama-cli` program provides several ways to interact with the LLaMA models
107107

108108
## Interaction
109109

110-
The `llama-cli` program offers a seamless way to interact with LLaMA models, allowing users to engage in real-time conversations or provide instructions for specific tasks. The interactive mode can be triggered using various options, including `--interactive` and `--interactive-first`.
110+
The `llama-completion` program offers a seamless way to interact with LLaMA models, allowing users to engage in real-time conversations or provide instructions for specific tasks. The interactive mode can be triggered using various options, including `--interactive` and `--interactive-first`.
111111

112112
In interactive mode, users can participate in text generation by injecting their input during the process. Users can press `Ctrl+C` at any time to interject and type their input, followed by pressing `Return` to submit it to the LLaMA model. To submit additional lines without finalizing input, users can end the current line with a backslash (`\`) and continue typing.
113113

@@ -136,15 +136,15 @@ To overcome this limitation, you can use the `--in-prefix` flag to add a space o
136136
The `--in-prefix` flag is used to add a prefix to your input, primarily, this is used to insert a space after the reverse prompt. Here's an example of how to use the `--in-prefix` flag in conjunction with the `--reverse-prompt` flag:
137137

138138
```sh
139-
./llama-cli -r "User:" --in-prefix " "
139+
./llama-completion -r "User:" --in-prefix " "
140140
```
141141

142142
### In-Suffix
143143

144144
The `--in-suffix` flag is used to add a suffix after your input. This is useful for adding an "Assistant:" prompt after the user's input. It's added after the new-line character (`\n`) that's automatically added to the end of the user's input. Here's an example of how to use the `--in-suffix` flag in conjunction with the `--reverse-prompt` flag:
145145

146146
```sh
147-
./llama-cli -r "User:" --in-prefix " " --in-suffix "Assistant:"
147+
./llama-completion -r "User:" --in-prefix " " --in-suffix "Assistant:"
148148
```
149149
When --in-prefix or --in-suffix options are enabled the chat template ( --chat-template ) is disabled
150150

tools/llama-bench/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -80,7 +80,7 @@ Each test is repeated the number of times given by `-r`, and the results are ave
8080

8181
Using the `-d <n>` option, each test can be run at a specified context depth, prefilling the KV cache with `<n>` tokens.
8282

83-
For a description of the other options, see the [main example](../main/README.md).
83+
For a description of the other options, see the [completion example](../completion/README.md).
8484

8585
> [!NOTE]
8686
> The measurements with `llama-bench` do not include the times for tokenization and for sampling.

0 commit comments

Comments
 (0)