You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: grammars/README.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
# GBNF Guide
2
2
3
-
GBNF (GGML BNF) is a format for defining [formal grammars](https://en.wikipedia.org/wiki/Formal_grammar) to constrain model outputs in `llama.cpp`. For example, you can use it to force the model to generate valid JSON, or speak only in emojis. GBNF grammars are supported in various ways in `tools/main` and `tools/server`.
3
+
GBNF (GGML BNF) is a format for defining [formal grammars](https://en.wikipedia.org/wiki/Formal_grammar) to constrain model outputs in `llama.cpp`. For example, you can use it to force the model to generate valid JSON, or speak only in emojis. GBNF grammars are supported in various ways in `tools/cli`, `tools/completion` and `tools/server`.
4
4
5
5
## Background
6
6
@@ -135,7 +135,7 @@ While semantically correct, the syntax `x? x? x?.... x?` (with N repetitions) ma
135
135
You can use GBNF grammars:
136
136
137
137
- In [llama-server](../tools/server)'s completion endpoints, passed as the `grammar` body field
138
-
- In [llama-cli](../tools/main), passed as the `--grammar` & `--grammar-file` flags
138
+
- In [llama-cli](../tools/cli) and [llama-completion](../tools/completion), passed as the `--grammar` & `--grammar-file` flags
139
139
- With [test-gbnf-validator](../tests/test-gbnf-validator.cpp), to test them against strings.
140
140
141
141
## JSON Schemas → GBNF
@@ -145,7 +145,7 @@ You can use GBNF grammars:
145
145
- In [llama-server](../tools/server):
146
146
- For any completion endpoints, passed as the `json_schema` body field
147
147
- For the `/chat/completions` endpoint, passed inside the `response_format` body field (e.g. `{"type", "json_object", "schema": {"items": {}}}` or `{ type: "json_schema", json_schema: {"schema": ...} }`)
148
-
- In [llama-cli](../tools/main), passed as the `--json` / `-j` flag
148
+
- In [llama-cli](../tools/cli) and [llama-completion](../tools/completion), passed as the `--json` / `-j` flag
149
149
- To convert to a grammar ahead of time:
150
150
- in CLI, with [examples/json_schema_to_grammar.py](../examples/json_schema_to_grammar.py)
151
151
- in JavaScript with [json-schema-to-grammar.mjs](../tools/server/public_legacy/json-schema-to-grammar.mjs) (this is used by the [server](../tools/server)'s Web UI)
Copy file name to clipboardExpand all lines: tools/completion/README.md
+16-16Lines changed: 16 additions & 16 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,4 +1,4 @@
1
-
# llama.cpp/tools/main
1
+
# llama.cpp/tools/completion
2
2
3
3
This example program allows you to use various LLaMA language models easily and efficiently. It is specifically designed to work with the [llama.cpp](https://github.com/ggml-org/llama.cpp) project, which provides a plain C/C++ implementation with optional 4-bit quantization support for faster, lower memory inference, and is optimized for desktop CPUs. This program can be used to perform various inference tasks with LLaMA models, including generating text based on user-provided prompts and chat-like interactions with reverse prompts.
4
4
@@ -27,64 +27,64 @@ Once downloaded, place your model in the models folder in llama.cpp.
27
27
##### Input prompt (One-and-done)
28
28
29
29
```bash
30
-
./llama-cli -m models/gemma-1.1-7b-it.Q4_K_M.gguf -no-cnv --prompt "Once upon a time"
30
+
./llama-completion -m models/gemma-1.1-7b-it.Q4_K_M.gguf -no-cnv --prompt "Once upon a time"
31
31
```
32
32
##### Conversation mode (Allow for continuous interaction with the model)
In this section, we cover the most commonly used options for running the `llama-cli` program with the LLaMA models:
87
+
In this section, we cover the most commonly used options for running the `llama-completion` program with the LLaMA models:
88
88
89
89
-`-m FNAME, --model FNAME`: Specify the path to the LLaMA model file (e.g., `models/gemma-1.1-7b-it.Q4_K_M.gguf`; inferred from `--model-url` if set).
90
90
-`-mu MODEL_URL --model-url MODEL_URL`: Specify a remote http url to download the file (e.g [https://huggingface.co/ggml-org/gemma-1.1-7b-it-Q4_K_M-GGUF/resolve/main/gemma-1.1-7b-it.Q4_K_M.gguf?download=true](https://huggingface.co/ggml-org/gemma-1.1-7b-it-Q4_K_M-GGUF/resolve/main/gemma-1.1-7b-it.Q4_K_M.gguf?download=true)).
@@ -97,7 +97,7 @@ In this section, we cover the most commonly used options for running the `llama-
97
97
98
98
## Input Prompts
99
99
100
-
The `llama-cli` program provides several ways to interact with the LLaMA models using input prompts:
100
+
The `llama-completion` program provides several ways to interact with the LLaMA models using input prompts:
101
101
102
102
-`--prompt PROMPT`: Provide a prompt directly as a command-line option.
103
103
-`--file FNAME`: Provide a file containing a prompt or multiple prompts.
@@ -107,7 +107,7 @@ The `llama-cli` program provides several ways to interact with the LLaMA models
107
107
108
108
## Interaction
109
109
110
-
The `llama-cli` program offers a seamless way to interact with LLaMA models, allowing users to engage in real-time conversations or provide instructions for specific tasks. The interactive mode can be triggered using various options, including `--interactive` and `--interactive-first`.
110
+
The `llama-completion` program offers a seamless way to interact with LLaMA models, allowing users to engage in real-time conversations or provide instructions for specific tasks. The interactive mode can be triggered using various options, including `--interactive` and `--interactive-first`.
111
111
112
112
In interactive mode, users can participate in text generation by injecting their input during the process. Users can press `Ctrl+C` at any time to interject and type their input, followed by pressing `Return` to submit it to the LLaMA model. To submit additional lines without finalizing input, users can end the current line with a backslash (`\`) and continue typing.
113
113
@@ -136,15 +136,15 @@ To overcome this limitation, you can use the `--in-prefix` flag to add a space o
136
136
The `--in-prefix` flag is used to add a prefix to your input, primarily, this is used to insert a space after the reverse prompt. Here's an example of how to use the `--in-prefix` flag in conjunction with the `--reverse-prompt` flag:
137
137
138
138
```sh
139
-
./llama-cli -r "User:" --in-prefix ""
139
+
./llama-completion -r "User:" --in-prefix ""
140
140
```
141
141
142
142
### In-Suffix
143
143
144
144
The `--in-suffix` flag is used to add a suffix after your input. This is useful for adding an "Assistant:" prompt after the user's input. It's added after the new-line character (`\n`) that's automatically added to the end of the user's input. Here's an example of how to use the `--in-suffix` flag in conjunction with the `--reverse-prompt` flag:
0 commit comments