在 iOS 上运行 LLM¶

ExecuTorch 针对 LLM 的运行时组件提供了围绕核心 C++ LLM 运行时的实验性 Objective-C 和 Swift 组件。

先决条件¶

请确保您已准备好模型和分词器文件，具体请参见使用 C++ 运行 LLM 指南的先决条件部分。

运行时 API¶

与 executorch_llm 框架链接后，您可以导入必要的组件。

导入¶

Objective-C

#import <ExecuTorchLLM/ExecuTorchLLM.h>

Swift

import ExecuTorchLLM

TextLLMRunner¶

ExecuTorchTextLLMRunner 类（在 Swift 中桥接为 TextLLMRunner）提供了一个简单的 Objective-C/Swift 接口，用于加载文本生成模型、使用自定义特殊标记配置其分词器、生成标记流以及停止执行。此 API 具有实验性质，可能会发生更改。

初始化¶

通过指定序列化模型（.pte）和分词器数据的路径，以及用于分词的特殊标记数组，来创建运行器。初始化本身是轻量级的，不会立即加载程序数据。

Objective-C

NSString *modelPath     = [[NSBundle mainBundle] pathForResource:@"llama-3.2-instruct" ofType:@"pte"];
NSString *tokenizerPath = [[NSBundle mainBundle] pathForResource:@"tokenizer" ofType:@"model"];
NSArray<NSString *> *specialTokens = @[ @"<|bos|>", @"<|eos|>" ];

ExecuTorchTextLLMRunner *runner = [[ExecuTorchTextLLMRunner alloc] initWithModelPath:modelPath
                                                                       tokenizerPath:tokenizerPath
                                                                       specialTokens:specialTokens];

Swift

let modelPath     = Bundle.main.path(forResource: "llama-3.2-instruct", ofType: "pte")!
let tokenizerPath = Bundle.main.path(forResource: "tokenizer", ofType: "model")!
let specialTokens = ["<|bos|>", "<|eos|>"]

let runner = TextLLMRunner(
  modelPath: modelPath,
  tokenizerPath: tokenizerPath,
  specialTokens: specialTokens
)

加载¶

在生成之前显式加载模型，以避免在首次调用 generate 时产生加载成本。

Objective-C

NSError *error = nil;
BOOL success = [runner loadWithError:&error];
if (!success) {
  NSLog(@"Failed to load: %@", error);
}

Swift

do {
  try runner.load()
} catch {
  print("Failed to load: \(error)")
}

生成¶

从初始提示生成最多给定数量的标记。回调块在每个标记生成时被调用一次。

Objective-C

NSError *error = nil;
BOOL success = [runner generate:@"Once upon a time"
                 sequenceLength:50
              withTokenCallback:^(NSString *token) {
                NSLog(@"Generated token: %@", token);
              }
                          error:&error];
if (!success) {
  NSLog(@"Generation failed: %@", error);
}

Swift

do {
  try runner.generate("Once upon a time", sequenceLength: 50) { token in
    print("Generated token:", token)
  }
} catch {
  print("Generation failed:", error)
}

停止生成¶

如果您需要中断长时间运行的生成，请调用

Objective-C

[runner stop];

Swift

runner.stop()

演示¶

通过我们的 LLaMA iOS 演示应用程序动手实践，了解 LLM 运行时 API 的实际应用。

在 iOS 上运行 LLM¶

先决条件¶

运行时 API¶

导入¶

TextLLMRunner¶

初始化¶

加载¶

生成¶

停止生成¶

演示¶

文档

教程

资源