Ollama

日志地址 %LOCALAPPDATA%\Ollama

ollama 启动qwen nothinking

ollama run qwen3 --think=false

Ollama 是一个可以在自己电脑上运行大语言模型（LLMs）的小工具。

reference

默认配置

OLLAMA_MODELS 模型位置

从模型文件创建模型

gguf格式模型

关闭深度思考

模型列表

运行模型

ollama run deepseek-r1:7b

配置

OLLAMA_HOST 0.0.0.0

netsh advfirewall firewall add rule name="Allow Port 11434" dir=in action=allow protocol=TCP localport=11434

启动

ollama serve

nginx 代理

 location /v1/ {
        # 处理预检请求（OPTIONS）
        if ($request_method = OPTIONS) {
            add_header Access-Control-Allow-Origin "*";
            add_header Access-Control-Allow-Methods "GET, POST, OPTIONS";
            add_header Access-Control-Allow-Headers "Authorization, Content-Type";
            return 204;
        }

        # 正式请求走代理
        proxy_pass http://localhost:11434;

        # CORS headers
        add_header Access-Control-Allow-Origin "*";
        add_header Access-Control-Allow-Headers "Authorization, Content-Type";
        add_header Access-Control-Allow-Methods "GET, POST, OPTIONS";
    }

测试

curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3",
    "messages": [
      { "role": "system", "content": "You are a helpful assistant." },
      { "role": "user", "content": "Hello!" }
    ]
  }'

curl https://catpd.cn/v1/chat/completions -H "Content-Type: application/json" -H "Authorization: Bearer c4cd6c77-37ee-4d0a-bf55-999e0ffceb88" -d "{ \"messages\": [ { \"role\": \"system\", \"content\": \"You are a test assistant.\" }, { \"role\": \"user\", \"content\": \"Testing. Just say hi and nothing else.\" } ], \"model\": \"llama3\" }"

错误

(status code 0) TypeError: Failed to fetch

可以正常使用，不受影响

GPU加速

nvidia-smi

OLLAMA_GPU_LAYER cuda OLLAMA_HOST 0.0.0.0

CUDA_VISIBLE_DEVICES GPU-32c44cdf-114a-b09c-0dd0-431d3faa9eab

nvidia-smi -L

☁️ 部署建议

如果你打算长期运行项目（博客 / API / 自动化脚本），建议直接用云服务器，会比本地稳定很多。

👉 查看云服务器（新用户优惠）