• 尝试FreeBSD下安装ollama


    Ollama是一个用于在本地运行大型语言模型(LLM)的开源框架。它支持多种操作系统,但是唯独不支持FreeBSD,于是尝试在FreeBSD里编译安装。

    先上结论,官网的ollama没有编译成功,使用特供版可以安装成功。因为特供版改了代码,为了安全,最后是在FreeBSD jail里操作的。

    在FreeBSD下安装ollama(第一次尝试,失败)

    编译环境配置

    首先安装最新的go

    pkg install go122-1.22.5 cmake
    

    后来发现不行,还是安装了默认的go (原来需要使用go122这条命令来执行)

    pkg install go
    

    但是这个版本低啊

    下载高版本试试。 下载:https://go.dev/dl/go1.22.5.freebsd-amd64.tar.gz

    wget https://go.dev/dl/go1.22.5.freebsd-amd64.tar.gz

     解压缩

    tar -xzvf go1.22.5.freebsd-amd64.tar.gz

    加上路径

    export PATH=/home/skywalk/work/go/bin:$PATH

    现在go就是1.22.5版本了

    1. $ go version
    2. go version go1.22.5 freebsd/amd64

    加速go

    1. # Set the GOPROXY environment variable
    2. export GOPROXY=https://goproxy.io,direct
    3. # Set environment variable allow bypassing the proxy for specified repos (optional)
    4. export GOPRIVATE=git.mycompany.com,github.com/my/private

    编译ollama

    从官网下载ollama

    git clone https://github.com/ollama/ollama

    generate

    go generate ./...

    build

    go build . 

    但是这里没有编译成功,最后报错

    1. skywalk@fbhost:~/github/ollama $ go build .
    2. package github.com/ollama/ollama
    3. imports github.com/ollama/ollama/cmd
    4. imports github.com/ollama/ollama/server
    5. imports github.com/ollama/ollama/gpu: C source files not allowed when not using cgo or SWIG: gpu_info_cudart.c gpu_info_nvcuda.c gpu_info_nvml.c gpu_info_oneapi.c

    在FreeBSD jail里调试(第二次尝试,失败)

    创建一个FreeBSDjail,登录

    # cbsd jlogin fb12

    登录后是csh,如果不适应,可以改成bash

    安装需要的包

    # pkg install -y git go122 cmake vulkan-headers vulkan-loader

    下载特供版本

    # git clone --depth 1 https://github.com/prep/ollama.git

    # git clone https://github.com/prep/ollama.git

    git clone https://github.com/prep/ollama

    切branch(这里没切换成)

    # cd ollama && git checkout feature/add-bsd-support

    先设定加速

    csh下

    # set GO111MODULE=on

    # set GOPROXY=https://goproxy.io,direct
    # set GOPRIVATE=git.mycompany.com,github.com/my/private 

    bash下

    # 启用 Go Modules 功能

    export GO111MODULE=on

    # Set the GOPROXY environment variable
    export GOPROXY=https://goproxy.io,direct
    # Set environment variable allow bypassing the proxy for specified repos (optional)
    export GOPRIVATE=git.mycompany.com,github.com/my/private

    开始go generate和build 

    # go122 generate ./...

    # go122 build .

    最后报错:

    go122 build .
    go: downloading github.com/pdevine/tensor v0.0.0-20240228013915-64ccaa8d9ca9
    convert/gemma.go:12:2: github.com/pdevine/tensor@v0.0.0-20240228013915-64ccaa8d9ca9: invalid version: unknown revision 64ccaa8d9ca9
    convert/gemma.go:13:2: github.com/pdevine/tensor@v0.0.0-20240228013915-64ccaa8d9ca9: invalid version: unknown revision 64ccaa8d9ca9

    在FreeBSD jail里使用普通用户编译ollama特供版本(第三次尝试,成功)

    若有报错,需要修改go.sum文件和go.mod文件。

    使用如下命令:

    1. bash
    2. mkdir github.com
    3. cd github.com
    4. git clone https://github.com/prep/ollama.git
    5. cd ollama && git checkout feature/add-bsd-support
    6. # 启用 Go Modules 功能
    7. export GO111MODULE=on
    8. # Set the GOPROXY environment variable
    9. export GOPROXY=https://goproxy.io,direct
    10. # Set environment variable allow bypassing the proxy for specified repos (optional)
    11. export GOPRIVATE=git.mycompany.com,github.com/my/private
    12. go122 generate ./...
    13. go122 build .

    报错调试过程

    还是有报错: go122 build .
    go: downloading github.com/pdevine/tensor v0.0.0-20240228013915-64ccaa8d9ca9
    convert/gemma.go:12:2: github.com/pdevine/tensor@v0.0.0-20240228013915-64ccaa8d9ca9: invalid version: unknown revision 64ccaa8d9ca9
    convert/gemma.go:13:2: github.com/pdevine/tensor@v0.0.0-20240228013915-64ccaa8d9ca9: invalid version: unknown revision 64ccaa8d9ca9

    修改go.sum文件,将里面的pdeviene/tensor 修改成

    github.com/pdevine/tensor v0.0.0-20240510204454-f88f4562727c h1:GwiUUjKefgvSNmv3NCvI/BL0kDebW6Xa+kcdpdc1mTY=
    github.com/pdevine/tensor v0.0.0-20240510204454-f88f4562727c/go.mod h1:PSojXDXF7TbgQiD6kkd98IHOS0QqTyUEaWRiS8+BLu8=

    还需要修改go.mod文件,将里面的pdevine/tensor版本改成5.10日的最新版本:

    github.com/pdevine/tensor v0.0.0-20240510204454-f88f4562727c

    然后重新generate和build

    根据实际情况,如果不重新generate,看提示大约需要重新get一下:

    go122  get github.com/ollama/ollama/convert

    然后再继续build

    go122 build .
     

    搞定! 

    测试一下:

    ./ollama help | head -n 5

    ./ollama help | head -n 5
    Large language model runner

    Usage:
      ollama [flags]
      ollama [command]
    证明确实编译成功了!

    启动ollama

    首先要启动ollama服务

    ./ollama serve

    运行llama3模型

    ./ollama run llama3

    ollama会自动下载模型。模型下载好后,会进入交互界面。

    ollama的交互输出

    一句回答用了50分钟.....但至少它成了,在FreeBSD下执行成功了!

    1. [skywalk@fb12 ~/gihub.com/ollama]$ ./ollama run llama3
    2. [GIN] 2024/07/15 - 12:01:47 | 200 | 466.704µs | 10.0.0.12 | HEAD "/"
    3. [GIN] 2024/07/15 - 12:01:47 | 404 | 450.54µs | 10.0.0.12 | POST "/api/show"
    4. pulling manifest ⠦ time=2024-07-15T12:01:50.016+08:00 level=INFO source=download.go:136 msg="downloading 6a0746a1ec1a in 47 100 MB part(s)"
    5. pulling manifest
    6. pulling 6a0746a1ec1a... 100% ▕████████████████▏ 4.7 GB time=2024-07-15T12:20:25.740+08:00 level=INFO source=download.go:136 msg="downloapulling manifest
    7. pulling 6a0746a1ec1a... 100% ▕████████████████▏ 4.7 GB
    8. pulling 4fa551d4f938... 100% ▕████████████████▏ 12 KB tpulling manifest
    9. pulling 6a0746a1ec1a... 100% ▕████████████████▏ 4.7 GB
    10. pulling 4fa551d4f938... 100% ▕████████████████▏ 12 KB
    11. pulling manifest
    12. pulling 6a0746a1ec1a... 100% ▕████████████████▏ 4.7 GB
    13. pulling 4fa551d4f938... 100% ▕████████████████▏ 12 KB
    14. pulling manifest
    15. pulling 6a0746a1ec1a... 100% ▕████████████████▏ 4.7 GB
    16. pulling manifest
    17. pulling 6a0746a1ec1a... 100% ▕████████████████▏ 4.7 GB
    18. pulling 4fa551d4f938... 100% ▕████████████████▏ 12 KB
    19. pulling 8ab4849b038c... 100% ▕████████████████▏ 254 B
    20. pulling 577073ffcc6c... 100% ▕████████████████▏ 110 B
    21. pulling 3f8eb4da87fa... 100% ▕████████████████▏ 485 B
    22. verifying sha256 digest
    23. writing manifest
    24. removing any unused layers
    25. success
    26. [GIN] 2024/07/15 - 12:22:06 | 200 | 1.786897ms | 10.0.0.12 | POST "/api/show"
    27. [GIN] 2024/07/15 - 12:22:06 | 200 | 1.384117ms | 10.0.0.12 | POST "/api/show"
    28. time=2024-07-15T12:22:06.288+08:00 level=INFO source=cpu_common.go:18 msg="CPU does not have vector extensions"
    29. ⠴ time=2024-07-15T12:22:20.820+08:00 level=INFO source=cpu_common.go:18 msg="CPU does not have vector extensions"
    30. time=2024-07-15T12:22:20.821+08:00 level=INFO source=server.go:289 msg="starting llama server" cmd="/tmp/ollama1084183988/runners/cpu/ollama_llama_server --model /home/skywalk/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa --ctx-size 2048 --batch-size 512 --embedding --log-disable --parallel 1 --port 62268"
    31. time=2024-07-15T12:22:20.847+08:00 level=INFO source=sched.go:340 msg="loaded runners" count=1
    32. time=2024-07-15T12:22:20.847+08:00 level=INFO source=server.go:432 msg="waiting for llama runner to start responding"
    33. {"function":"server_params_parse","level":"INFO","line":2604,"msg":"logging to file is disabled.","tid":"0x10139f812000","timestamp":1721017340}
    34. ⠦ {"build":2770,"commit":"952d03db","function":"main","level":"INFO","line":2821,"msg":"build info","tid":"0x10139f812000","timestamp":1721017340}
    35. {"function":"main","level":"INFO","line":2828,"msg":"system info","n_threads":4,"n_threads_batch":-1,"system_info":"AVX = 0 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | ","tid":"0x10139f812000","timestamp":1721017340,"total_threads":4}
    36. ⠧ llama_model_loader: loaded meta data with 22 key-value pairs and 291 tensors from /home/skywalk/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa (version GGUF V3 (latest))
    37. llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
    38. llama_model_loader: - kv 0: general.architecture str = llama
    39. llama_model_loader: - kv 1: general.name str = Meta-Llama-3-8B-Instruct
    40. llama_model_loader: - kv 2: llama.block_count u32 = 32
    41. llama_model_loader: - kv 3: llama.context_length u32 = 8192
    42. llama_model_loader: - kv 4: llama.embedding_length u32 = 4096
    43. llama_model_loader: - kv 5: llama.feed_forward_length u32 = 14336
    44. llama_model_loader: - kv 6: llama.attention.head_count u32 = 32
    45. llama_model_loader: - kv 7: llama.attention.head_count_kv u32 = 8
    46. llama_model_loader: - kv 8: llama.rope.freq_base f32 = 500000.000000
    47. llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
    48. llama_model_loader: - kv 10: general.file_type u32 = 2
    49. llama_model_loader: - kv 11: llama.vocab_size u32 = 128256
    50. llama_model_loader: - kv 12: llama.rope.dimension_count u32 = 128
    51. llama_model_loader: - kv 13: tokenizer.ggml.model str = gpt2
    52. llama_model_loader: - kv 14: tokenizer.ggml.pre str = llama-bpe
    53. ⠇ llama_model_loader: - kv 15: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ...
    54. ⠏ llama_model_loader: - kv 16: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
    55. ⠙ llama_model_loader: - kv 17: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
    56. llama_model_loader: - kv 18: tokenizer.ggml.bos_token_id u32 = 128000
    57. llama_model_loader: - kv 19: tokenizer.ggml.eos_token_id u32 = 128009
    58. llama_model_loader: - kv 20: tokenizer.chat_template str = {% set loop_messages = messages %}{% ...
    59. llama_model_loader: - kv 21: general.quantization_version u32 = 2
    60. llama_model_loader: - type f32: 65 tensors
    61. llama_model_loader: - type q4_0: 225 tensors
    62. llama_model_loader: - type q6_K: 1 tensors
    63. ⠹ llm_load_vocab: special tokens definition check successful ( 256/128256 ).
    64. llm_load_print_meta: format = GGUF V3 (latest)
    65. llm_load_print_meta: arch = llama
    66. llm_load_print_meta: vocab type = BPE
    67. llm_load_print_meta: n_vocab = 128256
    68. llm_load_print_meta: n_merges = 280147
    69. llm_load_print_meta: n_ctx_train = 8192
    70. llm_load_print_meta: n_embd = 4096
    71. llm_load_print_meta: n_head = 32
    72. llm_load_print_meta: n_head_kv = 8
    73. llm_load_print_meta: n_layer = 32
    74. llm_load_print_meta: n_rot = 128
    75. llm_load_print_meta: n_embd_head_k = 128
    76. llm_load_print_meta: n_embd_head_v = 128
    77. llm_load_print_meta: n_gqa = 4
    78. llm_load_print_meta: n_embd_k_gqa = 1024
    79. llm_load_print_meta: n_embd_v_gqa = 1024
    80. llm_load_print_meta: f_norm_eps = 0.0e+00
    81. llm_load_print_meta: f_norm_rms_eps = 1.0e-05
    82. llm_load_print_meta: f_clamp_kqv = 0.0e+00
    83. llm_load_print_meta: f_max_alibi_bias = 0.0e+00
    84. llm_load_print_meta: f_logit_scale = 0.0e+00
    85. llm_load_print_meta: n_ff = 14336
    86. llm_load_print_meta: n_expert = 0
    87. llm_load_print_meta: n_expert_used = 0
    88. llm_load_print_meta: causal attn = 1
    89. llm_load_print_meta: pooling type = 0
    90. llm_load_print_meta: rope type = 0
    91. llm_load_print_meta: rope scaling = linear
    92. llm_load_print_meta: freq_base_train = 500000.0
    93. llm_load_print_meta: freq_scale_train = 1
    94. llm_load_print_meta: n_yarn_orig_ctx = 8192
    95. llm_load_print_meta: rope_finetuned = unknown
    96. llm_load_print_meta: ssm_d_conv = 0
    97. llm_load_print_meta: ssm_d_inner = 0
    98. llm_load_print_meta: ssm_d_state = 0
    99. llm_load_print_meta: ssm_dt_rank = 0
    100. llm_load_print_meta: model type = 8B
    101. llm_load_print_meta: model ftype = Q4_0
    102. llm_load_print_meta: model params = 8.03 B
    103. llm_load_print_meta: model size = 4.33 GiB (4.64 BPW)
    104. llm_load_print_meta: general.name = Meta-Llama-3-8B-Instruct
    105. llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>'
    106. llm_load_print_meta: EOS token = 128009 '<|eot_id|>'
    107. llm_load_print_meta: LF token = 128 'Ä'
    108. llm_load_print_meta: EOT token = 128009 '<|eot_id|>'
    109. llm_load_tensors: ggml ctx size = 0.15 MiB
    110. llm_load_tensors: CPU buffer size = 4437.80 MiB
    111. .......................................................................................
    112. ⠸ llama_new_context_with_model: n_ctx = 2048
    113. llama_new_context_with_model: n_batch = 512
    114. llama_new_context_with_model: n_ubatch = 512
    115. llama_new_context_with_model: freq_base = 500000.0
    116. llama_new_context_with_model: freq_scale = 1
    117. ⠦ llama_kv_cache_init: CPU KV buffer size = 256.00 MiB
    118. llama_new_context_with_model: KV self size = 256.00 MiB, K (f16): 128.00 MiB, V (f16): 128.00 MiB
    119. llama_new_context_with_model: CPU output buffer size = 0.50 MiB
    120. llama_new_context_with_model: CPU compute buffer size = 258.50 MiB
    121. llama_new_context_with_model: graph nodes = 1030
    122. llama_new_context_with_model: graph splits = 1
    123. ⠧ {"function":"initialize","level":"INFO","line":448,"msg":"initializing slots","n_slots":1,"tid":"0x10139f812000","timestamp":1721017395}
    124. {"function":"initialize","level":"INFO","line":460,"msg":"new slot","n_ctx_slot":2048,"slot_id":0,"tid":"0x10139f812000","timestamp":1721017395}
    125. {"function":"main","level":"INFO","line":3065,"msg":"model loaded","tid":"0x10139f812000","timestamp":1721017395}
    126. {"function":"main","hostname":"127.0.0.1","level":"INFO","line":3268,"msg":"HTTP server listening","n_threads_http":"3","port":"62268","tid":"0x10139f812000","timestamp":1721017395}
    127. {"function":"update_slots","level":"INFO","line":1579,"msg":"all slots are idle and system prompt is empty, clear the KV cache","tid":"0x10139f812000","timestamp":1721017395}
    128. {"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":0,"tid":"0x10139f812000","timestamp":1721017395}
    129. {"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":1,"tid":"0x10139f812000","timestamp":1721017395}
    130. {"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":37211,"status":200,"tid":"0x1013dbe0ae00","timestamp":1721017395}
    131. {"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":2,"tid":"0x10139f812000","timestamp":1721017395}
    132. {"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":60236,"status":200,"tid":"0x1013dbe0a700","timestamp":1721017395}
    133. {"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":3,"tid":"0x10139f812000","timestamp":1721017395}
    134. {"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":43135,"status":200,"tid":"0x1013dbe0a000","timestamp":1721017395}
    135. {"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":4,"tid":"0x10139f812000","timestamp":1721017395}
    136. {"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":31620,"status":200,"tid":"0x1013dbe0ae00","timestamp":1721017395}
    137. {"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":5,"tid":"0x10139f812000","timestamp":1721017395}
    138. {"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":56527,"status":200,"tid":"0x1013dbe0a700","timestamp":1721017395}
    139. {"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":6,"tid":"0x10139f812000","timestamp":1721017395}
    140. {"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":53213,"status":200,"tid":"0x1013dbe0a000","timestamp":1721017395}
    141. {"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":7,"tid":"0x10139f812000","timestamp":1721017395}
    142. {"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":21875,"status":200,"tid":"0x1013dbe0ae00","timestamp":1721017395}
    143. {"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":47567,"status":200,"tid":"0x1013dbe0a700","timestamp":1721017395}
    144. {"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":8,"tid":"0x10139f812000","timestamp":1721017395}
    145. {"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":56264,"status":200,"tid":"0x1013dbe0a700","timestamp":1721017395}
    146. ⠇ [GIN] 2024/07/15 - 12:23:15 | 200 | 1m8s | 10.0.0.12 | POST "/api/chat"
    147. >>> hello
    148. time=2024-07-15T14:22:47.710+08:00 level=INFO source=cpu_common.go:18 msg="CPU does not have vector extensions"
    149. ⠋ time=2024-07-15T14:23:02.785+08:00 level=INFO source=cpu_common.go:18 msg="CPU does not have vector extensions"
    150. ⠙ time=2024-07-15T14:23:02.789+08:00 level=INFO source=server.go:289 msg="starting llama server" cmd="/tmp/ollama1084183988/runners/cpu/ollama_llama_server --model /home/skywalk/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa --ctx-size 2048 --batch-size 512 --embedding --log-disable --parallel 1 --port 61604"
    151. time=2024-07-15T14:23:02.811+08:00 level=INFO source=sched.go:340 msg="loaded runners" count=1
    152. time=2024-07-15T14:23:02.812+08:00 level=INFO source=server.go:432 msg="waiting for llama runner to start responding"
    153. {"function":"server_params_parse","level":"INFO","line":2604,"msg":"logging to file is disabled.","tid":"0x20da49412000","timestamp":1721024582}
    154. {"build":2770,"commit":"952d03db","function":"main","level":"INFO","line":2821,"msg":"build info","tid":"0x20da49412000","timestamp":1721024582}
    155. {"function":"main","level":"INFO","line":2828,"msg":"system info","n_threads":4,"n_threads_batch":-1,"system_info":"AVX = 0 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | ","tid":"0x20da49412000","timestamp":1721024582,"total_threads":4}
    156. ⠸ llama_model_loader: loaded meta data with 22 key-value pairs and 291 tensors from /home/skywalk/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa (version GGUF V3 (latest))
    157. llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
    158. llama_model_loader: - kv 0: general.architecture str = llama
    159. llama_model_loader: - kv 1: general.name str = Meta-Llama-3-8B-Instruct
    160. llama_model_loader: - kv 2: llama.block_count u32 = 32
    161. llama_model_loader: - kv 3: llama.context_length u32 = 8192
    162. llama_model_loader: - kv 4: llama.embedding_length u32 = 4096
    163. llama_model_loader: - kv 5: llama.feed_forward_length u32 = 14336
    164. llama_model_loader: - kv 6: llama.attention.head_count u32 = 32
    165. llama_model_loader: - kv 7: llama.attention.head_count_kv u32 = 8
    166. llama_model_loader: - kv 8: llama.rope.freq_base f32 = 500000.000000
    167. llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
    168. llama_model_loader: - kv 10: general.file_type u32 = 2
    169. llama_model_loader: - kv 11: llama.vocab_size u32 = 128256
    170. llama_model_loader: - kv 12: llama.rope.dimension_count u32 = 128
    171. llama_model_loader: - kv 13: tokenizer.ggml.model str = gpt2
    172. llama_model_loader: - kv 14: tokenizer.ggml.pre str = llama-bpe
    173. ⠼ llama_model_loader: - kv 15: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ...
    174. llama_model_loader: - kv 16: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
    175. ⠧ llama_model_loader: - kv 17: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
    176. llama_model_loader: - kv 18: tokenizer.ggml.bos_token_id u32 = 128000
    177. llama_model_loader: - kv 19: tokenizer.ggml.eos_token_id u32 = 128009
    178. llama_model_loader: - kv 20: tokenizer.chat_template str = {% set loop_messages = messages %}{% ...
    179. llama_model_loader: - kv 21: general.quantization_version u32 = 2
    180. llama_model_loader: - type f32: 65 tensors
    181. llama_model_loader: - type q4_0: 225 tensors
    182. llama_model_loader: - type q6_K: 1 tensors
    183. ⠇ llm_load_vocab: special tokens definition check successful ( 256/128256 ).
    184. llm_load_print_meta: format = GGUF V3 (latest)
    185. llm_load_print_meta: arch = llama
    186. llm_load_print_meta: vocab type = BPE
    187. llm_load_print_meta: n_vocab = 128256
    188. llm_load_print_meta: n_merges = 280147
    189. llm_load_print_meta: n_ctx_train = 8192
    190. llm_load_print_meta: n_embd = 4096
    191. llm_load_print_meta: n_head = 32
    192. llm_load_print_meta: n_head_kv = 8
    193. llm_load_print_meta: n_layer = 32
    194. llm_load_print_meta: n_rot = 128
    195. llm_load_print_meta: n_embd_head_k = 128
    196. llm_load_print_meta: n_embd_head_v = 128
    197. llm_load_print_meta: n_gqa = 4
    198. llm_load_print_meta: n_embd_k_gqa = 1024
    199. llm_load_print_meta: n_embd_v_gqa = 1024
    200. llm_load_print_meta: f_norm_eps = 0.0e+00
    201. llm_load_print_meta: f_norm_rms_eps = 1.0e-05
    202. llm_load_print_meta: f_clamp_kqv = 0.0e+00
    203. llm_load_print_meta: f_max_alibi_bias = 0.0e+00
    204. llm_load_print_meta: f_logit_scale = 0.0e+00
    205. llm_load_print_meta: n_ff = 14336
    206. llm_load_print_meta: n_expert = 0
    207. llm_load_print_meta: n_expert_used = 0
    208. llm_load_print_meta: causal attn = 1
    209. llm_load_print_meta: pooling type = 0
    210. llm_load_print_meta: rope type = 0
    211. llm_load_print_meta: rope scaling = linear
    212. llm_load_print_meta: freq_base_train = 500000.0
    213. llm_load_print_meta: freq_scale_train = 1
    214. llm_load_print_meta: n_yarn_orig_ctx = 8192
    215. llm_load_print_meta: rope_finetuned = unknown
    216. llm_load_print_meta: ssm_d_conv = 0
    217. llm_load_print_meta: ssm_d_inner = 0
    218. llm_load_print_meta: ssm_d_state = 0
    219. llm_load_print_meta: ssm_dt_rank = 0
    220. llm_load_print_meta: model type = 8B
    221. llm_load_print_meta: model ftype = Q4_0
    222. llm_load_print_meta: model params = 8.03 B
    223. llm_load_print_meta: model size = 4.33 GiB (4.64 BPW)
    224. llm_load_print_meta: general.name = Meta-Llama-3-8B-Instruct
    225. llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>'
    226. llm_load_print_meta: EOS token = 128009 '<|eot_id|>'
    227. llm_load_print_meta: LF token = 128 'Ä'
    228. llm_load_print_meta: EOT token = 128009 '<|eot_id|>'
    229. llm_load_tensors: ggml ctx size = 0.15 MiB
    230. ⠙ llm_load_tensors: CPU buffer size = 4437.80 MiB
    231. .......................................................................................
    232. llama_new_context_with_model: n_ctx = 2048
    233. llama_new_context_with_model: n_batch = 512
    234. llama_new_context_with_model: n_ubatch = 512
    235. llama_new_context_with_model: freq_base = 500000.0
    236. llama_new_context_with_model: freq_scale = 1
    237. ⠴ llama_kv_cache_init: CPU KV buffer size = 256.00 MiB
    238. llama_new_context_with_model: KV self size = 256.00 MiB, K (f16): 128.00 MiB, V (f16): 128.00 MiB
    239. llama_new_context_with_model: CPU output buffer size = 0.50 MiB
    240. llama_new_context_with_model: CPU compute buffer size = 258.50 MiB
    241. llama_new_context_with_model: graph nodes = 1030
    242. llama_new_context_with_model: graph splits = 1
    243. ⠦ {"function":"initialize","level":"INFO","line":448,"msg":"initializing slots","n_slots":1,"tid":"0x20da49412000","timestamp":1721024651}
    244. {"function":"initialize","level":"INFO","line":460,"msg":"new slot","n_ctx_slot":2048,"slot_id":0,"tid":"0x20da49412000","timestamp":1721024651}
    245. {"function":"main","level":"INFO","line":3065,"msg":"model loaded","tid":"0x20da49412000","timestamp":1721024651}
    246. {"function":"main","hostname":"127.0.0.1","level":"INFO","line":3268,"msg":"HTTP server listening","n_threads_http":"3","port":"61604","tid":"0x20da49412000","timestamp":1721024651}
    247. {"function":"update_slots","level":"INFO","line":1579,"msg":"all slots are idle and system prompt is empty, clear the KV cache","tid":"0x20da49412000","timestamp":1721024651}
    248. {"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":0,"tid":"0x20da49412000","timestamp":1721024651}
    249. {"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":1,"tid":"0x20da49412000","timestamp":1721024651}
    250. {"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":48229,"status":200,"tid":"0x20da85a0a000","timestamp":1721024651}
    251. {"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":2,"tid":"0x20da49412000","timestamp":1721024651}
    252. {"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":33319,"status":200,"tid":"0x20da85a0a700","timestamp":1721024651}
    253. {"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":3,"tid":"0x20da49412000","timestamp":1721024651}
    254. {"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":54187,"status":200,"tid":"0x20da85a0ae00","timestamp":1721024651}
    255. {"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":4,"tid":"0x20da49412000","timestamp":1721024651}
    256. {"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":28162,"status":200,"tid":"0x20da85a0a000","timestamp":1721024651}
    257. {"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":33773,"status":200,"tid":"0x20da85a0a700","timestamp":1721024651}
    258. {"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":5,"tid":"0x20da49412000","timestamp":1721024651}
    259. {"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":6,"tid":"0x20da49412000","timestamp":1721024651}
    260. {"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":19633,"status":200,"tid":"0x20da85a0ae00","timestamp":1721024651}
    261. {"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":7,"tid":"0x20da49412000","timestamp":1721024651}
    262. {"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":35779,"status":200,"tid":"0x20da85a0a000","timestamp":1721024651}
    263. {"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":18413,"status":200,"tid":"0x20da85a0a700","timestamp":1721024651}
    264. ⠧ {"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":8,"tid":"0x20da49412000","timestamp":1721024651}
    265. {"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":36742,"status":200,"tid":"0x20da85a0a700","timestamp":1721024651}
    266. {"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":9,"tid":"0x20da49412000","timestamp":1721024651}
    267. {"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":36742,"status":200,"tid":"0x20da85a0a700","timestamp":1721024651}
    268. ⠇ {"function":"log_server_request","level":"INFO","line":2742,"method":"POST","msg":"request","params":{},"path":"/tokenize","remote_addr":"10.0.0.12","remote_port":36742,"status":200,"tid":"0x20da85a0a700","timestamp":1721024651}
    269. {"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":10,"tid":"0x20da49412000","timestamp":1721024651}
    270. {"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":36742,"status":200,"tid":"0x20da85a0a700","timestamp":1721024651}
    271. ⠏ {"function":"launch_slot_with_data","level":"INFO","line":833,"msg":"slot is processing task","slot_id":0,"task_id":11,"tid":"0x20da49412000","timestamp":1721024651}
    272. {"function":"update_slots","ga_i":0,"level":"INFO","line":1817,"msg":"slot progression","n_past":0,"n_past_se":0,"n_prompt_tokens_processed":10,"slot_id":0,"task_id":11,"tid":"0x20da49412000","timestamp":1721024651}
    273. {"function":"update_slots","level":"INFO","line":1841,"msg":"kv cache rm [p0, end)","p0":0,"slot_id":0,"task_id":11,"tid":"0x20da49412000","timestamp":1721024651}
    274. Hello! It's nice to meet you. Is there something I can help you with, or
    275. would you like to chat?{"function":"print_timings","level":"INFO","line":276,"msg":"prompt eval time = 106459.91 ms / 10 tokens (10645.99 ms per token, 0.09 tokens per second)","n_prompt_tokens_processed":10,"n_tokens_second":0.09393207617523164,"slot_id":0,"t_prompt_processing":106459.906,"t_token":10645.990600000001,"task_id":11,"tid":"0x20da49412000","timestamp":1721027627}
    276. {"function":"print_timings","level":"INFO","line":290,"msg":"generation eval time = 2868918.63 ms / 26 runs (110343.02 ms per token, 0.01 tokens per second)","n_decoded":26,"n_tokens_second":0.00906264811318913,"slot_id":0,"t_token":110343.0241923077,"t_token_generation":2868918.629,"task_id":11,"tid":"0x20da49412000","timestamp":1721027627}
    277. {"function":"print_timings","level":"INFO","line":299,"msg":" total time = 2975378.54 ms","slot_id":0,"t_prompt_processing":106459.906,"t_token_generation":2868918.629,"t_total":2975378.535,"task_id":11,"tid":"0x20da49412000","timestamp":1721027627}
    278. {"function":"update_slots","level":"INFO","line":1649,"msg":"slot released","n_cache_tokens":36,"n_ctx":2048,"n_past":35,"n_system_tokens":0,"slot_id":0,"task_id":11,"tid":"0x20da49412000","timestamp":1721027627,"truncated":false}
    279. {"function":"log_server_request","level":"INFO","line":2742,"method":"POST","msg":"request","params":{},"path":"/completion","remote_addr":"10.0.0.12","remote_port":36742,"status":200,"tid":"0x20da85a0a700","timestamp":1721027627}
    280. [GIN] 2024/07/15 - 15:13:47 | 200 | 50m59s | 10.0.0.12 | POST "/api/chat"

    总结

    ollama可以在FreeBSD下编译,但是需要特供版本。官网是:GitHub - ollama/ollama: Get up and running with Llama 3, Mistral, Gemma 2, and other large language models. 特供版是:https://github.com/prep/ollama

    特供版如果编译时报错,看报错信息,相应修改go.sum go.mod文件里 github.com/pdevine/tensor v0.0.0-20240510204454-f88f4562727c 这句,修改成5.10日版本。

    整个系统在CPU J1900 、8G 内存,软件FreeBSD fbhost 14.1-RELEASE FreeBSD 下调试成功。尽管ollama速度非常慢,大约50分钟回答一个问题,但至少,它确实成功了! 

    调试

    go build的时候报错

    1. skywalk@fbhost:~/github/ollama $ go build .
    2. package github.com/ollama/ollama
    3. imports github.com/ollama/ollama/cmd
    4. imports github.com/ollama/ollama/server
    5. imports github.com/ollama/ollama/gpu: C source files not allowed when not using cgo or SWIG: gpu_info_cudart.c gpu_info_nvcuda.c gpu_info_nvml.c gpu_info_oneapi.c

    怎么会有gpu呢? 哪里配置不对? 

    为了FreeBSD下编译查看了ollama的issue

    Ollama on FreeBSD · Issue #1102 · ollama/ollama · GitHub

    在这个issue里,提到了方法,使用另一个repo:

    # pkg install -y git go122 cmake vulkan-headers vulkan-loader

    # git clone https://github.com/prep/ollama.git

    # cd ollama && git checkout feature/add-bsd-support

    # go122 generate ./...

    # go122 build .

    1. # ./ollama help | head -n 5
    2. Large language model runner
    3. Usage:
    4. ollama [flags]
    5. ollama [command]

    Works fine for me, no problems encountered.

    本来好像主repo 也可以FreeBSD下安装的,但是5.6日之后就不行了:Make maximum pending request configurable by dhiltgen · Pull Request #4144 · ollama/ollama · GitHub

     git checkout feature/add-bsd-support报错

    git checkout feature/add-bsd-support
    error: pathspec 'feature/add-bsd-support' did not match any file(s) known to git

    原来是因为前面代码没有下载全的原因。

    # git clone --depth 1 https://github.com/prep/ollama.git

    切branch(这里没切换成)

    # cd ollama && git checkout feature/add-bsd-support

    这里不能用--depth 1 ,去掉,

    git clone  https://github.com/prep/ollama.git

    这样就能git checkout feature/add-bsd-support 成功了。

    vulkan-headers 和 vulkan-loader 两个包的功能

    vulkan-headers 和 vulkan-loader 是与 Vulkan API 相关的两个关键组件,它们在开发使用 Vulkan 图形和计算 API 的应用程序时起着重要的作用。Vulkan 是一个跨平台的图形和计算 API,由 Khronos Group 开发,旨在提供高性能的 3D 图形渲染能力。

    在jail里build的时候报错C source files not allowed

    先上结论,是因为github抽风。

    在jail里build的时候报错imports github.com/ollama/ollama/gpu: C source files not allowed when not using cgo or SWIG: gpu_info_cpu.c gpu_info_cudart.c

    同时还有github连不上的报错:

     fatal: unable to access 'https://github.com/pdevine/tensor/': Failed to connect to github.com port 443 after 75025 ms: Couldn't connect to server
     

    go: downloading github.com/pdevine/tensor v0.0.0-20240228013915-64ccaa8d9ca9
    package github.com/ollama/ollama
        imports github.com/ollama/ollama/cmd
        imports github.com/ollama/ollama/server
        imports github.com/ollama/ollama/gpu: C source files not allowed when not using cgo or SWIG: gpu_info_cpu.c gpu_info_cudart.c
    convert/gemma.go:12:2: github.com/pdevine/tensor@v0.0.0-20240228013915-64ccaa8d9ca9: invalid version: git fetch -f origin refs/heads/*:refs/heads/* refs/tags/*:refs/tags/* in /root/go/pkg/mod/cache/vcs/6bf5b14e60582bdf39d55e6388653dd8c2addad6937480b86ddb5a729a838afe: exit status 128:
        fatal: unable to access 'https://github.com/pdevine/tensor/': Failed to connect to github.com port 443 after 75025 ms: Couldn't connect to server
    convert/gemma.go:13:2: github.com/pdevine/tensor@v0.0.0-20240228013915-64ccaa8d9ca9: invalid version: git fetch -f origin refs/heads/*:refs/heads/* refs/tags/*:refs/tags/* in /root/go/pkg/mod/cache/vcs/6bf5b14e60582bdf39d55e6388653dd8c2addad6937480b86ddb5a729a838afe: exit status 128:
        fatal: unable to access 'https://github.com/pdevine/tensor/': Failed to connect to github.com port 443 after 75025 ms: Couldn't connect to server
     

    第一次generate之后,build没成功

    + echo 'go generate completed.  LLM runners: cpu cpu_avx cpu_avx2 vulkan'
    go generate completed.  LLM runners: cpu cpu_avx cpu_avx2 vulkan
    [root@fb12 ollama]# go122 build .
    go: downloading github.com/pdevine/tensor v0.0.0-20240228013915-64ccaa8d9ca9
    convert/gemma.go:12:2: github.com/pdevine/tensor@v0.0.0-20240228013915-64ccaa8d9ca9: invalid version: unknown revision 64ccaa8d9ca9
    convert/gemma.go:13:2: github.com/pdevine/tensor@v0.0.0-20240228013915-64ccaa8d9ca9: invalid version: unknown revision 64ccaa8d9ca9
    不知道什么原因,不过有可能还是github抽风....

    再重新generate一下。继续抽风中

    前面都是用的root账户,尝试使用普通用户编译试试。

    普通用户也是这个报错

    修改go.sum文件,将里面的pdeviene/tensor 修改成

    github.com/pdevine/tensor v0.0.0-20240510204454-f88f4562727c h1:GwiUUjKefgvSNmv3NCvI/BL0kDebW6Xa+kcdpdc1mTY=
    github.com/pdevine/tensor v0.0.0-20240510204454-f88f4562727c/go.mod h1:PSojXDXF7TbgQiD6kkd98IHOS0QqTyUEaWRiS8+BLu8=

    修改之后,go build报错

    go build 报错convert/gemma.go:13:2: missing go.sum entry for module providing package

    go122 build .
    convert/gemma.go:12:2: missing go.sum entry for module providing package github.com/pdevine/tensor (imported by github.com/ollama/ollama/convert); to add:
        go get github.com/ollama/ollama/convert
    convert/gemma.go:13:2: missing go.sum entry for module providing package github.com/pdevine/tensor/native (imported by github.com/ollama/ollama/convert); to add:
        go get github.com/ollama/ollama/convert
    发现go.mod 文件里也有版本,修改成当前的:

    github.com/pdevine/tensor v0.0.0-20240510204454-f88f4562727c

    但是又报错了

    go.sum go.mod文件里修改tensor版本后报错

    verifying github.com/google/flatbuffers@v1.12.0: checksum mismatch
        downloaded: h1:N8EguYFm2wwdpoNcpchQY0tPs85vOJkboFb2dPxmixo=
        go.sum:     h1:/PtAHvnBY4Kqnx/xCQ3OIV9uYcSFGScBsWI3Oogeh6w=

    SECURITY ERROR
    This download does NOT match an earlier download recorded in go.sum.
    The bits may have been replaced on the origin server, or an attacker may
    have intercepted the download attempt.
     

    1. go122 generate ./...
    2. go: downloading github.com/google/flatbuffers v1.12.0
    3. go: downloading gonum.org/v1/gonum v0.8.2
    4. verifying github.com/google/flatbuffers@v1.12.0: checksum mismatch
    5. downloaded: h1:N8EguYFm2wwdpoNcpchQY0tPs85vOJkboFb2dPxmixo=
    6. go.sum: h1:/PtAHvnBY4Kqnx/xCQ3OIV9uYcSFGScBsWI3Oogeh6w=
    7. SECURITY ERROR
    8. This download does NOT match an earlier download recorded in go.sum.
    9. The bits may have been replaced on the origin server, or an attacker may
    10. have intercepted the download attempt.

    晕了,这个特供版本有问题啊

    go.mod 修改成这样试试 github.com/pdevine/tensor v0.0.0-20240510204454-f88f4562727c

    然后执行 

    go122  get github.com/ollama/ollama/convert

    然后执行

    go122 build .

    终于安装完成了。

  • 相关阅读:
    C语言实现线索化二叉树(先序、中序、后序)
    数仓主题域和数据域、雪花模型,星型模型和星座模型
    Visual Code 开发web 的hello world
    【Linux】kernel与应用消息队列的一种设计
    基于php的汇业家具商城
    前端框架海洋:如何破浪前行,寻找你的“黄金舟”
    Java声明式事务实战!工作中用这几种就够了!
    pip更新报错 Command “python setup.py egg_info“ failed with error code 1
    关于windows虚拟机的问题
    27. 738.单调递增的数字,968.监控二叉树,贪心算法总结
  • 原文地址:https://blog.csdn.net/skywalk8163/article/details/140408428