• OFI libfabric原理及应用解析


    Agenda 目录/议题

    • 编译通信软件
    • 硬件和软件带来的挑战
    • 为什么需要libfabric
    • libfabric架构
    • API分组
    • socket应用 VS libfabric应用区别
    • GPU数据传输示例

    编译通信软件

    • 可靠面向连接的TCP和无连接的数据报UDP协议
    • 高性能计算HPC或人工智能AI

    软硬件复杂性带来的挑战

    • 上千个节点的集群, 不同的网络类型(以太网, IB, 光纤等), 熊猫博士提到的CPU/GPU/XPU/IPU等等
    • xelink, nvlink, gpu
    • 软件库, 如nccl,intel mpi等
    • 所以咱们需要一个通用的通信库来桥接这些复杂的软硬件资源

    libfabric来解决上面的问题

    • 统一API, 让程序员更轻松
    • 高性能和高可扩展性
    • 核心组件: 众多网卡提供者的库, 核心服务, 测试程序等

    为什么需要libfabric

    • 承上启下, 桥接底层复杂多样的网络(socket, rdma, gpu, 共享内存等)和上层MPI, CCL, 共享内存等应用

    socket编程与libfabric编程对比

    • 语义类似, 如获取信息, bind, connect等, 但是libfabric底层支持多种网络类型

    支持GPU通信

    • gpu通信示例, 支持intel gpu, dpu, 或者其他供应商

    架构与四种服务

    • 控制类: 发现底层设备, 属性, 能力等
    • 通信接口: 建立连接, 初始资源等
    • 数据传输: 发送和接收数据
    • 完成服务: 报告发送或接收状态

    libfabric API分组

    • 整合底层提供者和上层开发者使用统一的API编程, 就像熊猫博士说的那样, 方便了提供者更方便的提供插件, 也方便了上层应用开发者

    tcp socket 代码截图

    socket收发数据示例

    server端启动:./example_socket客户端连接和发送:./example_socket 192.168.5.6Hello server this is client.

    example_socket.c 源码

    1. #include <stdio.h>
    2. #include <string.h>
    3. #include <sys/socket.h>
    4. #include <arpa/inet.h>
    5. char *dst_addr = NULL;
    6. int main(int argc, char *argv[])
    7. {
    8. int socket_desc, client_sock, client_size;
    9. struct sockaddr_in server_addr, client_addr;
    10. char server_message[2000], client_message[2000];
    11. dst_addr = inet_addr(argv[1]);
    12. // Clean buffers:
    13. memset(server_message, '\0', sizeof(server_message));
    14. memset(client_message, '\0', sizeof(client_message));
    15. // Create socket:
    16. socket_desc = socket(AF_INET, SOCK_STREAM, 0);
    17. if(socket_desc < 0){
    18. printf("Error while creating socket\n");
    19. return -1;
    20. }
    21. printf("Socket created successfully\n");
    22. // Set port and IP:
    23. server_addr.sin_family = AF_INET;
    24. server_addr.sin_port = "43192";
    25. server_addr.sin_addr.s_addr = inet_addr("127.0.0.1");
    26. if (!dst_addr) {
    27. // Bind to the set port and IP:
    28. if(bind(socket_desc, (struct sockaddr*)&server_addr, sizeof(server_addr))<0){
    29. printf("Couldn't bind to the port\n");
    30. return -1;
    31. }
    32. printf("Binding complete\n");
    33. // Listen for clients:
    34. if(listen(socket_desc, 1) < 0){
    35. printf("Error while listening\n");
    36. return -1;
    37. }
    38. printf("Listening for incoming connections...\n");
    39. // Accept an incoming connection:
    40. client_size = sizeof(client_addr);
    41. client_sock = accept(socket_desc, (struct sockaddr*)&client_addr, &client_size);
    42. if (client_sock < 0){
    43. printf("Can't accept\n");
    44. return -1;
    45. }
    46. printf("Client connected at IP: %s and port: %i\n", inet_ntoa(client_addr.sin_addr), ntohs(client_addr.sin_port));
    47. // Receive client's message:
    48. if (recv(client_sock, client_message, sizeof(client_message), 0) < 0){
    49. printf("Couldn't receive\n");
    50. return -1;
    51. }
    52. printf("Msg from client: %s\n", client_message);
    53. // Respond to client:
    54. strcpy(server_message, "This is the server's message.");
    55. if (send(client_sock, server_message, strlen(server_message), 0) < 0){
    56. printf("Can't send\n");
    57. return -1;
    58. }
    59. }
    60. if (dst_addr) {
    61. // Send connection request to server:
    62. if(connect(socket_desc, (struct sockaddr*)&server_addr, sizeof(server_addr)) < 0){
    63. printf("Unable to connect\n");
    64. return -1;
    65. }
    66. printf("Connected with server successfully\n");
    67. // Get input from the user:
    68. printf("Enter message: ");
    69. gets(client_message);
    70. // Send the message to server:
    71. if(send(socket_desc, client_message, strlen(client_message), 0) < 0){
    72. printf("Unable to send message\n");
    73. return -1;
    74. }
    75. // Receive the server's response:
    76. if(recv(socket_desc, server_message, sizeof(server_message), 0) < 0){
    77. printf("Error while receiving server's msg\n");
    78. return -1;
    79. }
    80. printf("Server's response: %s\n",server_message);
    81. }
    82. // Close the client socket:
    83. close(client_sock);
    84. // Closing the server socket:
    85. close(socket_desc);
    86. return 0;
    87. }

    libfabric收发数据示例截图

    1. 服务端启动: ./example_msg客户端连接发送数据: ./example_msg 192.168.5.6
    2. 客户端连接发送数据: ./example_msg 192.168.5.6

    参考代码

    1. https://github.com/ssbandjl/libfabric/blob/main/fabtests/functional/example_msg.c
    2. /*
    3. *
    4. * This software is available to you under the BSD license
    5. * below:
    6. *
    7. * Redistribution and use in source and binary forms, with or
    8. * without modification, are permitted provided that the following
    9. * conditions are met:
    10. *
    11. * - Redistributions of source code must retain the above
    12. * copyright notice, this list of conditions and the following
    13. * disclaimer.
    14. *
    15. * - Redistributions in binary form must reproduce the above
    16. * copyright notice, this list of conditions and the following
    17. * disclaimer in the documentation and/or other materials
    18. * provided with the distribution.
    19. *
    20. * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
    21. * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
    22. * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
    23. * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
    24. * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
    25. * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
    26. * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
    27. * SOFTWARE.
    28. */
    29. #include <stdio.h>
    30. #include <stdlib.h>
    31. #include <getopt.h>
    32. #include <netinet/in.h>
    33. #include <arpa/inet.h>
    34. #include <netdb.h>
    35. #include <rdma/fabric.h>
    36. #include <rdma/fi_domain.h>
    37. #include <rdma/fi_endpoint.h>
    38. #include <rdma/fi_cm.h>
    39. #include <shared.h>
    40. //Build with
    41. //gcc -o example_msg example_msg.c -L<path to libfabric lib> -I<path to libfabric include> -lfabric
    42. //gcc -o example_msg example_msg.c -L/home/xb/project/libfabric/libfabric/build/lib -I/home/xb/project/libfabric/libfabric/build/include -I/home/xb/project/libfabric/libfabric/build/include -lfabric
    43. #define BUF_SIZE 64
    44. char *dst_addr = NULL;
    45. char *port = "9228";
    46. struct fi_info *hints, *info, *fi_pep;
    47. struct fid_fabric *fabric = NULL;
    48. struct fid_domain *domain = NULL;
    49. struct fid_ep *ep = NULL;
    50. struct fid_pep *pep = NULL;
    51. struct fid_cq *cq = NULL;
    52. struct fid_eq *eq = NULL;
    53. struct fi_cq_attr cq_attr = {0};
    54. struct fi_eq_attr eq_attr = {
    55. .wait_obj = FI_WAIT_UNSPEC
    56. };
    57. //const struct sockaddr_in *sin;
    58. char str_addr[INET_ADDRSTRLEN];
    59. int ret;
    60. char buffer[BUF_SIZE];
    61. fi_addr_t fi_addr = FI_ADDR_UNSPEC;
    62. struct fi_eq_cm_entry entry;
    63. uint32_t event;
    64. ssize_t rd;
    65. /* Initializes all basic OFI resources to allow for a server/client to exchange a message */
    66. static int start_client(void)
    67. {
    68. ret = fi_getinfo(FI_VERSION(1,9), dst_addr, port, dst_addr ? 0 : FI_SOURCE,
    69. hints, &info);
    70. if (ret) {
    71. printf("fi_getinfo: %d\n", ret);
    72. return ret;
    73. }
    74. ret = fi_fabric(info->fabric_attr, &fabric, NULL);
    75. if (ret) {
    76. printf("fi_fabric: %d\n", ret);
    77. return ret;
    78. }
    79. ret = fi_eq_open(fabric, &eq_attr, &eq, NULL);
    80. if (ret) {
    81. printf("fi_eq_open: %d\n", ret);
    82. return ret;
    83. }
    84. ret = fi_domain(fabric, info, &domain, NULL);
    85. if (ret) {
    86. printf("fi_domain: %d\n", ret);
    87. return ret;
    88. }
    89. /* Initialize our completion queue. Completion queues are used to report events associated
    90. * with data transfers. In this example, we use one CQ that tracks sends and receives, but
    91. * often times there will be separate CQs for sends and receives.
    92. */
    93. cq_attr.size = 128;
    94. cq_attr.format = FI_CQ_FORMAT_MSG;
    95. ret = fi_cq_open(domain, &cq_attr, &cq, NULL);
    96. if (ret) {
    97. printf("fi_cq_open error (%d)\n", ret);
    98. return ret;
    99. }
    100. /* Bind our CQ to our endpoint to track any sends and receives that come in or out on that endpoint.
    101. * A CQ can be bound to multiple endpoints but one EP can only have one send CQ and one receive CQ
    102. * (which can be the same CQ).
    103. */
    104. ret = fi_endpoint(domain, info, &ep, NULL);
    105. if (ret) {
    106. printf("fi_endpoint: %d\n", ret);
    107. return ret;
    108. }
    109. ret = fi_ep_bind(ep, &cq->fid, FI_SEND | FI_RECV);
    110. if (ret) {
    111. printf("fi_ep_bind cq error (%d)\n", ret);
    112. return ret;
    113. }
    114. ret = fi_ep_bind((ep), &(eq)->fid, 0);
    115. if (ret) {
    116. printf("fi_ep_bind: %d\n", ret);
    117. return ret;
    118. }
    119. ret = fi_enable(ep);
    120. if (ret) {
    121. printf("fi_enable: %d\n", ret);
    122. return ret;
    123. }
    124. ret = fi_connect(ep, info->dest_addr, NULL, 0);
    125. if (ret) {
    126. printf("fi_connect: %d\n", ret);
    127. return ret;
    128. }
    129. rd = fi_eq_sread(eq, &event, &entry, sizeof(entry), -1, 0);
    130. if (rd != sizeof(entry)) {
    131. ret = (int) rd;
    132. printf("fi_eq_sread: %d\n", ret);
    133. return ret;
    134. }
    135. return 0;
    136. }
    137. static int start_server(void)
    138. {
    139. const struct sockaddr_in *sin;
    140. /* The first OFI call to happen for initialization is fi_getinfo which queries libfabric
    141. * and returns any appropriate providers that fulfill the hints requirements. Any applicable
    142. * providers will be returned as a list of fi_info structs (&info). Any info can be selected.
    143. * In this test we select the first fi_info struct. Assuming all hints were set appropriately,
    144. * the first fi_info should be most appropriate.
    145. * The flag FI_SOURCE is set for the server to indicate that the address/port refer to source
    146. * information. This is not set for the client because the fields refer to the server, not
    147. * the caller (client). */
    148. /* 初始化时发生的第一个 OFI 调用是 fi_getinfo,它查询 libfabric 并返回满足提示要求的任何适当的提供程序。 任何适用的提供程序都将作为 fi_info 结构 (&info) 列表返回。 可以选择任何信息。 在此测试中,我们选择第一个 fi_info 结构。 假设所有提示均已正确设置,第一个 fi_info 应该是最合适的。 为服务器设置标志FI_SOURCE以指示地址/端口引用源信息。 这不是为客户端设置的,因为这些字段引用服务器,而不是调用者(客户端) */
    149. ret = fi_getinfo(FI_VERSION(1,9), dst_addr, port, dst_addr ? 0 : FI_SOURCE,
    150. hints, &fi_pep);
    151. if (ret) {
    152. printf("fi_getinfo error (%d)\n", ret);
    153. return ret;
    154. }
    155. /* Initialize our fabric. The fabric network represents a collection of hardware and software
    156. * resources that access a single physical or virtual network. All network ports on a system
    157. * that can communicate with each other through their attached networks belong to the same fabric.
    158. */
    159. ret = fi_fabric(fi_pep->fabric_attr, &fabric, NULL); // 打开fabric, 初始化任何资源前需要打开fabric
    160. if (ret) {
    161. printf("fi_fabric error (%d)\n", ret);
    162. return ret;
    163. }
    164. /* Initialize our endpoint. Endpoints are transport level communication portals which are used to
    165. * initiate and drive communication. There are three main types of endpoints:
    166. * FI_EP_MSG - connected, reliable
    167. * FI_EP_RDM - unconnected, reliable
    168. * FI_EP_DGRAM - unconnected, unreliable
    169. * The type of endpoint will be requested in hints/fi_getinfo. Different providers support different
    170. * types of endpoints. TCP supports only FI_EP_MSG but when used with RxM, can support FI_EP_RDM.
    171. * In this application, we requested TCP and FI_EP_MSG.
    172. */
    173. ret = fi_eq_open(fabric, &eq_attr, &eq, NULL); // 打开事件队列EQ, 一般用于建连, 收发数据产生的事件
    174. if (ret) {
    175. printf("fi_eq_open: %d\n", ret);
    176. return ret;
    177. }
    178. ret = fi_passive_ep(fabric, fi_pep, &pep, NULL); // 打开被动端点, 常用与服务端监听端口, 支持多个客户端domain连接进来
    179. if (ret) {
    180. printf("fi_passive_ep: %d\n", ret);
    181. return ret;
    182. }
    183. ret = fi_pep_bind(pep, &eq->fid, 0); // 为端点绑定事件队列
    184. if (ret) {
    185. printf("fi_pep_bind %d", ret);
    186. return ret;
    187. }
    188. ret = fi_listen(pep); // 监听端点, 等待客户端连接请求
    189. if (ret) {
    190. printf("fi_listen %d", ret);
    191. return ret;
    192. }
    193. return 0;
    194. }
    195. static int complete_connection(void)
    196. {
    197. rd = fi_eq_sread(eq, &event, &entry, sizeof(entry), -1, 0); // 等待读取客户端触发的服务端事件, 读取事件, 推动进展(驱动程序运转)
    198. if (rd != sizeof entry) {
    199. ret = (int) rd;
    200. printf("fi_eq_sread: %d", ret);
    201. if (ret)
    202. goto err;
    203. }
    204. ret = fi_domain(fabric, info, &domain, NULL); // domain域用于将资源分组, 可基于域来做管理
    205. if (ret) {
    206. printf("fi_domain: %d\n", ret);
    207. return ret;
    208. }
    209. ret = fi_domain_bind(domain, &eq->fid, 0);
    210. if (ret) {
    211. printf("fi_domain_bind: %d\n", ret);
    212. return ret;
    213. }
    214. /*
    215. * Initialize our completion queue. Completion queues are used to report events associated
    216. * with data transfers. In this example, we use one CQ that tracks sends and receives, but
    217. * often times there will be separate CQs for sends and receives.
    218. */
    219. cq_attr.size = 128;
    220. cq_attr.format = FI_CQ_FORMAT_MSG;
    221. ret = fi_cq_open(domain, &cq_attr, &cq, NULL);
    222. if (ret) {
    223. printf("fi_cq_open error (%d)\n", ret);
    224. return ret;
    225. }
    226. /* Bind our CQ to our endpoint to track any sends and receives that
    227. * come in or out on that endpoint. A CQ can be bound to multiple
    228. * endpoints but one EP can only have one send CQ and one receive CQ
    229. * (which can be the same CQ).
    230. */
    231. ret = fi_endpoint(domain, info, &ep, NULL); // 用于客户端, 主动端点, 发起建连
    232. if (ret) {
    233. printf("fi_endpoint: %d\n", ret);
    234. return ret;
    235. }
    236. ret = fi_ep_bind(ep, &cq->fid, FI_SEND | FI_RECV);
    237. if (ret) {
    238. printf("fi_ep_bind cq error (%d)\n", ret);
    239. return ret;
    240. }
    241. ret = fi_ep_bind((ep), &(eq)->fid, 0);
    242. if (ret) {
    243. printf("fi_ep_bind: %d\n", ret);
    244. return ret;
    245. }
    246. ret = fi_enable(ep);
    247. if (ret) {
    248. printf("fi_enable: %d", ret);
    249. return ret;
    250. }
    251. ret = fi_accept(ep, NULL, 0);
    252. if (ret) {
    253. printf("fi_accept: %d\n", ret);
    254. return ret;
    255. }
    256. rd = fi_eq_sread(eq, &event, &entry, sizeof(entry), -1, 0);
    257. if (rd != sizeof(entry)) {
    258. ret = (int) rd;
    259. printf("fi_eq_read: %d\n", ret);
    260. return ret;
    261. }
    262. return 0;
    263. err:
    264. if (info)
    265. fi_reject(pep, info->handle, NULL, 0);
    266. return ret;
    267. }
    268. static void cleanup(void)
    269. {
    270. int ret;
    271. /* All OFI resources are cleaned up using the same fi_close(fid) call. */
    272. if (ep) {
    273. ret = fi_close(&ep->fid);
    274. if (ret)
    275. printf("warning: error closing EP (%d)\n", ret);
    276. }
    277. if (pep) {
    278. ret = fi_close(&pep->fid);
    279. if (ret)
    280. printf("warning: error closing PEP (%d)\n", ret);
    281. }
    282. ret = fi_close(&cq->fid);
    283. if (ret)
    284. printf("warning: error closing CQ (%d)\n", ret);
    285. ret = fi_close(&domain->fid);
    286. if (ret)
    287. printf("warning: error closing domain (%d)\n", ret);
    288. ret = fi_close(&eq->fid);
    289. if (ret)
    290. printf("warning: error closing EQ (%d)\n", ret);
    291. ret = fi_close(&fabric->fid);
    292. if (ret)
    293. printf("warning: error closing fabric (%d)\n", ret);
    294. if (info)
    295. fi_freeinfo(info);
    296. if (fi_pep)
    297. fi_freeinfo(fi_pep);
    298. }
    299. /* Post a receive buffer. This call does not ensure a message has been received, just
    300. * that a buffer has been passed to OFI for the next message the provider receives.
    301. * Receives may be directed or undirected using the address parameter. Here, we
    302. * pass in the fi_addr but note that the server has not inserted the client's
    303. * address into its AV, so the address is still FI_ADDR_UNSPEC, indicating that
    304. * this buffer may receive incoming data from any address. An application may
    305. * set this to a real fi_addr if the buffer should only receive data from a certain
    306. * peer.
    307. * When posting a buffer, if the provider is not ready to process messages (because
    308. * of connection initialization for example), it may return -FI_EAGAIN. This does
    309. * not indicate an error, but rather that the application should try again later.
    310. * This is why we almost always wrap sends and receives in a do/while. Some providers
    311. * may need the application to drive progress in order to get out of the -FI_EAGAIN
    312. * loop. To drive progress, the application needs to call fi_cq_read (not necessarily
    313. * reading any completion entries).
    314. */
    315. static int post_recv(void)
    316. {
    317. int ret;
    318. do {
    319. ret = fi_recv(ep, buffer, BUF_SIZE, NULL, fi_addr, NULL);
    320. if (ret && ret != -FI_EAGAIN) {
    321. printf("error posting recv buffer (%d\n", ret);
    322. return ret;
    323. }
    324. if (ret == -FI_EAGAIN)
    325. (void) fi_cq_read(cq, NULL, 0);
    326. } while (ret);
    327. return 0;
    328. }
    329. /* Post a send buffer. This call does not ensure a message has been sent, just that
    330. * a buffer has been submitted to OFI to be sent. Unlike a receive buffer, a send
    331. * needs a valid fi_addr as input to tell the provider where to send the message.
    332. * Similar to the receive buffer posting porcess, when posting a send buffer, if the
    333. * provider is not ready to process messages, it may return -FI_EAGAIN. This does not
    334. * indicate an error, but rather that the application should try again later. Just like
    335. * the receive, we drive progress with fi_cq_read if this is the case.
    336. */
    337. static int post_send(void)
    338. {
    339. char *msg = "Hello, server! I am the client you've been waiting for!\0";
    340. int ret;
    341. (void) snprintf(buffer, BUF_SIZE, "%s", msg);
    342. do {
    343. ret = fi_send(ep, buffer, BUF_SIZE, NULL, fi_addr, NULL);
    344. if (ret && ret != -FI_EAGAIN) {
    345. printf("error posting send buffer (%d)\n", ret);
    346. return ret;
    347. }
    348. if (ret == -FI_EAGAIN)
    349. (void) fi_cq_read(cq, NULL, 0);
    350. } while (ret);
    351. return 0;
    352. }
    353. /* Wait for the message to be sent/received using the CQ. fi_cq_read not only drives progress
    354. * but also returns any completed events to notify the application that it can reuse
    355. * the send/recv buffer. The returned completion entry will have fields set to let the application
    356. * know what operation completed. Not all fields will be valid. The fields set will be indicated
    357. * by the cq format (when creating the CQ). In this example, we use FI_CQ_FORMAT_MSG in order to
    358. * use the flags field.
    359. */
    360. static int wait_cq(void)
    361. {
    362. struct fi_cq_err_entry comp;
    363. int ret;
    364. do {
    365. ret = fi_cq_read(cq, &comp, 1);
    366. if (ret < 0 && ret != -FI_EAGAIN) {
    367. printf("error reading cq (%d)\n", ret);
    368. return ret;
    369. }
    370. } while (ret != 1);
    371. if (comp.flags & FI_RECV)
    372. printf("I received a message!\n");
    373. else if (comp.flags & FI_SEND)
    374. printf("My message got sent!\n");
    375. return 0;
    376. }
    377. static int run(void)
    378. {
    379. int ret;
    380. if (dst_addr) {
    381. printf("Client: send to server %s\n", dst_addr);
    382. ret = post_send();
    383. if (ret)
    384. return ret;
    385. ret = wait_cq();
    386. if (ret)
    387. return ret;
    388. } else {
    389. printf("Server: post buffer and wait for message from client\n");
    390. ret = post_recv();
    391. if (ret)
    392. return ret;
    393. ret = wait_cq();
    394. if (ret)
    395. return ret;
    396. printf("This is the message I received: %s\n", buffer);
    397. }
    398. return 1;
    399. }
    400. int main(int argc, char **argv)
    401. {
    402. int ret;
    403. /* Hints are used to request support for specific features from a provider */
    404. hints = fi_allocinfo(); //
    405. if (!hints)
    406. return EXIT_FAILURE;
    407. /* Server run with no args, client has server's address as an argument */
    408. dst_addr = argv[1];
    409. //Set anything in hints that the application needs
    410. /* Request FI_EP_MSG (reliable datagram) endpoint which will allow us
    411. * to reliably send messages to peers without having to listen/connect/accept. */
    412. hints->ep_attr->type = FI_EP_MSG; // 可靠数据报端点, 类似socket, 但无须执行listen/connect/accept
    413. /* Request basic messaging capabilities from the provider (no tag matching,
    414. * no RMA, no atomic operations) */
    415. hints->caps = FI_MSG;
    416. /* Specifically request the tcp provider for the simple test */
    417. // hints->fabric_attr->prov_name = "tcp"; // 类似socket的, 面向连接的消息类型端点
    418. hints->fabric_attr->prov_name = "ofi_rxm;verbs";
    419. /* Specifically request SOCKADDR_IN address format to simplify addressing for test */
    420. hints->addr_format = FI_SOCKADDR_IN;
    421. /* Default to FI_DELIVERY_COMPLETE which will make sure completions do not get generated
    422. * until our message arrives at the destination. Otherwise, the client might get a completion
    423. * and exit before the server receives the message. This is to make the test simpler */
    424. /* 默认为 FI_DELIVERY_COMPLETE,这将确保在我们的消息到达目的地之前不会生成完成(等待)。 否则,客户端可能会在服务器收到消息之前完成并退出。 这是为了让测试更简单 */
    425. hints->tx_attr->op_flags = FI_DELIVERY_COMPLETE;
    426. //Done setting hints
    427. if (!dst_addr) {
    428. ret = start_server();
    429. if (ret) {
    430. goto out;
    431. return ret;
    432. }
    433. }
    434. ret = dst_addr ? start_client() : complete_connection();
    435. if (ret) {
    436. goto out;
    437. return ret;
    438. }
    439. ret = run();
    440. out:
    441. cleanup();
    442. return ret;
    443. }

    socket vs libfabric消息类型示意图(两者都可完成建连和消息收发)

    GPU数据传输示例

    左边是内存直接访问DMA ibv verbs示例, 右边是DMA libfabric统一API的语义的示例

    1. verbs
    2. 服务端: ./rdmabw-xe -m host
    3. 客户端: ./rdmabw-xe -m host -S 1 -t write 192.168.5.6 #都是用主机内存, 完成1字节的远程内存写操作

    主机对主机GPU libfabric 内存直接访问DMA示例

    1. libfabric, host -> host DMA
    2. 服务端: ./fi-rdmabw-xe -m host
    3. 客户端: ./fi-rdmabw-xe -m host -S 1 -t write 192.168.5.6 #都是用主机内存, 完成1字节的远程内存写操作
    4. verbs 代码相对位置: fabtests/component/dmabuf-rdma/rdmabw-xe.c

    主机发给GPU设备 内存直接访问DMA libfabric示例

    1. libfabric GPU设备 -> host
    2. 服务端: ./fi-rdmabw-xe -m device #使用GPU设备的内存
    3. 客户端: ./fi-rdmabw-xe -m host -S 1 -t write 192.168.5.6 #用主机内存, 完成1字节的远程内存写操作
    4. libfabric 代码相对位置: fabtests/component/dmabuf-rdma/fi-rdmabw-xe.c

  • 相关阅读:
    《流畅的python》阅读笔记 - 第二章:数据结构
    羧甲基壳聚糖-丙硫菌唑水凝胶微球/壳聚糖/磷酸甘油盐温敏性水凝胶/石墨烯壳聚糖复合水凝胶
    Spring的IOC和AOP,学不会Spring的,你看我的文章
    [C++ 网络协议] 多线程服务器端
    【学习笔记】《Python深度学习》第一章:什么是深度学习
    Latent Topic-aware Multi-Label Classification
    BLIP-2小结
    一个月速刷leetcodeHOT100 day13 二叉树结构 以及相关简单题
    基于SpringSecurity的@PreAuthorize实现自定义权限校验方法
    碳中和&专利创新专题:各省市县专利面板(原始文件)、低碳专利授权数等多指标数据
  • 原文地址:https://blog.csdn.net/weixin_43778179/article/details/134524691