2

探索aarch64架构上使用ftrace的BPF LSM

 7 months ago
source link: https://www.cnxct.com/exploring-bpf-lsm-support-on-aarch64-with-ftrace/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

探索aarch64架构上使用ftrace的BPF LSM

笔者在MacBook M2上搭建Linux虚拟机上开发eBPF程序时,遇到一些LSM eBPF类型程序无法运行的问题。 在笔者尝试定位这些差异时,看到这篇文章,可以让大家更直观地了解LSM eBPF在ARM64、AMD64 内核上的差异。
原文地址:Exploring BPF LSM support on aarch64 with ftrace

arm-chip-1280x853-1.png

本博客文章是我们在Linux中对于aarch64BPF LSM支持的内部研究的摘要。如果你对内核代码库不熟悉,要开始查看内核源码是非常困难的,因此我们决定发布这篇文章,展示我们的方法,因为这对于想要探索内核内部的任何人都可能有所帮助。

x86_64上,我们已经在使用BPF LSM,而在aarch64上,我们依赖于Kprobes,因此我们想知道内核中缺少了哪些功能,才能让这些功能在aarch64上可用。

我们曾多次深入研究内核源代码,但通常我们搜索的是已经存在的东西,以了解其工作原理。但在这种情况下,我们在寻找的是不存在的东西,我们追寻的是那些因为未实现而返回错误的内容。

回想起Steven Rostedt关于如何开始学习Linux内核的讲话,我们从ftrace(以及构建在跟踪基础设施上的工具)开始,以了解当我们将一个不受支持的BPF程序加载到内核时会发生什么。

这是当我们尝试将一个BPF LSM程序加载到aarch64 5.15 Linux内核时,使用我们的软件pulsar时的输出:

    root@pine64-1:/home/exein# ./pulsar-enterprise-exec pulsard
    [2023-02-16T14:52:45Z INFO  pulsar::pulsard::daemon] Starting module process-monitor
    [2023-02-16T14:52:45Z INFO  pulsar::pulsard::daemon] Starting module file-system-monitor
    [2023-02-16T14:52:46Z INFO  pulsar::pulsard::daemon] Starting module network-monitor
    [2023-02-16T14:52:46Z INFO  pulsar::pulsard::daemon] Starting module logger
    [2023-02-16T14:52:46Z INFO  pulsar::pulsard::daemon] Starting module rules-engine
    [2023-02-16T14:52:46Z INFO  pulsar::pulsard::daemon] Starting module desktop-notifier
    [2023-02-16T14:52:46Z ERROR pulsar::pulsard::module_manager] Module error in file-system-monitor: failed program attach lsm path_mknod

        Caused by:
            0: `bpf_raw_tracepoint_open` failed
            1: No error information (os error 524)
    [2023-02-16T14:52:46Z INFO  pulsar::pulsard::daemon] Starting module anomaly-detection
    [2023-02-16T14:52:46Z INFO  pulsar::pulsard::daemon] Starting module malware-detection
    [2023-02-16T14:52:46Z ERROR pulsar::pulsard::module_manager] Module error in malware-detection: /var/lib/pulsar/malware_detection/models/parameters.json not found
    [2023-02-16T14:52:46Z INFO  pulsar::pulsard::daemon] Starting module platform-connector
    [2023-02-16T14:52:46Z INFO  platform_connector::client] Connected to https://platform-dev-instance.exein.io:8001/
    [2023-02-16T14:52:46Z INFO  pulsar::pulsard::daemon] Starting module threat-response
    [2023-02-16T14:52:46Z ERROR pulsar::pulsard::module_manager] Module error in network-monitor: failed program attach lsm socket_bind

        Caused by:
            0: `bpf_raw_tracepoint_open` failed
            1: No error information (os error 524)

我们在尝试加载与path_mknodLSM挂钩相关的BPF程序时,pulsar出现了错误524ENOTSUPP。让我们尝试深入研究这个问题。

注意: 在进行这项研究时,我们当时无法找到预先编译为启用BPFBTFaarch64,因此我们不得不编译一个自定义内核。我们还启用了跟踪选项和function_graph插件,以使用下面的工具。
所有的实验都是在一台装有定制Armbian镜像的Pine A64上进行的。
这些镜像具有带有标准Ubuntu 22.04 LTS Jammy用户空间的自定义内核。

为了调查这个问题,我们使用了以下工具:

  • bpftrace:基于BPF的工具,使用自定义类C语言动态附加探针。
  • trace-cmd:围绕tracefs文件系统的包装器,与ftrace基础设施交互。

要使用这些工具,您需要在Linux内核中启用一些选项,请查阅官方文档获取完整的要求。

注意: 也可以使用其他工具来完成相同的工作,例如perf-tools中的funcgraphkprobe

Linux 5.15

现在我们开始使用这些工具来查看在内核5.15中尝试加载我们的BPF程序时会发生什么。

从这一点开始到本文末尾,我们将使用probe二进制文件代替pulsar,因为它更简单。为了简要概括其工作原理,以下是命令行帮助:

    exein@pine64-1:~$ ./probe 
    Test runner for eBPF programs

    Usage: probe [OPTIONS] <COMMAND>

    Commands:
      file-system-monitor  Watch file creations
      process-monitor      Watch process events (fork/exec/exit)
      network-monitor      Watch network events
      help                 Print this message or the help of the given subcommand(s)

    Options:
      -v, --verbose  
      -h, --help     Print help
      -V, --version  Print version

在这些示例中,我们将尝试加载file-system-monitor探针。

通过运行以下命令,我们可以看到__sys_bpf函数的函数图调用,这是BPF系统调用的入口点:

    trace-cmd record -p function_graph -g __sys_bpf ./probe file-system-monitor
    trace-cmd report

输出是一个非常庞大的函数图,太大了,无法在这里粘贴。由于我们遇到了错误,我们对程序停止前的最后几个函数感兴趣。以下是trace-cmd report输出的最后几行:

    ...
     tokio-runtime-w-1666  [003]  1318.058019: funcgraph_entry:                   |        bpf_trampoline_link_prog() {
     tokio-runtime-w-1666  [003]  1318.058020: funcgraph_entry:        2.292 us   |          bpf_attach_type_to_tramp();
     tokio-runtime-w-1666  [003]  1318.058024: funcgraph_entry:        1.250 us   |          mutex_lock();
     tokio-runtime-w-1666  [003]  1318.058028: funcgraph_entry:                   |          bpf_trampoline_update() {
     tokio-runtime-w-1666  [003]  1318.058030: funcgraph_entry:                   |            kmem_cache_alloc_trace() {
     tokio-runtime-w-1666  [003]  1318.058031: funcgraph_entry:        1.167 us   |              should_failslab();
     tokio-runtime-w-1666  [003]  1318.058036: funcgraph_exit:         6.792 us   |            }
     tokio-runtime-w-1666  [003]  1318.058039: funcgraph_entry:                   |            kmem_cache_alloc_trace() {
     tokio-runtime-w-1666  [003]  1318.058042: funcgraph_entry:        2.750 us   |              should_failslab();
     tokio-runtime-w-1666  [003]  1318.058046: funcgraph_exit:         6.417 us   |            }
     tokio-runtime-w-1666  [003]  1318.058048: funcgraph_entry:        2.708 us   |            bpf_jit_charge_modmem();
     tokio-runtime-w-1666  [003]  1318.058053: funcgraph_entry:                   |            bpf_jit_alloc_exec_page() {
     tokio-runtime-w-1666  [003]  1318.058055: funcgraph_entry:                   |              bpf_jit_alloc_exec() {
     tokio-runtime-w-1666  [003]  1318.058057: funcgraph_entry:                   |                vmalloc() {
     tokio-runtime-w-1666  [003]  1318.058059: funcgraph_entry:                   |                  __vmalloc_node() {
     tokio-runtime-w-1666  [003]  1318.058061: funcgraph_entry:                   |                    __vmalloc_node_range() {
     tokio-runtime-w-1666  [003]  1318.058064: funcgraph_entry:                   |                      __get_vm_area_node.constprop.64() {
     tokio-runtime-w-1666  [003]  1318.058067: funcgraph_entry:                   |                        kmem_cache_alloc_node_trace() {
     tokio-runtime-w-1666  [003]  1318.058069: funcgraph_entry:        1.459 us   |                          should_failslab();
     tokio-runtime-w-1666  [003]  1318.058073: funcgraph_exit:         6.292 us   |                        }
     tokio-runtime-w-1666  [003]  1318.058075: funcgraph_entry:                   |                        alloc_vmap_area() {
     tokio-runtime-w-1666  [003]  1318.058077: funcgraph_entry:                   |                          kmem_cache_alloc_node() {
     tokio-runtime-w-1666  [003]  1318.058079: funcgraph_entry:        1.167 us   |                            should_failslab();
     tokio-runtime-w-1666  [003]  1318.058085: funcgraph_exit:         7.625 us   |                          }
     tokio-runtime-w-1666  [003]  1318.058088: funcgraph_entry:                   |                          kmem_cache_alloc_node() {
     tokio-runtime-w-1666  [003]  1318.058089: funcgraph_entry:        1.208 us   |                            should_failslab();
     tokio-runtime-w-1666  [003]  1318.058092: funcgraph_exit:         4.584 us   |                          }
     tokio-runtime-w-1666  [003]  1318.058104: funcgraph_entry:                   |                          kmem_cache_free() {
     tokio-runtime-w-1666  [003]  1318.058107: funcgraph_entry:        2.084 us   |                            __slab_free();
     tokio-runtime-w-1666  [003]  1318.058110: funcgraph_exit:         5.667 us   |                          }
     tokio-runtime-w-1666  [003]  1318.058112: funcgraph_entry:        6.375 us   |                          insert_vmap_area.constprop.74();
     tokio-runtime-w-1666  [003]  1318.058119: funcgraph_exit:       + 44.667 us  |                        }
     tokio-runtime-w-1666  [003]  1318.058122: funcgraph_exit:       + 58.250 us  |                      }
     tokio-runtime-w-1666  [003]  1318.058124: funcgraph_entry:                   |                      __kmalloc_node() {
     tokio-runtime-w-1666  [003]  1318.058125: funcgraph_entry:        1.625 us   |                        kmalloc_slab();
     tokio-runtime-w-1666  [003]  1318.058128: funcgraph_entry:        1.167 us   |                        should_failslab();
     tokio-runtime-w-1666  [003]  1318.058131: funcgraph_exit:         7.208 us   |                      }
     tokio-runtime-w-1666  [003]  1318.058133: funcgraph_entry:                   |                      alloc_pages() {
     tokio-runtime-w-1666  [003]  1318.058135: funcgraph_entry:        1.583 us   |                        get_task_policy.part.48();
     tokio-runtime-w-1666  [003]  1318.058138: funcgraph_entry:        1.500 us   |                        policy_node();
     tokio-runtime-w-1666  [003]  1318.058141: funcgraph_entry:        1.209 us   |                        policy_nodemask();
     tokio-runtime-w-1666  [003]  1318.058143: funcgraph_entry:                   |                        __alloc_pages() {
     tokio-runtime-w-1666  [003]  1318.058145: funcgraph_entry:        1.458 us   |                          should_fail_alloc_page();
     tokio-runtime-w-1666  [003]  1318.058147: funcgraph_entry:                   |                          get_page_from_freelist() {
     tokio-runtime-w-1666  [003]  1318.058150: funcgraph_entry:        1.583 us   |                            prep_new_page();
     tokio-runtime-w-1666  [003]  1318.058153: funcgraph_exit:         5.459 us   |                          }
     tokio-runtime-w-1666  [003]  1318.058154: funcgraph_exit:       + 10.542 us  |                        }
     tokio-runtime-w-1666  [003]  1318.058155: funcgraph_exit:       + 22.083 us  |                      }
     tokio-runtime-w-1666  [003]  1318.058157: funcgraph_entry:                   |                      __cond_resched() {
     tokio-runtime-w-1666  [003]  1318.058158: funcgraph_entry:        1.833 us   |                        rcu_all_qs();
     tokio-runtime-w-1666  [003]  1318.058161: funcgraph_exit:         4.167 us   |                      }
     tokio-runtime-w-1666  [003]  1318.058166: funcgraph_entry:        5.542 us   |                      vmap_pages_range_noflush();
     tokio-runtime-w-1666  [003]  1318.058173: funcgraph_exit:       ! 112.375 us |                    }
     tokio-runtime-w-1666  [003]  1318.058175: funcgraph_exit:       ! 116.000 us |                  }
     tokio-runtime-w-1666  [003]  1318.058176: funcgraph_exit:       ! 119.292 us |                }
     tokio-runtime-w-1666  [003]  1318.058177: funcgraph_exit:       ! 122.542 us |              }
     tokio-runtime-w-1666  [003]  1318.058179: funcgraph_entry:                   |              find_vm_area() {
     tokio-runtime-w-1666  [003]  1318.058180: funcgraph_entry:        1.375 us   |                find_vmap_area();
     tokio-runtime-w-1666  [003]  1318.058183: funcgraph_exit:         4.333 us   |              }
     tokio-runtime-w-1666  [003]  1318.058185: funcgraph_entry:                   |              set_memory_x() {
     tokio-runtime-w-1666  [003]  1318.058186: funcgraph_entry:                   |                change_memory_common() {
     tokio-runtime-w-1666  [003]  1318.058188: funcgraph_entry:                   |                  find_vm_area() {
     tokio-runtime-w-1666  [003]  1318.058189: funcgraph_entry:        1.333 us   |                    find_vmap_area();
     tokio-runtime-w-1666  [003]  1318.058192: funcgraph_exit:         3.875 us   |                  }
     tokio-runtime-w-1666  [003]  1318.058193: funcgraph_entry:                   |                  vm_unmap_aliases() {
     tokio-runtime-w-1666  [003]  1318.058194: funcgraph_entry:                   |                    _vm_unmap_aliases.part.58() {
     tokio-runtime-w-1666  [003]  1318.058196: funcgraph_entry:        1.542 us   |                      rcu_read_unlock_strict();
     tokio-runtime-w-1666  [003]  1318.058199: funcgraph_entry:        1.208 us   |                      rcu_read_unlock_strict();
     tokio-runtime-w-1666  [003]  1318.058202: funcgraph_entry:        1.166 us   |                      rcu_read_unlock_strict();
     tokio-runtime-w-1666  [003]  1318.058205: funcgraph_entry:        1.208 us   |                      rcu_read_unlock_strict();
     tokio-runtime-w-1666  [003]  1318.058207: funcgraph_entry:        1.208 us   |                      mutex_lock();
     tokio-runtime-w-1666  [003]  1318.058210: funcgraph_entry:                   |                      purge_fragmented_blocks_allcpus() {
     tokio-runtime-w-1666  [003]  1318.058212: funcgraph_entry:        1.500 us   |                        rcu_read_unlock_strict();
     tokio-runtime-w-1666  [003]  1318.058214: funcgraph_entry:        1.500 us   |                        rcu_read_unlock_strict();
     tokio-runtime-w-1666  [003]  1318.058217: funcgraph_entry:        1.500 us   |                        rcu_read_unlock_strict();
     tokio-runtime-w-1666  [003]  1318.058220: funcgraph_entry:        1.167 us   |                        rcu_read_unlock_strict();
     tokio-runtime-w-1666  [003]  1318.058222: funcgraph_exit:       + 11.917 us  |                      }
     tokio-runtime-w-1666  [003]  1318.058224: funcgraph_entry:                   |                      __purge_vmap_area_lazy() {
     tokio-runtime-w-1666  [003]  1318.058232: funcgraph_entry:                   |                        kmem_cache_free() {
     tokio-runtime-w-1666  [003]  1318.058234: funcgraph_entry:        1.250 us   |                          __slab_free();
     tokio-runtime-w-1666  [003]  1318.058237: funcgraph_exit:         4.791 us   |                        }
     tokio-runtime-w-1666  [003]  1318.058241: funcgraph_entry:        1.209 us   |                        __cond_resched_lock();
     tokio-runtime-w-1666  [003]  1318.058244: funcgraph_exit:       + 19.625 us  |                      }
     tokio-runtime-w-1666  [003]  1318.058245: funcgraph_entry:        1.167 us   |                      mutex_unlock();
     tokio-runtime-w-1666  [003]  1318.058247: funcgraph_exit:       + 53.042 us  |                    }
     tokio-runtime-w-1666  [003]  1318.058248: funcgraph_exit:       + 55.625 us  |                  }
     tokio-runtime-w-1666  [003]  1318.058250: funcgraph_entry:                   |                  __change_memory_common() {
     tokio-runtime-w-1666  [003]  1318.058251: funcgraph_entry:                   |                    apply_to_page_range() {
     tokio-runtime-w-1666  [003]  1318.058253: funcgraph_entry:                   |                      __apply_to_page_range() {
     tokio-runtime-w-1666  [003]  1318.058255: funcgraph_entry:        1.250 us   |                        pud_huge();
     tokio-runtime-w-1666  [003]  1318.058258: funcgraph_entry:        1.166 us   |                        pmd_huge();
     tokio-runtime-w-1666  [003]  1318.058260: funcgraph_entry:        1.208 us   |                        change_page_range();
     tokio-runtime-w-1666  [003]  1318.058263: funcgraph_exit:         9.834 us   |                      }
     tokio-runtime-w-1666  [003]  1318.058264: funcgraph_exit:       + 12.709 us  |                    }
     tokio-runtime-w-1666  [003]  1318.058266: funcgraph_exit:       + 15.459 us  |                  }
     tokio-runtime-w-1666  [003]  1318.058268: funcgraph_exit:       + 80.791 us  |                }
     tokio-runtime-w-1666  [003]  1318.058270: funcgraph_exit:       + 84.834 us  |              }
     tokio-runtime-w-1666  [003]  1318.058272: funcgraph_exit:       ! 218.500 us |            }
     tokio-runtime-w-1666  [003]  1318.058274: funcgraph_entry:                   |            __alloc_percpu_gfp() {
     tokio-runtime-w-1666  [003]  1318.058276: funcgraph_entry:                   |              pcpu_alloc() {
     tokio-runtime-w-1666  [003]  1318.058281: funcgraph_entry:        2.250 us   |                mutex_lock_killable();
     tokio-runtime-w-1666  [003]  1318.058290: funcgraph_entry:                   |                pcpu_find_block_fit() {
     tokio-runtime-w-1666  [003]  1318.058293: funcgraph_entry:        2.833 us   |                  pcpu_next_fit_region.constprop.38();
     tokio-runtime-w-1666  [003]  1318.058299: funcgraph_exit:         9.084 us   |                }
     tokio-runtime-w-1666  [003]  1318.058301: funcgraph_entry:                   |                pcpu_alloc_area() {
     tokio-runtime-w-1666  [003]  1318.058315: funcgraph_entry:        4.000 us   |                  pcpu_block_update_hint_alloc();
     tokio-runtime-w-1666  [003]  1318.058320: funcgraph_entry:        2.208 us   |                  pcpu_chunk_relocate();
     tokio-runtime-w-1666  [003]  1318.058324: funcgraph_exit:       + 22.625 us  |                }
     tokio-runtime-w-1666  [003]  1318.058327: funcgraph_entry:        1.208 us   |                mutex_unlock();
     tokio-runtime-w-1666  [003]  1318.058332: funcgraph_entry:        1.584 us   |                pcpu_memcg_post_alloc_hook();
     tokio-runtime-w-1666  [003]  1318.058335: funcgraph_exit:       + 58.833 us  |              }
     tokio-runtime-w-1666  [003]  1318.058336: funcgraph_exit:       + 61.834 us  |            }
     tokio-runtime-w-1666  [003]  1318.058338: funcgraph_entry:                   |            kmem_cache_alloc_trace() {
     tokio-runtime-w-1666  [003]  1318.058339: funcgraph_entry:        1.167 us   |              should_failslab();
     tokio-runtime-w-1666  [003]  1318.058342: funcgraph_exit:         4.458 us   |            }
     tokio-runtime-w-1666  [003]  1318.058359: funcgraph_entry:                   |            bpf_image_ksym_add() {
     tokio-runtime-w-1666  [003]  1318.058360: funcgraph_entry:                   |              bpf_ksym_add() {
     tokio-runtime-w-1666  [003]  1318.058363: funcgraph_entry:        1.583 us   |                __local_bh_enable_ip();
     tokio-runtime-w-1666  [003]  1318.058366: funcgraph_exit:         5.750 us   |              }
     tokio-runtime-w-1666  [003]  1318.058369: funcgraph_exit:         9.834 us   |            }
     tokio-runtime-w-1666  [003]  1318.058371: funcgraph_entry:        1.250 us   |            arch_prepare_bpf_trampoline();
     tokio-runtime-w-1666  [003]  1318.058373: funcgraph_entry:        2.292 us   |            kfree();
     tokio-runtime-w-1666  [003]  1318.058377: funcgraph_exit:       ! 348.625 us |          }
     tokio-runtime-w-1666  [003]  1318.058379: funcgraph_entry:        1.250 us   |          mutex_unlock();
     tokio-runtime-w-1666  [003]  1318.058382: funcgraph_exit:       ! 363.167 us |        }
     tokio-runtime-w-1666  [003]  1318.058384: funcgraph_entry:                   |        bpf_link_cleanup() {
     tokio-runtime-w-1666  [003]  1318.058386: funcgraph_entry:                   |          bpf_link_free_id.part.30() {
     tokio-runtime-w-1666  [003]  1318.058392: funcgraph_entry:                   |            call_rcu() {
     tokio-runtime-w-1666  [003]  1318.058396: funcgraph_entry:        1.834 us   |              rcu_segcblist_enqueue();
     tokio-runtime-w-1666  [003]  1318.058401: funcgraph_exit:         9.333 us   |            }
     tokio-runtime-w-1666  [003]  1318.058403: funcgraph_entry:        1.542 us   |            __local_bh_enable_ip();
     tokio-runtime-w-1666  [003]  1318.058406: funcgraph_exit:       + 19.542 us  |          }
     tokio-runtime-w-1666  [003]  1318.058408: funcgraph_entry:                   |          fput() {
     tokio-runtime-w-1666  [003]  1318.058409: funcgraph_entry:                   |            fput_many() {
     tokio-runtime-w-1666  [003]  1318.058411: funcgraph_entry:                   |              task_work_add() {
     tokio-runtime-w-1666  [003]  1318.058414: funcgraph_entry:        1.625 us   |                kick_process();
     tokio-runtime-w-1666  [003]  1318.058418: funcgraph_exit:         6.750 us   |              }
     tokio-runtime-w-1666  [003]  1318.058419: funcgraph_exit:       + 10.333 us  |            }
     tokio-runtime-w-1666  [003]  1318.058420: funcgraph_exit:       + 12.708 us  |          }
     tokio-runtime-w-1666  [003]  1318.058422: funcgraph_entry:        2.250 us   |          put_unused_fd();
     tokio-runtime-w-1666  [003]  1318.058426: funcgraph_exit:       + 41.416 us  |        }
     tokio-runtime-w-1666  [003]  1318.058428: funcgraph_entry:        1.292 us   |        mutex_unlock();
     tokio-runtime-w-1666  [003]  1318.058430: funcgraph_entry:        1.250 us   |        kfree();
     tokio-runtime-w-1666  [003]  1318.058433: funcgraph_exit:       ! 567.458 us |      }
     tokio-runtime-w-1666  [003]  1318.058435: funcgraph_entry:        2.125 us   |      __bpf_prog_put.isra.47();
     tokio-runtime-w-1666  [003]  1318.058438: funcgraph_exit:       ! 602.291 us |    }
     tokio-runtime-w-1666  [003]  1318.058439: funcgraph_exit:       ! 631.791 us |  }
```shell
这是<code>kernel/bpf/trampoline.c</code>中与最后执行的函数<code>bpf_trampoline_update</code>对应的源代码:
```c
    static int bpf_trampoline_update(struct bpf_trampoline *tr)
    {
        struct bpf_tramp_image *im;
        struct bpf_tramp_progs *tprogs;
        u32 flags = BPF_TRAMP_F_RESTORE_REGS;
        bool ip_arg = false;
        int err, total;

        tprogs = bpf_trampoline_get_progs(tr, &total, &ip_arg);
        if (IS_ERR(tprogs))
            return PTR_ERR(tprogs);

        if (total == 0) {
            err = unregister_fentry(tr, tr->cur_image->image);
            bpf_tramp_image_put(tr->cur_image);
            tr->cur_image = NULL;
            tr->selector = 0;
            goto out;
        }

        im = bpf_tramp_image_alloc(tr->key, tr->selector);
        if (IS_ERR(im)) {
            err = PTR_ERR(im);
            goto out;
        }

        if (tprogs[BPF_TRAMP_FEXIT].nr_progs ||
            tprogs[BPF_TRAMP_MODIFY_RETURN].nr_progs)
            flags = BPF_TRAMP_F_CALL_ORIG | BPF_TRAMP_F_SKIP_FRAME;

        if (ip_arg)
            flags |= BPF_TRAMP_F_IP_ARG;

        err = arch_prepare_bpf_trampoline(im, im->image, im->image + PAGE_SIZE,
                          &tr->func.model, flags, tprogs,
                          tr->func.addr);
        if (err < 0)
            goto out;

        WARN_ON(tr->cur_image && tr->selector == 0);
        WARN_ON(!tr->cur_image && tr->selector);
        if (tr->cur_image)
            /* progs already running at this address */
            err = modify_fentry(tr, tr->cur_image->image, im->image);
        else
            /* first time registering */
            err = register_fentry(tr, im->image);
        if (err)
            goto out;
        if (tr->cur_image)
            bpf_tramp_image_put(tr->cur_image);
        tr->cur_image = im;
        tr->selector++;
    out:
        kfree(tprogs);
        return err;
    }

根据先前的输出,我们可以看到:

     tokio-runtime-w-1666  [003]  1318.058371: funcgraph_entry:        1.250 us   |            arch_prepare_bpf_trampoline();
     tokio-runtime-w-1666  [003]  1318.058373: funcgraph_entry:        2.292 us   |            kfree();

arch_prepare_bpf_trampolinekfree函数之间没有其他函数调用,所以很可能第一个函数在err变量中返回了错误代码。让我们来验证一下!

通过以下方式在shell中启动bpftace,我们可以捕获arch_prepare_bpf_trampoline函数的返回值并将其打印到控制台上:

    bpftrace -e 'kretprobe:arch_prepare_bpf_trampoline { printf("retval link: %d\n", retval); }'

并且在另一个终端中启动probe后,我们从bpftace得到了以下输出:

    root@pine64-1:/home/exein# bpftrace -e 'kretprobe:arch_prepare_bpf_trampoline { printf("retval link: %d\n", retval); }'
    Attaching 1 probe...
    retval link: -524

这是因为内核5.15缺乏对aarch64架构的arch_prepare_bpf_trampoline实现,并使用了默认的占位符实现。

    int __weak
    arch_prepare_bpf_trampoline(struct bpf_tramp_image *tr, void *image, void *image_end,
                    const struct btf_func_model *m, u32 flags,
                    struct bpf_tramp_links *tlinks,
                    void *orig_call)
    {
        return -ENOTSUPP;
    }

因此,这个功能在这个内核版本上是不受支持的。好消息是,多亏了这个补丁,它在6.x内核中得到了实现。

让我们移步到6.x内核。

Linux 6.1

如果我们尝试在内核 6.1 上运行 probe,我们会得到以下输出:

    root@pine64:/home/exein# ./probe file-system-monitor
    thread 'main' panicked at 'initialization failed: ProgramAttachError { program: "lsm path_mknod", program_error: SyscallError { call: "bpf_raw_tracepoint_open", io_error: Os { code: 524, kind: Uncategorized, message: "No error information" } } }', src/bin/probe.rs:72:43
    note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

对于内核版本6.1,我们仍然遇到了和5.15内核一样的错误!!!让我们找出其中的原因。

这次在arch_prepare_bpf_trampoline上运行bpftrace,我们得到了以下输出:

    root@pine64:/home/exein# bpftrace -e 'kretprobe:arch_prepare_bpf_trampoline { printf("retval tp link: %d\n", retval); }'
    Attaching 1 probe...
    retval tp link: 284

所以问题不在这里,这个函数不再返回错误了。让我们回到函数调用图。

这次我们启动trace-cmd,跳过一些函数以获得更清晰的输出:

    trace-cmd record \
        -p function_graph \
        -g bpf_trampoline_link_prog \
        -n bpf_jit_alloc_exec \
        -n kmalloc_trace \
        -n arch_prepare_bpf_trampoline \
        -n generic_handle_domain_irq \
        -n do_interrupt_handler \
        -n irq_exit_rcu \
        ./probe file-system-monitor

我们从trace-cmd report中获得以下输出:

    root@pine64:/home/exein# trace-cmd report 
    CPU 0 is empty
    CPU 1 is empty
    CPU 3 is empty
    cpus=4
     tokio-runtime-w-11886 [002] 193385.056283: funcgraph_entry:                   |  bpf_trampoline_link_prog() {
     tokio-runtime-w-11886 [002] 193385.056321: funcgraph_entry:      + 15.042 us  |    mutex_lock();
     tokio-runtime-w-11886 [002] 193385.056373: funcgraph_entry:                   |    __bpf_trampoline_link_prog() {
     tokio-runtime-w-11886 [002] 193385.056395: funcgraph_entry:      + 14.833 us  |      bpf_attach_type_to_tramp();
     tokio-runtime-w-11886 [002] 193385.056428: funcgraph_entry:                   |      bpf_trampoline_update.isra.23() {
     tokio-runtime-w-11886 [002] 193385.056459: funcgraph_entry:        2.917 us   |        bpf_jit_charge_modmem();
     tokio-runtime-w-11886 [002] 193385.056531: funcgraph_entry:                   |        find_vm_area() {
     tokio-runtime-w-11886 [002] 193385.056540: funcgraph_entry:        3.000 us   |          find_vmap_area();
     tokio-runtime-w-11886 [002] 193385.056547: funcgraph_exit:       + 16.208 us  |        }
     tokio-runtime-w-11886 [002] 193385.056554: funcgraph_entry:                   |        __alloc_percpu_gfp() {
     tokio-runtime-w-11886 [002] 193385.056563: funcgraph_entry:                   |          pcpu_alloc() {
     tokio-runtime-w-11886 [002] 193385.056568: funcgraph_entry:        4.875 us   |            mutex_lock_killable();
     tokio-runtime-w-11886 [002] 193385.056591: funcgraph_entry:                   |            pcpu_find_block_fit() {
     tokio-runtime-w-11886 [002] 193385.056599: funcgraph_entry:        8.625 us   |              pcpu_next_fit_region.constprop.38();
     tokio-runtime-w-11886 [002] 193385.056608: funcgraph_exit:       + 17.166 us  |            }
     tokio-runtime-w-11886 [002] 193385.056610: funcgraph_entry:                   |            pcpu_alloc_area() {
     tokio-runtime-w-11886 [002] 193385.056639: funcgraph_entry:        9.167 us   |              pcpu_block_update();
     tokio-runtime-w-11886 [002] 193385.056656: funcgraph_entry:        7.667 us   |              pcpu_block_update_hint_alloc();
     tokio-runtime-w-11886 [002] 193385.056671: funcgraph_entry:        7.750 us   |              pcpu_chunk_relocate();
     tokio-runtime-w-11886 [002] 193385.056679: funcgraph_exit:       + 69.667 us  |            }
     tokio-runtime-w-11886 [002] 193385.056682: funcgraph_entry:        7.042 us   |            mutex_unlock();
     tokio-runtime-w-11886 [002] 193385.056703: funcgraph_entry:        2.792 us   |            pcpu_memcg_post_alloc_hook();
     tokio-runtime-w-11886 [002] 193385.056712: funcgraph_exit:       ! 148.709 us |          }
     tokio-runtime-w-11886 [002] 193385.056719: funcgraph_exit:       ! 165.250 us |        }
     tokio-runtime-w-11886 [002] 193385.056866: funcgraph_entry:                   |        bpf_image_ksym_add() {
     tokio-runtime-w-11886 [002] 193385.056873: funcgraph_entry:                   |          bpf_ksym_add() {
     tokio-runtime-w-11886 [002] 193385.056882: funcgraph_entry:        2.750 us   |            __local_bh_disable_ip();
     tokio-runtime-w-11886 [002] 193385.056897: funcgraph_entry:        4.625 us   |            __local_bh_enable_ip();
     tokio-runtime-w-11886 [002] 193385.056905: funcgraph_exit:       + 32.459 us  |          }
     tokio-runtime-w-11886 [002] 193385.056922: funcgraph_entry:        7.584 us   |          perf_event_ksymbol();
     tokio-runtime-w-11886 [002] 193385.056944: funcgraph_exit:       + 78.417 us  |        }
     tokio-runtime-w-11886 [002] 193385.057492: funcgraph_entry:                   |        set_memory_ro() {
     tokio-runtime-w-11886 [002] 193385.057501: funcgraph_entry:                   |          change_memory_common() {
     tokio-runtime-w-11886 [002] 193385.057504: funcgraph_entry:                   |            find_vm_area() {
     tokio-runtime-w-11886 [002] 193385.057506: funcgraph_entry:        8.875 us   |              find_vmap_area();
     tokio-runtime-w-11886 [002] 193385.057518: funcgraph_exit:       + 14.250 us  |            }
     tokio-runtime-w-11886 [002] 193385.057522: funcgraph_entry:                   |            __change_memory_common() {
     tokio-runtime-w-11886 [002] 193385.057531: funcgraph_entry:                   |              apply_to_page_range() {
     tokio-runtime-w-11886 [002] 193385.057538: funcgraph_entry:                   |                __apply_to_page_range() {
     tokio-runtime-w-11886 [002] 193385.057544: funcgraph_entry:      + 12.791 us  |                  pud_huge();
     tokio-runtime-w-11886 [002] 193385.057559: funcgraph_entry:        2.708 us   |                  pmd_huge();
     tokio-runtime-w-11886 [002] 193385.057574: funcgraph_entry:      + 15.125 us  |                  change_page_range();
     tokio-runtime-w-11886 [002] 193385.057591: funcgraph_exit:       + 53.792 us  |                }
     tokio-runtime-w-11886 [002] 193385.057597: funcgraph_exit:       + 66.083 us  |              }
     tokio-runtime-w-11886 [002] 193385.057610: funcgraph_exit:       + 88.125 us  |            }
     tokio-runtime-w-11886 [002] 193385.057619: funcgraph_entry:                   |            vm_unmap_aliases() {
     tokio-runtime-w-11886 [002] 193385.057622: funcgraph_entry:                   |              _vm_unmap_aliases.part.77() {
     tokio-runtime-w-11886 [002] 193385.057625: funcgraph_entry:        9.125 us   |                mutex_lock();
     tokio-runtime-w-11886 [002] 193385.057637: funcgraph_entry:        3.084 us   |                purge_fragmented_blocks_allcpus();
     tokio-runtime-w-11886 [002] 193385.057643: funcgraph_entry:                   |                __purge_vmap_area_lazy() {
     tokio-runtime-w-11886 [002] 193385.057687: funcgraph_entry:                   |                  kmem_cache_free() {
     tokio-runtime-w-11886 [002] 193385.057693: funcgraph_entry:      + 13.250 us  |                    __slab_free();
     tokio-runtime-w-11886 [002] 193385.057705: funcgraph_exit:       + 18.750 us  |                  }
     tokio-runtime-w-11886 [002] 193385.057718: funcgraph_entry:        7.416 us   |                  __cond_resched_lock();
     tokio-runtime-w-11886 [002] 193385.057733: funcgraph_exit:       + 90.042 us  |                }
     tokio-runtime-w-11886 [002] 193385.057741: funcgraph_entry:        2.792 us   |                mutex_unlock();
     tokio-runtime-w-11886 [002] 193385.057747: funcgraph_exit:       ! 124.666 us |              }
     tokio-runtime-w-11886 [002] 193385.057749: funcgraph_exit:       ! 130.291 us |            }
     tokio-runtime-w-11886 [002] 193385.057756: funcgraph_entry:                   |            __change_memory_common() {
     tokio-runtime-w-11886 [002] 193385.057759: funcgraph_entry:                   |              apply_to_page_range() {
     tokio-runtime-w-11886 [002] 193385.057765: funcgraph_entry:                   |                __apply_to_page_range() {
     tokio-runtime-w-11886 [002] 193385.057768: funcgraph_entry:        4.125 us   |                  pud_huge();
     tokio-runtime-w-11886 [002] 193385.057778: funcgraph_entry:        8.750 us   |                  pmd_huge();
     tokio-runtime-w-11886 [002] 193385.057790: funcgraph_entry:        4.625 us   |                  change_page_range();
     tokio-runtime-w-11886 [002] 193385.057797: funcgraph_exit:       + 31.958 us  |                }
     tokio-runtime-w-11886 [002] 193385.057803: funcgraph_exit:       + 44.375 us  |              }
     tokio-runtime-w-11886 [002] 193385.057817: funcgraph_exit:       + 61.208 us  |            }
     tokio-runtime-w-11886 [002] 193385.057820: funcgraph_exit:       ! 319.292 us |          }
     tokio-runtime-w-11886 [002] 193385.057826: funcgraph_exit:       ! 333.667 us |        }
     tokio-runtime-w-11886 [002] 193385.057840: funcgraph_entry:                   |        set_memory_x() {
     tokio-runtime-w-11886 [002] 193385.057847: funcgraph_entry:                   |          change_memory_common() {
     tokio-runtime-w-11886 [002] 193385.057855: funcgraph_entry:                   |            find_vm_area() {
     tokio-runtime-w-11886 [002] 193385.057858: funcgraph_entry:        2.917 us   |              find_vmap_area();
     tokio-runtime-w-11886 [002] 193385.057870: funcgraph_exit:       + 14.375 us  |            }
     tokio-runtime-w-11886 [002] 193385.057876: funcgraph_entry:                   |            vm_unmap_aliases() {
     tokio-runtime-w-11886 [002] 193385.057879: funcgraph_entry:                   |              _vm_unmap_aliases.part.77() {
     tokio-runtime-w-11886 [002] 193385.057882: funcgraph_entry:        3.959 us   |                mutex_lock();
     tokio-runtime-w-11886 [002] 193385.057893: funcgraph_entry:        3.000 us   |                purge_fragmented_blocks_allcpus();
     tokio-runtime-w-11886 [002] 193385.057900: funcgraph_entry:        2.791 us   |                __purge_vmap_area_lazy();
     tokio-runtime-w-11886 [002] 193385.057907: funcgraph_entry:        2.709 us   |                mutex_unlock();
     tokio-runtime-w-11886 [002] 193385.057913: funcgraph_exit:       + 33.708 us  |              }
     tokio-runtime-w-11886 [002] 193385.057915: funcgraph_exit:       + 43.000 us  |            }
     tokio-runtime-w-11886 [002] 193385.057922: funcgraph_entry:                   |            __change_memory_common() {
     tokio-runtime-w-11886 [002] 193385.057925: funcgraph_entry:                   |              apply_to_page_range() {
     tokio-runtime-w-11886 [002] 193385.057930: funcgraph_entry:                   |                __apply_to_page_range() {
     tokio-runtime-w-11886 [002] 193385.057933: funcgraph_entry:        4.292 us   |                  pud_huge();
     tokio-runtime-w-11886 [002] 193385.057945: funcgraph_entry:        8.750 us   |                  pmd_huge();
     tokio-runtime-w-11886 [002] 193385.057956: funcgraph_entry:        3.958 us   |                  change_page_range();
     tokio-runtime-w-11886 [002] 193385.058037: funcgraph_exit:       + 32.083 us  |                }
     tokio-runtime-w-11886 [002] 193385.058089: funcgraph_entry:        7.667 us   |                irq_enter_rcu();
     tokio-runtime-w-11886 [002] 193385.058233: funcgraph_exit:       ! 308.041 us |              }
     tokio-runtime-w-11886 [002] 193385.058239: funcgraph_exit:       ! 316.709 us |            }
     tokio-runtime-w-11886 [002] 193385.058247: funcgraph_exit:       ! 400.417 us |          }
     tokio-runtime-w-11886 [002] 193385.058255: funcgraph_exit:       ! 415.000 us |        }
     tokio-runtime-w-11886 [002] 193385.058555: funcgraph_entry:        8.250 us   |        irq_enter_rcu();
     tokio-runtime-w-11886 [002] 193385.058958: funcgraph_entry:                   |        kallsyms_lookup_size_offset() {
     tokio-runtime-w-11886 [002] 193385.058974: funcgraph_entry:      + 36.333 us  |          get_symbol_pos();
     tokio-runtime-w-11886 [002] 193385.059017: funcgraph_exit:       + 59.750 us  |        }
     tokio-runtime-w-11886 [002] 193385.059043: funcgraph_entry:                   |        kfree() {
     tokio-runtime-w-11886 [002] 193385.059057: funcgraph_entry:        3.000 us   |          __kmem_cache_free();
     tokio-runtime-w-11886 [002] 193385.059065: funcgraph_exit:       + 22.833 us  |        }
     tokio-runtime-w-11886 [002] 193385.059073: funcgraph_exit:       # 2644.708 us |      }
     tokio-runtime-w-11886 [002] 193385.059079: funcgraph_exit:       # 2706.292 us |    }
     tokio-runtime-w-11886 [002] 193385.059095: funcgraph_entry:        2.792 us   |    mutex_unlock();
     tokio-runtime-w-11886 [002] 193385.059101: funcgraph_exit:       # 2870.416 us |  }

这次程序已经通过了arch_prepare_bpf_trampolineset_memory_roset_memory_x,我们看到的最后一个函数是kallsyms_lookup_size_offset

正如我们在kernel/bpf/trampoline.c中的bpf_trampoline_update函数中所看到的,这里并没有明确调用kallsyms_lookup_size_offset

    static int bpf_trampoline_update(struct bpf_trampoline *tr, bool lock_direct_mutex)
    {

    // ... OTHER CODE ...

    #ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
    again:
        if ((tr->flags & BPF_TRAMP_F_SHARE_IPMODIFY) &&
            (tr->flags & BPF_TRAMP_F_CALL_ORIG))
            tr->flags |= BPF_TRAMP_F_ORIG_STACK;
    #endif

        err = arch_prepare_bpf_trampoline(im, im->image, im->image + PAGE_SIZE,
                          &tr->func.model, tr->flags, tlinks,
                          tr->func.addr);
        if (err < 0)
            goto out;

        set_memory_ro((long)im->image, 1);
        set_memory_x((long)im->image, 1);

        WARN_ON(tr->cur_image && tr->selector == 0);
        WARN_ON(!tr->cur_image && tr->selector);
        if (tr->cur_image)
            /* progs already running at this address */
            err = modify_fentry(tr, tr->cur_image->image, im->image, lock_direct_mutex);
        else
            /* first time registering */
            err = register_fentry(tr, im->image);

    #ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
        if (err == -EAGAIN) {
            /* -EAGAIN from bpf_tramp_ftrace_ops_func. Now
             * BPF_TRAMP_F_SHARE_IPMODIFY is set, we can generate the
             * trampoline again, and retry register.
             */
            /* reset fops->func and fops->trampoline for re-register */
            tr->fops->func = NULL;
            tr->fops->trampoline = 0;

            /* reset im->image memory attr for arch_prepare_bpf_trampoline */
            set_memory_nx((long)im->image, 1);
            set_memory_rw((long)im->image, 1);
            goto again;
        }
    #endif
        if (err)
            goto out;

        if (tr->cur_image)
            bpf_tramp_image_put(tr->cur_image);
        tr->cur_image = im;
        tr->selector++;
    out:
        /* If any error happens, restore previous flags */
        if (err)
            tr->flags = orig_flags;
        kfree(tlinks);
        return err;
    }
```shell

> **注意:** <code>bpf_trampoline_update</code>的实现与之前的内核5.15稍有不同。

<code>kallsyms_lookup_size_offset</code>的调用被隐藏在另一个函数内部。我们在函数图中看不到它,因为编译器将其内联了。

看起来<code>kallsyms_lookup_size_offset</code>是由<code>ftrace_location</code>调用的:
```c
    unsigned long ftrace_location(unsigned long ip)
    {
        struct dyn_ftrace *rec;
        unsigned long offset;
        unsigned long size;

        rec = lookup_rec(ip, ip);
        if (!rec) {
            if (!kallsyms_lookup_size_offset(ip, &size, &offset))
                goto out;

            /* map sym+0 to __fentry__ */
            if (!offset)
                rec = lookup_rec(ip, ip + size - 1);
        }

        if (rec)
            return rec->ip;

    out:
        return 0;
    }

ftrace_locationregister_fentry调用,而register_fentry在调用ftrace_location之后,在struct bpf_trampoline *trfops字段上包含了一次检查。

    /* first time registering */
    static int register_fentry(struct bpf_trampoline *tr, void *new_addr)
    {
        void *ip = tr->func.addr;
        unsigned long faddr;
        int ret;

        faddr = ftrace_location((unsigned long)ip);
        if (faddr) {
            if (!tr->fops)
                return -ENOTSUPP;
            tr->func.ftrace_managed = true;
        }

        if (bpf_trampoline_module_get(tr))
            return -ENOENT;

        if (tr->func.ftrace_managed) {
            ftrace_set_filter_ip(tr->fops, (unsigned long)ip, 0, 1);
            ret = register_ftrace_direct_multi(tr->fops, (long)new_addr);
        } else {
            ret = bpf_arch_text_poke(ip, BPF_MOD_CALL, NULL, new_addr);
        }

        if (ret)
            bpf_trampoline_module_put(tr);
        return ret;
    }

确实,如果tr->fopsfalse,该函数将返回错误-ENOTSUPP

让我们找出tr->fops是在哪里初始化的。

如果我们是正确的,那么创建trampoline的地方应该在bpf_trampoline_lookup函数内部。

    static struct bpf_trampoline *bpf_trampoline_lookup(u64 key)
    {
        struct bpf_trampoline *tr;
        struct hlist_head *head;
        int i;

        mutex_lock(&trampoline_mutex);
        head = &trampoline_table[hash_64(key, TRAMPOLINE_HASH_BITS)];
        hlist_for_each_entry(tr, head, hlist) {
            if (tr->key == key) {
                refcount_inc(&tr->refcnt);
                goto out;
            }
        }
        tr = kzalloc(sizeof(*tr), GFP_KERNEL);
        if (!tr)
            goto out;
    #ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
        tr->fops = kzalloc(sizeof(struct ftrace_ops), GFP_KERNEL);
        if (!tr->fops) {
            kfree(tr);
            tr = NULL;
            goto out;
        }
        tr->fops->private = tr;
        tr->fops->ops_func = bpf_tramp_ftrace_ops_func;
    #endif

        tr->key = key;
        INIT_HLIST_NODE(&tr->hlist);
        hlist_add_head(&tr->hlist, head);
        refcount_set(&tr->refcnt, 1);
        mutex_init(&tr->mutex);
        for (i = 0; i < BPF_TRAMP_MAX; i++)
            INIT_HLIST_HEAD(&tr->progs_hlist[i]);
    out:
        mutex_unlock(&trampoline_mutex);
        return tr;
    }

在分配之后,只有在出现CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS标志时,才会填充trampoline的fops字段。这个标志依赖于HAVE_CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS标志,而这个标志在aarch64上不存在。

当前情况下,由于缺少_ftrace直接调用_功能,无法在code>aarch64上使用BPF LSM。幸运的是,当前的mainline分支已经合并了一个[补丁](https://lore.kernel.org/bpf/[email protected]/T/),该补丁将在aarch64上启用LSMs(以及其他功能)。<>

预计这些变化将会在下一个6.4版的Linux内核中发布。

CFC4N的博客 由 CFC4N 创作,采用 知识共享 署名-非商业性使用-相同方式共享(3.0未本地化版本)许可协议进行许可。基于https://www.cnxct.com上的作品创作。转载请注明转自:探索aarch64架构上使用ftrace的BPF LSM


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK