• Apollo 应用与源码分析:Monitor监控-软件监控-定位、camera、功能安全、数据记录监控


    目录

    定位监控

    代码

    分析

    备注

    camera监控

    代码

    分析

    功能安全监控

    代码

    分析

    CheckSafty函数分析

    RunOnce 函数分析

    记录功能监控

    代码

    分析

    SmartRecorderStatus proto

    状态的上报位置分析

    监控信息汇总服务

    代码

    分析


    定位监控

    代码

    1. class LocalizationMonitor : public RecurrentRunner {
    2. public:
    3. LocalizationMonitor();
    4. void RunOnce(const double current_time) override;
    5. };
    6. void LocalizationMonitor::RunOnce(const double current_time) {
    7. auto manager = MonitorManager::Instance();
    8. auto* component = apollo::common::util::FindOrNull(
    9. *manager->GetStatus()->mutable_components(),
    10. FLAGS_localization_component_name);
    11. if (component == nullptr) {
    12. // localization is not monitored in current mode, skip.
    13. return;
    14. }
    15. static auto reader =
    16. manager->CreateReader(FLAGS_localization_msf_status);
    17. reader->Observe();
    18. const auto status = reader->GetLatestObserved();
    19. ComponentStatus* component_status = component->mutable_other_status();
    20. component_status->clear_status();
    21. if (status == nullptr) {
    22. SummaryMonitor::EscalateStatus(ComponentStatus::ERROR,
    23. "No LocalizationStatus received",
    24. component_status);
    25. return;
    26. }
    27. // Translate LocalizationStatus to ComponentStatus. Note that ERROR and FATAL
    28. // will trigger safety mode in current settings.
    29. switch (status->fusion_status()) {
    30. case MeasureState::OK:
    31. SummaryMonitor::EscalateStatus(ComponentStatus::OK, "", component_status);
    32. break;
    33. case MeasureState::WARNNING:
    34. SummaryMonitor::EscalateStatus(
    35. ComponentStatus::WARN,
    36. absl::StrCat("WARNNING: ", status->state_message()),
    37. component_status);
    38. break;
    39. case MeasureState::ERROR:
    40. SummaryMonitor::EscalateStatus(
    41. ComponentStatus::WARN,
    42. absl::StrCat("ERROR: ", status->state_message()), component_status);
    43. break;
    44. case MeasureState::CRITICAL_ERROR:
    45. SummaryMonitor::EscalateStatus(
    46. ComponentStatus::ERROR,
    47. absl::StrCat("CRITICAL_ERROR: ", status->state_message()),
    48. component_status);
    49. break;
    50. case MeasureState::FATAL_ERROR:
    51. SummaryMonitor::EscalateStatus(
    52. ComponentStatus::FATAL,
    53. absl::StrCat("FATAL_ERROR: ", status->state_message()),
    54. component_status);
    55. break;
    56. default:
    57. AFATAL << "Unknown fusion_status: " << status->fusion_status();
    58. break;
    59. }
    60. }

    分析

    1. 判断定位模块是不是属于被监控的模块
    2. 订阅localization_msf_status,如果读不到就报ERROR级别的故障
    3. 直接判断topic 中的状态信息,根据状态报对应故障

    备注

    ## Check MSF Localization Status We provide a simple way to check lidar localization, GNSS localization and fusion localization status. There are four states {NOT_VALID, NOT_STABLE, OK, VALID} for localization status. You can simply use `rostopic echo /apollo/localization/msf_status` to check localization status. If fusion_status is VALID or OK, the output of msf localization is reliable.
    

    上述是apollo MSF 定位状态的判断逻辑,上述故障都是由业务模块定位部分设置并发出的。

    下面是modules/localization/rtk/rtk_localization.cc的状态检测部分

    1. void RTKLocalization::FillLocalizationStatusMsg(
    2. const drivers::gnss::InsStat &status,
    3. LocalizationStatus *localization_status) {
    4. apollo::common::Header *header = localization_status->mutable_header();
    5. double timestamp = apollo::cyber::Clock::NowInSeconds();
    6. header->set_timestamp_sec(timestamp);
    7. localization_status->set_measurement_time(status.header().timestamp_sec());
    8. if (!status.has_pos_type()) {
    9. localization_status->set_fusion_status(MeasureState::ERROR);
    10. localization_status->set_state_message(
    11. "Error: Current Localization Status Is Missing.");
    12. return;
    13. }

    camera监控

    代码

    1. class CameraMonitor : public RecurrentRunner {
    2. public:
    3. CameraMonitor();
    4. void RunOnce(const double current_time) override;
    5. private:
    6. static void UpdateStatus(ComponentStatus* status);
    7. };
    8. void CameraMonitor::RunOnce(const double current_time) {
    9. auto* manager = MonitorManager::Instance();
    10. auto* component = apollo::common::util::FindOrNull(
    11. *manager->GetStatus()->mutable_components(), FLAGS_camera_component_name);
    12. if (component == nullptr) {
    13. // camera is not monitored in current mode, skip.
    14. return;
    15. }
    16. auto* status = component->mutable_other_status();
    17. UpdateStatus(status);
    18. }

    分析

    除了判断camera是不是被配置为监控配置之外核心函数在UpdateStatus 中

    1. void CameraMonitor::UpdateStatus(ComponentStatus* status) {
    2. status->clear_status();
    3. std::string frame_id = "";
    4. for (const auto& topic : camera_topic_set) {
    5. const auto& reader_message_pair = CreateReaderAndLatestsMessage(topic);
    6. const auto& reader = reader_message_pair.first;
    7. const auto& message = reader_message_pair.second;
    8. if (reader != nullptr && message != nullptr) {
    9. if (frame_id.empty()) {
    10. const auto& header = message->header();
    11. if (header.has_frame_id()) {
    12. frame_id = header.frame_id();
    13. }
    14. } else {
    15. SummaryMonitor::EscalateStatus(
    16. ComponentStatus::ERROR,
    17. absl::StrCat("Only one camera is permitted"), status);
    18. }
    19. }
    20. }
    21. if (frame_id.empty()) {
    22. SummaryMonitor::EscalateStatus(
    23. ComponentStatus::ERROR, absl::StrCat("No camera is detected"), status);
    24. } else {
    25. SummaryMonitor::EscalateStatus(
    26. ComponentStatus::OK, absl::StrCat("Detected one camera: ", frame_id),
    27. status);
    28. }
    29. }
    1. static const auto camera_topic_set = std::set{
    2. FLAGS_image_long_topic, FLAGS_camera_image_long_topic,
    3. FLAGS_camera_image_short_topic, FLAGS_camera_front_6mm_topic,
    4. FLAGS_camera_front_6mm_2_topic, FLAGS_camera_front_12mm_topic,
    5. // Add more cameras here if you want to monitor.
    6. };
    1. 获取最新的消息
    2. 获取消息头,拿到frame id,如果有两组frame id就报ERROR
    absl::StrCat("Only one camera is permitted"), status);

      如果frame id 是 empty,就报ERROR

    ComponentStatus::ERROR, absl::StrCat("No camera is detected"), status);

    功能安全监控

    代码

    1. // Check if we need to switch to safe mode, and then
    2. // 1. Notify driver to take action.
    3. // 2. Trigger Guardian if no proper action was taken.
    4. class FunctionalSafetyMonitor : public RecurrentRunner {
    5. public:
    6. FunctionalSafetyMonitor();
    7. void RunOnce(const double current_time);
    8. private:
    9. bool CheckSafety();
    10. };
    11. void FunctionalSafetyMonitor::RunOnce(const double current_time) {
    12. auto* system_status = MonitorManager::Instance()->GetStatus();
    13. // Everything looks good or has been handled properly.
    14. if (CheckSafety()) {
    15. system_status->clear_passenger_msg();
    16. system_status->clear_safety_mode_trigger_time();
    17. system_status->clear_require_emergency_stop();
    18. return;
    19. }
    20. if (system_status->require_emergency_stop()) {
    21. // EStop has already been triggered.
    22. return;
    23. }
    24. // Newly entered safety mode.
    25. system_status->set_passenger_msg("Error! Please disengage.");
    26. if (!system_status->has_safety_mode_trigger_time()) {
    27. system_status->set_safety_mode_trigger_time(current_time);
    28. return;
    29. }
    30. // Trigger EStop if no action was taken in time.
    31. if (system_status->safety_mode_trigger_time() +
    32. FLAGS_safety_mode_seconds_before_estop <
    33. current_time) {
    34. system_status->set_require_emergency_stop(true);
    35. }
    36. }

    分析

    CheckSafty函数分析

    1. bool FunctionalSafetyMonitor::CheckSafety() {
    2. // We only check safety in self driving mode.
    3. auto manager = MonitorManager::Instance();
    4. if (!manager->IsInAutonomousMode()) {
    5. return true;
    6. }
    7. // Check HMI modules status.
    8. const auto& mode = manager->GetHMIMode();
    9. const auto& hmi_modules = manager->GetStatus()->hmi_modules();
    10. for (const auto& iter : mode.modules()) {
    11. const std::string& module_name = iter.first;
    12. const auto& module = iter.second;
    13. if (module.required_for_safety() &&
    14. !IsSafe(module_name, hmi_modules.at(module_name))) {
    15. return false;
    16. }
    17. }
    18. // Check monitored components status.
    19. const auto& components = manager->GetStatus()->components();
    20. for (const auto& iter : mode.monitored_components()) {
    21. const std::string& component_name = iter.first;
    22. const auto& component = iter.second;
    23. if (component.required_for_safety() &&
    24. !IsSafe(component_name, components.at(component_name).summary())) {
    25. return false;
    26. }
    27. }
    28. // Everything looks good.
    29. return true;
    30. }
    1. 判断是不是在自动驾驶模式下,如果不是直接跳出
    2. 检查所有被监控的组件状态是不是ERROR或者FATAL
    3. 如果是就是不安全,如果不是就是安全

    RunOnce 函数分析

    1. 检查现在的安全状态
    2. 如果安全就清空之前的状态信息并返回
    3. 如果不安全就判断是不是已经调用了安全处置措施:EStop,如果Estop已经在工作了就返回
    4. 如果发现ESTOP还没有开始工作就检查trigger 安全处置ESTOP 的时间,如果没有超时就等待

    记录功能监控

    recorder monitor 是对于是apollo 对于记录服务的监控,方法是通过订阅/apollo/data/recorder/status 这个topic 获取Recorder status。

    代码

    1. class RecorderMonitor : public RecurrentRunner {
    2. public:
    3. RecorderMonitor();
    4. void RunOnce(const double current_time) override;
    5. };
    6. void RecorderMonitor::RunOnce(const double current_time) {
    7. auto manager = MonitorManager::Instance();
    8. auto* component = apollo::common::util::FindOrNull(
    9. *manager->GetStatus()->mutable_components(),
    10. FLAGS_smart_recorder_component_name);
    11. if (component == nullptr) {
    12. // SmartRecorder is not monitored in current mode, skip.
    13. return;
    14. }
    15. static auto reader =
    16. manager->CreateReader(FLAGS_recorder_status_topic);
    17. reader->Observe();
    18. const auto status = reader->GetLatestObserved();
    19. ComponentStatus* component_status = component->mutable_other_status();
    20. component_status->clear_status();
    21. if (status == nullptr) {
    22. SummaryMonitor::EscalateStatus(ComponentStatus::ERROR,
    23. "No SmartRecorderStatus received",
    24. component_status);
    25. return;
    26. }
    27. // Translate SmartRecorderStatus to ComponentStatus. Note that ERROR and FATAL
    28. // will trigger safety mode in current settings.
    29. switch (status->recording_state()) {
    30. case RecordingState::RECORDING:
    31. SummaryMonitor::EscalateStatus(ComponentStatus::OK, "", component_status);
    32. break;
    33. case RecordingState::TERMINATING:
    34. SummaryMonitor::EscalateStatus(
    35. ComponentStatus::WARN,
    36. absl::StrCat("WARNNING: ", status->state_message()),
    37. component_status);
    38. break;
    39. case RecordingState::STOPPED:
    40. SummaryMonitor::EscalateStatus(
    41. ComponentStatus::OK,
    42. absl::StrCat("STOPPED: ", status->state_message()), component_status);
    43. break;
    44. default:
    45. AFATAL << "Unknown recording status: " << status->recording_state();
    46. break;
    47. }
    48. }

    分析

    第一步依旧是判断recorder 是不是被配置的监控模块,如果不是直接返回。

    然后就是直接判断status->recording_state(),如果是RecordingState::TERMINATING(终止)状态就报出一个WARNING 的故障

    SmartRecorderStatus proto

    1. enum RecordingState {
    2. STOPPED = 0;
    3. RECORDING = 1;
    4. TERMINATING = 2;
    5. }
    6. message SmartRecorderStatus {
    7. optional apollo.common.Header header = 1;
    8. optional RecordingState recording_state = 2;
    9. optional string state_message = 3;
    10. }

    状态的上报位置分析

    modules/data/tools/smart_recorder/realtime_record_processor.cc

    我们可以在上述文件中找到recorder状态赋值情况,但是可惜apollo 中目前没有一个模块会主动填写RecordingState::TERMINATING(终止)状态。

    监控信息汇总服务

    代码

    1. // A monitor which summarize other monitors' result and publish the whole status
    2. // if it has changed.
    3. class SummaryMonitor : public RecurrentRunner {
    4. public:
    5. SummaryMonitor();
    6. void RunOnce(const double current_time) override;
    7. // Escalate the status to a higher priority new status:
    8. // FATAL > ERROR > WARN > OK > UNKNOWN.
    9. static void EscalateStatus(const ComponentStatus::Status new_status,
    10. const std::string& message,
    11. ComponentStatus* current_status);
    12. private:
    13. size_t system_status_fp_ = 0;
    14. double last_broadcast_ = 0;
    15. };
    16. void SummaryMonitor::RunOnce(const double current_time) {
    17. auto manager = MonitorManager::Instance();
    18. auto* status = manager->GetStatus();
    19. // Escalate the summary status to the most severe one.
    20. for (auto& component : *status->mutable_components()) {
    21. auto* summary = component.second.mutable_summary();
    22. const auto& process_status = component.second.process_status();
    23. EscalateStatus(process_status.status(), process_status.message(), summary);
    24. const auto& module_status = component.second.module_status();
    25. EscalateStatus(module_status.status(), module_status.message(), summary);
    26. const auto& channel_status = component.second.channel_status();
    27. EscalateStatus(channel_status.status(), channel_status.message(), summary);
    28. const auto& resource_status = component.second.resource_status();
    29. EscalateStatus(resource_status.status(), resource_status.message(),
    30. summary);
    31. const auto& other_status = component.second.other_status();
    32. EscalateStatus(other_status.status(), other_status.message(), summary);
    33. }
    34. // Get fingerprint of current status.
    35. // Don't use DebugString() which has known bug on Map field. The string
    36. // doesn't change though the value has changed.
    37. static std::hash hash_fn;
    38. std::string proto_bytes;
    39. status->SerializeToString(&proto_bytes);
    40. const size_t new_fp = hash_fn(proto_bytes);
    41. if (system_status_fp_ != new_fp ||
    42. current_time - last_broadcast_ > FLAGS_system_status_publish_interval) {
    43. static auto writer =
    44. manager->CreateWriter(FLAGS_system_status_topic);
    45. apollo::common::util::FillHeader("SystemMonitor", status);
    46. writer->Write(*status);
    47. status->clear_header();
    48. system_status_fp_ = new_fp;
    49. last_broadcast_ = current_time;
    50. }
    51. }

    分析

    针对前面所有的monitor 上报的故障信息,进行一个整合,然后发送到/apollo/monitor/system_status这个topic 上。

  • 相关阅读:
    自动驾驶技术:现状与未来
    项目版本号大小比较,找出最大版本号
    跨境专线物流都有哪些专线
    Uncaught ReferenceError: $ is not defined-- SpringMCV
    Windows服务器配置证书
    神经网络中的数值特征Embedding化方法
    WordPress电脑版+手机版自动识别切换主题插件优化版
    TP5 模型更新的返回值、返回值的判断以及所使用的SQL
    python+selenium的web自动化上传操作的实现
    K8s cert-manager配置PKCS12的TLS
  • 原文地址:https://blog.csdn.net/qq_32378713/article/details/128112717