Product Direction - Monitor

このページのコンテンツ

This is the product direction for Monitor. If you'd like to discuss this direction directly with the product managers for Monitor, feel free to reach out to Dov Hershkovitch (PM of the APM Group) (GitLab , Email), Sarah Waldner (PM of the Health Group) (GitLab, Email Zoom call) or Kevin Chu (Group PM of Monitor) (GitLab, Email Zoom call).

概要

The Monitor stage comes after you've configured your production infrastructure and deployed your application to it.

  1. The Monitor stage is part of the verification and release process - immediate performance validation helps to ensure your service(s) maintain the expected service-level objectives (SLOs) for your users.
  2. The Monitor stage is an observability platform. Observability is the ability to infer internal states of a system based on the system’s external outputs. Whether there are known ways to understand the total health of your systems, or your complex microservices system is full of unknowns, we want you to be able to export your system's telemetry to GitLab and use it to debug and diagnose any potential problem.
  3. The Monitor stage helps you respond when things go wrong. It enables the aggregation of errors and alerts to identify problems and to find improvements. The Monitor stage also enables responders to streamline incident response, so production issues are less frequent and severe.
  4. The Monitor stage also provides is user feedback. Understanding how users experience your product and understanding how users actually use your product are critical to making the right improvements.

Mission

The mission of the GitLab Monitor stage is to provide feedback that decreases the frequency and severity of incidents and improves operational and product performance.

The categories within the Monitor stage fits together to support the mission in the following way:

stateDiagram Development --> Monitor: Code Deploy state Monitor { s1 --> s2: Daily Operations s2 --> s3: Incident s3 --> s4: Resolution s2 --> s4 s1: Verification s1: Metrics s1: DEM (Synthetics) s1: DEM (Web Performance Monitoring) s2: Observability s2: Metrics s2: Traces s2: Logs s2: Errors s3: Response s3: Incident Management s3: Observability s4: Feedback s4: DEM (Real User Monitoring) s4: Product Analytics } Monitor --> Development: Continuous Improvement

Landscape

The Monitor stage directly competes in several markets, including Application Performance Monitoring (APM), Log Management, Infrastructure Monitoring, IT Service Management (ITSM), Digital Experience Management (DEM) and Product Analytics. The total addressable market for the Monitor stage was already more than $1.5 billion in 2018 and is expected to grow as businesses continues to shift to digital.

All of these markets are well-established and crowded. However, they are also being disrupted by the underlying technologies used. The shift to cloud, containers, and microservices architectures changed users' expectation, and many existing vendors have struggled to keep pace. Successful vendors, such as market leader Datadog have leveraged a platform strategy to expand their markets (such as the acquisition of Undefined Labs to expand beyond production applications to provide code insights during development, or their expansion to incident management in 2020), and even stages within DevOps. Competition among market leaders today is also geared toward making the whole stack observable. New Relic's updated business model reflects the need for vendors to capture increasing footprint (and spend) of enterprises while enabling future growth by making a significant part of their business free.

The changes in the market have also revealed opportunities that new entrants into this stage, like GitLab, can take advantage of. Specfically, the Ops section opportunities worth re-emphasizing are:

Vision

The vision of the Monitor stage is to enable DevOps team to operate their application by enabling verification, observability, incident response, and feedback all within GitLab. This vision is part of the overall GitLab vision and enables teams to complete the DevOps loop.

GitLab is uniquely qualified to deliver on this bold and ambitious vision because:

  1. GitLab is a complete devops tool that is connected across the devops stages. Being one tool makes the circular devops workflow, and feedback, seamless and achievable.
  2. The Monitor stage is pursuing a differentiated strategy from other observability vendors by not pursuing a usage based model business model by charging for processing and storage of observability. Instead, we lean on powerful open source software, such as Prometheus and OpenTelemetry, along with commodity cloud services to enable customers to setup and operate Monitor stage observability solutions effectively. We will be successful because we are well-practiced in integrating different parts of the tool chain together.
  3. Going cloud-native is a disruption to operations as usual. Cloud-native systems are constantly changing, are ephemeral, and are complex. As more and more companies adopt cloud-native, GitLab can create a well-integrated central control-pane that enables broad adoption by building on top of the tools that cloud-native teams are already familiar with and are using.

A trade-off in our approach is that we are explicitly not striving to be a fully turn-key experience that can be used to monitor all applications, particularly legacy applications. Wholesale removing an existing monitoring solution is painful and a land and expand strategy is prudent here. As a customer recently explained, "Every greenfield application that we can deploy with your monitoring tools saves us money on New Relic licenses."

As this stage matures, we will begin to shift our attention and compete more directly with incumbent players as a holistic Monitoring solution for modern applications.

3 Year Strategy

Dovetailing on our 2 year vision statement, our 3 year goal is to have built an integrated package of observability and operations tools that can displace today's front-runner in modern observability, Datadog and compete in all Monitor categories. We'll do that by focusing on the four core workflows of Instrument, Triage, Resolve and Improve.

The following links describe our strategy for each individual workflow:

料金

Monitor is a critical component for all software development and operations. The Monitor stage's tier strategy will be broken down by workflow as described below.

Core/Free

To execute our land and expand strategy and to receive as much feedback from our potential user base, Core contains the vast majority of the Monitor features, including metrics, logs, incident management, traces, and error management.

Limits:

Starter/Bronze

Upcoming starter Monitor functionality include:

Premium/Silver

Upcoming premium Monitor functionality include:

Ultimate/Gold

Upcoming ultimate Monitor functionality include:

What's next

From 2020-08 through 2020-10, the following are the goals we are pursuing within the Monitor stage.

  1. The Monitor::Health group is maturing the Incident Management category so that the GitLab SRE team can dogfood it
  2. The Monitor::APM group enables GitLab users to be able to display any metrics on any GitLab dashboards from any Prometheus instance or instances.
    • Key Result 1: Being able to connect multiple Prometheus instances
    • Key Result 2: Complete the Add metric to a custom dashboard workflow
    • Key Result 3: # of custom dashboard created increases 100% through improved on-boarding experience
  3. The Monitor:: APM group enables users to view all logs across their entire Kubernetes cluster in GitLab UI
    • Key Result 1: Improve APM PI by 25%

The quarterly goals fit within the larger overarching objectives of the Monitor stage described below.

First, we plan to provide a streamline triage experience to allows our users to quickly identify and effectively troubleshoot an application problem as described in the following flow:

graph TB; A[Alerts] -->|Embedded Metric Chart in Incident|B B[Metrics] -->|Timespan Log Drilldown|C C[Logs] -->|TraceID Search|D[Traces]

Detailed information can be found in the triage to minimal epic

Second, we plan to dogfood our current capabilities. Monitor and observability solutions, by nature of what they are, have a high bar to meet before adoption. By continuing to improve the triage workflow, we will at the same time enable our GitLab teammates to use GitLap Monitor more fully.

You can see our entire public backlog for Monitor at this link; filtering by labels or milestones will allow you to explore. If you find something you're interested in, you're encouraged to jump into the conversation and participate. At GitLab, everyone can contribute!

Performance Indicators (PIs)

Our Key Performance Indicator for the Monitor stage is the Monitor SMAU (stage monthly active users).

Monitor SMAU is determined by tracking how users configure, interact, and view the features contained within the stage. The following features are considered:

Configure Interact View
Install Prometheus Add/Update/Delete Metric Chart View Metrics Dashboard
Enable external Prometheus instance integration Download CSV data from a Metric chart Kubernetes podログの表示
Enable Jaeger for Tracing Generate a link to a Metric chart View Environments
Enable Sentry integration for Error Tracking Add/removes an alert View Tracing
Enable auto-creation of issues on alerts Change the environment when looking at pod logs View operations settings
Enable Generic Alert endpoint Selects issue template for auto-creation View Prometheus Integration page
Enable email notifications for auto-creation of issues Use /zoom and /remove_zoom quick actions View error list
  Click on metrics dashboard links in issues  
  Click View in Sentry button in errors list  

See the corresponding Periscope dashboard (internal).

Workflows

There are a few workflows that are critical to our users in this stage.

Each of these workflows has a designated level of maturity; you can read more about our category maturity model to help you decide which categories you want to start using and when.

Monitoring - Instrument

This workflow is planned, but not yet available.
Direction

Monitoring - Triage

Starting with the highest level alert, using preconfigured dashboards to review relevant metrics, enabling ad-hoc visualization and immediate drill down from time sliced metrics into logs and traces in the same screen This workflow is planned, but not yet available.

Direction

Monitoring - Resolve

This workflow is planned, but not yet available.
DocumentationDirection

Monitoring - Improve

This workflow is planned, but not yet available.
Direction

Categories

There are a few product categories that are critical for success here; each one is intended to represent what you might find as an entire product out in the market. We want our single application to solve the important problems solved by other tools in this space - if you see an opportunity where we can deliver a specific solution that would be enough for you to switch over to GitLab, please reach out to the PM for this stage and let us know.

Each of these categories has a designated level of maturity; you can read more about our category maturity model to help you decide which categories you want to start using and when.

メトリクス

Prometheusを利用して、GitLabはデプロイしたアプリケーションのパフォーマンスメトリクスを収集し表示できます。開発者はマージが本番環境に与える影響を、GitLabから離れることなく、簡単に確認できます。 This category is at the "complete" level of maturity.

Priority: high • DocumentationDirection

アラート管理

GitLabでAPMアラートを設定して管理できます。 This category is at the "complete" level of maturity.

Priority: high • DocumentationDirection

インシデント管理

GitLabでインシデントを追跡できます。つまり、誰が、何を、いつ、どこで、どのようなインシデントが発生したのかを把握するための統合された場所を提供します。 開発速度と安定性の望ましいバランスを達成するために、サービスレベルの目標とエラーバジェットを定義できます。 This category is at the "complete" level of maturity.

Priority: high • DocumentationDirection

ロギング

GitLabでは、Elastic Stackによるログ集約を利用して、複数のポッドやサービスに分散したログを簡単に閲覧できます。Elastic Stackを有効にすると、集約されたKubernetesのログを複数のサービスやインフラにまたがって閲覧したり、過去にさかのぼったり、無限スクロールしたり、GitLabのUIからアプリケーションのログを検索したりできます。 This category is at the "complete" level of maturity.

Priority: medium • DocumentationDirection

トレーシング

トレーシングは、デプロイされたアプリケーションのパフォーマンスと健全性に関する考察を提供し、各機能や指定されたリクエストを処理するマイクロサービスを追跡します。これにより、モノリシックなシステムを使用しているか分散システムを使用しているかに関係なく、リクエストのエンドツーエンドのフローを簡単に理解できます。 This category is at the "complete" level of maturity.

Priority: medium • DocumentationDirection

GitLabのセルフモニタリング

セルフマネージド型のGitLabインスタンスは、優れた観測ツールを備えており、GitLabインスタンスを維持するために必要な時間と労力を削減できます。

Priority: low • DocumentationDirection

エラートラッキング

エラートラッキングにより、開発者はアプリケーションで発生している可能性のあるエラーを簡単に発見し、確認することができます。コードの開発箇所とエラー情報を明らかにすることで、効率性と認識度を高めることができます。 This category is at the "complete" level of maturity.

Priority: low • DocumentationDirection

製品分析

This category is at the "minimal" level of maturity.
Priority: medium • Direction

合成モニタリング

ユーザーアクションと行動経路の成功率と実行のシミュレーション、監視、レポートを積極的に行います。 This category is at the "minimal" level of maturity.

Priority: high • Direction

Upcoming Releases

15.11 (2023-04-22)

Other Interesting Items

There are a number of other issues that we've identified as being interesting that we are potentially thinking about, but do not currently have planned by setting a milestone for delivery. Some are good ideas we want to do, but don't yet know when; some we may never get around to, some may be replaced by another idea, and some are just waiting for that right spark of inspiration to turn them into something special.

Remember that at GitLab, everyone can contribute! This is one of our fundamental values and something we truly believe in, so if you have feedback on any of these items you're more than welcome to jump into the discussion. Our vision and product are truly something we build together!