More and more companies are now shifting to a cloud-native & microservices-based architecture. Having an application monitoring tool is critical in this world because you can’t just log into a machine and figure out what’s going wrong.
We have spent years learning about application monitoring & observability. What are the key features an observability tool should have to enable fast resolution of issues.
In our opinion, good observability tools should have
- Out of the box application metrics like latency, request rates, error rate, etc.
- All telemetry signals under a single pane
- Way to go from metrics to traces to find why some issues are happening
- Seamless flow between metrics, traces & logs — the three pillars of observability
- Ability to set dynamic thresholds for alerts
- Transparency in pricing
User experience not great in current open-source tools
We found that though there are open-source tools like Prometheus & Jaeger, they don’t provide a great user experience as SaaS products do. It takes lots of time and effort to get them working, figuring out the long-term storage, etc. And if you want metrics and traces, it’s not possible as Prometheus metrics & Jaeger traces have different formats.
SaaS tools like DataDog and NewRelic do a much better job at many of these aspects:
- They are easy to setup & get started
- Provide out-of-box application metrics
- Provides correlation between signals for contextual troubleshooting
But it has the following issues:
- Crazy node-based pricing, which doesn’t make sense in today’s micro-services architecture. Any node which is live for more than 8hrs in a month is charged. So, unsuitable for spiky workloads
- Very costly. They charge custom metrics for $5/100 metrics
- It is cloud-only, so not suitable for companies that have concerns with sending data outside their infra
- some tools charge based on user seats which becomes very limiting and costly if your engineering team is large
- For any small feature, you are dependent on their roadmap. We think this is an unnecessary restriction for a product which developers use. A product used by developers should be extendible
To fill this gap we built SigNoz, an open-source alternative to DataDog.
Key Features of SigNoz - a DataDog alternative
Some of our key features which makes SigNoz vastly superior to current open-source products and a great alternative to DataDog are:
- Metrics, traces, and logs under a single pane of glass
- Opentelemetry-native
- Correlation of telemetry signals based on OpenTelemetry's semantic conventions
- Out of the box charts for application metrics
- Seamless flow between metrics, traces & logs
- Distributed tracing with Flamegraphs & Gantt charts to visualize user requests
- Filtering based on tags & attributes
- Custom aggregates on filtered traces
- Monitoring of messaging queues
- Infrastructure dashboards
- Exceptions monitoring
- Transparent usage data & pricing
Application metrics
Get out of the box p90, p99 latencies, RPS, Error rates and top endpoints for a service out of the box.
Seamless flow between telemetry signals
Powered by OpenTelemetry's semantic conventions, you can quickly jump between telemetry signals in SigNoz. Found something suspicious in a metric, just click that point in the graph & get details of traces which may be causing the issues. Seamless, Intuitive.
Similarly, we have enabled correlation between other telemetry signals.
APM Metrics to Traces & Logs
Traces to Logs
If you see a API call taking more time than usual, you can go to related logs to investigate further.
Similarly you can click on detailed view of logs and then go to related trace ID to see the flow of user requests.
Logs with Infrastructure metrics
While troubleshooting with logs, you can investigate the related infrastructure metrics to see if issues are happening becuase of that.
Quick filers & Advanced Query Builder for all telemetry signals
For all telemetry signals, there are quick filters to quickly filter out the data needed. We have built customized query builders for each signal to make your troublshooting work easier.
For example, query builder for traces allows you to create queries for finding the p99 latency of services.
Similarly, use quick filters to quickly filter the data that you need.
You can create custom metrics from filtered traces to find metrics of any type of request. Want to find p99 latency of customer_type: premium
who are seeing status_code:400
. Just set the filters, and you have the graph. Boom!
Flamegraphs & Gantt charts
Detailed flamegraph & Gantt charts to find the exact cause of the issue and which underlying requests are causing the problem. Is it a SQL query gone rogue or a Redis operation is causing an issue? Get more context on your spans with tags and events.
Logs Management
SigNoz provides Logs management with advanced log query builder. You can also monitor your logs in real-time using live tailing. SigNoz uses a columnar database ClickHouse to store logs, which is very efficient at ingesting and storing logs data. Columnar databases like ClickHouse are very effective in storing log data and making it available for analysis.
Metrics & Dashboards
Monitor any metrics important to you. Ingest metrics from your infrastructure or applications and create customized dashboards to monitor them.
Exceptions Monitoring
Monitor exceptions automatically in Python, Java, Ruby, and Javascript. For other languages, just drop in a few lines of code and start monitoring exceptions.
Simple usage-based pricing with cost control features
Open-source SigNoz is free to self-host and use. SigNoz cloud is the easiest way of running SigNoz without any hassle of maintenance. For cloud, we have a simple usage-based pricing based on amount of data sent.
- Logs: $0.3/GB ingested
- Traces: $0.3/GB ingested
- Metrics: $0.1/mil samples
Check out our pricing plans for more details. You can also estimate your monthly bill using the pricing calculator.
Getting started with SigNoz
Related Content
SigNoz vs Datadog
Beware these surprises in Datadog pricing