Doing a Group By On Datadog Logs

Date: 8/31/2023

I had some queries that were timing out in a web app, which were identified by a log pattern where the request had a log with request_started but was missing a log with request_finished.

I wanted to find queries that matched that pattern and as far as I could tell, datadog didn’t support an aggregate query across request ids like this. So I exported the data to a CSV and queried it directly.

  1. Go to datadog logs and add the query and time range. My query was service:odyn-prod @event:request_*.
  2. Dump the logs: click Download as CSV. I download about 60k rows and it took <90 seconds.
  3. I wanted to use datasette, so I converted the csv into a sqlite database: sqlite-utils insert data.db data data.csv --csv
  4. I opened the database with datasette: datasette data.db
  5. I ran this query:
with og as (
  select
    rowid,
    Date,
    Host,
    Service,
    [@log.level],
    [@code],
    [@event],
    Message,
    json_extract(Message, '$.request_id') as request_id,
    json_extract(Message, '$.request') as request
  from
    data
)
select
  request_id,
  count(*) as c,
  group_concat([@event], ' | ') as m,
  group_concat(request, ' | ') as request
from
  og
group by
  request_id
order by
  c

If you’re running datasette locally with the steps above, you can visit this link.

Screenshot of query and query results