Uploaded image for project: 'Insights Experiences'
  1. Insights Experiences
  2. HMS-9021

Only two worker pods are processing tasks

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Critical Critical
    • None
    • None
    • Content
    • None
    • insights-content
    • None

      Currently only two of the  4 worker pods are processing tasks.  I'm not sure if this was related to the change to postgres down to 20 gb in production, but it seems that it started around the same time:

       

      we saw this error:

       

      46PM ERR Connect database=postgres err="failed to connect to `user=postgres database=postgres`: 10.0.217.176:5432 (content-sources-prod.ckxxru2ayexw.us-east-1.rds.amazonaws.com): dial error: timeout: context deadline exceeded" host=content-sources-prod.ckxxru2ayexw.us-east-1.rds.amazonaws.com module=pgx port=5432 1:46PM ERR Acquire err="failed to connect to `user=postgres database=postgres`: 10.0.217.176:5432 (content-sources-prod.ckxxru2ayexw.us-east-1.rds.amazonaws.com): dial error: timeout: context deadline exceeded" module=pgx panic: error connecting to database: failed to connect to `user=postgres database=postgres`: 10.0.217.176:5432 (content-sources-prod.ckxxru2ayexw.us-east-1.rds.amazonaws.com): dial error: timeout: context deadline exceeded goroutine 117 [running]: github.com/content-services/content-sources-backend/pkg/tasks/queue.(*PgQueue).waitAndNotify(0xc0000acc80, {0x1e36848, 0xc00012e910}) /go/src/app/pkg/tasks/queue/pgqueue.go:303 +0x2d3 github.com/content-services/content-sources-backend/pkg/tasks/queue.(*PgQueue).listen(0xc0000acc80, {0x1e36848, 0xc00012e910}, 0x0?) /go/src/app/pkg/tasks/queue/pgqueue.go:279 +0x45 created by github.com/content-services/content-sources-backend/pkg/tasks/queue.NewPgQueue in goroutine 1 /go/src/app/pkg/tasks/queue/pgqueue.go:267 +0x225

       

      i think around when the db was changed. 

       

      The two pods are just showing:

      7:41PM INF Query args=[] commandTag=BEGIN module=pgx pid=26646 sql=begin
      7:41PM INF Query args=[] commandTag=LISTEN module=pgx pid=26636 sql="LISTEN tasks"
      7:41PM INF Query args=[] commandTag=BEGIN module=pgx pid=26667 sql=begin
      7:41PM INF Query args=[] commandTag=UNLISTEN module=pgx pid=26637 sql="UNLISTEN tasks"
      7:41PM INF Query args=[] commandTag=BEGIN module=pgx pid=26637 sql=begin
      7:41PM INF Query args=[] commandTag=UNLISTEN module=pgx pid=26636 sql="UNLISTEN tasks"
      7:41PM INF Query args=[] commandTag=LISTEN module=pgx pid=26636 sql="LISTEN tasks"

      I had app-sre restart both pods, and they initially picked up work, but then stopped again pretty quickly

              Unassigned Unassigned
              hms-jsherrill Justin Sherrill
              None
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: