Job Scheduler and pgagent error investigation

Matthew Gwillam
Matthew Gwillam

Investigating erroring jobs and logs.

The edb job scheduler in EPAS uses pgagent for it's functionality.

Finding the pgagent errors

Based on the scheduler being pgagent, you can extract, both the error of the code and the jobstep that fails, using the following query. The following is equally valid for for both EPAS and community postgres.

select j.jobid,j.jobname,js.jstname,js.jstcode, jsl.jslstart,jsl.jsloutput
from pgagent.pga_job j, 
pgagent.pga_jobstep js,
pgagent.pga_joblog jl,
pgagent.pga_jobsteplog jsl
where j.jobid=js.jstjobid
and j.jobid=jl.jlgjobid 
and js.jstid=jsl.jsljstid
and jsl.jsljlgid=jl.jlgid
and js.jstjobid=j.jobid
and jsl.jslstatus !='s' 
and jl.jlgstatus !='s' 
order by jl.jlgstart desc,j.jobid,js.jstname asc;

The above query will generate results only where effectively a jobstep has failed. The particularly useful fields are ..

  1. js.jstcode
  2. jsl.jsloutput

These 2 fields will provide the jobstep code and the error message, as we are after failures, generated for that particular jobstep.

If using PEM or PgAdmin you can export these results to a file, it will be in csv format, could uploaded to a ticket.

The output from this query will be similar to that shown below:

jobid | jobname | jstname | jstcode | jslstart | jsloutput 
1 | 01-populate-pcutime | 10-error | insert into tryme (a) values(10); | 10-MAR-23 14:00:03.100989 +00:00 | ERROR: relation "tryme" does not exist +
| | | | | LINE 1: insert into tryme (a) values(10);+
| | | | | ^
1 | 01-populate-pcutime | 10-error | insert into tryme (a) values(10); | 10-MAR-23 13:55:03.440661 +00:00 | ERROR: relation "tryme" does not exist +
| | | | | LINE 1: insert into tryme (a) values(10);+
| | | | | ^
1 | 01-populate-pcutime | 10-error | insert into tryme (a) values(10); | 10-MAR-23 13:50:00.960536 +00:00 | ERROR: relation "tryme" does not exist +
| | | | | LINE 1: insert into tryme (a) values(10);+

Alternatively, if using psql you could simply execute the following, this will generate a csv file that can be opened in say LibreOffice Calc or Microsoft Exel amongst others.

create temporary table pgagent_errors as 
select j.jobid,j.jobname,js.jstname,js.jstcode, jsl.jslstart,jsl.jsloutput
from pgagent.pga_job j, 
pgagent.pga_jobstep js,
pgagent.pga_joblog jl,
pgagent.pga_jobsteplog jsl
where j.jobid=js.jstjobid
and j.jobid=jl.jlgjobid 
and js.jstid=jsl.jsljstid
and jsl.jsljlgid=jl.jlgid
and js.jstjobid=j.jobid
and jsl.jslstatus !='s' 
and jl.jlgstatus !='s' 
order by jl.jlgstart desc,j.jobid,js.jstname asc;
\copy pgagent_errors to 'pgagent_errors.csv' (header, format csv);

The above query could be amended to further filter the results by adding the following, if you only want the errors from the last 7 days by adding the following to the where clause

and jl.jlgstart::DATE > CURRENT_DATE -7

Additional pgagent helpful information

It may sometimes be beneficial to query all the pgagent jobs to check to see if the jobs and jobsteps are enabled, and also to view the joblastrun and jobnextrun values.

The following query will provide details on pgagent jobs, such as whether they are enabled, the code associated with the jobsteps of the job etc.

select j.jobname ,
j.jobenabled,
j.jobnextrun,
j.joblastrun,
js.jstname,
js.jstcode,
js.jstenabled
from pgagent.pga_job j, pgagent.pga_jobstep js
where j.jobid=js.jstjobid
--and j.jobenabled is true
--and js.jstenabled is true
order by j.jobname,js.jstname asc;

The out from this query will be similar to that shown below:

jobname | jobenabled | jobnextrun | joblastrun | jstname | jstcode | jstenabled 
01-populate-pcutime | t | 10-MAR-23 14:10:00 +00:00 | 10-MAR-23 14:05:00.405007 +00:00 | 01-copy_cputime_to_arch | insert into pgpool_mon_arch.cputime select * from pgpool_mon.ft_cputime; | t
01-populate-pcutime | t | 10-MAR-23 14:10:00 +00:00 | 10-MAR-23 14:05:00.405007 +00:00 | 02-pop_job_run | insert into pgagent_job_run.jobrun(time_run) select current_time; | t
01-populate-pcutime | t | 10-MAR-23 14:10:00 +00:00 | 10-MAR-23 14:05:00.405007 +00:00 | 03-test | select current_date | t
01-populate-pcutime | t | 10-MAR-23 14:10:00 +00:00 | 10-MAR-23 14:05:00.405007 +00:00 | 10-error | insert into tryme (a) values(10); | t
02-truncate_arch_tables | t | 10-MAR-23 14:23:00 +00:00 | 10-MAR-23 14:03:04.227323 +00:00 | 01-truncate cputime | truncate table pgpool_mon_arch.cputime; | t
02-truncate_arch_tables | t | 10-MAR-23 14:23:00 +00:00 | 10-MAR-23 14:03:04.227323 +00:00 | 02 - truncate table pgagent_job_run.jobrun | truncate table pgagent_job_run.jobrun; | t
no_sched | t | | 09-MAR-23 14:32:12.849283 +00:00 | 01 - test | select current_date; | t
test2-not-run | t | | | 01-norun | select current_date; | t

To generate the output as a csv file that can be uploaded:

drop table if exists pgagent_jobs_enabled;
create temporary table pgagent_jobs_enabled as
select j.jobname ,
j.jobenabled,
j.jobnextrun,
j.joblastrun,
js.jstname,
js.jstcode,
js.jstenabled
from pgagent.pga_job j, pgagent.pga_jobstep js
where j.jobid=js.jstjobid
--and j.jobenabled is true
--and js.jstenabled is true
order by j.jobname,js.jstname asc;
\copy pgagent_jobs_enabled to pgagent_jobs_enabled.csv (header, format csv);

Combining the above to give the latest log entry for a job and jobsteps.

We can effectively combine the 2 methods as described above, by getting the latest log entries for the pgagent jobs, using the following query, the result of this query will also return values for jobs that have never run.

WITH jlm AS (
select max(jlgid) as jlgid_max, jlgjobid 
from pgagent.pga_joblog
group by jlgjobid
)
select j.jobid,j.jobname, j.jobenabled, j.joblastrun,j.jobnextrun, jl.jlgstatus,js.jstname,js.jstcode, js.jstenabled,jsl.jslstart,jsl.jslstatus,jsl.jsloutput
from pgagent.pga_job j
LEFT JOIN jlm 
on (j.jobid=jlm.jlgjobid)
left join pgagent.pga_joblog jl
on (j.jobid=jl.jlgjobid) 
left join pgagent.pga_jobstep js
on j.jobid=js.jstjobid 
left join pgagent.pga_jobsteplog jsl
on jsl.jsljlgid=jl.jlgid
where j.jobid=jlm.jlgjobid
and jlm.jlgid_max=jl.jlgid 
and js.jstid=jsl.jsljstid
or jl.jlgjobid is null and jl.jlgid is null
order by jl.jlgstart desc,j.jobname,js.jstname asc;

The out from this query will be similar to that shown below:

jobid | jobname | jobenabled | joblastrun | jobnextrun | jlgstatus | jstname | jstcode | jstenabled | jslstart | jslstatus | jsloutput 
4 | test2-not-run | t | | | | 01-norun | select current_date; | t | | | 
1 | 01-populate-pcutime | t | 10-MAR-23 14:10:03.242028 +00:00 | 10-MAR-23 14:15:00 +00:00 | f | 01-copy_cputime_to_arch | insert into pgpool_mon_arch.cputime select * from pgpool_mon.ft_cputime; | t | 10-MAR-23 14:10:03.248059 +00:00 | s | 
1 | 01-populate-pcutime | t | 10-MAR-23 14:10:03.242028 +00:00 | 10-MAR-23 14:15:00 +00:00 | f | 02-pop_job_run | insert into pgagent_job_run.jobrun(time_run) select current_time; | t | 10-MAR-23 14:10:03.392861 +00:00 | s | 
1 | 01-populate-pcutime | t | 10-MAR-23 14:10:03.242028 +00:00 | 10-MAR-23 14:15:00 +00:00 | f | 03-test | select current_date | t | 10-MAR-23 14:10:03.396919 +00:00 | s | 
1 | 01-populate-pcutime | t | 10-MAR-23 14:10:03.242028 +00:00 | 10-MAR-23 14:15:00 +00:00 | f | 10-error | insert into tryme (a) values(10); | t | 10-MAR-23 14:10:03.399162 +00:00 | f | ERROR: relation "tryme" does not exist +
| | | | | | | | | | | LINE 1: insert into tryme (a) values(10);+
| | | | | | | | | | | ^
2 | 02-truncate_arch_tables | t | 10-MAR-23 14:03:04.227323 +00:00 | 10-MAR-23 14:23:00 +00:00 | s | 01-truncate cputime | truncate table pgpool_mon_arch.cputime; | t | 10-MAR-23 14:03:04.236215 +00:00 | s | 
2 | 02-truncate_arch_tables | t | 10-MAR-23 14:03:04.227323 +00:00 | 10-MAR-23 14:23:00 +00:00 | s | 02 - truncate table pgagent_job_run.jobrun | truncate table pgagent_job_run.jobrun; | t | 10-MAR-23 14:03:04.243503 +00:00 | s | 
3 | no_sched | t | 09-MAR-23 14:32:12.849283 +00:00 | | s | 01 - test | select current_date; | t | 09-MAR-23 14:32:12.855754 +00:00 | s | 

To generate the output as a csv file that can be uploaded:

drop table if exists pgagent_jobs_combined;
create temporary table pgagent_jobs_combined as
WITH jlm AS (
select max(jlgid) as jlgid_max, jlgjobid 
from pgagent.pga_joblog
group by jlgjobid
)
select j.jobid,j.jobname, j.jobenabled, j.joblastrun,j.jobnextrun, jl.jlgstatus,js.jstname,js.jstcode, js.jstenabled,jsl.jslstart,jsl.jslstatus,jsl.jsloutput
from pgagent.pga_job j
LEFT JOIN jlm 
on (j.jobid=jlm.jlgjobid)
left join pgagent.pga_joblog jl
on (j.jobid=jl.jlgjobid) 
left join pgagent.pga_jobstep js
on j.jobid=js.jstjobid 
left join pgagent.pga_jobsteplog jsl
on jsl.jsljlgid=jl.jlgid
where j.jobid=jlm.jlgjobid
and jlm.jlgid_max=jl.jlgid 
and js.jstid=jsl.jsljstid
or jl.jlgjobid is null and jl.jlgid is null
order by jl.jlgstart desc,j.jobname,js.jstname asc;
\copy pgagent_jobs_combined to pgagent_jobs_combined.csv (header, format csv);

Was this article helpful?

0 out of 0 found this helpful