Hi All,
I need some help in the below query. There are 2 queries.
1st query which will tell me the monthly wise job report which includes columns such as TotalJobExecutions, Success Count, Failure count, Retry count, Cancel count.
2nd query, is completly concentrated on Failure count. Here i wanted to find out the reasons for the job failures.
These 2 queries is showing correct output on my local instance. bascially, I installed a sql instance and created 3 jobs and made them fail for few runs. On my local instance, the queries and counts are matching perfectly. But, when I run the set of queries,
the counts are mismatching. i.e. The sum(failure count) column in the 1st output doesnt match up with the Sum(totalCount) in the 2nd output.
Note: also I am considering only last 3 months worth data.i.e. i am interested in last 3 months job failures. Please check with WHERE clause conditions.
Now I wanted to understand when I ran the same 2 queries in PROD server, why there is count mistmatch. I am completely not able to understand. How can I fix this? Is there anything am I missing in the 2nd query. Also, in the 2nd query I am considering step
id =0 (i.e. job outcome) like I did in 1st query.Instead, to get the exact error message, I am considering the failed step to know the exact error message and I am joining the messageid to sys.sysmessages dmv to know the actual reasons for the job failures.
(stepid<>0)
Is there any special scenario or extra condition I am missing in the 2nd query. I am literally confused here. does running rerunning any failed step from point of failure will that a make a difference in the counts? I have also tested that case as well still
counts show me correct on local instant but on prod where we have around 100 jobs which is showing wrong results.
or else if a job is has multiple schedules does it make any difference????
All I am looking for is, if I sum up the values of "FailureCount" column values, that should match if I sum TotalCount column values in the 2 query output.
Can anybody help ?
Query 1
==============
use msdb
go
;
with MyCte
as
(
SELECT j.name JobName,
h.run_status,
msdb.dbo.agent_datetime(h.run_date, h.run_time) rundatetime,
case h.run_status when 0 then 'failed'
when 1 then 'Succeded'
when 2 then 'Retry'
when 3 then 'Cancelled'
when 4 then 'In Progress'
end as ExecutionStatus
--h.message MessageGenerated
FROM sysjobhistory h inner join sysjobs j
ON j.job_id = h.job_id
where h.step_id = 0
and msdb.dbo.agent_datetime(h.run_date, h.run_time) between '2016-06-01 00:00:00.000' and '2016-09-21 23:59:59.000'
)
select
@@servername AS Servername, --//CHANGE THE SERVERNAME
[Month]=cast(month(rundatetime) as varchar(2))+'/'+cast(year(rundatetime) as varchar(4)) ,
[TotalExecutions] = count(*)
,SuccessCount = sum(case when ExecutionStatus = 'Succeded' then 1 else 0 end)
,FailureCount = sum(case when ExecutionStatus = 'failed' then 1 else 0 end)
,RetryCount = sum(case when ExecutionStatus = 'Retry' then 1 else 0 end)
,CancelledCount = sum(case when ExecutionStatus = 'Cancelled' then 1 else 0 end)
from MyCte
group by cast(month(rundatetime) as varchar(2))+'/'+cast(year(rundatetime) as varchar(4))
go
Query 2
==============
;
with FailureReasons
AS
(
SELECT [JobName] = JOB.name,
msdb.dbo.agent_datetime(HIST.run_date, HIST.run_time) as RunDateTime,
[Step id] = HIST.step_id,
[StepName] = HIST.step_name,
HIST.run_status,
[Status] = CASE WHEN HIST.run_status = 0 THEN 'Failed'
WHEN HIST.run_status = 1 THEN 'Succeeded'
WHEN HIST.run_status = 2 THEN 'Retry'
WHEN HIST.run_status = 3 THEN 'Canceled'
END,
HIST.sql_severity,
HIST.sql_message_id,
HIST.message
FROM sysjobs JOB
INNER JOIN sysjobhistory HIST ON HIST.job_id = JOB.job_id
where HIST.run_status = 0 -- means only failed
and msdb.dbo.agent_datetime(HIST.run_date, HIST.run_time) between '2016-06-01 00:00:00.000' and '2016-09-21 23:59:59.000'
and step_id <> 0
)
SELECT t1.sql_message_id "Error #",t2.[Description], [TotalCount] = COUNT(1)
from FailureReasons t1
inner join sys.sysmessages t2 on (t1.sql_message_id = t2.error)
where t2.msglangid = 1033
group by t1.sql_message_id,t2.[Description]
ORDER BY [TotalCount] DESC
Sample output
Thanks,
Sam