Two weeks back I was working on one of the client "Chedraui" issue. Implementation team reported that two of the IGP jobs are running long from last couple of weeks. Unfortunately these were weekly jobs so if missed this time, next time we can debug after a week only.
Also it was reported that same job was running longer earlier as well and after disabling a trigger it went fine.
Recently they refreshed environment from prod again.
When I started looking into the issue, observed there were no active sessions in database and it was custom IGP loading job, so there cannot be anything on app server.
I have verified job logs as well and every time job took close to 2 hours whereas earlier it used to take close to 10 mins.
I have verified statspack report as well and nothing was useful in statspack report as well.
It was really difficult what's going wrong. After going through complete job code and all related packages and procedures I started querying into job tables. As these were oracle scheduled jobs so started querying below tables.
select * from sys.scheduler$_job
Above table had an entry without end date populated. That gave some hint that job in fact in place but waiting for something.
ex: 241680 BEGIN NWMGR.call_proc; END; 7/30/2018 6:29:29.000000 AM -07:00 7/30/2018 4:28:59.594874 AM -07:00 17372 3 1 0 9.0071992548803E15 Test job to check completion time NWMGR NLS_LANGUAGE='AMERICAN' NLS_TERRITORY='AMERICA' NLS_CURRENCY='$' NLS_ISO_CURRENCY='AMERICA' NLS_NUMERIC_CHARACTERS='.,' NLS_CALENDAR='GREGORIAN' NLS_DATE_FORMAT='DD-MON-RR' NLS_DATE_LANGUAGE='AMERICAN' NLS_SORT='BINARY' NLS_TIME_FORMAT='HH.MI.SSXFF AM' NLS_TIMESTAMP_FORMAT='DD-MON-RR HH.MI.SSXFF AM' NLS_TIME_TZ_FORMAT='HH.MI.SSXFF AM TZR' NLS_TIMESTAMP_TZ_FORMAT='DD-MON-RR HH.MI.SSXFF AM TZR' NLS_DUAL_CURRENCY='$' NLS_COMP='BINARY' NLS_LENGTH_SEMANTICS='CHAR' NLS_NCHAR_CONV_EXCP='FALSE' 0102000000000000 0 PRIMARY 0 1 7/30/2018 6:29:29.000000 AM -07:00 0 0 0 2147483648 0 0 0 0
Then I queried below table.
SELECT * FROM USER_SCHEDULER_RUNNING_JOBS;
I think this table was also having running jobs information. I missed to capture details on that day so does not have current information.
So to check historical details of these jobs I started querying job history tables based on the keywords provided in code.
select * from USER_SCHEDULER_JOB_RUN_DETAILS where job_name LIKE '%LOAD_OO6%' ORDER BY ACTUAL_START_DATE DESC;
With historical data in this table , things were more clear. My first observation was that each job was taking exactly 2 hours time. This cannot be just coincidence.
100900 7/5/2018 2:19:17.210435 PM -05:00 NWMGR LOAD_OO6 SUCCEEDED 0 7/5/2018 12:18:39.000000 PM -07:00 7/5/2018 12:18:39.408172 PM -07:00 +00 00:00:38.000000 1 533,36729 12685 +00 00:00:36.920000
99830 7/4/2018 11:26:46.054728 AM -05:00 NWMGR LOAD_OO6 SUCCEEDED 0 7/4/2018 9:26:25.000000 AM -07:00 7/4/2018 9:26:25.747494 AM -07:00 +00 00:00:20.000000 1 289,12857 19307 +00 00:00:20.090000
In above records you can see that there is close to 2 hours difference in start time and finish time. Job took 38 seconds and 20 seconds respectively.
Also the most important information you can collect fro this is time zone. You can see that start date has -7 in timezone and log date has -5. So it was pretty clear by now that it not database that is taking time but job is waiting before it starts executing code.
So my suspect was that client machine and DB server were configured with different time zone.
But I was not having access to both servers and there was nobody to help find out current time zone configuration on client and DB servers.
So I started writing code to test the same.
Sample code:
CREATE TABLE job_test
(
id NUMBER,
descr VARCHAR2 (200));
CREATE OR REPLACE PROCEDURE test_proc
AS
BEGIN
INSERT INTO job_test
VALUES (1, 'Job completed at' || SYSTIMESTAMP);
END;
/
show errors;
CREATE OR REPLACE PROCEDURE call_proc
AS
v_stat VARCHAR2 (200);
begin
v_stat :='BEGIN TEST_PROC;COMMIT;END;';
EXECUTE IMMEDIATE v_stat;
END;
/
show errors;
CREATE OR REPLACE PROCEDURE run_proc
AS
BEGIN
DBMS_SCHEDULER.create_job (
job_name => 'test_job',
job_type => 'PLSQL_BLOCK',
job_action => 'BEGIN NWMGR.call_proc; END;',
start_date => SYSTIMESTAMP + (0.5 / 1440),
repeat_interval => NULL,
end_date => NULL,
ENABLED => TRUE,
comments => 'Test job to check completion time');
END;
/
show errors;
begin
run_proc;
end;
/
alter session set time_zone='-07:00';
begin
run_proc;
end;
/
In first test job executed immediately and record got inserted into job_test table. But when I changed time zone and re-submitted job, I noticed that my job was waiting. After two hours job completed.
Now I have proof to that it up with implementation team. I advised them to change timezone on application server and they realized that it was changed recently as part of server maintenance activity.
After changing timezone ,job started completing within few seconds.
Party time!!!!
No comments:
Post a Comment