[prev in list] [next in list] [prev in thread] [next in thread] 

List:       cypherpunks
Subject:    A Detailed Example of a Real Y2K Problem
From:       Tim May <tcmay () got ! net>
Date:       1998-06-30 19:58:49
[Download RAW message or body]



This is one of the more interesting and detailed case studies of a real Y2K
bug I've seen. I have not been forwarding most Y2K news because there are
better fora for such discussions. But this one is worth reading and
thinking about.

Though this particular situation got resolved, the implications are
sobering for "live environments," which is where most of the problems will
actually first surface (think of the tax system, the FAA system, many or
even most factories and refineries, etc.).

The URL is:

http://www.theaustralian.com.au/extras/007/4051126.htm

Here's the article: (sorry for any formatting problems)


  Routine that became a meltdown
   By SUE ASHTON DAVIES

   30jun98


  A ROUTINE Friday afternoon batch job turned into disaster when a computer
meltdown brought a manufacturing system to its knees.

  The computer room was humming, and all systems were go for one of
Australia's largest manufacturers.

  Then Jeff Steel, project manager of Infact Consultants, reset the system
clock to January 7, 2000, and waited to see what would happen.

  The routine batch job, which involved 800 custom-built Cobol and PL-1
programs in a manufacturing mainframe environment, was expected to take six
hours to run.

  Close by, a terminal in the control room was set up to track the programs
as they went through the batch run.

  Although he anticipated some problems, Steel was not prepared for
anything coming out of left field.

  His team of 12 programmers had worked methodically for nine months,
manually sifting through millions of lines of code,
  rectifying the double digit issue to take account of the year 2000.

  Great care had been taken to keep the crew motivated and focused on the
their tasks to ensure time was spent productively and
  any reworking was kept to a minimum.

  At worst, he expected to make some specific changes that could be easily
spotted.

  Operations had hardly begun before the first programs started to run slowly.

  By the time the sixth program started, the system began to falter. Then,
one after another, programs fell over.

  By the time the 10th program failed, Steel decided to let the job run to
the end, because in all likelihood, it would be all over
  in half an hour anyway.

  Within minutes, 750 programs had fallen over. One of the few programs to
continue running was invoicing, but it was
  producing invoices for the 43rd day of the 14th month.

  As the job finally ground to a halt, a silence hung over the room as
everyone stared vacantly into the terminal.

  Steel stood frozen to the floor in shock, as did his team, which had been
contracted to fulfil a $3 million contract.

  Twelve people stared at the terminal where a complete suite of programs
had died instantly.

  Fortunately the meltdown had taken place in a test environment.

  The search was now on to diagnose the problem. One of the team tracked
down the problem to an obscure mainframe program.

  The culprit was a non-Y2K compliant link editor on a PL1 program that
last ran in 1987.

  A link editor takes different modules of a program and puts them together
in the right place at the right time.

  With the problem identified and a Y2K compliant link editor installed,
the 30 programs were rerun and the problem was
  solved.

  Steel says the use of the test environment saved the company from
bankruptcy.

  "The consequences in a live environment would have been devastating," he
says. As well as bringing the business to a
  standstill, it would have rendered it unable to operate for six months -
and possibly taken suppliers and customers down with
  it.

  Situations like this are typical of what's happening and testify to the
truth of rumours about large companies not yet meeting
  Y2K compliancy requirements, Steel says.

  The post-mortem meeting found that the collective time required to
diagnose such an obscure problem in a live environment
  would been about a month, and a fix would have taken six months.

  "The problem was so unusual, you wouldn't have known if it was hardware,
software or system utilities," Steel says. "The
  horrible thing about it was that it was such an obscure component that
nobody even thought that it could fail."

  Even with hindsight, the problem could never have been spotted before
testing because it was too obscure.

  "In nine months of remediation, no-one had ever got near this problem,"
he says.

  Steel says the meltdown was so catastrophic that even a contingency plan
wouldn't have saved the day.

  The only way to find the Y2K bugs in a system is to manually trawl the
program code line by line to find the date fields, some
  of which are very obscure, he says.

  One area for dates was embedded deep in a job control language, where a
sort of 30 characters revealed six characters making
  up a date.

  Even though the testing is complete, Steel cannot say definitely that the
system is now 100 per cent Y2K compliant.

  As part of the strategy to protect himself and the company from any legal
recourse, he operated with an auditor looking over
  his shoulder at every stage concurring that the way he was progressing
was the best available method.

  "All I can say to the client is that I can't guarantee that there will
not be any problems after the year 2000," he says.

  Steel says most organisations don't understand Y2K.

  "Until something like this happens, they don't understand what Y2K can do
to them," he says.



  © News Limited 1998

 
 

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic