Welcome to DU! The truly grassroots left-of-center political community where regular people, not algorithms, drive the discussions and set the standards. Join the community: Create a free account Support DU (and get rid of ads!): Become a Star Member Latest Breaking News General Discussion The DU Lounge All Forums Issue Forums Culture Forums Alliance Forums Region Forums Support Forums Help & Search

BumRushDaShow

(129,440 posts)
Wed Jan 11, 2023, 07:30 PM Jan 2023

US Aviation System Meltdown Tied to Corrupted Digital File

Source: Bloomberg

A corrupted computer file led to the breakdown of an air-safety system that prompted flights to be grounded across the US, according to people familiar with the preliminary findings.

The glitch that affected the Notice to Air Missions, or Notam, system on Wednesday also caused a failure in a related backup system, said the people, who asked not to be identified discussing the ongoing investigation. They cautioned that the information was not final.

The computer system that shares the notices to pilots, airlines and other users began developing problems late Tuesday night and had to be completely taken down in the early hours of Wednesday. The Federal Aviation Administration temporarily halted domestic departures, leading to thousands of flight delays.

Technology workers tried to activate a backup system and it initially seemed to function, but the same or a similar corrupted file caused problems there as well, said one of the people. A halt to all flights across the country is extremely rare and has only been done a handful of times, such as after the Sept. 11, 2001, terrorist attacks. 

Read more: https://www.bloomberg.com/news/articles/2023-01-11/us-aviation-system-meltdown-related-to-corrupted-digital-file

35 replies = new reply since forum marked as read
Highlight: NoneDon't highlight anything 5 newestHighlight 5 most recent replies
US Aviation System Meltdown Tied to Corrupted Digital File (Original Post) BumRushDaShow Jan 2023 OP
So glad they were able to determine the root cause of the problem. U think a system as critical as iluvtennis Jan 2023 #1
Assuming they are using some kind of RAID array(s) BumRushDaShow Jan 2023 #2
+ agree. But I would have expected more data corruption on HD than just a single file. iluvtennis Jan 2023 #3
I saw an article from CNN BumRushDaShow Jan 2023 #7
Makes me wonder if it's a distributed file, akin to when DNS goes bad and corruption propagates... NullTuples Jan 2023 #13
I just posted something similar because apparently Canada's system BumRushDaShow Jan 2023 #15
Answer: James48 Jan 2023 #19
Sounds like a bad file was sync'd Lithos Jan 2023 #21
This Was My Thought RobinA Jan 2023 #32
It may not be as simple as rolling back to a previous version Major Nikon Jan 2023 #34
I hope they can pinpoint the lapses or whatever allowed this to happen. 30 year old tech?? Evolve Dammit Jan 2023 #4
Yep. You'd be horrified what old tech we use bluevoter4life Jan 2023 #9
Wow DOS. But not surprised given all of the 60s based COBOL programs iluvtennis Jan 2023 #11
I know a programmer who wrote a small COBOL program... NullTuples Jan 2023 #16
They use tons of strategies to remove the old hardware Lithos Jan 2023 #22
I heard maybe 15 years ago that our nuclear systems still had similar outdated tech. Tell me this Evolve Dammit Jan 2023 #30
Still it works better than anywhere else Major Nikon Jan 2023 #35
I had a copy command drop a line of a digital file. rickford66 Jan 2023 #5
I have a little Raspberry Pi4 BumRushDaShow Jan 2023 #8
Glad I'm retired rickford66 Jan 2023 #10
Retired here too! BumRushDaShow Jan 2023 #12
I learned with punch cards. rickford66 Jan 2023 #17
But then that is when BumRushDaShow Jan 2023 #18
Yup rickford66 Jan 2023 #20
hear hear. AllaN01Bear Jan 2023 #24
Don't you mean... regnaD kciN Jan 2023 #26
It was "Airmen" when I worked on it. rickford66 Jan 2023 #29
Also happened in Canada. chowder66 Jan 2023 #6
Could be that whatever files were corrupted BumRushDaShow Jan 2023 #14
hem. AllaN01Bear Jan 2023 #23
If it WAS the result of hacking,... LudwigPastorius Jan 2023 #25
I had the same thought... regnaD kciN Jan 2023 #27
Could be the Russians or Aliens! Aussie105 Jan 2023 #28
F'in Ferangi..always up to no good Evolve Dammit Jan 2023 #31
It was Rom's fault BumRushDaShow Jan 2023 #33

iluvtennis

(19,871 posts)
1. So glad they were able to determine the root cause of the problem. U think a system as critical as
Wed Jan 11, 2023, 07:45 PM
Jan 2023

this needs not only a hot backup, but a backup that is one rev level back from the official version.

It seems like they have the hot backup which is why the same corrupt file was in place on the backup.

Would love to understand why testing/quality control didn't catch the corrupt file before it was rolled to the production system.

BumRushDaShow

(129,440 posts)
2. Assuming they are using some kind of RAID array(s)
Wed Jan 11, 2023, 08:02 PM
Jan 2023

it could have been a failing drive that ended up with enough errors to finally trigger a more obvious problem but the errors were already starting to corrupt the backups.

iluvtennis

(19,871 posts)
3. + agree. But I would have expected more data corruption on HD than just a single file.
Wed Jan 11, 2023, 08:08 PM
Jan 2023

Last edited Wed Jan 11, 2023, 08:59 PM - Edit history (2)

to a fellow tech nerd.

BumRushDaShow

(129,440 posts)
7. I saw an article from CNN
Wed Jan 11, 2023, 08:31 PM
Jan 2023

that mentioned that they apparently noticed the problem yesterday and they decided to reboot early this morning, which normally takes 90 min to do so after completing it's checks. But they were finding it was taking loo long after that normal boot time to push the info out and that is when they decided to do the ground stop to troubleshoot.

And yup - tech nerd here hoping my NAS holds up with my backups!

James48

(4,440 posts)
19. Answer:
Wed Jan 11, 2023, 09:48 PM
Jan 2023

The software is now maintained by the contractor- Lockheed Martin, interested in the greatest profit, not the highest quality. FAA hasn’t actually had control of the software for years.

Lithos

(26,404 posts)
21. Sounds like a bad file was sync'd
Wed Jan 11, 2023, 10:20 PM
Jan 2023

if they are using a Hot-hot HA mode (which is now pushing about 15-20 years old in obsolescence), then a simple sync would have replicated this.

And the file could have been the result of a bad write somewhere caused by a transient side effect (think partial disk failure) which was not caught.

L-

RobinA

(9,894 posts)
32. This Was My Thought
Thu Jan 12, 2023, 08:55 AM
Jan 2023

but I know nothing about IT. I would think there would be redundancy on top of redundancy, but mental health is my game, so what do I know. A little gift to Southwest from the computer gods.

Major Nikon

(36,827 posts)
34. It may not be as simple as rolling back to a previous version
Thu Jan 12, 2023, 10:44 AM
Jan 2023

You have static source code and dynamic data. If the corruption is in the source code it's easy enough to roll back. If it's in the dynamic data it's not that simple. NOTAM data is constantly changing as new ones are added and older ones are cancelled. So you can't just roll back to archived data because critical information not included in the backup would be lost. Imagine a closed runway NOTAM at an airport which was lost because they rolled back to a version that didn't include it.

In this case the data loss would be significant. They knew they had a problem hours before and waited until a low traffic period to perform the reset. In that time there would be countless NOTAMs added and removed.

bluevoter4life

(788 posts)
9. Yep. You'd be horrified what old tech we use
Wed Jan 11, 2023, 08:44 PM
Jan 2023

I'm ATC and some of our computers are still using a DOS-based operating system. Some of our equipment is so old, they are starting to have problems finding replacement parts. Our radar system is several generations behind the rest of the developed world, and we still use paper flight strips.

iluvtennis

(19,871 posts)
11. Wow DOS. But not surprised given all of the 60s based COBOL programs
Wed Jan 11, 2023, 09:04 PM
Jan 2023

still crunching business’ processes.

NullTuples

(6,017 posts)
16. I know a programmer who wrote a small COBOL program...
Wed Jan 11, 2023, 09:17 PM
Jan 2023

Over the years the city moved from IBM mainframe for that functionality to Windows based COBOL, but his program didn't change. Then they wrapped it in something Java-esque to present the data it exposed to web users. He moved cross-country, then retired. But he's heard from friends that his module is still running, inside several layers of wrappers, because the source is long gone & it's not worth it to reverse engineer (it's for a single, narrow function that's due to be replaced any day now...for the last 20 years). At this point it's just a black box compiled executable that will be in use until 32-bit is retired.

Lithos

(26,404 posts)
22. They use tons of strategies to remove the old hardware
Wed Jan 11, 2023, 10:29 PM
Jan 2023

Sounds like they rehosted the code into an emulator more than likely running in the cloud. Other strategies include taking the old COBOL and C code, compiling it to some intermediate model and then converting it to a more modern architecture with the goal of removing the "COBOL" flavorings. It gives code which is more approachable for today's developers to maintain.

Though frankly you can create extremely unmaintainable Java by over-using Dependency Injection and overly complicated OOP.

L-

Evolve Dammit

(16,763 posts)
30. I heard maybe 15 years ago that our nuclear systems still had similar outdated tech. Tell me this
Thu Jan 12, 2023, 08:43 AM
Jan 2023

won't bite us in the ass at some point. I give up.

Major Nikon

(36,827 posts)
35. Still it works better than anywhere else
Thu Jan 12, 2023, 10:56 AM
Jan 2023

Nowhere else in the world will you find an ATC system that moves as many airplanes anywhere nearly as efficiently and as safely as in the US.

Newer technology isn't always better. If a system works reliably, does the job intended, and is sustainable, replacing it with a system that happens to be newer doesn't always result in any improvement and could be a step backwards.

As far as radar systems go, ADS-B supplemented data is the way of the future and the US has a far better system than anywhere else in the world and it has far better growth potential.

rickford66

(5,528 posts)
5. I had a copy command drop a line of a digital file.
Wed Jan 11, 2023, 08:11 PM
Jan 2023

There was a manual copy needed in the process. Pretty straight forward. Well the system crashed and I finally had to eyeball the two files line by line. This was on a customer's proprietary S/W with no DIFF command. Thanks Murphy.

BumRushDaShow

(129,440 posts)
8. I have a little Raspberry Pi4
Wed Jan 11, 2023, 08:39 PM
Jan 2023

running a local weather station data capturing and formatting web program that was written in python and I know editing that can be a bear... And although I always keep a backup of the previous config files before editing, simple little misspellings or misplaced brackets can torpedo the program.

rickford66

(5,528 posts)
10. Glad I'm retired
Wed Jan 11, 2023, 08:44 PM
Jan 2023

Third shifts in cold computer rooms are not missed.

I should mention the system I was working on was a Falcon 50 simulator and the Notice To Airmen (NOTAM) was one of the updates I had to debug.

BumRushDaShow

(129,440 posts)
12. Retired here too!
Wed Jan 11, 2023, 09:09 PM
Jan 2023

My dad was a programmer for the VA (before they became a department) from the '50s - '70s programming COBOL (for veterans' checks). He used to bring his punch card decks home and we used to play with the mag tape write-protect tabs that he also would hand us.

Never had the patience to do programming but had a PASCAL course in college and did just rudimentary other languages as they came out, for hobby stuff to at least be able to customize the configs.

I can imagine trying to debug something like that though when your backup is flaky.

regnaD kciN

(26,045 posts)
26. Don't you mean...
Thu Jan 12, 2023, 12:39 AM
Jan 2023
I should mention the system I was working on was a Falcon 50 simulator and the Notice To Airmen (NOTAM) was one of the updates I had to debug.

...Notice to Air Missions?

I just noticed that change in terminology today. While I have no problem with adopting gender-neutral language, it amuses me to no end that "NOTAM" is such a familiar acronym that they had to stretch to find a new name that would fit the acronym (never heard of civilian flights being called "air missions" before), instead of just coming up with a logical name, like "Notice to Pilots," and then creating a new acronym from that.

rickford66

(5,528 posts)
29. It was "Airmen" when I worked on it.
Thu Jan 12, 2023, 08:16 AM
Jan 2023

We would have loved to have Airwomen around.The only women we ever saw were stewardesses at the various airline training centers. They did occasionally stop by for a tour, but I heard rumors of pilots bringing in GF's for a simulated mile high flight.

chowder66

(9,080 posts)
6. Also happened in Canada.
Wed Jan 11, 2023, 08:24 PM
Jan 2023

Canada’s air traffic system suffered a similar outage to the one that occurred in the US for a brief period on Wednesday.

US air travel was badly disrupted by the failure of the Federal Aviation Administration’s Notice to Air Missions system (NOTAM) overnight on Tuesday, forcing a full ground stop of domestic aviation on Wednesday morning.

Nav Canada, the Canadian national air navigation service provider, released a statement just after 12.30pm as US airlines struggled to resume normal service.


https://www.independent.co.uk/news/world/americas/canada-flights-system-outage-grounded-b2260500.html

BumRushDaShow

(129,440 posts)
14. Could be that whatever files were corrupted
Wed Jan 11, 2023, 09:14 PM
Jan 2023

were transferred between systems as I expect these systems interoperate through some special data pipe for obvious cross-border continuity purposes.

regnaD kciN

(26,045 posts)
27. I had the same thought...
Thu Jan 12, 2023, 12:43 AM
Jan 2023

It seems unbelievable that the FAA's system is so fragile that a single accidentally-corrupted file could shut down the entire airspace, without the ability to restore a backup and continue as usual.

OTOH, we haven't been hearing much about Russian cybersabotage recently, have we?

Aussie105

(5,432 posts)
28. Could be the Russians or Aliens!
Thu Jan 12, 2023, 06:40 AM
Jan 2023

Or Elon Musk messing about.

More likely though, this was a file error that didn't rear its ugly head until well after it became part of the backups that were kept.

Find the error in your working file, look at the stored time sequence of backups and find the same error.

It happens.

Latest Discussions»Latest Breaking News»US Aviation System Meltdo...