2

I work at a science facility where the software and physical configuration of a huge collection of instruments is modified to conduct some experimental research, and then must be returned to precisely the previous state and functionality. There are some great solutions and expertise in place at our facility already (particularly for safety-critical systems), but I want to learn how to write good-quality checklists and verification procedures to make sure I don't personally make mistakes.

The closest thing I've found so far is NASA's NOAA-N Prime mishap https://llis.nasa.gov/lesson/1580, where the configuration was changed and

The necessary 24 bolts to secure the adapter plate were not in place and the team relied on paperwork rather than through visual and mechanical verification as required by the procedures.

Does anyone happen to know what this field is called, or have any recommendations on books or standards that cover how to write these kinds of procedures and checklists?

I've been looking at books on "change management" or "change control", but this isn't quite right.

Crew Resource Management and airline maintainance checklists are just what I'm looking for, but I haven't found a great guide on how to write those.

Apologies if this isn't the right SE site or the question is too open-ended!

0xDBFB7
  • 121
  • 3
  • Service bulletins and work procedures cone to mind. – Solar Mike Jun 14 '22 at 05:50
  • 2
    I'd think of an FMEA. Where you go thru part by part, step by step and list everything that could fail, what the reult of the failure would be, what the reason for the failure would be and how severe the result is. Than you think how you can prevent it from happening also say how much this will reduce the risk and also define if and how something can be checked to detect if the prevention was not enough. – kruemi Jun 14 '22 at 08:32
  • It's call configuration control. It's what didn't happen when an F-15 Strike Eagle was leased to a test range as a photo bird. The guns were replaced with cameras. When it was returned and refurbished with guns, they set off on a check flight and toggled the weapons enable switch as part of normal preflight on the ramp. This emptied the 20mm cannon. Turns out, the camera guys rewired the switch so the camera came on when they hit the enable switch. Somehow, no-one got hurt. 35,000 people in the general area. – Phil Sweet Jul 02 '22 at 20:33
  • https://www.product-lifecycle-management.com/mil-hdbk-61a-6-1.htm – Phil Sweet Jul 02 '22 at 20:40
  • To need a "field" dedicated to trusting reality and the actual state of things over what people say or write says a lot about the state of things... but "configuration control" or "configuration management" is what you seek (it's about paperwork which unfortunately does not address the needs directly) – Abel Jul 03 '22 at 12:54

2 Answers2

1

The fields that would incorporate this would be:

Some of this requires personal experience and knowledge of systems, work practices and technology. Usually it involves more than one person to tap a pool of collected knowledge, particularly when dealing with unfamiliar situations.

Fred
  • 9,562
  • 11
  • 32
  • 44
0

Fred and kruemi's pointers to FMEA are very helpful - I hadn't considered including human error in an FMEA chart. All the same, I feel like there must be a more detailed guideline that e.g. the aviation industry follows for their checklists.

Meister, David, and Thomas P. Enderwick. Human factors in system design, development, and testing. CRC Press, 2001. isn't exactly what I was looking for, but it has some interesting pointers.

(Amusingly, one chapter is devoted to studies on human factors that occur when human factors designers try to find and apply guidelines in human factors handbooks, which is very self-referential. I expect the next level will be "Human factors design of 'Human factors in Human Factors in System Design, Meister & Enderwick on human factors handbooks'")

One field I thought might be relevant is "Task analysis". "Time-focused analysis", or "Timeline charts", where the mental loading of the operator is added to the procedure flow chart.

enter image description here enter image description here

A few handbooks mentioned in the text:

  • MIL-STD-1472F (1999)
  • Human Factors Design Handbook (Woodson et al (1992))
  • Engineering Design Compendium (Boff and Lincoln)

Other suggestions in the text:

  • Simulation & physical simulation. In the SRE world this is a very popular tactic, the production / test server split: build a non-critical replica of the system and try the new configuration on it first, and software simulation is used in e.g. unit and integration tests. In my situation it's obviously not practical to build a complete replica of the facility, but I can identify situations where I could have mocked up certain parts to catch procedural mistakes. Physical simulators are extensively used in aerospace to develop checklists, too.

I was able to contact the individuals responsible for the organization's quality process management, and they seemed to have the right training and were very receptive, so that might be another good place to start.

0xDBFB7
  • 121
  • 3