Kahveh SaramoutMarch 24, 2023

How to leverage foundation models in manufacturing

For several years, I worked in robotics at a large pharmaceutical manufacturing company, bringing in emerging technology to automate testing processes and manage incoming and outgoing materials. While the work I did ended up saving the company $50M over five years, I realized that the real problem I was trying to solve wasn't the application of the automation assets themselves, but the data architecture that enabled them to communicate, share information to other systems, and execute tasks based on events.

Building a process like this, which takes significant capital investment, just isn't sustainable on a site-by-site basis — it needs to be a standardized architecture that gets deployed globally. This realization is no secret for those in the field — the process of applying emerging technologies, integrating them at the data level, standardizing their design, and deploying them quickly and globally is known as the fourth industrial revolution.

Today, we're standing on the precipice of the fifth: AI.

When I first used tools like Copilot, ChatGPT, Stable Diffusion, and DALL-E, I was blown away. It was immediately clear that I was witnessing a paradigm shift in technology. For the first time, I experienced a computer program that was able to infer context, follow instruction in natural language, use knowledge from other sources, and make decisions in a human-like way. These programs, known as foundation models, can be fine-tuned for a particular application and perform exceptionally well at that application with significantly less development and training than typical algorithms.

Think about all the situations in your factory where you need humans to watch a process, identify an event, and then make a decision based on SOPs or other protocols. The bulk of this work can be offloaded to large language models (LLMs). In situations where a human is still needed, LLMs can significantly improve peoples’ access to information and guide them on how to solve a problem within the framework of their organizations SOPs, technical specifications, and other process defining documents.

Save millions by leveraging large language models

Extracting value from LLMs is easier than you may think. In these examples, I used GPT3.5 and GPT4 to solve problems and make decisions in a simulated factory environment. I shared specific SOPs, tech specs, details of how to communicate with the Manufacturing Execution System (MES), the status of individual manufacturing assets, and the production schedule of multiple manufacturing sites with the LLM.

Using GPT3.5 to help troubleshoot an alarm with a palletizer and manage communications and scheduling



The robotic palletizer on line 4 is throwing an error code on each reboot. What can I try? 

I understand. Based on the error code, it appears that there may be an issue with the communication between the robotic palletizer and the control system. One option to try would be to perform a system reboot to see if that clears the error. If the issue persists, I suggest checking the network connections and communication protocols to ensure they are configured correctly. If needed, I can also dispatch a notification to the maintenance team to investigate and resolve the issue.

Let the maintenance team know and inform the vendors on-site field engineers of the problem. Inform on-call operations associate of the downtime. Monitor the line status and make changes to the schedule as needed. 

Understood. I have dispatched a notification to the maintenance team to investigate the issue with the robotic palletizer on Line 4. Additionally, I have contacted the on-site field engineers from the vendor to provide assistance if needed. I have also informed the on-call operations associate about the downtime and the potential impact on the production schedule.

I am now monitoring the status of Line 4 and will update the production schedule accordingly. If the maintenance team requires more time to resolve the issue, I can assess the available production time on other lines and adjust the production plan accordingly to minimize any production delays.

Expanding on this example, I checked to see whether the LLM can help make up for a production shortfall by utilizing capacity at other sites.



Check if there is production time available on other lines, in other sites, to make up the production shortfall.

Certainly. I am accessing the production schedule for all lines across all sites and analyzing the available production time. Based on the current production schedule and inventory levels, I have identified that there is available production time on Line 3 in Site 2. I will update the production plan to increase production on Line 3 and will dispatch a notification to the material handling system to transport any necessary components to Site 2 to support this increased production.

Using the supplied production schedule and status of equipment at other sites, the LLM was able to correctly identify capacity at another location, potentially reducing the impact of down time.

I assumed a conservative reduction in the need for three line leads across all shifts, a 12-month implementation time, and a $500,000 capital investment (primarily for contractor time spent building the required data interfaces for existing systems), the financial benefits are substantial. Leaving aside the savings due to reduction in downtime and automated production rescheduling (which has the potential to be massive), over a 5-year period, the net present value (NPV) is estimated at $1 million, with an internal rate of return (IRR) of 66% and a gross return of 260%.

To take it further, I replicated this system at seven other facilities. Leveraging the significant capital investment reduction enjoyed when I replicated assets, I invested $1.4M across 7 sites. In the first year, only our first POC system is online. In year two, we will bring four other facilities online. In year three, we will bring the last two sites online. The value of this project now sits at a very cool $5.5M five-year net-present value (NPV), an internal rate of return (IRR) of 77%, and a gross return of 539%. This is how you supercharge your career — not to mention your company.

Using GPT4 to troubleshoot an alarm

In this example, I took on the persona of a maintenance technician. GPT4 has access to the state of the machine and our tech specs. The LLM communicates with the technician and uses its knowledge of the tech specs to help troubleshoot the alarm code. Without the use of a LLM, the technician would have had to search the document repository for the relevant tech spec and spend time reading and understanding the document. Using the LLM to automate that process gets us to a decision much faster.



I am standing at the workcell and see error code 79. What can I try?

Error code 79 on a FANUC robotic arm usually indicates a servo motor problem. The error typically means that there is a mismatch between the position of the servo motor and its expected position as determined by the control system. Here are a few steps you can try to troubleshoot the problem:

  1. Check for mechanical issues: Inspect the robotic arm for any signs of mechanical issues, such as loose bolts or damaged components that could be causing excessive play or backlash in the arm's movement.

  2. Inspect the cables and connectors: Check all cables and connectors related to the servo motor, including the power and feedback cables. Look for any signs of damage or loose connections that could be causing intermittent communication with the controller.

  3. Verify the motor and encoder: Confirm that the servo motor and its associated encoder are functioning correctly. You can use the robot controller's diagnostic tools to monitor the motor's performance and verify that it is responding as expected.

  4. Reset the error: After addressing any issues you have identified, try resetting the error by powering off the robotic arm, waiting a few moments, and then powering it back on.

I inspected the cables and found a damaged connector. I replaced the connector and power cycled the machine. I now see the error code C1650.

If are encountering error code C1650 after replacing the damaged connector and power cycling the machine, you can try the following steps to further troubleshoot the issue:

  1. Check for software or programming errors: Ensure that the robot program is correct and doesn't contain any errors that might cause the system to generate an error code. If necessary, reload a known-good program and test the robot arm again.

  2. Inspect the servo amplifier: The servo amplifier is responsible for controlling the servo motor. Ensure that the servo amplifier is properly connected, functioning correctly, and not generating any fault codes.

  3. Perform a system reset: In some cases, performing a complete system reset can help clear persistent error codes. Follow the procedure outlined in your FANUC robotic arm user manual for resetting the system.

  4. Review the error history: Check the error history on the robot controller to see if any other errors have occurred recently. This information can help you identify any patterns or related issues that might be contributing to the C1650 error.

I reloaded a known good program and it is operating correctly. Does this mean I need to contact central automation engineering to evaluate the code?

Yes, since reloading a known-good program resolved the issue, it is likely that the problem originated from a software or programming error in the previous program. In this case, it would be a good idea to contact the central automation engineering team to evaluate the original code. They can help identify and fix any programming errors or discrepancies that might have caused the C1650 error. By doing so, we can ensure the robot arm operates correctly and avoid similar issues in the future.

Send a Slack message to central automation engineering with a memo of what we tried. 

Message sent to Central Automation Engineering Slack channel:

Subject: Memo - Troubleshooting Robotic Arm Issue

Hello Central Automation Engineering team,

I'd like to update you on the recent troubleshooting efforts for the robotic arm error we encountered. The initial issue was identified as error code C1650. Here's a summary of the steps we took:

Inspected cables and found a damaged connector.

Replaced the damaged connector.

Power cycled the machine, which resulted in error code C1650.

Reloaded a known good program.

After reloading the known good program, the robotic arm began operating correctly. We would like your team to evaluate the code to determine the root cause and deploy a fix.

Thank you

Based on my assumptions, the system implementation would take twelve months and cost $500,000 in capital investment. With a 10% productivity increase across the maintenance staff and a 20% reduction in maintenance-related downtime, we will enjoy a five-year NPV of $700,000, an IRR of 53%, and a gross return of 200%.

Once again, I replicated this project across seven sites. The enterprise wide value is a five-year NPV of $5M, IRR of 75%, and gross return of 479%.

Adopting AI and foundation models

The opportunities are endless when it comes to AI and foundation models, but adopting this technology at the usual pace won’t be enough to help you maximize its value — you will need to move quickly. We are in a winner-take-all economy, so lagging behind in the adoption of emerging technologies can have severe repercussions on competitive standing. By hesitating to implement these innovations, organizations inadvertently provide competitors with an opportunity to outpace them. The good news is that we are still early, as even the most advanced manufacturing organizations are just now beginning to develop POCs.

Stay tuned next week for part two of this blog series, where we’ll discuss best practices and tips for building an AI-friendly architecture that can help you turn your unstructured data into powerful, cost-saving solutions faster than your competitors.