So I have been working on a list of blog topics. I actually have an outline that is 3 pages long, but I am not going to blog on any of the topics on the outline because my experience this week at the client site I think was useful and relevant. We are at crunch time. A lot of things have stacked up and as usual we are working to a very aggressive deadline. In recent years I have typically held many leadership roles on SAP projects. Usually heading up the technical team, working as the project architect or serving as the overall project manager for the SAP project. In my current roll I heading up the Basis Team in a very hands on role. I have been doing my best not to interfere with the development team. The development team is headed up by a good friend of mine and a very talented senior SAP consultant. This week we have had a number of SAP performance defects logged. Upon a quick initial analysis it was clear that the SAP performance issues were not Basis related. The development team is currently buried in the perfect storm of conversions finally coming together and the test team finally kicking the tires on the rest of the delivered development interface objects. This gave me the opportunity to pitch in get my hands dirty and look at some code and performance issues. We knocked off a significant SAP ABAP performance issue with a 100 fold improvement when all was said and done. Looking back on the problem it was obvious to me, but I missed a couple of clues that should have pointed me to the problem even sooner. This is what this blog post is about.
I started off staging a test to look at a performance issue related to an SAP customer load and ended up diagnosing a custom SAP invoice interface problem with Vistex. We had staged a test to look at why SAP V2 update processes where significantly bottle necking when the customer load program ran. Just before I pulled the trigger on the test. I decided to fire up Topas on the AIX to look at how the SAP server was running. That is when I saw something very interest. A custom developed SAP invoice interface program had been running for nearly 16 hours and was consuming nearly 1/3 of the CPU capacity of the instance (3 Cores – 4 and change GHz processors). I did a little checking and we were processing a file with approximately 47,000 invoices. Now that just ain’t right! You should be able to process 100’s of thousands of invoices an hour on an SAP instances configured as we are configured.
It was a programming problem and just using Topas, I should have been able to predict the problem and known where to look. I will explain why I should have known where the problem was and then go into how I actually figured out the issue. The smoking gun should have been the following clues. The SAP server we are testing on is a 2 box instance, separate SAP CI and DB. Topas was reporting one SAP batch process on the CI was consuming 1/3 of the CPU capacity on the box, that process was consuming a pretty normal amount of memory and was stable. The SAP DB server throughout the process was under no visible load. There was also no real network activity to be measured. The interface program reads a file, formats the data and passes the data to a standard SAP posting function. We were posting an invoice approximately once every 2 seconds.
Given just the above information what do we know? Experience tells me that my server should be sized to handle hundreds of thousands of invoices an hour, we are processing 2 per second. My process is consuming significant CPU. My process is stable from a memory perspective. My process is not hitting the DB. What is my process doing? It is crunching though ABAP. More specifically it is crunch through ABAP and it is not doing DB operations.
So what was the problem? We had 3 nested loops. The outer loop had 50k rows, the next loop had 150k rows, and the final loop had 800k rows. That is a lot of looping. In testing the developer used small files maybe 20 invoices max in the tests. While the code was still broken during unit testing it would not be visible with small test files. So how did I go about finding the problem? I kicked off a SQL trace using ST01 and could see that for each invoice we had about 2000 microseconds of operations. I then reviewed the code, selection screen, file import, and then processing. It took less than a minute to import the file and start the processing routine. I commented out the call to the posting function module (AKA the heavy lifting) and replaced it with a write statement. It was taking approximately 2 seconds per write still. The problem therefore lied in the routine that formatted the data from the file to present it to the posting function. As soon as I opened that part of the code it was obvious we had nested loops and looking at the counts on the internal tables it was pretty clear why it was running so long.