skip to main content

Using The Disassembler To Highlight Optimization Targets


The changes from this blog can be seen in our Github Pull Request:

It’s sometimes amazing how many string functions show up in profiling runs of UE4 – whether running final games, commandlets (such as content cooking) or the editor.

The FPaths::IsRelative() function showed up on a recent test – this can be found in Paths.cpp:-

bool FPaths::IsRelative(const FString& InPath)
  const bool IsRooted = InPath.StartsWith(TEXT("\\"), ESearchCase::CaseSensitive)  ||
              InPath.StartsWith(TEXT("/"), ESearchCase::CaseSensitive)  ||
              InPath.StartsWith(TEXT("root:/")) |
              (InPath.Len() >= 2 && FChar::IsAlpha(InPath[0]) && InPath[1] == TEXT(':'));
  return !IsRooted;

It looks innocent enough, right..? Just some harmless string tests to determine whether InPath is a relative path (eg. “../engine/myfile.uasset”) or absolute (eg. “c:\\myfile.uasset”).

To look into this one, I did what I often do… fired up the debugger, set a breakpoint in the function … and then looked at the disassembly. It was at this point that the true horror of the situation became immediately apparent. Here’s a small piece of the disassembly:-

00007FF6DFF97B4E  mov         ecx,2  
00007FF6DFF97B53  xor         edi,edi  
00007FF6DFF97B55  xor         edx,edx  
00007FF6DFF97B57  mov         r8d,ecx  
00007FF6DFF97B5A  mov         qword ptr [rsp+70h],rbx  
00007FF6DFF97B5F  mov         dword ptr [rbp+28h],edi  
00007FF6DFF97B62  mov         qword ptr [rbp-10h],rdi  
00007FF6DFF97B66  mov         qword ptr [rbp-8],2  
00007FF6DFF97B6E  call        DefaultCalculateSlack (07FF6DFEC7DD0h)  
00007FF6DFF97B73  movsxd      rcx,eax  
00007FF6DFF97B76  mov         rax,qword ptr [rbp-10h]  
00007FF6DFF97B7A  mov         dword ptr [rbp-4],ecx  
00007FF6DFF97B7D  test        rax,rax  
00007FF6DFF97B80  jne         FPaths::IsRelative+46h (07FF6DFF97B86h)  
00007FF6DFF97B82  test        ecx,ecx  
00007FF6DFF97B84  je          FPaths::IsRelative+5Bh (07FF6DFF97B9Bh)  
00007FF6DFF97B86  mov         rdx,rcx  
00007FF6DFF97B89  xor         r8d,r8d  
00007FF6DFF97B8C  mov         rcx,rax  
00007FF6DFF97B8F  add         rdx,rdx  
00007FF6DFF97B92  call        FMemory::Realloc (07FF6DFF04CB0h)  
00007FF6DFF97B97  mov         qword ptr [rbp-10h],rax  
00007FF6DFF97B9B  lea         rdx,[ToUpperAdjustmentTable+2ABCh (07FF6E1EFFB9Ch)]  
00007FF6DFF97BA2  mov         r8d,4  
00007FF6DFF97BA8  mov         rcx,rax  
00007FF6DFF97BAB  call        FGenericPlatformString::Memcpy (07FF6DFED3CA0h)  
00007FF6DFF97BB0  lea         rdx,[rbp-10h]  
00007FF6DFF97BB4  xor         r8d,r8d  
00007FF6DFF97BB7  mov         rcx,rsi  
00007FF6DFF97BBA  mov         ebx,1  
00007FF6DFF97BBF  call        FString::StartsWith (07FF6DFEDD440h)  
00007FF6DFF97BC4  test        al,al  
00007FF6DFF97BC6  jne         FPaths::IsRelative+1BBh (07FF6DFF97CFBh)

The code above accounts for just a quarter of the whole function. All of this is just to do a single line of code from the C++… scary!

Not only is the code long, you should note the call to FMemory::Realloc() … that’s just one of the three that occurs in the full disassembly. Later in the code and not shown in the snippet above, were calls to FMemory::Free() (three of these, too). And finally, StartsWith() isn’t exactly a cheap function to be calling here either (note that StartsWith() only has an FString implementation).

So… here’s what I did:-

  1. reduced the amount of calls to StartsWith() by replacing such as InPath.StartsWith(TEXT(“\\”), ESearchCase::CaseSensitive) with ((InPath[0] == ‘\\’) && (InPath[1] == ‘\\’)) (nb. you also need to check the length of InPath to make sure that we’re not accessing invalid memory);
  2. removed the remaining runtime TEXT() blocks by creating them externally;
  3. wrapped one of the tests with WITH_EDITOR (the root pathing is only appropriate there).

My final code looked like this:-

// Paths.cpp:-

FString FPaths::RootPrefix = TEXT("root:/");
#endif // WITH_EDITOR

bool FPaths::IsRelative(const FString& InPath)
  const uint32 PathLen = InPath.Len();

  const bool IsRooted = PathLen &&
    ((InPath[0] == '/') ||
      (PathLen >= 2 && (
        ((InPath[0] == '\\') && (InPath[1] == '\\'))
        || (InPath[1] == ':' && FChar::IsAlpha(InPath[0]))
        || (InPath.StartsWith(RootPrefix))
#endif // WITH_EDITOR
  return !IsRooted;

// Paths.h:-


  static FString RootPrefix;
#endif // WITH_EDITOR

Here’s how this all looks when we disassemble the new code:-

00007FF7147B695A  mov         edx,dword ptr [r8+8]  
00007FF7147B695E  mov         rsi,rcx  
00007FF7147B6961  test        edx,edx  
00007FF7147B6963  je          FPaths::ConvertRelativePathToFull+67h (07FF7147B6997h)  
00007FF7147B6965  dec         edx  
00007FF7147B6967  je          FPaths::ConvertRelativePathToFull+67h (07FF7147B6997h)  
00007FF7147B6969  mov         rax,qword ptr [r8]  
00007FF7147B696C  movzx       ecx,word ptr [rax]  
00007FF7147B696F  cmp         cx,2Fh  
00007FF7147B6973  je          FPaths::ConvertRelativePathToFull+0A9h (07FF7147B69D9h)  
00007FF7147B6975  cmp         edx,2  
00007FF7147B6978  jb          FPaths::ConvertRelativePathToFull+67h (07FF7147B6997h)  
00007FF7147B697A  cmp         cx,5Ch  
00007FF7147B697E  jne         FPaths::ConvertRelativePathToFull+56h (07FF7147B6986h)  
00007FF7147B6980  cmp         word ptr [rax+2],cx  
00007FF7147B6984  je          FPaths::ConvertRelativePathToFull+0A9h (07FF7147B69D9h)  
00007FF7147B6986  cmp         word ptr [rax+2],3Ah  
00007FF7147B698B  jne         FPaths::ConvertRelativePathToFull+67h (07FF7147B6997h)  
00007FF7147B698D  call        qword ptr [__imp_iswalpha (07FF716557B48h)]  
00007FF7147B6993  test        eax,eax  
00007FF7147B6995  jne         FPaths::ConvertRelativePathToFull+0A9h (07FF7147B69D9h)

This is the -entire- function in a non-editor build. Much, much better I’m sure you’ll agree.

A nice side effect of optimising this function is that the compiler is now happy to inline it – without us even specifying FORCEINLINE or INLINE!

My final tests showed a greater than 20 times performance increase for IsRelative() with 10% of the code footprint of the old version.

Credit(s): Robert Troughton (Coconut Lizard)
Status: Currently unimplemented in 4.12

Facebook Messenger Twitter Pinterest Whatsapp Email
Go to Top