How to debug a Stack Overflow for beginners?

How do you debug a stack overflow?

If you rarely touch debuggers, the above question will be difficult to answer, and if now you are faced with some cryptic failure and error codes. Awww.

Today I am going to share with you my experience in a powerful debugger called WinDBG. This is going to be a very long journey. On we go!

===

The Beginning

The first step in dealing with whatever bugs, is find a solid way to reproduce the bug. If it cannot be reproduced, how can you prove it is gone when you fixed it? Absence of evidence does not imply evidence of absence! Since it is different for all bugs, find the steps to reproduce the bug now and come back.

Have you got it? Make sure you do. You really need it.

Let’s begin.

Start the faulting process and attach WinDBG to it. Supply the path to the right symbols, and source files if you feel need them. The symbol files are called PDB. Without symbols, you will have a very hard time doing debugging in general. With the right source, you can free yourself from looking into assembly. ( Note that source can be incorrect! Assembly does not lie. )

An example for the symbols path could be the Microsoft Symbol Server, and my own symbols :

srv*DownstreamStore*D:local_SymbolDownstreamStore*http://msdl.microsoft.com/download/symbols;
D:\Symbols;

Now let’s download the symbols, /f for reload immediately. The DLL has embedded information and knows where to look for the PDB in the symbol server.

.reload /f

This will force all the modules to find their corresponding PDB symbols. It will take some time. The symbols will be cached at D:\local_SymbolDownstreamStore as specified above. Next time you do not need to wait that long.

List loaded modules, no arguments for showing all of them. This shows all the DLLs that is loaded into memory so far. Before we start doing any debugging, we have to make sure if the modules we want to debug have the right symbols.

lm

If you are successful, you will see something like below :

01000000 012ac000 CrashingProgram (private pdb symbols) D:\symbols\CrashingProgram.pdb

This means the symbols are not right :

10000000 100c8000 ws03res (no symbols)

Let’s set a breakpoint on all first chance exceptions, “*” for all of them. The breakpoint freezes the program so we can examine it.

sxe *

Let’s return the control flow back to the application.

g

It should show a little *BUSY* status. Now it is your job to reproduce the bug.

Reproduce the bug now.

The Debug

(9df4.4f7c): Stack overflow – code c00000fd (first/second chance not available)
eax=0000c94a ebx=80000000 ecx=00d6389c edx=7ffb001c esi=00000104 edi=77f670e9
eip=77d06628 esp=00d62fc0 ebp=00d63858 iopl=0         nv up ei pl nz na pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00010206
oleaut32!LoadTypeLibEx+0×13:
77d06628 53              push    ebx

Got it? Good. WinDBG should have halted by now. And the *BUSY* is gone. The next step is optional : create a memory dump in case you need to bring the debugging elsewhere or do it later.

.dump /ma C:\memory.dmp

The flag /m is to create a minidump, and with “a” it is equivalent to “fFhut” as well, which effectively means dump everything out. Funny thing is that a mini dump is bigger than a full dump – legacy stuffs.

The first thing you do on a crash, is to do the !analyze, with -v as verbose. It will do all the grunt work to analyze the information for you and save you a lot of time.

!analyze -v


0:001> !analyze -v
*******************************************************************************
* *
* Exception Analysis *
* *
*******************************************************************************

FAULTING_IP:
ole32!ModalLoop+5b [d:\nt\com\ole32\com\dcomrem\chancont.cxx @ 200]
776c1d74 57 push edi

EXCEPTION_RECORD: ffffffff — (.exr 0xffffffffffffffff)
ExceptionAddress: 77d06628 (oleaut32!LoadTypeLibEx+0×00000013)
ExceptionCode: c00000fd (Stack overflow)
ExceptionFlags: 00000000
NumberParameters: 2
Parameter[0]: 00000001
Parameter[1]: 00d62fbc

BUGCHECK_STR: c00000fd
DEFAULT_BUCKET_ID: STATUS_STACKOVERFLOW
PROCESS_NAME: CrashingProgram.exe
ERROR_CODE: (NTSTATUS) 0xc00000fd – A new guard page for the stack cannot be created.
RECURRING_STACK: From frames 0×70 to 0×98
NTGLOBALFLAG: 0
APPLICATION_VERIFIER_FLAGS: 0
LAST_CONTROL_TRANSFER: from 77d06c53 to 77d06628

STACK_COMMAND: ~1s; .ecxr ; kb

FOLLOWUP_IP:
CrashingProgram!_com_ptr_t<_com_IIID<CrashingServiceLib::IShared,&_GUID_e55d5bc5_0eff_4ca9_ae3f_63f6203afe18> >::CreateInstance+3a [d:\l\src\sdk\inc\comip.h @ 516]
010ab6ca 8945fc mov dword ptr [ebp-4],eax

FAULTING_SOURCE_CODE:
512:
513: if (dwClsContext & (CLSCTX_LOCAL_SERVER | CLSCTX_REMOTE_SERVER)) {
514: IUnknown* pIUnknown;
515:
> 516: hr = CoCreateInstance(rclsid, pOuter, dwClsContext, __uuidof(IUnknown), reinterpret_cast<void**>(&pIUnknown));
517:
518: if (FAILED(hr)) {
519: return hr;
520: }
521:

SYMBOL_STACK_INDEX: 82
FOLLOWUP_NAME: MachineOwner
MODULE_NAME: CrashingProgram
IMAGE_NAME: CrashingProgram.exe
DEBUG_FLR_IMAGE_TIMESTAMP: 4937c905
FAULTING_THREAD: 00004f7c
SYMBOL_NAME: CrashingProgram!_com_ptr_t<_com_IIID<CrashingServiceLib::IShared,&_GUID_e55d5bc5_0eff_4ca9_ae3f_63f6203afe18> >::CreateInstance+3a
FAILURE_BUCKET_ID: c00000fd_CrashingProgram!_com_ptr_t__com_IIID_CrashingServiceLib::IShared,__GUID_e55d5bc5_0eff_4ca9_ae3f_63f6203afe18___::CreateInstance+3a
BUCKET_ID: c00000fd_CrashingProgram!_com_ptr_t__com_IIID_CrashingServiceLib::IShared,__GUID_e55d5bc5_0eff_4ca9_ae3f_63f6203afe18___::CreateInstance+3a
Followup: MachineOwner
———

We are concerned only with these :

FAULTING_IP – The CPU instruction to execute when the crash happens.

STACK_COMMAND – This gives us a short hand to get more information on the stack by executing it in WinDBG command prompt.

MODULE_NAME – The crashing module name in the executable.

IMAGE_NAME – The crashing module file name in the file system.

FAULTING_THREAD – The thread ID of the thread that is active at the moment of crash.

FAULTING_SOURCE_CODE – If you have the right source code and symbols, this can pinpoint the source code where the crash happens.

STACK_TEXT – If you have not the right source code, this gives you the idea what happened.

ERROR_CODE – The error code of the exception that caused this crash.

DEFAULT_BUCKET_ID – The category of the problem we experience.

We got a stack overflow – 0xc00000fd ( You can find that in ntstatus.h ). We also know that the faulting thread is 1. In the above, sometimes you might not get the source code but the STACK_TEXT instead. The STACK_TEXT is the stacktrace of the faulting thread. It is present when you do not have the source code. Manually, you can type ~<thread number>s where thread number is the thread you want to see. In this case, it is 1. You can use the STACK_COMMAND as supplied above by the analysis, ~1s; .ecxr ; kb ;.

Let’s work on thread one.

~1s

Display the exception context for more information just in case. Registers do not lie too. =)

.ecxr

0:001> .ecxr
eax=0000c94a ebx=80000000 ecx=00d6389c edx=7ffb001c esi=00000104 edi=77f670e9
eip=77d06628 esp=00d62fc0 ebp=00d63858 iopl=0 nv up ei pl nz na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00010206
oleaut32!LoadTypeLibEx+0×13:
77d06628 53 push ebx

Now dump the thread stack , “kb” with arguments. Stack traces can be corrupted, do not trust them all.

kb

As we are doing a stack overflow debugging, we are probably using the whole of the stack reserves. Let’s see how much memory are we allowed by dumping the Thread Environment Block (TEB) :

!teb

0:001> !teb
TEB at 7ffdc000
ExceptionList: 00d63ca0
StackBase: 00da0000
StackLimit: 00d61000

SubSystemTib: 00000000
FiberData: 00001e00
ArbitraryUserPointer: 00000000
Self: 7ffdc000
EnvironmentPointer: 00000000
ClientId: 00009df4 . 00004f7c
RpcHandle: 00000000
Tls Storage: 00000000
PEB Address: 7ffdf000
LastErrorValue: 14007
LastStatusValue: 0
Count Owned Locks: 0
HardErrorMode: 0

Ahhh, do you see the StackBase and StackLimit? Their difference is the stack reserve limit for this thread, note that this can be different for every other thread.  ( DA0000 – D61000 = 3F000 ) and you got 252kb roughly.

Now let’s dump all of those stack frames and see what we’ve got.

~*kb 0xffff


0:001> ~1kb 0xffff
ChildEBP RetAddr  Args to Child
00d63858 77d06c53 00d638d4 00000000 00d6389c oleaut32!LoadTypeLibEx+0×13 [
(truncated)]
00d6386c 77d0e9f8 00d638d4 00d6389c 07df2d4c oleaut32!LoadTypeLib+0×12 [(truncated)]
00d63c4c 77d0ed1b 07df2d4c 00d63c68 0012f070 oleaut32!GetTypeInfoOfIID+0×371 [
(truncated)]
00d63c6c 7778d01b 07df2d38 07defd58 022117dc oleaut32!CUnivStubWrapper::Invoke+0x7c [(truncated)]
… ( truncated for clarity )

00d9ffa4 77f65e91 00000001 000a142c 00000000 CrashingModule!CServiceModule::_ServiceMain+0×57 [
d:\l\src\CrashingProgram\CrashingServiceMain.cpp @ 514]
00d9ffb8 77e64829 000a1420 00000000 00000000 advapi32!ScSvcctrlThreadA+0×21 [
(truncated)]
00d9ffec 00000000 77f65e70 000a1420 00000000 kernel32!BaseThreadStart+0×34 [
(truncated)]

The first column is the address in the stack. The second column is the return address. The third, forth and fifth column are the arguments to the function call. We are concerned with the first column and the top and bottom of the stack trace. Their difference gives the amount of memory used on the stack.

To calculate the amount of memory used, do a subtraction ( D9FFEC – D63858 = 3C794 ) and … 242kb. There we go! This thread is topping the limit of 252kb, and it seems it just hit the top. However, we still have to find out what caused this memory usage in the first place.

Phew!

The Cause

Now, we have to find out what is happening. Since the problem lies in using the whole stack memory. Let’s analyze the stack frames for any recursion. According to WinDBG online help, it could be :

  • A thread uses the entire stack reserved for it. This is often caused by infinite recursion.
  • A thread cannot extend the stack because the page file is maxed out, and therefore no additional pages can be committed to extend the stack.
  • A thread cannot extend the stack because the system is within the brief period used to extend the page file.

For cause 1, infinite recursion, the crash can be exacerbated if you are allocating huge strings on the stack.

So. Let’s look into the full stack trace and see if there are any apparent recursions :

0:001> ~1kb 0xffff
00d652f8 010af822 01012b68 00000000 00000017 CrashingProgram!_com_ptr_t<_com_IIID<CrashingServiceLib::IShared3,&_GUID_e55d5bc5_0eff_4ca9_ae3f_63f6203afe18> >::CreateInstance+0x3a [d:\l\src\sdk\inc\comip.h @ 516]
00d65344 010adf2e 00bb93c8 6c3ba3d9 00d6540c CrashingProgram!CShared2::CreateShared3Service+0×72 [d:\l\src\CrashingProgram\Shared2.cpp @ 958]
00d65408 77c80193 00bb93cc 00d65618 02020202 CrashingProgram!CShared2::GetSettings+0x15e [d:\l\src\CrashingProgram\Shared2.cpp @ 458]
… (truncated for clarity)
00d67178 77c80193 00bc2f80 00000018 00000001 CrashingProgram!CrashingProgram::SetHealthStatus+0×84 [d:\l\src\CrashingProgram\CrashingService.cpp @ 13073]
… (truncated for clarity)
00d68c80 010af822 01012b68 00000000 00000017 CrashingProgram!_com_ptr_t<_com_IIID<CrashingServiceLib::IShared3,&_GUID_e55d5bc5_0eff_4ca9_ae3f_63f6203afe18> >::CreateInstance+0x3a [d:\l\src\sdk\inc\comip.h @ 516]
00d68ccc 010adf2e 00bc2658 6c3b7a41 00d68d94 CrashingProgram!CShared2::CreateShared3Service+0×72 [d:\l\src\CrashingProgram\Shared2.cpp @ 958]
00d68d90 77c80193 00bc265c 00d68fa0 02020202 CrashingProgram!CShared2::GetSettings+0x15e [d:\l\src\CrashingProgram\Shared2.cpp @ 458]
… (truncated for clarity)

I have truncated the above stack trace, which is about thousands of lines. There is a recurring CreateInstance in general. I counted there are 37 instances of it, and each of them uses about 6.8kb. ( 6.8kb * 37 = 251.6kb ) Boom!!!

Remember what we see above in the “!analyze -v” results?

CrashingProgram!_com_ptr_t<_com_IIID<CrashingServiceLib::IShared,&_GUID_e55d5bc5_0eff_4ca9_ae3f_63f6203afe18> >::CreateInstance+3a [d:\l\src\sdk\inc\comip.h @ 516]

Ignore this paragraph : After some studying, it turns out that the cause is that a COM Single-Threaded Apartment allows pre-emption if the main thread is performing an Out-of-proc call, which is a by-product of the Windows Message Loop. The official workaround is to use the IFilter and implement the whole thing yourself. YUCKS! Whatever, that is the reason to a program I’m working on. It might be different for you.

MACHINE BENT! ( This is a my native language slang for almost anything, in this case “Gotcha!” )

By the way, you can also check if this thread is specially allocated only 252kb of stack reserve, or it is a executable limit. Let’s dump the executable headers information. The generic command is “dh <module start addr>” :

!dh 01000000|CrashingProgram.exe -f

252 kb for stack reserves. I got the 01000000 from the “lm” command above. The two hex numbers are starting address and ending address of the loaded code :

01000000 012ac000 CrashingProgram (private pdb symbols) D:\symbols\CrashingProgram.pdb

Then scroll down to the headers, in this case it is :

00040000 size of stack reserve
00002000 size of stack commit
00100000 size of heap reserve
00001000 size of heap commit

That is pathetically small, MSDN says that the operating system default is rounded up to the nearest multiple of 1MB per stack. However, it seems the compiler of this executable imposed their own limits in this case.

The Words

Wow! That’s all for such a boring tutorial. I removed some information from the stacktraces above, though I believe the information above is adequate for your understanding. I hope you find this article helpful for bootstrapping your debugging experience, as it can be very fun.

If you know where I am doing badly, remember to tell me as I am … a beginner! ( Hey! I am a web application security dude! )

===

Resources :

WinDBG help online – http://msdn.microsoft.com/en-us/library/cc267445.aspx

Thread Stack Size – http://msdn.microsoft.com/en-us/library/ms686774(VS.85).aspx

Crash Dump Analysis – http://www.dumpanalysis.org/blog/

Advanced Windows Debugging – http://www.amazon.com/Advanced-Debugging-Addison-Wesley-Microsoft-Technology/dp/0321374460

Windows Internals – http://www.amazon.com/Microsoft-Windows-Internals-4th-Server/dp/0735619174/

About these ads

5 Responses to “How to debug a Stack Overflow for beginners?”


  1. 1 Chris Weber December 9, 2008 at 12:17 am

    It’s important to enable full pageheap checking with gflags too, so you can catch heap corruption issues. If you found a buffer overrun here, Windbg would have reported STATUS_STACK_BUFFER_OVERRUN too, but with the STACK_OVERRUN it looks like you have a DoS instead – or did you find a better exploit?

  2. 2 log0 December 9, 2008 at 1:58 am

    >>Chris

    Yep. I know of the gflags. That was something I didn’t use though, but I’ll be testing another. Those are good things along with a lot of tools Microsoft develoeped.

    That is a DoS, by flooding the program with arbitrary client requests. Unfortunately, I haven’t locate any user-input buffers ( well, I have the source code =) )

  3. 3 Klischeepunk August 12, 2011 at 9:41 am

    Thanx a lot 4 this gr8 Article.

  4. 4 Dwayne Robinson February 24, 2012 at 12:54 am

    Thanks for the article. Seeing the output from DH and TEB was useful for a stack overflow today.


  1. 1 I am NOT dead. « Paralliverse Trackback on December 5, 2008 at 9:23 am

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s





Follow

Get every new post delivered to your Inbox.

%d bloggers like this: