Re: [edk2] NT32pkg unstable on Dell Precision T3500 / Win7 Pro

Subject: Re: [edk2] NT32pkg unstable on Dell Precision T3500 / Win7 Pro

From: Andrew Fish <afish@apple.com>

To: edk2-devel@lists.sourceforge.net

Date: 2012-02-17 22:51:10

Alex,

Sorry I don't run NT32 and have not look at the code in years, but I did write the pthreads AP implementation so I can speak from a design point of view.

In EFI only the BSP can call EFI services, so the critical section that you claim is missing is already implemented. The DXE core protects from interrupt based reentrancy by raising to TPL_HIGH which masks interrupts. Given this works on real hardware and with the pthread based emulator implementation I think you are probably seeing a bug in NT32 emulation of  the timer interrupt. 

Sorry if I'm looking in the wrong place but it looks to me like the CPU DXE driver is not masking interrupts and that is causing reentrancy failures in the DXE core, that smash the stack. So it looks to me like a race condition that you are just lucky enough to be able to reproduce!

https://edk2.svn.sourceforge.net/svnroot/edk2/trunk/edk2/Nt32Pkg/CpuRuntimeDxe/Cpu.c
EFI_STATUS
EFIAPI
WinNtDisableInterrupt (
  IN EFI_CPU_ARCH_PROTOCOL  *This
  )
{
  CPU_ARCH_PROTOCOL_PRIVATE *Private;

  Private                 = CPU_ARCH_PROTOCOL_PRIVATE_DATA_FROM_THIS (This);
  Private->InterruptState = FALSE;
  return EFI_SUCCESS;
}
vs.

EFI_STATUS
EFIAPI
EmuDisableInterrupt (
  IN EFI_CPU_ARCH_PROTOCOL  *This
  )
{
  CPU_ARCH_PROTOCOL_PRIVATE *Private;

  Private                 = CPU_ARCH_PROTOCOL_PRIVATE_DATA_FROM_THIS (This);
  Private->InterruptState = FALSE;
  gEmuThunk->DisableInterrupt ();
  return EFI_SUCCESS;
}

VOID
SecDisableInterrupt (
  VOID
  )
{
  sigset_t  sigset;

  // Since SetTimer() uses SIGALRM we emulate turning on and off interrupts
  // by enabling/disabling SIGALRM.
  sigemptyset (&sigset);
  sigaddset (&sigset, SIGALRM);
  pthread_sigmask (SIG_BLOCK, &sigset, NULL);
  gEmulatorInterruptEnabled = FALSE;
}


On Feb 17, 2012, at 1:41 PM, Alexander Goykhman wrote:

I traced the missing stack frame to a "frame-pointer omission" compiler option (/Oy) implicitely turned on by /O1xxx.  Disabling optimization noticeably
reduced the failure rate but did not fix the problem:
 
    DxeCore.dll!CpuBreakpoint()  Line 39 C
    DxeCore.dll!DebugAssert(const char * FileName=0x0031d970, unsigned int LineNumber=0x00000069, const char * Description=0x0031d9c8)  Line 304 + 0x5 bytes     C
>    DxeCore.dll!CoreRestoreTpl(unsigned int NewTpl=0x00000010)  Line 105 + 0x25 bytes C
    05e535fa() [Timer.dll!MMTimerThread]
    winmm.dll!_DriverCallback@28()  + 0x4e bytes   
    winmm.dll!_TimerCompletion@4()  + 0x95 bytes   
    winmm.dll!_timeThread@4()  + 0x54 bytes   
    kernel32.dll!@BaseThreadInitThunk@12()  + 0x12 bytes
    ntdll.dll!___RtlUserThreadStart@8()  + 0x27 bytes   
    ntdll.dll!__RtlUserThreadStart@8()  + 0x1b bytes    
 
 
VOID
EFIAPI
CoreRestoreTpl (
  IN EFI_TPL NewTpl  // 0x10
  )
{
  EFI_TPL     OldTpl;
 
  OldTpl = gEfiCurrentTpl; 
  ASSERT (NewTpl <= OldTpl);  <===  ASSERT fails
 
Here gEfiCurrentTpl == 0x10, OldIpl == 0x4 (not 0x10 as one would expect), NewTpl == 0x10
Another word, the value of gEfiCurrentTpl was modified after it was saved on OldTpl,
apparently by another NT32 thread.
 
The following code fragment, and others like it, should be atomic (e.g. inside of a critical section).
 
-------------Enter CS---------------
  OldTpl = gEfiCurrentTpl;
  ASSERT (NewTpl <= OldTpl);
  ASSERT (VALID_TPL (NewTpl));
 
  //
  // If lowering below HIGH_LEVEL, make sure
  // interrupts are enabled
  //
 
  if (OldTpl >= TPL_HIGH_LEVEL  &&  NewTpl < TPL_HIGH_LEVEL) {
    gEfiCurrentTpl = TPL_HIGH_LEVEL;
  }
--------------Leave CS---------------
 
 
The lack of synchronization explains why the failure rate correlates with the number of CPU's.
 
 
From: Sun, Rui [mailto:rui.sun@intel.com] 
Sent: Friday, February 17, 2012 2:11 AM
To: edk2-devel@lists.sourceforge.net
Subject: Re: [edk2] NT32pkg unstable on Dell Precision T3500 / Win7 Pro
 
The call stack records you listed are a little weird.
It is expected: winmm.dll -> Timer.dll!MMTimerThread -> DxeCore.dll!CoreTimerTick ->
Seems Timer.dll!MMTimerThread is missing.
 
Have you modified Nt32Pkg\TimerDxe in your local copy?
 
From: Alexander Goykhman [mailto:Alexander.Goykhman@citrix.com] 
Sent: Thursday, February 16, 2012 5:53 AM
To: edk2-devel@lists.sourceforge.net
Subject: [edk2] NT32pkg unstable on Dell Precision T3500 / Win7 Pro
 
Hello,
 
I started experiencing problems with the NT32 emulator when I switched from a Dell OptiPlex 755 to Precision T3500.   More often than not, NT32 crashes during startup.  If not, it crashes several minutes thereafter.   The crashes fall into two categories:  1) an MdeModulePkg\Core\Dxe\Event\Tpl.c ASSERT failure while lowering/raising TPL and 2) an MdeModulePkg\Core\Dxe\Library\Library.c ASSERT failure while trying to acquire the global event lock due to an apparent stack corruption.  The problems seem to disappear when the system is rebooted with MP disabled (bcdedit /set onecpu on).   All symptoms are consistent with the NT32pkg not being properly synchronized.
 
A search of the edk2-devel  yielded one TPL.c  ASSERT problem reported against Ovmfx64 on 1/19/2012.    This makes me wonder why I seem to be the only affected.  Does anyone out there use the NT32  emulator?
 
Thanks,
Alex Goykhman
 
P.S. Here is a sample of what I have seen so far:
 
-------------------------------------------------------------------------------------------------------------
ASSERT!: c:\temp\edk2\MdeModulePkg\Core\Dxe\Event\Tpl.c (105): NewTpl <= OldTpl
 
               DxeCore.dll!DebugAssert(const char * FileName=0x00342630, unsigned int LineNumber=0x00000044, const char * Description=0x0034261c)  Line 304           C
               DxeCore.dll!CoreRaiseTpl(unsigned int NewTpl=0x00000010)  Line 68 + 0x1a bytes           C
               DxeCore.dll!CoreAcquireLockOrFail(EFI_LOCK * Lock=0x00337d3b)  Line 48 + 0xb bytes C
               DxeCore.dll!CoreAllocatePool(EFI_MEMORY_TYPE PoolType=EfiBootServicesData, unsigned int Size=0x00000074, void * * Buffer=0x0a0efb2c)  Line 207    C
               DxeCore.dll!ReportStatusCodeEx(unsigned int Type=0x90000002, unsigned int Value=0x03050007, unsigned int Instance=0x00000000, const GUID * CallerId=0x00000000, const GUID * ExtendedDataGuid=0x00000000, const void * ExtendedData=0x0a0efd68, unsigned int ExtendedDataSize=0x00000060)  Line 517          C
               DxeCore.dll!DebugAssert(const char * FileName=0x00342630, unsigned int LineNumber=0x00000069, const char * Description=0x00342674)  Line 297 + 0x1a bytes C
               DxeCore.dll!CoreRestoreTpl(unsigned int NewTpl=0x0000001f)  Line 105 + 0x1d bytes    C
               DxeCore.dll!CoreReleaseLock(EFI_LOCK * Lock=0x003453e0)  Line 102 + 0xf bytes            C
>             DxeCore.dll!CoreTimerTick(unsigned __int64 Duration=0x0000000000000010)  Line 223 + 0x6 bytes          C
               winmm.dll!733e3af0()  
                [Frames below may be incorrect and/or missing, no symbols loaded for winmm.dll]       
               winmm.dll!733ea535()  
                winmm.dll!733ea434()  
                kernel32.dll!769f339a()                
                ntdll.dll!77b39ef2()        
                ntdll.dll!77b39ec5()        
---------------------------------------------------------------------------------------------------------------
ASSERT!: c:\temp\edk2\MdeModulePkg\Core\Dxe\Event\Tpl.c (68): OldTpl <= NewTpl
                                NewTpl == 0x04edf810
 
               DxeCore.dll!DebugAssert(const char * FileName, unsigned int LineNumber, const char * Description)  Line 303  C
               DxeCore.dll!CoreReleaseLock(EFI_LOCK * Lock)  Line 96 + 0x13 bytes     C
               DxeCore.dll!CoreReleaseMemoryLock()  Line 123 + 0xa bytes     C
               DxeCore.dll!CoreAllocatePool(EFI_MEMORY_TYPE PoolType, unsigned int Size, void * * Buffer)  Line 213             C
               DxeCore.dll!ReportStatusCodeEx(unsigned int Type, unsigned int Value, unsigned int Instance, const GUID * CallerId, const GUID * ExtendedDataGuid, const void * ExtendedData, unsigned int ExtendedDataSize)  Line 517              C
               DxeCore.dll!DebugAssert(const char * FileName, unsigned int LineNumber, const char * Description)  Line 296 + 0x1a bytes          C
>             DxeCore.dll!CoreRestoreTpl(unsigned int NewTpl)  Line 105 + 0x1d bytes             C
               05ebc2d0()        
               winmm.dll!733e3af0()  
                [Frames below may be incorrect and/or missing, no symbols loaded for winmm.dll]       
               winmm.dll!733ea535()  
                winmm.dll!733ea434()  
                kernel32.dll!769f339a()                
                ntdll.dll!77b39ef2()        
                ntdll.dll!77b39ec5()        
-------------------------------------------------------------------------------------------------------
ASSERT!: c:\temp\edk2\MdeModulePkg\Core\Dxe\Library\Library.c (71): Lock->Lock == EfiLockReleased
 
               DxeCore.dll!DebugAssert(const char * FileName, unsigned int LineNumber, const char * Description)  Line 304  C
               DxeCore.dll!CoreAcquireLock(EFI_LOCK * Lock)  Line 71 + 0x13 bytes      C
>             DxeCore.dll!CoreCurrentSystemTime()  Line 84 C
               DxeCore.dll!CoreCheckTimers(void * CheckEvent, void * Context)  Line 112 + 0x5 bytes                C
               DxeCore.dll!CoreDispatchEventNotifies()  Line 199 + 0x7 bytes  C
               DxeCore.dll!CoreRestoreTpl(unsigned int NewTpl)  Line 123        C
               DxeCore.dll!CoreReleaseLock(EFI_LOCK * Lock)  Line 102 + 0xf bytes      C
               DxeCore.dll!CoreReleaseEventLock()  Line 118 + 0xa bytes           C
               DxeCore.dll!CoreSignalEvent(void * UserEvent)  Line 527             C
               DxeCore.dll!CoreTimerTick(unsigned __int64 Duration)  Line 219 + 0xb bytes      C
               winmm.dll!73e53af0()  
                [Frames below may be incorrect and/or missing, no symbols loaded for winmm.dll]       
               winmm.dll!73e5a535()  
                winmm.dll!73e5a434()  
                kernel32.dll!7516339a()                
                ntdll.dll!77249ef2()         
                ntdll.dll!77249ec5()        
 
--------------------------------------------------------------------------------------------------------------------------------------
ASSERT!: c:\temp\edk2\MdeModulePkg\Core\Dxe\Library\Library.c (71): Lock->Lock == EfiLockReleased
 
               DxeCore.dll!DebugAssert(const char * FileName, unsigned int LineNumber, const char * Description)  Line 304  C
>             DxeCore.dll!CoreAcquireLock(EFI_LOCK * Lock)  Line 71 + 0x13 bytes      C
               DxeCore.dll!CoreAcquireEventLock()  Line 106   C
               DxeCore.dll!CoreSignalEvent(void * UserEvent)  Line 505             C
               DxeCore.dll!CoreTimerTick(unsigned __int64 Duration)  Line 219 + 0xb bytes      C
               winmm.dll!73e53af0()  
                [Frames below may be incorrect and/or missing, no symbols loaded for winmm.dll]       
               winmm.dll!73e5a535()  
                winmm.dll!73e5a434()  
                kernel32.dll!7516339a()                
                ntdll.dll!77249ef2()         
                ntdll.dll!77249ec5()        
-----------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/_______________________________________________
edk2-devel mailing list
edk2-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/edk2-devel