Re: [edk2] [PATCH 2/2] MdeModulePkg: CoreStall: improve accuracy in iterating branch

Subject: Re: [edk2] [PATCH 2/2] MdeModulePkg: CoreStall: improve accuracy in iterating branch

From: "Tian, Feng" <feng.tian@intel.com>

To: "edk2-devel@lists.sourceforge.net" <edk2-devel@lists.sourceforge.net>

Date: 2014-09-15 15:00:37

Hi, Laszlo

Here is our intention for the logic:

If Microseconds is greater than or equal to 0x1999999999999999ULL(about 58,494 years), then the divide operation is performed first, which means we will lose some accuracy, but this is a very large delay request, so waiting a little bit extra in this extreme case is an acceptable compromise.  

So I would prefer not to add this patch if there is no real impact.

Thanks
Feng

-----Original Message-----
From: Laszlo Ersek [mailto:lersek@redhat.com] 
Sent: Sunday, September 14, 2014 21:07
To: edk2-devel@lists.sourceforge.net
Subject: [edk2] [PATCH 2/2] MdeModulePkg: CoreStall: improve accuracy in iterating branch

The iterating branch of CoreStall() adds an extra tick for each iteration if the division returned a nonzero remainder; effectively rounding up to whole ticks separately per iteration. This has the potential to waste up to 9 ticks.

While the current code is technically correct ("the Stall() function stalls execution on the processor for *at least* the requested number of microseconds", according to the UEFI spec, emphasis mine), we can improve the accuracy by rounding up to the exact whole number of ticks. We accumulate fractional ticks during the iteration, and flush whole ticks as they become available.

Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Laszlo Ersek 
---
 MdeModulePkg/Core/Dxe/Misc/Stall.c | 44 +++++++++++++++++++++++++++++---------
 1 file changed, 34 insertions(+), 10 deletions(-)

diff --git a/MdeModulePkg/Core/Dxe/Misc/Stall.c b/MdeModulePkg/Core/Dxe/Misc/Stall.c
index aae54fd..4c63624 100644
--- a/MdeModulePkg/Core/Dxe/Misc/Stall.c
+++ b/MdeModulePkg/Core/Dxe/Misc/Stall.c
@@ -51,51 +51,75 @@ CoreInternalWaitForTick (  **/  EFI_STATUS  EFIAPI  CoreStall (
   IN UINTN            Microseconds
   )
 {
   UINT64  Counter;
   UINT32  Remainder;
   UINTN   Index;
+  UINT64  Accumulator;
 
   if (gMetronome == NULL) {
     return EFI_NOT_AVAILABLE_YET;
   }
 
   //
   // Counter = Microseconds * 10 / gMetronome->TickPeriod
   // 0x1999999999999999 = (2^64 - 1) / 10
   //
   if (Microseconds > 0x1999999999999999ULL) {
     //
     // Microseconds is too large to multiple by 10 first.  Perform the divide 
     // operation first and loop 10 times to avoid 64-bit math overflow.
     //
     Counter = DivU64x32Remainder (
                 Microseconds,
                 gMetronome->TickPeriod,
                 &Remainder
                 );
+
+    Accumulator = 0;
+    //
+    // In each iteration we wait for Counter whole ticks, and
+    // (Remainder/TickPeriod) fractional ticks (the latter being strictly less
+    // than one). Clearly we can only pass whole ticks to
+    // CoreInternalWaitForTick(), therefore we keep a running account of the
+    // fractional ticks. Whenever enough fractional ticks accumulate, we flush
+    // a whole tick from them.
+    //
+    // Note that Accumulator will never reach or exceed 2*TickPeriod, because
+    // Remainder < TickPeriod. This has two consequences:
+    // - We never have to flush more than 1 whole tick.
+    // - Accumulator is always expressible as UINT64 (TickPeriod being UINT32).
+    //
     for (Index = 0; Index < 10; Index++) {
-      CoreInternalWaitForTick (Counter);
-    }      
+      Accumulator += Remainder;
+      if (Accumulator < gMetronome->TickPeriod) {
+        CoreInternalWaitForTick (Counter);
+      } else {
+        Accumulator -= gMetronome->TickPeriod;
+        if (Counter == MAX_UINT64) {
+          CoreInternalWaitForTick (Counter);
+          CoreInternalWaitForTick (1);
+        } else {
+          CoreInternalWaitForTick (Counter + 1);
+        }
+      }
+    }
 
-    if (Remainder != 0) {
-      //
-      // If Remainder was not zero, then normally, Counter would be rounded 
-      // up by 1 tick.  In this case, since a loop for 10 counts was used
-      // to emulate the multiply by 10 operation, Counter needs to be rounded
-      // up by 10 counts.
-      //
-      CoreInternalWaitForTick (10);
+    //
+    // Flush any remaining fractional ticks.
+    //
+    if (Accumulator > 0) {
+      CoreInternalWaitForTick (1);
     }
   } else {
     //
     // Calculate the number of ticks by dividing the number of microseconds by
     // the TickPeriod.  Calculation is based on 100ns unit.
     //
     Counter = DivU64x32Remainder (
                 MultU64x32 (Microseconds, 10),
                 gMetronome->TickPeriod,
                 &Remainder
--
1.8.3.1


------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/edk2-devel

------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/edk2-devel