Re: [edk2] Xen OVMF early discussions

Subject: Re: [edk2] Xen OVMF early discussions

From: Bei Guan <gbtju85@gmail.com>

To: Andrei Warkentin <andreiw@motorola.com>

Date: 2011-05-10 10:29:25



2011/5/9 Bei Guan <gbtju85@gmail.com>


2011/5/9 Andrei Warkentin <andreiw@motorola.com>

Alright, so you should instrument the Bds and video driver and see where it gets wedged.

Are you able to use gdbsx to figure out where you are in OVMF?

I think I have to do it as the following.
1) Using theDEBUG ((EFI_D_INFO, "*.c line %d\n", __LINE__)) to get the detailed function where it hung.
2) Use the "add-symbol-file <video driver> <located address>" and gdbsx to figure out the detailed locationwhere it hungin OVMF.

I try to load the driver using the add-symbol-file.

The following debug trace message from "xl console dom" is about the BdsDxe.efi.

Loading driver FC5C7020-1A48-4198-9BE2-EAD5ABC8CF2F
InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 3F41B628
Loading driver at 0x0003F5CD000 EntryPoint=0x0003F5CD400

So, I load BdsDxe.dll to 0x3F5CD000.

#add-symbol-file/root/src/edk2/Build/OvmfIa32/DEBUG_UNIXGCC/IA32/IntelFrameworkModulePkg/Universal/BdsDxe/BdsDxe/DEBUG/BdsDxe.dll0x3F5CD000

And then, I need to make OVMF continuing to run to the point where it will hang. But there is something wrong with the driver load.

(gdb) set debug_attached=1
No symbol "debug_attached" in current context.

So, is there anything wrong with the driver load usingadd-symbol-file?


Thanks,
Bei Guan


Is there something wrong with it?I will try it.


Thanks,
Bei Guan

Thanks,
A

On May 8, 2011 8:19 PM, "Bei Guan" <gbtju85@gmail.com> wrote:
> 2011/5/9 Andrei Warkentin <andreiw@motorola.com>
>
>> Looks good! And then what? Hang? Crash? Does vga initialize properly?
>>
> Oh, sorry. I haven't reported the result.
> The ovmf_on_xen log is all the output message that I got using "xl domain
> -c". And there is not anything crashed message in the log file
> qemu-dm-domname.log. So, I think it hung after outputing the last message as
> the following. The QEMU screen is the same as before, just white and nothing
> shown.
>
> ...
> Found PCI VGA device
> InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 3E981710
> <Maybe it hung here>
>
>
> Thanks,
> Bei Guan
>
>
>
>
>> Thanks,
>> A
>> On May 8, 2011 11:28 AM, "Bei Guan" <gbtju85@gmail.com> wrote:
>> > 2011/5/8 Andrei Warkentin <andreiw@motorola.com>
>> >
>> >> On Sat, May 7, 2011 at 3:19 AM, Bei Guan <gbtju85@gmail.com> wrote:
>> >>
>> >> > You say "blacklist" the device, does it mean I need to add a judgment
>> in
>> >> > edk2 source code like the following pseudo-code?
>> >> > if (CurrentPciDevice == PciDeviceNeedToBeBlacklist) {
>> >> > skip and deal with the next PCI device;
>> >> > }
>> >> >
>> >>
>> >> Yeah...in general, but read below, there may be little value in trying.
>> >>
>> >> >>
>> >> >> Also, consider the fact that you don't really have a "hardware
>> device"
>> >> >> per se, but a software simulation of one in qemu-dm. Questions you
>> >> >> want to answer -
>> >> >>
>> >> >> 1) Could the domain crash be caused by a crash in qemu-dm (and
>> >> >> subsequent exit)? How could you test this theory for yourself?
>> >> >> 2) What's happening inside qemu-dm when you write that register? (gdb
>> >> >> and breakpoints are your friends)
>> >> >
>> >> > Sorry, I have to ask a question here. How to debug OVMF code using
>> gdb,
>> >> when
>> >> > it loaded by Xen hvmloader?
>> >> > I use the same method as you have done in hvmloader, but it doesn't
>> work.
>> >> > So, maybe it's a little complicated.
>> >> > I tried to debug OVMF when it was loaded by Xen hvmloader as the
>> >> following.
>> >> > a) Add this code before Pci.Write() in edk2 code.
>> >> > {
>> >> > volatile int debug_attached = 0;
>> >> > DEBUG ((EFI_D_INFO, "Debug in Pci.Write: Waiting for
>> >> > debugger...\n"));
>> >> > while (!debug_attached) {
>> >> > }
>> >> > }
>> >> > b) Using gbd to attach to the hvmloader.
>> >> > #gdbsx -a domainId 32 9999
>> >> > #gdb hvmloader
>> >> > (gdb) target remote localhost:9999
>> >> > (gdb) set debug_attached=1
>> >> > No symbol "debug_attached" in current context.
>> >> >
>> >>
>> >> Because as far as gdb is concerned, you are debugging hvmloader. The
>> >> Pci bus driver code is loaded at a different address by the firmware.
>> >> On the vm console (xm console domainname) you should see loading
>> >> messages that give you a base address.
>> >>
>> >> Then in gdb you should run command like -
>> >> add-symbol-file
>> >>
>> >>
>> /path/to/edk2//Build/./DEBUG_GCC44/X64/MdeModulePkg/Bus/Pci/PciBusDxe/PciBusDxe/DEBUG/PciBusDxe.dll
>> >> 0x87EAB000
>> >>
>> >> That should allow you to do same kind of gdbsx stepping... I didn't
>> >> try it explicitly yet with gdbsx and ovmf, but that would be the
>> >> general way of doing this. Don't forget to run gdbsx in 64-bit mode if
>> >> you boot OVMF-X64.
>> >>
>> >> So from my end I did verify that it was a QEMU crash (you can see for
>> >> yourself in /var/log/xen/qemu-dm-domname.log).
>> >>
>> >> I rebuilt qemu-dm with debugging info (under tools/ioemu-dir, I just
>> >> edited Makefile, make clean, all, install).
>> >> Then I did something like -
>> >> $ xm create test.xm && sleep 1 && xm pause testxm && ps aux | grep
>> qemu-dm
>> >> <note the PID>
>> >>
>> >> In a separate console:
>> >> ~/src/xen-unstable.hg/tools/ioemu-dir$ gdb -p <noted PID>
>> i386-dm/qemu-dm
>> >> (gdb) c
>> >> Continuing
>> >>
>> >> Back on previous console
>> >> $ xm unpause testxm
>> >>
>> >> Stepped on a SIGABRT. Here is the backtrace I got:
>> >>
>> >> Program received signal SIGABRT, Aborted.
>> >> 0x00007f791f00ba75 in raise () from /lib/libc.so.6
>> >> (gdb) bt
>> >> #0 0x00007f791f00ba75 in raise () from /lib/libc.so.6
>> >> #1 0x00007f791f00f5c0 in abort () from /lib/libc.so.6
>> >> #2 0x0000000000409803 in hw_error (fmt=0x4c0a18
>> >> "register_ioport_read: invalid opaque (old is %x new is %x)") at
>> >> /home/fjnh84/src/xen-unstable.hg/tools/ioemu-dir/vl.c:520
>> >> #3 0x00000000004098ad in register_ioport_write (start=<value
>> >> optimized out>, length=<value optimized out>, size=<value optimized
>> >> out>, func=0xffffffffffffffff,
>> >> opaque=0x7f7921711760) at
>> >> /home/fjnh84/src/xen-unstable.hg/tools/ioemu-dir/vl.c:410
>> >> #4 0x0000000000432599 in bmdma_map (pci_dev=<value optimized out>,
>> >> region_num=<value optimized out>, addr=4352, size=<value optimized
>> >> out>, type=<value optimized out>)
>> >> at /home/fjnh84/src/xen-unstable.hg/tools/ioemu-dir/hw/ide.c:3520
>> >> #5 0x0000000000411cbb in pci_update_mappings (d=<value optimized
>> >> out>) at /home/fjnh84/src/xen-unstable.hg/tools/ioemu-dir/hw/pci.c:423
>> >> #6 0x0000000000411e07 in pci_default_write_config (d=0x21b5,
>> >> address=8629, val=6, len=-1) at
>> >> /home/fjnh84/src/xen-unstable.hg/tools/ioemu-dir/hw/pci.c:566
>> >> #7 0x0000000000411566 in pci_data_write (opaque=<value optimized
>> >> out>, addr=8629, val=6, len=-1) at
>> >> /home/fjnh84/src/xen-unstable.hg/tools/ioemu-dir/hw/pci.c:593
>> >> #8 0x000000000046541c in pci_host_data_writew (opaque=<value
>> >> optimized out>, addr=<value optimized out>, val=6)
>> >> at /home/fjnh84/src/xen-unstable.hg/tools/ioemu-dir/hw/pci_host.h:62
>> >> #9 0x0000000000406c9a in ioport_write (index=<value optimized out>,
>> >> address=4294967295, data=6) at
>> >> /home/fjnh84/src/xen-unstable.hg/tools/ioemu-dir/vl.c:316
>> >> #10 0x0000000000406dd1 in cpu_outw (env=<value optimized out>,
>> >> addr=8629, val=6) at
>> >> /home/fjnh84/src/xen-unstable.hg/tools/ioemu-dir/vl.c:448
>> >> #11 0x0000000000472c9e in do_outp (env=0x21b5, addr=8629, size=<value
>> >> optimized out>, val=18446744073709551615) at helper2.c:311
>> >> #12 0x0000000000472e0e in cpu_ioreq_pio (env=0x2753140,
>> >> req=0x7f7921740000) at helper2.c:351
>> >> #13 __handle_ioreq (env=0x2753140, req=0x7f7921740000) at helper2.c:446
>> >> #14 0x00000000004735a1 in cpu_handle_ioreq (opaque=<value optimized
>> >> out>) at helper2.c:515
>> >> #15 0x0000000000408cdf in main_loop_wait (timeout=<value optimized
>> >> out>) at /home/fjnh84/src/xen-unstable.hg/tools/ioemu-dir/vl.c:3794
>> >> #16 0x00000000004732d4 in main_loop () at helper2.c:580
>> >> #17 0x000000000040dc22 in main (argc=<value optimized out>,
>> >> argv=<value optimized out>, envp=<value optimized out>)
>> >> at /home/fjnh84/src/xen-unstable.hg/tools/ioemu-dir/vl.c:6161
>> >>
>> >> (gdb) x/26x aac640 <----------- old opaque value
>> >> No symbol "aac640" in current context.
>> >> (gdb) x/26x 0xaac640
>> >> 0xaac640 <php_devfn>: 0x00 0x00 0x00 0x00 0x00 0x00
>> >> 0x00 0x00
>> >> 0xaac648 <php_devfn+8>: 0x00 0x00 0x00 0x00 0x00 0x00
>> >> 0x00 0x00
>> >> 0xaac650 <php_devfn+16>: 0x00 0x00 0x00 0x00 0x00
>> >> 0x00 0x00 0x00
>> >> 0xaac658 <php_devfn+24>: 0x00 0x00
>> >> (gdb)
>> >> 0xaac65a <php_devfn+26>: 0x00 0x00 0x00 0x00 0x00
>> >> 0x00 0x00 0x00
>> >> 0xaac662 <php_devfn+34>: 0x00 0x00 0x00 0x00 0x00
>> >> 0x00 0x00 0x00
>> >> 0xaac66a <php_devfn+42>: 0x00 0x00 0x00 0x00 0x00
>> >> 0x00 0x00 0x00
>> >> 0xaac672 <php_devfn+50>: 0x00 0x00
>> >>
>> >> The crash appears due to register_ioport_read failing to register I/O
>> >> ports, because they are already registered
>> >> for a different device - namely, by code within hw/piix4acpi.c. I
>> >> still haven't quite figured out what's going on, though...
>> >>
>> >> The offending I/O port I believe is 10c0. The dsdt.asl under
>> >> firmware/hvmloader/acpi.c marks [0x10c1, 0x1101] and [0xb044, 0xb047]
>> >> as being reserved (along with a bunch of ports < 0x1000, but that's
>> >> not too interesting). Point is, the port I/O range in
>> >> edk2/OvmfPkg/PlatformPei/Platform.c is specified as 0x1000-0xF000, so
>> >> the PCI bus code will try to assign some of the
>> >> reserved ports to devices and you will crash qemu-dm. Arghhh...
>> >>
>> >> //
>> >> // Add PCI IO Port space available for PCI resource allocations.
>> >> //
>> >> BuildResourceDescriptorHob (
>> >> EFI_RESOURCE_IO,
>> >> EFI_RESOURCE_ATTRIBUTE_PRESENT |
>> >> EFI_RESOURCE_ATTRIBUTE_INITIALIZED,
>> >> 0x1000,
>> >> 0xF000
>> >> );
>> >>
>> >> ...make the range 0xc000-0x10000 and see how far you get.
>> >>
>> >
>> > Yes, when I change the port I/O range in
>> edk2/OvmfPkg/PlatformPei/Platform.c
>> > to 0xc000-0x10000, it goes a little far. The boot trace and qemu-dm debug
>> > message are attached.
>> >
>> >
>> > Thanks,
>> > Bei Guan
>> >
>> >
>> >> ...note that the PCI memory range should be changed too... the range
>> >> should be 0xf0000000 until 0xfc000000. I've updated my patch at
>> >>
>> >>
>> https://github.com/andreiw/andreiw-wip/raw/master/xen/ovmf-support/ovmf-unstable-4.2.patch
>> >> (didn't split it yet), although nothing there should impact this bug
>> >> (as long as you don't have 3.8GB of RAM assigned to VM ;-))
>> >>
>> >> I looked briefly at some relatively-fresh qemu sources I have, and
>> >> hw/hwpiix4acpi.c isn't there, which makes me think it's something
>> >> specific to the Xen fork.
>> >>
>> >> A
>> >>
>>