[SeaBIOS] varlow/extrastack vs code

Tue Feb 14 17:36:34 CET 2017

On 02/14/17 17:22, Kevin O'Connor wrote:
> On Tue, Feb 14, 2017 at 02:50:23PM +0000, Dr. David Alan Gilbert wrote:
>> I've been sporadically carrying on debugging this and I
>> think I have a bit more understanding, but not quite all the way.
>>
>> I'm pretty sure we are trying to run code that's been over written
>> by variable data - which I believe to be TimerLast, and I'm also
>> fairly sure the problem is not a delayed reset from multiple reboot signals.
>>
>> Some detail:
>> a) I added some debug to qemu to print what triggered the reboot by
>>   always passing qemu_system_reset_request a string; the kernel
>>   is always rebooting via KBD_CCMD_RESET, I see that enter the BIOS
>>   and then see SeaBIOS then does KBD_CCMD_RESET again - which I think
>>   is the one coming from handle_resume32.
>>   But then I noticed that every so often we got a reboot from a KVM_SHUTDOWN
>>   which is apparently a triple-fault or the like.
>>
>> b) With a KVM trace I got the addresses the fault was happening at,
>>   and it was always an address in the copy fo the bios around bffbcf..
>>   although the actual failure jumped about a bit, taking the:
>>     Relocating init from 0x000e4810 to 0xbffb2050 (size 57120)
>>   I worked the fault address back to 'ef799' and from the objdump
>>   found that was an address in maininit but also corresponded to
>>   TimerLast:
> 
> I can confirm there is a defect with TimerLast and reboot handling.
> I'm not sure what the best fix is - suggestions welcome.
> 
> The root of the problem is that QEMU does not reset the f-segment on a
> reboot signal

Are you sure this applies to recent QEMU too, not just ancient QEMU?

> and so SeaBIOS must manually reset that memory.

Do you mean a workaround for the (historically lacking) emulation of the
PAM chipset registers? PAM emulation has been correct for quite a while
now in QEMU, as far as I can tell.

(Sorry if I'm completely off.)

Thanks,
Laszlo

> It does
> this in the call to qemu_prep_reset() in tryReboot().  That call
> copies the pristine copy of the bios at 0xffff0000 to 0xf0000.  The
> code then goes on to signal a hard reboot so that all the hardware is
> reset and so that the code starts just like it would on first boot.
> 
> Unfortunately, the hard reboot logic makes calls to udelay() -
> specifically in i8042_reboot() and pci_reboot().  These calls to
> udelay() are not valid here as the timer isn't even necessarily
> configured at this point.  Alas - it also has the nasty possibility of
> corrupting the pristine copy of the bios that was just made in
> qemu_prep_reset().  Random crashes can then occur on the next boot.
> 
> It's easy to catch this with the following code change:
> 
> --- a/src/hw/timer.c
> +++ b/src/hw/timer.c
> @@ -146,6 +146,8 @@ timer_adjust_bits(u32 value, u32 validbits)
>  static u32
>  timer_read(void)
>  {
> +    if (!GET_GLOBAL(HaveRunPost))
> +        panic("timer read too early\n");
>      u16 port = GET_GLOBAL(TimerPort);
>      if (CONFIG_TSC_TIMER && !port)
>          // Read from CPU TSC
> 
> With that, I get a panic on every reboot.  Commenting out the udelay()
> calls then gets reboots working again.
> 
> The simplest fix is to remove the udelay() calls from the reboot
> logic.  However, it's not clear to me if it's valid to omit the delays
> when running on real hardware.  Another possibility would be to try
> and harden timer_read() so that it can be run very early (prior to
> code relocation), but that's tricky to implement correctly..
> 
> Good find Dave - I know that wasn't easy to track down.
> -Kevin
>