Help to evaluate the system crash resistance method

Posted: 01-16-2006, 03:13 PM
Hi,

I just thought a approach to hold system crash caused by those tiny SW
problems, please kindly help to evaluate:

1. Hacking the IDT, replace those exception vector like memory
violation,divided by zero with our handling logic.

2. At our handling logic, if the IRQL is above dispatch, transfer the
control to original OS handler, which will show blue screen at last; but if
not, use KeDelayExecutionThreadexction() to hold the problem thread for a
while, then kernel will re-schedule to other threads.In this case, system
still alive instead of going to blue screen.

I've been dedicated to fixing kernel bugs for several years, feel very pity
to see many times the system dead just because of a tiny driver problem.
Would think to develop a kernel piece that can help this...It NOT targets
for helping all the system crash cases - I'm aware of many crash cases are
so severe that it is no use even if you can hold it for a while, it targets
for those SW problems like DBZ, memory violate etc...each of them has a
seperate item at IDT which can be selectively replaced.

My questions here are:
1. Is there any formal way for us to get the IDT address and selectively
replace some of IDT items?
2. How long can the KeDelayExecutionThreadexction() hold the problem thread
in practice?
3. Will the overall mechnism work when driver code raises a kernel crash?

Thank you!

TR


Reply With Quote

Responses to "Help to evaluate the system crash resistance method"

Don Burn
Guest
Posts: n/a
 
Re: Help to evaluate the system crash resistance method
Posted: 01-16-2006, 03:17 PM
As some one who has worked in the fault tolerant part of the computer
industry, the problem is a lot harder than you imagine. I know of a number
of companies working on potential solutions (I am a founder of one them),
but you are not going to see discussions here of their technology, you won't
get that without an NDA.

What I can say is you either have to wrap a layer of protection around the
whole driver (such as moving it into its own address space) or provide a way
to capture enough system state to go back before the crash and do something
to avert it. Neither of these is a small task such as tweaking an IDT
member, and neither can easily be explained in a newsgroup.


--
Don Burn (MVP, Windows DDK)
Windows 2k/XP/2k3 Filesystem and Driver Consulting
Remove StopSpam from the email to reply

"darkside" <zlfeng@hotmail.com> wrote in message
news:%23BTl59qGGHA.3532@TK2MSFTNGP14.phx.gbl...
> Hi,
>
> I just thought a approach to hold system crash caused by those tiny SW
> problems, please kindly help to evaluate:
>
> 1. Hacking the IDT, replace those exception vector like memory
> violation,divided by zero with our handling logic.
>
> 2. At our handling logic, if the IRQL is above dispatch, transfer the
> control to original OS handler, which will show blue screen at last; but
> if
> not, use KeDelayExecutionThreadexction() to hold the problem thread for a
> while, then kernel will re-schedule to other threads.In this case, system
> still alive instead of going to blue screen.
>
> I've been dedicated to fixing kernel bugs for several years, feel very
> pity
> to see many times the system dead just because of a tiny driver problem.
> Would think to develop a kernel piece that can help this...It NOT targets
> for helping all the system crash cases - I'm aware of many crash cases are
> so severe that it is no use even if you can hold it for a while, it
> targets
> for those SW problems like DBZ, memory violate etc...each of them has a
> seperate item at IDT which can be selectively replaced.
>
> My questions here are:
> 1. Is there any formal way for us to get the IDT address and selectively
> replace some of IDT items?
> 2. How long can the KeDelayExecutionThreadexction() hold the problem
> thread
> in practice?
> 3. Will the overall mechnism work when driver code raises a kernel crash?
>
> Thank you!
>
> TR
>

Reply With Quote
darkside
Guest
Posts: n/a
 
Re: Help to evaluate the system crash resistance method
Posted: 01-16-2006, 03:27 PM
I don't know what's your mean of "such as moving it into its own address
space"(moving which into whose address space?), can you explain why?

Regarding to the system state, I think the processor should reserve most of
them if not all, it is a standard kernel exception handling mechnism, the
processor and OS know which things they should keep in mind...

"Don Burn" <burn@stopspam.acm.org> wrote in message
news:eZQQDArGGHA.3320@TK2MSFTNGP12.phx.gbl...
> As some one who has worked in the fault tolerant part of the computer
> industry, the problem is a lot harder than you imagine. I know of a
> number
> of companies working on potential solutions (I am a founder of one them),
> but you are not going to see discussions here of their technology, you
> won't
> get that without an NDA.
>
> What I can say is you either have to wrap a layer of protection around the
> whole driver (such as moving it into its own address space) or provide a
> way
> to capture enough system state to go back before the crash and do
> something
> to avert it. Neither of these is a small task such as tweaking an IDT
> member, and neither can easily be explained in a newsgroup.
>
>
> --
> Don Burn (MVP, Windows DDK)
> Windows 2k/XP/2k3 Filesystem and Driver Consulting
> Remove StopSpam from the email to reply
>
> "darkside" <zlfeng@hotmail.com> wrote in message
> news:%23BTl59qGGHA.3532@TK2MSFTNGP14.phx.gbl...
>> Hi,
>>
>> I just thought a approach to hold system crash caused by those tiny SW
>> problems, please kindly help to evaluate:
>>
>> 1. Hacking the IDT, replace those exception vector like memory
>> violation,divided by zero with our handling logic.
>>
>> 2. At our handling logic, if the IRQL is above dispatch, transfer the
>> control to original OS handler, which will show blue screen at last; but
>> if
>> not, use KeDelayExecutionThreadexction() to hold the problem thread for a
>> while, then kernel will re-schedule to other threads.In this case, system
>> still alive instead of going to blue screen.
>>
>> I've been dedicated to fixing kernel bugs for several years, feel very
>> pity
>> to see many times the system dead just because of a tiny driver problem.
>> Would think to develop a kernel piece that can help this...It NOT targets
>> for helping all the system crash cases - I'm aware of many crash cases
>> are
>> so severe that it is no use even if you can hold it for a while, it
>> targets
>> for those SW problems like DBZ, memory violate etc...each of them has a
>> seperate item at IDT which can be selectively replaced.
>>
>> My questions here are:
>> 1. Is there any formal way for us to get the IDT address and selectively
>> replace some of IDT items?
>> 2. How long can the KeDelayExecutionThreadexction() hold the problem
>> thread
>> in practice?
>> 3. Will the overall mechnism work when driver code raises a kernel crash?
>>
>> Thank you!
>>
>> TR
>>
>
>

Reply With Quote
Don Burn
Guest
Posts: n/a
 
Re: Help to evaluate the system crash resistance method
Posted: 01-16-2006, 03:38 PM
See my comments on microsoft.public.development.device.drivers posting the
same question independantly under multiple groups is not a nice idea.


--
Don Burn (MVP, Windows DDK)
Windows 2k/XP/2k3 Filesystem and Driver Consulting
Remove StopSpam from the email to reply


"darkside" <zlfeng@hotmail.com> wrote in message
news:exGLRFrGGHA.2040@TK2MSFTNGP14.phx.gbl...
>I don't know what's your mean of "such as moving it into its own address
> space"(moving which into whose address space?), can you explain why?
>
> Regarding to the system state, I think the processor should reserve most
> of
> them if not all, it is a standard kernel exception handling mechnism, the
> processor and OS know which things they should keep in mind...
>
> "Don Burn" <burn@stopspam.acm.org> wrote in message
> news:eZQQDArGGHA.3320@TK2MSFTNGP12.phx.gbl...
>> As some one who has worked in the fault tolerant part of the computer
>> industry, the problem is a lot harder than you imagine. I know of a
>> number
>> of companies working on potential solutions (I am a founder of one them),
>> but you are not going to see discussions here of their technology, you
>> won't
>> get that without an NDA.
>>
>> What I can say is you either have to wrap a layer of protection around
>> the
>> whole driver (such as moving it into its own address space) or provide a
>> way
>> to capture enough system state to go back before the crash and do
>> something
>> to avert it. Neither of these is a small task such as tweaking an IDT
>> member, and neither can easily be explained in a newsgroup.
>>
>>
>> --
>> Don Burn (MVP, Windows DDK)
>> Windows 2k/XP/2k3 Filesystem and Driver Consulting
>> Remove StopSpam from the email to reply
>>
>> "darkside" <zlfeng@hotmail.com> wrote in message
>> news:%23BTl59qGGHA.3532@TK2MSFTNGP14.phx.gbl...
>>> Hi,
>>>
>>> I just thought a approach to hold system crash caused by those tiny SW
>>> problems, please kindly help to evaluate:
>>>
>>> 1. Hacking the IDT, replace those exception vector like memory
>>> violation,divided by zero with our handling logic.
>>>
>>> 2. At our handling logic, if the IRQL is above dispatch, transfer the
>>> control to original OS handler, which will show blue screen at last; but
>>> if
>>> not, use KeDelayExecutionThreadexction() to hold the problem thread for
>>> a
>>> while, then kernel will re-schedule to other threads.In this case,
>>> system
>>> still alive instead of going to blue screen.
>>>
>>> I've been dedicated to fixing kernel bugs for several years, feel very
>>> pity
>>> to see many times the system dead just because of a tiny driver problem.
>>> Would think to develop a kernel piece that can help this...It NOT
>>> targets
>>> for helping all the system crash cases - I'm aware of many crash cases
>>> are
>>> so severe that it is no use even if you can hold it for a while, it
>>> targets
>>> for those SW problems like DBZ, memory violate etc...each of them has a
>>> seperate item at IDT which can be selectively replaced.
>>>
>>> My questions here are:
>>> 1. Is there any formal way for us to get the IDT address and selectively
>>> replace some of IDT items?
>>> 2. How long can the KeDelayExecutionThreadexction() hold the problem
>>> thread
>>> in practice?
>>> 3. Will the overall mechnism work when driver code raises a kernel
>>> crash?
>>>
>>> Thank you!
>>>
>>> TR
>>>
>>
>>
>
>

Reply With Quote
 
LinkBack Thread Tools Display Modes
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


Similar Threads
Thread Thread Starter Forum Replies Last Post
System crash Henry Tran Windows XP Games 3 12-14-2006 06:12 AM
system crash john Windows XP Performance & Maintenance 11 08-02-2004 07:55 AM
system crash Windows XP Device Drivers 1 12-24-2003 06:26 PM
system crash martin Customize Windows XP 1 07-17-2003 03:18 AM