WinCE下錯(cuò)誤報(bào)告的用法
WinCE下錯(cuò)誤報(bào)告的用法
??????仔細(xì)看了昨晚搜刮到的那篇關(guān)于WinCE錯(cuò)誤報(bào)告的文章,受益匪淺。使用流程已經(jīng)介紹得足夠詳細(xì),這里不再贅述。需要參考的同志,請(qǐng)直接看原文。這里補(bǔ)充一點(diǎn)原文中貌似沒講到的,個(gè)人覺得很關(guān)鍵的地方,如何在產(chǎn)品中保留錯(cuò)誤報(bào)告。
???????那篇文章中提到要將錯(cuò)誤報(bào)告從Windows的目錄下拷貝到別的非易失性的存儲(chǔ)器中,以便后續(xù)的分析與處理。這在開發(fā)過程中是沒有問題的。但如果產(chǎn)品已經(jīng)發(fā)布,我們需要從客戶的設(shè)備中獲取錯(cuò)誤報(bào)告,就行不通了。
???????最初測(cè)試時(shí)發(fā)現(xiàn),當(dāng)應(yīng)用程序出錯(cuò)時(shí)會(huì)在Windows的目錄下生成錯(cuò)誤報(bào)告,但選擇“不發(fā)送”,從而關(guān)閉“錯(cuò)誤報(bào)告”對(duì)話框后,生成的錯(cuò)誤報(bào)告文件被自動(dòng)刪除。我想原文之所以說要拷貝出來,大概就是這個(gè)原因。很顯然,我們需要將該目錄直接挪到SD卡或者NAND目錄中,并且阻止系統(tǒng)自動(dòng)刪除該文件。
???????嘗試了半天,終于找到了解決辦法,修改注冊(cè)表的相關(guān)鍵值即可。需要修改的主要有以下兩項(xiàng)。
[HKEY_LOCAL_MACHINESystemErrorReportingDumpSettings]
????"DumpDirectory"="\Windows\DumpFiles"???????????????;此處修改為指定的目錄
????"ExtraFilesDirectory"="\Windows\ExtraDumpFiles"?????;此處修改為指定的目錄
????"CabDirectory"="\Windows\DumpFiles\CabFiles"???;此處修改為指定的目錄
????"UploadClient"="\Windows\Dw.exe"
????"MaxDiskUsage"=dword:80000
????"DumpEnabled"=dword:1
?[HKEY_LOCAL_MACHINESystemErrorReportingUploadSettings]
????"NoConsentRequired"=dword:0????;此處修改為1,取消彈出提示框
????"DontUpload"=dword:0????????????;此處修改為1,取消上傳及自動(dòng)刪除,使其保留在目錄中
????"MaxWeeklyReports"=dword:0
????"MaxDailyReports"=dword:0
????"UploadSucceededDlg"=dword:0
??? "UploadFailedDlg"=dword:0
[HKEY_LOCAL_MACHINEinit]
??? "Launch95"="dw.exe"????????????;刪除該項(xiàng),禁止該程序自啟動(dòng)
另外,需要在指定的目錄下創(chuàng)建Dumpfiles的目錄,否則錯(cuò)誤報(bào)告也不能正確生成,特別需要注意這一點(diǎn)。所有的配置正確后,再次運(yùn)行Crash.exe,就可以在B:DumpfilesCe092009-01目錄下生成一個(gè)名為Ce092009-01.kdmp的文件。我是在模擬器中測(cè)試的,將B:(Ramdisk)用做模擬器的Storage Card,所以是在該目錄下生成了該文件。下面就可以用WinDgb工具進(jìn)行分析獲得的錯(cuò)誤報(bào)告了。先設(shè)定源代碼目錄、映像目錄、符號(hào)文件目錄,然后打開Ce092009-01.kdmp文件,就可以在源代碼中定位到出錯(cuò)的位置,如下圖所示。
????至此,這個(gè)BUG就一目了然了,F(xiàn)ix也是很容易的事。
折騰了大半天,最后發(fā)現(xiàn)其實(shí)改動(dòng)很小,僅僅是注冊(cè)表中的幾個(gè)鍵值。可不折騰,又怎么知道修改什么地方,怎么修改呢?最近調(diào)試一個(gè)新的開發(fā)板,也有同樣的感受,最后修改也許很簡(jiǎn)單,但如何定位到這個(gè)需要修改的地方及如何修改則需要花很多時(shí)間。不過沒關(guān)系,我還是挺愛折騰這些的。
昨天折騰了一個(gè)左道,應(yīng)該能緩解一下目前存在的問題。今天折騰了一個(gè)正道,應(yīng)該能有助于問題的最終解決,雙管齊下,希望能讓產(chǎn)品更穩(wěn)定。
??????WinDbg的下載地址:http://msdl.microsoft.com/download/symbols/debuggers/dbg_x86_6.11.1.402.msi,它是一個(gè)很強(qiáng)大的工具,用好了也是一門學(xué)問,有時(shí)間再進(jìn)一步學(xué)習(xí)。
_____________________________________________________________________________________________________________
by Abraham Kcholi and Gad Meir
Introduction
Because we believe that we are perfect, it follows that we create perfect?software.
Therefore, it is thehardware's?fault when our systems crash. By "systems," of course, we're referring to the combination ofhardware,
operating system, and applications that comprise the whole embedded system.?
The scenario goes like this... We deliver the system to the client, get paid (hopefully), and then a week later, we get a nervous call in the middle of the night: "Your?system crashed."
Trying to get oriented and open our eyes, we start to query the person on the other end of the line regarding what really occurred, and we end up realizing that something caused the system to crash. We promise to start investigating the problem first thing
in the morning. But our beauty sleep has now evaporated, and it's time to go and trace that crash.
If only we had incorporated the Windows Error Reporting (WER) module, into our system! This would have let us retrieve the state of our device at the time the program crashed. More than that, we could have uploaded it from the device, stuck it in WinDbg, and
determined the?exact line?where our mischievous code broke down.
Motivation
As is usually the case, demonstrating this new feature of Windows CE 5.0 is the best way to explain what it does and illustrate its usefulness. Our scenario assumes you developed a program or module running on a Windows CE 5.0 based device, the program is installed
on thousands of units, and complaints are flowing in from end users that the application sometimes crashes. Wouldn't it be nice to know exactly why each crash happens, to the level of having a stack trace with the source code line number and the value of local
variables at the point of the crash? Well, WER gives you just that.
Our application can be any application, whether native or managed. To demonstrate that it could be any type of application, we will use a simple console application running on a Pocket PC device, but it could be a Windows application, or a special purpose Windows
CE module.
The process we are going to describe is CPU agnostic and can be used for any type of hardware running Windows CE 5.0.
Here is the source code of our sample culprit application:??
??????????????????????????????????????????????? ??????????
Figure 1
After deploying the application onto the device, running the application will obviously cause a divide-by-zero exception. Since our device has the Error Reporting feature incorporated into its image, a polite message will pop up, asking the user if he or she
would like to share their unfortunate experience of that offensive application with Microsoft.
Figure 2
There are two links on the page. Let's examine the second link a little bit further: the link indicated by "To View technical information contained in this error report."
Figure 3
It looks like two files are about to be uploaded. Clicking each of the links, to find out more about what is sent, yields:
Figure 4
Figure 5
The first looks like a report, and the second looks like a memory dump.
Later in the article, we'll see how this information can travel from Microsoft buckets directly to your product support FTP. For now, let's dig a little bit into the CE device file system.
We're interested specifically in the?My Device/Windows/System/DumpFiles?folder. It is a hidden folder, so you'll need to set?show all?for the file explorer to view it.
Figure 6
In this folder, you are going to find another folder with the prefix "CE," the date of the application crash in the format "MMDDYY," and a sequence number, in case you are lucky enough to have several applications crash on the same day. In that folder, you
can find one or two files that are actually the sources of the data you have seen previously in the Windows CE ER reports.
Since those file are deleted after you make up your mind about sending the report to Microsoft, let's copy that folder to a safe place for further examination (copy and paste to another folder, or to an SD card).
Figure 7
Assuming you got the dump somehow -- from Microsoft, from your customer, or you grabbed it yourself from the customer's device during maintenance -- let's see what can be learned from that dump.
Some sort of analysis tool is needed in order to analyze the dump, and the best one is WinDbg. WinDbg is included in the "debugging tools for windows" package, freely available from the Microsoft WHDC site. We'll talk more about that package later, but first
we must set the stage.
We need access to the source code of our application; we need the symbol files produced by the compiler and the application image (exe file). Since we created that program in the first place, it's probably a very straightforward process to get it onto our workstation.
So, assuming we installed and configured on our workstation with the needed tools, lets start the analysis process:
Start WinDbgDrag and drop the dump onto the WinDbg window
Assuming everything is set up correctly, the result would be as shown in Figure 8.
Figure 8
(Click to enlarge)
It becomes clear that it may be a divide-by-zero exception:
(Click to enlarge)
The assembly snippet shows the exact command that caused the crash. However that's just the beginning. Let's click the stack trace button:
Figure 9
(Click to enlarge)
As can be seen, we can tell the exact source file and line number that are causing the problem -- and that's not the end of the story. If we move to the stack frame in our program, we can open up a new source window with the faulty line clearly marked:
Figure 10
(Click to enlarge)
And, last but not least, the locals window is going to give us the local values at that frame, including the value of "i":
Figure 11
(Click to enlarge)
It's very tempting to change the value of "i" and retry the application in the debugger, but there are several practical reasons why you can't do that.
First, the host workstation we are using to analyze the data is most likely an x86-based computer, whereas the target device may be an ARM-based device or a MIPS-based device, etc., and although it looks like a live debugging session, you are actually debugging
a piece of dump memory created automatically for you by the Windows CE WER function. Nevertheless, if you can get the dump to your analyzing machine, you can tell exactly what happened to your app at the moment of the crash, which is the primary motivation
for the article you are reading right now.
By the way, if you are too lazy to remember all those debugger commands, just remember one command. The!analyze -v command. The following output of that command might explain why it is probably the most useful command in WinDbg:
Figure 12
(Click to enlarge)
Adding Error reporting to the image
Among Windows CE 5.0's most interesting new features is a set of error reporting components that we can add to our image. There are four components that can be added to the image from the catalog. The report upload component, however, can either provide a graphic
user interface or not. In Figure 13, a view of error reporting catalog items is shown.
Figure 13
With error reporting incorporated into our device, when a program crashes, the device will automatically save the state of the device at the point in time the program crashed. The error report generator will save a dump file, which includes some very useful
information that should be helpful in eliminating bugs that escaped the testing process.
Error Report Generator
The Error Report Generator is the component responsible for the creation of dump files using the configuration options set in the registry.
The dump file formats are compatible with the requirements of Microsoft's?Watson website.
This enables the uploading server to handle classification of -- and reporting of -- the uploaded dump files.
To generate an error report dump file, at least 128KB of memory must be reserved. The OAL developer initializes the size of the memory to be reserved by setting a variable named?dwNKDrWatsonSize.?This is done in the?OEMInit?function, as shown
in Figure 14.
Figure 14
(Click to enlarge)
The kernel will use this size to reserve a block of memory at the end of the main memory. The Sysgen variable SYSGEN_WATSON_DMPGEN must be set to include the Error Report Generator in the image.
The?HKEY_Local_Machine/System/ErrorReporting/DumpSettings?registry key holds the registry values for error report generation. Figure 15 is a sample of such registry setting.
Figure 15
(Click to enlarge)
The?Error Report Transfer Driver?transfers registry setting values to the aforementioned reserved memory. The Error Report Generator then retrieves these settings from memory, in order to generate the appropriate dump file. These inform the Error Report
Generator where to generate the dump file and what type of dump to create; in this case it's the system dump, and the maximum disk size to use is four times the size of the reserved memory.
While developing an OS design, the developer sets the type of crash dump to be generated. Each type of dump follows the same file format, three of which can be generated:
Context dumps, 4 KB to 64 KB Information about the crashing systemThe exception that initiated the crashThe context record of the faulting threadA module list, limited to the faulting threads of the owner processA thread list, limited to the faulting threads of the owner processThe call stack of the faulting thread64 bytes of memory above and below the instruction pointer of the faulting threadStack memory dump of the faulting thread, truncated to fit a 64 KB limitSystem dumps, 64 KB -- several MB All information in a Context dumpCalls tacks and context records for all threadsComplete module, process, and thread lists for the entire device2048 bytes of memory above and below the instruction pointer of the faulting thread.Global variables for the process that was current at the time of the crashComplete dumps, including all physical memory plus at least 64 KB All information in a context dumpA complete dump of all used memory
The Error Report Generator generates files in a well-defined format. It starts with a single MINIDUMP_HEADER structure, followed by a number of MINIDUMP_DIRECTORY entries each describing data type, such as system info or exception info, the size of the data
in bytes to be stored in the file, and a Relative (to the beginning of the file) Virtual Address (RVA) pointer to where the data begins in the file.
All the relevant structures can be found in?$(_COMMONOAKROOT)/INC/DwCeDump.h.
Error Report Transfer Driver
The Error Report Transfer Driver moves the registry values (needed by the Error Report Generator) from the registry to the reserved memory block, and moves the generated files from reserved memory into persistent files.
After transferring a dump file to persistent storage, the Error Report Transfer Driver launches the Report Upload Client specified in the registry.
The Sysgen variable "SYSGEN_WATSON_XFER" must be set to include the Error Report Transfer Driver in the image.
The?HKEY_LOCAL_MACHINE/Drivers/BuiltIn/ErrorReporting?registry key holds the registry values for Error Report Transfer Driver. Figure 16 shows a sample of such a registry setting, in which the time interval for transfer polling is set to 5 minutes
and the poll priority is set to 249.
Figure 16
(Click to enlarge)
Error Report Control Panel
The Error Reporting Control Panel allows the user of a display-based device to configure options for dump file generation by way of a Control Panel applet. The options available to the user are:
Enable/disable error reporting -- on a display-based device, error reporting is enabled by default. On a headless device, error reporting is disabled by default.
Control the amount of storage space allocated for dump files -- the control panel dialog box contains a set of radio buttons that allow the user to select the amount of storage space for storing dump files, as can be seen in Figure 17.
Enable user notification dialogs
?
Figure 17
(Click to enlarge)
The Sysgen variable "SYSGEN_WATSON_CTLPNL" must be set to include the Error Reporting Control Panel in the image.
The registry settings contained in the?HKLM/System/ErrorReporting/DumpSettings?registry key and in theHKLM/System/ErrorReporting/UploadSettings?registry key are used by the Error Reporting Control Panel to set the initial values in the control
panel dialog.
Report Upload Client
The upload client is responsible for uploading the generated and created dump file to thewatson.microsoft.com?error
reporting web site. It is, however, possible to upload this file to another web site -- but that involves code changes, for example the function?FValidBucketResponseURL, so it validates a different website than the above mentioned and implemented
in (_PUBLICROOT) /WCESHELLFE/OAK/WATSON/DWUI/ DWUIDLGS.CPP.
Another file you want to look at is (_PUBLICROOT) /COMMON/OAK/INC/DWPUBLIC.H. Here, you can define a valid response server (VALID_RESPONSE_SERVER) for your server, and, of course, you need to create an upload website capable getting bucket parameters, grouping
minidumps into buckets, and responding to the upload client. While all this is possible, it might not be worth the trouble.
Minidumps and Buckets
A minidump is a dump file generated on the device by Dr. Watson, containing the most important parts of a crashed application. It's "mini" name results from the fact that it contains only what is needed to identify and analyze the crashed application. A bucket
represents a unique bug or problem and identifies the component responsible for the bug. Bucketing helps the upload server to organize uploaded minidumps. All of this means that minidumps describing the same problem are grouped together in what is termed a
bucket.
The structure?DMPFILEINFO?contains all the information needed to group a minidump file in a bucket:
//?Structure?to?contain?information?regarding?the?dump?file typedef?struct?tagDMPFILEINFO { ???WORD?wBucketParams;??????//?how?many?bucket?parameters?are?being?used ???LPWSTR??rgwzBucketParams[MAX_BUCKETPARAMS];?//?bucket?parameters?for?generic ????????????????????????????//?mode ???LPWSTR?pwzQueryString;???//?additional?query?string ???LPWSTR?pwzAppName;???????//?Name?to?display?in?the?UI. ???LPWSTR?pwzFilesToKeep;???//?files?to?include?in?log?but?not?delete ???LPWSTR?pwzFilesToDelete;?//?files?to?include?in?log?but?delete?when?finished ???BOOL???fGenericParams;???//?True?indicates?the?bucket?parameters?are?generic? ????????????????????????????//?parameters }?DMPFILEINFO,?*PDMPFILEINFO;
Figure 18
How does it work?
When an application crashes, Dr. Watson goes into action and calls a function?GenerateDumpFileContentimplemented in (_WINCEROOT) PRIVATE/WINCEOS/UTILS/USREXCEPTDMP/UDUMPGEN.CPP.?
This function does most of the work. It makes sure Dr. Watson is not preempted, and completes its job by setting its thread to the highest priority and its quantum to run to completion. It then gathers system, module, exception, process, and thread information
into a CRASH_DATA structure defined in the same file. This structure actually defines a collection of structures. Once crash data has been collected, it resets the thread to its original state and writes the crash data to a dump file. That's it.
Epilogue
This article is by no means a comprehensive view on the subject of post mortem debugging and error reporting of retail devices. However, it should be viewed as a teaser for the reader to delve into the subject and take a look at the sources available. The following
locations are a good place to begin:
(_PUBLICROOT) /WCESHELLFE/OAK/WATSON/DWUI(_WINCEROOT) PRIVATE/WINCEOS/COREOS/NK/OSAXS(_WINCEROOT) PRIVATE/WINCEOS/UTILS/USREXCEPTDMP
We hope that error reporting will become part of the retail images you create, mainly so you can provide better and more robust systems for your clients.