2005년 03월 25일
Tips for Improving Time-Critical Code
Sorting and Searching
Sorting is inherently time consuming compared to many typical operations. The best way to avoid unnecessary slowdown is to avoid sorting at critical times. You may be able to:
Defer sorting until a non-performance–critical time.
Sort the data at an earlier, non-performance–critical time.
Sort only the part of the data that truly needs sorting.
Sometimes, you can build the list in sorted order. Be careful, because if you need to insert data in sorted order, you may require a more complicated data structure with poor locality of reference, leading to cache misses and page faults. There is no approach that works in all cases. Try several approaches and measure the differences.
Here are some general tips for sorting:
Use a stock sort to minimize bugs.
Any work you can do beforehand to reduce the complexity of the sort is worthwhile. If a one-time pass over your data simplifies the comparisons and reduces the sort from O(n log n) to O(n), you will almost certainly come out ahead.
Think about the locality of reference of the sort algorithm and the data you expect it to run on.
There are fewer alternatives for searches than for sorting. If the search is time-critical, a binary search or hash table lookup is almost always best, but as with sorting, you must keep locality in mind. A linear search through a small array can be faster than a binary search through a data structure with a lot of pointers that causes page faults or cache misses.
Shared Libraries
Code reuse is desirable. However, if you are going to use someone else's code, you should make sure you know exactly what it does in those cases where performance is critical to you. The best way to understand this is by stepping through the source code or by measuring with tools such as PView or Performance Monitor.
Heaps
Use multiple heaps with discretion. Additional heaps created with HeapCreate and HeapAlloc let you manage and then dispose of a related set of allocations. Don't commit too much memory. If you're using multiple heaps, pay special attention to the amount of memory that is initially committed.
Instead of multiple heaps, you can use helper functions to interface between your code and the default heap. Helper functions facilitate custom allocation strategies that can improve the performance of your application. For example, if you frequently perform small allocations, you may want to localize these allocations to one part of the default heap. You can allocate a large block of memory and then use a helper function to suballocate from that block. If you do this, you will not have additional heaps with unused memory because the allocation is coming out of the default heap.
In some cases, however, using the default heap can reduce locality of reference. Use Process Viewer, Spy++, or Performance Monitor to measure the effects of moving objects from heap to heap.
Measure your heaps so you can account for every allocation on the heap. Use the C run-time debug heap routines to checkpoint and dump your heap. You can read the output into a spreadsheet program like Microsoft Excel and use pivot tables to view the results. Note the total number, size, and distribution of allocations. Compare these with the size of working sets. Also look at the clustering of related-sized objects.
You can also use the performance counters to monitor memory usage.
Threads
For background tasks, effective idle handling of events may be faster than using threads. It's easier to understand locality of reference in a single-threaded program.
A good rule of thumb is to use a thread only if an operating system notification that you block on is at the root of the background work. Threads are the best solution in such a case because it is impractical to block a main thread on an event.
Threads also present communication problems. You must manage the communication link between your threads, with a list of messages or by allocating and using shared memory. Managing the communication link usually requires synchronization to avoid race conditions and deadlock problems. This complexity can easily turn into bugs and performance problems.
For additional information, see Idle Loop Processing and Multithreading.
Small Working Set
Smaller working sets mean better locality of reference, fewer page faults, and more cache hits. The process working set is the closest metric the operating system directly provides for measuring locality of reference.
ㆍTo set the upper and lower limits of the working set, use SetProcessWorkingSetSize.
ㆍTo get the upper and lower limits of the working set, use GetProcessWorkingSetSize.
ㆍTo view the size of the working set, use Spy++.
ms-help://MS.MSDNQTR.2003FEB.1033/vccore/html/_core_tips_for_improving_time.2d.critical_code.htm
Sorting is inherently time consuming compared to many typical operations. The best way to avoid unnecessary slowdown is to avoid sorting at critical times. You may be able to:
Defer sorting until a non-performance–critical time.
Sort the data at an earlier, non-performance–critical time.
Sort only the part of the data that truly needs sorting.
Sometimes, you can build the list in sorted order. Be careful, because if you need to insert data in sorted order, you may require a more complicated data structure with poor locality of reference, leading to cache misses and page faults. There is no approach that works in all cases. Try several approaches and measure the differences.
Here are some general tips for sorting:
Use a stock sort to minimize bugs.
Any work you can do beforehand to reduce the complexity of the sort is worthwhile. If a one-time pass over your data simplifies the comparisons and reduces the sort from O(n log n) to O(n), you will almost certainly come out ahead.
Think about the locality of reference of the sort algorithm and the data you expect it to run on.
There are fewer alternatives for searches than for sorting. If the search is time-critical, a binary search or hash table lookup is almost always best, but as with sorting, you must keep locality in mind. A linear search through a small array can be faster than a binary search through a data structure with a lot of pointers that causes page faults or cache misses.
Shared Libraries
Code reuse is desirable. However, if you are going to use someone else's code, you should make sure you know exactly what it does in those cases where performance is critical to you. The best way to understand this is by stepping through the source code or by measuring with tools such as PView or Performance Monitor.
Heaps
Use multiple heaps with discretion. Additional heaps created with HeapCreate and HeapAlloc let you manage and then dispose of a related set of allocations. Don't commit too much memory. If you're using multiple heaps, pay special attention to the amount of memory that is initially committed.
Instead of multiple heaps, you can use helper functions to interface between your code and the default heap. Helper functions facilitate custom allocation strategies that can improve the performance of your application. For example, if you frequently perform small allocations, you may want to localize these allocations to one part of the default heap. You can allocate a large block of memory and then use a helper function to suballocate from that block. If you do this, you will not have additional heaps with unused memory because the allocation is coming out of the default heap.
In some cases, however, using the default heap can reduce locality of reference. Use Process Viewer, Spy++, or Performance Monitor to measure the effects of moving objects from heap to heap.
Measure your heaps so you can account for every allocation on the heap. Use the C run-time debug heap routines to checkpoint and dump your heap. You can read the output into a spreadsheet program like Microsoft Excel and use pivot tables to view the results. Note the total number, size, and distribution of allocations. Compare these with the size of working sets. Also look at the clustering of related-sized objects.
You can also use the performance counters to monitor memory usage.
Threads
For background tasks, effective idle handling of events may be faster than using threads. It's easier to understand locality of reference in a single-threaded program.
A good rule of thumb is to use a thread only if an operating system notification that you block on is at the root of the background work. Threads are the best solution in such a case because it is impractical to block a main thread on an event.
Threads also present communication problems. You must manage the communication link between your threads, with a list of messages or by allocating and using shared memory. Managing the communication link usually requires synchronization to avoid race conditions and deadlock problems. This complexity can easily turn into bugs and performance problems.
For additional information, see Idle Loop Processing and Multithreading.
Small Working Set
Smaller working sets mean better locality of reference, fewer page faults, and more cache hits. The process working set is the closest metric the operating system directly provides for measuring locality of reference.
ㆍTo set the upper and lower limits of the working set, use SetProcessWorkingSetSize.
ㆍTo get the upper and lower limits of the working set, use GetProcessWorkingSetSize.
ㆍTo view the size of the working set, use Spy++.
ms-help://MS.MSDNQTR.2003FEB.1033/vccore/html/_core_tips_for_improving_time.2d.critical_code.htm
# by | 2005/03/25 11:26 | 시간.. | 트랙백 | 덧글(8)





☞ 내 이글루에 이 글과 관련된 글 쓰기 (트랙백 보내기) [도움말]