Tips for Improving Time-Critical Code

Sorting and Searching
Sorting is inherently time consuming compared to many typical operations. The best way to avoid unnecessary slowdown is to avoid sorting at critical times. You may be able to:

Defer sorting until a non-performance–critical time.
Sort the data at an earlier, non-performance–critical time.
Sort only the part of the data that truly needs sorting.
Sometimes, you can build the list in sorted order. Be careful, because if you need to insert data in sorted order, you may require a more complicated data structure with poor locality of reference, leading to cache misses and page faults. There is no approach that works in all cases. Try several approaches and measure the differences.

Here are some general tips for sorting:

Use a stock sort to minimize bugs.
Any work you can do beforehand to reduce the complexity of the sort is worthwhile. If a one-time pass over your data simplifies the comparisons and reduces the sort from O(n log n) to O(n), you will almost certainly come out ahead.
Think about the locality of reference of the sort algorithm and the data you expect it to run on.
There are fewer alternatives for searches than for sorting. If the search is time-critical, a binary search or hash table lookup is almost always best, but as with sorting, you must keep locality in mind. A linear search through a small array can be faster than a binary search through a data structure with a lot of pointers that causes page faults or cache misses.

Shared Libraries
Code reuse is desirable. However, if you are going to use someone else's code, you should make sure you know exactly what it does in those cases where performance is critical to you. The best way to understand this is by stepping through the source code or by measuring with tools such as PView or Performance Monitor.

Heaps
Use multiple heaps with discretion. Additional heaps created with HeapCreate and HeapAlloc let you manage and then dispose of a related set of allocations. Don't commit too much memory. If you're using multiple heaps, pay special attention to the amount of memory that is initially committed.

Instead of multiple heaps, you can use helper functions to interface between your code and the default heap. Helper functions facilitate custom allocation strategies that can improve the performance of your application. For example, if you frequently perform small allocations, you may want to localize these allocations to one part of the default heap. You can allocate a large block of memory and then use a helper function to suballocate from that block. If you do this, you will not have additional heaps with unused memory because the allocation is coming out of the default heap.

In some cases, however, using the default heap can reduce locality of reference. Use Process Viewer, Spy++, or Performance Monitor to measure the effects of moving objects from heap to heap.

Measure your heaps so you can account for every allocation on the heap. Use the C run-time debug heap routines to checkpoint and dump your heap. You can read the output into a spreadsheet program like Microsoft Excel and use pivot tables to view the results. Note the total number, size, and distribution of allocations. Compare these with the size of working sets. Also look at the clustering of related-sized objects.

You can also use the performance counters to monitor memory usage.

Threads
For background tasks, effective idle handling of events may be faster than using threads. It's easier to understand locality of reference in a single-threaded program.

A good rule of thumb is to use a thread only if an operating system notification that you block on is at the root of the background work. Threads are the best solution in such a case because it is impractical to block a main thread on an event.

Threads also present communication problems. You must manage the communication link between your threads, with a list of messages or by allocating and using shared memory. Managing the communication link usually requires synchronization to avoid race conditions and deadlock problems. This complexity can easily turn into bugs and performance problems.

For additional information, see Idle Loop Processing and Multithreading.

Small Working Set
Smaller working sets mean better locality of reference, fewer page faults, and more cache hits. The process working set is the closest metric the operating system directly provides for measuring locality of reference.

ㆍTo set the upper and lower limits of the working set, use SetProcessWorkingSetSize.
ㆍTo get the upper and lower limits of the working set, use GetProcessWorkingSetSize.
ㆍTo view the size of the working set, use Spy++.

ms-help://MS.MSDNQTR.2003FEB.1033/vccore/html/_core_tips_for_improving_time.2d.critical_code.htm

by 드레 | 2005/03/25 11:26 | 시간.. | 트랙백 | 덧글(8)

Dynamic Clustering..

그림은 분산 서버를 구현하는 방법을 나타낸다. 그 중에 나에 눈에 띄는 방식이 있어 올려본다. Dynamic Clustering..


아마 Sphere-Tree의 알고리즘을 이용하지 않을까 라는 생각이다.

by 드레 | 2005/03/23 18:32 | 시간.. | 트랙백 | 덧글(0)

JAVA LiveConnect

JavaScript to Java communication :
ㆍ http://wp.netscape.com/eng/mozilla/3.0/handbook/javascript/livecon.htm#1007749

LiveConnect provides three ways for JavaScript to communicate with Java:

Call Java methods directly.
Control Java applets.
Control Java plug-ins.
more...


Java to JavaScript communication
ㆍ http://wp.netscape.com/eng/mozilla/3.0/handbook/javascript/livecon.htm#1007828
To access JavaScript methods, properties, and data structures from your Java applet, import the Netscape javascript package:

import netscape.javascript.*
The package netscape.javascript defines the JSObject class and the JSException exception object.
The author of an HTML page must permit an applet to access JavaScript by specifying the MAYSCRIPT attribute of the <APPLET> tag. This prevents an applet from accessing JavaScript on a page without the knowledge of the page author. Attempting to access JavaScript from an applet that does not have the MAYSCRIPT attribute generates an exception. The MAYSCRIPT tag is needed only for Java to access JavaScript; it is not needed for JavaScript to access Java.

more...

LiveConnect 으로 해본 것은 모든 인터페이스를 스크립트로 구여하여 만든 소켓 사용 채팅이 있었다.
이 방법은 javascript의 많은 부분의 제약을 어느정도 효과적으로 활용 할 수 있는 방법이라 생각되어 진다.

by 드레 | 2005/03/23 10:53 | 시간.. | 트랙백 | 덧글(0)

Codepage & CharSet

HTML 이나 ASP 를 서비스할때 언어로 인한 문제에 대해 간단히 적어본다.

CODEPAGE 나 CHARSET 을 바꿔주는 설정으로 언어설정으로 가능하다. 중국어 간체를 예로 보겠다.

첫번째 1 ASP

<%
Response.CharSet = "GB2312"
Session.CodePage = 936


Response.Write("返回部落")
Response.Write(", ")
Response.Write("返回上?")

// 깨지지 않는 이유는 UTF-8로 저장하였을시 화면에 뿌려줄때는 유니코드의 문자들이
// Response.CharSet으로 지정한 GB2312의 간체의 맞는 ANSI 로 변환해준다.

// 하지만 이 방법도 ANSI로 저장하였을 경우 마지막 글씨가 깨진다.
// ANSI -> GB2312 변환시 마지막 ?로 표시되는
// 글씨는 현재 설정된 언어 ANSI코드에 없는 문자이기 때문에 표시되지 않는다.
// 한자를 잘 복사해서 붙여 넣고 제대로 返回上级로 표시되는
// 상태에서 UTF-8로 저장을 하면 제대로 표시된다 권장한다.


%>

두번째 경우 2 HTML

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

返回部落, 返回上?

<!--
UTF-8 코드 저장해도 깨지는 이유는
유니코드로 저장되어 있지만 화면에 뿌려주는건 GB2312 간체이다. 깨지지 않으려면 utf-8로 인코딩을
바꿔주거나 된다.
이런식으로 설정하면 된다.

또는 ANSI로 저장하고 싶다면 제어판에서 Regional and language Options에서 Language for non-unicode 를
Chinese(PRC)바꾸고 중국어로 입력후 ANSI로 저장하는 방식이 있다.
-->

by 드레 | 2005/03/09 22:33 | 시간.. | 트랙백 | 덧글(0)

Cache Misses and Page Faults

MSDN : ms-help://MS.MSDNQTR.2003FEB.1033/vccore/html/_core_Tips_for_Improving_Time.2d.Critical_Code.htm

Cache Misses and Page Faults
Missed cache hits, on both the internal and external cache, as well as page faults (going to secondary storage for program instructions and data) slow the performance of a program.

A CPU cache hit can cost your program 10-20 clock cycles. An external cache hit can cost 20-40 clock cycles. A page fault can cost one million clock cycles (assuming a processor that handles 500 million instructions/second and a time of 2 millisecond for a page fault). Therefore, it is in the best interest of program execution to write code that will reduce the number of missed cache hits and page faults.

One reason for slow programs is that they take more page faults or miss the cache more often than necessary. To avoid this, it's important to use data structures with good locality of reference, which means keeping related things together. Sometimes a data structure that looks great turns out to be horrible because of poor locality of reference, and sometimes the reverse is true. Here are two examples:

Dynamically allocated linked lists can reduce program performance because when you search for an item or when you traverse a list to the end, each skipped link could miss the cache or cause a page fault. A list implementation based on simple arrays might actually be much faster because of better caching and fewer page faults even - allowing for the fact that the array would be harder to grow, it still might be faster.

Hash tables that use dynamically allocated linked lists can degrade performance. By extension, hash tables that use dynamically allocated linked lists to store their contents might perform substantially worse. In fact, in the final analysis, a simple linear search through an array might actually be faster (depending on the circumstances). Array-based hash tables (so-called "closed hashing") is an often-overlooked implementation which frequently has superior performance

캐쉬 실패율과 페이지 실패율에 신경이 쓰이는 이유이다.

by 드레 | 2005/01/07 16:21 | 시간.. | 트랙백 | 덧글(2)

<< 이전 페이지다음 페이지 >>