JIT Loop Invariant Hoisting

在上一篇文章中,我们已经对 Loop Invariant 概念有一个简单的了解,在文章的最后提到的 Loop Invariant 将在这篇文章做一个简单的介绍。

在我们开始之前

如果你了解 C# 程序的运行,应该知道 C# 代码先被编译器编译为 MSIL 中间代码,在实际运行的时候才通过 JIT 编译器将 MSIL 代码编译成机器代码并运行。因此,编译器有两个时段可以对代码进行优化:

  • C# -> MSIL
  • MSIL -> MACHINE CODE
阅读更多

Loop Hoisting

上篇文章中,提到 Loop Hoisting ,这是一个常见的编译器优化项。我们总是能通过汇编代码等低级语言来“窥探”代码实际是怎么“指示”硬件运行的(这边文章不会涉及到详细的汇编内容,但是会用C#反编译后得到的汇编代码来辅助说明)。如果你看过我前面的几篇文章,会发现我用了大量反编译后的汇编代码来辅助说明,毕竟,千言不如实际的“证据”有说服力。

阅读更多

C# 内存模型

在 C# 的语言规范中 ECMA-334,对于Volatile关键字的描述:

15.5.4 Volatile fields
When a field-declaration includes a volatile modifier, the fields introduced by that declaration are volatile fields. For non-volatile fields, optimization techniques that reorder instructions can lead to unexpected and unpredictable results in multi-threaded programs that access fields without synchronization such as that provided by the lock-statement (§13.13). These optimizations can be
performed by the compiler, by the run-time system, or by hardware. For volatile fields, such reordering optimizations are restricted:

  • A read of a volatile field is called a volatile read. A volatile read has “acquire semantics”; that is, it is guaranteed to occur prior to any references to memory that occur after it in the instruction sequence.
  • A write of a volatile field is called a volatile write. A volatile write has “release semantics”; that is, it is guaranteed to happen after any memory references prior to the write instruction in the instruction sequence.

简单来说,对于常规字段,由于代码优化而导致指令顺序改变,如果没有进行一定的同步控制,在多线程应用中可能会导致意想不到的结果,而造成这种意外的原因可能是编译器优化、运行时系统的优化或者因为硬件的原因(即CPU和主存储器的通信模型)。可变(volatile)字段会限制这种优化的发生,在这里引入两个定义:

  • 可变读: 对于可变字段的读操作会获取语义。即,其可以保证对于可变字段的内存读取操作一定发生在其后内存操作指令的前面。进一步解释,与 Thread.MemoryBarrier 类似,获取语义会保证在读取可变字段指令前的指令可以跨越它出现在它后面,但是相反地,在它后面的指令不能跨越它出现在它的前面。例子:
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    class Volatile_class
    {
    private int _a;
    private volatile int _b;
    private int _c;

    private void Call()
    {
    int temp=_a;
    //由于_b是可变字段,这样可以保证编译器不会将temp2=_c的指令提前到其之前
    //但是,可以将temp=_a提到其之后
    int temp1=_b;
    int temp2=_c;

    ...
    }

    private void OtherCall(){...}
    }
  • 可变写: 对于可变字段的写操作会释放语义。即,其可以保证对于可变字段的写操作发生在其前面指令执行之后,但是在它之后的指令可以跨域它提前执行。
阅读更多

C# 尾递归优化

何为尾递归

有时候我们使用递归来解决一些特定的问题,但是使用递归需要注意不要导致栈溢出,这是使用递归的一个常见问题,对于规模足够大的问题,使用递归必定会导致栈溢出。通常,我们可以通过尾递归进行优化,尾递归可以避免栈溢出的问题(暂且这样认为)。
尾递归并不是什么新奇的东西,理解起来很简单,对于递归,如果上层调用的返回结果不依赖子调用的结果,那么,这就是一个尾递归。例如:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
///这是一个简单的尾递归例子
namespace tail
{
class Program
{
static void Main(string[] args)
{
var test=RetrieveData(100000000000000);
}

static long RetrieveData(long unit)
{
if (unit == 0)
return unit;
return RetrieveData(--unit);
}
}
}

等等,好像有什么地方不对

从上面的例子分析,代码依旧会导致栈溢出,不是吗?是的,聪明的你答对了。那为什么说尾递归可以避免栈溢出问题?当然,从刚才的结论看,这个问题提的并不准确,尾递归并不能避免栈溢出问题。
仔细想想,尾递归结构和循环结构是类似的,上面的尾递归可以写成:

阅读更多
Light Weighted DropshadowEffect

Light Weighted DropshadowEffect

Let’s create a light weighted wpf drop shadow effect, considering that the origin one performs badly in some special occasions.
As I mentioned before (check it out), the original WPF DropShadow Effect can cause severe preformance problem. Due to the “flaw” M$FT brought to the HLSL support for WPF, the Effect class that implements the visual effect creates and destroy GPU resource each frame, which is the worst thing you could do with GPU resources. So, what about implementing a custom shadow effect to avoid it? This sounds interesting.

– 春节补充

关于 WPF 透明窗口的内存占用

关于 WPF 透明窗口的内存占用

要实现一个透明的 WPF 窗口?
What an easy task! By setting AllowTransparency and WindowStyle, you could finish it in seconds.

AllowTransparency="True" WindowStyle="None" Background="Transparent"

Correct? Of course.

However (you know this is coming), look at your task manager, easy task comes with large memory consumption, especially for 4K transparent window. 100+ MB ram are wasted for just showing an empty, transparent window! That’s unacceptable. A year ago, you might be right saying “Who cares about RAM, they are cheap as hell”, but check out the price they’ve grown over this year, they are expensive as hell now.

Fun fact of WPF transparent window

  • RAM usage increase as window size enlarge
  • Win32 window has no such problem

Wait, what? The larger the window is, the more RAM it consumes? Hmmmmm… this looks familiar, just like a Bitmap. For now, we don’t know how WPF handles transparent window, but the symptom shows that it’s like using the whole screen as a bitmap and the window updating itself with portion of that bitmap, making it “transparent”. What a smart move…
Back in the days when WPF was first released, low screen resolution was the main stream. Even today, most laptops still are shipping with a monitor of 1366*768 (ridiculous, right?). Let’s despise the nonsense the OEM told us and think about program running in computers with higher screen resolution.

RAM is not free, do not waste it

Obviously, costing 100+ mb of ram for showing a transparent window in 4K is unacceptable, especially compared with Win32 transparent window, which costs only 10+ mb. The gap between them makes WPF look dump.
Enough complaining, what can we do about it? I don’t want to write UI code with GDI using C++, that’s inefficient and not modern, plus, no one would abandon their beautiful, easy to maintain xaml UI code for this.

Hosting WPF content in Win32 Window

Well, indeed, no one would rewrite their UI code for just about 90mb of RAM. Compared with the work needed to rewrite C++ UI code, the RAM consumption gap seems acceptable (#smile face). But please remember, we can always host WPF content in win32 window.
Let say, we want to create a full screen notification dialog with semi-transparent background and apaque notication content in the center. To avoid the WPF ram problem, we create a semi-transparent window using win32:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
DWORD Flags1 = WS_EX_LAYERED;
DWORD Flags2 = WS_POPUP;

HWND hWnd = CreateWindowEx(Flags1,szWindowClass, szTitle, Flags2,
CW_USEDEFAULT, 0, 3840,2160, nullptr, nullptr, hInstance, nullptr);

SetLayeredWindowAttributes(hWnd, RRR, (BYTE)125, LWA_ALPHA);
ShowWindow(hWnd, nCmdShow);
UpdateWindow(hWnd);

case WM_ERASEBKGND:
RECT rect;
GetClientRect(hWnd, &rect);
FillRect((HDC)wParam, &rect, CreateSolidBrush(RGB(0, 0, 0)));
break;

By enabling C++/CLI, we can access WPF content directly

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
namespace ManagedCode
{
using namespace System;
using namespace System::Windows;
using namespace System::Windows::Interop;
using namespace System::Windows::Media;

HWND GetHwnd(HWND parent, int x, int y, int width, int height) {
HwndSource^ source = gcnew HwndSource(
0, // class style
WS_VISIBLE | WS_CHILD, // style
0, // exstyle
x, y, width, height,
"hi", // NAME
IntPtr(parent) // parent window
);

UIElement^ page = gcnew ManagedContent::WPFContent();
source->RootVisual = page;
return (HWND)source->Handle.ToPointer();
}
}

and finally

1
2
//managed content
ManagedCode::GetHwnd(hWnd, 0, 0, 200, 200);

Due to the different technologies behind WPF and GDI, more work needed to be done for the unavoidable alpha blending problem, but, you can use wpf popup to achieve your goal for short.

使用 OpenCL 实现图片高斯模糊

使用 OpenCL 实现图片高斯模糊

高斯模糊( https://zh.wikipedia.org/wiki/%E9%AB%98%E6%96%AF%E6%A8%A1%E7%B3%8A )是一种常见的图像处理算法,使用高斯分布与图像做卷积,得到模糊的效果。其二维定义:

σ是正态分布的标准偏差。在应用的时候,假设σ为2.5。对于模糊半径为1,则高斯矩阵为3*3的一个矩阵,以[1,1]为中心,带入公式计算高斯矩阵的值,得到:

阅读更多
.NET Standard CLOO

.NET Standard CLOO

鉴于 .NET Standard 2.0 已经支持大量的.NET api,移植CLOO已经是毫无难度的一件事情 Github

CLOO使用p/invoke方式调用opencl api,但是对于不同平台下,opencl 的名称并不一致,例如在linux下为libOpenCL.so,Windows下为OpenCL.dll,且 .NET Standard 没有提供 Mono 类似的 dllmap 模式,因此,现在来说还不能达到用一个package,在所有平台引用的目的。