Chrommium如何显示网页

Sep 14th, 2012 | Comments

涉及模块

WebKit: html渲染引擎
Glue: 将WebKit的对应类型转化为chromium对应类型，被称为”Webkit嵌入层”它是chromium和test_shell(可以测试webkit)的基础
Renderer/Render host: chromium的多进程嵌入层，负责browser进程和render进程间的通知和命令
WebContent: 方便整合多个沙盒进程中渲染html为一个完整的画面
TabContentWrapper：包含一个完整的WebContent实例，并且包含一个插件接口
Browser：浏览器窗口，包含多个TabContentWrapper

WebKit

用来渲染web页面，源码位置为/third_party/WebKit。WebKit包含一个”WebCore”和一个”JavaScriptCore”，后者主要用来测试，一般使用V8对其替换。”WebCore”作为核心渲染引擎，在chromium中并不是像safari一样用”WebKit”的原生的接口，只是为了方便称为”WebKit”层。

WebKit Port

底层包含google实现的基于系统的底层”port”，部分代码并不是平台相关的，可以认为是”WebCore”的一部分，但是比如字体渲染等操作必须用各个平台自己的方式。

网络通信主要是通过chromium的multi-process resource loading 来实现的，而不是由render进程直接通过系统调用来实现
利用来自android的跨平台图形库Skia渲染除字体外的画面。代码位于 /third_party/skia其入口是/webkit/port/platform/graphics/GraphicsContextSkia.cpp

WebKit Glue

Glue为WebKit的类型和接口提供了一层封装(比如用std::string 代替 WebCore::String)，所有的chromium代码基于glue，这样方便统一类型以及风格并且减少WebKit的类型以及API变动为chromium带来的问题

“test shell” 提供了一个原生的调用WebKit的方式，它的调用方式和Chromium通过glue调用WebKit完全一致，可以用来进行测试新代码，减少chromium 架构上许多特性，线程或者进程的干扰。它还可以作为WebKit的一个自动化测试工具。但是”test shell” 下行方面和chromium的多进程方式不同，它在每个shell中单独集成一个”content shell”

Render进程

render进程基于glue集成webkit port，代码量不大，主要是通过IPC接收主进程的任务。最重要的类是RenderView，位于/content/renderer/render_view_impl.cc 类的对象代表了一个web页面，处理来自主进程的浏览命令，该类继承自RenderWidget，RenderWidget提供了绘制和事件处理接口。RenderView通过 render进程中的RenderProcess对象和主进程交互

RenderWidget实现了glue的一个抽象接口WebWidgetDelegate，对应于 WebCore::Widget。这是一个基础的显示并处理事件的窗口。RenderView继承自 RenderWidget，并且显示一个tab内的内容或者弹出窗口。对于chromium来说RenderWidget不依赖RenderView存在的唯一情况是web页面的选择窗口。

render线程

每个render包含两个线程，其中render thread运行RenderView和WebKit代码，还有一个main thread负责IPC。render thread和外部通信，首先将消息发送给render中的main thread，然后交给browser进程。所以它和browser之间可以是同步的

Browser进程

browser进程的底层处理

browser分为UI线程和IPC线程，IPC线程负责处理browser和render进程间的通信，并且管理外部网络链接。UI线程一旦创建一个RenderProcessHost 则同时在IPC线程内创建一个新的ChannelProxy IPC对象，render通过该对象内的PIPE通道和UI线程进行通信。该对象内存在ResourceMessageFilter可以处理网络请求。该功能位于ResourceMessageFilter::OnMessageReceived.

UI线程内的RenderProcessHost对象负责给对应的render发送界面相关消息，该功能位于RenderProcessHost::OnMessageReceived.

browser进程的高层处理

和界面相关的消息大部分在RenderViewHost::OnMessageReceived被处理，剩下的发送给RenderWidgetHost基类，对应于render进程内的RenderView and the RenderWidget 各平台都有自己的显示实现(RenderWidgetHostView[Aura|Gtk|Mac|Win])

在RenderView/Widget上层是WebContents对象，多数消息在这一层被函数响应。一个WebContents对应于一个webpage.它是内容模块的最高层。负责显示标签内的内容，每个WebContents包含于一个TabContentsWrapper内，对应于chrome内的一个tab。

举例

设置光标的流程

对于render进程来说

WebKit内部生成光标设置消息。该消息通过content/renderer/render_widget.cc内的RenderWidget::SetCursor被发出
调用RenderView内的RenderWidget::Send分配该消息，然后通过RenderThread::Send将消息从render进程发送给browser进程
然后调用render进程内的main thread的IPC::SyncChannel，将消息添加到指定的pipe中

对于browser进程来说

RenderProcessHost内的IPC::ChannelProxy通过IPC线程接收该消息。首先通过ResourceMessageFilter过滤是网络消息还是从底层传来的，由于消息不是网络消息，没有被过滤，被传送给UI线程
content/browser/renderer_host/render_process_host_impl.cc中的RenderProcessHost::OnMessageReceived 接收到该消息，处理几类消息后将其他消息传送给消息来源的RenderView对应的RenderViewHost。
content/browser/renderer_host/render_view_host_impl.cc内的RenderViewHost::OnMessageReceived接收到该消息，许多消息在此被处理，但是这个消息会继续被传递给RenderWidget
content/browser/renderer_host/render_view_host_impl.cc存有消息字典以及对应的处理函数RenderWidgetHost::OnMsgSetCursor，然后被特定的 UI函数处理

鼠标点击的流程

对于browser进程来说

UI线程内的RenderWidgetHostViewWin::OnMouseEvent捕捉到窗口消息，然后调用ForwardMouseEventToRenderer
上个函数将事件打包为跨平台的WebMouseEvent，然后将其发送给对应的RenderWidgetHost
RenderWidgetHost::ForwardInputEvent创建一个IPC消息ViewMsg_HandleInputEvent，将WebInputEvent序列化后放入其中，然后调用RenderWidgetHost::Send.
接着会调用RenderProcessHost::Send将消息发送给对应的IPC::ChannelProxy.
IPC::ChannelProxy.会将消息传给browser进程内的IPC线程，然后写入对应的pipe中

许多消息在WebContents内创建，比如浏览消息。上面的流程同样适用于这类消息。

对于render进程来说

IPC线程（这里写main thread?）上的IPC::Channel 读取到消息，然后通过代理传送给render线程
RenderView::OnMessageReceived获取到消息。很多消息在这里直接被处理了。但是这个消息不能在这里处理，被传递给RenderWidget::OnMessageReceived 然后被RenderWidget::OnHandleInputEvent处理
通过WebWidgetImpl::HandleInputEvent处理该消息，这个函数代替了WebKit内的PlatformMouseEvent,将事件传递给WebKit内的WebCore::Widget

参考

Libevent使用

Sep 11th, 2012 | Comments

参考

libevent-book libevent参考手册blog level triger & edge triger

Chrome多进程架构阅读

Sep 6th, 2012 | Comments

简介

关于Chromium多进程架构综述

多进程架构解决的问题

很难构建一个不会崩溃和挂起而且绝对安全的渲染器。现代浏览器类似原始的单用户多任务系统异常操作很容易使得系统崩溃。一个tab错误或者插件错误引起所有的tab崩溃。

对比现代操作系统，其强壮性是通过分离不同任务在不同的进程中来实现的。而且不同用户只能访问本用户的数据。

所以我们可以用类似的方式来实现浏览器的架构。

架构概览

Chromium将不同的tabs通过不同的进程处理来保证其崩溃不会影响其他部分。并且严格限制每个tab对于内存的访问。

运行UI，管理以及插件的进程称为browser process or browser

每个tab运行的进程称为render processes or renderers

renderers利用WebKit解析渲染HTML

管理进程

每个渲染进程都有一个全局的RenderProcess对象。负责和browser交互。browser对应保存一个RenderProcessHost管理状态和通信

管理渲染

每个渲染进程有一个或者多个被RenderProcess管理的RenderView实例，每个实例对应一个tab 在browser里的RenderProcessHost持有多个RenderViewHost。每个均有不同的ID。browser和特定 tab的通信靠RenderViewHost对象来实现，它通过RenderProcessHost将消息发送给RenderProcess内的RenderView

模块及接口

渲染进程内

渲染进程内是RenderProcess处理IPC消息，browser内是RenderProcessHost
RenderView和对应的RenderViewHost以及WebKit嵌入层进行通信。

浏览进程内

Browser是一个顶层的浏览窗口
RenderProcessHost是浏览进程和渲染进程IPC通信的实例
RenderViewHost封装了和RenderView的通信。

细节参见How Chromium displays web pages

共享渲染进程

一般来说，每个tab都是在一个新进程中打开。但是某些时候需要多个tab共享一个渲染进程。比如打开一个需要进行同步操作的web应用（JavaScript中的window.open）

渲染进程崩溃检测

browser进程监控所有IPC链接，一旦发现某个链接断开，那么认为该渲染进程崩溃。目前处理崩溃的方式是显示一个通知崩溃的页面

渲染进程沙盒化

因为WebKit单独运行在一个进程里，我们可以控制其如何访问以及访问哪些系统资源。比如渲染进程访问网络只能通过主进程进行访问。另外可以控制其访问用户显示及相关对象。一旦用户打开一个新窗口或者捕捉按将，因为渲染进程都是独立的，那么不会产生错误的显示。

内存管理

主要在低内存的情况下，提高顶层tab的响应速度。主要通过降低没有顶层tab的RenderProcess的Working set的大小。提高切换速度。

参考

Multi-process Architecture

c++的Traits技术

Sep 6th, 2012 | Comments

简介

traits是一种特性萃取技术,它在Generic Programming中被广泛运用,常常被用于使不同的类型可以用于相同的操作,或者针对不同类型提供不同的实现类型和类型的特性本是耦合在一起，通过traits技巧就可以将两者解耦。从某种意思上说traits方法也是对类型的特性做了泛化的工作，通过traits提供的类型特性是泛化的类型特性

例子

Example 1

traits在实现过程中往往需要用到以下三种C++的基本特性

enum
typedef
template (partial) specialization

enum用于将在不同类型间变化的标示统一成一个,它在C++中常常被用于在类中替代define,你可以称enum为类中的define; typedef则用于定义你的模板类支持特性的形式,你的模板类必须以某种形式支持某一特性,否则类型萃取器traits将无法正常工作.看到这里你可能会想,太苛刻了吧? 其实不然,不支持某种特性本身也是一种支持的方式(见示例2,我们定义了两种标示, __xtrue_type和 __xfalse_type,分别表示对某特性支持和不支持). template (partial) specialization被用于提供针对特定类型的正确的或更合适的版本. 借助以上几种简单技术,我们可以利用traits提取类中定义的特性,并根据不同的特性提供不同的实现.你可以将从特性的定义到萃取,再到traits的实际使用统称为traits技术,但这种定义使得traits显得过于复杂, 我更愿意将traits的定义限于特性萃取,因为这种定义使得traits显得更简单,更易于理解

#include <iostream>
using namespace std;

class CComplexObject // a demo class
{
  public:
      void clone() { cout << "in clone" << endl; }
};

// Solving the problem of choosing method to call by inner traits class
template <typename T, bool isClonable>
class XContainer
{
  public:
      enum {Clonable = isClonable};

      void clone(T* pObj)
      {
          Traits<isClonable>().clone(pObj);
      }

      template <bool flag>
          class Traits
          {
          };

      template <>
          class Traits<true>
          {
              public:
                  void clone(T* pObj)
                  {
                      cout << "before cloning Clonable type" << endl;
                      pObj->clone();
                      cout << "after cloning Clonable type" << endl;
                  }
          };

      template <>
          class Traits<false>
          {
              public:
                  void clone(T* pObj)
                  {
                      cout << "cloning non Clonable type" << endl;
                  }
          };
};

void main()
{
  int* p1 = 0;
  CComplexObject* p2 = 0;

  XContainer<int, false> n1;
  XContainer<CComplexObject, true> n2;

  n1.clone(p1);
  n2.clone(p2);
}

输出：

doing something non Clonable before doing something Clonable in clone after doing something Clonable

Traits初探

Example 2

从图中可看出算法destroy不必关心具体的类型特性traits，client不用关心具体的destroy。destroy概念上存在的基类是通过参数多态实现的，traits概念上存在的基类是通过type_traits编程方法实现的。另外得注意的是STL中的iterator相关type_traits的使用跟这里所说的有点不同，如果把类型特性从类中剥离出来看待，那就完全相同了。如何剥离，遇到type_traits相关的含有类型特性的类只看成是类型特性，跟类型特性无关的全都忽略掉

my_type_traits.h

#ifndef MY_TYPE_TRAITS_H
#define MY_TYPE_TRAITS_H

struct my_true_type {
};

struct my_false_type {
};

template <class T>
struct my_type_traits
{
  typedef my_false_type has_trivial_destructor;
};

template<> struct my_type_traits<int>
{
  typedef my_true_type has_trivial_destructor;
};

#endif

my_destruct.h

#ifndef MY_DESTRUCT_H
#define MY_DESTRUCT_H
#include <iostream>

#include "my_type_traits.h"

using std::cout;
using std::endl;

  template <class T1, class T2>
inline void myconstruct(T1 *p, const T2& value)
{
  new (p) T1(value);
}

  template <class T>
inline void mydestroy(T *p)
{
  typedef typename my_type_traits<T>::has_trivial_destructor trivial_destructor;
  _mydestroy(p, trivial_destructor());
}

  template <class T>
inline void _mydestroy(T *p, my_true_type)
{
  cout << " do the trivial destructor " << endl;
}

  template <class T>
inline void _mydestroy(T *p, my_false_type)
{
  cout << " do the real destructor " << endl;
  p->~T();
}

#endif

test_type_traits.cpp

#include <iostream>
#include "my_destruct.h"

using std::cout;
using std::endl;

class TestClass
{
  public:
      TestClass()
      {
          cout << "TestClass constructor call" << endl;
          data = new int(3);
      }
      TestClass(const TestClass& test_class)
      {
          cout << "TestClass copy constructor call. copy data:"
              << *(test_class.data) << endl;
          data = new int;
          *data = *(test_class.data) * 2;
      }
      ~TestClass()
      {
          cout << "TestClass destructor call. delete the data:" << *data << endl;
          delete data;
      }
  private:
      int *data;
};

int main(void)
{
  {
      TestClass *test_class_buf;
      TestClass test_class;

      test_class_buf = (TestClass *)malloc(sizeof(TestClass));
      myconstruct(test_class_buf, test_class);
      mydestroy(test_class_buf);
      free(test_class_buf);
  }

  {
      int *int_p;
      int_p = new int;
      mydestroy(int_p);
      free(int_p);
  }
}

type traits 之”本质论”

Example 3

首先假如有以下一个泛型的迭代器类，其中类型参数 T 为迭代器所指向的类型

template <typename T>
class myIterator
{
   ...
};

当我们使用myIterator时，怎样才能获知它所指向的元素的类型呢？我们可以为这个类加入一个内嵌类型，像这样

template <typename T>
class myIterator
{
        typedef  T value_type;
        ...
};

这样当我们使用myIterator类型时，可以通过 myIterator::value_type来获得相应的myIterator所指向的类型。

现在我们来设计一个算法，使用这个信息。

template <typename T>
typename myIterator<T>::value_type Foo(myIterator<T> i)
{
   ...
}

这里我们定义了一个函数Foo，它的返回为为参数i 所指向的类型，也就是T，那么我们为什么还要兴师动众的使用那个value_type呢？那是因为，当我们希望修改Foo函数，使它能够适应所有类型的迭代器时，我们可以这样写：

template <typename I> //这里的I可以是任意类型的迭代器
typename I::value_type Foo(I i)
{
   ...
}

现在，任意定义了 value_type内嵌类型的迭代器都可以做为Foo的参数了，并且Foo的返回值的类型将与相应迭代器所指的元素的类型一致。至此一切问题似乎都已解决，我们并没有使用任何特殊的技术。然而当考虑到以下情况时，新的问题便显现出来了：

原生指针也完全可以做为迭代器来使用，然而我们显然没有办法为原生指针添加一个value_type的内嵌类型，如此一来我们的Foo()函数就不能适用原生指针了，这不能不说是一大缺憾。那么有什么办法可以解决这个问题呢？此时便是我们的主角：类型信息榨取机 Traits 登场的时候了

template <typename T>
class Traits
{
        typedef typename T::value_type value_type;
};

template <typename I> //这里的I可以是任意类型的迭代器
typename Traits<I>::value_type Foo(I i)
{
   ...
}

偏特化原生指针

template <typename T>
class Traits<T*> //注意 这里针对原生指针进行了偏特化
{
        typedef typename T value_type;
};

test.cpp

int * p;
....
int i = Foo(p);

C++ Traits

参考

C++之traits小记

Nginx源码阅读

Aug 29th, 2012 | Comments

简介

nginx [engine x]是Igor Sysoev编写的一个HTTP和反向代理服务器，另外它也可以作为邮件代理服务器。它从2004开始已经在众多流量很大的俄罗斯网站上使用，包括Yandex、Mail.Ru、VKontakte，以及Rambler。据Netcraft统计，在2011年10月份，世界上最繁忙的网站中有7.84%使用Nginx作为其服务器或者代理服务器。部分成功案例请见：FastMail.FM， Wordpress.com。

Nginx的源码使用的许可为两条款类BSD协议。

一个主进程和多个工作进程，工作进程以非特权用户运行；
支持的事件机制：kqueue（FreeBSD 4.1+）、epoll（Linux 2.6+）、rt signals（Linux 2.2.19+）、/dev/poll（Solaris 7 11/99+）、event ports（Solaris 10）、select以及poll；
众多支持的kqueue特性包括EV_CLEAR、EV_DISABLE（临时禁止事件）、NOTE_LOWAT、EV_EOF，可用数据的数量，错误代码；
支持sendfile（FreeBSD 3.1+, Linux 2.2+, Mac OS X 10.5）、sendfile64（Linux 2.4.21+）和sendfilev（Solaris 8 7/01+）；
文件AIO（FreeBSD 4.3+, Linux 2.6.22+）；
Accept-filters（FreeBSD 4.1+）和 TCP_DEFER_ACCEPT（Linux 2.4+）；
10000个非活跃的HTTP keep-alive连接仅占用约2.5M内存；
尽可能避免数据拷贝操作

参考

Chrome 下载编译

Aug 28th, 2012 | Comments

安装依赖并获取代码

确认可以解压.tgz类型的文件
下载代码source tarball
确认代码放置的分区空间足够
解压代码
安装depot_tools
如果是ubuntu系统需要运行下面

bash $cd /path/to/chromium/src $sudo ./build/install-build-deps.sh
更新代码

bash $ gclient sync --force

具体参见Get the code

安装clang依赖

因为chrome编译很慢，这里尝试利用clang加快编译速度以及提高编译质量

$tools/clang/scripts/update.sh

编译

gcc

$./build/gyp_chromium
$make chrome -j4

clang

$GYP_GENERATORS=ninja GYP_DEFINES=clang=1 ./build/gyp_chromium
$ninja -C out/Debug chrome #fast

或者

$GYP_GENERATORS=make GYP_DEFINES=clang=1 ./build/gyp_chromium
$make chrome -j4  # 4: Number of cores, change accordingly

chrome clang

问题

nacl超时

在进行更新代码操作时可能会遇到

download_nacl_toolchain.py  timeout

如果没有下载完就进行编译，可能会遇到

LASTCHANGE is needed

这样的错误。

需要尝试重新更新代码

或者在build/common.gyi中将‘disable_nacl%’%: 0置为1(这种方法是官网在编译 chrome os时超时的解决办法，未经尝试)

webkit的svn超时

一种方法：如果不需要webkit中的layouttest可以在.gclient中将其注销

solutions = [
{ "name"        : "src",
  "url"         : "https://src.chromium.org/chrome/trunk/src",
  "deps_file"   : "DEPS",
  "managed"     : True,
  "custom_deps" : {
      "src/third_party/WebKit/LayoutTests": None,
      "src/content/test/data/layout_tests/LayoutTests": None,
      "src/chrome_frame/tools/test/reference_build/chrome": None,
      "src/chrome_frame/tools/test/reference_build/chrome_win": None,
      "src/chrome/tools/test/reference_build/chrome": None,
      "src/chrome/tools/test/reference_build/chrome_linux": None,
      "src/chrome/tools/test/reference_build/chrome_mac": None,
      "src/chrome/tools/test/reference_build/chrome_win": None,
  },
  "safesync_url": "",
},
  ]

另一种方法：可以人工下载webkit然后将其替换到chrome工程中去。

参考

Chromium Project chrome clang Get the code

切分字符串

Aug 22nd, 2012 | Comments

切分字符串

对于切分一个字符串的问题，是比较基础而且经常遇到的问题，下面就3种语言的实现方式做一个比较

C++

c++的方式，包括c没有在标准库中提供这样的一个函数。因此有很多第三方库或者灵活运用标准库的方法

#include <boost/algorithm/string.hpp>

using namespace boost;

int main()
{
  string s = "a,b, c ,,e,f,";
  vector <string> fields;
  split( fields, s, is_any_of( "," ) );
  }

Boost还支持正则表达式的方式切分字符串

string s = "one->two->thirty-four";
vector <string> fields;
split_regex( fields, s, regex( "->" ) );

同时Boost Tokenizer库也可以完成这种工作。具体参见 Boost.Tokenizer

QT

QT中的QString可以完成unicode方式的解析具体参见 QString::split()

GNU

在glib中有相应的切分函数

const char* s = ",,three,,five,,";
char** fields = g_strsplit( s, ',', 0 );
for (char** field = fields; field; ++field, ++n)
{
  printf( "\"%s\"\n", *field );
}
g_strfreev( fields );
fields = NULL;

stl::iostream

可以用std::getline()函数根据不同的切分符来分割行

string s = "string, to, split";
istringstream ss( s );
while (!ss.eof())
{
    string x;               // here's a nice, empty string
      getline( ss, x, ',' );  // try to read the next field into it
        cout << x << endl;      // print it out, even if we already hit EOF
}

std::string

可以利用std::string中的find_first_of()函数循环处理字符串

string s = "string, to, split";
string delimiters = " ,";
size_t current;
size_t next = -1;
do
{
    current = next + 1;
      next = s.find_first_of( delimiters, current );
        cout << s.substr( current, next - current ) << endl;
}
while (next != string::npos);

另外还有其他非主流方式，参见参考1

Java

相对c/c++来说，java的标准库要强大很多，比如split函数就是string里面的

string s=abcdeabcdeabcde;
string[] sArray=s.Split('c') ;
foreach(string i in sArray)
Console.WriteLine(i.ToString());

还可以按照字符串分割

string s="abcdeabcdeabcde";
string[] sArray1=s.Split(new char[3]{'c','d','e'}) ;
foreach(string i in sArray1)
Console.WriteLine(i.ToString());

正则表达式也是支持的

string content=agcsmallmacsmallgggsmallytx;
string[]resultString=Regex.Split(content,small,RegexOptions.IgnoreCase)
foreach(string i in resultString)
Console.WriteLine(i.ToString());

Python

类似java，python的字符串分割也是内置的函数

str = 'a,b,c,d'
strlist = str.split(',') # 用逗号分割str字符串，并保存到列表

参考

Split a string Splitting a string in C++ python string

Tornado Report

Aug 17th, 2012 | Comments

来源

本文来源于阅读Understanding the code inside Tornado, the asynchronous web server这篇文章的简单记录

简介

Tornado是一个python写的异步网络框架。最早是由FriendFeed开发的，后被facebook收购后开源。相比于流行的python 网络开发框架django，它更加简单，灵活，可定制化性强，但是学习曲线陡峭。比较类似的框架是web.py

异步接口

对于server端来说，如果考虑是同步的接口类似下面的

def handler_request(self, request):

    answ = self.remote_server.query(request) # this takes 5 seconds

      request.write_response(answ)

并发性低，当然可以采用多线程（多进程）处理，但是开销也较大。

但是采用异步接口，类似如下代码

def handler_request(self, request):

    self.remote_server.query_async(request, self.response_received)

  def response_received(self, request, answ):    # this is called 5 seconds later

      request.write(answ)

可以参考pythond的异步网络框架twisted

源码和安装

可以利用easy_install或者pip进行安装

$sudo easy_install tornado

或者下载后安装

git clone http://github.com/facebook/tornado.git
cd tornado
sudo python setup.py install

源码包里有demo的例子

IOLoop

这个模块是框架的核心模块： ioloop.py

添加一个socket处理代码如下

def add_handler(self, fd, handler, events):

    """Registers the given handler to receive the given events for fd."""

      self._handlers[fd] = handler

          self._impl.register(fd, events | self.ERROR)

_handlers是一个字典对象，对应于epoll里面注册的socket列表。

self._impl或者是select.epoll(),或者是select.select

start

def start(self):

  """Starts the I/O loop.

 The loop will run until one of the I/O handlers calls stop(), which

 will make the loop stop after the current event iteration completes.

 """

  self._running = True

  while True:

  [ ... ]

  if not self._running:

  break

  [ ... ]

  try:

event_pairs = self._impl.poll(poll_timeout)

  except Exception, e:

  if e.args == (4, "Interrupted system call"):

  logging.warning("Interrupted system call", exc_info=1)

  continue

  else:

  raise

# Pop one fd at a time from the set of pending fds and run

# its handler. Since that handler may perform actions on

# other file descriptors, there may be reentrant calls to

# this IOLoop that update self._events

self._events.update(event_pairs)

  while self._events:

fd, events = self._events.popitem()

  try:

self._handlers[fd](fd, events)

  except KeyboardInterrupt:

  raise

  except OSError, e:

  if e[0] == errno.EPIPE:

# Happens when the client closes the connection

  pass

  else:

  logging.error("Exception in I/O handler for fd %d",

          fd, exc_info=True)

  except:

  logging.error("Exception in I/O handler for fd %d",

          fd, exc_info=True)

关于这个循环的停止，可以通过设置self._running为false来解决，如果停止也是由事件触发的话，那么可以用poll的内部机制来解决，可以注册一个匿名的管道，当需要退出时，向管道写入某些数据，然后触发停止事件，代码如下

def __init__(self, impl=None):

  [...]

  # Create a pipe that we send bogus data to when we want to wake

  # the I/O loop when it is idle

  r, w = os.pipe()

  self._set_nonblocking(r)

  self._set_nonblocking(w)

  self._waker_reader = os.fdopen(r, "r", 0)

  self._waker_writer = os.fdopen(w, "w", 0)

  self.add_handler(r, self._read_waker, self.WRITE)

def _wake(self):

  try:

      self._waker_writer.write("x")

  except IOError:

      pass

定时器

对于IOLoop模块来说实现一个定时器非常简单，利用python的bisect模块实现如下

def add_timeout(self, deadline, callback):

  """Calls the given callback at the time deadline from the I/O loop."""

  timeout = _Timeout(deadline, callback)

  bisect.insort(self._timeouts, timeout)

  return timeout

在IOLoop模块中epoll比select要快速很多。

IOStream模块

IOStream提供了非阻塞的socket操作

read_util
read_bytes
write

HTTP server

组合IOLoop模块以及IOStream模块我们可以实现一个异步的httpserver.py

其中HTTPServer类只是接收socket然后将其放入IOLoop中。

def listen(self, port, address=""):

    assert not self._socket

  self._socket = socket.(socket.AF_INET, socket.SOCK_STREAM, 0)

  flags = fcntl.fcntl(self._socket.fileno(), fcntl.F_GETFD)

  flags |= fcntl.FD_CLOEXEC

  fcntl.fcntl(self._socket.fileno(), fcntl.F_SETFD, flags)

  self._socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)

  self._socket.setblocking(0)

  self._socket.bind((address, port))

  self._socket.listen(128)

  self.io_loop.add_handler(self._socket.fileno(), self._handle_events,

  self.io_loop.READ)

其中_handle_events()响应相关事件

def _handle_events(self, fd, events):

    while True:

  try:

connection, address = self._socket.accept()

  except socket.error, e:

  if e[0] in (errno.EWOULDBLOCK, errno.EAGAIN):

      return

      raise

      try:

stream = iostream.IOStream(connection, io_loop=self.io_loop)

  HTTPConnection(stream, address, self.request_callback,

          self.no_keep_alive, self.xheaders)

  except:

  logging.error("Error in connection callback", exc_info=True)

Zookeeper Guide

Aug 15th, 2012 | Comments

简介

介绍zookeeper的设计以及模型。提供一个简单的例子以及使用方式

数据模型 data model

分层的命名空间，每个node都可以有孩子。类似于文件系统中允许节点既可以是文件夹也可以是文件。

znode

znode保存一个状态数据结构

具体参见Zookeeper Intro中关于znode的介绍。

对于zookeeper文档中的名称说明

znode：表示一个数据节点
servers：表示组成zookeeper集群的机器
quorum peers：仲裁服务器(啥意思？)
client：表示一个机器或者一个进程使用zookeeper服务

对于开发者来说，znode是主要接触的部分。

watches

client可以作为znode的观察者。znode的变换会触发观察者的响应。具体见下面的介绍

Data Access

每个znode上的数据可以被原子读写，具有访问权控制。但是znode主要管理的是协调数据，而不是通用数据协调数据一般小于1M。如果需要处理大数据，应该用hdfs或者nfs之类的。

Ephemeral Nodes

临时节点，跟session的生存周期相同

Sequence Nodes

顺序节点，根据父节点的序数递增的创建znode。保证命名唯一

zookeeper的时间

zookeeper用不同的方式查询时间

zxid
version number
ticks
real time 只有在写入状态结构数据的时间戳会用，其他时候不用

状态结构

czxid
mzxid
ctime
mtime
version
cversion
aversion
ephemeralOwner
dataLen
numChildren

会话 sessions

对于client来说，链接server时创建session，异常或者主动关闭时删除session。下面是状态变换图

应用程序需要提供给client一个以逗号分割的字符串，被分割的每个部分子串表示了一个zookeeper的ip和port。例如

“127.0.0.1:4545” or “127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002”

client会挑选一个进行链接，失败的话自动链接下一个。

3.2.0版本添加了一个目录chroot，作为链接后的跟目录例如”127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002/app/a”

观察者 watcher

所有的获取数据的操作可以设置一个观察者，比如getData(), getChildren(), exists() 有三个关键点

一次性动作。一旦被观察者的数据发生变化，一个watche事件被发送给client，然后变动的数据发送给client，这些在一次动作中完成
发送给client。操作立刻发送给client，但是不保证client收到并重置。发送的动作是异步的，但是server端保证不同的client端收到的 event的顺序是相同的
server端设置的数据。zookeeper需要保存两个watches列表：数据列表和子节点列表。setData触发数据列表变动，create和delete触发两个列表变动

watches列表存储在client链接的server上，便于减轻负担和进行分布式处理。

对于watches来说zookeeper能够保证以下几点

wathces是有序的。
client先看到znode的数据变动，然后看到变动的数据
watch事件的顺序和zookeeper service数据发生变动的顺序是相同的

对于watches需要注意

watch是一次性的动作。
由于网络时延，可能在设置watch的时候，znode数据变动多次。
断开链接后watches不再有作用

访问授权 ACL

和unix的文件权限类似，但是不仅仅区分user，group，world

ACL permissions

CREATE: you can create a child node
READ: you can get data from a node and list its children.
WRITE: you can set data for a node
DELETE: you can delete a child node
ADMIN: you can set permissions

Builtin ACL Schemes

world
auth
digest
ip

下面的简单例子

#include <string.h>
#include <errno.h>

#include "zookeeper.h"

static zhandle_t *zh;

/**
 * In this example this method gets the cert for your
 *   environment -- you must provide
 */
char *foo_get_cert_once(char* id) { return 0; }

/** Watcher function -- empty for this example, not something you should
 * do in real code */
void watcher(zhandle_t *zzh, int type, int state, const char *path,
      void *watcherCtx) {}

int main(int argc, char argv) {
  char buffer[512];
  char p[2048];
  char *cert=0;
  char appId[64];

  strcpy(appId, "example.foo_test");
  cert = foo_get_cert_once(appId);
  if(cert!=0) {
      fprintf(stderr,
              "Certificate for appid [%s] is [%s]\n",appId,cert);
      strncpy(p,cert, sizeof(p)-1);
      free(cert);
  } else {
      fprintf(stderr, "Certificate for appid [%s] not found\n",appId);
      strcpy(p, "dummy");
  }

  zoo_set_debug_level(ZOO_LOG_LEVEL_DEBUG);

  zh = zookeeper_init("localhost:3181", watcher, 10000, 0, 0, 0);
  if (!zh) {
      return errno;
  }
  if(zoo_add_auth(zh,"foo",p,strlen(p),0,0)!=ZOK)
      return 2;

  struct ACL CREATE_ONLY_ACL[] = ;
  struct ACL_vector CREATE_ONLY = {1, CREATE_ONLY_ACL};
  int rc = zoo_create(zh,"/xyz","value", 5, &CREATE_ONLY, ZOO_EPHEMERAL,
          buffer, sizeof(buffer)-1);

  /** this operation will fail with a ZNOAUTH error */
  int buflen= sizeof(buffer);
  struct Stat stat;
  rc = zoo_get(zh, "/xyz", 0, buffer, &buflen, &stat);
  if (rc) {
      fprintf(stderr, "Error %d for %s\n", rc, __LINE__);
  }

  zookeeper_close(zh);
  return 0;
}

一致性保证

zookeeper性能高，可扩展性强，读写性能高（读比写高）。因为放弃了一部分一致性的要求，client可能在server读取到老的数据。但是可以保证

顺序一致性
更新原子性
单一系统视图
可靠性：monotonicity condition in Paxos; 不会因为恢复server而回滚数据
及时性：保证一定时间内client获取到最新数据。

利用上述特性可以容易的利用zookeeper接口完成leader election, barriers, queues, and read/write revocable locks 操作

zookeeper不保证下面的特性

Simultaneously Consistent Cross-Client Views

语言绑定

可以绑定java和c

简单操作

处理错误

java和c都返回错误

java

throwing KeeperException, calling code() on the exception will return the specific error code

returns an error code as defined in the enum ZOO_ERRORS

链接zookeeper

读操作

写操作

处理观察事件

zookeeper处理

java例子

Executor.java

/**
 * A simple example program to use DataMonitor to start and
 * stop executables based on a znode. The program watches the
 * specified znode and saves the data that corresponds to the
 * znode in the filesystem. It also starts the specified program
 * with the specified arguments when the znode exists and kills
 * the program if the znode goes away.
 */
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;

import org.apache.zookeeper.KeeperException;
import org.apache.zookeeper.WatchedEvent;
import org.apache.zookeeper.Watcher;
import org.apache.zookeeper.ZooKeeper;

public class Executor
implements Watcher, Runnable, DataMonitor.DataMonitorListener
{
  String znode;

  DataMonitor dm;

  ZooKeeper zk;

  String filename;

  String exec[];

  Process child;

  public Executor(String hostPort, String znode, String filename,
          String exec[]) throws KeeperException, IOException {
      this.filename = filename;
      this.exec = exec;
      zk = new ZooKeeper(hostPort, 3000, this);
      dm = new DataMonitor(zk, znode, null, this);
  }

  /**
  * @param args
  */
  public static void main(String[] args) {
      if (args.length < 4) {
          System.err
              .println("USAGE: Executor hostPort znode filename program [args ...]");
          System.exit(2);
      }
      String hostPort = args[0];
      String znode = args[1];
      String filename = args[2];
      String exec[] = new String[args.length - 3];
      System.arraycopy(args, 3, exec, 0, exec.length);
      try {
          new Executor(hostPort, znode, filename, exec).run();
      } catch (Exception e) {
          e.printStackTrace();
      }
  }

  /***************************************************************************
  * We do process any events ourselves, we just need to forward them on.
  *
  * @see org.apache.zookeeper.Watcher#process(org.apache.zookeeper.proto.WatcherEvent)
  */
  public void process(WatchedEvent event) {
      dm.process(event);
  }

  public void run() {
      try {
          synchronized (this) {
              while (!dm.dead) {
                  wait();
              }
          }
      } catch (InterruptedException e) {
      }
  }

  public void closing(int rc) {
      synchronized (this) {
          notifyAll();
      }
  }

  static class StreamWriter extends Thread {
      OutputStream os;

      InputStream is;

      StreamWriter(InputStream is, OutputStream os) {
          this.is = is;
          this.os = os;
          start();
      }

      public void run() {
          byte b[] = new byte[80];
          int rc;
          try {
              while ((rc = is.read(b)) > 0) {
                  os.write(b, 0, rc);
              }
          } catch (IOException e) {
          }

      }
  }

  public void exists(byte[] data) {
      if (data == null) {
          if (child != null) {
              System.out.println("Killing process");
              child.destroy();
              try {
                  child.waitFor();
              } catch (InterruptedException e) {
              }
          }
          child = null;
      } else {
          if (child != null) {
              System.out.println("Stopping child");
              child.destroy();
              try {
                  child.waitFor();
              } catch (InterruptedException e) {
                  e.printStackTrace();
              }
          }
          try {
              FileOutputStream fos = new FileOutputStream(filename);
              fos.write(data);
              fos.close();
          } catch (IOException e) {
              e.printStackTrace();
          }
          try {
              System.out.println("Starting child");
              child = Runtime.getRuntime().exec(exec);
              new StreamWriter(child.getInputStream(), System.out);
              new StreamWriter(child.getErrorStream(), System.err);
          } catch (IOException e) {
              e.printStackTrace();
          }
      }
  }
}

DataMonitor.java

/**
 * A simple class that monitors the data and existence of a ZooKeeper
 * node. It uses asynchronous ZooKeeper APIs.
 */
import java.util.Arrays;

import org.apache.zookeeper.KeeperException;
import org.apache.zookeeper.WatchedEvent;
import org.apache.zookeeper.Watcher;
import org.apache.zookeeper.ZooKeeper;
import org.apache.zookeeper.AsyncCallback.StatCallback;
import org.apache.zookeeper.KeeperException.Code;
import org.apache.zookeeper.data.Stat;

public class DataMonitor implements Watcher, StatCallback {

  ZooKeeper zk;

  String znode;

  Watcher chainedWatcher;

  boolean dead;

  DataMonitorListener listener;

  byte prevData[];

  public DataMonitor(ZooKeeper zk, String znode, Watcher chainedWatcher,
          DataMonitorListener listener) {
      this.zk = zk;
      this.znode = znode;
      this.chainedWatcher = chainedWatcher;
      this.listener = listener;
      // Get things started by checking if the node exists. We are going
      // to be completely event driven
      zk.exists(znode, true, this, null);
  }

  /**
  * Other classes use the DataMonitor by implementing this method
  */
  public interface DataMonitorListener {
      /**
      * The existence status of the node has changed.
      */
      void exists(byte data[]);

      /**
      * The ZooKeeper session is no longer valid.
      *
      * @param rc
      *                the ZooKeeper reason code
      */
      void closing(int rc);
  }

  public void process(WatchedEvent event) {
      String path = event.getPath();
      if (event.getType() == Event.EventType.None) {
          // We are are being told that the state of the
          // connection has changed
          switch (event.getState()) {
              case SyncConnected:
                  // In this particular example we don't need to do anything
                  // here - watches are automatically re-registered with 
                  // server and any watches triggered while the client was 
                  // disconnected will be delivered (in order of course)
                  break;
              case Expired:
                  // It's all over
                  dead = true;
                  listener.closing(KeeperException.Code.SessionExpired);
                  break;
          }
      } else {
          if (path != null && path.equals(znode)) {
              // Something has changed on the node, let's find out
              zk.exists(znode, true, this, null);
          }
      }
      if (chainedWatcher != null) {
          chainedWatcher.process(event);
      }
  }

  public void processResult(int rc, String path, Object ctx, Stat stat) {
      boolean exists;
      switch (rc) {
          case Code.Ok:
              exists = true;
              break;
          case Code.NoNode:
              exists = false;
              break;
          case Code.SessionExpired:
          case Code.NoAuth:
              dead = true;
              listener.closing(rc);
              return;
          default:
              // Retry errors
              zk.exists(znode, true, this, null);
              return;
      }

      byte b[] = null;
      if (exists) {
          try {
              b = zk.getData(znode, false, null);
          } catch (KeeperException e) {
              // We don't need to worry about recovering now. The watch
              // callbacks will kick off any exception handling
              e.printStackTrace();
          } catch (InterruptedException e) {
              return;
          }
      }
      if ((b == null && b != prevData)
              || (b != null && !Arrays.equals(prevData, b))) {
          listener.exists(b);
          prevData = b;
      }
  }
}

参考

Zookeeper Intro

Aug 15th, 2012 | Comments

简介

Zookeeper是一个开源的分布式的协调服务框架，主要为分布式应用程序提供服务。它提供一些列简单的原语进行同步，配置维护，全局命名等服务。它是java语言实现的，但是可以绑定运行java以及c的程序

它的主要作用是提供协调服务，减轻分布式应用程序的协调负担。

目标

简单

对于zookeeper来说，每个分布式节点类似于一个文件系统的文件。它为其提供分层命名空间下的协调通信服务。zookeeper将数据保存在内存中而不是硬盘中，所以时延较低

分布式

zookeeper本身就是分布式的

运行在不同机器上的zookeeper服务可以彼此通信。如果大部分的服务器运转正常，则zookeeper依然可用。

即少部分的机器异常不影响整体服务每个client都和一个server保持tcp链接 client通过这个链接传递请求和响应以及心跳，一旦server异常，client可以立刻链接别的server

有序

zookeeper对每次更新提供一个全局序数，随后的操作可以利用该序数进行高层一致性抽象操作，比如同步

zookeeper提供全局锁，类似google chuddy？

快速

zookeeper是一个读取操作比写入操作要快的服务，在上千个服务器上运行zookeeper的话，一般来说，读写速度是10：1

考虑到分布式的特性，读可以进行分离，但是写的话需要锁以及一致性操作

架构

zookeeper的提供一个分层的命名空间。每个节点都可以有孩子节点，类似于文件系统中每个文件也可以作为文件夹。节点被称为znode

znode保存一个包含数据修改版本，ACL修改记录，以及时间戳的状态结构。允许缓存验证和协调更新。每次状态结构修改，则版本自增1.

znode上的数据读取和写入提供原子操作。zookeeper可以允许存在临时节点，节点生存周期和session相同。

zookeeper提供观察功能，client可以作为znode的观察者，一旦znode数据发生变换，那么client会被通知。

一致性

顺序一致性：client的更新会被按照发送顺序操作
原子性：更新或者成功或者失败，不会出现其他结果
单一的系统链接表示：所有client观察到的系统都是
可靠性：一旦更新提交了，在下一个更新来之前都是有效的
时效性：对于client来说，看到的都是最新的数据

API

create
delete
exists
get data
set data
get chirldren
sync

实现

下面是zookeeper的结构图

在zookeeper服务中，每个zookeeper的节点都有上述几个模块

replicated database是一个内存数据库，保存修改日志到硬盘上。

每个client端，连接一个指定的server提交相关的request。可以从当前server中读取相关信息。

作为协议公认的方式，所有的信息写入都发给一个当前系统固定的server，被称为leader.其他的server被称为follower.其中follower从leader获取相关消息。zookeeper的消息层负责leader失效的重新选举和follower同步。

使用及性能

使用zookeeper提供的高层api，可以完成分布式系统中的同步原语，分组以及所有权管理。其性能测试如下

Overview

← Older Blog Archives Newer →