rspec get started

2013 October-04

rspec

init

gem install rspec
rspec --init

.rspec

我们可以用 .rspec 文件做 rspec 的配置。

--color
--format documentation

这里的 documentation 可以让 rspec 显示出具体的信息，而默认的是 --format progress 只会显示类似 testunit 的显示形式。

rspec with rails

rails new *application* --skip-test-unit

可以让 rails 不要给予 test-unit 的任何内容。然后在 Gemfile 添加

group :development, :test do
  gem 'rspec-rails'
end

然后

rails generate rspec:install

之后如果执行rails g scaffold那么就会在spec目录下建立相应的测试。

rspec with guard

group :development, :test do
  gem 'rspec-rails'
  gem 'rb-fsevent', :require => false if RUBY_PLATFORM =~ /darwin/i
  gem 'guard-rspec'
  gem 'guard-livereload'
end

安装 guard 的支持非常的简单。然后

guard init rspec
guard init livereload
guard

详情可见 railscasts #264 Guard

pending

对于还没有想好测试内容的情况这个就是超级有用了。

首先可以这样做

describe Hotel do
  it "should have method price"
end

或者这样

describe Hotel do
  it "should have method price" do
    pending
  end
end

一大堆的 matcher

zombie = Zombie.new (name: 'Ash')

zombie.name.should == 'Ash'
zombie.should respond_to(:name)
zombie.weapons.should include(weapon)
zombie.weapon.should_not be_nil

mysql and python tips

2013 September-28

最近又写 python 了，感觉好久不写又手生了。用 mysql 导数据，记下点东西吧。

python-MySQLdb 的安装

在安装 pip 之前需要执行

apt-get install libmysqlclient-dev python-dev

关于中文

table creation

要想让 mysql 很好的支持中文，在创建 table 的时候就要小心了。看了一下 ruby migration 生成的 sql table 是这个样子的：

CREATE TABLE `projects` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `title` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `description` text COLLATE utf8_unicode_ci,
  `created_at` datetime NOT NULL,
  `updated_at` datetime NOT NULL,
  `user_id` int(11) DEFAULT NULL,
  `is_archived` tinyint(1) DEFAULT '0',
  PRIMARY KEY (`id`),
  KEY `index_projects_on_user_id` (`user_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

注意最下面的

ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

是一定要加的。

python lang support

python 2 3 分离确实是个很让人揪心的事情。虽说 python3 已经对于 unicode 做了很好的支持，但是可惜 python3 一直发展不起来。我还是要苦逼的用着 python2.7 每次对于 string 都要自己去做一下转换。在把 string 扔给 MySQLdb 之前都要搞成 unicode 的。

setdefaultencoding

通常在 linux 系统中的默认编码是 'ascii' 这导致 urllib2.urlopen().read() 出来的结果会有一些问题。

reload(sys)
sys.setdefaultencoding('utf-8')

这样会避免很多问题。

然后 HOWTO UNICODE 对于编码问题讲的非常的透彻。

python MySQLdb

MySQLdb 的一个 connection 是不支持多线程的。要想支持多线程，需要对每一个线程建立 connection。

connect mysql from remote

在 /etc/mysql/my.cnf 里面有一个 bind_network 如果允许远程连接的话，一定不能是 127.0.0.1。

然后就是给用户提升权限，让他可以 remote 连接。

grant all on *.* to adminm@'%' identified by '123456'

mysql text limit

  TINYBLOB, TINYTEXT       L + 1 bytes, where L < 2^8    (255 Bytes)
  BLOB, TEXT               L + 2 bytes, where L < 2^16   (64 Kilobytes)
  MEDIUMBLOB, MEDIUMTEXT   L + 3 bytes, where L < 2^24   (16 Megabytes)
  LONGBLOB, LONGTEXT       L + 4 bytes, where L < 2^32   (4 Gigabytes)

text 最多支持 16K。

mysql text index

如果想要给 text 字段建立索引那就需要指定索引的长度。

CREATE UNIQUE INDEX index_name ON misc_info (key(10));

UPDATE: 我后来木有这么做了，我对 url 做了 md5 这样我就有了一个长度为 32 的十六进制字符串。我对 md5 做了索引。

mysql OperationalError: (2006, 'MySQL server has gone away')

这个问题也困扰了我好久。先描述一下，大概是因为 mysql 的连接有一个最长的连接时间。然后由于爬虫需要长时间的连接或者是等待连接，因此会导致在某些时候这个连接就断开了。

先去 /etc/mysql/my.cnf 看一些所有有关 timeout 的参数。

connect_timeout         = 10
wait_timeout            = 180
net_read_timeout        = 30
net_write_timeout       = 30

这里的时间单位都是秒，我查了一下资料，有对这些参数的具体解释，在这里

connect_timeout 在很多地方都存在，就是连接 mysql 时候的超时时间。应该不是这个。 wait_timeout 在连接建立之后 mysql 等待一个连接有动作的时间。也就是说如果连接在这个时间之内没有动作就会断开连接了。

即使连接没有处于 sleep 状态，即客户端忙于计算或者存储数据，MySQL 也选择了有条件的等待。在数据包的分发过程中，客户端可能来不及响应（发送、接收、或者处理数据包太慢）。为了保证连接不被浪费在无尽的等待中，MySQL 也会选择有条件（net_read_timeout 和 net_write_timeout）地主动断开连接。

这就让我纠结了，我尝试了修改以上的参数却没有从根本上解决这个问题。最后我就采用了比较暴力的方式。

  def execute(self, *args, **kvargs):
    try:
      cursor = self.conn.cursor()
      cursor.execute(*args, **kvargs)
    except (AttributeError, MySQLdb.OperationalError):
      self.connect()
      cursor = self.conn.cursor()
      cursor.execute(*args, **kvargs)
    return cursor

哈，就是如果断开，我就重连...

这个解决的不太好，但是确实有效。

sql 性能问题

select url from taobao limit 1 offset 2000000;

这条语句在我有 300w 的表里要跑个十几秒... explain 如下：

mysql> explain select url from taobao limit 1 offset 2000000;
+---------------+------+---------+------+---------+-------+
| possible_keys | key  | key_len | ref  | rows    | Extra |
|---------------|------|---------|------|---------|-------|
| NULL          | NULL | NULL    | NULL | 3298171 |       |
+---------------+------+---------+------+---------+-------+

换一种方式

mysql> explain select url from taobao where id < 2000001 and id >= 2000000;
+---------------+---------+---------+------+------+-------------+
| possible_keys | key     | key_len | ref  | rows | Extra       |
+---------------+---------+---------+------+------+-------------+
| PRIMARY       | PRIMARY | 4       | NULL |    1 | Using where |
+---------------+---------+---------+------+------+-------------+

0.05 秒的速度。

Self join graph relation in rails

2013 August-24

最近有一个诡异的需求，需要做一个多对多的图关系。情况是这样的，有一堆本来是扁平关系的标签，现在需要给他们组织出来层级关系了。那么一个 tag 就会有很多的父节点以及子节点。那么，简单来看，其实就是一个自身元素的多对多关系了。

通常的，针对两个 model 的多对多关系是这样的。link here

class Physician < ActiveRecord::Base
  has_many :appointments
  has_many :patients, through: :appointments
end

class Appointment < ActiveRecord::Base
  belongs_to :physician
  belongs_to :patient
end

class Patient < ActiveRecord::Base
  has_many :appointments
  has_many :physicians, through: :appointments
end

而对于通常的，对于自身做树级关系的 model 如下: link here

class Employee < ActiveRecord::Base
  has_many :subordinates, class_name: "Employee",
                          foreign_key: "manager_id"

  belongs_to :manager, class_name: "Employee"
end

那么，我现在所需要的差不多就是把这两个结合一下。

class Word < ActiveRecord::Base

  has_many :parent_relations, class_name: :WordRelation, foreign_key: :child_id
  has_many :child_relations, class_name: :WordRelation, foreign_key: :parent_id

  has_many :parents, through: :parent_relations
  has_many :children, through: :child_relations
end

class WordRelation < ActiveRecord::Base
  attr_accessible :child_id, :parent_id, :parent, :child

  belongs_to :parent, class_name: :Word
  belongs_to :child, class_name: :Word
end

两个 model word 以及 word_relation。对于这种 self join 的关系，通常是不能按照默认的外键的。那么就像第二个例子一样。我们需要自己指定 foreign_key。这里有个比较特别的地方。

has_many :parent_relations, class_name: :WordRelation, foreign_key: :child_id

parent_relations 需要的外键居然是 child_id 感觉有点奇怪吧。不过理清 activerecord 帮你生成的 sql 是什么样子就明白了。为了找其父亲节点，那么 sql 语句大概如下:

select * from word_relations where [one column] = '[this word id]'

要找父亲节点，那么 where 中就是找 哪个节点的子节点是这个 word。所以就应当是反着的才对的。

做了上述的工作之后，一个图状的 word 关系就可以搞定了。

Previous Next

Eisen's Blog