Python 学习笔记

Python 学习笔记
- 参考
  - diveintopython.org
- Python 的语法特点（Something strange ……）
  - 代码缩进不再是美观的需要，而称为语法的一部分！
  - 函数的参数传递：支持关键字参数传递使参数顺序不再重要！
  - 内嵌代码中的帮助文档: DocStrings
  - 三引号的字符串
  - while 循环和 for 循环可以带 else 语句块
  - 交换赋值：a,b = b,a
  - Class 中 method（方法）的第一个参数非常特殊：需要声明（self），调用时却不提供（Python 自动添加）。
  - 类的构造函数名称为 __init__(self, ...)
  - 类的 Class 变量和 Object 变量
  - 一切皆是对象：甚至字符串，变量，函数，都是对象
- 获得帮助
  - 如何获得帮助？
  - 1. 进入 python 命令行
  - 2. 包含要查询的模组。如： import sys
  - 3. 显示该模组包含的属性。命令： dir(sys)
  - 4. 获取该模组的帮助。如： help(sys)
- 源文件的字符集设置
  - 为支持中文，需要在源码的第一行或第二行（一般是第二行）添加特殊格式的注释，声明源文件的字符集。默认为 7-bit ASCII
  - 格式为： # -*- coding: <encoding-name> -*-
    - 参见: http://www.python.org/dev/peps/pep-0263/
    - 如：设置 gbk 编码： #!/usr/bin/python # -*- coding: gbk -*-
    - 如：设置 utf-8 编码 #!/usr/bin/python # -*- coding: utf-8 -*-
  - 注： emacs 能够也能识别该语法。而 VIM 通过 # vim:fileencoding=<encoding-name> 来识别
- 常量和变量
  - 变量
    - 变量名规则和 C 的相类似
      - 合法的变量名，如： __my_name, name_23, a1b2_c3 等
    - 保留关键字（不能与之重名）
      - and　　　　　 def　　　　 exec　　　　 if　　　　　not　　　　 return assert　　　 del　　　　 finally　　　 import　　　or　　　　　try break　　　　 elif　　　　 for　　　　　in　　　　　pass　　　　while class　　　　 else　　　　from　　　　 is　　　　　print　　　 yield continue　　 except　　　global　　　 lambda　　　raise
    - 没有类型声明，直接使用
  - 类型综述 / 查看类型
    - int
      - >>> type(17) <type 'int'>
    - float
      - >>> type(3.2) <type 'float'>
    - long
      - >>> type(1L) <type 'long'>
      - >>> type(long(1)) <type 'long'>
    - bool
      - True 和 False，注意大小写
      - >>> type(True) <type 'bool'>
      - >>> type(1>2) <type 'bool'>
    - string
      - >>> type("Hello, World!") <type 'str'>
      - >>> type("WorldHello"[0]) <type 'str'>
        即 Python 没有 Char 类型
    - list
      - >>> type(['a','b','c']) <type 'list'>
      - >>> type([]) <type 'list'>
    - tuple
      - >>> type(('a','b','c')) <type 'tuple'>
      - >>> type(()) <type 'tuple'>
    - dict
      - >>> type({'color1':'red','color12':'blue'}) <type 'dict'>
      - >>> type({}) <type 'dict'>
  - 字符串
    - 三引号
      - 三引号：''' 或者 """ 是 python 的发明。三引号可以包含跨行文字，其中的引号不必转义。（即内容可以包含的换行符和引号）
      - 如
        '''This is a multi-line string. This is the first line. This is the second line. "What's your name?," I asked. He said "Bond, James Bond." '''
    - 单引号和双引号都可以用于创建字符串。
      - 注意，单引号和双引号没有任何不同，不像 PHP, PERL
    - \ 作为转义字符，\ 用在行尾作为续行符
    - r 或者 R 作为前缀，引入 Raw String
      - 例如: r"Newlines are indicated by \n."
      - 在处理常规表达式，尽量使用 Raw String，免得增加反斜线。例如 r'\1' 相当于 '\\1'。
    - u 或者 U 作为前缀，引入 Unicode
      - 例如: u"This is a Unicode string."
    - u， r 可以一起使用，u在r前
      - 例如 ur"\u0062\n" 包含三个字符
        \u0062
        
        \\
        
        n
    - 字符串连接：两个字符串并排，则表示两个字符串连接在一起
      - 'What\'s ' "your name?" 自动转换为 "What's your name?" .
      - 作用一：减少 \ 作为续行符的使用。
      - 作用二：可以为每段文字添加注释。如：
        re.compile("[A-Za-z_]" # letter or underscore "[A-Za-z0-9_]*" # letter, digit or underscore )
      - 用括号包含多行字串
        >>> test= ("case 1: something;" # test case 1 ... "case 2: something;" #test case 2 ... "case 3: something." #test case 3 ... ) >>> test 'case 1: something;case 2: something;case 3: something.'
    - 类似于 sprintf 的字符串格式化
      - header1 = "Dear %s," % name
      - header2 = "Dear %(title)s %(name)s," % vars()
    - 字符串操作
      - String slices
        [n] : 字符串的第 n+1 个字符
        print "WorldHello"[0]
        
        str="WorldHello" print str[len(str)-1]
        
        [n:m] : 返回从 n 开始到 m 结束的字符串，包括 n，不包括 m
        >>> s = "0123456789" >>> print s[0:5] 01234 >>> print s[3:5] 34 >>> print s[7:21] 789 >>> print s[:5] 01234 >>> print s[7:] 789 >>> print s[21:]
      - len : 字符串长度
        len("WorldHello")
      - 字符串比较
        ==, >, < 可以用于字符串比较
      - string 模组
    - 警告： python 中字符串不可更改，属于常量
      - # 错误！字符串不可更改 greeting = "Hello, world!" greeting[0] = 'J' # ERROR! print greeting
        # 可改写为： greeting = "Hello, world!" newGreeting = 'J' + greeting[1:] print newGreeting
  - 数字
    - 整形和长整形
      - longinteger ::= integer ("l" | "L") integer ::= decimalinteger | octinteger | hexinteger decimalinteger ::= nonzerodigit digit* | "0" octinteger ::= "0" octdigit+ hexinteger ::= "0" ("x" | "X") hexdigit+ nonzerodigit ::= "1"..."9" octdigit ::= "0"..."7" hexdigit ::= digit | "a"..."f" | "A"..."F"
    - 浮点数
  - 类型转换
    - int("32")
    - int(-2.3)
    - float(32)
    - float("3.14159")
    - str(3.14149)
    - ord('A') ：返回字母'A' 的 ASCII 值
  - 复杂类型，如 list, tuple, dict 参见后面章节
  - 局部变量与全局变量
    - 函数中可以直接引用全局变量的值，无须定义。但如果修改，影响只限于函数内部。
    - 函数中没有用 global 声明的变量是局部变量，不影响全局变量的取值
    - global 声明全局变量
      - #!/usr/bin/python def func1(): print "func1: local x is", x def func2(): x = 2 print 'func2: local x is', x def func3(): global x print "func3: before change, x is", x x = 2 print 'func3: changed x to', x x = 1 print 'Global x is', x func1() print 'Global x is', x func2() print 'Global x is', x func3() print 'Global x is', x
    - locals() 和 globals() 是两个特殊函数，返回局部变量和全局变量
      - locals() 返回局部变量的 copy，不能修改
      - globals() 返回全局变量的 namespace, 可以通过其修改全局变量本身
    - vars() 等同于 locales()，可以用 vars()['key'] = 'value' 动态添加局部变量
- 复杂类型
  - string/unicode（字符串）
  - list （列表）
    - 方括号建立的列表
      - [10, 20, 30, 40]
      - ["spam", "bungee", "swallow"]
      - ["hello", 2.0, 5, [10, 20]]
    - range 函数建立的列表
      - >>> range(1,5) [1, 2, 3, 4]
        从1 到 5，包括1，但不包括5。（隐含步长为1）
      - >>> range(10) [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
        从 0 到 10，包括 0，但不包括 10。（隐含步长为1）
      - >>> range(1, 10, 2) [1, 3, 5, 7, 9]
        步长为2
    - 访问列表中的元素
      - 类似数组下标
      - print numbers[0]
      - numbers[1] = 5
    - print 语句显示列表
      - vocabulary = ["ameliorate", "castigate", "defenestrate"] numbers = [17, 123] empty = [] print vocabulary, numbers, empty ['ameliorate', 'castigate', 'defenestrate'] [17, 123] []
    - 列表操作
      - 列表长度
        len() 函数
      - + （相加）
        >>> a = [1, 2, 3] >>> b = [4, 5, 6] >>> c = a + b >>> print c [1, 2, 3, 4, 5, 6]
      - * （重复）
        >>> [0] * 4 [0, 0, 0, 0] >>> [1, 2, 3] * 3 [1, 2, 3, 1, 2, 3, 1, 2, 3]
      - List slices
        参见 String slices
      - 列表是变量，可以更改
        不像字符串 str， List 是可以更改的
        
        >>> fruit = ["banana", "apple", "quince"] >>> fruit[0] = "pear" >>> fruit[-1] = "orange" >>> print fruit ['pear', 'apple', 'orange']
        
        >>> list = ['a', 'b', 'c', 'd', 'e', 'f'] >>> list[1:3] = ['x', 'y'] >>> print list ['a', 'x', 'y', 'd', 'e', 'f']
      - 列表中增加元素
        >>> list = ['a', 'd', 'f'] >>> list[1:1] = ['b', 'c'] >>> print list ['a', 'b', 'c', 'd', 'f'] >>> list[4:4] = ['e'] >>> print list ['a', 'b', 'c', 'd', 'e', 'f']
      - 删除列表中元素
        通过清空而删除
        >>> list = ['a', 'b', 'c', 'd', 'e', 'f'] >>> list[1:3] = [] >>> print list ['a', 'd', 'e', 'f']
        
        使用 del 关键字
        >>> a = ['one', 'two', 'three'] >>> del a[1] >>> a ['one', 'three']
        
        >>> list = ['a', 'b', 'c', 'd', 'e', 'f'] >>> del list[1:5] >>> print list ['a', 'f']
      - 查看列表的id
        >>> a = [1, 2, 3] >>> b = [1, 2, 3] >>> print id(a), id(b) 418650444 418675820 >>> b = a >>> print id(a), id(b) 418650444 418650444 >>> b = a[:] >>> print id(a), id(b) 418650444 418675692
    - 引用和Copy/Clone
      - b = a，则两个变量指向同一个对象，两个变量的值一起变动
      - b = a[:]，则建立克隆，b 和 a 指向不同对象，互不相干
      - list 作为函数的参数，是引用调用，即函数对 list 所做的修改会影响 list 对象本身
    - 列表嵌套和矩阵
      - 嵌套
        >>> list = ["hello", 2.0, 5, [10, 20]] >>> list[3][1] 20
      - 矩阵
        >>> matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]] >>> matrix[1] [4, 5, 6] >>> matrix[1][1] 5
    - 字符串和列表
      - string.split 方法
        >>> import string >>> song = "The rain in Spain..." >>> string.split(song) ['The', 'rain', 'in', 'Spain...']
        
        >>> string.split(song, 'ai') ['The r', 'n in Sp', 'n...']
      - string.join 方法
        >>> list = ['The', 'rain', 'in', 'Spain...'] >>> string.join(list) 'The rain in Spain...'
        
        >>> string.join(list, '_') 'The_rain_in_Spain...'
        
        >>> list = ['The', 'rain', 'in', 'Spain...'] >>> '|'.join(list) 'The|rain|in|Spain...'
  - Tuples
    - 圆括号建立 Tuple
      - 在最外面用圆括号括起来
        >>> type((1,2,3)) <type 'tuple'>
      - 必需是逗号分隔的多个值
        >>> type((1)) <type 'int'>
        
        >>> type((1,)) <type 'tuple'>
        
        >>> type(('WorldHello')) <type 'str'>
        
        >>> type(('WorldHello',)) <type 'tuple'>
    - Tuple vs list
      - Tuple 和 list 的区别就是: Tuple 是不可更改的，而 list 是可以更改的
      - 一个元素也可以构成 list，但 tuple 必需为多个元素
        >>> type([1]) <type 'list'>
        
        >>> type((1)) <type 'int'>
  - Dictionaries （哈希表）
    - 花括号建立哈希表
      - Perl 管这种类型叫做哈希表或者关联数组。即下标可以是字符串的数组
      - >>> eng2sp = {} >>> eng2sp['one'] = 'uno' >>> eng2sp['two'] = 'dos' >>> print eng2sp {'one': 'uno', 'two': 'dos'}
    - 访问哈希表中元素：下标为字符串
      - >>> print eng2sp {'one': 'uno', 'three': 'tres', 'two': 'dos'} >>> print eng2sp['two'] 'dos'
    - 哈希表操作
      - keys() 方法，返回 keys 组成的列表
        >>> eng2sp.keys() ['one', 'three', 'two']
      - values() 方法，返回由 values 组成的列表
        >>> eng2sp.values() ['uno', 'tres', 'dos']
      - items() 方法，返回由 key-value tuple 组成的列表
        >>> eng2sp.items() [('one','uno'), ('three', 'tres'), ('two', 'dos')]
        
        from MoinMoin.util.chartypes import _chartypes for key, val in _chartypes.items(): if not vars().has_key(key): vars()[key] = val
      - haskey() 方法，返回布尔值
        >>> eng2sp.has_key('one') True >>> eng2sp.has_key('deux') False
      - get() 方法
        返回哈希表某个 key 对应的 value
        如 eng2sp.get('one')
        
        等同于 eng2sp['one']
        
        get() 可以带缺省值，即如果没有定义该 key，返回缺省值
        如 eng2sp.get('none', 0)，如果没有定义 none, 返回 0，而不是空
    - 引用和 copy/clone
      - 哈希表的克隆：copy() 方法
        >>> opposites = {'up': 'down', 'right': 'wrong', 'true': 'false'} >>> copy = opposites.copy()
  - Iterators
  - type 函数返回变量类型
    - isinstance(varname, type({}))
- 语句
  - 每一行语句，不需要分号作为语句结尾！
  - 如果多个语句写在一行，则需要分号分隔；
  - 用 “\” 显示连接行
    - 如： i=10 print \ i
  - 默认连接行
    - 方括号，圆括号，花括号中的内容可以多行排列，不用 \ 续行，默认续行
    - 例如： month_names = ['Januari', 'Februari', 'Maart', # These are the 'April', 'Mei', 'Juni', # Dutch names 'Juli', 'Augustus', 'September', # for the months 'Oktober', 'November', 'December'] # of the year
  - 缩进
    - 一条语句前的空白（空格、TAB）是有意义的！
    - 相同缩进的语句成为一个逻辑代码块
    - 错误的缩进，将导致运行出错！
    - 缩进的单位是空格。Tab 转换为1-8个空格，转换原则是空格总数是 8 的倍数。
  - 空语句 pass
    - def someFunction(): pass
- 操作符和表达式
  - ** 代表幂
    - 3 ** 4 gives 81 (i.e. 3 * 3 * 3 * 3)
  - // 代表 floor
    - 4 // 3.0 gives 1.0
  - % 代表取余
    - -25.5 % 2.25 gives 1.5 .
  - << 左移位
  - >> 右移位
  - <, >, <=, >=, ==, != 和 C 类似
  - 比较可以级联。如：
    - if 0 < x < 10: print "x is a positive single digit."
  - ~, &, |, ^ 和 c 语言相同
    - 5 & 3 gives 1.
    - 5 | 3 gives 7.
    - 5 ^ 3 gives 6
    - ~5 gives -6
      - 取反。 ~x 相当于 -(x+1)
  - and, or, not 代表逻辑与或非
    - if 0 < x and x < 10: print "x is a positive single digit."
  - is 和 is not，用于比较两个 object 是否为同一个对象
    - 实际上两个对象的 ID 相同，才代表同一个对象。
    - is: id(obj1) == id(obj2)
    - is not: id(obj1) != id(obj2)
  - in, not in 用于测试成员变量
    - 'a' in ['a', 'b', 'c'] # True
  - 交换赋值 a,b = b,a
    - 为交换变量 a, b 的值，其它语言可能需要一个中间变量
      - temp=a a=b b=temp
    - python 有一个交换赋值的写法： a,b = b,a
- 控制语句
  - if 语句
    - if ... elif ... else ，示例：（注意冒号和缩进）
      - #!/usr/bin/python # Filename : if.py number = 23 guess = int(raw_input('Enter an integer : ')) if guess == number: print 'Congratulations, you guessed it.' # new block starts here print "(but you don't win any prizes!)" # new block ends here elif guess < number: print 'No, it is a little higher than that.' # another block # You can do whatever you want in a block ... else: print 'No, it is a little lower than that.' # you must have guess > number to reach here print 'Done' # This last statement is always executed, after the if statement # is executed.
    - 注意: 没有 switch... case 语句！
  - while 循环语句
    - while ... [else ...] ，示例：（else 可选）
      - #!/usr/bin/python # Filename : while.py number = 23 stop = False while not stop: guess = int(raw_input('Enter an integer : ')) if guess == number: print 'Congratulations, you guessed it.' stop = True # This causes the while loop to stop elif guess < number: print 'No, it is a little higher than that.' else: # you must have guess > number to reach here print 'No, it is a little lower than that.' else: print 'The while loop is over.' print 'I can do whatever I want here.' print 'Done.'
    - break 和 continue 语句
      - break 语句跳出循环，且不执行 else 语句
  - for 循环语句
    - for... else... ，示例：（else 可选）
      - #!/usr/bin/python # Filename : for.py for i in range(1, 5): print i else: print 'The for loop is over.'
        range(1,5) 相当于 range(1,5,1) 第三个参数为步长
        
        range 止于第二个参数，但不包括第二个参数
    - break 和 continue 语句
      - break 语句跳出循环，且不执行 else 语句
    - 后置 for 语句
      - [ name for name in wikiaction.__dict__ ]
      - actions = [name[3:] for name in wikiaction.__dict__ if name.startswith('do_')]
    - 示例
      - 字符串中的字符
        prefixes = "JKLMNOPQ" suffix = "ack" for letter in prefixes: print letter + suffix
- 函数
  - 函数声明
    - def 关键字
      - 函数名
      - 括号和参数
      - 冒号
    - 如：
      - #!/usr/bin/python # Filename : func_param.py def printMax(a, b): if a > b: print a, 'is maximum' else: print b, 'is maximum' printMax(3, 4) # Directly give literal values
  - 参数的缺省值
    - 如同 C++ 那样
      - #!/usr/bin/python # Filename : func_default.py def say(s, times = 1): print s * times say('Hello') say('World', 5)
  - 关键字参数
    - 在 C++ 等语言中遇到如下困扰：有一长串参数，虽然都有缺省值，但只为了修改后面的某个参数，还需要把前面的参数也赋值。这种方式，在 python 中称为顺序参数赋值。
    - Python 的一个特色是关键字参数赋值
    - 例如：
      - #!/usr/bin/python # Filename : func_key.py def func(a, b=5, c=10): print 'a is', a, 'and b is', b, 'and c is', c func(3, 7) func(25, c=24) func(c=50, a=100)
  - 可变参数
    - 参数前加 * 或者 **，则读取的是 list 或者 dictionary
    - 示例1
      - #!/usr/bin/python def sum(*args): '''Return the sum the number of args.''' total = 0 for i in range(0, len(args)): total += args[i] return total print sum(10, 20, 30, 40, 50)
  - 函数返回值
    - return 语句提供函数返回值
    - 没有 return，则返回 None
  - DocStrings
    - DocStrings 提供函数的帮助
      - 函数内部的第一行开始的字符串为 DocStrings
      - DocStrings 一般为多行
        DocString 为三引号扩起来的多行字符串
        
        第一行为概述
        
        第二行为空行
        
        第三行开始是详细描述
      - DocStrings 的存在证明了函数也是对象
        函数的 __doc__ 属性为该 DocStrings
        
        例如 print printMax.__doc__ 为打印 printMax 函数的 DocStrings
      - help( ) 查看帮助就是调用函数的 DocStrings
  - Lambda Forms
    - Lambda Forms 用于创建并返回新函数，即是一个函数生成器
    - 示例
- 内置函数和对象
  - 帮助： import __builtin__; help (__builtin__)
  - 函数
    - 数学／逻辑／算法
      - abs(number) : 绝对值
      - cmp(x,y) ：比较x y 的值。返回 1,0,-1
      - divmod(x, y) -> (div, mod) ：显示除数和余数
      - pow(x, y[, z]) -> number
      - round(number[, ndigits]) -> floating point number ：四舍五入，保留 n 位小数
      - sum(sequence, start=0) -> value ：取 sequence 的和
      - hex(number) -> string ：返回十六进制
      - oct(number) -> string ：八进制
      - len(object) -> integer
      - max(sequence) -> value
      - min(sequence) -> value
      - range([start,] stop[, step]) -> list of integers
        >>> range(10) [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
      - filter(function or None, sequence) -> list, tuple, or string
        function 作用于 sequence 的每一个元素，返回 true 的元素。返回类型同 sequence 类型。
        
        如果 function 为 None，返回本身为 true 的元素
      - map(function, sequence[, sequence, ...]) -> list
        将函数作用于 sequence 每个元素，生成 list
        
        >>> map(lambda x : x*2, [1,2,3,4,5]) [2, 4, 6, 8, 10]
      - reduce(function, sequence[, initial]) -> value
        从左至右，将函数作用在 sequence 上，最终由 sequence 产生一个唯一值。
        
        >>> reduce(lambda x, y: x+y, [1, 2, 3, 4, 5]) 15 相当于 ((((1+2)+3)+4)+5)
      - sorted(iterable, cmp=None, key=None, reverse=False) ：排序
      - zip(seq1 [, seq2 [...]]) -> [(seq1[0], seq2[0] ...), (...)]
        >>> zip('1234','789') [('1', '7'), ('2', '8'), ('3', '9')]
      - coerce(x, y) -> (x1, y1)
        Return a tuple consisting of the two numeric arguments converted to a common type, using the same rules as used by arithmetic operations. If coercion is not possible, raise TypeError.
    - 字符串
      - chr(i) ： 0<=i<256, 返回 ascii 码为 i 的字符
      - unichr(i) -> Unicode character ：返回 unicode 。 0 <= i <= 0x10ffff
      - ord(c) ：返回字符 c 的 ascii 码
    - 对象相关
      - delattr(object,name) ：在对象 object 中删除属性 name
        delattr(x, 'y') 相当于 del x.y
      - getattr(object, name[, default]) -> value
        getattr(x, 'y') 相当于 x.y
        
        缺省值，是当对象不包含时的取值
      - hasattr(object, name) -> bool
      - id(object) -> integer ：返回对象 ID，相当于内存中地址
      - hash(object) -> integer ：两个对象具有相同的值，就有相当的 hash。但反之未必。
      - setattr(object, name, value) ：相当于赋值 x.y = v
      - isinstance(object, class-or-type-or-tuple) -> bool
      - issubclass(C, B) -> bool
      - globals() -> dictionary
      - locals() -> dictionary
      - vars([object]) -> dictionary
        没有参数相当于 locals()
        
        以对象为参数，相当于 object.__dict__
      - dir([object]) ：显示对象属性列表
      - repr(object) -> string ：对象 object 的正式名称
      - reload(module) -> module ：重新加载 module
      - iter
        iter(collection) -> iterator
        Get an iterator from an object. In the first form, the argument must supply its own iterator, or be a sequence.
        
        iter(callable, sentinel) -> iterator
        In the second form, the callable is called until it returns the sentinel.
    - 输入输出
      - input([prompt]) -> value ：输入。相当于 eval(raw_input(prompt))。
      - raw_input([prompt]) -> string ：输入内容不做处理，作为字符串
    - 其他
      - __import__(name, globals, locals, fromlist) -> module ：动态加载模块
        import module 中的 module 不能是变量。如果要使用变量动态加载模块，使用下面的方法。
        
        def importName(modulename, name): """ Import name dynamically from module Used to do dynamic import of modules and names that you know their names only in runtime. Any error raised here must be handled by the caller. @param modulename: full qualified mudule name, e.g. x.y.z @param name: name to import from modulename @rtype: any object @return: name from module """ module = __import__(modulename, globals(), {}, [name]) return getattr(module, name)
      - callable(object) ：是否可调用，如函数。对象也可以调用。
      - compile(source, filename, mode[, flags[, dont_inherit]]) -> code object
      - eval(source[, globals[, locals]]) -> value
        执行代码，source 可以是字符串表达的代码，或者 compile 返回的 code object
      - execfile(filename[, globals[, locals]])
      - intern(string) -> string
  - 对象
    - basestring
      - str
      - unicode
    - buffer
    - classmethod
    - complex
    - dict
    - enumerate
    - file
    - file
    - float
    - frozenset
    - int
      - bool
    - list
    - long
    - property
    - reversed
    - set
    - slice
    - staticmethod
    - super
    - tuple
    - type
    - xrange
- 输入和输出
  - 输入：raw_input vs input
    - 最好用 raw_input
      - name = raw_input ("What...is your name? ")
    - input 只能用于输入数字
      - age = input ("How old are you? ")
      - 如果输入的不是数字，直接报错退出！
  - 文件
    - 打开文件
      - 读
        >>> f = open("test.dat","r")
      - 写
        >>> f = open("test.dat","w") >>> print f <open file 'test.dat', mode 'w' at fe820>
    - write("content")：写文件
      - >>> f.write("Now is the time") >>> f.write("to close the file")
    - read(count)：读文件
      - 读取全部数据
        >>> text = f.read() >>> print text Now is the timeto close the file
      - 读取定长数据
        >>> f = open("test.dat","r") >>> print f.read(5) Now i
      - 判断是否到文件尾：读取内容为空
    - readline()：读取一行内容，包括行尾换行符
    - readlines()：读取每行内容到一个列表
    - 关闭文件
      - >>> f.close()
    - 示例
      - def copyFile(oldFile, newFile): f1 = open(oldFile, "r") f2 = open(newFile, "w") while True: text = f1.read(50) if text == "": break f2.write(text) f1.close() f2.close() return
  - % 格式化输出
    - % 用在数字中，是取余数。
    - % 前面如果是字符串，则类似 C 的 printf 格式化输出。
    - 示例
      - >>> cars = 52 >>> "In July we sold %d cars." % cars 'In July we sold 52 cars.'
      - >>> "In %d days we made %f million %s." % (34,6.1,'dollars') 'In 34 days we made 6.100000 million dollars.'
  - pickle 和 cPickle
    - 相当于 C++ 中的序列化
    - 示例
      - >>> import pickle >>> f = open("test.pck","w") >>> pickle.dump(12.3, f) >>> pickle.dump([1,2,3], f) >>> f.close() >>> f = open("test.pck","r") >>> x = pickle.load(f) >>> x 12.3 >>> type(x) <type 'float'> >>> y = pickle.load(f) >>> y [1, 2, 3] >>> type(y) <type 'list'>
    - 使用 cPickle
      - cPickle 是用 C 语言实现的，速度更快
      - 比较两者时间
        bash$ x=1; time while [ $x -lt 20 ]; do x=`expr $x + 1`; ./pickle.py ; done real 0m5.743s user 0m2.368s sys 0m2.932s bash$ x=1; time while [ $x -lt 20 ]; do x=`expr $x + 1`; ./cpickle.py ; done real 0m3.826s user 0m2.194s sys 0m1.958s
    - cPickle 示例
      - #!/usr/bin/python # Filename: pickling.py import cPickle shoplistfile = 'shoplist.data' # The name of the file we will use shoplist = ['apple', 'mango', 'carrot'] # Write to the storage f = file(shoplistfile, 'w') cPickle.dump(shoplist, f) # dump the data to the file f.close() del shoplist # Remove shoplist # Read back from storage f = file(shoplistfile) storedlist = cPickle.load(f) print storedlist
  - 管道(pipe)
    - os.popen('ls /etc').read()
    - os.popen('ls /etc').readlines()
- 关于 Python
  - Python 链接
    - http://www.python.org
    - wxPython
    - Boa
    - Eclipse
  - Python 版本
    - 2.4.3
- 关于本文
  - 作者
    - J
      - Jiang Xin
    - 等待你的加入...
  - 参考资料
    - 《A Byte of Python》, by Swaroop C H
    - 《How to Think Like a Computer Scientist ——Learning with Python》
- 面向对象：类的编程
  - 甚至字符串，变量，函数，都是对象
  - 概念
    - class 和 object
      - class 是用 class 关键字创建的一个新类型
      - object 是该 class 的一个实例
    - fields 和 methods
      - class 中定义的变量称为 field
      - class 中定义的函数称为 method
    - fields 的两种类型
      - instance variables
        属于该类的每一个对象实例
      - class variables
        属于class 本身的
  - method（方法）与函数的区别
    - method 的第一个参数比较特殊
      - 在 method 声明时必须提供，但是调用时又不能提供该参数
      - 这个参数指向对象本身，一般命名为 self
      - python 在调用时会自动提供该参数
    - 例如：调用 MyClass 的一个实例 MyObject： MyObject.method(arg1, arg2) ，Python 自动调用 MyClass.method(MyObject, arg1,arg2).
  - class 变量和 object 变量
    - 在 Class ChassName 中定义的变量 var1 和 var2
    - 如果 ClassName.var1 方式调用，则该变量为 Class 变量，在该 Class 的各个实例中共享
    - 如果 var2 以 self.var2 方式调用，则该变量为 Object 变量，与其他 Object 隔离
    - 示例
      - 类 Person, 每新增一人，类变量 population 加一
      - 代码
        #!/usr/bin/python # Filename: objvar.py class Person: '''Represents a person.''' population = 0 def __init__(self, name): '''Initializes the person.''' self.name = name print '(Initializing %s)' % self.name # When this person is created, # he/she adds to the population Person.population += 1 def sayHi(self): '''Greets the other person. Really, that's all it does.''' print 'Hi, my name is %s.' % self.name def howMany(self): '''Prints the current population.''' # There will always be atleast one person if Person.population == 1: print 'I am the only person here.' else: print 'We have %s persons here.' % \ Person.population swaroop = Person('Swaroop') swaroop.sayHi() swaroop.howMany() kalam = Person('Abdul Kalam') kalam.sayHi() kalam.howMany() swaroop.sayHi() swaroop.howMany()
  - 构造和析构函数
    - __init__ 构造函数
      - 在对象建立时，该函数自动执行。
    - __del__ 构造函数
      - 在用 del 删除对象时，该函数自动执行。
  - 其他类的方法
    - 大多和操作符重载相关
    - __lt__(self, other)
      - 重载 <
    - __getitem__(...) x.__getitem__(y) <==> x[y]
      - 重载 [key]
    - __len__(self)
      - 重载 len() 函数
    - __str__(self)
      - 当 print object 时，打印的内容
    - __iter__(self)
      - 支持 iterator, 返回一个包含 next() 方法的对象。或者如果类定义了 next(), __iter__ 可以直接返回 self
    - __getattribute__(...) x.__getattribute__('name') <==> x.name
  - 类的继承
    - 语法，在子类声明中用括号将基类扩在其中
    - 示例
      - # Filename: inheritance.py class SchoolMember: '''Represents any school member.''' def __init__(self, name, age): self.name = name self.age = age print '(Initialized SchoolMember: %s)' % self.name def tell(self): print 'Name:"%s" Age:"%s" ' % (self.name, self.age), class Teacher(SchoolMember): '''Represents a teacher.''' def __init__(self, name, age, salary): SchoolMember.__init__(self, name, age) self.salary = salary print '(Initialized Teacher: %s)' % self.name def tell(self): SchoolMember.tell(self) print 'Salary:"%d"' % self.salary class Student(SchoolMember): '''Represents a student.''' def __init__(self, name, age, marks): SchoolMember.__init__(self, name, age) self.marks = marks print '(Initialized Student: %s)' % self.name def tell(self): SchoolMember.tell(self) print 'Marks:"%d"' % self.marks t = Teacher('Mrs. Abraham', 40, 30000) s = Student('Swaroop', 21, 75) print # prints a blank line members = [t, s] for member in members: member.tell() # Works for instances of Student as well as Teacher
- 异常处理
  - Try..Except
    - 在 python 解析器中输入 s = raw_input('Enter something --> ')，按 Ctrl-D ， Ctrl-C 看看如何显示？
    - 用 Try..Except 捕获异常输入。示例
      - #!/usr/bin/python # Filename: try_except.py import sys try: s = raw_input('Enter something --> ') except EOFError: print '\nWhy did you do an EOF on me?' sys.exit() # Exit the program except: print '\nSome error/exception occurred.' # Here, we are not exiting the program print 'Done'
  - Try..Finally
    - finally: 代表无论如何都要执行的语句块
  - Raising Exceptions
    - 建立自己的异常事件，需要创建一个 Exception 的子类
    - 创建自己的异常类 ShortInputException 示例
      - #!/usr/bin/python # Filename: raising.py class ShortInputException(Exception): '''A user-defined exception class.''' def __init__(self, length, atleast): self.length = length self.atleast = atleast
    - 产生异常和捕获异常
      - try: s = raw_input('Enter something --> ') if len(s) < 3: raise ShortInputException(len(s), 3) # Other work can go as usual here. except EOFError: print '\nWhy did you do an EOF on me?' except ShortInputException, x: print '\nThe input was of length %d, it should be at least %d'\ % (x.length, x.atleast) else: print 'No exception was raised.'
- 模组和包
  - 示例
    - a.py 示例
      - # -*- python -*- version=0.1.a
    - b.py 以模组调用 a.py
      - a.py 与 b.py 在同一目录下
      - 直接 import
        a.py 中定义的变量和函数的引用属于模块 a 的命名空间
        
        import a print "version:%s, author:%s" % (a.version, a.author)
      - 使用 from module import 语法
        a.py 中定义的变量和函数就像在 b.py 中定义的一样
        
        from a import * print "version:%s, author:%s" % (version, author)
        
        from a import author # 只 import 模块a中一个变量 print "author:", author
      - 将 a.py 拷贝到目录 dir_a 中
      - 修改 sys.path, 将 dir_a 包含其中
        import sys sys.path.insert(0, "dir_a") import a print "author:", a.author
        
        import sys sys.path.insert(0, "dir_a") from a import * print "version:%s, author:%s" % (version, author)
      - 将 dir_a 作为 package
        参见： python.org > Doc > Essays > Packages
        
        在 dir_a 目录下创建文件 __init__.py (空文件即可)
        
        from dir_a import a # 只 import 模块a中一个变量 print "author:", a.author
        
        # b.py from dir_a.a import * print "version:%s, author:%s" % (version, author)
  - 说明
    - 模组文件为 *.py 文件
    - 模组文件位于 PYTHONPATH 指定的目录中，可以用 print sys.path 查看
      - import sys print sys.path
    - 模组引用一次后，会编译为 *.pyc 二进制文件，以提高效率
  - import 语句，引用模组
    - 语法1： "import" module ["as" name] ( "," module ["as" name] )*
    - 语法2： "from" module "import" identifier ["as" name] ( "," identifier ["as" name] )*
  - __name__ 变量
    - 每个模组都有一个名字，模组内语句可以通过 __name__ 属性得到模组名字。
    - 当模组被直接调用， __name__ 设置为 __main__
    - 例如模组中的如下语句
      - #!/usr/bin/python # Filename: using_name.py if __name__ == '__main__': print 'This program is being run by itself' else: print 'I am being imported from another module'
  - __dict__
    - Modules, classes, and class instances all have __dict__ attributes that holds the namespace contents for that object.
  - dir() 函数
    - 可以列出一个模组中定义的变量
  - 关于包（package）
    - package 可以更有效的组织 modules。
    - __init__.py 文件，决定了一个目录不是不同目录，而是作为 python package
      - __init__.py 可以为空
      - __init__.py 可以包含 __all__变量
    - package 就是一个目录，包含 *.py 模组文件，同时包含一个 __init__.py 文件
    - 一个问题：由于 Mac, windows 等对于文件名大小写不区分，当用 from package import * 的时候，难以确定文件名到模组名的对应
    - __all__ 变量是一个解决方案
      - 已如对于上例，在 __init__.py 中定义 __all__ = ["a"] 即当 from dir_a import * 的时候，import 的模组是 __all__ 中定义的模组
- sys, os: Python 核心库
- Python 函数库
  - sys
    - 查看系统信息 sys.platform, sys.version_info, sys.maxint
      - >>> import sys >>> sys.version '2.4.1 (#1, May 27 2005, 18:02:40) \n[GCC 3.3.3 (cygwin special)]' >>> sys.version_info (2, 4, 1, 'final', 0) >>> sys.platform, sys.maxint ('linux2', 9223372036854775807)
    - Python 模组的查询路径： sys.path
      - 显示 python 查询路径： sys.path
      - 设置 Python 模组包含路径： sys.path.append( '/home/user')
    - Exception 例外信息： sys.exc_type
      - >>> try: ... raise IndexError ... except: ... print sys.exc_info()
      - try: raise TypeError, "Bad Thing" except: print sys.exc_info() print sys.exc_type, sys.exc_value
    - 命令行参数： sys.argv
      - 命令行参数数目： len(sys.argv) ，包含程序本身名称
      - sys.argv[0] 为程序名称， sys.argv[1] 为第一个参数，依此类推
      - 示例1
        def main(arg1, arg2): """main entry point""" ... ... if __name__ == '__main__': if len(sys.argv) < 3: sys.stderr.write("Usage: %s ARG1 ARG2\n" % (sys.argv[0])) else: main(sys.argv[1], sys.argv[2])
      - 示例2
        #!/usr/bin/python # Filename : using_sys.py import sys print 'The command line arguments used are:' for i in sys.argv: print i print '\n\nThe PYTHONPATH is', sys.path, '\n'
    - 系统退出： sys.exit
    - 标准输入输出等： sys.stdin, sys.stdout, sys.stderr
  - os
    - 分隔符等：os.sep, os.pathsep, os.linesep
    - 获取进程ID： os.getpid()
    - 得到当前路径： os.getcwd()
    - 切换路径： os.chdir(r'c:\temp')
    - 将路径分解为目录和文件名：os.path.split(), os.path.dirname()
      - >>> os.path.split('/home/swaroop/poem.txt') ('/home/swaroop', 'poem.txt')
      - os.path.dirname('/etc/init.d/apachectl')
      - os.path.basename('/etc/init.d/apachectl')
    - 判断是文件还是目录： os.path.isdir(r'c:\temp'), os.path.isfile(r'c:\temp') ，返回值 1,0
    - 判断文件/目录是否存在 os.path.exists('/etc/passwd')
    - 执行系统命令： os.system('ls -l /etc')
    - 执行系统命令并打开管道： os.popen(command [, mode='r' [, bufsize]])
      - os.popen('ls /etc').read()
      - os.popen('ls /etc').readlines()
  - string （字符串处理）
    - 帮助： help('string')
    - 示例
      - import string fruit = "banana" index = string.find(fruit, "a") print index
  - math （数学函数）
    - 例如
      - import math x = math.cos(angle + math.pi/2) x = math.exp(math.log(10.0))
  - re
    - 帮助
      - 常规表达式。参考： http://docs.python.org/lib/module-re.html
      - >>> help('sre')
    - 正则表达式语法
      - ^, $ 指代字符串开始，结束。对于 re.MULTILINE 模式，^,$ 除了指代字符串开始和结尾，还指代一行的开始和结束
      - [ ] 字符列表，其中的 ^ 含义为“非”
      - *, +, ?, {m,n} ：量词（默认贪婪模式，尽量多的匹配）
        例如：表达式 "<.*>" 用于匹配字符串 '<H1>title</H1>'，会匹配整个字串，而非 '<H1>'
        
        >>> re.match('<.*>', '<H1>titile</H1>').group() '<H1>titile</H1>'
      - *?, +?, ?? ：避免贪婪模式的量词
        例如：表达式 "<.*?>" 用于匹配字符串 '<H1>title</H1>'，只匹配 '<H1>'
        
        >>> re.match('<.*?>', '<H1>titile</H1>').group() '<H1>'
      - {m,n}? ：同样尽量少的匹配（非贪婪模式）
        >>> re.match('<.{,20}>', '<H1>titile</H1>').group() '<H1>titile</H1>'
        
        >>> re.match('<.{,20}?>', '<H1>titile</H1>').group() '<H1>'
      - [(] [)]
        ( 和 )，用于组合pattern，如果要匹配括号，可以使用 $, $ 或者 [(] , [)]
      - ( ) ：组合表达式，可以在后面匹配
      - (?iLmsux)
        (? 之后跟 iLmsux 任意字符，相当于设置了 re.I, re.L, re.M, re.S, re.U, re.X
        
        参见 re 选项
        
        >>> re.search('(?i)(T[A-Z]*)','<h1>title</h1>').groups() ('title',)
      - (?P<name>pattern) ：用名称指代匹配
        >>> re.match('(?P<p>.*?)(?::\s*)(?P<msg>.*)', 'prompt: enter your name').group('p') 'prompt' >>> re.match('(?P<p>.*?)(?::\s*)(?P<msg>.*)', 'prompt: enter your name').group('msg') 'enter your name' >>> re.match('(?P<p>.*?)(?::\s*)(?P<msg>.*)', 'prompt: enter your name').group(0) 'prompt: enter your name' >>> re.match('(?P<p>.*?)(?::\s*)(?P<msg>.*)', 'prompt: enter your name').group(1) 'prompt' >>> re.match('(?P<p>.*?)(?::\s*)(?P<msg>.*)', 'prompt: enter your name').group(2) 'enter your name'
        
        用 r'\1' 指代匹配 >>> re.sub ( 'id:\s*(?P<id>\d+)', 'N:\\1', 'userlist\nid:001,user001:jiangxin\nid:002,user003:tom\nid:003,user003:jerry\n') 'userlist\nN:001,user001:jiangxin\nN:002,user003:tom\nN:003,user003:jerry\n' >>> re.sub ( 'id:\s*(?P<id>\d+)', r'N:\1', 'userlist\nid:001,user001:jiangxin\nid:002,user003:tom\nid:003,user003:jerry\n') 'userlist\nN:001,user001:jiangxin\nN:002,user003:tom\nN:003,user003:jerry\n'
        
        用 r'\g<name>' 指代匹配 >>> re.sub ( 'id:\s*(?P<id>\d+)', r'N:\g<id>', 'userlist\nid:001,user001:jiangxin\nid:002,user003:tom\nid:003,user003:jerry\n') 'userlist\nN:001,user001:jiangxin\nN:002,user003:tom\nN:003,user003:jerry\n'
      - (?P=name) ：指代前面发现的命名匹配
        >>> re.findall ( 'id:\s*(?P<id>\d+)', 'userlist\nid:001,user001:jiangxin\nid:002,user003:tom\nid:003,user003:jerry\n') ['001', '002', '003']
        
        >>> re.findall ( 'id:\s*(?P<id>\d+),\s*user(?P=id):', 'userlist\nid:001,user001:jiangxin\nid:002,user003:tom\nid:003,user003:jerry\n') ['001', '003']
      - (?#...) ：为注释
      - (?:pattern)
        组合表达式，但并不计入分组
        
        对比下面的两个例子： >>> re.match('(.*?:\s*)(.*)', 'prompt: enter your name').group(1) 'prompt: ' >>> re.match('(?:.*?:\s*)(.*)', 'prompt: enter your name').group(1) 'enter your name'
      - (?=pattern) 正向前断言
        Matches if pattern matches next, but doesn't consume any of the string.
        
        例如：
        只改动出现在 foobar 中的 foo, 不改变如 fool, foolish 中出现的 foo
        
        $line="foobar\nfool"; ## foo后面出现bar，且 bar 的内容不再替换之列。 $line =~ s/foo(?=bar)/something/gm; print "$line\n"; 显示 somethingbar fool
      - (?!pattern) 负向前断言
        则和 (?=pattern) 相反。 Matches if ... doesn't match next. This is a negative lookahead assertion.
        
        例如: 改动除了 foobar 外单词中的 foo, 如 fool, foolish 中出现的 foo。
        $line="foobar\nfool"; ## foo后面不是bar，且 (?!..) 中的内容不再替换之列。 $line =~ s/foo(?!bar)/something/gm; print "$line\n"; 显示 foobar somethingl
      - (?<=pattern) 正向后断言
        正向后断言。Matches if the current position in the string is preceded by a match for ... that ends at the current position.
        
        如下例:
        $line="foobar\nbarfoo\nbar foo\na fool"; ## 替换 bar 后面的 foo，(bar) 不再替换之列。 $line =~ s/(?<=bar)foo/something/gm; print "$line\n"; 显示 foobar barsomething bar foo a fool
      - (?<!pattern) 负向后断言
        负向后断言。 Matches if the current position in the string is not preceded by a match for .... This is called a negative lookbehind assertion.
        
        如下例:
        $line="foobar\nbarfoo\nbar foo\na fool"; ## 替换 foo，但之前不能是 bar。 $line =~ s/(?<!bar)foo/something/gm; print "$line\n"; 显示 somethingbar barfoo bar something a somethingl
    - 正则表达式特殊字符
      - \A Matches only at the start of the string.
      - \b Matches the empty string, but only at the beginning or end of a word
      - \B Matches the empty string, but only when it is not at the beginning or end of a word.
      - \d When the UNICODE flag is not specified, matches any decimal digit; this is equivalent to the set [0-9]. With UNICODE, it will match whatever is classified as a digit in the Unicode character properties database.
      - \D When the UNICODE flag is not specified, matches any non-digit character; this is equivalent to the set [^0-9]. With UNICODE, it will match anything other than character marked as digits in the Unicode character properties database.
      - \s When the LOCALE and UNICODE flags are not specified, matches any whitespace character; this is equivalent to the set [ \t\n\r\f\v]. With LOCALE, it will match this set plus whatever characters are defined as space for the current locale. If UNICODE is set, this will match the characters [ \t\n\r\f\v] plus whatever is classified as space in the Unicode character properties database.
      - \S When the LOCALE and UNICODE flags are not specified, matches any non-whitespace character; this is equivalent to the set [^ \t\n\r\f\v] With LOCALE, it will match any character not in this set, and not defined as space in the current locale. If UNICODE is set, this will match anything other than [ \t\n\r\f\v] and characters marked as space in the Unicode character properties database.
      - \w When the LOCALE and UNICODE flags are not specified, matches any alphanumeric character and the underscore; this is equivalent to the set [a-zA-Z0-9_]. With LOCALE, it will match the set [0-9_] plus whatever characters are defined as alphanumeric for the current locale. If UNICODE is set, this will match the characters [0-9_] plus whatever is classified as alphanumeric in the Unicode character properties database.
      - \W When the LOCALE and UNICODE flags are not specified, matches any non-alphanumeric character; this is equivalent to the set [^a-zA-Z0-9_]. With LOCALE, it will match any character not in the set [0-9_], and not defined as alphanumeric for the current locale. If UNICODE is set, this will match anything other than [0-9_] and characters marked as alphanumeric in the Unicode character properties database.
      - \Z Matches only at the end of the string.
    - re 选项
      - re.I, re.IGNORE ：忽略大小写
      - re.L, re.LOCALE ： \w, \W, \b, \B, \s and \S 参考当前 locale
      - re.M, re.MULTILINE ：将字符串视为多行，^ 和 $ 匹配字符串中的换行符。缺省只匹配字符串开始和结束。
      - re.S, re.DOTALL ： . 匹配任意字符包括换行符。缺省匹配除了换行符外的字符
      - re.U, re.UNICODE ： \w, \W, \b, \B, \d, \D, \s and \S 参考 Unicode 属性
        >>> re.compile(ur'----(-)*\r?\n.*\b(网页类)\b',re.U).search("--------\r\nCategoryX 网页类 CategoryY".decode('utf-8')).groups() (u'-', u'\u7f51\u9875\u7c7b')
        
        >>> re.compile(ur'----(-)*\r?\n.*\b(网页类)\b',re.U).search(u"--------\r\nCategoryX 网页类 CategoryY").groups() (u'-', u'\u7f51\u9875\u7c7b')
      - re.X, re.VERBOSE ：可以添加 # 注释，以增强表达式可读性。
        空格被忽略。＃为注释
        
        例如： page_invalid_chars_regex = re.compile( ur""" \u0000 | # NULL # Bidi control characters \u202A | # LRE \u202B | # RLE \u202C | # PDF \u202D | # LRM \u202E # RLM """, re.UNICODE | re.VERBOSE )
    - 注意 match 和 search 的差异
      - re.match( pattern, string[, flags]) 仅在字符串开头匹配。相当于在 pattern 前加上了一个'^'！
        >>> p.match("") >>> print p.match("") None p = re.compile( ... ) m = p.match( 'string goes here' ) if m: print 'Match found: ', m.group() else: print 'No match'
      - re.search( pattern, string[, flags]) 在整个字符串中查询
    - re.compile( pattern[, flags])
      - 使用 re.compile，对于需要重复使用的表达式，更有效率
      - prog = re.compile(pat) result = prog.match(str) 相当于 result = re.match(pat, str)
    - re.split( pattern, string[, maxsplit = 0]) 分割字符串
      - >>> re.split('\W+', 'Words, words, words.') ['Words', 'words', 'words', ''] >>> re.split('(\W+)', 'Words, words, words.') ['Words', ', ', 'words', ', ', 'words', '.', ''] >>> re.split('\W+', 'Words, words, words.', 1) ['Words', 'words, words.']
    - re.findall( pattern, string[, flags])
      - 查询所有匹配，返回 list
      - >>> p = re.compile('\d+') >>> p.findall('12 drummers drumming, 11 pipers piping, 10 lords a-leaping') ['12', '11', '10']
    - re.finditer( pattern, string[, flags])
      - 查询所有匹配，返回 iterator
      - >>> p = re.compile('\d+') >>> iterator = p.finditer('12 drummers drumming, 11 ... 10 ...') >>> iterator <callable-iterator object at 0x401833ac> >>> for match in iterator: ... print match.span() ... (0, 2) (22, 24) (29, 31)
    - re.sub(pattern, repl, string[, count])
      - >>> re.sub ( 'id:\s*(?P<id>\d+)', 'N:\\1', 'userlist\nid:001,user001:jiangxin\nid:002,user003:tom\nid:003,user003:jerry\n') 'userlist\nN:001,user001:jiangxin\nN:002,user003:tom\nN:003,user003:jerry\n' >>> re.sub ( 'id:\s*(?P<id>\d+)', r'N:\1', 'userlist\nid:001,user001:jiangxin\nid:002,user003:tom\nid:003,user003:jerry\n') 'userlist\nN:001,user001:jiangxin\nN:002,user003:tom\nN:003,user003:jerry\n'
      - >>> re.sub ( 'id:\s*(?P<id>\d+)', r'N:\g<id>', 'userlist\nid:001,user001:jiangxin\nid:002,user003:tom\nid:003,user003:jerry\n') 'userlist\nN:001,user001:jiangxin\nN:002,user003:tom\nN:003,user003:jerry\n'
    - re.subn( pattern, repl, string[, count]) 和 re.sub 类似，返回值不同
      - 返回值为： a tuple (new_string, number_of_subs_made).
    - re.escape(string) ：对字符串预处理，以免其中特殊字符对正则表达式造成影响
    - compile 对象
      - re.compile 返回的 compile 对象的方法都有 re 类似方法对应，只是参数不同
      - re 相关对象有 flags 参数，而 compile 对象因为在建立之初已经提供了 flags，在 compile 相应方法中，用 pos, endpos 即开始位置和结束位置参数取代 flags 参数
      - match( string[, pos[, endpos]])
      - search( string[, pos[, endpos]])
      - split( string[, maxsplit = 0])
      - findall( string[, pos[, endpos]])
      - finditer( string[, pos[, endpos]])
      - sub( repl, string[, count = 0])
      - subn( repl, string[, count = 0])
    - match 对象
      - expand( template)
        利用匹配结果展开模板 template
        
        支持 "\1", "\2", "\g<1>", "\g<name>"
      - group( [group1, ...])
        示例 m = re.match(r"(?P<int>\d+)\.(\d*)", '3.14') 结果 m.group(1) is '3', as is m.group('int'), and m.group(2) is '14'.
        
        >>> p = re.compile('(a(b)c)d') >>> m = p.match('abcd') >>> m.group(0) 'abcd' >>> m.group(1) 'abc' >>> m.group(2) 'b' >>> m.groups() ('abc', 'b')
      - groups( [default])
        返回一个 tuple，包含从 1 开始的所有匹配
      - groupdict( [default])
        返回一个 dictionary，包含所有的命名匹配
      - start( [group]) 和 end( [group])
        分别代表第 group 组匹配在字符串中的开始和结束位置
      - span( [group])
        返回由 start, end 组成的二值 tuple
  - getopt（命令行处理）
    - getopt.getopt( args, options[, long_options])
      - args 是除了应用程序名称外的参数，相当于： sys.argv[1:]
      - options 是短格式的参数支持。如果带有赋值的参数后面加上冒号":"。参见 Unix getopt()
      - long_options 是长格式的参数支持。如果是带有赋值的参数，参数后面加上等号“="。
      - 返回值：返回两个元素
        一：返回包含 (option, value) 键值对的列表
        
        二：返回剩余参数
    - 异常：GetoptError ，又作 error
    - 示例：
      - >>> import getopt >>> args = '-a -b -cfoo -d bar a1 a2'.split() >>> args ['-a', '-b', '-cfoo', '-d', 'bar', 'a1', 'a2'] >>> optlist, args = getopt.getopt(args, 'abc:d:') >>> optlist [('-a', ''), ('-b', ''), ('-c', 'foo'), ('-d', 'bar')] >>> args ['a1', 'a2']
      - """Module docstring. This serves as a long usage message. """ import sys import getopt def main(): # parse command line options try: opts, args = getopt.getopt(sys.argv[1:], "hp:", ["help", "port="]) except getopt.error, msg: print msg print "for help use --help" sys.exit(2) # process options for o, a in opts: if o in ("-h", "--help"): print __doc__ sys.exit(0) elif o in ("-p", "--port"): print "port is %d" % a # process arguments for arg in args: process(arg) # process() is defined elsewhere if __name__ == "__main__": main()
  - 数据库
    - 参见： http://mysql-python.sourceforge.net/MySQLdb.html
  - LDAP
  - time（时间函数）
    - time.time() ：返回 Unix Epoch 时间（秒），符点数
    - time.clock() ：进程启动后的秒数（符点数）
    - gmtime() ：返回 UTC 时间，格式为 tuple
    - localtime() ：返回本地时间，格式为 tuple
    - asctime() ：将 tuple 时间转换为字符串
    - ctime() ：将秒转换为字符串
    - mktime() ：将本地时间 tuple 转换为 Epoch 秒数
    - strftime() ：将 tuple time 依照格式转换
    - strptime() ：将字符串按格式转换为 tuple time
    - tzset() ：设置时区
  - logging
    - logging 级别
      - Level Numeric value CRITICAL 50 ERROR 40 WARNING 30 INFO 20 DEBUG 10 NOTSET 0
    - getLogger()
      - 缺省为 root logger, 通过 getLogger 设置新的 logger 和名称
      - logging.basicConfig() logging.getLogger("").setLevel(logging.DEBUG) ERR = logging.getLogger("ERR") ERR = logging.getLogger("ERR") ERR.setLevel(logging.ERROR) #These should log logging.log(logging.CRITICAL, nextmessage()) logging.debug(nextmessage()) ERR.log(logging.CRITICAL, nextmessage()) ERR.error(nextmessage()) #These should not log ERR.debug(nextmessage())
    - basicConfig 用于设置日志级别和格式等
      - logging.basicConfig(level=logging.DEBUG, format="%(levelname)s : %(asctime)-15s > %(message)s")
- Python 实战
  - 帮助框架
    - __doc__
      - '''PROGRAM INTRODUCTION Usage: %(PROGRAM)s [options] Options: -h|--help Print this message and exit. '''
    - 函数 usage
      - def usage(code, msg=''): if code: fd = sys.stderr else: fd = sys.stdout print >> fd, _(__doc__) if msg: print >> fd, msg sys.exit(code)
      - 说明： code 是返回值，msg 是附加的错误消息
  - 命令行处理
    - 命令行框架
      - #!/usr/bin/python # -*- coding: utf-8 -*- import sys import getopt def main(argv=None): if argv is None: argv = sys.argv try: opts, args = getopt.getopt( argv[1:], "hn:", ["help", "name="]) except getopt.error, msg: return usage(1, msg) for opt, arg in opts: if opt in ('-h', '--help'): return usage(0) #elif opt in ('--more_options'): if __name__ == "__main__": sys.exit(main())
    - 说明
      - 利用 __name__ 属性，封装代码
      - sys.argv 参见
      - 之所以为 main 添加缺省参数，是为了可以在交互模式调用 main 来传参
        def main(argv=None): if argv is None: argv = sys.argv # etc., replacing sys.argv with argv in the getopt() call.
      - 为防止 main 中调用 sys.exit()，导致交互模式退出，在 main 中使用 return 语句，而非 sys.exit
        if __name__ == "__main__": sys.exit(main())
  - 文件读写
  - unicode
    - Python 里面的编码和解码也就是 unicode 和 str 这两种形式的相互转化。编码是 unicode -> str，相反的，解码就 > 是 str -> unicode
    - 认识 unicode
      - # 因为当前 locale 是 utf-8 编码，因此字符串默认编码为 utf-8 >>> '中文' '\xe4\xb8\xad\xe6\x96\x87' >>> isinstance('中文', unicode) False >>> isinstance('中文', str) True
      - # decode 是将 str 转换为 unicode >>> '中文'.decode('utf-8') u'\u4e2d\u6587' >>> isinstance('中文'.decode('utf-8'), unicode) True >>> isinstance('中文'.decode('utf-8'), str) False
      - # 前缀 u 定义 unicode 字串 >>> u'中文' u'\u4e2d\u6587' >>> isinstance(u'中文', unicode) True >>> isinstance(u'中文', str) False
      - # encode 将 unicode 转换为 str >>> u'中文'.encode('utf-8') '\xe4\xb8\xad\xe6\x96\x87' >>> isinstance(u'中文'.encode('utf-8'), unicode) False >>> isinstance(u'中文'.encode('utf-8'), str) True
      - >>> len(u'中文') 2 >>> len(u'中文'.encode('utf-8')) 6 >>> len(u'中文'.encode('utf-8').decode('utf-8')) 2
    - Unicode 典型错误1
      - >>> "str1: %s, str2: %s" % ('中文', u'中文') Traceback (most recent call last): File "<stdin>", line 1, in ? UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 6: ordinal not in range(128)
      - 解决方案
        >>> "str1: %s, str2: %s" % ('中文', '中文') 'str1: \xe4\xb8\xad\xe6\x96\x87, str2: \xe4\xb8\xad\xe6\x96\x87'
        
        >>> "str1: %s, str2: %s" % (u'中文', u'中文') u'str1: \u4e2d\u6587, str2: \u4e2d\u6587'
    - Unicode 典型错误2
      - mystr = '中文' mystr.encode('gb18030') 报错： Traceback (most recent call last): File "<stdin>", line 1, in ? UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 0: ordinal not in range(128)
      - 错误解析： mystr.encode('gb18030') 这句代码将 mystr 重新编码为 gb18030 的格式，即进行 unicode -> str 的转换。因为 mystr 本身就是 str 类型的，因此 Python 会自动的先将 mystr 解码为 unicode ，然后再编码成 gb18030。因为解码是python自动进行的，我们没有指明解码方式，python 就会使用 sys.defaultencoding 指明的方式来解码。很多情况下 sys.defaultencoding 是 ANSCII，如果 mystr 不是这个类型就会出错。拿上面的情况来说，缺省 sys.defaultencoding 是 anscii，而 mystr 的编码方式和文件的编码方式一致，是 utf8 的，所以出错了。
    - 通过 sys.setdefaultencoding 设置字符串缺省编码
      - #! /usr/bin/env python # -*- coding: utf-8 -*- import sys reload(sys) # Python2.5 初始化后会删除 sys.setdefaultencoding 这个方法，我们需要重新载入 sys.setdefaultencoding('utf-8') mystr = '中文' # 缺省先用定义的缺省字符集将 str 解码为 unicode， # 之后编码为 gb18030 mystr.encode('gb18030')
    - 显式将 str 转换为 unicode, 再编码
      - #! /usr/bin/env python # -*- coding: gb2312 -*- s = '中文' s.decode('gb2312').encode('big5')
      - #! /usr/bin/env python # -*- coding: utf-8 -*- s = '中文' # 即使文件编码为 utf-8，sys 的缺省字符编码仍为 ascii，需要显式设置解码的字符集为 utf-8 print s.decode('utf-8') print s.decode('utf-8').encode('gb18030')
    - unicode 函数
      - 是 python 内置函数。将字符串由'charset' 字符集转换为 unicode
      - unicode (message, charset)
      - unicode('中文字符串', 'gbk')
    - encode 负责 uicode --> str
      - unicode('中文字符串', 'gbk').encode('gb18030')
  - 调试
    - 手动调试函数
      - 运行命令行 python
      - 用 import 加载程序，模块名为程序名
      - 以程序名.函数名(参数) 方式调试函数
  - 语法检查
    - PyLint 除了语法错误检查外，还能提供很多修改建议。诸如：发现 Tab 和空格混用进行缩进，……
    - PyLint 网址: http://www.logilab.org/projects/pylint
  - Python IDE
    - Eclipse
      - http://www.eclipse.org/
      - http://www.javasoft.com/
      - Pydev
    - Boa

Python 学习笔记//mm2html.xsl FreemindVersion:0.9.0_Beta_8

World Hello

Python 学习笔记