Python 学习笔记
- Python 学习笔记
- Python 的语法特点
(Something strange ……)
- 获得帮助
- 2. 包含要查询的模组。如: import sys
- 3. 显示该模组包含的属性。命令: dir(sys)
- 源文件的字符集设置
- 为支持中文,需要在源码的第一行或第二行(一般是第二行)添加特殊格式的注释,声明源文件的字符集。默认为 7-bit ASCII
- 格式为: # -*- coding: <encoding-name> -*-
- 参见: http://www.python.org/dev/peps/pep-0263/
- 如:设置 gbk 编码:
#!/usr/bin/python
# -*- coding: gbk -*-
- 如: 设置 utf-8 编码
#!/usr/bin/python
# -*- coding: utf-8 -*-
- 注: emacs 能够也能识别该语法。而 VIM 通过 # vim:fileencoding=<encoding-name> 来识别
- 常量和变量
- 变量
- 变量名规则和 C 的相类似
- 合法的变量名,如: __my_name, name_23, a1b2_c3 等
- 保留关键字(不能与之重名)
- and def exec if not return
assert del finally import or try
break elif for in pass while
class else from is print yield
continue except global lambda raise
- 类型综述 / 查看类型
- int
- >>> type(17)
<type 'int'>
- float
- >>> type(3.2)
<type 'float'>
- long
- >>> type(1L)
<type 'long'>
- >>> type(long(1))
<type 'long'>
- bool
- >>> type(True)
<type 'bool'>
- >>> type(1>2)
<type 'bool'>
- string
- >>> type("Hello, World!")
<type 'str'>
- >>> type("WorldHello"[0])
<type 'str'>
- list
- >>> type(['a','b','c'])
<type 'list'>
- >>> type([])
<type 'list'>
- tuple
- >>> type(('a','b','c'))
<type 'tuple'>
- >>> type(())
<type 'tuple'>
- dict
- >>> type({'color1':'red','color12':'blue'})
<type 'dict'>
- >>> type({})
<type 'dict'>
- 字符串
- 三引号
- 三引号:''' 或者 """ 是 python 的发明。三引号可以包含跨行文字,其中的引号不必转义。(即内容可以包含的换行符和引号)
- 如
- '''This is a multi-line string. This is the first line.
This is the second line.
"What's your name?," I asked.
He said "Bond, James Bond."
'''
- 单引号和双引号都可以用于创建字符串。
- 注意,单引号和双引号没有任何不同,不像 PHP, PERL
- r 或者 R 作为前缀,引入 Raw String
- 例如: r"Newlines are indicated by \n."
- 在处理常规表达式,尽量使用 Raw String,免得增加反斜线。例如 r'\1' 相当于 '\\1'。
- u 或者 U 作为前缀,引入 Unicode
- 例如: u"This is a Unicode string."
- 字符串连接:两个字符串并排,则表示两个字符串连接在一起
- 'What\'s ' "your name?" 自动转换为 "What's your name?" .
- 作用二:可以为每段文字添加注释。如:
- re.compile("[A-Za-z_]" # letter or underscore
"[A-Za-z0-9_]*" # letter, digit or underscore
)
- 用括号包含多行字串
- >>> test= ("case 1: something;" # test case 1
... "case 2: something;" #test case 2
... "case 3: something." #test case 3
... )
>>> test
'case 1: something;case 2: something;case 3: something.'
- 类似于 sprintf 的字符串格式化
- header1 = "Dear %s," % name
- header2 = "Dear %(title)s %(name)s," % vars()
- 字符串操作
- String slices
- [n] : 字符串的第 n+1 个字符
- str="WorldHello"
print str[len(str)-1]
- [n:m] : 返回从 n 开始到 m 结束的字符串,包括 n, 不包括 m
- >>> s = "0123456789"
>>> print s[0:5]
01234
>>> print s[3:5]
34
>>> print s[7:21]
789
>>> print s[:5]
01234
>>> print s[7:]
789
>>> print s[21:]
- 警告: python 中字符串不可更改,属于常量
- # 错误!字符串不可更改
greeting = "Hello, world!"
greeting[0] = 'J' # ERROR!
print greeting
- # 可改写为:
greeting = "Hello, world!"
newGreeting = 'J' + greeting[1:]
print newGreeting
- 数字
- 整形和长整形
- longinteger ::= integer ("l" | "L")
integer ::= decimalinteger | octinteger | hexinteger
decimalinteger ::= nonzerodigit digit* | "0"
octinteger ::= "0" octdigit+
hexinteger ::= "0" ("x" | "X") hexdigit+
nonzerodigit ::= "1"..."9"
octdigit ::= "0"..."7"
hexdigit ::= digit | "a"..."f" | "A"..."F"
- 类型转换
- ord('A') : 返回 字母'A' 的 ASCII 值
- 局部变量与全局变量
- 函数中可以直接引用全局变量的值,无须定义。但如果修改,影响只限于函数内部。
- 函数中没有用 global 声明的变量是局部变量,不影响全局变量的取值
- global 声明全局变量
- #!/usr/bin/python
def func1():
print "func1: local x is", x
def func2():
x = 2
print 'func2: local x is', x
def func3():
global x
print "func3: before change, x is", x
x = 2
print 'func3: changed x to', x
x = 1
print 'Global x is', x
func1()
print 'Global x is', x
func2()
print 'Global x is', x
func3()
print 'Global x is', x
- locals() 和 globals() 是两个特殊函数,返回局部变量和全局变量
- locals() 返回局部变量的 copy,不能修改
- globals() 返回全局变量的 namespace, 可以通过其修改全局变量本身
- vars() 等同于 locales(),可以用 vars()['key'] = 'value' 动态添加局部变量
- 复杂类型
- list (列表)
- 方括号建立的列表
- ["spam", "bungee", "swallow"]
- ["hello", 2.0, 5, [10, 20]]
- range 函数建立的列表
- >>> range(1,5)
[1, 2, 3, 4]
- 从1 到 5,包括1,但不包括5。(隐含步长为1)
- >>> range(10)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
- 从 0 到 10,包括 0,但不包括 10。(隐含步长为1)
- >>> range(1, 10, 2)
[1, 3, 5, 7, 9]
- print 语句显示列表
- vocabulary = ["ameliorate", "castigate", "defenestrate"]
numbers = [17, 123]
empty = []
print vocabulary, numbers, empty
['ameliorate', 'castigate', 'defenestrate'] [17, 123] []
- 列表操作
- + (相加)
- >>> a = [1, 2, 3]
>>> b = [4, 5, 6]
>>> c = a + b
>>> print c
[1, 2, 3, 4, 5, 6]
- * (重复)
- >>> [0] * 4
[0, 0, 0, 0]
>>> [1, 2, 3] * 3
[1, 2, 3, 1, 2, 3, 1, 2, 3]
- 列表是变量,可以更改
- >>> fruit = ["banana", "apple", "quince"]
>>> fruit[0] = "pear"
>>> fruit[-1] = "orange"
>>> print fruit
['pear', 'apple', 'orange']
- >>> list = ['a', 'b', 'c', 'd', 'e', 'f']
>>> list[1:3] = ['x', 'y']
>>> print list
['a', 'x', 'y', 'd', 'e', 'f']
- 列表中增加元素
- >>> list = ['a', 'd', 'f']
>>> list[1:1] = ['b', 'c']
>>> print list
['a', 'b', 'c', 'd', 'f']
>>> list[4:4] = ['e']
>>> print list
['a', 'b', 'c', 'd', 'e', 'f']
- 删除列表中元素
- 通过清空而删除
- >>> list = ['a', 'b', 'c', 'd', 'e', 'f']
>>> list[1:3] = []
>>> print list
['a', 'd', 'e', 'f']
- 使用 del 关键字
- >>> a = ['one', 'two', 'three']
>>> del a[1]
>>> a
['one', 'three']
- >>> list = ['a', 'b', 'c', 'd', 'e', 'f']
>>> del list[1:5]
>>> print list
['a', 'f']
- 查看列表的id
- >>> a = [1, 2, 3]
>>> b = [1, 2, 3]
>>> print id(a), id(b)
418650444 418675820
>>> b = a
>>> print id(a), id(b)
418650444 418650444
>>> b = a[:]
>>> print id(a), id(b)
418650444 418675692
- 引用和Copy/Clone
- list 作为函数的参数,是引用调用,即函数对 list 所做的修改会影响 list 对象本身
- 列表嵌套和矩阵
- 嵌套
- >>> list = ["hello", 2.0, 5, [10, 20]]
>>> list[3][1]
20
- 矩阵
- >>> matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
>>> matrix[1]
[4, 5, 6]
>>> matrix[1][1]
5
- 字符串和列表
- string.split 方法
- >>> import string
>>> song = "The rain in Spain..."
>>> string.split(song)
['The', 'rain', 'in', 'Spain...']
- >>> string.split(song, 'ai')
['The r', 'n in Sp', 'n...']
- string.join 方法
- >>> list = ['The', 'rain', 'in', 'Spain...']
>>> string.join(list)
'The rain in Spain...'
- >>> string.join(list, '_')
'The_rain_in_Spain...'
- >>> list = ['The', 'rain', 'in', 'Spain...']
>>> '|'.join(list)
'The|rain|in|Spain...'
- Tuples
- 圆括号建立 Tuple
- 在最外面用圆括号括起来
- >>> type((1,2,3))
<type 'tuple'>
- 必需是逗号分隔的多个值
- >>> type((1))
<type 'int'>
- >>> type((1,))
<type 'tuple'>
- >>> type(('WorldHello'))
<type 'str'>
- >>> type(('WorldHello',))
<type 'tuple'>
- Tuple vs list
- Tuple 和 list 的区别就是: Tuple 是不可更改的,而 list 是可以更改的
- 一个元素也可以构成 list,但 tuple 必需为多个元素
- >>> type([1])
<type 'list'>
- >>> type((1))
<type 'int'>
- Dictionaries (哈希表)
- 花括号建立 哈希表
- Perl 管这种类型叫做 哈希表 或者关联数组。即下标可以是字符串的数组
- >>> eng2sp = {}
>>> eng2sp['one'] = 'uno'
>>> eng2sp['two'] = 'dos'
>>> print eng2sp
{'one': 'uno', 'two': 'dos'}
- 访问哈希表中元素:下标为字符串
- >>> print eng2sp
{'one': 'uno', 'three': 'tres', 'two': 'dos'}
>>> print eng2sp['two']
'dos'
- 哈希表操作
- keys() 方法,返回 keys 组成的列表
- >>> eng2sp.keys()
['one', 'three', 'two']
- values() 方法,返回由 values 组成的列表
- >>> eng2sp.values()
['uno', 'tres', 'dos']
- items() 方法,返回由 key-value tuple 组成的列表
- >>> eng2sp.items()
[('one','uno'), ('three', 'tres'), ('two', 'dos')]
- from MoinMoin.util.chartypes import _chartypes
for key, val in _chartypes.items():
if not vars().has_key(key):
vars()[key] = val
- haskey() 方法,返回布尔值
- >>> eng2sp.has_key('one')
True
>>> eng2sp.has_key('deux')
False
- get() 方法
- get() 可以带缺省值,即如果没有定义该 key,返回缺省值
- 如 eng2sp.get('none', 0),如果没有定义 none, 返回 0,而不是空
- 引用和 copy/clone
- 哈希表的克隆:copy() 方法
- >>> opposites = {'up': 'down', 'right': 'wrong', 'true': 'false'}
>>> copy = opposites.copy()
- type 函数返回变量类型
- isinstance(varname, type({}))
- 语句
- 默认连接行
- 方括号,圆括号,花括号中的内容可以多行排列,不用 \ 续行,默认续行
- 例如:
month_names = ['Januari', 'Februari', 'Maart', # These are the
'April', 'Mei', 'Juni', # Dutch names
'Juli', 'Augustus', 'September', # for the months
'Oktober', 'November', 'December'] # of the year
- 缩进
- 缩进的单位是空格。Tab 转换为1-8个空格,转换原则是空格总数是 8 的倍数。
- 操作符和表达式
- ** 代表幂
- 3 ** 4 gives 81 (i.e. 3 * 3 * 3 * 3)
- <, >, <=, >=, ==, != 和 C 类似
- and, or, not 代表逻辑与或非
- if 0 < x and x < 10:
print "x is a positive single digit."
- is 和 is not,用于 比较 两个 object 是否为同一个对象
- is not: id(obj1) != id(obj2)
- in, not in 用于测试成员变量
- 'a' in ['a', 'b', 'c'] # True
- 交换赋值 a,b = b,a
- 为交换变量 a, b 的值,其它语言可能需要一个中间变量
- python 有一个交换赋值的写法: a,b = b,a
- 控制语句
- if 语句
- if ... elif ... else , 示例:(注意冒号和缩进)
- #!/usr/bin/python
# Filename : if.py
number = 23
guess = int(raw_input('Enter an integer : '))
if guess == number:
print 'Congratulations, you guessed it.' # new block starts here
print "(but you don't win any prizes!)" # new block ends here
elif guess < number:
print 'No, it is a little higher than that.' # another block
# You can do whatever you want in a block ...
else:
print 'No, it is a little lower than that.'
# you must have guess > number to reach here
print 'Done'
# This last statement is always executed, after the if statement
# is executed.
- 注意: 没有 switch... case 语句!
- while 循环语句
- while ... [else ...] ,示例:(else 可选)
- #!/usr/bin/python
# Filename : while.py
number = 23
stop = False
while not stop:
guess = int(raw_input('Enter an integer : '))
if guess == number:
print 'Congratulations, you guessed it.'
stop = True # This causes the while loop to stop
elif guess < number:
print 'No, it is a little higher than that.'
else: # you must have guess > number to reach here
print 'No, it is a little lower than that.'
else:
print 'The while loop is over.'
print 'I can do whatever I want here.'
print 'Done.'
- break 和 continue 语句
- break 语句跳出循环,且不执行 else 语句
- for 循环语句
- for... else... ,示例:(else 可选)
- #!/usr/bin/python
# Filename : for.py
for i in range(1, 5):
print i
else:
print 'The for loop is over.'
- break 和 continue 语句
- break 语句跳出循环,且不执行 else 语句
- 后置 for 语句
- [ name for name in wikiaction.__dict__ ]
- actions = [name[3:] for name in wikiaction.__dict__ if name.startswith('do_')]
- 示例
- 字符串中的字符
- prefixes = "JKLMNOPQ"
suffix = "ack"
for letter in prefixes:
print letter + suffix
- 函数
- 函数声明
- 如:
- #!/usr/bin/python
# Filename : func_param.py
def printMax(a, b):
if a > b:
print a, 'is maximum'
else:
print b, 'is maximum'
printMax(3, 4) # Directly give literal values
- 参数的缺省值
- 如同 C++ 那样
- #!/usr/bin/python
# Filename : func_default.py
def say(s, times = 1):
print s * times
say('Hello')
say('World', 5)
- 关键字参数
- 在 C++ 等语言中遇到如下困扰:有一长串参数,虽然都有缺省值,但只为了修改后面的某个参数,还需要把前面的参数也赋值。这种方式,在 python 中称为顺序参数赋值。
- 例如:
- #!/usr/bin/python
# Filename : func_key.py
def func(a, b=5, c=10):
print 'a is', a, 'and b is', b, 'and c is', c
func(3, 7)
func(25, c=24)
func(c=50, a=100)
- 可变参数
- 参数前加 * 或者 **,则读取的是 list 或者 dictionary
- 示例1
- #!/usr/bin/python
def sum(*args):
'''Return the sum the number of args.'''
total = 0
for i in range(0, len(args)):
total += args[i]
return total
print sum(10, 20, 30, 40, 50)
- DocStrings
- DocStrings 提供函数的帮助
- 函数内部的第一行开始的字符串为 DocStrings
- DocStrings 的存在证明了函数也是对象
- 函数的 __doc__ 属性为该 DocStrings
- 例如 print printMax.__doc__ 为打印 printMax 函数的 DocStrings
- help( ) 查看帮助就是调用函数的 DocStrings
- Lambda Forms
- Lambda Forms 用于创建并返回新函数,即是一个函数生成器
- 内置函数和对象
- 帮助: import __builtin__; help (__builtin__)
- 函数
- 数学/逻辑/算法
- cmp(x,y) : 比较x y 的值。返回 1,0,-1
- divmod(x, y) -> (div, mod) : 显示除数和余数
- round(number[, ndigits]) -> floating point number : 四舍五入,保留 n 位小数
- sum(sequence, start=0) -> value : 取 sequence 的和
- hex(number) -> string : 返回十六进制
- oct(number) -> string : 八进制
- range([start,] stop[, step]) -> list of integers
- >>> range(10)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
- filter(function or None, sequence) -> list, tuple, or string
- function 作用于 sequence 的每一个元素,返回 true 的元素。返回类型同 sequence 类型。
- 如果 function 为 None,返回本身为 true 的元素
- map(function, sequence[, sequence, ...]) -> list
- 将函数作用于 sequence 每个元素,生成 list
- >>> map(lambda x : x*2, [1,2,3,4,5])
[2, 4, 6, 8, 10]
- reduce(function, sequence[, initial]) -> value
- 从左至右,将函数作用在 sequence 上,最终由 sequence 产生一个唯一值。
- >>> reduce(lambda x, y: x+y, [1, 2, 3, 4, 5])
15
相当于 ((((1+2)+3)+4)+5)
- sorted(iterable, cmp=None, key=None, reverse=False) : 排序
- zip(seq1 [, seq2 [...]]) -> [(seq1[0], seq2[0] ...), (...)]
- >>> zip('1234','789')
[('1', '7'), ('2', '8'), ('3', '9')]
- coerce(x, y) -> (x1, y1)
- Return a tuple consisting of the two numeric arguments converted to a common type, using the same rules as used by arithmetic operations. If coercion is not possible, raise TypeError.
- 字符串
- chr(i) : 0<=i<256, 返回 ascii 码为 i 的字符
- unichr(i) -> Unicode character : 返回 unicode 。 0 <= i <= 0x10ffff
- ord(c) : 返回字符 c 的 ascii 码
- 对象相关
- delattr(object,name) : 在对象 object 中删除属性 name
- delattr(x, 'y') 相当于 del x.y
- getattr(object, name[, default]) -> value
- hasattr(object, name) -> bool
- id(object) -> integer : 返回对象 ID,相当于内存中地址
- hash(object) -> integer : 两个对象具有相同的值,就有相当的 hash。但反之未必。
- setattr(object, name, value) : 相当于赋值 x.y = v
- isinstance(object, class-or-type-or-tuple) -> bool
- vars([object]) -> dictionary
- 以对象为参数,相当于 object.__dict__
- repr(object) -> string : 对象 object 的正式名称
- reload(module) -> module : 重新加载 module
- iter
- iter(collection) -> iterator
- Get an iterator from an object. In the first form, the argument must
supply its own iterator, or be a sequence.
- iter(callable, sentinel) -> iterator
- In the second form, the callable is called until it returns the sentinel.
- 输入输出
- input([prompt]) -> value : 输入。相当于 eval(raw_input(prompt))。
- raw_input([prompt]) -> string : 输入内容不做处理,作为字符串
- 其他
- __import__(name, globals, locals, fromlist) -> module : 动态加载模块
- import module 中的 module 不能是变量。如果要使用变量动态加载模块,使用下面的方法。
- def importName(modulename, name):
""" Import name dynamically from module
Used to do dynamic import of modules and names that you know their
names only in runtime.
Any error raised here must be handled by the caller.
@param modulename: full qualified mudule name, e.g. x.y.z
@param name: name to import from modulename
@rtype: any object
@return: name from module
"""
module = __import__(modulename, globals(), {}, [name])
return getattr(module, name)
- callable(object) : 是否可调用,如函数。对象也可以调用。
- compile(source, filename, mode[, flags[, dont_inherit]]) -> code object
- eval(source[, globals[, locals]]) -> value
- 执行代码,source 可以是字符串表达的代码,或者 compile 返回的 code object
- execfile(filename[, globals[, locals]])
- 输入和输出
- 输入:raw_input vs input
- 最好用 raw_input
- name = raw_input ("What...is your name? ")
- input 只能用于输入数字
- age = input ("How old are you? ")
- 文件
- 打开文件
- 读
- >>> f = open("test.dat","r")
- 写
- >>> f = open("test.dat","w")
>>> print f
<open file 'test.dat', mode 'w' at fe820>
- write("content"):写文件
- >>> f.write("Now is the time")
>>> f.write("to close the file")
- read(count):读文件
- 读取全部数据
- >>> text = f.read()
>>> print text
Now is the timeto close the file
- 读取定长数据
- >>> f = open("test.dat","r")
>>> print f.read(5)
Now i
- readline():读取一行内容,包括行尾换行符
- 示例
- def copyFile(oldFile, newFile):
f1 = open(oldFile, "r")
f2 = open(newFile, "w")
while True:
text = f1.read(50)
if text == "":
break
f2.write(text)
f1.close()
f2.close()
return
- % 格式化输出
- % 前面如果是字符串,则类似 C 的 printf 格式化输出。
- 示例
- >>> cars = 52
>>> "In July we sold %d cars." % cars
'In July we sold 52 cars.'
- >>> "In %d days we made %f million %s." % (34,6.1,'dollars')
'In 34 days we made 6.100000 million dollars.'
- pickle 和 cPickle
- 示例
- >>> import pickle
>>> f = open("test.pck","w")
>>> pickle.dump(12.3, f)
>>> pickle.dump([1,2,3], f)
>>> f.close()
>>> f = open("test.pck","r")
>>> x = pickle.load(f)
>>> x
12.3
>>> type(x)
<type 'float'>
>>> y = pickle.load(f)
>>> y
[1, 2, 3]
>>> type(y)
<type 'list'>
- 使用 cPickle
- 比较两者时间
- bash$ x=1; time while [ $x -lt 20 ]; do x=`expr $x + 1`; ./pickle.py ; done
real 0m5.743s
user 0m2.368s
sys 0m2.932s
bash$ x=1; time while [ $x -lt 20 ]; do x=`expr $x + 1`; ./cpickle.py ; done
real 0m3.826s
user 0m2.194s
sys 0m1.958s
- cPickle 示例
- #!/usr/bin/python
# Filename: pickling.py
import cPickle
shoplistfile = 'shoplist.data' # The name of the file we will use
shoplist = ['apple', 'mango', 'carrot']
# Write to the storage
f = file(shoplistfile, 'w')
cPickle.dump(shoplist, f) # dump the data to the file
f.close()
del shoplist # Remove shoplist
# Read back from storage
f = file(shoplistfile)
storedlist = cPickle.load(f)
print storedlist
- 管道(pipe)
- os.popen('ls /etc').read()
- os.popen('ls /etc').readlines()
- 关于本文
- 参考资料
- 《A Byte of Python》, by Swaroop C H
- 《How to Think Like a Computer Scientist ——Learning with Python》
- 面向对象:类的编程
- 概念
- class 和 object
- class 是用 class 关键字创建的一个新类型
- method(方法) 与函数的区别
- method 的第一个参数比较特殊
- 在 method 声明时必须提供,但是调用时又不能提供该参数
- 例如:调用 MyClass 的一个实例 MyObject:
MyObject.method(arg1, arg2) ,Python 自动调用 MyClass.method(MyObject, arg1,arg2).
- class 变量和 object 变量
- 在 Class ChassName 中定义的变量 var1 和 var2
- 如果 ClassName.var1 方式调用,则该变量为 Class 变量,在该 Class 的各个实例中共享
- 如果 var2 以 self.var2 方式调用,则该变量为 Object 变量,与其他 Object 隔离
- 示例
- 类 Person, 每新增一人,类变量 population 加一
- 代码
- #!/usr/bin/python
# Filename: objvar.py
class Person:
'''Represents a person.'''
population = 0
def __init__(self, name):
'''Initializes the person.'''
self.name = name
print '(Initializing %s)' % self.name
# When this person is created,
# he/she adds to the population
Person.population += 1
def sayHi(self):
'''Greets the other person.
Really, that's all it does.'''
print 'Hi, my name is %s.' % self.name
def howMany(self):
'''Prints the current population.'''
# There will always be atleast one person
if Person.population == 1:
print 'I am the only person here.'
else:
print 'We have %s persons here.' % \
Person.population
swaroop = Person('Swaroop')
swaroop.sayHi()
swaroop.howMany()
kalam = Person('Abdul Kalam')
kalam.sayHi()
kalam.howMany()
swaroop.sayHi()
swaroop.howMany()
- 其他类的方法
- __getitem__(...)
x.__getitem__(y) <==> x[y]
- __iter__(self)
- 支持 iterator, 返回一个包含 next() 方法的对象。或者如果类定义了 next(), __iter__ 可以直接返回 self
- __getattribute__(...)
x.__getattribute__('name') <==> x.name
- 类的继承
- 示例
- # Filename: inheritance.py
class SchoolMember:
'''Represents any school member.'''
def __init__(self, name, age):
self.name = name
self.age = age
print '(Initialized SchoolMember: %s)' % self.name
def tell(self):
print 'Name:"%s" Age:"%s" ' % (self.name, self.age),
class Teacher(SchoolMember):
'''Represents a teacher.'''
def __init__(self, name, age, salary):
SchoolMember.__init__(self, name, age)
self.salary = salary
print '(Initialized Teacher: %s)' % self.name
def tell(self):
SchoolMember.tell(self)
print 'Salary:"%d"' % self.salary
class Student(SchoolMember):
'''Represents a student.'''
def __init__(self, name, age, marks):
SchoolMember.__init__(self, name, age)
self.marks = marks
print '(Initialized Student: %s)' % self.name
def tell(self):
SchoolMember.tell(self)
print 'Marks:"%d"' % self.marks
t = Teacher('Mrs. Abraham', 40, 30000)
s = Student('Swaroop', 21, 75)
print # prints a blank line
members = [t, s]
for member in members:
member.tell()
# Works for instances of Student as well as Teacher
- 异常处理
- Try..Except
- 在 python 解析器中输入 s = raw_input('Enter something --> '),
按 Ctrl-D , Ctrl-C 看看如何显示?
- 用 Try..Except 捕获异常输入。示例
- #!/usr/bin/python
# Filename: try_except.py
import sys
try:
s = raw_input('Enter something --> ')
except EOFError:
print '\nWhy did you do an EOF on me?'
sys.exit() # Exit the program
except:
print '\nSome error/exception occurred.'
# Here, we are not exiting the program
print 'Done'
- Raising Exceptions
- 建立自己的异常事件,需要创建一个 Exception 的子类
- 创建自己的异常类 ShortInputException 示例
- #!/usr/bin/python
# Filename: raising.py
class ShortInputException(Exception):
'''A user-defined exception class.'''
def __init__(self, length, atleast):
self.length = length
self.atleast = atleast
- 产生异常和捕获异常
- try:
s = raw_input('Enter something --> ')
if len(s) < 3:
raise ShortInputException(len(s), 3)
# Other work can go as usual here.
except EOFError:
print '\nWhy did you do an EOF on me?'
except ShortInputException, x:
print '\nThe input was of length %d, it should be at least %d'\
% (x.length, x.atleast)
else:
print 'No exception was raised.'
- 模组和包
- 示例
- a.py 示例
- # -*- python -*-
version=0.1.a
- b.py 以模组调用 a.py
- 直接 import
- a.py 中定义的变量和函数的引用属于模块 a 的命名空间
- import a
print "version:%s, author:%s" % (a.version, a.author)
- 使用 from module import 语法
- a.py 中定义的变量和函数就像在 b.py 中定义的一样
- from a import *
print "version:%s, author:%s" % (version, author)
- from a import author
# 只 import 模块a中一个变量
print "author:", author
- 修改 sys.path, 将 dir_a 包含其中
- import sys
sys.path.insert(0, "dir_a")
import a
print "author:", a.author
- import sys
sys.path.insert(0, "dir_a")
from a import *
print "version:%s, author:%s" % (version, author)
- 将 dir_a 作为 package
- 参见: python.org > Doc > Essays > Packages
- 在 dir_a 目录下创建文件 __init__.py (空文件即可)
- from dir_a import a
# 只 import 模块a中一个变量
print "author:", a.author
- # b.py
from dir_a.a import *
print "version:%s, author:%s" % (version, author)
- 说明
- 模组文件位于 PYTHONPATH 指定的目录中,可以用 print sys.path 查看
- import sys
print sys.path
- 模组引用一次后,会编译为 *.pyc 二进制文件,以提高效率
- import 语句,引用模组
- 语法1: "import" module ["as" name] ( "," module ["as" name] )*
- 语法2: "from" module "import" identifier ["as" name] ( "," identifier ["as" name] )*
- __name__ 变量
- 每个模组都有一个名字,模组内语句可以通过 __name__ 属性得到模组名字。
- 当模组被直接调用, __name__ 设置为 __main__
- 例如模组中的如下语句
- #!/usr/bin/python
# Filename: using_name.py
if __name__ == '__main__':
print 'This program is being run by itself'
else:
print 'I am being imported from another module'
- __dict__
- Modules, classes, and class instances all have __dict__ attributes that holds the namespace contents for that object.
- 关于包(package)
- package 可以更有效的组织 modules。
- __init__.py 文件,决定了一个目录不是不同目录,而是作为 python package
- __init__.py 可以包含 __all__变量
- package 就是一个目录,包含 *.py 模组文件,同时包含一个 __init__.py 文件
- 一个问题: 由于 Mac, windows 等对于文件名大小写不区分,当用 from package import * 的时候,难以确定文件名到模组名的对应
- __all__ 变量是一个解决方案
- 已如对于上例,在 __init__.py 中定义
__all__ = ["a"]
即当 from dir_a import * 的时候,import 的模组是 __all__ 中定义的模组
- Python 函数库
- sys
- 查看系统信息 sys.platform, sys.version_info, sys.maxint
- >>> import sys
>>> sys.version
'2.4.1 (#1, May 27 2005, 18:02:40) \n[GCC 3.3.3 (cygwin special)]'
>>> sys.version_info
(2, 4, 1, 'final', 0)
>>> sys.platform, sys.maxint
('linux2', 9223372036854775807)
- Python 模组的查询路径: sys.path
- 设置 Python 模组包含路径: sys.path.append( '/home/user')
- Exception 例外信息: sys.exc_type
- >>> try:
... raise IndexError
... except:
... print sys.exc_info()
- try:
raise TypeError, "Bad Thing"
except:
print sys.exc_info()
print sys.exc_type, sys.exc_value
- 命令行参数: sys.argv
- 命令行参数数目: len(sys.argv) , 包含程序本身名称
- sys.argv[0] 为程序名称, sys.argv[1] 为第一个参数,依此类推
- 示例1
- def main(arg1, arg2):
"""main entry point"""
... ...
if __name__ == '__main__':
if len(sys.argv) < 3:
sys.stderr.write("Usage: %s ARG1 ARG2\n" % (sys.argv[0]))
else:
main(sys.argv[1], sys.argv[2])
- 示例2
- #!/usr/bin/python
# Filename : using_sys.py
import sys
print 'The command line arguments used are:'
for i in sys.argv:
print i
print '\n\nThe PYTHONPATH is', sys.path, '\n'
- 标准输入输出等: sys.stdin, sys.stdout, sys.stderr
- os
- 分隔符等:os.sep, os.pathsep, os.linesep
- 切换路径: os.chdir(r'c:\temp')
- 将路径分解为目录和文件名:os.path.split(), os.path.dirname()
- >>> os.path.split('/home/swaroop/poem.txt')
('/home/swaroop', 'poem.txt')
- os.path.dirname('/etc/init.d/apachectl')
- os.path.basename('/etc/init.d/apachectl')
- 判断是文件还是目录: os.path.isdir(r'c:\temp'), os.path.isfile(r'c:\temp') , 返回值 1,0
- 判断文件/目录是否存在 os.path.exists('/etc/passwd')
- 执行系统命令: os.system('ls -l /etc')
- 执行系统命令并打开管道: os.popen(command [, mode='r' [, bufsize]])
- os.popen('ls /etc').read()
- os.popen('ls /etc').readlines()
- string (字符串处理)
- 示例
- import string
fruit = "banana"
index = string.find(fruit, "a")
print index
- math (数学函数)
- 例如
- import math
x = math.cos(angle + math.pi/2)
x = math.exp(math.log(10.0))
- re
- 帮助
- 常规表达式。参考: http://docs.python.org/lib/module-re.html
- 正则表达式语法
- ^, $ 指代 字符串开始,结束。对于 re.MULTILINE 模式,^,$ 除了指代字符串开始和结尾,还指代一行的开始和结束
- *, +, ?, {m,n} : 量词(默认贪婪模式,尽量多的匹配)
- 例如:表达式 "<.*>" 用于匹配字符串 '<H1>title</H1>',会匹配整个字串,而非 '<H1>'
- >>> re.match('<.*>', '<H1>titile</H1>').group()
'<H1>titile</H1>'
- *?, +?, ?? : 避免贪婪模式的量词
- 例如:表达式 "<.*?>" 用于匹配字符串 '<H1>title</H1>',只匹配 '<H1>'
- >>> re.match('<.*?>', '<H1>titile</H1>').group()
'<H1>'
- {m,n}? : 同样尽量少的匹配(非贪婪模式)
- >>> re.match('<.{,20}>', '<H1>titile</H1>').group()
'<H1>titile</H1>'
- >>> re.match('<.{,20}?>', '<H1>titile</H1>').group()
'<H1>'
- [(] [)]
- ( 和 ),用于组合pattern,如果要匹配括号,可以使用 \(, \) 或者 [(] , [)]
- (?iLmsux)
- (? 之后跟 iLmsux 任意字符,相当于设置了 re.I, re.L, re.M, re.S, re.U, re.X
- >>> re.search('(?i)(T[A-Z]*)','<h1>title</h1>').groups()
('title',)
- (?P<name>pattern) : 用名称指代匹配
- >>> re.match('(?P<p>.*?)(?::\s*)(?P<msg>.*)', 'prompt: enter your name').group('p')
'prompt'
>>> re.match('(?P<p>.*?)(?::\s*)(?P<msg>.*)', 'prompt: enter your name').group('msg')
'enter your name'
>>> re.match('(?P<p>.*?)(?::\s*)(?P<msg>.*)', 'prompt: enter your name').group(0)
'prompt: enter your name'
>>> re.match('(?P<p>.*?)(?::\s*)(?P<msg>.*)', 'prompt: enter your name').group(1)
'prompt'
>>> re.match('(?P<p>.*?)(?::\s*)(?P<msg>.*)', 'prompt: enter your name').group(2)
'enter your name'
- 用 r'\1' 指代匹配
>>> re.sub ( 'id:\s*(?P<id>\d+)', 'N:\\1', 'userlist\nid:001,user001:jiangxin\nid:002,user003:tom\nid:003,user003:jerry\n')
'userlist\nN:001,user001:jiangxin\nN:002,user003:tom\nN:003,user003:jerry\n'
>>> re.sub ( 'id:\s*(?P<id>\d+)', r'N:\1', 'userlist\nid:001,user001:jiangxin\nid:002,user003:tom\nid:003,user003:jerry\n')
'userlist\nN:001,user001:jiangxin\nN:002,user003:tom\nN:003,user003:jerry\n'
- 用 r'\g<name>' 指代匹配
>>> re.sub ( 'id:\s*(?P<id>\d+)', r'N:\g<id>', 'userlist\nid:001,user001:jiangxin\nid:002,user003:tom\nid:003,user003:jerry\n')
'userlist\nN:001,user001:jiangxin\nN:002,user003:tom\nN:003,user003:jerry\n'
- (?P=name) : 指代前面发现的命名匹配
- >>> re.findall ( 'id:\s*(?P<id>\d+)', 'userlist\nid:001,user001:jiangxin\nid:002,user003:tom\nid:003,user003:jerry\n')
['001', '002', '003']
- >>> re.findall ( 'id:\s*(?P<id>\d+),\s*user(?P=id):', 'userlist\nid:001,user001:jiangxin\nid:002,user003:tom\nid:003,user003:jerry\n')
['001', '003']
- (?:pattern)
- 对比下面的两个例子:
>>> re.match('(.*?:\s*)(.*)', 'prompt: enter your name').group(1)
'prompt: '
>>> re.match('(?:.*?:\s*)(.*)', 'prompt: enter your name').group(1)
'enter your name'
- (?=pattern) 正向前断言
- Matches if pattern matches next, but doesn't consume any of the string.
- 例如:
- 只改动出现在 foobar 中的 foo, 不改变如 fool, foolish 中出现的 foo
-
$line="foobar\nfool";
## foo后面出现bar,且 bar 的内容不再替换之列。
$line =~ s/foo(?=bar)/something/gm;
print "$line\n";
显示
somethingbar
fool
- (?!pattern) 负向前断言
- 则和 (?=pattern) 相反。 Matches if ... doesn't match next. This is a negative lookahead assertion.
- 例如: 改动除了 foobar 外单词中的 foo, 如 fool, foolish 中出现的 foo。
-
$line="foobar\nfool";
## foo后面不是bar,且 (?!..) 中的内容不再替换之列。
$line =~ s/foo(?!bar)/something/gm;
print "$line\n";
显示
foobar
somethingl
- (?<=pattern) 正向后断言
- 正向后断言。Matches if the current position in the string is preceded by a match for ... that ends at the current position.
- 如下例:
- $line="foobar\nbarfoo\nbar foo\na fool";
## 替换 bar 后面的 foo,(bar) 不再替换之列。
$line =~ s/(?<=bar)foo/something/gm;
print "$line\n";
显示
foobar
barsomething
bar foo
a fool
- (?<!pattern) 负向后断言
- 负向后断言。 Matches if the current position in the string is not preceded by a match for .... This is called a negative lookbehind assertion.
- 如下例:
- $line="foobar\nbarfoo\nbar foo\na fool";
## 替换 foo,但之前不能是 bar。
$line =~ s/(?<!bar)foo/something/gm;
print "$line\n";
显示
somethingbar
barfoo
bar something
a somethingl
- 正则表达式特殊字符
- \A Matches only at the start of the string.
- \b Matches the empty string, but only at the beginning or end of a word
- \B Matches the empty string, but only when it is not at the beginning or end of a word.
- \d When the UNICODE flag is not specified, matches any decimal digit; this is equivalent to the set [0-9]. With UNICODE, it will match whatever is classified as a digit in the Unicode character properties database.
- \D When the UNICODE flag is not specified, matches any non-digit character; this is equivalent to the set [^0-9]. With UNICODE, it will match anything other than character marked as digits in the Unicode character properties database.
- \s When the LOCALE and UNICODE flags are not specified, matches any whitespace character; this is equivalent to the set [ \t\n\r\f\v]. With LOCALE, it will match this set plus whatever characters are defined as space for the current locale. If UNICODE is set, this will match the characters [ \t\n\r\f\v] plus whatever is classified as space in the Unicode character properties database.
- \S When the LOCALE and UNICODE flags are not specified, matches any non-whitespace character; this is equivalent to the set [^ \t\n\r\f\v] With LOCALE, it will match any character not in this set, and not defined as space in the current locale. If UNICODE is set, this will match anything other than [ \t\n\r\f\v] and characters marked as space in the Unicode character properties database.
- \w When the LOCALE and UNICODE flags are not specified, matches any alphanumeric character and the underscore; this is equivalent to the set [a-zA-Z0-9_]. With LOCALE, it will match the set [0-9_] plus whatever characters are defined as alphanumeric for the current locale. If UNICODE is set, this will match the characters [0-9_] plus whatever is classified as alphanumeric in the Unicode character properties database.
- \W When the LOCALE and UNICODE flags are not specified, matches any non-alphanumeric character; this is equivalent to the set [^a-zA-Z0-9_]. With LOCALE, it will match any character not in the set [0-9_], and not defined as alphanumeric for the current locale. If UNICODE is set, this will match anything other than [0-9_] and characters marked as alphanumeric in the Unicode character properties database.
- \Z Matches only at the end of the string.
- re 选项
- re.L, re.LOCALE : \w, \W, \b, \B, \s and \S 参考当前 locale
- re.M, re.MULTILINE : 将字符串视为多行,^ 和 $ 匹配字符串中的换行符。缺省只匹配字符串开始和结束。
- re.S, re.DOTALL : . 匹配任意字符包括换行符。缺省匹配除了换行符外的字符
- re.U, re.UNICODE : \w, \W, \b, \B, \d, \D, \s and \S 参考 Unicode 属性
- >>> re.compile(ur'----(-)*\r?\n.*\b(网页类)\b',re.U).search("--------\r\nCategoryX 网页类 CategoryY".decode('utf-8')).groups()
(u'-', u'\u7f51\u9875\u7c7b')
- >>> re.compile(ur'----(-)*\r?\n.*\b(网页类)\b',re.U).search(u"--------\r\nCategoryX 网页类 CategoryY").groups()
(u'-', u'\u7f51\u9875\u7c7b')
- re.X, re.VERBOSE : 可以添加 # 注释,以增强表达式可读性。
- 例如:
page_invalid_chars_regex = re.compile(
ur"""
\u0000 | # NULL
# Bidi control characters
\u202A | # LRE
\u202B | # RLE
\u202C | # PDF
\u202D | # LRM
\u202E # RLM
""",
re.UNICODE | re.VERBOSE
)
- 注意 match 和 search 的差异
- re.match( pattern, string[, flags]) 仅在字符串开头匹配。 相当于在 pattern 前加上了一个'^'!
- >>> p.match("")
>>> print p.match("")
None
p = re.compile( ... )
m = p.match( 'string goes here' )
if m:
print 'Match found: ', m.group()
else:
print 'No match'
- re.search( pattern, string[, flags]) 在整个字符串中查询
- re.compile( pattern[, flags])
- 使用 re.compile,对于需要重复使用的表达式,更有效率
- prog = re.compile(pat)
result = prog.match(str)
相当于
result = re.match(pat, str)
- re.split( pattern, string[, maxsplit = 0]) 分割字符串
- >>> re.split('\W+', 'Words, words, words.')
['Words', 'words', 'words', '']
>>> re.split('(\W+)', 'Words, words, words.')
['Words', ', ', 'words', ', ', 'words', '.', '']
>>> re.split('\W+', 'Words, words, words.', 1)
['Words', 'words, words.']
- re.findall( pattern, string[, flags])
- >>> p = re.compile('\d+')
>>> p.findall('12 drummers drumming, 11 pipers piping, 10 lords a-leaping')
['12', '11', '10']
- re.finditer( pattern, string[, flags])
- >>> p = re.compile('\d+')
>>> iterator = p.finditer('12 drummers drumming, 11 ... 10 ...')
>>> iterator
<callable-iterator object at 0x401833ac>
>>> for match in iterator:
... print match.span()
...
(0, 2)
(22, 24)
(29, 31)
- re.sub(pattern, repl, string[, count])
- >>> re.sub ( 'id:\s*(?P<id>\d+)', 'N:\\1', 'userlist\nid:001,user001:jiangxin\nid:002,user003:tom\nid:003,user003:jerry\n')
'userlist\nN:001,user001:jiangxin\nN:002,user003:tom\nN:003,user003:jerry\n'
>>> re.sub ( 'id:\s*(?P<id>\d+)', r'N:\1', 'userlist\nid:001,user001:jiangxin\nid:002,user003:tom\nid:003,user003:jerry\n')
'userlist\nN:001,user001:jiangxin\nN:002,user003:tom\nN:003,user003:jerry\n'
- >>> re.sub ( 'id:\s*(?P<id>\d+)', r'N:\g<id>', 'userlist\nid:001,user001:jiangxin\nid:002,user003:tom\nid:003,user003:jerry\n')
'userlist\nN:001,user001:jiangxin\nN:002,user003:tom\nN:003,user003:jerry\n'
- re.subn( pattern, repl, string[, count]) 和 re.sub 类似,返回值不同
- 返回值为: a tuple (new_string, number_of_subs_made).
- re.escape(string) : 对字符串预处理,以免其中特殊字符对正则表达式造成影响
- compile 对象
- re.compile 返回 的 compile 对象 的方法都有 re 类似方法对应,只是参数不同
- re 相关对象有 flags 参数,而 compile 对象因为在建立之初已经提供了 flags,
在 compile 相应方法中,用 pos, endpos 即开始位置和结束位置参数取代 flags 参数
- match( string[, pos[, endpos]])
- search( string[, pos[, endpos]])
- split( string[, maxsplit = 0])
- findall( string[, pos[, endpos]])
- finditer( string[, pos[, endpos]])
- sub( repl, string[, count = 0])
- subn( repl, string[, count = 0])
- match 对象
- expand( template)
- 支持 "\1", "\2", "\g<1>", "\g<name>"
- group( [group1, ...])
- 示例
m = re.match(r"(?P<int>\d+)\.(\d*)", '3.14')
结果
m.group(1) is '3', as is m.group('int'), and m.group(2) is '14'.
- >>> p = re.compile('(a(b)c)d')
>>> m = p.match('abcd')
>>> m.group(0)
'abcd'
>>> m.group(1)
'abc'
>>> m.group(2)
'b'
>>> m.groups()
('abc', 'b')
- groups( [default])
- 返回一个 tuple, 包含从 1 开始的所有匹配
- groupdict( [default])
- 返回一个 dictionary,包含所有的命名匹配
- start( [group]) 和 end( [group])
- 分别代表第 group 组匹配在字符串中的开始和结束位置
- span( [group])
- 返回由 start, end 组成的 二值 tuple
- getopt(命令行处理)
- getopt.getopt( args, options[, long_options])
- args 是除了应用程序名称外的参数,相当于: sys.argv[1:]
- options 是短格式的参数支持。如果带有赋值的参数后面加上冒号":"。参见 Unix getopt()
- long_options 是长格式的参数支持。如果是带有赋值的参数,参数后面加上等号“="。
- 返回值: 返回两个元素
- 一:返回包含 (option, value) 键值对的列表
- 示例:
- >>> import getopt
>>> args = '-a -b -cfoo -d bar a1 a2'.split()
>>> args
['-a', '-b', '-cfoo', '-d', 'bar', 'a1', 'a2']
>>> optlist, args = getopt.getopt(args, 'abc:d:')
>>> optlist
[('-a', ''), ('-b', ''), ('-c', 'foo'), ('-d', 'bar')]
>>> args
['a1', 'a2']
- """Module docstring.
This serves as a long usage message.
"""
import sys
import getopt
def main():
# parse command line options
try:
opts, args = getopt.getopt(sys.argv[1:], "hp:", ["help", "port="])
except getopt.error, msg:
print msg
print "for help use --help"
sys.exit(2)
# process options
for o, a in opts:
if o in ("-h", "--help"):
print __doc__
sys.exit(0)
elif o in ("-p", "--port"):
print "port is %d" % a
# process arguments
for arg in args:
process(arg) # process() is defined elsewhere
if __name__ == "__main__":
main()
- 数据库
- 参见: http://mysql-python.sourceforge.net/MySQLdb.html
- time(时间函数)
- time.time() : 返回 Unix Epoch 时间(秒),符点数
- time.clock() : 进程启动后的秒数(符点数)
- gmtime() : 返回 UTC 时间,格式为 tuple
- localtime() : 返回本地时间,格式为 tuple
- asctime() : 将 tuple 时间转换为字符串
- mktime() : 将本地时间 tuple 转换为 Epoch 秒数
- strftime() : 将 tuple time 依照格式转换
- strptime() : 将字符串按格式转换为 tuple time
- logging
- logging 级别
- Level Numeric value
CRITICAL 50
ERROR 40
WARNING 30
INFO 20
DEBUG 10
NOTSET 0
- getLogger()
- 缺省为 root logger, 通过 getLogger 设置新的 logger 和名称
- logging.basicConfig()
logging.getLogger("").setLevel(logging.DEBUG)
ERR = logging.getLogger("ERR")
ERR = logging.getLogger("ERR")
ERR.setLevel(logging.ERROR)
#These should log
logging.log(logging.CRITICAL, nextmessage())
logging.debug(nextmessage())
ERR.log(logging.CRITICAL, nextmessage())
ERR.error(nextmessage())
#These should not log
ERR.debug(nextmessage())
- basicConfig 用于设置日志级别和格式等
- logging.basicConfig(level=logging.DEBUG,
format="%(levelname)s : %(asctime)-15s > %(message)s")
- Python 实战
- 帮助框架
- __doc__
- '''PROGRAM INTRODUCTION
Usage: %(PROGRAM)s [options]
Options:
-h|--help
Print this message and exit.
'''
- 函数 usage
- def usage(code, msg=''):
if code:
fd = sys.stderr
else:
fd = sys.stdout
print >> fd, _(__doc__)
if msg:
print >> fd, msg
sys.exit(code)
- 说明: code 是返回值,msg 是附加的错误消息
- 命令行处理
- 命令行框架
- #!/usr/bin/python
# -*- coding: utf-8 -*-
import sys
import getopt
def main(argv=None):
if argv is None:
argv = sys.argv
try:
opts, args = getopt.getopt(
argv[1:], "hn:",
["help", "name="])
except getopt.error, msg:
return usage(1, msg)
for opt, arg in opts:
if opt in ('-h', '--help'):
return usage(0)
#elif opt in ('--more_options'):
if __name__ == "__main__":
sys.exit(main())
- 说明
- 之所以为 main 添加缺省参数,是为了可以在交互模式调用 main 来传参
- def main(argv=None):
if argv is None:
argv = sys.argv
# etc., replacing sys.argv with argv in the getopt() call.
- 为防止 main 中调用 sys.exit(),导致交互模式退出,在 main 中使用 return 语句,而非 sys.exit
- if __name__ == "__main__":
sys.exit(main())
- unicode
- Python 里面的编码和解码也就是 unicode 和 str 这两种形式的相互转化。
编码是 unicode -> str,相反的,解码就 > 是 str -> unicode
- 认识 unicode
- # 因为当前 locale 是 utf-8 编码,因此字符串默认编码为 utf-8
>>> '中文'
'\xe4\xb8\xad\xe6\x96\x87'
>>> isinstance('中文', unicode)
False
>>> isinstance('中文', str)
True
- # decode 是将 str 转换为 unicode
>>> '中文'.decode('utf-8')
u'\u4e2d\u6587'
>>> isinstance('中文'.decode('utf-8'), unicode)
True
>>> isinstance('中文'.decode('utf-8'), str)
False
- # 前缀 u 定义 unicode 字串
>>> u'中文'
u'\u4e2d\u6587'
>>> isinstance(u'中文', unicode)
True
>>> isinstance(u'中文', str)
False
- # encode 将 unicode 转换为 str
>>> u'中文'.encode('utf-8')
'\xe4\xb8\xad\xe6\x96\x87'
>>> isinstance(u'中文'.encode('utf-8'), unicode)
False
>>> isinstance(u'中文'.encode('utf-8'), str)
True
- >>> len(u'中文')
2
>>> len(u'中文'.encode('utf-8'))
6
>>> len(u'中文'.encode('utf-8').decode('utf-8'))
2
- Unicode 典型错误1
- >>> "str1: %s, str2: %s" % ('中文', u'中文')
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 6: ordinal not in range(128)
- 解决方案
- >>> "str1: %s, str2: %s" % ('中文', '中文')
'str1: \xe4\xb8\xad\xe6\x96\x87, str2: \xe4\xb8\xad\xe6\x96\x87'
- >>> "str1: %s, str2: %s" % (u'中文', u'中文')
u'str1: \u4e2d\u6587, str2: \u4e2d\u6587'
- Unicode 典型错误2
- mystr = '中文'
mystr.encode('gb18030')
报错:
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 0: ordinal not in range(128)
- 错误解析:
mystr.encode('gb18030') 这句代码将 mystr 重新编码为 gb18030 的格式,即进行 unicode -> str 的转换。因为 mystr 本身就是 str 类型的,因此 Python 会自动的先将 mystr 解码为 unicode ,然后再编码成 gb18030。
因为解码是python自动进行的,我们没有指明解码方式,python 就会使用 sys.defaultencoding 指明的方式来解码。很多情况下 sys.defaultencoding 是 ANSCII,如果 mystr 不是这个类型就会出错。
拿上面的情况来说,缺省 sys.defaultencoding 是 anscii,而 mystr 的编码方式和文件的编码方式一致,是 utf8 的,所以出错了。
- 通过 sys.setdefaultencoding 设置字符串缺省编码
- #! /usr/bin/env python
# -*- coding: utf-8 -*-
import sys
reload(sys) # Python2.5 初始化后会删除 sys.setdefaultencoding 这个方法,我们需要重新载入
sys.setdefaultencoding('utf-8')
mystr = '中文'
# 缺省先用定义的缺省字符集将 str 解码为 unicode,
# 之后编码为 gb18030
mystr.encode('gb18030')
- 显式将 str 转换为 unicode, 再编码
- #! /usr/bin/env python
# -*- coding: gb2312 -*-
s = '中文'
s.decode('gb2312').encode('big5')
- #! /usr/bin/env python
# -*- coding: utf-8 -*-
s = '中文'
# 即使文件编码为 utf-8,sys 的缺省字符编码仍为 ascii,需要显式设置解码的字符集为 utf-8
print s.decode('utf-8')
print s.decode('utf-8').encode('gb18030')
- unicode 函数
- 是 python 内置函数。将字符串由'charset' 字符集转换为 unicode
- unicode (message, charset)
- encode 负责 uicode --> str
- unicode('中文字符串', 'gbk').encode('gb18030')
- 语法检查
- PyLint 除了语法错误检查外,还能提供很多修改建议。诸如:发现 Tab 和空格混用进行缩进,……
- PyLint 网址: http://www.logilab.org/projects/pylint
Python 学习笔记//mm2html.xsl FreemindVersion:0.9.0_Beta_8