DBF 是 dBase、Visual FoxPro 和 FoxBase+ 等数据库使用的文件格式。 该库读取 DBF 文件并将数据作为本机 Python 数据类型返回以供进一步处理。 它主要用于批处理作业和一次性脚本。
>>> from dbfread import DBF
>>> for record in DBF('people.dbf'):
... print(record)
OrderedDict([('NAME', 'Alice'), ('BIRTHDATE', datetime.date(1987, 3, 1))])
OrderedDict([('NAME', 'Bob'), ('BIRTHDATE', datetime.date(1980, 11, 12))])
默认情况下,记录直接从文件流式传输。 如果您有足够的内存,您可以将它们加载到列表中。 这允许随机访问:
>>> table = DBF('people.dbf', load=True)
>>> print(table.records[1]['NAME'])
Bob
>>> print(table.records[0]['NAME'])
Alice
完整文档位于 https://dbfread.readthedocs.io/
有关每个版本中更改的完整列表,请参阅 docs/changes.rst。
为 Python 3 编写,但也适用于 2.7
简单但灵活的API
Requires Python 3.2 or 2.7.
pip install dbfread
dbfread 是一个纯 Python 模块,不依赖于标准库之外的任何包。
要在本地构建文档:
python setup.py docs
This requires Sphinx. The resulting files can be found in docs/_build/.
Latest stable release: http://github.com/olemb/dbfread/
Development version: http://github.com/olemb/dbfread/tree/develop/
从版本 2.0 开始,dbfread.open() and dbfread.read() 已弃用,并将在 2.2 中删除。
DBF 类不再是 list 的子类。 这使得 API 更加干净且易于理解,但依赖此行为的旧代码将会被破坏。 迭代和记录计数的工作方式与以前相同。 其他列表操作可以使用 record 属性重写。 例如:
table = dbfread.read('people.dbf')
print(table[1])
can be rewritten as:
table = DBF('people.dbf', load=True)
print(table.records[1])
open() 和 read() 都返回 DeprecatedDBF,它是 DBF 和 list 的子类,因此向后兼容。
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from dbfread import DBF
def read_dbf(dbf_file):
# table = DBF(dbf_file)
table = DBF(dbf_file, encoding='gbk') # 指定编码
for record in table: # 没加删除标志
# for record in table.deleted: # 加了删除标志
# print(record)
for field in record:
print("%s = %s" % (field, record[field]), end = ", ")
break
print(len(table))
if __name__ == '__main__':
dbf_file = 'dbf.dbf'
read_dbf(dbf_file)
搜到了这段代码,比较轻量级的,虽然已是十年前的代码,暂时没有发现什么大的问题,调试起来比较方便:
dbf读取及csv输出,适用中文 -> python读写dbf文件 ->
Created by Raymond Hettinger on Tue, 11 Jan 2005 (PSF) (Licensed under the PSF License)
Reader iterates over records in Dbase or Xbase files. Writer creates dbf files from Python sequences.
import struct, datetime, decimal, itertools
def dbfreader(f):
"""Returns an iterator over records in a Xbase DBF file.
The first row returned contains the field names.
The second row contains field specs: (type, size, decimal places).
Subsequent rows contain the data records.
If a record is marked as deleted, it is skipped.
File should be opened for binary reads.
"""
# See DBF format spec at:
# http://www.pgts.com.au/download/public/xbase.htm#DBF_STRUCT
numrec, lenheader = struct.unpack('<xxxxLH22x', f.read(32))
numfields = (lenheader - 33) // 32
fields = []
for fieldno in xrange(numfields):
name, typ, size, deci = struct.unpack('<11sc4xBB14x', f.read(32))
name = name.replace('\0', '') # eliminate NULs from string
fields.append((name, typ, size, deci))
yield [field[0] for field in fields]
yield [tuple(field[1:]) for field in fields]
terminator = f.read(1)
assert terminator == '\r'
fields.insert(0, ('DeletionFlag', 'C', 1, 0))
fmt = ''.join(['%ds' % fieldinfo[2] for fieldinfo in fields])
fmtsiz = struct.calcsize(fmt)
for i in xrange(numrec):
record = struct.unpack(fmt, f.read(fmtsiz))
if record[0] != ' ':
continue # deleted record
result = []
for (name, typ, size, deci), value in itertools.izip(fields, record):
if name == 'DeletionFlag':
continue
if typ == "N":
value = value.replace('\0', '').lstrip()
if value == '':
value = 0
elif deci:
value = decimal.Decimal(value)
else:
value = int(value)
elif typ == 'D':
y, m, d = int(value[:4]), int(value[4:6]), int(value[6:8])
value = datetime.date(y, m, d)
elif typ == 'L':
value = (value in 'YyTt' and 'T') or (value in 'NnFf' and 'F') or '?'
elif typ == 'F':
value = float(value)
result.append(value)
yield result
def dbfwriter(f, fieldnames, fieldspecs, records):
""" Return a string suitable for writing directly to a binary dbf file.
File f should be open for writing in a binary mode.
Fieldnames should be no longer than ten characters and not include \x00.
Fieldspecs are in the form (type, size, deci) where
type is one of:
C for ascii character data
M for ascii character memo data (real memo fields not supported)
D for datetime objects
N for ints or decimal objects
L for logical values 'T', 'F', or '?'
size is the field width
deci is the number of decimal places in the provided decimal object
Records can be an iterable over the records (sequences of field values).
"""
# header info
ver = 3
now = datetime.datetime.now()
yr, mon, day = now.year-1900, now.month, now.day
numrec = len(records)
numfields = len(fieldspecs)
lenheader = numfields * 32 + 33
lenrecord = sum(field[1] for field in fieldspecs) + 1
hdr = struct.pack('<BBBBLHH20x', ver, yr, mon, day, numrec, lenheader, lenrecord)
f.write(hdr)
# field specs
for name, (typ, size, deci) in itertools.izip(fieldnames, fieldspecs):
name = name.ljust(11, '\x00')
fld = struct.pack('<11sc4xBB14x', name, typ, size, deci)
f.write(fld)
# terminator
f.write('\r')
# records
for record in records:
f.write(' ') # deletion flag
for (typ, size, deci), value in itertools.izip(fieldspecs, record):
if typ == "N":
value = str(value).rjust(size, ' ')
elif typ == 'D':
value = value.strftime('%Y%m%d')
elif typ == 'L':
value = str(value)[0].upper()
else:
value = str(value)[:size].ljust(size, ' ')
assert len(value) == size
f.write(value)
# End of file
f.write('\x1A')
# -------------------------------------------------------
# Example calls
if __name__ == '__main__':
import sys, csv
from cStringIO import StringIO
from operator import itemgetter
# Read a database
filename = '/pydev/databases/orders.dbf'
if len(sys.argv) == 2:
filename = sys.argv[1]
f = open(filename, 'rb')
db = list(dbfreader(f))
f.close()
for record in db:
print record
fieldnames, fieldspecs, records = db[0], db[1], db[2:]
# Alter the database
del records[4]
records.sort(key=itemgetter(4))
# Remove a field
del fieldnames[0]
del fieldspecs[0]
records = [rec[1:] for rec in records]
# Create a new DBF
f = StringIO()
dbfwriter(f, fieldnames, fieldspecs, records)
# Read the data back from the new DBF
print '-' * 20
f.seek(0)
for line in dbfreader(f):
print line
f.close()
# Convert to CSV
print '.' * 20
f = StringIO()
csv.writer(f).writerow(fieldnames)
csv.writer(f).writerows(records)
print f.getvalue()
f.close()
# Example Output
"""
['ORDER_ID', 'CUSTMR_ID', 'EMPLOY_ID', 'ORDER_DATE', 'ORDER_AMT']
[('C', 10, 0), ('C', 11, 0), ('C', 11, 0), ('D', 8, 0), ('N', 12, 2)]
['10005 ', 'WALNG ', '555 ', datetime.date(1995, 5, 22), Decimal("173.40")]
['10004 ', 'BMARK ', '777 ', datetime.date(1995, 5, 18), Decimal("3194.20")]
['10029 ', 'SAWYH ', '777 ', datetime.date(1995, 6, 29), Decimal("97.30")]
['10013 ', 'RITEB ', '777 ', datetime.date(1995, 6, 2), Decimal("560.40")]
['10024 ', 'RATTC ', '444 ', datetime.date(1995, 6, 21), Decimal("2223.50")]
['10018 ', 'RATTC ', '444 ', datetime.date(1995, 6, 12), Decimal("1076.05")]
['10025 ', 'RATTC ', '444 ', datetime.date(1995, 6, 23), Decimal("185.80")]
['10038 ', 'OLDWO ', '111 ', datetime.date(1995, 7, 14), Decimal("863.96")]
['10002 ', 'MTIME ', '333 ', datetime.date(1995, 5, 16), Decimal("731.80")]
['10007 ', 'MORNS ', '444 ', datetime.date(1995, 5, 24), Decimal("1405.00")]
['10026 ', 'MORNS ', '555 ', datetime.date(1995, 6, 26), Decimal("17.40")]
['10030 ', 'LILLO ', '111 ', datetime.date(1995, 7, 3), Decimal("909.91")]
['10022 ', 'LAPLA ', '111 ', datetime.date(1995, 6, 19), Decimal("671.50")]
['10035 ', 'HIGHG ', '111 ', datetime.date(1995, 7, 11), Decimal("1984.83")]
['10033 ', 'FOODG ', '333 ', datetime.date(1995, 7, 6), Decimal("3401.32")]
--------------------
['CUSTMR_ID', 'EMPLOY_ID', 'ORDER_DATE', 'ORDER_AMT']
[('C', 11, 0), ('C', 11, 0), ('D', 8, 0), ('N', 12, 2)]
['MORNS ', '555 ', datetime.date(1995, 6, 26), Decimal("17.40")]
['SAWYH ', '777 ', datetime.date(1995, 6, 29), Decimal("97.30")]
['WALNG ', '555 ', datetime.date(1995, 5, 22), Decimal("173.40")]
['RATTC ', '444 ', datetime.date(1995, 6, 23), Decimal("185.80")]
['RITEB ', '777 ', datetime.date(1995, 6, 2), Decimal("560.40")]
['LAPLA ', '111 ', datetime.date(1995, 6, 19), Decimal("671.50")]
['MTIME ', '333 ', datetime.date(1995, 5, 16), Decimal("731.80")]
['OLDWO ', '111 ', datetime.date(1995, 7, 14), Decimal("863.96")]
['LILLO ', '111 ', datetime.date(1995, 7, 3), Decimal("909.91")]
['RATTC ', '444 ', datetime.date(1995, 6, 12), Decimal("1076.05")]
['MORNS ', '444 ', datetime.date(1995, 5, 24), Decimal("1405.00")]
['HIGHG ', '111 ', datetime.date(1995, 7, 11), Decimal("1984.83")]
['BMARK ', '777 ', datetime.date(1995, 5, 18), Decimal("3194.20")]
['FOODG ', '333 ', datetime.date(1995, 7, 6), Decimal("3401.32")]
....................
CUSTMR_ID,EMPLOY_ID,ORDER_DATE,ORDER_AMT
MORNS ,555 ,1995-06-26,17.40
SAWYH ,777 ,1995-06-29,97.30
WALNG ,555 ,1995-05-22,173.40
RATTC ,444 ,1995-06-23,185.80
RITEB ,777 ,1995-06-02,560.40
LAPLA ,111 ,1995-06-19,671.50
MTIME ,333 ,1995-05-16,731.80
OLDWO ,111 ,1995-07-14,863.96
LILLO ,111 ,1995-07-03,909.91
RATTC ,444 ,1995-06-12,1076.05
MORNS ,444 ,1995-05-24,1405.00
HIGHG ,111 ,1995-07-11,1984.83
BMARK ,777 ,1995-05-18,3194.20
FOODG ,333 ,1995-07-06,3401.32
"""
The dbf file format is old, precise, commonplace, and widely supported by everything from calendar software in PDAs, to contact managers, to Excel and Access. It can be a good way to get Python to interoperate with pre-existing, non-Python apps.
Tim Morgan 18 years, 7 months ago # | flag
Visual FoxPro. With a visual foxpro dbf file I get an AssertionError regarding the terminator.
>>> db = list(dbfreader(f))
Traceback (most recent call last):
File "", line 1, in ?
File "dbf.py", line 29, in dbfreader
assert terminator == '\r'
AssertionError
Can your recipe be modified easily to work with Visual FoxPro?
Marco Molinari 18 years, 6 months ago
small typo. there is a small typo in the example code: main function is written twice and the second "version" has a bug in it (a missing argument in dbfwriter()); the first version works though
Anthony Dycks 18 years, 3 months ago
yield statement syntax error using ActiveState Python V2.2. Attempting to run the DBF reader portion of the code results in the following invalid syntax error ...
yield [field[0] for field in fields]
an accompanying warning message is issued indicates that 'yield' will become a reserved word.
Chris Arndt 18 years, 1 month ago
Generators in Python 2.2. Generators were still an optional feature in Python 2.2.
Just add "from future import generators" to the top of the module.
hamish steiner 17 years, 11 months ago
1 byte out.... Great code, as a newbie I would love to write code this tight. When I ran this on a DBF file it was one byte out , adding a new line to grab an extra byte sorted this out.
terminator = f.read(1)
assert terminator == '\r'
fields.insert(0, ('DeletionFlag', 'C', 1, 0))
fmt = ''.join(['%ds' % fieldinfo[2] for fieldinfo in fields])
fmtsiz = struct.calcsize(fmt)
hamish = f.read(1) ----NEW LINE
Then I had no problems. I dont know if this is due to an extra byte after the terminator '\r' ???. Anyway, that helped.
The other problem I had was empy dates. If the value was ' ' then I would get errors ie a=int(' ') gives an error
Changeing the code for the Date record to this
elif typ == 'D':
y, m, d = value[:4], value[4:6], value[6:8]
value=d+"/"+m+"/"+y
worked and gave me the date in a dmy format. If the string was ' ' then I get ' / / ' which is ok.
Hamish
Alex Baraev 17 years, 10 months ago
Note that. File handlers for dbfwriter must be open in 'wb' mode, especially on Windows platform, because write method will convert 0xA to 0x0A 0x0D (CR to CRLF) and you'll get corrupted DBF.
Dennis Christopher 16 years, 9 months ago
problem writing dbf file to disk. I am butting my head against a wall trying to do something very simple with this: write a dbf file to disk (on XP). E.g modifying the sample script:
filename = 'c:\MyDBF.dbf'
g = open(filename, 'wb')
f = StringIO()
dbfwriter(f, fieldnames, fieldspecs, records)
g.write(f.getvalue())
g.flush()
Access doesnt like the resultant file. And DBFReader function also complains about it.
Any insight would be appreciated.
Dennis Christopher
Yusdi Santoso 16 years, 1 month ago
Python code for reading Visual FoxPro DBF. This is a Python module that read Visual FoxPro DBF file with memo:
http://www.physics.ox.ac.uk/users/santoso/Software.Repository.html
Hope this can be useful.
tyrion-mx 14 years, 11 months ago
I have made a simple lib to read DBF files, that looks like the django db/model api. It supports searching into the databases, creating new databases from a model definition and creating a model from an existing database. There is also a lower-level interface, if you don't like the "django like" syntax. It's still in the early stages of development, I hope to have the time to finish it.
david SHI 14 years, 7 months ago
How come I can not update or delete a record?
david SHI 14 years, 7 months ago
Get you make it generic?
Andrew Leonard 12 years, 8 months ago
I recently found this code, and i am using it to read arcmap shapefile dbf files, when i read in the data using the the dbfreader function some of the fields are being read in as various length string of '*'. I dunno why the code is doing this, i think it maybe the data, but excel, and arcmap read the data perfectly fine, some of the data columns are over the character length, which may be the problem. Any help would be greatly appreciated also i can provide the data that is giving issues too.
Thanks, Andrew
John Machin 12 years, 8 months ago
To Andrew Leonard: I have a non-published DBF reader module which may be able to help you. You can find my e-mail address by google_search("John Machin xlrd").
Raymond Hettinger (author) 12 years, 4 months ago
New link to the spec: http://www.clicketyclick.dk/databases/xbase/format/dbf.html#DBF_STRUCT
Ken MacDonald 11 years, 1 month ago
We are using the code above with a few mods for dumping CodeBase files, which has been a HUGE help for our development. However, CodeBase can create character fields with lengths greater than 255, and the above code chokes to death on that. This may be the solution to the problem Andrew Leonard mentions above, since he mentions over-length data columns.
As it turns out, the following line:
fmt = ''.join(['%ds' % fieldinfo[2] for fieldinfo in fields])
should look like:
fmt = ''.join(['%ds' % (fieldinfo[2] + (fieldinfo[3] * 256)) for fieldinfo in fields])
as fieldinfo[3] is the high order byte of the field length according to the spec at clicketyclick.dk, see comment above for URL. Once the above change is in place, files with long character fields read/dump properly.
CAVEAT: the spec mentions that this is the format of the high order byte for FoxPro and Clipper files for NON-NUMERIC data; the original posted code (comment, line 7) indicates that for some numeric types that fieldinfo[3] may be the number of decimal places. You may have to write some additional code if you have BOTH numeric and long character fields in your table. I didn't have any tables with decimal data to verify whether or not this is the case.
Good luck!
Ken
Tomas Nordin 9 years, 1 month ago
For my needs (reading in a dbf file) this awesome function just replaced a dependency on a 6000+ lines of code module (java style) I was using before. And loading a file (126K, 8) took about 3+ secs while with the module it took 6- secs. So function is fast. Awesome and fast.
More, the module I was using, enforced the field names to lowercase. Also, the module I was using enforced no duplicated field names (when reading!), else error. This function does no such thing. So it's also beautiful.