dbfread

DBF 是 dBase、Visual FoxPro 和 FoxBase+ 等数据库使用的文件格式。该库读取 DBF 文件并将数据作为本机 Python 数据类型返回以供进一步处理。它主要用于批处理作业和一次性脚本。

>>> from dbfread import DBF
>>> for record in DBF('people.dbf'):
...     print(record)
OrderedDict([('NAME', 'Alice'), ('BIRTHDATE', datetime.date(1987, 3, 1))])
OrderedDict([('NAME', 'Bob'), ('BIRTHDATE', datetime.date(1980, 11, 12))])

默认情况下，记录直接从文件流式传输。如果您有足够的内存，您可以将它们加载到列表中。这允许随机访问：

>>> table = DBF('people.dbf', load=True)
>>> print(table.records[1]['NAME'])
Bob
>>> print(table.records[0]['NAME'])
Alice

完整文档位于 https://dbfread.readthedocs.io/

有关每个版本中更改的完整列表，请参阅 docs/changes.rst。

Main Features

为 Python 3 编写，但也适用于 2.7
简单但灵活的API
数据作为本机 Python 数据类型返回
记录是有序字典，但可以重新配置为任何类型
旨在处理 DBF 文件的所有变体。（目前仅使用 Visual FoxPro 进行了广泛测试，但应该可以与其他变体配合良好。）
支持 18 种字段类型。可以通过子类化“FieldParser”来添加自定义类型
读取“FPT”和“DBT”备忘录文件，包括文本和二进制数据
在区分大小写的文件系统上优雅地处理混合大小写的文件名
可以检索已删除的记录

Installing

Requires Python 3.2 or 2.7.

pip install dbfread

dbfread 是一个纯 Python 模块，不依赖于标准库之外的任何包。

要在本地构建文档：

python setup.py docs

This requires Sphinx. The resulting files can be found in docs/_build/.

Source code

Latest stable release: http://github.com/olemb/dbfread/

Development version: http://github.com/olemb/dbfread/tree/develop/

API Changes

从版本 2.0 开始，dbfread.open() and dbfread.read() 已弃用，并将在 2.2 中删除。

DBF 类不再是 list 的子类。这使得 API 更加干净且易于理解，但依赖此行为的旧代码将会被破坏。迭代和记录计数的工作方式与以前相同。其他列表操作可以使用 record 属性重写。例如：

table = dbfread.read('people.dbf')
print(table[1])

can be rewritten as:

table = DBF('people.dbf', load=True)
print(table.records[1])

open() 和 read() 都返回 DeprecatedDBF，它是 DBF 和 list 的子类，因此向后兼容。

示例

#!/usr/bin/env python
# -*- coding: utf-8 -*-

from dbfread import DBF

def read_dbf(dbf_file):
    # table = DBF(dbf_file)
    table = DBF(dbf_file, encoding='gbk')  # 指定编码

    for record in table:  # 没加删除标志
    # for record in table.deleted:  # 加了删除标志
        # print(record)

        for field in record:
            print("%s = %s" % (field, record[field]), end = ", ")

        break

    print(len(table))

if __name__ == '__main__':
    dbf_file = 'dbf.dbf'
    read_dbf(dbf_file)

搜到了这段代码，比较轻量级的，虽然已是十年前的代码，暂时没有发现什么大的问题，调试起来比较方便：

dbf读取及csv输出，适用中文 -> python读写dbf文件 ->

DBF READER AND WRITER (PYTHON RECIPE)

Created by Raymond Hettinger on Tue, 11 Jan 2005 (PSF) (Licensed under the PSF License)

Reader iterates over records in Dbase or Xbase files. Writer creates dbf files from Python sequences.

Download

import struct, datetime, decimal, itertools

def dbfreader(f):
    """Returns an iterator over records in a Xbase DBF file.

    The first row returned contains the field names.
    The second row contains field specs: (type, size, decimal places).
    Subsequent rows contain the data records.
    If a record is marked as deleted, it is skipped.

    File should be opened for binary reads.

    """
    # See DBF format spec at:
    #     http://www.pgts.com.au/download/public/xbase.htm#DBF_STRUCT

    numrec, lenheader = struct.unpack('<xxxxLH22x', f.read(32))
    numfields = (lenheader - 33) // 32

    fields = []
    for fieldno in xrange(numfields):
        name, typ, size, deci = struct.unpack('<11sc4xBB14x', f.read(32))
        name = name.replace('\0', '')       # eliminate NULs from string
        fields.append((name, typ, size, deci))
    yield [field[0] for field in fields]
    yield [tuple(field[1:]) for field in fields]

    terminator = f.read(1)
    assert terminator == '\r'

    fields.insert(0, ('DeletionFlag', 'C', 1, 0))
    fmt = ''.join(['%ds' % fieldinfo[2] for fieldinfo in fields])
    fmtsiz = struct.calcsize(fmt)
    for i in xrange(numrec):
        record = struct.unpack(fmt, f.read(fmtsiz))
        if record[0] != ' ':
            continue                        # deleted record
        result = []
        for (name, typ, size, deci), value in itertools.izip(fields, record):
            if name == 'DeletionFlag':
                continue
            if typ == "N":
                value = value.replace('\0', '').lstrip()
                if value == '':
                    value = 0
                elif deci:
                    value = decimal.Decimal(value)
                else:
                    value = int(value)
            elif typ == 'D':
                y, m, d = int(value[:4]), int(value[4:6]), int(value[6:8])
                value = datetime.date(y, m, d)
            elif typ == 'L':
                value = (value in 'YyTt' and 'T') or (value in 'NnFf' and 'F') or '?'
            elif typ == 'F':
                value = float(value)
            result.append(value)
        yield result


def dbfwriter(f, fieldnames, fieldspecs, records):
    """ Return a string suitable for writing directly to a binary dbf file.

    File f should be open for writing in a binary mode.

    Fieldnames should be no longer than ten characters and not include \x00.
    Fieldspecs are in the form (type, size, deci) where
        type is one of:
            C for ascii character data
            M for ascii character memo data (real memo fields not supported)
            D for datetime objects
            N for ints or decimal objects
            L for logical values 'T', 'F', or '?'
        size is the field width
        deci is the number of decimal places in the provided decimal object
    Records can be an iterable over the records (sequences of field values).

    """
    # header info
    ver = 3
    now = datetime.datetime.now()
    yr, mon, day = now.year-1900, now.month, now.day
    numrec = len(records)
    numfields = len(fieldspecs)
    lenheader = numfields * 32 + 33
    lenrecord = sum(field[1] for field in fieldspecs) + 1
    hdr = struct.pack('<BBBBLHH20x', ver, yr, mon, day, numrec, lenheader, lenrecord)
    f.write(hdr)

    # field specs
    for name, (typ, size, deci) in itertools.izip(fieldnames, fieldspecs):
        name = name.ljust(11, '\x00')
        fld = struct.pack('<11sc4xBB14x', name, typ, size, deci)
        f.write(fld)

    # terminator
    f.write('\r')

    # records
    for record in records:
        f.write(' ')                        # deletion flag
        for (typ, size, deci), value in itertools.izip(fieldspecs, record):
            if typ == "N":
                value = str(value).rjust(size, ' ')
            elif typ == 'D':
                value = value.strftime('%Y%m%d')
            elif typ == 'L':
                value = str(value)[0].upper()
            else:
                value = str(value)[:size].ljust(size, ' ')
            assert len(value) == size
            f.write(value)

    # End of file
    f.write('\x1A')


# -------------------------------------------------------
# Example calls
if __name__ == '__main__':
    import sys, csv
    from cStringIO import StringIO
    from operator import itemgetter

    # Read a database
    filename = '/pydev/databases/orders.dbf'
    if len(sys.argv) == 2:
        filename = sys.argv[1]
    f = open(filename, 'rb')
    db = list(dbfreader(f))
    f.close()
    for record in db:
        print record
    fieldnames, fieldspecs, records = db[0], db[1], db[2:]

    # Alter the database
    del records[4]
    records.sort(key=itemgetter(4))

    # Remove a field
    del fieldnames[0]
    del fieldspecs[0]
    records = [rec[1:] for rec in records]

    # Create a new DBF
    f = StringIO()
    dbfwriter(f, fieldnames, fieldspecs, records)

    # Read the data back from the new DBF
    print '-' * 20
    f.seek(0)
    for line in dbfreader(f):
        print line
    f.close()

    # Convert to CSV
    print '.' * 20
    f = StringIO()
    csv.writer(f).writerow(fieldnames)
    csv.writer(f).writerows(records)
    print f.getvalue()
    f.close()




# Example Output
"""
['ORDER_ID', 'CUSTMR_ID', 'EMPLOY_ID', 'ORDER_DATE', 'ORDER_AMT']
[('C', 10, 0), ('C', 11, 0), ('C', 11, 0), ('D', 8, 0), ('N', 12, 2)]
['10005     ', 'WALNG      ', '555        ', datetime.date(1995, 5, 22), Decimal("173.40")]
['10004     ', 'BMARK      ', '777        ', datetime.date(1995, 5, 18), Decimal("3194.20")]
['10029     ', 'SAWYH      ', '777        ', datetime.date(1995, 6, 29), Decimal("97.30")]
['10013     ', 'RITEB      ', '777        ', datetime.date(1995, 6, 2), Decimal("560.40")]
['10024     ', 'RATTC      ', '444        ', datetime.date(1995, 6, 21), Decimal("2223.50")]
['10018     ', 'RATTC      ', '444        ', datetime.date(1995, 6, 12), Decimal("1076.05")]
['10025     ', 'RATTC      ', '444        ', datetime.date(1995, 6, 23), Decimal("185.80")]
['10038     ', 'OLDWO      ', '111        ', datetime.date(1995, 7, 14), Decimal("863.96")]
['10002     ', 'MTIME      ', '333        ', datetime.date(1995, 5, 16), Decimal("731.80")]
['10007     ', 'MORNS      ', '444        ', datetime.date(1995, 5, 24), Decimal("1405.00")]
['10026     ', 'MORNS      ', '555        ', datetime.date(1995, 6, 26), Decimal("17.40")]
['10030     ', 'LILLO      ', '111        ', datetime.date(1995, 7, 3), Decimal("909.91")]
['10022     ', 'LAPLA      ', '111        ', datetime.date(1995, 6, 19), Decimal("671.50")]
['10035     ', 'HIGHG      ', '111        ', datetime.date(1995, 7, 11), Decimal("1984.83")]
['10033     ', 'FOODG      ', '333        ', datetime.date(1995, 7, 6), Decimal("3401.32")]
--------------------
['CUSTMR_ID', 'EMPLOY_ID', 'ORDER_DATE', 'ORDER_AMT']
[('C', 11, 0), ('C', 11, 0), ('D', 8, 0), ('N', 12, 2)]
['MORNS      ', '555        ', datetime.date(1995, 6, 26), Decimal("17.40")]
['SAWYH      ', '777        ', datetime.date(1995, 6, 29), Decimal("97.30")]
['WALNG      ', '555        ', datetime.date(1995, 5, 22), Decimal("173.40")]
['RATTC      ', '444        ', datetime.date(1995, 6, 23), Decimal("185.80")]
['RITEB      ', '777        ', datetime.date(1995, 6, 2), Decimal("560.40")]
['LAPLA      ', '111        ', datetime.date(1995, 6, 19), Decimal("671.50")]
['MTIME      ', '333        ', datetime.date(1995, 5, 16), Decimal("731.80")]
['OLDWO      ', '111        ', datetime.date(1995, 7, 14), Decimal("863.96")]
['LILLO      ', '111        ', datetime.date(1995, 7, 3), Decimal("909.91")]
['RATTC      ', '444        ', datetime.date(1995, 6, 12), Decimal("1076.05")]
['MORNS      ', '444        ', datetime.date(1995, 5, 24), Decimal("1405.00")]
['HIGHG      ', '111        ', datetime.date(1995, 7, 11), Decimal("1984.83")]
['BMARK      ', '777        ', datetime.date(1995, 5, 18), Decimal("3194.20")]
['FOODG      ', '333        ', datetime.date(1995, 7, 6), Decimal("3401.32")]
....................
CUSTMR_ID,EMPLOY_ID,ORDER_DATE,ORDER_AMT
MORNS      ,555        ,1995-06-26,17.40
SAWYH      ,777        ,1995-06-29,97.30
WALNG      ,555        ,1995-05-22,173.40
RATTC      ,444        ,1995-06-23,185.80
RITEB      ,777        ,1995-06-02,560.40
LAPLA      ,111        ,1995-06-19,671.50
MTIME      ,333        ,1995-05-16,731.80
OLDWO      ,111        ,1995-07-14,863.96
LILLO      ,111        ,1995-07-03,909.91
RATTC      ,444        ,1995-06-12,1076.05
MORNS      ,444        ,1995-05-24,1405.00
HIGHG      ,111        ,1995-07-11,1984.83
BMARK      ,777        ,1995-05-18,3194.20
FOODG      ,333        ,1995-07-06,3401.32
"""

The dbf file format is old, precise, commonplace, and widely supported by everything from calendar software in PDAs, to contact managers, to Excel and Access. It can be a good way to get Python to interoperate with pre-existing, non-Python apps.

16 COMMENTS

Tim Morgan 18 years, 7 months ago # | flag

Visual FoxPro. With a visual foxpro dbf file I get an AssertionError regarding the terminator.

>>> db = list(dbfreader(f))
Traceback (most recent call last):
  File "", line 1, in ?
  File "dbf.py", line 29, in dbfreader
    assert terminator == '\r'
AssertionError

Can your recipe be modified easily to work with Visual FoxPro?

Marco Molinari 18 years, 6 months ago

small typo. there is a small typo in the example code: main function is written twice and the second "version" has a bug in it (a missing argument in dbfwriter()); the first version works though

Anthony Dycks 18 years, 3 months ago

yield statement syntax error using ActiveState Python V2.2. Attempting to run the DBF reader portion of the code results in the following invalid syntax error ...

yield [field[0] for field in fields]

an accompanying warning message is issued indicates that 'yield' will become a reserved word.

Chris Arndt 18 years, 1 month ago

Generators in Python 2.2. Generators were still an optional feature in Python 2.2.

Just add "from future import generators" to the top of the module.

hamish steiner 17 years, 11 months ago

1 byte out.... Great code, as a newbie I would love to write code this tight. When I ran this on a DBF file it was one byte out , adding a new line to grab an extra byte sorted this out.

    terminator = f.read(1)

    assert terminator == '\r'



    fields.insert(0, ('DeletionFlag', 'C', 1, 0))

    fmt = ''.join(['%ds' % fieldinfo[2] for fieldinfo in fields])

    fmtsiz = struct.calcsize(fmt)

    hamish = f.read(1)   ----NEW LINE

Then I had no problems. I dont know if this is due to an extra byte after the terminator '\r' ???. Anyway, that helped.

The other problem I had was empy dates. If the value was ' ' then I would get errors ie a=int(' ') gives an error

Changeing the code for the Date record to this

    elif typ == 'D':

        y, m, d = value[:4], value[4:6], value[6:8]

        value=d+"/"+m+"/"+y

worked and gave me the date in a dmy format. If the string was ' ' then I get ' / / ' which is ok.

Hamish

Alex Baraev 17 years, 10 months ago

Note that. File handlers for dbfwriter must be open in 'wb' mode, especially on Windows platform, because write method will convert 0xA to 0x0A 0x0D (CR to CRLF) and you'll get corrupted DBF.

Dennis Christopher 16 years, 9 months ago

problem writing dbf file to disk. I am butting my head against a wall trying to do something very simple with this: write a dbf file to disk (on XP). E.g modifying the sample script:

filename = 'c:\MyDBF.dbf'
g = open(filename, 'wb')
f = StringIO()
dbfwriter(f, fieldnames, fieldspecs, records)
g.write(f.getvalue())
g.flush()

Access doesnt like the resultant file. And DBFReader function also complains about it.

Any insight would be appreciated.

Dennis Christopher

Yusdi Santoso 16 years, 1 month ago

Python code for reading Visual FoxPro DBF. This is a Python module that read Visual FoxPro DBF file with memo:

http://www.physics.ox.ac.uk/users/santoso/Software.Repository.html

Hope this can be useful.

tyrion-mx 14 years, 11 months ago

I have made a simple lib to read DBF files, that looks like the django db/model api. It supports searching into the databases, creating new databases from a model definition and creating a model from an existing database. There is also a lower-level interface, if you don't like the "django like" syntax. It's still in the early stages of development, I hope to have the time to finish it.

david SHI 14 years, 7 months ago

How come I can not update or delete a record?

david SHI 14 years, 7 months ago

Get you make it generic?

Andrew Leonard 12 years, 8 months ago

I recently found this code, and i am using it to read arcmap shapefile dbf files, when i read in the data using the the dbfreader function some of the fields are being read in as various length string of '*'. I dunno why the code is doing this, i think it maybe the data, but excel, and arcmap read the data perfectly fine, some of the data columns are over the character length, which may be the problem. Any help would be greatly appreciated also i can provide the data that is giving issues too.

Thanks, Andrew

John Machin 12 years, 8 months ago

To Andrew Leonard: I have a non-published DBF reader module which may be able to help you. You can find my e-mail address by google_search("John Machin xlrd").

Raymond Hettinger (author) 12 years, 4 months ago

New link to the spec: http://www.clicketyclick.dk/databases/xbase/format/dbf.html#DBF_STRUCT

Ken MacDonald 11 years, 1 month ago

We are using the code above with a few mods for dumping CodeBase files, which has been a HUGE help for our development. However, CodeBase can create character fields with lengths greater than 255, and the above code chokes to death on that. This may be the solution to the problem Andrew Leonard mentions above, since he mentions over-length data columns.

As it turns out, the following line:

fmt = ''.join(['%ds' % fieldinfo[2] for fieldinfo in fields])

should look like:

fmt = ''.join(['%ds' % (fieldinfo[2] + (fieldinfo[3] * 256)) for fieldinfo in fields])

as fieldinfo[3] is the high order byte of the field length according to the spec at clicketyclick.dk, see comment above for URL. Once the above change is in place, files with long character fields read/dump properly.

CAVEAT: the spec mentions that this is the format of the high order byte for FoxPro and Clipper files for NON-NUMERIC data; the original posted code (comment, line 7) indicates that for some numeric types that fieldinfo[3] may be the number of decimal places. You may have to write some additional code if you have BOTH numeric and long character fields in your table. I didn't have any tables with decimal data to verify whether or not this is the case.

Good luck!

Ken

Tomas Nordin 9 years, 1 month ago

For my needs (reading in a dbf file) this awesome function just replaced a dependency on a 6000+ lines of code module (java style) I was using before. And loading a file (126K, 8) took about 3+ secs while with the module it took 6- secs. So function is fast. Awesome and fast.

More, the module I was using, enforced the field names to lowercase. Also, the module I was using enforced no duplicated field names (when reading!), else error. This function does no such thing. So it's also beautiful.