P.J. LaBrocca
Revision 1.0, September 1999
p@labrocca.com |
This document is a detailed description of PeekDBF1, an application for peeking into and converting DBF files generated by database management systems such as dBase III+ (and up), FoxPro, Visual FoxPro, Clipper, CodeBase and others claiming dBase/xBase compatibility. You do not need the application that created the DBF.
You can extract and convert DBFs to standard data format (SDF), comma separated values (CSV), and HTML tables. PeekDBF transforms DBFs into human readable and/or computer usable form.
I use `DBF' to refer to any file that has an xBased/dBase table format. This includes Visual FoxPro files that end in .dbc, .frx, .lbx, .mnx, .pjx, .scx, and .vcx--there may be others, too.
The following sections explain how to install and use PeekDBF. Some background information is also presented.
peekdbf [-a|-s|-c|-h|-t] file.dbfPeekdbf's output goes to the screen. Use lines like
default header information
-a all
-s standard data format
-h HTML table
-c comma separated values
-t test
peekdbf table1.dbf | moreor
peekdbf table1.dbf > tempfileto make it easier to read. It may also help to view tempfile with an editor that does not wrap lines.
Create a subdirectory called PeekDBF and extract peekdbf1-2.zip into it. Copy peekdbf.exe to a directory listed in your computer's path. PeekDBF does not alter your system files in any way. To remove it simply delete the file peekdbf.exe and the subdirectory PeekDBF.
Create a directory called ~/PeekDBF and extract peekdbf1-2.tar.gz into it. Copy peekdbf to a directory listed in your computer's path, for example ~/bin.
DBFs are binary database table files produced by many database management programs. They consist of two parts: a header and the data records. You usually don't get to see most of the information stored in the header.
The header contains information about the structure of the table. In other words, the header explains how the data stored in the records should be interpreted.
The data within records may be stored in arbitrary combinations of fields of these types:
C | Character | |
Y | Currency | * |
N | Numeric | |
F | Float | |
D | DateTime | * |
B | Double | * |
I | Integer | * |
L | Logical | |
M | Memo | * |
G | General | * |
C | Character (binary) | * |
M | Memo (binary) | * |
(P | Picture | *) |
The records follow one after the other--nothing separates them. A database program figures out where one record ends and another begins by doing calculations based on information stored in the header.
The first byte in each record holds the `delete flag.' It contains an asterisk, *, 0x2A (42), if the record is deleted, otherwise, a blank 0x20 (32).
Fields contain either text (ASCII) or binary (coded) data. PeekDBF converts the ASCII data byte by byte to the corresponding ASCII characters. Binary data has to be converted based on the way the information is encoded. Memo fields contain information about the location of the memo text (memo text is stored in a separate file), while DateTime fields contain two four-byte integers that represent a date and a time.
The conversion of binary data to human readable form does not always correspond to expectations. Consider converting a DBF containing binary fields to an SDF file. SDFs should contain only ASCII characters in fixed-width fields. In DBFs, binary fields have a standard width that is specific to the internal representation of the data type. There exists no `standard' way to convert the binary data to fixed-width text. Fields marked with an asterisk in the table above contain binary information.
Consider the number 1,234 stored in a Number field (ASCII) and an Integer field (binary). If a Number field holds the digits '1234,' those digits are stored literally as the ASCII codes:
00 00 00 00 00 00 31 32 33 34 | ASCII Codes (hex) |
1 2 3 4 | Corresponding ASCII Characters |
Note that a number field is always ten bytes wide. The four digits are right justified; the left side is padded with nulls. In an SDF file, this data should be converted to seven blanks followed by the ASCII characters 1, 2, 3, 4.
In an Integer field, which is always four bytes wide, the same 1234 looks like this:
D2 04 00 00 | Hex |
210 4 0 0 | Decimal |
The bytes are decoded like this:
210 + (4 * 256) + (0 * 65,536) + (0 * 16,777216) = 1,234In this case the value fits in four bytes. If the field had contained D2 04 01 00 instead, we would need five bytes. Please bear in mind that comparisons between the sizes of binary representations and the sizes of their literal reprepresentations are meaningless.
PeekDBF handles binary data in a reasonable way, based on the command line option supplied.
PeekDBF offers two views of DBF files that show usually hidden information: a summary of the header and a dump of everything in the file. Use these options to view the dirty details of DBFs.
See table1.txt.2 PeekDBF's default action is to display a summary of the information contained in the header. Binary data is converted to readable form. All the numbers displayed are in decimal. The number of records shown includes those marked deleted. (`Deleted' records are not removed from a DBF until it is explicitly PACKed by a database program such as dBase.)
See table1-a.txt. The -a option dumps everything in the DBF. In addition to the default output (see above), bytes used by the system or reserved for future expansion are displayed. Some of these bytes are used by certain database programs. At the end of the field descriptions is the header terminator byte, 0x0d (13 ). (All numbers up to here are displayed in decimal.)
If the DBF was created by Visual FoxPro, the next 263 bytes describes the table's relationship to other files in a multi-file database.
Finally the data records start. Records marked for deletion are displayed. (In table1.dbf the third record is deleted.) Although they are not separated in any way in the DBF (the record length is used to locate records), PeekDBF breaks each record so they line up nicely for your viewing pleasure.
Here is were the mixture of text and binary can be confusing. With the -a option, PeekDBF displays exactly what is in the data records. ASCII data is displayed as printable characters. Binary data is displayed as hex bytes enclosed in angle brackets.
This option is useful for examining the contents of DBFs.
PeekDBF offers three conversions of DBFs. Of course, these conversions provide useful views of your DBFs, too. Use conversions to put the data from your DBFs into formats easily read by humans and/or computer programs.
See table1-s.txt. An SDF file contains records with fixed-width fields. Records are separated by newlines. SDF files should contain only text. If the DBF contains any binary fields (see above) a place holder is inserted into the output that looks like this
<Y!!!!!>The place holder is the same length (in bytes) as the field. It is set off with angle brackets and padded with exclamation points. This makes it easy to index into the output--you know exactly where every field starts. This allows you to extract information from your DBF using indexing methods, as you might from C/C++.
This format is useful when you want to access fields by position within a record.
Records marked deleted are NOT included in the output.
See table1-c.txt. The -c option offers the most complete conversion of data into text. All the binary fields except for M and G are converted into their actual values. Since the fields are delimited by commas, the length of the output string is irrelevant. Excess blanks, fore and aft, are stripped. If you need to pull information out of a DBF file this is probably your best choice. Memo fields produce the string `memo' and general fields produce the string `gen.' Empty fields produce a comma on output, so it is possible to have several commas in a row.
The output from -c is ideal for use with Awk and Perl.
Records marked deleted are NOT included in the output.
See table1-h.html. This option produces an HTML table in a complete Web page. The first row of the table contains the names of each field in <TH></TH> tags. Blank fields have a dash inserted (some browsers choke on empty table entries). The conversions done by -h are the same as those of -c. The only attributes are those in the table-tag:
<TABLE BORDER CELLPADDING="5">
This is the wishful thinking section. A future version of PeekDBF will test and analyze your troubled DBFs.
See table1-t.txt. This option lists the type of DBF and compares the stored header length with one calculated from analysis.
The current version of PeekDBF is available at
http://labrocca.com/
A single user license for PeekDBF costs only $25. If you choose not to license your copy of PeekDBFtm you must remove it from your system.
You may not distribute PeekDBF in any form by any means without written permission from the copyright owner. Purchase License
HTML tables generated by PeekDBF contain copyright information that refers to PeekDBF, not your converted data. You can publish these tables anyway you want provided the copyright information remains in the HTML file unaltered.
PeekDBF only reads files. It never alters them in any way. Nevertheless, you use PeekDBF at your own risk. Back up important files often.PeekDBF(tm) Copyright ©1994-1999 by P.J. LaBrocca. All rights reserved.
This document was generated using the LaTeX2HTML translator Version 98.1p1 release (March 2nd, 1998)
Copyright © 1993, 1994, 1995, 1996, 1997, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
The command line arguments were:
latex2html -split 0 -no_navigation -show_section_numbers peekmod.tex.
The translation was initiated by P.J. LaBrocca on 1999-09-04