A direct access database system
Posted: Sat Dec 10, 2016 5:31 pm
Back in the late sixties (1966-1969) i was a Fortran programmer on one of the first IBM 360 series mainframes in the Netherlands. For those who have little idea about the hardware in those days: the 360/30 CPU was a 32-bit system, running at about 1 Megaherz, internal memory was 64 Kbytes, 4 removable discdrives at 7.2 Mbytes each, some tape-units, and a lineprinter. All this in a computerroom of some 100 m2.
Hence, capacity was at a premium, which affected programming style and data organization.
One such consequence was that files always had to reside on disc (DASD in IBM talk). Loading data files in large arrays for processing was mostly out of the question.
For that reason we used a standard IBM file organization, ISAM, ("indexed sequential access method"), when we needed direct access to records in the file, or BDAM ("basic direct access method") with an extra, self written, layer for logical access.
This last method is described herafter, and the method is interesting enough (for me, that is) to do the coding once again, but now in SmartBasic.
The basic organization is shown in the following picture.
The DB (database) consists of a number of fixed length "slots". One slot contains one record. The "logical" access consists of a key, which may be one field or a combination of fields out of the record. This key is as such added to the slot. Finally, a link field is added to enable a "linked list" mechanism. So, a slot contains 3 strings.
Physical access is by a simple adress (or file-pointer) in the linear adress space of the file. It is simply: adress = slotnumber x slotsize.
The mapping of the (large) logical key-space into the relatively small slotnumber-space is done with a hashing algorithm. The input is a lengthy complex key, out comes a simple slotnumber which fits in what is called the "primary" area of the DB.
It is always possible that two or even more different logical keys are mapped into the same slotnumber. Those "synonimes" are then placed into a "secondary" or "overflow" area, connected with the parent record in the primary area using a linked list method see figure).
The advantage of packing the fields of a record in one string is that the "low level" functions to create and maintain the DB are independent from the actual record
lay-out. I will gradually provide some basic functions, starting with creation of a DB and adding records to the DB. Later on: deletion of records, getting records for updates and write them back. "Low level" : those functions handle entire records in the form of a string. It's to the user to do the user interface and to assemble and disassemble the string from and into the specific record lay-out. Just for testing I use a testcase in the form of some "personnel" records with a simple lay-out: Surname, name, department, age, salary. In my testprogram, the test-DB is fed 9 records from a read_data construct, instead from a normal "add record" panel.
The number of slots might be chosen between 1x and 2x the expected amount of records to be stored in the DB. At 1x #records, about 60% of the records will be placed in the primary area and 40% in the overflow area (causing a few extra accesses to the DB).
Currently there are 4 functions that are normally used by the user:
- create_db(filename,DBname,slotsize,expected # of records) for creating a new DB
- db(filename) for opening an existing DB
- add_rec(key,record) for adding records to the DB
- get_rec$(key) for retrieving record with key <key>
The other functions are "low level" functions, used by the above mentioned functions.
Coming up: edit_rec(...) and del_rec(...)
Hence, capacity was at a premium, which affected programming style and data organization.
One such consequence was that files always had to reside on disc (DASD in IBM talk). Loading data files in large arrays for processing was mostly out of the question.
For that reason we used a standard IBM file organization, ISAM, ("indexed sequential access method"), when we needed direct access to records in the file, or BDAM ("basic direct access method") with an extra, self written, layer for logical access.
This last method is described herafter, and the method is interesting enough (for me, that is) to do the coding once again, but now in SmartBasic.
The basic organization is shown in the following picture.
The DB (database) consists of a number of fixed length "slots". One slot contains one record. The "logical" access consists of a key, which may be one field or a combination of fields out of the record. This key is as such added to the slot. Finally, a link field is added to enable a "linked list" mechanism. So, a slot contains 3 strings.
Physical access is by a simple adress (or file-pointer) in the linear adress space of the file. It is simply: adress = slotnumber x slotsize.
The mapping of the (large) logical key-space into the relatively small slotnumber-space is done with a hashing algorithm. The input is a lengthy complex key, out comes a simple slotnumber which fits in what is called the "primary" area of the DB.
It is always possible that two or even more different logical keys are mapped into the same slotnumber. Those "synonimes" are then placed into a "secondary" or "overflow" area, connected with the parent record in the primary area using a linked list method see figure).
The advantage of packing the fields of a record in one string is that the "low level" functions to create and maintain the DB are independent from the actual record
lay-out. I will gradually provide some basic functions, starting with creation of a DB and adding records to the DB. Later on: deletion of records, getting records for updates and write them back. "Low level" : those functions handle entire records in the form of a string. It's to the user to do the user interface and to assemble and disassemble the string from and into the specific record lay-out. Just for testing I use a testcase in the form of some "personnel" records with a simple lay-out: Surname, name, department, age, salary. In my testprogram, the test-DB is fed 9 records from a read_data construct, instead from a normal "add record" panel.
The number of slots might be chosen between 1x and 2x the expected amount of records to be stored in the DB. At 1x #records, about 60% of the records will be placed in the primary area and 40% in the overflow area (causing a few extra accesses to the DB).
Currently there are 4 functions that are normally used by the user:
- create_db(filename,DBname,slotsize,expected # of records) for creating a new DB
- db(filename) for opening an existing DB
- add_rec(key,record) for adding records to the DB
- get_rec$(key) for retrieving record with key <key>
The other functions are "low level" functions, used by the above mentioned functions.
Coming up: edit_rec(...) and del_rec(...)
Code: Select all
first_time=0
dim field$(5) ' test DB has staff records with 5 fields
f$="testdb"
if first_time then
create_db(f$,"Test_database",50,10)
for i=0 to 8
read rec$
split rec$ to field$,n with " " ' to extract the key
key$=field$(1)
add_rec(key$,rec$)
next i
data "John Friedman sales 32 4500 "
data " Betty Longa ledger 28 3700"
data " Ann Strady sales 43 5300 "
data "John-John Brady production 25 3250 "
data " Lydia Carter sales 36 3300 "
data " Peter Pan R&D 48 4100"
data " Reginald Nobody production 23 3100 "
data " Mary Poppins sales 29 3650 "
data " Elly Sunshine production 32 3560 "
end if
db(f$) ' open database f$
do
input "Name? ": key$
if key$="stop" or key$="Stop" then break
record$=get_record$(key$)
if record$="not found" then
print ! print record$ ! continue
end if
split record$ to field$,n with " "
print
print "surname - " & field$(0)
print "name - " & field$(1)
print "department - " & field$(2)
print "age - " & field$(3)
print "salary - " & field$(4)
until forever
end
' Create a direct access database with fixed length slots
' fn$ - filename on "disc"
' db_name$ - database name for readability
' S_size - slotsize for key + record + link fields
' N_rec - estimated max. # of records
'
def create_db(fn$,db_name$,S_size,N_rec)
N_prim=N_rec ' # of slots in primary area
S_free=N_rec+1 ' initial first free slot
'
dim slot(S_size)
if file_exists(fn$) then file fn$ delete
'
' create space for the database
for i=0 to N_prim ! file fn$ writedim slot ! next i
'
' write slot number 0 with meta data
file fn$ setpos 0
file fn$ writeline db_name$,str$(S_size),str$(N_prim),str$(S_free)
'
' init the slots in the primary area
for i=1 to N_rec
file fn$ setpos S_size*i
file fn$ writeline "","","0"
next i
db(fn$)
end def
def db(fn$) ' open database f$ (read meta data in slot 0)
file fn$ setpos 0
file fn$ readline db_name$,size$,prim$,free$
S_size=size$ ! N_prim=prim$ ! S_free=free$
end def
def add_rec(key$,rec$)
adr=get_adr(key$)
if adr>0 then return -1 ' record with same key already present
if adr<0 then ' free slot in primary area
adr=-adr
file db.fn$ setpos adr*db.S_size
file db.fn$ writeline key$,rec$,"0"
return adr
end if
new=hash(key$)
file db.fn$ setpos new*db.S_size
file db.fn$ readline aux$,trec$,link$
rec_to=get_1st_free()
file db.fn$ setpos rec_to*db.S_size ! file db.fn$ writeline " "
file db.fn$ setpos rec_to*db.S_size
file db.fn$ writeline key$,rec$,link$
file db.fn$ setpos new*db.S_size
file db.fn$ writeline aux$,trec$,str$(rec_to)
put_1st_free(rec_to+1)
end def
' retrieve record with key <key$> from the DB
' returns <not found> if not found
'
def get_record$(key$)
slot=get_adr(key$)
if slot>0 then
file db.fn$ setpos slot*db.S_size
file db.fn$ readline t$,rec$,t$
return rec$
else
return "not found"
end if
end def
' find adress of record with given key
' key$ - the key on which to search
' function returns one of 3 values:
' adr > 0 : record found in slot <adr> (update or delete record)
' adr = 0 : no record found with that key (error key or add new record)
' adr < 0 : not found, but slot <abs(adr)> is free (add new record)
'
def get_adr(key$)
adr=hash(key$)
while adr>0
file db.fn$ setpos adr*db.S_size
file db.fn$ readline aux$,rec$,link$
if aux$=key$ then return adr
if aux$="" then return -adr
adr=val(link$)
end while
return 0
end def
' returns the adress of the first free slot
'
def get_1st_free()
file db.fn$ setpos 0
file db.fn$ readline t$,t$,t$,t$
return val(t$)
end def
' edits the adress of the first free slot
'
def put_1st_free(adr)
file db.fn$ setpos 0
file db.fn$ readline t1$,t2$,t3$,t4$
file db.fn$ setpos 0
file db.fn$ writeline t1$,t2$,t3$,str$(adr)
return
end def
' transform key tt$ into a DB adress (slotnumber)
'
def hash(tt$)
nt=len(tt$) ! adr=1
for i=0 to nt-1 ! adr+=asc(mid$(tt$,i,1))-32 ! adr*=7 ! next i
return 1+adr%db.N_prim
end def