Page 1 of 1

Reading nth line in a file

Posted: Thu Apr 08, 2021 9:47 am
by Erg2
Hi,

I’ve tried for a while and can’t find a solution to my problem. I’m a beginner and a learner.

My file contains 6000 text lines.
I use FILE N$ READLINE X$
I want to start the redline to only start to read from say line 5000 and onwards, not from the beginning of the file

FILE N$ READLINE X$, works, however it takes a very long time to process before it gets to line 5000. I’m thinking if it starts to read from line 5000 then processing time will reduce significantly.

My question how to use FILE N$ READLINE X$ to start reading from line 5000 (or the n’th line in the file)

Would appreciate if anyone can help with a sample routine. :)

Re: Reading nth line in a file

Posted: Thu Apr 08, 2021 6:10 pm
by matt7
I recently tried to figure out how to do this but ended up settling for doing a bunch of READLINEs until I reached the line I wanted. However, your post got me thinking about the problem again and I realized that if you know the file size at the line you want, you can use FILE N$ SETPOS X.

I briefly tested the below function on a Unicode data file I have that has a file size of 955,710 and it seems to work, but do your own testing. Memory may be an issue depending on the file size, as I'm not sure what the limits are for FILE N$ READDIM M, N on various devices.

Code: Select all

n$ = "/data/unicode_table"
FILE_GOTO_LINE(n$, 5000)
FILE n$ READLINE line$
PRINT line$

DEF FILE_GOTO_LINE (name$, line)
  
  FILE name$ SETPOS 0
  IF line = 0 THEN RETURN
  FILE name$ READDIM bytes, nBytes
  
  lineCount = 0
  FOR i = 0 TO nBytes-1
    IF bytes(i) = 10 THEN    ' newline character
      lineCount += 1
      IF lineCount = line THEN BREAK i
    END IF
  NEXT i
  
  FILE name$ SETPOS i+1
  
END DEF

Re: Reading nth line in a file

Posted: Thu Apr 08, 2021 6:29 pm
by matt7
Follow-up thought: If memory is an issue or you just want a safer way to do this, modify the above function to use FILE N$ READDIM M, N, K in a loop, reading K bytes at a time. Then you can keep overwriting the bytes() variable as you count newline characters, keeping track of the total running file size until you reach the target line. I believe all READDIMs begin the read from the current file position, so this means you don't have to read a very large file in its entirety to an array.

Re: Reading nth line in a file

Posted: Fri Apr 09, 2021 8:12 pm
by Erg2
Hi Matt,

Thank you so much, works great.
Regards,

Re: Reading nth line in a file

Posted: Sun Apr 11, 2021 6:05 pm
by matt7
You're welcome. However, after some testing I determined that using a bunch of READLINE calls is by far the fastest method.

Here is my test file that times three different FILE_SETLINE functions:

Code: Select all

DEF FILE_SETLINE_RD (file$, line)

  FILE file$ SETPOS 0
  IF line = 0 THEN RETURN
  FILE file$ READDIM bytes, nBytes

  lineCount = 0
  FOR b = 0 TO nBytes-1
    IF bytes(b) = 10 THEN    ' newline character
      lineCount += 1
      IF lineCount = line THEN BREAK b
    END IF
  NEXT b

  FILE file$ SETPOS b+1

END DEF


FILE_SETLINE_RDK.bufSize = 10000

DEF FILE_SETLINE_RDK (file$, line)

  FILE file$ SETPOS 0
  IF line = 0 THEN RETURN

  byteCount = 0
  lineCount = 0
  DO

    FILE file$ READDIM bytes, nBytes, bufSize

    FOR b = 0 TO nBytes-1
      IF bytes(b) = 10 THEN    ' newline character
        lineCount += 1
        IF lineCount = line THEN BREAK b
      END IF
    NEXT b

    byteCount += b

  UNTIL lineCount = line OR FILE_END(file$)

  FILE file$ SETPOS byteCount+1

END DEF


DEF FILE_SETLINE_RL (file$, line)

  FILE file$ SETPOS 0
  FOR i = 0 TO line-1
    FILE file$ READLINE line$
  NEXT i

END DEF


n$ = "/data/unicode_table"
x = 5000

t = TIME()
FILE_SETLINE_RD(n$,x)
t = TIME() - t
FILE n$ READLINE lineRD$
PRINT "READDIM   ="; t

t = TIME()
FILE_SETLINE_RDK(n$,x)
t = TIME() - t
FILE n$ READLINE lineRDK$
PRINT "READDIM K ="; t

t = TIME()
FILE_SETLINE_RL(n$,x)
t = TIME() - t
FILE n$ READLINE lineRL$
PRINT "READLINE  ="; t

PRINT ""
IF lineRD$ = lineRDK$ AND lineRD$ = lineRL$ THEN
  PRINT "Lines match ✅"
  PRINT lineRL$
ELSE
  PRINT "Lines don't match ⛔"
  PRINT lineRD$
  PRINT lineRDK$
  PRINT lineRL$
END IF
Setting the file position to line 5 results in the following times:

Code: Select all

READDIM   = 0.048583
READDIM K = 0.000683069
READLINE  = 0.000148892
Setting the file position to line 5000 results in the following times:

Code: Select all

READDIM   = 0.191181
READDIM K = 0.153618
READLINE  = 0.062402