16

I have a csv file that grows until it reaches approximately 48M of lines.

Before adding new lines to it, I need to read the last line.

I tried the code below, but it got too slow and I need a faster alternative:

def return_last_line(filepath):    
    with open(filepath,'r') as file:        
        for x in file:
            pass
        return x        
return_last_line('lala.csv')
5
  • 3
    You can use f.seek(-1, 2). This is a quote from Python docs "To change the file object’s position, use f.seek(offset, whence). The position is computed from adding offset to a reference point; the reference point is selected by the whence argument. A whence value of 0 measures from the beginning of the file, 1 uses the current file position, and 2 uses the end of the file as the reference point. whence can be omitted and defaults to 0, using the beginning of the file as the reference point.". So -1 means the last byte and 2 means read from the end of file Mar 6, 2021 at 15:35
  • 2
    f.seek(offset, whence) docs.python.org/3/tutorial/inputoutput.html Mar 6, 2021 at 15:40
  • 1
    BTW, isn't it possible to keep the file open over the writes? Mar 6, 2021 at 16:38
  • That is basically a subset of my solution here: stackoverflow.com/a/26747854/122033 Mar 16, 2021 at 22:28
  • Does this answer your question? Get last n lines of a file, similar to tail Mar 17, 2021 at 19:32

9 Answers 9

9

Here is my take, in python: I created a function that lets you choose how many last lines, because the last lines may be empty.

def get_last_line(file, how_many_last_lines = 1):

    # open your file using with: safety first, kids!
    with open(file, 'r') as file:

        # find the position of the end of the file: end of the file stream
        end_of_file = file.seek(0,2)
        
        # set your stream at the end: seek the final position of the file
        file.seek(end_of_file)             
        
        # trace back each character of your file in a loop
        n = 0
        for num in range(end_of_file+1):            
            file.seek(end_of_file - num)    
           
            # save the last characters of your file as a string: last_line
            last_line = file.read()
           
            # count how many '\n' you have in your string: 
            # if you have 1, you are in the last line; if you have 2, you have the two last lines
            if last_line.count('\n') == how_many_last_lines: 
                return last_line
get_last_line('lala.csv', 2)

This lala.csv has 48 million lines, such as in your example. It took me 0 seconds to get the last line.

6
  • 3
    This isn't actually correct. The '\n' count is one too little for Unix text files. A line is terminated by \n, therefore a text file ends with '\n' and by default your get_last_line would just return the line terminator for the last line, not the last line. Mar 6, 2021 at 16:11
  • But it worked... Sorry, I did not understand your complaint. Are you saying it would not work outside of Windows? Mar 6, 2021 at 17:13
  • 1
    I see what you mean now. However, in a txt file I created as a test, there was only one \n. That's why I enabled the option to select how many last lines. Here folows my test: with open('teste.txt', 'w') as x: x.write('lala\nfifi\ndede') Mar 6, 2021 at 18:46
  • 1
    yes, it needs to work in both cases. However, a text file must end in a \n on Unix Mar 6, 2021 at 20:25
  • 1
    A mac is unix, right? The file I created on a mac did not finish on a '\n'. I created the file with python. Anyways, this conversation helped me further understand the code. Thanks! Mar 6, 2021 at 21:38
7

Here is code for finding the last line of a file mmap, and it should work on Unixen and derivatives and Windows alike (I've tested this on Linux only, please tell me if it works on Windows too ;), i.e. pretty much everywhere where it matters. Since it uses memory mapped I/O it could be expected to be quite performant.

It expects that you can map the entire file into the address space of a processor - should be OK for 50M file everywhere but for 5G file you'd need a 64-bit processor or some extra slicing.

import mmap


def iterate_lines_backwards(filename):
    with open(filename, "rb") as f:
        # memory-map the file, size 0 means whole file
        with mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ) as mm:
            start = len(mm)

            while start > 0:
                start, prev = mm.rfind(b"\n", 0, start), start
                slice = mm[start + 1:prev + 1]
                # if the last character in the file was a '\n',
                # technically the empty string after that is not a line.
                if slice:
                    yield slice.decode()


def get_last_nonempty_line(filename):
    for line in iterate_lines_backwards(filename):
        if stripped := line.rstrip("\r\n"):
            return stripped


print(get_last_nonempty_line("datafile.csv"))

As a bonus there is a generator iterate_lines_backwards that would efficiently iterate over the lines of a file in reverse for any number of lines:

print("Iterating the lines of datafile.csv backwards")
for l in iterate_lines_backwards("datafile.csv"):
    print(l, end="")
3
  • Worked in Windows! Mar 6, 2021 at 17:06
  • I like this solution very much, because it's very efficient. Instead of throwing empty lines away, the caller should do this if it's required. But there is one pain point. What happens with the open file after the iteration was stopped from the caller?
    – DeaD_EyE
    Mar 17, 2021 at 10:31
  • @DeaD_EyE good point, it stays open until iteration is exhausted and there is an active reference to the filehandle. Of course the open file could be passed in instead of the filename. Mar 17, 2021 at 12:52
4

If you are running your code in a Unix based environment, you can execute tail shell command from Python to read the last line:

import subprocess

subprocess.run(['tail', '-n', '1', '/path/to/lala.csv'])
2
3

This is generally a rather tricky thing to do. A very efficient way of getting a chunk that includes the last lines is the following:

import os


def get_last_lines(path, offset=500):
    """ An efficient way to get the last lines of a file.

    IMPORTANT: 
    1. Choose offset to be greater than 
    max_line_length * number of lines that you want to recover.
    2. This will throw an os.OSError if the file is shorter than
    the offset.
    """
    with path.open("rb") as f:
        f.seek(-offset, os.SEEK_END)
        while f.read(1) != b"\n":
            f.seek(-2, os.SEEK_CUR)
        return f.readlines()

You need to know the maximum line length though and ensure that the file is at least one offset long!

To use it, do the following:

from pathlib import Path


n_last_lines = 10
last_bit_of_file = get_last_lines(Path("/path/to/my/file"))
real_last_n_lines = last_bit_of_file[-10:]

Now finally you need to decode the binary to strings:

real_last_n_lines_non_binary = [x.decode() for x in real_last_n_lines]

Probably all of this could be wrapped in one more convenient function.

0
2

You could additionally store the last line in a separate file, which you update whenever you add new lines to the main file.

1

This works well for me:
https://pypi.org/project/file-read-backwards/

from file_read_backwards import FileReadBackwards

with FileReadBackwards("/tmp/file", encoding="utf-8") as frb:

    # getting lines by lines starting from the last line up
    for l in frb:
        if l:
            print(l)
            break
0

An easy way to do this is with deque:

from collections import deque

def return_last_line(filepath):
    with open(filepath,'r') as f:
        q = deque(f, 1)
    return q[0]
4
  • this isn't any more performant than the original. Mar 6, 2021 at 16:01
  • @AnttiHaapala. I timed mine. It took 0.53 seconds for a 56MB file. While yours is more efficient, this gets the job done for a 40MB file with a lot less code.
    – pakpe
    Mar 6, 2021 at 16:29
  • OP said that their method, to which yours is algorithmically exactly equivalent, is not performant enough: "I tried the code below, but it got too slow and I need a faster alternative" Mar 6, 2021 at 16:35
  • "It took 0.53 seconds for a 56MB file" - How long did the OP's code take for that file?
    – Manuel
    Mar 15, 2021 at 22:22
0

since seek() returns the position that it moved to, you can use it to move backward and position the cursor to the beginning of the last line.

with open("test.txt") as f:
    p = f.seek(0,2)-1              # ignore trailing end of line
    while p>0 and f.read(1)!="\n": # detect end of line (or start of file)
        p = f.seek(p-1,0)          # search backward
    lastLine = f.read().strip()    # read from start of last line
print(lastLine)

To get the last non-empty line, you can add a while loop around the search:

with open("test.txt") as f:
    p,lastLine = f.seek(0,2),""    # start from end of file
    while p and not lastLine:      # want last non-empty line
        while p>0 and f.read(1)!="\n": # detect end of line (or start of file)
            p = f.seek(p-1,0)          # search backward
        lastLine = f.read().strip()    # read from start of last line
2
  • Return empty string Mar 6, 2021 at 17:04
  • That would happen if your last line is empty and the file has a trailing end of line (as is often standard). Do you need to get the last non-empty line instead ?
    – Alain T.
    Mar 6, 2021 at 17:06
0

Based on @kuropan

Faster and shorter:

# 60.lastlinefromlargefile.py
# juanfc 2021-03-17

import os


def get_last_lines(fileName, offset=500):
    """ An efficient way to get the last lines of a file.

    IMPORTANT:
    1. Choose offset to be greater than
    max_line_length * number of lines that you want to recover.
    2. This will throw an os.OSError if the file is shorter than
    the offset.
    """
    with open(fileName, "rb") as f:
        f.seek(-offset, os.SEEK_END)
        return f.read().decode('utf-8').rstrip().split('\n')[-1]



print(get_last_lines('60.lastlinefromlargefile.py'))

Not the answer you're looking for? Browse other questions tagged or ask your own question.