I have been searching for the deal with large CSV file read method
Its over 100gb and need to know how deal with the chunk file processing
and make concatenation faster
%%time
import time
filename = "../code/csv/file.csv"
lines_number = sum(1 for line in open(filename))
lines_in_chunk = 100# I don't know what size is better
counter = 0
completed = 0
reader = pd.read_csv(filename, chunksize=lines_in_chunk)
CPU times: user 36.3 s, sys: 30.3 s, total: 1min 6s
Wall time: 1min 7s
this won't take long but the problem is concat
%%time
df = pd.concat(reader,ignore_index=True)
this part take too long and take too much memory also
is there way to make this concat process faster and efficiently ?