Tools for reading data from large, irregular csv files (aka excel file hell)

Asked Sep 17 '19 at 22:55

Active Sep 19 '19 at 21:14

Viewed 241 times

I have a csv file with 1000 columns and 50 rows that was collected over a few months. Embedded randomly in the file are ~1000 small datasets with semi-standard formats (some columns differ, so does number of rows). There are also identifying parameters in the cells around each dataset, although exactly where they are relative to a dataset varies.

I want to get this messy excel file into R or python. How should I do it?

(I know I should buckle down and copy and paste the datasets into a standard format, but that seems tedious and error prone. So does writing an ad hoc function to do it. Do I have any other options??)

edited Sep 18 '19 at 18:42

asked Sep 17 '19 at 22:55

R Greg Stacey

I would write a small script to cover the most regular parts and fix the rest manually. The script could also do some sanity checks (e.g. check valid number of columns etc.) – Erwan Sep 17 '19 at 23:44
1

Do you have a small example of the file? – Edmund Sep 17 '19 at 23:52
I really recommend writing a function using [tidyxl](https://www.r-bloggers.com/tidying-messy-excel-data-tidyxl/) and unpivotr. – Lincoln Sep 19 '19 at 20:38
Thanks for comments. Fortunately I was given the raw data used to make the large file, so I no longer need to read the large file. – R Greg Stacey Sep 19 '19 at 21:37

Tools for reading data from large, irregular csv files (aka excel file hell)

0 Answers0