2

I have a csv file with 1000 columns and 50 rows that was collected over a few months. Embedded randomly in the file are ~1000 small datasets with semi-standard formats (some columns differ, so does number of rows). There are also identifying parameters in the cells around each dataset, although exactly where they are relative to a dataset varies.

I want to get this messy excel file into R or python. How should I do it?

(I know I should buckle down and copy and paste the datasets into a standard format, but that seems tedious and error prone. So does writing an ad hoc function to do it. Do I have any other options??)

  • I would write a small script to cover the most regular parts and fix the rest manually. The script could also do some sanity checks (e.g. check valid number of columns etc.) – Erwan Sep 17 '19 at 23:44
  • 1
    Do you have a small example of the file? – Edmund Sep 17 '19 at 23:52
  • I really recommend writing a function using [tidyxl](https://www.r-bloggers.com/tidying-messy-excel-data-tidyxl/) and unpivotr. – Lincoln Sep 19 '19 at 20:38
  • Thanks for comments. Fortunately I was given the raw data used to make the large file, so I no longer need to read the large file. – R Greg Stacey Sep 19 '19 at 21:37

0 Answers0