Questions tagged [csv]

Comma-Separated Values are a list of plain text values delimited by commas, or a file containing one or more lists in that format.

CSV is a file format describing a plain text file with information separated by commas (,).

While technically plain text files (.txt), these files are given the .csv extension to indicate that the data is delimited by commas and can be afforded special parsing.

The MIME type for CSV files is text/csv.

Information is often stored as CSV format to make it easy to transfer tables of data between applications. Each row of a table is represented as a list of plain text (human-readable) values with commas as delimiters between each discrete piece of data. Values may be enclosed in quotes, especially if they contain spaces (which might otherwise be parsed as delimiters), or if the data itself contains commas or line breaks. The first row of data often contains the headings of the table columns, which describe the meaning of the data in each column.

Example:

Tabular format:

+-------+-------------+----------+----------------------+
| Time  | Temperature | Humidity | Description          |
+-------|-------------|----------|----------------------+
| 08:00 |     70      |    35    | Sunny and Clear      |
| 11:45 |     94      |    90    | Hazy, Hot, and Humid |
| 14:30 |     18      |          | Freezing             |
+-------+-------------+----------+----------------------+

CSV format:

Time,Temperature,Humidity,Description
08:00,70,35,"Sunny and Clear"
11:45,94,90,"Hazy, Hot, and Humid"
14:30,18,,Freezing

In this example, the first row of CSV data serves as the "header" which describes the corresponding data below it. Each successive line of the CSV file would then neatly fit into the same field as the first line. There is no inherent way to describe within a CSV file whether the first row is a header row or not.

Note that empty fields (fields with no available data, such as the third field in the last line) are place-held with commas so that the fields that follow may be correctly placed.

Questions tagged are expected to relate to programming in some way, for example, parsing/importing CSV files or creating them programmatically.

Related links:

Reference: Stack Overflow SE

102 questions
16
votes
2 answers

How to store strings in CSV with new line characters?

My question is: what are ways I can store strings in a CSV that contain newline characters (i.e. \n), where each data point is in one line? Sample data This is a sample of the data I have: data = [ ['some text in one line', 1], ['text…
Bruno Lubascher
  • 3,488
  • 1
  • 11
  • 35
8
votes
2 answers

ValueError: could not convert string to float: '���'

I have a (2M, 23) dimensional numpy array X. It has a dtype of
cappy0704
  • 231
  • 1
  • 3
  • 7
8
votes
5 answers

I got the following error : 'DataFrame' object has no attribute 'data'

I am trying to get the 'data' and the 'target' of the iris setosa database, but I can't. For example, when I load the iris setosa directly from sklearn datasets I get a good result: Program: from sklearn import datasets import numpy as np iris =…
user58187
  • 81
  • 1
  • 1
  • 2
7
votes
2 answers

Merging large CSV files in pandas

I have two CSV files (each of the file size is in GBs) which I am trying to merge, but every time I do that, my computer hangs. Is there no way to merge them in chunks in pandas itself?
enterML
  • 3,011
  • 9
  • 26
  • 38
6
votes
3 answers

How to get K most different rows in csv?

We have boring CSV with 10000 rows of ages (float), titles (enum/int), scores (float). How to select 1000 most different rows? I look for a general solution that would work for more than one case. What do I mean by different: We have N columns each…
Blender
  • 161
  • 3
6
votes
3 answers

Reading CSVs with new lines in fields with Spark

I was trying to load the below weblogic domain log (application error log) into Spark dataframe. I created a RDD and converted the RDD into dataframe. I was able to load the data successfully for the first two rows because the records are not spread…
uk2016
  • 61
  • 1
  • 1
  • 2
6
votes
4 answers

How can I observe my CSV files better?

I'm running a lot of experiments that give their output as CSV files. An experiment might be running for hours, with a new line being added to the CSV every 10 seconds. Right now I'm opening these CSV files in a text editor, which isn't too…
Ram Rachum
  • 255
  • 1
  • 5
6
votes
3 answers

Summarize and visualize a CSV in Java/Scala?

I would like to summarize (as in R) the contents of a CSV (possibly after loading it, or storing it somewhere, that's not a problem). The summary should contain the quartiles, mean, median, min and max of the data in a CSV file for each numeric…
Trylks
  • 178
  • 8
5
votes
2 answers

What kind of statistical analyses can I do with my data?

I'm trying to analyze human intentions in clicking google ad word keywords. In this dataset I have the usual adword details, for example CTR = Clicks / Impressions CPC = Cost / Clicks CPA = Cost / Converted Clicks ROI = Total Conversion Value /…
Miller
  • 287
  • 2
  • 9
4
votes
2 answers

Recommendations for storing time series data

As part of my thesis I've done some experiments that have resulted in a reasonable amount of time-series data (motion-capture + eye movements). I have a way of storing and organizing all of this data, but it's made me wonder whether there are best…
lmjohns3
  • 558
  • 6
  • 19
4
votes
1 answer

Any one read the over 100gb csv file and successfully concatenation?

I have been searching for the deal with large CSV file read method Its over 100gb and need to know how deal with the chunk file processing and make concatenation faster %%time import time filename = "../code/csv/file.csv" …
slowmonk
  • 513
  • 1
  • 7
  • 16
4
votes
2 answers

File format where column names are repeated on each row

I have received a dataset in text file with the following format col1=datac1r1,col2=datac2r1,col3=datac3r1 col1=datac1r2,col2=datac2r2,col3=datac3r2 col1=datac1r3,col2=datac2r3,col3=datac3r3 col1=datac1r4,col2=datac2r4,col3=datac3r4 Each row is a…
tallharish
  • 153
  • 3
3
votes
5 answers

Quick way to visually explore data?

I have a very large csv file which Apple Numbers won't open. I can open it with TextEdit, but each row is so long it forms multiple lines in the document and makes the document difficult to understand. Is there a tool for opening a csv and exploring…
user6309
  • 131
  • 2
3
votes
3 answers

Merging repeating data cells in csv

I have a CSV file with around 1 Million rows. Let say its have details like Name | Age | Salary name 1 52 10000 name 2 55 10043 name 3 50 100054 name 2 55 10023 name 1 52 …
Miller
  • 287
  • 2
  • 9
3
votes
3 answers

How to read file from user in Shiny and assign it to a variable in global.r?

I want to read a csv file as input from user in Shiny and assign it to a variable in global.r file.The code I have in ui.R is fileInput('file1', 'Choose CSV File', accept=c('text/csv', 'text/comma-separated-values,text/plain',…
1
2 3 4 5 6 7