Questions tagged [data-table]

20 questions
30
votes
4 answers

Is pandas now faster than data.table?

Here is the GitHub link to the most recent data.table benchmark. The data.table benchmarks has not been updated since 2014. I heard somewhere that Pandas is now faster than data.table. Is this true? Has anyone done any benchmarks? I have never used…
xiaodai
  • 620
  • 1
  • 5
  • 12
13
votes
4 answers

How to Write Multiple Data Frames in an Excel Sheet

I have multiple data frames with same column names. I want to write them together to an excel sheet stacked vertically on top of each other. And between each, there will be a text occupying a row. This is what I have in mind. I tried the…
Della
  • 315
  • 1
  • 3
  • 9
4
votes
2 answers

Mean across every several rows in pandas

I have a table of features and labels where each row has a time stamp. Labels are categorical. They go in a batch where one label repeats several times. Batches with the same label do not have a specific order. The number of repetitions of the same…
Munira
  • 157
  • 2
  • 9
3
votes
1 answer

Look for previous date in dataframe that has certain column category in R

I have the following data frame: Date.POSIXct Date WeekDay DayCategory Hour Holidays value 1 2018-05-01 00:00:00 2018-05-01 MA MA-MI-JU 0 0 30 2 2018-05-01 01:00:00 2018-05-01 MA MA-MI-JU 1 …
alvaropr
  • 141
  • 2
3
votes
4 answers

rows to columns in data.table R (or Python)

This is something I can't achieve with the reshape2 library for R. I have the following data: zone code literal 1: A 14 bicl 2: B 14 bicl 3: B 24 calso 4: A 51 …
cpumar
  • 807
  • 1
  • 9
  • 14
2
votes
1 answer

How to make smaller categories with factor character variables

I have this data set with consist of ISO3166 Alpha-2 codes for countries. Example: DE, AD, AE etc They are coded as factor variables in R and there are about 173 observations. Now because there are too many and this would just overwhelm a boxplot, I…
Beharrlich
  • 29
  • 1
1
vote
1 answer

Pandas: Group by Single Column Entries

So have this table above. I'm trying to aggregate the occupations such that the table results in: I've tried using df.groupby(['Occupation']) but I get an error. All I know is that my final step would be to set the index to "Occupation". But I…
J. Doe
  • 13
  • 2
1
vote
0 answers

Are there decisive leaders in programming with tabular data?

What are the most effective bread-and-butter in-memory open source tabular data frameworks today? I have been working with tabular data for years with an in-house solution that integrates with Excel well, but falls short of many other expectations.…
Monolithguy
  • 111
  • 1
1
vote
0 answers

Calculating % by Dividing Filtered Matrix Columns in MS Power BI

Given: A monthly percentage (%) metric has to be calculated from dividing a column ('Numerator') from one table by a column ('Denominator') from another table, both filtered by month, as given in an example below: Table 1: Date_1 …
V_B
  • 11
  • 2
1
vote
2 answers

Theoretical Question: Data.table vs Data.frame with Big Data

I know that I can read in a very large csv file much faster with fread using the data.table library than with read.csv that reads a file in as a data.frame. However, dplyr can only perform operations on data.frame. My questions are: Why was…
Bear
  • 145
  • 1
  • 9
1
vote
0 answers

Table function output and order of arguments

I have a silly question. Below is the output of a logistic regression analysis I did. I notice that when I switch the order of the arguments I put in the table function in R that it also switch the false positives and the false negatives values but…
Darrin Thomas
  • 191
  • 3
  • 13
0
votes
2 answers

Identifying patterns in tabular data

I have a set of tables containing some thousand entries and some tenths of columns from machine status values of production. The entries are of mixed types like string, float, or timestamp. Each table is pre-labeled with a certain failure mode (e.g.…
0
votes
1 answer

How do you see the element of a csv table with many columns (>30) which the names of its columns is more than 10 character in pandas?

How do you see in pandas the element of a csv table with many columns (>25) which the names of its columns is more than 10 character? I have 5000 rows and 32 columns and the label of some columns are more than 10 characters. How I ca see them and…
user10296606
  • 1,784
  • 5
  • 17
  • 31
0
votes
1 answer

Software for automated database processing

I faced a problem which I'd like to solve w/o any programming. And looking for a software to do this. I have a dataset, for example: (brand-id, brand-name, product-class-name;) 0, Audi, economy business premium; 1, Rolls Royce, luxury; 2, Seat,…
0
votes
1 answer

R: Calculations based on frequencies / grouped / aggregate data

I am trying to do simple calculations in R when no raw data but grouped data with frequencies is available only. This is the case when I have a large amount of records in a database, say a large SQL table, and then for given reasons GROUP BY and…
joffdd
  • 11
  • 1
1
2