Questions tagged [descriptive-statistics]

Descriptive statistics summarize features of a sample, such as mean and standard deviations, median and quartiles, the maximum and minimum. With multiple variables, may include correlations and crosstabs. Can include visual displays - boxplots, histograms, scatterplots and so on.

Descriptive statistics summarize features of a sample.

Common descriptive statistics include mean and standard deviations, particular quantiles like the median and quartiles, the maximum and minimum, range and interquartile range, five number summaries and so on, but with multiple variables, may include correlations and crosstabs.

Descriptive statistics may include visual displays such as boxplots, histograms and scatterplots.

90 questions

votes

5 answers

How can I ensure anonymity with queries to small datasets?

I'm building a service that will contain personal data relating to real people. Initially the dataset will be quite small, and as such it may be possible to identify individuals if the search parameters are narrowed sufficiently. An example of a…

asked Mar 23 '23 at 08:57

mal

votes

5 answers

When to use mean vs median

I'm new to data science and stats, so this might seems like a beginner question. I'm working on a dataset where I've user's Twitter followers gain per day. I want to measure the average growth he had over a period of time, which I did by finding the…

statistics descriptive-statistics

asked Mar 06 '19 at 03:30

Mukul Jain

votes

2 answers

Which statistical test tells which classifier performs better than the other?

I have 3 classifiers: A, B and C. According to accuracy, specificity, sensitivity, f-score, and g-mean, say classifier B performs best. Now I want to statistically validate this claim. How should I do it? Will McNemar's test be enough to validate…

machine-learning classification statistics performance descriptive-statistics

asked Dec 20 '19 at 04:51

girl101

1,161
2
11
25

votes

1 answer

How does Seaborn calculate error bars when using estimators other than the arithmetic mean?

If I create a barplot using Seaborn and specify the geometric mean or the median as the estimator, does Seaborn know to use the appropriate standard error formula to create error bars?

python pandas descriptive-statistics seaborn

asked Mar 01 '16 at 16:44

LizK

votes

2 answers

Approach to creating a user profile in music web application

I am working on a use case, and I'm unsure of the best way to proceed: in order to analyze the behavior of users of a web-based music application, we retain all songs each has played since 2009. We store this information in flat files, each…

feature-construction descriptive-statistics aggregation

asked Sep 29 '15 at 19:33

user17241

votes

1 answer

Range to define emotions

We are capturing emotions as survey responses. We need to assign values for the responses(emotions) for analysis purposes. Is there an optimum range that can be assigned to achieve this? (like from -100 to 100). An example of a question and a set of…

sentiment-analysis weighted-data descriptive-statistics

asked Jun 13 '17 at 16:38

user33293

votes

2 answers

How to measure the performance of an imputation technique

I would like to know how I can measure the performance of an imputation technique. I have read a lot about this. Most literature on the web are applying a classifier after the data has been completed. So this classifier will be used in order to make…

data-mining descriptive-statistics data-imputation

asked Mar 10 '16 at 21:48

user6046209

votes

2 answers

Distribution vs Histogram

I am not very good at understanding statistics jargon I tried to read articles on statistics every article says distribution and show a picture of bell curve. I understand that x axis of bell curve will take the value of data but does y-axis always…

statistics descriptive-statistics

asked Jun 08 '20 at 15:51

Hitesh Somani

votes

2 answers

evaluation metrics for multiple values per session

I have an application that executes my foo() function several times for each user session. There are 2 alternate algorithms that i can implement as "foo" function and my goal is to evaluate them based on execution delay . The number of times foo()…

statistics accuracy model-evaluations distribution descriptive-statistics

asked Dec 30 '19 at 06:34

sbr

votes

2 answers

How do I handle string feature while performing model generation

I have data which looks like this shift_id user_id status organization_id location_id department_id open_positions city zip role_id specialty_id latitude longitude years_of_experience …

python scikit-learn pandas descriptive-statistics

asked Feb 15 '19 at 08:16

nlper

votes

1 answer

Churn definition for non-contractual services

I am computing churn definition i.e, the number of days after which we will say a customer has churned in fashion retail, etc. Currently, I am using the transaction dates to get the average days between two purchases of each customer, other stats…

machine-learning statistics machine-learning-model descriptive-statistics

asked Jul 05 '18 at 05:19

aspiring1

votes

1 answer

Statistically Robust Distance Measure/Metric for comparing more than two network data series

I have about 30 lists of unequal length (some of which are triplicates of the data), corresponding to metrics relating to nodes of different graphs. I want to compare their similarity using a distance metric, but was unsure which method I can use…

data-mining distance descriptive-statistics

asked Mar 10 '18 at 14:11

user112237

votes

1 answer

Making Use of the Target Values for Regression

Problem: I have a regression problem and I decided to useg Gradient Boosting Regression Trees to solve it. After all the preprocessing, I end up having around 130 attributes, 70K rows, and my cross-validated R-squared is 0.62. Work So Far: To…

clustering regression statistics decision-trees descriptive-statistics

asked Nov 08 '17 at 09:16

mari

votes

2 answers

p-value and effect size

Is it correct to say that the lower the p-value is the higher is the difference between the two means of the two groups in the t-test? For example, if I apply the t-test between two groups of measurements A and B and then to two groups of…

statistics descriptive-statistics inference pvalue

asked Feb 12 '21 at 20:44

HelpNeederStudent

votes

0 answers

Statistical method to find the value which preserves the most information inside "most" of data points. (resize images to a common height)

So I have this data of around 88K images and I found out some interesting properties for my images. print(np.median(width),np.mean(width),scipy.stats.mode(width)) print(np.median(height),np.mean(height),scipy.stats.mode(height)) >> 1280.0…

machine-learning deep-learning computer-vision image-preprocessing descriptive-statistics

asked Jan 06 '21 at 08:25

Deshwal

2 3 4 5 6 Next