Questions tagged [descriptive-statistics]

Descriptive statistics summarize features of a sample, such as mean and standard deviations, median and quartiles, the maximum and minimum. With multiple variables, may include correlations and crosstabs. Can include visual displays - boxplots, histograms, scatterplots and so on.

Descriptive statistics summarize features of a sample.

Common descriptive statistics include mean and standard deviations, particular quantiles like the median and quartiles, the maximum and minimum, range and interquartile range, five number summaries and so on, but with multiple variables, may include correlations and crosstabs.

Descriptive statistics may include visual displays such as boxplots, histograms and scatterplots.

90 questions
15
votes
5 answers

How can I ensure anonymity with queries to small datasets?

I'm building a service that will contain personal data relating to real people. Initially the dataset will be quite small, and as such it may be possible to identify individuals if the search parameters are narrowed sufficiently. An example of a…
8
votes
5 answers

When to use mean vs median

I'm new to data science and stats, so this might seems like a beginner question. I'm working on a dataset where I've user's Twitter followers gain per day. I want to measure the average growth he had over a period of time, which I did by finding the…
Mukul Jain
  • 193
  • 1
  • 6
7
votes
2 answers

Which statistical test tells which classifier performs better than the other?

I have 3 classifiers: A, B and C. According to accuracy, specificity, sensitivity, f-score, and g-mean, say classifier B performs best. Now I want to statistically validate this claim. How should I do it? Will McNemar's test be enough to validate…
6
votes
1 answer

How does Seaborn calculate error bars when using estimators other than the arithmetic mean?

If I create a barplot using Seaborn and specify the geometric mean or the median as the estimator, does Seaborn know to use the appropriate standard error formula to create error bars?
LizK
  • 61
  • 2
4
votes
2 answers

Approach to creating a user profile in music web application

I am working on a use case, and I'm unsure of the best way to proceed: in order to analyze the behavior of users of a web-based music application, we retain all songs each has played since 2009. We store this information in flat files, each…
4
votes
1 answer

Range to define emotions

We are capturing emotions as survey responses. We need to assign values for the responses(emotions) for analysis purposes. Is there an optimum range that can be assigned to achieve this? (like from -100 to 100). An example of a question and a set of…
user33293
4
votes
2 answers

How to measure the performance of an imputation technique

I would like to know how I can measure the performance of an imputation technique. I have read a lot about this. Most literature on the web are applying a classifier after the data has been completed. So this classifier will be used in order to make…
3
votes
2 answers

Distribution vs Histogram

I am not very good at understanding statistics jargon I tried to read articles on statistics every article says distribution and show a picture of bell curve. I understand that x axis of bell curve will take the value of data but does y-axis always…
Hitesh Somani
  • 379
  • 1
  • 10
3
votes
2 answers

evaluation metrics for multiple values per session

I have an application that executes my foo() function several times for each user session. There are 2 alternate algorithms that i can implement as "foo" function and my goal is to evaluate them based on execution delay . The number of times foo()…
3
votes
2 answers

How do I handle string feature while performing model generation

I have data which looks like this shift_id user_id status organization_id location_id department_id open_positions city zip role_id specialty_id latitude longitude years_of_experience …
nlper
  • 31
  • 2
3
votes
1 answer

Churn definition for non-contractual services

I am computing churn definition i.e, the number of days after which we will say a customer has churned in fashion retail, etc. Currently, I am using the transaction dates to get the average days between two purchases of each customer, other stats…
3
votes
1 answer

Statistically Robust Distance Measure/Metric for comparing more than two network data series

I have about 30 lists of unequal length (some of which are triplicates of the data), corresponding to metrics relating to nodes of different graphs. I want to compare their similarity using a distance metric, but was unsure which method I can use…
user112237
  • 31
  • 2
3
votes
1 answer

Making Use of the Target Values for Regression

Problem: I have a regression problem and I decided to useg Gradient Boosting Regression Trees to solve it. After all the preprocessing, I end up having around 130 attributes, 70K rows, and my cross-validated R-squared is 0.62. Work So Far: To…
2
votes
2 answers

p-value and effect size

Is it correct to say that the lower the p-value is the higher is the difference between the two means of the two groups in the t-test? For example, if I apply the t-test between two groups of measurements A and B and then to two groups of…
2
votes
0 answers

Statistical method to find the value which preserves the most information inside "most" of data points. (resize images to a common height)

So I have this data of around 88K images and I found out some interesting properties for my images. print(np.median(width),np.mean(width),scipy.stats.mode(width)) print(np.median(height),np.mean(height),scipy.stats.mode(height)) >> 1280.0…
1
2 3 4 5 6