5

I am a Python-Newbie and want to plot a list of values between -0.2 and 0.2. The list looks like this

[...-0.01501152092971969,
  -0.01501152092971969,
  -0.01501152092971969,
  -0.01501152092971969,
  -0.01501152092971969,
  -0.01501152092971969,
  -0.01501152092971969,
  -0.01501152092971969,
  -0.01501152092971969,
  -0.01489985147131656,
  -0.015833709930856088,
  -0.015833709930856088,
  -0.015833709930856088,
  -0.015833709930856088,
  -0.015833709930856088...and so on].

In statistics I've learned to group my data into classes to get a useful plot for a histogram, which depends on such large data.

How can I add classes in python to my plot?

My code is

plt.hist(data)

and histogram looks like enter image description here

But it should look like enter image description here

Brian Spiering
  • 20,142
  • 2
  • 25
  • 102
Thomas
  • 51
  • 1
  • 1
  • 2
  • This is unclear. Are you asking for how to group the data, or how to plot grouped data? – Stephen Rauch Jan 27 '18 at 19:19
  • @ Stephen Rauch: I am asking for grouping the data with plt.hist() or in another way. After grouping the data I want to realize the plot. @ Media: plt.hist(cum_returns_10_5, bins=range(min(cum_returns_10_5), max(cum_returns_10_5) + binwidth, binwidth)) NameError: name 'binwidth' is not defined plt.hist(data, bins=range(min(data), max(data) + binwidth, bin width)) Your solution produces an error (look above). – Tom Jan 27 '18 at 19:43
  • You should not put this information into an answer. You can comment, or edit your question, or both. – Stephen Rauch Jan 27 '18 at 19:48
  • welcome to the community @Tom, use comments. the reason it is not working is that you have to set them. they are typical variables for illustration purposes, you have to set values instead of them. – Green Falcon Jan 27 '18 at 19:49
  • Thank you for that hint @Media! @Stephen Rauch: Would you be so kind and give me a comment on do you group data in python that is written in a list so that it can be plotted? Thanks for your help :) – Tom Jan 27 '18 at 20:45
  • @Tom that is not how it works here. Questions need to be clear to be answered. I suggest you take the [tour](https://datascience.stackexchange.com/tour), and look the rest of the help center, starting [here](https://datascience.stackexchange.com/help/how-to-ask). – Stephen Rauch Jan 27 '18 at 20:51
  • @Tom what do you mean by grouping? if you mean you need to specify axis values, you can do them using the arguments of the functions themselves. You don't need to gather them together. – Green Falcon Jan 28 '18 at 04:25

2 Answers2

3

Your histogram is valid, but it has too many bins to be useful.

If you want a number of equally spaced bins, you can simply pass that number through the bins argument of plt.hist, e.g.:

plt.hist(data, bins=10)

If you want your bins to have specific edges, you can pass these as a list to bins:

plt.hist(data, bins=[0, 5, 10, 15, 20, 25, 30, 35, 40, 60, 100])

Finally, you can also specify a method to calculate the bin edges automatically, such as auto (available methods are specified in the documentation of numpy.histogram_bin_edges):

plt.hist(data, bins='auto')

Complete code sample

import matplotlib.pyplot as plt
import numpy as np

# fix the random state for reproducibility
np.random.seed(19680801);

# sum of 2 normal distributions
n = 500;
data = 10 * np.random.randn(n) + 20 * np.random.randn(n) + 20;

# plot histograms with various bins
fig, axs = plt.subplots(1, 3, sharey=True, tight_layout=True, figsize=(9,3));
axs[0].hist(data, bins=10);
axs[1].hist(data, bins=[0, 5, 10, 15, 20, 25, 30, 35, 40, 60, 100]);
axs[2].hist(data, bins='auto');

enter image description here

Xavier
  • 131
  • 3
2

You have to specify the bin size, if I've figured out the question. As stated here.

You can give a list with the bin boundaries.

plt.hist(data, bins=[0, 10, 20, 30, 40, 50, 100])

If you just want them equally distributed, you can simply use range:

plt.hist(data, bins=range(min(data), max(data) + binwidth, binwidth))

You can also take a look at here and here.

Green Falcon
  • 13,868
  • 9
  • 55
  • 98
  • If you want them equally distributed, there is a simpler way: instead of given the bin boundaries as an argument, just tell matplotlib how many bins you want, e.g. `plt.hist(data, bins=20)`. – Xavier Sep 21 '19 at 09:31
  • @Xavier Thank you for your respond, I guess you may want to submit your answer. As you may have noticed, the question is not closed yet :) – Green Falcon Sep 22 '19 at 10:40