Questions tagged [open-source]
20 questions
202
votes
35 answers
Publicly Available Datasets
One of the common problems in data science is gathering data from various sources in a somehow cleaned (semi-structured) format and combining metrics from various sources for making a higher level analysis. Looking at the other people's effort,…
Amir Ali Akbari
- 1,393
- 3
- 13
- 25
28
votes
7 answers
Publicly available social network datasets/APIs
As an extension to our great list of publicly available datasets, I'd like to know if there is any list of publicly available social network datasets/crawling APIs. It would be very nice if alongside with a link to the dataset/API, characteristics…
Rubens
- 4,097
- 5
- 23
- 42
19
votes
5 answers
Open source data science projects to contribute
Contribution into open source projects is typically a good way to get some practice for newbies, and try a new area for experienced data scientists and analysts.
Which projects do you contribute? Please provide some intro + link on Github.
IgorS
- 5,444
- 11
- 31
- 43
7
votes
5 answers
Where can I find free spatio-temporal dataset for download?
Where can I find free spatio-temporal dataset for download so that I can play with it in R ?
mynameisJEFF
- 171
- 1
- 3
7
votes
2 answers
Item Based Collaborative Filtering with No Ratings
I am building a recommender for web pages. For each web page in our data set, we wish to generate a list of web pages that other users have also visited.
Our data only shows that a user has either visited a page, or they have not. Users do not…
sheldonkreger
- 1,169
- 8
- 20
4
votes
3 answers
What open-source books (or other materials) provide a relatively thorough overview of data science?
As a researcher and instructor, I'm looking for open-source books (or similar materials) that provide a relatively thorough overview of data science from an applied perspective. To be clear, I'm especially interested in a thorough overview that…
statsRus
- 325
- 1
- 10
4
votes
3 answers
Data available from industry operations
I'm going to start my degree thesis and I want to do a fault detector system using machine learning techniques. I need datasets for my thesis but I don't know where I can get that data. I'm looking for historical operation/maintenance/fault datasets…
Juan David
- 143
- 3
4
votes
1 answer
Open source solver for large mixed integer programming task?
I'm currently using General Algebraic Modeling System (GAMS), and more specifically CPLEX within GAMS, to solve a very large mixed integer programming problem. This allows me to parallelize the process over 4 cores (although I have more, CPLEX…
rnorberg
- 203
- 2
- 7
2
votes
0 answers
Regression dataset with categorical features
I have thought of a regression technique that I want to try on several datasets. I would like these datasets to have the following properties:
Be a tabular dataset (no images).
Have at least 20k rows, and ideally around 100k.
Have some categorical…
David Masip
- 5,981
- 2
- 23
- 61
2
votes
0 answers
which algorithm will be good for detecting and recognition of faces from variety of angles
i am building a face recognition app for my class attendance system , i collect training data from social website like facebook, instagram and other, as you can see the images i got from there is not usually front facial but at variety of angle. i…
RISHABH RAI
- 71
- 1
- 2
2
votes
1 answer
Difficulties of getting raw data
I am trying to obtain raw data for (violent) crime rates of a US/Canadian city (any city would do), but I need the data to be granular and raw. All I could find is either interpretations, summary data or useless editorials. I'm trying to do…
LearnByReading
- 121
- 2
1
vote
1 answer
Publicly available news APIs/datasets?
In addition to our list of publicly available datasets, I'd like to know if there is any list of publicly available news datasets/crawling APIs. It would be very nice if alongside with a link to the dataset/API, characteristics of the data available…
stevec
- 211
- 1
- 7
1
vote
3 answers
Tools to preprocess a big data for dashboards?
I have a complex dataset with more than 16M rows coming from pharmaceutical industry. Regarding the data, it is saved in a sql server with multiple (more than 400) relational tables. Data got several levels of hierachies like province, city, postal…
JeanVuda
- 421
- 4
- 6
1
vote
0 answers
Code or Package to cluster sequences (or time series) of different lengths based on HMM?
Is there any existing code or packages in Python, R, Java, Matlab, or Scala that implements the sequence clustering algorithms in any of the following 2 papers?
1) 'Clustering Sequences with Hidden Markov Models' by Padhraic Smyth (1997):…
mflowww
- 111
- 3
1
vote
4 answers
which is the most effective(accurate) face detection method in python
i try haar_cascade for face detection and LBPH for face recognition , but the result wasn't good enough, please suggest good ways to detect and recognize faces.
my aim is to create an app which take a photograph of students and by scanning this one…
RISHABH RAI
- 71
- 1
- 2