Questions tagged [databases]

A comprehensive collection of related data organized for convenient access, generally associated with software to update and query the data.

From Wikipedia:

A database is an organized collection of data. The data is typically organized to model relevant aspects of reality (for example, the availability of rooms in hotels), in a way that supports processes requiring this information (for example, finding a hotel with vacancies).

A large proportion of websites and applications rely on databases. They are a crucial component of telecommunications systems, banking systems, video games, and just about any other software system or electronic device that maintains some amount of persistent information. In addition to persistence, database systems provide a number of other properties that make them exceptionally useful and convenient: reliability, efficiency, scalability, concurrency control, data abstraction, and high-level query languages. Databases are so ubiquitous and important that computer science graduates frequently cite their database class as the one most useful to them in their industry or graduate-school careers.2

The term database should not be confused with Database Management System (DBMS). A DBMS is the system software used to create and manage databases and provide users and applications with access to the database(s). A database is to a DBMS as a document is to a word processor.

Some useful references:

98 questions
59
votes
9 answers

How to deal with version control of large amounts of (binary) data

I am a PhD student of Geophysics and work with large amounts of image data (hundreds of GB, tens of thousands of files). I know svn and git fairly well and come to value a project history, combined with the ability to easily work together and have…
Johann
  • 701
  • 1
  • 5
  • 5
38
votes
5 answers

Best practices to store Python machine learning models

What are the best practices to save, store, and share machine learning models? In Python, we generally store the binary representation of the model, using pickle or joblib. Models, in my case, can be ~100Mo large. Also, joblib can save one model to…
Antoine Dusséaux
  • 481
  • 1
  • 4
  • 7
27
votes
4 answers

What makes columnar databases suitable for data science?

What are some of the advantages of columnar data-stores which make them more suitable for data science and analytics?
Dawny33
  • 8,226
  • 12
  • 47
  • 104
15
votes
1 answer

What is the difference between ImageNet and ImageNet1k? How to download it?

Some papers mention just ImageNet and some papers mention ImageNet 1k database? What is the difference between these 2? Are they same or is the latter one subset of the former one? I'm working on Generative Adversarial Nets. I wanted to train it on…
Nagabhushan S N
  • 724
  • 3
  • 8
  • 21
14
votes
5 answers

Advantages of pandas dataframe to regular relational database

In Data Science, many seem to be using pandas dataframes as the datastore. What are the features of pandas that make it a superior datastore compared to regular relational databases like MySQL, which are used to store data in many other fields of…
Simon Boehm
  • 371
  • 1
  • 2
  • 12
13
votes
3 answers

Efficient database model for storing data indexed by n-grams

I'm working on an application which requires creating a very large database of n-grams that exist in a large text corpus. I need three efficient operation types: Lookup and insertion indexed by the n-gram itself, and querying for all n-grams that…
Phonon
  • 298
  • 2
  • 6
13
votes
1 answer

Neo4j vs OrientDB vs Titan

I am working on a data-science project related on social relationship mining and need to store data in some graph databases. Initially I chose Neo4j as the database. But it seams Neo4j doesn't scale well. The alternative I found out are Titan and…
Sreejithc321
  • 1,890
  • 3
  • 17
  • 32
13
votes
1 answer

When a relational database has better performance than a no relational

When a relational database, like MySQL, has better performance than a no relational, like MongoDB? I saw a question on Quora other day, about why Quora still uses MySQL as their backend, and that their performance is still good.
11
votes
3 answers

Which is faster: PostgreSQL vs MongoDB on large JSON datasets?

I have a large dataset with 9m JSON objects at ~300 bytes each. They are posts from a link aggregator: basically links (a URL, title and author id) and comments (text and author ID) + metadata. They could very well be relational records in a table,…
blue-dino
  • 383
  • 2
  • 3
  • 11
10
votes
2 answers

Is this Neo4j comparison to RDBMS execution time correct?

Background: Following is from the book Graph Databases, which covers a performance test mentioned in the book Neo4j in Action: Relationships in a graph naturally form paths. Querying, or traversing, the graph involves following paths. Because of…
blunders
  • 1,922
  • 2
  • 15
  • 19
10
votes
7 answers

What is the ideal database that allows fast cosine distance?

I'm currently trying to store many feature vectors in a database so that, upon request, I can compare an incoming feature vector against many other (if not all) stored in the db. I would need to compute the Cosine Distance and only return, for…
G4bri3l
  • 213
  • 2
  • 7
10
votes
2 answers

Have 100% images from ImageNet been proven to belong to the class annotated?

Is it proven that all 15M images were manually classified correctly and there are no mistakes or randomly selected responses collected?
ivan866
  • 200
  • 1
  • 7
9
votes
3 answers

Human activity recognition using smartphone data set problem

I'm new to this community and hopefully my question will well fit in here. As part of my undergraduate data analytics course I have choose to do the project on human activity recognition using smartphone data sets. As far as I'm concern this topic…
Jakubee
  • 401
  • 1
  • 5
  • 8
6
votes
2 answers

Python interface to Titan Database

How can I connect to Titan database from Python ? What I understand is that Titan (Graph database) provides an interface (Blueprint) to Cassandra (Column Store) and bulb is a python interface to graph DB. Now how can I start programming in python…
Sreejithc321
  • 1,890
  • 3
  • 17
  • 32
6
votes
1 answer

Is Data Science just a trend or is a long term concept?

I see a lot of courses in Data Science emerging in the last 2 years. Even big universities like Stanford and Columbia offers MS specifically in Data Science. But as long as I see, it looks like data science is just a mix of computer science and…
1
2 3 4 5 6 7