Structured Query Language (SQL) is a language for querying databases. Questions should include code examples, table structure, sample data, and a tag for the DBMS implementation (e.g. MySQL, PostgreSQL, Oracle, MS SQL Server, IBM DB2, etc.) being used.
Questions tagged [sql]
101 questions
142
votes
12 answers
Why do people prefer Pandas to SQL?
I've been using SQL since 1996, so I may be biased. I've used MySQL and SQLite 3 extensively, but have also used Microsoft SQL Server and Oracle.
The vast majority of the operations I've seen done with Pandas can be done more easily with SQL. This…
vy32
- 601
- 2
- 7
- 10
26
votes
5 answers
Natural Language to SQL query
I have been working on developing a system "Converting Natural Language to SQL Query".
I have read the answers from the similar questions, but was not able to get the information that I was looking for.
Below is the flowchart for such system which I…
deepguy
- 1,441
- 7
- 18
- 38
20
votes
5 answers
Do modern R and/or Python libraries make SQL obsolete?
I work in an office where SQL Server is the backbone of everything we do, from data processing to cleaning to munging. My colleague specializes in writing complex functions and stored procedures to methodically process incoming data so that it can…
AffableAmbler
- 363
- 1
- 2
- 10
11
votes
3 answers
Which is faster: PostgreSQL vs MongoDB on large JSON datasets?
I have a large dataset with 9m JSON objects at ~300 bytes each. They are posts from a link aggregator: basically links (a URL, title and author id) and comments (text and author ID) + metadata.
They could very well be relational records in a table,…
blue-dino
- 383
- 2
- 3
- 11
11
votes
2 answers
Tools for automatic anomaly detection on a SQL table?
I have a large SQL table that is essentially a log. The data is pretty complex and I'm trying to find some way to identify anomalies without me understanding all the data. I've found lots of tools for Anomaly Detection but most of them require a…
THE JOATMON
- 211
- 2
- 4
10
votes
4 answers
How to debug data analysis?
I've came across the following problem, that I recon is rather typical.
I have some large data, say, a few million rows. I run some non-trivial analysis on it, e.g. an SQL query consisting of several sub-queries. I get some result, stating, for…
Little Bobby Tables
- 341
- 2
- 5
7
votes
2 answers
How important is advanced SQL for data science?
Is advanced level SQL required to be competitive as a data scientist? Is it more important for a data analyst to be good in SQL? Is it enough to be able to extract data using simple SQL queries?
I know it is faster to manipulate data in SQL than to…
user62049
6
votes
2 answers
How to best accomplish high speed comparison of like data?
I attack this problem frequently with inefficiency because it's always pretty low on the priority list and my clients are resistant to change until things break. I would like some input on how to speed things up.
I have multiple datasets of…
Steve Kallestad
- 3,128
- 4
- 21
- 39
5
votes
3 answers
Storing Sensor Data for Analysis of the Office
I have currently been tasked with designing an application that tracks several different measurements around the office, eg. the temperature, light, presence of people, etc. Having never really worked on data analysis before, I would like some…
Utsav Tiwary
- 53
- 3
5
votes
1 answer
Data enrichment of geographical records
I have a user_data table with various fields, some of them are based on geography.
I'd like to enrich the data with additional columns, like expected_income_in_region, city_population, life_expectancy_in_state, etc... for each user record.
I'd like…
Uri Goren
- 438
- 2
- 7
5
votes
1 answer
Python & Pandas : TypeError: to_sql() got an unexpected keyword argument 'flavor'
I want to store JSON Data into MySQL Database using Python.
I used dataframe of pandas. I found that to_sql() can do this job easily.
Python Code:
jdata=json.loads(json_data)
df=pandas.DataFrame(jdata)
df.to_sql(con=con, name='crashTable',…
Dipankar Nalui
- 155
- 1
- 2
- 5
5
votes
1 answer
Data engineering good and bad practice?
I'm a Data Analyst in a pretty big company and I'm having a really bad time with the data I'm being given. I spend about 70% of my time thinking about where to find the data and how to pull it instead of analyzing it. I have to pull from tables that…
Marc
- 222
- 1
- 7
4
votes
1 answer
Traversing trees in SQL: JOINs vs imperative algorithm
I have a table representing posts on a message board. Posts may or may not have parents.
What is the most common way to get all posts starting from a given post, or to find the root post of any given post?
I can think of using JOINs to join parents…
blue-dino
- 383
- 2
- 3
- 11
4
votes
1 answer
Data store for testing data products?
Is there a recommended approach for storing processed data for testing new data products?
Basically, I'd like to have a system where a data scientist or an analyst could think of a new data product to present to users, do the data processing to…
bobfet1
- 141
- 1
4
votes
2 answers
How to deal with millions or rows of data for analysis/visualization purpose
I have data in 2 tables in Sql server.
First table has around 10 million rows and 8 columns.
Second table has 6 million rows and 60 columns.
I want to import those tables into a Python notebook using pandas ( I am importing in "chunksize") and…
Chandrashekhar Patil
- 41
- 3