![]() So, I have to do about 8 or 9 SELECT COUNT (DISTINCT col). The actual columns in the table that make up the facets are about 8 or 9. SELECT COUNT (DISTINCT c) FROM t WHERE deleted = 0 SELECT COUNT (DISTINCT b) FROM t WHERE deleted = 0 SELECT COUNT (DISTINCT a) FROM t WHERE deleted = 0 SELECT * FROM t WHERE deleted = 0 LIMIT 30 OFFSET 0 SELECT COUNT(id) FROM t WHERE deleted = 0 In the very first instance, since there is no WHERE clause, the above queries will be - number of rows in the result set SELECT COUNT (DISTINCT c) FROM t WHERE deleted = 0 AND a = 'foo' ![]() SELECT COUNT (DISTINCT b) FROM t WHERE deleted = 0 AND a = 'foo' SELECT * FROM t WHERE deleted = 0 AND a = 'foo' LIMIT 30 OFFSET 0 SELECT COUNT(id) FROM t WHERE deleted = 0 AND a = 'foo' So, I will do something like - number of rows in the result set I need to return the matching rows, and I also need to return how many DISTINCT b and c are present in those rows. Let's say someone searches for all rows WHERE a = 'foo'. The COUNT (DISTINCT column) counts are to create facets. Even then, the number of deleted rows will be minimal. For the most part, such rows will be very few, but it is essential that they are not used in the counts and selects.Įventually the number of rows might grow to 3 times as much, let's say 1M. The deleted column is a flag to keep track of rows that are to be excluded from all queries. output: SEARCH TABLE t USING INDEX ix_t (deleted=?)Īs we can see above, the new index is not being used. (and so on the columns b and c as well)ĮXPLAIN QUERY PLAN SELECT Count(DISTINCT a) FROM t WHERE deleted = 0 How can I speed up these queries to the same order as the first one? I created following indexes but it didn't help CREATE INDEX ix_t_a ON t (a, deleted) WHERE deleted = 0 The other three queries COUNT (DISTINCT ) take 600-900ms. Given ~300K records in the following table CREATE TABLE t (ĬREATE INDEX ix_t ON t (deleted) WHERE deleted = 0 ġ) SELECT Count(id) FROM t WHERE deleted = 0 Ģ) SELECT Count(DISTINCT a) FROM t WHERE deleted = 0 ģ) SELECT Count(DISTINCT b) FROM t WHERE deleted = 0 Ĥ) SELECT Count(DISTINCT c) FROM t WHERE deleted = 0 We'll cover this in greater depth in a later lesson.Update: added background info and more explanation It's worth noting that using DISTINCT, particularly in aggregations, can slow your queries down quite a bit. For MAX and MIN, you probably shouldn't ever use DISTINCT because the results will be the same as without DISTINCT, and the DISTINCT function will make your query substantially slower to return results. Of course, you can SUM or AVG the distinct values in a column, but there are fewer practical applications for them. You'll notice that DISTINCT goes inside the aggregate function rather than at the beginning of the SELECT clause. For example, you might follow this up by taking average trade volumes by month to get a sense of when Apple stock really moves: SELECT month, That's a small enough number that you might be able to aggregate by month and interpret the results fairly early. The results show that there are 12 unique values (other examples may be less obvious). SELECT COUNT(DISTINCT month) AS unique_monthsįROM tutorial.aapl_historical_stock_price In this case, you should run the query below that counts the unique values in the month column. You'll probably use it most commonly with the COUNT function. You can use DISTINCT when performing an aggregation. Looking at the unique values on each column can help identify how you might want to group or filter the data. In many real-world scenarios, you will generally end up writing several preliminary queries in order to figure out the best approach to answering your initial question. ![]() Write a query that returns the unique values in the year column, in chronological order.ĭISTINCT can be particularly helpful when exploring a new data set.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |