0.125.0
版本发布时间: 2024-04-01 20:38:32
dathere/qsv最新发布版本:0.134.0(2024-09-10 20:11:27)
In this release, we focused on the 🏎️ need for even more speed 🏎️ .
This was done primarily by tweaking several supporting qsv crates. qsv-docopt
now parses command-line arguments slightly faster. qsv-stats
, the crate behind commands like stats
, schema
, tojsonl
, and frequency
, has been further optimized for speed. qsv-dateparser
has been updated to support new timezone handling options in datefmt
. qsv-sniffer
also got a speed boost.
Per the benchmark suite, stats
is 25% faster (1.563 secs vs 2.067 secs) when computing the 13 "streaming" stats and 13% faster when computing --everything
(17 columns of addl stats - 3.149 secs vs 3.656 secs) for the 1M row, 41 column, 520mb sample of NYC's 311 data.
The count
command has been refactored to utilize Polars' SQLContext, which leverages LazyFrames evaluation to automagically count even very large files in just a few seconds. Previously, count
was already using Polars, but it mistakenly fell back to a slower counting mode. Now, it consistently delivers fast performance, even without an index. On the same benchmark suite, it takes 0.052 secs vs 0.503 seconds - almost 10x faster!
As count
is not just a top-level command, but also a widely used helper used by several qsv commands, this gives the entire suite a nice performance boost.
Continuing on the performance front, the excel
command now has a new short --metadata
mode, allowing users to just get a "shorter" version of the metadata report that only list the workbook's top level metadata (sheet index, sheet name, sheet type, visibility) instead of the full metadata report (which also has info like num rows, column metadata, etc.). On the benchmark suite, the short metadata report takes all of 0.005 secs vs 11.237 secs for the 1M row xlsx version of the same NYC 311 data - more than 3 orders of magnitude faster! (it may actually be faster since 0.005 secs is at the limits of what hyperfine can measure)
The datefmt
command also got some major enhancements with new timezone handling and timestamp parsing options, though at the cost of a small 15% performance penalty.
Lastly, we are excited to announce that qsv will be featured at the CSV,Conf,V8 conference in Puebla, Mexico on May 28-29. I'll be presenting a talk titled "qsv: A Blazing Fast CSV Data-Wrangling Toolkit". Hope to see you there!.
Added
-
excel
: added short mode to--metadata
option https://github.com/jqnatividad/qsv/pull/1699 -
datefmt
: addedts-resolution
option to specify resolution to use when parsing unix timestamps https://github.com/jqnatividad/qsv/pull/1704 -
datefmt
: added timezone handling options https://github.com/jqnatividad/qsv/pull/1706 https://github.com/jqnatividad/qsv/pull/1707 https://github.com/jqnatividad/qsv/pull/1642
Changed
-
count
: refactored to use Polars SQLContext https://github.com/jqnatividad/qsv/commit/43a236f6a45c890d2bb6b4c43eb469bd627f82e1 -
stats
: refactored stats_path helper function https://github.com/jqnatividad/qsv/commit/174c30e3b87470613ff34a98617d44e477a4296a -
apply
,applydp
,datefmt
,excel
,geocode
,py
,validate
: use std::mem::take to avoid clone https://github.com/jqnatividad/qsv/commit/1fd187f23262b51e0f431664895d49fd930d011a https://github.com/jqnatividad/qsv/commit/8402d3a8063ef161fc9ec68dd7f0f0601802d21d https://github.com/jqnatividad/qsv/commit/849615775505a25888a50b255ba0d544e878aeaf -
excel
: optimized workbook opening operation https://github.com/jqnatividad/qsv/commit/67f662eba501e543ec44e5daf5eb175f8a8ae7b1 - build(deps): bump flexi_logger from 0.27.4 to 0.28.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1673
- build(deps): bump polars from 0.38.2 to 0.38.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1674
- build(deps): bump uuid from 1.7.0 to 1.8.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1675
- build(deps): bump hashbrown from 0.14.3 to 0.14.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/1680
- build(deps): bump reqwest from 0.11.26 to 0.11.27 by @dependabot in https://github.com/jqnatividad/qsv/pull/1679
- build(deps): bump bytes from 1.5.0 to 1.6.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1685
- build(deps): bump regex from 1.10.3 to 1.10.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/1686
- build(deps): bump indexmap from 2.2.5 to 2.2.6 by @dependabot in https://github.com/jqnatividad/qsv/pull/1687
- build(deps): bump rayon from 1.9.0 to 1.10.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1688
- build(deps): bump qsv_docopt from 1.6.0 to 1.7.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1691
- build(deps): bump reqwest from 0.12.1 to 0.12.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1693
- build(deps): bump serde_json from 1.0.114 to 1.0.115 by @dependabot in https://github.com/jqnatividad/qsv/pull/1694
- build(deps): bump itoa from 1.0.10 to 1.0.11 by @dependabot in https://github.com/jqnatividad/qsv/pull/1695
- build(deps): bump actions/setup-python from 5.0.0 to 5.1.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1700
- build(deps): bump rust_decimal from 1.34.3 to 1.35.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1701
- build(deps): bump chrono from 0.4.35 to 0.4.37 by @dependabot in https://github.com/jqnatividad/qsv/pull/1702
- build(deps): bump tokio from 1.36.0 to 1.37.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1703
- build(deps): bump qsv-sniffer from 0.10.2 to 0.10.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1708
- build(deps): bump titlecase from 2.2.1 to 3.0.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1709
- build(deps): bump qsv-stats from 0.13.0 to 0.14.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1710
- applied select clippy recommendations
- updated several indirect dependencies
- added several benchmarks for new/changed commands
- bumped MSRV to 1.77.1
- use
#[cfg(debug_assertions)]
conditional compilation to avoid compiling debug code in release mode - use patched forks of
jsonschema
,cached
,self_update
andlocalzone
crates to avoid old dependencies which was causing dependency bloat
Fixed
-
count
: fixed polars_count_input helper, as it was always falling back to "slow" counting mode https://github.com/jqnatividad/qsv/commit/3484c89080d41d2e39457c918a893189aee64753
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.124.1...0.125.0
1、 qsv-0.125.0-aarch64-apple-darwin.zip 119.92MB
2、 qsv-0.125.0-aarch64-unknown-linux-gnu.zip 14.72MB
3、 qsv-0.125.0-geocode-index.bincode 14.12MB
4、 qsv-0.125.0-geocode-index.bincode.cities15000 14.12MB
5、 qsv-0.125.0-geocode-index.bincode.cities15000.sz 5.58MB
6、 qsv-0.125.0-i686-pc-windows-msvc.zip 14.19MB
7、 qsv-0.125.0-i686-unknown-linux-gnu.zip 15.21MB
8、 qsv-0.125.0-x86_64-apple-darwin.zip 133.7MB
9、 qsv-0.125.0-x86_64-pc-windows-gnu.zip 31.45MB
10、 qsv-0.125.0-x86_64-pc-windows-msvc.zip 136.72MB
11、 qsv-0.125.0-x86_64-unknown-linux-gnu.zip 190.27MB
12、 qsv-0.125.0-x86_64-unknown-linux-musl.zip 57.34MB
13、 qsv-0.125.0.msi 32.55MB