[ { "title": "Huggingface/datasets", "url": "https://alongwy.top/opensource/huggingface-datasets/", "body": "🤗 Datasets is a lightweight library providing two main features:\none-line dataloaders for many public datasets: \n\none liners to download and pre-process any of the number of datasets major public datasets (in 467 languages and dialects!) provided on the HuggingFace Datasets Hub. With a simple command like squad_dataset = load_dataset("squad"), get any of these datasets ready to use in a dataloader for training/evaluating a ML model (Numpy/Pandas/PyTorch/TensorFlow/JAX),\nefficient data pre-processing: simple, fast and reproducible data pre-processing for the above public datasets as well as your own local datasets in CSV/JSON/text. With simple commands like tokenized_dataset = dataset.map(tokenize_example), efficiently prepare the dataset for inspection and ML model evaluation and training.\n\n" } , { "title": "Language Technology Platform", "url": "https://alongwy.top/projects/language-technology-platform/", "body": "Intro\nAn open-source neural language technology platform supporting six fundamental Chinese NLP tasks:\n\nlexical analysis (Chinese word segmentation, part-of-speech tagging, and named entity recognition)\nsyntactic parsing (dependency parsing)\nsemantic parsing (semantic dependency parsing and semantic role labeling). \n\nQuickstart\nfrom ltp import LTP\n\nltp = LTP() # 默认加载 Small 模型\nseg, hidden = ltp.seg(["他叫汤姆去拿外衣。"])\npos = ltp.pos(hidden)\nner = ltp.ner(hidden)\nsrl = ltp.srl(hidden)\ndep = ltp.dep(hidden)\nsdp = ltp.sdp(hidden)\n\nPerformance\nModelCWSPOSNERSRLDEPSDPSpeed(Sents/S)\nLTP 4.0 (Base)98.7098.5095.480.6089.5075.2039.12\nLTP 4.0 (Base1)99.2298.7396.3979.2889.5776.57--.--\nLTP 4.0 (Base2)99.1898.6995.9779.4990.1976.62--.--\nLTP 4.0 (Small)98.4098.2094.3078.4088.3074.7043.13\nLTP 4.0 (Tiny)96.8097.1091.6070.9083.8070.1053.22\n\nCite\n@article{che2020n,\n title={N-LTP: A Open-source Neural Chinese Language Technology Platform with Pretrained Models},\n author={Che, Wanxiang and Feng, Yunlong and Qin, Libo and Liu, Ting},\n journal={arXiv preprint arXiv:2009.11616},\n year={2020}\n}\n\n" } , { "title": "NotFeed: A RSS Reader on GitHub", "url": "https://alongwy.top/projects/notfeed-a-rss-reader-on-github/", "body": "NotCraft::NotFeed\nAn RSS reader running entirely from your GitHub repo.\n\nFree hosting on GitHub Pages. No ads. No third party tracking.\nNo need for backend. Content updates via GitHub Actions.\nCustomizable layouts and styles via templating and theming API. Just bring your HTML and CSS.\nFree and open source. No third-party tracking.\n\nHow to use it?\nGithub Pages\n\n\nUse the NotFeed-Template generate your own repository.\n\n\nIn the repository root, open Config.toml file, click the "Pencil (Edit this file)" button to edit.\n\n\nRemove # to uncommend the cacheUrl property, replace <github_username> with your GitHub username, and\nreplace <repo> with your GitHub repo name.\n\n\nIn the sources, update the items to the sources you want to follow. The final content of the file should look similar\nto this:\n# Config.toml\n\nsite_title = "ArxivDaily"\ncache_max_days = 7\nsources = [\n "https://export.arxiv.org/rss/cs.CL"\n]\n# proxy = "http://127.0.0.1:7890" ## Optional: default is None\n# statics_dir = "statics" ## Optional: default is "statics"\n# templates_dir = "includes" ## Optional: default is "includes"\n# cache_url = "https://GITHUB_USERNAME.github.io/REPO_NAME/cache.json"\n# minify = true\n# [scripts]\n# highlight = "scripts/highlight.rhai"\n\n\n\nScroll to the bottom of the page, click "Commit changes" button.\n\n\nOnce the rebuild finishes, your feed will be available at https://<github_username>.github.io/<repo>\n\n\nLocalhost\n\n\nClone the NotFeed-Template repository.\n\n\nEdit Config.toml file.\n\n\nRun notfeed\n\nbuild: notfeed build\nserve: notfeed serve --addr 127.0.0.1 --port 8080 or simply notfeed serve\n\n\n\nThanks\n\nInspired by osmos::feed\n\n" } , { "title": "gpustat: a rust-version of gpustat.", "url": "https://alongwy.top/projects/gpustat-a-rust-version-of-gpustat/", "body": "gpustat\n\n\nA rust version of gpustat.\nJust less than nvidia-smi?\nUsage\n$ gpustat\nOptions:\n\n--color : Force colored output (even when stdout is not a tty)\n--no-color : Suppress colored output\n-u, --show-user : Display username of the process owner\n-c, --show-cmd : Display the process name\n-f, --show-full-cmd : Display full command and cpu stats of running process\n-p, --show-pid : Display PID of the process\n-F, --show-fan : Display GPU fan speed\n-e, --show-codec : Display encoder and/or decoder utilization\n-a, --show-all : Display all gpu properties above\n\nQuick Installation\nInstall from Cargo:\ncargo install gpustat\n\nDefault display\n\n[0] | A100-PCIE-40GB | 65'C | 75 % | 33409 / 40536 MB | along(33407M)\n\n\n[0]: GPUindex (starts from 0) as PCI_BUS_ID\nA100-PCIE-40GB: GPU name\n65'C: Temperature\n75 %: Utilization\n33409 / 40536 MB: GPU Memory Usage\nalong(33407M): Username of the running processes owner on GPU (and their memory usage)\n\nLicense\nGPL v2 License\n" } , { "title": "N-LTP: A Open-source Neural Chinese Language Technology Platform with Pretrained Models", "url": "https://alongwy.top/publications/n-ltp-a-open-source-neural-chinese-language-technology-platform-with-pretrained-models/", "body": "An open-source neural language technology platform supporting six fundamental Chinese NLP tasks: \n\nlexical analysis (Chinese word segmentation, part-of-speech tagging, and named entity recognition)\nsyntactic parsing (dependency parsing)\nsemantic parsing (semantic dependency parsing and semantic role labeling). \n\nUnlike the existing state-of-the-art toolkits, such as Stanza, that adopt an independent model for each task, N-LTP adopts the multi-task framework by using a shared pre-trained model, which has the advantage of capturing the shared knowledge across relevant Chinese tasks. \nIn addition, knowledge distillation where the single-task model teaches the multi-task model is further introduced to encourage the multi-task model to surpass its single-task teacher.\nFinally, we provide a collection of easy-to-use APIs and a visualization tool to make users easier to use and view the processing results directly. To the best of our knowledge, this is the first toolkit to support six Chinese NLP fundamental tasks. \n" } , { "title": "HIT-SCIR at MRP 2020: Transition-based Parser and Iterative Inference Parser", "url": "https://alongwy.top/publications/hit-scir-at-mrp-2020-transition-based-parser-and-iterative-inference-parser/", "body": "This paper describes our submission system (HIT-SCIR) for the CoNLL 2020 shared task: Cross-Framework and Cross-Lingual Meaning Representation Parsing. \nThe task includes five frameworks for graph-based meaning representations, i.e., UCCA, EDS, PTG, AMR, and DRG. \nOur solution consists of two sub-systems: \n+ transition-based parser for Flavor (1) frameworks (UCCA, EDS, PTG)\n+ iterative inference parser for Flavor (2) frameworks (DRG, AMR). \nIn the final evaluation, our system is ranked 3rd among the seven team both in Cross-Framework Track and Cross-Lingual Track, with the macro-averaged MRP F1 score of 0.81/0.69.\n" } ]