Clelia Astra Bertelli's picture

Clelia Astra Bertelli

as-cle-bert

·

https://www.clelia.dev

AI & ML interests

Biology + Artificial Intelligence = ❤️ | AI for sustainable development, sustainable development for AI | Researching on Machine Learning Enhancement | I love automation for everyday things | Blogger | Open Source

Organizations

Posts 47

Post

4177

Let's pipe some 𝗱𝗮𝘁𝗮 𝗳𝗿𝗼𝗺 𝘁𝗵𝗲 𝘄𝗲𝗯 into our vector database, shall we?🤠

With 𝐢𝐧𝐠𝐞𝐬𝐭-𝐚𝐧𝐲𝐭𝐡𝐢𝐧𝐠 𝐯𝟏.𝟑.𝟎 (https://github.com/AstraBert/ingest-anything) you can now scrape content simply starting from URLs, extract the text from it, chunk it and put it into your favorite LlamaIndex-compatible database!🕸️

You can do it thanks to 𝗰𝗿𝗮𝘄𝗹𝗲𝗲 by Apify, an open-source crawling library for python and javascript that handles all the data flow from the web: ingest-anything then combines it with 𝗕𝗲𝗮𝘂𝘁𝗶𝗳𝘂𝗹𝗦𝗼𝘂𝗽, 𝗣𝗱𝗳𝗜𝘁𝗗𝗼𝘄𝗻 and 𝗣𝘆𝗠𝘂𝗣𝗱𝗳 to scrape HTML files, convert them to PDF and extract the text - hassle-free!😸

Check the attached code snippet if you're curious of knowing how to get started🎬

PS: Don't tell anybody, but this release also has another gem... It supports OpenAI models for agentic chunking, following the new releases of Chonkie🦛✨

If you don't want to miss out on the new features, leave us a little star on GitHub ➡️ https://github.com/AstraBert/ingest-anything
And join our discord community! ➡️ https://discord.gg/kDqHNjks

Articles 10

Article

9

Why we (don't) need export control

View all Articles

Collections 2

spaces 24

Pdfitdown

Convert (almost) everything to PDF!

PapersChat

Chatting with scientific papers made easy

Pokemon Bot

A bot that knows a lot about Pokemons

What A Git Year

Showcase your GitHub achievements in the past year!

Bsky Feedllama Demo

Demo for BlueSky FeedLlama with Streamlit and Cohere

BioMedicalPapersBot

Bot that scrapes Pubmed

models 11

as-cle-bert/bcus-class-segformer

Image Classification • 24.2M • Updated Apr 29, 2024 • 13

as-cle-bert/tinyllama-essay-scorer

Text Generation • 1B • Updated Apr 29, 2024 • 24 • 2

as-cle-bert/tiny-fungal-llama

Text Generation • 1B • Updated Apr 15, 2024 • 20 • 1

as-cle-bert/carbon-footprint-prediction

Tabular Regression • Updated Apr 14, 2024 • 1

as-cle-bert/saccharomyces-pythia-v1

Text Generation • 0.2B • Updated Apr 12, 2024 • 14

as-cle-bert/resistBERT

Text Classification • 0.4B • Updated Apr 2, 2024 • 29 • 1

as-cle-bert/segformer-v1-breastcancer

Image Segmentation • 3.72M • Updated Apr 1, 2024 • 1.51k

as-cle-bert/segformer-breastcancer

Image Segmentation • 3.72M • Updated Mar 31, 2024 • 28

as-cle-bert/beit-banana-diseases

Image Classification • 85.8M • Updated Mar 31, 2024 • 16 • 1

as-cle-bert/bus-deit

Image Classification • 21.7M • Updated Mar 29, 2024 • 16 • 1

datasets 15

as-cle-bert/DebateLLMs

Viewer • Updated Dec 30, 2024 • 20 • 32 • 4

as-cle-bert/architecture_vs_normal_image_prompts

Viewer • Updated Nov 8, 2024 • 6k • 25 • 2

as-cle-bert/speckledata

Viewer • Updated Jun 3, 2024 • 2.43k • 26

as-cle-bert/saccaromyces-cerevisiae-base

Viewer • Updated Apr 16, 2024 • 368 • 69 • 1

as-cle-bert/AMR-Gene-Families

Viewer • Updated Apr 1, 2024 • 1.5k • 42 • 1

as-cle-bert/scerevisiae-proteins-reduced

Viewer • Updated Apr 1, 2024 • 600 • 60

as-cle-bert/plastic-enzymes

Viewer • Updated Apr 1, 2024 • 1.64k • 53 • 1

as-cle-bert/scerevisiae-transcripts-biotypes

Viewer • Updated Mar 31, 2024 • 6.72k • 53 • 1

as-cle-bert/breastcancer-semantic-segmentation

Viewer • Updated Mar 31, 2024 • 40 • 34

as-cle-bert/banana-disease-classification

Viewer • Updated Mar 31, 2024 • 777 • 82 • 2

View 15 datasets