New Fresh Open Source Object Storage SNIA SDC 2016

New Fresh Open Source Object Storage SNIA SDC 2016

New Fresh Open Source Object Storage OpenIO Jean-Francois Smigielski SNIA SDC 2016 Santa Clara, CA @OpenIO Lille, Tokyo, San Francisco, Montreal, Madrid 2006 2007 2009 2012 2014 2015 Open 15+PB OpenIO Idea Dev. 1PB source 60M users fork Achievements E-mails + Consumer Cloud >> major french ISP Strong SLAs (latency, availability) 15+PiB, 60M end users 50Gib/s IN+OUT #idea -,2006: The painful days Too much data Silos, sharding, Too many Growth acceleration migrations … operators $TCO too high ! Large stacks of interfaces become I/O-unfriendly Powering Real apps Necessity of an App. Aware solution with pragmatic Software Defined approach “Double Humped” data Ops / Capacity Mixed data profiles GiB used ● Most of the capacity by large cold files ● Most of the I/O ops on small hot files E.g. mailbox index vs. Contents. Colocating seems a bad idea. Content size Grow when necessary Grow small Start at day-0 investment big No appreciated JIT invest. Recycle HW Recycle limit without Grow Uninterrupted deliveries Uninterrupted TOTAL USED Up-to-date HW HW recycling JIT invest' problem: HW decommisionning heterogeneity New vendor deals Etc ... Need for versatility adaptation! Real humans managed 6PM 8PM Bandwidth IN+OUT Strong End-User orientation Observed Daylight rythm 12AM Goal of flexibility OK for admin. Users online Real humans served Major ISP use-case Data life-cycle ● Emails, Videos, Archives ● Recent data used ● 10⁸ users / ISP ● Large data ignored ● 10⁶ contents / user ● Buzz effects ● < 100ms to download 1MiB ● < 20ms latency Room for caches & multiple tiers #Architecture Split hot/cold, offload cold Go for an Independant low cost drives Elastic Storage Software Defined glue Consistent Hashing Algorithms ! ● Scale up → Rebalancing The idea (to avoid) ● Decommissioning → Rebalancing Both happen every week… Locations not fixed? Choose + Remember Conscience Directories Distributes recent snapshots of your platform The Conscience Discovers the services Qualifies them Load-Balances on them The directories Remember polled locations for 10⁸ [user] * 10⁶ [content/user] Divided into trivial problems ● Naming indirection ! ● 1 directory of users services ● 1 directory of contents / user The directory of services Layered as a hash table ● 1st level higly static ● 2nd level sharded in 64k slots ... Replicated (replicas + backlinks) SQLite powered: 1 file/shard replica Sharded yet small & non-volatile → cached The containers Object view Versions, Hardlinks, Properties Map<name, Set<@>> Efficient listing Notifications→ async mgmt Replicated (replicas + backlinks) SQLite powered : 1 file/replica SDK's model Client App. oio-sds client data-client dir-client http http+json Erasure coding proxy-data Caching proxy-dir Replication Load-balancing Where latency Client App. matters oio-sds client data-client dir-client proxy-dir OSS: SDK: Python, C, Java Interfaces: S3, Swift Connectors Specific Editions: Email: cyrus, dovecot, zimbra Media: tailor-made HTTP Filesystem: FUSE, NFS, ... #Tiering #Flexibility Directory of users Directories of contents Easy Tiering Pointers everywhere! Storage Policy = Storage Pool + Data Protection “Where” “How” Fine load-balancing Storage pools Tag-based, Fallbacks « Where » Distance constraints Geo-distribution Erasure code ? Data Protection Replication ? « How » Plain ? Set upon an upload Configuration Per content > container > account « When » Managed asynchronously Dynamic storage policy Tiering rules (still under active development) « Why » Filter on metadata (Mime-Type,size, ctime, mtime, ...) Possible partitions... ● Platform (high-end, low-end) Immediate benefits ● Users set (gold, regular) All QoS elements! All allow cost optimisations! #hybrid #cloud TCO still too high for ultra-cold data Why public tiers ? Alternative to tape archiving Ultra-cold tier Public tiers: How ? Client App. oio-sds client Dedicated Storage Policy data-client dir-client Embedded connector Asynchronism to cope with limited bandwidth proxy-data proxy-dir First partner Backblaze B2 #serverless #storage Still... Too much HW complexity Could we drop Too much sysadmin required servers ? Technical “lasagna” with bugs and latencies on each layer The Kinetic Opportunity TCP/IP/Ethernet connected drives Direct access to data No host required Sorted map of <Key,Value> The perfect tier! Client App. Same vision ! oio-sds client Sleek protocol data-client dir-client No host required Direct access to the data proxy-dir (when it matters) Proxied access (when enough) K K K K K K K K K K K K K K K K K K K K K K K K K K K Meet the Kinetic Open Storage Group and OpenIO at the Plugfest Sonoma Room, 09/20 Apps need more than just storage Processing colocated to data Metadata handling Full text search ! We call this Grid for Apps … out of scope of SDC http://openio.io #scalable #opensource #objectstorage #tiering ? #serverless #hybrid #cloud.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    39 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us