#+Title: Twitter #+Author: fcuny@twitter.com * People ** Cory *** TODO Get up to speed on alerts / viz *** TODO Work with Philip to take the work / knowledge on load test cluster *** TODO Should I setup a 1:1 with him ? *** TODO I don't want to split ownership ** Mahak *** DONE Complete feedback *** TODO Should we do a 'one day cleanup' where we go through alerts / warnings and prune / fix ? *** TODO Running custdevel is getting more and more expensive. What if we were to limit the size of the streams in there ? *** TODO We keep adding features (queue model, placement), but this won't be stable for months. How do we justify pushing back priority for life cycle. This add a clear benefit right now to the stability of the system. *** TODO We don't have the expertise we need on our system. For months I've been asking what's the consequence of having many partitions. Either the answer is "it's fine" or "we don't know". In the last few weeks, we've seen issues because of the number of partitions: + IM with zookeeper + moving dataproducts We need to stop working on long term solution that might give us benefits, but we're not sure. We need to spend time to understand our current system. Without reporting we are blind. We have no idea how much resources are used, necessary, wasted, etc. * Projects ** OS 7 Migration + update for an aggressive [[https://docs.google.com/document/d/1bv_tGtB2mNgaA5ToQLRseY0lzE2vlALTiM5NraQtSkE/edit#][timeline]]. + shared services are in progress ([[https://docs.google.com/document/d/1TVIIgc1mfvghj-cFUX0iINh6SgF1OmRSAvK_b4UjJKU/edit][doc]]) + progress for our various services, run ~[[file+emacs:../bin/tw-os7-report][tw-os7-report]]~. * Tasks ** TODO Update Mesos ticket regarding zombie shards ** TODO Review Dan's doc for zookeeper tasks, and create tickets ** TODO Add more capacity to data product WP cluster if we don't get the new hosts SCHEDULED: <2017-02-17 Fri> * Notes ** Manhattan ** Hybrid Mesos for messaging services